Whenever there are a number of ways to achieve the same goal, I’m always inquisitive as what the performance difference is between them. I’m a firm believer in thinking about scalability from the start - if you can make your best effort to prepare for scale, then you can save yourself time and effort further down the line. I like to try and avoid pain - it doesn’t agree with me.
In this instance, I become curious about sorting an in-memory DataTable, using C# / .NET 3.5. So I decided to run a few tests using the common approaches. Now, you may be thinking “Why not just order the results from the database using an ORDER BY clause?”. Well yes, you could do that. But what about when you want to cache some data once in memory, and then use that cached copy for subsequent purposes to prevent round-tripping back/re-hitting the database server? Or, what if the data isn’t actually coming from a database but some other source?
So I knocked up a quick test harness. For each method, I tested sorting a DataTable containing between 100 and 2 million rows of data. The DataTable contained 2 columns:
- ColumnA - integer, just an incrementing number
- ColumnB - string, in the format {Character}{Row Number} where {Character} just loops round from A-Z just to mix the records up a bit and give the need for ordering.
Method 1 - DataView.Sort
DataView vw = dt.DefaultView;
vw.Sort = "ColumnB ASC";
Method 2 - DataTable.Select
DataRow[] rows = dt.Select("", "ColumnB ASC");
Method 3 - LINQ to DataSet
var rows = (from r in dt.AsEnumerable()
orderby r["ColumnB"] ascending
select r).ToArray();
Note: the .ToArray()
bit in the LINQ above is important - this makes the execution of the query immediate. Without it, what you are actually really doing is just defining a query object. It does not execute until you try to request data from the query object - this is known as deferred execution. So, in this example, without the enclosing brackets and the subsequent .ToArray()
, the data wouldn’t actually be being sorted at this point.
Here’s the results:
No. Of Rows | Method 1 - DataView.Sort | Method 2 - DataTable.Select | Method 3 - LINQ |
---|---|---|---|
100 | 0.0625s | 0.0520s | 0.0475s |
1000 | 0.0781s | 0.0573s | 0.0573s |
10,000 | 0.1618s | 0.1094s | 0.0989s |
100,000 | 1.4793s | 0.8959s | 0.7084s |
1,000,000 | 16.1318s | 9.8290s | 8.4534s |
2,000,000 | 35.094s | 21.5995s | 18.3420s |
As you can see from my tests, LINQ to DataSet came out tops. With a smaller DataTable the difference is, as you’d expect, minimal. Though as the volume of rows increases, LINQ seems to keep out-performing the other two approaches, being nearly 100% quicker than a DataView Sort as you get to the level of hundreds of thousands of rows, and about 14% quicker than a DataTable.Select.