Sorting a DataTable - LINQ performance

Whenever there are a number of ways to achieve the same goal, I’m always inquisitive as what the performance difference is between them. I’m a firm believer in thinking about scalability from the start - if you can make your best effort to prepare for scale, then you can save yourself time and effort further down the line. I like to try and avoid pain - it doesn’t agree with me.

In this instance, I become curious about sorting an in-memory DataTable, using C# / .NET 3.5. So I decided to run a few tests using the common approaches. Now, you may be thinking “Why not just order the results from the database using an ORDER BY clause?”. Well yes, you could do that. But what about when you want to cache some data once in memory, and then use that cached copy for subsequent purposes to prevent round-tripping back/re-hitting the database server? Or, what if the data isn’t actually coming from a database but some other source?

So I knocked up a quick test harness. For each method, I tested sorting a DataTable containing between 100 and 2 million rows of data. The DataTable contained 2 columns:

ColumnA - integer, just an incrementing number
ColumnB - string, in the format {Character}{Row Number} where {Character} just loops round from A-Z just to mix the records up a bit and give the need for ordering.

Method 1 - DataView.Sort

DataView vw = dt.DefaultView;  
vw.Sort = "ColumnB ASC";

Method 2 - DataTable.Select

DataRow[] rows = dt.Select("", "ColumnB ASC");

Method 3 - LINQ to DataSet

var rows = (from r in dt.AsEnumerable()  
orderby r["ColumnB"] ascending  
select r).ToArray();

Note: the .ToArray() bit in the LINQ above is important - this makes the execution of the query immediate. Without it, what you are actually really doing is just defining a query object. It does not execute until you try to request data from the query object - this is known as deferred execution. So, in this example, without the enclosing brackets and the subsequent .ToArray(), the data wouldn’t actually be being sorted at this point.

Here’s the results:

No. Of Rows	Method 1 - DataView.Sort	Method 2 - DataTable.Select	Method 3 - LINQ
100	0.0625s	0.0520s	0.0475s
1000	0.0781s	0.0573s	0.0573s
10,000	0.1618s	0.1094s	0.0989s
100,000	1.4793s	0.8959s	0.7084s
1,000,000	16.1318s	9.8290s	8.4534s
2,000,000	35.094s	21.5995s	18.3420s

As you can see from my tests, LINQ to DataSet came out tops. With a smaller DataTable the difference is, as you’d expect, minimal. Though as the volume of rows increases, LINQ seems to keep out-performing the other two approaches, being nearly 100% quicker than a DataView Sort as you get to the level of hundreds of thousands of rows, and about 14% quicker than a DataTable.Select.

Method 1 - DataView.Sort

Method 2 - DataTable.Select

Method 3 - LINQ to DataSet

See also