A A
RSS

Improve LINQ Query Performance

Tue, Feb 19, 2008

Random

I was writing a small utility for Outlook 2007 and was using LINQ to query Outlook Tasks.  This query was nested within another query.  While debugging, I realized that looping though the query results in my For Each loop was taking too long.

If you have read anything about LINQ, I am sure you already know about LINQ’s deferred execution.  Deferred execution basically means that your query will not execute until you actually try to use it.  By "use it", I mean anything from looping through it, accessing the "Count" property, and so on. 

The idea behind deferred execution makes sense for LINQ to SQL, since the idea is that your SQL query will only be run when it is absolutely needed and can be optimized at the time it is needed – specially if you try to nest it within another query.  This could indoubtedly produce optimized SQL queries.  Like I said, this makes more sense for LINQ to SQL but doesn’t make too much sense for LINQ to objects.

Maybe an example would make more sense.  The LINQ query below is trying to find all the Outlook tasks that are not in my list of tasks.

Dim results = _
   From myTask In myTasks _
   Where Not (From outlookTask In outlookTasks _
             Select outlookTask.Subject).Contains(myTask) _
   Select myTask 

To clarify, if this was a SQL query it will be something like

select * from myTasks where name not in 
            (select subject from outlooktasks)

I think what happened was that the nested query selecting the outlook tasks was executed on every iteration in my loop which means that the Contains method ran on the entire outlook tasks list in every iteration.  I was kind of disappointed that the nested query didn’t get cached somehow or that the LINQ engine was smart enough to run it once.

Luckily I found a really neat trick to speed this up by orders of magnitudes.  I am talking minutes to seconds here.  And it was very simple.  By calling the ToList method on the LINQ query, I cause instant execution of the LINQ query and if I store the results in a variable then the query is ran only once.  So I got rid of deferred execution and I got rid of multiple executions.

My final LINQ query looked like this:

Dim outlookTasksName = (From outlookTask _
                        In outlookTasks _
                        Select outlookTask.Subject).ToList
Dim results = _
    (From myTask In myTasks _
    Where Not (outlookTasksName).Contains(myTask) _
    Select myTask).ToList

What do you think of this solution?  Do you see any drawbacks or issues?  Do you have other LINQ optimization tricks  or way to improve LINQ query performance?  Do  you think deferred execution is necessary for LINQ to objects?

NOTE: As I was writing this, I just realized that one drawback for large collections would be memory utilization.  But I think the gains in performance outweigh the cost in memory.

Tags: , , , , ,

  • Essentially what you were looking for here was the difference of two collections (myTasks and outlookTasks). Enumerable has a built in method (Except()) to handle these when the collections are of the same type, and LINQ adds the ability to pass in a query to the "Except" method so that you can transform one collection before it is used.

    Your first approach was O(N^2), iterating over one collection then iterating over the second collection once for each iteration of the first collection.

    Your second approach is better: O(N) + O(N), which will iterate through collection one, then collection 2. However as the collections get bigger this will begin to take longer, which an optimized intersect algorithm can help you to avoid!

    if you instead used:

    Dim results = _
    myTasks.Except(From outlookTask In outlookTasks _
    Select outlookTask.Subject)

    which I believe is implemented as O(N log N) which should nicely plateau and provide reliable performance even as the collections get larger, not to mention fewer lines of code!

    Hope this helped!
  • Ed, thanks for the tip, I will give it a shot and report back.
  • I'm wondering Emad, have you considered optimizing using the new GetTable method on the folder class?

    Ed
blog comments powered by Disqus
Advertise Here
The Most Intelligent Add-In To Visual Studio Happy fan of

What I'm Doing...

Yonkly Open Source

Sign up for my newsletter




* = required field

powered by MailChimp!

megree Widget

Apparently, I am connected to Obama. Check this out...
My path to Obama

Cyber Identity