Debunking the LINQ “magic” myth again


I’ve blogged before about how LINQ-to-Objects, at it’s most basic, is just about building on top of enumerating, one at a time, over collections via MoveNext(). It wraps it up in a beautiful API, but it’s still generally crawling through collections. I wanted to give an example of this in more depth and how the C# team introduced some smart optimisations where possible.

And I’m sure that Jon Skeet’s fantastic EduLinq blog post goes into more detail than here, but anyway…

LINQ’s Any() vs Count()

I know a lot of people that tend to check for the existence of an empty collection with a collection.Count() == 0 check (or > 1 for non-empty ones). This obviously works, but there’s a better alternative i.e. collection.Any(). There are two reasons why you should favour this over Count: –

Readability

Compare the two examples below: –

  • if (employees.Count() > 0)
  • if (employees.Any())

The latter reads much better. The former invariably means that in your head you’re doing a “conversion” to “if there are any items”.

Performance

How does .Count() work? It simply iterates over every item in the collection to determine how many there are. In this situation, .Any() will be much, much quicker because it simply tests if there is at least one item in the collection i.e. does MoveNext() return true or not.

Decompiling LINQ

There’s one situation where the above is not true. If the underlying enumerable collection implements either version of ICollection, and you’re calling the parameter-less version of Count(), that extension method is smart enough to simply delegate to the already-calculated Count property. Again – this only applies to the parameter-less version! So for the version of Count that takes in a predicate, the first of the next two samples will generally be much quicker than the latter: –

  • employees.Any(e => e.Age < 25);
  • employees.Count(e => e.Age < 25) > 0;

How can I prove this? Well, newer versions of CodeRush have a built-in decompiler so you can look at the source code of these methods (like Reflector does). I’m sure there are other tools out there that do the same… anyway, here’s an (ever so slightly simplified) sample of the implementations of the Count() and Any() methods. First, the parameter-less versions: –

image

image

The former optimises where possible, but if it can’t, has to fall back to iterating over the entire collection. The latter simply falls out if it succeeds in finding the first element.

Now here’s the predicated versions of those methods: –

image

image

The former has to iterate over every item in the collection to determine how many passed the predicate – there’s no optimisation possible here. The latter simply finds the first item in the collection that passes the predicate and breaks there.

Conclusion

Having read through those code samples, re-read the initial example of Any versus Count and ask yourself why you would use Count Smile

Again, just to drill home the point – there is no magic to LINQ. The C# developers did a great job optimising code where possible, but at the end of the day, when you have several options open to you regarding queries, make sure you choose the correct option for the right reason to ensure your code performs as well as possible.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s