The cost of creating Expression Trees


First, merry xmas etc. etc..

Second – expression trees. I just wanted to post a short blog about the actual cost of creating trees that can be illustrated by comparing the following two code samples: –

image

image

See the difference? Rather than creating the expression inline of the for loop (Sample A), we create it once outside the for loop and reuse it for all iterations (Sample B). (FYI: ParseExpression() does absolutely nothing – it’s just there as a means to an end). Here’s the timing results for 1,000,000 loops: –

image

Running the performance tuning wizard against this shows us some cheeky reflection calls behind-the-scenes which even brings up a performance warning: –

image

image

The lesson to be learned? Expression Trees are an extremely useful tool in your arsenal – and can be used to give big performance gains over e.g. reflection or even dynamic, but ensure that you cache the expressions themselves otherwise the cost of creating the tree can outweigh the benefits gained from them.

Performance or readability with Expression Trees in C#


I was fumbling around trying to create some expression trees on Friday and was working through some stuff with a colleague of mine (someone who actually knows how to do it better than me!) and I got to thinking about the characteristics of different ways of doing the same things in C# with respect to both runtime performance and readability / ease of development.

Let’s say we have the following type hierarchy:

image

Easy enough. Now we want, for some arbitrary reason, to write a generic method that will set a specific integer property to the value of 25, but we don’t know which one it is. In our example above, that could be both the Age of the employee, or the AddressId. You could do this with reflection: –

image

image

Expression Trees

Nice. This works well, no problems whatsoever. There is another, strongly typed way of doing it though, with expression trees. By passing in an expression tree that navigates to that property, we can, at runtime, construct a Action method specific to what we want, and then call that method as needed. Here’s how we would consume such a method: –

image

ageSetter and addressIdSetter are two delegates that we are constructing at runtime to perform the task of setting the appropriate property to the value of 25. The main benefit of such an approach is obvious i.e. strong typing. But there’s another, more subtle benefit of such an approach – performance. To understand why this is, we need to understand how this method is implemented.

CreatePropertySetter returns an Action<T> which, when called, will set the appropriate property to 25: (I’ve elided the GetMembersEnumerator method for clarity)

image

To make it a bit easier to see what happens, when we call the method with e => e.Age, here’s what the above method looks like with debugger quickwatches pinned: –

image

By compiling the final expression, we get a proper Action method which is the same as instance => instance.Age = 25, except we’ve constructed this code, at runtime.

I find Expression Trees difficult to get my head around. I understand the idea of them – essentially building up a tree of things like “assign property” and “call method” and “if” etc. which can then be compiled into a lambda. But the Expression API has lots of factory methods and many ways to do the same thing; I guess with time you gain familiarity with the API, but it’s definitely not the easiest thing in the world to get your head around. Worse still, look at the code required to do the same thing as that reflection code – much more effort and much less readable.

However, where performance is concerned, Expression Trees will beat Reflection by miles – as long as you cache the Action method that is generated! The construction and parsing of the expression trees required to create the Action is an expensive operation, so you should cache the Actions. Once you’ve done that, the cost is extremely low, as you are simply calling an action method, almost the same as if you had written it yourself. For example, to set both the Age and AddressId properties to 25, for 2.5 million Employees, I observed the following timings: –

Type of code applied Time required (ms)
Direct assignment

144

Reflection

3,881

Expression Trees

349

Dynamic

597

Hard-coded lambdas

136

Obviously the other three options wouldn’t be appropriate for the problem at hand (generic property assignment) but I wanted to illustrate how these two mechanisms performed. By the way – notice how much quicker dynamic is than reflection (and these timings were obtained when caching the property info objects as well). Funnily enough, if I wrote hard coded lambdas e.g. instance => instance.Age = 25 and used them, they outperformed code like instance.Age = 25. Why?

Conclusion

When you next use reflection for some property assignments or method calls, think about using Expression Trees, particularly where performance is a factor. Expression Trees are time consuming to write and understand, but offer superior performance and can be consumed in a strongly-typed manner. Alternatively, consider the use of dynamic where possible – again, for property setters and getters and method calls it offers better performance than reflection, and whilst obviously not strongly-typed, the code is again far more readable.

VS2010 Professional Review Part 2 – Dynamic Typing in C#4.0


Visual Studio 2010 comes with the next version of C# – version 4.0. The most controversial feature of this version seems to be the dynamic typing features that are built on top of the DLR (also part of the .NET 4). No, that’s not the Docklands Light Railway – it’s actually the Dynamic Language Runtime. The DLR is a new layer that sits on top of the Common Language Runtime (CLR) in .NET and provides some of the sorts of features available in existing dynamic languages (Python, Ruby) in existing CLR languages such as C# and VB as well as some of the newer .NET languages that are appearing.

My (hugely limited) understanding of dynamic language are that they are primarily weakly typed languages i.e. little or no compile-time checking that e.g. you are accessing properties that exist or not etc.. Think how in JavaScript you can simply do something like assign a value to a property without having explicitly declared that property first etc.. In some languages, like Ruby, this is a fundamental part of the language and you can do things in those languages that look positively weird to a C# coder the first time you see them.

So, what is dynamic typing in the C# sense? Something like this (albeit a contrived example, as usual). Here are two classes, Employee and Person. They have nothing in relation in terms of class hierarchies:

class Employee
{
public string Name { get; set; }
public int Age { get; set; }
public string Department { get; set; }
}
class Person
{
public string Name { get; set; }
public Gender Gender { get; set; }
public int Age { get; set; }
}
Suppose we wanted to print to the Console the Name and Age of all Persons and all Employees. We would probably write two methods which takes in either an Employee or a Person, which would print the Name and Age of either a Person or an Employee and let the compiler choose which method to call depending on the type of object we’re dealing with. Or we might have a single method which does an if / then statement on the type (ugly :-). Or maybe we’d have a single method which used reflection to get the two common properties out of these types and print the details:
 
static void PrintDetails(object detailsContainer)
{
object Name = detailsContainer.GetType().GetProperty ("Name").GetValue (detailsContainer, null);
object Age = detailsContainer.GetType().GetProperty("Age").GetValue(detailsContainer, null);

System.Console.WriteLine("{0} is {1} years old.", Name, Age);
}

Obviously, this is weakly typed – we’re passing in objects. At first glance, this is completely against what most C# coders have been taught to do. But consider – in some ways this is nicer than e.g. having two separate methods which do 99% the same thing – it’s easier to see what’s going on i.e. a single method which prints out the details of the object to the console rather than two methods which at first glance do the same thing. But the problems are: –
  1. Weak typing. Let’s leave this discussion for later on…
  2. The syntax to get the properties is ugly. Let’s deal with this now 🙂

Reflection is Ugly!

Using reflection to get the value of a property isn’t hard to do in .NET, but it is a little strange to look at.

  1. Get the Type of the object you want to interrogate e.g. MyObject.GetType ();
  2. Get the PropertyInfo of the Property that you want e.g. MyType.GetProperty (“ThePropertyIWant");
  3. Get the Value from the PropertyInfo given a particular object of that type e.g. MyProperty.GetValue (MyObject, null);

So that’s a chain of three method calls to get any given property – at a glance, it’s all a bit "weird” to see what’s going on.

Here’s how we would write the same PrintDetails method in C# using Dynamic typing:

static void PrintDetails(dynamic detailsContainer)
{
System.Console.WriteLine("{0} is {1} years old.", detailsContainer.Name, detailsContainer.Age);
}

The main difference is the use of the “dynamic” type instead of “object” as the type of the method parameter. “dynamic” is a new type in C# which is in reality just plain old object. But it tells the compiler to not check any method invocations or property accessors until runtime, I presume using reflection (and you therefore get no intellisense when manipulating dynamic objects). So it’s no more or less weakly type than the reflection-based example, just a whole lot easier to read.

You can cast any type as Dynamic and then do what you want with it – but you can of course get this “wrong” e.g. if I called a property or method that did not exist, I’d get a runtime error – just like you would with reflection (Mike Taulty has a good blog posting about resolution of overloaded methods using dynamic types and overloads).

And just to clarify – dynamic is NOT the same as var! Dynamic is a 100% weakly typed object, you don’t even get the Object methods like ToString() and GetType on them (which every type has). Var is a always strongly typed object, even if it’s an anonymous type.

image image

Static versus Dynamic Typing

Coming from a C# or C++ background, you might be wondering “why do we want features like this? Isn’t statically typing better than dynamic typing?”. Well, the impression I get from some of the interviews with people involved in the development and evolution of .NET is that they see C# and VB .NET becoming more of a hybrid language in the future, offering “best of breed” features from static and dynamic languages, just as in C# 3 they took declarative features from e.g. SQL which became LINQ and merged them with imperative language features like for loops.

So, I think the answer to “is dynamic typing a good thing” is something like “yes, in its right place”. I wouldn’t expect us to chuck interfaces and class hierarchies etc. out of the window just because we can – but dynamic typing can be used in a few places in C# to make our lives a lot easier (and more readable!). Here are some examples: –

  • Making code more readable by hiding Reflection. Talking of reflection, a lot of the time (always?) you will be able to avoid using reflection-style calls to get properties and methods and simply use the dynamic features in C# 4 instead.
  • Interacting with dynamic languages. You can already have C# and VB .NET classes talk to one another, because they are both built on .NET. However, there are now some languages cropping up in the .NET world such as IronPython and IronRuby (I think?) which are .NET versions of Python and Ruby. So, in order to interact from the C# world with .NET assemblies written in those languages, you need to have support for dynamic typing in C#.
  • Talking to COM libraries. There are some great examples on the Internet showing how much easier it is to interact with COM methods using dynamic typing than using Reflection. If you’ve ever done COM interop, such as Excel or Word document manipulation, you’ll know that this is a nightmare – using the dynamic keyword makes your code a lot more readable! This is also especially useful when interacting with e.g. JavaScript in Silverlight.

However, there’s also a risk that people stop using "proper” static typing concepts such as interfaces and class hierarchies etc. simply because they cannot be bothered with it and using dynamic is “easier”. I’m not saying that this approach is something out of the “dark side” – I’m sure that there are times where it’ll be a real timesaver. And, provided you’re doing lots of good unit tests, you can probably get away with dynamic typing when required – but it’s not an excuse for breaking OO rules!

If you want to find out more about dynamic, there are some decent videos on Channel9 and MSDN – there’s one in particular by Anders Heijlsberg who goes through the feature in great depth – worth checking out.