Archive
Why Entity Framework renders the Repository pattern obsolete?
A post here on a pattern I thought was obsolete yet I still see cropping up in projects using EF time and time again…
What is a Repository?
The repository pattern – to me – is just a form of data access gateway. We used it to provide both a form of abstraction above the details of data access, as well as to provide testability to your calling clients, e.g. services or perhaps even view models / controllers. A typical repository will have methods such as the following:-
interface IRepository
{
T GetById(Int32 id);
T Insert(T item);
T Update(T item);
T Delete(T item);
}
interface ICustomerRepository : IRepository
{
Customer GetByName(String name);
}
And so on. You’ll probably create a Repository<T> class which does the basic CRUD work for any <T>. Each one of these repositories will delegate to an EF ObjectContext (or DbContext for newer EF versions), and they’ll offer you absolutely nothing. Allow me to explain…
Getting to EF data in Services
Let’s illustrate the two different approaches with a simple example service method that gets the first customer whose name is an arbitrary string. In terms of objects and responsibilities, the two approaches are somewhat different. Here’s the Repository version: -
public class Service
{
private readonly ICustomerRepository customerRepository;
public Customer GetCustomer(String customerName)
{
return customerRepository.GetByName(customerName);
}
}
public class CustomerRepository : ICustomerRepository
{
private readonly DatabaseContext context;
public Customer GetByName(string customerName)
{
return context.Customers.First(c => c.Name == customerName);
}
}
Using the Repository pattern, you generally abstract out your actual query so that your service does any “business logic” e.g. validation etc. and then orchestrates repository calls e.g. Get customer 4, Amend name, Update customer 4 etc. etc.. You’ll also invariably end up templating (which if you read my blog regularly you know I hate) your Repositories for common logic like First, Where etc.. – all these methods will just delegate onto the equivalent method on DbSet.
If you go with the approach of talking to EF directly, you enter your queries directly in your service layer. There’s no abstraction layer between the service and EF.
public class ServiceTwo
{
private readonly DatabaseContext context;
Customer GetCustomer(String customerName)
{
return context.Customers.First(c => c.Name == customerName);
}
}
So there’s now just one class, the service, which is coupled to DatabaseContext rather than CustomerRepository; we perform the query directly in the service. Notice also that Context contains all our repositories e.g. Customers, Orders etc. as a single dependency rather than one per type. Why would we want to do this? Well, you cut out a layer of indirection, reduce the number of classes you have (i.e. the whole Repository hierarchy vs a fake DbContext + Set), making your code quicker to write as well as easier to reason about.
Aha! Surely now we can’t test out our services because we’re coupled to EF! And aren’t we violating SRP by putting our queries directly into our service? I say “no” to both.
Testability without Repository
How do we fix the first issue, that of testability? There are actually many good examples online for this, but essentially, think about this – what is DbContext? At it’s most basic, it’s a class which contains multiple properties, each implementing IDbSet<T> (notice – IDbSet, not DbSet). What is IDbSet<T>? It’s the same thing as our old friend, IRepository<T>. It contains methods to Add, Delete etc. etc., and in addition implements IQueryable<T> – so you get basically the whole LINQ query set including things like First, Single, Where etc. etc.
Because DBSet<T> implements the interface IDbSet<T>, you can write your own one which uses e.g. in-memory List<T> as a backing store instead. This way your service methods can work against in-memory lists during unit tests (easy to generate test data, easy to prove tests for), whilst going against the real DBContext at runtime. You don’t need to play around with mocking frameworks – in your unit tests you can simply generate fake data and place them into your fake DBSet lists.
I know that some people whinge about this saying “it doesn’t prove the real SQL that EF will generate; it won’t test performance etc. That’s true – however, this approach doesn’t try to solve that – what it does try to do is to remove the unnecessary IRepository layer and reduce friction, whilst improving testability – for 90% of your EF queries e.g. Where, First, GroupBy etc., this will work just fine.
Violation of SRP
This one is trickier. You ideally want to be able to reuse your queries across service methods – how do we do that if we’re writing our queries inline of the service? The answer is – be pramatic. If you have a query that is used once and once only, or a few times but is a simple Where clause – don’t bother refactoring for reuse.
If, on the other hand you have a large query that is being used in many places and is difficult to test, consider making a mockable query builder that takes in an IQueryable, composes on top of it and then returns another IQueryable back out. This allows you to create common queries yet still be flexible in their application – whilst still giving you the ability to go directly to your EF context.
Conclusion
Testability is important when writing EF-based data-driven services. However, the Repository pattern offers little when you can write your services directly against a testable EF context. You can in fact get much better testability from an service-with-an-EF-context based approach than just with a repository, as you can test out your LINQ queries against a fake context, which at least proves your query represents what you want semantically. It’s still not a 100% tested solution, because your code does not test out the EF IQueryable provider – so it’s important that you still have some form of integration and / or performance tests against your services.
Tips on writing an EF-based ETL
I’ve been working on a relatively small ETL (that’s Extract-Transform-Load) process recently. It’s been written in C# using EF4 and is designed to migrate some data – perhaps a million rows in total – from one database schema to another as a one-off job. Nothing particularly out of the ordinary there; the object mapping is somewhat tricky but nothing too difficult. There were a few things I’ve been doing lately to improve the performance of it, and I thought I’d share some of those with you.
Separate out query and command. This is probably the hardest thing to change after you get started, so think about this up front. What I mean by this is this: let’s say you write some code that navigates through a one-to-many object graph and for each child, fires off a query to the DB to retrieve some other data, and then acts on that bit of retrieved data. You then discover that sometimes, you’ll have 5,000 children in the graph, which equates to 5,000 queries being fired off to the DB. Instead of this, why not just write a single set-based query which performs a “where in”-style query to retrieve all the data you’ll need in one go. Then you can, in memory, iterate over each of them one at a time. This will give you a big performance boost. In order to do this, you need to be careful to construct your code such that you decouple the bit that queries the data to retrieve a collection of object and the part that operates on each single object one a time, and then have a controlling method which orchestrates the two together. Doing this design upfront is much, much easier than trying to do it afterwards. In essence, if you know up front what data you need to operate on, try to pull as much of that in together rather than doing it bit-by-bit.
Keep your object graphs as small as possible. By this I mean do not read in fields or elements of the graph that you do not need. If necessary, construct DTOs and use EF’s projection capabilities to construct them, rather than reading back an entire EF-generated object graph when you only need 5 properties out of the 150 available.
Use the VS performance profiler. Or ANTS etc. The best part of the VS perf tool is the Tier Interaction Profiler (TIP). This monitors all the queries you fire off to the DB and shows you stats like how many of them there were, how long they took etc. – great for finding bottlenecks.
Avoid lazy loading. It’s seductive but will again negatively impact performance by quietly hammering the database without you realising it.
Use compiled EF queries. For queries that you are repeatedly calling – especially complex ones – compiling them will give you a nice boost in performance.
Keep the “destination” EF context load as low as possible. In the context of EF, this means NOT doing things like using a single session for the entire process. It’ll just get bigger and bigger as you add stuff to it. Instead, try to keep them relatively short lived – perhaps one for x number of source aggregate that you process.
Use No-Tracking for read-only (source) EF contexts. This means you can essentially just re-use the same context across the whole application as the context just becomes a gateway to the DB, but nothing more.
Do not batch up too much in one go. The ObjectStateManager that tracks changes behind the scenes of an EF context is a glutton for memory. If you have a large graph to save, top even 500 or 1,000 insertions and you’ll see your memory footprint creeping up and up; calling SubmitChanges() on a regular basis can alleviate this (at least, that’s the behaviour that I’ve observed).
Separate out writing to reference and entity data. If you are inserting reference data, create an entirely separate context for it. Do NOT share the entities with your main “destination” entity model. Instead, just refer to reference data items by ID. The benefits of doing this are that you can much more easily cache your reference data without falling foul of things like attaching the same object into multiple contexts etc.
Ask yourself if you really need to use EF. There are many forward-only data-access mechanisms available in .NET that out-perform EF. For reading from the source database, you could use something like Dapper or Massive intead of EF. I can’t comment on the performance of Massive, but Dapper is certainly extremely quick. You will lose the ability to write LINQ queries though, and will have to manually construct your entire source database domain model. Again though, that may not be such a bad thing if you design DTOs that work well in a set-based query environment.
Creating a testable WCF RIA Domain Service – Part 2
I posted a few weeks ago about writing a testable LinqToEntities domain service and came up with a number of options. The last one was my preferred choice i.e. use of the new keyword to “overwrite” the real context on the base class with an interface that your context implements. I’ve now refined it to create a very simple class that I call the MockableDomainService.
Here’s how it works: -
1. Create a mockable EF object context
This has been covered in many places on the net so I won’t bore you with the details; instead I’ll point you here again and show a class diagram of how your mocked context might look:
2. Copy the code for MockableDomainService
MockableDomainService inherits from LinqToEntitiesDomainService, but unlike that service class, this one is not as closely coupled to your ObjectContext. Instead it has two generic parameters – the ObjectContext (in our example, BusinessContext) and the testable interface that it implements (IBusinessContext).
You can download the code here, but here’s the code (I’m pasting the code as a screenshot since I still can’t figure out how to gracefully paste code into WordPress with Windows Live Writer…)
3. Inherit from MockableDomainService
Now take your real domain service, and inherit from MockableDomainService instead of the LinqToEntitiesDomainService. In your production code, simply create it with the default constructor. When you want to test your domain service, create it passing in your Fake object context instead. All your Domain Service queries will now execute against that instead – job done!
There’s still things that need work on this mockable domain service – you’ll still need to get it to handle change sets and the like – basically all the other stuff that LinqToEntities domain service gives you – but for getting going and at least testing out your queries, this should work just fine.
Testing out the Entity Framework WCF RIA Services Domain Service
I’ve started using RIA Services lately and have been testing it out with an EF4 back-end. Generally I’m quite impressed with it, as it gives you several things out of the box that you normally would need to spend time coding otherwise: -
-
Hosting of the WCF service
-
IQueryable on the client
-
Batch updates through the client-side domain context
-
Change change of client-side entities
-
Change notification of client-side entities
The last two are particularly nice as they mean that you don’t have to do this work on your server-side domain model – RIA services automatically code gens up client-side types that do this automatically.
On the server side, you have a domain service which is essentially just a class that inherits from DomainService, which in turn acts as a smart WCF service, doing a lot of the work of hosting the service etc. for you for free. However, if you’re using EF4, you can inherit from LinqToEntitiesDomainService<TObjectContext>, which gives you some more things out of the box that you would normally have to code up for your domain service. This includes a property called ObjectContext which is the context used for accessing your data store.
Testing out your domain service
Unfortunately, the above EF Domain Service couples itself to the concrete object context, and so thus rules out testing your domain service or the queries that you execute in that service. I looked around and found precious few examples on how to test out a LinqToEntities Domain Service. There’s an example for testing out a LinqToSql service, but that’s it. So, here are a few options I came up with to allow you to test out your Entity Framework Domain Service.
Use the standard Domain Service
Here, you simply use the standard domain service rather than the EF one, which gives you the ability to inject your ObjectContext as you see fit.
Create a “regular” domain service, and reference an interface to your object context (lets call it IObjectContext) rather than a concrete object context (I suggest looking here for details on how to use IObjectSet as a way to mock the object context). You can inject the context manually or use an IoC framework to do it for you. Then, you write queries to your data source through the IObjectContext etc.
This solution functions, but you will lose any extra features that the LinqToEntitiesDomainService gives you – I’ll blog about this in the future.
Use the repository pattern to test your queries
In this example, your use the real EF Domain Service, but instead of writing queries directly in your service, you create a Repository class which takes in an IObjectContext and executes its queries there. You can test out your repository queries in the normal manner i.e. in your unit tests, create a fake ObjectContext and supply that to your repository class.
However, your cannot test out any of your domain service “business logic” code in MyService as this is still tightly coupled to the concrete ObjectContext. Only your queries against the context can be tested.
Put your real logic in a delegated class
Here, you create a class which will have the same method signatures as your domain service, and your domain service delegates all calls to it. Your domain service is a “proper” EF Domain Service, so you get all the benefits of lifecycle management etc. of the context, but still have the ability to fake out the context to make your code testable. In effect, your Domain Service becomes a proxy for the real business logic.
Create an interface for your Domain Service which has all the methods that you’re going to expose. Then, create another class which implements that interface and contains all your business logic and queries in. In essence, this is the “real” domain service, except it doesn’t inherit from Domain Service etc.. In your Domain Service class, you create an instance of this “delegated” service, passing in the concrete object context, and delegate all calls to it. Because the “delegated” service class is only coupled to IObjectContext, you can full test it out, yet you get the benefits of EFDomainService because that class is still created.
However, this is possible the most complicated of all the different ways of abstracting away the context, and you have a somewhat ugly manual delegation of calls going on – and for every new domain method you create, you need to update the interface etc., so it’s a bit of work.
Shim in a fake context
In this scenario, you simply create a new property on your EF Domain Service which overrides and thus hides the “real” Object Context. The new property is the interface; all access to the context goes via this property rather than the base ObjectContext.
With this fairly lightweight approach, you create a property with the same name as the “real” context that exists on the base class, ObjectContext, and use the new keyword to override it. You then use constructor-based injection to use a fake context when required; otherwise you use the base ObjectContext.
With this approach, as long as you remember to never use base.ObjectContext, you can effectively shim in a fake context without any need for proxies or repository classes; your service is still fully testable and you get all the goodness of the LinqToEntities Domain Service.
NuGet, EF4.1 and SQL Compact 4
As I’m waiting around this week for furniture to be delivered to my new abode, I was passing the time today by trying out EF4.1. I thought it’d also be a good opportunity to try out NuGet in order to see how easy it is to download packages + dependencies etc. etc..
So, the test was: Download EF4.1 and SQL Compact 4 with NuGet to allow me to create a simple data model and get some CRUD functionality up on screen in a WPF screen. It failed me.
NuGet itself seems very nice – the ability to easily download packages and their associated dependencies for a VS solution, integrated within the IDE etc. – great stuff; I will definitely use it in future.
Unfortunately, neither the package for EF4.1 nor SQL Compact 4 contains the System.Data.SqlServerCe.Entity.dll, thus as soon as you try to make a connection to a SQL Compact database with EF4.1, your application will crash:
Could not load System.Data.SqlServerCe.Entity.dll. Reinstall SQL Server Compact.
The solution is to manually install SQL Compact 4 from this link. Once done, you should be good to go.
So, a failure for NuGet insofar as the packages supplied did not contain the correct assemblies for what was required – however, NuGet in general seems very impressive and I would recommend you taking a look at it in future.
UPDATE
There are actually two SQL Compact packages on nuGet. One is simply called SQLServerCompact; the other is called EntityFramework.SqlServerCompact. I had downloaded the former during my attempt described above. If you download the latter package, you will find that you get the .Entity.dll and don’t need to install SQL Compact 4 separately. Not sure that it’s entirely clear though that there are two versions of SQL Compact 4 in nuGet…. one which is “compatible” with EF4 and one which isn’t…
Entity Framework Code-First almost released
I should actually qualify that by saying the first version of it is almost released. I blogged about Code First what seems like ages ago, and it’s great to see a version of it getting out there. However, this version won’t ship with several features – some of which are in the current model- and database-first scenarios, and some of which aren’t – for example: -
-
No SP support
-
No compiled queries
-
No enum support (as per the rest of EF)
-
No user-defined mapping conventions
All are on the roadmap – but in the interests of getting it out there, they’ve been pushed back to version 2. The feedback on one of the announcement pages has been pretty negative – I can’t understand why. Yes, the work isn’t complete yet. Yes, there are features that need to be added before everyone’s happy. In fact, there’ll always be someone who doesn’t like it, but hey….
What is important IMHO is that the ADO .NET team are adopting a more community-based approach – get smaller iterations of code out there and get feedback quickly. EF3.5 was OK but not much more – EF4 went a long way to improving the state of affairs; EF4.1 adds a brand new class-based approach without any EDMX etc. – give them some time and I’m sure it’ll have all the features that we want. Otherwise, they’d never release it. Just look at Firefox 4 – a great browser, but it took so long to get out there due to (IMHO) large scope creep and poor roadmapping.
My personal wish list feature would be to see better integration between EF’s model-first approach and DB projects. Currently, once you go live with a database, you effectively have to move over to database-first modelling since EF offers no integration with DB projects. I’d love to see EF modify tables and such in a database project once you’ve modified your model.
So, EF4.1 might not do everything you want. But at least it’s out there – we can try it out with full support from MS (or as much as MS Connect offers….) and go from there.
Entity Framework and Concrete-Table-per-Type Inheritance
As I alluded to on a recent post of mine, I’ve been experimenting with inheritance in EF4. EF4 (allegedly) supports 3 types: -
-
Table-per-Hierarchy: One physical DB table contains all fields for the entire type hierarchy. This is the only mode of inheritance in Linq to SQL.
-
Table-per-Type: One physical DB table stores all shared fields, whilst each type has its own table for type-specific fields.
-
Table-per-Concrete Type: Each type has its own physical DB table, even for common (base types) field.
Now, the first two are well documented on MSDN but the third is conspicuously absent. It turns out that there is no design-time support for this model, nor is there any real documentation on how to do it! I’ve surfed around and not really found much in the way of helpful guidance on this, so here’s my take on it.
Here’s a (very) simple set of database tables:
Let’s now push that into EF4 using the designer:
Oh look! There are two common fields. Let’s make a nice type hierarchy out of them, by creating a new abstract entity called Employee and then tying them all together (and removing the common fields from the two concrete entities):
Hit compile and…. it doesn’t work. The first problem you’ll get is that you need to map the Id and Name properties from the base Employee entity to both physical tables. How do we do that? Well, you can try to go to mapping of Employee in the designer and doing the following:
It won’t work. You still need to map the unique identifier column (ID) to both the Developer and Manager entities – but you can’t do this in the designer.
So you open up the model in the XML Editor (Open with…) and you get a whole load of XML.
-
SSDL: The database defintions; you won’t need to modify this.
-
CSDL: The conceptual model i.e. the entities you see in the designer. You won’t need to modify this.
-
CS Mapping: The mapping between the above two models.
-
Designer content: Ignore this; it contains the designer surface details e.g. where on the screen the entities should be displayed etc.
So, you go into the CS Mapping section, and create a new scalar property in both Manager and Developer mapping fragments for ID e.g.
<EntityContainerMapping StorageEntityContainer="TestModelStoreContainer" CdmEntityContainer="TestEntities"> <EntitySetMapping Name="Employees"> <EntityTypeMapping TypeName="IsTypeOf(TestModel.Developer)"> <MappingFragment StoreEntitySet="Developer"> <ScalarProperty Name="Skill" ColumnName="Skill" /> <ScalarProperty Name="Age" ColumnName="Age" /> </MappingFragment> </EntityTypeMapping>
becomes
<EntityContainerMapping StorageEntityContainer="TestModelStoreContainer" CdmEntityContainer="TestEntities"> <EntitySetMapping Name="Employees"> <EntityTypeMapping TypeName="IsTypeOf(TestModel.Developer)"> <MappingFragment StoreEntitySet="Developer"> <ScalarProperty Name="Id" ColumnName="Id" /> <ScalarProperty Name="Skill" ColumnName="Skill" /> <ScalarProperty Name="Age" ColumnName="Age" /> </MappingFragment> </EntityTypeMapping>
- OK, let’s try to compile again. Still no luck – VS now complains about the lack of the mapping for the Name property, too! In other words:
you need to manually map across all shared properties from the base class to all derived classes
Once you create another mapping for Name (just like I did for ID), your code will compile, and you can do code like the following: -
var context = new TestEntities(); context.AddToEmployees(new Manager()); context.AddToEmployees(new Developer()); context.SaveChanges();
You can also query the Employees collection on the context in order to get all Employees, or do a Employees.OfType<Developer>().
EF3.5 and relationships
Having been using EF3.5 lately, I just thought I’d point of one of the things that they did which – to my mind – was completely mad (thankfully rectified in EF4).
-
You have a parent-child relationship e.g. Course and Pupil
-
You load a Pupil instance on its own
-
You want to find out the ID of the Course that this Pupil belongs to i.e. the foreign key.
-
You are in an n-tier, or otherwise disconnected environment, or one where performance is a key factor.
In EF3.5, there is no easy, strongly-typed way of doing this, without a further round trip back to the data source in order to load the parent Course, and then navigate to that object e.g.
Pupil pupil = LoadPupil("Isaac"); if (!pupil.CourseReference.IsLoaded) pupil.CourseReference.Load(); int courseId = pupil.Course.Id;
Of course, in a perfect world, you have a superfast database with lots of memory etc. etc. so this approach works fine, in theory.
In the real world, it’s not that good when you need something high performing, with minimal unnecessary round trips to the DB. You could argue that I should simply Include() the Course when loading the Pupil – yep, that’d work – but again, why should I need to load the entire entity in just to get that one field? Why should I have to do a join or union in SQL just for that one field when the Pupil should already be able to provide this information to me?
Thankfully, in EF4 you can expose the FK fields so that you can get to them as e.g. Pupil.CourseId. I can sort of see why someone might not want them in the conceptual model, but compare that with how you have to do it in EF3.5:
Pupil pupil = LoadPupil("Isaac"); int courseId; if (!pupil.CourseReference.IsLoaded) courseId = (int)pupil.CourseReference.EntityKey.EntityKeyValues[0].Value; else courseId = pupil.Course.Id;
That’s right – an array of key/value pairs which represent the key between Pupil and Course. Naturally, the key/value pair is weakly typed, so you have to cast the id back to an int. Another developer without a decent amount of EF experience will not be able to look at this code and automatically know what is going on; you’d need to comment it or similar to explain the intent.
Anyway – rant over. It just annoyed me today that this was the best solution that they came up with for EF3.5.
Coming up: Entity Framework and inheritance!
Entity Framework 4 “Code First” CTP
There are all sorts of blog posts flying about at the moment regarding EF4’s Code First CTP: -
Since they all cover the finer details in excruiating detail, I wanted to just talk about my brief experience with it yesterday and today.
Firstly, installation is easy – just download the CTP, it installs a folder in Program Files that you reference in your .NET projects (Microsoft.Data.Entity.CTP).
Actually using it is extremely easy and quick, as the above blogs attest to – you can just create POCOs, put the type within your context and off you go – it really is that easy. Unfortunately I could find no way of actually viewing the database tables that the framework creates, as I’m using SQL Compact 4 (Code First does not (yet?) work with Sql Compact 3.5).
Anyway – just an example of how easy it was to create a simple two-tier inheritance hierarchy…
Here’s the code for it:
{
public DbSet<Employee> Employees { get; set; }
public DbSet<Customer> Customers { get; set; }
public DbSet<Person> People { get; set; }
}
class Person
{
public int PersonId { get; set; }
public string Forename { get; set; }
public string Surname { get; set; }
public string Name { get { return String.Format("{0} {1}", Forename, Surname); } }
}
class Employee : Person
{
public string Role { get; set; }
public int Salary { get; set; }
}
class Customer : Person
{
public int OrderCount { get; set; }
}
Not a lot of code really – and that’s about all that’s required in terms of set-up of code. Here’s an example of my using it – pretty standard Entity Framework really aside from the second line that tells EF to automatically update the database schema if you make changes to the object model: -
{
Database.DefaultConnectionFactory = new SqlCeConnectionFactory("System.Data.SqlServerCe.4.0");
Database.SetInitializer(new RecreateDatabaseIfModelChanges<EmployeeDbContext>());
var context = new EmployeeDbContext();
AddNewEmployee(context);
AddNewCustomer(context);
context.SaveChanges();
Console.WriteLine("Employees");
foreach (var employee in context.Employees)
Console.WriteLine(employee.Name);
Console.WriteLine();
Console.WriteLine("Customers");
foreach (Customer customer in context.Customers)
Console.WriteLine(customer.Name);
Console.WriteLine();
Console.WriteLine("People");
foreach (var person in context.People)
Console.WriteLine(person.Name);
Console.Read();
}
private static void AddNewCustomer (EmployeeDbContext employeeDbContext)
{
employeeDbContext.Customers.Add(new Customer
{
Forename = "John",
Surname = "Smith",
OrderCount = 10
});
}
private static void AddNewEmployee (EmployeeDbContext employeeDbContext)
{
employeeDbContext.Employees.Add(new Employee
{
Forename = "Isaac",
Surname = "Abraham",
Role = "Developer",
Salary = 2000
});
}
A just to prove that EF understands the inheritance hierarchy, here’s a screenshot with the output from the code above:
The nice thing about this is the ease of creation – you can be up and running with a database in literally minutes. Just hitting F5 with the code above will implicitly create a new database and the schema to match the object model. Nice.
I’m not sure how easily you could push updates post-deployment e.g. you’ve got a production database and want to amend some fields etc. – I believe that this sort of thing is still to come – but it’s still quite impressive.
Assembly references and dependencies – Part II
I did finally fix the problem in my last blog post after a late night session of VS2008. It turns out the culprit was the Entity Framework model. How I did figure this out? Using good old Holmesian powers of deduction.
I started by creating a simple data model in Visio – I picked any old Software diagram – just so that I could visually map out all the assembly dependencies graphically (how I would have liked VS2010 architectural modeller here!). I then took a close look at the build order of my solution when doing a build of the unit test project. I noticed that one project actually did not need a rebuild when I amended my unit test project. All my other projects did, but not this one (let’s call this one Framework).
What I then did was create a single Unit Test project and referenced Framework. Sure enough, amending my unit test project did NOT require a rebuild of the Framework project. This immediately told me that the problem I was facing was nothing to do with my PC e.g. virus scanner / solution settings or such. It must have been something specific to a project, or projects, within my solution.
I looked at my architectural diagram and realised that all my projects, except for Framework, were dependant on a single assembly in some way – my Data assembly, which contains my business entities.
The screenshot above illustrates this sort of dependency chain. The way that Visual Studio works (and this makes good sense), the only time an assembly needs to be rebuilt is either if that assembly itself has changed, or if any dependent assembly has changed. In this case, any change in Data would cause ClassLibrary1 and 2 to be rebuilt.
The upshot is that after going through adding and removing references one-by-one to my unit test project, I determined that the Data assembly was getting rebuilt every time, even if it hadn’t changed. This was having a knock-on effect on all my other assemblies, which is where the long build time came from.
Why was the Data assembly rebuilding? Because of my Entity Framework object model in the assembly. To prove this, just create an empty project and add an EF model to it. As soon as you do this, every rebuild of any assembly will cause a rebuild of that one as well. I need to understand why this is – I think that it’s to do with how it adds the underlying XML files that make up the EF model into the assembly at build time rather than leave them as files lying around in the bin folder – but not sure. So, for the meantime I have removed all project references to the Data assembly and replaced them with assembly references – this eliminates the problem.
Either way, I’ve now reduced my build time from ~20 seconds to ~7 seconds

