Meet Entity Framework, the Anti-SQL Framework


Entity Framework 6 is coming!

Entity Framework 6 is now ramping up for a release. It brings nice Async functionality, but also gives lots more power to the Code First capability, as well as also bringing EF completely out of the core .NET framework – it’s instead now a standalone NuGet package. So, the story now for creating and modifying your SQL database within EF can be considered to use these four features: –

  1. Get hold of EF via a NuGet package
  2. Create a physical data model based on an object model
  3. Update and evolve your database through code-first migrations
  4. Create a database quickly and easily using LocalDB

No dependency on .NET 4.5, or on a specific version of Visual Studio, or on having a running instance of SQL Server (even SQL Express) as a Windows Service. This sounds all great. We can now create a physical database, create migrations from one version to the next, and write queries in the language that automatically get translated into SQL. In effect, EF6 with code-first allows you to write an application without having to ever look at the database. This may be a somewhat short-sighted or simplistic view – you’ll always need to optimise your DB with indexes and perhaps SPs etc. for high performance corner cases etc. – but for quickly getting up and running, and I would suggest for a fair proportion of your SQL database requirements, EF6 as a framework will sort you out .

Why SQL?

This is all great – the barrier to entry to getting up and running in a short space of time is extremely low. However, I would then raise the question: if you’re using your database in a fashion akin to that above – where it’s just a means to an end – why use a SQL database at all? What are you actually using SQL for? As a way to store and retrieve data into your application. Your queries are written outside of SQL in e.g. LINQ. Your tables are generated outside of T-SQL. You never interact with tables because EF is acting as an ORM that does all the funky mapping for you. So what exactly are you getting from using SQL?

The point is that you can remove almost all of the impedance mismatch between your object model and your data-store in a much simpler way simply by using something like MongoDB. You don’t have a formal schema as such in the database – you simply read and write object graphs to and from a document. The concept of a “mapping” between DB and object model are almost entirely removed. There’s no need to formally upgrade a “schema” e.g. foreign keys, indexes, columns etc. etc – there’s your object model, and nothing else.

Conclusion

I’m not suggesting you go and dump SQL tomorrow if you’ve already invested in it. What I would suggest is that the next time you create a new database you consider why you use SQL and whether you could achieve the same use cases by using a schema-free document database like MongoDB or RavenDB. What are you really using your database for today?

Trying out RavenDB


I came across this the other day and decided to give it a whirl. My initial experiences have been generally quite positive.

I’m sure that the official site can explain it much better than I can but essentially RavenDB is different from a regular relational database in that there are no “tables” as such. Instead, the database stores entire object graphs as “documents”.

Disclaimer: This is not a conclusive review! Think of it more as my initial impressions and experiences given an evening playing around with it, looking through the online docs and searching through Google a bit.

Getting started

I was really impressed at how quick it was to get going – essentially just download and unzip the package from the website, run the Start batch file, and off you go! You get an HTTP endpoint that will return JSON for your queries (more on this later). You also get a Silverlight website which gives you a nice front end to query the database or just browse your documents (I should also point out that you can run the database “in process” as well). All in all, I had the server and was querying the database (you get a stock DB with some sample data) in under five minutes – really good.

From a coding point of view, it’s also very easy to get going. Add a reference to a couple of RavenDB assemblies (note – the Client folder contains the richest version of the API) and you’re ready to start working with the DB (it’s also on NuGet although I don’t know what version is on there). The API seems reasonably clear – the basic functions are accessible at the top level of the namespace, with more complex features in the .Advanced namespace – another good idea.

Adding documents

Adding data to RavenDB the first time was a painless experience. I wrote a simple routine to pull out all the music files on my hard disk and created a simple Artist/Albums/Tracks type hierarchy (very similar to what the sample Raven database contains. actually…). The code is as follows: –

image

I’ve elided the GetArtists() method – it just scans my hard disk for certain files and constructs the object graph. The session object is analogous to the ObjectContext for EF people – it follows the unit-of-work pattern etc., and the SaveChanges persists all modifications to the session. Easy.

The nicest thing is that this is literally all the coding required. No database creation scripts are required. No mappings to tables. No nothing – even EF’s code-first approach is more heavy-handed than this. In fact, in some ways this completely removes the impedance mismatch of database / object modelling – there is no ORM as the database stores the entire graph as a document.

Once an object is in the session, Raven change-tracks the objects, just like the EF ObjectContext, so you get updates for free.

This is all great stuff – you can be up and running, inserting stuff into your database on a clean install of Raven within a couple of minutes.

Querying the database

Here’s where things start to get interesting! So we’ve added a load of objects to our database, and now we want to query it. First off, there is a LINQ provider so that you can write simple queries without too much difficulty: –

image

Notice how the Id is stored as a String (this is the default – I think that you can amend the format though), but it works easily enough. Before someone flames me, I want to point out that there’s actually a Load() method which you should probably use for explicitly loading single entities rather than First().

You can also pretty easily do other basic queries that don’t do projections e.g.

image

I assume that RavenDB converts the IQueryable into HTTP requests, and in the Raven console you can see the queries coming in. I believe that RavenDB stores all your object graphs as JSON internally, and when you query this HTTP endpoint directly, that’s what you get back as well – nice: –

image

RavenDB Indexes

I don’t want to get into performance metrics too much here – I don’t know enough about RavenDB to go into depth about it – but I do want to talk about about Indexes as they seem to be a key part of Raven.

Whenever you make a LINQ query, RavenDB will try to build an Index to speed up performance. An Index in RavenDB terms is not like a SQL Server Index – as I understand it, it’s more like a cached view of data based on a query – almost like a Stored Procedure which caches the results. The performance benefits are quite large – for example, in the example query above (with the Count() > 10), the first time I ran the query it took around 2800ms; the second time it took just 73ms. Raven will silently create these temporary index dynamically and update the results of them in the background (although Raven makes no guarantees that Indexes will be up to date – although you can manually refresh them if required).

Accessing Indexes

There were some issues I had with indexes though. For example, these dynamic indexes get trashed when you stop and start Raven. So I thought “let’s try to save them so when we restart raven it’s still nice and quick”. I couldn’t get it to work. Let’s say we have that Album Count query from earlier. Raven mad an index automatically after the first time I executed the LINQ query. I then renamed it through the website so it got saved as a “permanent” index. When I restarted Raven and ran the same LINQ query, it didn’t know to hit that index so created a brand new temp index from scratch with exactly the same indexing query. If I tried to force Raven to use the saved index when writing in the LINQ query on the client, it failed to do the range search and threw an exception. Even when I directly queried the index through the Silverlight UI as a Lucene query, I failed there too – it would treat the Count field as a text field and therefore treated 20 as less than 3. I’m sure that there’s a way to do this, but I couldn’t figure it out from a scan through the documentation.

Projections and MapReduce

Another time I had to get my hands dirty was with projections. Let’s say we want to get a result back from the DB which gives us a summary of all Artists names, the number of albums, and the total number of tracks. Normally in e.g. Entity Framework you can do something like this: –

image

It won’t work in Raven. First it will complain because you can’t use an anonymous type on the projection. So you make a proper type – and then will simply get back a set of empty objects! As I understand it, this is the crux of the difference between document and relational databases. With a relational database, you can construct result sets by joining between tables etc. etc.. but you cannot do this with document databases (or rather, you don’t want to do this with document databases!).

You could get around this by reading all Artists onto the client and doing the projection there – but this would of course be inefficient (in fact, to discourage you from this sort of sloppiness, by default Raven will only return a maximum of 1024 documents in a single query and 30 queries per session!).

So how do we do projections? With indexes. In RavenDB, we can use MapReduce to construct pre-defined indexes – ironically these are written in LINQ, but are stored on the database rather executed on the client. I found a few articles, including this great blog post, on writing them, so I won’t reproduce it here. Suffice it to say that you write a couple of LINQ queries to perform your projection and then query that index in code by name (although there is a strongly-typed method for querying indexes, too). It then “just works”, nice and quickly etc.

The biggest “issue” I have with this sort of approach is that your application becomes closely coupled to implementation details of your database. Why should you care that there’s an index on the database in order to retrieve a result set? By putting all of the queries in a repository you can abstract it away I guess – it’s just that I’m used to not having to care about that in EF etc. and suddenly now we have this mix of query code on the DB and queries on the client – it’s like we’re back in the land of stored procedures for CRUD. Perhaps the best way to think of it is as if the IQueryable implementation of RavenDB doesn’t support certain methods e.g. GroupBy, Sum etc. etc.

Another problem I had was with Contains e.g. Where(artist => artist.Name.Contains(“van”)); This initially did not work; I then discovered that it expects a Lucene-style query to be put in there e.g. “*van*”. Then it works just great. But this, to my mind, changes the semantics of Contains – surely Contains should, by default, just do a wildcard search anyway?

Conclusion

I’ve only been playing with RavenDB for a few days, so this is by no means an exhaustive review or anything like that. I just was quite excited when I started using it and wanted to share my initial thoughts. There are probably mistakes in what I’ve written above – and in a sense that’s a good thing – all I’ve done so far are read through the RavenDB website and Googled around a bit when I got stuck. And with that I was able in, literally, just a few minutes to get up and running with inserting and querying etc.. The main problems I’ve encountered are more to do with the fact that one shouldn’t treat a document database like a relational database – they’re two different beasts that have different features and ways of working.

I’m really interested in using Raven more though – not only does it have some very nice features, and is easy to get up and running, but it’s a different way of looking at something that we often take for granted – I would urge you to give it a go as it might change the way you think about databases. Just be prepared to do a bit of digging around – I think that the documentation could be a bit deeper – a lot of the samples on the website don’t even mention the LINQ provider or how to create MapReduce indexes etc. etc..