Trying out RavenDB


I came across this the other day and decided to give it a whirl. My initial experiences have been generally quite positive.

I’m sure that the official site can explain it much better than I can but essentially RavenDB is different from a regular relational database in that there are no “tables” as such. Instead, the database stores entire object graphs as “documents”.

Disclaimer: This is not a conclusive review! Think of it more as my initial impressions and experiences given an evening playing around with it, looking through the online docs and searching through Google a bit.

Getting started

I was really impressed at how quick it was to get going – essentially just download and unzip the package from the website, run the Start batch file, and off you go! You get an HTTP endpoint that will return JSON for your queries (more on this later). You also get a Silverlight website which gives you a nice front end to query the database or just browse your documents (I should also point out that you can run the database “in process” as well). All in all, I had the server and was querying the database (you get a stock DB with some sample data) in under five minutes – really good.

From a coding point of view, it’s also very easy to get going. Add a reference to a couple of RavenDB assemblies (note – the Client folder contains the richest version of the API) and you’re ready to start working with the DB (it’s also on NuGet although I don’t know what version is on there). The API seems reasonably clear – the basic functions are accessible at the top level of the namespace, with more complex features in the .Advanced namespace – another good idea.

Adding documents

Adding data to RavenDB the first time was a painless experience. I wrote a simple routine to pull out all the music files on my hard disk and created a simple Artist/Albums/Tracks type hierarchy (very similar to what the sample Raven database contains. actually…). The code is as follows: –

image

I’ve elided the GetArtists() method – it just scans my hard disk for certain files and constructs the object graph. The session object is analogous to the ObjectContext for EF people – it follows the unit-of-work pattern etc., and the SaveChanges persists all modifications to the session. Easy.

The nicest thing is that this is literally all the coding required. No database creation scripts are required. No mappings to tables. No nothing – even EF’s code-first approach is more heavy-handed than this. In fact, in some ways this completely removes the impedance mismatch of database / object modelling – there is no ORM as the database stores the entire graph as a document.

Once an object is in the session, Raven change-tracks the objects, just like the EF ObjectContext, so you get updates for free.

This is all great stuff – you can be up and running, inserting stuff into your database on a clean install of Raven within a couple of minutes.

Querying the database

Here’s where things start to get interesting! So we’ve added a load of objects to our database, and now we want to query it. First off, there is a LINQ provider so that you can write simple queries without too much difficulty: –

image

Notice how the Id is stored as a String (this is the default – I think that you can amend the format though), but it works easily enough. Before someone flames me, I want to point out that there’s actually a Load() method which you should probably use for explicitly loading single entities rather than First().

You can also pretty easily do other basic queries that don’t do projections e.g.

image

I assume that RavenDB converts the IQueryable into HTTP requests, and in the Raven console you can see the queries coming in. I believe that RavenDB stores all your object graphs as JSON internally, and when you query this HTTP endpoint directly, that’s what you get back as well – nice: –

image

RavenDB Indexes

I don’t want to get into performance metrics too much here – I don’t know enough about RavenDB to go into depth about it – but I do want to talk about about Indexes as they seem to be a key part of Raven.

Whenever you make a LINQ query, RavenDB will try to build an Index to speed up performance. An Index in RavenDB terms is not like a SQL Server Index – as I understand it, it’s more like a cached view of data based on a query – almost like a Stored Procedure which caches the results. The performance benefits are quite large – for example, in the example query above (with the Count() > 10), the first time I ran the query it took around 2800ms; the second time it took just 73ms. Raven will silently create these temporary index dynamically and update the results of them in the background (although Raven makes no guarantees that Indexes will be up to date – although you can manually refresh them if required).

Accessing Indexes

There were some issues I had with indexes though. For example, these dynamic indexes get trashed when you stop and start Raven. So I thought “let’s try to save them so when we restart raven it’s still nice and quick”. I couldn’t get it to work. Let’s say we have that Album Count query from earlier. Raven mad an index automatically after the first time I executed the LINQ query. I then renamed it through the website so it got saved as a “permanent” index. When I restarted Raven and ran the same LINQ query, it didn’t know to hit that index so created a brand new temp index from scratch with exactly the same indexing query. If I tried to force Raven to use the saved index when writing in the LINQ query on the client, it failed to do the range search and threw an exception. Even when I directly queried the index through the Silverlight UI as a Lucene query, I failed there too – it would treat the Count field as a text field and therefore treated 20 as less than 3. I’m sure that there’s a way to do this, but I couldn’t figure it out from a scan through the documentation.

Projections and MapReduce

Another time I had to get my hands dirty was with projections. Let’s say we want to get a result back from the DB which gives us a summary of all Artists names, the number of albums, and the total number of tracks. Normally in e.g. Entity Framework you can do something like this: –

image

It won’t work in Raven. First it will complain because you can’t use an anonymous type on the projection. So you make a proper type – and then will simply get back a set of empty objects! As I understand it, this is the crux of the difference between document and relational databases. With a relational database, you can construct result sets by joining between tables etc. etc.. but you cannot do this with document databases (or rather, you don’t want to do this with document databases!).

You could get around this by reading all Artists onto the client and doing the projection there – but this would of course be inefficient (in fact, to discourage you from this sort of sloppiness, by default Raven will only return a maximum of 1024 documents in a single query and 30 queries per session!).

So how do we do projections? With indexes. In RavenDB, we can use MapReduce to construct pre-defined indexes – ironically these are written in LINQ, but are stored on the database rather executed on the client. I found a few articles, including this great blog post, on writing them, so I won’t reproduce it here. Suffice it to say that you write a couple of LINQ queries to perform your projection and then query that index in code by name (although there is a strongly-typed method for querying indexes, too). It then “just works”, nice and quickly etc.

The biggest “issue” I have with this sort of approach is that your application becomes closely coupled to implementation details of your database. Why should you care that there’s an index on the database in order to retrieve a result set? By putting all of the queries in a repository you can abstract it away I guess – it’s just that I’m used to not having to care about that in EF etc. and suddenly now we have this mix of query code on the DB and queries on the client – it’s like we’re back in the land of stored procedures for CRUD. Perhaps the best way to think of it is as if the IQueryable implementation of RavenDB doesn’t support certain methods e.g. GroupBy, Sum etc. etc.

Another problem I had was with Contains e.g. Where(artist => artist.Name.Contains(“van”)); This initially did not work; I then discovered that it expects a Lucene-style query to be put in there e.g. “*van*”. Then it works just great. But this, to my mind, changes the semantics of Contains – surely Contains should, by default, just do a wildcard search anyway?

Conclusion

I’ve only been playing with RavenDB for a few days, so this is by no means an exhaustive review or anything like that. I just was quite excited when I started using it and wanted to share my initial thoughts. There are probably mistakes in what I’ve written above – and in a sense that’s a good thing – all I’ve done so far are read through the RavenDB website and Googled around a bit when I got stuck. And with that I was able in, literally, just a few minutes to get up and running with inserting and querying etc.. The main problems I’ve encountered are more to do with the fact that one shouldn’t treat a document database like a relational database – they’re two different beasts that have different features and ways of working.

I’m really interested in using Raven more though – not only does it have some very nice features, and is easy to get up and running, but it’s a different way of looking at something that we often take for granted – I would urge you to give it a go as it might change the way you think about databases. Just be prepared to do a bit of digging around – I think that the documentation could be a bit deeper – a lot of the samples on the website don’t even mention the LINQ provider or how to create MapReduce indexes etc. etc..

Advertisements

4 thoughts on “Trying out RavenDB

  1. Nice review.

    First, you’d want to visit our new website at http://beta.ravendb.net/, where you will find new, more comprehensive docs (still ongoing work tho) and a KnowledgeBase. To more easily view JSON, btw, use Chrome and the JSON View addon.

    A few points:

    New nuget packages are published for every stable build. We have a stable build once every 2-3 weeks, so nuget is the recommended way to get them since it allows for easier updates.

    Your code for storing new objects is incorrect. With RavenDB being Safe-By-Default, you want to have it call SaveChanges() and reopen the session every now and then. Also, for changes to UoW to be persisted, you will need to call SaveChanges() explicitly or they will be discarded. See http://beta.ravendb.net/kb/3/using-ravendb-in-an-asp-net-mvc-website for an example how to automate this with ASP.NET MVC.

    IDs in RavenDB are string by default and use the entity’s collection name. You can find more info on this here: http://beta.ravendb.net/docs/consumer/basic-concepts#documents-collections-and-document-unique-identifiers

    Indexes in RavenDB are not cached views. They are definitions of data extraction contracts RavenDB runs against all data to extract and map entity properties into a Lucene index. The queries you run code are translated to Lucene syntax that is then run against the appropriate index.

    As you wrote, ad-hoc queries will create indexes to be run on if none existed before. These will be temporary indexes that will run in memory and erased when the DB shuts down. If used enough, they will graduate to be a static index which is kept on disk for future usage. Static indexes can also be created using the Client API (see PutIndex and AbstractIndexCreationTask).

    Static indexes allow for more advanced features too, such as Full-Text search. The Contains operator didn’t work because the index you were running against is not a FTS-enabled one. You will need to manually create an index and define a field as “Indexed” for it to work. Also look at the Search() operator.

    RavenDB can actually guarantee indexes are up to date. Indexes can be stale, and Raven can tell you if they are.

    In your queries, use the .Where() operator – the lack of it is probably what made your experience go south. If that was not it, please post it to the mailing list so we can investigate further.

    RavenDB is more closely coupled with your data as any other DB would be. If you think of indexes as documents – would that make more sense?

    HTH,

    Itamar.

  2. Here is the comment I left there.

    re: Accessing Indexing
    We recently fixed several issues related to just that scenario in the query optimizer, now if you make a temporary index permanent, it will recognize that.

    For querying integer fields on the range, you need to use the FieldName +”_Range” field, which stores the value in a format that allows integer comparisons

    re: Projections & Map Reduce

    Your initial code should actually work, except for the Sum part. We can project _values_ from documents, but we don’t allow computation during the actual query.

    The reason we require those type of things to be defined ahead of times is that we are doing something here that is quite different than what you are used to in relational databases.
    Instead of executing your query when you make it, we are actually processing the index as soon as you create it. When you make a query on an index, we give you the pre-computed results. What this means in turn is that you don’t have to do any computation during a query, which make it _really_ fast.

    As for Contains, that has to do with inefficiencies and how you set things up. If you are looking for a word, and you setup your index properly, this would actually work (because of how RavenDB handles full text searches). If you are doing a partial string match, it means that we have to do a LOT of work internally, and far less efficient about it.
    Similar to why you should avoid doing like ‘%van%’ in a relational database, because of the big performance issue.

    1. Hey guys! Thanks for the super quick responses.

      Firstly – I wouldn’t really call this a review of RavenDB as I’m so new to it – rather call it a first experiences with it – this way I’m sort of covered for any mistakes I’ve made 😉 Seriously though – let me go through both of your comments…

      Itamar… the beta site looks very nice and if you can expand the documentation, I’m sure that’ll make Raven even more accessible for us folks coming from a relational database point of view. As for not calling SaveChanges – the example I gave was contrived, I’m sure in a real-world situation you would batch them up etc.; I simply wanted to illustrate batching up of a change set which is very similar to the EF way of doing things. Not really sure what you mean about usage of Where clause – I think Oren has answered my issue with things like Contains in his comment though.

      Oren: re: ranged search – I did actually try that, but when I look at the static index through raven studio, there’s no such field. I’ll post on the mailing list with my experiences in more details. And thanks for the explanation re: indexes vs queries – that that’s the real difference between the two databases in terms of getting data out; the fact that the method on the session API is called “Query” made me initially think of it as running a regular query. It’s more like a “QueryIndex” 🙂

      Thanks again guys for the responses.

      Isaac

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s