This is a comment I recently saw on a headline-grabbing article about Swift: -
I also don’t think that “type inferring” is of great use. If you cannot be bothered to hack in a variable’s data type, maybe you should not develop software in the first place.
I was so infuriated by this comment that I ended up writing this blog post. The thing that infuriated me so much was more the arrogance of the second sentence as much as the ignorance of why having type inference can be such a massive productivity gain.
Here are three versions of the same sentence written in three different ways – slightly contrived as usual to make my point…
- Hello, I’m Java. I have a bag of apples. In this bag of apples there are five apples.
- Hello, I’m C# 3. I have a big of apples that has five apples in it.
- Hello, I’m F#. I have a bag of five apples.
These statements don’t actually translate directly to code but you get the idea. Why would you need to write out the above sentence in either of the latter ways? Are you learning the language from a school textbook? Or using it in a professional & fluent manner?
C# can declare locally-scope implict variables and create implicit arrays. F# can do full H-M type inference. Java can’t do much of anything really. Here’s the above sentences sort of written as code (the final example uses the same C#-style syntax to keep things consistent): -
The latter example is more in the F# vein, without braces, or keywords whose usage can be inferred.
The real power of type inference comes from the ability to chain multiple functions together in a succinct manner, and then at some point in the future changing the type of one of those functions and not having to change any type signatures of dependent functions. It’s quite addictive, and once you get used to it it’s quite difficult to go back to an explicitly-typed language.
Last minute update: It turns out that the individual who I quoted above at the start of this post had confused type inference with dynamic typing. I suspect that this won’t the first or last person who has done this.
So, last week I finally released the F# Azure Storage Type Provider as v1! I learned a hell of a lot about writing Type Providers in F# as a result over the last few months… Anyway – v1.0 deals with Blobs and Tables; I’m hoping to integrate Queues and possibly Files in the future (the former is particularly powerful for a scripting point of view). You can get it on NuGet or download the source (and add issues etc.) through GitHub.
Working with Blobs
Here’s a sample set of containers and blobs in my local development storage account displayed through Visual Studio’s Server Explorer and some files in the “tp-test” container: -
You can get to a particular blob in two lines of code: -
The first line connects to the storage account – in this example I’m connecting to the local storage emulator, but to connect to a live account, just provide the account name and storage key. Once you navigate to a blob, you can download the file to the local file system, read it as a string, or stream it line-by-line (useful for dealing with large files). Of course you get full intellisense for the containers and folders automatically – this makes navigating through a storage account extremely easy to do: -
Working with Azure Tables
The Table section of the type provider gives you quick access to tables, does a lot of the heavy lifting for doing bulk inserts (automatically batching up based on partition and maximum batch size), and gives you a schema for free. This last part means that you can literally go to any pre-existing Azure Table that you might have and start working with it for CRUD purposes without any predefined data model.
Tables are automatically represent themselves with intellisense, and give you a simple API to work with: -
Results are essentially DTOs that represent the table schema. Whilst Tables have no enforced schema, individual rows themselves do have one, and we can interrogate Azure to understand that schema and build a strongly-typed data model over the top of it. So the following schema in an Azure table:
becomes this in the type provider: -
All the properties are strongly typed based on their EDM type e.g. string, float etc. etc.. We can also execute arbitrary plain-text queries or use a strongly-typed query builder to chain clauses and ultimately execute a query remotely: -
Whilst this is not quite LINQ over Azure, there’s a reason for this. Ironically, the Azure SDK supports IQueryable to table storage. But because Table Storage is weak, computationally speaking, there’s severe restrictions on what you can do with LINQ – basically just Where and Take. The benefit of a more restrictive query set that the Type Provider delivers is that it is guaranteed compile time to generate a query that will be accepted by Azure Tables, where IQueryable over Tables does not.
The generated provided types also expose the raw set of values for the entities as key/values so that you can easily push this data into other formats e.g. Deedle etc. if you want.
I’d like to make some other features for the Storage Type Provider going forward, such as: -
- Azure Storage Queue Support
- “Individuals” support (a la the SQL Type Provider) for Tables
- Support for data binding on generated entities for e.g. WPF integration
- Potentially removing the dependency on the Azure Storage SDK
- Option types on Tables (either through schema inference or provided by configuration)
- Connection string through configuration
- More Async support
If you are looking to get involved with working on a Type Provider – or have some Azure experience and want to learn more about F# – this would be a good project to cut your teeth on :-)
As C# 6 previews have come out, I was not surprised to see pattern matching absent in the feature set. It’s a shame but I can understand why it’s not included – it’s far more powerful than switch/case but with both in the language, it’d probably be difficult to make the two work together without becoming a bit of a mish mash.
Existing Pattern Matching in C#
One of the things that C# devs often say to me is that they can’t see what pattern matching really gives over switch / case – the answer is that it’s the ability to not only branch on conditions but also to implicitly bind the result to a variable simultaneously. I then realised that C# actually does already support pattern matching for a specific use case: exception handling. Look at the two versions of a simple try / catch block: -
In the first example, notice how the compiler will automatically set the next line of execution appropriately depending on which exception was raised AND will automatically bind the exception (by this I mean cast and assign) to the ex variable as appropriate. “Yeah yeah yeah” you might say – this is basic exception handling. But imagine that we didn’t have this sort of program flow and had to do it all with just if / then statements as per the second sample – it’s a bit of a mess, with typecasts and whatnot.
Pattern Matching over collections
Now, imagine that we wanted to do some conditional logic over an array of numbers: -
- If the array is 2 elements long and starts with “7″, do something
- Otherwise, if it’s 2 elements long, do something else
- Otherwise, do something else
Look at this with conventional if / then statements or with pattern matching (obviously with code that wouldn’t compile in C# but is somewhat similar to what can be achieved in F#): -
This illustrates how you can use pattern matching as a means of drastically simplifying code – even for a simple example like the one above. In F# it’s more effective because of the lightweight syntax, but even here you can see where we’re going.
This sort of pattern matching is intrinsic to F# where you can match over lists, arrays, types etc. etc. very easily, and another example of how branching and binding can easily simplify your code.
Just a short post to say that I’ve re-released the Azure Storage Type Provider on NuGet with a number of changes.
In short, on the back end I’ve re-written a lot of the backing code to reduce the size of the codebase, reorganised the folder structure to comply with other F# projects, introduced FAKE. From the end-user perspective I’ve added a load of features to improve usage of the provider across both Blobs and Tables. I’ve also tried my best to fix the issues around NuGet package dependencies, ultimately by removing them completely and simply embedding the required DLLs in the lib folder of the package. It works, but it’s not particularly pretty.
In fact, I made so many changes, I also decided to repackage and rebrand it completely. The namespace is changed, and the package has been created anew: -
The old package has now been delisted.
It’s now versioned at 0.9.0 – by this I mean it’s almost feature complete for what I would consider ready to go, but what I desperately need from the community is some feedback. Does it work out of the box? Are there massive bugs that I’ve left in? Does it perform poorly? Are there extra features you would like to see added? Not enough async support? Don’t like the way something works? Tell me :-)
F# fanbois often talk about how F# supposedly makes “composition” easier than C# (or indeed any OO-first langage). If you come from a C# background, you might not really think about what people mean by “composition”, because to be honest functional composition in the OO world is pretty difficult to achieve. You achieve it normally through inheritiance, which is a bit of a cop-out, or you start looking at things like like the Strategy pattern to achieve it in a more decoupled manner, typically through interfaces. But I tended to think of composition as something abstract before I started looking at F#.
One simpler way to achieve composition in the OO world is to use a (somewhat underused and misunderstood) feature common to IoC containers – to apply interceptors & policies to implement composition and pipelining (e.g. Decorator and Chain of Responsibility patterns). This is commonly known as Aspect Oriented Programming (AOP); common aspects include cross cutting concerns like logging and caching. Well, here’s an implementation of a fairly simple interception framework in F# that can chain up arbritrary decorators on a target function quickly and easily.
Composition through Aspects in F#
The goal would be to have a function – let’s say one that adds two numbers together – and be able to apply validation or logging to it in a decoupled manner.
First thing we do is define the signature of any “aspect”: -
Very simple – every aspect takes in a value of the same type as the “real” function, and returns a message that says either “fine, carry on, “something is wrong, here’s an exception”, or “return prematurely with this value”. Easy. Let’s take our usual friendly calculator add function and a set of sample aspects: -
How do we chain these aspects up together with the raw add function? Quite easily as it turns out, with two simple functions, plus F#’s built-in compose operator (>>). The first one “bolts” one aspect to another, and returns a function with the same signature. The second function closes the chain and bolts the final aspect to the “real” function: -
Now that we’ve done that, we can chain up our aspects together using the following syntax: -
Nice, but we can make it better by defining a couple of custom operators which allows us to write code much more succinctly: -
This sort of compositional frameworkl is relatively easy on the eye, and thanks to F#’s ability to generalise functions automatically, aspects are generic enough to achieve decent reuse without the need for reflection or codegen. Whilst this isn’t a replacement for everything that you can do with a framework like Unity, Ninject or Postsharp, this very simple example illustrates how you can rapidly compose functions together in a succinct manner. If you’re interested in more of this sort of thing, have a look at Railway Oriented Programming on the excellent F# for Fun and Profit website.
Having spent a while using Hadoop on HDInsight now, I wanted to look at writing Hadoop mapper and reducers in F#. There are several reasons for this as opposed to other languages such as Java, Python and C#. I’m not going to go into all of the usual features of F# over other languages, but the main reason is because F# lets you just “get on” with dealing with data. That’s what one of it’s main strengths, in my opinion, and is what most map-reduce jobs are about.
There’s already a .NET SDK for Hadoop that Microsoft have released. However, it does have some issues with it, not just in terms of functionality but also in terms of how well it maps with F#. The main problem that I have with it is that you write your code in an object hierarchy, inheriting from MapperBase or ReducerCombinerBase. You then have to mutate the Context that’s passed in with any outputs from your Mapper or Reducer.
I wanted something that was a bit more lightweight, and also allowed me to explore creating a parser from the Streaming Hadoop inputs. So, I’ve now put HadoopFs on GitHub, with the intention to put it on NuGet in the short term future. The main things is gives you is the ability to write mapper and reducers very easily without the need to “inherit” from any classes or anything, and also a flexible IO mechanism, so you can pipe data in or out from the “real” console (for use with the real Hadoop), file system or in-memory lists etc. (essentially anything that can be used to generate a sequence of strings). So the prototypical wordcount map / reduce looks like this: -
Three lines for the mapper (including function declaration) and four lines for the reducer. Nice. Notice that you do not need to have any dependency on HadoopFs to write your map / reduce code. It’s just a couple of arbitrary functions, which has several benefits. Firstly, it’s more accessible than having to understand a “framework” – all you have to do is understand the Hadoop MR paradigm and you’re good to go. Secondly, it’s easier to test – you can always much more easily test a pure function than something which involves e.g. mutating state of some “context” object that you need to create and provide.
The only times you use the HadoopFs types and functions is when plugging in your MR code into an executable for use with Hadoop:-
You can see from the last example how you can essentially plug in any input / output source e.g. file system or console etc.. This is very useful for e.g. unit testing as you can simply provide an in-memory list of strings and get back the output from a full map-reduce.
I still have some more work to do on it – some cleaning up of the function signatures for consistency etc., and there’s no doubt some extra corner cases to deal with, but as an experiment in doing this in a day or so, it was a good learning exercise in Hadoop streaming. Indeed, the hardest part was actually in generating a lazy group of key/values for the reduce from a flat list of sorted input rows. I’d also like to write a generic MapReduce executable that can be parameterised for the mapper or reducer that you need.
All said though, considering the entire framework including test helper classes is less than 150 lines of code, it’s quite nice I think.
In the words of Professor Farnsworth – Good news everybody! I’ve finally gotten around to looking at adding some basic Azure Table Storage support to the Azure Type Provider.
Why Table Storage?
There are some difficulties with interacting with Azure Table Storage through the native .NET API, some of which impacts how useful (or not) the Type Provider can be, and some of which the Type Provider can help with: -
- The basic API gives you back an IQueryable, but you can only use Where, Take and First. Any other calls will give a runtime exception
- You can write arbitrary queries against a table with the above restriction, but this will invoke be a full table scan
- The quickest way of getting an entity is by the Partition and Entity keys, otherwise you’ll effectively initiate a full (or at best, a partial) table scan
- You can’t get the number of rows in a table without iterating through every row
- You can’t get a complete list of partitions in a table without iterating through every row
- There’s no fixed schema. You can create your own types, but these need to inherit from Table Entity. Alternatively, you can use the DynamicTableEntity to give you key/value pair access to every row; however, accessing values of an entity is a pain as you must pick a specific “getter” e.g. ValueAsBoolean or ValueAsString.
So, how does the Type Provider help you?
Well, first, you’ll automatically get back the list of tables in your storage account, for free. On dotting to a table, the provider will automatically infer the schema based upon the first x number of rows (currently I’ve set this to 20 rows) and will automatically generate the entity type.
How do we do this? Well, a table collection doesn’t have a schema that all rows must conform to, but what you do get on each cell of each entity returned is metadata including the type which can be mapped to regular .NET types; this is made easier when using the DynamicTableEntity. The generated properties in the Type Provider will use the EDM data from the row to get the data back as the correct type e.g. String, Int32 etc. etc.. and will collate different entities in the same table as a single merged entity which is the sum of both shapes.
Once this is done, you can pull back all the rows from a specific table partition into memory and then query it to your hearts content. Here’s a little sample to get you started – imagine a table as follows: -
Then with the Azure Type Provider you can do as follows: -
- The good: player is strongly typed, down to the fact that the Cost property is a float option (not a string or object).
- The ugly: You have to explicitly supply the Partition key as plain text. There’s no easy way to get a complete list of all partitions, although I am hoping to at least suggest some partition keys based on e.g. first 100 rows.
What doesn’t it do (yet)?
- You currently can’t write arbitrary queries to execute on the server. You can pull back all the entities for a particular partition key, but that’s it, nor can you specify a limit on how many entities to bring back. I want to look at ways that you can create query expressions over these provided types, or at least ways you can create “weak” queries (off of the standard CreateQuery() call) and then pipe that into the provider
- All properties of all entities are option types. This is not so different from the real underlying Table Storage fields in a Dynamic Table Entity, which are returned as nullables for all value types (the only EDM reference type is String), and is in part because there’s no way to easily know whether any column is optional or not, but I would like to give the option for a user to say that they want e.g. all fields to not be option types and to e.g. return default(T) or throw an exception instead
- You can’t search an individual entity by Entity Key (yet)
- You can’t download an entire table as a CSV yet – but you will be able to shortly
- No write support
- No async support (yet)