Demystifying the Enigma machine with F#

24 December, 2014 2 comments

I had a couple of evenings free this week so decided to see if I could implement the Enigma machine, used during WW2 by the Nazis (and famously decrypted by the Polish, French and ultimately the British in Bletchley Park via some of the first programmable computers) in F#.

An overview of Enigma

The initial work around doing this involved gaining an understanding of how it worked. I’d actually already visited Bletchley Park a few years back as well as read up on the Enigma machine anyway, so had a limited understanding of how it worked. However, actually implementing it in code taught me quite a few things about it that I didn’t know!

At a high level, you can think of the Enigma as a machine that performs a number of substitution cyphers in a fixed pattern, with the substitutions changing after every letter. Although the shifts in substitution are relatively simple, I do wonder at just how individuals were able to crack these codes without detailed knowledge of the machines. Even with them, and even if you knew the rotors that were used, without the keys, there are still many permutations to consider. Apparently one of the main reasons that the Enigmas were eventually broken was down to human error e.g. many messages were initiated (or signed off) with the same common text, or some same messages were sent multiple times but with different encryption settings, thus enabling their decryption.

The Enigma we’re modelling was comprised of several components each with unique (and here, slightly simplified) behaviours: –

  • Three rotors. Each rotor linked to another rotor, and acted as a substitution matrix e.g. The 1st character on one rotor connected to 5th character on the adjacent rotor. Before each key press, the first rotor would cycle to the next position. After passing a specific letter (known as a “Knock On”), the next rotor would also cycle forward one notch.
  • A reflector. This took a signal from a rotor, performed a simple substitution that was guaranteed not to substitute any letter to itself, and sent it back to the rotors. These rotors would then process backwards, performing another set of three substitutions, but using the reverse cypher.
  • A plugboard. This acted as an additional substitution layer for mapping letters bi-directionally e.g. A <-> B, C <-> D etc.

Here’s how a single character would flow through the machine to be encoded: –

Enigma PipelineAfter each “stage” in the pipeline, the character would be substituted for another one, so by the time you finish, there have been nine separate substitutions and you end up with the letter “I”. Because of the nature of the rotors, of which at least one of them moves after every keypress, sending the same letter again immediately afterwards would not generate the same output.

You could also configure the Enigma in several ways: –

  • There were nine rotors, each hard coded with different substitutions, and three rotor sockets on an Enigma; thus many combinations existed for permutations depending on which rotors were inserted.
  • There were two reflectors in wide operation, one of which would be used.
  • The plugboard would be used to perform an initial substitution from the letters on the keyboard.
  • Each rotor could be given an arbitrary starting position (1-26).
  • Each rotor could be given a specific offset (ring setting) which would also apply to the substitution.

Mapping the problem into code

So, enough about the Enigma itself – how do we map this into F#! Well, to start with, here’s a simple set of types to model our (slightly dumbed down) domain: –

And that’s all we need to model the system. Note the use of single-case Discriminated Unions to provide a way to easily wrap around primitive types that are used in multiple places e.g. RingSetting and WheelPosition. Using these not only guides the compiler, but also allow us to infer usage based solely on the type being used – we don’t need to rely on variable names.

Composing functionality together

What’s interesting is how in F# you can get good results very quickly by simply starting with small functions and not necessarily worrying too much about the big picture. Once you have these small bits of functionality, you can compose them together to build much more powerful systems. Look at the pipeline diagram above, and then map that to this code below: –

Notice how closely the code above and the previous diagram map to one another. Indeed, all of these functions have identical signatures i.e. char -> char. This makes perfect sense when you consider the task of each stage of the pipeline – give me a char, and I’ll give you another one back out. You could even view the pipeline as list of (char -> char) functions that you can fold with a single character to get a result out.

Having created this simple function to translate a single character, we can now compose this function into one that can translate a whole string of characters: –

Notice how although we need to track state of the Enigma (to manage wheel rotations after each translated character) that we don’t mutate the actual enigma that was provided; rather, internally we create copies with the required changes on each pass. Once we’ve completed the capability for encrypting an entire string, we can easily build up into an easy-to-consume API: –

Conclusion

Please feel free to have a look at the full source code here. Things to take away: –

  • Problems can often easily be solved through a bottom-up approach, naturally allowing a higher-level design and API to present itself.
  • Composition and partial function application are key enablers to easily manipulate functions together.
  • A large part of the Enigma process can essentially be seen as an ordered set of pure char -> char functions.
  • F# has a concise type system that allows us to express our domain in just a few lines of code.

Also check out the use of fscheck within the set of unit tests as a way of testing specific properties of the Enigma machine with a large set of data!

Analysing website download sizes

7 December, 2014 Leave a comment

When I go to watch football matches at Tottenham, I find it interesting to see the scores and summaries of other teams during e.g. the half time break. There are many sites and applications that can do this. One I use often is the BBC website, where they have a page that summarises what is happening “around the grounds” in one page, with a list of “comments”. A comment might look as follows: –

90+2 mins Man City 1-0 Everton

Throw to Everton deep inside their own half. They have to hurry up.

Some comments may be longer. Others will have images or thumbnails alongside them. But the basic idea is to let you know what is happening across all the games in the Premier League (there are usually 4-5 games being played at once, but sometimes this can rise up to 10)

If my mobile phone network provider EE are not playing up, and I can get reception, then I will use that site at half time, full time, and sometimes mid-way through the game if there’s a stoppage for e.g. injury. The problem is that the site has recently had a bit of an overhaul and I’ve noticed that it sometimes takes age to load, so I wanted to find out why this was – was it EE or just the new BBC site design?

Analysing website performance

The site’s changed because in the past it used to simply be a static list of “comments” that would auto refresh on a regular interval; it’s now more like a SPA, with a static HTML shell with JS, plus a dynamically updating set of data in the centre. It also now loads in two parts – first, a subset of data that contains the 10 most recent comments, and then a button to load all comments for the day (for a typical Saturday, this might be up to 500 comments).

To capture the data, I used the excellent debugging and profiling support built into Firefox that lets you monitor network activity and then divide it up based on the type of data e.g. HTML or CSS. I then put that into a simple F# script and then used FSPlot to push it into HighCharts in a variety of graphs.

Some of the results were quite surprising.

The first is that in the “initial” load of the page (just to see the ten most recent comments), you have to download an astonishing 2.2MB of data, across 85 files. Worst still is how this is spread: –

chart1To put it another way – around two-thirds of the data is taken up by JavaScript or images, some more for the HTML shell and CSS, and a measly 3.5% is used for those actual ten most recent comments that you want to read. It gets even worse if you then click the “load all comments” button: –

chart2

That’s right. Over 9mb of data needs to be downloaded on your phone just to see what’s happening with the eight or ten teams playing football today. Whilst the JS, CSS and HTML shell are unchanged, over 7mb is taken up by 192 images (some of which weigh in at over 700k each). The extra content file (containing the remaining 400+ text comments) comes in at about 700k in total, which accounts for around 8% of the total amount download (although whilst this in itself is in JSON, there’s embedded HTML within the payloads).

Conclusion

BBC web team – this is not a good experience. Even on my desktop PC, when I click the “load all comments” button, it locks Firefox for a couple of seconds when it downloads the remaining comments and then associated images etc. And the main use case is very simple: –

As a football fan, I want to quickly read what is happening around the grounds so that I can laugh/cry at the results.

Nowhere in the above sentence does it say anything about wanting to wait 5 minutes whilst it loads a million images. Nowhere does it say anything about wanting to use up a percentage of my data plan for tons of irrelevant pictures. I just want to read some football commentary!

Tags: ,

Distributing the F# Mailbox Processor

4 December, 2014 12 comments

Note: This blog post is part of the 2014 F# Advent Calendar. Be sure to check out yesterday’s Intro to Data Science post by Jon Wood!

Mailbox Processors 101

If you’ve been using F# for any reasonable length of time, you’ll have come across the MailboxProcessor, AKA the F# Agent (or Actor). Mailbox Processors are cool. They give us the ability to offload load to background processors without worrying about managing the thread that they live on (as agents silently “go to sleep” when they aren’t processing anything), and they take away the pain of locking as they ensure that only one message will be processed at a time whilst automatically queuing up backed up messages. They also allow us to visualise problems differently to how we might do so when just using a raw Task, in terms of message passing. We can partition data based by pushing them to different actors and thus in some cases eliminate locking issues and transactions all together.

Here’s a sample Mailbox Processor that receives an arbitrary “message” and “priority” for a specific user and automatically accumulates them together into a file (let’s imagine it’s a distributed file e.g. Azure Blob Storage or similar):-

As is common with F#, we use a discriminated union to show the different types of messages that can be received and pattern match on them to determine the outcome. Also notice how the agent code itself is wrapped in a generator function so that we can easily create new agents as required.

All this is all great – what’s not to love? Well, in turns out that the Mailbox Processor does have a few limitations: –

  • In process only. You can’t distribute workloads across multiple machines or processes.
  • Does not handle routing. If you want multiple instances of the same agent, you need to manually create / destroy those agents on demand, and ensure that there is a way to route messages to the correct agent. This might be something as simple as a Map of string and agent, or something more complex.
  • Does not have any fault tolerance built-in. If your agent dies whilst processing a message, your message will be lost (unless you have written your own fault handling mechanism). If the agent crashes, all queued messages will also be lost.

These are things that are sometimes native in other languages like Erlang, or provided in some frameworks.

Actors on F#

At this point, it’s worth pointing out that there are already several actor frameworks that run on F# (some of more utility than others…): –

  • Akka .NET – a port of the Scala framework Akka. Akka is a tried-and-tested framework, and the port is apparently very close to the original version, so it might be a good call if you’re coming from that background.
  • Cricket - an extensible F# actor framework that has support for multiple routing and supervision mechanisms.
  • Orleans - an actor framework by Microsoft that was designed for C# but can be made to work with F#. However, it’s not particularly idiomatic in terms of F#, and seems to have been in alpha and beta forever.
  • MBrace - a general distributed compute / data framework that can be made to work as an actor framework.

One commonality between the above is that none of them use the native Mailbox Processor as a starting point. I wanted something that would enable me to simply lift my existing Mailbox Processor code into a distributed pool of workers. So, I wrote CloudAgent – a simple, easy-to-use library that is designed to do one thing and one thing only – distribute Mailbox Processors with the minimal amount of change to your existing codebase. It adds the above three features (distribution, routing and resilience) to Mailbox Processors by pushing the bulk of the work into another component – Azure Service Bus (ASB).

Azure Service Bus

ASB is a resilient, cheap and high-performance service bus offered on Azure as a platform-as-a-service (PAAS). As such, you do not worry about the physical provisioning of the service – you simply sign into the Azure portal, create a service bus, and then create things like FIFO queues or multicast Topics underneath it. The billing for this is cheap – something like $0.01 for 10,000 messages, and it takes literally seconds to create a service bus and queue.

How do we use ASB for distribution of Mailbox Processors? Well, CloudAgent uses it as a resilient backing store for the Mailbox Processor queue, so instead of seeing messages stack up in the mailbox processor itself, they stack up in Azure instead, and are pulled one at a time into the mailbox processor. CloudAgent automatically serializes and deserializes the messages, so as far as the Mailbox Processor is concerned, this happens transparently (currently this is JSON but I’m looking to plug in other frameworks such as FSPickler in the future). We’ll see now how we use the features of ASB to provide the previously mentioned three features that we want to add to Mailbox Processors.

Distribution and Routing

These first two characteristics can essentially be dealt with in one question: “How can I scale Mailbox Processor?”. Firstly, as we’re using Service Bus, it automatically handles both multiple publishers and subscribers for a given FIFO queue. This allows us to push many messages onto a queue, and have many worker processes handling messages simultaneously. This is something CloudAgent does automatically for us – when a consumer starts listening to a Service Bus Queue, it will immediately start polling for new messages (or, as we’ll see shortly, sessions), and then route then to an “appropriate” worker. What does this mean? To answer that, we need to understand that there are two types of worker models: –

Worker Pools

Worker Pools in CloudAgent are what I would classify as “dumb” agents. They do not fit in with the “actor” paradigm, but more for processing of generic messages that do not necessary need to be ordered in a specific sequence, or by a single instance. This might be useful where you need “burst out” capability for purely functional computations that can be scaled horizontally without reliance on other external sources of data. In this model, we use a standard ASB queue to hold messages, and set up as many machines as we want to process messages. (By default, CloudAgent will create 512 workers per node). Each CloudAgent node will simply poll for messages and allocate each one to a random agent in its local pool.

Actor Pools

Actor Pools fit more with the classic Agent / Actor paradigm. Here, messages are tagged with a specific ActorKey, which ensures that only a single instance of a F# Mailbox Processor will process messages for this actor at any one time. We use a feature of ASB Queues, called “Sessions”, to perform the routing: Each CloudAgent node will request the next available “session” to process; the session represents the stream of messages for a particular actor. Once a session is made available (by a message being sent to the queue, with a new actor key), this will be allocated to a particular worker node, and subsequently to a new instance of Mailbox Processor for that actor (CloudAgent maintains a map of Actor Key / Mailbox Processors for local routing).

So if you send 10 messages to Actor “Joe Bloggs”, these will all be routed to the same physical machine, and the same instance of Mailbox Processor. Once the messages “dry up”, that specific mailbox processor will be disposed of by the CloudAgent node; when new messages appear, a new instance will be allocated within the pool and the whole cycle starts again.

Here’s an example of how we would connect our existing Mailbox Processor from above into CloudAgent in terms of both producer of messages (equivalent of Post) and consumer of messages:

Notice that there is no change to the Mailbox Processor code whatsoever. All we have done is bind up the creation of a Mailbox Processor to ASB through CloudAgent. Instead of us calling Post on an agent directly, we send a message to Service Bus, which in turn is captured by CloudAgent on a worker node, and then internally Posted to the appropriate Mailbox Processor. In this context, you can think of CloudAgent as a framework over Mailbox Processors to route and receive messages through Azure Service Bus Queues.

Resiliency

An orthagonal concern to routing and distribution is that of message resiliency. One of the features that we get for free by pushing messages through Azure Service Bus is that messages waiting to be processed are by definition resilient – they’re stored on Service Bus for a user-defined period until they are picked up off the queue and processed. You might set this to a minute, a day, or a week – it doesn’t matter. So until a message starts to be processed, we do not have to worry if no consumers are available. But what about messages that are being processed – what if the Mailbox Processor crashes part way through? Again, CloudAgent offers us a way of solving this: –

Basic Agents

Basic Agents contain no fault tolerance within the context of message processing. This actually fits nicely within the context of the standard “Fire-and-forget” mechanism of Posting messages to F# MPs. Whilst it’s your responsibility to ensure that you handle failures yourself, you don’t have to worry about repeats of messages – they will only ever be sent to the pool once. You might use this for messages that might go wrong once in a while where it’s not the end of the world if they do. With a Basic Agent, the above Mailbox Processor code sample would not need to change at all.

Resilient Agents

Service Bus also optionally gives us “at least once” processing. This means that once we finish processing a message, we must response to Service Bus and “confirm” that we successfully processed it. If we don’t respond in time, Service Bus will assume that the processor has failed, and resend the message; if too many attempts fail, the message will automatically get dead-lettered. How do we map this “confirmation” process into MailboxProcessors? That’s easy – through a variant of the native PostAndReply mechanism offered by Mailbox Processors. Here, every message we receive contains a payload and a reply channel we call with a choice of Completed, Failed or Abandoned. Failure tells Service Bus to retry (until it exceeds the retry limit), whilst Abandon will immediately dead-letter the message. This is useful for “bad data” rather than transient failures such as database connection failures, where you would probably want the retry functionality.

Here’s how we change our Mailbox Processor code to take advantage of this resilient behaviour; notice that the Receive() call now returns a payload and a callback function that gets translated into ASB. We also add a new business rule that says we can never have more than 5 messages saved at once; if we do, we’ll reject the message by Abandoning it: –

At the cost of having to explicitly reply to every message we process, we now get retry functionality with automatic dead lettering. If the agent crashes and does not respond in a specific time, Service Bus will also automatically resend the message for a new agent to pick up. Bear in mind though that this also means however that if the first consumer does not respond in time, Service Bus will assume it has died and repost it – so a message may be posted many times. Therefore, in this model, you should design your agents to process messages in an idempotent manner.

Conclusion

Here’s a screenshot of three processes (all running on the same machine, but could be distributed) subscribing to the same service bus through CloudAgent and being sent messages for three different Actors. Notice how all the messages for a given actor have an affinity to a particular console window (and therefore consumer and agent): –

CloudAgentService Bus enables us to quickly and easily convert Mailbox Processors from single-process concepts into massively scalable and fault-tolerant workers that can be used as dumb workers or as actors with in-built routing. The actual CloudAgent code is pretty small – just four or five .fs files and around 500 lines of code. This isn’t only because F# is extremely succinct and powerful, but also because Azure is doing the heavy lifting of everything we want from a messaging subsystem; when coupled with the Message Processor / Agent paradigm, I believe that this forms a simple, yet compelling offering for distributing processing in a fairly friction-free manner.

Lightweight websites with F#

3 September, 2014 16 comments

There are several common approaches I’ve seen people take on the .NET platform when writing web-based applications that I want to review in terms of language and framework choice: -

  • Adopt a conventional MVC application approach. Write static HTML that is emitted from the server using e.g. Razor markup + C# / VB .NET, write your controllers and any back-end logic in C#.
  • As above, but replace your back-end logic with F#. This is a reasonable first step to take, because essentially all your data access “back-end” processing are performed in a language that it’s best suited for, whilst your C# is relegated to essentially thin controllers and some simple markup logic.
  • Adopt a “SPA”-style approach. But this I mean split your web application into two distinct applications – a client-side application that is self-managing, typically using Javascript and some framework like Knockout or AngularJS; meanwhile your back-end is a hosted WebAPI written in F#.
  • Write the entire application in F#. Surely you can’t write websites in F# can you? Well, actually, there are some (pretty sophisticated) frameworks like WebSharper out there that can do that, rewriting your F# into e.g. Typescript and the like.

I haven’t used WebSharper in depth so can’t comment on the effectiveness of writing your client-side code in F# and therefore not going to talk about the latter option today. but I have written WebAPIs in F# and want to talk about where I do think your separation of concerns should lie with respect to client and server side code.

As far as I’m concerned, if you’re a .NET developer today, writing websites, then you should be writing as much as of the CLR-side code as possible in F#. I am really pleased with the brevity that you can get from the combination of OWIN, Katana (Microsoft’s web-hosting OWIN framework), Web API and F#. This combination will allow you to create Web APIs that can be created simply and easily, and when combined with a SPA client-side website is a compelling architectural offering.

Sudoku in F#

Some months ago, I wrote a Sudoku solver in F# (I think that there’s a gist somewhere with the implementation). I wanted to try to write a website on top of it with a visual board that allowed you to quickly enter a puzzle and get the solution back. So, having borrowed some HTML and CSS from an existing website, I set about doing it. You can see the finished site here and the source code is here.

Untitled2

Client

  • HTML
  • AngularJS
  • Typescript (no native Javascript please!)

 Server

  • F#
  • F#
  • F#

Standard JSON is used to pass data between website and server. On the server side, we use OWIN, Katana and Web API to handle the web “stuff”. This then ties into the real processing with the minimum of effort. This was all done in a single solution and a single F# project.

OWIN with F#

I’m no Angular or Typescript expert so I’m not going to focus on it – suffice it to say that Typescript is a massive leap over standard Javascript whilst retaining backwards compatibility, and AngularJS is a decent MVC framework that runs in Javascript. What I’m more interested in talking about is how to host and run the entire site through a single F# project. Mark Seeman‘s excellent blog has already discussed creating ASP .NET websites through F#, and there are indeed some templates that you can download for Visual Studio that enable this. However, they still use ASP .NET and the full code-bloat that it comes with. Conversely, using OWIN and Katana, this all goes away. What I like about OWIN is that there’s no code-generation, no uber folder hierarchies or anything like that, you have full control over the request / response pipeline, plus you get the flexibility to change hosting mechanisms extremely easily. To startup, all we need is to download a (fair) few NuGet packages, and then create a Startup class with a Configuration method: -

Now you have that you can simply create Web API controllers: -

So two F# files, a web.config and you’re good to go from a server-side point of view. Talking of web.config – how do you create an F# web project? Mark Seeman’s blog gives full details on creating Visual Studio web projects that are F# compliant, but essentially just adding the “Project Type” GUID in the .fsproj file (I think it’s 349C5851-65DF-11DA-9384-00065B846F21) will do the job.

Combining client and server side assets

UntitledBecause this is a full .NET web project you can do all the things that you would normally do in C# web projects, such as serve up static files (perfect for a SPA) like HTML, Javascript and CSS, as well as generating from Typescript files (just add an project import for the Typescript msbuild target). If you appreciate the extra security you get from F# over other statically typed .NET languages, you’ll almost certainly want to use Typescript over raw Javascript as well, so this should be a given.

A single project that can serve up your web assets and the server side logic to go with it looks pretty simple – in this screenshot, in the api folder is my back-end logic – message contracts between client and server, the actual puzzle solver and the Web API controller.

Client side assets are few and far between – just a SudokuController.ts to hold the controller logic and Index.HTML + stylesheet for the presentation layer. It’s important to note that with a SPA framework like AngularJS, you serve static HTML and Javascript; the Javascript then essentially bootstraps, modifying the HTML dynamically, requesting JSON from the WebAPI and occasionally getting more static HTML. You never modify HTML on the server as you would do with something like Razor.

In addition, as it’s a “normal” website, with Visual F# 3.1.2, you can use Azure websites to deploy this easily – either through VS to manually publish out to Azure, or through Azure’s excellent source control integration to e.g. GitHub or BitBucket webhooks. It’s never been easier to get a CI deploy of a website out.

More flexibility with Web API

Another important thing about Owin is that it separates out the hosting element of the website from the actual project structure. So, after talking about all this nice website project integration, there’s actually nothing to stop you creating a standard F# library project, and then use either the Owin WebHost console application (available over NuGet), or create an empty website or Azure worker and then host it through that via the Owin Host. All this can be done without making any changes to your actual configuration class or actual controllers.

Conclusion

A common misconception around F# is that it’s great for use as a “computation engine” where you give it a number and it gives you back another number. Or perhaps a “data processing engine” where it can read some data from a flat file or a web service and do something to it. These are both true – however, there is very little reason why you can’t use it for full featured Web APIs using Owin (as there’s no code generation to miss out on from e.g. VS/C# projects), and with a minimum amount of effort, even as a full website host for a SPA that will consume that same Web API.

In my next post I want to replace the use of Typescript with F# using Funscript to illustrate how you can have a fully-managed end to end solution for both client and server in F#.

On Type Inference


This is a comment I recently saw on a headline-grabbing article about Swift: -

I also don’t think that “type inferring” is of great use. If you cannot be bothered to hack in a variable’s data type, maybe you should not develop software in the first place.

I was so infuriated by this comment that I ended up writing this blog post. The thing that infuriated me so much was more the arrogance of the second sentence as much as the ignorance of why having type inference can be such a massive productivity gain.

Here are three versions of the same sentence written in three different ways – slightly contrived as usual to make my point…

  • Hello, I’m Java. I have a bag of apples. In this bag of apples there are five apples.
  • Hello, I’m C# 3. I have a big of apples that has five apples in it.
  • Hello, I’m F#. I have a bag of five apples.

These statements don’t actually translate directly to code but you get the idea. Why would you need to write out the above sentence in either of the latter ways? Are you learning the language from a school textbook? Or using it in a professional & fluent manner?

C# can declare locally-scope implict variables and create implicit arrays. F# can do full H-M type inference. Java can’t do much of anything really. Here’s the above sentences sort of written as code (the final example uses the same C#-style syntax to keep things consistent): –

The latter example is more in the F# vein, without braces, or keywords whose usage can be inferred.

The real power of type inference comes from the ability to chain multiple functions together in a succinct manner, and then at some point in the future changing the type of one of those functions and not having to change any type signatures of dependent functions. It’s quite addictive, and once you get used to it it’s quite difficult to go back to an explicitly-typed language.

Last minute update: It turns out that the individual who I quoted above at the start of this post had confused type inference with dynamic typing. I suspect that this won’t the first or last person who has done this.

F# Azure Storage Type Provider v1.0 released!

22 June, 2014 1 comment

So, last week I finally released the F# Azure Storage Type Provider as v1! I learned a hell of a lot about writing Type Providers in F# as a result over the last few months… Anyway – v1.0 deals with Blobs and Tables; I’m hoping to integrate Queues and possibly Files in the future (the former is particularly powerful for a scripting point of view). You can get it on NuGet or download the source (and add issues etc.) through GitHub.

Working with Blobs

Here’s a sample set of containers and blobs in my local development storage account displayed through Visual Studio’s Server Explorer and some files in the “tp-test” container: –

1  2

You can get to a particular blob in two lines of code: -

3

The first line connects to the storage account – in this example I’m connecting to the local storage emulator, but to connect to a live account, just provide the account name and storage key. Once you navigate to a blob, you can download the file to the local file system, read it as a string, or stream it line-by-line (useful for dealing with large files). Of course you get full intellisense for the containers and folders automatically – this makes navigating through a storage account extremely easy to do: –

4 5

Working with Azure Tables

The Table section of the type provider gives you quick access to tables, does a lot of the heavy lifting for doing bulk inserts (automatically batching up based on partition and maximum batch size), and gives you a schema for free. This last part means that you can literally go to any pre-existing Azure Table that you might have and start working with it for CRUD purposes without any predefined data model.

Tables are automatically represent themselves with intellisense, and give you a simple API to work with: –

6

Results are essentially DTOs that represent the table schema. Whilst Tables have no enforced schema, individual rows themselves do have one, and we can interrogate Azure to understand that schema and build a strongly-typed data model over the top of it. So the following schema in an Azure table:

7

becomes this in the type provider: –

8

All the properties are strongly typed based on their EDM type e.g. string, float etc. etc.. We can also execute arbitrary plain-text queries or use a strongly-typed query builder to chain clauses and ultimately execute a query remotely: –

9

Whilst this is not quite LINQ over Azure, there’s a reason for this. Ironically, the Azure SDK supports IQueryable to table storage. But because Table Storage is weak, computationally speaking, there’s severe restrictions on what you can do with LINQ – basically just Where and Take. The benefit of a more restrictive query set that the Type Provider delivers is that it is guaranteed compile time to generate a query that will be accepted by Azure Tables, where IQueryable over Tables does not.

The generated provided types  also expose the raw set of values for the entities as key/values so that you can easily push this data into other formats e.g. Deedle etc. if you want.

Future Plans

I’d like to make some other features for the Storage Type Provider going forward, such as: –

  • Azure Storage Queue Support
  • “Individuals” support (a la the SQL Type Provider) for Tables
  • Support for data binding on generated entities for e.g. WPF integration
  • Potentially removing the dependency on the Azure Storage SDK
  • Option types on Tables (either through schema inference or provided by configuration)
  • Connection string through configuration
  • More Async support

If you are looking to get involved with working on a Type Provider – or have some Azure experience and want to learn more about F# – this would be a good project to cut your teeth on :-)

Pattern Matching in C#?


As C# 6 previews have come out, I was not surprised to see pattern matching absent in the feature set. It’s a shame but I can understand why it’s not included – it’s far more powerful than switch/case but with both in the language, it’d probably be difficult to make the two work together without becoming a bit of a mish mash.

Existing Pattern Matching in C#

One of the things that C# devs often say to me is that they can’t see what pattern matching really gives over switch / case – the answer is that it’s the ability to not only branch on conditions but also to implicitly bind the result to a variable simultaneously. I then realised that C# actually does already support pattern matching for a specific use case: exception handling. Look at the two versions of a simple try / catch block: –

In the first example, notice how the compiler will automatically set the next line of execution appropriately depending on which exception was raised AND will automatically bind the exception (by this I mean cast and assign) to the ex variable as appropriate. “Yeah yeah yeah” you might say – this is basic exception handling. But imagine that we didn’t have this sort of program flow and had to do it all with just if / then statements as per the second sample – it’s a bit of a mess, with typecasts and whatnot.

Pattern Matching over collections

Now, imagine that we wanted to do some conditional logic over an array of numbers: –

  • If the array is 2 elements long and starts with “7”, do something
  • Otherwise, if it’s 2 elements long, do something else
  • Otherwise, do something else

Look at this with conventional if / then statements or with pattern matching (obviously with code that wouldn’t compile in C# but is somewhat similar to what can be achieved in F#): –

This illustrates how you can use pattern matching as a means of drastically simplifying code – even for a simple example like the one above. In F# it’s more effective because of the lightweight syntax, but even here you can see where we’re going.

Summary

This sort of pattern matching is intrinsic to F# where you can match over lists, arrays, types etc. etc. very easily, and another example of how branching and binding can easily simplify your code.

SuperSite for Windows

Spreading the Gospel of Isaac

ScottGu's Blog

Spreading the Gospel of Isaac

Search Msdn

Spreading the Gospel of Isaac

elastacloud = azurecoder + bareweb

The Official Elastacloud Blog for Happy Times in the Cloud

Robin Osborne

Spreading the Gospel of Isaac

Fabulous adventures in coding

Spreading the Gospel of Isaac

Follow

Get every new post delivered to your Inbox.

Join 627 other followers