Introducing Type Providers


Note: This article has been excerpted from my upcoming book, Learn F# . Save 37% off with code fccabraham!

Welcome to the world of data! This article will:

  • Gently introduce us to what type providers are
  • Get us up to speed with the most popular type provider, FSharp.Data.

What Are Type Providers?

Type Providers are a language feature first introduced in F#3.0:

An F# type provider is a component that provides types, properties, and methods for use in your program. Type providers are a significant part of F# 3.0 support for information-rich programming.

https://docs.microsoft.com/en-us/dotnet/articles/fsharp/tutorials/type-providers/index

At first glance, this sounds a bit fluffy – we already know what types, properties and methods are. And what does “information-rich programming” mean? Well, the short answer is to think of Type Providers as T4 Templates on steroids – that is, a form of code generation, but one that lives inside the F# compiler. Confused? Read on.

Understanding Type Providers

Let’s look at a somewhat holistic view of type providers first, before diving in and working with one to actually see what the fuss is all about. You’re already familiar with the notion of a compiler that parses C# (or F#) code and builds MSIL from which we can run applications, and if you’ve ever used Entity Framework (particularly the earlier versions) – or old-school SOAP web services in Visual Studio – you’ll be familiar with the idea of code generation tools such as T4 Templates or the like. These are tools which can generate C# code from another language or data source.

figure1
Figure 1 – Entity Framework Database-First code generation process

Ultimately, T4 Templates and the like, whilst useful, are somewhat awkward to use. For example, you need to attach them into the build system to get them up and running, and they use a custom markup language with C# embedded in them – they’re not great to work with or distribute.

At their most basic, Type Providers are themselves just F# assemblies (that anyone can write) that can be plugged into the F# compiler – and can then be used at edit-time to generate entire type systems for you to work with as you type. In a sense, Type Providers serve a similar purpose to T4 Templates, except they are much more powerful, more extensible, more lightweight to use, and are extremely flexible – they can be used with what I call “live” data sources, as well as offering a gateway not just to data sources but also to other programming languages.

figure2
Figure 2 – A set of F# Type Providers with supported data sources

Unlike T4 Templates, Type Providers can affect type systems without re-building the project, since they run in the background as you write code. There are dozens, if not hundreds of Type Providers out there, from working with simple flat files such as CSV, to SQL, to cloud-based data storage repositories such as Microsoft Azure Storage or Amazon Web Services S3. The term “information-rich programming” refers to the concept of bringing disparate data sources into the F# programming language in an extensible way.

Don’t worry if that sounds a little confusing – we’ll take a look at our first Type Provider in just a second.

Quick Check

  1. What is a Type Provider?
  2. How do type providers differ from T4 templates?
  3. Is the number of type providers fixed?

Working with our first Type Provider

Let’s look at a simple example of a data access challenge – working with some soccer results, except rather than work with an in-memory dataset, we’ll work with a larger, external data source – a CSV file that you can download at https://raw.githubusercontent.com/isaacabraham/learnfsharp/master/data/FootballResults.csv. You need to answer the following question: which three teams won at home the most over the whole season.

Working with CSV files today

Let’s first think about the typical process that you might use to answer this question: –

figure3
Figure 3 – Steps to parse a CSV file in order to perform a calculation on it.

Before we can even begin to perform the calculation, we first need to understand the data. This normally means looking at the source CSV file in something like Excel, and then designing a C# type to “match” the data in the CSV. Then, we do all of the usual boilerplate parsing – opening a handle to the file, skipping the header row, splitting on commas, pulling out the correct columns and parsing into the correct data types, etc. Only after doing all of that can you actually start to work with the data and produce something actually valuable. Most likely, you’ll use a console application to get the results, too. This process is more like typical software engineering – not a great fit when we want to explore some data quickly and easily.

Introducing FSharp.Data

We could quite happily do the above in F#; at least using the REPL affords us a more “exploratory” way of development. However, it wouldn’t remove the whole boilerplate element of parsing the file – and this is where our first Type Provider comes in – FSharp.Data.

FSharp.Data is an open source, freely distributable NuGet package which is designed to provide generated types when working with data in CSV, JSON, or XML formats. Let’s try it out with our CSV file.

Scripts for the win

At this point, I’m going to advise that you move away from heavyweight solutions and start to work exclusively with standalone scripts – this fits much better with what we’re going to be doing. You’ll notice a build.cmd file in the learnfsharp code repository (https://github.com/isaacabraham/learnfsharp). Run it – it uses Paket to download a number of NuGet packages into the packages folder, which you can reference directly from your scripts. This means we don’t need a project or solution to start coding – we can just create scripts and jump straight in. I’d recommend creating your scripts in the src/code-listings/ folder (or another folder at the same level, e.g. src/learning/) so that the package references shown in the listings here work without needing changes.

Now you try

  1. Create a new standalone script in Visual Studio using File -> New. You don’t need a solution here – remember that a script can work standalone.
  2. Save the newly created file into an appropriate location as described in “Scripts for the win”.
  3. Enter the following code from listing 1:
// Referencing the FSharp.Data assembly
#r @"..\..\packages\FSharp.Data\lib\net40\FSharp.Data.dll"
open FSharp.Data

// Connecting to the CSV file to provide types based on the supplied file
type Football = CsvProvider< @"..\..\data\FootballResults.csv">

// Loading in all data from the supplied CSV file
let data = Football.GetSample().Rows |> Seq.toArray

That’s it. You’ve now parsed the data, converted it into a type that you can consume from F# and loaded it into memory. Don’t believe me? Check this out: –

figure4
Figure 4 – Accessing a Provided Type from FSharp.Data

You now have full intellisense to the dataset – that’s it! You don’t have to manually parse the data set – that’s been done for you. You also don’t need to “figure out” the types – the Type Provider will scan through the first few rows and infer the types based on the contents of the file! In effect, this means that rather than using a tool such as Excel to “understand” the data, you can now begin to use F# as a tool to both understand and explore your data.

Backtick members

You’ll see from the screenshot above, as well as from the code when you try it out yourself, that the fields listed have spaces in them! It turns out that this isn’t actually a type provider feature, but one that’s available throughout F# called backtick members. Just place a double backtick (“) at the beginning and end of the member definition and you can put spaces, numbers or other characters in the member definition. Note that Visual Studio doesn’t correctly provide intellisense for these in all cases, e.g. let-bound members on modules, but it works fine on classes and records.

Whilst we’re at it, we’ll also bring down an easy-to-use F#-friendly charting library, XPlot. This library gives us access to charts available in Google Charts as well as Plotly. We’ll use the Google Charts API here, which means adding dependencies to XPlot.GoogleCharts (which also brings down the Google.DataTable.Net.Wrapper package).

  1. Add references to both the GoogleCharts and Google.DataTable.Net.Wrapper assemblies. If you’re using standalone scripts, both packages will be in the packages folder after running build.cmd – just use #r to reference the assembly inside one of the lib/net folders.
  2. Open up the GoogleCharts namespace.
  3. Execute the following code to calculate the result and plot them as a chart.
data
|> Seq.filter(fun row ->
    row.``Full Time Home Goals`` > row.``Full Time Away Goals``)
|> Seq.countBy(fun row -> row.``Home Team``)
|> Seq.sortByDescending snd
|> Seq.take 10
|> Chart.Column
|> Chart.Show
// countBy generates a sequence of tuples (team vs number of wins)
// Chart.Column converts the sequence of tuples into an XPlot Column Chart
// Chart.Show displays the chart in a browser window
figure5
Figure 5 – Visualising data sourced from the CSV Type Provider

In just a few lines of code, we were able to open up a CSV file we’ve never seen, explore the schema of it, perform some operations on it, and then chart it in less than 20 lines of code – not bad! This ability to rapidly work with and explore datasets that we’ve not even seen before, whilst still allowing us to interact with the full breadth of .NET libraries that are out there gives F# unparalleled abilities for bringing in disparate data sources to full-blown applications.

Type Erasure

The vast majority of type providers fall into the category of erasing type providers. The upshot of this is that the types generated by the provider exist only at compile time. At runtime, the types are erased and usually compile down to plain objects; if you try to use reflection over them, you won’t see the fields that you get in the code editor.

One of the downsides is that this makes them extremely difficult (if not impossible) to work with in C#. On the flip side, they are extremely efficient – you can use erasing type providers to create type systems with thousands of types without any runtime overhead, since at runtime they’re just of type Object.

Generative type providers allow for run-time reflection, but are much less commonly used (and from what I understand, much harder to develop).

If you want to know more, download the free first chapter of Learn F# and see this Slideshare presentation. Don’t forget to save 37% with code fccabraham.

F# Azure Storage Type Provider v1.0 released!


So, last week I finally released the F# Azure Storage Type Provider as v1! I learned a hell of a lot about writing Type Providers in F# as a result over the last few months… Anyway – v1.0 deals with Blobs and Tables; I’m hoping to integrate Queues and possibly Files in the future (the former is particularly powerful for a scripting point of view). You can get it on NuGet or download the source (and add issues etc.) through GitHub.

Working with Blobs

Here’s a sample set of containers and blobs in my local development storage account displayed through Visual Studio’s Server Explorer and some files in the “tp-test” container: –

1  2

You can get to a particular blob in two lines of code: –

3

The first line connects to the storage account – in this example I’m connecting to the local storage emulator, but to connect to a live account, just provide the account name and storage key. Once you navigate to a blob, you can download the file to the local file system, read it as a string, or stream it line-by-line (useful for dealing with large files). Of course you get full intellisense for the containers and folders automatically – this makes navigating through a storage account extremely easy to do: –

4 5

Working with Azure Tables

The Table section of the type provider gives you quick access to tables, does a lot of the heavy lifting for doing bulk inserts (automatically batching up based on partition and maximum batch size), and gives you a schema for free. This last part means that you can literally go to any pre-existing Azure Table that you might have and start working with it for CRUD purposes without any predefined data model.

Tables are automatically represent themselves with intellisense, and give you a simple API to work with: –

6

Results are essentially DTOs that represent the table schema. Whilst Tables have no enforced schema, individual rows themselves do have one, and we can interrogate Azure to understand that schema and build a strongly-typed data model over the top of it. So the following schema in an Azure table:

7

becomes this in the type provider: –

8

All the properties are strongly typed based on their EDM type e.g. string, float etc. etc.. We can also execute arbitrary plain-text queries or use a strongly-typed query builder to chain clauses and ultimately execute a query remotely: –

9

Whilst this is not quite LINQ over Azure, there’s a reason for this. Ironically, the Azure SDK supports IQueryable to table storage. But because Table Storage is weak, computationally speaking, there’s severe restrictions on what you can do with LINQ – basically just Where and Take. The benefit of a more restrictive query set that the Type Provider delivers is that it is guaranteed compile time to generate a query that will be accepted by Azure Tables, where IQueryable over Tables does not.

The generated provided types  also expose the raw set of values for the entities as key/values so that you can easily push this data into other formats e.g. Deedle etc. if you want.

Future Plans

I’d like to make some other features for the Storage Type Provider going forward, such as: –

  • Azure Storage Queue Support
  • “Individuals” support (a la the SQL Type Provider) for Tables
  • Support for data binding on generated entities for e.g. WPF integration
  • Potentially removing the dependency on the Azure Storage SDK
  • Option types on Tables (either through schema inference or provided by configuration)
  • Connection string through configuration
  • More Async support

If you are looking to get involved with working on a Type Provider – or have some Azure experience and want to learn more about F# – this would be a good project to cut your teeth on 🙂

Azure Storage Type Provider re-released


Just a short post to say that I’ve re-released the Azure Storage Type Provider on NuGet with a number of changes.

In short, on the back end I’ve re-written a lot of the backing code to reduce the size of the codebase, reorganised the folder structure to comply with other F# projects, introduced FAKE. From the end-user perspective I’ve added a load of features to improve usage of the provider across both Blobs and Tables. I’ve also tried my best to fix the issues around NuGet package dependencies, ultimately by removing them completely and simply embedding the required DLLs in the lib folder of the package. It works, but it’s not particularly pretty.

In fact, I made so many changes, I also decided to repackage and rebrand it completely. The namespace is changed, and the package has been created anew: –

https://www.nuget.org/packages/FSharp.Azure.StorageTypeProvider

The old package has now been delisted.

It’s now versioned at 0.9.0 – by this I mean it’s almost feature complete for what I would consider ready to go, but what I desperately need from the community is some feedback. Does it work out of the box? Are there massive bugs that I’ve left in? Does it perform poorly? Are there extra features you would like to see added? Not enough async support? Don’t like the way something works? Tell me 🙂

 

Azure Storage F# Type Provider – Now with Table Support


In the words of Professor Farnsworth – Good news everybody! I’ve finally gotten around to looking at adding some basic Azure Table Storage support to the Azure Type Provider.

Why Table Storage?

There are some difficulties with interacting with Azure Table Storage through the native .NET API, some of which impacts how useful (or not) the Type Provider can be, and some of which the Type Provider can help with: –

  • The basic API gives you back an IQueryable, but you can only use Where, Take and First. Any other calls will give a runtime exception
  • You can write arbitrary queries against a table with the above restriction, but this will invoke be a full table scan
  • The quickest way of getting an entity is by the Partition and Entity keys, otherwise you’ll effectively initiate a full (or at best, a partial) table scan
  • You can’t get the number of rows in a table without iterating through every row
  • You can’t get a complete list of partitions in a table without iterating through every row
  • There’s no fixed schema. You can create your own types, but these need to inherit from Table Entity. Alternatively, you can use the DynamicTableEntity to give you key/value pair access to every row; however, accessing values of an entity is a pain as you must pick a specific “getter” e.g. ValueAsBoolean or ValueAsString.

So, how does the Type Provider help you?

Well, first, you’ll automatically get back the list of tables in your storage account, for free. On dotting to a table, the provider will automatically infer the schema based upon the first x number of rows (currently I’ve set this to 20 rows) and will automatically generate the entity type.

How do we do this? Well, a table collection doesn’t have a schema that all rows must conform to, but what you do get on each cell of each entity returned is metadata including the type which can be mapped to regular .NET types; this is made easier when using the DynamicTableEntity. The generated properties in the Type Provider will use the EDM data from the row to get the data back as the correct type e.g. String, Int32 etc. etc.. and will collate different entities in the same table as a single merged entity which is the sum of both shapes.

Once this is done, you can pull back all the rows from a specific table partition into memory and then query it to your hearts content. Here’s a little sample to get you started – imagine a table as follows: –

Table Storage Sample

Then with the Azure Type Provider you can do as follows: –

  • The good: player is strongly typed, down to the fact that the Cost property is a float option (not a string or object).
  • The ugly: You have to explicitly supply the Partition key as plain text. There’s no easy way to get a complete list of all partitions, although I am hoping to at least suggest some partition keys based on e.g. first 100 rows.

What doesn’t it do (yet)?

  • You currently can’t write arbitrary queries to execute on the server. You can pull back all the entities for a particular partition key, but that’s it, nor can you specify a limit on how many entities to bring back. I want to look at ways that you can create query expressions over these provided types, or at least ways you can create “weak” queries (off of the standard CreateQuery() call) and then pipe that into the provider
  • All properties of all entities are option types. This is not so different from the real underlying Table Storage fields in a Dynamic Table Entity, which are returned as nullables for all value types (the only EDM reference type is String), and is in part because there’s no way to easily know whether any column is optional or not, but I would like to give the option for a user to say that they want e.g. all fields to not be option types and to e.g. return default(T) or throw an exception instead
  • You can’t search an individual entity by Entity Key (yet)
  • You can’t download an entire table as a CSV yet – but you will be able to shortly
  • No write support
  • No async support (yet)

The code is currently in GitHub, but I’ll be updating the NuGet package later this week for all to enjoy.

An Azure Type Provider for F# – first steps


I plan on blogging a bit more about my experiences with writing Type Providers in general as there’s a dearth of material readily available online. At any rate, after several false starts, I now have a moderately usable version of a Azure Blob Storage type provider on GitHub!

It’s easy to consume – just create a specific type of account, passing in your account name and key, and away you go!

Using Type Providers to improve the developer experience

It’s important to notice the developer experience as you dot your way through. First you’ll get a live list of containers: –

1

Then, on dotting into a container, you’ll get a lazily-loaded list of files in that container: –

2

Finally, on dotting into a file, you’ll get some details on the last Copy of that file as well as the option to download the file: –

3

It’s important to note that the signature of the Download() function from file to file will change depending on the extension of the file. If it’s a file that ends in .txt, .xml or .csv, it will return Async<string>; otherwise it’s an Async<Byte[]>. This is completely strongly typed – there’s no casting or dynamic typing involved, and you don’t have to worry about picking the correct overload for the file :-). This, for me, is a big value add over a static API which cannot respond in this manner to the contents of the data that it operates over – and yet with a type provider it still retains static typing!

Conclusion

I think that this type provider is somewhat different to others like the excellent FSharp.Data ones, which are geared towards programmability etc. – this one is (currently) more suited to scripting and exploration of a Blob Storage account. I still need to make the provider more stable, and add some more creature comforts to it, but I’m hoping that this will make peoples lives a bit easier when you need to quickly and easily get some data out of (and in the future, into) a Blob store.