Introducing Type Providers

abraham_fsharp_hiresmeap

Note: This article has been excerpted from my upcoming book, Learn F# . Save 37% off with code fccabraham!

Welcome to the world of data! This article will:

  • Gently introduce us to what type providers are
  • Get us up to speed with the most popular type provider, FSharp.Data.

What Are Type Providers?

Type Providers are a language feature first introduced in F#3.0:

An F# type provider is a component that provides types, properties, and methods for use in your program. Type providers are a significant part of F# 3.0 support for information-rich programming.

https://docs.microsoft.com/en-us/dotnet/articles/fsharp/tutorials/type-providers/index

At first glance, this sounds a bit fluffy – we already know what types, properties and methods are. And what does “information-rich programming” mean? Well, the short answer is to think of Type Providers as T4 Templates on steroids – that is, a form of code generation, but one that lives inside the F# compiler. Confused? Read on.

Understanding Type Providers

Let’s look at a somewhat holistic view of type providers first, before diving in and working with one to actually see what the fuss is all about. You’re already familiar with the notion of a compiler that parses C# (or F#) code and builds MSIL from which we can run applications, and if you’ve ever used Entity Framework (particularly the earlier versions) – or old-school SOAP web services in Visual Studio – you’ll be familiar with the idea of code generation tools such as T4 Templates or the like. These are tools which can generate C# code from another language or data source.

figure1
Figure 1 – Entity Framework Database-First code generation process

Ultimately, T4 Templates and the like, whilst useful, are somewhat awkward to use. For example, you need to attach them into the build system to get them up and running, and they use a custom markup language with C# embedded in them – they’re not great to work with or distribute.

At their most basic, Type Providers are themselves just F# assemblies (that anyone can write) that can be plugged into the F# compiler – and can then be used at edit-time to generate entire type systems for you to work with as you type. In a sense, Type Providers serve a similar purpose to T4 Templates, except they are much more powerful, more extensible, more lightweight to use, and are extremely flexible – they can be used with what I call “live” data sources, as well as offering a gateway not just to data sources but also to other programming languages.

figure2
Figure 2 – A set of F# Type Providers with supported data sources

Unlike T4 Templates, Type Providers can affect type systems without re-building the project, since they run in the background as you write code. There are dozens, if not hundreds of Type Providers out there, from working with simple flat files such as CSV, to SQL, to cloud-based data storage repositories such as Microsoft Azure Storage or Amazon Web Services S3. The term “information-rich programming” refers to the concept of bringing disparate data sources into the F# programming language in an extensible way.

Don’t worry if that sounds a little confusing – we’ll take a look at our first Type Provider in just a second.

Quick Check

  1. What is a Type Provider?
  2. How do type providers differ from T4 templates?
  3. Is the number of type providers fixed?

Working with our first Type Provider

Let’s look at a simple example of a data access challenge – working with some soccer results, except rather than work with an in-memory dataset, we’ll work with a larger, external data source – a CSV file that you can download at https://raw.githubusercontent.com/isaacabraham/learnfsharp/master/data/FootballResults.csv. You need to answer the following question: which three teams won at home the most over the whole season.

Working with CSV files today

Let’s first think about the typical process that you might use to answer this question: –

figure3
Figure 3 – Steps to parse a CSV file in order to perform a calculation on it.

Before we can even begin to perform the calculation, we first need to understand the data. This normally means looking at the source CSV file in something like Excel, and then designing a C# type to “match” the data in the CSV. Then, we do all of the usual boilerplate parsing – opening a handle to the file, skipping the header row, splitting on commas, pulling out the correct columns and parsing into the correct data types, etc. Only after doing all of that can you actually start to work with the data and produce something actually valuable. Most likely, you’ll use a console application to get the results, too. This process is more like typical software engineering – not a great fit when we want to explore some data quickly and easily.

Introducing FSharp.Data

We could quite happily do the above in F#; at least using the REPL affords us a more “exploratory” way of development. However, it wouldn’t remove the whole boilerplate element of parsing the file – and this is where our first Type Provider comes in – FSharp.Data.

FSharp.Data is an open source, freely distributable NuGet package which is designed to provide generated types when working with data in CSV, JSON, or XML formats. Let’s try it out with our CSV file.

Scripts for the win

At this point, I’m going to advise that you move away from heavyweight solutions and start to work exclusively with standalone scripts – this fits much better with what we’re going to be doing. You’ll notice a build.cmd file in the learnfsharp code repository (https://github.com/isaacabraham/learnfsharp). Run it – it uses Paket to download a number of NuGet packages into the packages folder, which you can reference directly from your scripts. This means we don’t need a project or solution to start coding – we can just create scripts and jump straight in. I’d recommend creating your scripts in the src/code-listings/ folder (or another folder at the same level, e.g. src/learning/) so that the package references shown in the listings here work without needing changes.

Now you try

  1. Create a new standalone script in Visual Studio using File -> New. You don’t need a solution here – remember that a script can work standalone.
  2. Save the newly created file into an appropriate location as described in “Scripts for the win”.
  3. Enter the following code from listing 1:
// Referencing the FSharp.Data assembly
#r @"..\..\packages\FSharp.Data\lib\net40\FSharp.Data.dll"
open FSharp.Data

// Connecting to the CSV file to provide types based on the supplied file
type Football = CsvProvider< @"..\..\data\FootballResults.csv">

// Loading in all data from the supplied CSV file
let data = Football.GetSample().Rows |> Seq.toArray

That’s it. You’ve now parsed the data, converted it into a type that you can consume from F# and loaded it into memory. Don’t believe me? Check this out: –

figure4
Figure 4 – Accessing a Provided Type from FSharp.Data

You now have full intellisense to the dataset – that’s it! You don’t have to manually parse the data set – that’s been done for you. You also don’t need to “figure out” the types – the Type Provider will scan through the first few rows and infer the types based on the contents of the file! In effect, this means that rather than using a tool such as Excel to “understand” the data, you can now begin to use F# as a tool to both understand and explore your data.

Backtick members

You’ll see from the screenshot above, as well as from the code when you try it out yourself, that the fields listed have spaces in them! It turns out that this isn’t actually a type provider feature, but one that’s available throughout F# called backtick members. Just place a double backtick (“) at the beginning and end of the member definition and you can put spaces, numbers or other characters in the member definition. Note that Visual Studio doesn’t correctly provide intellisense for these in all cases, e.g. let-bound members on modules, but it works fine on classes and records.

Whilst we’re at it, we’ll also bring down an easy-to-use F#-friendly charting library, XPlot. This library gives us access to charts available in Google Charts as well as Plotly. We’ll use the Google Charts API here, which means adding dependencies to XPlot.GoogleCharts (which also brings down the Google.DataTable.Net.Wrapper package).

  1. Add references to both the GoogleCharts and Google.DataTable.Net.Wrapper assemblies. If you’re using standalone scripts, both packages will be in the packages folder after running build.cmd – just use #r to reference the assembly inside one of the lib/net folders.
  2. Open up the GoogleCharts namespace.
  3. Execute the following code to calculate the result and plot them as a chart.
data
|> Seq.filter(fun row ->
    row.``Full Time Home Goals`` > row.``Full Time Away Goals``)
|> Seq.countBy(fun row -> row.``Home Team``)
|> Seq.sortByDescending snd
|> Seq.take 10
|> Chart.Column
|> Chart.Show
// countBy generates a sequence of tuples (team vs number of wins)
// Chart.Column converts the sequence of tuples into an XPlot Column Chart
// Chart.Show displays the chart in a browser window
figure5
Figure 5 – Visualising data sourced from the CSV Type Provider

In just a few lines of code, we were able to open up a CSV file we’ve never seen, explore the schema of it, perform some operations on it, and then chart it in less than 20 lines of code – not bad! This ability to rapidly work with and explore datasets that we’ve not even seen before, whilst still allowing us to interact with the full breadth of .NET libraries that are out there gives F# unparalleled abilities for bringing in disparate data sources to full-blown applications.

Type Erasure

The vast majority of type providers fall into the category of erasing type providers. The upshot of this is that the types generated by the provider exist only at compile time. At runtime, the types are erased and usually compile down to plain objects; if you try to use reflection over them, you won’t see the fields that you get in the code editor.

One of the downsides is that this makes them extremely difficult (if not impossible) to work with in C#. On the flip side, they are extremely efficient – you can use erasing type providers to create type systems with thousands of types without any runtime overhead, since at runtime they’re just of type Object.

Generative type providers allow for run-time reflection, but are much less commonly used (and from what I understand, much harder to develop).

If you want to know more, download the free first chapter of Learn F# and see this Slideshare presentation. Don’t forget to save 37% with code fccabraham.

Modelling State in F#

abraham_fsharp_hiresmeap

This article has been excerpted from Learn F#

Working with mutable data

Working with mutable data structures in the OO world follows a simple model — you create an object, and then modify its state through operations on that object.

Figure 1 – Mutating an object repeatedly

What’s tricky about this model is that it can be hard to reason about your code. Calling a method like UpdateState() above will generally have no return value; the result of calling the method is a side effect that takes place on the object.

Now you try

Let’s now put this into practice with an example — driving a car. We want to write code that allows us to drive() a car, tracking the amount of petrol used; the distance we drive determines the total amount of petrol used.

let mutable petrol = 100.0 // initial state

let drive(distance) = // modify through mutation
    if distance = “far” then petrol <- petrol / 2.0
    elif distance = “medium” then petrol <- petrol — 10.0
    else petrol <- petrol — 1.0

drive(“far”) // repeatedly modify state
drive(“medium”)
drive(“short”)

petrol // check current stat

Working like this, it’s worth noting a few things: –

  1. Calling drive() has no outputs. We call it, and it silently modifies the mutable petrol variable — we can’t know this from the type system.
  2. Methods aren’t deterministic. You can’t know what the behaviour of a method is without knowing what the (often hidden) state is, and if you call drive(“far”) 3 times, the value of petrol will change every time, depending on the previous calls.
  3. We’ve no control over the ordering of method calls. If you switch the order of calls to drive(), you’ll get a different answer.

Working with immutable data

Let’s now compare that with working with immutable data structures.

Figure 2 – Generating new state working with immutable data

In this mode of operation, we can’t mutate data. Instead, we create copies of the state with updates applied, and return that to the caller to work with; that state may be passed in to other calls that generate a new state yet again. Let’s now rewrite our code to use immutable data.

// Function explicitly dependent on state — takes in petrol and 
// distance, and returns new petrol
let drive(petrol, distance) = 
    if distance = “far” then petrol / 2.0
    elif distance = “medium” then petrol — 10.0
    else petrol — 1.0

let petrol = 100.0 // initial state

// storing output state in a value
let firstState = drive(petrol, “far”)
let secondState = drive(firstState, “medium”)

// chaining calls together manually
let finalState = drive(secondState, “short”)

We’ve made a few key changes to our code. The most obvious is that we aren’t using a mutable variable for our state any longer, but a set of immutable values. We “thread” the state through each function call, storing the intermediate states in values, which are manually passed to the next function call. Working in this manner, we gain a few benefits immediately.

  1. We can reason about behaviour more easily. Rather than hidden side effects on private fields, each method or function call can return a new version of the state that we can easily understand. This makes unit testing much easier, for example.
  2. Function calls are repeatable. We can call drive(50, “far”) as many times as we want, and it’ll always give us the same result. This is known as a pure function. Pure functions have useful properties, such as being able to be cached or pre-generated.
  3. The compiler protects us, in this case, from accidentally mis-ordering function calls, because each function call is explicitly dependent on the output of the previous call.
  4. We can see the value of each intermediate step as we “work up” towards the final state.

Passing immutable state in F#

In this example, you’ll see that we’re manually storing intermediate state and explicitly passing that to the next function call. That’s not strictly necessary, as F# has language syntax to avoid having to do this explicitly.

Now you try

Let’s try to make some changes to our drive code.

  1. Instead of using a string to represent how far we’ve driven, use an integer.
  2. Instead of “far”, check if the distance is more than 50.
  3. Instead of “medium”, check if the distance is more than 25.
  4. If the distance is > 0, reduce petrol by 1.
  5. If the distance is 0, make no change to the petrol consumption. Return the same state that was provided.

Other benefits of immutable data

A few other benefits that aren’t necessarily obvious from the above sample: –

  1. When working with immutable data, encapsulation isn’t necessarily as important as it is when working with mutable data. Sometimes encapsulation is still valuable, e.g. as part of a public API — but there are occasions where making your data read-only removes the need to “hide” your data;
  2. Multi-threading. One of the benefits of working immutable data is that you don’t need to worry about locks within a multi-threaded environment. Because there’s never any shared mutable state, you don’t need to be concerned with race conditions — every thread can access the same data as often as necessary, without change.

Performance of immutable data

I often hear this question — isn’t it much slower to constantly make copies rather than modify a single object? The answer is: yes and no. Yes, it’s slower to copy an object graph than make an in-place update. Unless you’re in a tight loop, performing millions of mutations, the cost of doing it is neglible compared to opening a database connection. Plus, many languages (including F#) have specific data structures designed to work with immutable data in a highly performant manner.

If you want to learn more about F#, go download the free first chapter of Learn F# and see this Slideshare presentation for more information and a discount code.

Learn F# for the masses


Anyone who reads my blog will have observed that I’ve not posted anything for several months now. In addition to my moving country and trying to build a company in 2016, I’ve also been writing a book.

I’m delighted to now share that Learn F# is now available on Manning’s MEAP program – hopefully the book will be content complete within the next couple of months.

The book is designed specifically for C# and VB .NET developers who are already familiar with Visual Studio and the .NET ecosystem to get up to speed with F#. Half the book focuses on core language features, whilst the second half looks at practical use cases of F# in areas such as web programming, data access and interoperability.

The book doesn’t focus on theoretical aspects of functional programming – so no discussion of monads or category theory here – but rather attempts to explain to the reader the core fundamentals of functional programming (at least, in my opinion) and apply them in a practical sense. In this sense I think that the book doesn’t overlap too much with many of the F# books out there – it doesn’t give you a hardcore understanding of the mathematical fundamentals of FP, and relates many concepts to those that the reader will already be familiar with in C# etc.. – but it will give you confidence to use, explore and learn more about F# alongside what you already know.

I’d like to think it will appeal to those developers that are already on the .NET platform and want to see how they can utilise and benefit from F# within their “day-to-day” without having to throw away everything they’ve learned so far. So you’ll see how to perform data access more easily without resorting to Entity Framework, how to perform error handling in F# in a more sane manner, parsing data files, and creating web APIs, whilst using FP & F#-specific language features directly in solving those problems.

I’ll blog about my experiences of writing a book when it’s done – for now, I hope that this book is well received and a useful addition to the excellent learning materials already available in the F# world.