Source Code

Managing Pointers, or F#’s Platform-Invoke “Gotcha”

I love how well F# “plays” with other languages. This is obviously true where its in-the-box .NET siblings are concerned. However, over the past few years, I’ve come to find it is just as seamless when mixed with good old-fashioned C code.

Well, it’s nearly as seamless.

For the most part, when using P/Invoke, one may simply copy-and-paste the signatures from a C header file (sans-semi-colons, of course). However, there is at least one scenario where naïvely doing so produces code which is not verifiably type-safe. Let’s look at a specific example. Given the follow function prototype in C:

__declspec(dllexport) void getVersion (int* major, int* minor, int* patch);

One might use the following P/Invoke signature (and associated call) in F#:

[<DllImport("somelib",CallingConvention=CallingConvention.Cdecl)>]
extern void getVersion (int* major, int* minor, int* patch)

// ...

let mutable major,minor,patch = 0,0,0
getVersion(&&major,&&minor,&&patch)
printfn "Version: %i.%i.%i" major minor patch

At this point, everything will compile and run just fine. So where’s the problem? To find out, we have to turn to an under-utilised tool in the .NET developer’s arsenal — PEVerify.



SIDEBAR — What the heck is PEVerify?

According to MSDN:

The PEVerify tool helps developers… to determine whether their MSIL code and associated metadata meet type safety requirements. Some compilers generate verifiably type-safe code only if you avoid using certain language constructs.

That’s great. And the reason we want “verifiably type-safe code” is?

Also, according to MSDN:

The common language runtime relies on the type-safe execution of application code to help enforce security and isolation mechanisms. Normally, code that is not verifiably type-safe cannot run, although you can set security policy to allow the execution of trusted but unverifiable code.

and elsewhere:

Type-safe code accesses only the memory locations it is authorized to access. (For this discussion, type safety specifically refers to memory type safety and should not be confused with type safety in a broader respect.) For example, type-safe code cannot read values from another object’s private fields. It accesses types only in well-defined, allowable ways.

So, clearly, “verifiably type-safe” is a distinction worth having. And PEVerify is a tool worth knowing.



And now back to our program (already in progress)…

Running the tool gives us the following output (reformatted for readability):

Microsoft (R) .NET Framework PE Verifier. Version 4.0.30319.1
 Copyright (c) Microsoft Corporation. All rights reserved.

[IL]: Error:
 [C:\dev\somelibfs.dll : .$Constants::.cctor]
 [mdToken=0x600008f][offset 0x0000000D]
 [found address of Int32][expected unmanaged pointer] 
 Unexpected type on the stack.

[IL]: Error:
 [C:\dev\somelibfs.dll : .$Constants::.cctor]
 [mdToken=0x600008f][offset 0x0000000D]
 [found address of Int32][expected unmanaged pointer]
 Unexpected type on the stack.

[IL]: Error:
 [C:\dev\somelibfs.dll : .$Constants::.cctor]
 [mdToken=0x600008f][offset 0x0000000D]
 [found address of Int32][expected unmanaged pointer]
 Unexpected type on the stack.

3 Error(s) Verifying C:\dev\somelibfs.dll

Clearly, something isn’t right.

It’s not very obvious, but the big clue is where PEVerify tells us it was expecting an unmanaged pointer. Turns out, when dealing with the CLR, there are two types of pointers: unmanaged and managed. The later are what you use when passing around CLR types by-reference (i.e. “byref<‘T>“ in F#, or “ref“ in C#, or “ByRef“ in VB). It also happens that you should use the managed variety if you want your F# code to be verifiably type-safe — and this includes P/Invoke calls. If you think about it, this makes sense. The runtime can only guarantee the bits it can control (i.e. the parts which are “managed”). So here’s what the F# code looks like using managed pointers instead:

[<DllImport("somelib",CallingConvention=CallingConvention.Cdecl)>]
extern void getVersion (int& major, int& minor, int& patch)

// ...

let mutable major,minor,patch = 0,0,0
getVersion(&major,&minor,&patch)
printfn "Version: %i.%i.%i" major minor patch

And, if we run PEVerify on the updated code, we get the following report:

Microsoft (R) .NET Framework PE Verifier.  Version  4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

All Classes and Methods in C:\dev\somelibfs.dll

That’s much better!

So, to recap, there are two types of pointers, as summarized in the following table:

Pointer F# Type Declaration Invocation
Unmanaged nativeint <type>* &&<type>
Managed byref <type> <type>& &type

In nearly all cases, a .NET developer should prefer the managed pointer. Leave the unmanaged risks with your C code.



I’d like to give special thanks to Jack Pappas, for finding (and helping me to understand and vanquish) this issue in fszmq.

Advertisements
Presentations

Double Trouble

Updated 23 Jan 2012

A nice recording of my presentation, along with other NYCFSUG talks, can be found here.


Updated 30th Sep 2011

Unfortunately, due to (highly disappointing) hurricane Irene, the New York City F# Users’ Group meeting has been rescheduled. Please check out this link for the details.


Shameless plug!!!

Come check out the next New York City F# Users’ Group meeting on Tuesday 30th August 2011 at 18:30 hours. It’s a double-header featuring two guys from Jersey — Steve Goguen and yours truly. Click the link below for all the pertenant details. It’s gonna be a blast!

Double Trouble with Paulmichael Blasucci and Steve Goguen

Shameless plug!!!

Random

FSharpy Goodness from the Land of Monkeys

This isn’t much of a post… except to say how pleased I am with the latest release of Mono (2.10.3). Specifically, F# Interactive (fsi, to friends) finally runs correctly on my MacBook Pro (Snow Leopard). Yay! It no longer throws an exception when you issue the “#quit” command (or just “#q”, for the cool kids). Also, it no longer spawns a separate (headless) application just to re-host the GUI message-pump. This was a long-standing pain for me, because it would cause the terminal to lose focus rather abruptly. Of course, I haven’t really tested thoroughly to ensure no other issues have appeared. But, as of right now, I’m pleased as punch. Well done, you monkey-loving geniuses!

Source Code

Getting JSON.NET to “Talk” F#, Part 1: Tuples

JavaScript Object Notation (hereafter, JSON) has become a very popular means of encoding data in a “plain text” format. And, as with so many things, there are several implementations of JSON-encoding libraries available in the .NET ecosystem. One such library, which I’ve used quite a bit, is the Newtonsoft JSON.NET library. It’s both simple and efficient. However, it has a bit of trouble understanding some of F#’s “bread and butter” data types. Fortunately, the JSON.NET library also provides several extensibility points. In this post, we’ll extend this library to support one of F#’s most fundamental types — the tuple. (Please note: I’ve assumed you already have a good working knowledge of F# and the .NET run-time.)

Before diving into the “meat” of our converter, let’s look at a sample of it in action, taken from an F# interactive session (where I’ve added some white space for the sake of clarity).

> open System;;
> open Newtonsoft.Json;;
> open Newtonsoft.Json.FSharp;;

> let employee = (012345,("Bob","Smith"),28500.00,DateTime.Today);;
val employee : int * (string * string) * float * DateTime = (12345, ("Bob", "Smith"), 28500.0, 7/4/2011 12:00:00 AM)

> let converters : JsonConverter[] = [| TupleConverter() |];;
val converters : JsonConverter [] = [|FSI_0006.Newtonsoft.Json.FSharp.TupleConverter|]

> let rawData = JsonConvert.SerializeObject(employee,converters);;
val rawData : string = "{"Item1":12345,"Item2":{"Item1":"Bob","Item2":"Smith"},"Item3"+[49 chars]

> let backAgain : (int * (string * string) * float * DateTime) = JsonConvert.DeserializeObject(rawData,converters);;
val backAgain : int * (string * string) * float * DateTime = (12345, ("Bob", "Smith"), 28500.0, 7/4/2011 12:00:00 AM)

> printfn "%b" (employee = backAgain);;
true val it : unit = ()

As eluded to in the previous example, we can encode (and decode) tuples of any length by enriching JSON.NET with a custom type converter. This may seem involved, but we’ll break the actual code into logical easy-to-digest “chunks”. First, we’ve got some “boiler-plate” code which wires our class into the JSON.NET machinery.

type TupleConverter() =
  inherit JsonConverter()

  override __.CanRead  = true
  override __.CanWrite = true

  override __.CanConvert(vType) = vType |> FSharpType.IsTuple

We start by inheriting from JsonConverter, which is the abstract base class provided by the Newtonsoft library for building custom type converters. As part of inheriting this class, we must tell JSON.NET whether our class is meant to be used for serialization (i.e. CanWrite = true), deserialization (i.e. CanRead = true), or both. We also provide an implementation of the CanConvert method. This method will be invoked (potentially very frequently) at run-time when JSON.NET wants to know if it should transfer control to us. Our logic here is very simple: if the input type is a tuple, we want it and return true; otherwise, we’re not interested and return false. Of course, the “is it a tuple?” check is delegated to a helper function provided by the F# run-time. Next, we’ve got to implement the methods for doing the actual encoding and decoding.

  override __.WriteJson(writer,value,serializer) =

Overriding the WriteJson method allows us to turn tuple instances into JSON. The Newtonsoft machinery passes three values into our method. The first, writer, is the stream to which we should write encoded data. Next up: value is the actual tuple instance to be serialized. And third comes serializer, which is a general sort of context which is threaded throughout the serialization process.

The algorithm for encoding is actual very simple and aligns with the way tuples appear when used in other .NET languages (e.g. C#). Specifically, the tuple is turned into an object with a property for each tuple element. The name for each property is the word “Item” suffixed by the tuple element’s one-based index. So, the value

("paul",32)

will be encoded to

{ 'Item1' : "paul"; 'Item2' : 32 }

To realize this algorithm, we use reflection to get the list of tuple fields. Then we iterate over those fields, writing each value to the output after being sure to emit the appropriate property name.

  let fields = value |> FSharpValue.GetTupleFields
  fields |> Array.iteri (fun i v ->
    // emit name based on values position in tuple
    let n = sprintf "Item%i" (i + 1)
    writer.WritePropertyName(n)
    // emit value or reference thereto, if necessary
    if v <> null && serializer.HasReference(v)
      then writer.WriteReference(serializer,v)
      else serializer.Serialize(writer,v))

Of course, these values need to be wrapped in curly braces (i.e. WriteStartObject and WriteEndObject). Also, in case any users of our converter want to use JSON.NET’s instance tracking feature, we’ll add a one-liner which optionally records the existence of the tuple being processed (i.e. WriteIdentity). Finally, we’ll include a bit of defensive coding, leaving the implementation of WriteJson as follows.

  override __.WriteJson(writer,value,serializer) =
    match value with
    | null -> nullArg "value" // a 'null' tuple doesn't make sense!
    | data ->
        writer.WriteStartObject()
        let fields = value |> FSharpValue.GetTupleFields
        if fields.Length > 0 then
          // emit "system" metadata, if necessary
          if serializer.IsTracking then
            writer.WriteIndentity(serializer,value)

          fields |> Array.iteri (fun i v ->
            // emit name based on values position in tuple
            let n = sprintf "Item%i" (i + 1)
            writer.WritePropertyName(n)
            // emit value or reference thereto, if necessary
            if v <> null && serializer.HasReference(v)
              then writer.WriteReference(serializer,v)
              else serializer.Serialize(writer,v))
        writer.WriteEndObject()

Now on to the most complex portion of this converter — deserialization.

  override __.ReadJson(reader,vType,_,serializer) =

We’ll again override a method; this time it’s ReadJson. The JSON.NET runtime will pass us four pieces of data when invoking our override. The first, reader is the stream of JSON tokens from which we’ll build a tuple instance. Second, we have the CLR type which JSON.NET thinks we should return. Next up is any existing value the Newtonsoft machinery might have for us. We’ll be ignoring this parameter, as it’s not useful for our purposes. The last piece of input is serializer, which we’ve already seen in the WriteJson method.

In order to generate a tuple properly, we need all of its constituent values up front. However, the Newtonsoft machinery is designed around advancing through the input stream one-token-at-a-time. To make this work, we’ll read the entire object (all of the key/value pairs between the curly braces) into a Map<string,obj> instance, via a recursive helper function.

  let readProperties (fields:Type[]) =
    let rec readProps index pairs =
      match reader.TokenType with
      | JsonToken.EndObject -> pairs // no more pairs, return map
      | JsonToken.PropertyName ->
          // get the key of the next key/value pair
          let name = readName ()
          let value,index' = match name with
                              // for "system" metadata, process normally
                              | JSON_ID | JSON_REF -> decode (),index
                              // for tuple data...
                              // use type info for current field
                              // bump offset to the next type info
                              | _ -> decode' fields.[index],index+1
          advance ()
          // add decoded key/value pair to map and continue to next pair
          readProps (index') (pairs |> Map.add name value)
      | _ -> reader |> invalidToken
    advance ()
    readProps 0 Map.empty

One of the interesting aspects of the readProperties function is it’s input. When called, we’ll give it an array of the CLR types which comprise the tuple. Then, while stepping through the JSON tokens, we can match “raw” value to CLR type as part of the deserialization process. This introduces a subtle wrinkle, though. We should ignore this type information when we encounter any Newtonsoft “metadata” in the input stream. We accomplish this by keeping track of an offset into the type array, which will only get incremented when the key/value pair under scrutiny is not “metadata”. Now with the actual JSON traversal finished, we can analyse our Map<string,obj> and take appropriate action.

If the map is simply a reference to data which has already been decoded, it will only contain an identifier as such. We can use this identifier to fetch the tuple instance from the JSON.NET run-time context.

 | Ref(trackingId) ->
     // tuple value is a reference, de-reference to actual value
     serializer.GetReference(string trackingId)

If the map holds a more sophisticated set of key/value pairs, we’ll use it as input to the construction of a new tuple instance.

  | Map(data) ->
      let inputs =
        data
          // strip away "system" meta data
          |> Seq.filter (fun (KeyValue(k,_)) -> k <> JSON_ID)
          // discard keys, retain values
          |> Seq.map (fun (KeyValue(_,v)) -> v)
          // merge values with type info
          |> Seq.zip fields
          // marshal values to correct data types
          |> Seq.map (fun (t,v) -> v |> coerceType t)
          |> Seq.toArray
      // create tuple instance
      let value = FSharpValue.MakeTuple(inputs,vType)

This bit of logic simply massages the map into an array of the appropriate values, and uses a simple helper function from the F# run-time to instantiate the tuple. Finally, we’ll put this code together with some helper methods, and some caching logic (again, in case any users of our converter want to use JSON.NET’s instance tracking feature), which leaves the complete method as follows.

  override __.ReadJson(reader,vType,_,serializer) =
    let decode,decode',advance,readName = makeHelpers reader serializer

    let readProperties (fields:Type[]) =
      let rec readProps index pairs =
        match reader.TokenType with
        | JsonToken.EndObject -> pairs // no more pairs, return map
        | JsonToken.PropertyName ->
            // get the key of the next key/value pair
            let name = readName ()
            let value,index' = match name with
                                // for "system" metadata, process normally
                                | JSON_ID | JSON_REF -> decode (),index
                                // for tuple data...
                                // use type info for current field
                                // bump offset to the next type info
                                | _ -> decode' fields.[index],index+1
            advance ()
            // add decoded key/value pair to map and continue to next pair
            readProps (index') (pairs |> Map.add name value)
        | _ -> reader |> invalidToken
      advance ()
      readProps 0 Map.empty

    match reader.TokenType with
    | JsonToken.StartObject ->
        let fields = vType |> FSharpType.GetTupleElements
        // read all key/value pairs, reifying with tuple field types
        match readProperties fields with
        | Ref(trackingId) ->
            // tuple value is a reference, de-reference to actual value
            serializer.GetReference(string trackingId)
        | Map(data) ->
            let inputs =
              data
                // strip away "system" meta data
                |> Seq.filter (fun (KeyValue(k,_)) -> k <> JSON_ID)
                // discard keys, retain values
                |> Seq.map (fun (KeyValue(_,v)) -> v)
                // merge values with type info
                |> Seq.zip fields
                // marshal values to correct data types
                |> Seq.map (fun (t,v) -> v |> coerceType t)
                |> Seq.toArray
            // create tuple instance (and cache it if necessary)
            let value = FSharpValue.MakeTuple(inputs,vType)
            if serializer.IsTracking then
              match data |> Map.tryFindKey (fun k _ -> k = JSON_ID) with
              // use existing "$id"
              | Some(k) -> serializer.AddReference(string data.[k],value)
              // make a new "$id"
              | None -> serializer.MakeReference(value) |> ignore
            value
        | _ -> raise InvalidPropertySet
    | _ -> reader |> invalidToken

And that’s what’s needed to get JSON.NET to properly understand tuples of any length. Hopefully, this post has helped to shed some light on an important — but relatively undocumented — aspect of one of the better libraries currently available in the .NET ecosystem. (It should be noted, however, there is one “feature” of JSON.NET which this converter does NOT support: embedded type information. In brief, this is one feature I wish was never added to any JSON library… but that rant could be a whole separate blog entry.) In future posts, I will (hopefully) provide similar coverage of converters for other idiomatically F# types like discriminated unions and linked lists.

The complete source code for this class, as well as some other useful code for combining F# and JSON.NET, can be found in a GitHub repository.

Uncategorized

Greetings and Salutations!

Welcome! Welcome!

Sure, this might look like any other techno-geek blog; but I hope to prove you wrong.

Please don’t click away yet!

This blog will, over the coming weeks, be a chance for me to crystallize all the thoughts swirling about in my head. But — I promise — I’ll try my best to focus on the somewhat-less-tangible aspects of software engineering. Anyone can dryly detail feature ‘x’ of technology ‘y’. (Indeed, the “blog-o-sphere” is full of these human help-file mimeographs.) Some of the topics I hope to cover in the near future include: embracing polyglot software solutions, why syntax matters, better software through right-brain/left-brain harmony, and probably something about getting it right (enough) the first time. Of course, I’ll also likely pepper in a few shorter, more-concrete entries, showcasing tricks one needs in the odd corners of particular languages, or perhaps some novel unions of technology and problem domain.

Be warned! I do a lot of development in the .NET eco-system, so most of my examples will be flavored as such (and I don’t make much effort at hiding, or sugar-coating, my opinions). Also, I make no promises as to how frequently new entries will appear. I’m all about quality — not quantity.

Finally, here’s something to ponder:

Nothing simple is ever easy, and nothing free is ever cheap.

Until next time…