Well-Typed are delighted to announce a release preview of hs-bindgen, a tool for automatic Haskell binding generation from C header files. We hope to invite some feedback on this initial release and then publish the first “official” version 0.1 in a few weeks. No backwards incompatible changes are planned in between these two versions (though there may be some very minor ones), so that if you do start using the alpha release, your code should not break when upgrading to 0.1.

This blog post will be a brief overview of what hs-bindgen can do for you, as well as a summary of the status of the project. You will find links for further reading at the very end, the most important of which is probably the (draft) manual. The hs-bindgen repository also contains a number of partial example bindings, including one for rpm; the hs-bindgen nix tutorial includes one further partial example set of bindings, for wlroots.

Installation

The alpha version of hs-bindgen is not yet released on Hackage. Instead, add the following to your cabal.project file:

source-repository-package
  type: git
  location: https://github.com/well-typed/hs-bindgen
  tag: release-0.1-alpha
  subdir: c-expr-dsl c-expr-runtime hs-bindgen hs-bindgen-runtime

source-repository-package
  type: git
  location: https://github.com/well-typed/libclang
  tag: release-0.1-alpha

We have however uploaded three package candidates, primarily so that they can be used to lookup Haddocks:

  • hs-bindgen-runtime provides runtime support for the code generated by hs-bindgen, and in many cases is also necessary for interacting with that code

  • c-expr-runtime provides similar support for bindings generated for CPP macros

  • hs-bindgen-the-library for using hs-bindgen in Template Haskell mode.

Introduction

We’ll start very simple. Let’s generate bindings for some library A, which offers the following API:

struct Version {
  int major;
  int minor;
};

void showVersion(struct Version v);

If you want to follow along, you can find the examples in this blog post on GitHub.

Invoking hs-bindgen

There are multiple ways to integrate hs-bindgen into your project (we’ll mention a few others below), but for now we will focus on just running it on the command line:

cabal run -- hs-bindgen-cli preprocess \
  --overwrite-files \
  --unique-id com.well-typed.hs-bindgen-0.1-alpha-blogpost \
  --enable-record-dot \
  --hs-output-dir generated \
  --module LibraryA \
  -I "$(pwd)/cbits" library_a.h

The first few arguments tell hs-bindgen-cli that it is okay to overwrite existing files, specify a unique identifier to avoid generating C name collisions1, enable the optional record-dot syntax to avoid having to prefix all field names (following the recommendations in Haskell Unfolder #45: Haskell records in 2025), set the output directory, and specify the desired name of the generated Haskell module.

The final line deserves a slightly more detailed explanation. Suppose you are generating bindings for a library that is installed in /opt/library, and that library has a header /opt/library/api/server/secure.h. The bindings generated by hs-bindgen will need to refer to this header using a #include statement; the question is what that #include statement should look like. If we generated

#include </opt/library/api/server/secure.h>

then the generated bindings would only work on machines where that library is installed in that exact location. If this include path should instead be

#include <api/server/secure.h>

with /opt/library in the search path, then hs-bindgen must be invoked with -I /opt/library and main argument api/server/secure.h: the final argument is included precisely as-is in the #include. For our simple example, -I "$(pwd)/cbits" means that the cbits folder of our Haskell package is added to the C include path, and the generated #include will be

#include <library_a.h>

Generated bindings

The above invocation of hs-bindgen will have generated a few Haskell modules. LibraryA.hs contains the translation of the types in the header:

module LibraryA where

data Version = Version {
    major :: CInt
  , minor :: CInt
  }
  deriving stock (Eq, Show)

instance Storable Version where (..)

instance HasField "major" (Ptr Version) (Ptr CInt) where (..)
instance HasField "minor" (Ptr Version) (Ptr CInt) where (..)

(The above code is slightly cleaned up for readability, and various parts of the generated module are omitted: boilerplate such as imports and required language extensions, Haddocks, as well as some more specialized type class instances.)

In addition, module LibraryA/Safe.hs will contain safe imports for all functions in the header:

module LibraryA.Safe where

showVersion :: Version -> IO ()
showVersion = (..)

Since the C function showVersion takes a struct argument by value (rather than as a pointer to the struct), which is not supported by Haskell FFI, hs-bindgen will also have generated a C wrapper; that all happens transparently and automatically.

Module LibraryA/Unsafe.hs contains the same API, but using unsafe imports (if you need a refresher on safe versus unsafe, you might like to watch Haskell Unfolder #36: concurrency and the FFI). Module LibraryA/FunPtr.hs finally contains the addresses of all functions, in case you have C code that works with function pointers.

Using the generated bindings

We can call showVersion very simply as

import LibraryA
import LibraryA.Safe

main :: IO ()
main = showVersion $ Version 2 3

We will come back to the HasField instances for Ptr Version below; no explicit HasField instances for Version itself are necessary, because they are generated by ghc.

Dependencies

Suppose library B defines some kind of API for drivers, and suppose it uses library A:

#include "library_a.h"

struct Driver {
  char* name;
  struct Version version;
};

void initDriver(struct Driver *d);
void showDriver(struct Driver *d);

Main headers

If we run hs-bindgen on this header in the same way as we did for library_a.h, we will get warnings such as

[Warning] [HsBindgen] [select] 'struct Driver' at "../cbits/library_b.h 8:8"
  Could not select declaration (direct select predicate match):
    Transitive dependency not selected:
      'struct Version' at "../cbits/library_a.h 3:8"
      Adjust the select predicate or enable program slicing

When we generate bindings for a header, we need to know which declarations in that header the user wants to generate bindings for; in hs-bindgen this happens by means of selection predicates. The default selection predicate is --select-from-main-headers, which means that any declarations in headers explicitly mentioned on the command line are selected, but any declarations in headers that might be imported by those headers are not. The first way in which we can fix this warning therefore is by explicitly generate bindings for both library_a.h and library_b.h:

cabal run -- hs-bindgen-cli preprocess \
  (..other arguments as before..)
  --module LibraryB \
  -I "$(pwd)/cbits" library_a.h library_b.h

Program slicing

Suppose we are really only interested in library B, and want to generate only those bindings in library A that are required by library B. To do this, we can enable program slicing:

cabal run -- hs-bindgen-cli preprocess \
  (..)
  --enable-program-slicing \
  -I "$(pwd)/cbits" library_b.h

This will pull in only those declarations in library A that are referenced by library B.

Opaque types

The API provided by library B exclusively works with pointers to struct Driver; so perhaps we don’t need a Haskell-side representation of that struct at all. If that is the case, we can configure hs-bindgen through a prescriptive binding specification, and tell it that it should keep the Haskell type opaque:

version:
  hs_bindgen: 0.1.0
  binding_specification: '1.0'
ctypes:
- headers: library_b.h
  cname: struct Driver
  hsname: Driver
hstypes:
- hsname: Driver
  representation: emptydata

This states that the C declaration struct Driver, found in header library_b.h, should be mapped to a Haskell type called Driver (we could pick a different name here if we wanted to, overriding naming decisions made by hs-bindgen), and that the Haskell type Driver be represented as an empty datatype. If we now run

cabal run -- hs-bindgen-cli preprocess \
  (..)
  --prescriptive-binding-spec libraryB.yaml \
  -I "$(pwd)/cbits" library_b.h

then the generated LibraryB is simply

data Driver

It can be quite useful to combine emptydata with program slicing, limiting how many declarations from imported headers are in fact needed.

Composability

The solutions in the previous section all had one important downside: in none of them the generated code for library B reused the generated code for library A. This kind of composability of generated bindings is an important goal of hs-bindgen, influencing many design decisions. Composability is achieved through (external) binding specifications; a binding specification is a .yaml (or .json) file describing a set of generated bindings, a bit like .hi files in Haskell, or a module signature in OCaml or Backpack.

When we generate the bindings for library A we can ask hs-bindgen to additionally generate a binding spec:

cabal run -- hs-bindgen-cli preprocess \
  (..)
  --gen-binding-spec libraryA.yaml

The resulting .yaml file describes the types generated for library A, including which type class instances they have (necessary in order to know which type class instances we can generate when these types are used in other libraries). When generating bindings for library B, we can pass this binding specification along:

cabal run -- hs-bindgen-cli preprocess \
  (..)
  --external-binding-spec libraryA.yaml \
  -I "$(pwd)/cbits" library_b.h

The generated code then looks like

module LibraryB where

import qualified LibraryA

data Driver = Driver {
    name    :: Ptr CChar
  , version :: LibraryA.Version
  }
  deriving stock (Eq, Show)

instance Storable Driver where (..)

instance HasField "name"    (Ptr Driver) (Ptr (Ptr CChar))      where (..)
instance HasField "version" (Ptr Driver) (Ptr LibraryA.Version) where (..)

External binding specifications are useful not only when generating bindings for multiple libraries, but also for structuring bindings for multiple headers of the same library, or when an identical header is included in lots of libraries (such as the rtwtypes.h header generated by MATLAB). We consider external binding specifications to be an essential feature of hs-bindgen.

Pointers

HasField

Suppose we want to override one value deeply nested in some C data structure. We could use the Storable instances to peek the value, then override the appropriate field, and finally poke the updated value:

main :: IO ()
main = do
    alloca $ \(driverPtr :: Ptr Driver) -> do
      initDriver driverPtr

      driver <- peek driverPtr
      poke driverPtr $ driver & #version % #minor .~ 2
      showDriver driverPtr

(The use of lenses here is optional of course.)

However, it may well be undesirable to marshall the entire structure back and forth merely to change a single value. This is why hs-bindgen also generates HasField instances for pointers, so that record dot syntax can be used to index C structures. We can update the minor version number without marshalling the entire structure as follows:

poke driverPtr.version.minor 3

If you prefer to avoid record-dot syntax, you can use the HsBindgen.Runtime.HasCField infrastructure directly.

FunPtr

When dealing with a higher order API, hs-bindgen will generate additional bindings to convert back and forth between C function pointers and Haskell functions, and package these conversions up as instances of two type classes in hs-bindgen-runtime:

class ToFunPtr a where
  toFunPtr :: a -> IO (FunPtr a)

class FromFunPtr a where
  fromFunPtr :: FunPtr a -> a

withFunPtr :: ToFunPtr a => a -> (FunPtr a -> IO b) -> IO b

For example, suppose the driver API in library B additionally contains

struct Driver;
typedef int RunDriver(struct Driver* self);

struct Driver {
  // .. other fields as before ..
  RunDriver* run;
};

int callDriver(struct Driver* d);

then we generate

newtype RunDriver = RunDriver {
    unwrap :: Ptr Driver -> IO CInt
  }

instance ToFunPtr   RunDriver where (..)
instance FromFunPtr RunDriver where (..)

Here’s how we might use this:

counter <- newIORef 0
let run = RunDriver $ \_self -> atomicModifyIORef counter $ \x -> (succ x, x)
withFunPtr run $ \funPtr -> do
  poke driverPtr.run funPtr
  replicateM_ 5 $ print =<< callDriver driverPtr

Macros

Some C libraries make part of their API available as CPP macros. Macros in C don’t have a semantics per se; they are merely a list of tokens that the preprocessor splices in whenever they are used. In order to nonetheless be able to generate “bindings” for macros, hs-bindgen imbues macros with a bespoke semantics. In future versions of hs-bindgen this macro infrastructure will be pluggable (#942), because the default semantics may not suit all applications.

Constants

Low level C libraries (such as this Analog Devices Talise driver) may make certain constants such as bitfields available as CPP macros

#define SIGNALID 0x01

We translate these to Haskell constants:

sIGNALID :: CInt
sIGNALID = 1

Expressions

It may also happen that libraries offer certain functionality as macro functions. For example, a library might provide a definition that provides a pointer offset for devices with memory-mapped I/O:

#define INPUT_PORT(x) x + 4

Since hs-bindgen has no way of knowing if that (+) operator should be interpreted as integer addition or pointer offset (or indeed something else), it generates a very general definition:

iNPUT_PORT :: Add a CInt => a -> AddRes a CInt

Add and AddRes come from c-expr-runtime); one way that we can instantiate this type is to:

inputPort :: Ptr Word8 -> Ptr Word8
inputPort = iNPUT_PORT

Types

Finally, some low-level libraries define types as macros:

#define S16_TYPE short int

In the case (and only in the case) that these can be parsed as the corresponding typedef

typedef short int S16_TYPE;

hs-bindgen will treat these as typedefs, and generate a Haskell newtype:

newtype S16_TYPE = S16_TYPE {
    unwrap :: CShort
  }

Squashing

C structs are often defined using a typedef:

typedef struct Point {
  int x;
  int y;
} Point;

This is typically done for syntactic convenience only, making it possible to write simply Point rather than struct Point. When the only use of a struct is within a typedef in this manner, hs-bindgen will “squash” the typedef, and generate a single type only:

data Point = Point {
    x :: CInt
  , y :: CInt
  }

If however the typedef is not the only use of the struct, then hs-bindgen assumes that this is intended to convey some kind of semantic information, and will not squash. For example, suppose the device driver API includes

typedef struct Driver DeviceDriver;

then we generate

newtype DeviceDriver = DeviceDriver {
    unwrap :: Driver
  }

To avoid confusion, a notice will be emitted whenever a type is squashed. If desired, this notice can be suppressed with --log-as-info select-mangle-names-squashed. In a future release of hs-bindgen squashing will be configurable per typedef using a prescriptive binding spec (#1436).

Build process integration

Preprocessor

The primary way of invoking hs-bindgen is by calling hs-bindgen-cli, as we have discussed in this blogpost, raising the question of how to integrate that into your build process. We don’t have a perfect answer here just yet. You can of course write your own Makefile, or integrate this into a nix derivation (see the Nix tutorial). If you want to use cabal as the main driver, you can use a custom setup script, or the new SetupHooks API; we don’t provide explicit support for either just yet (#1666 tracks adding support for SetupHooks).

We do offer a “literate” mode, abusing Cabal’s support for literate Haskell to trick it into running hs-bindgen instead; see section Cabal preprocessor integration of the hs-bindgen manual for details.

Template Haskell

If you prefer, you can use hs-bindgen in TH mode; this avoids the need for running the preprocessor altogether. Instead, you can #include a C header in a Haskell module:

let cfg :: Config
    cfg = def & #clang % #extraIncludeDirs .~ [Pkg "cbits"]
 in withHsBindgen cfg def $
       hashInclude "library_a.h"

Since we cannot generate more than one module in this way, by default this splices in bindings for all the types in the header along with the safe imports, though you can override that if you wish. Of course, you can also use hashInclude in more than one Haskell module to manually generate multiple modules. See Template Haskell mode in the manual for details.

Cross-compilation

We rely on libclang for all machine-dependent decisions (as well as parsing the C headers in the first place). Therefore if you want to generate code for a target differing from your host platform, in principle it should suffice to provide the appropriate clang arguments. We are working on a guide for doing so; this will become section Cross-compilation in the manual; however, this section is not quite ready yet (#1630).

Non-portability

It is important to note that the bindings generated by hs-bindgen are not portable in general (indeed, it will be rare that they are). The slogan to remember is:

The bindings generated by hs-bindgen should be regarded as build artefacts.

If you do want to distribute generated bindings as part of your package, you can of course do so, but then you are responsible for making the appropriate provisions in your .cabal file (perhaps using SetupHooks) to check machine architecture, choose between different sets of bindings, etc. At present hs-bindgen does not yet provide any explicit support for making this process easier.

Conclusions

Although Haskell provides good FFI features, writing bindings to large C libraries by hand can be a laborious process. The bindings generated by hs-bindgen are low-level: char* is translated to Ptr CChar, not ByteString or Text or something else still; nonetheless, it should make the process of writing bindings significantly easier. Automatically generating high-level bindings is something we’ll soon turn our attention to.

Before we do so, however, there is still some cleanup to be done before we can release version 0.1. As it stands, the alpha release supports nearly all of C, although some missing corner-cases are still missing (such as implicit fields, #1649); see the issue tracker for a list of missing C features. However, we would like to invite you to start using the alpha release now; there should only be very minor backwards incompatible changes introduced in between versions 0.1-alpha and 0.1, so your code should not break when upgrading to version 0.1.

Acknowledgements

Many people at Well-Typed have contributed to hs-bindgen, including Travis Cardwell, Joris Dral, Dominik Schrempf, Armando Santos, Sam Derbyshire, and others (as well as myself, Edsko de Vries).

Well-Typed is grateful to Anduril Industries for sponsoring this work.

Further reading


  1. The C namespace is a global namespace, so when hs-bindgen introduces new C symbols, they must also be globally unique. Normally the uniqueness of the symbols introduced by hs-bindgen is derived from the uniqueness of the C symbols for which we are generating bindings, but it may happen that multiple Haskell libraries include bindings for the same C function. In such a case it is important that the --unique-id used for these different Haskell modules is different.↩︎