Well-Typed are delighted to announce a release preview of hs-bindgen, a tool
for automatic Haskell binding generation from C header files. We hope to invite
some feedback on this initial release and then publish the first “official”
version 0.1 in a few weeks. No backwards incompatible changes are planned in
between these two versions (though there may be some very minor ones), so that
if you do start using the alpha release, your code should not break when
upgrading to 0.1.
This blog post will be a brief overview of what hs-bindgen can do for you,
as well as a summary of the status of the project. You will find links for
further reading at the very end, the most important of which is probably the
(draft) manual. The
hs-bindgen repository also contains a number of partial
example bindings,
including one for rpm; the
hs-bindgen nix tutorial
includes one further partial example set of bindings, for
wlroots.
Installation
The alpha version of hs-bindgen is not yet released on Hackage. Instead,
add the following to your cabal.project file:
source-repository-package
type: git
location: https://github.com/well-typed/hs-bindgen
tag: release-0.1-alpha
subdir: c-expr-dsl c-expr-runtime hs-bindgen hs-bindgen-runtime
source-repository-package
type: git
location: https://github.com/well-typed/libclang
tag: release-0.1-alpha
We have however uploaded three package candidates, primarily so that they can be used to lookup Haddocks:
hs-bindgen-runtime provides runtime support for the code generated by
hs-bindgen, and in many cases is also necessary for interacting with that codec-expr-runtime provides similar support for bindings generated for CPP macros
hs-bindgen-the-library for using
hs-bindgenin Template Haskell mode.
Introduction
We’ll start very simple. Let’s generate bindings for some library A, which offers the following API:
struct Version {
int major;
int minor;
};
void showVersion(struct Version v);If you want to follow along, you can find the examples in this blog post on GitHub.
Invoking hs-bindgen
There are multiple ways to integrate hs-bindgen into your project (we’ll
mention a few others below), but for now we will focus on just running it on the
command line:
cabal run -- hs-bindgen-cli preprocess \
--overwrite-files \
--unique-id com.well-typed.hs-bindgen-0.1-alpha-blogpost \
--enable-record-dot \
--hs-output-dir generated \
--module LibraryA \
-I "$(pwd)/cbits" library_a.h
The first few arguments tell hs-bindgen-cli that it is okay to overwrite
existing files, specify a unique identifier to avoid generating C name
collisions1, enable the optional record-dot syntax to avoid
having to prefix all field names (following the recommendations in Haskell
Unfolder #45: Haskell records in
2025), set the output directory,
and specify the desired name of the generated Haskell module.
The final line deserves a slightly more detailed explanation. Suppose you are
generating bindings for a library that is installed in /opt/library, and that
library has a header /opt/library/api/server/secure.h. The bindings generated
by hs-bindgen will need to refer to this header using a #include statement;
the question is what that #include statement should look like. If we generated
#include </opt/library/api/server/secure.h>then the generated bindings would only work on machines where that library is installed in that exact location. If this include path should instead be
#include <api/server/secure.h>with /opt/library in the search path, then hs-bindgen must be invoked with
-I /opt/library and main argument api/server/secure.h: the final argument
is included precisely as-is in the #include. For our simple example,
-I "$(pwd)/cbits" means that the cbits folder of our Haskell package is
added to the C include path, and the generated #include will be
#include <library_a.h>Generated bindings
The above invocation of hs-bindgen will have generated a few Haskell modules.
LibraryA.hs contains the translation of the types in the header:
module LibraryA where
data Version = Version {
major :: CInt
, minor :: CInt
}
deriving stock (Eq, Show)
instance Storable Version where (..)
instance HasField "major" (Ptr Version) (Ptr CInt) where (..)
instance HasField "minor" (Ptr Version) (Ptr CInt) where (..)(The above code is slightly cleaned up for readability, and various parts of the generated module are omitted: boilerplate such as imports and required language extensions, Haddocks, as well as some more specialized type class instances.)
In addition, module LibraryA/Safe.hs will contain safe imports for all
functions in the header:
module LibraryA.Safe where
showVersion :: Version -> IO ()
showVersion = (..)Since the C function showVersion takes a struct argument by value (rather than
as a pointer to the struct), which is not supported by Haskell FFI, hs-bindgen
will also have generated a C wrapper; that all happens transparently and
automatically.
Module LibraryA/Unsafe.hs contains the same API, but using unsafe imports
(if you need a refresher on safe versus unsafe, you might like to watch
Haskell Unfolder #36: concurrency and the
FFI). Module LibraryA/FunPtr.hs
finally contains the addresses of all functions, in case you have C code that
works with function pointers.
Using the generated bindings
We can call showVersion very simply as
import LibraryA
import LibraryA.Safe
main :: IO ()
main = showVersion $ Version 2 3We will come back to the HasField instances for Ptr Version below; no
explicit HasField instances for Version itself are necessary, because they
are generated by ghc.
Dependencies
Suppose library B defines some kind of API for drivers, and suppose it uses library A:
#include "library_a.h"
struct Driver {
char* name;
struct Version version;
};
void initDriver(struct Driver *d);
void showDriver(struct Driver *d);Main headers
If we run hs-bindgen on this header in the same way as we did for
library_a.h, we will get warnings such as
[Warning] [HsBindgen] [select] 'struct Driver' at "../cbits/library_b.h 8:8"
Could not select declaration (direct select predicate match):
Transitive dependency not selected:
'struct Version' at "../cbits/library_a.h 3:8"
Adjust the select predicate or enable program slicing
When we generate bindings for a header, we need to know which declarations in
that header the user wants to generate bindings for; in hs-bindgen this
happens by means of selection predicates. The default selection predicate is
--select-from-main-headers, which means that any declarations in headers
explicitly mentioned on the command line are selected, but any declarations in
headers that might be imported by those headers are not. The first way in
which we can fix this warning therefore is by explicitly generate bindings for
both library_a.h and library_b.h:
cabal run -- hs-bindgen-cli preprocess \
(..other arguments as before..)
--module LibraryB \
-I "$(pwd)/cbits" library_a.h library_b.h
Program slicing
Suppose we are really only interested in library B, and want to generate only those bindings in library A that are required by library B. To do this, we can enable program slicing:
cabal run -- hs-bindgen-cli preprocess \
(..)
--enable-program-slicing \
-I "$(pwd)/cbits" library_b.h
This will pull in only those declarations in library A that are referenced by library B.
Opaque types
The API provided by library B exclusively works with pointers to struct Driver; so perhaps we don’t need a Haskell-side representation of that struct
at all. If that is the case, we can configure hs-bindgen through a
prescriptive binding specification, and tell it that it should keep the
Haskell type opaque:
version:
hs_bindgen: 0.1.0
binding_specification: '1.0'
ctypes:
- headers: library_b.h
cname: struct Driver
hsname: Driver
hstypes:
- hsname: Driver
representation: emptydataThis states that the C declaration struct Driver, found in header
library_b.h, should be mapped to a Haskell type called Driver (we could pick
a different name here if we wanted to, overriding naming decisions made by
hs-bindgen), and that the Haskell type Driver be represented as an empty datatype.
If we now run
cabal run -- hs-bindgen-cli preprocess \
(..)
--prescriptive-binding-spec libraryB.yaml \
-I "$(pwd)/cbits" library_b.h
then the generated LibraryB is simply
data DriverIt can be quite useful to combine emptydata with program slicing, limiting
how many declarations from imported headers are in fact needed.
Composability
The solutions in the previous section all had one important downside: in none of
them the generated code for library B reused the generated code for library A.
This kind of composability of generated bindings is an important goal of
hs-bindgen, influencing many design decisions. Composability is achieved
through (external) binding specifications; a binding specification is a
.yaml (or .json) file describing a set of generated bindings, a bit like
.hi files in Haskell, or a module signature in
OCaml or
Backpack.
When we generate the bindings for library A we can ask hs-bindgen to
additionally generate a binding spec:
cabal run -- hs-bindgen-cli preprocess \
(..)
--gen-binding-spec libraryA.yaml
The resulting .yaml file describes the types generated for library A,
including which type class instances they have (necessary in order to know
which type class instances we can generate when these types are used in other
libraries). When generating bindings for library B, we can pass this binding
specification along:
cabal run -- hs-bindgen-cli preprocess \
(..)
--external-binding-spec libraryA.yaml \
-I "$(pwd)/cbits" library_b.h
The generated code then looks like
module LibraryB where
import qualified LibraryA
data Driver = Driver {
name :: Ptr CChar
, version :: LibraryA.Version
}
deriving stock (Eq, Show)
instance Storable Driver where (..)
instance HasField "name" (Ptr Driver) (Ptr (Ptr CChar)) where (..)
instance HasField "version" (Ptr Driver) (Ptr LibraryA.Version) where (..)External binding specifications are useful not only when generating bindings for
multiple libraries, but also for structuring bindings for multiple headers of
the same library, or when an identical header is included in lots of libraries
(such as the rtwtypes.h header generated by MATLAB). We consider external
binding specifications to be an essential feature of hs-bindgen.
Pointers
HasField
Suppose we want to override one value deeply nested in some C data structure. We
could use the Storable instances to peek the value, then override the
appropriate field, and finally poke the updated value:
main :: IO ()
main = do
alloca $ \(driverPtr :: Ptr Driver) -> do
initDriver driverPtr
driver <- peek driverPtr
poke driverPtr $ driver & #version % #minor .~ 2
showDriver driverPtr(The use of lenses here is optional of course.)
However, it may well be undesirable to marshall the entire structure back and
forth merely to change a single value. This is why hs-bindgen also generates
HasField instances for pointers, so that record dot syntax can be used
to index C structures. We can update the minor version number without
marshalling the entire structure as follows:
poke driverPtr.version.minor 3If you prefer to avoid record-dot syntax, you can use the
HsBindgen.Runtime.HasCField
infrastructure directly.
FunPtr
When dealing with a higher order API, hs-bindgen will generate additional
bindings to convert back and forth between C function pointers and Haskell
functions, and package these conversions up as instances of two type classes in
hs-bindgen-runtime:
class ToFunPtr a where
toFunPtr :: a -> IO (FunPtr a)
class FromFunPtr a where
fromFunPtr :: FunPtr a -> a
withFunPtr :: ToFunPtr a => a -> (FunPtr a -> IO b) -> IO bFor example, suppose the driver API in library B additionally contains
struct Driver;
typedef int RunDriver(struct Driver* self);
struct Driver {
// .. other fields as before ..
RunDriver* run;
};
int callDriver(struct Driver* d);then we generate
newtype RunDriver = RunDriver {
unwrap :: Ptr Driver -> IO CInt
}
instance ToFunPtr RunDriver where (..)
instance FromFunPtr RunDriver where (..)Here’s how we might use this:
counter <- newIORef 0
let run = RunDriver $ \_self -> atomicModifyIORef counter $ \x -> (succ x, x)
withFunPtr run $ \funPtr -> do
poke driverPtr.run funPtr
replicateM_ 5 $ print =<< callDriver driverPtrMacros
Some C libraries make part of their API available as CPP macros. Macros in C
don’t have a semantics per se; they are merely a list of tokens that the
preprocessor splices in whenever they are used. In order to nonetheless be able
to generate “bindings” for macros, hs-bindgen imbues macros with a bespoke
semantics. In future versions of hs-bindgen this macro infrastructure
will be pluggable (#942),
because the default semantics may not suit all applications.
Constants
Low level C libraries (such as this Analog Devices Talise driver) may make certain constants such as bitfields available as CPP macros
#define SIGNALID 0x01We translate these to Haskell constants:
sIGNALID :: CInt
sIGNALID = 1Expressions
It may also happen that libraries offer certain functionality as macro functions. For example, a library might provide a definition that provides a pointer offset for devices with memory-mapped I/O:
#define INPUT_PORT(x) x + 4Since hs-bindgen has no way of knowing if that (+) operator should be
interpreted as integer addition or pointer offset (or indeed something else),
it generates a very general definition:
iNPUT_PORT :: Add a CInt => a -> AddRes a CIntAdd and AddRes come from c-expr-runtime); one way that we can instantiate this type is to:
inputPort :: Ptr Word8 -> Ptr Word8
inputPort = iNPUT_PORTTypes
Finally, some low-level libraries define types as macros:
#define S16_TYPE short intIn the case (and only in the case) that these can be parsed as the corresponding typedef
typedef short int S16_TYPE;hs-bindgen will treat these as typedefs, and generate a Haskell newtype:
newtype S16_TYPE = S16_TYPE {
unwrap :: CShort
}Squashing
C structs are often defined using a typedef:
typedef struct Point {
int x;
int y;
} Point;This is typically done for syntactic convenience only, making it possible to
write simply Point rather than struct Point. When the only use of a struct
is within a typedef in this manner, hs-bindgen will “squash” the typedef,
and generate a single type only:
data Point = Point {
x :: CInt
, y :: CInt
}If however the typedef is not the only use of the struct, then
hs-bindgen assumes that this is intended to convey some kind of semantic
information, and will not squash. For example, suppose the device driver API
includes
typedef struct Driver DeviceDriver;then we generate
newtype DeviceDriver = DeviceDriver {
unwrap :: Driver
}To avoid confusion, a notice will be emitted whenever a type is squashed. If
desired, this notice can be suppressed with
--log-as-info select-mangle-names-squashed.
In a future release of hs-bindgen squashing will be configurable per typedef
using a prescriptive binding spec
(#1436).
Build process integration
Preprocessor
The primary way of invoking hs-bindgen is by calling hs-bindgen-cli, as we
have discussed in this blogpost, raising the question of how to integrate that
into your build process. We don’t have a perfect answer here just yet. You
can of course write your own Makefile, or integrate this into a nix
derivation (see the Nix tutorial). If you want to use cabal as the main driver, you can use a custom
setup script, or the new
SetupHooks
API; we don’t provide explicit support for either just yet
(#1666 tracks adding
support for SetupHooks).
We do offer a “literate” mode, abusing Cabal’s support for literate Haskell to
trick it into running hs-bindgen instead; see section
Cabal preprocessor integration
of the hs-bindgen manual for details.
Template Haskell
If you prefer, you can use hs-bindgen in TH mode; this avoids the need for
running the preprocessor altogether. Instead, you can
#include a C header in a Haskell module:
let cfg :: Config
cfg = def & #clang % #extraIncludeDirs .~ [Pkg "cbits"]
in withHsBindgen cfg def $
hashInclude "library_a.h"Since we cannot generate more than one module in this way, by default this
splices in bindings for all the types in the header along with the safe
imports, though you can override that if you wish. Of course, you can also
use hashInclude in more than one Haskell module to manually generate multiple
modules. See Template Haskell
mode
in the manual for details.
Cross-compilation
We rely on libclang for
all machine-dependent decisions (as well as parsing the C headers in the first
place). Therefore if you want to generate code for a target differing from your host
platform, in principle it should suffice to provide the appropriate clang
arguments. We are working on a guide for doing so; this will become section
Cross-compilation
in the manual; however, this section is not quite ready yet
(#1630).
Non-portability
It is important to note that the bindings generated by hs-bindgen are not
portable in general (indeed, it will be rare that they are). The slogan to
remember is:
The bindings generated by hs-bindgen should be regarded as build artefacts.
If you do want to distribute generated bindings as part of your package, you
can of course do so, but then you are responsible for making the appropriate
provisions in your .cabal file (perhaps using SetupHooks) to check machine
architecture, choose between different sets of bindings, etc. At present
hs-bindgen does not yet provide any explicit support for making this process
easier.
Conclusions
Although Haskell provides good FFI features, writing bindings to large C
libraries by hand can be a laborious process. The bindings generated by
hs-bindgen are low-level: char* is translated to Ptr CChar, not
ByteString or Text or something else still; nonetheless, it should make the
process of writing bindings significantly easier. Automatically generating
high-level bindings is something we’ll soon turn our attention to.
Before we do so, however, there is still some
cleanup to be done
before we can release version 0.1. As it stands, the alpha release supports
nearly all of C, although some missing corner-cases are still missing (such
as implicit fields,
#1649); see the issue
tracker for a
list of missing C features.
However, we would like to invite you to start using the alpha release now; there
should only be very minor backwards incompatible changes introduced in between
versions 0.1-alpha and 0.1, so your code should not break when upgrading to version 0.1.
Acknowledgements
Many people at Well-Typed have contributed to hs-bindgen, including Travis
Cardwell, Joris Dral, Dominik Schrempf, Armando Santos, Sam Derbyshire, and
others (as well as myself, Edsko de Vries).
Well-Typed is grateful to Anduril Industries for sponsoring this work.
Further reading
- Paper on
hs-bindgen: Automatic C Bindings Generation for Haskell, published at Haskell 2025. You might also like to watch the presentation. - The
hs-bindgenmanual. Note that while many sections of the manual have already been written, some are still empty or out of date; this is one of the things that will be completed prior to the official 0.1 release. - The
hs-bindgen nix tutorial. This tutorial shows how to generate partial bindings for two libraries (pcapandwlroots). Although this assumesnix, most of the information here will be relevant for non-nixusers also. - The
examplesfolder of thehs-bindgenrepo, which contains example partial bindings for a bunch of libraries, such asBotan,minisat,QR-Code-generator,rogueutil,rpmandlibpcap.
The C namespace is a global namespace, so when
hs-bindgenintroduces new C symbols, they must also be globally unique. Normally the uniqueness of the symbols introduced byhs-bindgenis derived from the uniqueness of the C symbols for which we are generating bindings, but it may happen that multiple Haskell libraries include bindings for the same C function. In such a case it is important that the--unique-idused for these different Haskell modules is different.↩︎