Information plumbing

Duncan Coutts – Thursday, 08 May 2008

all coding

If you look at real programs you'll find there is often a lot of code spent just on information plumbing. Plumbing is code that doesn't do any "real work" but is just concerned with getting the right information from one bit of a program to another, often with some rearranging or impedance matching. It's not as cool as code with real algorithmic content. So how can we stop plumbing code taking over and obscuring the code that is doing real work?

A nice example is some code in cabal-install for installing a package. Well, I say installing a package but really the actual installing is done by the Cabal library. This code just gathers information from various sources and uses that to decide how to configure a package. There is a little bit of real work mixed in, like unpacking tarballs, but it is mostly plumbing. For example there is the global configure options that apply to each package, per-package flag assignments and dependencies. Then there are various options for controlling how we compile and run the Setup.hs scripts for each package, including what compiler to use and what version of the Cabal library to run them with. There's also an option to use some kind of sudo style wrapper for the install phase.

The code followed a common pattern of a series of layers, each one calling the layer below. In the cabal install example we have executeInstallPlan that calls installConfiguredPackage, which calls installAvailablePackage which calls installUnpackedPackage which calls setupWrapper (which as the name suggests is a layer on top of something else). It's a lot of layers. Lets pick out just two examples:

installConfiguredPackage verbosity
    scriptOptions miscOptions configFlags
    (ConfiguredPackage pkg flags deps)
  = installAvailablePackage verbosity
    scriptOptions miscOptions configFlags' pkg
  where
    configFlags' = configFlags {
      configConfigurationsFlags = flags,
      configConstraints = deps
    }

installAvailablePackage verbosity
    scriptOptions miscOptions configFlags
    (AvailablePackage _ pkg LocalPackage)
  = installUnpackedPackage verbosity
      scriptOptions miscOptions configFlags
      pkg Nothing

As you can see, the problem with this style is that each layer gets cluttered with passing all the information that the lower layers need. Bundling related bits of information into tuples or records helps to some extent. In this example you can see it's been done already; scriptOptions, miscOptions and configFlags are bundles of other parameters. (The name miscOptions is a sure sign that we're bundling arbitrary things together just to try and reduce the number of parameters we're passing.)

What is worst though is that we end up with a tightly coupled system. To add a new feature at the bottom layer requires changing all the layers to pipe that information all the way down.

So one nice trick is instead of having each layer directly call the next one down, we parametrise each layer by all the layers below it. For example:

installConfiguredPackage configFlags
    (ConfiguredPackage pkg flags deps)
    installPkg
  = installPkg configFlags' pkg
  where
    configFlags' = configFlags {
      configConfigurationsFlags = flags,
      configConstraints = deps
    }

installAvailablePackage
    (AvailablePackage _ pkg LocalPackage)
    installPkg
  = installPkg pkg Nothing

So we take the next layer as an installPkg parameter, do whatever transform the layer was supposed to be do and call installPkg as the next layer. So all the extra parameters that this layer did not care about are gone. They can be passed directly to installPkg by the caller.

So in the end we stick all the layers together:

executeInstallPlan installPlan $ \cpkg ->
  installConfiguredPackage configFlags cpkg $
  \configFlags' apkg ->
    installAvailablePackage verbosity apkg $
      installUnpackedPackage verbosity
          scriptOptions miscOptions configFlags'

Unlike before, we can now see all the layers at once rather than just the top layer. So we can pass those extra parameters directly down to the bottom layer. As you can see from the installConfiguredPackage layer, we can modify the values on the way down but we only have to pay in plumbing for the ones we're using at each layer rather than in every layer.

After the refactoring the code can become more general which has advantages for testing and reuse. For example installConfiguredPackage turns out to be pure and executeInstallPlan can work in any monad where as previously it was tied down to being in the IO monad because the bottom layer was in the IO monad. Not being in IO makes it much easier to use a testing system like QuickCheck and being able to pass a dummy makes it possible to test each layer in isolation.

So obviously the message must be that higher order plumbing is cool!

Well perhaps not, but it can make it a little less interconnected and easier to manage.