The Interface for Multiple Home Units

Matthew Pickering – Friday, 07 January 2022

all cabal ghc hls

Over the last few weeks I have been finishing and improving the implementation of support for Multiple Home Units. A lot of the preliminary work was completed by Fendor. In short, multiple home units allows you to load different packages which may depend on each other into one GHC session. This will allow both GHCi and HLS to support multi component projects more naturally. To get a more complete overview of the why then you should first consult Fendor’s excellent introduction.

This post will concentrate on the interface and implementation of the feature. In particular we will talk about the solution to two of the issues he summarises at the end of the post, all the flags you need to know about and other limitations of the current implementation.

I originally implemented support for multiple components in Haskell Language Server (HLS) at the start of 2020 but the implementation has always been hacky. Stack also has rudimentary support for loading multiple home units into one ghci session but doesn’t allow different options or dependencies per component. Given the increasing importance of HLS, fundamental issues which affect it are now being given more priority under the normal GHC maintenance budget. Multiple Home Units is one of the first bigger projects which we hope will allow the language server to be implemented more robustly.

Interface of Multiple Home Units

Imagine that you have a project which contains two libraries, named lib-core and lib. lib-core contains some utility functions which are used by lib, so when editing lib you are often also editing lib-core. Multiple home units can make this less painful by allowing a build tool to compile lib and lib-core with one command line invocation. How would a build tool make use of this feature?

In order to specify multiple units, the -unit @⟨filename⟩ flag is given multiple times with a response file containing the arguments for each unit. The response file contains a newline separated list of arguments.

ghc -unit @unitLibCore -unit @unitLib

where the unitLibCore response file contains the normal arguments that cabal would pass to --make mode.

-this-unit-id lib-core-0.1.0.0
-i
-isrc
LibCore.Utils
LibCore.Types

The response file for lib, can specify a dependency on lib-core, so then modules in lib can use modules from lib-core.

-this-unit-id lib-0.1.0.0
-package-id lib-core-0.1.0.0
-i
-isrc
Lib.Parse
Lib.Render

Then when the compiler starts in --make mode it will compile both units lib and lib-core.

There is also very basic support for multiple home units in GHCi: at the moment you can start a GHCi session with multiple units but only the :reload command is supported. Most commands in GHCi assume a single home unit, and so it is additional work (#20889) to modify the interface to support multiple loaded home units.

Options used when working with Multiple Home Units

There are a few extra flags which have been introduced specifically for working with multiple home units. The flags allow a home unit to pretend it’s more like an installed package, for example, specifying the package name, module visibility and reexported modules.

-working-dir ⟨dir⟩

It is common to assume that a package is compiled in the directory where its cabal file resides. Thus, all paths used in the compiler are assumed to be relative to this directory. When there are multiple home units the compiler is often not operating in the standard directory and instead where the cabal.project file is located. In this case the -working-dir option can be passed which specifies the path from the current directory to the directory the unit assumes to be it’s root, normally the directory which contains the cabal file.

When the flag is passed, any relative paths used by the compiler are offset by the working directory. Notably this includes -i and -I⟨dir⟩ flags.

-this-package-name ⟨name⟩

This flag papers over the awkward interaction of the PackageImports and multiple home units. When using PackageImports you can specify the name of the package in an import to disambiguate between modules which appear in multiple packages with the same name.

This flag allows a home unit to be given a package name so that you can also disambiguate between multiple home units which provide modules with the same name.

This solves one problem that Fendor described in his blog post.

-hidden-module ⟨module name⟩

This flag can be supplied multiple times in order to specify which modules in a home unit should not be visible outside of the unit it belongs to.

The main use of this flag is to be able to recreate the difference between an exposed and hidden module for installed packages.

Fendor talked about the issue of module visibility in his blog post, and this flag solves the issue.

-reexported-module ⟨module name⟩

This flag can be supplied multiple times in order to specify which modules are not defined in a unit but should be reexported. The effect is that other units will see this module as if it was defined in this unit.

The use of this flag is to be able to replicate the reexported modules feature of packages with multiple home units.

Offsetting Paths in Template Haskell splices

When using Template Haskell to embed files into your program, traditionally the paths have been interpreted relative to the directory where the .cabal file resides. This causes problems for multiple home units as we are compiling many different libraries at once which have .cabal files in different directories.

For this purpose we have introduced a way to query the value of the -working-dir flag to the Template Haskell API. By using this function we can implement a makeRelativeToProject function which offsets a path which is relative to the original project root by the value of -working-dir.

import Language.Haskell.TH.Syntax ( makeRelativeToProject )

foo = $(makeRelativeToProject "./relative/path" >>= embedFile)

If you write a relative path in a Template Haskell splice you should use the makeRelativeToProject function so that your library works correctly with multiple home units.

A similar function already exists in the file-embed library. The function in template-haskell implements this function in a more robust manner by honouring the -working-dir flag rather than searching the file system.

Closure Property for Home Units

For tools or libraries using the GHC API there is one very important closure property which must be adhered to:

Any dependency which is not a home unit must not (transitively) depend on a home unit.

For example, if you have three packages p, q and r, then if p depends on q which depends on r then it is illegal to load both p and r as home units but not q, because q is a dependency of the home unit p which depends on another home unit r.

If you are using GHC by the command line then this property is checked, but if you are using the GHC API then you need to check this property yourself. If you get it wrong you will probably get some very confusing errors about overlapping instances.

Limitations of Multiple Home Units

There are a few limitations of the initial implementation which will be smoothed out on user demand.

Package thinning/renaming syntax is not supported (#20888)
More complicated reexports/renaming are not yet supported.
It’s more common to run into existing linker bugs when loading a large number of packages in a session (for example #20674, #20689)
Backpack is not yet supported when using multiple home units (#20890)
Dependency chasing can be quite slow with a large number of modules and packages (#20891).
Loading wired-in packages as home units is currently not supported (this only really affects GHC developers attempting to load template-haskell).
Barely any normal GHCi features are supported (#20889). It would be good to support enough for ghcid to work correctly.

Despite these limitations, the implementation works already for nearly all packages. It has been testing on large dependency closures, including the whole of head.hackage which is a total of 4784 modules from 452 packages.

Conclusion

With the first iteration of this implementation the necessary foundational aspects have been implemented to allow GHC API clients such as HLS to load multiple home units at once. The next steps are for the maintainers of build tools such as Cabal and Stack to modify their repl commands to support the new interface.

Well-Typed is able to work on GHC, HLS, Cabal and other core Haskell infrastructure thanks to funding from various sponsors. If your company is interested in contributing to this work, sponsoring maintenance efforts, or funding the implementation of other features, please get in touch.