This is the twenty-second edition of our GHC activities report, which describes the work on GHC, Cabal and related projects that we are doing at Well-Typed. The current edition covers roughly the months of December 2023 to February 2024. You can find the previous editions collected under the ghc-activities-report tag.

Many thanks to our sponsors who make this work possible: Anduril, Hasura and Juspay. In addition, we are grateful to Mercury for funding specific work on improved performance for developer tools on large codebases, and to the Sovereign Tech Fund for funding work on Cabal.

However, we need more sponsorship to sustain the team! If your company might be able to contribute funding to sustain this work, please read about how you can help or get in touch.

Of course, Haskell tooling is a large community effort, and Well-Typed’s contributions are just a small part of this. This report does not aim to give an exhaustive picture of all GHC work that is ongoing, and there are many fantastic features currently being worked on that are omitted here simply because none of us are currently involved in them. Furthermore, the aspects we do mention are still the work of many people. In many cases, we have just been helping with the last few steps of integration. We are immensely grateful to everyone contributing to GHC!

Team

The GHC team at Well-Typed currently consists of Ben Gamari, Andreas Klebinger, Matthew Pickering, Zubin Duggal, Sam Derbyshire and Rodrigo Mesquita, with Hannes Siebenhandl joining the team in January and Finley McIlwaine moving to another client project. In addition, many others within Well-Typed are contributing to GHC more occasionally.

Releases

Zubin released GHC 9.6.4 in January and GHC 9.8.2 in February. We are now working towards the release of GHC 9.10 later in the year. Check out the GHC status page for more information on release plans.

Eras profiling

Matthew and Zubin recently implemented a new profiling mode, eras profiling, that can give insight into when particular objects are allocated. This can be a great boon in diagnosing memory leaks in long-running programs.

Check out our blog post introducing eras profiling for more information about this new feature, and an exploration of how we used this new profiling mode to diagnose a memory leak in GHCi. Matthew also used eras profiling to diagnose a space leak in GHC’s simplifier (!11914).

The combination of eras profiling and ghc-debug works particularly well for analysing memory leaks, so Zubin has been making various improvements to ghc-debug (MR 32), including improving how it handles profiled executables (MR 35, MR 36).

A new home for GHC’s internals

GHC’s base library has long served a dual purpose: on one hand it is the user-facing standard library interface, but at the same time it contains many internal details used to implement the standard library. This dual purpose lead to problems for both implementors and users alike, as internal interfaces are freely interspersed with long-stable interfaces intended for general consumption. Even worse, the documentation of base often provided little guidance to users regarding which interfaces fell into which category.

Earlier this year, the Core Libraries Committee and GHC Team agreed a path to improve this situation by splitting base into three libraries: base, ghc-internal, and ghc-experimental. Our hope is that this approach will allow us to solve several problems at once:

  • base gives users a clearly-demarcated set of stable interfaces, overseen by the Core Libraries Commiteee.
  • ghc-experimental gives developers of new language and library features a dedicated place to iterate on their designs while still allowing usage to users willing to accept a slightly lower degree of stability.
  • ghc-internal provides a home for internal implementation details that are not intended for consumption by users, and potentially change from release to release.

Ben has been working on implementing this split by separating out definitions that belong in the ghc-internal package (!11400). This split has lead to a number of improvements across the ecosystem, ranging from Haddock improvements (see Haddock issues 1629, 1630) to compiler bug-fixes (#24436) and implementation cleanups (#24472).

Exception backtraces

Ben has been working to land his long-running and long-awaited Exception Backtrace Proposal (!8869) following extensive discussions with the Core Libraries Committee. This is expected to form part of GHC 9.10 and will be a major step towards making exception diagnosis easier for users.

GHC Steering Committee and GHC2024

Adam has now taken on the role of Secretary to the GHC Steering Committee, following Joachim Breitner stepping down after many years of dedicated service in the role. His first major task as secretary has been seeking new volunteers to serve on the commitee. If you would be interested, please read more and get in touch.

The committee has updated the collection of recommended language extensions by introducing GHC2024. GHC 9.10 will ship with GHC2024 available (!12084), but it is unclear when it will become the default (see ghc-proposals MR 632).

STM correctness and performance

Andreas has been diagnosing progress and performance issues with STM prompted by a user reporting STM starvation problems (#24142). In particular:

  • STM transaction performance scales badly with the number of TVars involved (#24410), because the current implementation uses a linked list to keep track of all TVars used by a transaction. Ben explored one approach for improving this situation, using a hashmap for these lookups (!12030).

  • Transactions with a large number of TVars may perform badly (#24427) due to a check performed by the RTS each time Haskell threads return to the scheduler. This check identifies potentially non-terminating STM transactions by validating the transaction’s view of the STM memory against the memory’s current state. While very useful, this check is somewhat costly to perform, and under the current implementation can also lead to false negatives when multiple validations happen in parallel. It is likely that the best solution for this issue is to perform validations less frequently, especially on long running transactions.

  • In pathological cases, two transactions run in parallel may be unable to make progress (#24446), even if all transactions are read only. This should be solvable with a rework of how TVars are locked during validation.

Unfortunately, fixing these issues will require further work.

Specialisation and late plugins

Finley has been exploring techniques to make it easier to diagnose issues with specialisation in large applications, such as poor runtime performance due to overloaded calls not being specialised. One workaround for such problems is exposing all unfoldings and using aggresive specialisation, but this tends to lead to poor compile-time performance instead.

Motivated by these investigations he added “late plugins”, which are plugins that are run at the very end of the Core pipeline, after the addition of late cost centres (!11765). This allows plugins to analyse and modify the Core that is compiled down to STG, without the changes ending up in interface files.

Cabal

Matthew, Rodrigo and Sam have been working to address longstanding architectural and maintenance issues in the Cabal library and the cabal-install build tool. This work is being supported by the Sovereign Tech Fund as discussed in our previous blog post.

Some of the changes have included:

  • Designing and implementing a new build-type: Hooks feature to provide a path towards deprecating build-type: Custom. Based on community feedback, Sam iterated on the design, with a particular focus on pre-build rules, arriving at a design inspired by Cloud Haskell, using static pointers. See the detailed HF Tech Proposal for an in-depth explanation of the design and its benefits. The implementation is now being prepared for review (PR 9551).

  • Disentangling implicit global state from the Cabal library, allowing it to take a working directory as an argument instead of using the working directory of the current process (PR 9718). This is intended to allow directly calling the Cabal library to build packages in a concurrent setting.

  • Working on a design and prototype implementation for private dependencies (issue 4035), allowing packages to express the fact that they do not expose any types from a dependency in their API. This gives greater flexibility to construct build plans, potentially making library version upgrades easier, and allows tests and benchmarks to compare different versions of the same library.

  • Making the testsuite more robust, including refactoring it to run tests in a separate temporary directory so they are not influenced by the external configuration of the user’s system (PR 9717).

  • Allowing per-component builds with Haskell Program Coverage (HPC) information (PR 9464).

  • Refactoring to eliminate long-standing code duplication that was a regular source of bugs in the logic for building components (PR 9602) and in glob support (PR 9673).

  • Fixing several longstanding bugs with the install command often ignoring CLI flags (PR 9697).

  • Robustly handling the same GHC version having been compiled from source multiple times (PR 9618), as the GHC version number is not enough to ensure ABI-compatibility.

  • Many more bug fixes and refactorings to improve maintainability and robustness of the codebase (e.g. PR 9524 PR 9554).

GHC bug fixes

  • Ben investigated memory-ordering issues using ThreadSanitizer and fixed numerous data races (!9372, !11795, !11768).

  • Ben fixed a thread-safety issue due to GHC’s use of the C strerror utility (#24344).

  • Sam fixed a 9.8 regression in shadowing error messages involving record fields with no field selectors (!11981).

  • Hannes fixed a 9.8 regression in how Haddock resolves qualified references (!11920).

  • Zubin fixed a regression in which GHC reported a poor error message in the presence of module cycles including hs-boot files (!11718, !11792).

  • Zubin fixed cross-module module breakpoints using incorrect cost centres (!11892).

  • Sam and Andreas fixed a variety of bugs in the handling of fused-multiply-add primops that were added in GHC 9.8.1 (!11587, !11893, !11902, !11987).

  • Ben fixed a subtle bug in the implementation of unique generation on 32-bit platforms (!11802).

  • Andreas fixed a bug in the C foreign-function interface that was introduced by using sub-word-sized arguments (!11989).

  • Zubin set -DPROFILING when compiling C++ sources with profiling (!11871).

  • Matthew fixed an off-by-one error when handling info-table provenance entries (!11873).

  • Zubin fixed a bug with ghcup-metadata generation (!11791).

  • Zubin updated the users’ guide to take into account the unrestricted overloaded labels GHC proposal, which landed in GHC 9.6 (!11774).

  • Hannes fixed a bug arising from GHC being installed at a filepath that includes spaces on Windows (!11938).

Build system, CI and distribution improvements

  • Ben carried out a number of submodule bumps in preparation for the GHC 9.10 release.

  • Rodrigo allowed the configure script to use autoconf 2.72 (!11942).

  • Matthew fixed a bug in the configuration of hsc2hs when building GHC, which was the source of linker errors (#24050, !11384).

  • Matthew updated the CI images, with a particular focus on improving the testing of the LLVM backend on CI (#24369, !11976).

  • Matthew ensured that documentation is built on more configuration in CI (e.g. on alpine, rocky8, Windows, Darwin) (!12134).

  • Ben adapted GHC to LLVM’s new pass manager CLI (!8999).