Using GHC low-latency garbage collection in production

Domen Kožar – Tuesday, 16 March 2021

all cachix garbage-collector ghc performance

This is a guest post by Domen Kožar.

In this post I’ll dive into how low-latency garbage collection (GC) has improved developer experience for Cachix users.

The need for low latency

Cachix serves the binary cache protocol for the Nix package manager.

Before Nix builds a package, it will ask the binary cache if it contains the binary for a given package it wants to build. For a typical invocation of Nix there can be hundreds or even thousand of packages that need to be checked. However, if a binary is present in the cache, building the package is no longer required, potentially saving a lot of time and CPU.

It is crucial for the backend of such a binary cache service to respond in a timely manner so that the optimisation of skipping builds actually pays off.

Monitoring GC pauses

The easiest way to monitor and graph how long GC pauses last is via ekg-statsd, by exposing the rts.gc.gc_wall_ms metric. For Cachix, a typical plot of this metric used to look like this:

Garbage collection pause time versus time of a typical one-hour period of the Cachix server when run under GHC’s parallel, copying garbage collector. The right pane depicts a pause-time histogram on logarithmic scale.

From the picture, we can see that under load, we experience GC pause times of up to nearly 800 ms. Having 800 ms pauses stopping the world is far from ideal (I’ve even observed some pauses that last over a second under really heavy load), since the endpoint for checking if a certain binary exists normally takes only about 2–4 ms.

Switching to the low-latency GC

If you want to try the low-latency GC in your own code, please make sure to use GHC 8.10.3 or later since it fixes a few crashing bugs that you don’t want to encounter. Then, to enable the low-latency GC, append the following flags when invoking an executable built by GHC:

myexecutable +RTS --nonmoving-gc

For Cachix, a typical picture of GC pauses plotted over time then looks as follows:

Garbage collection pause time versus time of another one-hour period after switching to the non-moving garbage collector.

While this is comparing apples to oranges since the load is not exactly the same between the two pictures, you can see that the distribution is now significantly different. By far the most pauses are now actually in the range of just a few milliseconds.

Unfortunately, non-moving GC can occasionally still cause relatively long pauses in the worst case (measured at 150–200 ms). We believe that this is due to the workload spawning many threads. There is still work to be done to further reduce pause times of the low-latency collector under such circumstances.

The throughput impact of the low-latency GC hasn’t been measured for this case. The response time of a “does this binary exist?” request is still within the 2–4 ms range most of the time.

Conclusion

Monitor GC pauses to understand how they impact your application response times.
Non-moving GC has been running in production for over a month, reducing worst-case response time for a performance-sensitive endpoint without any issues.
Being able to monitor the total number of threads in the RTS would improve production insights, but that is yet to be implemented.