The large-records
library provides support for large records in Haskell with
much better compilation time performance than vanilla ghc
does. Well-Typed and
MonadFix are happy to announce a new release of this
library, which avoids all Template Haskell or quasi-quote brackets. Example:
{-# ANN type User largeRecordLazy #-}
data User = MkUser {
name :: String
active :: Bool
,
}deriving stock (Show, Eq)
instance ToJSON User where
= gtoJSON
toJSON
john :: User
= MkUser { name = "john", active = True }
john
setInactive :: User -> User
= u{active = False} setInactive u
This makes for a nicer user experience and provides better integration with tooling (for example, better syntax highlighting, auto-formatting, and auto-completion). Importantly, avoiding Template Haskell also means we avoid the unnecessary recompilations that this incurs1, a significant benefit for a library aimed at improving compilation time.
In this blog post we will briefly discuss how this was achieved.
Avoiding quotation
Record declaration
The previous large-records
version used quotation in two places. First, it was
using Template Haskell quotes for record definitions, something like:
largeRecord defaultLazyOptions [d|
data User = MkUser {
name :: String
, active :: Bool
}
deriving stock (Show, Eq) |]
The new version avoids this by using a ghc
source plugin
instead of TH. The source plugin generates much the same code as the TH code
used to do; if you’d like to see what definitions are generated, you can use
{-# ANN type User largeRecordLazy { debugLargeRecords = True } #-}
Record expressions
Record updates such as
setInactive :: User -> User
= u{active = False} setInactive u
were already supported by the old version (and are still supported by the new),
since these rely only on RecordDotSyntax
as provided by
record-dot-preprocessor
. However, record values required
quasi-quotation in the previous version:
john :: User
= [lr| MkUser { name = "john", active = True } |] john
Here it was less obvious how to replace this with a source plugin, because we
cannot see from the syntax whether or not MkUser
is the constructor of a
large record. Moreover, the old internal representation of large records
(described in detail in Avoiding quadratic core code size with large
records) meant that ghc
was not even aware of name
or active
as record fields. This means that the source plugin must run before the
renamer: after all, name resolution would fail for these names. This in turn
essentially means that the plugin gets the syntax to work with and nothing else.
The solution is an alternative internal representation of records, after a
cunning idea from Adam Gundry. For our running example, the code that is
generated for User
is
data User = forall n a.
~ String, a ~ Bool)
(n => MkUser {
name :: n
active :: a
, }
This representation achieves two things:
ghc
won’t generate field accessors for fields with an existential type (avoiding quadratic blow-up)- but it still works much like a normal record constructor; in particular,
record values such as
john
work just fine.
This representation does mean that regular record updates won’t work; something like
setInactive :: User -> User
= u { active = False } setInactive u
will result in an error
Record update for insufficiently polymorphic field
When using RecordDotSyntax
however all is fine, which was already a
requirement for using large-records
anyway.
Performance
The main benchmark for large-records
is a module containing a record
declaration with n
fields with Eq
, Show
, Generic
and HasField
instances, and a ToJSON
instance defined using a generic function. See the
Benchmarks section of the first blog post on
large-records
for additional information.
The code generated by the new source plugin is very similar to the code that was
previously generated by TH. Critically, it is still linear in the size of the
record (unlike standard ghc
, which is quadratic); see to the full report on
the (compile time) performance of large-records
for details. We
therefore don’t expect any super-linear improvements in compilation time; indeed,
improvement of compilation time was not the point of this refactoring (other
than avoiding unnecessary recompilations due to TH). It is nonetheless nice to
see that the plugin is roughly 25% faster than TH:
Although we didn’t measure it, avoiding quasi-quotation for record values should also help improve compilation time further, depending on how common these are in any particular codebase.
Conclusions
The large-records
library is part of our work on improving compilation time on
behalf of Juspay. We have written extensively about these
compilation time problems before (see blog posts tagged with
compile-time-performance), and also have
given various presentations on this topic (HIW 2021,
HaskellX 2021). This new release of large-records
is
not fundamentally different to the previous. It still offers the same features:
- linear-size
ghc
code and therefore much better compilation time performance - stock derivation support (
Show
,Eq
,Ord
) Generics
support (throughlarge-generics
style generics, similar in style togenerics-sop
)HasField
support for integration withrecord-dot-preprocessor
However, the fact that Template Haskell quotes and quasi-quotation are no longer required in the new version should make for a much better user experience, as well as further speed up compilation time projects with deep module hierarchies.
Suppose module
B
imports moduleA
. IfB
uses Template Haskell splices, it will be recompiled wheneverA
changes, whether or not the change toA
is relevant. Specifically, even with optimizations disabled, a change to the implementation of a function inA
will trigger a recompilation ofB
. The reason is thatB
might executef
in the splice, andghc
makes no attempt at all to figure out what the splice may or may not execute. We have recently improved this inGHC HEAD
; a blog post on that change is coming soon.↩︎