Edsko will be talking about the problems discussed in this blog post in his Haskell Implementors’ Workshop talk this Sunday, Aug 22. The talk will be broadcast live on YouTube.
Consider a module that contains nothing but the definition of a single large record and some type class instances:
{-# OPTIONS_GHC -fplugin=RecordDotPreprocessor #-}
module Example where
import Data.Aeson
import Generics.SOP.JSON
import Generics.SOP.TH
import GHC.TypeLits
newtype T (i :: Nat) = MkT Word
deriving (Show, Eq, ToJSON)
data R = MkR {
f00 :: T 00
f01 :: T 01
,-- .. lots more ..
f98 :: T 98
, f99 :: T 99
,
}deriving (Eq, Show)
'R
deriveGeneric '
instance ToJSON R where
= gtoJSON defaultJsonOptions toJSON
As it stands—using ghc
’s standard representation for records, along with
the code generated by the RecordDotPreprocessor
plugin and the
Generic
instance generated by generics-sop
—this results
in a core
representation of a whopping 450,000 terms/types/coercions, takes
3 seconds to compile, and requires 500M of RAM to compile.1
If we change this module to
module Example where
import Data.Aeson (ToJSON(..))
import Data.Record.Generic.JSON
import Data.Record.TH
import Test.Record.Size.Infra
largeRecord defaultLazyOptions [d|
data R = MkR {
f00 :: T 00
, f01 :: T 01
-- .. lots more ..
, f98 :: T 98
, f99 :: T 99
}
deriving (Eq, Show)
|]
instance ToJSON R where
= gtoJSON toJSON
we get a module with essentially the same functionality, but with a core
size of a mere 14,000 terms/types/coercions, which compiles within 1 second and
requires roughly 100M of RAM.
In this blog post we describe why this simple module generates so much code,
and how the large-records
library manages to reduce this by more than an
order of magnitude.
We wrote this library because Juspay recently engaged
Well-Typed’s services, and one of their requests to us was to try and improve
compilation time and compilation memory requirements for their code base.
Juspay very generously allowed us to make large-records
open source, and it
is now available on Hackage
.
Quadratic code size at every level
The reason the core
representation of our example module is so large is that
unfortunately there are many examples of ghc
and other libraries being
accidentally quadratic.
Before we look at some concrete examples, let’s first investigate where
this quadratic code size is coming from. As we will see, it arises at every
level: terms, types, type classes, and type level programming.
Warmup: terms
For our running example, we will want to have a record with lots of fields. To avoid some “accidental optimizations”, we’ll give each of those fields a different type. To make that a little bit easier, we’ll just introduce a single type that is indexed by a natural number, so that this one type definition gives us as many different types as we need:
data T (n :: Nat) = MkT Int
That out of the way, consider a record with lots of fields, such as
data R = MkR {
f00 :: T 00
f01 :: T 01
, f02 :: T 02
,-- .. lots more ..
f98 :: T 98
, f99 :: T 99
, }
When we define a record, ghc
will generate field accessors for all fields
in the record. In other words, it will derive functions such as
f00 :: R -> T 0
These functions are not difficult to generate, of course. Each function is
just a simple case
statement:
= \(r :: R) ->
f00 case r of
MkR x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
->
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99 x00
Although simple, this case statement mentions a lot of unused variables
(99 of them, in fact). Moreover, each of those variables is annotated with
their type. This means that this one function is actually rather large;
ghc
reports that it contains 5 terms and 202 types. The size of this
function is clearly linear in the number n
of fields we have; moreover, ghc
will generate one function for each of those n
fields; that means that
simply declaring the record will already generate code that is O(n²)
in size.
More subtle: types
Suppose we define an applicative “zip
” function for R
, something like
zipMyRecordWith ::
Applicative f
=> (forall n. T n -> T n -> f (T n))
-> R -> R -> f R
=
zipMyRecordWith f r r' pure MkR
<*> f (f00 r) (f00 r')
<*> f (f01 r) (f01 r')
<*> f (f02 r) (f02 r')
-- .. lots more ..
<*> f (f98 r) (f98 r')
<*> f (f99 r) (f99 r')
Clearly the size of this function is at least linear in the number of record
fields, that much is expected. However, -ddump-simpl
tells us that this
function contains 50,818 types! Where are all of those coming from?
Recall the type of (<*>)
:
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
Those type variables need to be instantiated; f
is always instantiated to the
f
type parameter passed to zipMyRecordWith
, a
is the type of the next
field we’re applying, but what about b
? Let’s annotate zipMyRecordWith
with the types of a
and b
in pseudo-Haskell:
zipMyRecordWith ::
Applicative f
=> (forall n. T n -> T n -> f (T n))
-> R -> R -> f R
=
zipMyRecordWith f r r' pure MkR
<*> @(T 00) @(T 01 -> T 02 -> T 03 -> .. -> T 99 -> R) f (f00 r) (f00 r')
<*> @(T 01) @( T 02 -> T 03 -> .. -> T 99 -> R) f (f01 r) (f01 r')
<*> @(T 02) @( T 03 -> .. -> T 99 -> R) f (f02 r) (f02 r')
-- .. lots more ..
<*> @(T 98) @( T 99 -> R) f (f98 r) (f98 r')
<*> @(T 99) @( R) f (f99 r) (f99 r')
The first instantiation of (<*>)
mentions the type of every single field;
the second mentions the types of all-but-one field, the next of all-but-two,
etc. This means that the size of this single function is once again O(n²)
in the number of record fields.
Type class dictionaries
Suppose we wanted to capture the concept “some constraints c
applied to the
types of all fields in our record”:
class (
T 00)
c (T 01)
, c (T 02)
, c (-- .. lots more ..
T 98)
, c (T 99)
, c (=> Constraints_R c )
This should be fine right? Right? Wrong.
When we declare a type class, we’re effectively constructing a record type with
fields for each of the methods of the class; this record is known as the
“dictionary” for the class. Superclass constraints translate to
“subdictionaries”: when a class such as Ord
has a superclass constraint on
Eq
, then the dictionary for Ord
will have a field for an Eq
dictionary.
This means that this definition of Constraints_R
is actually of a very similar
nature to the definition of R
itself: it defines a record with 100 fields.
And just like for the record, ghc
will generate “field accessors” to extract
the fields of this dictionary; put another way, those field accessors “prove”
that if we know Constraints_R c
, we also know c (T 00)
, c (T 01)
, etc.
What do those field accessors look like? You guessed it, a big pattern match;
in pseudo-Haskell:
$p1Constraints_R :: Constraints_R c => c (T 0)
$p1Constraints_R = \dict ->
case dict of
Constraints_R d00 d01 d02 d03 d04 d05 d06 d07 d08 d09
d10 d11 d12 d13 d14 d15 d16 d17 d18 d19-- .. lots more ..
->
d90 d91 d92 d93 d94 d95 d96 d97 d98 d99 d00
Since ghc
generates a projection like this for each superclass constraint,
this once again results in code of size that is O(n²)
in the number of
record fields.
Type level induction
So far all our examples have been simple Haskell; for our next example, we’ll
get a bit more advanced. Just like we can have list values, we can also have
list types; for example, here is a type-level list of the indices of the T
types used inside our running example record R
:
type IndicesR = '[
00, 01, 02, 03, 04, 05, 06, 07, 08, 09
10, 11, 12, 13, 14, 15, 16, 17, 18, 19
, -- .. lots more ..
90, 91, 92, 93, 94, 95, 96, 97, 98, 99
, ]
There are many use cases for type level lists. For example, we can define a type
NP
such that NP f [x, y, .., z]
is basically the same as (f x, f y, .., f z)
:
data NP (f :: k -> Type) (xs :: [k]) where
Nil :: NP f '[]
(:*) :: f x -> NP f xs -> NP f (x ': xs)
If we have a T
for every index in IndicesR
, we can construct a value
of our record:
npToR :: NP T IndicesR -> R
:* f01 :* f02 :* f03 :* f04 :* f05 :* f06 :* f07 :* f08 :* f09
npToR ( f00 :* f10 :* f11 :* f12 :* f13 :* f14 :* f15 :* f16 :* f17 :* f18 :* f19
-- .. lots more ..
:* f90 :* f91 :* f92 :* f93 :* f94 :* f95 :* f96 :* f97 :* f98 :* f99
:* Nil ) = MkR {..}
The compiled size of npToR
is large, but it is linear in size (total
size of 4441 terms, types and coercions for 100 fields, and a total size of
2241 for 50 fields). So far so good.
In order to get to the problem I’d like to illustrate in this section, we need
one more concept. Suppose wanted to write a function that can construct a
value of NP T xs
for any xs
:
mkNP :: NP T xs
This should be possible, since T
is just a wrapper around an Int
, and so
all we need to do is generate as many T
s as there are elements in the type
level list xs
. However, xs
is a type level list, and we cannot pattern
match on types in Haskell; indeed, they do not even exist at all at run-time.
Therefore we somehow need to reflect the type level list at the term level: we need a value that corresponds exactly to the type. We do this by introducing a new type, indexed by a type level list, so that given a type level list of a particular length, our new type has exactly one value. Such a type—a type with exactly one value—is known as a singleton type:
data SList :: [k] -> Type where
SNil :: SList '[]
SCons :: SList xs -> SList (x ': xs)
SList
gives us a value that we can pattern match on, and when we do we
discover something about the shape of the type level list xs
:
mkNP' :: SList xs -> NP T xs
SNil = Nil
mkNP' SCons s) = MkT 0 :* mkNP' s mkNP' (
The closest we can come to mkNP
is to make this singleton value implicit:
class SListI (xs :: [k]) where
sList :: SList xs
mkNP :: SListI xs => NP T xs
= mkNP' sList mkNP
If we now try to use mkNP
to construct a value of our record
r0 :: R
= npToR mkNP r0
we will of course find that we need an instance of SListI
for IndicesR
.
Our first instinct might be to write something like
instance SListI IndicesR where
=
sList SCons
$ SCons
$ SCons
-- .. lots more ..
$ SCons
$ SCons
$ SNil
but if we do that, we will soon discover that the compiled code is
quadratic in size. We could have predicted that: it’s the same problem as in
the “Types” section above, with ($)
playing the role of (<*>)
. But even if
we write it as
instance SListI IndicesR where
=
sList SCons (
SCons (
SCons (
-- .. lots more ..
SCons (
SCons (
SNil
{- .. lots more brackets .. -} )) )))
we’re still in trouble: each of those SCons
applications has two type
argument x
and xs
(the type level list of the tail). So with some type
annotations, this code is
instance SListI IndicesR where
=
sList SCons @00 @'[01, 02, 03, .., 99] (
SCons @01 @'[ 02, 03, .., 99] (
SCons @02 @'[ 03, .., 99] (
-- .. lots more ..
SCons @98 @'[ 99] (
SCons @99 @'[ ] (
SNil
{- .. lots more brackets .. -} )) )))
So this code is again O(n²)
in size (actually, the real code generated by
ghc
is much worse than this, due to the fact that SList
is a GADT; after
desugaring, the function has a total size of 15,352, and after the simplifier
runs (in -O0
) that expands to a whopping 46,151).
Experienced type-level Haskellers might be surprised that we’d try to write
this SListI
instance by hand. After all, the definition of a singleton type
is that it is a type with only a single value, and so we should be able to
just derive it automatically. Indeed we can:
instance SListI '[] where
= SNil
sList
instance SListI xs => SListI (x ': xs) where
= SCons sList sList
Surely we should be good now, right? These definitions are small, and don’t deal with concrete large lists, and so we avoid quadratic code size. Right? Wrong.
Although it is true that the two instances for SListI
are unproblematic, the
moment that we use npToR mkNP
, ghc
needs to prove SListI '[00, 01, .. 99]
.
In other words, it must generate code that produces a dictionary for
SListI [00, 01, .. 99]
. Since SListI
for (x ': xs)
has a superclass
constraint SListI xs
, the dictionary for SListI [00, 01, .., 99]
will have
a field for the dictionary for SListI [01, .., 99]
, all the way down to
the empty type level list. This means that ghc
will generate 100 dictionaries;
each of those dictionaries contains an SCons
application with the same type
annotation as in hand-written code above. This means that we still have code
that is O(n²)
in size.
Concrete examples
In the previous section we discussed the ways in which we might end up with
accidentally quadratic code size. In this section we will consider some examples
of code generated by specific libraries. We will start with GHC generics, which
is actually a good example: it generates code of size O(n log n)
rather than
O(n²)
. After that we will discuss record-dot-preprocessor
and
generics-sop
, both of which do generate code of O(n²)
size.
GHC Generics
The goal of generic programming is to be able to write a single function that
can be applied to values of lots of different types. Generics libraries such as
GHC.Generics
and generics-sop
(discussed below) do this by translating the value to a representation type;
since every type can be translated to only a handful of different representation
types, it suffices to write a function over all of those representation types.
Here we will discuss a simplified form of GHC.Generics
that still
illustrates the same point. The generic representation of a record such as the
one above is essentially just a large nested tuple. For the GHC library itself
it does not actually matter terribly how this tuple is created; for example,
this would work:
type family GHC_Rep (a :: Type) :: Type
type instance GHC_Rep R = (T 00, (T 01, (T02, ... (T98, T99))))
Although it would work, it would not be great from a code size perspective.
Consider the function that would translate R
to GHC_Rep R
:
ghcTo :: R -> GHC_Rep R
MkR{..} =
ghcTo @(T 00) @(T 01, (T02, ... (T98, T99))) f00 (
(,) @(T 01) @(T02, ... (T98, T99)) f01 (
(,) @(T 02) @(... (T98, T99)) f02 (
(,) -- .. lots more ..
@(T 99) @(()) f99 (
(,) () ))))
This pattern is starting to get familiar at this point; with this
representation, ghcTo
would be O(n²)
in size. Fortunately, GHC generics
avoids this problem by instead generating a balanced representation,
something like
type instance GHC_Rep R =
T 00, ( T 01, T 02 ) )
( ( ( ( ( ( T 03, ( T 04, T 05 ) )
, (
)-- .. lots more ..
T 46, T 47 )
, ( ( T 48, T 49 )
, (
) ) ) ) )T 50, ( T 51, T 52 ) )
, ( ( ( ( ( T 53, ( T 54, T 55 ) )
, (
)-- .. lots more ..
T 96, T 97 )
, ( ( T 98, T 99 )
, ( ) ) ) ) ) )
With this representation, the number of branches is still the same, but in the translation function the type annotations are now halved in size at each branch, rather than reduced by 1:
ghcTo :: R -> GHC_Rep R
MkR{..} =
ghcTo
( ( ( ( ( ( f00, ( f01, f02 ) )
, ( f03 , ( f04, f05 ) )
)-- .. lots more ..
, ( ( f96, f97 )
, ( f98, f99 ) ) ) ) ) ) )
As a consequence, the size of this version, and the cost of GHC generics
in general, is actually O(n log n)
in the number of record fields,
although the constant factor is reasonably high.
It is worth emphasizing how much better O(n log n)
is than O(n²)
. Here is a
plot of the cost of GHC generics (in terms of AST size: terms, types and
coercions) as we vary the number of record fields from 100 fields to 1000
fields:
This almost looks linear. It isn’t; the cost per field is roughly 460 terms/types/coercions when we have 100 fields, and that increases to roughly 625 when we get to 1000 fields, but the cost only goes up very slowly.
The RecordDotSyntax
preprocessor
The record-dot-preprocessor
is a preprocessor and GHC plugin for the
RecordDotSyntax GHC proposal. The preprocessor
interprets specialized “record syntax”; for example, it translates
.lbl2 = val} expr{lbl1
to
@"lbl1" expr $ setField @"lbl2" (getField @"lbl1" expr) val setField
These two functions getField
and hasField
come from a HasField
class currently
provided by record-hasfield
(although now that the Add
setField
to HasField
proposal is accepted, this should
eventually move to base
):
class HasField x r a | x r -> a where
hasField :: r -> (a -> r, a)
When you include
{-# OPTIONS_GHC -fplugin=RecordDotPreprocessor #-}
at the top of your Haskell file, the RecordDotPreprocessor
plugin will
generate HasField
instances for you. They look innocuous enough:
instance HasField "f00" R (T 00) where
= (\x -> r { f00 = x }, f00 r) hasField r
Unfortunately, once we get to ghc
’s internal representation, this is much
less innocent:
hasField_f00 :: R -> (T 0 -> R, T 0)
= (
hasField_f00 r -> case r of
\new MkR x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
->
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99 MkR new x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99case r of
, MkR x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
x10 x11 x12 x13 x14 x15 x16 x17 x18 x19-- .. lots more ..
->
x90 x91 x92 x93 x94 x95 x96 x97 x98 x99
x00 )
We saw this before when we discussed the record field accessors; the same
linear cost that a field accessor induces is induced here as well, with a larger
constant factor. As for field accessors, we need to generate a HasField
instance for every field, and hence altogether we once again have code that is
O(n²)
in size.
This really matters: a module containing just the declaration of our record
size has a total size of 22,065 terms/types/coercions after desugaring, and
22,277 after the simplifier (with -O0
). This is already much bigger than it
should be, due to the quadratic nature fo the field accessors. Generating
HasField
instances for all fields results in a total code size of 58,665 after
desugaring and 78,977 after the simplifier. And all we’ve done is define the
record! (For comparison, with large-records
, a module containing a single
record with 100 fields has a mere total size of 8,305 after desugaring, expanding
to 13,958 after the simplifier, and that is including support for generics).
SOP generics
The generics-sop
library is similar in nature to GHC generics,
but it uses a different generic representation. It is described in detail
in the paper True Sums of Products; here we give a simplified
presentation.
In fact, we have already seen most ingredients. The generics-sop
representation for a record is essentially the NP
type that we discussed in
“Type level induction”, above. In that section we saw that the function npToR
that translates from the generic representation to R
is linear in size.
Unfortunately, the same is not true for the inverse function:
npFromR :: R -> NP T IndicesR
MkR{..} = (
npFromR :* f01 :* f02 :* f03 :* f04 :* f05 :* f06 :* f07 :* f08 :* f09
f00 :* f10 :* f11 :* f12 :* f13 :* f14 :* f15 :* f16 :* f17 :* f18 :* f19
-- .. lots more ..
:* f90 :* f91 :* f92 :* f93 :* f94 :* f95 :* f96 :* f97 :* f98 :* f99
#endif
:* Nil
)
That wildcard pattern match MkR{..}
will expand to a pattern match for a
variable for every field, but that doesn’t matter here: we only generate
one translation function, and it’s fine if that is linear in size.
The problem however is in the body of this function. After the previous
examples, perhaps you can spot the problem already: (:*)
has a bunch of
type arguments, and one of those is the list of indices at the tail; so
this code looks something like
MkR{..} =
npFromR :*) @00 @'[1, 2, .., 98, 99] f00 (
(:*) @01 @'[ 2, .., 98, 99] f01 (
(:*) @02 @'[ .., 98, 99] f02 (
(-- .. lots more ..
:*) @98 @'[ 99] f98 (
(:*) @99 @'[ ] f99 (
(Nil )))))
a depressingly familiar sight at this point (and again, the real code is
worse, due to the fact that NP
is a GADT). This matters: the size of npToR
is 4,441 terms/types/coercions, the size of npFromR
is 46,459.
The generics-sop
library suffers from quadratic code size in other places as well.
It makes heavy use of type-level lists, in a similar style to SListI
above,
and with the same kinds of problems. It also represents metadata at the
type-level, rather than just at the term level, which are more type-level lists.
These are not fundamental problems; generics-sop
simply wasn’t designed with
the goal to optimize for code size reduction in mind. As we saw in the section
on GHC generics, these costs can probably be brought down to O(n log n)
,
though this will require careful thought.
The large-records
library
When you use the large-records
library and define
largeRecord defaultLazyOptions [d|
data R = MkR {
f00 :: T 00
, f01 :: T 01
, f02 :: T 02
-- .. lots more ..
, f98 :: T 98
, f99 :: T 99
} |]
You get the definition of a type R
with field accessors, HasField
instances
for every field, and a Generic
instance (albeit for a custom generics library),
but the code will be entirely linear—O(n)
—in the size of the record.
In this section we will see how large-records
achieves this for the basic
definitions; we will discuss generics separately in the next section.
Representation
As we saw, the moment that we declare a record, ghc
will generate field
accessors for each field of the record, resulting in code that is O(n²)
in size. It follows that we cannot use the normal representation of the record.
Instead, large-records
generates the following:
newtype R = LR__MkR { vectorFromR :: Vector Any }
That is, we are representing the record basically by an untyped vector which
will have an entry for every field in the record.2 Of course,
typically users will never deal with this untyped representation directly,
but use the field accessors or HasField
instances, which we will discuss next.
Field accessors
Along with the definition of R
, large-records
generates an unsafe function
that can return any element of the vector at any type:
unsafeGetIndexR :: Int -> R -> a
= noInlineUnsafeCo $ vectorFromR t n unsafeGetIndexR n t
where noInlineUnsafeCo
is a non-inlinable form of unsafeCoerce
3.
Just like the internal representation of R
, this function is not intended for
normal use. Instead, it is used to define field accessors for each field. For
example, here is the definition of the accessor f00
:
f00 :: R -> T 0
= unsafeGetIndexR 0 f00
One of these accessors is generated for every field, but the size of each
accessor is constant (and tiny), so the generation of all accessors results
in code that is O(n)
in size.
HasField
instance
The HasField
instance is very similar. Along with the unsafe accessor, we
also define an unsafe update function:
unsafeSetIndexR :: Int -> R -> a -> R
= LR__MkR $
unsafeSetIndexR n r x unsafeUpd (vectorFromR r) [(n, noInlineUnsafeCo x)]
The HasField
instance is now easy, and once again constant in size and tiny:
instance HasField "f00" R (T 0) where
= (unsafeSetIndexR 0 r, unsafeGetIndexR 0 r) hasField r
Pattern synonym
By default large-records
does not generate a pattern synonym for R
.
It can do, if requested:
= True}) [d|
largeRecord (defaultLazyOptions {generatePatternSynonym
data R = MkR {
f00 :: T 00
, f01 :: T 01
, f02 :: T 02
-- .. lots more ..
, f98 :: T 98
, f99 :: T 99
} |]
With the generatePatternSynonym
option, large-records
generates two
new definitions. First, a function that constructs a tuple containing all
fields of the record:
tupleFromR :: R
-> ( (T 00, T 01, T 02, {- .. lots more .. -}, T 60, T 61)
T 62, T 63, T 64, {- .. lots more .. -}, T 98, T 99)
, (
)= (
tupleFromR r 00 r
unsafeGetIndexR 01 r
, unsafeGetIndexR 02 r
, unsafeGetIndexR -- .. lots more ..
60 r
, unsafeGetIndexR 61 r
, unsafeGetIndexR
)62 r
, ( unsafeGetIndexR 63 r
, unsafeGetIndexR 64 r
, unsafeGetIndexR -- .. lots more ..
98 r
, unsafeGetIndexR 99 r
, unsafeGetIndexR ) )
(It was careful to use a nested tuple, because ghc
does not support tuples
with more than 62 fields.) It then uses this function as a view pattern in
an explicitly bidirectional pattern synonym:
pattern MkR :: T 0 -> T 1 -> T 2 -> {- .. lots more .. -} -> T 98 -> T99 -> R
pattern MkR{f00, f01, f02, {- .. lots more .. -}, f98, f99} <-
-> ( ( f00, f01, f02, f03, f04, f05, f06, f07, f08, f09
tupleFromR
, f10, f11, f12, f13, f14, f15, f16, f17, f18, f19-- .. lots more ..
, f60, f61
)
, ( f62, f63, f64, f65, f66, f67, f68, f69-- .. lots more ..
, f90, f91, f92, f93, f94, f95, f96, f97, f98, f99
) )where
MkR x00 x01 x02 {- .. lots more .. -} x98 x99 = RFromVector $ fromList [
unsafeCoerce x00
, unsafeCoerce x01
, unsafeCoerce x02-- .. lots more ..
, unsafeCoerce x98
, unsafeCoerce x99 ]
The pattern synonym makes it possible to pattern match on R
or construct R
values as if it was defined in the normal way. The reason that we don’t generate
it by default is an annoying one: when we declare a record pattern synonym,
ghc
very “helpfully” generates field accessors,
which was precisely what we were trying to avoid.
So the option is there for code bases that need it; when enabled, we reintroduce
a quadratic component, albeit a reasonably small one, and the rest of the
generated code is still linear. Fortunately, many code bases that deal with
large records never actually construct such records directly. They are instead
populated from databases or from REST requests; in such cases, the pattern
synonym is not required. In addition, large-records
provides a quasi-quoter
that can be used in lieu of the pattern synonym so that you can write, for example,
firstTwo :: R -> (T 0, T1)
= (x0, x1) firstTwo [lr| MkR { f00 = x0, f01 = x1 } |]
In ghc- 9.2
we have the new NoFieldSelectors
language pragma
(ghc proposal, merge request)
which might improve the situation here, but I haven’t experimented with that
yet.
Generics
The large-records
library comes with its own Generic
class. Although the
class and its instances are defined rather differently, usage of the
class—that is, the way you’d write generic functions—is very similar in style
to generics-sop
. The class is defined as
class Generic a where
-- Translation
from :: a -> Rep I a
to :: Rep I a -> a
-- Constraints
type Constraints a :: (Type -> Constraint) -> Constraint
dict :: Constraints a c => Proxy c -> Rep (Dict c) a
-- Metadata
metadata :: Proxy a -> Metadata a
We will discuss the various parts of this class separately, and finish with an example of a generic function that ties all this together.
Translation to and from the generic representation
The generic representation used by large-records
is
newtype Rep f a = Rep (Vector (f Any))
In other words, if we pick the identity functor for f
, then Rep f a
is
just Vector Any
; this means that from
and to
can just be coerce
:4
instance Generic R where
= coerce
from = coerce
to -- .. more ..
Constraints
The Constraints
type family is meant to capture the concept of “a constraint
applied to all fields in the record”. We encountered this idea above in the
section “Type class dictionaries”, where we noticed we have to be careful.
Here is how large-records
instantiates Constraints
for our running example:
class Constraints_R c where
dictConstraints_R :: Proxy c -> Rep (Dict c) R
instance Generic R where
type Constraints R = Constraints_R
= dictConstraints_R
dict -- .. more ..
Constraints_R
does not have any superclass constraints, thus avoiding the
quadratic cost of all the dictionary projections. Instead, it is constructing a
vector of dictionaries. Essentially, we are constructing our own
representation of a dictionary, just like we constructed our own representation
for a record. Unlike the GHC representation of dictionaries, we can project from
our representation by just doing a vector index.
Fortunately, we can give an instance for Constraints_R
:
instance ( c (T 0)
T 1)
, c (T 2)
, c (-- .. lots more ..
T 98)
, c (T 99)
, c (=> Constraints_R c where
) = Rep $ fromList [
dictConstraints_R p Proxy @(T 0))
unsafeCoerce (dictFor p) (Proxy @(T 1))
, unsafeCoerce (dictFor p) (Proxy @(T 2))
, unsafeCoerce (dictFor p) (-- .. lots more ..
Proxy @(T 98))
, unsafeCoerce (dictFor p) (Proxy @(T 99))
, unsafeCoerce (dictFor p) ( ]
In core, this results in a single function with 100 arguments, that
constructs our custom dictionary representation. This function is reasonably
large (3523 terms/types/coercions), but not excessively so, and moreover,
it is O(n)
in the number of record fields.
Metadata
The Metadata
provided by large-records
is currently fairly minimal;
it tells us the names of the record, its constructor, and the fields, as well
as the strictness of each field:
data Metadata a = Metadata {
recordName :: String
recordConstructor :: String
, recordSize :: Int
, recordFieldMetadata :: Rep FieldMetadata a
,
}
data FieldStrictness = FieldStrict | FieldLazy
data FieldMetadata x where
FieldMetadata ::
KnownSymbol name
=> Proxy name
-> FieldStrictness
-> FieldMetadata x
Metadata is generated automatically by the largeRecord
invocation.
Note that the name of each field is at the type level; this is primarily to help
with integration with GHC generics. This is
essential for the integration of large-records
with other libraries, but a detailed
discussion of this is outside the scope of this blog post.
Writing generic functions
An in-depth discussion on how to write generic functions in the style of
generics-sop
is beyond the scope of this blog post; interested readers may
wish to refer to the True Sums of Products paper or watch Andres
Löh’s generics-sop
tutorial at ZuriHac 2016 or read his SSGEP
summer school lecture notes. Here we will discuss one (typical)
example only.
Generic functions in generics-sop
style are written using a set of combinators
on the generic representation. As a simple example, we can zip two Rep
s:
zipWith :: (forall x. f x -> g x -> h x)
-> Rep f a -> Rep g a -> Rep h a
It is worth spelling out what this function is doing: it is zipping together
all the fields of two records, albeit in their generic representation (Rep
).
We therefore require that the function we’re zipping with is polymorphic: we
must be able to zip fields no matter their type.
As another example, we can map a function over all fields in the record; there
is a function map
, as you’d expect:
map :: (forall x. f x -> g x) -> Rep f a -> Rep g a
but there is also a slightly more useful version of this:
cmap :: (Generic a, Constraints a c)
=> Proxy c
-> (forall x. c x => f x -> g x)
-> Rep f a -> Rep g a
This constrained map still requires the function we’re applying to all
fields to be polymorphic, but it is given the guarantee that the types it
is applied to all satisfy constraint c
. Of course, in order to be able to
do that, we need constraint c
to hold for all fields, which is why this
function requires Constraints a c
. There is a similar generalization of
zipWith
called czipWith
, as well as applicative versions of both of these
functions.
The final combinator we will discuss applies when we have a vector in which every element has the same type: in this case, we can collapse the vector to an ordinary list:
collapse :: Rep (K a) b -> [a]
Here is an example of a generic function that can translate any value to
JSON, provided it has a Generic
instance and provided we have ToJSON
instances for all fields:
gtoJSON :: forall a. (Generic a, Constraints a ToJSON) => a -> Value
=
gtoJSON
Aeson.object. Rep.collapse
. Rep.zipWith (mapKKK $ \n x -> (Text.pack n, x)) recordFieldNames
. Rep.cmap (Proxy @ToJSON) (K . toJSON . unI)
. from
where
Metadata{..} = metadata (Proxy @a)
If you are new to generics-sop
style programming, this function may require
careful reading, but the essence of it is simple:
- Translate the value to its generic representation using
from
- Apply
toJSON
to all fields - Zip it with the metadata to get a vector of
(name, value)
pairs - Collapse that vector to a list
- Use Aeson’s standard
object
function to create the JSON object.
Once you are used to the paradigm, generics-sop
style programming allows
for the very succinct definition of generic functions (I’d go so far as calling
it “elegant”, but then I am biased). Typically a function like gtoJSON
would
be used to derive a ToJSON
instance:
instance ToJSON R where
= gtoJSON toJSON
Indeed, the large-records
library can generate such instances automatically
for a handful of classes.
Transforms
The only aspect of the generics programming aspect of large-records
we have
not covered here is what is known as “transforms” in generics-sop
style programming. Transforms allow you to generically modify type indices;
for example, suppose we have a HKD-style definition of a table, as used in
the beam
database library:
data Table f = MkTable {
tableField1 :: f Int
tableField2 :: f Maybe
, }
We could write a generic function that could, say, turn a Table Identity
into a Table Maybe
. Doing this without using type-level lists anywhere is
surprisingly difficult, and a discussion of how this is done in large-records
is probably a good topic for a separate blog post. For now we just refer any
interested readers to the examples in the test suite.
Talking of beam
, large-records
integrates smoothly with
beam-core
by means of an auxiliary library
beam-large-records
. This library is on GitHub but is not available on
Hackage yet because it depends on a
small patch to beam
.
Hopefully it will be released soon.
Benchmarks
For the purpose of these benchmarks, we are interested in compile-time, not
run-time performance. I have not yet done any benchmarks of run-time performance.
There might be some run-time benefits compared to other generics approaches
because there is no conversion to and from a generic representation, but there
are probably some performance penalties for normal functions because the
record representation used internally in large-records
is much less
transparent to the GHC optimizer. Measuring this is future work.
In terms of compile time performance, however, the results are pretty clear. Here is a graph of the size of the core AST of the module (the sum of the terms, types and coercions in the module) plotted against the number of record fields, after desugaring:
As you can see, for vanilla ghc
the size grows quadratically to nearly 450,000
nodes, but stays nice and flat for large-records
. The GHC simplifier reduces
the size of the AST a bit in the case of vanilla ghc
and increases it a bit
for large-records
, but large-records
is still the clear winner:
In terms of “real” performance measurements, here is a plot of ghc
memory
usage as reported by the OS (in kB), against the number of record fields:
As you can see, ghc
climbs to roughly 500 MB for 100 fields, whereas
large-records
stays near the 100 MB mark. There is a similar graph for
compilation time (in terms of elapsed wall-clock time in seconds):
For vanilla ghc
compilation takes roughly 3 seconds for 100 fields, whereas
large-records
is 3x times faster and stays under one second.
(These numbers have been normalized against compiling an empty module. The error bars here indicate standard error; each module has been compiled 100 times.)
Conclusions
The large-records
library generates code that is strictly O(n)
in size
in the number of record fields. It does so at the cost of generating basically
untyped core
. This does not matter for code using large-records
, which is
still presented with a perfectly safe, typed interface. It might however
reduce the applicability of the techniques employed by large-records
more
generally within ghc
: the History of Haskell paper (Section 9.1)
emphasizes the importance of having a typed internal ghc
language to catch and
help debug compiler bugs.
Perhaps aiming for linear cost however is not necessary. We saw in the section
on GHC generics that if we are careful how we set things up, we can generate
code that is O(n log n)
in size. Similar techniques can be applied in
libraries such as generics-sop
as well, provided that we are careful at every
step; for example, the use of type-level lists should be avoided and we should
be using type-level (balanced) trees instead.
In order to tackle the problem at a more fundamental level, we might need to be
able to introduce and control sharing at the type level; Richard Eisenberg
opened ghc ticket #20264 after my HIW talk on large-records
with some
thoughts on the topic. As a completely alternative approach, the work on Scrap
your type applications removes type arguments altogether. This may well
address many of the concerns in this blog post, but it’s such a radically
different internal language that it’s hard to foresee quite all the
consequences.
Addressing the overabundance with types is however only one aspect of the
problem, albeit an important one. Specifically, it does not prevent record
pattern matches and record updates (either on term level records or on type
class dictionaries) in core
always being linear in the size of the record.
Perhaps an extension to core
could alleviate this problem, though changes to
core
affect to the entire compiler backend and so are rightly treated with
extreme prejudice.
When such problems are addressed in ghc
, a library like large-records
may no
longer be required. Until such time, however, the large-records
library can be
used to significantly reduce compile time and memory requirements for code bases
that contain many large records.
Compilation time and memory usage normalised to an empty module.↩︎
We will not discuss it in this blog post, but the library currently has support for records where all fields are strict and records where all fields are lazy. It would not be too difficult to extend it to records where some fields are lazy and some are strict. It does not currently support
UNPACK
pragmas; adding support for that would be more difficult, we would need to add an unboxed vector for each type of unpacked field.↩︎This is to avoid a segfault with
ghc
prior to version 9.0. See GHC ticket #16893.↩︎The actual code generated uses an indirection through another function to deal with strictness: if the record has strict fields, then those fields are forced in the
to
translation.↩︎