Anonymous or large records with OverloadedRecordDot and OverloadedRecordUpdate

Edsko de Vries – Tuesday, 07 March 2023

GHC 9.2 introduces two new language extensions called OverloadedRecordDot and OverloadedRecordUpdate, which overload record field access (myRecord.x.y) and record field update (myRecord{x.y = a}), respectively. In this short blog post we will discuss these two extensions in the context of two libraries, large-anon and large-records. Of course, OverloadedRecordDot and OverloadedRecordUpdate can be useful in many other contexts too; the principles we explain in this blog post should apply universally.

We will start by discussing large-anon, which is a library that provides support for scalable anonymous records (records that do not have to be defined before they can be used), along with a rich API for manipulating such records. We will see how we can take advantage of OverloadedRecordDot and OverloadedRecordUpdate to access and update fields of such records.

We then briefly discuss large-records, which provides “regular” records but with a linear compilation time, as opposed to the quadratic compilation time for standard records; this can really matter for large records (records with more than say 20 or 30 fields). The usage of OverloadedRecordDot and OverloadedRecordUpdate is pretty similar to their usage in large-anon, but as we will see, there is a small technical complication.

`OverloadedRecordDot`

When the OverloadedRecordDot language extension is enabled, then r.b for some record r and field b is interpreted as getField @"b" r, where getField is a method of the HasField class defined in GHC.Records:

class HasField x r a | x r -> a where
  getField :: r -> a

For example, the following code¹, which uses the large-anon library for anonymous records, will print True:

{-# LANGUAGE OverloadedRecordDot #-}
{-# OPTIONS_GHC -fplugin=Data.Record.Anon.Plugin #-}

import Data.Record.Anon
import Data.Record.Anon.Simple qualified as S

test :: IO ()
test = print r.b
  where
    r :: S.Record [ "a" := Int, "b" := Bool ]
    r = ANON { a = 5, b = True }

The syntax also extends to nested fields; for example this code will print Just 'x':

import Data.Record.Anon.Advanced qualified as A

test :: IO ()
test = print r.b.d
  where
    r :: S.Record [ "a" := Int, "b" := A.Record Maybe [ "c" := Bool, "d" := Char ] ]
    r = ANON { a = 5, b = ANON_F { c = Nothing, d = Just 'x' } }

(The difference between “simple” anonymous records and “advanced” anonymous records is that the latter support an additional functor argument; S.Record ["a" := Int, "b" := Bool ] is a record with one field or type Int and one field of type Bool, whereas A.Record Maybe [ "c" := Bool, "d" := Char ] is a record with one field of type Maybe Bool and one field of type Maybe Char. For an in-depth introduction, see large-anon: Practical scalable anonymous records for Haskell.)

`OverloadedRecordUpdate`

Unfortunately, the situation with OverloadedRecordUpdate is not as clean as with OverloadedRecordDot. The current situation is that the syntax r{a = 6} translates to setField @"a" r 6, without specifying what setField is, precisely. The design and implementation of the corresponding class is not yet finalized; the GHC wiki entry on overloaded record fields contains a bunch of proposals.

In an attempt to guarantee forwards compatibility, therefore, OverloadedRecordUpdate currently requires RebindableSyntax. This language extension changes GHC so certain pieces of syntax no longer refer to their standard definition, but rather to whatever definition is in scope; for example, it makes it possible to change what OverloadedStrings does (by using a different definition of fromString) or to change how do notation is desugared (by using a different definition of >>=). For records, it will use whatever definition of getField is in scope instead of using the HasField class; and it will also whatever definition of setField is in scope, except that in this case there is no standard definition yet.

The large-anon library offers a module Data.Record.Anon.Overloading which can be used in combination with RebindableSyntax: it restores all definitions to their standard definition again, and it defines a setField function which depends on the HasField class from the record-hasfield package:

class HasField x r a | x r -> a where
  hasField :: r -> (a -> r, a)

For example, the following code prints ANON { a = 6, b = True }:

{-# LANGUAGE OverloadedRecordUpdate #-}
{-# LANGUAGE RebindableSyntax       #-}

import Data.Record.Anon.Overloading

test :: IO ()
test = print r { a = 6 }
  where
    r :: S.Record [ "a" := Int, "b" := Bool ]
    r = ANON { a = 5, b = True }

Like record field access, OverloadedRecordUpdate also works for nested fields (in this case OverloadedRecordDot is also required):

test :: IO ()
test = print r { b.d = Just 'a' }
  where
    r :: S.Record [ "a" := Int, "b" := A.Record Maybe [ "c" := Bool, "d" := Char ] ]
    r = ANON { a = 5, b = ANON_F { c = Just True, d = Nothing  } }

`large-records`

The above examples all used the large-anon library for anoynmous records; the situation with the large-records is similar, with one somewhat annoying technical difference. The large-records library provides records whose compilation time scales linearly with the size of the record, as opposed to the quadratic compilation time of standard records. As discussed in Avoiding quadratic core code size with large records, it needs to jump through quite a few hoops to achieve this. When a module contains a record declaration such as

{-# OPTIONS_GHC -fplugin=Data.Record.Plugin #-}

{-# ANN type Person largeRecord #-}
data Person = Person { name :: String }

then large-records turns it into something like this:

data Person = forall a. a ~ String => Person { name :: a }

This unusual representation achieves two goals: the strange existential type parameter prevent GHC from generating field accessors (one of the many sources of quadratic code), but we can still construct and pattern match on these records in a normal way.

As discussed in the GHC manual, the existential type parameter will also prevent GHC from automatically derive a HasField instance for fields such as name. This wouldn’t be a problem (large-records could generate the instance instead), if it wasn’t for this proviso in the manual:

If a field has a higher-rank or existential type, the corresponding HasField constraint will not be solved automatically (..), but in the interests of simplicity we do not permit users to define their own instances either.

There is a GHC proposal to relax this condition; until that proposal is implemented, however, the use of OverloadedDot with large-records will depend on RebindableSyntax (which then bypasses the use of the standard HasField class entirely). Of course, as we saw in the previous section, OverloadedRecordUpdate depends on RebindableSyntax anyway, so this isn’t as much of a limitation as it might seem.

Conclusions

Record dot syntax has been available since GHC 8.6 through the use of the record-dot-preprocessor. From 9.2 onwards this syntax is now available without a preprocessor or plugin; for record field access this is achieved through the use of the HasField class from GHC.Records. For record field update the design is not yet finalized, and we instead rely on RebindableSyntax.

As of their latest releases, the large-anon and large-records libraries, as well as the supporting libraries large-generics and typelet, are now compatible with GHC 9.2 and 9.4 and make use of these extensions. It is also still possible to use record-dot-preprocessor (you will need at least version 0.2.16 for GHC 9.4). The only complication is that GHC 9.4 changed the order in which plugins are loaded. This matters because the large-records plugin must be loaded before RecordDotPreprocessor; to avoid users having to use CPP to distinguish between GHC versions, large-records now offers Data.Record.Plugin.WithRDP which combines both plugins (for large-anon the order does not matter).

Here and elsewhere we will only show the most important language pragmas and includes.↩︎