Suppose you are writing a compiler for some programming language or DSL. If you are doing source to source transformations in your compiler, perhaps as part of an optimization pass, you will need to construct and deconstruct bits of abstract syntax. It would be very convenient if we could write that abstract syntax using the syntax of your language. In this blog post we show how you can reuse your existing compiler infrastructure to make this possible by writing a quasi-quoter with support for metavariables. As we will see, a key insight is that we can reuse object variables as meta variables.
Toy Language “Imp
”
For the sake of this blog post we will be working with a toy language called
Imp
. The abstract syntax for Imp
is defined by
type VarName = String
data Expr =
Var VarName
| Add Expr Expr
| Sub Expr Expr
| Int Integer
| Read
deriving (Data, Typeable, Show, Eq)
data Cmd =
Write Expr
| Assign VarName Expr
| Decl VarName
deriving (Data, Typeable, Show)
data Prog = Prog [Cmd]
deriving (Data, Typeable, Show)
and we will assume that we have some parsec
parsers
parseExpr :: Parser Expr
parseProg :: Parser Prog
We will also make use of
topLevel :: Parser a -> Parser a
= whiteSpace *> p <* eof topLevel p
and the following useful combinator for running a parser:
parseIO :: Parser a -> String -> IO a
The details of these parsers are beyond the scope of this post. There are
plenty of parsec
tutorials online; for instance, you could start with the
parsec
chapter in Real World Haskell. Moreover, the full code
for this blog post, including a simple interpreter for the language, is
available on github if you want to play with it. Here is a simple
example of an Imp
program:
var x ;
x := read ;
write (x + x + 1)
A simple quasi-quoter
We want to be able to write something like
prog1 :: Prog
= [prog|
prog1
var x ;
x := read ;
write (x + x + 1) |]
where the intention is that the [prog| ... |]
quasi-quote will expand to
something like
= Prog [
prog1 Decl "x"
Assign "x" Read
, Write (Add (Add (Var "x") (Var "x")) (Int 1))
, ]
To achieve this, we have to write a quasi-quoter. A quasi-quoter is an instance of the following data type:
data QuasiQuoter = QuasiQuoter {
quoteExp :: String -> Q Exp
quotePat :: String -> Q Pat
, quoteType :: String -> Q Type
, quoteDec :: String -> Q [Dec]
, }
The different fields are used when using the quasi-quoter in different places in your Haskell program: at a position where we expect a (Haskell) expression, a pattern (we will see an example of that later), a type or a declaration; we will not consider the latter two at all in this blog post.
In order to make the above example (prog1
) work, we need to implement
quoteExp
but we can leave the other fields undefined:
prog :: QuasiQuoter
= QuasiQuoter {
prog = \str -> do
quoteExp <- location'
l <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
c const Nothing) c
dataToExpQ (= undefined
, quotePat = undefined
, quoteType = undefined
, quoteDec }
Let’s see what’s going on here. The quasi-quoter gets as argument the string in
the quasi-quote brackets, and must return a Haskell expression in the
Template-Haskell Q
monad. This monad supports, amongst other things, getting
the current location in the Haskell file. It also supports IO.
Location
The first thing that we do is find the current location in the Haskell source
file and convert it to parsec
format:
location' :: Q SourcePos
= aux <$> location
location' where
aux :: Loc -> SourcePos
= uncurry (newPos (loc_filename loc)) (loc_start loc) aux loc
Running the parser
Once we have the location we then parse the input string to a term in our
abstract syntax (something of type Prog
). We use parsec
’s setPosition
to
tell parsec
where we are in the Haskell source file, so that if we make a
mistake such as
prog1 :: Prog
= [prog|
prog1
var x ;
x := read ;
write (x + x + ) |]
we get an error that points to the correct location in our Haskell file:
TestQQAST.hs:6:9:
Exception when trying to run compile-time code:
user error ("TestQQAST.hs" (line 9, column 20):
unexpected ")"
expecting "(", "read", identifier or integer)
Converting to Haskell abstract syntax
The parser returns something of type Prog
, but we want something of type
Exp
; Exp
is defined in Template Haskell and reifies the abstract
syntax of Haskell. For example, we would have to translate the Imp
abstract
syntax term
Var "x" :: Prog
to its reflection as a piece of abstract Haskell syntax as
AppE (ConE 'Var) (LitE (StringL "x")) :: TH.Exp
which, when spliced into the Haskell source, yields the original Prog
value.
Fortunately, we don’t have to write this translation by hand, but we can make
use of the following Template Haskell function:
dataToExpQ :: Data a
=> (forall b. Data b => b -> Maybe (Q Exp))
-> a -> Q Exp
This function can translate any term to a reified Haskell expression, as long
as the type of the term derives Data
(Data
instances can be auto-derived by
ghc
if you enable the DeriveDataTypeable
language extension). The first
argument allows you to override the behaviour of the function for specific
cases; we will see an example use case in the next section. In our quasi-quoter
so far we don’t want to override anything, so we pass a function that always
returns Nothing
.
Once we have defined this quasi-quoter we can write
prog1 :: Prog
= [prog|
prog1
var x ;
x := read ;
write (x + x + 1) |]
and ghc
will run our quasi-quoter and splice in the Haskell expression
corresponding to the abstract syntax tree of this program (provided that we
enable the QuasiQuotes
language extension).
Meta-variables
Consider this function:
prog2 :: VarName -> Integer -> Prog
= [prog|
prog2 y n
var x ;
x := read ;
write (x + y + n) |]
As mentioned, in the source code for this blog post we also have an
interpreter for the language. What happens if we try to run (prog2 "x" 1)
?
*Main> intIO $ intProg (prog2 "x" 2)
5
*** Exception: user error (Unbound variable "y")
Indeed, when we look at the syntax tree that got spliced in for prog2
we see
Prog [ Decl "x"
Assign "x" Read
, Write (Add (Add (Var "x") (Var "y")) (Var "n"))
, ]
What happened? Didn’t we pass in "x"
as the argument y
? Actually, on
second thought, this makes perfect sense: this is after all what our string
parses to. The fact that y
and n
also happen to be Haskell variables, and
happen to be in scope at the point of the quasi-quote, is really irrelevant.
But we would still like prog2
to do what we expected it to do.
Meta-variables in Template Haskell
To do that, we have to support meta variables: variables from the
“meta” language (Haskell) instead of the object language (Imp
).
Template Haskell supports this out of the box. For example, we can define
ex :: Lift a => a -> Q Exp
= [| id x |] ex x
Given any argument that supports Lift
, ex
constructs a piece of abstract
Haskell syntax which corresponds to the application of the identity function to
x
. (Don’t confuse this with anti-quotation; see Brief Intro to
Quasi-Quotation.) Lift
is a type class with a single
method
class Lift t where
lift :: t -> Q Exp
For example, here is the instance for Integer
:
instance Lift Integer where
= return (LitE (IntegerL x)) lift x
Meta-variables in quasi-quotes
Quasi-quotes don’t have automatic support for meta-variables. This makes sense: Template Haskell is for quoting Haskell so it has a specific concrete syntax to work with, where as quasi-quotes are for arbitrary custom syntaxes and so we have to decide what the syntax and behaviour of meta-variables is going to be.
For Imp
we want to translate any unbound Imp
(object-level) variable in the
quasi-quote to a reference to a Haskell
(meta-level) variable. To do that, we
will introduce a similar type class to Lift
:
class ToExpr a where
toExpr :: a -> Expr
and provide instances for variables and integers:
instance ToExpr VarName where
= Var
toExpr
instance ToExpr Integer where
= Int toExpr
We will also need to know which variables in an Imp
program are bound
and unbound; in the source code you will find a function which
returns the set of free variables in an Imp
program:
fvProg :: Prog -> Set VarName
Overriding the behaviour of dataToExpQ
In the previous section we mentioned that rather than doing the Prog -> Q Exp
transformation by hand we use the generic function dataToExpQ
to do it for
us. However, now we want to override the behaviour of this function for the
specific case of unbound Imp
variables, which we want to translate to
Haskell variables.
Recall that dataToExpQ
has type
dataToExpQ :: Data a
=> (forall b. Data b => b -> Maybe (Q Exp))
-> a -> Q Exp
This is a rank-2 type: the first argument to dataToExpQ
must itself
be polymorphic in b
: it must work on any type b
that derives Data
. So far
we have been passing in
const Nothing
which is obviously polymorphic in b
since it completely ignores its argument.
But how do we do something more interesting? Data
and its associated
combinators come from a generic programming library called Scrap Your
Boilerplate (Data.Generics
). A full discussion of SYB is beyond
the scope of this blog post; the SYB papers are a good starting
point if you would like to know more (I would recommend reading them in
chronological order, the first published paper first). For the sake of what we
are trying to do it suffices to know about the existence of the following
combinator:
extQ :: (Typeable a, Typeable b) => (a -> q) -> (b -> q) -> a -> q
Given a polymorphic query (forall a
)—in our case this is const Nothing
—extQ
allows to extend the query with a type specific case (for
a specific type b
). We will use this to give a specific case for Expr
: when
we see a free variable in an expression we translate it to an application of
toExpr
to a Haskell variable with the same name:
metaExp :: Set VarName -> Expr -> Maybe ExpQ
Var x) | x `Set.member` fvs =
metaExp fvs (Just [| toExpr $(varE (mkName x)) |]
=
metaExp _ _ Nothing
The improved quasi-quoter
With this in hand we can define our improved quasi-quoter:
prog :: QuasiQuoter
= QuasiQuoter {
prog = \str -> do
quoteExp <- location'
l <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
c const Nothing `extQ` metaExp (fvProg c)) c
dataToExpQ (= undefined
, quotePat = undefined
, quoteType = undefined
, quoteDec }
Note that we are extending the query for Expr
, not for Prog
; dataToExpQ
(or, more accurately, SYB) makes sure that this extension is applied at all the
right places. Running (prog2 "x" 2)
now has the expected behaviour:
*Main> intIO $ intProg (prog2 "x" 2)
6
14
Indeed, when we have a variable in our code that is unbound both in Imp
and in Haskell, we now get a Haskell type error:
prog2 :: VarName -> Integer -> Prog
= [prog|
prog2 y n
var x ;
x := read ;
write (x + z + n) |]
gives
TestQQAST.hs:15:19: Not in scope: ‘z’
Parenthetical remark: it is a design decision whether or not we want to allow
local binding sites in a splice to “capture” meta-variables. Put
another way, when we pass in "x"
to prog2
, do we mean the x
that is bound
locally in prog2
, or do we mean a different x
? Certainly a case can be made
that we should not be able to refer to the locally bound x
at all—after
all, it’s not bound outside of the snippet! This is an orthogonal concern
however and we will not discuss it any further in this blog post.
Quasi-quoting patterns
We can also use quasi-quoting to support patterns. This enables us to write something like
optimize :: Expr -> Expr
| n == m = optimize a
optimize [expr| a + n - m |] = other optimize other
As before, the occurrence of a
in this pattern is free, and we intend it
to correspond to a Haskell variable, not an Imp
variable; the above code
should correspond to
Sub (Add a n) m) | n == m = optimize a optimize (
(note that this is comparing Expr
s for equality, hence the need for Expr
to derive Eq
). We did not mean the pattern
Sub (Add (Var "a") (Var "n")) (Var "m")) optimize (
To achieve this, we can define a quasi-quoter for Expr
that supports patterns
(as well as expressions):
expr :: QuasiQuoter
= QuasiQuoter {
expr = \str -> do
quoteExp <- location'
l <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
e const Nothing `extQ` metaExp (fvExpr e)) e
dataToExpQ (= \str -> do
, quotePat <- location'
l <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
e const Nothing `extQ` metaPat (fvExpr e)) e
dataToPatQ (= undefined
, quoteType = undefined
, quoteDec }
The implementation of quotePat
is very similar to the definition of
quoteExp
. The only difference is that we use dataToPatQ
instead of
dataToExpQ
to generate a Haskell pattern rather than a Haskell expression,
and we use metaPat
to give a type specific case which translates free
Imp
variables to Haskell pattern variables:
metaPat :: Set VarName -> Expr -> Maybe PatQ
Var x) | x `Set.member` fvs = Just (varP (mkName x))
metaPat fvs (= Nothing metaPat _ _
Note that there is no need to lift now; the Haskell variable will be bound to whatever matches in the expression.
Limitations
We might be tempted to also add support for Prog
patterns. While that is
certainly possible, it’s of limited use if we follow the same strategy that we
followed for expressions. For instance, we would not be able to write something
like
| x `Set.notMember` fvProg c = opt c opt [prog| var x ; c |]
The intention here is that we can remove unused variables. Unfortunately, this
will not work because this will cause a parse error: the syntax for Imp
does
not allow for variables for commands, and hence we also don’t allow for
meta-variables at this point. This is important to remember:
By using object-level variables as stand-ins for meta-level variables, we only allow for meta-level variables where the syntax for the object-level language allows variables.
If this is too restrictive, we need to add special support in the ADT and in the corresponding parsers for meta-variables. This is a trade-off in increased expressiveness of the quasi-quotes against additional complexity in their implementation (new parsers, new ADTs).
Conclusions
By reusing object-level variables as stand-ins for meta variables you can reuse existing parsers and ADTs to define quasi-quoters. Using the approach described in this blog we were able to add support for quasi-quoting to a real compiler for a domain specific programming language with a minimum of effort. The implementation is very similar to what we have shown above, except that we also dealt with renaming (so that meta variables cannot be captured by binding sites in the quasi quotes) and type checking (reusing the existing renamer and type checker, of course).