The Plutus Compilation Pipeline: Understanding Plutus Core(s)

Edsko de Vries – Wednesday, 03 August 2022

Plutus is a strict, pure functional language. It is developed by IOHK for use on the Cardano blockchain, but in this blog post we will not be concerned with specific applications of the language, but instead look at its compilation pipeline.

Technically speaking, Plutus is not one language, but three, and most people who write “Plutus” are not really writing Plutus at all, but Haskell. These Haskell programs are translated to the Plutus Intermediate Representation (PIR). After that, data types are replaced by their Scott encodings, and recursion is replaced by explicit fixpoints to get to typed Plutus Core (PLC). Finally, types are erased to get to the Untyped Plutus Core (UPLC).

In this blog post, we will explain this entire process, with a particular focus on

The consequences of polymorphism in the typed language on the untyped language
Scott encoding
Type and term level fixpoints

We won’t try to explain the default syntax of Plutus Core in this blog post, because it is very verbose; the syntax we used here is generated by a custom pretty-printer we wrote. If you want to learn more about the default syntax, or need a more gentle introduction to lambda calculus, you might want to check out An Introduction to Plutus Core.

Polymorphism

Throughout this blog post we will assume the reader is familiar with Haskell, and we will use Haskell in sections marked In Haskell as a vehicle for explaining some of the concepts needed to understand Plutus code, before showing those concepts in Plutus itself.

In Haskell. Force and Delay

Suppose we need to work with an API that is stricter than we’d like it to be. To have a concrete but simple (if contrived) example, let’s say we want to use this strict implementation of if-then-else:

strictIfThenElse :: Bool -> a -> a -> a
strictIfThenElse !b !t !f = if b then t else f

If we call this function with two arguments t and f, it will evaluate both arguments before even looking at b, even though only one will be required.

> strictIfThenElse True 'a' undefined
*** Exception: Prelude.undefined

If we want to avoid this problem, we somehow need to delay evaluation of the arguments. We could do this by giving them a dummy argument of unit type:

type Delayed a = () -> a

delay :: a -> Delayed a
delay x = const x

Going the other direction, if we have a delayed argument, in order to evaluate it be must force it:

force :: Delayed a -> a
force x = x ()

Using delay and force we can get the behaviour we’d expect:

> force $ strictIfThenElse True (delay 'a') (delay undefined)
'a'

Simple example: function application

With these first preliminaries out of the way, let’s start taking a look at Plutus now. The compilation pipeline is as follows:

Plutus programs are written using Haskell as a surface language.
The Haskell code is translated to the Plutus Intermediate Representation (PIR).
Through a series of transformations the compiler turns PIR into (typed) Plutus Core (PLC). The main difference between PIR and PLC is that the former has support for datatypes and recursion, and the latter does not.
Finally, PLC is turned into Untyped Plutus Core (UPLC).

We will see lots of examples of each of these representations in this blog post, but let’s start with something very simple. Consider the following Haskell code:

apMono :: (Integer -> Integer) -> Integer -> Integer
apMono f x = f x

For these first few simple examples, the PIR and PLC representations are identical:

λ (f :: Integer -> Integer) (x :: Integer) -> f x

PIR and PLC are explicitly typed languages, so every variable binder is given a type annotation. These types are then removed in the translation to UPLC:

λ f x -> f x

No surprises so far, so let’s consider something a bit more interesting:

apId :: (forall a. a -> a) -> Integer -> Integer
apId f x = f x

Same code as above, but with a different type: we now insist that the first argument is polymorphic in a. The only (total) function of type forall a. a -> a is the identity, so apId isn’t particularly useful, but it will serve to illustrate a point. The PIR/PLC translation of apId is:

λ (f :: ∀ a. a -> a) (x :: Integer) -> f {Integer} x

Polymorphic functions in Haskell become functions that take type arguments in PIR; in the body we see f being applied to the type argument Integer and then to the regular argument x. (We will use curly brackets for type arguments and type application.) Still nothing unusual, but in the translation to UPLC we see the first Plutus specific feature:

λ f x -> f! x

That exclamation mark is the force operator we discussed above. UPLC has explicit constructs for delay and force, which we will write as λ ~ -> e and e!, respectively:

λ ~ -> e  ==  delay e
e!        ==  force e

The question is: why is f being forced here?

Polymorphism versus strictness

To illustrate why f was being forced in that last example, let’s consider a slightly more interesting PIR program¹

let* !boom = λ {a} -> ERROR
     !id   = λ {a} (x :: a) -> x
in λ (b :: Bool) -> ifThenElse {∀ a. a -> a} b boom id

This defines a function that takes an argument b of type Bool. When the caller passes True, the function evaluates to ERROR (aka undefined in Haskell); otherwise it evaluates to the identity function.

You might expect the untyped version of this program to look something like this:

λ b -> ifThenElse b ERROR (λ x -> x)

However, as we mentioned above, Plutus Core is a strict language: function arguments are evaluated before the function is called. This means that this translation is not correct: the call to ERROR would be evaluated and the program would crash, independent of the value of b.

In the original program this is not the case: the argument passed to ifThenElse is (λ {a} -> ERROR), which is a function. Evaluating a function does nothing, so the original PIR program does have the behaviour we need.²

This means that when we erase types to get from PLC to UPLC, we have to make sure that we don’t make programs stricter than they were. The Plutus compiler does this by replacing type arguments by delays. The UPLC translation of our example is:

λ b -> ifThenElse# ! b (λ ~ -> ERROR) (λ ~ y -> y)

Using the definitions from the section on Force and Delay, we could write this UPLC code in (Pseudo-)Haskell as

maybeExplode :: Bool -> Delayed (∀ a. a -> a)
maybeExplode b = strictIfThenElse b (delay undefined) (delay id)

This is nearly identical to the strictIfThenElse example we saw before, except that there is no need to force the final result: we want to return a delayed function, because we are returning a polymorphic function.

Finally, this also explains why we need to force the call to the built-in function ifThenElse#: it too is a polymorphic function, and hence it too is delayed.

Datatypes

In Haskell. Scott encoding

Consider pattern matching on a value of type Maybe a: we will need two cases, one case for Nothing and one case for Just (x :: a). We can capture this notion of “pattern matching on Maybe” as a function:

toSMaybe :: ∀ a. Maybe a -> (∀ r. r -> (a -> r) -> r)
toSMaybe x = λ n j -> maybe n j x

As is clear from the definition, this is just the well-known function maybe from the standard library with the arguments in a different order (the reason for the strange name toSMaybe will become apparent soon). We can also go the other way: from the pattern matching function to the value of type Maybe a:

fromSMaybe :: (∀ r. r -> (a -> r) -> r) -> Maybe a
fromSMaybe x = x Nothing Just

In fact, it’s not difficult to see that these two functions are mutually inverse. This means that the two representations Maybe a and (∀ r. r -> (a -> r) -> r) are isomorphic, and we could define Maybe to be the type of pattern matching functions. This is known as the Scott encoding of Maybe:

type SMaybe a = forall r. r -> (a -> r) -> r

As some further examples of Scott encoding, we will consider the case for booleans, pairs, and unit (we will see the Plutus equivalent of all of these later). In the case of Bool, pattern matching will need a case for True and a case for False; this means that the translation to the Scott encoding turns out to be if-then-else:

type SBool = forall r. r -> r -> r

toSBool :: Bool -> SBool
toSBool x = λ t f -> if x then t else f

fromSBool :: SBool -> Bool
fromSBool x = x True False

For pairs, we only need one case, which will need to accept both components of the pair as arguments. This means that the Scott-encoding is essentially uncurry:

type SPair a b = forall r. (a -> b -> r) -> r

toSPair :: (a, b) -> SPair a b
toSPair x = λ f -> uncurry f x

fromSPair :: SPair a b -> (a, b)
fromSPair x = x (,)

Finally, for unit we only need one case, with no arguments; i.e., the Scott-encoding of the unit value () is the identity function:

type SUnit = forall r. r -> r

toSUnit :: () -> SUnit
toSUnit () = id

fromSUnit :: SUnit -> ()
fromSUnit x = x ()

This last example may seem a bit silly, but we will encounter it again later.

Scott encoding is not Church encoding

For readers familiar with the Church encoding, it may be worth pointing out that the Scott encoding is not a fold; it only captures pattern matching, not recursion. Concretely, the Scott encoding of natural numbers is given by

newtype SNat = SNat (forall r. (SNat -> r) -> r -> r)

Note the recursive occurrence of SNat here. This means for example that unlike with Church encodings, we can easily define a predecessor function:

pred :: SNat -> SNat
pred (SNat x) = x id (error "pred: zero")

This example will look different in Plutus, however; we will come back to it when we discuss recursion.

Simple example: booleans

Now that we understand what Scott encodings are, let’s look at how booleans work in Plutus Core, and consider compilation of the the Haskell expression

ifThenElse :: Bool -> a -> a -> a
ifThenElse = λ b t f -> if b then t else f

In PIR, this looks like

let* data Bool = True  | False
in λ {a} (b :: Bool) (t :: a) (f :: a) ->
     Bool_match b {∀ _. a} (λ {_} -> t) (λ {_} -> f) {_}

Every PIR program is entirely self-contained (no external libraries), so all datatype and function definitions are included. Unlike in Haskell, datatype declarations are scoped, and automatically introduce an explicit pattern matching function; for Bool this pattern matching function is called Bool_match.

The body of the function does not introduce any concepts we didn’t see before: if-then-else is translated³ by a call to the pattern matching function on Bool. In order to prevent the evaluation of t and f before the pattern match on b, we delay those arguments by giving them a dummy type argument; this makes them polymorphic functions, and we then force the choice once it is made by applying it a dummy type argument⁴. As we discussed above, this type abstraction and application will then be translated to force/delay calls when the program is translated to UPLC, as we will see shortly.

Of course, in this particular case, the Plutus compiler is being overly conservative: since t and f are variables (lambda arguments), we know that they must already have been evaluated (Plutus is a strict language after all), and so the force/delay in this example is redundant and could in principle be optimized away.

This is our first example of a program that looks different in PIR and PLC, because PLC does not have data types; instead, we see the Scott encoding of Bool appear:

let* type Bool   = ∀ r. r -> r -> r
     !True       = λ {r} (t :: r) (f :: r) -> t
     !False      = λ {r} (t :: r) (f :: r) -> f
     !Bool_match = λ (x :: ∀ r. r -> r -> r) -> x
in λ {a} (b :: Bool) (t :: a) (f :: a) ->
     Bool_match b {∀ _. a} (λ {_} -> t) (λ {_} -> f) {_}

We can see that Bool is now the Scott-encoding of Bool we saw in the previous section; True and False have become functions that select the first and second argument respectively, and Bool_match is now the identity function because Scott-encoded data types are pattern matching functions.

Finally, the UPLC translation is

λ ~ b t f -> ( b! (λ ~ -> t) (λ ~ -> f) )!

We have now seen all concepts required to understand this code:

The function itself is polymorphic, and so it is delayed; this explains the initial dummy argument.
b is the Scott-encoding of a Bool, is therefore a polymorphic function, which turned into a function with a dummy argument in UPLC, and so we must force b in order to actually use that function.
We delay both t and f, and once the choice is made, we force the result.

This definition is very similar to the strictIfThenElse example above; the only difference really is the top-level delay; it’s as if we had written this in Haskell:

delay $ λ b t f -> force $ force b (delay t) (delay f)

Types with arguments: pairs

The Bool type discussed in the previous section had no arguments: it had kind Type. As a second example of Scott encodings, let’s consider a type that does have arguments: the type of pairs. This type has two type arguments (the two components), and hence has kind Type -> Type -> Type (also written * -> * -> *).

Let’s consider the compilation of the Haskell program

fst :: (a, b) -> a
fst (x, _y) = x

In PIR, this program looks like:

let* data (Tuple2 :: * -> * -> *) a b = Tuple2 a b
in λ {a} {b} (pair :: Tuple2 a b) ->
     Tuple2_match {a} {b} pair {a} (λ (x :: a) (y :: b) -> x)

Like before, the pattern match on the pair has been replaced by an explicit call to the pattern matching function. Unlike in the Bool case, this pattern matching function now has three type arguments: two for the type arguments of the pair (a and b), and one for the result type (here, a, because we are selecting the first component of the pair).

The PLC version of this program is

let* type Tuple2 :: * -> * -> * = Λ a b. ∀ r. (a -> b -> r) -> r
     !Tuple2       = λ {a} {b} (x :: a) (y :: b) {r} (f :: a -> b -> r) -> f x y
     !Tuple2_match = λ {a} {b} (x :: (Λ a b. ∀ r. (a -> b -> r) -> r) a b) -> x
in λ {a} {b} (pair :: Tuple2 a b) ->
     Tuple2_match {a} {b} pair {a} (λ (x :: a) (y :: b) -> x)

The most interesting thing here is that the type Tuple2 has become a type-level function (syntax Λ a. t), which computes the type of the pattern matching function given two arguments a and b. Note the kind difference between

Λ a b. ... :: * -> * -> *

and (for a given a and b)

∀ r. ... :: *

The rest of the code is reasonably straight-forward: the Tuple2 term is the constructor for pairs, Tuple2_match is the pattern matching function and is again the identity function (because data types are pattern matching functions when we use Scott encodings).

Finally, this program compiles down to the following UPLC:

λ ~ ~ pair -> pair! (λ x y -> x)

No new concepts here; it is perhaps somewhat surprising to see the double delay of this function, arising from the fact that it has two type arguments. Indeed, one might wonder if some of these calls to delay/force might not be avoided through a compiler optimization pass.

Type-level recursion

In Haskell. Type-level fixpoints

When we introduced Scott encoding, we saw that we can take something that we normally think of as a feature of a language (pattern matching) and turn it into something that we can work with in the language (functions). In this section we will see that we can do the same with another language feature: type-level recursion (we will delay a discussion of term level recursion until the next section). For a discussion of why one might do this in Haskell, see for example the paper by Rodriguez et al. on generic programming, or the paper by Bahr and Hvitved on compositional data types; why the Plutus compiler does it is discussed in the Plutus paper.

Consider the standard definition of lists:

data List a = Nil | Cons a (List a)

List takes a single type argument, and hence has kind

List :: * -> *

In order to identify recursion as its own concept, we can abstract out the recursive use of List, and give List an additional argument that will stand in for “recursive calls”:

data ListF f a = Nil | Cons a (f a)

Now f has the same kind that List had previously, which means that we have

ListF :: (* -> *) -> (* -> *)

What should we pass as the value for f? In a sense, we want to pass ListF itself, but if we do, then that ListF also wants an argument, ad infinitum:

ListF (ListF (ListF (ListF ..)))  a

This, of course, is precisely recursion. We can capture this using a type-level fixpoint operator:

newtype Fix (t :: (* -> *) -> (* -> *)) (a :: *) = WrapFix (t (Fix t) a)

The kind of the argument t to Fix is precisely the kind of ListF, so we can define

type List = Fix ListF

Let’s check if this makes sense:

   List a
== Fix ListF a
== ListF (Fix ListF) a
== ListF List a

In Haskell. Generalizing to arbitrary kinds: `IFix`

If we look closely at the arguments of Fix, we realize that it is more (kind) monomorphic than it needs to be, and there is an obvious generalization:

newtype IFix (t :: (k -> *) -> (k -> *)) (a :: k) = WrapIFix (t (IFix t) a)

The Plutus paper goes through some length explaining why this definition is sufficient to capture all datatypes (Section 3.1, Recursive types). We will not do that in this blog post, but we will see three different example use cases, starting with lists: notice that we get Fix back simply by instantiating k to *:

type List = IFix FListF

Let’s introduce two functions to wrap and unwrap IFix:

wrap :: t (IFix t) a -> IFix t a
wrap = WrapIFix

unwrap :: IFix t a -> t (IFix t) a
unwrap (WrapIFix t) = t

If we now write functions on lists, we must wrap and unwrap at the appropriate times:

nil :: FList a
nil = wrap $ Nil

cons :: a -> FList a -> FList a
cons x xs = wrap $ Cons x xs

null :: FList a -> Bool
null xs =
   case unwrap xs of
     Nil      -> True
     Cons _ _ -> False

In Haskell. Combining with Scott encoding

Before we look at how these fixpoints are used in Plutus, let’s stay in Haskell a tiny bit longer, and first see how fixpoints can be combined with Scott encoding.

First, let’s see the Scott encoding of lists without the use of IFix:

newtype List a = List {
      listMatch :: forall r. r -> (a -> List a -> r) -> r
    }

The pattern matching function needs two cases: one for nil and one for cons; the latter then takes two arguments, the head and tail of the list. Some simple examples:

nil :: List a
nil = List $ λ n _c -> n

cons :: a -> List a -> List a
cons x xs = List $ λ _ c -> c x xs

null :: List a -> Bool
null xs = listMatch xs True (λ _ _ -> False)

To combine the Scott encoding with the use of explicit fixpoints, we need to go through the same process as before, and give List an additional argument that captures the recursive uses:

newtype ListF f a = List {
      listMatch :: forall r. r -> (a -> f a -> r) -> r
    }

type List = IFix IListF

The same three functions we defined above now become:

nil :: List a
nil = wrap $ List $ λ n _c -> n

cons :: a -> List a -> List a
cons x xs = wrap $ List $ λ _n c -> c x xs

null :: List a -> Bool
null xs = listMatch (unwrap xs) True (λ _ _ -> False)

Simple example: lists

We are now finally ready to look at how lists work in Plutus. Let’s start with the definition of nil. In PIR:

let rec data (List :: * -> *) a = Nil  | Cons a (List a)
in Nil

The only new PIR feature we see here is marking the definition of Nat as recursive: let-bindings in PIR are non-recursive unless otherwise indicated.

Let’s look the PLC version now, which is considerably more involved; PLC has neither datatypes nor recursion, and instead uses Scott encodings and the IFix combinator⁵:

let* type ListF = Λ (f :: * -> *). Λ a. ∀ r. r -> (a -> (f a) -> r) -> r
     type List  = Λ a. ifix ListF a
     !Nil       = λ {a} ->
                    wrap {ListF} {a} (
                      λ {r} (n :: r) (c :: a -> List a -> r) -> n)
     !Cons      = λ {a} (x :: a) (xs :: List a) ->
                    wrap {ListF} {a} (
                      λ {r} (n :: r) (c :: a -> (List a) -> r) -> c x xs)
     !List_match = λ {a} (x :: List a) ->
                    unwrap x
in Nil

Hopefully this PLC code is not difficult to understand anymore now; ifix, wrap and unwrap are PLC primitives. To understand the untyped Plutus code, we merely need to know that wrap and unwrap are simply erased, so this whole program compiles down to

λ ~ ~ n c -> n

The double delay comes from the two type parameters: the type of the list elements a, and the type of the continuation r.

As another example, let’s look at null (check if a list is empty). First, in PIR:

let* data Bool = True  | False
     rec data List :: * -> * a = Nil  | Cons a (List a)
in λ {a} (xs :: List a) ->
     List_match {a} xs {∀ _. Bool}
       (λ {_} -> True)
       (λ (x :: a) (xs' :: List a) {_} -> False)
       {_}

The pattern match on the list is replaced by a call to the pattern matching function, and the two arguments for the Nil and Cons cases are delayed, to avoid unnecessary computation; we have seen these patterns before (and, as before, here too the use of delay is unnecessarily conservative). In PLC:

let* -- definition of IListF, List, Nil, Cons, List_match as above
     -- definition of Bool, True, False as discussed in section on datatypes
     ..
in λ {a} (xs :: List a) ->
     List_match {a} xs {∀ _. Bool}
       (λ {_} -> True)
       (λ (x :: a) (xs :: List a) {_} -> False)
       {_}

The main code looks identical, but the definitions have changed from the PLC version: List_match, True and False are now all using the Scott-encoding. In UPLC:

λ ~ xs -> ( xs! (λ ~ ~ t f -> t) (λ x xs ~ ~ t f -> f) )!

Let’s make sure we really understand this code:

The top-level delay (λ ~ xs -> ..) is there because nil is a polymorphic function (type argument a).
We must force xs because xs is represented by the Scott-encoding of a list, which is therefore itself a polymorphic function (type argument r).
We give two arguments to xs: the case for nil and the case for cons. We delay both of those arguments, and then force the result once the choice is made.
Finally, True (i.e., (λ t f -> t) in Scott encoding) and False (i.e, (λ t f -> f)) themselves are polymorphic functions (they have their own r type argument), and we therefore have one more delay in both arguments to xs.

In Haskell. Types of kind `*`

Lists are probably the simplest example of IFix, because the kind of ListF

ListF :: (* -> *) -> (* -> *)

lines up very nicely with the kind of IFix:

IFix :: ∀ k. ((k -> *) -> (k -> *)) -> (k -> *)

simply by picking k = *. We’ll now discuss an example that doesn’t line up quite so nicely: natural numbers. Basic definition:

data Nat = Succ Nat | Zero

Abstracting out the recursion:

data NatF f = Succ f | Zero

Since NatF does not take any argument, its kind does not line up with what IFix wants:

NatF :: * -> *

What we will do instead is pick some type SimpleRec such that

IFix SimpleRec :: (* -> *) -> *

In other words, we will pick k = * -> *. This then allows us to then define

type Nat = IFix SimpleRec NatF

Unlike in the list example, NatF is now the second argument to IFix, not the first; this is quite a different use of IFix.⁶

Here’s how we can define SimpleRec:

newtype SimpleRec (rec :: (* -> *) -> *) (f :: (* -> *)) = SimpleRec (f (rec f))

Let’s see if this makes sense:

   Nat
== IFix SimpleRec NatF
== SimpleRec (IFix SimpleRec) NatF
== NatF (IFix SimpleRec NatF)
== NatF Nat

Example: natural numbers

Now that we have seen the theory in Haskell, the Plutus translation is not that difficult anymore. Let’s consider the compilation of

data Nat = Zero | Succ Nat

isZero :: Nat -> Bool
isZero Zero = True
isZero _    = False

to Plutus. First, PIR:

let*    data Bool = True | False
let rec data Nat  = Succ Nat | Zero
in λ (n :: Nat) ->
     Nat_match n {∀ _. Bool}
       (λ (n' :: Nat) {_} -> False)
       (λ {_} -> True)
       {_}

Nothing new here. Let’s look at PLC:

let* type SimpleRec = Λ (rec :: (* -> *) -> *). Λ (f :: * -> *). f (rec f)
     type NatF      = Λ f. ∀ r. (f -> r) -> r -> r
     type Nat       = ifix SimpleRec NatF

     !Succ      = λ (n :: Nat) -> wrap {SimpleRec} {NatF} (
                    λ {r} (s :: Nat -> r) (z :: r) -> s n)
     !Zero      = wrap {SimpleRec} {NatF} (
                    λ {r} (s :: Nat -> r) (z :: r) -> z)
     !Nat_match = λ (x :: Nat) -> unwrap x

     -- definition of Bool, True, False as discussed in section on datatypes

in λ (n :: Nat) ->
     Nat_match n {∀ _. Bool}
       (λ (ipv :: Nat) {_} -> False)
       (λ {_} -> True)
       {_}

Other than the use of SimpleRec that we discussed in the previous section⁷, this introduces no new concepts. The UPLC code is straight-forward, because the wrapping/unwrapping simply disappears:

λ n -> ( n! (λ ipv ~ ~ t f -> f) (λ ~ ~ t f -> t) )!

Term-level recursion

In Haskell. Term-level fixpoints

So far we have been talking about recursive types; let’s now turn our attention to recursive terms. Let’s consider addition on natural numbers:

data Nat = Succ Nat | Zero

add :: Nat -> Nat -> Nat
add Zero     m = m
add (Succ n) m = Succ (add n m)

Just like we did with types, we can abstract out the recursive call:

addF :: (Nat -> Nat -> Nat) -> Nat -> Nat -> Nat
addF f Zero     m = m
addF f (Succ n) m = Succ (f n m)

And as with types, we really want to pass the function itself as its own argument:

addF (addF (addF (addF ..)))

We can do this by defining a term-level fixpoint combinator:

fix :: (a -> a) -> a
fix f = f (fix f)

add :: Nat -> Nat -> Nat
add = fix addF

Let’s verify:

   add
== fix addF
== addF (fix addF)
== addF add

In Haskell. Deriving term-level recursion from type-level recursion

In the previous section, we abstracted out the recursion from add, but we still used recursion in the definition of fix. Stunningly, it is possible to define the term-level fixpoint combinator without using recursion at all! This is the famous Y-combinator. As we saw above, at the heart of recursion is applying a function to itself, and this self-application is also at the heart of the Y-combinator; in pseudo-Haskell:

fix :: forall a. (a -> a) -> a
fix = λ f -> (λ s -> f (s s)) (λ s -> f (s s)) -- not real Haskell

Let’s verify:

   add
== fix addF
== (λ s -> addF (s s)) (λ s -> addF (s s))
== addF ((λ s -> addF (s s)) (λ s -> addF (s s)))
== addF (fix addF)
== addF add

The above definition is pseudo-Haskell, because it does not typecheck. Consider

selfApply :: ??
selfApply f = f f

What would be the type of selfApply? Since we are apply f to something, it must have type a -> b for some a and b; and since we are applying it to itself, we must then also have that a is the same as a -> b. If we tried this in ghc, it would complain that it cannot solve this problem:

• Occurs check: cannot construct the infinite type: a ~ a -> b
• In the first argument of ‘f’, namely ‘f’
  In the expression: f f
  In an equation for ‘selfApply’: selfApply f = f f

However, we can define a type T b such that T b is isomorphic to T b -> b:

newtype Self b = WrapSelf { unwrapSelf :: Self b -> b }

This allows us to define self-application⁸:

selfApply :: Self b -> b
{-# NOINLINE selfApply #-}
selfApply f = (unwrapSelf f) f

and therefore the Y-combinator:

fix :: forall a. (a -> a) -> a
fix f = selfApply $ WrapSelf $ λ s -> f (selfApply s)

If we remove all the newtype wrapping and unwrapping, we can see that this is indeed the Y-combinator:

   fix f
== selfApply $ WrapSelf $ λ s -> f (selfApply s)
== selfApply $ λ s -> f (selfApply s)
== selfApply $ λ s -> f (s s)
== (λ s -> f (s s)) (λ s -> f (s s))

In Haskell. Using `IFix`

We have now managed to define a term-level fixpoint combinator without using term-level recursion, but we still used type-level recursion in the definition of Self. The final step is to define Self using IFix instead. As always, we first abstract out the recursion from Self:

newtype SelfF (self :: * -> *) (a :: *) = SelfF {
      unSelfF :: self a -> a
    }

Fortunately, the kind of SelfF lines up nicely with the kind expected by IFix, so we can simply define

type Self = IFix SelfF

The definition of the fixpoint combinator remains essentially the same; we just need to now wrap/unwrap both the SelfF newtype and the IFix newtype.

fix :: forall a. (a -> a) -> a
fix f = let w = wrap $ SelfF $ λ s -> f (selfApply s) in selfApply w

However, Since Plutus is a strict language, it only makes sense to take the fixpoint of functions (you can’t define an infinite list in Plutus, for example). The fixpoint in Plutus therefore is specialized to functions; in Haskell we could define this as:⁹

fixFun :: forall a b. ((a -> b) -> (a -> b)) -> (a -> b)
fixFun f = let w = wrap $ SelfF $ λ s x -> f (selfApply s) x in selfApply w

Example: addition

We are now ready to discuss our final and most involved Plutus program: addition. As a reminder, the Haskell code looked like

data Nat = Succ Nat | Zero

add :: Nat -> Nat -> Nat
add Zero     m = m
add (Succ n) m = Succ (add n m)

In PIR:

let rec data Nat = Succ Nat | Zero
    rec !add = λ (n :: Nat) (m :: Nat) ->
           Nat_match n {∀ _. Nat}
             (λ (n' :: Nat) {_} -> Succ (add n' m))
             (λ {_} -> m)
             {_}
in add

To go from PIR to PLC, we need to move the type-level recursion from Nat, the term-level recursion from add, and introduce Scott encoding:¹⁰

let* type SimpleRec = Λ (rec :: (* -> *) -> *). Λ (f :: * -> *). f (rec f)
     type NatF      = Λ f. ∀ r. (f -> r) -> r -> r
     type Nat       = ifix SimpleRec NatF

     !Succ      = λ (n :: Nat) ->
                   wrap {SimpleRec} {NatF} (
                     λ {r} (s :: Nat -> r) (z :: r) -> s n)
     !Zero      = wrap {SimpleRec} {NatF} (
                     λ {r} (s :: Nat -> r) (z :: r) -> z)
     !Nat_match = λ (x :: Nat) -> unwrap x

     type SelfF = Λ (self :: * -> *). Λ a. self a -> a
     type Self  = Λ a. ifix SelfF a

     !addF = λ (f :: Nat -> Nat -> Nat) (n :: Nat) (m :: Nat) ->
               Nat_match n {∀ _. Nat}
                 (λ (n' :: Nat) {_} -> Succ (f n' m))
                 (λ {_} -> m)
                 {_}
     !w    = wrap {SelfF} {Nat -> Nat -> Nat} (
               λ (s :: Self (Nat -> Nat -> Nat)) (n :: Nat) ->
                 addF ((unwrap s) s) n)
     !add  = (unwrap w) w

in add

There is a lot of detail here, but no new concepts. The most important thing to realize is that the definition of addF is precisely the one we expect, and that we can recognize the definition of w and the use of (unwrap w) w as being precisely the application of (what we called) fixFun to addF.

We are now ready to understand the UPLC version:

let* w = λ s n ->
       let* add = s s
       in λ m -> ( n! (λ n' ~ ~ s z -> s (add n' m)) (λ ~ -> m) )!
in w w

This code is pretty difficult to understand without seeing the typed version, but if we compare this to the PLC version we can see what is going on. The only somewhat confusing part is that the compiler has inlined the definition of addF, and assigned the name add to the term (s s) (not unjustifiably: it corresponds to the recursive calls). The double delay in the first argument to n comes from two places: one delay from the definition of addF, and one delay from the definition of Succ.

The cost of polymorphism

In this final section we will not introduce any new Plutus concepts, but we will consider the following question: suppose we have a monomorphic top-level Plutus program; what is the cost of defining this in terms of some polymorphic functions? In an ideal world, this cost would be zero, but unfortunately this is not the case at the moment.

Parametric polymorphism

We have already seen the cost of polymorphism in the proliferation of delay/force calls in UPLC. However, there is another cost. Let’s consider the compilation of this Haskell code:

reversePoly :: [a] -> [a]
reversePoly = reversePoly' []

reversePoly' :: [a] -> [a] -> [a]
reversePoly' acc []     = acc
reversePoly' acc (x:xs) = reversePoly' (x:acc) xs

useReversePoly :: [Integer] -> [Integer]
useReversePoly = reversePoly . reversePoly

The PIR code is unsurprising:

let* rec data List :: * -> * a = Nil  | Cons a (List a)
     rec !reversePoly' =
            λ {a} (acc :: List a) (xs :: List a) ->
               List_match {a} xs {∀ _. List a}
                 (λ {_} -> acc)
                 (λ (x :: a) (xs' :: List a) {_} -> reversePoly' {a} (Cons {a} x acc) xs')
                 {_}
     !reversePoly = λ {a} -> reversePoly' {a} (Nil {a})
in (λ {b} {c} {a} (f :: b -> c) (g :: a -> b) (x :: a) -> f (g x))
     {List Integer}
     {List Integer}
     {List Integer}
     (reversePoly {Integer})
     (reversePoly {Integer})

But the PLC code isn’t; specifically, the compilation of reversePoly':

let* type Id      = ∀ a. a -> a
     type Reverse = ∀ a. List a -> List a -> List a

     !reversePoly'F =
       λ (f :: Id -> Reverse) (_ :: Id) {a} (acc :: List a) (xs :: List a) ->
         List_match {a} xs {∀ _. List a}
           (λ {_} -> acc)
           (λ (x :: a) (xs' :: List a) {_} -> f (λ {a} (x :: a) -> x) {a} (Cons {a} x acc) xs')
           {_}

     !w = wrap {SelfF} {Id -> Reverse} (λ (s :: Self (Id -> Reverse)) (x :: Id) ->
            reversePoly'F ((unwrap s) s) x)
in (unwrap w) w

The definition of reversePoly'F is the version of reversePoly' with the recursive call abstracted out, as before. However, it seems to have gotten an additional argument of type Id = ∀ a. a -> a. This is essentially a different way to delay. Recall from the section on Force and Delay that delay conceptually corresponds to introducing a parameter of type unit; recall furthermore from the section on Scott encoding that the Scott encoding of unit is ∀ a. a -> a.

So we can understand the (dummy) parameter of type Id as another delay, but the reason why this delay is there, and why it doesn’t just use another type abstraction, is rather technical. Here we will merely state that any recursive polymorphic function will receive such a dummy argument at present; unfortunately, this then also results in some additional code in UPLC, because this does not get translated the native delay/force constructs:

let* w = λ s _ -> let* reversePoly = s s
                  in λ ~ acc xs ->
                       ( xs!
                           (λ ~ -> acc)
                           (λ x xs' ~ -> reversePoly (λ ~ x -> x) ! (λ ~ n c -> c x acc) xs')
                       ) !
     reversePoly_0 = w w
     reversePoly_1 = λ ~ -> reversePoly_0 (λ ~ x -> x) ! (λ ~ n c -> n)
     f = reversePoly_1 !
     g = reversePoly_1 !
in λ x_3 -> f (g x_3)

Not only do we see that (λ ~ x -> x) being passed around, but it’s also not shared, so this can result in fairly significant code expansion.

Adhoc polymorphism

Let’s now consider what happens if we have a type class constraint:

allEqualPoly :: forall a. Eq a => a -> [a] -> Bool
allEqualPoly _ []     = True
allEqualPoly x (y:ys) = x == y && allEqualPoly x ys

useAllEqualPoly :: [Integer] -> Bool
useAllEqualPoly = allEqualPoly 1

PIR:

let rec !allEqualPoly = λ {a} (dEq :: (Λa. a -> a -> Bool) a) (x :: a) (ys :: List a) ->
      List_match {a} ys {∀ _. Bool}
        (λ {_} -> True)
        (λ (y :: a) (ys' :: List a) {_} ->
          let* !l = dEq x y
               !r = allEqualPoly {a} dEq x ys'
          in Bool_match l {∀ _. Bool} (λ {_} -> r) (λ {_} -> False) {_})
        {_}
in allEqualPoly {Integer} (λ (x :: Integer) (y :: Integer) ->
     ifThenElse {Bool} (equalsInteger x y) True False) 1

We see that the type class constraint has been replaced with a dictionary argument. The unfortunate thing is that this dictionary argument does not get inlined; in UPLC this program looks like

let* True  = λ ~ t f -> t
     False = λ ~ t f -> f

     w = λ s _ -> let* allEqualPoly = s s
                  in λ ~ dEq x ys ->
                      ( ys!
                          (λ ~ -> True)
                          (λ y ys' ~ -> let* l = dEq x y;
                                             r = (allEqualPoly (λ ~ x -> x))! dEq x ys'
                                        in ( l! (λ ~ -> r) (λ ~ -> False) )!
                          )
                      )!
     allEqualPoly = w w

in (allEqualPoly (λ ~ x -> x))! (λ x y -> ifThenElse# ! (equalsInteger# x y) True False) 1#

This lack of specialization may also prevent further optimizations (we see the same Id dummy argument that we discussed in the previous section here too, because allEqualPoly is a polymorphic recursion function).

Multi-parameter type classes

Finally, let’s consider what happens with multi-parameter type classes

class Convert a b where
  convert :: a -> b

instance Convert Integer Bool where
  convert n = n == 0

convertPair :: (Convert a a', Convert b b') => (a, b) -> (a', b')
convertPair (x, y) = (convert x, convert y)

useConvertPair :: (Integer, Integer) -> (Bool, Bool)
useConvertPair = convertPair

(The Plutus compiler does not currently support associated data types, nor associated type synonyms.) In PIR:

let* data Bool = True  | False
     fConvertIntegerBool = λ(n :: Integer) -> ifThenElse {Bool} (equalsInteger n 0) True False
     data Tuple2 :: * -> * -> * a b = Tuple2 a b
     !dConvert = fConvertIntegerBool
     !dConvert = fConvertIntegerBool
in λ(pair :: Tuple2 Integer Integer) ->
     Tuple2_match
       {Integer}
       {Integer}
       pair
       {Tuple2 Bool Bool}
       (λ(x :: Integer) (y :: Integer) -> Tuple2 {Bool} {Bool} (dConvert x) (dConvert y))

UPLC:

let* fConvertIntegerBool = \ ~ n ->
         ifThenElse# ! (equalsInteger# n 0#) (\ ~ t f -> t) (\ ~ t f -> f)

     dConvert  = fConvertIntegerBool!
     dConvert' = fConvertIntegerBool!

in \ ds -> ds ! (\ x y ~ pair -> pair (dConvert x) (dConvert' y))

Fortunately, here we see no remnants of the fact that we used a type class with multiple parameters.

Taking advantage of `ghc` optimizations

You might wonder why ghc didn’t inline the dictionary in the example of adhoc polymorphism that we saw above. The answer is that it can. All of the examples above had the definition of the Haskell code in the same Haskell module as the call to compile:

useAllEqualPoly :: [Integer] -> Bool
useAllEqualPoly = ..

compiledUseAllEqualPoly :: CompiledCode ([Integer] -> Bool)
compiledUseAllEqualPoly = $$(compile [|| useAllEqualPoly ||])

A consequence of this setup is that the Plutus compilation plugin will see the code before the ghc optimizer gets to work on it. If instead we put the definition of useAllEqualPoly in a different module than the call to compile, and we add

{-# OPTIONS_GHC -fexpose-all-unfoldings   #-}
{-# OPTIONS_GHC -fspecialize-aggressively #-}

to the module defining useAllEqualPoly, we see that ghc inlines the dictionary entirely, and the resulting UPLC code is of the same size as the code resulting from a monomorphic definition of allEqual.

That is encouraging, but unfortunately this does not always result in smaller code. As a simple example, consider something like

pred :: Nat -> Nat
pred Zero     = error ()
pred (Succ n) = n

predder :: Nat -> Nat
predder = pred . pred

The compilation of predder is about 50% larger when compiled with ghc optimizations, because ghc inlined the definition of pred. There are other, less obvious, examples too. For instance, in the parametric polymorphic example of reverse, enabling the ghc optimizer results in more polymorphic recursive let-bindings of the shape discussed in that section, and hence in more of those (λ ~ x -> x) terms, resulting in larger code size.

So your mileage may vary; perhaps enabling the ghc optimizer will result in smaller UPLC code, perhaps not. It may require careful tweaking of the code and/or the ghc optimization flags.

Conclusions

For many Plutus developers, UPLC is of particular interest, because Plutus programs are subject to stringent resource bounds and so it is often necessary to optimize the UPLC code by making tweaks to the Haskell source code. In order to do so, it helps to understand the compilation pipeline, which is what this blog post attempted to illustrate.

This is of course not the only way to improve the UPLC code. For example, Well-Typed has been working on an optimizing compiler for UPLC called Plutonomy; as one example, Plutonomy can reduce the size of the Marlowe Plutus script by about 20%. However, Plutonomy is hampered by the lack of type information in UPLC; indeed, the presence of explicit delay/force constructs in UPLC is a blessing, enabling some optimizations that would not otherwise be possible without type inference or some other kind of global analysis. Perhaps an optimizer that operated at the level of PLC instead of UPLC would be beneficial for some applications.

An alternative approach is to bypass the entire compilation pipeline altogether, and instead write UPLC code “directly”. One project that makes this possible is Plutarch, which provides a typed EDSL in Haskell for UPLC.

We hand-wrote this example in PIR for this blog post, since illustrating this with a PIR program that would result from the normal Haskell to PIR compiler would look more complicated.↩︎
This is different to the semantics of type arguments in ghccore: there evaluation does continue underneath the binder.↩︎
The Haskell Bool type does not translate to the built-in Bool type in Plutus Core; that is used only by built-in functions.↩︎
The actual dummy argument provided is ∀ a. a -> a.↩︎
This is not quite the output of the Plutus compiler; cleaned up a bit to improve readability↩︎
This is not the only possible approach. An alternative would be to define a version of Nat with an argument, over the unit kind. This would perhaps be more consistent; what we illustrate here is what the Plutus compiler does and it may not be unreasonable to special case types of kind *, given how common they are.↩︎
SimpleRec is not named as such in PLC; the definition is simply inlined.↩︎
selfApply must be marked as no-inline to prevent ghc from unrolling recursion at compile time forever.↩︎
We could of course just define fixFun = fix; the definition of fixFun we show here matches the one used in Plutus, and is therefore more helpful when trying to understand PLC/UPLC code.↩︎
Again, code cleaned up a bit for improved readability.↩︎