Plutus is a strict, pure functional language. It is developed by IOHK for use on the Cardano blockchain, but in this blog post we will not be concerned with specific applications of the language, but instead look at its compilation pipeline.

Technically speaking, Plutus is not one language, but three, and most people who write “Plutus” are not really writing Plutus at all, but Haskell. These Haskell programs are translated to the Plutus Intermediate Representation (PIR). After that, data types are replaced by their Scott encodings, and recursion is replaced by explicit fixpoints to get to typed Plutus Core (PLC). Finally, types are erased to get to the Untyped Plutus Core (UPLC).

In this blog post, we will explain this entire process, with a particular focus on

- The consequences of polymorphism in the typed language on the untyped language
- Scott encoding
- Type and term level fixpoints

We won’t try to explain the default syntax of Plutus Core in this blog post, because it is very verbose; the syntax we used here is generated by a custom pretty-printer we wrote. If you want to learn more about the default syntax, or need a more gentle introduction to lambda calculus, you might want to check out An Introduction to Plutus Core.

### Polymorphism

Throughout this blog post we will assume the reader is familiar with Haskell, and we will use Haskell in sections marked *In Haskell* as a vehicle for explaining some of the concepts needed to understand Plutus code, before showing those concepts in Plutus itself.

*In Haskell*. Force and Delay

Suppose we need to work with an API that is stricter than we’d like it to be. To have a concrete but simple (if contrived) example, let’s say we want to use this strict implementation of `if-then-else`

:

```
strictIfThenElse :: Bool -> a -> a -> a
!b !t !f = if b then t else f strictIfThenElse
```

If we call this function with two arguments `t`

and `f`

, it will evaluate *both* arguments before even looking at `b`

, even though only one will be required.

```
> strictIfThenElse True 'a' undefined
*** Exception: Prelude.undefined
```

If we want to avoid this problem, we somehow need to *delay* evaluation of the arguments. We could do this by giving them a dummy argument of unit type:

```
type Delayed a = () -> a
delay :: a -> Delayed a
= const x delay x
```

Going the other direction, if we have a delayed argument, in order to evaluate it be must *force* it:

```
force :: Delayed a -> a
= x () force x
```

Using `delay`

and `force`

we can get the behaviour we’d expect:

```
> force $ strictIfThenElse True (delay 'a') (delay undefined)
'a'
```

#### Simple example: function application

With these first preliminaries out of the way, let’s start taking a look at Plutus now. The compilation pipeline is as follows:

- Plutus programs are written using Haskell as a surface language.
- The Haskell code is translated to the Plutus Intermediate Representation (
**PIR**). - Through a series of transformations the compiler turns PIR into (typed) Plutus Core (
**PLC**). The main difference between PIR and PLC is that the former has support for datatypes and recursion, and the latter does not. - Finally, PLC is turned into Untyped Plutus Core (
**UPLC**).

We will see lots of examples of each of these representations in this blog post, but let’s start with something very simple. Consider the following Haskell code:

```
apMono :: (Integer -> Integer) -> Integer -> Integer
= f x apMono f x
```

For these first few simple examples, the PIR and PLC representations are identical:

`f :: Integer -> Integer) (x :: Integer) -> f x λ (`

PIR and PLC are explicitly typed languages, so every variable binder is given a type annotation. These types are then removed in the translation to UPLC:

`λ f x -> f x`

No surprises so far, so let’s consider something a bit more interesting:

```
apId :: (forall a. a -> a) -> Integer -> Integer
= f x apId f x
```

Same code as above, but with a different type: we now insist that the first argument is polymorphic in `a`

. The only (total) function of type `forall a. a -> a`

is the identity, so `apId`

isn’t particularly useful, but it will serve to illustrate a point. The PIR/PLC translation of `apId`

is:

`f :: ∀ a. a -> a) (x :: Integer) -> f {Integer} x λ (`

Polymorphic functions in Haskell become functions that take type arguments in PIR; in the body we see `f`

being applied to the type argument `Integer`

and then to the regular argument `x`

. (We will use curly brackets for type arguments and type application.) Still nothing unusual, but in the translation to UPLC we see the first Plutus specific feature:

`-> f! x λ f x `

That exclamation mark is the force operator we discussed above. UPLC has explicit constructs for delay and force, which we will write as `λ ~ -> e`

and `e!`

, respectively:

```
~ -> e == delay e
λ ! == force e e
```

The question is: why is `f`

being forced here?

#### Polymorphism versus strictness

To illustrate why `f`

was being forced in that last example, let’s consider a slightly more interesting PIR program^{1}

```
let* !boom = λ {a} -> ERROR
!id = λ {a} (x :: a) -> x
in λ (b :: Bool) -> ifThenElse {∀ a. a -> a} b boom id
```

This defines a function that takes an argument `b`

of type `Bool`

. When the caller passes `True`

, the function evaluates to `ERROR`

(aka `undefined`

in Haskell); otherwise it evaluates to the identity function.

You might expect the untyped version of this program to look something like this:

`-> ifThenElse b ERROR (λ x -> x) λ b `

However, as we mentioned above, Plutus Core is a strict language: function arguments are evaluated *before* the function is called. This means that this translation is not correct: the call to `ERROR`

would be evaluated and the program would crash, independent of the value of `b`

.

In the original program this is not the case: the argument passed to `ifThenElse`

is `(λ {a} -> ERROR)`

, *which is a function*. Evaluating a function does nothing, so the original PIR program does have the behaviour we need.^{2}

This means that when we erase types to get from PLC to UPLC, we have to make sure that we don’t make programs stricter than they were. The Plutus compiler does this by replacing type arguments by delays. The UPLC translation of our example is:

`-> ifThenElse# ! b (λ ~ -> ERROR) (λ ~ y -> y) λ b `

Using the definitions from the section on Force and Delay, we could write this UPLC code in (Pseudo-)Haskell as

```
maybeExplode :: Bool -> Delayed (∀ a. a -> a)
= strictIfThenElse b (delay undefined) (delay id) maybeExplode b
```

This is nearly identical to the `strictIfThenElse`

example we saw before, except that there is no need to force the final result: we *want* to return a delayed function, because we are returning a *polymorphic* function.

Finally, this also explains why we need to force the call to the built-in function `ifThenElse#`

: it too is a polymorphic function, and hence it too is delayed.

### Datatypes

*In Haskell*. Scott encoding

Consider pattern matching on a value of type `Maybe a`

: we will need two cases, one case for `Nothing`

and one case for `Just (x :: a)`

. We can capture this notion of “pattern matching on `Maybe`

” as a function:

```
toSMaybe :: ∀ a. Maybe a -> (∀ r. r -> (a -> r) -> r)
= λ n j -> maybe n j x toSMaybe x
```

As is clear from the definition, this is just the well-known function `maybe`

from the standard library with the arguments in a different order (the reason for the strange name `toSMaybe`

will become apparent soon). We can also go the other way: from the pattern matching function to the value of type `Maybe a`

:

```
fromSMaybe :: (∀ r. r -> (a -> r) -> r) -> Maybe a
= x Nothing Just fromSMaybe x
```

In fact, it’s not difficult to see that these two functions are mutually inverse. This means that the two representations `Maybe a`

and `(∀ r. r -> (a -> r) -> r)`

are isomorphic, and we could define `Maybe`

to *be* the type of pattern matching functions. This is known as the *Scott encoding* of `Maybe`

:

`type SMaybe a = forall r. r -> (a -> r) -> r`

As some further examples of Scott encoding, we will consider the case for booleans, pairs, and unit (we will see the Plutus equivalent of all of these later). In the case of `Bool`

, pattern matching will need a case for `True`

and a case for `False`

; this means that the translation to the Scott encoding turns out to be `if-then-else`

:

```
type SBool = forall r. r -> r -> r
toSBool :: Bool -> SBool
= λ t f -> if x then t else f
toSBool x
fromSBool :: SBool -> Bool
= x True False fromSBool x
```

For pairs, we only need *one* case, which will need to accept both components of the pair as arguments. This means that the Scott-encoding is essentially `uncurry`

:

```
type SPair a b = forall r. (a -> b -> r) -> r
toSPair :: (a, b) -> SPair a b
= λ f -> uncurry f x
toSPair x
fromSPair :: SPair a b -> (a, b)
= x (,) fromSPair x
```

Finally, for unit we only need one case, with no arguments; i.e., the Scott-encoding of the unit value `()`

is the identity function:

```
type SUnit = forall r. r -> r
toSUnit :: () -> SUnit
= id
toSUnit ()
fromSUnit :: SUnit -> ()
= x () fromSUnit x
```

This last example may seem a bit silly, but we will encounter it again later.

##### Scott encoding is not Church encoding

For readers familiar with the Church encoding, it may be worth pointing out that the Scott encoding is *not* a fold; it *only* captures pattern matching, not recursion. Concretely, the Scott encoding of natural numbers is given by

`newtype SNat = SNat (forall r. (SNat -> r) -> r -> r)`

Note the recursive occurrence of `SNat`

here. This means for example that unlike with Church encodings, we can easily define a predecessor function:

```
pred :: SNat -> SNat
pred (SNat x) = x id (error "pred: zero")
```

This example will look different in Plutus, however; we will come back to it when we discuss recursion.

#### Simple example: booleans

Now that we understand what Scott encodings are, let’s look at how booleans work in Plutus Core, and consider compilation of the the Haskell expression

```
ifThenElse :: Bool -> a -> a -> a
= λ b t f -> if b then t else f ifThenElse
```

In PIR, this looks like

```
let* data Bool = True | False
in λ {a} (b :: Bool) (t :: a) (f :: a) ->
Bool_match b {∀ _. a} (λ {_} -> t) (λ {_} -> f) {_}
```

Every PIR program is entirely self-contained (no external libraries), so all datatype and function definitions are included. Unlike in Haskell, datatype declarations are scoped, and automatically introduce an explicit pattern matching function; for `Bool`

this pattern matching function is called `Bool_match`

.

The body of the function does not introduce any concepts we didn’t see before: `if-then-else`

is translated^{3} by a call to the pattern matching function on `Bool`

. In order to prevent the evaluation of `t`

and `f`

before the pattern match on `b`

, we *delay* those arguments by giving them a dummy type argument; this makes them polymorphic functions, and we then *force* the choice once it is made by applying it a dummy type argument^{4}. As we discussed above, this type abstraction and application will then be translated to `force`

/`delay`

calls when the program is translated to UPLC, as we will see shortly.

Of course, in this particular case, the Plutus compiler is being overly conservative: since `t`

and `f`

are *variables* (lambda arguments), we know that they *must already have been evaluated* (Plutus is a strict language after all), and so the force/delay in this example is redundant and could in principle be optimized away.

This is our first example of a program that looks different in PIR and PLC, because PLC does not have data types; instead, we see the Scott encoding of `Bool`

appear:

```
let* type Bool = ∀ r. r -> r -> r
!True = λ {r} (t :: r) (f :: r) -> t
!False = λ {r} (t :: r) (f :: r) -> f
!Bool_match = λ (x :: ∀ r. r -> r -> r) -> x
in λ {a} (b :: Bool) (t :: a) (f :: a) ->
Bool_match b {∀ _. a} (λ {_} -> t) (λ {_} -> f) {_}
```

We can see that `Bool`

is now the Scott-encoding of `Bool`

we saw in the previous section; `True`

and `False`

have become functions that select the first and second argument respectively, and `Bool_match`

is now the identity function because Scott-encoded data types *are* pattern matching functions.

Finally, the UPLC translation is

`~ b t f -> ( b! (λ ~ -> t) (λ ~ -> f) )! λ `

We have now seen all concepts required to understand this code:

- The function
*itself*is polymorphic, and so it is delayed; this explains the initial dummy argument. `b`

is the Scott-encoding of a Bool, is therefore a polymorphic function, which turned into a function with a dummy argument in UPLC, and so we must*force*`b`

in order to actually use that function.- We delay both
`t`

and`f`

, and once the choice is made, we force the result.

This definition is very similar to the `strictIfThenElse`

example above; the only difference really is the top-level delay; it’s as if we had written this in Haskell:

`$ λ b t f -> force $ force b (delay t) (delay f) delay `

#### Types with arguments: pairs

The `Bool`

type discussed in the previous section had no arguments: it had kind `Type`

. As a second example of Scott encodings, let’s consider a type that *does* have arguments: the type of pairs. This type has *two* type arguments (the two components), and hence has kind `Type -> Type -> Type`

(also written `* -> * -> *`

).

Let’s consider the compilation of the Haskell program

```
fst :: (a, b) -> a
fst (x, _y) = x
```

In PIR, this program looks like:

```
let* data (Tuple2 :: * -> * -> *) a b = Tuple2 a b
in λ {a} {b} (pair :: Tuple2 a b) ->
Tuple2_match {a} {b} pair {a} (λ (x :: a) (y :: b) -> x)
```

Like before, the pattern match on the pair has been replaced by an explicit call to the pattern matching function. Unlike in the `Bool`

case, this pattern matching function now has *three* type arguments: two for the type arguments of the pair (`a`

and `b`

), and one for the result type (here, `a`

, because we are selecting the first component of the pair).

The PLC version of this program is

```
let* type Tuple2 :: * -> * -> * = Λ a b. ∀ r. (a -> b -> r) -> r
!Tuple2 = λ {a} {b} (x :: a) (y :: b) {r} (f :: a -> b -> r) -> f x y
!Tuple2_match = λ {a} {b} (x :: (Λ a b. ∀ r. (a -> b -> r) -> r) a b) -> x
in λ {a} {b} (pair :: Tuple2 a b) ->
Tuple2_match {a} {b} pair {a} (λ (x :: a) (y :: b) -> x)
```

The most interesting thing here is that the type `Tuple2`

has become a *type-level function* (syntax `Λ a. t`

), which computes the type of the pattern matching function *given* two arguments `a`

and `b`

. Note the kind difference between

`Λ a b. ... :: * -> * -> *`

and (for a given `a`

and `b`

)

`∀ r. ... :: *`

The rest of the code is reasonably straight-forward: the `Tuple2`

term is the constructor for pairs, `Tuple2_match`

is the pattern matching function and is again the identity function (because data types *are* pattern matching functions when we use Scott encodings).

Finally, this program compiles down to the following UPLC:

`~ ~ pair -> pair! (λ x y -> x) λ `

No new concepts here; it is perhaps somewhat surprising to see the *double* delay of this function, arising from the fact that it has *two* type arguments. Indeed, one might wonder if some of these calls to delay/force might not be avoided through a compiler optimization pass.

### Type-level recursion

*In Haskell*. Type-level fixpoints

When we introduced Scott encoding, we saw that we can take something that we normally think of as a feature *of* a language (pattern matching) and turn it into something that we can work with *in* the language (functions). In this section we will see that we can do the same with another language feature: type-level recursion (we will delay a discussion of *term* level recursion until the next section). For a discussion of why one might do this in Haskell, see for example the paper by Rodriguez et al. on generic programming, or the paper by Bahr and Hvitved on compositional data types; why the Plutus compiler does it is discussed in the Plutus paper.

Consider the standard definition of lists:

`data List a = Nil | Cons a (List a)`

`List`

takes a single type argument, and hence has kind

`List :: * -> *`

In order to identify recursion as its own concept, we can abstract out the recursive use of `List`

, and give `List`

an additional argument that will stand in for “recursive calls”:

`data ListF f a = Nil | Cons a (f a)`

Now `f`

has the same kind that `List`

had previously, which means that we have

`ListF :: (* -> *) -> (* -> *)`

What should we pass as the value for `f`

? In a sense, we want to pass `ListF`

*itself*, but if we do, then *that* `ListF`

also wants an argument, ad infinitum:

`ListF (ListF (ListF (ListF ..))) a`

This, of course, is precisely recursion. We can capture this using a type-level fixpoint operator:

`newtype Fix (t :: (* -> *) -> (* -> *)) (a :: *) = WrapFix (t (Fix t) a)`

The kind of the argument `t`

to `Fix`

is precisely the kind of `ListF`

, so we can define

`type List = Fix ListF`

Let’s check if this makes sense:

```
List a
== Fix ListF a
== ListF (Fix ListF) a
== ListF List a
```

*In Haskell*. Generalizing to arbitrary kinds: `IFix`

If we look closely at the arguments of `Fix`

, we realize that it is more (kind) monomorphic than it needs to be, and there is an obvious generalization:

`newtype IFix (t :: (k -> *) -> (k -> *)) (a :: k) = WrapIFix (t (IFix t) a)`

The Plutus paper goes through some length explaining why this definition is sufficient to capture *all* datatypes (Section 3.1, Recursive types). We will not do that in this blog post, but we will see three different example use cases, starting with lists: notice that we get `Fix`

back simply by instantiating `k`

to `*`

:

`type List = IFix FListF`

Let’s introduce two functions to wrap and unwrap `IFix`

:

```
wrap :: t (IFix t) a -> IFix t a
= WrapIFix
wrap
unwrap :: IFix t a -> t (IFix t) a
WrapIFix t) = t unwrap (
```

If we now write functions on lists, we must `wrap`

and `unwrap`

at the appropriate times:

```
nil :: FList a
= wrap $ Nil
nil
cons :: a -> FList a -> FList a
= wrap $ Cons x xs
cons x xs
null :: FList a -> Bool
null xs =
case unwrap xs of
Nil -> True
Cons _ _ -> False
```

*In Haskell*. Combining with Scott encoding

Before we look at how these fixpoints are used in Plutus, let’s stay in Haskell a tiny bit longer, and first see how fixpoints can be combined with Scott encoding.

First, let’s see the Scott encoding of lists *without* the use of `IFix`

:

```
newtype List a = List {
listMatch :: forall r. r -> (a -> List a -> r) -> r
}
```

The pattern matching function needs two cases: one for `nil`

and one for `cons`

; the latter then takes two arguments, the head and tail of the list. Some simple examples:

```
nil :: List a
= List $ λ n _c -> n
nil
cons :: a -> List a -> List a
= List $ λ _ c -> c x xs
cons x xs
null :: List a -> Bool
null xs = listMatch xs True (λ _ _ -> False)
```

To combine the Scott encoding with the use of explicit fixpoints, we need to go through the same process as before, and give `List`

an additional argument that captures the recursive uses:

```
newtype ListF f a = List {
listMatch :: forall r. r -> (a -> f a -> r) -> r
}
type List = IFix IListF
```

The same three functions we defined above now become:

```
nil :: List a
= wrap $ List $ λ n _c -> n
nil
cons :: a -> List a -> List a
= wrap $ List $ λ _n c -> c x xs
cons x xs
null :: List a -> Bool
null xs = listMatch (unwrap xs) True (λ _ _ -> False)
```

#### Simple example: lists

We are now finally ready to look at how lists work in Plutus. Let’s start with the definition of `nil`

. In PIR:

```
let rec data (List :: * -> *) a = Nil | Cons a (List a)
in Nil
```

The only new PIR feature we see here is marking the definition of `Nat`

as recursive: `let`

-bindings in PIR are non-recursive unless otherwise indicated.

Let’s look the PLC version now, which is considerably more involved; PLC has neither datatypes nor recursion, and instead uses Scott encodings and the `IFix`

combinator^{5}:

```
let* type ListF = Λ (f :: * -> *). Λ a. ∀ r. r -> (a -> (f a) -> r) -> r
type List = Λ a. ifix ListF a
!Nil = λ {a} ->
ListF} {a} (
wrap {n :: r) (c :: a -> List a -> r) -> n)
λ {r} (!Cons = λ {a} (x :: a) (xs :: List a) ->
ListF} {a} (
wrap {n :: r) (c :: a -> (List a) -> r) -> c x xs)
λ {r} (!List_match = λ {a} (x :: List a) ->
unwrap xin Nil
```

Hopefully this PLC code is not difficult to understand anymore now; `ifix`

, `wrap`

and `unwrap`

are PLC primitives. To understand the untyped Plutus code, we merely need to know that `wrap`

and `unwrap`

are simply erased, so this whole program compiles down to

`~ ~ n c -> n λ `

The double delay comes from the two type parameters: the type of the list elements `a`

, and the type of the continuation `r`

.

As another example, let’s look at `null`

(check if a list is empty). First, in PIR:

```
let* data Bool = True | False
data List :: * -> * a = Nil | Cons a (List a)
rec in λ {a} (xs :: List a) ->
List_match {a} xs {∀ _. Bool}
-> True)
(λ {_} x :: a) (xs' :: List a) {_} -> False)
(λ ( {_}
```

The pattern match on the list is replaced by a call to the pattern matching function, and the two arguments for the `Nil`

and `Cons`

cases are delayed, to avoid unnecessary computation; we have seen these patterns before (and, as before, here too the use of `delay`

is unnecessarily conservative). In PLC:

```
let* -- definition of IListF, List, Nil, Cons, List_match as above
-- definition of Bool, True, False as discussed in section on datatypes
..
in λ {a} (xs :: List a) ->
List_match {a} xs {∀ _. Bool}
-> True)
(λ {_} x :: a) (xs :: List a) {_} -> False)
(λ ( {_}
```

The main code looks identical, but the definitions have changed from the PLC version: `List_match`

, `True`

and `False`

are now all using the Scott-encoding. In UPLC:

`~ xs -> ( xs! (λ ~ ~ t f -> t) (λ x xs ~ ~ t f -> f) )! λ `

Let’s make sure we really understand this code:

- The top-level delay
`(λ ~ xs -> ..)`

is there because`nil`

is a polymorphic function (type argument`a`

). - We must force
`xs`

because`xs`

is represented by the Scott-encoding of a list, which is therefore itself a polymorphic function (type argument`r`

). - We give two arguments to
`xs`

: the case for`nil`

and the case for`cons`

. We delay both of those arguments, and then force the result once the choice is made. - Finally,
`True`

(i.e.,`(λ t f -> t)`

in Scott encoding) and`False`

(i.e,`(λ t f -> f)`

)*themselves*are polymorphic functions (they have their own`r`

type argument), and we therefore have one more delay in both arguments to`xs`

.

*In Haskell*. Types of kind `*`

Lists are probably the simplest example of `IFix`

, because the kind of `ListF`

`ListF :: (* -> *) -> (* -> *)`

lines up very nicely with the kind of `IFix`

:

`IFix :: ∀ k. ((k -> *) -> (k -> *)) -> (k -> *)`

simply by picking `k = *`

. We’ll now discuss an example that doesn’t line up quite so nicely: natural numbers. Basic definition:

`data Nat = Succ Nat | Zero`

Abstracting out the recursion:

`data NatF f = Succ f | Zero`

Since `NatF`

does not take any argument, its kind does not line up with what `IFix`

wants:

`NatF :: * -> *`

What we will do instead is pick some type `SimpleRec`

such that

`IFix SimpleRec :: (* -> *) -> *`

In other words, we will pick `k = * -> *`

. This then allows us to then define

`type Nat = IFix SimpleRec NatF`

Unlike in the list example, `NatF`

is now the *second* argument to `IFix`

, not the first; this is quite a different use of `IFix`

.^{6}

Here’s how we can define `SimpleRec`

:

`newtype SimpleRec (rec :: (* -> *) -> *) (f :: (* -> *)) = SimpleRec (f (rec f))`

Let’s see if this makes sense:

```
Nat
== IFix SimpleRec NatF
== SimpleRec (IFix SimpleRec) NatF
== NatF (IFix SimpleRec NatF)
== NatF Nat
```

#### Example: natural numbers

Now that we have seen the theory in Haskell, the Plutus translation is not that difficult anymore. Let’s consider the compilation of

```
data Nat = Zero | Succ Nat
isZero :: Nat -> Bool
Zero = True
isZero = False isZero _
```

to Plutus. First, PIR:

```
let* data Bool = True | False
let rec data Nat = Succ Nat | Zero
in λ (n :: Nat) ->
Nat_match n {∀ _. Bool}
n' :: Nat) {_} -> False)
(λ (-> True)
(λ {_} {_}
```

Nothing new here. Let’s look at PLC:

```
let* type SimpleRec = Λ (rec :: (* -> *) -> *). Λ (f :: * -> *). f (rec f)
type NatF = Λ f. ∀ r. (f -> r) -> r -> r
type Nat = ifix SimpleRec NatF
!Succ = λ (n :: Nat) -> wrap {SimpleRec} {NatF} (
s :: Nat -> r) (z :: r) -> s n)
λ {r} (!Zero = wrap {SimpleRec} {NatF} (
s :: Nat -> r) (z :: r) -> z)
λ {r} (!Nat_match = λ (x :: Nat) -> unwrap x
-- definition of Bool, True, False as discussed in section on datatypes
in λ (n :: Nat) ->
Nat_match n {∀ _. Bool}
ipv :: Nat) {_} -> False)
(λ (-> True)
(λ {_} {_}
```

Other than the use of `SimpleRec`

that we discussed in the previous section^{7}, this introduces no new concepts. The UPLC code is straight-forward, because the wrapping/unwrapping simply disappears:

`-> ( n! (λ ipv ~ ~ t f -> f) (λ ~ ~ t f -> t) )! λ n `

### Term-level recursion

*In Haskell*. Term-level fixpoints

So far we have been talking about recursive *types*; let’s now turn our attention to recursive *terms*. Let’s consider addition on natural numbers:

```
data Nat = Succ Nat | Zero
add :: Nat -> Nat -> Nat
Zero m = m
add Succ n) m = Succ (add n m) add (
```

Just like we did with types, we can abstract out the recursive call:

```
addF :: (Nat -> Nat -> Nat) -> Nat -> Nat -> Nat
Zero m = m
addF f Succ n) m = Succ (f n m) addF f (
```

And as with types, we really want to pass the function *itself* as its own argument:

`..))) addF (addF (addF (addF `

We can do this by defining a term-level fixpoint combinator:

```
fix :: (a -> a) -> a
= f (fix f)
fix f
add :: Nat -> Nat -> Nat
= fix addF add
```

Let’s verify:

```
add== fix addF
== addF (fix addF)
== addF add
```

*In Haskell*. Deriving term-level recursion from type-level recursion

In the previous section, we abstracted out the recursion from `add`

, but we still used recursion in the definition of `fix`

. Stunningly, it is possible to define the term-level fixpoint combinator without using recursion at all! This is the famous Y-combinator. As we saw above, at the heart of recursion is applying a function to itself, and this self-application is also at the heart of the Y-combinator; in pseudo-Haskell:

```
fix :: forall a. (a -> a) -> a
= λ f -> (λ s -> f (s s)) (λ s -> f (s s)) -- not real Haskell fix
```

Let’s verify:

```
add== fix addF
== (λ s -> addF (s s)) (λ s -> addF (s s))
== addF ((λ s -> addF (s s)) (λ s -> addF (s s)))
== addF (fix addF)
== addF add
```

The above definition is pseudo-Haskell, because it does not typecheck. Consider

```
selfApply :: ??
= f f selfApply f
```

What would be the type of `selfApply`

? Since we are apply `f`

to *something*, it must have type `a -> b`

for some `a`

and `b`

; and since we are applying it to itself, we must then also have that `a`

*is the same as* `a -> b`

. If we tried this in `ghc`

, it would complain that it cannot solve this problem:

```
• Occurs check: cannot construct the infinite type: a ~ a -> b
• In the first argument of ‘f’, namely ‘f’
In the expression: f f
In an equation for ‘selfApply’: selfApply f = f f
```

However, we *can* define a type `T b`

such that `T b`

is isomorphic to `T b -> b`

:

`newtype Self b = WrapSelf { unwrapSelf :: Self b -> b }`

This allows us to define self-application^{8}:

```
selfApply :: Self b -> b
{-# NOINLINE selfApply #-}
= (unwrapSelf f) f selfApply f
```

and therefore the Y-combinator:

```
fix :: forall a. (a -> a) -> a
= selfApply $ WrapSelf $ λ s -> f (selfApply s) fix f
```

If we remove all the newtype wrapping and unwrapping, we can see that this is indeed the Y-combinator:

```
fix f== selfApply $ WrapSelf $ λ s -> f (selfApply s)
== selfApply $ λ s -> f (selfApply s)
== selfApply $ λ s -> f (s s)
== (λ s -> f (s s)) (λ s -> f (s s))
```

*In Haskell*. Using `IFix`

We have now managed to define a term-level fixpoint combinator without using term-level recursion, but we still used type-level recursion in the definition of `Self`

. The final step is to define `Self`

using `IFix`

instead. As always, we first abstract out the recursion from `Self`

:

```
newtype SelfF (self :: * -> *) (a :: *) = SelfF {
unSelfF :: self a -> a
}
```

Fortunately, the kind of `SelfF`

lines up nicely with the kind expected by `IFix`

, so we can simply define

`type Self = IFix SelfF`

The definition of the fixpoint combinator remains essentially the same; we just need to now wrap/unwrap both the `SelfF`

newtype and the `IFix`

newtype.

```
fix :: forall a. (a -> a) -> a
= let w = wrap $ SelfF $ λ s -> f (selfApply s) in selfApply w fix f
```

However, Since Plutus is a strict language, it only makes sense to take the fixpoint of functions (you can’t define an infinite list in Plutus, for example). The fixpoint in Plutus therefore is specialized to functions; in Haskell we could define this as:^{9}

```
fixFun :: forall a b. ((a -> b) -> (a -> b)) -> (a -> b)
= let w = wrap $ SelfF $ λ s x -> f (selfApply s) x in selfApply w fixFun f
```

#### Example: addition

We are now ready to discuss our final and most involved Plutus program: addition. As a reminder, the Haskell code looked like

```
data Nat = Succ Nat | Zero
add :: Nat -> Nat -> Nat
Zero m = m
add Succ n) m = Succ (add n m) add (
```

In PIR:

```
let rec data Nat = Succ Nat | Zero
!add = λ (n :: Nat) (m :: Nat) ->
rec Nat_match n {∀ _. Nat}
n' :: Nat) {_} -> Succ (add n' m))
(λ (-> m)
(λ {_}
{_}in add
```

To go from PIR to PLC, we need to move the type-level recursion from `Nat`

, the term-level recursion from `add`

, and introduce Scott encoding:^{10}

```
let* type SimpleRec = Λ (rec :: (* -> *) -> *). Λ (f :: * -> *). f (rec f)
type NatF = Λ f. ∀ r. (f -> r) -> r -> r
type Nat = ifix SimpleRec NatF
!Succ = λ (n :: Nat) ->
SimpleRec} {NatF} (
wrap {s :: Nat -> r) (z :: r) -> s n)
λ {r} (!Zero = wrap {SimpleRec} {NatF} (
s :: Nat -> r) (z :: r) -> z)
λ {r} (!Nat_match = λ (x :: Nat) -> unwrap x
type SelfF = Λ (self :: * -> *). Λ a. self a -> a
type Self = Λ a. ifix SelfF a
!addF = λ (f :: Nat -> Nat -> Nat) (n :: Nat) (m :: Nat) ->
Nat_match n {∀ _. Nat}
n' :: Nat) {_} -> Succ (f n' m))
(λ (-> m)
(λ {_}
{_}!w = wrap {SelfF} {Nat -> Nat -> Nat} (
s :: Self (Nat -> Nat -> Nat)) (n :: Nat) ->
λ (
addF ((unwrap s) s) n)!add = (unwrap w) w
in add
```

There is a lot of detail here, but no new concepts. The most important thing to realize is that the definition of `addF`

is precisely the one we expect, and that we can recognize the definition of `w`

and the use of `(unwrap w) w`

as being precisely the application of (what we called) `fixFun`

to `addF`

.

We are now ready to understand the UPLC version:

```
let* w = λ s n ->
let* add = s s
in λ m -> ( n! (λ n' ~ ~ s z -> s (add n' m)) (λ ~ -> m) )!
in w w
```

This code is pretty difficult to understand without seeing the typed version, but if we compare this to the PLC version we can see what is going on. The only somewhat confusing part is that the compiler has inlined the definition of `addF`

, and assigned the name `add`

to the term `(s s)`

(not unjustifiably: it corresponds to the recursive calls). The double delay in the first argument to `n`

comes from two places: one delay from the definition of `addF`

, and one delay from the definition of `Succ`

.

### The cost of polymorphism

In this final section we will not introduce any new Plutus concepts, but we will consider the following question: suppose we have a *monomorphic* top-level Plutus program; what is the cost of defining this in terms of some *polymorphic* functions? In an ideal world, this cost would be zero, but unfortunately this is not the case at the moment.

#### Parametric polymorphism

We have already seen the cost of polymorphism in the proliferation of `delay`

/`force`

calls in UPLC. However, there is another cost. Let’s consider the compilation of this Haskell code:

```
reversePoly :: [a] -> [a]
= reversePoly' []
reversePoly
reversePoly' :: [a] -> [a] -> [a]
= acc
reversePoly' acc [] :xs) = reversePoly' (x:acc) xs
reversePoly' acc (x
useReversePoly :: [Integer] -> [Integer]
= reversePoly . reversePoly useReversePoly
```

The PIR code is unsurprising:

```
let* rec data List :: * -> * a = Nil | Cons a (List a)
!reversePoly' =
rec acc :: List a) (xs :: List a) ->
λ {a} (List_match {a} xs {∀ _. List a}
-> acc)
(λ {_} x :: a) (xs' :: List a) {_} -> reversePoly' {a} (Cons {a} x acc) xs')
(λ (
{_}!reversePoly = λ {a} -> reversePoly' {a} (Nil {a})
in (λ {b} {c} {a} (f :: b -> c) (g :: a -> b) (x :: a) -> f (g x))
List Integer}
{List Integer}
{List Integer}
{Integer})
(reversePoly {Integer}) (reversePoly {
```

But the PLC code isn’t; specifically, the compilation of `reversePoly'`

:

```
let* type Id = ∀ a. a -> a
type Reverse = ∀ a. List a -> List a -> List a
!reversePoly'F =
f :: Id -> Reverse) (_ :: Id) {a} (acc :: List a) (xs :: List a) ->
λ (List_match {a} xs {∀ _. List a}
-> acc)
(λ {_} x :: a) (xs' :: List a) {_} -> f (λ {a} (x :: a) -> x) {a} (Cons {a} x acc) xs')
(λ (
{_}
!w = wrap {SelfF} {Id -> Reverse} (λ (s :: Self (Id -> Reverse)) (x :: Id) ->
reversePoly'F ((unwrap s) s) x)in (unwrap w) w
```

The definition of `reversePoly'F`

is the version of `reversePoly'`

with the recursive call abstracted out, as before. However, it seems to have gotten an additional argument of type `Id = ∀ a. a -> a`

. This is essentially a different way to delay. Recall from the section on Force and Delay that delay conceptually corresponds to introducing a parameter of type unit; recall furthermore from the section on Scott encoding that the Scott encoding of unit is `∀ a. a -> a`

.

So we can understand the (dummy) parameter of type `Id`

as another delay, but the reason *why* this delay is there, and why it doesn’t just use another type abstraction, is rather technical. Here we will merely state that any recursive polymorphic function will receive such a dummy argument at present; unfortunately, this then also results in some additional code in UPLC, because this does *not* get translated the native `delay`

/`force`

constructs:

```
let* w = λ s _ -> let* reversePoly = s s
in λ ~ acc xs ->
!
( xs~ -> acc)
(λ ~ -> reversePoly (λ ~ x -> x) ! (λ ~ n c -> c x acc) xs')
(λ x xs' !
) = w w
reversePoly_0 = λ ~ -> reversePoly_0 (λ ~ x -> x) ! (λ ~ n c -> n)
reversePoly_1 = reversePoly_1 !
f = reversePoly_1 !
g in λ x_3 -> f (g x_3)
```

Not only do we see that `(λ ~ x -> x)`

being passed around, but it’s also not shared, so this can result in fairly significant code expansion.

#### Adhoc polymorphism

Let’s now consider what happens if we have a type class constraint:

```
allEqualPoly :: forall a. Eq a => a -> [a] -> Bool
= True
allEqualPoly _ [] :ys) = x == y && allEqualPoly x ys
allEqualPoly x (y
useAllEqualPoly :: [Integer] -> Bool
= allEqualPoly 1 useAllEqualPoly
```

PIR:

```
let rec !allEqualPoly = λ {a} (dEq :: (Λa. a -> a -> Bool) a) (x :: a) (ys :: List a) ->
List_match {a} ys {∀ _. Bool}
-> True)
(λ {_} y :: a) (ys' :: List a) {_} ->
(λ (let* !l = dEq x y
!r = allEqualPoly {a} dEq x ys'
in Bool_match l {∀ _. Bool} (λ {_} -> r) (λ {_} -> False) {_})
{_}in allEqualPoly {Integer} (λ (x :: Integer) (y :: Integer) ->
Bool} (equalsInteger x y) True False) 1 ifThenElse {
```

We see that the type class constraint has been replaced with a dictionary argument. The unfortunate thing is that this dictionary argument does not get inlined; in UPLC this program looks like

```
let* True = λ ~ t f -> t
False = λ ~ t f -> f
= λ s _ -> let* allEqualPoly = s s
w in λ ~ dEq x ys ->
!
( ys~ -> True)
(λ ~ -> let* l = dEq x y;
(λ y ys' = (allEqualPoly (λ ~ x -> x))! dEq x ys'
r in ( l! (λ ~ -> r) (λ ~ -> False) )!
)!
)= w w
allEqualPoly
in (allEqualPoly (λ ~ x -> x))! (λ x y -> ifThenElse# ! (equalsInteger# x y) True False) 1#
```

This lack of specialization may also prevent further optimizations (we see the same `Id`

dummy argument that we discussed in the previous section here too, because `allEqualPoly`

is a polymorphic recursion function).

#### Multi-parameter type classes

Finally, let’s consider what happens with multi-parameter type classes

```
class Convert a b where
convert :: a -> b
instance Convert Integer Bool where
= n == 0
convert n
convertPair :: (Convert a a', Convert b b') => (a, b) -> (a', b')
= (convert x, convert y)
convertPair (x, y)
useConvertPair :: (Integer, Integer) -> (Bool, Bool)
= convertPair useConvertPair
```

(The Plutus compiler does not currently support associated data types, nor associated type synonyms.) In PIR:

```
let* data Bool = True | False
= λ(n :: Integer) -> ifThenElse {Bool} (equalsInteger n 0) True False
fConvertIntegerBool data Tuple2 :: * -> * -> * a b = Tuple2 a b
!dConvert = fConvertIntegerBool
!dConvert = fConvertIntegerBool
in λ(pair :: Tuple2 Integer Integer) ->
Tuple2_match
Integer}
{Integer}
{
pairTuple2 Bool Bool}
{x :: Integer) (y :: Integer) -> Tuple2 {Bool} {Bool} (dConvert x) (dConvert y)) (λ(
```

UPLC:

```
let* fConvertIntegerBool = \ ~ n ->
ifThenElse# ! (equalsInteger# n 0#) (\ ~ t f -> t) (\ ~ t f -> f)
dConvert = fConvertIntegerBool!
dConvert' = fConvertIntegerBool!
in \ ds -> ds ! (\ x y ~ pair -> pair (dConvert x) (dConvert' y))
```

Fortunately, here we see no remnants of the fact that we used a type class with multiple parameters.

#### Taking advantage of `ghc`

optimizations

You might wonder why `ghc`

didn’t inline the dictionary in the example of adhoc polymorphism that we saw above. The answer is that it *can*. All of the examples above had the definition of the Haskell code in the same Haskell module as the call to `compile`

:

```
useAllEqualPoly :: [Integer] -> Bool
= ..
useAllEqualPoly
compiledUseAllEqualPoly :: CompiledCode ([Integer] -> Bool)
= $$(compile [|| useAllEqualPoly ||]) compiledUseAllEqualPoly
```

A consequence of this setup is that the Plutus compilation plugin will see the code *before* the `ghc`

optimizer gets to work on it. If instead we put the definition of `useAllEqualPoly`

in a different module than the call to `compile`

, and we add

```
{-# OPTIONS_GHC -fexpose-all-unfoldings #-}
{-# OPTIONS_GHC -fspecialize-aggressively #-}
```

to the module defining `useAllEqualPoly`

, we see that `ghc`

inlines the dictionary entirely, and the resulting UPLC code is of the same size as the code resulting from a *monomorphic* definition of `allEqual`

.

That is encouraging, but unfortunately this does not always result in smaller code. As a simple example, consider something like

```
pred :: Nat -> Nat
pred Zero = error ()
pred (Succ n) = n
predder :: Nat -> Nat
= pred . pred predder
```

The compilation of `predder`

is about 50% larger when compiled with `ghc`

optimizations, because `ghc`

inlined the definition of `pred`

. There are other, less obvious, examples too. For instance, in the parametric polymorphic example of `reverse`

, enabling the `ghc`

optimizer results in more polymorphic recursive let-bindings of the shape discussed in that section, and hence in more of those `(λ ~ x -> x)`

terms, resulting in larger code size.

So your mileage may vary; perhaps enabling the `ghc`

optimizer will result in smaller UPLC code, perhaps not. It may require careful tweaking of the code and/or the `ghc`

optimization flags.

### Conclusions

For many Plutus developers, UPLC is of particular interest, because Plutus programs are subject to stringent resource bounds and so it is often necessary to optimize the UPLC code by making tweaks to the Haskell source code. In order to do so, it helps to understand the compilation pipeline, which is what this blog post attempted to illustrate.

This is of course not the only way to improve the UPLC code. For example, Well-Typed has been working on an optimizing compiler for UPLC called Plutonomy; as one example, Plutonomy can reduce the size of the Marlowe Plutus script by about 20%. However, Plutonomy is hampered by the lack of type information in UPLC; indeed, the presence of explicit delay/force constructs in UPLC is a blessing, enabling some optimizations that would not otherwise be possible without type inference or some other kind of global analysis. Perhaps an optimizer that operated at the level of PLC instead of UPLC would be beneficial for some applications.

An alternative approach is to bypass the entire compilation pipeline altogether, and instead write UPLC code “directly.” One project that makes this possible is Plutarch, which provides a typed EDSL in Haskell for UPLC.

We hand-wrote this example in PIR for this blog post, since illustrating this with a PIR program that would result from the normal Haskell to PIR compiler would look more complicated.↩︎

This is different to the semantics of type arguments in

`ghc`

core: there evaluation does continue underneath the binder.↩︎The Haskell

`Bool`

type does not translate to the built-in`Bool`

type in Plutus Core; that is used only by built-in functions.↩︎The actual dummy argument provided is

`∀ a. a -> a`

.↩︎This is not quite the output of the Plutus compiler; cleaned up a bit to improve readability↩︎

This is not the only possible approach. An alternative would be to define a version of

`Nat`

with an argument, over the unit kind. This would perhaps be more consistent; what we illustrate here is what the Plutus compiler does and it may not be unreasonable to special case types of kind`*`

, given how common they are.↩︎`SimpleRec`

is not named as such in PLC; the definition is simply inlined.↩︎`selfApply`

must be marked as no-inline to prevent`ghc`

from unrolling recursion at compile time forever.↩︎We could of course just define

`fixFun = fix`

; the definition of`fixFun`

we show here matches the one used in Plutus, and is therefore more helpful when trying to understand PLC/UPLC code.↩︎Again, code cleaned up a bit for improved readability.↩︎