Plutus is a strict, pure functional language. It is developed by IOHK for use on the Cardano blockchain, but in this blog post we will not be concerned with specific applications of the language, but instead look at its compilation pipeline.
Technically speaking, Plutus is not one language, but three, and most people who write “Plutus” are not really writing Plutus at all, but Haskell. These Haskell programs are translated to the Plutus Intermediate Representation (PIR). After that, data types are replaced by their Scott encodings, and recursion is replaced by explicit fixpoints to get to typed Plutus Core (PLC). Finally, types are erased to get to the Untyped Plutus Core (UPLC).
In this blog post, we will explain this entire process, with a particular focus on
- The consequences of polymorphism in the typed language on the untyped language
- Scott encoding
- Type and term level fixpoints
We won’t try to explain the default syntax of Plutus Core in this blog post, because it is very verbose; the syntax we used here is generated by a custom pretty-printer we wrote. If you want to learn more about the default syntax, or need a more gentle introduction to lambda calculus, you might want to check out An Introduction to Plutus Core.
Polymorphism
Throughout this blog post we will assume the reader is familiar with Haskell, and we will use Haskell in sections marked In Haskell as a vehicle for explaining some of the concepts needed to understand Plutus code, before showing those concepts in Plutus itself.
In Haskell. Force and Delay
Suppose we need to work with an API that is stricter than we’d like it to be.
To have a concrete but simple (if contrived) example, let’s say we want to use
this strict implementation of if-then-else
:
strictIfThenElse :: Bool -> a -> a -> a
!b !t !f = if b then t else f strictIfThenElse
If we call this function with two arguments t
and f
, it will evaluate both
arguments before even looking at b
, even though only one will be required.
> strictIfThenElse True 'a' undefined
*** Exception: Prelude.undefined
If we want to avoid this problem, we somehow need to delay evaluation of the arguments. We could do this by giving them a dummy argument of unit type:
type Delayed a = () -> a
delay :: a -> Delayed a
= const x delay x
Going the other direction, if we have a delayed argument, in order to evaluate it be must force it:
force :: Delayed a -> a
= x () force x
Using delay
and force
we can get the behaviour we’d expect:
> force $ strictIfThenElse True (delay 'a') (delay undefined)
'a'
Simple example: function application
With these first preliminaries out of the way, let’s start taking a look at Plutus now. The compilation pipeline is as follows:
- Plutus programs are written using Haskell as a surface language.
- The Haskell code is translated to the Plutus Intermediate Representation (PIR).
- Through a series of transformations the compiler turns PIR into (typed) Plutus Core (PLC). The main difference between PIR and PLC is that the former has support for datatypes and recursion, and the latter does not.
- Finally, PLC is turned into Untyped Plutus Core (UPLC).
We will see lots of examples of each of these representations in this blog post, but let’s start with something very simple. Consider the following Haskell code:
apMono :: (Integer -> Integer) -> Integer -> Integer
= f x apMono f x
For these first few simple examples, the PIR and PLC representations are identical:
f :: Integer -> Integer) (x :: Integer) -> f x λ (
PIR and PLC are explicitly typed languages, so every variable binder is given a type annotation. These types are then removed in the translation to UPLC:
λ f x -> f x
No surprises so far, so let’s consider something a bit more interesting:
apId :: (forall a. a -> a) -> Integer -> Integer
= f x apId f x
Same code as above, but with a different type: we now insist that the first
argument is polymorphic in a
. The only (total) function of type
forall a. a -> a
is the identity, so apId
isn’t particularly useful, but it
will serve to illustrate a point. The PIR/PLC translation of apId
is:
f :: ∀ a. a -> a) (x :: Integer) -> f {Integer} x λ (
Polymorphic functions in Haskell become functions that take type arguments in
PIR; in the body we see f
being applied to the type argument Integer
and
then to the regular argument x
. (We will use curly brackets for type arguments
and type application.) Still nothing unusual, but in the translation to UPLC we
see the first Plutus specific feature:
-> f! x λ f x
That exclamation mark is the
force operator we discussed above. UPLC has explicit constructs for delay and force,
which we will write as λ ~ -> e
and e!
, respectively:
~ -> e == delay e
λ ! == force e e
The question is: why is f
being forced here?
Polymorphism versus strictness
To illustrate why f
was being forced in that last example, let’s consider
a slightly more interesting PIR program1
let* !boom = λ {a} -> ERROR
!id = λ {a} (x :: a) -> x
in λ (b :: Bool) -> ifThenElse {∀ a. a -> a} b boom id
This defines a function that takes an argument b
of type Bool
. When the
caller passes True
, the function evaluates to ERROR
(aka undefined
in
Haskell); otherwise it evaluates to the identity function.
You might expect the untyped version of this program to look something like this:
-> ifThenElse b ERROR (λ x -> x) λ b
However, as we mentioned above, Plutus Core is a strict language: function
arguments are evaluated before the function is called. This means that this
translation is not correct: the call to ERROR
would be evaluated and the
program would crash, independent of the value of b
.
In the original program this is not the case: the argument passed to
ifThenElse
is (λ {a} -> ERROR)
, which is a function. Evaluating a function
does nothing, so the original PIR program does have the behaviour we need.2
This means that when we erase types to get from PLC to UPLC, we have to make sure that we don’t make programs stricter than they were. The Plutus compiler does this by replacing type arguments by delays. The UPLC translation of our example is:
-> ifThenElse# ! b (λ ~ -> ERROR) (λ ~ y -> y) λ b
Using the definitions from the section on Force and Delay, we could write this UPLC code in (Pseudo-)Haskell as
maybeExplode :: Bool -> Delayed (∀ a. a -> a)
= strictIfThenElse b (delay undefined) (delay id) maybeExplode b
This is nearly identical to the strictIfThenElse
example we saw before, except
that there is no need to force the final result: we want to return a delayed
function, because we are returning a polymorphic function.
Finally, this also explains why we need to force the call to the built-in
function ifThenElse#
: it too is a polymorphic function, and hence it too is
delayed.
Datatypes
In Haskell. Scott encoding
Consider pattern matching on a value of type Maybe a
: we will need two cases,
one case for Nothing
and one case for Just (x :: a)
. We can capture this
notion of “pattern matching on Maybe
” as a function:
toSMaybe :: ∀ a. Maybe a -> (∀ r. r -> (a -> r) -> r)
= λ n j -> maybe n j x toSMaybe x
As is clear from the definition, this is just the well-known function maybe
from the standard library with the arguments in a different order (the reason
for the strange name toSMaybe
will become apparent soon). We can also go the
other way: from the pattern matching function to the value of type Maybe a
:
fromSMaybe :: (∀ r. r -> (a -> r) -> r) -> Maybe a
= x Nothing Just fromSMaybe x
In fact, it’s not difficult to see that these two functions are mutually
inverse. This means that the two representations Maybe a
and
(∀ r. r -> (a -> r) -> r)
are isomorphic, and we could define Maybe
to be
the type of pattern matching functions. This is known as the Scott encoding of
Maybe
:
type SMaybe a = forall r. r -> (a -> r) -> r
As some further examples of Scott encoding, we will consider the case for
booleans, pairs, and unit (we will see the Plutus equivalent of all of these
later). In the case of Bool
, pattern matching will need a case for True
and
a case for False
; this means that the translation to the Scott encoding turns
out to be if-then-else
:
type SBool = forall r. r -> r -> r
toSBool :: Bool -> SBool
= λ t f -> if x then t else f
toSBool x
fromSBool :: SBool -> Bool
= x True False fromSBool x
For pairs, we only need one case, which will need to accept both components of
the pair as arguments. This means that the Scott-encoding is essentially
uncurry
:
type SPair a b = forall r. (a -> b -> r) -> r
toSPair :: (a, b) -> SPair a b
= λ f -> uncurry f x
toSPair x
fromSPair :: SPair a b -> (a, b)
= x (,) fromSPair x
Finally, for unit we only need one case, with no arguments; i.e., the
Scott-encoding of the unit value ()
is the identity function:
type SUnit = forall r. r -> r
toSUnit :: () -> SUnit
= id
toSUnit ()
fromSUnit :: SUnit -> ()
= x () fromSUnit x
This last example may seem a bit silly, but we will encounter it again later.
Scott encoding is not Church encoding
For readers familiar with the Church encoding, it may be worth pointing out that the Scott encoding is not a fold; it only captures pattern matching, not recursion. Concretely, the Scott encoding of natural numbers is given by
newtype SNat = SNat (forall r. (SNat -> r) -> r -> r)
Note the recursive occurrence of SNat
here. This means for example that unlike
with Church encodings, we can easily define a predecessor function:
pred :: SNat -> SNat
pred (SNat x) = x id (error "pred: zero")
This example will look different in Plutus, however; we will come back to it when we discuss recursion.
Simple example: booleans
Now that we understand what Scott encodings are, let’s look at how booleans work in Plutus Core, and consider compilation of the the Haskell expression
ifThenElse :: Bool -> a -> a -> a
= λ b t f -> if b then t else f ifThenElse
In PIR, this looks like
let* data Bool = True | False
in λ {a} (b :: Bool) (t :: a) (f :: a) ->
Bool_match b {∀ _. a} (λ {_} -> t) (λ {_} -> f) {_}
Every PIR program is entirely self-contained (no external libraries), so all
datatype and function definitions are included. Unlike in Haskell, datatype
declarations are scoped, and automatically introduce an explicit pattern
matching function; for Bool
this pattern matching function is called
Bool_match
.
The body of the function does not introduce any concepts we didn’t see before:
if-then-else
is translated3
by a call to the pattern matching function on Bool
. In order to prevent the
evaluation of t
and f
before the pattern match on b
, we delay those
arguments by giving them a dummy type argument; this makes them polymorphic
functions, and we then force the choice once it is made by applying it a dummy
type argument4. As we
discussed above, this type abstraction and application will then be translated
to force
/delay
calls when the program is translated to UPLC, as we will see
shortly.
Of course, in this particular case, the Plutus compiler is being overly
conservative: since t
and f
are variables (lambda arguments), we know
that they must already have been evaluated (Plutus is a strict language
after all), and so the force/delay in this example is redundant and could in
principle be optimized away.
This is our first example of a program that looks different in PIR and PLC,
because PLC does not have data types; instead, we see the Scott encoding of
Bool
appear:
let* type Bool = ∀ r. r -> r -> r
!True = λ {r} (t :: r) (f :: r) -> t
!False = λ {r} (t :: r) (f :: r) -> f
!Bool_match = λ (x :: ∀ r. r -> r -> r) -> x
in λ {a} (b :: Bool) (t :: a) (f :: a) ->
Bool_match b {∀ _. a} (λ {_} -> t) (λ {_} -> f) {_}
We can see that Bool
is now the Scott-encoding of Bool
we saw in the
previous section; True
and False
have become functions that select the
first and second argument respectively, and Bool_match
is now the identity
function because Scott-encoded data types are pattern matching functions.
Finally, the UPLC translation is
~ b t f -> ( b! (λ ~ -> t) (λ ~ -> f) )! λ
We have now seen all concepts required to understand this code:
- The function itself is polymorphic, and so it is delayed; this explains the initial dummy argument.
b
is the Scott-encoding of a Bool, is therefore a polymorphic function, which turned into a function with a dummy argument in UPLC, and so we must forceb
in order to actually use that function.- We delay both
t
andf
, and once the choice is made, we force the result.
This definition is very similar to the
strictIfThenElse
example above;
the only difference really is the top-level delay; it’s as if we had written
this in Haskell:
$ λ b t f -> force $ force b (delay t) (delay f) delay
Types with arguments: pairs
The Bool
type discussed in the previous section had no arguments: it had kind
Type
. As a second example of Scott encodings, let’s consider a type that
does have arguments: the type of pairs. This type has two type arguments
(the two components), and hence has kind Type -> Type -> Type
(also written * -> * -> *
).
Let’s consider the compilation of the Haskell program
fst :: (a, b) -> a
fst (x, _y) = x
In PIR, this program looks like:
let* data (Tuple2 :: * -> * -> *) a b = Tuple2 a b
in λ {a} {b} (pair :: Tuple2 a b) ->
Tuple2_match {a} {b} pair {a} (λ (x :: a) (y :: b) -> x)
Like before, the pattern match on the pair has been replaced by an explicit call
to the pattern matching function. Unlike in the Bool
case, this pattern
matching function now has three type arguments: two for the type arguments of
the pair (a
and b
), and one for the result type (here, a
, because we are
selecting the first component of the pair).
The PLC version of this program is
let* type Tuple2 :: * -> * -> * = Λ a b. ∀ r. (a -> b -> r) -> r
!Tuple2 = λ {a} {b} (x :: a) (y :: b) {r} (f :: a -> b -> r) -> f x y
!Tuple2_match = λ {a} {b} (x :: (Λ a b. ∀ r. (a -> b -> r) -> r) a b) -> x
in λ {a} {b} (pair :: Tuple2 a b) ->
Tuple2_match {a} {b} pair {a} (λ (x :: a) (y :: b) -> x)
The most interesting thing here is that the type Tuple2
has become a
type-level function (syntax Λ a. t
), which computes the type of the pattern
matching function given two arguments a
and b
. Note the kind difference
between
Λ a b. ... :: * -> * -> *
and (for a given a
and b
)
∀ r. ... :: *
The rest of the code is reasonably straight-forward: the Tuple2
term is the
constructor for pairs, Tuple2_match
is the pattern matching function and is
again the identity function (because data types are pattern matching functions
when we use Scott encodings).
Finally, this program compiles down to the following UPLC:
~ ~ pair -> pair! (λ x y -> x) λ
No new concepts here; it is perhaps somewhat surprising to see the double delay of this function, arising from the fact that it has two type arguments. Indeed, one might wonder if some of these calls to delay/force might not be avoided through a compiler optimization pass.
Type-level recursion
In Haskell. Type-level fixpoints
When we introduced Scott encoding, we saw that we can take something that we normally think of as a feature of a language (pattern matching) and turn it into something that we can work with in the language (functions). In this section we will see that we can do the same with another language feature: type-level recursion (we will delay a discussion of term level recursion until the next section). For a discussion of why one might do this in Haskell, see for example the paper by Rodriguez et al. on generic programming, or the paper by Bahr and Hvitved on compositional data types; why the Plutus compiler does it is discussed in the Plutus paper.
Consider the standard definition of lists:
data List a = Nil | Cons a (List a)
List
takes a single type argument, and hence has kind
List :: * -> *
In order to identify recursion as its own concept, we can abstract out the
recursive use of List
, and give List
an additional argument that
will stand in for “recursive calls”:
data ListF f a = Nil | Cons a (f a)
Now f
has the same kind that List
had previously, which means that we have
ListF :: (* -> *) -> (* -> *)
What should we pass as the value for f
? In a sense, we want to pass ListF
itself, but if we do, then that ListF
also wants an argument, ad infinitum:
ListF (ListF (ListF (ListF ..))) a
This, of course, is precisely recursion. We can capture this using a type-level fixpoint operator:
newtype Fix (t :: (* -> *) -> (* -> *)) (a :: *) = WrapFix (t (Fix t) a)
The kind of the argument t
to Fix
is precisely the kind of ListF
, so
we can define
type List = Fix ListF
Let’s check if this makes sense:
List a
== Fix ListF a
== ListF (Fix ListF) a
== ListF List a
In Haskell. Generalizing to arbitrary kinds: IFix
If we look closely at the arguments of Fix
, we realize that it is more (kind)
monomorphic than it needs to be, and there is an obvious generalization:
newtype IFix (t :: (k -> *) -> (k -> *)) (a :: k) = WrapIFix (t (IFix t) a)
The Plutus paper goes through some length explaining why this
definition is sufficient to capture all datatypes (Section 3.1, Recursive
types). We will not do that in this blog post, but we will see three different
example use cases, starting with lists: notice that we get Fix
back simply by
instantiating k
to *
:
type List = IFix FListF
Let’s introduce two functions to wrap and unwrap IFix
:
wrap :: t (IFix t) a -> IFix t a
= WrapIFix
wrap
unwrap :: IFix t a -> t (IFix t) a
WrapIFix t) = t unwrap (
If we now write functions on lists, we must wrap
and unwrap
at the
appropriate times:
nil :: FList a
= wrap $ Nil
nil
cons :: a -> FList a -> FList a
= wrap $ Cons x xs
cons x xs
null :: FList a -> Bool
null xs =
case unwrap xs of
Nil -> True
Cons _ _ -> False
In Haskell. Combining with Scott encoding
Before we look at how these fixpoints are used in Plutus, let’s stay in Haskell a tiny bit longer, and first see how fixpoints can be combined with Scott encoding.
First, let’s see the Scott encoding of lists without the use of IFix
:
newtype List a = List {
listMatch :: forall r. r -> (a -> List a -> r) -> r
}
The pattern matching function needs two cases: one for nil
and one for cons
;
the latter then takes two arguments, the head and tail of the list. Some
simple examples:
nil :: List a
= List $ λ n _c -> n
nil
cons :: a -> List a -> List a
= List $ λ _ c -> c x xs
cons x xs
null :: List a -> Bool
null xs = listMatch xs True (λ _ _ -> False)
To combine the Scott encoding with the use of explicit fixpoints, we need to
go through the same process as before, and give List
an additional argument
that captures the recursive uses:
newtype ListF f a = List {
listMatch :: forall r. r -> (a -> f a -> r) -> r
}
type List = IFix IListF
The same three functions we defined above now become:
nil :: List a
= wrap $ List $ λ n _c -> n
nil
cons :: a -> List a -> List a
= wrap $ List $ λ _n c -> c x xs
cons x xs
null :: List a -> Bool
null xs = listMatch (unwrap xs) True (λ _ _ -> False)
Simple example: lists
We are now finally ready to look at how lists work in Plutus. Let’s start with
the definition of nil
. In PIR:
let rec data (List :: * -> *) a = Nil | Cons a (List a)
in Nil
The only new PIR feature we see here is marking the definition of Nat
as
recursive: let
-bindings in PIR are non-recursive unless otherwise indicated.
Let’s look the PLC version now, which is considerably more involved; PLC has
neither datatypes nor recursion, and instead uses Scott encodings and the IFix
combinator5:
let* type ListF = Λ (f :: * -> *). Λ a. ∀ r. r -> (a -> (f a) -> r) -> r
type List = Λ a. ifix ListF a
!Nil = λ {a} ->
ListF} {a} (
wrap {n :: r) (c :: a -> List a -> r) -> n)
λ {r} (!Cons = λ {a} (x :: a) (xs :: List a) ->
ListF} {a} (
wrap {n :: r) (c :: a -> (List a) -> r) -> c x xs)
λ {r} (!List_match = λ {a} (x :: List a) ->
unwrap xin Nil
Hopefully this PLC code is not difficult to understand anymore now; ifix
,
wrap
and unwrap
are PLC primitives. To understand the untyped Plutus code,
we merely need to know that wrap
and unwrap
are simply erased, so this whole
program compiles down to
~ ~ n c -> n λ
The double delay comes from the two type parameters: the type of the list
elements a
, and the type of the continuation r
.
As another example, let’s look at null
(check if a list is empty). First,
in PIR:
let* data Bool = True | False
data List :: * -> * a = Nil | Cons a (List a)
rec in λ {a} (xs :: List a) ->
List_match {a} xs {∀ _. Bool}
-> True)
(λ {_} x :: a) (xs' :: List a) {_} -> False)
(λ ( {_}
The pattern match on the list is replaced by a call to the pattern matching
function, and the two arguments for the Nil
and Cons
cases are delayed, to
avoid unnecessary computation; we have seen these patterns before (and, as
before, here too the use of delay
is unnecessarily conservative). In PLC:
let* -- definition of IListF, List, Nil, Cons, List_match as above
-- definition of Bool, True, False as discussed in section on datatypes
..
in λ {a} (xs :: List a) ->
List_match {a} xs {∀ _. Bool}
-> True)
(λ {_} x :: a) (xs :: List a) {_} -> False)
(λ ( {_}
The main code looks identical, but the definitions have changed from the PLC
version: List_match
, True
and False
are now all using the Scott-encoding.
In UPLC:
~ xs -> ( xs! (λ ~ ~ t f -> t) (λ x xs ~ ~ t f -> f) )! λ
Let’s make sure we really understand this code:
- The top-level delay
(λ ~ xs -> ..)
is there becausenil
is a polymorphic function (type argumenta
). - We must force
xs
becausexs
is represented by the Scott-encoding of a list, which is therefore itself a polymorphic function (type argumentr
). - We give two arguments to
xs
: the case fornil
and the case forcons
. We delay both of those arguments, and then force the result once the choice is made. - Finally,
True
(i.e.,(λ t f -> t)
in Scott encoding) andFalse
(i.e,(λ t f -> f)
) themselves are polymorphic functions (they have their ownr
type argument), and we therefore have one more delay in both arguments toxs
.
In Haskell. Types of kind *
Lists are probably the simplest example of IFix
, because the kind of ListF
ListF :: (* -> *) -> (* -> *)
lines up very nicely with the kind of IFix
:
IFix :: ∀ k. ((k -> *) -> (k -> *)) -> (k -> *)
simply by picking k = *
. We’ll now discuss an example that doesn’t line up
quite so nicely: natural numbers. Basic definition:
data Nat = Succ Nat | Zero
Abstracting out the recursion:
data NatF f = Succ f | Zero
Since NatF
does not take any argument, its kind does not line up with what
IFix
wants:
NatF :: * -> *
What we will do instead is pick some type SimpleRec
such that
IFix SimpleRec :: (* -> *) -> *
In other words, we will pick k = * -> *
. This then allows us to then define
type Nat = IFix SimpleRec NatF
Unlike in the list example, NatF
is now the second argument to IFix
, not
the first; this is quite a different use of IFix
.6
Here’s how we can define SimpleRec
:
newtype SimpleRec (rec :: (* -> *) -> *) (f :: (* -> *)) = SimpleRec (f (rec f))
Let’s see if this makes sense:
Nat
== IFix SimpleRec NatF
== SimpleRec (IFix SimpleRec) NatF
== NatF (IFix SimpleRec NatF)
== NatF Nat
Example: natural numbers
Now that we have seen the theory in Haskell, the Plutus translation is not that difficult anymore. Let’s consider the compilation of
data Nat = Zero | Succ Nat
isZero :: Nat -> Bool
Zero = True
isZero = False isZero _
to Plutus. First, PIR:
let* data Bool = True | False
let rec data Nat = Succ Nat | Zero
in λ (n :: Nat) ->
Nat_match n {∀ _. Bool}
n' :: Nat) {_} -> False)
(λ (-> True)
(λ {_} {_}
Nothing new here. Let’s look at PLC:
let* type SimpleRec = Λ (rec :: (* -> *) -> *). Λ (f :: * -> *). f (rec f)
type NatF = Λ f. ∀ r. (f -> r) -> r -> r
type Nat = ifix SimpleRec NatF
!Succ = λ (n :: Nat) -> wrap {SimpleRec} {NatF} (
s :: Nat -> r) (z :: r) -> s n)
λ {r} (!Zero = wrap {SimpleRec} {NatF} (
s :: Nat -> r) (z :: r) -> z)
λ {r} (!Nat_match = λ (x :: Nat) -> unwrap x
-- definition of Bool, True, False as discussed in section on datatypes
in λ (n :: Nat) ->
Nat_match n {∀ _. Bool}
ipv :: Nat) {_} -> False)
(λ (-> True)
(λ {_} {_}
Other than the use of SimpleRec
that we discussed in the previous
section7, this introduces no new concepts. The UPLC code is straight-forward,
because the wrapping/unwrapping simply disappears:
-> ( n! (λ ipv ~ ~ t f -> f) (λ ~ ~ t f -> t) )! λ n
Term-level recursion
In Haskell. Term-level fixpoints
So far we have been talking about recursive types; let’s now turn our attention to recursive terms. Let’s consider addition on natural numbers:
data Nat = Succ Nat | Zero
add :: Nat -> Nat -> Nat
Zero m = m
add Succ n) m = Succ (add n m) add (
Just like we did with types, we can abstract out the recursive call:
addF :: (Nat -> Nat -> Nat) -> Nat -> Nat -> Nat
Zero m = m
addF f Succ n) m = Succ (f n m) addF f (
And as with types, we really want to pass the function itself as its own argument:
..))) addF (addF (addF (addF
We can do this by defining a term-level fixpoint combinator:
fix :: (a -> a) -> a
= f (fix f)
fix f
add :: Nat -> Nat -> Nat
= fix addF add
Let’s verify:
add== fix addF
== addF (fix addF)
== addF add
In Haskell. Deriving term-level recursion from type-level recursion
In the previous section, we abstracted out the recursion from add
, but we
still used recursion in the definition of fix
. Stunningly, it is possible
to define the term-level fixpoint combinator without using recursion at all!
This is the famous Y-combinator. As we saw above, at the heart
of recursion is applying a function to itself, and this self-application is
also at the heart of the Y-combinator; in pseudo-Haskell:
fix :: forall a. (a -> a) -> a
= λ f -> (λ s -> f (s s)) (λ s -> f (s s)) -- not real Haskell fix
Let’s verify:
add== fix addF
== (λ s -> addF (s s)) (λ s -> addF (s s))
== addF ((λ s -> addF (s s)) (λ s -> addF (s s)))
== addF (fix addF)
== addF add
The above definition is pseudo-Haskell, because it does not typecheck. Consider
selfApply :: ??
= f f selfApply f
What would be the type of selfApply
? Since we are apply f
to something,
it must have type a -> b
for some a
and b
; and since we are applying it
to itself, we must then also have that a
is the same as a -> b
. If we
tried this in ghc
, it would complain that it cannot solve this problem:
• Occurs check: cannot construct the infinite type: a ~ a -> b
• In the first argument of ‘f’, namely ‘f’
In the expression: f f
In an equation for ‘selfApply’: selfApply f = f f
However, we can define a type T b
such that T b
is isomorphic to
T b -> b
:
newtype Self b = WrapSelf { unwrapSelf :: Self b -> b }
This allows us to define self-application8:
selfApply :: Self b -> b
{-# NOINLINE selfApply #-}
= (unwrapSelf f) f selfApply f
and therefore the Y-combinator:
fix :: forall a. (a -> a) -> a
= selfApply $ WrapSelf $ λ s -> f (selfApply s) fix f
If we remove all the newtype wrapping and unwrapping, we can see that this is indeed the Y-combinator:
fix f== selfApply $ WrapSelf $ λ s -> f (selfApply s)
== selfApply $ λ s -> f (selfApply s)
== selfApply $ λ s -> f (s s)
== (λ s -> f (s s)) (λ s -> f (s s))
In Haskell. Using IFix
We have now managed to define a term-level fixpoint combinator without using
term-level recursion, but we still used type-level recursion in the definition
of Self
. The final step is to define Self
using IFix
instead. As always,
we first abstract out the recursion from Self
:
newtype SelfF (self :: * -> *) (a :: *) = SelfF {
unSelfF :: self a -> a
}
Fortunately, the kind of SelfF
lines up nicely with the kind expected by
IFix
, so we can simply define
type Self = IFix SelfF
The definition of the fixpoint combinator remains essentially the same; we just
need to now wrap/unwrap both the SelfF
newtype and the IFix
newtype.
fix :: forall a. (a -> a) -> a
= let w = wrap $ SelfF $ λ s -> f (selfApply s) in selfApply w fix f
However, Since Plutus is a strict language, it only makes sense to take the fixpoint of functions (you can’t define an infinite list in Plutus, for example). The fixpoint in Plutus therefore is specialized to functions; in Haskell we could define this as:9
fixFun :: forall a b. ((a -> b) -> (a -> b)) -> (a -> b)
= let w = wrap $ SelfF $ λ s x -> f (selfApply s) x in selfApply w fixFun f
Example: addition
We are now ready to discuss our final and most involved Plutus program: addition. As a reminder, the Haskell code looked like
data Nat = Succ Nat | Zero
add :: Nat -> Nat -> Nat
Zero m = m
add Succ n) m = Succ (add n m) add (
In PIR:
let rec data Nat = Succ Nat | Zero
!add = λ (n :: Nat) (m :: Nat) ->
rec Nat_match n {∀ _. Nat}
n' :: Nat) {_} -> Succ (add n' m))
(λ (-> m)
(λ {_}
{_}in add
To go from PIR to PLC, we need to move the type-level recursion from Nat
, the
term-level recursion from add
, and introduce Scott encoding:10
let* type SimpleRec = Λ (rec :: (* -> *) -> *). Λ (f :: * -> *). f (rec f)
type NatF = Λ f. ∀ r. (f -> r) -> r -> r
type Nat = ifix SimpleRec NatF
!Succ = λ (n :: Nat) ->
SimpleRec} {NatF} (
wrap {s :: Nat -> r) (z :: r) -> s n)
λ {r} (!Zero = wrap {SimpleRec} {NatF} (
s :: Nat -> r) (z :: r) -> z)
λ {r} (!Nat_match = λ (x :: Nat) -> unwrap x
type SelfF = Λ (self :: * -> *). Λ a. self a -> a
type Self = Λ a. ifix SelfF a
!addF = λ (f :: Nat -> Nat -> Nat) (n :: Nat) (m :: Nat) ->
Nat_match n {∀ _. Nat}
n' :: Nat) {_} -> Succ (f n' m))
(λ (-> m)
(λ {_}
{_}!w = wrap {SelfF} {Nat -> Nat -> Nat} (
s :: Self (Nat -> Nat -> Nat)) (n :: Nat) ->
λ (
addF ((unwrap s) s) n)!add = (unwrap w) w
in add
There is a lot of detail here, but no new concepts. The most important thing to
realize is that the definition of addF
is precisely
the one we expect, and that we can
recognize the definition of w
and the use of (unwrap w) w
as being precisely
the application of (what we called) fixFun
to addF
.
We are now ready to understand the UPLC version:
let* w = λ s n ->
let* add = s s
in λ m -> ( n! (λ n' ~ ~ s z -> s (add n' m)) (λ ~ -> m) )!
in w w
This code is pretty difficult to understand without seeing the typed version,
but if we compare this to the PLC version we can see what is going on. The only
somewhat confusing part is that the compiler has inlined the definition of
addF
, and assigned the name add
to the term (s s)
(not unjustifiably: it
corresponds to the recursive calls). The double delay in the first argument
to n
comes from two places: one delay from the definition of addF
, and
one delay from the definition of Succ
.
The cost of polymorphism
In this final section we will not introduce any new Plutus concepts, but we will consider the following question: suppose we have a monomorphic top-level Plutus program; what is the cost of defining this in terms of some polymorphic functions? In an ideal world, this cost would be zero, but unfortunately this is not the case at the moment.
Parametric polymorphism
We have already seen the cost of polymorphism in the proliferation of
delay
/force
calls in UPLC. However, there is another cost. Let’s consider
the compilation of this Haskell code:
reversePoly :: [a] -> [a]
= reversePoly' []
reversePoly
reversePoly' :: [a] -> [a] -> [a]
= acc
reversePoly' acc [] :xs) = reversePoly' (x:acc) xs
reversePoly' acc (x
useReversePoly :: [Integer] -> [Integer]
= reversePoly . reversePoly useReversePoly
The PIR code is unsurprising:
let* rec data List :: * -> * a = Nil | Cons a (List a)
!reversePoly' =
rec acc :: List a) (xs :: List a) ->
λ {a} (List_match {a} xs {∀ _. List a}
-> acc)
(λ {_} x :: a) (xs' :: List a) {_} -> reversePoly' {a} (Cons {a} x acc) xs')
(λ (
{_}!reversePoly = λ {a} -> reversePoly' {a} (Nil {a})
in (λ {b} {c} {a} (f :: b -> c) (g :: a -> b) (x :: a) -> f (g x))
List Integer}
{List Integer}
{List Integer}
{Integer})
(reversePoly {Integer}) (reversePoly {
But the PLC code isn’t; specifically, the compilation of reversePoly'
:
let* type Id = ∀ a. a -> a
type Reverse = ∀ a. List a -> List a -> List a
!reversePoly'F =
f :: Id -> Reverse) (_ :: Id) {a} (acc :: List a) (xs :: List a) ->
λ (List_match {a} xs {∀ _. List a}
-> acc)
(λ {_} x :: a) (xs' :: List a) {_} -> f (λ {a} (x :: a) -> x) {a} (Cons {a} x acc) xs')
(λ (
{_}
!w = wrap {SelfF} {Id -> Reverse} (λ (s :: Self (Id -> Reverse)) (x :: Id) ->
reversePoly'F ((unwrap s) s) x)in (unwrap w) w
The definition of reversePoly'F
is the version of reversePoly'
with the
recursive call abstracted out, as before. However, it seems to have gotten an
additional argument of type Id = ∀ a. a -> a
. This is essentially a different
way to delay. Recall from the section on Force and
Delay that delay conceptually corresponds to
introducing a parameter of type unit; recall furthermore from the section on
Scott encoding that the Scott encoding of unit is
∀ a. a -> a
.
So we can understand the (dummy) parameter of type Id
as another delay, but
the reason why this delay is there, and why it doesn’t just use another type
abstraction, is rather
technical. Here we will
merely state that any recursive polymorphic function will receive such a dummy
argument at present; unfortunately, this then also results in some additional
code in UPLC, because this does not get translated the native delay
/force
constructs:
let* w = λ s _ -> let* reversePoly = s s
in λ ~ acc xs ->
!
( xs~ -> acc)
(λ ~ -> reversePoly (λ ~ x -> x) ! (λ ~ n c -> c x acc) xs')
(λ x xs' !
) = w w
reversePoly_0 = λ ~ -> reversePoly_0 (λ ~ x -> x) ! (λ ~ n c -> n)
reversePoly_1 = reversePoly_1 !
f = reversePoly_1 !
g in λ x_3 -> f (g x_3)
Not only do we see that (λ ~ x -> x)
being passed around, but it’s also
not shared, so this can result in fairly significant code expansion.
Adhoc polymorphism
Let’s now consider what happens if we have a type class constraint:
allEqualPoly :: forall a. Eq a => a -> [a] -> Bool
= True
allEqualPoly _ [] :ys) = x == y && allEqualPoly x ys
allEqualPoly x (y
useAllEqualPoly :: [Integer] -> Bool
= allEqualPoly 1 useAllEqualPoly
PIR:
let rec !allEqualPoly = λ {a} (dEq :: (Λa. a -> a -> Bool) a) (x :: a) (ys :: List a) ->
List_match {a} ys {∀ _. Bool}
-> True)
(λ {_} y :: a) (ys' :: List a) {_} ->
(λ (let* !l = dEq x y
!r = allEqualPoly {a} dEq x ys'
in Bool_match l {∀ _. Bool} (λ {_} -> r) (λ {_} -> False) {_})
{_}in allEqualPoly {Integer} (λ (x :: Integer) (y :: Integer) ->
Bool} (equalsInteger x y) True False) 1 ifThenElse {
We see that the type class constraint has been replaced with a dictionary argument. The unfortunate thing is that this dictionary argument does not get inlined; in UPLC this program looks like
let* True = λ ~ t f -> t
False = λ ~ t f -> f
= λ s _ -> let* allEqualPoly = s s
w in λ ~ dEq x ys ->
!
( ys~ -> True)
(λ ~ -> let* l = dEq x y;
(λ y ys' = (allEqualPoly (λ ~ x -> x))! dEq x ys'
r in ( l! (λ ~ -> r) (λ ~ -> False) )!
)!
)= w w
allEqualPoly
in (allEqualPoly (λ ~ x -> x))! (λ x y -> ifThenElse# ! (equalsInteger# x y) True False) 1#
This lack of specialization may also prevent further optimizations (we see the
same Id
dummy argument that we discussed in the previous section here too,
because allEqualPoly
is a polymorphic recursion function).
Multi-parameter type classes
Finally, let’s consider what happens with multi-parameter type classes
class Convert a b where
convert :: a -> b
instance Convert Integer Bool where
= n == 0
convert n
convertPair :: (Convert a a', Convert b b') => (a, b) -> (a', b')
= (convert x, convert y)
convertPair (x, y)
useConvertPair :: (Integer, Integer) -> (Bool, Bool)
= convertPair useConvertPair
(The Plutus compiler does not currently support associated data types, nor associated type synonyms.) In PIR:
let* data Bool = True | False
= λ(n :: Integer) -> ifThenElse {Bool} (equalsInteger n 0) True False
fConvertIntegerBool data Tuple2 :: * -> * -> * a b = Tuple2 a b
!dConvert = fConvertIntegerBool
!dConvert = fConvertIntegerBool
in λ(pair :: Tuple2 Integer Integer) ->
Tuple2_match
Integer}
{Integer}
{
pairTuple2 Bool Bool}
{x :: Integer) (y :: Integer) -> Tuple2 {Bool} {Bool} (dConvert x) (dConvert y)) (λ(
UPLC:
let* fConvertIntegerBool = \ ~ n ->
ifThenElse# ! (equalsInteger# n 0#) (\ ~ t f -> t) (\ ~ t f -> f)
dConvert = fConvertIntegerBool!
dConvert' = fConvertIntegerBool!
in \ ds -> ds ! (\ x y ~ pair -> pair (dConvert x) (dConvert' y))
Fortunately, here we see no remnants of the fact that we used a type class with multiple parameters.
Taking advantage of ghc
optimizations
You might wonder why ghc
didn’t inline the dictionary in the
example of adhoc polymorphism that we saw above.
The answer is that it can. All of the examples above had the definition of the
Haskell code in the same Haskell module as the call to compile
:
useAllEqualPoly :: [Integer] -> Bool
= ..
useAllEqualPoly
compiledUseAllEqualPoly :: CompiledCode ([Integer] -> Bool)
= $$(compile [|| useAllEqualPoly ||]) compiledUseAllEqualPoly
A consequence of this setup is that the Plutus compilation plugin will see the
code before the ghc
optimizer gets to work on it. If instead we put the
definition of useAllEqualPoly
in a different module than the call to
compile
, and we add
{-# OPTIONS_GHC -fexpose-all-unfoldings #-}
{-# OPTIONS_GHC -fspecialize-aggressively #-}
to the module defining useAllEqualPoly
, we see that ghc
inlines the
dictionary entirely, and the resulting UPLC code is of the same size as the
code resulting from a monomorphic definition of allEqual
.
That is encouraging, but unfortunately this does not always result in smaller code. As a simple example, consider something like
pred :: Nat -> Nat
pred Zero = error ()
pred (Succ n) = n
predder :: Nat -> Nat
= pred . pred predder
The compilation of predder
is about 50% larger when compiled with ghc
optimizations, because ghc
inlined the definition of pred
. There are other,
less obvious, examples too. For instance, in the parametric polymorphic
example of reverse
, enabling the ghc
optimizer
results in more polymorphic recursive let-bindings of the shape discussed in
that section, and hence in more of those (λ ~ x -> x)
terms, resulting in
larger code size.
So your mileage may vary; perhaps enabling the ghc
optimizer will result in
smaller UPLC code, perhaps not. It may require careful tweaking of the code
and/or the ghc
optimization flags.
Conclusions
For many Plutus developers, UPLC is of particular interest, because Plutus programs are subject to stringent resource bounds and so it is often necessary to optimize the UPLC code by making tweaks to the Haskell source code. In order to do so, it helps to understand the compilation pipeline, which is what this blog post attempted to illustrate.
This is of course not the only way to improve the UPLC code. For example, Well-Typed has been working on an optimizing compiler for UPLC called Plutonomy; as one example, Plutonomy can reduce the size of the Marlowe Plutus script by about 20%. However, Plutonomy is hampered by the lack of type information in UPLC; indeed, the presence of explicit delay/force constructs in UPLC is a blessing, enabling some optimizations that would not otherwise be possible without type inference or some other kind of global analysis. Perhaps an optimizer that operated at the level of PLC instead of UPLC would be beneficial for some applications.
An alternative approach is to bypass the entire compilation pipeline altogether, and instead write UPLC code “directly”. One project that makes this possible is Plutarch, which provides a typed EDSL in Haskell for UPLC.
We hand-wrote this example in PIR for this blog post, since illustrating this with a PIR program that would result from the normal Haskell to PIR compiler would look more complicated.↩︎
This is different to the semantics of type arguments in
ghc
core: there evaluation does continue underneath the binder.↩︎The Haskell
Bool
type does not translate to the built-inBool
type in Plutus Core; that is used only by built-in functions.↩︎The actual dummy argument provided is
∀ a. a -> a
.↩︎This is not quite the output of the Plutus compiler; cleaned up a bit to improve readability↩︎
This is not the only possible approach. An alternative would be to define a version of
Nat
with an argument, over the unit kind. This would perhaps be more consistent; what we illustrate here is what the Plutus compiler does and it may not be unreasonable to special case types of kind*
, given how common they are.↩︎SimpleRec
is not named as such in PLC; the definition is simply inlined.↩︎selfApply
must be marked as no-inline to preventghc
from unrolling recursion at compile time forever.↩︎We could of course just define
fixFun = fix
; the definition offixFun
we show here matches the one used in Plutus, and is therefore more helpful when trying to understand PLC/UPLC code.↩︎Again, code cleaned up a bit for improved readability.↩︎