Lambda calculus predecessor function reduction steps

Question

I am getting stuck with the Wikipedia description of the predecessor function in lambda calculus.

What Wikipedia says is the following:

PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)

Can someone explain reduction processes step-by-step?

Thanks.

I find lambda notation hard to follow, so I've done step-by-step reductions with S-expressions using Clojure for `(pred zero)`, `(pred one)` and `(pred two)` and [published it on Github.](https://github.com/abiro/church-encoding/blob/master/src/church_encoding/pred_reductions.clj) — Agost Biro, Apr 15 '17 at 11:05

score 32 · Answer 1 · answered Jan 10 '12 at 01:32

Ok, so the idea of Church numerals is to encode "data" using functions, right? The way that works is by representing a value by some generic operation you'd perform with it. We can therefore go in the other direction as well, which can sometimes make things clearer.

Church numerals are a unary representation of the natural numbers. So, let's use Z to mean zero and Sn to represent the successor of n. Now we can count like this: Z, SZ, SSZ, SSSZ... The equivalent Church numeral takes two arguments--the first corresponding to S, and second to Z--then uses them to construct the above pattern. So given arguments f and x, we can count like this: x, f x, f (f x), f (f (f x))...

Let's look at what PRED does.

First, it creates a lambda taking three arguments--n is the Church numeral whose predecessor we want, of course, which means that f and x are the arguments to the resulting numeral, which thus means that the body of that lambda will be f applied to x one time fewer than n would.

Next, it applies n to three arguments. This is the tricky part.

The second argument, that corresponds to Z from earlier, is λu.x--a constant function that ignores one argument and returns x.

The first argument, that corresponds to S from earlier, is λgh.h (g f). We can rewrite this as λg. (λh.h (g f)) to reflect the fact that only the outermost lambda is being applied n times. What this function does is take the accumulated result so far as g and return a new function taking one argument, which applies that argument to g applied to f. Which is absolutely baffling, of course.

So... what's going on here? Consider the direct substitution with S and Z. In a non-zero number Sn, the n corresponds to the argument bound to g. So, remembering that f and x are bound in an outside scope, we can count like this: λu.x, λh. h ((λu.x) f), λh'. h' ((λh. h ((λu.x) f)) f) ... Performing the obvious reductions, we get this: λu.x, λh. h x, λh'. h' (f x) ... The pattern here is that a function is being passed "inward" one layer, at which point an S will apply it, while a Z will ignore it. So we get one application of f for each S except the outermost.

The third argument is simply the identity function, which is dutifully applied by the outermost S, returning the final result--f applied one fewer times than the number of S layers n corresponds to.

+1 for the important insight that "the body of that lambda will be `f` applied to `x` one time fewer than `n` would." But how does it achieve this goal is still beyond me from your description. It would probably help with adding some more abstractions to this formula, abstracting the ideas in a bit higher level. For example, by replacing that `Lu.u` with `I`, the Identity function. And maybe some others too. I saw an interesting explanation of it here: http://www.mactech.com/articles/mactech/Vol.07/07.05/LambdaCalculus/index.html which deciphers these lambdas as list operations (cons,car,cdr). — SasQ, Mar 14 '12 at 02:58
I think the list version ends up being a different, more elaborate implementation, albeit an easier to understand one. The predecessor definition here really is just hard to comprehend, and the best way to see how it works might be to step through the evaluation manually to get a feel for what's going on. — C. A. McCann, Mar 14 '12 at 03:20

score 7 · Answer 2 · edited Feb 12 '17 at 02:40

McCann's answer explains it pretty well. Let's take a concrete example for Pred 3 = 2:

Consider expression: n (λgh.h (g f)) (λu.x). Let K = (λgh.h (g f))

For n = 0, we encode 0 = λfx.x, so when we apply the beta reduction for (λfx.x)(λgh.h(gf)) means (λgh.h(gf)) is replaced 0 times. After further beta-reduction we get:

λfx.(λu.x)(λu.u)

reduces to

λfx.x

where λfx.x = 0, as expected.

For n = 1, we apply K for 1 times:

(λgh.h (g f)) (λu.x) => λh. h((λu.x) f) => λh. h x

For n = 2, we apply K for 2 times:

(λgh.h (g f)) (λh. h x) => λh. h ((λh. h x) f) => λh. h (f x)

For n = 3, we apply K for 3 times:

(λgh.h (g f)) (λh. h (f x)) => λh.h ((λh. h (f x)) f) => λh.h (f (f x))

Finally, we take this result and apply an id function to it, we got

λh.h (f (f x)) (λu.u) => (λu.u)(f (f x)) => f (f x)

This is the definition of number 2.

The list based implementation might be easier to understand, but it takes many intermediate steps. So it is not as nice as the Church's original implementation IMO.

I don't understand how you get from "For n = so and so many times we apply K for so and so many times" to each of the first lines of the derivations. — Zelphir Kaltstahl, Sep 26 '16 at 23:38
Yeah it's not explicitly stated; but it comes from the `f` term in a numeral. For instance 2 is defined as `λfx.f(fx)`. So that means when the beta reduction is done for `(λfx.f(fx))(λgh.h (g f))` the `(λgh.h (g f))` function is applied twice. — Greg, Feb 12 '17 at 02:33

score 1 · Answer 3 · answered Aug 27 '17 at 17:36

After Reading the previous answers (good ones), I’d like to give my own vision of the matter in hope it helps someone (corrections are welcomed). I’ll use an example.

First off, I’d like to add some parenthesis to the definition that made everything clearer to me. Let’s redifine the given formula to:

PRED := λn λf λx.(n (λgλh.h (g f)) (λu.x)) (λu.u)

Let’s also define three Church numerals that will help with the example:

Zero := λfλx.x
One := λfλx. f (Zero f x)
Two := λfλx. f (One f x)
Three := λfλx. f (Two f x)

In order to understand how this works, let's focus first on this part of the formula:

n (λgλh.h (g f)) (λu.x)

From here, we can extract this conclusions: n is a Church numeral, the function to be applied is λgλh.h (g f) and the starting data is λu.x

With this in mind, let's try an example:

PRED Three := λf λx.(Three (λgλh.h (g f)) (λu.x)) (λu.u)

Let's focus first on the reduction of the numeral (the part we explained before):

Three (λgλh.h (g f)) (λu.x)

Which reduces to:

(λgλh.h (g f)) (Two (λgλh.h (g f)) (λu.x))
(λgλh.h (g f)) ((λgλh.h (g f)) (One (λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (Zero (λgλh.h (g f)) (λu.x))))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) ((λfλx.x) (λgλh.h (g f)) (λu.x)))) -- Here we lose one application of f
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h ((λu.x) f)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h x))
(λgλh.h (g f)) (λh.h ((λh.h x) f))
(λgλh.h (g f)) (λh.h (f x))
(λh.h ((λh.h (f x) f)))

Ending up with:

λh.h f (f x)

So, we have:

PRED Three := λf λx.(λh.h (f (f x))) (λu.u)

Reducing again:

PRED Three := λf λx.((λu.u) (f (f x)))
PRED Three := λf λx.f (f x)

As you can see in the reductions, we end up applying the function one time less thanks to a clever way of using functions.

Using add1 as f and 0 as x, we get:

PRED Three add1 0 := add1 (add1 0) = 2

Hope this helps.

score 0 · Answer 4 · answered Jan 21 '17 at 21:50

You can try to understand this definition of the predecessor function (not my favourite one) in terms of continuations.

To simplify the matter a bit, let us consider the following variant

    PRED := λn.n (λgh.h (g S)) (λu.0) (λu.u)

then, you can replace S with f, and 0 with x.

The body of the function iterates n times a transformation M over an argument N. The argument N is a function of type (nat -> nat) -> nat that expects a continuation for nat and returns a nat. Initially, N = λu.0, that is it ignores the continuation and just returns 0. Let us call N the current computation.

The function M: (nat -> nat) -> nat) -> (nat -> nat) -> nat modifies the computation g: (nat -> nat)->nat as follows. It takes in input a continuation h, and applies it to the result of continuing the current computation g with S.

Since the initial computation ignored the continuation, after one application of M we get the computation (λh.h 0), then (λh.h (S 0)), and so on.

At the end, we apply the computation to the identity continuation to extract the result.

Dmitri Gekhtman · Answer 5 · 2019-09-25T16:57:18.017

I'll add my explanation to the above good ones, mostly for the sake of my own understanding. Here's the definition of PRED again:

PRED := λnfx. (n (λg (λh.h (g f))) )  λu.x λu.u

The stuff on the right side of the first dot is supposed to be the (n-1) fold composition of f applied to x: f^(n-1)(x).

Let's see why this is the case by incrementally grokking the expression.

λu.x is the constant function valued at x. Let's just denote it const_x.

λu.u is the identity function. Let's call it id.

λg (λh.h (g f)) is a weird function that we need to understand. Let's call it F.

Ok, so PRED tells us to evaluate the n-fold composition of F on the constant function and then to evaluate the result on the identity function.

PRED := λnfx. F^n const_x id

Let's take a closer look at F:

F:= λg (λh.h (g f))

F sends g to evaluation at g(f). Let's denote evaluation at value y by ev_y. That is, ev_y := λh.h y

So

F = λg. ev_{g(f)}

Now we figure out what F^n const_x is.

F const_x = ev_{const_x(f)} = ev_x

and

F^2 const_x = F ev_x = ev_{ev_x(f)} = ev_{f(x)}

Similarly,

F^3 const_x = F ev_{f(x)} = ev_{f^2(x)}

and so on:

F^n const_x = ev_{f^(n-1)(x)}

Now,

PRED = λnfx. F^n const_x id

     = λnfx. ev_{f^(n-1)(x)} id

     = λnfx. id(f^(n-1)(x))

     = λnfx. f^(n-1)(x)

which is what we wanted.

Super goofy. The idea is to turn doing something n times into doing f n-1 times. The solution is to apply F n times to const_x to obtain ev_{f^(n-1)(x)} and then to extract f^(n-1)(x) by evaluating at the identity function.

score 0 · Answer 6 · answered Oct 16 '22 at 06:36

Split this definition

PRED := λn.λf.λx.n (λg.λh.h (g f)) (λu.x) (λu.u)

into 4 parts:

PRED := λn.λf.λx. | n | (λg.λh.h (g f)) | (λu.x) | (λu.u)
                    -   ---------------   ------   ------
                    A   B                 C        D

For now, ignore D. By definition of Church numerals, A B C is B^n C: Apply n folds of B to C.

Now treat B like a machine that turns one input into one output. Its input g has form λh.h *, when appended by f, becomes (λh.h *) f = f *. This adds one more application of f to *. The result f * is then prepended by λh.h to become λh.h (f *).

You see the pattern: Each application of B turns λh.h * into λh.h (f *). If we had λh.h x as the begin term, we would have λh.h (f^n x) as the end term (after n applications of B).

However, the begin term is C = (λu.x), when appended by f, becomes (λu.x) f = x, then prepended by λh.h to become λh.h x. So we had λh.h x after, not before, the first application of B. This is why we have λh.h (f^(n-1) x) as the end term: The first application of f was ignored.

Finally, apply λh.h (f^(n-1) x) to D = (λu.u), which is identity, to get f^(n-1) x. That is:

PRED := λn.λf.λx.f^(n-1) x

Lambda calculus predecessor function reduction steps

6 Answers6