Here’s a quote from a computer scientist living in what was clearly simpler times:
[…]. Hence I have written the following simple routine, which may separate the man-compilers from the boy-compilers: […]
– Donald Knuth
Here is Knuth’s program in ALGOL60:
begin real procedure A(k, x1, x2, x3, x4, x5); value k; integer k; begin real procedure B; begin k := k - 1; B := A := A(k, B, x1, x2, x3, x4) end; if k ≤ then A : = x4 + x5 else B end; outreal(A(10, 1, -1, -1, 1, 0)) end;
This seemingly innocent program uses a lot of resources to compute its result. It takes a parameter k that is used to scale up the difficulty of the problem. It’s an interesting program to use for testing language implementations, since it’s easy to verify the result and also to compare the performance with other language implementations. People have translated the program into many languages over at Rosetta code.
But what if you have a groundbreaking compiler that manages to compute the solution to a larger k-value than you can find in the tables? You would want to know if you got the right answer. The “last word” from Knuth in ALGOL bulletin #19, page 8, gives us a way to easily compute the answer much faster than running the actual program:
Since then my right hand has observed that the value of A(k, x1, x2, x3, x4, x5) is equal to c1 × x1 + c2 × x2 + c3 × x3 + c4 × x4 + c5 × x5 where the coefficients are given in the following table:
k c1(k) c2(k) c3(k) c4(k) c5(k) ≤0 0 0 0 1 1 1 0 0 1 1 0 2 0 1 1 0 0 3 1 1 0 0 0 4 2 1 0 0 0 5 3 2 1 0 0 6 5 3 3 2 0 7 8 6 9 6 0 8 14 15 22 13 0 9 29 37 48 26 0 10 66 85 102 54 0
When k ≥ 5, these values may be obtained by the relations
c1(k + 1) = c1(k) + c2(k)
c2(k + 1) = c2(k) + c3(k)
c3(k + 1) = c3(k) + c4(k + 1)
c4(k + 1) = c1(k) + c4(k) - 1
c5(k) = 0
By using Knuth’s table and relations it’s possible to compute the man or boy function quite quickly. I wrote a program that does this. A walk-through follows.
The first part of the program is a gratuitous use of macros. The
define-coefficients macro defines one of the coefficients mentioned
in Knuth’s last word letter (c1 to c5). Each coefficient is
parameterized by the k-value, which means that we should define a
procedure that takes k as an argument and returns cn(k). Knuth’s
table gives us the first values it should return (consts). For the
rest of the values we’ll use the corresponding relation. As a bonus
the macro also creates a hashtable where it stores previously computed
values, which is key to speeding up the algorithm.
(define-syntax define-coefficients (syntax-rules () ((_ (name k) consts equation) (define name (let ((h (make-eqv-hashtable))) (lambda (k) (let ((v consts)) (cond ((< k 0) (vector-ref v 0)) ((< k (vector-length v)) (vector-ref v k)) ((hashtable-ref h k #f)) (else (let ((t equation)) (hashtable-set! h k t) ;memoize t))))))))))
Next the macro is used to define the coefficients.
(define-coefficients (c1 k) '#(0 0 0 1 2 3) (+ (c1 (- k 1)) (c2 (- k 1)))) (define-coefficients (c2 k) '#(0 0 1 1 1 2) (+ (c2 (- k 1)) (c3 (- k 1)))) (define-coefficients (c3 k) '#(0 1 1 0 0 1) (+ (c3 (- k 1)) (c4 k))) (define-coefficients (c4 k) '#(1 1 0 0 0 0) (+ (c1 (- k 1)) (c4 (- k 1)) -1)) (define-coefficients (c5 k) '#(1) 0)
The equations given as the last argument to the macro are the relations adjusted to cn(k) instead of cn(k+1). Now that the coefficients have been defined it is easy to define A, just as Knuth observed:
(define (A k x1 x2 x3 x4 x5) (+ (* (c1 k) x1) (* (c2 k) x2) (* (c3 k) x3) (* (c4 k) x4) (* (c5 k) x5)))
Computing what is commonly referred to as A(k) is now achieved by
(A k 1 -1 -1 1 0). In Chez Scheme the computation takes no
time at all, even for very large k that no existing machine could
possibly handle using the original algorithm:
> (time (A 11 1 -1 -1 1 0)) (time (A 11 ...)) no collections 0.000004543s elapsed cpu time 0.000004149s elapsed real time 96 bytes allocated -138 > (time (A 128 1 -1 -1 1 0)) (time (A 128 ...)) no collections 0.000007392s elapsed cpu time 0.000007114s elapsed real time 304 bytes allocated -4635937302118988398753530618599286738522250 > (time (A 1024 1 -1 -1 1 0)) (time (A 1024 ...)) no collections 0.000744346s elapsed cpu time 0.000743677s elapsed real time 635472 bytes allocated -134525946204897748012677078309699584745858677996302504232754322006054514469295899995329805478290190232956296857851394596551319734862436102284482400522332522964879376131124504760141628006776332968808079671234906412428151832076596502192653046509414077174915139024920128557185075083924034657098210492453447406606370793777979291350811243549339280709645840417
Only one question is left unanswered. Whether or not a sufficiently smart compiler can (or indeed should) transform Knuth’s original algorithm into the fast algorithm presented here.
This was written as the first part of a series of articles on stuff that has been lying around on this website for a long time without any real commentary. The code it talks about was written in 2012.