While I was writing a manpage for Scheme’s quasiquote, something I saw surprised me and changed my understanding of quasiquote. It turns out that a new language, with semantics that are interesting to PLT enthusiasts, hides behind the innocent backtick character. Starting with R6RS Scheme, quasiquote became total magic.
Background
It is not going to be easy to understand the argument in this article if you lack some background knowledge, so here’s a brief explanation of quasiquote, Scheme’s concept of locations, Scheme’s handling of literal constants, and finally referential transparency.
Briefly on quasiquote
Quasiquote is a language feature in Scheme that lets you write a template for a structure of lists and vectors. These templates are more like web templates than C++ templates; don’t let the terminology confuse you.
Basically you write a backtick character to start a template. The code immediately following the backtick is the template. You write a comma wherever you want to fill in some variable or other expression. (There’s also a list-splicing version of the comma which often comes in handy).
The expression `(b "Hello " ,x)
builds a list with these elements:
the symbol b
, the string "Hello"
and lastly whatever value the
variable x
happened to have. So perhaps (b "Hello " "John")
.
Quasiquote is very useful to build SXML and SHTML. Forget learning a new templating system every week; this one’s a keeper. But this being Scheme, the most popular use is likely to write S-expressions that represent code in some language. It’s used for just that in the nanopass framework.
Location, location, location!
Making a language is a difficult job. Everything should ideally work smoothly together as a coherent whole, like pineapples on a pizza. Objects in Scheme programs implicitly refer to locations. There are many details around that which affect the whole language, and they have interesting consequences for quasiquote.
What’s a location? It’s just a place where you can store a value. The
vector #(1 2 3)
has three locations where values are stored, and
currently it’s the numbers 1, 2 and 3. If you make a vector using
make-vector
then the vector is mutable and you can change the
values. Later when you see the same vector again it will still contain
the new values.
In practice a location is some address in memory, but the garbage collector might move it around, so its address changes, but it is still the same location.
Other objects in Scheme do not have locations. The number 1
does not
have any locations that hold values that you can change. This is only
because of the wisdom, kindheartedness and foresight of the language
designers, because it is possible to design things differently.
As a consequence of numbers not having locations, there is also very
little point in worrying about which number object you have. Suppose
that numbers did have locations and you could store important
information in them. You’d be very concerned that the 1
number
object where you stored all your passwords is the same 1
that you
now have in your secrets variable. (Nobody will ever think to look
inside the number 1
for your passwords, so your secrets are safe
there.) But number objects do not actually have locations, so it
doesn’t matter if the Scheme implementation fiddles with them behind
your back and gives you a different 1
object the next time you’re looking.
Literal constants
Pairs and vectors have locations, but the rules for their locations are much relaxed if they are literal constants in the program code.
Constants in Scheme are allowed to have read-only locations. If you compile a program with Loko Scheme, you will notice that you get an assertion if you try to change any of the constants. From R6RS:
It is desirable for constants (i.e. the values of literal expressions) to reside in read-only memory. To express this, it is convenient to imagine that every object that refers to locations is associated with a flag telling whether that object is mutable or immutable. Literal constants, the strings returned by
symbol->string
, records with no mutable fields, and other values explicitly designated as immutable are immutable objects, while all objects created by the other procedures listed in this report are mutable. An attempt to store a new value into a location referred to by an immutable object should raise an exception with condition type&assertion
.
Literal constants can also share locations. If the same constant
appears in different places in the program then the compiler is
allowed to create a single shared instance of the constant, here
explained as it applies to structures, in the section on eqv?
:
Implementations may share structure between constants where appropriate. Thus the value of
eqv?
on constants is sometimes implementation-dependent.
The illustrative examples are:
(eqv? '(a) '(a)) ;⇒ unspecified
(eqv? "a" "a") ;⇒ unspecified
(eqv? '(b) (cdr '(a b))) ;⇒ unspecified
(let ((x '(a)))
(eqv? x x)) ;⇒ #t
So when it comes to literal constants, Scheme’s normal storage rules
do not apply. A program might find that two different locations have
become the same location, so that changing the value in a quoted
vector ends up changing the value in another quoted vector. It’s also
very likely that the program gets an exception when it tries to change
the value in such a location. The last example shows that going the
other way is not allowed: the compiler is not allowed to create two
different versions of the list (a)
in that program.
Referential transparency
Referential transparency is a concept that is important in purely functional programming languages. An expression that is referentially transparent can be replaced by the values it returns.
Some expressions in Scheme are referentially transparent. Constants,
references to variables that are never mutated, arithmetic in general,
type predicates, etc. A Scheme compiler is allowed to replace (+ 1
2)
with 3
. It doesn’t matter that the program might actually have
returned a “different” 3
each time the expression runs. In the same
way it doesn’t matter if the compiler turns two different constants
into the same constant.
Most parts of Scheme are not referentially transparent. As an example,
a Scheme compiler cannot replace (vector 1 2 3)
with '#(1 2 3)
. The
locations created by the vector procedure need to be fresh and
mutable. But it can replace (vector-ref '#(1 2 3) 0)
with 1
, so
this expression is referentially transparent. And as we previously
saw, it can replace (cdr '(a b))
with '(b)
.
All of this should be fairly widely known, but now comes the interesting part.
There is a crack in everything
So far I have explained Scheme’s notion of locations, referential transparency and that the rules are different for literal constants.
Behold this hidden gem in R6RS:
A quasiquote expression may return either fresh, mutable objects or literal structure for any structure that is constructed at run time during the evaluation of the expression. Portions that do not need to be rebuilt are always literal. Thus,
(let ((a 3)) `((1 2) ,a ,4 ,'five 6))
may be equivalent to either of the following expressions:
'((1 2) 3 4 five 6) (let ((a 3)) (cons '(1 2) (cons a (cons 4 (cons 'five '(6))))))
However, it is not equivalent to this expression:
(let ((a 3)) (list (list 1 2) a 4 'five 6))
This part of R6RS originally came from Kent Dybvig’s formal comment #204. The same type of language was adopted in R7RS.
The meaning is that a quasiquoted expression can be turned into a
literal, or parts may be turned into literals. Where there was code in
the quasiquote expression, there can now be a literal. Going the other
direction is not allowed: literals cannot be turned into code that
returns fresh, mutable structure. But as the example '((1 2) 3 4 five
6)
shows, a compiler is allowed to even propagate constants into quasiquote.
There is a very deep rabbit hole here! Have a look again: return […] literal structure for any structure that is constructed at run time during the evaluation of the expression.
There is now a way to construct literals from run time code, but to do so ahead of run time.
Literal magic
Let me demonstrate the power of Scheme’s magic quasiquote. Let -->
mean “equivalent to”. It can be the result of an expansion or another
compiler pass, such as a partial evaluator. Here is the original,
innocuous-looking example:
(let ((a 3)) `((1 2) ,a ,4 ,'five 6))
; -->
'((1 2) 3 4 five 6)
Can we get literal structure copied into the constant part? Easy:
(let ((a '(3))) `((1 2) ,a ,4 ,'five 6))
; -->
'((1 2) (3) 4 five 6)
But we’re just getting started. Can we construct a structure at runtime and have that appear as a constant? Of course!
`((1 2) ,(list 3) ,4 ,'five 6)
;; -->
'((1 2) (3) 4 five 6)
Those Schemers who are paying attention will be thinking I’ve gone mad now. Maybe I have, but this example simply returned literal structure for the (list) structure that was constructed at run time during the evaluation of the expression, to paraphrase the quasiquote specification. Let’s increase the volume:
`((1 2) ,(map + '(1 1) '(2 3)) ,'five 6)
;; -->
'((1 2) (3 4) five 6)
Hang on, shouldn’t map
return a fresh, mutable list? Not anymore,
this is quasiquote. The map
function constructs a list at run time
during the evaluation of the quasiquote expression, so the structure
no longer needs to be fresh. (Besides, the R6RS and R7RS definitions
of map
do not actually say that the list needs to be fresh and
mutable, but everyone probably assumes it does.)
(letrec ((iota
(lambda (n)
(let lp ((m 0) (n n))
(if (>= m n)
'()
(cons m (lp (+ m 1) n)))))))
`((1 2) ,(iota 4) ,'five 6))
;; -->
'((1 2) (0 1 2 3) five 6)
Is this valid? I think most would say it isn’t; the list created by
iota
is constructed with cons
, which returns fresh, mutable pairs.
But this is happening inside quasiquote where the normal rules of
society break down. The iota
procedure constructs a list structure
at runtime during the evaluation of a quasiquote expression, so a
compiler is allowed to return literal structure for that list structure.
These go to 11
Let’s crank it up and make quasiquote transform code.
Compilers like Chez Scheme, Loko Scheme, Guile Scheme and many others use partial evaluators to perform source-level optimizations. A partial evaluator runs code symbolically. The output is a new program that is hopefully more efficient than the program that went into it.
The partial evaluators used by Scheme compilers are pretty mild as far as partial evaluators go, mostly because of the semantics of Scheme. Doing more powerful transformations on Scheme programs would require quite powerful static analysis, and that is both slow and difficult.
To get quasiquote to work with code, we need something that enables a
partial evaluator to run a given procedure in such as way that it’s
always inside a quasiquote expression. If we have such a procedure
then the partial evaluator can start using the tricks described above,
and treat all code that constructs new structures as if their
structures were literals. This makes the partial evaluator very happy,
so here is happy
:
(define (happy proc)
(lambda x
`,(apply proc x)))
In a normal Scheme implementation this operator doesn’t do much more
than maybe waste a little time and space. But in a Scheme that knows
the magic nature of quasiquote, it would enable powerful program
transformations on lists and vectors, without the need for as much
analysis as it would normally require. In particular, it should no
longer be necessary to analyze if intermediate results are mutated,
nor to analyze if programs check results for pointer equality with
eq?
.
Here is an illustrative example of a potential program transformation, based on Philip Wadler’s 1990 paper Deforestation: Transforming programs to eliminate trees:
(define (upto m n)
(if (> m n)
'()
(cons m (upto (+ m 1) n))))
(define (square x)
(* x x))
(define (sum xs)
(let sum* ((a 0) (xs xs))
(match xs
[() a]
[(x . xs) (sum* (+ a x) xs)])))
(define (squares xs)
(match xs
[() '()]
[(x . xs) (cons (square x) (squares xs))]))
(define f
(happy (lambda (n) (sum (squares (upto 1 n))))))
;; -->
(define f
(lambda (n)
(letrec ((h0 (lambda (u0 u1 n)
(if (> u1 n)
u0
(h0 (+ u0 (square u1)) (+ u1 1) n)))))
(h0 0 1 n))))
After deforestation (or fusion), the intermediate lists used in
f
have been eliminated. This is beneficial in that you can write
high-level code, but still have the compiler produce the efficient
loop you would have had to write by hand. Scheme compilers normally
don’t do these transformations due to the required analysis.
The happy
operator does not completely open the barn doors: the
transformation still needs to not change other program side effects,
such as I/O and exceptions.
The fly in the ointment
Imagine a program written according to this template:
(define (main)
...)
(define main* (happy main))
(main*)
What are the consequences for the main program? It would seem that
everything in it follows the rules of quasiquote and it can’t use
cons
in the normal way. This is bad news for the main program.
Conclusions
I don’t know where this leads. What is the precise limit for what a compiler can and can’t turn into literal structure in quasiquote? The example with the main program makes it seem that quasiquote actually gives the compiler a bit too much freedom.
Perhaps it’s actually just poor wording, so R6RS and R7RS will get a new erratum that clarifies what is and what isn’t allowed. I suspect that this is the most likely outcome.
But it doesn’t stop someone who is working on a partial evaluator or
another program transformation from proposing a happy
operator as a
SRFI, giving it semantics that enable even more powerful
transformations, but without the need to rely on language lawyering.
There is one conclusion I can draw from all this: don’t assume that what comes of out quasiquote can be mutated.