Quasiquote - Literal Magic

Written by Gwen Weinholt on 2020-05-15

While I was writing a manpage for Scheme’s quasiquote, something I saw surprised me and changed my understanding of quasiquote. It turns out that a new language, with semantics that are interesting to PLT enthusiasts, hides behind the innocent backtick character. Starting with R6RS Scheme, quasiquote became total magic.

Background

It is not going to be easy to understand the argument in this article if you lack some background knowledge, so here’s a brief explanation of quasiquote, Scheme’s concept of locations, Scheme’s handling of literal constants, and finally referential transparency.

Briefly on quasiquote

Quasiquote is a language feature in Scheme that lets you write a template for a structure of lists and vectors. These templates are more like web templates than C++ templates; don’t let the terminology confuse you.

Basically you write a backtick character to start a template. The code immediately following the backtick is the template. You write a comma wherever you want to fill in some variable or other expression. (There’s also a list-splicing version of the comma which often comes in handy).

The expression `(b "Hello " ,x) builds a list with these elements: the symbol b, the string "Hello" and lastly whatever value the variable x happened to have. So perhaps (b "Hello " "John").

Quasiquote is very useful to build SXML and SHTML. Forget learning a new templating system every week; this one’s a keeper. But this being Scheme, the most popular use is likely to write S-expressions that represent code in some language. It’s used for just that in the nanopass framework.

Location, location, location!

Making a language is a difficult job. Everything should ideally work smoothly together as a coherent whole, like pineapples on a pizza. Objects in Scheme programs implicitly refer to locations. There are many details around that which affect the whole language, and they have interesting consequences for quasiquote.

What’s a location? It’s just a place where you can store a value. The vector #(1 2 3) has three locations where values are stored, and currently it’s the numbers 1, 2 and 3. If you make a vector using make-vector then the vector is mutable and you can change the values. Later when you see the same vector again it will still contain the new values.

In practice a location is some address in memory, but the garbage collector might move it around, so its address changes, but it is still the same location.

Other objects in Scheme do not have locations. The number 1 does not have any locations that hold values that you can change. This is only because of the wisdom, kindheartedness and foresight of the language designers, because it is possible to design things differently.

As a consequence of numbers not having locations, there is also very little point in worrying about which number object you have. Suppose that numbers did have locations and you could store important information in them. You’d be very concerned that the 1 number object where you stored all your passwords is the same 1 that you now have in your secrets variable. (Nobody will ever think to look inside the number 1 for your passwords, so your secrets are safe there.) But number objects do not actually have locations, so it doesn’t matter if the Scheme implementation fiddles with them behind your back and gives you a different 1 object the next time you’re looking.

Literal constants

Pairs and vectors have locations, but the rules for their locations are much relaxed if they are literal constants in the program code.

Constants in Scheme are allowed to have read-only locations. If you compile a program with Loko Scheme, you will notice that you get an assertion if you try to change any of the constants. From R6RS:

It is desirable for constants (i.e. the values of literal expressions) to reside in read-only memory. To express this, it is convenient to imagine that every object that refers to locations is associated with a flag telling whether that object is mutable or immutable. Literal constants, the strings returned by symbol->string, records with no mutable fields, and other values explicitly designated as immutable are immutable objects, while all objects created by the other procedures listed in this report are mutable. An attempt to store a new value into a location referred to by an immutable object should raise an exception with condition type &assertion.

Literal constants can also share locations. If the same constant appears in different places in the program then the compiler is allowed to create a single shared instance of the constant, here explained as it applies to structures, in the section on eqv?:

Implementations may share structure between constants where appropriate. Thus the value of eqv? on constants is sometimes implementation-dependent.

The illustrative examples are:

(eqv? '(a) '(a))         ;⇒ unspecified
(eqv? "a" "a")           ;⇒ unspecified
(eqv? '(b) (cdr '(a b))) ;⇒ unspecified
(let ((x '(a)))
  (eqv? x x))            ;⇒ #t

So when it comes to literal constants, Scheme’s normal storage rules do not apply. A program might find that two different locations have become the same location, so that changing the value in a quoted vector ends up changing the value in another quoted vector. It’s also very likely that the program gets an exception when it tries to change the value in such a location. The last example shows that going the other way is not allowed: the compiler is not allowed to create two different versions of the list (a) in that program.

Referential transparency

Referential transparency is a concept that is important in purely functional programming languages. An expression that is referentially transparent can be replaced by the values it returns.

Some expressions in Scheme are referentially transparent. Constants, references to variables that are never mutated, arithmetic in general, type predicates, etc. A Scheme compiler is allowed to replace (+ 1 2) with 3. It doesn’t matter that the program might actually have returned a “different” 3 each time the expression runs. In the same way it doesn’t matter if the compiler turns two different constants into the same constant.

Most parts of Scheme are not referentially transparent. As an example, a Scheme compiler cannot replace (vector 1 2 3) with '#(1 2 3). The locations created by the vector procedure need to be fresh and mutable. But it can replace (vector-ref '#(1 2 3) 0) with 1, so this expression is referentially transparent. And as we previously saw, it can replace (cdr '(a b)) with '(b).

All of this should be fairly widely known, but now comes the interesting part.

There is a crack in everything

So far I have explained Scheme’s notion of locations, referential transparency and that the rules are different for literal constants.

Behold this hidden gem in R6RS:

A quasiquote expression may return either fresh, mutable objects or literal structure for any structure that is constructed at run time during the evaluation of the expression. Portions that do not need to be rebuilt are always literal. Thus,

(let ((a 3)) `((1 2) ,a ,4 ,'five 6))

may be equivalent to either of the following expressions:

'((1 2) 3 4 five 6)

(let ((a 3))
   (cons '(1 2)
         (cons a (cons 4 (cons 'five '(6))))))

However, it is not equivalent to this expression:

(let ((a 3)) (list (list 1 2) a 4 'five 6))

This part of R6RS originally came from Kent Dybvig’s formal comment #204. The same type of language was adopted in R7RS.

The meaning is that a quasiquoted expression can be turned into a literal, or parts may be turned into literals. Where there was code in the quasiquote expression, there can now be a literal. Going the other direction is not allowed: literals cannot be turned into code that returns fresh, mutable structure. But as the example '((1 2) 3 4 five 6) shows, a compiler is allowed to even propagate constants into quasiquote.

There is a very deep rabbit hole here! Have a look again: return […] literal structure for any structure that is constructed at run time during the evaluation of the expression.

There is now a way to construct literals from run time code, but to do so ahead of run time.

Literal magic

Let me demonstrate the power of Scheme’s magic quasiquote. Let --> mean “equivalent to”. It can be the result of an expansion or another compiler pass, such as a partial evaluator. Here is the original, innocuous-looking example:

(let ((a 3)) `((1 2) ,a ,4 ,'five 6))
; -->
'((1 2) 3 4 five 6)

Can we get literal structure copied into the constant part? Easy:

(let ((a '(3))) `((1 2) ,a ,4 ,'five 6))
; -->
'((1 2) (3) 4 five 6)

But we’re just getting started. Can we construct a structure at runtime and have that appear as a constant? Of course!

`((1 2) ,(list 3) ,4 ,'five 6)
;; -->
'((1 2) (3) 4 five 6)

Those Schemers who are paying attention will be thinking I’ve gone mad now. Maybe I have, but this example simply returned literal structure for the (list) structure that was constructed at run time during the evaluation of the expression, to paraphrase the quasiquote specification. Let’s increase the volume:

`((1 2) ,(map + '(1 1) '(2 3)) ,'five 6)
;; -->
'((1 2) (3 4) five 6)

Hang on, shouldn’t map return a fresh, mutable list? Not anymore, this is quasiquote. The map function constructs a list at run time during the evaluation of the quasiquote expression, so the structure no longer needs to be fresh. (Besides, the R6RS and R7RS definitions of map do not actually say that the list needs to be fresh and mutable, but everyone probably assumes it does.)

(letrec ((iota
          (lambda (n)
            (let lp ((m 0) (n n))
              (if (>= m n)
                  '()
                  (cons m (lp (+ m 1) n)))))))
  `((1 2) ,(iota 4) ,'five 6))
;; -->
'((1 2) (0 1 2 3) five 6)

Is this valid? I think most would say it isn’t; the list created by iota is constructed with cons, which returns fresh, mutable pairs. But this is happening inside quasiquote where the normal rules of society break down. The iota procedure constructs a list structure at runtime during the evaluation of a quasiquote expression, so a compiler is allowed to return literal structure for that list structure.

These go to 11

Let’s crank it up and make quasiquote transform code.

Compilers like Chez Scheme, Loko Scheme, Guile Scheme and many others use partial evaluators to perform source-level optimizations. A partial evaluator runs code symbolically. The output is a new program that is hopefully more efficient than the program that went into it.

The partial evaluators used by Scheme compilers are pretty mild as far as partial evaluators go, mostly because of the semantics of Scheme. Doing more powerful transformations on Scheme programs would require quite powerful static analysis, and that is both slow and difficult.

To get quasiquote to work with code, we need something that enables a partial evaluator to run a given procedure in such as way that it’s always inside a quasiquote expression. If we have such a procedure then the partial evaluator can start using the tricks described above, and treat all code that constructs new structures as if their structures were literals. This makes the partial evaluator very happy, so here is happy:

(define (happy proc)
  (lambda x
    `,(apply proc x)))

In a normal Scheme implementation this operator doesn’t do much more than maybe waste a little time and space. But in a Scheme that knows the magic nature of quasiquote, it would enable powerful program transformations on lists and vectors, without the need for as much analysis as it would normally require. In particular, it should no longer be necessary to analyze if intermediate results are mutated, nor to analyze if programs check results for pointer equality with eq?.

Here is an illustrative example of a potential program transformation, based on Philip Wadler’s 1990 paper Deforestation: Transforming programs to eliminate trees:

(define (upto m n)
  (if (> m n)
      '()
      (cons m (upto (+ m 1) n))))

(define (square x)
  (* x x))

(define (sum xs)
  (let sum* ((a 0) (xs xs))
    (match xs
      [() a]
      [(x . xs) (sum* (+ a x) xs)])))

(define (squares xs)
  (match xs
    [() '()]
    [(x . xs) (cons (square x) (squares xs))]))

(define f
  (happy (lambda (n) (sum (squares (upto 1 n))))))
;; -->
(define f
  (lambda (n)
    (letrec ((h0 (lambda (u0 u1 n)
                   (if (> u1 n)
                       u0
                       (h0 (+ u0 (square u1)) (+ u1 1) n)))))
      (h0 0 1 n))))

After deforestation (or fusion), the intermediate lists used in f have been eliminated. This is beneficial in that you can write high-level code, but still have the compiler produce the efficient loop you would have had to write by hand. Scheme compilers normally don’t do these transformations due to the required analysis.

The happy operator does not completely open the barn doors: the transformation still needs to not change other program side effects, such as I/O and exceptions.

The fly in the ointment

Imagine a program written according to this template:

(define (main)
  ...)

(define main* (happy main))

(main*)

What are the consequences for the main program? It would seem that everything in it follows the rules of quasiquote and it can’t use cons in the normal way. This is bad news for the main program.

Conclusions

I don’t know where this leads. What is the precise limit for what a compiler can and can’t turn into literal structure in quasiquote? The example with the main program makes it seem that quasiquote actually gives the compiler a bit too much freedom.

Perhaps it’s actually just poor wording, so R6RS and R7RS will get a new erratum that clarifies what is and what isn’t allowed. I suspect that this is the most likely outcome.

But it doesn’t stop someone who is working on a partial evaluator or another program transformation from proposing a happy operator as a SRFI, giving it semantics that enable even more powerful transformations, but without the need to rely on language lawyering.

There is one conclusion I can draw from all this: don’t assume that what comes of out quasiquote can be mutated.