I’ve been working on optimizations in Loko Scheme recently and have implemented large parts of A Sufficiently Smart Compiler for Procedural Records (Keep & Dybvig, 2012). At the same time I have improved the representation of record type descriptors and wanted to share a simple trick I used to improve record type checks for non-sealed records. But first I should explain what a record is in Scheme.
Background: Records in Scheme
Scheme supports record types, which are user-defined data types. R⁷RS Scheme has a syntax-based variant of this feature, based on SRFI-9. Here’s an example:
(import (scheme base))
(define-record-type point
(make-point x y)
point?
(x point-x point-x-set!)
(y point-y point-y-set))
This will let you write (make-point 0.0 0.0)
to get a point at (0.0,
0.0), and (point-x p)
to access the x field of p. That’s it for
records in R⁷RS and SRFI-9.
The record type system in R⁶RS Scheme improves on SRFI-9 in several ways. In R⁶RS you would instead write this:
(import (rnrs))
(define-record-type point
(fields (mutable x)
(mutable y)))
There is no longer any need to explicitly write out the names of the
constructor, predicate, accessors and mutators (unless you want to).
Additionally, you can extend a record type, you can customize the
constructor, and you can control what happens if define-record-type
runs multiple times, i.e. if it makes a new type each time or not.
A syntactical record layer can be abstraction on top of a procedural layer. What you can do with syntax, you can also do with procedure calls at runtime. R⁶RS standardizes this layer as well. It also standardizes a record inspection layer that lets you get the record type descriptor (RTD) from a record at runtime (unless it’s marked as opaque) and also to inspect all aspects of RTDs. In fact, RTDs are objects in their own right, just like pairs and symbols.
The above record type definition might expand to this code that uses the procedural layer (a real expansion would use fresh identifiers for the RTD and RCD):
(define point-rtd
(make-record-type-descriptor
'point #f #f #f #f
'#((mutable x) (mutable y))))
(define point-rcd
(make-record-constructor-descriptor
point-rtd #f #f))
(define point? (record-predicate point-rtd))
(define make-point (record-constructor point-rcd))
(define point-x (record-accessor point-rtd 0))
(define point-y (record-accessor point-rtd 1))
(define point-x-set! (record-mutator point-rtd 0))
(define point-y-set! (record-mutator point-rtd 1))
This code uses the helpers record-predicate
, record-constructor
,
record-accessor
and record-mutator
to create procedures. An
intermediate record constructor descriptor contains the information
needed to make the constructor.
Next we will have a look at records in memory, how the code above is optimized, and finally a trick to speed up record type checks.
Record Type Representation
Loko Scheme has a straightforward representation of records. Using the
above point
type as an example, here are the records returned by
(make-point 1.0 2.0)
and (make-point 1.5 -2.0)
as they are
structured in memory. The rows are 64-bit words and the arrows are
tagged pointers. Memory allocations are aligned to 16 bytes, and the
empty space created by alignment is also shown.
The information stored in the record is a pointer to the record type descriptor, which is reused for each record of the same type, followed by a slot for each field in the record.
The information in the record type descriptor is used by the record inspection procedures and the garbage collector. The slots contain: a type tag for the rtd itself, the size of the records, an optional parent type, an optional record unique identifier, the field names, field mutability (a bit-field), and an optional record writer procedure.
Loko uses the first slot in the RTD to store the length of the RTD and these flags: opaque?, sealed?, generative?.
Other R⁶RS implementations will have similar representations of RTDs because all this information is needed at runtime.
Single Inheritance
R⁶RS supports single inheritance for record types. Instead of demonstrating this with some contrived geometric shapes or balloon animals, let’s use an example from working code. The following record types are simplified variants of records used in Loko’s PCI library.
(define-record-type pcibar
(fields reg base size))
(define-record-type pcibar-i/o
(parent pcibar))
(define-record-type pcibar-mem
(parent pcibar)
(fields type prefetchable?))
PCI base address registers (BARs) all have these fields: reg, base, and size. If they are in I/O space then that’s all, but BARs in memory space have two additional fields: type and prefetchable?.
Notice that both pcibar-i/o
and pcibar-mem
point to pcibar
as
their parent. The size field is larger in pcibar-mem
to account for
the extra fields. The extra fields in the pcibar-mem
record are
placed immediately after the fields that belong to pcibar
, so
accessors for pcibar
don’t need to recompute the slot numbers when
passed a pcibar-mem
.
Predicates for Non-Sealed Types
R⁶RS lets you say that a record type is sealed. This prevents a record type from being extended. As a consequence, type checks are more efficient. Why is that?
The record predicate pcibar?
is given an object and returns true if
the object has that record type, and false otherwise. If the
implementation uses tagged pointers then the predicate first checks
the tag. Next, it reads the type field of the object and compares it
to the pcibar
type.
But if types are not sealed then they can be extended, and it’s
possible that the type that the predicate is checking for was used as
a parent. The pcibar?
predicate should return true even for a
pcibar-mem
record.
The Trick
Previously Loko Scheme’s record-predicate
procedure worked as
follows. It checked the RTD to see if it’s sealed. For sealed RTDs it
returned a procedure that implemented the fast check described above.
For non-sealed RTDs another procedure was returned that did that check, and additionally looped over all parent RTDs to see if any of them was the desired RTD:
(define (record-predicate rtd)
(if (record-type-sealed? rtd)
(lambda (obj) ;fast path
(and
($box? obj)
(eq? ($box-type obj) rtd)))
(lambda (obj) ;slow path
(and
($box? obj)
(let ((t ($box-type obj)))
(or
(eq? t rtd)
(and
(record-type-descriptor? t)
(let lp ((t (record-type-parent t)))
(cond
((eq? t rtd) #t)
((not t) #f)
(else
(lp (record-type-parent t))))))))))))
I haven’t checked around, but I suspect that most R⁶RS implementations do something similar. Even when I checked Chez Scheme’s assembly output I saw a loop that’s morally equivalent to this one. This loop also shows up in accessors and mutators, because they need to know that the object they’ve been passed has the right type.
The trick: this loop can be avoided by extending the RTD representation so that each RTD directly contains pointers to all parent RTDs. The pointers are laid out so that the base type is placed first, followed by the sub-types in order. An RTD will then appear at a fixed location in any RTD that extends it.
I’m sure that there are other language implementations where this problem of sub-typing shows up and someone else has come up with just this optimization, because it’s kind of obvious.
Suppose that we have a base record type and some record types that extend each other. For simplicity, I will not give them any fields.
(define-record-type A)
(define-record-type B (parent A))
(define-record-type C (parent B))
(define-record-type S (parent A))
(define-record-type T (parent S))
The expression (A? (make-A))
evaluates to true, which also A?
does
for all types shown. But (B? (make-T))
evaluates to false because
T
does not have B
anywhere in its chain of parents. That’s what
the loop would be checking.
This picture shows the memory layout when the trick is used on these RTDs.
The pointer to the A
RTD is always in the 0:
slot for any RTD that
extends it. Similarly, the predicate for B
knows to always check in
the 1:
slot. A bounds check on the RTD is also needed in this layout.
Taking it further
Further improvements on this layout are possible. Loko Scheme always
allocates memory for four parent RTDs. If an RTD will appear at slots
0 to 3, then the predicate does not need to do bounds checking on the
RTD. The parent:
slot is not strictly needed and can be removed.
Specially just for Loko Scheme, the predicates hidden inside accessors and mutators use slightly less code than the predicate procedures. These hidden predicates do not explicitly verify the tags on the pointers, instead leaving it up to the processor’s built-in alignment checking to trap invalid references.
The type checks are even faster if specialized predicates, accessors, and mutators can be generated at compile time. If the RTD is known at compile time then the slot that contains the RTD is also known and can be inlined.
Sufficiently Smart
A sufficiently smart compiler is a legendary compiler that does your favorite optimizations so that your favorite language feature becomes very efficient.
In A Sufficiently Smart Compiler for Procedural Records, Andy Keep and Kent Dybvig present their work on optimizing the R⁶RS procedural record system in Chez Scheme. It builds on top of the source-level optimizer cp0. Loko Scheme has its own implementation of cp0, so adapting their work has been pretty simple.
The basic idea is to have cp0 generate static or partially static
RTDs, which are then propagated throughout the program using cp0’s
existing mechanisms. If cp0 succeeds in propagating the RTDs to where
record-accessor
(etc) are called, then it can also generate code
specialized to each record type.
I’ve implemented large parts of the ideas in the paper in Loko Scheme, together with the improved record type representation.
Post-script
Compiling Loko Scheme with Loko Scheme is now almost as fast as
compiling it with Chez Scheme, if garbage collection time is not
counted (run the compilation with something like LOKO_HEAP=28000
if
you have enough RAM). I’m not sure it’s an apples-to-apples comparison
though, because when Chez is used, it also has to load and compile
Loko’s compiler, whereas Loko cheats by already having it loaded. But
still, Loko’s performance is improving. I’m interested in seeing how
well the next release will fare in the Scheme Benchmarks.