Low-bit tagging is a technique where the low bits of values are used to store type information. There are numerous benefits that come with this technique and it is quite popular in implementations of Scheme, JavaScript and other languages. But once you start down the road of bit-twiddling it is hard to stop and the design of the tagging system may become difficult to understand. So that’s when you look in your tool box and pull out something like Z3, which this article explores.
Example of Cleverness
Let us begin with a look at a real life example of low-bit tagging from Chez Scheme on an AMD64 system, where integers in the interval [-260, 260-1] are encoded directly into the value’s bit pattern:
Chez Scheme Version 9.5
Copyright 1984-2017 Cisco Systems, Inc.
> (#%$assembly-output #t)
> (lambda (x y) (fx+ x y))
entry.28:
0: cmpi (imm 2), %ac0
4: bne lf (35)
dcl.29:
6: mov %rdi, %rcx
9: or %r8, %rcx
12: testib (imm 7), %rcx
15: bne Llib (12)
lt.30:
…
Chez Scheme has its own mnemonics for the assembler instructions, but
it should be familiar enough. In this example the rdi
and r8
registers are bitwise logical OR’d together, the three low bits are
tested and at 15:
it branches to some error handler if the bits were
not 0. In C terms: if (((x | y) & 7) != 0) { goto error; }
. This is an
example of the cleverness possible in a well-designed low-bit tagging
system: two values can be type checked with a single branch instruction.
This works because all fixnums are tagged with three 0 bits and bitwise logical OR with something not a fixnum would introduce some 1 bits. Chez Scheme is careful to never create non-fixnum objects with three lower 0 bits.
Starting with Z3Py
Let us have a look at how to enter this system into Z3 via Z3Py. Z3 is
an MIT-licensed theorem prover from Microsoft Research. The native
language of Z3 is actually SMT-LIB, but
I will use Z3Py here because I find it helps with writing logic at a
higher level. Z3Py is available in many Linux distributions, including
Debian: apt-get install python-z3
(currently Python 2.7 only).
So what does Z3 actually do? Think of it as a tool that does a brute force search through a whole problem space, looking at every possible model that satisfies a set of constraints, but that it also knows a lot of shortcuts that speed up the search. If you gave it the assertion x + 1 = 2 it would be clever enough that it would not need to go and search through all possible values of x until it found x = 1 (but in some situations would do basically this).
Back to tagging. One interesting property of tagging fixnums with 0 bits is that addition can be performed without masking away the tag bits. This Z3Py script checks this property:
from __future__ import print_function
from z3 import *
# We want fixnums to be tagged with some three low bits.
tag_fixnum = BitVec('tag-fixnum', 64)
mask_fixnum = BitVecVal(0b111, 64)
# Keep these ordered.
tags = ( tag_fixnum, )
masks = ( mask_fixnum, )
# Create a solver.
s = Solver()
# Two fixnums can be added and the result is a fixnum.
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
Implies(And((x & mask_fixnum) == tag_fixnum,
(y & mask_fixnum) == tag_fixnum),
((x + y) & mask_fixnum) == tag_fixnum)))
print(s.sexpr())
print(s.check())
print(s.model().sexpr())
The BitVec
objects here are bit-vector variables in the model,
representing some 64-bit value and BitVecVal
is a 64-bit constant.
The argument to s.add
is read as: for all x and for all y it is
true that x and y being fixnums implies (as in P → Q) that x + y
is also a fixnum. It is important to use x and y inside ForAll
,
otherwise Z3 will look for some specific x and y that satisfy the
assertions instead of proving a model for all fixnums. (There is a
missing constraint, though. Can you see it?)
When run through Python it generates this output:
(declare-fun tag-fixnum () (_ BitVec 64))
(assert (forall ((x (_ BitVec 64)) (y (_ BitVec 64)))
(=> (and (= (bvand x #x0000000000000007) tag-fixnum)
(= (bvand y #x0000000000000007) tag-fixnum))
(= (bvand (bvadd x y) #x0000000000000007) tag-fixnum))))
sat
(define-fun tag-fixnum () (_ BitVec 64)
#x0000000000000000)
The first part is the solver in the SMT-LIB language and sat
means
it is satisfiable. That it is satisfiable means that Z3 also found a
model, which is the last output shown. Here it has assigned all 0 bits
to the fixnum tag, having proved that tagging with 0 bits allows
addition to work directly with fixnums. (Almost, actually. We didn’t
check that addition itself actually gives the right result).
Something that’s tricky with these theorem solvers is that you really need to tell them all the constraints or, like little children, they will find that one electrical outlet that you didn’t secure. The missing constraint here is that the tag must fit within the mask. Z3 actually found a model where the tag is zero, which is what we were looking for, but it could just as well have given us a model where some high bit of the fixnum tag is set. That will be seen in the next section.
Types, types, types!
Just having fixnums is no fun, so let’s add pairs, characters, booleans and the empty list (nil). Some additional constraints will be needed, but let us first see what happens without them.
from __future__ import print_function
from z3 import *
tag_fixnum = BitVec('tag-fixnum', 64)
tag_pair = BitVec('tag-pair', 64)
tag_char = BitVec('tag-char', 64)
tag_boolean = BitVec('tag-boolean', 64)
tag_nil = BitVec('tag-nil', 64)
mask_fixnum = BitVecVal(0b111, 64)
mask_pair = BitVecVal(0b111, 64)
mask_char = BitVec('mask-char', 64)
mask_boolean = BitVec('mask-boolean', 64)
mask_nil = BitVec('mask-nil', 64)
tags = ( tag_fixnum, tag_pair, tag_char, tag_boolean, tag_nil )
masks = ( mask_fixnum, mask_pair, mask_char, tag_boolean, mask_nil )
s = Solver()
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
Implies(And((x & mask_fixnum) == tag_fixnum,
(y & mask_fixnum) == tag_fixnum),
((x + y) & mask_fixnum) == tag_fixnum)))
# Uncommented one by one in the discussion below.
#s.add(Distinct(tags))
#s.add([(tag & mask) == tag for (tag, mask) in zip(tags, masks)])
#s.add([And(mask > 0, mask <= 0xff) for mask in masks])
print(s.sexpr())
print(s.check())
print(s.model().sexpr())
If you were to run this through Python you would find that Z3Py
probably prints exactly the same thing as before! That’s because the
new tag and mask variables are not referenced anywhere in the model.
Uncomment the Distinct
constraint and you might get this model:
(define-fun tag-pair () (_ BitVec 64)
#x0000000000000003)
(define-fun tag-nil () (_ BitVec 64)
#x0000000000000000)
(define-fun tag-char () (_ BitVec 64)
#x0000000000000002)
(define-fun tag-boolean () (_ BitVec 64)
#x0000000000000001)
(define-fun tag-fixnum () (_ BitVec 64)
#x2000000000000000)
Distinct means that the tags must have different values. Note that Z3
gave the all-0 tag to nil and fixnums have a high bit set. And
indeed, the addition constraint still holds in this model. Oops.
Uncomment the constraint after the Distinct
line to constrain tags
to fit inside their mask. Here is a model with the new constraints:
(define-fun tag-pair () (_ BitVec 64)
#x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
#x32212a2aa3282220)
(define-fun mask-char () (_ BitVec 64)
#x0000000010000000)
(define-fun mask-nil () (_ BitVec 64)
#x32212a2aa3282220)
(define-fun tag-char () (_ BitVec 64)
#x0000000010000000)
(define-fun tag-boolean () (_ BitVec 64)
#x0100000000000000)
(define-fun tag-fixnum () (_ BitVec 64)
#x0000000000000000)
Z3 really got creative here with the tag and mask for nil, but fixnums are back to zero tags, so that’s good. In general the tags are a bit too large, so let’s enable the next constraint, saying that the masks should be 8-bit values. Here is the new model (again, several models are possible):
sat
(define-fun tag-pair () (_ BitVec 64)
#x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
#x0000000000000040)
(define-fun mask-char () (_ BitVec 64)
#x0000000000000061)
(define-fun mask-nil () (_ BitVec 64)
#x0000000000000040)
(define-fun tag-char () (_ BitVec 64)
#x0000000000000061)
(define-fun tag-boolean () (_ BitVec 64)
#x0000000000000080)
(define-fun tag-fixnum () (_ BitVec 64)
#x0000000000000000)
This iterative approach to the design is where using a theorem prover really shines. In this model the nil value and booleans are actually fixnums, which is wrong, so assertions should be added that prevents this from happening. But now on to some cleverness.
Clever masking
The masks in the current design are small and neat and fit as immediates in instruction encodings. However, anyone who is familiar with the x86 instruction set knows that registers can be addressed in smaller parts without separate masking. Here is a handy table for one of the registers:
Register name | Register size | Addressed data |
---|---|---|
rax |
64 | rax |
eax |
32 | rax & 0xffffffff |
ax |
16 | rax & 0xffff |
al |
16 | rax & 0xff |
ah |
16 | (rax >> 8) & 0xff |
Using al
would make it possible to type check without applying the
mask explicitly, if it is exactly 0xff
. This also means that the
original value is not overwritten, so a temporary register is not
needed. Decreasing register pressure is important when optimizing some
code, e.g. tight loops.
Let us see what Chez Scheme does here.
> (#%$assembly-output #t)
> (lambda (x) (char=? x #\space))
entry.21:
0: cmpi (imm 1), %ac0
4: bne lf (66)
dcl.22:
6: mov %r8, %rcx
9: andi (imm 255), %rcx
16: cmpi (imm 22), %rcx
20: bne lf (31)
lt.23:
…
Hmm! The equivalent C code is rcx = (r8 & 255);
if (rcx != 255) {
goto error; }
. It turns out that Chez Scheme doesn’t know that it can
use r8l
to do the check without involving a temporary register, even
though the mask allows for this. Perhaps Chez Scheme has some
low-hanging fruit for the intrepid compiler developer.
When you find a trick that you want to use in your tagging system, you just add it as a constraint. This Z3Py snippet adds the new constraint for characters:
s.add(Or(mask_char == 0xff, # for free with al
mask_char == 0xffff, # ax
mask_char == 0xffffffff)) # eax
Clever shifting
Any language with both characters and integers needs some way to
convert between them and Scheme is no exception. Chez Scheme’s
implementation of char->integer
contains a piece of cleverness (not
unique to itself) that works because of how the character and fixnum
tags are arranged:
> (#%$assembly-output #t)
> (lambda (x) (char->integer x))
entry.28:
0: cmpi (imm 1), %ac0
4: bne lf (39)
dcl.29:
6: mov %r8, %rcx
9: andi (imm 255), %rcx
16: cmpi (imm 22), %rcx
20: bne lf (11)
lt.30:
22: mov %r8, %ac0
25: lsri (imm 5), %ac0
…
It first does a type check on r8
to ensure that it’s a character.
Then it writes the return value as (in C terms) r8 >> 5
. How can 5
work with no unmasking or tagging? Let’s add it as a constraint. This
requires some leg work, see the comments:
# SPDX-License-Identifier: MIT
# Unchanged from before
from __future__ import print_function
from z3 import *
tag_fixnum = BitVec('tag-fixnum', 64)
tag_pair = BitVec('tag-pair', 64)
tag_char = BitVec('tag-char', 64)
tag_boolean = BitVec('tag-boolean', 64)
tag_nil = BitVec('tag-nil', 64)
mask_fixnum = BitVecVal(0b111, 64)
mask_pair = BitVecVal(0b111, 64)
mask_char = BitVec('mask-char', 64)
mask_boolean = BitVec('mask-boolean', 64)
mask_nil = BitVec('mask-nil', 64)
tags = ( tag_fixnum, tag_pair, tag_char, tag_boolean, tag_nil )
masks = ( mask_fixnum, mask_pair, mask_char, tag_boolean, mask_nil )
s = Solver()
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
Implies(And((x & mask_fixnum) == tag_fixnum,
(y & mask_fixnum) == tag_fixnum),
((x + y) & mask_fixnum) == tag_fixnum)))
s.add(Distinct(tags))
s.add([(tag & mask) == tag for (tag, mask) in zip(tags, masks)])
s.add([And(mask > 0, mask <= 0xff) for mask in masks])
s.add(Or(mask_char == 0xff,
mask_char == 0xffff,
mask_char == 0xffffffff))
# New code starts here:
# A trick (see Hacker's Delight) to get the shift amounts
# that match the masks.
shift_char = BitVec('shift-char', 64)
shift_fixnum = BitVec('shift-fixnum', 64)
s.add(mask_char == ((1 << shift_char) - 1))
s.add(mask_fixnum == ((1 << shift_fixnum) - 1))
s.add(shift_char > 0)
s.add(shift_fixnum > 0)
# Z3Py versions of char->integer and integer->char
def char_to_integer(ch):
return ((ch >> shift_char) << shift_fixnum) | tag_fixnum
def integer_to_char(fx):
return ((fx >> shift_fixnum) << shift_char) | tag_char
# Sets fx_A to the fixnum that represents the 'A' code point
# and sets ch_A to the character 'A'. Then asserts that the
# conversion functions work. I'm a little bit lazy and do this
# for 'A' instead of all chars.
ch_A = BitVec('ch-A', 64)
fx_A = BitVec('fx-A', 64)
s.add(fx_A == ((ord('A') << shift_fixnum) | tag_fixnum))
s.add(ch_A == ((ord('A') << shift_char) | tag_char))
s.add(char_to_integer(ch_A) == fx_A)
s.add(integer_to_char(fx_A) == ch_A)
# Assert that no object satisfies both fixnump and charp.
# Ideally there should be a complete set of these.
def fixnump(obj): return (obj & mask_fixnum) == tag_fixnum
def charp(obj): return (obj & mask_char) == tag_char
s.add(ForAll([x], Implies(fixnump(x), Not(charp(x)))))
s.add(ForAll([x], Implies(charp(x), Not(fixnump(x)))))
# Assert that char->integer is equivalent to (ch >> n) for some n.
# This is the main point of this section.
shift_ch_to_fx = BitVec('shift-ch->fx', 64)
s.add(char_to_integer(ch_A) == (ch_A >> shift_ch_to_fx))
print(s.check())
print(s.model().sexpr())
Quite a bit of code, but it is necessary so that Z3 will not find a loophole. Here is one possible output:
sat
(define-fun shift-char () (_ BitVec 64)
#x0000000000000008)
(define-fun ch-A () (_ BitVec 64)
#x0000000000004102)
(define-fun shift-fixnum () (_ BitVec 64)
#x0000000000000003)
(define-fun mask-char () (_ BitVec 64)
#x00000000000000ff)
(define-fun tag-char () (_ BitVec 64)
#x0000000000000002)
(define-fun mask-nil () (_ BitVec 64)
#x000000000000000c)
(define-fun tag-boolean () (_ BitVec 64)
#x0000000000000014)
(define-fun tag-pair () (_ BitVec 64)
#x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
#x000000000000000c)
(define-fun tag-fixnum () (_ BitVec 64)
#x0000000000000000)
(define-fun fx-A () (_ BitVec 64)
#x0000000000000208)
(define-fun shift-ch->fx () (_ BitVec 64)
#x0000000000000005)
The interesting part is shift-ch->fx
, which is 5, just like in Chez
Scheme. Furthermore, if you add the assertion shift_ch_to_fx != 5
then Z3 will say “model is not available”, meaning that only this
shift amount has the desired properties. There is nothing particular
that stands out in the assertions that causes this to happen (although
it is possible to get other shift amounts if some constraints are relaxed).
This has also affected the selections of the other values in the model. If you were to change the shift amount for characters then you could no longer rely on this trick and Z3 would happily let you know about it. If you rely on this trick in your compiler then it’s a good idea to add it as an assertion. In fact, while writing your Z3Py code it’s a good idea to add all your assumptions as assertions.
Summing up
I could go on and show a few more missed optimization opportunities in Chez Scheme, like how it doesn’t type check multiple characters with one branch, but I hope that you have already seen how Z3 is useful. It lets you prove properties of your tagging system and, perhaps just as importantly, lets you document your assumptions.
Z3 itself is lacking in documentation, attempting to use tutorials as a substitute, and its web site is full of dead links. When Z3 fails to find a solution it either goes into a seemingly infinite spin and/or prints unsat, hoping you will go away. (There is a way to get it to print a counterexample, but the output is sadly incomprehensible.) Quite often I found myself sitting at my terminal wondering why my assertions were unsatisfiable.
My advice, if you find yourself in the situation where you are 100% certain something should work, is to remove the general cases and add assertions for very specific cases. At some point a specific case will cause Z3 to reject your assertions, which will give you a clue as to what has gone wrong.
Give it a spin if you’re thinking about revamping your tagging system or if you want to add extra tricks and be certain that they are well founded.