Design Your Low-Bit Tagging with Z3Py

Written by Gwen Weinholt on 2018-11-18

Low-bit tagging is a technique where the low bits of values are used to store type information. There are numerous benefits that come with this technique and it is quite popular in implementations of Scheme, JavaScript and other languages. But once you start down the road of bit-twiddling it is hard to stop and the design of the tagging system may become difficult to understand. So that’s when you look in your tool box and pull out something like Z3, which this article explores.

Example of Cleverness

Let us begin with a look at a real life example of low-bit tagging from Chez Scheme on an AMD64 system, where integers in the interval [-260, 260-1] are encoded directly into the value’s bit pattern:

Chez Scheme Version 9.5
Copyright 1984-2017 Cisco Systems, Inc.

> (#%$assembly-output #t)
> (lambda (x y) (fx+ x y))

entry.28:
0:       cmpi           (imm 2), %ac0
4:       bne            lf.27(35)
dcl.29:
6:       mov            %rdi, %rcx
9:       or             %r8, %rcx
12:      testib         (imm 7), %rcx
15:      bne            Llib.26(12)
lt.30:

Chez Scheme has its own mnemonics for the assembler instructions, but it should be familiar enough. In this example the rdi and r8 registers are bitwise logical OR’d together, the three low bits are tested and at 15: it branches to some error handler if the bits were not 0. In C terms: if (((x | y) & 7) != 0) { goto error; }. This is an example of the cleverness possible in a well-designed low-bit tagging system: two values can be type checked with a single branch instruction.

This works because all fixnums are tagged with three 0 bits and bitwise logical OR with something not a fixnum would introduce some 1 bits. Chez Scheme is careful to never create non-fixnum objects with three lower 0 bits.

Starting with Z3Py

Let us have a look at how to enter this system into Z3 via Z3Py. Z3 is an MIT-licensed theorem prover from Microsoft Research. The native language of Z3 is actually SMT-LIB, but I will use Z3Py here because I find it helps with writing logic at a higher level. Z3Py is available in many Linux distributions, including Debian: apt-get install python-z3 (currently Python 2.7 only).

So what does Z3 actually do? Think of it as a tool that does a brute force search through a whole problem space, looking at every possible model that satisfies a set of constraints, but that it also knows a lot of shortcuts that speed up the search. If you gave it the assertion x + 1 = 2 it would be clever enough that it would not need to go and search through all possible values of x until it found x = 1 (but in some situations would do basically this).

Back to tagging. One interesting property of tagging fixnums with 0 bits is that addition can be performed without masking away the tag bits. This Z3Py script checks this property:

from __future__ import print_function
from z3 import *

# We want fixnums to be tagged with some three low bits.
tag_fixnum = BitVec('tag-fixnum', 64)
mask_fixnum = BitVecVal(0b111, 64)

# Keep these ordered.
tags = ( tag_fixnum, )
masks = ( mask_fixnum, )

# Create a solver.
s = Solver()

# Two fixnums can be added and the result is a fixnum.
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
             Implies(And((x & mask_fixnum) == tag_fixnum,
                         (y & mask_fixnum) == tag_fixnum),
                     ((x + y) & mask_fixnum) == tag_fixnum)))

print(s.sexpr())
print(s.check())
print(s.model().sexpr())

The BitVec objects here are bit-vector variables in the model, representing some 64-bit value and BitVecVal is a 64-bit constant. The argument to s.add is read as: for all x and for all y it is true that x and y being fixnums implies (as in P → Q) that x + y is also a fixnum. It is important to use x and y inside ForAll, otherwise Z3 will look for some specific x and y that satisfy the assertions instead of proving a model for all fixnums. (There is a missing constraint, though. Can you see it?)

When run through Python it generates this output:

(declare-fun tag-fixnum () (_ BitVec 64))
(assert (forall ((x (_ BitVec 64)) (y (_ BitVec 64)))
  (=> (and (= (bvand x #x0000000000000007) tag-fixnum)
           (= (bvand y #x0000000000000007) tag-fixnum))
      (= (bvand (bvadd x y) #x0000000000000007) tag-fixnum))))

sat
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)

The first part is the solver in the SMT-LIB language and sat means it is satisfiable. That it is satisfiable means that Z3 also found a model, which is the last output shown. Here it has assigned all 0 bits to the fixnum tag, having proved that tagging with 0 bits allows addition to work directly with fixnums. (Almost, actually. We didn’t check that addition itself actually gives the right result).

Something that’s tricky with these theorem solvers is that you really need to tell them all the constraints or, like little children, they will find that one electrical outlet that you didn’t secure. The missing constraint here is that the tag must fit within the mask. Z3 actually found a model where the tag is zero, which is what we were looking for, but it could just as well have given us a model where some high bit of the fixnum tag is set. That will be seen in the next section.

Types, types, types!

Just having fixnums is no fun, so let’s add pairs, characters, booleans and the empty list (nil). Some additional constraints will be needed, but let us first see what happens without them.

from __future__ import print_function
from z3 import *

tag_fixnum   = BitVec('tag-fixnum', 64)
tag_pair     = BitVec('tag-pair', 64)
tag_char     = BitVec('tag-char', 64)
tag_boolean  = BitVec('tag-boolean', 64)
tag_nil      = BitVec('tag-nil', 64)

mask_fixnum  = BitVecVal(0b111, 64)
mask_pair    = BitVecVal(0b111, 64)
mask_char    = BitVec('mask-char', 64)
mask_boolean = BitVec('mask-boolean', 64)
mask_nil     = BitVec('mask-nil', 64)

tags = ( tag_fixnum, tag_pair, tag_char, tag_boolean, tag_nil )
masks = ( mask_fixnum, mask_pair, mask_char, tag_boolean, mask_nil )

s = Solver()
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
             Implies(And((x & mask_fixnum) == tag_fixnum,
                         (y & mask_fixnum) == tag_fixnum),
                     ((x + y) & mask_fixnum) == tag_fixnum)))

# Uncommented one by one in the discussion below.
#s.add(Distinct(tags))
#s.add([(tag & mask) == tag for (tag, mask) in zip(tags, masks)])
#s.add([And(mask > 0, mask <= 0xff) for mask in masks])

print(s.sexpr())
print(s.check())
print(s.model().sexpr())

If you were to run this through Python you would find that Z3Py probably prints exactly the same thing as before! That’s because the new tag and mask variables are not referenced anywhere in the model. Uncomment the Distinct constraint and you might get this model:

(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000003)
(define-fun tag-nil () (_ BitVec 64)
  #x0000000000000000)
(define-fun tag-char () (_ BitVec 64)
  #x0000000000000002)
(define-fun tag-boolean () (_ BitVec 64)
  #x0000000000000001)
(define-fun tag-fixnum () (_ BitVec 64)
  #x2000000000000000)

Distinct means that the tags must have different values. Note that Z3 gave the all-0 tag to nil and fixnums have a high bit set. And indeed, the addition constraint still holds in this model. Oops. Uncomment the constraint after the Distinct line to constrain tags to fit inside their mask. Here is a model with the new constraints:

(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
  #x32212a2aa3282220)
(define-fun mask-char () (_ BitVec 64)
  #x0000000010000000)
(define-fun mask-nil () (_ BitVec 64)
  #x32212a2aa3282220)
(define-fun tag-char () (_ BitVec 64)
  #x0000000010000000)
(define-fun tag-boolean () (_ BitVec 64)
  #x0100000000000000)
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)

Z3 really got creative here with the tag and mask for nil, but fixnums are back to zero tags, so that’s good. In general the tags are a bit too large, so let’s enable the next constraint, saying that the masks should be 8-bit values. Here is the new model (again, several models are possible):

sat
(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
  #x0000000000000040)
(define-fun mask-char () (_ BitVec 64)
  #x0000000000000061)
(define-fun mask-nil () (_ BitVec 64)
  #x0000000000000040)
(define-fun tag-char () (_ BitVec 64)
  #x0000000000000061)
(define-fun tag-boolean () (_ BitVec 64)
  #x0000000000000080)
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)

This iterative approach to the design is where using a theorem prover really shines. In this model the nil value and booleans are actually fixnums, which is wrong, so assertions should be added that prevents this from happening. But now on to some cleverness.

Clever masking

The masks in the current design are small and neat and fit as immediates in instruction encodings. However, anyone who is familiar with the x86 instruction set knows that registers can be addressed in smaller parts without separate masking. Here is a handy table for one of the registers:

Register name Register size Addressed data
rax 64 rax
eax 32 rax & 0xffffffff
ax 16 rax & 0xffff
al 16 rax & 0xff
ah 16 (rax >> 8) & 0xff

Using al would make it possible to type check without applying the mask explicitly, if it is exactly 0xff. This also means that the original value is not overwritten, so a temporary register is not needed. Decreasing register pressure is important when optimizing some code, e.g. tight loops.

Let us see what Chez Scheme does here.

> (#%$assembly-output #t)
> (lambda (x) (char=? x #\space))

entry.21:
0:       cmpi           (imm 1), %ac0
4:       bne            lf.20(66)
dcl.22:
6:       mov            %r8, %rcx
9:       andi           (imm 255), %rcx
16:      cmpi           (imm 22), %rcx
20:      bne            lf.19(31)
lt.23:

Hmm! The equivalent C code is rcx = (r8 & 255); if (rcx != 255) { goto error; }. It turns out that Chez Scheme doesn’t know that it can use r8l to do the check without involving a temporary register, even though the mask allows for this. Perhaps Chez Scheme has some low-hanging fruit for the intrepid compiler developer.

When you find a trick that you want to use in your tagging system, you just add it as a constraint. This Z3Py snippet adds the new constraint for characters:

s.add(Or(mask_char == 0xff,         # for free with al
         mask_char == 0xffff,       # ax
         mask_char == 0xffffffff))  # eax

Clever shifting

Any language with both characters and integers needs some way to convert between them and Scheme is no exception. Chez Scheme’s implementation of char->integer contains a piece of cleverness (not unique to itself) that works because of how the character and fixnum tags are arranged:

> (#%$assembly-output #t)
> (lambda (x) (char->integer x))

entry.28:
0:       cmpi           (imm 1), %ac0
4:       bne            lf.27(39)
dcl.29:
6:       mov            %r8, %rcx
9:       andi           (imm 255), %rcx
16:      cmpi           (imm 22), %rcx
20:      bne            lf.26(11)
lt.30:
22:      mov            %r8, %ac0
25:      lsri           (imm 5), %ac0
…

It first does a type check on r8 to ensure that it’s a character. Then it writes the return value as (in C terms) r8 >> 5. How can 5 work with no unmasking or tagging? Let’s add it as a constraint. This requires some leg work, see the comments:

# SPDX-License-Identifier: MIT
# Unchanged from before
from __future__ import print_function
from z3 import *
tag_fixnum   = BitVec('tag-fixnum', 64)
tag_pair     = BitVec('tag-pair', 64)
tag_char     = BitVec('tag-char', 64)
tag_boolean  = BitVec('tag-boolean', 64)
tag_nil      = BitVec('tag-nil', 64)
mask_fixnum  = BitVecVal(0b111, 64)
mask_pair    = BitVecVal(0b111, 64)
mask_char    = BitVec('mask-char', 64)
mask_boolean = BitVec('mask-boolean', 64)
mask_nil     = BitVec('mask-nil', 64)
tags = ( tag_fixnum, tag_pair, tag_char, tag_boolean, tag_nil )
masks = ( mask_fixnum, mask_pair, mask_char, tag_boolean, mask_nil )
s = Solver()
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
             Implies(And((x & mask_fixnum) == tag_fixnum,
                         (y & mask_fixnum) == tag_fixnum),
                     ((x + y) & mask_fixnum) == tag_fixnum)))
s.add(Distinct(tags))
s.add([(tag & mask) == tag for (tag, mask) in zip(tags, masks)])
s.add([And(mask > 0, mask <= 0xff) for mask in masks])
s.add(Or(mask_char == 0xff, 
         mask_char == 0xffff, 
         mask_char == 0xffffffff))
# New code starts here:

# A trick (see Hacker's Delight) to get the shift amounts
# that match the masks.
shift_char = BitVec('shift-char', 64)
shift_fixnum = BitVec('shift-fixnum', 64)
s.add(mask_char == ((1 << shift_char) - 1))
s.add(mask_fixnum == ((1 << shift_fixnum) - 1))
s.add(shift_char > 0)
s.add(shift_fixnum > 0)

# Z3Py versions of char->integer and integer->char
def char_to_integer(ch):
    return ((ch >> shift_char) << shift_fixnum) | tag_fixnum
def integer_to_char(fx):
    return ((fx >> shift_fixnum) << shift_char) | tag_char

# Sets fx_A to the fixnum that represents the 'A' code point
# and sets ch_A to the character 'A'. Then asserts that the
# conversion functions work. I'm a little bit lazy and do this
# for 'A' instead of all chars.
ch_A = BitVec('ch-A', 64)
fx_A = BitVec('fx-A', 64)
s.add(fx_A == ((ord('A') << shift_fixnum) | tag_fixnum))
s.add(ch_A == ((ord('A') << shift_char) | tag_char))
s.add(char_to_integer(ch_A) == fx_A)
s.add(integer_to_char(fx_A) == ch_A)

# Assert that no object satisfies both fixnump and charp.
# Ideally there should be a complete set of these.
def fixnump(obj): return (obj & mask_fixnum) == tag_fixnum
def charp(obj): return (obj & mask_char) == tag_char
s.add(ForAll([x], Implies(fixnump(x), Not(charp(x)))))
s.add(ForAll([x], Implies(charp(x), Not(fixnump(x)))))

# Assert that char->integer is equivalent to (ch >> n) for some n.
# This is the main point of this section.
shift_ch_to_fx = BitVec('shift-ch->fx', 64)
s.add(char_to_integer(ch_A) == (ch_A >> shift_ch_to_fx))

print(s.check())
print(s.model().sexpr())

Quite a bit of code, but it is necessary so that Z3 will not find a loophole. Here is one possible output:

sat
(define-fun shift-char () (_ BitVec 64)
  #x0000000000000008)
(define-fun ch-A () (_ BitVec 64)
  #x0000000000004102)
(define-fun shift-fixnum () (_ BitVec 64)
  #x0000000000000003)
(define-fun mask-char () (_ BitVec 64)
  #x00000000000000ff)
(define-fun tag-char () (_ BitVec 64)
  #x0000000000000002)
(define-fun mask-nil () (_ BitVec 64)
  #x000000000000000c)
(define-fun tag-boolean () (_ BitVec 64)
  #x0000000000000014)
(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
  #x000000000000000c)
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)
(define-fun fx-A () (_ BitVec 64)
  #x0000000000000208)
(define-fun shift-ch->fx () (_ BitVec 64)
  #x0000000000000005)

The interesting part is shift-ch->fx, which is 5, just like in Chez Scheme. Furthermore, if you add the assertion shift_ch_to_fx != 5 then Z3 will say “model is not available”, meaning that only this shift amount has the desired properties. There is nothing particular that stands out in the assertions that causes this to happen (although it is possible to get other shift amounts if some constraints are relaxed).

This has also affected the selections of the other values in the model. If you were to change the shift amount for characters then you could no longer rely on this trick and Z3 would happily let you know about it. If you rely on this trick in your compiler then it’s a good idea to add it as an assertion. In fact, while writing your Z3Py code it’s a good idea to add all your assumptions as assertions.

Summing up

I could go on and show a few more missed optimization opportunities in Chez Scheme, like how it doesn’t type check multiple characters with one branch, but I hope that you have already seen how Z3 is useful. It lets you prove properties of your tagging system and, perhaps just as importantly, lets you document your assumptions.

Z3 itself is lacking in documentation, attempting to use tutorials as a substitute, and its web site is full of dead links. When Z3 fails to find a solution it either goes into a seemingly infinite spin and/or prints unsat, hoping you will go away. (There is a way to get it to print a counterexample, but the output is sadly incomprehensible.) Quite often I found myself sitting at my terminal wondering why my assertions were unsatisfiable.

My advice, if you find yourself in the situation where you are 100% certain something should work, is to remove the general cases and add assertions for very specific cases. At some point a specific case will cause Z3 to reject your assertions, which will give you a clue as to what has gone wrong.

Give it a spin if you’re thinking about revamping your tagging system or if you want to add extra tricks and be certain that they are well founded.