-*- coding: utf-8-with-signature; mode: org -*- Notes on the implementation of fixnums Copyright (c) 2011 Göran Weinholt WORK IN PROGRESS! Keywords: scheme, lisp, fixnums, immediate integers, assembler * Introduction In Lisp implementations small integers are often implemented as fixnums. A fixnum fits in a machine register, which means that no memory has to be allocated for arithmetic performed on fixnums. Fixnums can represent integers in the interval [-2^(w-1), 2^(w-1)-1] where w is the fixnum-width. The fixnum-width is generally less than the width of the machine registers. This is done in order to implement "low-tagging", where the low bits of a pointer indicate the type of the object it represents. One tag might be reserved for pairs and another for vectors. The fixnum tag is generally zero because this makes some operations more efficient. This document contains some notes on efficient implementation of fixnum operations. The context is that of R6RS Scheme and some reasonable assumptions are made with regard to implementation details. In order to make things more concrete examples are in amd64 assembly, fixnum-width is assumed to be 61 and the tag is three zero bits. The assembly syntax is that used by the disassembler in the Industria Libraries. It is also assumed that the constants for truth and falsehood fit inside a 32-bit machine register. This affects the CMOVcc instructions, which would otherwise have to use 64-bit operands. The examples given here can be used to implement the general form of the procedures. Better code can be generated if some procedure arguments are known. * Acknowledgements During the development of the ideas found here I've received quite some inspiration from a book, Hacker's Delight. Christian Häggström suggested improvements to the machine code and also pushed me to find better code sequences. * Representation of fixnums An integer is converted to a fixnum by shifting it left by three bits. The exact integer 1, when stored in a register or memory location, is represented as 8. This instruction loads (least-fixnum): 48 B8 0000000000000080 (mov rax #x8000000000000000) This instruction loads (greatest-fixnum): 48 B8 F8FFFFFFFFFFFF7F (mov rax #x7FFFFFFFFFFFFFF8) And for completeness, this instruction loads (fixnum-width): B8 E8010000 (mov eax #x1E8) * Type dispatch To check if the value in RAX is a fixnum is a simple matter: A8 07 (test al #x7) After this RFLAGS.Z is set if RAX contains a fixnum. Many fixnum operations, e.g. fx+ and fx*, accept two arguments. To check if two object are both fixnums is not much more complicated. Assuming that RDI and RSI contain the objects to be tested, it can be done like this, using RDX for temporary storage: 8B C6 (mov eax esi) 0B C7 (or eax edi) A8 07 (test al #x7) Note that the MOV and OR only use the lower 32 bits of the registers, and the TEST instruction only uses the lower 8 bits. This is because only the low bits matter in this case. The instructions have shorter encodings than their 64-bit equivalents. The design of the fixnum operators in R6RS is such that fixnum operators always return fixnums, if they return at all. This means that a clever compiler can remove a lot of runtime type checking. For example, the fx+ in this expression does not need to check if its operands are fixnums: (fx+ (fx- a b) (fx* c d)). From now on this document will assume that the operands have been checked and confirmed to be fixnums. * Comparison predicates Comparing two fixnums is merely a matter of knowing which condition code (cc) to use. The comparison is first done with CMP. After this an instruction like CMOVcc (for a predicate) or Jcc (for a conditional expression) is used. The instructions SETcc might also be useful. The following example makes two assumptions. It assumes that RDI is the first operand and that RSI is the second operand. Furthermore it assumes that the constants #f and #t have been loaded into RAX and RDX respectively. This is how fx? | g | | fx=? | ge | | fx<=? | le | * Numerical predicates The numerical predicates are straightforward. The following examples assume that the operand is in RDI and that RAX/RDX have been prepared just like for the comparison predicates. This is fxzero?: 48 3B FF (cmp rdi rdi) 0F44 C2 (cmovz eax edx) Here's fxpositive?: 48 85 FF (test rdi rdi) 0F4F C2 (cmovns eax edx) This is fxnegative?: 48 85 FF (test rdi rdi) 0F4C C2 (cmovs eax edx) This is fxodd?: 40 F6 C7 08 (test dil #x8) 0F45 C2 (cmovnz eax edx) And finally fxeven?: 40 F6 C7 08 (test dil #x8) 0F44 C2 (cmovz eax edx) The last two examples use DIL, which accesses the lower 8 bits of the RDI register. * Bounds checking vector references It is beneficial if the length field of vectors is restricted to fixnums. To perform bounds checking in vector-ref, code with this semantics is needed: (if (not (and (fx>=? k 0) (fxbignum a) (fixnum->bignum b)))) To implement fx+/false is a simple matter of using CMOVO instead of JO. Here is what it looks like, assuming that RDX has been loaded with the constant #f: 48 03 F7 (add rsi rdi) 48 0F40 F2 (cmovo rsi rdx) * Arithmetic shifting The procedures fxarithmetic-shift, fxarithmetic-shift-left and fxarithmetic-shift-right implement bitwise arithmetic shifting. The R6RS explains it this way: (fxarithmetic-shift fx1 fx2) procedure The absolute value of fx2 must be less than (fixnum-width). If (floor (* fx1 (expt 2 fx2))) is a fixnum, then that fixnum is returned. Otherwise an exception with condition type &implementation-restriction is raised. Obviously the example given is only for clarification. There is no need to use floor, * or expt because the processor has instructions that implement arithmetic shift. You're probably already familiar with arithmetic shift if you've programmed in C. These operators are a bit different from the '<<' and '>>' operators. The most obvious difference is that they check for overflow and that they check that the shift amount is valid. Less obvious is that '>>' for signed negative integers is implementation- defined. Shifting to the right always produces a number that is closer to zero, so there is no way it can overflow. The following example shows how fxarithmetic-shift-right can be done, assuming the first operand is in RAX and the second in RCX: 48 C1 F9 03 (sar rcx #x3) 48 D3 F8 (sar rax cl) 48 83 E0 F8 (and rax #xFFFFFFFFFFFFFFF8) 48 83 F9 3D (cmp rcx 61) 73 nn (jnb out-of-bounds-handler) The first SAR converts RCX from a fixnum and the second one does the actual arithmetic shift. The AND turns RAX back into a fixnum. The last two instructions verify that the shift amount is valid. The use of CL (the lower byte of RCX) is necessary because that is the only way to get SAR to use a variable shift amount. Shifting to the left is slightly more tricky, because of the need to check for overflow in addition to verifying the shift amount. Here's one convoluted way of doing it: 48 C1 F9 03 (sar rcx #x3) 48 8B F8 (mov rdi rax) 48 D3 E0 (shl rax cl) 48 8B D0 (mov rdx rax) 48 D3 FA (sar rdx cl) 48 3B FA (cmp rdi rdx) BA nnnnnnnn (mov edx the-false-constant) 48 0F45 C2 (cmovnz rax rdx) 48 83 F9 3D (cmp rcx 61) 48 0F43 C2 (cmovnb rax rdx) The basic idea comes from Christian Häggström. First the number is shifted to the left and then it is shifted back to the right. If that number is not equal to the original number, then some bits were lost and the result is not representable as a fixnum. The version above is branchless and will either return the correct result (given that RAX is a fixnum), or it will return #f. It can easily be made to jump to an error handler by replacing the CMOVcc instructions. * The division operations Division is slightly more complicated than the other arithmetic operations. The operations fxdiv and fxdiv0 are not directly implemented by the processor. The IDIV instruction truncates towards zero, whereas fxdiv performs number-theoretic (Euclidean) division. The fxdiv and fxmod procedures can be implemented in terms of quotient and remainder, which can be implemented with IDIV. The fixnum equivalents of these procedures can be called fxquotient and fxremainder. Here's how fxremainder can be implemented, given that the first operand is in RAX and the second in RDI: 48 99 (cqo) 48 F7 FF (idiv rdi) The remainder is placed in RDX. The CQO instruction propagates the sign bit from RAX into RDX. To implement fxquotient there is one special case that must be handled: (fxquotient (least-fixnum) -1). The result would be one greater than (greatest-fixnum), and that number is not representable as a fixnum. This is probably best handled by an explicit check in fxdiv. Here's what fxquotient looks like, without the check and given the same assumptions as above: 48 99 (cqo) 48 F7 FF (idiv rdi) 48 C1 E0 03 (shl rax #x3) One must also keep in mind that division by zero causes the processor to signal a Divide Error exception. Either this can be handled by catching the signal, or an explicit check must be inserted somewhere before the IDIV instruction. One way of implementing fxdiv requires greater precision than fixnums provide. This can easily be accomplished in assembler by shifting the operands to the right, or by using bignums. Here's one relationship between div and quotient: (define (div n d) (cond ((negative? d) (if (negative? n) (quotient (+ n d 1) d) (quotient n d))) (else (if (negative? n) (quotient (+ (- n d) 1) d) (quotient n d))))) The mod can be calculated with (- n (* d (div n d))). Alternatively this relationship can be used: (define (mod n d) (let ((r (remainder n d))) (cond ((negative? d) (if (negative? r) (- r d) r)) (else (if (negative? r) (+ r d) r))))) There's an alternative way of calculating fxdiv-and-mod, requiring no more precision than that provided by fixnums. Let (q,r) be the quotient and remainder of n/d: fxdiv-and-mod n d = error d = 0 or (n,d)=(-2^(w-1),-1) fxdiv-and-mod n d = (q,r) r >= 0 fxdiv-and-mod n d = (q+1,r-d) d < 0 fxdiv-and-mod n d = (q-1,r+d) otherwise This also has the advantage that it only requires a single IDIV. * The fxlength procedure The fxlength procedure finds the length of a fixnum. To quote R6RS: Returns the number of bits needed to represent fx if it is positive, and the number of bits needed to represent (fxnot fx) if it is negative, [...]. The document then gives an implementation of fxlength using a loop written in Scheme. A more efficient implementation is possible. The following example assumes that RAX contains the operand: 33 F6 (xor esi esi) 48 C1 F8 02 (sar rax #x2) 48 0FBD C0 (bsr rax rax) 0F44 C6 (cmovz eax esi) C1 E0 03 (shl eax #x3) Note that this only handles non-negative operands. The main ingredient is the BSR (Bit Scan Reverse) instruction. The CMOVZ instruction is needed because BSR is undefined for zero. There are several ways of handling negative numbers, here's how to do it with CMOVL: 33 F6 (xor esi esi) 48 C1 F8 02 (sar rax #x2) 48 8B F8 (mov rdi rax) 48 F7 D0 (not rax) 48 0F4C F8 (cmovl rdi rax) 48 0FBD C7 (bsr rax rdi) 0F44 C6 (cmovz eax esi) C1 E0 03 (shl eax #x3) If you're using GCC then the __builtin_clz function provides the same functionality as BSR. ** Decimal version of fxlength In Hacker's Delight, on page 219, there's a function that computes the integer logarithm base 10. Here's a definition for R6RS Scheme. It assumes the number is positive: (define (fxilog10 n) (define table1 '#(0 0 0 0 1 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5 6 6 6 6 7 7 7 8 8 8 9 9 9 9 10 10 10 11 11 11 12 12 12 12 13 13 13 14 14 14 15 15 15 15 16 16 16 17 17 17 18 18 18 18 19 19 19)) (define table2 '#(1 10 100 1000 10000 100000 1000000 10000000 100000000 1000000000 10000000000 100000000000 1000000000000 10000000000000 100000000000000 1000000000000000 10000000000000000 100000000000000000 1000000000000000000)) (let ((y (vector-ref table1 (fxlength n)))) (if (fx