I had already been programming for twenty years before I started my current project at Ericsson. During my time in the project I’ve come to really appreciate a few things that were new to me, like Continuous Integration (CI) and automated testing. I recently setup CI for Zabavno on GitHub with a new test case generator and immediately found bugs.
The approach
Zabavno is an x86 emulator and the x86 is a notoriously tricky
architecture. Of course, it’s not the first x86 emulator and not the
first one that needs testing. In the
paper Design and Testing of a CPU Emulator (2009, Forin and Liu)
a very systematic way to generate test cases for the x86 is described.
But the findings in their evaluation section are a bit puzzling to me.
When the processor manuals say that the flags are undefined it should
not really be surprising that they can have been modified. They
describe errors in the processor manual’s description of the
instruction encodings. My own approach to the x86 has been to use the
opcode tables, but even they have errors. They’ve also rediscovered
opcode 82, which is actually already in the opcode map. Their
techniques seem quite complex for what they accomplish. My main
takeaway from this article is to generate test cases automatically and
run them on actual hardware, but to do it in an easier way. (I’ll also
be stealing their parity matrix for the aas
instruction).
A simpler approach is to use the opcode tables to generate random operands for instructions. The instructions can then be run in the emulator and the results incorporated into a binary that runs them on real hardware. I took a similar approach in schjig, which is a program that tests R6RS Scheme implementations by comparing two implementations. In the 2009 paper they generate C programs that are run under a “test execution engine”. This might be necessary later but the first version will incorporate everything into the test binary, which will be built using the machine-code x86 assembler.
Generating test cases
In schjig there is a table of Scheme procedures along with a description of their arguments and return values (similar to what is found in the R6RS documents themselves):
(define ops
'#(((bool) number? obj)
((bool) complex? obj)
((bool) real? obj)
((bool) rational? obj)
((bool) integer? obj)
((bool) real-valued? obj)
((bool) rational-valued? obj)
((bool) integer-valued? obj)
((bool) exact? z)
((bool) inexact? z)
((z) exact z)
((bool) = z z z ...)
((bool) < x x x ...)
…
These descriptions are used to generate test cases. If an argument has type z then a random complex number is generated as an argument, and of course the right number of arguments should be generated. (Partly this table was also automatically generated by eval’ing procedure calls with random arguments. It turns out that the procedure signatures differ slightly between different Scheme implementations).
Instructions in x86 opcode tables are described using what is basically the same notation (albeit more complex, with nested tables):
(define opcodes
'#((add Eb Gb)
(add Ev Gv)
(add Gb Eb)
(add Gv Ev)
(add *AL Ib)
(add *rAX Iz)
#(Mode (push *ES) #f)
#(Mode (pop *ES) #f)
;; 08
(or Eb Gb)
(or Ev Gv)
(or Gb Eb)
(or Gv Ev)
(or *AL Ib)
(or *rAX Iz)
#(Mode (push *CS) #f)
…
The structure of the table itself is used to navigate the opcode
space. From the snippet above it can be seen that the add
instruction uses opcodes 00 to 05, push es
is on 06, etc. Some
instructions take implicitly encoded operands (e.g. *AL
can only be
the al
register operand), but most operands need to be provided
separately using additional bytes. The Eb
opsyntax can be a byte
register or a byte memory reference. The encoding details are left to
the assembler and the task is just to generate operands that match the requirements.
Both schjig and the new test generator for Zabavno prefer to generate integers that lie close to power-of-two boundaries. This tends to uncover a lot of edge cases. The test generator is very simple right now at around 380 lines, it only generates byte register operands and byte immediates, but is easy to extend with additional operands.
The tested instruction is placed in a Linux i386 ELF binary that’s generated using the x86 assembler of the machine-code project. There is no need to emit C code or interact with other tools at all, except for the Linux kernel, so the test runtime environment is pretty simple to build and execute. For each test case the ELF binary contains a short setup sequence followed by the tested instruction itself. Then it compares the actual register values with the register values produced by Zabavno. If there’s a mismatch it prints a failure report and exits with a non-zero status.
Bugs found
Even before the test program was finished it found a bug in the dec
instruction. Here’s the report (note the difference in the flags line):
Test failed: dec-Eb-0
(mov (mem32+ scratch-flags) #x41)
(mov esp scratch-flags)
(popfd)
(mov eax #x10001)
(mov ecx #x80000001)
(mov edx #xFFFFFFF)
(mov ebx #x1)
(mov esp #x7FFF)
(mov ebp #x3FFFF)
(mov esi #x1FFFFF)
(mov edi #x40000001)
(dec al)
Result from emulation in Zabavno:
eax #x00010000
ecx #x80000001
edx #x0FFFFFFF
ebx #x00000001
esp #x00007FFF
ebp #x0003FFFF
esi #x001FFFFF
edi #x40000001
flags #x00000257
Result from processor execution:
eax #x00010000
ecx #x80000001
edx #x0FFFFFFF
ebx #x00000001
esp #x00007FFF
ebp #x0003FFFF
esi #x001FFFFF
edi #x40000001
flags #x00000247
It has so far found bugs in the AF flag handling of adc
, dec
and
sbb
. And it hasn’t even been activated yet for most instructions.
One complication with enabling this for more instructions is that
Zabavno doesn’t emulate the undefined processor flags “correctly” (it
just clears them). It remains to be seen what can be done there, but
it should not be very difficult for the emulator to track which flags
are undefined.
Continuous Integration
The test suite should run automatically. GitHub offers a large amount of CI tools, so it can be hard to know where to start. I naturally picked the one with a moustache, Travis CI.
Setting up an account on Travis CI is just a matter of logging in via your GitHub account and granting some innocuous permissions. Travis will then let you enable testing for any of your repositories.
Tests are configured in a configuration file that you commit to your
repository as .travis.yml
. Right now they don’t have a runtime for
Scheme. But they do have runtimes for C and most of the big popular
languages, so installing a Scheme as part of the build process isn’t
very difficult. (And besides that, they also let you use apt to
install packages from Ubuntu, and there are a bunch of Schemes
available through there). Here’s the configuration file used by
Zabavno (slightly reformatted for the web). It downloads Chez Scheme,
generates a test suite and runs it:
language: c
os:
- linux
compiler:
- gcc
before_script:
# Install Chez Scheme
- "wget https://github.com/cisco/ChezScheme/archive/master.zip
-O ChezScheme-master.zip"
- unzip ChezScheme-master.zip
- "pushd ChezScheme-master && ./configure
--installprefix=$TRAVIS_BUILD_DIR/chez &&
make && make install && popd"
- export PATH=$TRAVIS_BUILD_DIR/chez/bin:$PATH
- export CHEZSCHEMELIBDIRS=$TRAVIS_BUILD_DIR/..:$TRAVIS_BUILD_DIR
# Install machine-code
- "wget https://github.com/weinholt/machine-code/archive/master.zip
-O machine-code-master.zip"
- unzip machine-code-master.zip
- mv machine-code-master machine-code
script:
- programs/zabavno --help
- tests/x86/generate.scm && chmod +x generate.out
- ./generate.out
Simply commit a file similar to this one this named .travis.yml
and
push it to a branch. For the initial setup you can push it to a test
branch and Travis will still see it.
Travis sends you emails about the build status and also shows the build output as the build is happening. To top it all off there’s a status image you can link to from your project. Now everyone can see that the code is working.