Automated Testing of Zabavno

Written by Gwen Weinholt on 2016-12-23

I had already been programming for twenty years before I started my current project at Ericsson. During my time in the project I’ve come to really appreciate a few things that were new to me, like Continuous Integration (CI) and automated testing. I recently setup CI for Zabavno on GitHub with a new test case generator and immediately found bugs.

The approach

Zabavno is an x86 emulator and the x86 is a notoriously tricky architecture. Of course, it’s not the first x86 emulator and not the first one that needs testing. In the paper Design and Testing of a CPU Emulator (2009, Forin and Liu) a very systematic way to generate test cases for the x86 is described. But the findings in their evaluation section are a bit puzzling to me. When the processor manuals say that the flags are undefined it should not really be surprising that they can have been modified. They describe errors in the processor manual’s description of the instruction encodings. My own approach to the x86 has been to use the opcode tables, but even they have errors. They’ve also rediscovered opcode 82, which is actually already in the opcode map. Their techniques seem quite complex for what they accomplish. My main takeaway from this article is to generate test cases automatically and run them on actual hardware, but to do it in an easier way. (I’ll also be stealing their parity matrix for the aas instruction).

A simpler approach is to use the opcode tables to generate random operands for instructions. The instructions can then be run in the emulator and the results incorporated into a binary that runs them on real hardware. I took a similar approach in schjig, which is a program that tests R6RS Scheme implementations by comparing two implementations. In the 2009 paper they generate C programs that are run under a “test execution engine”. This might be necessary later but the first version will incorporate everything into the test binary, which will be built using the machine-code x86 assembler.

Generating test cases

In schjig there is a table of Scheme procedures along with a description of their arguments and return values (similar to what is found in the R6RS documents themselves):

(define ops
  '#(((bool) number? obj)
     ((bool) complex? obj)
     ((bool) real? obj)
     ((bool) rational? obj)
     ((bool) integer? obj)
     ((bool) real-valued? obj)
     ((bool) rational-valued? obj)
     ((bool) integer-valued? obj)
     ((bool) exact? z)
     ((bool) inexact? z)
     ((z) exact z)
     ((bool) = z z z ...)
     ((bool) < x x x ...)

These descriptions are used to generate test cases. If an argument has type z then a random complex number is generated as an argument, and of course the right number of arguments should be generated. (Partly this table was also automatically generated by eval’ing procedure calls with random arguments. It turns out that the procedure signatures differ slightly between different Scheme implementations).

Instructions in x86 opcode tables are described using what is basically the same notation (albeit more complex, with nested tables):

(define opcodes
  '#((add Eb Gb)
     (add Ev Gv)
     (add Gb Eb)
     (add Gv Ev)
     (add *AL Ib)
     (add *rAX Iz)
     #(Mode (push *ES) #f)
     #(Mode (pop *ES) #f)
     ;; 08
     (or Eb Gb)
     (or Ev Gv)
     (or Gb Eb)
     (or Gv Ev)
     (or *AL Ib)
     (or *rAX Iz)
     #(Mode (push *CS) #f)

The structure of the table itself is used to navigate the opcode space. From the snippet above it can be seen that the add instruction uses opcodes 00 to 05, push es is on 06, etc. Some instructions take implicitly encoded operands (e.g. *AL can only be the al register operand), but most operands need to be provided separately using additional bytes. The Eb opsyntax can be a byte register or a byte memory reference. The encoding details are left to the assembler and the task is just to generate operands that match the requirements.

Both schjig and the new test generator for Zabavno prefer to generate integers that lie close to power-of-two boundaries. This tends to uncover a lot of edge cases. The test generator is very simple right now at around 380 lines, it only generates byte register operands and byte immediates, but is easy to extend with additional operands.

The tested instruction is placed in a Linux i386 ELF binary that’s generated using the x86 assembler of the machine-code project. There is no need to emit C code or interact with other tools at all, except for the Linux kernel, so the test runtime environment is pretty simple to build and execute. For each test case the ELF binary contains a short setup sequence followed by the tested instruction itself. Then it compares the actual register values with the register values produced by Zabavno. If there’s a mismatch it prints a failure report and exits with a non-zero status.

Bugs found

Even before the test program was finished it found a bug in the dec instruction. Here’s the report (note the difference in the flags line):

Test failed: dec-Eb-0
(mov (mem32+ scratch-flags) #x41)
(mov esp scratch-flags)
(mov eax #x10001)
(mov ecx #x80000001)
(mov edx #xFFFFFFF)
(mov ebx #x1)
(mov esp #x7FFF)
(mov ebp #x3FFFF)
(mov esi #x1FFFFF)
(mov edi #x40000001)
(dec al)
Result from emulation in Zabavno:
eax     #x00010000
ecx     #x80000001
edx     #x0FFFFFFF
ebx     #x00000001
esp     #x00007FFF
ebp     #x0003FFFF
esi     #x001FFFFF
edi     #x40000001
flags   #x00000257
Result from processor execution:
eax     #x00010000
ecx     #x80000001
edx     #x0FFFFFFF
ebx     #x00000001
esp     #x00007FFF
ebp     #x0003FFFF
esi     #x001FFFFF
edi     #x40000001
flags   #x00000247

It has so far found bugs in the AF flag handling of adc, dec and sbb. And it hasn’t even been activated yet for most instructions. One complication with enabling this for more instructions is that Zabavno doesn’t emulate the undefined processor flags “correctly” (it just clears them). It remains to be seen what can be done there, but it should not be very difficult for the emulator to track which flags are undefined.

Continuous Integration

The test suite should run automatically. GitHub offers a large amount of CI tools, so it can be hard to know where to start. I naturally picked the one with a moustache, Travis CI.

Setting up an account on Travis CI is just a matter of logging in via your GitHub account and granting some innocuous permissions. Travis will then let you enable testing for any of your repositories.

Tests are configured in a configuration file that you commit to your repository as .travis.yml. Right now they don’t have a runtime for Scheme. But they do have runtimes for C and most of the big popular languages, so installing a Scheme as part of the build process isn’t very difficult. (And besides that, they also let you use apt to install packages from Ubuntu, and there are a bunch of Schemes available through there). Here’s the configuration file used by Zabavno (slightly reformatted for the web). It downloads Chez Scheme, generates a test suite and runs it:

language: c

  - linux

  - gcc

  # Install Chez Scheme
  - "wget
  - unzip
  - "pushd ChezScheme-master && ./configure
      --installprefix=$TRAVIS_BUILD_DIR/chez &&
     make && make install && popd"
  - export PATH=$TRAVIS_BUILD_DIR/chez/bin:$PATH
  # Install machine-code
  - "wget
  - unzip
  - mv machine-code-master machine-code

  - programs/zabavno --help
  - tests/x86/generate.scm && chmod +x generate.out
  - ./generate.out

Simply commit a file similar to this one this named .travis.yml and push it to a branch. For the initial setup you can push it to a test branch and Travis will still see it.

Travis sends you emails about the build status and also shows the build output as the build is happening. To top it all off there’s a status image you can link to from your project. Now everyone can see that the code is working.

Build Status