The comments on this blog are now back from their GDPR-induced coma. I’m using a custom comment system powered by HTMX and a backend built on Loko Scheme. While writing the backend, one thing lead to another and I wanted to see if my HTTP message parser could crash. This is when I discovered that the AFL support in Loko Scheme had suffered bit rot. I have repaired it now and wanted to demonstrate how to fuzz Scheme code with Loko Scheme and AFL++.
AFL++ is a fuzzer based on the original
American Fuzzy Lop (AFL)
by Michał “lcamtuf” Zalewski. Loko
Scheme previously had support for AFL but it inadvertently stopped working back when the
.data segment was made read-only.
The fuzzer support is now repaired
and accessible with a command line flag.
I did find one bug in the HTTP message parser, but it was not very interesting. More interesting were the problems that I found in laesare, my Scheme reader library. AFL++ found a way to crash it and to make it hang. Oops!
Steps to Fuzzing
Here is how you fuzz a Scheme program (R6RS or R7RS):
- First make sure you have AFL++ installed:
sudo apt install afl++.
- Install Akku.scm, the Scheme package manager. It is required by Loko Scheme.
- Install Loko Scheme from git, or any future version later than 0.12.0.
- Write a program that reads input from standard input and passes it to your function under test.
- Compile that program with
loko -fcoverage=afl++ --compile coverage.sps.
- Create a directory called
inputswith sample input files. It is often enough to provide a single file, but it can help speed up the fuzzing process if you have samples of a wide variety.
- Run AFL++ with
env AFL_CRASH_EXITCODE=70 afl-fuzz -i inputs/ -o outputs -- ./coverage. Watch the pretty status screen and wait for it to find crashes and timeouts.
- Analyze the outputs.
That’s essentially it. Only steps 4—8 are really specific to fuzzing, so I will go through them in detail.
Program Under Test
You need a program that reads from standard input and passes the data to the code you want to test with AFL++. This is usually very simple to accomplish.
If your program works with textual data then you read from
(current-input-port), but if it binary data then you make a new
binary input port with
(standard-input-port) and read from that. You
can either read until you see the eof object and pass all the data
directly to the code under test, or you can pass the port directly to
the code under test. It depends on what your API needs as input.
Here is an example program that feeds data to
get-token from laesare:
;; SPDX-License-Identifier: MIT (import (rnrs) (laesare reader)) (let ((reader (make-reader (current-input-port) "stdin"))) (reader-mode-set! reader 'r6rs) (let lp () (let-values ([(type token) (guard (con ((lexical-violation? con) (values 'condition con))) (get-token reader))]) (write type) (display #\space) (write token) (newline) (unless (eof-object? token) (lp)))))
Compile with Instrumentation
The program now needs to be compiled with instrumentation for AFL++. This is done by passing a new flag to loko:
.akku/env loko -fcoverage=afl++ --compile coverage.sps
-fcoverage=afl++ flag tells the code generator to insert a
special code sequence inside every
The way that AFL++ works is that the program receives a shared memory
afl-fuzz that it mutates during the execution of the
program. You can imagine that this program:
(if test conseq altern)
is transformed into this program:
(if test (begin (afl-mutate (compile-time-random)) conseq) (begin (afl-mutate (compile-time-random)) altern))
(compile-time-random) should be fixed at compile-time
and should be different for the two branches. Therefore the effect of
afl-mutate on the shared memory segment will be different depending
on which branch is taken at runtime, but it will be identical for
different runs of the program if the input is identical.
Now suppose that this transformation is applied to the whole program. The path of all branches taken through the program for a given input generates a unique fingerprint that AFL++ uses to explore all the branches in the program. It uses some clever algorithms to mutate the input in ways that uncover new paths through the program. This is repeated thousands of times per second to eventually (maybe) find inputs that crash or hang the program.
By the way, when you use the
-fcoverage=afl++ flag with Loko you
also get an instrumented standard library. This means that AFL++ can
(scheme base), etc, and can fuzz them along with
your code. This means that AFL++ can be smarter when it searches for
bugs that are triggered by how your program interacts with the
standard library, which would otherwise be a black box.
Run the Fuzzer
With the binary built you can start the fuzzer:
$ env AFL_CRASH_EXITCODE=70 afl-fuzz -i inputs \ -o outputs -- ./coverage
It will tell you if something is wrong with the program. Otherwise it starts up a screen that looks like this:
Not all crashes and timeouts are necessarily unique, many of them are likely to be triggered by the same bug.
The speed of fuzzing can shift over time, but I commonly see around
2500 4300 executions/sec using a single core on my machine. This can be
further sped up by using multiple cores and system tuning that the
AFL++ manual can tell you more about.
Analyze the outputs
If the “findings in depth” box reports crashes and timeouts then you
can go and look in the output directory. The
directory contains files that you can just feed directly into your
$ ./coverage < outputs/default/crashes/id:000000,* … Frame 2 has return address #x26987C. Local 3: #<closure dynamic-wind /usr/local/lib/loko/loko/runtime/control.loko.sls:164:0> Local 4: #[reader port: #<textual-input-port "*stdin*" fd: 0> filename: "stdin" line: 1 column: 40 saved-line: 1 saved-column: 10 fold-case?: #f mode: r6rs tolerant?: #f] Local 5: &lexical Frame 3 has return address #x24282F. Local 0: #f Local 1: 0 Local 2: 0 End of stack trace. The condition has 6 components: 1. &assertion &violation &serious 2. &who: get-token 3. &who: "/usr/local/lib/loko/laesare/reader.sls:460:0" 4. &message: "Type error: expected a fixnum" 5. &program-counter: #x2E68F1 6. &continuation k: #<closure continuation /usr/local/lib/loko/loko/runtime/control.loko.sls:152:21> End of condition components. …
This tells us there is a crash in the
Unfortunately this is a huge procedure and Loko does not yet generate
DWARF information that lets us get source line information from the
instruction pointer. We know that something in
a fixnum, but Loko is a bit sloppy with the
&who condition when
it comes to assertions from inlined built-ins.
We can use
objdump -d ./coverage and look for the instruction at
0x2E68F1 or the instruction that jumps to that address and try
to make sense of the context. But there is another way.
More Tools for Easier Fun
AFL++ comes with tools that let you analyze the crashes and minimize the inputs. When you report a bug found using a fuzzer it is important to first use a minimizer to find a minimal reproducer. The person reading your bug report does not want to have to guess which parts of the input are relevant and which are noise. This applies also to us when we’re the ones using AFL++ for our own code.
Minimize with afl-tmin
Here is how you run the minimizer:
$ env AFL_CRASH_EXITCODE=70 afl-tmin \ -i outputs/default/crashes/id:000000,* \ -o crash -- ./coverage
The afl-tmin program uses the same binary as before but has a different goal: remove as much of the input as possible.
This is what happened to the file:
$ hexdump outputs/default/crashes/id:000000,* 0000000 0023 0100 2300 0000 0001 5c23 4678 4646 0000010 4646 4646 4646 4646 4646 4646 4646 4646 0000020 4646 4646 4646 4646 4623 4646 1821 3070 0000030 1818 1818 1818 1702 1818 5c00 7038 000003e $ hexdump crash 0000000 5c23 4678 3030 3030 3030 3030 3030 3030 0000010 3030 0030 0000013 $ cat -vet crash #\xF000000000000000
AFL++ has just told us that laesare crashes if it attempts to read a large character constant! That bug is now fixed in the git repo.
Perhaps even more impressive is that the minimizer works even when the
program hangs and it turns out that
(string->number "0F800000") hangs
Loko Scheme. Oops again!
Analyze with afl-analyze
The analyzer is another fun tool and here is how you run it:
$ afl-analyze -i crash -- ./coverage
In this case we already know what the problem is a character constant that is too large, so it is not telling us anything new. But it has figured out that the middle zeros do not really affect the program flow, which can be useful information when analyzing other test cases.
Fuzzing is a powerful technique that automatically searches for inputs that crash or hang your program. Loko Scheme can now be used with AFL++ to fuzz Scheme programs.
Write a program
coverage.sps that passes standard input to the
procedure you want to test and then compile it with a recent Loko Scheme:
mkdir inputs echo '()' > inputs/nil .akku/env loko -ftarget=linux -fcoverage=afl++ --compile coverage.sps env AFL_CRASH_EXITCODE=70 afl-fuzz -i inputs -o outputs -- ./coverage