weinholt.se

Chez Scheme 10 in Debian experimental

weinholt — Sun, 21 Jul 2024 02:00:00 +0200

I have recently been working on getting Chez Scheme 10.0.0 into Debian and have uploaded it to Debian experimental. After some minor fixes it now builds on most archs except armel and x32, using bytecode where a native port is not available. Please test it and report any bugs in the Debian bug tracker.

Fuzzing Scheme with AFL++

weinholt — Thu, 18 May 2023 02:00:00 +0200

The comments on this blog are now back from their GDPR-induced coma. I’m using a custom comment system powered by HTMX and a backend built on Loko Scheme. While writing the backend, one thing lead to another and I wanted to see if my HTTP message parser could crash. This is when I discovered that the AFL support in Loko Scheme had suffered bit rot. I have repaired it now and wanted to demonstrate how to fuzz Scheme code with Loko Scheme and AFL++.

AFL++ is a fuzzer based on the original American Fuzzy Lop (AFL) by Michał “lcamtuf” Zalewski. Loko Scheme previously had support for AFL but it inadvertently stopped working back when the .data segment was made read-only. The fuzzer support is now repaired and accessible with a command line flag.

I did find one bug in the HTTP message parser, but it was not very interesting. More interesting were the problems that I found in laesare, my Scheme reader library. AFL++ found a way to crash it and to make it hang. Oops!

Steps to Fuzzing

Here is how you fuzz a Scheme program (R6RS or R7RS):

First make sure you have AFL++ installed: sudo apt install afl++.
Install Akku.scm, the Scheme package manager. It is required by Loko Scheme.
Install Loko Scheme from git, or any future version later than 0.12.0.
Write a program that reads input from standard input and passes it to your function under test.
Compile that program with loko -fcoverage=afl++ --compile coverage.sps.
Create a directory called inputs with sample input files. It is often enough to provide a single file, but it can help speed up the fuzzing process if you have samples of a wide variety.
Run AFL++ with env AFL_CRASH_EXITCODE=70 afl-fuzz -i inputs/ -o outputs -- ./coverage. Watch the pretty status screen and wait for it to find crashes and timeouts.
Analyze the outputs.

That’s essentially it. Only steps 4—8 are really specific to fuzzing, so I will go through them in detail.

Program Under Test

You need a program that reads from standard input and passes the data to the code you want to test with AFL++. This is usually very simple to accomplish.

If your program works with textual data then you read from (current-input-port), but if it binary data then you make a new binary input port with (standard-input-port) and read from that. You can either read until you see the eof object and pass all the data directly to the code under test, or you can pass the port directly to the code under test. It depends on what your API needs as input.

Here is an example program that feeds data to get-token from laesare:

;; SPDX-License-Identifier: MIT
(import
  (rnrs)
  (laesare reader))

(let ((reader (make-reader (current-input-port) "stdin")))
  (reader-mode-set! reader 'r6rs)
  (let lp ()
    (let-values ([(type token)
                  (guard (con
                          ((lexical-violation? con)
                           (values 'condition con)))
                    (get-token reader))])
      (write type)
      (display #\space)
      (write token)
      (newline)
      (unless (eof-object? token)
        (lp)))))

Compile with Instrumentation

The program now needs to be compiled with instrumentation for AFL++. This is done by passing a new flag to loko:

.akku/env loko -fcoverage=afl++ --compile coverage.sps

The new -fcoverage=afl++ flag tells the code generator to insert a special code sequence inside every if expression.

The way that AFL++ works is that the program receives a shared memory segment from afl-fuzz that it mutates during the execution of the program. You can imagine that this program:

(if test
    conseq
    altern)

is transformed into this program:

(if test
    (begin (afl-mutate (compile-time-random)) conseq)
    (begin (afl-mutate (compile-time-random)) altern))

The expression (compile-time-random) should be fixed at compile-time and should be different for the two branches. Therefore the effect of afl-mutate on the shared memory segment will be different depending on which branch is taken at runtime, but it will be identical for different runs of the program if the input is identical.

Now suppose that this transformation is applied to the whole program. The path of all branches taken through the program for a given input generates a unique fingerprint that AFL++ uses to explore all the branches in the program. It uses some clever algorithms to mutate the input in ways that uncover new paths through the program. This is repeated thousands of times per second to eventually (maybe) find inputs that crash or hang the program.

By the way, when you use the -fcoverage=afl++ flag with Loko you also get an instrumented standard library. This means that AFL++ can see into (rnrs), (scheme base), etc, and can fuzz them along with your code. This means that AFL++ can be smarter when it searches for bugs that are triggered by how your program interacts with the standard library, which would otherwise be a black box.

Run the Fuzzer

With the binary built you can start the fuzzer:

$ env AFL_CRASH_EXITCODE=70 afl-fuzz -i inputs \
    -o outputs -- ./coverage

It will tell you if something is wrong with the program. Otherwise it starts up a screen that looks like this:

Not all crashes and timeouts are necessarily unique, many of them are likely to be triggered by the same bug.

The speed of fuzzing can shift over time, but I commonly see around ~~2500~~ 4300 executions/sec using a single core on my machine. This can be further sped up by using multiple cores and system tuning that the AFL++ manual can tell you more about.

Analyze the outputs

If the “findings in depth” box reports crashes and timeouts then you can go and look in the output directory. The outputs/default/crashes directory contains files that you can just feed directly into your test program:

$ ./coverage < outputs/default/crashes/id:000000,*
…
 Frame 2 has return address #x26987C.
  Local 3: #<closure dynamic-wind /usr/local/lib/loko/loko/runtime/control.loko.sls:164:0>
  Local 4: #[reader port: #<textual-input-port "*stdin*" fd: 0>
                    filename: "stdin" line: 1 column: 40 saved-line: 1
                    saved-column: 10 fold-case?: #f mode: r6rs tolerant?: #f]
  Local 5: &lexical
 Frame 3 has return address #x24282F.
  Local 0: #f
  Local 1: 0
  Local 2: 0
End of stack trace.
The condition has 6 components:
 1. &assertion &violation &serious
 2. &who: get-token
 3. &who: "/usr/local/lib/loko/laesare/reader.sls:460:0"
 4. &message: "Type error: expected a fixnum"
 5. &program-counter: #x2E68F1
 6. &continuation
     k: #<closure continuation /usr/local/lib/loko/loko/runtime/control.loko.sls:152:21>
End of condition components.
…

This tells us there is a crash in the get-token procedure. Unfortunately this is a huge procedure and Loko does not yet generate DWARF information that lets us get source line information from the instruction pointer. We know that something in get-token expected a fixnum, but Loko is a bit sloppy with the &who condition when it comes to assertions from inlined built-ins.

We can use objdump -d ./coverage and look for the instruction at 0x2E68F1 or the instruction that jumps to that address and try to make sense of the context. But there is another way.

More Tools for Easier Fun

AFL++ comes with tools that let you analyze the crashes and minimize the inputs. When you report a bug found using a fuzzer it is important to first use a minimizer to find a minimal reproducer. The person reading your bug report does not want to have to guess which parts of the input are relevant and which are noise. This applies also to us when we’re the ones using AFL++ for our own code.

Minimize with afl-tmin

Here is how you run the minimizer:

$ env AFL_CRASH_EXITCODE=70 afl-tmin  \
  -i outputs/default/crashes/id:000000,* \
  -o crash -- ./coverage

The afl-tmin program uses the same binary as before but has a different goal: remove as much of the input as possible.

This is what happened to the file:

$ hexdump outputs/default/crashes/id:000000,*
0000000 0023 0100 2300 0000 0001 5c23 4678 4646
0000010 4646 4646 4646 4646 4646 4646 4646 4646
0000020 4646 4646 4646 4646 4623 4646 1821 3070
0000030 1818 1818 1818 1702 1818 5c00 7038
000003e
$ hexdump crash
0000000 5c23 4678 3030 3030 3030 3030 3030 3030
0000010 3030 0030                              
0000013
$ cat -vet crash
#\xF000000000000000

AFL++ has just told us that laesare crashes if it attempts to read a large character constant! That bug is now fixed in the git repo.

Perhaps even more impressive is that the minimizer works even when the program hangs and it turns out that (string->number "0F800000") hangs Loko Scheme. Oops again!

Analyze with afl-analyze

The analyzer is another fun tool and here is how you run it:

$ afl-analyze -i crash -- ./coverage

In this case we already know what the problem is a character constant that is too large, so it is not telling us anything new. But it has figured out that the middle zeros do not really affect the program flow, which can be useful information when analyzing other test cases.

tl;dr

Fuzzing is a powerful technique that automatically searches for inputs that crash or hang your program. Loko Scheme can now be used with AFL++ to fuzz Scheme programs.

Write a program coverage.sps that passes standard input to the procedure you want to test and then compile it with a recent Loko Scheme:

mkdir inputs
echo '()' > inputs/nil
.akku/env loko -ftarget=linux -fcoverage=afl++ --compile coverage.sps
env AFL_CRASH_EXITCODE=70 afl-fuzz -i inputs -o outputs -- ./coverage

Akku website updates

weinholt — Sun, 15 Jan 2023 01:00:00 +0100

I have had some free time recently between working for clients, and took this opportunity to implement new features for Akku’s website.

In case you did not know, Akku is a package manager with features specially designed for R6RS and R7RS Scheme.

The library systems in Scheme make it possible to automatically analyze source code to find libraries, exports and imports. Akku combines such analysis with a package index that lets you install packages from the command line, automatically resolving dependencies and placing files in the right place. Dependencies and installed files are project-specific, so you can concentrate on each project separately.

The top of the page now has a handy search box:

It uses DuckDuckGo, which I think is a good balance between privacy and functionality.

At some point it would be good to set up a custom search engine which also searches through all source code.

Who uploaded the package?

Akku has over 500 packages in its index. Many of them have been uploaded by me personally when I initially set up the archive. Packaging is really simple due to the automatic analysis built-in to Akku, so when packaging something new I just needed to read through the code to see that nothing bad was going on, then add the required dependencies and write a description.

The package uploader is identified through their OpenPGP signature on the uploaded package at https://archive.akkuscm.org/archive/packages/. The website generator uses this signature to figure out who uploaded the package and shows their name if it is different from the (single) package author.

There’s an exception for packages mirrored from Snow. Those are all signed by me anyway, as there is no way to directly upload a Snow package to Akku. Those all have to go through snow-fort.org first.

Anyone can upload packages to Akku’s archive, so if you find some cool R6RS project that’s missing in Akku then you can go ahead and upload it. See the man page for instructions. (If you want to upload a new version of a package that already exists then that’s okay too, but if the author themselves uploaded the previous version then please check with them first).

Before packages are published in the archive they are manually reviewed to verify that they’re not up to any funny business. I think this is the only way to truly prevent the attacks that regularly happen to the larger package repositories for other languages. The work is manageable today and if Akku gets more popular then it should still be sustainable with increased automation to help with the tedious parts of the manual review.

Where do I get that library?

If you have some Scheme code in front of you that imports a library then you might like to find the package that contains it. There is now a new page with just such an index:

The index is still small enough that everything fits on one page. On the left you see libraries, which are tagged as R6RS libraries, R7RS libraries or implementation-specific modules. On the right you see which packages contain the library.

A library can exist in multiple packages, so you will see some libraries with links to multiple packages. When Akku encounters this situation it is the order of the dependencies in Akku.manifest that determines which variant “wins”: dependencies can overwrite files from dependencies specified earlier in the list.

Who exports this identifier?

Akku’s archive also has information about what identifiers are exported by each library, so I have made a page where you can look for identifiers and find the relevant libraries and packages.

I’m not too happy about the usability of this page, and ideas for improvements are very welcome. I tried putting it all on one page, thinking that at least then you can search in the browser, but Mr. Browser got slow and the page was 11MB. So instead there’s an awkward split into multiple pages. It should however make search engines happy, and that’s good enough for now.

What’s in the package?

Akku’s analyzer knows what is in each package and the archive software has been publishing this information for some time, but it has never been visualized before. The only information you got on the website was a synopsis, a list of authors, maybe a description and a reference to the source code.

The new “Package contents” box lists all libraries and modules that are found in the package. The first row under each library name is the list of exported identifiers, followed by one row for each imported library. Meta-information is added to the rows, like a little tag showing if it’s an R6RS library, an R7RS library, or an implementation-specified module format.

A library name can show up multiple times if there are implementation-specific variants, e.g., one variant for Chez Scheme and another one for Chibi-Scheme. This is also shown with a little tag on the package name.

SRFI library names are linked to the relevant page on https://srfi.schemers.org. This type of linking is something that should be expanded on later, but for now that’s the only type of link you will see on a library.

I think this type of information adds a lot to the package pages. You get a deeper insight into what libraries there are, what libraries they use, and you can quickly see if the package is missing library variants for your Scheme implementation. An example is the wak-common package and its (wak private include compat) library, which exists only for Chez Scheme, GNU Guile, Ikarus, Mosh, Racket and Ypsilon. However, if you look in the library index then you can see that the akku package also has a variant of this library for Loko Scheme.

Future work

There is more I would like to add to the web site, and they are not such small projects:

Documentation. Some packages have proper documentation and it would be good to link to this. It might require an update to Akku’s package format so that it will be built properly, e.g., if there are PDFs to generate. It’s also common for other languages to have automatically extracted documentation from comments, but today there is no wide-spread tool for Scheme that does this.
Test reports. Many packages have automatic test suites, but Akku does nothing with them at the moment. An even simpler test would be to just try importing each library in each Scheme and see if that works.

Suggestions and merge requests are welcome at the website’s project page: https://gitlab.com/akkuscm/akku-web.

Loko Scheme 2022 Q4 Update

weinholt — Sun, 06 Nov 2022 01:00:00 +0100

I released Loko Scheme 0.12.0 last month and forgot to blog about it. I’ve been busy starting my own consulting company so it just slipped my mind. There are two cool milestones with this release.

Self-compilation on bare metal

A cool milestone in 0.12.0 is one of those things that is pretty significant but that you can’t really demonstrate visually.

I have implemented enough of the Linux syscall layer that I was able to run Loko’s compiler on bare metal. I used an old Acer laptop to compile Loko itself while running only Loko on the laptop. Many compilers can compile themselves but this is a new extreme.

Valand, a windowing system

Loko now has a windowing system called Valand. Its design is somewhat inspired by Wayland, except it’s integrated in the kernel and is meant to be used on bare metal. So Loko on bare metal now has support for running multiple graphical programs with preemptive multitasking. You can even run Doom through a port of doomgeneric:

The way this works is through an extension to the Linux syscall ABI emulation. When you cross-compile doomgeneric on Linux you get an ELF binary that you can copy to the hard drive and then load with @/doomgeneric in the REPL window. That starts a doomgeneric process that opens /dev/valand, which gives it a file descriptor for Valand.

The Valand file descriptor supports an ioctl for creating a graphical surface which is then mapped into the process memory with mmap. Doomgeneric writes pixel data to this memory and calls another ioctl to mark the surface as damaged. Valand regularly fixes the damages by copying the damaged pixels to the framebuffer, which means that the screen is updated with a new frame from the game.

Keyboard events are returned by doing a non-blocking read on the Valand file descriptor. If there is an event then it’s returned as a struct that specifies a USB HID page and usage. Using USB HID means that there is no need to invent yet another scancode table just for Loko.

Valand keeps track of surfaces and composes an image from them. The composing magic is done with a bunch of rectangle math and a z-buffer. It is all done in Scheme code compiled to native machine code by Loko. I haven’t benchmarked it, but it’s fast enough to not be laggy.

It’s not much but it’s enough to get Doom running. You might notice that there are no title bars and controls on the windows. There’s very little that the window system gives you in the current version. You can move windows and keyboard focus will follow the mouse. Valand is starting out small and simple.

Dreaming up what’s next

The next milestone could be to port an editor. With an editor running on Loko and Valand it would in principle be possible to keep developing Loko without using another OS. I’m thinking that the fork of uEmacs/PK that Torvalds maintains should be pretty simple to port. Loko doesn’t have a terminal emulator, not even a tty layer, but you could build the terminal renderer into the uEmacs binary and have it use Valand for the UI. Update 2023-02-05: I just learned that uEmacs has a non-free license. I will find something else.

And I intend for Valand to be an integral part of the operating system that I’m building with Loko. This will make it possible to do some things that you can’t do in an OS like GNU/Linux where these components are much more loosely coupled. The Linux kernel has no idea about the desktop environment you’re using, which is the right thing for its design, but which also limits what can be done.

The tighter coupling means that Valand can provide a trusted path. The user should have a way into the system which they know with certainty can’t be faked. The system menu on top of the screen will be one such trusted path. It’s a placeholder in the screenshot shown above, but you can imagine something like the macOS menu. Window decorations will be another trusted path; it should not be possible to fake them.

A mini-rant

Linux systems sometimes freeze because the kernel overcommits memory and under heavy memory pressure begins discarding the pages of demand-paged executables. The kernel can basically decide to discard all of user space in favor of a rogue memory hog, so user space grinds to a halt.

Loko should guarantee that the computer always remains responsive, even if a program goes rogue and uses up all resources. I’m pretty weary of my Linux desktop occasionally freezing, so I’m not going to allow that in Loko.

And I don’t want to support anything that steals keyboard focus. Not even dialogue windows. Imagine typing and knowing with utter certainty where your keystrokes will be sent. I haven’t experienced that since DOS.

So when 1.0?

Obviously a version number like 0.12.0 is getting ridiculous and it’s time for 1.0.0 soon. The big milestone that I’ve been wanting to reach before 1.0.0 is to make eval use the compiler. I’ve been putting it off, even though it’s not really all that difficult. Perhaps I’ll get to it once my company is off the ground.

Cond-expand and #ifdef

weinholt — Thu, 09 Jun 2022 02:00:00 +0200

In the C programming language you can ask the macro preprocessor to keep or remove part of a source file. This is done with #ifdef. The equivalent in Scheme is called cond-expand. R7RS Scheme has two different instances of cond-expand, while R6RS Scheme does not have it all. What does R6RS do instead, and is cond-expand a bad idea?

Use cases

What is #ifdef, an its cousins #if and #ifndef, used for? And why might we want its equivalent in Scheme? There are two major use cases for #ifdef: build-time configuration and portability. The build system will usually have some sort of configuration script that detects what system it’s running on. Usually these scripts also let the user enable or disable features.

Chez Scheme runs on top of a chunk of fairly portable C and uses #ifdef as described:

$ grep -h '^#ifdef' ChezScheme/c/*.[ch] |awk '{print $2}' \
   | sort -u | xargs
ARCHYPERBOLIC ARMV6 BSDI CHAFF CHECK_FOR_ROSETTA CLOCK_HIGHRES
CLOCK_MONOTONIC CLOCK_MONOTONIC_HR CLOCK_PROCESS_CPUTIME_ID
CLOCK_REALTIME CLOCK_REALTIME_HR CLOCK_THREAD_CPUTIME_ID DEBUG
DEFINE_MATHERR DISABLE_CURSES EINTR ENABLE_OBJECT_COUNTS
FEATURE_EXPEDITOR FEATURE_ICONV FEATURE_PTHREADS FEATURE_WINDOWS FLOCK
FLUSHCACHE FunCRepl GETWD HANDLE_SIGWINCH HPUX I386 IEEE_DOUBLE ITEST
KEEPSMALLPUPPIES LIBX11 LITTLE_ENDIAN_IEEE_DOUBLE LOAD_SHARED_OBJECT
LOCKF LOG1P LOOKUP_DYNAMIC MACOSX MAP_32BIT __MINGW32__ MMAP_HEAP
NAN_INCLUDE NO_DIRTY_NEWSPACE_POINTERS NOISY
NO_LOCKED_OLDSPACE_OBJECTS PPC32 PROMPT PTHREADS SA_INTERRUPT
SA_RESTART SAVEDHEAPS segment_t2_bits segment_t3_bits SIGBUS SIGQUIT
SOLARIS SPARC SPARC64 TIOCGWINSZ USE_MBRTOWC_L WIN32 _WIN64 WIPECLEAN
X86_64

Macros like FEATURE_EXPEDITOR turn on and off functionality, while macros like HPUX and I386 are used for portability. So, configuration and portability.

Portability?

Does #ifdef truly help with portability? It can certainly seem this way, but there’s a different way to think about this issue. This is what Rob Pike had to say on the TUHS main list:

C with #ifdefs is not portable, it is a collection of 2^n overlaid programs, where n is the number of distinct #if[n]def tags. It’s too bad the problems of that approach were not appreciated by the C standard committee, who mandated the #ifndef guard approach that I’m sure could count as a provable billion dollar mistake, probably much more. The cost of building #ifdef’ed code, especially with C++, which decided to be more fine-grained about it, is unfathomable.

For each C file with #ifdef you need to understand what happens if the condition is true versus if it’s false. It’s simple with just one #ifdef, but the problem grows exponentially.

Configuration?

You can use #ifdef for conditional compilation, to turn on and off features. But this can also create a mess like that described by Pike above. The GNU Coding Standards have this to say:

When supporting configuration options already known when building your program we prefer using if (... ) over conditional compilation, as in the former case the compiler is able to perform more extensive checking of all possible code paths.

The same sentiment is echoed by Douglas McIlroy in the message that preceded Pike’s message above.

This approach is generally a good idea. All those conditionals give you a lot of code paths. Checking that all them even compile is difficult to do by hand, and you need all the help you can get. When you use #ifdef you hide the code from the compiler. The compiler can’t check code that it can’t see. Using if (... ) when the expression is constant at compile time should give the same result as conditional inclusion, at least if you have an optimizing compiler.

Also Considered Harmful

It’s not just people on the Internet saying these things about #ifdef, and the complaints are not new either.

We believe that a C programmer’s impulse to use#ifdef in an attempt at portability is usually a mistake. Portability is generally the result of advance planning rather than trench warfare involving #ifdef. In the course of developing C News on different systems, we evolved various tactics for dealing with differences among systems without producing a welter of #ifdefs at points of difference. We discuss the alternatives to, and occasional proper use of, #ifdef.

Source: SPENCER, Henry; COLLYER, Geoff. #ifdef considered harmful, or portability experience with C News. In: USENIX Summer 1992 Technical Conference (USENIX Summer 1992 Technical Conference). 1992.

You can use Google Scholar to find papers that cite this paper, if you’re interested in more reading.

cond-expand is not as bad…

The first standardization of cond-expand that I know of is Marc Feeley’s SRFI-0. It is also part of R7RS Scheme, where I believe it has seen wider adoption than plain SRFI-0.

There is one major difference between #ifdef and cond-expand. The former is handled by a preprocessor that does not understand the lexical syntax of the language is it working with. You can even use cpp with other languages than C, e.g. assembly. This means you can easily introduce latent syntax errors with #ifdef.

There is a cond-expand available from inside define-library and another one available as syntax in (scheme base). Both of these are handled after the source file has been parsed. This means that cond-expand cannot create an unbalanced syntax tree. What I mean by this is that you can’t somehow use cond-expand wrong in such a way that the parenthesis become unbalanced. To make this mistake with #ifdef is trivial; simply place } before #endif when it should have been after, or vice versa.

You get bonus points, so to speak, if you do this near code that handles portability to operating systems that you can’t test your changes on.

… but not really better

Apart from the differences in how the compiler handles them, they do actually express the same thing. One can look at cond-expand as morally equivalent to a series of #if, #else and #endif directives. Therefore the very same problems that happen with #ifdef also happen with cond-expand.

I’m not optimistic about the future landscape of R7RS code if cond-expand is not recognized for the problems it brings. It may be that each R7RS library will become a jungle of 2ⁿ overlaid libraries. I have seen some indication of this process already beginning when looking at the packages in Snow Fort.

Fortunately I have not seen any examples where cond-expand is used to change which identifiers are exported from a library, but that day may yet come.

Back to R6RS

So in the beginning of this article I wrote that R6RS Scheme does not have cond-expand. Does that mean it has another way to handle these problems?

No, but in practice: yes. In the R6RS report there are only libraries as a suggested way to handle this. And there isn’t really a way to conditionally import libraries at compile time.

This situation has given rise to some creative solutions. The configuration and portability problems do not disappear just like that, so people have tried to solve it within the restrictions of the language.

Portability between R6RS implementations

I believe that all R6RS Scheme implementations implement the de facto standard of importing libraries by first looking for them in files that end with the .<impl>.sls suffix before they try .sls. If Chez Scheme sees (import (foo)) then it first tries foo.chezscheme.sls before it tries foo.sls. This mechanism, even though it’s not in R6RS, is widely implemented.

This mechanism is used to create compatibility libraries. One striking example is the (xitomatl common) library from Derick Eddington’s xitomatl. It contains a few procedures that traditionally appear in Scheme implementation, but which are not in the reports. Here is common.chezscheme.sls:

;; Copyright 2009 Derick Eddington.  My MIT-style license is in the file named
;; LICENSE from the original collection this file is distributed with.

(library (xitomatl common)
  (export
    add1 sub1
    format printf fprintf pretty-print
    gensym
    time
    with-input-from-string with-output-to-string
    system
    ;; TODO: add to as needed/appropriate
    )
  (import
    (chezscheme))
)

There are matching libraries for Guile, Ikarus, Larcency, Racket, Mosh and Ypsilon. They export the same identifiers, but they all have some tweaks to adapt them to the various implementations.

This is the “Plan 9” approach to portability, as briefly described in the mailing list thread referenced above. Define APIs and let the implementation of the API hide the portability problems from the rest of the program.

Configuration for R6RS code

What is to be done for configuration? When I need this in my own code, I create a library that exports the configuration as identifier syntax. Here’s an abbreviated example from Loko Scheme:

(library (loko arch amd64 config)
  (export
    ; ...
    use-popcnt)
  (import
    (rnrs))

(define-syntax define-const
  (syntax-rules ()
    ((_ name v)
     (define-syntax name
       (identifier-syntax v)))))

; ...

(define-const use-popcnt #f))

I can then use this identifier syntax as a regular variable, like (if use-popcnt <do-this> <do-that>). But thanks to define-const it becomes inlined at the place where it is used. So if it’s set to #f then the expanded conditional is actually (if #f <do-this> <do-that>), which is trivial to optimize.

Akku supports this stuff

I have included support in Akku for both the R6RS and R7RS approach to portability. Akku will keep track of which Scheme implementation an R6RS library is meant for and adapt the way it installs the library. It will use the .<impl>.sls extension and even escape the file name correctly for that implementation.

With R7RS libraries it merely needs to install the .sld file to the right location. This is simple enough to do. But Akku also translates R7RS libraries into R6RS libraries. Akku has to do some interesting juggling when cond-expand appears at the define-library level.

Akku checks the list of features that appear cond-expand and looks to see if it recognizes any implementation names. For each implementation it then creates a copy of the library that is specific to that implementation. For each such copy it expands all cond-expand expressions at the define-library level, as best as it can, and installs .<impl>.sls files. Kind of dirty, but it mostly works.

For example, this library will be installed as hello.chezscheme.sls, hello.loko.sls, hello.sld and hello.sls:

(define-library (hello)
  (export hello)
  (cond-expand
   ((library (rnrs))
    (import (rnrs)))
   (else
    (import (scheme base))))

  (cond-expand
   (chezscheme
    (begin
      (define (hello)
        (display "Hello Chez!\n"))))
   (loko
    (begin
      (define (hello)
        (display "Hello Loko!\n"))))
   (else
    (begin
      (define (hello)
        (display "Hello world!\n"))))))

(The hello.loko.sls file it creates is actually a symlink to hello.sld because Akku knows that Loko supports R7RS).

Maybe a way forward

The picture I’ve painted seems quite damning for cond-expand. But the problem is not really cond-expand itself. The problem is when the misuse of cond-expand leads to a mess. I’m not advocating for its removal, but I would like it to be better understood for what it is. Some widely distributed guidelines for how to use it would go far in reducing the damage.

The “Plan 9” approach of making APIs can be done with cond-expand just as well as with the R6RS approach of .<impl>.sls>. It just requires that you know what you’re doing; that it’s a bad idea to sprinkle all your code with cond-expand and that you should keep this code in isolated libraries.

Finally, I’d like to say that cond-expand is actually more powerful than the .<impl>.sls approach. This power unfortunately makes static analysis of library declarations more difficult, but it also gives you access to feature identifiers that are more interesting than just the name of the Scheme implementation, such as x86-64 and clr. But please do hide your use of these behind an appropriate API.

Loko Scheme 0.9.0

weinholt — Sat, 21 Aug 2021 02:00:00 +0200

Loko Scheme 0.9.0 is now available from:

A bootable disk image for 64-bit PCs is available from:

The signatures are made with the GnuPG key 0xE33E61A2E9B8C3A2.

Loko Scheme 0.9.0 fixes bugs, improves performance and adds features. See NEWS.md in the distribution for a more detailed summary of changes.

Loko Scheme is an optimizing Scheme compiler that builds statically linked binaries for bare metal, Linux and NetBSD/amd64. It supports the R6RS Scheme and R7RS Scheme standards.

Loko Scheme’s web site is https://scheme.fail, where you can find the release tarballs and the manual.

Loko Scheme is available under GNU Affero GPL version 3 or later.

A Record Type Representation Trick

weinholt — Sat, 14 Aug 2021 02:00:00 +0200

I’ve been working on optimizations in Loko Scheme recently and have implemented large parts of A Sufficiently Smart Compiler for Procedural Records (Keep & Dybvig, 2012). At the same time I have improved the representation of record type descriptors and wanted to share a simple trick I used to improve record type checks for non-sealed records. But first I should explain what a record is in Scheme.

Background: Records in Scheme

Scheme supports record types, which are user-defined data types. R⁷RS Scheme has a syntax-based variant of this feature, based on SRFI-9. Here’s an example:

(import (scheme base))

(define-record-type point
  (make-point x y)
  point?
  (x point-x point-x-set!)
  (y point-y point-y-set))

This will let you write (make-point 0.0 0.0) to get a point at (0.0, 0.0), and (point-x p) to access the x field of p. That’s it for records in R⁷RS and SRFI-9.

The record type system in R⁶RS Scheme improves on SRFI-9 in several ways. In R⁶RS you would instead write this:

(import (rnrs))

(define-record-type point
  (fields (mutable x)
          (mutable y)))

There is no longer any need to explicitly write out the names of the constructor, predicate, accessors and mutators (unless you want to). Additionally, you can extend a record type, you can customize the constructor, and you can control what happens if define-record-type runs multiple times, i.e. if it makes a new type each time or not.

A syntactical record layer can be abstraction on top of a procedural layer. What you can do with syntax, you can also do with procedure calls at runtime. R⁶RS standardizes this layer as well. It also standardizes a record inspection layer that lets you get the record type descriptor (RTD) from a record at runtime (unless it’s marked as opaque) and also to inspect all aspects of RTDs. In fact, RTDs are objects in their own right, just like pairs and symbols.

The above record type definition might expand to this code that uses the procedural layer (a real expansion would use fresh identifiers for the RTD and RCD):

(define point-rtd
  (make-record-type-descriptor
    'point #f #f #f #f
    '#((mutable x) (mutable y))))
(define point-rcd
  (make-record-constructor-descriptor
    point-rtd #f #f))

(define point? (record-predicate point-rtd))
(define make-point (record-constructor point-rcd))
(define point-x (record-accessor point-rtd 0))
(define point-y (record-accessor point-rtd 1))
(define point-x-set! (record-mutator point-rtd 0))
(define point-y-set! (record-mutator point-rtd 1))

This code uses the helpers record-predicate, record-constructor, record-accessor and record-mutator to create procedures. An intermediate record constructor descriptor contains the information needed to make the constructor.

Next we will have a look at records in memory, how the code above is optimized, and finally a trick to speed up record type checks.

Record Type Representation

Loko Scheme has a straightforward representation of records. Using the above point type as an example, here are the records returned by (make-point 1.0 2.0) and (make-point 1.5 -2.0) as they are structured in memory. The rows are 64-bit words and the arrows are tagged pointers. Memory allocations are aligned to 16 bytes, and the empty space created by alignment is also shown.

The information stored in the record is a pointer to the record type descriptor, which is reused for each record of the same type, followed by a slot for each field in the record.

The information in the record type descriptor is used by the record inspection procedures and the garbage collector. The slots contain: a type tag for the rtd itself, the size of the records, an optional parent type, an optional record unique identifier, the field names, field mutability (a bit-field), and an optional record writer procedure.

Loko uses the first slot in the RTD to store the length of the RTD and these flags: opaque?, sealed?, generative?.

Other R⁶RS implementations will have similar representations of RTDs because all this information is needed at runtime.

Single Inheritance

R⁶RS supports single inheritance for record types. Instead of demonstrating this with some contrived geometric shapes or balloon animals, let’s use an example from working code. The following record types are simplified variants of records used in Loko’s PCI library.

(define-record-type pcibar
  (fields reg base size))

(define-record-type pcibar-i/o
  (parent pcibar))

(define-record-type pcibar-mem
  (parent pcibar)
  (fields type prefetchable?))

PCI base address registers (BARs) all have these fields: reg, base, and size. If they are in I/O space then that’s all, but BARs in memory space have two additional fields: type and prefetchable?.

Notice that both pcibar-i/o and pcibar-mem point to pcibar as their parent. The size field is larger in pcibar-mem to account for the extra fields. The extra fields in the pcibar-mem record are placed immediately after the fields that belong to pcibar, so accessors for pcibar don’t need to recompute the slot numbers when passed a pcibar-mem.

Predicates for Non-Sealed Types

R⁶RS lets you say that a record type is sealed. This prevents a record type from being extended. As a consequence, type checks are more efficient. Why is that?

The record predicate pcibar? is given an object and returns true if the object has that record type, and false otherwise. If the implementation uses tagged pointers then the predicate first checks the tag. Next, it reads the type field of the object and compares it to the pcibar type.

But if types are not sealed then they can be extended, and it’s possible that the type that the predicate is checking for was used as a parent. The pcibar? predicate should return true even for a pcibar-mem record.

The Trick

Previously Loko Scheme’s record-predicate procedure worked as follows. It checked the RTD to see if it’s sealed. For sealed RTDs it returned a procedure that implemented the fast check described above.

For non-sealed RTDs another procedure was returned that did that check, and additionally looped over all parent RTDs to see if any of them was the desired RTD:

(define (record-predicate rtd)
  (if (record-type-sealed? rtd)
      (lambda (obj)     ;fast path
        (and
          ($box? obj)
          (eq? ($box-type obj) rtd)))
      (lambda (obj)     ;slow path
        (and
          ($box? obj)
          (let ((t ($box-type obj)))
            (or
              (eq? t rtd)
              (and
                (record-type-descriptor? t)
                (let lp ((t (record-type-parent t)))
                  (cond
                    ((eq? t rtd) #t)
                    ((not t) #f)
                    (else
                     (lp (record-type-parent t))))))))))))

I haven’t checked around, but I suspect that most R⁶RS implementations do something similar. Even when I checked Chez Scheme’s assembly output I saw a loop that’s morally equivalent to this one. This loop also shows up in accessors and mutators, because they need to know that the object they’ve been passed has the right type.

The trick: this loop can be avoided by extending the RTD representation so that each RTD directly contains pointers to all parent RTDs. The pointers are laid out so that the base type is placed first, followed by the sub-types in order. An RTD will then appear at a fixed location in any RTD that extends it.

I’m sure that there are other language implementations where this problem of sub-typing shows up and someone else has come up with just this optimization, because it’s kind of obvious.

Suppose that we have a base record type and some record types that extend each other. For simplicity, I will not give them any fields.

(define-record-type A)

(define-record-type B (parent A))
(define-record-type C (parent B))

(define-record-type S (parent A))
(define-record-type T (parent S))

The expression (A? (make-A)) evaluates to true, which also A? does for all types shown. But (B? (make-T)) evaluates to false because T does not have B anywhere in its chain of parents. That’s what the loop would be checking.

This picture shows the memory layout when the trick is used on these RTDs.

The pointer to the A RTD is always in the 0: slot for any RTD that extends it. Similarly, the predicate for B knows to always check in the 1: slot. A bounds check on the RTD is also needed in this layout.

Taking it further

Further improvements on this layout are possible. Loko Scheme always allocates memory for four parent RTDs. If an RTD will appear at slots 0 to 3, then the predicate does not need to do bounds checking on the RTD. The parent: slot is not strictly needed and can be removed.

Specially just for Loko Scheme, the predicates hidden inside accessors and mutators use slightly less code than the predicate procedures. These hidden predicates do not explicitly verify the tags on the pointers, instead leaving it up to the processor’s built-in alignment checking to trap invalid references.

The type checks are even faster if specialized predicates, accessors, and mutators can be generated at compile time. If the RTD is known at compile time then the slot that contains the RTD is also known and can be inlined.

Sufficiently Smart

A sufficiently smart compiler is a legendary compiler that does your favorite optimizations so that your favorite language feature becomes very efficient.

In A Sufficiently Smart Compiler for Procedural Records, Andy Keep and Kent Dybvig present their work on optimizing the R⁶RS procedural record system in Chez Scheme. It builds on top of the source-level optimizer cp0. Loko Scheme has its own implementation of cp0, so adapting their work has been pretty simple.

The basic idea is to have cp0 generate static or partially static RTDs, which are then propagated throughout the program using cp0’s existing mechanisms. If cp0 succeeds in propagating the RTDs to where record-accessor (etc) are called, then it can also generate code specialized to each record type.

I’ve implemented large parts of the ideas in the paper in Loko Scheme, together with the improved record type representation.

Post-script

Compiling Loko Scheme with Loko Scheme is now almost as fast as compiling it with Chez Scheme, if garbage collection time is not counted (run the compilation with something like LOKO_HEAP=28000 if you have enough RAM). I’m not sure it’s an apples-to-apples comparison though, because when Chez is used, it also has to load and compile Loko’s compiler, whereas Loko cheats by already having it loaded. But still, Loko’s performance is improving. I’m interested in seeing how well the next release will fare in the Scheme Benchmarks.

Akku.scm 1.1.0 released

weinholt — Sat, 06 Feb 2021 01:00:00 +0100

Akku.scm version 1.1.0, a language package manager for R6RS and R7RS Scheme, is now generally available. It can be downloaded from GitLab. This version adds support for Guile 3.0, Digamma, and includes some bug fixes and new features.

Akku is a language package manager designed for Scheme. In Scheme, libraries can be analyzed to find their names, exports and imports. Akku uses this information, plus knowledge of how the various Scheme implementations work, to automatically install libraries where they will be found. Libraries can come from the current project or from packages downloaded from the Internet.

Akku installs libraries to a per-project library directory that works across all supported Scheme implementations. On top of this there is a traditional package index and a dependency solver. Packages are manually reviewed before they are published in the cryptographically signed index.

To increase the portability of R7RS code, Akku also performs an automatic conversion from the R7RS define-library form to the R6RS library form.

Akku supports Chez Scheme, Chibi Scheme, Digamma, GNU Guile, Gauche Scheme, Ikarus Scheme, IronScheme, Larceny Scheme, Loko Scheme, Mosh Scheme, Racket (plt-r6rs), Sagittarius Scheme, Vicare Scheme and Ypsilon Scheme. It has been tested on Cygwin, FreeBSD, GNU/Linux, MSYS2, OpenBSD and macOS.

Loko Scheme 0.6.0

weinholt — Sat, 29 Aug 2020 02:00:00 +0200

Loko Scheme 0.6.0 is now available from:

The release tarball is signed by the GnuPG key 0xE33E61A2E9B8C3A2.

Loko Scheme 0.6.0 introduces support for R7RS-small. The release tarballs now include a pre-built compiler and all dependencies needed for building Loko. See NEWS.md in the distribution for a more detailed summary of changes.

Loko Scheme is an optimizing Scheme compiler that builds statically linked binaries for bare metal, Linux and NetBSD/amd64. It supports the R6RS Scheme and R7RS Scheme standards.

Loko Scheme’s web site is https://scheme.fail, where you can find the release tarballs and the manual.

Loko Scheme is available under GNU Affero GPL version 3 or later.

Akku Archive Improvements

weinholt — Sat, 20 Jun 2020 02:00:00 +0200

Akku.scm is a language package manager for R6RS and R7RS Scheme. The software that powers the package index has been growing beyond the simple one-liner it was in the beginning and today I’ve finally pushed it to a public repository. I’ve also made preparations for hosting packages as tarballs directly in the archive.

Tarballs

The Akku archive has never hosted packages directly. The index points at git repositories and commit revisions. These are added to each project’s Akku.lock file and are used when akku install clones the repository.

This has two major drawbacks. Cloning a git repository can be really slow. The repositories are also hosted on sites like GitHub where users sometimes decide to force-push or remove the repositories completely. I feel this is likely to happen more often the more politics and business influences GitHub in the future.

I’ve prepared the Akku archive to host tarballs directly. These are made with git archive from the submitted git repository. Downloading these is much faster than cloning a repository, they are not at risk of being removed at a whim, and they are cached in a local shared cache. Other package managers as a rule host their own archives as well, so this is nothing unusual.

Provenance

It’s important to me that users of Akku can trust that they get original software that has not been tampered with. I review all code that goes into the archive to protect Akku against use in supply chain attacks.

Building tarballs changes the equation a little bit since you now need to trust that the tarballs have not been tampered with. Tarballs are verified when they are downloaded, but how do you know that they match the original software?

This can be seen as an issue of provenance, or providing proof of the history of a piece of software. Here is the chain for the new tarballs:

Akku packages are submitted through akku publish by a developer (or by the Snow mirror software) as a .akku file with a detached GPG signature. This signature can be independently verified by fetching the key from the keyservers.

The signed .akku file contains a git commit id. Because it is signed by the person who submitted the package, we can use the signature to verify that it was not tampered with after it went into the archive.

Copies of these files are hosted under /archive/packages.
The archive software creates a tarball from the original repository using the git archive command. It also creates a new .akku file which contains information about the original repository and commit id as a comment. The non-comment part of the file contains the URL and hash of the new tarball. Like other .akku files, it is signed. This provides a signature linking the original git commit to the new tarball’s hash.

These files are available under /archive/pkg. The signature is made with the current Akku archive key, which is in turn signed by my own key (which is in the Debian keyring).
The .akku files for Snow packages and the new tarballs are combined using akku archive-scan and written to Akku-index.scm, which is then XZ-compressed and signed with the archive key. The akku update command verifies the signature when it downloads this file. When Akku creates an Akku.lock file it incorporates the hash from the index, which is verified when akku install runs.

The above should make it possible for any interested party to check the integrity of the archive. It also protects against attackers uploading funky tarballs that don’t match the git repository.

All git repositories and Snow packages are mirrored in the archive under /archive/mirror. This mirror is not used in the index and is mostly provided for backup purposes.

Beta testers

The new index with tarballs is not live yet, it needs some testing.

Anyone who wants to do so can try it and report successes or failures in the comments section below or in GitLab issues. Here is how to update to the new archive manually:

curl https://archive.akkuscm.org/beta/Akku-index.scm \
  > ~/.local/share/akku/index.db

There is a GPG signature (.sig) in the same directory in case you want to verify that it was not tampered with.

Run akku lock in your existing project to get a lockfile that uses the new index. Then run akku install to download your packages as usual.

If all goes well then some time soon the switch to the new index will happen and akku update will use the new style index. You will still be able to revert to the old index by downloading Akku-origin.scm manually from the archive site and then use that as your index. This file will keep being maintained because that is where the Akku website generator finds pointers to upstream Git repositories.

Quasiquote - Literal Magic

weinholt — Fri, 15 May 2020 02:00:00 +0200

While I was writing a manpage for Scheme’s quasiquote, something I saw surprised me and changed my understanding of quasiquote. It turns out that a new language, with semantics that are interesting to PLT enthusiasts, hides behind the innocent backtick character. Starting with R6RS Scheme, quasiquote became total magic.

Background

It is not going to be easy to understand the argument in this article if you lack some background knowledge, so here’s a brief explanation of quasiquote, Scheme’s concept of locations, Scheme’s handling of literal constants, and finally referential transparency.

Briefly on quasiquote

Quasiquote is a language feature in Scheme that lets you write a template for a structure of lists and vectors. These templates are more like web templates than C++ templates; don’t let the terminology confuse you.

Basically you write a backtick character to start a template. The code immediately following the backtick is the template. You write a comma wherever you want to fill in some variable or other expression. (There’s also a list-splicing version of the comma which often comes in handy).

The expression `(b "Hello " ,x) builds a list with these elements: the symbol b, the string "Hello" and lastly whatever value the variable x happened to have. So perhaps (b "Hello " "John").

Quasiquote is very useful to build SXML and SHTML. Forget learning a new templating system every week; this one’s a keeper. But this being Scheme, the most popular use is likely to write S-expressions that represent code in some language. It’s used for just that in the nanopass framework.

Location, location, location!

Making a language is a difficult job. Everything should ideally work smoothly together as a coherent whole, like pineapples on a pizza. Objects in Scheme programs implicitly refer to locations. There are many details around that which affect the whole language, and they have interesting consequences for quasiquote.

What’s a location? It’s just a place where you can store a value. The vector #(1 2 3) has three locations where values are stored, and currently it’s the numbers 1, 2 and 3. If you make a vector using make-vector then the vector is mutable and you can change the values. Later when you see the same vector again it will still contain the new values.

In practice a location is some address in memory, but the garbage collector might move it around, so its address changes, but it is still the same location.

Other objects in Scheme do not have locations. The number 1 does not have any locations that hold values that you can change. This is only because of the wisdom, kindheartedness and foresight of the language designers, because it is possible to design things differently.

As a consequence of numbers not having locations, there is also very little point in worrying about which number object you have. Suppose that numbers did have locations and you could store important information in them. You’d be very concerned that the 1 number object where you stored all your passwords is the same 1 that you now have in your secrets variable. (Nobody will ever think to look inside the number 1 for your passwords, so your secrets are safe there.) But number objects do not actually have locations, so it doesn’t matter if the Scheme implementation fiddles with them behind your back and gives you a different 1 object the next time you’re looking.

Literal constants

Pairs and vectors have locations, but the rules for their locations are much relaxed if they are literal constants in the program code.

Constants in Scheme are allowed to have read-only locations. If you compile a program with Loko Scheme, you will notice that you get an assertion if you try to change any of the constants. From R6RS:

It is desirable for constants (i.e. the values of literal expressions) to reside in read-only memory. To express this, it is convenient to imagine that every object that refers to locations is associated with a flag telling whether that object is mutable or immutable. Literal constants, the strings returned by symbol->string, records with no mutable fields, and other values explicitly designated as immutable are immutable objects, while all objects created by the other procedures listed in this report are mutable. An attempt to store a new value into a location referred to by an immutable object should raise an exception with condition type &assertion.

Literal constants can also share locations. If the same constant appears in different places in the program then the compiler is allowed to create a single shared instance of the constant, here explained as it applies to structures, in the section on eqv?:

Implementations may share structure between constants where appropriate. Thus the value of eqv? on constants is sometimes implementation-dependent.

The illustrative examples are:

(eqv? '(a) '(a))         ;⇒ unspecified
(eqv? "a" "a")           ;⇒ unspecified
(eqv? '(b) (cdr '(a b))) ;⇒ unspecified
(let ((x '(a)))
  (eqv? x x))            ;⇒ #t

So when it comes to literal constants, Scheme’s normal storage rules do not apply. A program might find that two different locations have become the same location, so that changing the value in a quoted vector ends up changing the value in another quoted vector. It’s also very likely that the program gets an exception when it tries to change the value in such a location. The last example shows that going the other way is not allowed: the compiler is not allowed to create two different versions of the list (a) in that program.

Referential transparency

Referential transparency is a concept that is important in purely functional programming languages. An expression that is referentially transparent can be replaced by the values it returns.

Some expressions in Scheme are referentially transparent. Constants, references to variables that are never mutated, arithmetic in general, type predicates, etc. A Scheme compiler is allowed to replace (+ 1 2) with 3. It doesn’t matter that the program might actually have returned a “different” 3 each time the expression runs. In the same way it doesn’t matter if the compiler turns two different constants into the same constant.

Most parts of Scheme are not referentially transparent. As an example, a Scheme compiler cannot replace (vector 1 2 3) with '#(1 2 3). The locations created by the vector procedure need to be fresh and mutable. But it can replace (vector-ref '#(1 2 3) 0) with 1, so this expression is referentially transparent. And as we previously saw, it can replace (cdr '(a b)) with '(b).

All of this should be fairly widely known, but now comes the interesting part.

There is a crack in everything

So far I have explained Scheme’s notion of locations, referential transparency and that the rules are different for literal constants.

Behold this hidden gem in R6RS:

A quasiquote expression may return either fresh, mutable objects or literal structure for any structure that is constructed at run time during the evaluation of the expression. Portions that do not need to be rebuilt are always literal. Thus,
(let ((a 3)) `((1 2) ,a ,4 ,'five 6))
may be equivalent to either of the following expressions:
'((1 2) 3 4 five 6)

(let ((a 3))
   (cons '(1 2)
         (cons a (cons 4 (cons 'five '(6))))))
However, it is not equivalent to this expression:
(let ((a 3)) (list (list 1 2) a 4 'five 6))

This part of R6RS originally came from Kent Dybvig’s formal comment #204. The same type of language was adopted in R7RS.

The meaning is that a quasiquoted expression can be turned into a literal, or parts may be turned into literals. Where there was code in the quasiquote expression, there can now be a literal. Going the other direction is not allowed: literals cannot be turned into code that returns fresh, mutable structure. But as the example '((1 2) 3 4 five 6) shows, a compiler is allowed to even propagate constants into quasiquote.

There is a very deep rabbit hole here! Have a look again: return […] literal structure for any structure that is constructed at run time during the evaluation of the expression.

There is now a way to construct literals from run time code, but to do so ahead of run time.

Literal magic

Let me demonstrate the power of Scheme’s magic quasiquote. Let --> mean “equivalent to”. It can be the result of an expansion or another compiler pass, such as a partial evaluator. Here is the original, innocuous-looking example:

(let ((a 3)) `((1 2) ,a ,4 ,'five 6))
; -->
'((1 2) 3 4 five 6)

Can we get literal structure copied into the constant part? Easy:

(let ((a '(3))) `((1 2) ,a ,4 ,'five 6))
; -->
'((1 2) (3) 4 five 6)

But we’re just getting started. Can we construct a structure at runtime and have that appear as a constant? Of course!

`((1 2) ,(list 3) ,4 ,'five 6)
;; -->
'((1 2) (3) 4 five 6)

Those Schemers who are paying attention will be thinking I’ve gone mad now. Maybe I have, but this example simply returned literal structure for the (list) structure that was constructed at run time during the evaluation of the expression, to paraphrase the quasiquote specification. Let’s increase the volume:

`((1 2) ,(map + '(1 1) '(2 3)) ,'five 6)
;; -->
'((1 2) (3 4) five 6)

Hang on, shouldn’t map return a fresh, mutable list? Not anymore, this is quasiquote. The map function constructs a list at run time during the evaluation of the quasiquote expression, so the structure no longer needs to be fresh. (Besides, the R6RS and R7RS definitions of map do not actually say that the list needs to be fresh and mutable, but everyone probably assumes it does.)

(letrec ((iota
          (lambda (n)
            (let lp ((m 0) (n n))
              (if (>= m n)
                  '()
                  (cons m (lp (+ m 1) n)))))))
  `((1 2) ,(iota 4) ,'five 6))
;; -->
'((1 2) (0 1 2 3) five 6)

Is this valid? I think most would say it isn’t; the list created by iota is constructed with cons, which returns fresh, mutable pairs. But this is happening inside quasiquote where the normal rules of society break down. The iota procedure constructs a list structure at runtime during the evaluation of a quasiquote expression, so a compiler is allowed to return literal structure for that list structure.

These go to 11

Let’s crank it up and make quasiquote transform code.

Compilers like Chez Scheme, Loko Scheme, Guile Scheme and many others use partial evaluators to perform source-level optimizations. A partial evaluator runs code symbolically. The output is a new program that is hopefully more efficient than the program that went into it.

The partial evaluators used by Scheme compilers are pretty mild as far as partial evaluators go, mostly because of the semantics of Scheme. Doing more powerful transformations on Scheme programs would require quite powerful static analysis, and that is both slow and difficult.

To get quasiquote to work with code, we need something that enables a partial evaluator to run a given procedure in such as way that it’s always inside a quasiquote expression. If we have such a procedure then the partial evaluator can start using the tricks described above, and treat all code that constructs new structures as if their structures were literals. This makes the partial evaluator very happy, so here is happy:

(define (happy proc)
  (lambda x
    `,(apply proc x)))

In a normal Scheme implementation this operator doesn’t do much more than maybe waste a little time and space. But in a Scheme that knows the magic nature of quasiquote, it would enable powerful program transformations on lists and vectors, without the need for as much analysis as it would normally require. In particular, it should no longer be necessary to analyze if intermediate results are mutated, nor to analyze if programs check results for pointer equality with eq?.

Here is an illustrative example of a potential program transformation, based on Philip Wadler’s 1990 paper Deforestation: Transforming programs to eliminate trees:

(define (upto m n)
  (if (> m n)
      '()
      (cons m (upto (+ m 1) n))))

(define (square x)
  (* x x))

(define (sum xs)
  (let sum* ((a 0) (xs xs))
    (match xs
      [() a]
      [(x . xs) (sum* (+ a x) xs)])))

(define (squares xs)
  (match xs
    [() '()]
    [(x . xs) (cons (square x) (squares xs))]))

(define f
  (happy (lambda (n) (sum (squares (upto 1 n))))))
;; -->
(define f
  (lambda (n)
    (letrec ((h0 (lambda (u0 u1 n)
                   (if (> u1 n)
                       u0
                       (h0 (+ u0 (square u1)) (+ u1 1) n)))))
      (h0 0 1 n))))

After deforestation (or fusion), the intermediate lists used in f have been eliminated. This is beneficial in that you can write high-level code, but still have the compiler produce the efficient loop you would have had to write by hand. Scheme compilers normally don’t do these transformations due to the required analysis.

The happy operator does not completely open the barn doors: the transformation still needs to not change other program side effects, such as I/O and exceptions.

The fly in the ointment

Imagine a program written according to this template:

(define (main)
  ...)

(define main* (happy main))

(main*)

What are the consequences for the main program? It would seem that everything in it follows the rules of quasiquote and it can’t use cons in the normal way. This is bad news for the main program.

Conclusions

I don’t know where this leads. What is the precise limit for what a compiler can and can’t turn into literal structure in quasiquote? The example with the main program makes it seem that quasiquote actually gives the compiler a bit too much freedom.

Perhaps it’s actually just poor wording, so R6RS and R7RS will get a new erratum that clarifies what is and what isn’t allowed. I suspect that this is the most likely outcome.

But it doesn’t stop someone who is working on a partial evaluator or another program transformation from proposing a happy operator as a SRFI, giving it semantics that enable even more powerful transformations, but without the need to rely on language lawyering.

There is one conclusion I can draw from all this: don’t assume that what comes of out quasiquote can be mutated.

Loko Scheme 0.4.3

weinholt — Mon, 02 Mar 2020 01:00:00 +0100

Loko Scheme 0.4.3 is now out with a few important fixes, new features and network card drivers for eepro100, rtl8139, virtio net and Linux tuntap devices. The include form is now available and #u8() is recognized. Hashtables are written using Racket’s #hasheq() syntax, and cycles are handled while printing records.

Read more about the drivers in the companion article Device Drivers in Loko Scheme.

A New R6RS Scheme Compiler

weinholt — Wed, 02 Oct 2019 02:00:00 +0200

Some readers already know this and a few have suspected. I’ve been working on a new R6RS Scheme compiler for a while. Now I have released it as free software. Read on to learn the many wonderful drawbacks of this niche compiler.

I will start with what many will find to be the largest drawback, so that those of you who don’t want it can close this tab right away and never look back (but don’t close the tab!). The compiler is licensed under the GNU Affero General Public License, version 3 or later. I chose this license, not only to promote chaos and disorder, but also because of where I see technology and society heading.

With that out of the way, here are some questionable facts about Loko Scheme:

You can download it from the Loko Scheme web site.
It is written in R6RS Scheme and a wafer thin amount of assembly. Once it has been bootstrapped it can self-compile. There is no C code in the compiler or the runtime.
It generates code for the AMD64 instruction set.
It has a few optimization passes, such as Fixing Letrec (reloaded), the inliner cp0 (also used in Chez Scheme and Ikarus Scheme) and a low-level instruction level optimizer. Some code runs really fast, but most code runs just okay.
It outputs statically compiled binaries only, although there is also an interpreter.
The binaries are simultaneously Linux ELF binaries and multiboot binaries for bare hardware.
The Linux port of Loko starts in just 3 ms on my machine.
It has concurrency based on Concurrent ML with an API surface mostly nicked from fibers for Guile. I/O is non-blocking by default. If you’re familiar with Golang then this part will feel familiar.
Most SRFIs from chez-srfi are supported and there’s an early POSIX library for the Linux port based on the current SRFI 170 draft.

Why Loko

Why not? If you can live with the license (which really isn’t as bad as you might think), then Loko is definitely for you. If one looks at how the GPL has worked out for Linux, I think it will be okay. Linux’s license doesn’t extend to user space and I want that aspect to work the same for Loko.

My original use case for Loko Scheme is experimental operating system development. Forget about all legacy software and build your own kernel, with blackjack and hookers, so to speak. Due to the nature of that kind of work, I think it will necessarily be useful for more things.

Suppose that you can compile a Scheme program on your machine and have a guarantee that it will work on all other Linux AMD64 systems. No confusion with glibc vs musl vs whatever. You also have access to non-blocking I/O, a concurrency library and direct syscalls. What can you build with that?

There’s a directory in the Loko repository with the following samples, which might give you an idea of where Loko is today:

bga-graphics - this is a simple program that uses the linear framebuffer of the Bochs graphics adapter (available in QEMU), reads a 3D model and renders it on the screen.
etherdump - simple driver for an RTL8139 networking chip combined with a text-mode based Ethernet frame dumper.
hello - it’s just Hello World as a library, that runs on Linux or on bare hardware (printing to the serial port)
lspci - scans the devices on the PCI bus, prints the register locations, IRQs and option ROM sizes, and uses the PCI ID database to print the name of the devices and the vendors
web-server - this Linux program sets up a concurrent web server that responds with a static payload. Pretty simplistic code, but not too far away from a working web server. Handles maybe 15k requests per second.

Future direction

In the future I think there will be ports to more kernels, probably NetBSD and FreeBSD, and more instruction sets. I’d like to try to port it to AArch64 myself unless someone gets there before me.

But before that, I will work on more operating system stuff. There’s an experimental USB stack. I’ve got some code that reads a file from a FAT file system on a USB stick, but it’s very slow in its current manifestation. I just recently added a buddy allocator and fibers and just haven’t had time to write more drivers.

Kernel programming with Loko is not as difficult as regular kernel programming. The concurrency model of kernel code is the same as the one for user programs. There is no need to care about writing special code safe for interrupt contexts. If you’re a Scheme programmer then you can probably already do kernel programming with Loko; you just don’t know it yet.

I also want to have user space support in Loko. This would mean you could have a kernel in Scheme that can run regular Unix-like programs. (It’s not a pipe dream either, it’s actually pretty straightforward). This would let you design your own syscall interface for programs. Loko doesn’t really care about what those syscalls do and doesn’t have any opinions about file systems, networking and drivers. If you preferred how file systems worked in TOPS-20, ITS or VMS, then you could make it work that way.

Do not cling to the past. Download Loko and start experimenting today!

Announcing Akku.scm 1.0.0

weinholt — Fri, 26 Jul 2019 02:00:00 +0200

I am happy to announce the general availability of Akku.scm 1.0.0, a language package manager for R6RS and R7RS Scheme. It can be downloaded from GitLab and GitHub.

Akku is a package manager with features specially designed for Scheme. The library systems of R6RS and R7RS, where libraries are fully self describing, make it possible to automatically analyze source code to find libraries and imports.

To increase the portability of R7RS code, Akku also performs an automatic conversion from the R7RS define-library form to the R6RS library form.

Akku supports Chez Scheme, Chibi Scheme, GNU Guile, Gauche Scheme, Ikarus Scheme, IronScheme, Larceny Scheme, Loko Scheme, Mosh Scheme, Racket (plt-r6rs), Sagittarius Scheme, Vicare Scheme and Ypsilon Scheme. It has been tested on Cygwin, FreeBSD, GNU/Linux, MSYS2, OpenBSD and macOS.

Akku has been in development for 21 months.

Terminfo and its DSL

weinholt — Tue, 05 Feb 2019 01:00:00 +0100

Programs for Linux that run in the terminal often use color. There are a few approaches to making this work. Many programs use hardcoded ANSI compatible escape sequences, which are widespread enough today that they work almost everywhere. There are drawbacks to hardcoding these and for that reason there’s a database called terminfo, which has its own stack-based Domain Specific Language (DSL).

The terminfo database has entries for most terminals that were ever made, even very obscure brands. My machine has 1743 entries in /lib/terminfo and /usr/share/terminfo. Terminfo provides a standardized interface towards all these terminals, including future terminals.

The terminfo database is also used by ncurses, a library for making programs like the one shown above. These programs will work more or less the same on all terminals that supports cursor movement. If there is no color support then they are monochrome, but still they work. Terminfo has a standard set of booleans and numbers that tell you how many colors the terminal supports, how many rows and columns it has (by default), whether it supports a mouse and also which quirks it has.

$ echo "$(tput bold; tput setaf 3;
tput setab 4)Yellow on blue$(tput sgr0)"

The tput tool uses the terminfo library to generate escape sequences for the current terminal, which it finds through the TERM environment variable.

A dumb terminfo entry

The infocmp tool can inspect terminfo entries:

$ TERM=dumb infocmp -x
dumb|80-column dumb tty,
    am,
    cols#80,
    bel=^G, cr=\r, cud1=\n, ind=\n,

All terminfo entries consist of named fields (called capabilities) that are either booleans, numbers or strings. The names are short and cryptic, although there is a longer version available as well. The terminfo(5) manpage has a short description of all fields, e.g. cud1 means cursor_down and contains the bytes that move the cursor down one line. Unsurprisingly, it’s a newline character. A pre-requisite is that the terminal is in raw mode, so the kernel does not translate newline to newline + carriage return as it would normally do.

There is also support for extended capabilities, which are not in the pre-defined list. The difference is basically that the binary terminfo format then has to explicitly encode the name of the field, whereas otherwise it is implicit by its location in the file. Terminfo can use these to encode new and interesting capabilities, like support for 24-bit color.

An advanced entry

Let’s go from dumb to advanced. The terminal emulator xterm has a variant with support for 24-bit color. It’s big and cryptic!

TERM=vt100 infocmp -x
#    Reconstructed via infocmp from file: /lib/terminfo/v/vt100
vt100|vt100-am|dec vt100 (w/advanced video),
    OTbs, am, mc5i, msgr, xenl, xon,
    cols#80, it#8, lines#24, vt#3,
    acsc=``aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
    bel=^G, blink=\E[5m$<2>, bold=\E[1m$<2>,
    clear=\E[H\E[J$<50>, cr=\r, csr=\E[%i%p1%d;%p2%dr,
    cub=\E[%p1%dD, cub1=^H, cud=\E[%p1%dB, cud1=\n,
    cuf=\E[%p1%dC, cuf1=\E[C$<2>,
    cup=\E[%i%p1%d;%p2%dH$<5>, cuu=\E[%p1%dA,
    cuu1=\E[A$<2>, ed=\E[J$<50>, el=\E[K$<3>, el1=\E[1K$<3>,
    enacs=\E(B\E)0, home=\E[H, ht=^I, hts=\EH, ind=\n, ka1=\EOq,
    ka3=\EOs, kb2=\EOr, kbs=^H, kc1=\EOp, kc3=\EOn, kcub1=\EOD,
    kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA, kent=\EOM, kf0=\EOy,
    kf1=\EOP, kf10=\EOx, kf2=\EOQ, kf3=\EOR, kf4=\EOS, kf5=\EOt,
    kf6=\EOu, kf7=\EOv, kf8=\EOl, kf9=\EOw, lf1=pf1, lf2=pf2,
    lf3=pf3, lf4=pf4, mc0=\E[0i, mc4=\E[4i, mc5=\E[5i, rc=\E8,
    rev=\E[7m$<2>, ri=\EM$<5>, rmacs=^O, rmam=\E[?7l,
    rmkx=\E[?1l\E>, rmso=\E[m$<2>, rmul=\E[m$<2>,
    rs2=\E<\E>\E[?3;4;5l\E[?7;8h\E[r, sc=\E7,
    sgr=\E[0%?%p1%p6%|%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;m%?%p9%t\016%e\017%;$<2>,
    sgr0=\E[m\017$<2>, smacs=^N, smam=\E[?7h, smkx=\E[?1h\E=,
    smso=\E[7m$<2>, smul=\E[4m$<2>, tbc=\E[3g,

Many interesting things are going on here. All the string capabilities that start with k, plus a few more, tell you how the terminal encodes keys. The home entry says that the Home key sends \E[H (i.e. ESC [ H). A quirky thing with this terminal is that it requires delays after some commands. Those are encoded as e.g. $<5>, which is a 5 millisecond delay. The delays are generated by sending an amount of NUL bytes that will generate the appropriate delay given the terminal’s current baud rate (although today they are often omitted).

Anatomy of a string

All non-dumb terminals can be told to move the cursor using the cup (cursor_address) capability. When tput is called as tput cup 5 10 it takes two arguments: the row and the column. In this vt100 entry this string is fairly simple:

cup=\E[%i%p1%d;%p2%dH$<5>

It’s ASCII soup. This is a program written in a DSL. Actually, all the strings that generate output are written in this DSL, even if they just output the same bytes every time. I have written a parser and compiler in R6RS Scheme for this DSL, which you can find in the text-mode package. Here is the result of tokenizing the string:

> (import (text-mode terminfo parser))
> (parse-term-string (string->utf8 "\x1b;[%i%p1%d;%p2%dH$<5>"))
((print #vu8(27 91))      ;prints ESC [
 (inc-p1/p2)              ;increment parameters 1 and 2
 (parameter 1)            ;push parameter 1 to the stack
 (printf "%d" #f #f #f #\space #f #f #\d)  ;pop and printf as %d
 (print #vu8(59))         ;print ;
 (parameter 2)            ;push parameter 2 to the stack
 (printf "%d" #f #f #f #\space #f #f #\d)  ;pop and printf as %d
 (print #vu8(72))         ;print H
 (msleep #vu8(53) #f #f)) ;sleep for 5 ms

Parameters passed to the program itself are addressed by their index (1-9), whereas operations in the program implicitly pop from or push to an argument stack. Some operations pop their arguments, e.g. the %d operation that pops a number and formats it as the printf function in the C library would. Other arguments push to the stack, e.g. %p1 that pushes the first parameter.

The %i operation is an interesting special operation that increments the first two parameters. This is useful because terminfo (and termcap, its historical predecessor) uses zero-based indexing for rows and columns, whereas VT100/ANSI terminals use one-based indexing.

The terminfo language

The terminfo language is a simple stack-based DSL that has parameters, if-then-else, built-in printf (with padding and decimal, octal and hex output), persistent variables and basic math/logic operators.

It is a dynamically typed language that supports integers of some unspecified type (i.e. signed 32-bit) and NUL terminated strings. There are two operations on strings: %l (pop and push the length of a string) and %s (pop and print a string). Terminfo generally has no clue if what is on the argument stack is a valid string or not, so these operations can very easily be made to cause segfaults in C implementations of terminfo.

The persistent variables are interesting. There are two sets of them: the static and the dynamic. Historically there was likely some difference between these, but today they appear to be the same. They are commonly used as temporary variables inside programs, as a way to get around the need to manage the argument stack properly. But they can also be used to store information that persists between calls to the terminfo library (when used through tparm/tiparm rather than tput).

So basically terminfo comes equipped with a language that can scan arbitrary addresses for NUL bytes (strlen), copy memory from arbitrary addresses in your program to the terminal, and persist information between library calls and perform basic logic and arithmetic. It lacks loops, which means it is not really Turing complete as such, but that can be worked around by relying on multiple calls to the program and by using the persistent variables to drive control flow through if-then-else constructs. Let’s hope your terminfo entry comes from a trustworthy source.

Compiling terminfo programs

The ncurses implementation of terminfo uses a simple combined lexer-and-interpreter to run the programs. One of my hobbies is compilers, so I decided to do things differently in the text-mode package. I decided to compile the programs.

Let’s have a look at the setab string from the xterm-direct entry:

> (import (text-mode terminfo parser))
> (define setab "\x1b;[%?%p1%{8}%<%t4%p1%d%e48\\:2\\:\\:%p1%{65536}%/%d\\:%p1%{256}%/%{255}%&%d\\:%p1%{255}%&%d%;m")
; Tokenizer output
> (parse-term-string (string->utf8 setab))
((print #vu8(27 91))
 (if)
 (parameter 1)
 (push 8)
 (<)
 (then)
 (print #vu8(52))
 (parameter 1)
 (printf "%d" #f #f #f #\space #f #f #\d)
 (else)
 (print #vu8(52 56 92 58 50 92 58 92 58))
 (parameter 1)
 (push 65536)
 (quotient)
 (printf "%d" #f #f #f #\space #f #f #\d)
 (print #vu8(92 58))
 (parameter 1)
 (push 256)
 (quotient)
 (push 255)
 (bitwise-and)
 (printf "%d" #f #f #f #\space #f #f #\d)
 (print #vu8(92 58))
 (parameter 1)
 (push 255)
 (bitwise-and)
 (printf "%d" #f #f #f #\space #f #f #\d)
 (endif)
 (print #vu8(109)))
; Compiler output
> (expand/optimize (terminfo-expression setab))
(lambda (p dvar svar baudrate lines p1 p2 p3 p4 p5 p6 p7 p8 p9)
  (put-bytevector p #vu8(27 91))
  (if (< p1 8)
      (begin
        (put-u8 p 52)
        (ti-printf p "%d" p1 #f #f #f #\space #f #f #\d))
      (begin
        (put-bytevector p #vu8(52 56 92 58 50 92 58 92 58))
        (let ([v (quotient p1 65536)])
          (ti-printf p "%d" v #f #f #f #\space #f #f #\d)
          (put-bytevector p #vu8(92 58))
          (let ([v (bitwise-and 255 (quotient p1 256))])
            (ti-printf p "%d" v #f #f #f #\space #f #f #\d)
            (put-bytevector p #vu8(92 58))
            (let ([v (bitwise-and 255 p1)])
              (ti-printf p "%d" v #f #f #f #\space #f #f #\d))))))
  (put-u8 p 109)
  (void))

The output from the compiler is ready to be consumed by eval, which in Chez Scheme will generate fast machine code for this program.

The process is a whole lot more complex than it looks. The compiler has to translate stack-based code to direct code, which means that it has to keep track of the argument stack and translate it to procedure arguments and return values. It has to implement if-then-else and handle arguments going into the branches and possibly going out of the branches. On top of that the first two parameters can be mutated and the resulting code should still be efficient.

To make things easier, I relied on cp0 (compiler pass 0). This is a partial evaluator described in Oscar Waddell’s Ph.D. dissertation and is built-in to Chez Scheme. It basically lets my compiler write pretty terrible code, but terrible in a particular good way that cp0 likes. The call to expand/optimize above shows the output after cp0 has done its job.

The first thing the compiler does is to statically analyze the operations in the program to find the required stack size. Taking the %+ operation as an example, it first pops two values and then pushes a value. This means that: the stack has to have room for at least two values, there are two values going in to the operation, and there is one value going out. The concatenative nature of the language makes the analysis simple to perform for any sequence of operation. This analysis is carried out for the whole program, but also for branches in the program.

The stack is made explicit as variables in the generated program. This is key to letting cp0 get rid of the stack completely. Here is the code generated for the program %p1%d before cp0:

(lambda (p dvar svar baudrate lines p1 p2 p3 p4 p5 p6 p7 p8 p9)
  (define (k . x) (if #f #f))
  (let ([p1^ p1] [p2^ p2] [s0 0])
    (let ()
      ((lambda (s0)
         (let ([s0 p1^])
           ((lambda (s0)
              (let ([v s0])
                (ti-printf p "%d" v #f #f #f #\space #f #f #\d)
                (k s0)))
             s0)))
        s0)))
  (if #f #f))

This is lambda soup. The k procedure acts as a continuation in the case when the compiler detects that it can’t generate sequential code for an if-then-else. This is rare, but happens when a branch affects the explicit program state in a bad way. It can be due to either a side-effect in a branch (i.e. %i) or that a branch affects the stack. In these cases the output from the branches has to be passed to the continuation of the program, which is therefore made explicit as a new k procedure. In all examples I looked at, cp0 optimized this quite well.

The program after cp0:

(lambda (p dvar svar baudrate lines p1 p2 p3 p4 p5 p6 p7 p8 p9)
  (ti-printf p "%d" p1 #f #f #f #\space #f #f #\d)
  (void))

Notice that the program before cp0 has an s0 variable. This is the single stack location needed for this program. Pushing a value to the stack is simply done by rebinding s0 to the new value, (let ([s0 p1^]). This binds s0 to p1^, which is the program’s current value of parameter 1. It is different from p1, because incrementing p1 in %i is handled by rebinding p1^ as (+ 1 p1). The mutable program state (s0 in this case) is passed to the rest of the program by application of a procedure that rebinds all state variables and contains the rest of the program, in this case ((lambda (s0) ...) s0).

When popping from the stack, a temporary variable is bound to s0 and the rest of the stack is rebound. The temporary variable is then used in the implementation of the operation, in this case ti-print. Since this example only has one stack slot nothing happens to s0, but otherwise s0 would be rebound to s1, and so on.

This is a lot of rebinding and unnecessary lambdas, but cp0 eats this kind of code for breakfast and transforms it into (most of the times) optimal code. This approach to compiling the programs is made easier by the fact that there is no advanced control flow.

Summary

Down the rabbit hole, indeed. Terminfo has a strangely powerful DSL that specializes in generating escape sequences for terminals. Although ncurses uses an interpreter to run the programs, it is also possible to compile them and get efficient direct code.

Someone clever in a white hat should probably have a look at how good it is that terminfo programs, in the ncurses implementation, can copy arbitrary memory to the terminal. This has not been possible to implement in the Scheme implementation of the same.

Alignment Checking & Meltdown

weinholt — Sun, 06 Jan 2019 01:00:00 +0100

Here is some interesting news for compiler writers worried about Meltdown. I have previously described a way to get hardware-based type checks (think branchless car, cdr, vector-ref, etc.) using alignment checks. It now appears that this technique may be immune to Meltdown-type attacks:

Alignment Faults.

Upon detecting an unaligned memory operand, the processor can (optionally) generate an alignment check exception (#AC). We found that the results of unaligned memory accesses never reach the transient execution. We suspect that this is because #AC is generated early-on (even before the operand’s virtual address is translated to a physical one). Thus, Meltdown-AC is not possible.

– A Systematic Evaluation of Transient Execution Attacks and Defenses (2018, Canella, et al.)

The kernel unfortunately can’t use it because #AC does not work at CPL=0, but for user space it could be a great way to avoid some Meltdown vulnerabilities.

Design Your Low-Bit Tagging with Z3Py

weinholt — Sun, 18 Nov 2018 01:00:00 +0100

Low-bit tagging is a technique where the low bits of values are used to store type information. There are numerous benefits that come with this technique and it is quite popular in implementations of Scheme, JavaScript and other languages. But once you start down the road of bit-twiddling it is hard to stop and the design of the tagging system may become difficult to understand. So that’s when you look in your tool box and pull out something like Z3, which this article explores.

Example of Cleverness

Let us begin with a look at a real life example of low-bit tagging from Chez Scheme on an AMD64 system, where integers in the interval [-2⁶⁰, 2⁶⁰-1] are encoded directly into the value’s bit pattern:

Chez Scheme Version 9.5
Copyright 1984-2017 Cisco Systems, Inc.

> (#%$assembly-output #t)
> (lambda (x y) (fx+ x y))

entry.28:
0:       cmpi           (imm 2), %ac0
4:       bne            lf.27(35)
dcl.29:
6:       mov            %rdi, %rcx
9:       or             %r8, %rcx
12:      testib         (imm 7), %rcx
15:      bne            Llib.26(12)
lt.30:
…

Chez Scheme has its own mnemonics for the assembler instructions, but it should be familiar enough. In this example the rdi and r8 registers are bitwise logical OR’d together, the three low bits are tested and at 15: it branches to some error handler if the bits were not 0. In C terms: if (((x | y) & 7) != 0) { goto error; }. This is an example of the cleverness possible in a well-designed low-bit tagging system: two values can be type checked with a single branch instruction.

This works because all fixnums are tagged with three 0 bits and bitwise logical OR with something not a fixnum would introduce some 1 bits. Chez Scheme is careful to never create non-fixnum objects with three lower 0 bits.

Starting with Z3Py

Let us have a look at how to enter this system into Z3 via Z3Py. Z3 is an MIT-licensed theorem prover from Microsoft Research. The native language of Z3 is actually SMT-LIB, but I will use Z3Py here because I find it helps with writing logic at a higher level. Z3Py is available in many Linux distributions, including Debian: apt-get install python-z3 (currently Python 2.7 only).

So what does Z3 actually do? Think of it as a tool that does a brute force search through a whole problem space, looking at every possible model that satisfies a set of constraints, but that it also knows a lot of shortcuts that speed up the search. If you gave it the assertion x + 1 = 2 it would be clever enough that it would not need to go and search through all possible values of x until it found x = 1 (but in some situations would do basically this).

Back to tagging. One interesting property of tagging fixnums with 0 bits is that addition can be performed without masking away the tag bits. This Z3Py script checks this property:

from __future__ import print_function
from z3 import *

# We want fixnums to be tagged with some three low bits.
tag_fixnum = BitVec('tag-fixnum', 64)
mask_fixnum = BitVecVal(0b111, 64)

# Keep these ordered.
tags = ( tag_fixnum, )
masks = ( mask_fixnum, )

# Create a solver.
s = Solver()

# Two fixnums can be added and the result is a fixnum.
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
             Implies(And((x & mask_fixnum) == tag_fixnum,
                         (y & mask_fixnum) == tag_fixnum),
                     ((x + y) & mask_fixnum) == tag_fixnum)))

print(s.sexpr())
print(s.check())
print(s.model().sexpr())

The BitVec objects here are bit-vector variables in the model, representing some 64-bit value and BitVecVal is a 64-bit constant. The argument to s.add is read as: for all x and for all y it is true that x and y being fixnums implies (as in P → Q) that x + y is also a fixnum. It is important to use x and y inside ForAll, otherwise Z3 will look for some specific x and y that satisfy the assertions instead of proving a model for all fixnums. (There is a missing constraint, though. Can you see it?)

When run through Python it generates this output:

(declare-fun tag-fixnum () (_ BitVec 64))
(assert (forall ((x (_ BitVec 64)) (y (_ BitVec 64)))
  (=> (and (= (bvand x #x0000000000000007) tag-fixnum)
           (= (bvand y #x0000000000000007) tag-fixnum))
      (= (bvand (bvadd x y) #x0000000000000007) tag-fixnum))))

sat
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)

The first part is the solver in the SMT-LIB language and sat means it is satisfiable. That it is satisfiable means that Z3 also found a model, which is the last output shown. Here it has assigned all 0 bits to the fixnum tag, having proved that tagging with 0 bits allows addition to work directly with fixnums. (Almost, actually. We didn’t check that addition itself actually gives the right result).

Something that’s tricky with these theorem solvers is that you really need to tell them all the constraints or, like little children, they will find that one electrical outlet that you didn’t secure. The missing constraint here is that the tag must fit within the mask. Z3 actually found a model where the tag is zero, which is what we were looking for, but it could just as well have given us a model where some high bit of the fixnum tag is set. That will be seen in the next section.

Types, types, types!

Just having fixnums is no fun, so let’s add pairs, characters, booleans and the empty list (nil). Some additional constraints will be needed, but let us first see what happens without them.

from __future__ import print_function
from z3 import *

tag_fixnum   = BitVec('tag-fixnum', 64)
tag_pair     = BitVec('tag-pair', 64)
tag_char     = BitVec('tag-char', 64)
tag_boolean  = BitVec('tag-boolean', 64)
tag_nil      = BitVec('tag-nil', 64)

mask_fixnum  = BitVecVal(0b111, 64)
mask_pair    = BitVecVal(0b111, 64)
mask_char    = BitVec('mask-char', 64)
mask_boolean = BitVec('mask-boolean', 64)
mask_nil     = BitVec('mask-nil', 64)

tags = ( tag_fixnum, tag_pair, tag_char, tag_boolean, tag_nil )
masks = ( mask_fixnum, mask_pair, mask_char, tag_boolean, mask_nil )

s = Solver()
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
             Implies(And((x & mask_fixnum) == tag_fixnum,
                         (y & mask_fixnum) == tag_fixnum),
                     ((x + y) & mask_fixnum) == tag_fixnum)))

# Uncommented one by one in the discussion below.
#s.add(Distinct(tags))
#s.add([(tag & mask) == tag for (tag, mask) in zip(tags, masks)])
#s.add([And(mask > 0, mask <= 0xff) for mask in masks])

print(s.sexpr())
print(s.check())
print(s.model().sexpr())

If you were to run this through Python you would find that Z3Py probably prints exactly the same thing as before! That’s because the new tag and mask variables are not referenced anywhere in the model. Uncomment the Distinct constraint and you might get this model:

(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000003)
(define-fun tag-nil () (_ BitVec 64)
  #x0000000000000000)
(define-fun tag-char () (_ BitVec 64)
  #x0000000000000002)
(define-fun tag-boolean () (_ BitVec 64)
  #x0000000000000001)
(define-fun tag-fixnum () (_ BitVec 64)
  #x2000000000000000)

Distinct means that the tags must have different values. Note that Z3 gave the all-0 tag to nil and fixnums have a high bit set. And indeed, the addition constraint still holds in this model. Oops. Uncomment the constraint after the Distinct line to constrain tags to fit inside their mask. Here is a model with the new constraints:

(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
  #x32212a2aa3282220)
(define-fun mask-char () (_ BitVec 64)
  #x0000000010000000)
(define-fun mask-nil () (_ BitVec 64)
  #x32212a2aa3282220)
(define-fun tag-char () (_ BitVec 64)
  #x0000000010000000)
(define-fun tag-boolean () (_ BitVec 64)
  #x0100000000000000)
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)

Z3 really got creative here with the tag and mask for nil, but fixnums are back to zero tags, so that’s good. In general the tags are a bit too large, so let’s enable the next constraint, saying that the masks should be 8-bit values. Here is the new model (again, several models are possible):

sat
(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
  #x0000000000000040)
(define-fun mask-char () (_ BitVec 64)
  #x0000000000000061)
(define-fun mask-nil () (_ BitVec 64)
  #x0000000000000040)
(define-fun tag-char () (_ BitVec 64)
  #x0000000000000061)
(define-fun tag-boolean () (_ BitVec 64)
  #x0000000000000080)
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)

This iterative approach to the design is where using a theorem prover really shines. In this model the nil value and booleans are actually fixnums, which is wrong, so assertions should be added that prevents this from happening. But now on to some cleverness.

Clever masking

The masks in the current design are small and neat and fit as immediates in instruction encodings. However, anyone who is familiar with the x86 instruction set knows that registers can be addressed in smaller parts without separate masking. Here is a handy table for one of the registers:

Register name	Register size	Addressed data
`rax`	64	`rax`
`eax`	32	`rax & 0xffffffff`
`ax`	16	`rax & 0xffff`
`al`	16	`rax & 0xff`
`ah`	16	`(rax >> 8) & 0xff`

Using al would make it possible to type check without applying the mask explicitly, if it is exactly 0xff. This also means that the original value is not overwritten, so a temporary register is not needed. Decreasing register pressure is important when optimizing some code, e.g. tight loops.

Let us see what Chez Scheme does here.

> (#%$assembly-output #t)
> (lambda (x) (char=? x #\space))

entry.21:
0:       cmpi           (imm 1), %ac0
4:       bne            lf.20(66)
dcl.22:
6:       mov            %r8, %rcx
9:       andi           (imm 255), %rcx
16:      cmpi           (imm 22), %rcx
20:      bne            lf.19(31)
lt.23:
…

Hmm! The equivalent C code is rcx = (r8 & 255); if (rcx != 255) { goto error; }. It turns out that Chez Scheme doesn’t know that it can use r8l to do the check without involving a temporary register, even though the mask allows for this. Perhaps Chez Scheme has some low-hanging fruit for the intrepid compiler developer.

When you find a trick that you want to use in your tagging system, you just add it as a constraint. This Z3Py snippet adds the new constraint for characters:

s.add(Or(mask_char == 0xff,         # for free with al
         mask_char == 0xffff,       # ax
         mask_char == 0xffffffff))  # eax

Clever shifting

Any language with both characters and integers needs some way to convert between them and Scheme is no exception. Chez Scheme’s implementation of char->integer contains a piece of cleverness (not unique to itself) that works because of how the character and fixnum tags are arranged:

> (#%$assembly-output #t)
> (lambda (x) (char->integer x))

entry.28:
0:       cmpi           (imm 1), %ac0
4:       bne            lf.27(39)
dcl.29:
6:       mov            %r8, %rcx
9:       andi           (imm 255), %rcx
16:      cmpi           (imm 22), %rcx
20:      bne            lf.26(11)
lt.30:
22:      mov            %r8, %ac0
25:      lsri           (imm 5), %ac0
…

It first does a type check on r8 to ensure that it’s a character. Then it writes the return value as (in C terms) r8 >> 5. How can 5 work with no unmasking or tagging? Let’s add it as a constraint. This requires some leg work, see the comments:

# SPDX-License-Identifier: MIT
# Unchanged from before
from __future__ import print_function
from z3 import *
tag_fixnum   = BitVec('tag-fixnum', 64)
tag_pair     = BitVec('tag-pair', 64)
tag_char     = BitVec('tag-char', 64)
tag_boolean  = BitVec('tag-boolean', 64)
tag_nil      = BitVec('tag-nil', 64)
mask_fixnum  = BitVecVal(0b111, 64)
mask_pair    = BitVecVal(0b111, 64)
mask_char    = BitVec('mask-char', 64)
mask_boolean = BitVec('mask-boolean', 64)
mask_nil     = BitVec('mask-nil', 64)
tags = ( tag_fixnum, tag_pair, tag_char, tag_boolean, tag_nil )
masks = ( mask_fixnum, mask_pair, mask_char, tag_boolean, mask_nil )
s = Solver()
x = BitVec('x', 64)
y = BitVec('y', 64)
s.add(ForAll([x, y],
             Implies(And((x & mask_fixnum) == tag_fixnum,
                         (y & mask_fixnum) == tag_fixnum),
                     ((x + y) & mask_fixnum) == tag_fixnum)))
s.add(Distinct(tags))
s.add([(tag & mask) == tag for (tag, mask) in zip(tags, masks)])
s.add([And(mask > 0, mask <= 0xff) for mask in masks])
s.add(Or(mask_char == 0xff, 
         mask_char == 0xffff, 
         mask_char == 0xffffffff))
# New code starts here:

# A trick (see Hacker's Delight) to get the shift amounts
# that match the masks.
shift_char = BitVec('shift-char', 64)
shift_fixnum = BitVec('shift-fixnum', 64)
s.add(mask_char == ((1 << shift_char) - 1))
s.add(mask_fixnum == ((1 << shift_fixnum) - 1))
s.add(shift_char > 0)
s.add(shift_fixnum > 0)

# Z3Py versions of char->integer and integer->char
def char_to_integer(ch):
    return ((ch >> shift_char) << shift_fixnum) | tag_fixnum
def integer_to_char(fx):
    return ((fx >> shift_fixnum) << shift_char) | tag_char

# Sets fx_A to the fixnum that represents the 'A' code point
# and sets ch_A to the character 'A'. Then asserts that the
# conversion functions work. I'm a little bit lazy and do this
# for 'A' instead of all chars.
ch_A = BitVec('ch-A', 64)
fx_A = BitVec('fx-A', 64)
s.add(fx_A == ((ord('A') << shift_fixnum) | tag_fixnum))
s.add(ch_A == ((ord('A') << shift_char) | tag_char))
s.add(char_to_integer(ch_A) == fx_A)
s.add(integer_to_char(fx_A) == ch_A)

# Assert that no object satisfies both fixnump and charp.
# Ideally there should be a complete set of these.
def fixnump(obj): return (obj & mask_fixnum) == tag_fixnum
def charp(obj): return (obj & mask_char) == tag_char
s.add(ForAll([x], Implies(fixnump(x), Not(charp(x)))))
s.add(ForAll([x], Implies(charp(x), Not(fixnump(x)))))

# Assert that char->integer is equivalent to (ch >> n) for some n.
# This is the main point of this section.
shift_ch_to_fx = BitVec('shift-ch->fx', 64)
s.add(char_to_integer(ch_A) == (ch_A >> shift_ch_to_fx))

print(s.check())
print(s.model().sexpr())

Quite a bit of code, but it is necessary so that Z3 will not find a loophole. Here is one possible output:

sat
(define-fun shift-char () (_ BitVec 64)
  #x0000000000000008)
(define-fun ch-A () (_ BitVec 64)
  #x0000000000004102)
(define-fun shift-fixnum () (_ BitVec 64)
  #x0000000000000003)
(define-fun mask-char () (_ BitVec 64)
  #x00000000000000ff)
(define-fun tag-char () (_ BitVec 64)
  #x0000000000000002)
(define-fun mask-nil () (_ BitVec 64)
  #x000000000000000c)
(define-fun tag-boolean () (_ BitVec 64)
  #x0000000000000014)
(define-fun tag-pair () (_ BitVec 64)
  #x0000000000000004)
(define-fun tag-nil () (_ BitVec 64)
  #x000000000000000c)
(define-fun tag-fixnum () (_ BitVec 64)
  #x0000000000000000)
(define-fun fx-A () (_ BitVec 64)
  #x0000000000000208)
(define-fun shift-ch->fx () (_ BitVec 64)
  #x0000000000000005)

The interesting part is shift-ch->fx, which is 5, just like in Chez Scheme. Furthermore, if you add the assertion shift_ch_to_fx != 5 then Z3 will say “model is not available”, meaning that only this shift amount has the desired properties. There is nothing particular that stands out in the assertions that causes this to happen (although it is possible to get other shift amounts if some constraints are relaxed).

This has also affected the selections of the other values in the model. If you were to change the shift amount for characters then you could no longer rely on this trick and Z3 would happily let you know about it. If you rely on this trick in your compiler then it’s a good idea to add it as an assertion. In fact, while writing your Z3Py code it’s a good idea to add all your assumptions as assertions.

Summing up

I could go on and show a few more missed optimization opportunities in Chez Scheme, like how it doesn’t type check multiple characters with one branch, but I hope that you have already seen how Z3 is useful. It lets you prove properties of your tagging system and, perhaps just as importantly, lets you document your assumptions.

Z3 itself is lacking in documentation, attempting to use tutorials as a substitute, and its web site is full of dead links. When Z3 fails to find a solution it either goes into a seemingly infinite spin and/or prints unsat, hoping you will go away. (There is a way to get it to print a counterexample, but the output is sadly incomprehensible.) Quite often I found myself sitting at my terminal wondering why my assertions were unsatisfiable.

My advice, if you find yourself in the situation where you are 100% certain something should work, is to remove the general cases and add assertions for very specific cases. At some point a specific case will cause Z3 to reject your assertions, which will give you a clue as to what has gone wrong.

Give it a spin if you’re thinking about revamping your tagging system or if you want to add extra tricks and be certain that they are well founded.

R7RS versus R6RS

weinholt — Fri, 22 Jun 2018 02:00:00 +0200

InPhase asked today on #scheme about the R7RS vs R6RS debate. I followed the original debate closely and have experience both using and implementing R6RS. I also recently added R7RS support in Akku.scm 0.3.0, so I feel like I can weigh in on this. It’s a topic that many feel passionately about, and I’m also firmly on one side of the debate, but I will try to keep my own opinions and hyperbole out of it this time.

[Note from January 2020: Note that this article is about R7RS-small. As R7RS-large is coming together, new readers could assume that it is about R7RS-large, but it is not. Thanks to cos on lobste.rs for pointing this out.]

The number argument

If you simply asked around, you might get the answer that R6RS is just so much bigger than R5RS/R7RS (and bigger is presumed to be not as good). It looks obvious on the surface, but defenders of R6RS see it is a canard. Here are the numbers, based on the latest updated documents.

AI Memo № 349: 43 pages.
RRS: 35 pages.
R2RS: 76 pages.
R3RS: 43 pages.
R4RS: 55 pages.
R5RS: 50 pages.
R6RS: 91 + 72 = 163 pages (not counting the non-normative appendices and the rationale, which I think is fair).
R7RS: 88 pages (similarly not counting the overview).

By these numbers we see that R6RS is 185% the size of R7RS and 326% the size of R5RS. The argument looks true, at least on the surface. These numbers hide the fact that R6RS contains 15 pages of formal semantics.

Is it fair to count these pages as a point against R6RS? The formal semantics are a good reference for implementers who are unsure about some corner of the language and can be used to validate an implementation’s semantics. Here are the recent numbers again, excluding formal semantics, appendices, bibliographies and indices:

R5RS: 40 pages (excluding section 7.2 and forwards).
R6RS: 60 + 65 = 125 pages (excluding Appendix A and forwards).
R7RS: 65 pages (excluding section 7.2 and forwards).

The 65 pages from the R6RS standard libraries are the only remaining part of the number argument that still holds.

R7RS side: The R6RS is too big. R6RS side: It’s smaller than it appears.

The condition system

R6RS specifies a condition system with 14 standard condition types for code that raises exceptions. R5RS does not provide any standard way to handle or distinguish exceptions and not even a way to raise exceptions. R7RS borrows guard, raise and raise-continuable from R6RS but does not specify a complete condition system.

Instead of a condition system, R7RS has error-object-message, error-object-irritants, error-object?, read-error? and file-error?. These are not necessarily meant to work with a new distinct type, but may simply work with e.g. symbols and vectors. This gives the implementer the freedom to reuse whatever condition objects were used before they implemented R7RS support.

The condition system in R6RS comes from the pain of trying to write any kind of error handling at all in R5RS. It was not possible to, let’s say, write portable code that reliably writes to the file system and correctly handles I/O errors. In contrast, if an R6RS program tries to open a file it does not have access to then it will get an exception with an &i/o-file-protection condition as well as a few other conditions that together give a complete picture of the condition. In Chez Scheme (which also adds the extra &format):

(guard (exn
        ((i/o-file-protection-error? exn)
         (simple-conditions exn)))
  (open-file-output-port "/dev/foo"))
;; => (#<condition &i/o-file-protection> #<condition &format>
;;     #<condition &who> #<condition &message>
;;     #<condition &irritants> #<condition &continuation>)

User code has the same access to these conditions as the implementer and can be construct, inspect and pretty print them.

However, this means that anyone implementing R6RS should go through all their code and update it to raise the correct conditions.

R7RS side: The condition system is too big and burdensome. R6RS side: It lets us write portable code that catches exceptions.

Undefined behavior controversy

This is a big philosophical difference between the reports. I’ll let the documents themselves tell the story.

As defined by this document, the Scheme programming language is safe in the following sense: The execution of a safe top-level program cannot go so badly wrong as to crash or to continue to execute while behaving in ways that are inconsistent with the semantics described in this document, unless an exception is raised.

— Revised⁶ Report on the Algorithmic Language Scheme

That’s for R6RS, although it does later leave room for implementations to add unsafe features. But it’s clear that a program that doesn’t import such extra libraries is safe. It will not have a buffer overflow waiting for an attacker to use it.

When speaking of an error situation, this report uses the phrase “an error is signaled” to indicate that implementations must detect and report the error. […]

If such wording does not appear in the discussion of an error, then implementations are not required to detect or report the error, though they are encouraged to do so. Such a situation is sometimes, but not always, referred to with the phrase “an error.” In such a situation, an implementation may or may not signal an error; […]

For example, it is an error for a procedure to be passed an argument of a type that the procedure is not explicitly specified to handle, even though such domain errors are seldom mentioned in this report. Implementations may signal an error, extend a procedure’s domain of definition to include such arguments, or fail catastrophically.

— Revised⁷ Report on the Algorithmic Language Scheme

“Fail catastrophically” is presumably not too far away from the nasal demons of C compilers. The report does not say what will happen if a program does (string-ref "" -1) or (car 0).

Implementation restrictions provide even more ways that things can go wrong:

This report uses the phrase “may report a violation of an implementation restriction” to indicate circumstances under which an implementation is permitted to report that it is unable to continue execution of a correct program because of some restriction imposed by the implementation. Implementation restrictions are discouraged, but implementations are encouraged to report violations of implementation restrictions.

For example, an implementation may report a violation of an implementation restriction if it does not have enough storage to run a program, or if an arithmetic operation would produce an exact number that is too large for the implementation to represent.

— Revised⁷ Report on the Algorithmic Language Scheme

So an implementation is also within its rights to not detect out of memory errors or integer overflow. R6RS does not work that way:

Implementations must raise an exception when they are unable to continue correct execution of a correct program due to some implementation restriction. For example, an implementation that does not support infinities must raise an exception with condition type &implementation-restriction when it evaluates an expression whose result would be an infinity.

— Revised⁶ Report on the Algorithmic Language Scheme

R7RS side: Safety is not an essential language feature. R6RS side: Safety is an essential language feature.

[Updated in January 2020: this previously stated the R7RS side as “Safety is not a desirable language feature”, but that does not accurately describe the situation. Thanks to John Cowan for pointing this out.]

Optional is better argument

R7RS requires that implementations support 7-bit ASCII (except for NUL in strings). This is different from R5RS, which is character set agnostic. And it’s different again from R6RS which requires full Unicode support.

Unicode is one of several optional features in R7RS. Appendix B gives a list of feature identifiers that may be missing in any given implementation:

exact-closed - The algebraic operations +, -, *, and expt where the second argument is a non-negative integer produce exact values given exact inputs.
exact-complex - Exact complex numbers are provided.
ieee-float - Inexact numbers are IEEE 754 binary floating point values.
full-unicode - All Unicode characters present in Unicode version 6.0 are supported as Scheme characters.

ratios - / with exact arguments produces an exact result when the divisor is nonzero.

A benefit of these features being optional is that R7RS is easier to implement in certain environments. We can see that R7RS strings can be implemented as C strings, which also do not support NUL characters. An R7RS Scheme targeting ECMAScript can let (+ 1 1) evaluate to 2.0 and (/ 1 2) to 0.5. An R7RS targeting an AVR microcontroller can exclude Unicode support. This will lead to more R7RS implementations, which is good.

Now the other side of the argument. Implementations which don’t implement these features will likely list them as restrictions in their documentation. Nothing stops an implementer from similarly claiming compliance with R6RS and listing some restrictions. Some targets will require certain restrictions, such as due to memory limits on microcontrollers. But if these features are taken as optional in the language itself then we can’t write portable code that uses these features. The burden is on the user to provide a full Unicode library if our software requires the use of Unicode.

R7RS side: It is too burdensome and/or restrictive to require all these features. R6RS side: It is too burdensome and/or difficult to write portable code without these features.

syntax-case

The choice of macro system is a very contended issue. The result of this particular controversy was that R7RS just kept syntax-rules from R5RS. In R6RS there is both syntax-rules and the more powerful syntax-case, in which syntax-rules can be written in just a few lines.

I don’t think I can properly make justice to the arguments for and against syntax-case in this article. There are other popular macro systems with the same expressiveness, and perhaps the popularity of some of those is the largest reason why R7RS didn’t choose syntax-case.

In R5RS and R7RS it is not possible to write macros that violate syntactic hygiene. The macro system is based on rewriting rules, which happen to easily be Turing complete, but which do not have access to the Scheme language itself. They can therefore also not deconstruct strings or create new identifiers. That’s why define-record-type in R7RS (and SRFI 9) requires the user to write out all procedure names for all fields. This is also a simple motivation for wanting a more powerful macro system.

A macro expander like syntax-rules is a very tricky piece of code and syntax-case is even tricker. Those interested can check out Oscar Waddell’s PhD thesis. Requiring a tricky macro system is obviously a burden for the implementer and perhaps another reason why R7RS did not add syntax-case.

From the side of R6RS, adding syntax-case made sense. It is a great feature to have as a user of the language. We can run any Scheme code at expansion time and easily write macros that insert new identifiers, even while preserving hygiene.

Bottom line

I’ve not written about all controversies. The record system of R6RS has also received criticism. But I have shown a number of essential differences between R7RS and R6RS. I think that this is a fair summary:

R6RS is more demanding on implementers but easier on users. Conversely, R7RS is easier on implementers but more demanding on users.

Finally, a caveat for this whole article is that it applies to R7RS-small vs R6RS. Much might change with R7RS-large.

R7RS comes to Akku

weinholt — Sun, 10 Jun 2018 02:00:00 +0200

I have made some strides with Akku.scm since the introductory blog article and the announcement on the Chez Scheme mailing list. The big feature on the horizon is support for translating R7RS libraries to run on R6RS.

But first of all I need to apologize for building version 0.2.3 with libncurses6. Chez Scheme uses ncurses for its expression editor and Debian sid, which I use, has just had a migration from libncurses5 to libncurses6. Most Linux systems do not have that version of ncurses yet, so many of you could not get the pre-compiled version of Akku to run (the +src version works). I will have this fixed for the next release.

So what is this about translating R7RS to R6RS? The subset of R7RS that already exists in R6RS is quite large and the incompatible bits are manageable. Anyone interested in the details can read Implementing R7RS on an R6RS Scheme system (Kato, 2014). There are some syntactical differences, so a new reader is needed and the define-library forms need to be translated into library forms. Lastly the R7RS standard library needs to be implemented.

For this purpose I’ve written a reader called laesare that can handle both R6RS and R7RS. The bulk of the reader already existed and I added support for the R7RS lexical syntax. Next I added code to Akku to have it understand R7RS libraries and translate them to R6RS libraries. The final piece of the puzzle is the akku-r7rs package that provides the standard library. The latter is based on yuni by okuoku, with my own additions.

Some trickery was needed to support include and cond-expand. The include form in R7RS is somewhat loosely specified and leaves it up to the implementation to decide how to search for the files, but in practice the file paths are relative to the file where the include form appeared. This is not trivial to get working in straight up R6RS, but is easy with some help from Akku.

The next release of Akku will install the (akku metadata) library that describes all the libraries and assets (i.e. included files) that exist in the project. The include form uses this library to look up the location of the referenced files:

(define installed-assets
  '(((include "match/match.scm")
      ("chibi/match/match.scm")
      (chibi match))
    ...))

The location is relative to the library search path, which is how all R6RS include forms already do the job today. The metadata library also contains a list of installed libraries so that cond-expand library clauses work. These tricks work because Akku is project-oriented: it has full knowledge of all files it has installed.

Currently the R7RS support works with Chez Scheme. There is a slight problem with Racket: it has its own (scheme *) modules. Other implementations either already have R7RS support (Sagittarius and Larceny) or they don’t have a compatibility library in akku-r7rs and/or chez-srfi.

Another fly in the ointment is that many of the packages in Snow require implementation-specific code, which doesn’t exist yet for R6RS implementations. Hopefully that will change if Akku gains some popularity.

And finally, a bit of news for everyone who wants to use the git master version of Akku: you no longer need to manually bootstrap the dependencies. See CONTRIBUTING.md for instructions. Happy hacking!

So many package managers

weinholt — Sun, 25 Feb 2018 01:00:00 +0100

In a previous article, I wrote about Akku.scm, a package manager for Scheme. It is far from being the first package manager or even the first for Scheme. There have been at least a dozen failed attempts at getting something going.

Dorodango is another package manager aimed at R6RS. It works more like a system package manager (apt, dpkg) in some sense. You point it at a repository and it can then install packages for you. It is not project-oriented and does not help you with locking specific dependencies for your project. It was later forked as Guildhall for GNU Guile. That one has also stopped moving (perhaps they moved to Guix).

Another package manager is Raven, which is targeted at Chez Scheme. It wants to be installed by root into /usr/local but can then be used by unprivileged users. It’s quite new and doesn’t really have a lot of features yet. It works and it can download some packages. You can have a look at the code yourself, it’s quite small.

There are a bunch of defunct package managers: Alex Shinn’s Common-Scheme, Will Donnelly’s UnCommon Lisp (UCL), Marc Feeley’s Scheme Now! (the first Snow), Aaron Hsu’s DeSCoT, Higepon Taro Minowa’s spon and Manuel Serrano’s ScmPkg.

Centrifugal vs. centripetal

The ScmPkg paper (Serrano and Galleiso, 2007) is interesting reading. They make the distinction between centrifugal and centripetal approaches. The centrifugal approach is to attract users to the language implementation by having a lot of libraries, so that a community forms around it, which then contributes more libraries to it. The centripetal approach is to create some common framework that, if used, lets unmodified Scheme code run in multiple implementations.

There are successful centrifugal projects. SchemeSpheres is, according to their site, like the batteries (as in batteries included) for the Gambit Scheme compiler. The site is a bit wonky at the moment and the blog was last updated in 2014, but they have around 150 libraries. Even more successful, I would say, is Eggs for Felix Winkelmann’s Chicken Scheme. But the grand winner in this contest must be Racket Packages for Racket. You know you have a winner on your hand when they have four different sets of bindings for ØMQ. (Snide remarks aside, they are very popular).

There are centripetal approaches alive today, but they are not anywhere near as popular as Eggs or Racket Packages. Snow is the successor to the old Snow (Scheme Now). It’s alive and in use by some Scheme implementers and a handful of other people. Eerily similar to Snow is Snow2 by Seth Alves. Some or all of this is the basis for a rather flawed R7RS package repository format.

My pointy analysis (the rant)

The centrifugal approaches are excellent for the implementations that they target. Chicken and Racket are themselves excellent implementations and better off for their package repositories. But it’s not directly useful to Scheme as such. The packages are not portable between implementations. Not what I’m looking for.

Centripetal is no good either. In my opinion, Snow suffers from the same problems as other centripetal approaches. The packages in ScmPkg used .spi files that defined an interface and did include of some code from regular naked .scm files. Snow does the same but calls those files .sld and relies heavily on cond-expand. Same kind of animal. So every library is divided in two files and on top of that it’s #ifdef Hell all over again.

I don’t think Akku falls directly into either centrifugal or centripetal, and neither should it. Something else is going on. As I wrote in Akku’s README: It grabs hold of code and vigorously shakes it until it behaves properly.

Akku is built on the solid ground of R6RS Scheme. This provides a target that is well-specified and stable. Sure, some things are not specified, such as how libraries are stored in the file system, but that is where Akku comes in and bridges the gap between the code and the implementation. I think this is a better approach than, let’s say, smugly not providing a REPL just because R6RS didn’t specify the procedures necessary for it. (Such an implementation should, to be consistent with its ideals, not permit loading libraries from the file system either).

R6RS does not have cond-expand. Instead there is a de facto standard of loading .<impl>.sls before .sls that all R6RS implementations support, as well as Akku. This works better than the cond-expand forms which tend to show up in every little library file. (Can you find the typo’d SRFI number hidden in one of the cond-expands in Chibi Scheme?). The .<impl>.sls approach, on the other hand, results in a few libraries where all the non-portable stuff goes. You end up building reusable abstractions instead of clutter. That is just better.

And R6RS already supports libraries. ScmPkg had to target any number of module systems. I think it is rather telling that the centripetal approaches tend to expect implementers to write their own clients for the package system.

Minor R7RS rant

I can not write my software with R7RS as the target language. My critiques are too numerous and I don’t even know how my name got into the standard.

However the library system in R7RS is quite alright, basically being a downgraded copy of that in R6RS, and a future Akku version is likely to support R7RS/Snow packages. They would be installed both for use directly in R7RS implementations and lightly converted for use in R6RS implementations. Going in the other direction is not possible in the general case (see the botched attempts at porting Industria to R7RS).

To me R7RS-large looks like the centrifugal approach, except applied directly to the language standard. Most of what it attempts to accomplish has been possible with R6RS all along. More progress would have been made in this if Scheme standardization had not been turned into an “Us vs. Them” game. It disillusioned and demotivated many schemers.

Besides, implementers who want access to many libraries can add R6RS support, which I think will be less effort than adding R7RS-large.

Security

I stand in front of you. I’ll take the force of the blow. Protection.

– Massive Attack – Protection

Am I the only one to see that the Snow repository format uses unpadded RSA signatures, to be verified by keys sent next to the signatures over plain http? As we say in Sweden: va?

The other package managers listed are not any more serious about security either, but at least they don’t pretend to be either. Racket is again the best of the bunch.

I’m taking Akku’s security seriously. That includes using standard cryptographic protocols and algorithms, independently verifiable signatures on the package index, message digests on all downloaded code, no arbitrary code execution on installation and manual reviews before code even shows up in the official index.

We are all friends here but let’s not kid ourselves, the Internet is a wild place, and let’s have some standards.

Introduction to Akku.scm

weinholt — Sat, 24 Feb 2018 01:00:00 +0100

For the past few months I’ve been working on Akku.scm, a language package manager for R6RS Scheme. It’s not the first one for Scheme and it’s not even the first for R6RS. But it’s here, it’s yet another package manager, it works and I’m using it.

Language package managers are specialized to some specific set of programming languages. They are not general tools to distribute any kind of software. The purpose is to aid developers in managing the dependencies of their code. Here’s a quick demo:

As the demonstration shows, there are three parts: manifest, lock and install. I started writing Akku.scm in reverse order. I first wrote the installer, then applied it to manually written lockfiles, and then wrote the code that computed the lockfile automatically.

Installing a locked set of projects

The lockfile contains specifications of projects to download and install. Any code pulled in by Akku.scm’s installer will come from a lock specification in the lockfile. Currently Akku can clone git repositories and install R6RS libraries/programs, but the format is flexible enough to support anything.

The lockfile will always contain some cryptographic checksums. Today this is the sha1 commit id to checkout in git. Even when a git tag is used to checkout, it is verified against the sha1 commit id. This is because git tags can be replaced and are not cryptographically secure by themselves. When file downloads are later added they will have their sha256 digests in the lock, and so on. The locks come from the package index and that is signed with an OpenPGP signature.

Other than code locations and checksums, the lockfile can also contain instructions for what to do with the downloaded projects. By default everything is treated the same, as R6RS libraries and programs. Reasonable future extensions include: library name prefixing, library filtering (to split a project into multiple packages), build instructions for code loaded via FFIs, and conversion of R7RS libraries.

Installation is not as simple as just copying files that match *.sls. Akku has a repository scanner that analyzes all files to locate and categorize libraries, programs, included files and license notices. It furthermore has code to recreate the pathnames of libraries based on the library names and the various rules used in different Scheme implementations. Example: (srfi :1 lists) is loaded from srfi/:1/lists.sls in Chez Scheme, srfi/1/lists.sls in IronScheme and srfi/%3a1/lists.sls in Ikarus. Akku installs it at all these locations.

The result is that you can simply point the installer at a repository and it automatically figures out what code there is and how to install it in the library path.

Solving

I’ve briskly ripped out the dependency solver from Andreas Rottmann’s Dorodango. The solver is a Scheme port of the one in Debian’s aptitude. Its inputs are the dependencies listed in the package manifest in combination with the package index. The output is a set of package versions that go into the lockfile.

All packages have SemVer versions and package dependencies are listed as ranges with a syntax borrowed from npm. Essentially SemVer means that versions are written as X.Y.Z, where X is incremented when breaking backwards compatibility, Y when adding features and Z when fixing bugs. Usually 0.x.x implies that compatibility is not guaranteed.

The solver’s job is to take the package’s direct dependencies and select a set of compatible package versions that pulls in not just the immediate dependencies, but also their dependencies, and then the next level of dependencies. You need package A and package A needs package B, so the lockfile must have both A and B. The trick for the solver is to get the highest versions which are all compatible with each other. This is in general a difficult problem, NP-complete actually.

For now the solver is working beautifully, but that might change as the package index grows and dependencies grow more complex. The way out of the NP trap is to switch to a simpler problem. Akku can later be extended to do what npm does: if the dependencies want to use package A both in version X and Y, then npm will install both X and Y at the same time. I think that with proper care this will mostly work in R6RS.

Infrastructure - not there yet

The next natural steps for Akku involve infrastructure for publishing and discovering packages. These are in progress. Currently the package index can be updated with akku update, but there also needs to be commands to publish packages and securely bind an OpenPGP key to the publisher and the package names.

To make the situation complete there also needs to be a web site connected with the package index. Package documentation and testing needs to be handled as well. There must also be support for publishing tarballs rather than git repos.

Epilogue

I have mentioned npm a few times, but don’t be led to believe that I think npm is a good example of a package manager. It seems that every few months there’s a scandal about npm or some other popular package manager. It has become something of a fashion to publicize package manager failures. Everyone has heard of left-pad and recently it would even chmod your filesystem into chaos. So now everyone should switch to Yarn, which is advertised as… Mega Secure? Golang’s go get uses GitHub URLs without any commit ids and someone thinks it’s GitHub’s fault that things can go wrong. Okay. It’s the new normal. (Russ Cox writes about vgo where these problems are fixed).

I have been wanting to get a package manager going for a while, but it was only after reading Sam Boyer’s article So you want to write a package manager that the final pieces fell in place. Sam’s work is for Golang, which is somewhat more popular than R6RS. Akku has an easier problem to solve.

“Python is popular; we’re not! We have an advantage!”

— Abdulaziz Ghuloum

For even more opinions about package managers and Scheme in general, see the next article: So many package managers.

Linting Scheme with r6lint

weinholt — Sat, 08 Apr 2017 02:00:00 +0200

I find it useful while working in Python, JavaScript or C to have Emacs show me the location of code errors. For Python there is Pylint and for JavaScript one can use JSHint and a few others. And of course with C there was the original lint, but today the compilers themselves generate quite good warnings. These linters are easily integrated with Emacs via Flycheck, which highlights errors in the code. Finding that they produce too many errors when fed Scheme code, I decided to make my own linter, r6lint.

Some normal things are expected from a linter:

It should not produce irrelevant messages.
It should show the location of the problem.
It should do some useful analysis.

I wanted my linter to warn about bad style (possibly according to Riastradh’s Lisp Style Rules), improper usage of procedures and to show the location of unused variables. This last part is something that even the original lint did. Of course, a linter performs static analysis and this limits what it can do. But it should at least find some problems that compilers don’t bother to warn about. And a linter can be a social tool, helping to spread awareness of good style.

Practical issues

Scheme is pretty good for working with Scheme code, but in this case the tools in the standard library are not adequate. The read procedure that parses S-expressions does not keep the source information, so e.g. the location of procedure definitions would be lost. The linter needs this information and that means a custom reader must be used.

Scheme is not limited to the syntax provided by the language designer and compiler implementer. Code can define new syntax and package it in libraries. R6RS Scheme supports syntax-case, which allows macros to run arbitrary code at compile time. These can introduce new variables and control structures. If the linter didn’t understand these then the analysis would be very lacking. So when the parsed S-expressions are in memory they need to be macro expanded.

The macro expander is however not exported by the standard libraries. One of the reasons for this is that the output from the expander is very implementation-specific. Exposing the macro expander wouldn’t automatically mean that programs could do anything useful with its output, because the forms it returns do not need to be standard Scheme forms.

Practical solutions

I happened to already have a lexer and parser for R6RS and for this project I’ve improved it so that it keeps source information. I also modified the reader be tolerant to errors, so it can emit more than one error message. If packaged separately it should be called tra-la-la, because it will happily ignore all possible errors and continue reading until end of file.

The next part of the solution is a macro expander. For this I dug up the portable syntax-case implementation by Abdulaziz Ghuloum and R. Kent Dybvig. The official code repository is in Launchpad as lp:r6rs-libraries, but some fixes and improvements can be found in the psyntax embedded in Ikarus and IronScheme.

I made my own modifications to psyntax. There is a small change in how source information is handled so that the reader’s annotations can be used. Furthermore there is no longer any need for compatibility libraries. One assumption of psyntax is that it will be integrated in a Scheme implementation. This means that it needs access to a lower level of the Scheme implementation than is accessible from R6RS. It wants to read and set top-level variables that are reachable from eval‘d code (it uses eval to run user-provided macros). In R6RS there is no portable interaction environment, which would normally provide this kind of semantics. I’ve worked around this by placing macro-defined global variables in a hashtable.

Another problem is that psyntax needs to generate unique symbols. This is an important feature for syntactical hygiene: if a macro contains the variable x it should not clash with the macro user’s variable x. In R6RS there isn’t really a gensym, but macros still need to be able to generate temporary names, so access to the host Scheme’s gensym can more or less be finagled by using the standard procedures generate-temporaries and syntax->datum. A requirement from the linter (not psyntax) is that a gensym must be possible to turn back into the original symbol. In Chez Scheme this isn’t a problem due to an innovative gensym that works with symbol->string. But generally in other implementations the name returned could be anything, so the linter saves all gensym names in a hashtable.

Finally the output from psyntax is records instead of S-expressions. In part this eliminated the need to represent the void value, but primarily it was useful to get a more general way to store source information.

Lint it

In r6lint the analysis happens on several levels. The lexer itself warns about lexical violations, e.g. unexpected end of file, characters outside the valid Unicode range and invalid identifiers. The reader finds problems with mismatching braces and other structural problems. The tokens from the lexer are also used to detect formatting errors, e.g. hanging parenthesis, trailing whitespace and other whitespace issues.

Syntactical violations are reported during macro expansion. Exceptions from the expander are caught and transformed into something useful. This doesn’t do much more than a compiler already does, except it tries to preserve source information.

The more interesting analysis has barely even been implemented, but a proof of concept is there. The records returned by psyntax are fed into a simple analyzer that warns about unused variables.

Wonders

I integrated the linter with my editor before it was working. At one point while I was coding the linter sprang to life and started to warn about errors in itself. This sort of thing tends to happen a lot with Scheme.

In summary there is a new R6RS Scheme frontend that is designed to run standalone in any R6RS implementation. It feeds a simple static analyzer where new analysis passes can be plugged in. It’s an interesting framework that I hope will grow more and more featureful.

Structure of the ARM A64 instruction set

weinholt — Sun, 29 Jan 2017 01:00:00 +0100

Earlier this year I bought a Raspberry Pi 3 to have as an AArch64 development machine. The fastest way to get familiar with an instruction set is to write a disassembler for it and I’ve made one for 64-bit ARM in R6RS Scheme as part of the machine-code project. The instruction set is called ARM A64, instructions are always 32 bits wide and they have a neat structure which is pretty fast to decode in software.

The architecture has 31 integer registers (x0-x30). There is also a stack pointer register and a zero register that always contains zeroes. Both these registers are encoded as register number 31, and it’s up to each instruction if an operand can use the stack pointer or the zero register. The x30 register is used to store the return address. These registers are all 64-bit registers and the lower 32 bits can be accessed using the names w0-w31. Operations that write to the lower 32 bits also clear the upper 32 bits, just like on AMD64.

There are also 32 registers usable as either floating point registers or 128-bit vector registers. As vectors they support different arrangements that are either 64 or 128 bits in total, containing 8-bit, 16-bit, 32-bit or 64-bit quantities. There are many instructions that operate on multiple quantities at the same time, which is an interesting way to speed up code. Multiple loop iterations can be run simultaneously.

The instructions are documented in the ARM ARM for ARMv8-A. I’ve counted, not including instruction aliases, 442 instruction mnemonics (things like ADD, EOR, B.EQ, etc). They are organized in what is basically a four-level table: main encoding, instruction group, decode group and instruction. Chapter C4 of the manual follows the same structure. This structure is nice for fast decoding, but it’s not strictly necessary since all encodings at the instruction level still need to have a unique meaning.

For each instruction mnemonic there can be multiple variants that enable the instruction to handle different types of operands. An example of this is the FMUL instruction that multiples two floating point values. In a C program it would look like a = b * c. In A64 assembler it might look like one of these, depending on what the surrounding code does:

fmul s0, s1, s2            ;single precision floats
fmul d0, d1, d2            ;double precision floats
fmul v0.2s, v1.2s, v2.2s   ;vectors with two singles
fmul v0.4s, v1.4s, v2.4s   ;vectors with four singles
fmul v0.2d, v1.2d, v2.2d   ;vectors with two doubles
fmul s0, s1, v2.s[0]       ;multiplies by a vector element
fmul d0, d1, v2.d[0]
fmul v0.2s, v1.2s, v2.s[0] ;combinations of the above
fmul v0.4s, v1.4s, v2.s[0]
fmul v0.2d, v1.2d, v2.d[0]

That’s quite a few variants for a single mnemonic. Not all mnemonics have this many variants, but depending on how one counts I estimate that there are in total around 1000-2000 variants. The instruction set designers had to fit all these variants into 32 bits, while at the same time making space for instructions that encode relatively large immediate operands, and not forgetting about leaving space for future extensions. As if that wasn’t difficult enough, the instructions should also be easy to decode with hardware.

Instruction encodings

I’ve extracted the tables from my disassembler, rendered them with the bit-field package, and made them slightly interactive. If you’re reading this in a browser you can see the encodings below. The thing to notice is that each layer adds extra fixed bits: fields that must be a fixed 0 or 1 value. (The last level, the instruction level, is not shown in this table). Two encodings under the same parent always have some differences in these fields, so that they can be separated by an instruction decoder. Click an encoding to expand the next level of encodings.

There are many conventions in the field names. Instructions that take register operands encode them in fields named Rd, Rn and Rm. Immediate values (integers, PC-relative offsets, etc) are named imm. Fields that change the type of operation tend to be called opN or opcode. In general a few of the fields encode the operation (or the size of the operation) and the rest encode the operands.

Room to grow

The image below shows the encoding space of the instruction set. The x axis goes from 0 to 2¹⁶-1 and encodes the lower 16 bits of the instruction space, and the y axis contains the upper 16 bits. The different colors denote different decode groups, i.e. all the encodings at the third level of the table above. (There is probably a better representation).

All the dark spots are places where ARMv8 does not have any allocated instructions, or the encoding is reserved. For many instructions there are some fields that have reserved encodings and these are also dark.

Even if instructions are kept to the fixed 32 bit encoding there is still plenty of room for the instruction set to grow.

Impression

ARM A64 is a quite clean instruction set with only a few quirks here and there in its encoding. Compared to AMD64 it has twice the amount of registers, a clean separation of load/store instructions, clean RISCy operands (mostly one destination register and two source registers) and of course the register names and most mnemonics are totally different. Both have 128-bit vector registers and 64-bit integer registers and a 64-bit address space. They look quite similar, except everything’s different.

Splitting Industria

weinholt — Sat, 14 Jan 2017 01:00:00 +0100

Recently a friend lent me the book Start With Why by Simon Sinek. It made a lot of sense to me and made me look at my own projects in a new light. The Industria libraries is a set of libraries for R6RS Scheme that do, well, quite a few different things. There’s cryptography, compression, a few network protocols, various things, but also an assembler and a few disassemblers. It has many things, but it doesn’t truly have a “why”.

The original idea was to make pure Scheme implementations of things that are needed by user space. Libraries for those things that are needed at the base of an operating system, and everything that’s needed to communicate with computers running other modern operating systems. So a few things made sense to add. But at some point there were unnecessary things added (e.g. the broken FiSH crypto protocol for IRC), and at some point the complexity of multiple systems is just too much to contain in a single project (the DNS libraries languish, the TLS client is not up to date, etc).

So the original “why” got lost at some point. And now the project can move in so many different directions that it’s not clear what, if anything, should be done.

That’s the background for my decision to split Industria into multiple projects. The first split-off is the the machine-code project. This is where Industria’s assembler, disassembler and object code libraries have moved. To get some momentum into this project I’m also writing a new disassembler for 64-bit ARM which will be released soon.

There are a few other mega projects like Industria in the R6RS world. I think it would be beneficial for more projects to do similar splits. Of course, this will increase the need for a proper package manager and, more importantly, a public package repository. But I see that as a positive side effect. I think that we need to experience some discomfort before we can gather enough motivation to improve our infrastructure. There is already Dorodango, but I’m not aware of a public package repository for R6RS. I think that would be an interesting project in itself.

Automated Testing of Zabavno

weinholt — Fri, 23 Dec 2016 01:00:00 +0100

I had already been programming for twenty years before I started my current project at Ericsson. During my time in the project I’ve come to really appreciate a few things that were new to me, like Continuous Integration (CI) and automated testing. I recently setup CI for Zabavno on GitHub with a new test case generator and immediately found bugs.

The approach

Zabavno is an x86 emulator and the x86 is a notoriously tricky architecture. Of course, it’s not the first x86 emulator and not the first one that needs testing. In the paper Design and Testing of a CPU Emulator (2009, Forin and Liu) a very systematic way to generate test cases for the x86 is described. But the findings in their evaluation section are a bit puzzling to me. When the processor manuals say that the flags are undefined it should not really be surprising that they can have been modified. They describe errors in the processor manual’s description of the instruction encodings. My own approach to the x86 has been to use the opcode tables, but even they have errors. They’ve also rediscovered opcode 82, which is actually already in the opcode map. Their techniques seem quite complex for what they accomplish. My main takeaway from this article is to generate test cases automatically and run them on actual hardware, but to do it in an easier way. (I’ll also be stealing their parity matrix for the aas instruction).

A simpler approach is to use the opcode tables to generate random operands for instructions. The instructions can then be run in the emulator and the results incorporated into a binary that runs them on real hardware. I took a similar approach in schjig, which is a program that tests R6RS Scheme implementations by comparing two implementations. In the 2009 paper they generate C programs that are run under a “test execution engine”. This might be necessary later but the first version will incorporate everything into the test binary, which will be built using the machine-code x86 assembler.

Generating test cases

In schjig there is a table of Scheme procedures along with a description of their arguments and return values (similar to what is found in the R6RS documents themselves):

(define ops
  '#(((bool) number? obj)
     ((bool) complex? obj)
     ((bool) real? obj)
     ((bool) rational? obj)
     ((bool) integer? obj)
     ((bool) real-valued? obj)
     ((bool) rational-valued? obj)
     ((bool) integer-valued? obj)
     ((bool) exact? z)
     ((bool) inexact? z)
     ((z) exact z)
     ((bool) = z z z ...)
     ((bool) < x x x ...)
…

These descriptions are used to generate test cases. If an argument has type z then a random complex number is generated as an argument, and of course the right number of arguments should be generated. (Partly this table was also automatically generated by eval’ing procedure calls with random arguments. It turns out that the procedure signatures differ slightly between different Scheme implementations).

Instructions in x86 opcode tables are described using what is basically the same notation (albeit more complex, with nested tables):

(define opcodes
  '#((add Eb Gb)
     (add Ev Gv)
     (add Gb Eb)
     (add Gv Ev)
     (add *AL Ib)
     (add *rAX Iz)
     #(Mode (push *ES) #f)
     #(Mode (pop *ES) #f)
     ;; 08
     (or Eb Gb)
     (or Ev Gv)
     (or Gb Eb)
     (or Gv Ev)
     (or *AL Ib)
     (or *rAX Iz)
     #(Mode (push *CS) #f)
…

The structure of the table itself is used to navigate the opcode space. From the snippet above it can be seen that the add instruction uses opcodes 00 to 05, push es is on 06, etc. Some instructions take implicitly encoded operands (e.g. *AL can only be the al register operand), but most operands need to be provided separately using additional bytes. The Eb opsyntax can be a byte register or a byte memory reference. The encoding details are left to the assembler and the task is just to generate operands that match the requirements.

Both schjig and the new test generator for Zabavno prefer to generate integers that lie close to power-of-two boundaries. This tends to uncover a lot of edge cases. The test generator is very simple right now at around 380 lines, it only generates byte register operands and byte immediates, but is easy to extend with additional operands.

The tested instruction is placed in a Linux i386 ELF binary that’s generated using the x86 assembler of the machine-code project. There is no need to emit C code or interact with other tools at all, except for the Linux kernel, so the test runtime environment is pretty simple to build and execute. For each test case the ELF binary contains a short setup sequence followed by the tested instruction itself. Then it compares the actual register values with the register values produced by Zabavno. If there’s a mismatch it prints a failure report and exits with a non-zero status.

Bugs found

Even before the test program was finished it found a bug in the dec instruction. Here’s the report (note the difference in the flags line):

Test failed: dec-Eb-0
(mov (mem32+ scratch-flags) #x41)
(mov esp scratch-flags)
(popfd)
(mov eax #x10001)
(mov ecx #x80000001)
(mov edx #xFFFFFFF)
(mov ebx #x1)
(mov esp #x7FFF)
(mov ebp #x3FFFF)
(mov esi #x1FFFFF)
(mov edi #x40000001)
(dec al)
Result from emulation in Zabavno:
eax     #x00010000
ecx     #x80000001
edx     #x0FFFFFFF
ebx     #x00000001
esp     #x00007FFF
ebp     #x0003FFFF
esi     #x001FFFFF
edi     #x40000001
flags   #x00000257
Result from processor execution:
eax     #x00010000
ecx     #x80000001
edx     #x0FFFFFFF
ebx     #x00000001
esp     #x00007FFF
ebp     #x0003FFFF
esi     #x001FFFFF
edi     #x40000001
flags   #x00000247

It has so far found bugs in the AF flag handling of adc, dec and sbb. And it hasn’t even been activated yet for most instructions. One complication with enabling this for more instructions is that Zabavno doesn’t emulate the undefined processor flags “correctly” (it just clears them). It remains to be seen what can be done there, but it should not be very difficult for the emulator to track which flags are undefined.

Continuous Integration

The test suite should run automatically. GitHub offers a large amount of CI tools, so it can be hard to know where to start. I naturally picked the one with a moustache, Travis CI.

Setting up an account on Travis CI is just a matter of logging in via your GitHub account and granting some innocuous permissions. Travis will then let you enable testing for any of your repositories.

Tests are configured in a configuration file that you commit to your repository as .travis.yml. Right now they don’t have a runtime for Scheme. But they do have runtimes for C and most of the big popular languages, so installing a Scheme as part of the build process isn’t very difficult. (And besides that, they also let you use apt to install packages from Ubuntu, and there are a bunch of Schemes available through there). Here’s the configuration file used by Zabavno (slightly reformatted for the web). It downloads Chez Scheme, generates a test suite and runs it:

language: c

os:
  - linux

compiler:
  - gcc

before_script:
  # Install Chez Scheme
  - "wget https://github.com/cisco/ChezScheme/archive/master.zip
    -O ChezScheme-master.zip"
  - unzip ChezScheme-master.zip
  - "pushd ChezScheme-master && ./configure
      --installprefix=$TRAVIS_BUILD_DIR/chez &&
     make && make install && popd"
  - export PATH=$TRAVIS_BUILD_DIR/chez/bin:$PATH
  - export CHEZSCHEMELIBDIRS=$TRAVIS_BUILD_DIR/..:$TRAVIS_BUILD_DIR
  # Install machine-code
  - "wget https://github.com/weinholt/machine-code/archive/master.zip
      -O machine-code-master.zip"
  - unzip machine-code-master.zip
  - mv machine-code-master machine-code

script:
  - programs/zabavno --help
  - tests/x86/generate.scm && chmod +x generate.out
  - ./generate.out

Simply commit a file similar to this one this named .travis.yml and push it to a branch. For the initial setup you can push it to a test branch and Travis will still see it.

Travis sends you emails about the build status and also shows the build output as the build is happening. To top it all off there’s a status image you can link to from your project. Now everyone can see that the code is working.

Make Test Inputs with Prolog

weinholt — Wed, 23 Nov 2016 01:00:00 +0100

A while back I wrote a parser for R6RS Scheme numbers, or the string->number procedure. Numbers in Scheme are somewhat sophisticated and can be written in some surprising variations and I wanted some test inputs for verifying that the parser doesn’t crash on valid inputs. Luckily, the number syntax is specified in such a way that a Prolog program easily can be written that generates test inputs.

SWI-Prolog supports an alternative syntax called definite clause grammars (DCG) that is suitable for this task. The specification in R6RS is written in a similar BNF syntax so translation is very easy. Except for this there is nothing particular about Prolog itself that makes it suitable for this task and alternatives such as miniKanren or µKanren could be used instead.

Let’s get down to numbers. The datum syntax can be found in R6RS section 4.2.1, which has this to say:

The rules for ⟨num R⟩, ⟨complex R⟩, ⟨real R⟩, ⟨ureal R⟩, ⟨uinteger R⟩, and ⟨prefix R⟩ below should be replicated for R = 2, 8, 10, and 16. There are no rules for ⟨decimal 2⟩, ⟨decimal 8⟩, and ⟨decimal 16⟩, which means that number representations containing decimal points or exponents must be in decimal radix.

In the following rules, case is insignificant.

So we’ll need to remember to replicate the rules for every radix and to handle both upper and lower case letters. (Luckily DCG handles the first for us). This text is followed up by rules that look something like this (made to be less compact than in the specification):

⟨digit⟩ → 0 ∣ 1 ∣ 2 ∣ 3 ∣ 4 ∣ 5 ∣ 6 ∣ 7 ∣ 8 ∣ 9
…
⟨number⟩ → ⟨num 2⟩ ∣ ⟨num 8⟩ ∣ ⟨num 10⟩ ∣ ⟨num 16⟩
⟨num R⟩ → ⟨prefix R⟩ ⟨complex R⟩
⟨complex R⟩ → ⟨real R⟩
    ∣ ⟨real R⟩ @ ⟨real R⟩
    ∣ ⟨real R⟩ + ⟨ureal R⟩ i
    ∣ ⟨real R⟩ - ⟨ureal R⟩ i
    ∣ ⟨real R⟩ + ⟨naninf⟩ i
    ∣ ⟨real R⟩ - ⟨naninf⟩ i
    ∣ ⟨real R⟩ + i
    ∣ ⟨real R⟩ - i
    ∣ + ⟨ureal R⟩ i
    ∣ - ⟨ureal R⟩ i
    ∣ + ⟨naninf⟩ i
    ∣ - ⟨naninf⟩ i
    ∣ + i
    ∣ - i
…

If one’s not familiar with BNF this might be tricky to read. Rules are surrounded by ⟨brackets⟩ and can be referred to using the same syntax. Things outside the brackets are right arrows (→) saying that the thing on the left side is defined as what’s on the right side. The right side contains references to rules, vertical bars (∣) to define multiple options, and literal characters to say that “this character must be here”.

One way to use this is for determining if a particular string is a valid number or not. The first rule defines a number as one of four possible things. To see if a string is a number a program can follow the rules and try to match every character in the string to a character in a rule. For the string "52697461" there should be a way to get all the way from the ⟨number⟩ rule to ⟨digit⟩, and not just once but eight times. The path might be through ⟨num 10⟩ or ⟨num 16⟩, it doesn’t really matter which.

Here is one (abbreviated) path that takes us through the rules from ⟨number⟩ to "+i": ⟨number⟩ ⇒ ⟨num 2⟩ ⇒ ⟨prefix 2⟩ ⟨complex 2⟩ ⇒ (prefix can be empty) ⇒ ⟨complex 2⟩ ⇒ + i. A program can be written that walks all paths in the rules and when it finds a dead end prints the characters it has collected along the way, and then backtracks to continue on another path.

In Prolog with DCG the program looks remarkably similar to the rules in the specification (disregarding the insignificance of case for a moment):

scheme_number --> (num(2); num(8); num(10); num(16)).

num(R) --> prefix(R), complex(R).

complex(R) --> (real(R);
                real(R), "@", real(R);
                real(R), "+", ureal(R), "i";
                real(R), "-", ureal(R), "i";
                real(R), "+", naninf(R), "i";
                real(R), "-", naninf(R), "i";
                real(R), "+i";
                real(R), "-i";
                "+", ureal(R), "i";
                "-", ureal(R), "i";
                "+", naninf(R), "i";
                "-", naninf(R), "i";
                "+i";
                "-i").

real(R) --> (sign, ureal(R);
             "+", naninf(R);
             "-", naninf(R)).

naninf(10) --> ("nan.0"; "inf.0").

ureal(R) --> (uinteger(R);
              uinteger(R), "/", uinteger(R);
              decimal(R), mantissa_width).

decimal(10) --> (uinteger(10), suffix;
                 ".", digits(10), suffix;
                 digits(10), ".", digits0(10), suffix;
                 digits(10), ".", suffix).

uinteger(R) --> digits(R).

prefix(R) --> (radix(R), exactness;
               exactness, radix(R)).

suffix --> ("";
            exponent_marker, sign, digits(10)).
exponent_marker --> ("e"; "s"; "f"; "d"; "l").
mantissa_width --> (""; "|", digits(10)).
sign --> (""; "+"; "-").
exactness --> (""; "#i"; "#e").

radix(2) --> "#b".
radix(8) --> "#o".
radix(10) --> ""; "#d".
radix(16) --> "#x".

digit(2) --> ("0";"1").
digit(8) --> ("0";"1";"2";"3";"4";"5";"6";"7").
digit(10) --> ("0";"1";"2";"3";"4";"5";"6";"7";"8";"9").
digit(16) --> ("0";"1";"2";"3";"4";"5";"6";"7";"8";"9";
               "a";"b";"c";"d";"e";"f";
               "A";"B";"C";"D";"E";"F").

digits(R) --> (digit(R);
               digit(R), digits(R)).

When developing ones own solution I suggest to start small, focus on the basic rules with almost only characters, and build on more rules later when that’s working. There are two primary ways to use this program. Firstly it can check if a string is a valid number or not. Load the program into SWI-Prolog and call scheme_number with a string and the empty list. Prolog will print true or false and might wait for the user to type a command (try . or ;).

$ swipl -s numbers.pl
% numbers.pl compiled 0.00 sec, 42 clauses
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 6.6.6)
…
?- scheme_number("52697461", []).
true ;
true ;
true ;
true ;
false.
?- scheme_number("42i", []).
false.

From the first test it looks like there are four different ways to parse "52697461" as a number (e.g. as decimal and hexadecimal). The program can also be run in the other direction and generate all numbers:

$ swipl -s numbers.pl
% numbers.pl compiled 0.00 sec, 42 clauses
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 6.6.6)
…
?- forall(scheme_number(X, []), writef("%s\n", [X])).
#b0
#b1
#b00
#b01
#b000
#b001
…

Oops! It is really printing all numbers. Some modifications need to be made before the program is useful. The final version has case-insensitivity added and some sample strings for those cases that would otherwise generate digits forever. After these modifications it prints useful test inputs, although it can no longer recognize all numbers. Even though the search space is now finite the program still outputs 523 908 unique numbers. But that is just a consequence of the notation.

The program finds some very strange looking numbers, e.g. #e#d-inf.0-49.83e+49|53i and #o#e-0755/0755@-0755/0755. The first is the exact decimal complex number with real part -∞ and imaginary part -49.83⋅10⁴⁹ with mantissa width 53. The second is the octal exact complex number with both magnitude and angle -1 (i.e. -1∠-1). Kind of hard to see, though. Many Scheme implementations do not support exact complex numbers and they should reject these inputs.

The testing doesn’t stop after all the lovely test cases have been fed through the parser that’s being tested. Unfortunately the program can’t generate all invalid numbers, so using its output does not completely test an implementation. A parser tested only using the strategy presented here could accept or even crash on invalid inputs. (However if crashes are possible then american fuzzy lop can be used to automatically find crashing test cases).

The fact that the parser doesn’t reject valid inputs also doesn’t say much about if the input was parsed correctly. However the test inputs can be fed through the parser and printed, and then be compared to a reference printout (e.g. generated with a trusted implementation).

This was written as part of a series of articles on stuff that has been lying around on this website for a long time without any real commentary. The original version of the number generator was written in 2012, revised the following year, and revised again for this article.

Efficient computation of the "man or boy" test

weinholt — Tue, 25 Oct 2016 02:00:00 +0200

Here’s a quote from a computer scientist living in what was clearly simpler times:

[…]. Hence I have written the following simple routine, which may separate the man-compilers from the boy-compilers: […]
– Donald Knuth

Here is Knuth’s program in ALGOL60:

begin real procedure A(k, x1, x2, x3, x4, x5);
           value k; integer k;
           begin real procedure B;
                      begin k := k - 1;
                            B := A := A(k, B, x1, x2, x3, x4)
                      end;
                 if k ≤ then A : = x4 + x5 else B
           end;
      outreal(A(10, 1, -1, -1, 1, 0))
end;

This seemingly innocent program uses a lot of resources to compute its result. It takes a parameter k that is used to scale up the difficulty of the problem. It’s an interesting program to use for testing language implementations, since it’s easy to verify the result and also to compare the performance with other language implementations. People have translated the program into many languages over at Rosetta code.

But what if you have a groundbreaking compiler that manages to compute the solution to a larger k-value than you can find in the tables? You would want to know if you got the right answer. The “last word” from Knuth in ALGOL bulletin #19, page 8, gives us a way to easily compute the answer much faster than running the actual program:

Since then my right hand has observed that the value of A(k, x1, x2, x3, x4, x5) is equal to c1 × x1 + c2 × x2 + c3 × x3 + c4 × x4 + c5 × x5 where the coefficients are given in the following table:

k c1(k) c2(k) c3(k) c4(k) c5(k)

≤0 0 0 0 1 1

1 0 0 1 1 0

2 0 1 1 0 0

3 1 1 0 0 0

4 2 1 0 0 0

5 3 2 1 0 0

6 5 3 3 2 0

7 8 6 9 6 0

8 14 15 22 13 0

9 29 37 48 26 0

10 66 85 102 54 0

When k ≥ 5, these values may be obtained by the relations

c1(k + 1) = c1(k) + c2(k)
c2(k + 1) = c2(k) + c3(k)
c3(k + 1) = c3(k) + c4(k + 1)
c4(k + 1) = c1(k) + c4(k) - 1
c5(k) = 0

k	c1(k)	c2(k)	c3(k)	c4(k)	c5(k)
≤0	0	0	0	1	1
1	0	0	1	1	0
2	0	1	1	0	0
3	1	1	0	0	0
4	2	1	0	0	0
5	3	2	1	0	0
6	5	3	3	2	0
7	8	6	9	6	0
8	14	15	22	13	0
9	29	37	48	26	0
10	66	85	102	54	0

By using Knuth’s table and relations it’s possible to compute the man or boy function quite quickly. I wrote a program that does this. A walk-through follows.

The first part of the program is a gratuitous use of macros. The define-coefficients macro defines one of the coefficients mentioned in Knuth’s last word letter (c1 to c5). Each coefficient is parameterized by the k-value, which means that we should define a procedure that takes k as an argument and returns cn(k). Knuth’s table gives us the first values it should return (consts). For the rest of the values we’ll use the corresponding relation. As a bonus the macro also creates a hashtable where it stores previously computed values, which is key to speeding up the algorithm.

(define-syntax define-coefficients
  (syntax-rules ()
    ((_ (name k) consts equation)
     (define name
       (let ((h (make-eqv-hashtable)))
         (lambda (k)
           (let ((v consts))
             (cond ((< k 0)
                    (vector-ref v 0))
                   ((< k (vector-length v))
                    (vector-ref v k))
                   ((hashtable-ref h k #f))
                   (else
                    (let ((t equation))
                      (hashtable-set! h k t) ;memoize
                      t))))))))))

Next the macro is used to define the coefficients.

(define-coefficients (c1 k) '#(0 0 0 1 2 3)
                     (+ (c1 (- k 1))
                        (c2 (- k 1))))

(define-coefficients (c2 k) '#(0 0 1 1 1 2)
                     (+ (c2 (- k 1))
                        (c3 (- k 1))))

(define-coefficients (c3 k) '#(0 1 1 0 0 1)
                     (+ (c3 (- k 1))
                        (c4 k)))

(define-coefficients (c4 k) '#(1 1 0 0 0 0)
                     (+ (c1 (- k 1))
                        (c4 (- k 1))
                        -1))

(define-coefficients (c5 k) '#(1)
                     0)

The equations given as the last argument to the macro are the relations adjusted to cn(k) instead of cn(k+1). Now that the coefficients have been defined it is easy to define A, just as Knuth observed:

(define (A k x1 x2 x3 x4 x5)
  (+ (* (c1 k) x1)
     (* (c2 k) x2)
     (* (c3 k) x3)
     (* (c4 k) x4)
     (* (c5 k) x5)))

Computing what is commonly referred to as A(k) is now achieved by calling (A k 1 -1 -1 1 0). In Chez Scheme the computation takes no time at all, even for very large k that no existing machine could possibly handle using the original algorithm:

> (time (A 11 1 -1 -1 1 0))
(time (A 11 ...))
    no collections
    0.000004543s elapsed cpu time
    0.000004149s elapsed real time
    96 bytes allocated
-138
> (time (A 128 1 -1 -1 1 0))
(time (A 128 ...))
    no collections
    0.000007392s elapsed cpu time
    0.000007114s elapsed real time
    304 bytes allocated
-4635937302118988398753530618599286738522250
> (time (A 1024 1 -1 -1 1 0))
(time (A 1024 ...))
    no collections
    0.000744346s elapsed cpu time
    0.000743677s elapsed real time
    635472 bytes allocated
-134525946204897748012677078309699584745858677996302504232754322006054514469295899995329805478290190232956296857851394596551319734862436102284482400522332522964879376131124504760141628006776332968808079671234906412428151832076596502192653046509414077174915139024920128557185075083924034657098210492453447406606370793777979291350811243549339280709645840417

Only one question is left unanswered. Whether or not a sufficiently smart compiler can (or indeed should) transform Knuth’s original algorithm into the fast algorithm presented here.

This was written as the first part of a series of articles on stuff that has been lying around on this website for a long time without any real commentary. The code it talks about was written in 2012.

Internals of Zabavno the x86 emulator

weinholt — Mon, 24 Oct 2016 02:00:00 +0200

Zabavno (Забавно) is an x86 emulator I’ve been working on in my spare time. It translates x86 instructions into Scheme and eval’s them, which works surprisingly well.

The initial commit was made two years ago, but at the time I only worked on it for a few weeks. When Chez Scheme was open sourced I got interested in it again, since the techniques used depend on having a good compiler. (And Chez Scheme is a really good compiler).

Here’s a look at the internals of the emulator. The rest of this article gets very technical and assumes that the reader knows something about CPUs. The core of the CPU emulation happens in this procedure:

(define (%run-until-abort M debug instruction-limit
                          fl ip AX CX DX BX SP BP SI DI
                          cs ds ss es fs gs)
  (call/cc
    (lambda (abort)
      (let loop ((ip ip) (fl fl) (AX AX) (CX CX) (DX DX) (BX BX)
                 (SP SP) (BP BP) (SI SI) (DI DI)
                 (cs cs) (ds ds) (ss ss) (es es) (fs fs) (gs gs))
        ;; Translate instruction(s) or get an existing translation.
        (let ((trans (translate cs ip debug instruction-limit)))
          ;; Call the translation and get new values for the
          ;; registers. The translation may choose to abort in the
          ;; middle of a translation.
          (let-values (((ip^ fl^ AX^ CX^ DX^ BX^ SP^ BP^ SI^ DI^
                             cs^ ds^ ss^ es^ fs^ gs^)
                        (trans abort fl AX CX DX BX SP BP SI DI
                               cs ds ss es fs gs)))
            (loop ip^ fl^ AX^ CX^ DX^ BX^ SP^ BP^ SI^ DI^
                  cs^ ds^ ss^ es^ fs^ gs^)))))))

The translate procedure translates a block of machine instructions from memory at the address pointed to by cs:ip. The translated code is a procedure that takes the registers as arguments and returns new values for the registers. This is then done in a loop over and over, with the result the program in the emulated machine runs. Translations are cached, so if the emulator sees the same code address again it can just use the previous translation. This makes things a lot faster.

One might think that translation caching would violate the processor semantics in some way. Actually, it’s the opposite. Real processors have a buffer that tries to stay ahead of the actual code execution. One way to detect that code is running under a debugger is to modify an instruction that should have been pre-fetched by the processor. If a debugger is single-stepping then the modified code will be used, otherwise the original pre-fetched code is used. We can mimic the real processor’s semantics by putting multiple instructions into the same translated block. One just needs to be careful to invalidate the cache when writing to memory, but there is no need to interrupt due to writes in the middle of a translated block.

There are other benefits to putting multiple instructions in the same translated block. A lot of translated code will compute some intermediate result that is never actually used. This seems counter-intuitive since an optimizing compiler should have removed all such computations. However, the x86 architecture has a flags register that is updated automatically in a very inconsistent manner. This is how it looked in Intel’s 80386 programmer’s reference manual (1986):

The lower bits of the flags register holds information about arithmetic results: overflow, sign, zero and carry. Then there is the more interesting parity flag that contains the even parity of the lowest byte of the result, and the even more interesting auxiliary carry that signals carry/borrow from the lower four bits of the result (used for bcd). The processor can compute these flags as a side-effect of its normal operation, but the emulator has no such luxury. This is a lot of overhead for each and every arithmetic instruction.

Zabavno emits code to update the flags, but it’s done in a clever way so that in the majority of cases the code is never used. As was mentioned earlier, a translation block can contain multiple instructions. When one arithmetic instruction is followed by another one, the second tends to overwrite the flags. In that case it’s completely unnecessary to do the first flag update.

The bookkeeping for tracking which flags should be used and which should be discarded is generally a bit tricky, since a lot of instructions use the flags as input, and some instructions update only a few of them, and a lot of the time the flags are left undefined. Zabavno outsources almost all bookkeeping to the host Scheme’s optimizer. The code generator was written with cp0 (from Oscar Waddell’s PhD thesis) in mind. This optimizer is available in Chez Scheme and a few other ones.

Let’s look at the translation of a small instruction sequence. This code updates the eax register twice and computes flags twice:

example:
    sub eax,edx
    add eax,ecx

As previously mentioned, the code generator in Zabavno emits code that returns new values for the registers. This makes it easy to wrap each translated instruction in a lambda and call it with the registers returned by the previous instruction. It also happens that cp0 is very good at optimizing this kind of code. The arithmetic flags are also wrapped in lambdas and are called when they are needed. They are never used as first-class procedures, and they are quite small, so cp0 inlines them every time and no memory allocations are needed. Here is the initial (huge, nightmarish) translation of the example program:

(lambda (abort fl AX CX DX BX SP BP SI DI cs ds ss es fs gs)
  (define RAM
    (case-lambda
      [(addr size)
       (case size
         [(8) (memory-u8-ref addr)]
         [(16) (memory-u16-ref addr)]
         [(32) (memory-u32-ref addr)])]
      [(addr size value)
       (case size
         [(8) (memory-u8-set! addr value)]
         [(16) (memory-u16-set! addr value)]
         [(32) (memory-u32-set! addr value)])]))
  (define I/O
    (case-lambda
      [(addr size) (port-read addr size)]
      [(addr size value) (port-write addr size value)]))
  (let ([fl (lambda () fl)]
        [fl-OF (lambda () (fxand fl 2048))]
        [fl-SF (lambda () (fxand fl 128))]
        [fl-ZF (lambda () (fxand fl 64))]
        [fl-AF (lambda () (fxand fl 16))]
        [fl-PF (lambda () (fxand fl 4))]
        [fl-CF (lambda () (fxand fl 1))])
    ((lambda (fl AX CX DX BX SP BP SI DI cs ds ss es fs gs)
       (let* ([t0 AX]
              [t1 DX]
              [tmp (fx- t0 t1)]
              [result (fxand tmp 4294967295)]
              [fl-OF (lambda ()
                       (if (fxbit-set?
                            (fxand (fxxor t0 t1) (fxxor t0 result))
                            31)
                           2048
                           0))]
              [fl-SF (lambda ()
                       (if (fxbit-set? result 31) 128 0))]
              [fl-ZF (lambda ()
                       (if (eqv? result 0) 64 0))]
              [fl-AF (lambda ()
                       (if (fxbit-set? (fx- (fxand t0 15)
                                            (fxand t1 15))
                                       4)
                           16
                           0))]
              [fl-PF (lambda ()
                       (if (vector-ref
                            byte-parity-table
                            (fxand result 255))
                           4
                           0))]
              [fl-CF (lambda () (if (fxbit-set? tmp 32) 1 0))]
              [AX result])
         ((lambda (fl AX CX DX BX SP BP SI DI cs ds ss es fs gs)
            (let* ([t0 AX]
                   [t1 CX]
                   [tmp (fx+ t0 t1)]
                   [result (fxand tmp 4294967295)]
                   [fl-OF (lambda ()
                            (if (fxbit-set?
                                 (fxand (fxxor t0 result)
                                        (fxxor t1 result))
                                 31)
                                2048
                                0))]
                   [fl-SF (lambda ()
                            (if (fxbit-set? result 31) 128 0))]
                   [fl-ZF (lambda ()
                            (if (eqv? result 0) 64 0))]
                   [fl-AF (lambda ()
                            (if (fxbit-set?
                                 (fx+ (fxand t0 15) (fxand t1 15))
                                 4)
                                16 0))]
                   [fl-PF (lambda ()
                            (if (vector-ref byte-parity-table
                                            (fxand result 255))
                                4
                                0))]
                   [fl-CF (lambda () (if (fxbit-set? tmp 32) 1 0))]
                   [AX result])
              (let* ([fl (lambda ()
                           (fxior (fxand (fl) -2262) (fl-OF) (fl-SF)
                                  (fl-ZF) (fl-PF)
                                  (fxior (fl-AF) (fl-CF))))])
                (values 262 (fl) AX CX DX BX SP BP SI DI
                        cs ds ss es fs gs))))
          fl AX CX DX BX SP BP SI DI
          cs ds ss es fs gs)))
     fl AX CX DX BX SP BP SI DI
     cs ds ss es fs gs)))

There is some initial setup code for accessing memory and the i/o bus. Then the initial values of the flags are wrapped in lambdas. The instructions themselves correspond to the let* expressions. The innermost let* computes the flags and then returns all registers. Pretty terrible with a lot of lambdas, so the GC will be invoked very frequently if this is what the emulator uses. But this is the code after cp0 has optimized it:

(lambda (abort fl AX CX DX BX SP BP SI DI cs ds ss es fs gs)
  (let ([result (fxand #xffffffff (fx- AX DX))])
    (let ([tmp (fx+ result CX)])
      (let ([result (fxand #xffffffff tmp)])
        (values 262
          (fxior (fxand -2262 fl)
                 (if (fxbit-set? (fxand (fxxor result result)
                                        (fxxor CX result)) 31)
                2048 0)
            (if (fxbit-set? result 31) 128 0)
            (if (eqv? result 0) 64 0)
            (if (vector-ref byte-parity-table (fxand #xff result))
                4 0)
            (fxior (if (fxbit-set? (fx+ (fxand 15 result)
                                        (fxand 15 CX))
                                   4)
                       16 0)
                   (if (fxbit-set? tmp 32)
                       1 0)))
          result CX DX BX SP BP SI DI cs ds ss es fs gs)))))

None of the remaining code needs to allocate memory (in fact only the vector-ref call needs to read from memory). There is still some room for improvement, but the flags are only computed once. If there was an instruction that did need a flag from sub then cp0 would see that, and that flag would be computed. In general the flags tend to be computed only once per block.

This is how the core of the CPU emulation works. Now that you’ve seen it, why don’t you give it a try? Zabavno is available from github:

git clone https://github.com/weinholt/zabavno/

Shiny new website layout

weinholt — Sun, 23 Oct 2016 02:00:00 +0200

This website has a new layout. The previous ~~anti-social~~ directory listing layout is gone and the future is shiny. I’ve been putting this off for maybe ten years now.

I still wanted a static web site because it’s so much easier to run a web server that way. So I went looking for a Node.js based static site generator. Node.js because, honestly, JavaScript is the language for the web.

There are a few of these. My previous web site was made with a self-made static site generator, and they are not actually very complicated when you get down to it. It’s just that there are more design choices than there are lines of code. The ones that caught my eye were Wintersmith and DocPad. DocPad talks about setting me free, suggesting that I’m currently a captive. Not a positive image. I went with Wintersmith because I happen to like winter. And the author is Swedish, so there’s that. The obvious choice.

Wintersmith lets you write in Markdown and it automatically turns it into web pages. Markdown is a shiny web thing. But the layout of the site itself is controlled with Jade (apparently now renamed Pug?). In my own templating engine I used SXML for both the job of Markdown and Jade. I hadn’t used Jade for more than five minutes before I started to miss quasiquote, unquote and in particular my good old friend unquote-splicing. (Search for how to make a list of tags separated by a comma). If I were to do a site generator again I’d use Markdown for the content and SXML for the structure. Scheme gets a lot of things right, in the end.

Faster Dynamic Type Checks

weinholt — Wed, 07 Mar 2012 01:00:00 +0100

“Arranging for Safety Checks with Hardware Traps” was the title of an article I wrote for a class project. It describes how to use the Alignment Checking feature of the x86/AMD64 architecture to get branchless dynamic type checks.

The article has not been published formally, although it was written in a style that makes it look like it could be published. I was unable to find anything in the literature that describes how to use alignment checking for dynamic type checks.

After writing the article I discovered that the idea is mentioned in passing in Olin Shiver’s dissertation and there are several mentions of it on Usenet. The method is old but has not been used on the x86 because it is difficult to make it work with other software.

And for what it’s worth: the branchless vector-ref is probably slower than the one with the branch. But it is cool.

“A hack is a terrible thing to waste, please give to the implementation of your choice…” – GJC

weinholt.se

Chez Scheme 10 in Debian experimental

Fuzzing Scheme with AFL++

Steps to Fuzzing

Program Under Test

Compile with Instrumentation

Run the Fuzzer

Analyze the outputs

More Tools for Easier Fun

Minimize with afl-tmin

Analyze with afl-analyze

tl;dr

Akku website updates

Search box

Who uploaded the package?

Where do I get that library?

Who exports this identifier?

What’s in the package?

Future work

Loko Scheme 2022 Q4 Update

Self-compilation on bare metal

Valand, a windowing system

Dreaming up what’s next

A mini-rant

So when 1.0?

Cond-expand and #ifdef

Use cases

Portability?

Configuration?

Also Considered Harmful

cond-expand is not as bad…

… but not really better

Back to R6RS

Portability between R6RS implementations

Configuration for R6RS code

Akku supports this stuff

Maybe a way forward

Loko Scheme 0.9.0

A Record Type Representation Trick

Background: Records in Scheme

Record Type Representation

Single Inheritance

Predicates for Non-Sealed Types

The Trick

Taking it further

Sufficiently Smart

Post-script

Akku.scm 1.1.0 released

Further reading

Loko Scheme 0.6.0

Akku Archive Improvements

Tarballs

Provenance

Beta testers

Further reading

Quasiquote - Literal Magic

Background

Briefly on quasiquote

Location, location, location!

Literal constants

Referential transparency

There is a crack in everything

Literal magic

These go to 11

The fly in the ointment

Conclusions

Loko Scheme 0.4.3

A New R6RS Scheme Compiler

Why Loko

Future direction

Further reading

Announcing Akku.scm 1.0.0

Further reading

Terminfo and its DSL

A dumb terminfo entry

An advanced entry

Anatomy of a string

The terminfo language

Compiling terminfo programs

Summary