Cond-expand and #ifdef

Written by Gwen Weinholt on 2022-06-09

In the C programming language you can ask the macro preprocessor to keep or remove part of a source file. This is done with #ifdef. The equivalent in Scheme is called cond-expand. R7RS Scheme has two different instances of cond-expand, while R6RS Scheme does not have it all. What does R6RS do instead, and is cond-expand a bad idea?

Use cases

What is #ifdef, an its cousins #if and #ifndef, used for? And why might we want its equivalent in Scheme? There are two major use cases for #ifdef: build-time configuration and portability. The build system will usually have some sort of configuration script that detects what system it’s running on. Usually these scripts also let the user enable or disable features.

Chez Scheme runs on top of a chunk of fairly portable C and uses #ifdef as described:

$ grep -h '^#ifdef' ChezScheme/c/*.[ch] |awk '{print $2}' \
   | sort -u | xargs
ARCHYPERBOLIC ARMV6 BSDI CHAFF CHECK_FOR_ROSETTA CLOCK_HIGHRES
CLOCK_MONOTONIC CLOCK_MONOTONIC_HR CLOCK_PROCESS_CPUTIME_ID
CLOCK_REALTIME CLOCK_REALTIME_HR CLOCK_THREAD_CPUTIME_ID DEBUG
DEFINE_MATHERR DISABLE_CURSES EINTR ENABLE_OBJECT_COUNTS
FEATURE_EXPEDITOR FEATURE_ICONV FEATURE_PTHREADS FEATURE_WINDOWS FLOCK
FLUSHCACHE FunCRepl GETWD HANDLE_SIGWINCH HPUX I386 IEEE_DOUBLE ITEST
KEEPSMALLPUPPIES LIBX11 LITTLE_ENDIAN_IEEE_DOUBLE LOAD_SHARED_OBJECT
LOCKF LOG1P LOOKUP_DYNAMIC MACOSX MAP_32BIT __MINGW32__ MMAP_HEAP
NAN_INCLUDE NO_DIRTY_NEWSPACE_POINTERS NOISY
NO_LOCKED_OLDSPACE_OBJECTS PPC32 PROMPT PTHREADS SA_INTERRUPT
SA_RESTART SAVEDHEAPS segment_t2_bits segment_t3_bits SIGBUS SIGQUIT
SOLARIS SPARC SPARC64 TIOCGWINSZ USE_MBRTOWC_L WIN32 _WIN64 WIPECLEAN
X86_64

Macros like FEATURE_EXPEDITOR turn on and off functionality, while macros like HPUX and I386 are used for portability. So, configuration and portability.

Portability?

Does #ifdef truly help with portability? It can certainly seem this way, but there’s a different way to think about this issue. This is what Rob Pike had to say on the TUHS main list:

C with #ifdefs is not portable, it is a collection of 2^n overlaid programs, where n is the number of distinct #if[n]def tags. It’s too bad the problems of that approach were not appreciated by the C standard committee, who mandated the #ifndef guard approach that I’m sure could count as a provable billion dollar mistake, probably much more. The cost of building #ifdef’ed code, especially with C++, which decided to be more fine-grained about it, is unfathomable.

For each C file with #ifdef you need to understand what happens if the condition is true versus if it’s false. It’s simple with just one #ifdef, but the problem grows exponentially.

Configuration?

You can use #ifdef for conditional compilation, to turn on and off features. But this can also create a mess like that described by Pike above. The GNU Coding Standards have this to say:

When supporting configuration options already known when building your program we prefer using if (... ) over conditional compilation, as in the former case the compiler is able to perform more extensive checking of all possible code paths.

The same sentiment is echoed by Douglas McIlroy in the message that preceded Pike’s message above.

This approach is generally a good idea. All those conditionals give you a lot of code paths. Checking that all them even compile is difficult to do by hand, and you need all the help you can get. When you use #ifdef you hide the code from the compiler. The compiler can’t check code that it can’t see. Using if (... ) when the expression is constant at compile time should give the same result as conditional inclusion, at least if you have an optimizing compiler.

Also Considered Harmful

It’s not just people on the Internet saying these things about #ifdef, and the complaints are not new either.

We believe that a C programmer’s impulse to use#ifdef in an attempt at portability is usually a mistake. Portability is generally the result of advance planning rather than trench warfare involving #ifdef. In the course of developing C News on different systems, we evolved various tactics for dealing with differences among systems without producing a welter of #ifdefs at points of difference. We discuss the alternatives to, and occasional proper use of, #ifdef.

Source: SPENCER, Henry; COLLYER, Geoff. #ifdef considered harmful, or portability experience with C News. In: USENIX Summer 1992 Technical Conference (USENIX Summer 1992 Technical Conference). 1992.

You can use Google Scholar to find papers that cite this paper, if you’re interested in more reading.

cond-expand is not as bad…

The first standardization of cond-expand that I know of is Marc Feeley’s SRFI-0. It is also part of R7RS Scheme, where I believe it has seen wider adoption than plain SRFI-0.

There is one major difference between #ifdef and cond-expand. The former is handled by a preprocessor that does not understand the lexical syntax of the language is it working with. You can even use cpp with other languages than C, e.g. assembly. This means you can easily introduce latent syntax errors with #ifdef.

There is a cond-expand available from inside define-library and another one available as syntax in (scheme base). Both of these are handled after the source file has been parsed. This means that cond-expand cannot create an unbalanced syntax tree. What I mean by this is that you can’t somehow use cond-expand wrong in such a way that the parenthesis become unbalanced. To make this mistake with #ifdef is trivial; simply place } before #endif when it should have been after, or vice versa.

You get bonus points, so to speak, if you do this near code that handles portability to operating systems that you can’t test your changes on.

… but not really better

Apart from the differences in how the compiler handles them, they do actually express the same thing. One can look at cond-expand as morally equivalent to a series of #if, #else and #endif directives. Therefore the very same problems that happen with #ifdef also happen with cond-expand.

I’m not optimistic about the future landscape of R7RS code if cond-expand is not recognized for the problems it brings. It may be that each R7RS library will become a jungle of 2n overlaid libraries. I have seen some indication of this process already beginning when looking at the packages in Snow Fort.

Fortunately I have not seen any examples where cond-expand is used to change which identifiers are exported from a library, but that day may yet come.

Back to R6RS

So in the beginning of this article I wrote that R6RS Scheme does not have cond-expand. Does that mean it has another way to handle these problems?

No, but in practice: yes. In the R6RS report there are only libraries as a suggested way to handle this. And there isn’t really a way to conditionally import libraries at compile time.

This situation has given rise to some creative solutions. The configuration and portability problems do not disappear just like that, so people have tried to solve it within the restrictions of the language.

Portability between R6RS implementations

I believe that all R6RS Scheme implementations implement the de facto standard of importing libraries by first looking for them in files that end with the .<impl>.sls suffix before they try .sls. If Chez Scheme sees (import (foo)) then it first tries foo.chezscheme.sls before it tries foo.sls. This mechanism, even though it’s not in R6RS, is widely implemented.

This mechanism is used to create compatibility libraries. One striking example is the (xitomatl common) library from Derick Eddington’s xitomatl. It contains a few procedures that traditionally appear in Scheme implementation, but which are not in the reports. Here is common.chezscheme.sls:

;; Copyright 2009 Derick Eddington.  My MIT-style license is in the file named
;; LICENSE from the original collection this file is distributed with.

(library (xitomatl common)
  (export
    add1 sub1
    format printf fprintf pretty-print
    gensym
    time
    with-input-from-string with-output-to-string
    system
    ;; TODO: add to as needed/appropriate
    )
  (import
    (chezscheme))
)

There are matching libraries for Guile, Ikarus, Larcency, Racket, Mosh and Ypsilon. They export the same identifiers, but they all have some tweaks to adapt them to the various implementations.

This is the “Plan 9” approach to portability, as briefly described in the mailing list thread referenced above. Define APIs and let the implementation of the API hide the portability problems from the rest of the program.

Configuration for R6RS code

What is to be done for configuration? When I need this in my own code, I create a library that exports the configuration as identifier syntax. Here’s an abbreviated example from Loko Scheme:

(library (loko arch amd64 config)
  (export
    ; ...
    use-popcnt)
  (import
    (rnrs))

(define-syntax define-const
  (syntax-rules ()
    ((_ name v)
     (define-syntax name
       (identifier-syntax v)))))

; ...

(define-const use-popcnt #f))

I can then use this identifier syntax as a regular variable, like (if use-popcnt <do-this> <do-that>). But thanks to define-const it becomes inlined at the place where it is used. So if it’s set to #f then the expanded conditional is actually (if #f <do-this> <do-that>), which is trivial to optimize.

Akku supports this stuff

I have included support in Akku for both the R6RS and R7RS approach to portability. Akku will keep track of which Scheme implementation an R6RS library is meant for and adapt the way it installs the library. It will use the .<impl>.sls extension and even escape the file name correctly for that implementation.

With R7RS libraries it merely needs to install the .sld file to the right location. This is simple enough to do. But Akku also translates R7RS libraries into R6RS libraries. Akku has to do some interesting juggling when cond-expand appears at the define-library level.

Akku checks the list of features that appear cond-expand and looks to see if it recognizes any implementation names. For each implementation it then creates a copy of the library that is specific to that implementation. For each such copy it expands all cond-expand expressions at the define-library level, as best as it can, and installs .<impl>.sls files. Kind of dirty, but it mostly works.

For example, this library will be installed as hello.chezscheme.sls, hello.loko.sls, hello.sld and hello.sls:

(define-library (hello)
  (export hello)
  (cond-expand
   ((library (rnrs))
    (import (rnrs)))
   (else
    (import (scheme base))))

  (cond-expand
   (chezscheme
    (begin
      (define (hello)
        (display "Hello Chez!\n"))))
   (loko
    (begin
      (define (hello)
        (display "Hello Loko!\n"))))
   (else
    (begin
      (define (hello)
        (display "Hello world!\n"))))))

(The hello.loko.sls file it creates is actually a symlink to hello.sld because Akku knows that Loko supports R7RS).

Maybe a way forward

The picture I’ve painted seems quite damning for cond-expand. But the problem is not really cond-expand itself. The problem is when the misuse of cond-expand leads to a mess. I’m not advocating for its removal, but I would like it to be better understood for what it is. Some widely distributed guidelines for how to use it would go far in reducing the damage.

The “Plan 9” approach of making APIs can be done with cond-expand just as well as with the R6RS approach of .<impl>.sls>. It just requires that you know what you’re doing; that it’s a bad idea to sprinkle all your code with cond-expand and that you should keep this code in isolated libraries.

Finally, I’d like to say that cond-expand is actually more powerful than the .<impl>.sls approach. This power unfortunately makes static analysis of library declarations more difficult, but it also gives you access to feature identifiers that are more interesting than just the name of the Scheme implementation, such as x86-64 and clr. But please do hide your use of these behind an appropriate API.