Scheme "load" in various compilers

Home About GitHub

Most other programming languages I have used are either strictly compiled (C, C++, C#, Java) or strictly interpreted (JavaScript, POSIX Shell/Bash). Languages that can be compiled and interpreted, like Haskell or Python, have a clear distinction between compiling and interpreting code.

The Scheme programming language is a bit different. Many Scheme compilers work like Python with bytecode compilation, or Haskell where you can compile and then used compiled code in the interpreter REPL. But the R7RS standard in fact does not even define a compile command, nor does it specify compilation or interpretation phases in the operational semantics at all. Compilation seems to be considered an implementation detail, at least from the point of view of the R7RS document.

So for me, understanding the difference between compile time and runtime in Scheme was a little confusing at first. Also Scheme does macro expansion, and so it was not clear to me why a Scheme compiler could do macro expansion strictly at compile time when the language itself doesn't even define what "compile time" means.

Fortunately, most Scheme compilers operate in somewhat similar fashion regardless of their implementation details, and the one thing that ties them all together is the "load" procedure, which is clearly defined in the R7RS standard. So what does "load" actually do? And by the way, Scheme also has an "import" keyword. When should we use "load", when should we use "import"?

I would like to answer these questions in this article, and develop an understanding of what a "compiler" is from a Lisp/Scheme way of thinking. I also provide some concrete examples of how the "load" procedure works across four different R7RS standard compliant Scheme implementations: Guile, Gambit, Stklos, and MIT Scheme.

TL;DR

Before we begin: A note on the meaning of "REPL"

I worry that the term "REPL" nowadays is widely misunderstood. The acronym, pronounced "Reh-Pull" is usually used only to describe "interactive" REPLs for languages like Python, Ruby, Perl, and so on. But notice that the word "interactive" is not included in the REPL acronym: "Read-Eval-Print Loop." A REPL need not be interactive at all.

Lisp was the very first language to ever use a REPL, both in the interactive sense, and in the non interactive sense. In the Lisp family of languages, the "load" function is the REPL. That is to say, what "load" actually does is it enters into a loop in which a single expression (which Lisp calls a "form") is read, the read form is evaluated, and then the result of evaluation may (or may not) print a result. Then the loop begins again at the "read" step.

In the rest of this article, I want readers to keep in mind that when I say "REPL", I am talking about what the "load" procedure is doing, and not about an "interactive REPL" like what many programmers may have in mind when the term is usually mentioned.

What does "load" actually do?

According to the Scheme report:

The "load" procedure reads expressions and definitions from the file and evaluates them sequentially in the environment specified by ENVIRONMENT-SPECIFIER.

(...for some arbitrary definition of "file.") Note also the use of the words "evaluates them sequentially," which implies the use of "eval" to execute code, but does a compiled Scheme program use "eval"?

Also, what is environment-specifier? Well, for now, lets think of an environment object as the state of a REPL, although environments are such a deep topic that they must be covered in a separate article (stay tuned).

In simplest possible terms, the "load" procedure creates or updates a Scheme "environment" object with data and executable code taken from outside of the Scheme environment. If we compare Scheme to other programming languages, the behavior of "load" is as follows:

But the R7RS standard does not say your Scheme implementation must necessarily operate like this at all.

When you look at it more closely, the R7RS specification really gives a lot of leeway to Scheme implementations. The "load" procedure simply to takes a reference to a piece of information that the R7RS document calls a "file", which exists somewhere outside of the program, and uses that reference to that "file" to modify an environment object, presumably by evaluating code in the file. The reference to this "file" must be represented by a string, but the semantics of the string are unspecified. The string could be file path, a URL, a database query, a telephone number, a radio station call sign... it could display a name on a marquee indicating a trained pigeon to peck Morse code on a small red button.

(let ((game (interaction-environment)))
  (load "./my-game.scm" game)
  (load "~/directory/with/many/files/" game)
  (load "https://example.com/scheme/my-game.scm" game)
  (load "SELECT code FROM scheme-code-DB WHERE name = 'my-game'" game)
  (load "(system \"/usr/bin/python -m symbol-server\")" game)
  (load "(617) 258-8682" game)
  (load "WMBR 88.1 FM" game)
  )
  ;; WARNING: none of these are guaranteed to work, consult
  ;; the documentation for your scheme implementation for
  ;; how to use the "LOAD" procedure.

But of course, a useful Scheme implementation will try to execute loaded procedures as quickly as possible, and to that end, a Scheme compiler might try to optimize the loaded procedures by transforming textual, human-readable code into efficient binary code that can be copied directly into the working memory of the computer, and caching that binary code somewhere it can be retrieved the next time "load" is applied the same string again. But these are implementation details, and not specified in, or required by, the R7RS standard.

So what is "load" used for? It is used to give you access to data/procedures taken from some resource outside the system and bind them to symbols inside of a REPL. Loading a Scheme program should behave the same regardless of whether the REPL is an interactive shell, or whether it is a compiler. Again, keep in mind that the affected "REPL" might or might not be an interactive REPL like Python or a POSIX shell.

Notably, however, "load" should not be used to assemble many Scheme programs together into a larger one. For that, "define-library" and "import" should be used instead (explained later).

How "load" is unique to Lisp-like languages

When "eval" is being run by a compiler, it is not the same "eval" used by the interpreter. The "eval" for a Scheme compiler will only be performing a partial evaluation of the code as it is translated to a binary object program, and optimizing that binary as it goes.

You can still call "eval" from a compiled procedure and run arbitrary code constructed at runtime. But to accomplish this, the version of "eval" that is being used by the compiler will usually link a purely interpreted version of "eval" for use at runtime, it will not link to the version of "eval" currently being used by the compiler. The call to the interpreted "eval" procedure will probably also have to be updated with the set of macros that are in scope at site at which "eval" is invoked. This is the only way for a compiler to ensure that the semantics of "eval" in the compiled program will be the same as that of the purely interpreted program.

When a macro definition is evaluated in the compiler REPL environment, it is immediately included into the set of rules used by the "eval" procedure which is, in turn, being used by "load". So if your Scheme program does define a macro, "eval will insert your own code transformation rules directly into the compiler itself. This may include your own custom code optimizations.

But Lisp and Scheme macros go beyond simply performing compiler optimizations. The "load" procedure may encounter code that overwrites certain built-in symbols such as define or let so that type checking can be performed at binding sites. Comments could be extracted to generate documentation. Theoretically, you could define Lisp systems to perform all kinds of analysis and transformation on your code:

Note that all of the above code transformations are theoretically possible, but just because each of the above systems are possible does not necessarily mean it is easy or inexpensive to find a good Lisp system that actually does any of the above mentioned things.

How is "import" different from "load"

The Scheme "import" statement is different from "load" in a few important ways. The "import" statement is:

Also, there is some difference between implementation as to whether you are allowed to use "import" in an interactive REPL or not. According section 5.1 of the R7RS standard:

A Scheme program consists of one or more import declarations followed by a sequence of expressions and definitions.

So "import" can be used as the first expressions at the top of a program file, but not used again after the first non-import expression. "import" can also be used in a "define-library" expression. But an interactive REPL is not necessarily the same REPL that "load" is given when you begin loading a program file. So here a quick overview of the use of "import" across four of the R7RS-compliant Scheme implementations:

Specific examples

So lets see how various actual Scheme compilers compile things. I have here an example program, which can be loaded as-is by Guile, MIT-Scheme, Gambit, and STkLos — four Scheme implementations which have made a best effort at implementing the R7RS Scheme standard. Here is the example code:

(import
  (scheme base)
  (only (scheme file) open-binary-input-file))

(display "This code runs at load-time.")

(define a-number 5) ;; small objects can be stored in an environment

(define a-huge-binary-blob
  ;; Large objects can be stored in an environment as well.
  ;; Here we load a file up to (expt 2 22) in size, which is 4 MiB
  ;; If this program is compiled, the compiled program /could/ have
  ;; this entire binary blob stored within it.
  (call-with-input-file
    (open-binary-input-file "binary-data.raw")
    (lambda (port) (read-bytevector (expt 2 22) port))))

;; Of course, executable procedures can be stored in an environment:

(define (load-time-and-runtime-code)
  (display "I want this code to run at load-time and run-time."))

(define (main . args) (display "This code runs only at run-time.")
  (load-time-and-runtime-code))

(display "This code also runs at load-time.")

;; If an executable procedure is fully defined, it can be called
;; at load time as well.

(load-time-and-runtime-code)
(newline)

How the Guile compiler operates

Guile compiles any file you load to bytecode, and keeps a cache of compiled bytecode objects. Once load-ed into memory, the bytecode can be JIT compiled (compiled on demand) to further improve performance while the program is running. You can force re-compilation of the cached file with the --fresh-auto-compile flag.

$ #-------------------- First invocation --------------------
$ guile --r7rs
GNU Guile 3.0.8
Copyright (C) 1995-2021 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (load "example.scm")
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /home/ramin/example.scm
WARNING: (guile-user): imported module (scheme base) overrides core binding `expt'
WARNING: (guile-user): imported module (scheme base) overrides core binding `expt'
;;; compiled /home/ramin/.cache/guile/ccache/3.0-LE-8-4.6/home/ramin/example.scm.go
This code runs at load-time.
This code also runs at load-time.
I want this code to run at load-time and run-time.

scheme@(guile-user)> ;; The program was compiled, and
scheme@(guile-user)> ;; we can access data/procedures in the compiled file
scheme@(guile-user)> (bytevector-u8-ref a-huge-binary-blob 0)
$1 = 0
scheme@(guile-user)> (main)
This code runs only at run-time.
I want this code to run at load-time and run-time.

scheme@(guile-user)> ,q
$ #-------------------- Second invocation --------------------
$ guile --r7rs -e main example.scm
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
This code runs only at run-time.
I want this code to run at load time and runtime.
$ # The program was already compiled so it isn't compiled again.

Notice that on the first invocation the compiler log messages indicate the file is compiled. Then we see the load time messages displayed. Note that it reports the file path where the bytecode file is cached:

/home/ramin/.cache/guile/ccache/3.0-LE-8-4.6/home/ramin/example.scm.go

The cached object file has a ".go" extension. This is not a Go programming language file, it is a "Guile Object" file. Guile has been using this ".go" file extension since before the Go programming language was ever even invented.

On the second invocation of Guile on my example program, there are no messages from the compiler because the compiler does not run at all. Guile automatically reloads the compiled bytecode file from cache and runs it immediately. The same would happen if I had invoked the example program using "load" from the interactive REPL.

In the second invocation, I use the "-e main" command line argument to apply the main procedure after the program is loaded. We could, however, simply write "(main)" as the last line of the program file as well. We then see the load time and runtime messages displayed.

How the MIT/GNU Scheme compiler operates

MIT/GNU Scheme does everything via the REPL, it doesn't really allow you to control many of its features via command line arguments. Also, it defaults to interpretation. If you want a file to be compiled, you must explicitly compile a file before load-ing it, otherwise it is interpreted. To compile a file, use the "cf" procedure (CF means "compile file"). It then produces a compiled object file with a ".com" extension, and you can load this ".com" file using the load procedure.

$ mit-scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.

Copyright (C) 2020 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Image saved on Sunday March 7, 2021 at 3:24:56 PM
  Release 11.2 || SF || LIAR/x86-64

1 ]=> (cf "example.scm") ;; --------------- this compiles the file

;Generating SCode for file: "example.scm" => "example.bin"...
;  This program does not have a USUAL-INTEGRATIONS declaration.
;  Without this declaration, the compiler will be unable to perform
;  many optimizations, and as a result the compiled program will be
;  slower and perhaps larger than it could be.  Please read the MIT
;  Scheme User's Guide for more information about USUAL-INTEGRATIONS.
;Warning: Unreferenced bound variable: args (main)
;... done
;Compiling file: "example.bin" => "example.com"... done
;Unspecified return value

1 ]=> (load "example.com") ;; ---- loading now happens much faster

;Loading "example.com"...

This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
;... done
;Unspecified return value

1 ]=> (bytevector-u8-ref a-huge-binary-blob 0) ;; data/procedures are now available
;Value: 0
1 ]=> (main)
This code runs only at run-time.
I want this code to run at load time and runtime.
;Unspecified return value

1 ]=> (exit 0) ;; ------ it is even harder to get out of than "vi"

..#]^@^@^@ NO CARRIER
$ # "No carrier." Very funny.
$ # I'm old enough to remember modems and dialup connections.
$ 

Notice how the load time messages still displayed at load time even after the file has been compiled. Scheme compilers must maintain the "load" semantics even when files are compiled, and so it will compile the file to executable code that works the same as it would if it were interpreted. And of course, the runtime messages are displayed as soon as the main procedure is applied.

By the way, to get rid of the message "This program does not have a USUAL-INTEGRATIONS declaration..." add this block of code to the top of the example.scm program:

(cond-expand
  (mit/gnu
   (declare (usual-integrations)))
  (else))

Also note that the compiler generates the following files, which according to the user manual:

example.com
contains binary executable code
example.bci
contains debugging information, source locations
example.bin
contains intermediate bytecode (called "SCode")

How the Gambit compiler operates

NOTE: that this only works for Gambit 4.9.5 or later, any earlier versions will probably not work as described here.

The Gambit compiler operates the same as MIT/GNU Scheme, interpreting although the procedure to compile a file is called compile-file instead of cf. The compiler-enabled REPL is launched with the "gsc" (Gambit Scheme Compiler) command. Note: that there is also a Gambit Scheme Interpreter executable called "gsi", but it does not provide the "compile-file" procedure. So the following must use GSC:

$ gsc -:r7rs
Gambit v4.9.5

> (compile-file "example.scm") ;; ------------------- compile
"/home/ramin/example.o1"
> (load "example.o1") ;; --------------- load compiled object
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
"/home/ramin/example.o1"
> (bytevector-u8-ref a-huge-binary-blob 0)
0
> (main) ;; --------------------------------- run the program
This code runs only at run-time.
I want this code to run at load time and runtime.
> ,q ;; --------------------------------------- exit the REPL
$ 

As you can see, the compiled binary object file has an ".o1" filename extension. Compiling the example again will increment the number appended to ".o" so you will see "example.o2" the next time you compile, unless you delete "example.o1".

The next time you launch the "gsc" or "gsi" interactive REPL, you can apply "load" to these compiled ".o*" object files to use the optimized binary form of your program. It is also possible to invoke an ".o*" object program using "gsi":

$ gsi ./example.o1
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.

Gambit also makes it easier to invoke compilation from the command line, and to generate programs that can be executed as stand-alone binary files. Simply use the "-exe <filepath>" option to indicate the file path of where stand-alone executable should be created.

$ gsc -:r7rs -exe -o example example.scm
$ ./example
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
$ 

The load time messages are still displayed when running the program, in keeping with the "load" semantics that loading calls all procedures applied at the top level of the program. If you want the "(main)" procedure to run, write a line of code at the end of the program applying the "main" procedure.

How the STkLos compiler operates

STkLos uses the same as procedure as MIT/GNU and Gambit, and produces bytecode files. Since version 2.0 (and if I recall correctly, even as early as version 1.7) STkLos provides the "compile-file" function similar to Gambit, although you must specify the compilation target output file path.

$ stklos
  \    STklos version 2.00 (stable)
   \   Copyright (C) 1999-2023 Erick Gallesio 
  / \  [Linux-6.1.0-21-arm64-aarch64/pthreads/readline/utf8]
 /   \ Type ',h' for help
stklos> (compile-file "example.scm" "example.stklos")
stklos> (load "example.stklos")
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
stklos> (bytevector-u8-ref a-huge-binary-blob 0)
0
stklos> (main)
This code runs only at run-time.
I want this code to run at load time and runtime.
stklos> ,q

The compiled file "example.stklos" is actually a stand-alone executable file, and if you mark this file as executable to your host OS (such as with "chmod") you can run the compiled Scheme program directly from the host OS command line.

$ chmod 755 example.stklos
$ ./example.stklos
This code runs at load-time.
This code also runs at load-time.
I want this code to run at load-time and run-time.

This code runs only at run-time.
I want this code to run at load-time and run-time.

Notice that if you do run the compiled program as a stand-alone executable, STkLos will automatically invoke any procedure called "main" after load time completes. However "main" is not applied if you "load" the compiled program in an interactive REPL.

As with Gambit, STkLos provides an easier way to build stand-alone executables from the host OS command line, so you do not need to use "chmod". There is actually a Scheme SRFI, number 138 specifying the command line arguments that Scheme compilers should accept, and STkLos follows the standards specified in this SRFI, as well as providing its own options.

$ stklos-compile -o example example.scm
Compilation time 15 ms
$ ./example
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.

This code runs only at run-time.
I want this code to run at load-time and run-time.

It should also be noted that a STkLos stand-alone executable is actually just a binary blob of the STkLos bytecode form of your Scheme program. This blob is automatically invoked by a copy of the STkLos bytecode interpreter at the entry point of the stand-alone executable. So it is likely not as efficient as the compiled binary program produced by compilers like Gambit.

Conclusion

Hopefully now we have a better understanding of when to use the Scheme the load procedure, and when to use "import" instead. Now we know that "load" updates a REPL, whether that REPL be an interactive interpreter or a compiler. The operational semantics of load-ing are to simply macro-expand and evaluate one Scheme expression at a time, and each evaluated expression updates the REPL.

Now we know that the "eval" procedure for a Scheme compiler is only a partial evaluation of the Scheme program which emits an more efficient representation of a Scheme program, but a compiler's "eval" still has the same semantics as an interpreter's "eval", and that macros defined at compile time are expanded the same as they would be if they were load-ed in an interactive interpreter, meaning your program can change how it is compiled by the Scheme compiler.

And I hope you found it interesting how the various R7RS-compliant Scheme compilers each perform compilation of a Scheme program, and how in spite of the big differences in the implementation details between each of them, the semantics of "load" is the same across all implementations.

I hope also that if you have not tried using Scheme yet, seeing examples of how you can actually use four different Scheme implementations may have given you at least a vague idea of the different features provided by each of the compilers demonstrated above, and perhaps this may help inspire you to choose one of those Scheme compilers to use for one of your own programming projects.