Scheme "load
" in various compilers
Most other programming languages I have used are either strictly compiled (C, C++, C#, Java) or strictly interpreted (JavaScript, POSIX Shell/Bash). Languages that can be compiled and interpreted, like Haskell or Python, have a clear distinction between compiling and interpreting code.
The Scheme programming language is a bit different. Many Scheme
compilers work like Python with bytecode compilation, or Haskell where
you can compile and then used compiled code in the interpreter
REPL. But the R7RS standard in fact does not even define
a compile
command, nor does it specify compilation or
interpretation phases in the operational semantics at all. Compilation
seems to be considered an implementation detail, at least from the
point of view of the R7RS document.
So for me, understanding the difference between compile time and runtime in Scheme was a little confusing at first. Also Scheme does macro expansion, and so it was not clear to me why a Scheme compiler could do macro expansion strictly at compile time when the language itself doesn't even define what "compile time" means.
Fortunately, most Scheme compilers operate in somewhat similar
fashion regardless of their implementation details, and the one thing
that ties them all together is the "load
" procedure,
which is clearly defined in the R7RS standard. So what does
"load
" actually do? And by the way, Scheme also has an
"import
" keyword. When should we use "load
",
when should we use "import
"?
I would like to answer these questions in this article, and develop
an understanding of what a "compiler" is from a Lisp/Scheme way of
thinking. I also provide some concrete examples of how the
"load
" procedure works across four different R7RS
standard compliant Scheme implementations: Guile, Gambit, Stklos, and
MIT Scheme.
TL;DR
You use the "
load
" procedure to update the current REPL environment with useful data and procedures. Therefore in order a use a Scheme program, you always must first "load
" it, and then you call the procedures that were defined as a result. The REPL environment into which code is "load
-ed" need not be an interactive REPL (though it usually is), but the REPL environemnt could be the state of a Scheme compiler as it emits compiled binary code. "REPL" means "read, evaluate, print, and loop", it does not necessarily mean "read interactively."The "
import
" statement is a statement, whereas "load
" is a procedure. Use the "import
" statement is to compose libraries of code together into larger programs, do not use "load
" for this purpose. Probably the only time will ever need to use "load
" is in an interactive REPL.The "
eval
" procedure used by "load
" in a Scheme compiler is different from the "eval
" used by an interpreter, the version of "eval
" used by a scheme compiler performs only a partial evaluation of the code. But the compiled version of a Scheme program must have the same semantics as the interpreted version of that program.Defining a new macro immediately changes the behavior of "
eval
" and "load
" regardless of whether you are using a compiling "eval
" or an interpreted "eval
". This means that if your compiled program has defined a macro, your macro affected the behavior of the Scheme compiler itself while it was being compiled so that the compiled program will still behave though it were interpreted. This allows you to applying custom optimizations or type checking. You can make the compiler affect many other kinds of code transformations as well.If your compiled Scheme program uses "
eval
" somewhere at runtime, and not just at compile time, the version of "eval
" that is linked into the compiled program is probably going to be the purely interpreted version of "eval
," and not the version of "eval
" that was used to compile the program. The interpreted version of "eval
" that is linked into your compiled program will probably also be modified (parameterized) to include any macros that were in scope where the "eval
" procedure was called so as to ensure that running a compiled version of your code will work identically to the interpreted version of your code.Although the the behavior of "
load
" is a fairly consistent across all Scheme implementations, it is ultimately a very implementation-specific procedure. So please read the documentation of your Scheme implementation to better understand how "load
" works. Some Scheme compilers, like Guile, automatically compile a program as soon as you "load
" it. Other compilers, like Gambit, will interpret your program unless you explicitly compile your program and then "load
" the compiled code.
Before we begin: A note on the meaning of "REPL"
I worry that the term "REPL" nowadays is widely misunderstood. The acronym, pronounced "Reh-Pull" is usually used only to describe "interactive" REPLs for languages like Python, Ruby, Perl, and so on. But notice that the word "interactive" is not included in the REPL acronym: "Read-Eval-Print Loop." A REPL need not be interactive at all.
Lisp was the very first language to ever use a REPL, both in the
interactive sense, and in the non interactive sense. In the Lisp
family of languages, the "load
" function is the
REPL. That is to say, what "load
" actually does is it
enters into a loop in which a single expression (which Lisp calls a
"form") is read, the read form is evaluated,
and then the result of evaluation may (or may not) print
a result. Then the loop begins again at the "read" step.
In the rest of this article, I want readers to keep in mind that
when I say "REPL", I am talking about what the "load
"
procedure is doing, and not about an "interactive REPL" like what many
programmers may have in mind when the term is usually mentioned.
What does "load
" actually do?
According to the Scheme report:
The "load
" procedure reads expressions and definitions from the file and evaluates them sequentially in the environment specified byENVIRONMENT-SPECIFIER
.
(...for some arbitrary definition of "file.") Note also the use of
the words "evaluates them sequentially," which implies the use of
"eval
" to execute code, but does a compiled Scheme
program use "eval
"?
Also, what is environment-specifier
? Well, for now,
lets think of an environment object as the state of a REPL,
although environments are such a deep topic that they must be covered
in a separate article (stay tuned).
In simplest possible terms, the "load
" procedure
creates or updates a Scheme "environment" object with data
and executable code taken from outside of the Scheme
environment. If we compare Scheme to other programming languages, the
behavior of "load
" is as follows:
For Scheme compilers, how "
load
" works under the hood closely resemble how Python "import
" works under the hood if your Python implementation were configured to always byte-code compile a program. That is, the program is partially evaluated to produce an efficient binary representation of the program, then that binary is executed in a way that preserves the semantics of applying "load
" to an interpreted form of the program.For Scheme interpreters, "
load
" is analogous to the Bash/Zsh "source
" command which evaluates a file as though it were typed into the REPL directly.
But the R7RS standard does not say your Scheme implementation must necessarily operate like this at all.
When you look at it more closely, the R7RS specification really
gives a lot of leeway to Scheme implementations. The
"load
" procedure simply to takes a reference to
a piece of information that the R7RS document calls a "file", which
exists somewhere outside of the program, and uses that reference to
that "file" to modify an environment object, presumably by
evaluating code in the file. The reference to this "file"
must be represented by a string, but the semantics of the string are
unspecified. The string could be file path, a URL, a database query, a
telephone number, a radio station call sign... it could display a name
on a marquee indicating a trained pigeon to peck Morse code on a small
red button.
(let ((game (interaction-environment)))
(load "./my-game.scm" game)
(load "~/directory/with/many/files/" game)
(load "https://example.com/scheme/my-game.scm" game)
(load "SELECT code FROM scheme-code-DB WHERE name = 'my-game'" game)
(load "(system \"/usr/bin/python -m symbol-server\")" game)
(load "(617) 258-8682" game)
(load "WMBR 88.1 FM" game)
)
;; WARNING: none of these are guaranteed to work, consult
;; the documentation for your scheme implementation for
;; how to use the "LOAD" procedure.
But of course, a useful Scheme implementation will try to execute
load
ed procedures as quickly as possible, and to that
end, a Scheme compiler might try to optimize the load
ed
procedures by transforming textual, human-readable code into efficient
binary code that can be copied directly into the working memory of the
computer, and caching that binary code somewhere it can be retrieved
the next time "load
" is applied the same string
again. But these are implementation details, and not specified in, or
required by, the R7RS standard.
So what is "load
" used for? It is used to give
you access to data/procedures taken from some resource outside the
system and bind them to symbols inside of a REPL. Loading a
Scheme program should behave the same regardless of whether the REPL
is an interactive shell, or whether it is a compiler. Again, keep in
mind that the affected "REPL" might or might not be
an interactive REPL like Python or a POSIX shell.
Notably, however, "load
" should not be
used to assemble many Scheme programs together into a larger one. For
that, "define-library
" and "import
" should
be used instead (explained
later).
How "load
" is unique to Lisp-like languages
When "eval
" is being run by a compiler,
it is not the same "eval
" used by the interpreter. The
"eval
" for a Scheme compiler will only be performing a
partial evaluation of the code as it is translated to a
binary object program, and optimizing that binary as it goes.
You can still call "eval
" from a compiled procedure
and run arbitrary code constructed at runtime. But to accomplish this,
the version of "eval
" that is being used by the compiler
will usually link a purely interpreted version of "eval
"
for use at runtime, it will not link to the version of
"eval
" currently being used by the compiler. The call to
the interpreted "eval
" procedure will probably also have
to be updated with the set of macros that are in scope at site at
which "eval
" is invoked. This is the only way for a
compiler to ensure that the semantics of "eval
" in the
compiled program will be the same as that of the purely interpreted
program.
When a macro definition is evaluated in the compiler REPL
environment, it is immediately included into the set of rules used by
the "eval
" procedure which is, in turn, being used by
"load
". So if your Scheme program does define a macro,
"eval
will insert your own code transformation rules
directly into the compiler itself. This may include your own custom
code optimizations.
But Lisp and Scheme macros go beyond simply performing compiler
optimizations. The "load
" procedure may encounter code
that overwrites certain built-in symbols such as define
or let
so that type checking can be performed at binding
sites. Comments could be extracted to generate
documentation. Theoretically, you could define Lisp systems to perform
all kinds of analysis and transformation on your code:
defining generic methods dispatched by data type, as in the Meta Object Protocol
compiling a user manual from literate code
type checking and linting
generating serialization protocols from data types
generating object-relational mappings to data types
running regression tests
gathering profiling statistics on regression tests
statistics-guided compiler optimization, perhaps even using modern machine learning techniques such as LLMs to optimize code
Note that all of the above code transformations are theoretically possible, but just because each of the above systems are possible does not necessarily mean it is easy or inexpensive to find a good Lisp system that actually does any of the above mentioned things.
How is "import
" different from "load
"
The Scheme "import
" statement is different from
"load
" in a few important ways. The "import
"
statement is:
an expression (a built-in keyword), whereas "
load
" is a procedure (executable function).allows you to limit which symbols are loaded with the "
only
" and "except
" keywords, and rename symbols with the "prefix
" and "rename
" keywords.only modifies the current environment object and cannot be given an alternative environment object on which to operate, whereas "
load
" takes an environment object as an optional second argument.is specifically designed to work with the "
define-library
" keyword. The "import
" mechanism takes a logical library name such as "(scheme base)
" that has been defined by "define-library
". Contrast this with "load
" which takes a string indicating some arbitrary resource external to the Scheme system.
Also, there is some difference between implementation as to whether
you are allowed to use "import
" in an interactive REPL or
not. According section 5.1 of the R7RS standard:
A Scheme program consists of one or more import declarations followed by a sequence of expressions and definitions.
So "import
" can be used as the first expressions at
the top of a program file, but not used again after the first
non-import
expression. "import
" can also be
used in a "define-library
" expression. But an interactive
REPL is not necessarily the same REPL that "load
" is
given when you begin loading a program file. So here a quick overview
of the use of "import
" across four of the R7RS-compliant
Scheme implementations:
MIT Scheme never allows "
import
" in the REPL, only at the top-level of a program file, or in a "define-library
" expression.Guile v3 and later allows "
import
" in the interactive REPL, much like in Python.STkLos v2 and later is like Guile, allowing "
import
" statements in the interactive REPL.Gambit v4.9.5 and later behaves like Guile and STkLos, allowing "
import
" to be used in the interactive REPL. However even slightly older versions of Gambit (v4.9.3 and older) are not fully R7RS compliant with regard to "import
", and does not handle "import
" properly.
Specific examples
So lets see how various actual Scheme compilers compile things. I
have here an example program, which can be load
ed as-is
by Guile, MIT-Scheme, Gambit, and STkLos — four Scheme
implementations which have made a best effort at implementing the R7RS
Scheme standard. Here is the example code:
(import
(scheme base)
(only (scheme file) open-binary-input-file))
(display "This code runs at load-time.")
(define a-number 5) ;; small objects can be stored in an environment
(define a-huge-binary-blob
;; Large objects can be stored in an environment as well.
;; Here we load a file up to (expt 2 22) in size, which is 4 MiB
;; If this program is compiled, the compiled program /could/ have
;; this entire binary blob stored within it.
(call-with-input-file
(open-binary-input-file "binary-data.raw")
(lambda (port) (read-bytevector (expt 2 22) port))))
;; Of course, executable procedures can be stored in an environment:
(define (load-time-and-runtime-code)
(display "I want this code to run at load-time and run-time."))
(define (main . args) (display "This code runs only at run-time.")
(load-time-and-runtime-code))
(display "This code also runs at load-time.")
;; If an executable procedure is fully defined, it can be called
;; at load time as well.
(load-time-and-runtime-code)
(newline)
How the Guile compiler operates
Guile compiles any file you load to bytecode, and keeps a cache of
compiled bytecode objects. Once load
-ed into memory, the
bytecode can be JIT compiled (compiled on demand) to further improve
performance while the program is running. You can force re-compilation
of the cached file with the --fresh-auto-compile
flag.
$ #-------------------- First invocation --------------------
$ guile --r7rs
GNU Guile 3.0.8
Copyright (C) 1995-2021 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guile-user)> (load "example.scm")
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;; or pass the --no-auto-compile argument to disable.
;;; compiling /home/ramin/example.scm
WARNING: (guile-user): imported module (scheme base) overrides core binding `expt'
WARNING: (guile-user): imported module (scheme base) overrides core binding `expt'
;;; compiled /home/ramin/.cache/guile/ccache/3.0-LE-8-4.6/home/ramin/example.scm.go
This code runs at load-time.
This code also runs at load-time.
I want this code to run at load-time and run-time.
scheme@(guile-user)> ;; The program was compiled, and
scheme@(guile-user)> ;; we can access data/procedures in the compiled file
scheme@(guile-user)> (bytevector-u8-ref a-huge-binary-blob 0)
$1 = 0
scheme@(guile-user)> (main)
This code runs only at run-time.
I want this code to run at load-time and run-time.
scheme@(guile-user)> ,q
$ #-------------------- Second invocation --------------------
$ guile --r7rs -e main example.scm
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
This code runs only at run-time.
I want this code to run at load time and runtime.
$ # The program was already compiled so it isn't compiled again.
Notice that on the first invocation the compiler log messages indicate the file is compiled. Then we see the load time messages displayed. Note that it reports the file path where the bytecode file is cached:
/home/ramin/.cache/guile/ccache/3.0-LE-8-4.6/home/ramin/example.scm.go
The cached object file has a ".go
" extension. This is
not a Go programming language file, it is a "Guile Object" file. Guile
has been using this ".go
" file extension since before the
Go programming language was ever even invented.
On the second invocation of Guile on my example program, there are
no messages from the compiler because the compiler does not run at
all. Guile automatically reloads the compiled bytecode file from cache
and runs it immediately. The same would happen if I had invoked the
example program using "load
" from the interactive REPL.
In the second invocation, I use the "-e main
" command
line argument to apply the main
procedure after the
program is loaded. We could, however, simply write
"(main)
" as the last line of the program file as well. We
then see the load time and runtime messages displayed.
How the MIT/GNU Scheme compiler operates
MIT/GNU Scheme does everything via the REPL, it doesn't really
allow you to control many of its features via command line
arguments. Also, it defaults to interpretation. If you want a file to
be compiled, you must explicitly compile a file
before load
-ing it, otherwise it is interpreted. To
compile a file, use the "cf
" procedure (CF means "compile
file"). It then produces a compiled object file with a
".com
" extension, and you can load
this
".com
" file using the load
procedure.
$ mit-scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2020 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Sunday March 7, 2021 at 3:24:56 PM
Release 11.2 || SF || LIAR/x86-64
1 ]=> (cf "example.scm") ;; --------------- this compiles the file
;Generating SCode for file: "example.scm" => "example.bin"...
; This program does not have a USUAL-INTEGRATIONS declaration.
; Without this declaration, the compiler will be unable to perform
; many optimizations, and as a result the compiled program will be
; slower and perhaps larger than it could be. Please read the MIT
; Scheme User's Guide for more information about USUAL-INTEGRATIONS.
;Warning: Unreferenced bound variable: args (main)
;... done
;Compiling file: "example.bin" => "example.com"... done
;Unspecified return value
1 ]=> (load "example.com") ;; ---- loading now happens much faster
;Loading "example.com"...
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
;... done
;Unspecified return value
1 ]=> (bytevector-u8-ref a-huge-binary-blob 0) ;; data/procedures are now available
;Value: 0
1 ]=> (main)
This code runs only at run-time.
I want this code to run at load time and runtime.
;Unspecified return value
1 ]=> (exit 0) ;; ------ it is even harder to get out of than "vi"
..#]^@^@^@ NO CARRIER
$ # "No carrier." Very funny.
$ # I'm old enough to remember modems and dialup connections.
$
Notice how the load time messages still displayed at load time even
after the file has been compiled. Scheme compilers must maintain the
"load
" semantics even when files are compiled, and so it
will compile the file to executable code that works the same as it
would if it were interpreted. And of course, the runtime messages are
displayed as soon as the main
procedure is applied.
By the way, to get rid of the message "This program does not
have a USUAL-INTEGRATIONS declaration...
" add this block of
code to the top of the example.scm
program:
(cond-expand
(mit/gnu
(declare (usual-integrations)))
(else))
Also note that the compiler generates the following files, which according to the user manual:
example.com
- contains binary executable code
example.bci
- contains debugging information, source locations
example.bin
- contains intermediate bytecode (called "SCode")
How the Gambit compiler operates
NOTE: that this only works for Gambit 4.9.5 or later, any earlier versions will probably not work as described here.
The Gambit compiler operates the same as MIT/GNU Scheme,
interpreting although the procedure to compile a file is
called compile-file
instead of cf
. The
compiler-enabled REPL is launched with the "gsc
" (Gambit
Scheme Compiler) command. Note: that there is also a
Gambit Scheme Interpreter executable called "gsi
", but it
does not provide the "compile-file
" procedure. So the
following must use GSC:
$ gsc -:r7rs
Gambit v4.9.5
> (compile-file "example.scm") ;; ------------------- compile
"/home/ramin/example.o1"
> (load "example.o1") ;; --------------- load compiled object
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
"/home/ramin/example.o1"
> (bytevector-u8-ref a-huge-binary-blob 0)
0
> (main) ;; --------------------------------- run the program
This code runs only at run-time.
I want this code to run at load time and runtime.
> ,q ;; --------------------------------------- exit the REPL
$
As you can see, the compiled binary object file has an
".o1
" filename extension. Compiling the example again
will increment the number appended to ".o
" so you will
see "example.o2
" the next time you compile, unless you
delete "example.o1
".
The next time you launch the "gsc
" or
"gsi
" interactive REPL, you can apply "load
"
to these compiled ".o*
" object files to use the optimized
binary form of your program. It is also possible to invoke an
".o*
" object program using "gsi
":
$ gsi ./example.o1
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
Gambit also makes it easier to invoke compilation from the command
line, and to generate programs that can be executed as stand-alone
binary files. Simply use the "-exe
<filepath>
" option to indicate the file path of
where stand-alone executable should be created.
$ gsc -:r7rs -exe -o example example.scm
$ ./example
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
$
The load time messages are still displayed when running the
program, in keeping with the "load
" semantics that
loading calls all procedures applied at the top level of the
program. If you want the "(main)
" procedure to run, write
a line of code at the end of the program applying the "main"
procedure.
How the STkLos compiler operates
STkLos uses the same as procedure as MIT/GNU and Gambit, and
produces bytecode files. Since version 2.0 (and if I recall correctly,
even as early as version 1.7) STkLos provides the
"compile-file
" function similar to Gambit, although
you must specify the compilation target output file
path.
$ stklos
\ STklos version 2.00 (stable)
\ Copyright (C) 1999-2023 Erick Gallesio
/ \ [Linux-6.1.0-21-arm64-aarch64/pthreads/readline/utf8]
/ \ Type ',h' for help
stklos> (compile-file "example.scm" "example.stklos")
stklos> (load "example.stklos")
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
stklos> (bytevector-u8-ref a-huge-binary-blob 0)
0
stklos> (main)
This code runs only at run-time.
I want this code to run at load time and runtime.
stklos> ,q
The compiled file "example.stklos
" is actually a
stand-alone executable file, and if you mark this file as executable
to your host OS (such as with "chmod
") you can run the
compiled Scheme program directly from the host OS command line.
$ ./example.stklos
This code runs at load-time.
This code also runs at load-time.
I want this code to run at load-time and run-time.
This code runs only at run-time.
I want this code to run at load-time and run-time.
$ chmod 755 example.stklos
Notice that if you do run the compiled program as a stand-alone
executable, STkLos will automatically invoke any procedure called
"main
" after load time completes. However
"main
" is not applied if you "load
" the
compiled program in an interactive REPL.
As with Gambit, STkLos provides an easier way to build stand-alone
executables from the host OS command line, so you do not need to use
"chmod
". There is actually
a Scheme
SRFI, number 138 specifying the command line arguments that Scheme
compilers should accept, and STkLos follows the standards specified in
this SRFI, as well as providing its own options.
$ stklos-compile -o example example.scm
Compilation time 15 ms
$ ./example
This code runs at load time.
This code also runs at load-time.
I want this code to run at load time and runtime.
This code runs only at run-time.
I want this code to run at load-time and run-time.
It should also be noted that a STkLos stand-alone executable is actually just a binary blob of the STkLos bytecode form of your Scheme program. This blob is automatically invoked by a copy of the STkLos bytecode interpreter at the entry point of the stand-alone executable. So it is likely not as efficient as the compiled binary program produced by compilers like Gambit.
Conclusion
Hopefully now we have a better understanding of when to use the
Scheme the load
procedure, and when to use
"import
" instead. Now we know that "load
"
updates a REPL, whether that REPL be an interactive interpreter or a
compiler. The operational semantics of load
-ing are to
simply macro-expand and evaluate one Scheme expression at a time, and
each evaluated expression updates the REPL.
Now we know that the "eval
" procedure for a Scheme
compiler is only a partial evaluation of the Scheme program which
emits an more efficient representation of a Scheme program, but a
compiler's "eval
" still has the same semantics as an
interpreter's "eval
", and that macros defined at
compile time are expanded the same as they would be if they
were load
-ed in an interactive interpreter, meaning your
program can change how it is compiled by the Scheme compiler.
And I hope you found it interesting how the various R7RS-compliant
Scheme compilers each perform compilation of a Scheme program, and how
in spite of the big differences in the implementation details between
each of them, the semantics of "load
" is the same across
all implementations.
I hope also that if you have not tried using Scheme yet, seeing examples of how you can actually use four different Scheme implementations may have given you at least a vague idea of the different features provided by each of the compilers demonstrated above, and perhaps this may help inspire you to choose one of those Scheme compilers to use for one of your own programming projects.