Concentric rings from a crescent shaped prototile.

Chapter 1. Introduction

Scheme is a general-purpose computer programming language. It is a high-level language, supporting operations on structured data such as strings, lists, and vectors, as well as operations on more traditional data such as numbers and characters. While Scheme is often identified with symbolic applications, its rich set of data types and flexible control structures make it a truly versatile language. Scheme has been employed to write text editors, optimizing compilers, operating systems, graphics packages, expert systems, numerical applications, financial analysis packages, and virtually every other type of application imaginable. Scheme is a fairly simple language to learn, since it is based on a handful of syntactic forms and semantic concepts and since the interactive nature of most implementations encourages experimentation. Scheme is a challenging language to understand fully, however; developing the ability to use its full potential requires careful study and practice.

Scheme programs are highly portable across implementations of the same Scheme system on different machines, because machine dependencies are almost completely hidden from the programmer. Also, because of two related Scheme standardization efforts, it is possible to use a standard dialect of Scheme to write programs that are portable across different Scheme implementations. The standard dialect is defined by an ANSI/IEEE Standard described in the "IEEE Standard for the Scheme Programming Language" [15]. The ANSI/IEEE standard grew out of an ongoing effort by a group of Scheme designers, who have published a series of less formal reports, the "Revised Reports" on Scheme. The most recent revised report, the "Revised⁴ Report" [4], describes a dialect very close to the standard dialect, differing primarily in the addition of a few primitives. A "Revised⁵ Report" is expected to appear soon, highlighted by the addition of a high-level syntactic extension system, an eval procedure, and multiple return values.

Although some early Scheme systems were inefficient and slow, many newer compiler-based implementations are fast, with programs running on par with equivalent programs written in lower-level languages. The relative inefficiency that sometimes remains results from run-time checks that help the programmer detect and correct various common programming errors. These checks may be disabled in most implementations.

Scheme handles data values quite differently from most languages. Data values, or objects, are dynamically allocated in a heap where they are retained until no longer needed, then automatically deallocated. Objects are first-class data values; because they are heap-allocated and retained indefinitely, they may be passed freely as arguments to procedures, returned as values from procedures, and combined to form new objects. This is in contrast with most other languages where composite data values such as arrays are either statically allocated and never deallocated, allocated on entry to a block of code and unconditionally deallocated on exit from the block, or explicitly allocated and deallocated by the programmer.

Scheme supports many types of objects, including numbers, characters, strings, symbols, and lists or vectors of objects. A full set of numeric data types, including complex, real, and arbitrary-precision rational numbers, allows Scheme to support many numerical applications typically coded in lower-level languages.

At the heart of the Scheme language is a small core of syntactic forms from which all other forms are built. These core forms, a set of extended syntactic forms derived from them, and a library of primitive procedures make up the full Scheme language. An interpreter or compiler for Scheme can be quite small, and potentially fast and highly reliable. The extended syntactic forms and many primitive procedures can be defined in Scheme itself, simplifying the implementation and increasing reliability.

Scheme programs share a common printed representation with Scheme data structures. As a result, any Scheme program has a natural and obvious internal representation as a Scheme object. For example, variables and syntactic keywords correspond to symbols, while structured syntactic forms correspond to lists. This representation is the basis for the syntactic extension facilities provided by most Scheme systems for the definition of new syntactic forms in terms of existing syntactic forms and procedures. It also facilitates the implementation of interpreters, compilers, and other program transformation tools for Scheme directly in Scheme, as well as program transformation tools for other languages in Scheme.

Scheme variables and keywords are lexically scoped, and Scheme programs are block-structured. Identifiers may be bound at top level (as are the names of primitive Scheme procedures and syntactic forms) or locally, within a given block of code. A local binding is visible only lexically, i.e., within the program text that makes up the particular block of code. An occurrence of an identifier of the same name outside this block refers to a different binding; if no binding for the identifier exists outside of the block, then the reference is invalid. Blocks may be nested, and a binding in one block may shadow a binding for an identifier of the same name in a surrounding block. The scope of a binding is the block in which the bound identifier is visible minus any portions of the block in which the identifier is shadowed. Block structure and lexical scoping help create programs that are modular, easy to read, easy to maintain, and reliable. Efficient code for lexical scoping is possible because a compiler can determine before program evaluation the scope of all bindings and the binding to which each identifier reference resolves. This does not mean, of course, that a compiler can determine the values of all variables, since the actual values are not computed in most cases until the program executes.

In most languages, a procedure definition is simply the association of a name with a block of code. Certain variables local to the block are the parameters of the procedure. In some languages, a procedure definition may appear within another block or procedure so long as the procedure is invoked only during execution of the enclosing block. In others, procedures can be defined only at top level. In Scheme, a procedure definition may appear within another block or procedure, and the procedure may be invoked at any time thereafter, even if the enclosing block has completed its execution. To support lexical scoping, a procedure carries the lexical context (environment) along with its code.

Furthermore, Scheme procedures are not always named. Instead, procedures are first-class data objects like strings or numbers, and variables are bound to procedures in the same way they are bound to other objects.

As with procedures in most other languages, Scheme procedures may be recursive. That is, any procedure may invoke itself directly or indirectly. Many algorithms are most elegantly or efficiently specified recursively. A special case of recursion, called tail recursion, is used to express iteration, or looping. A tail call occurs when one procedure directly returns the result of invoking another procedure; tail recursion occurs when a procedure recursively tail calls itself, directly or indirectly. Scheme implementations are required to implement tail calls as jumps (gotos), so the storage overhead normally associated with recursion is avoided. As a result, Scheme programmers need master only simple procedure calls and recursion and need not be burdened with the usual assortment of looping constructs.

Scheme supports the definition of arbitrary control structures with continuations. A continuation is a procedure that embodies the remainder of a program at a given point in the program. When a continuation is invoked, the program immediately continues from that point. A continuation may be obtained at any time during the execution of a program. As with other procedures, a continuation is a first-class object and may be invoked at any time after its creation. Continuations allow the implementation of complex control mechanisms including explicit backtracking, multithreading, and coroutines.

Many Scheme implementations support the syntax-rules syntactic extension (macro) system adopted for inclusion in the Revised⁵ Report on Scheme. This system allows programmers to define extended syntactic forms in terms of existing syntactic forms using a convenient high-level pattern language. Of those implementations that do not support syntax-rules, virtually all provide some other mechanism for defining extended syntactic forms. Syntactic extensions are useful for defining new language constructs, for emulating language constructs found in other languages, for achieving the effects of in-line code expansion, and even for emulating entire languages in Scheme. Most large Scheme programs are built from a mix of syntactic extensions and procedure definitions.

Scheme evolved from the Lisp language and is considered to be a dialect of Lisp. Scheme inherited from Lisp the treatment of values as first-class objects, several important data types, including symbols and lists, and the representation of programs as objects, among other things. Lexical scoping and block structure are features taken from Algol 60 [18]. Scheme was the first Lisp dialect to adopt lexical scoping and block structure, the notion of first-class procedures, treatment of tail calls as jumps, and continuations.

Common Lisp [22] and Scheme are both contemporary Lisp languages, and the development of each has been influenced by the other. Like Scheme but unlike earlier Lisp languages, Common Lisp adopted lexical scoping and first-class procedures. Common Lisp's evaluation rules for procedures are different from the evaluation rules for other objects, however, and it maintains a separate namespace for procedure variables, thereby discouraging the use of procedures as first-class objects. Also, Common Lisp does not support continuations or require proper treatment of tail calls, but it does support several less general control structures not found in Scheme. While the two languages are similar, Common Lisp includes more specialized operators, while Scheme includes more general-purpose building blocks out of which such operators (and others) may be built.

The remainder of this chapter describes Scheme's syntax and naming conventions and the typographical conventions used throughout this book.

Section 1.1. Scheme Syntax

Scheme programs are made up of keywords, variables, structured forms, constant data (numbers, characters, strings, quoted vectors, quoted lists, quoted symbols, etc.), whitespace, and comments.

Keywords, variables, and symbols are collectively called identifiers. Identifiers may be formed from the following set of characters:

the lowercase letters a through z,
the uppercase letters A through Z,
the digits 0 through 9, and
the characters ? ! . + - * / < = > : $ % ^ & _ ~.

Identifiers normally cannot start with any character that may start a number, i.e., a digit, plus sign ( + ), minus sign ( - ), or decimal point ( . ). Exceptions are +, -, and ..., which are valid identifiers. For example, hi, Hello, n, x, x3, and ?$&*!!! are all identifiers. Identifiers must be delimited by whitespace, parentheses, a string (double) quote ( " ), or the comment character ( ; ). All implementations must recognize as identifiers any sequences of characters that adhere to these rules. Other sequences of characters, such as -1234a, that do not represent numbers or other syntactic entities may be recognized as identifiers in some implementations, although it is best to avoid such identifiers in code that may need to run in more than one Scheme system.

There is no inherent limit on the length of a Scheme identifier; programmers may use as many characters as necessary. Long identifiers are no substitute for comments, however, and frequent use of long identifiers can make a program difficult to format and consequently difficult to read.

Identifiers may be written in any mix of uppercase and lowercase letters. The case is not important, in that two identifiers differing only in case are identical. For example, abcde, Abcde, AbCdE, and ABCDE all refer to the same identifier. Scheme systems typically print an identifier in either all uppercase or all lowercase letters regardless of the way it is entered.

Structured forms and list constants are enclosed within parentheses, e.g., (a b c) or (* (- x 2) y). The empty list is written (). Some implementations permit the use of brackets ( [ ] ) in place of parentheses, and brackets are sometimes used to set off particular subexpressions for readability.

The boolean values representing true and false are written as #t and #f. Scheme conditional expressions actually treat #f as false and all other objects as true.

The ANSI/IEEE standard requires that () and #f be distinct objects, but the Revised⁴ Report allows them to be the same. If they are the same, () counts, naturally, as false; otherwise, it counts as true. Scheme implementations in which () and #f are the same object always choose one way or the other to write the object when it is printed, which can lead to some confusion. This book always uses () for the empty list and #f for false.

Vectors are written similarly to lists, except that they are preceded by #( and terminated by ), e.g., #(this is a vector of symbols). Strings are enclosed in double quotation marks, e.g., "I am a string". Characters are preceded by #\, e.g., #\a. Case is important within character and string constants, unlike within identifiers. Numbers may be written as integers, e.g., -123, as ratios, e.g., 1/2, in floating-point or scientific notation, e.g., 1.3 or 1e23, or as complex numbers in rectangular or polar notation, e.g., 1.3-2.7i or -1.2@73. Details of the syntax for each type of constant data are given in the individual sections of Chapter 6 and in the formal syntax of Scheme given in the back of the book.

Scheme expressions may span several lines, and no explicit terminator is required. Since the number of whitespace characters (spaces and newlines) between expressions is not significant, Scheme programs are normally indented to show the structure of the code in a way that is pleasing to the author of the program. Comments may appear on any line of a Scheme program, between a semicolon ( ; ) and the end of the line. Comments explaining a particular Scheme expression are normally placed at the same indentation level as the expression, on the line before the expression. Comments explaining a procedure or group of procedures are normally placed before the procedures, without indentation. Multiple comment characters are often used to set off the latter kind of comment, e.g., ;;; The following procedures ....

Section 1.2. Scheme Naming Conventions

Scheme's naming conventions are designed to provide a high degree of regularity. The following is a list of these naming conventions:

Predicate names end in a question mark ( ? ). Predicates are procedures that return with a true or false answer, such as eq?, zero?, and string=?. The common numeric comparators =, <, >, <=, and >= are exceptions to this rule.
Type predicates, such as pair?, are created from the name of the type, in this case pair, and the question mark.
The names of most character, string, and vector procedures start with the prefix char-, string-, and vector-, e.g., string-append. (The names of some list procedures start with list-, but most do not.)
The names of procedures that convert an object of one type into an object of another type are written as type₁->type₂, e.g., vector->list.
The names of procedures and syntactic forms that cause side effects end with an exclamation point ( ! ). These include set! and vector-set!. Procedures that perform input or output technically cause side effects, but their names are exceptions to this rule.

Section 1.3. Typographical and Notational Conventions

Often, the value of a procedure or syntactic form is said to be unspecified. This means that an implementation is free to return any Scheme object as the value of the procedure or syntactic form. Do not count on this value being the same across implementations, the same across versions of the same implementation, or even the same across two uses of the procedure or syntactic form. Some Scheme systems routinely use a special object to represent unspecified values. Printing of this object is often suppressed by interactive Scheme systems, so that the values of expressions returning unspecified values are not printed.

Scheme expressions usually evaluate to a single value, although the multiple values mechanism described in Section 5.8 allows an expression to evaluate to zero or more than one value. To simplify the presentation, this book usually refers to the result of an expression as a single value even if the expression may in fact evaluate to zero or more than one value.

This book sometimes says "it is an error" or "an error will be signaled" when describing a circumstance in violation of the rules of Scheme. Something that is an error is not valid in Scheme, and the behavior of a Scheme implementation in such a case is not specified. A signaled error results in the invocation of an implementation-dependent error handler, which typically results in an error message being printed and a reset of the interactive programming system or entry into a debugging subsystem.

The typographic conventions used in this book are straightforward. All Scheme objects are printed in a typewriter typeface, just as they are to be typed at the keyboard. This includes syntactic keywords, variables, constant objects, Scheme expressions, and example programs. An italic typeface is used to set off syntax variables in the descriptions of syntactic forms and arguments in the descriptions of procedures. Italics are also used to set off technical terms the first time they appear. In general, names written in typewriter font are never capitalized (even at the beginning of a sentence). The same is true for syntax variables written in italics.

In the description of a syntactic form or procedure, a pattern shows the syntactic form or the application of the procedure. The syntax keyword or procedure name is given in typewriter font, as are parentheses. The remaining pieces of the syntax or arguments are shown in italics, using a name that implies the type of expression or argument expected by the syntactic form or procedure. Ellipses are used to specify zero or more occurrences of a subexpression or argument. For example, (or exp ...) describes the or syntactic form, which has zero or more subexpressions, and (member obj list) describes the member procedure, which expects two arguments, an object and a list.

R. Kent Dybvig
The Scheme Programming Language, Second Edition
© 1996. Electronically reproduced by permission of Prentice Hall, Upper Saddle River, New Jersey.
http://www.scheme.com
Illustrations © 1997 Jean-Pierre Hébert
to order this book
about this book