Formal Syntax of Scheme

The formal grammars and accompanying text appearing here describe the syntax of Scheme programs and data. Consult the Summary of Forms and Procedures and the individual descriptions given in Chapters 4 through 8 for additional details on specific syntactic forms.

Programs and data are formed from tokens, whitespace, and comments. Tokens include identifiers, booleans, numbers, characters, strings, open and close parentheses, the open vector parenthesis #(, the dotted pair marker . (dot), the quotation marks ' and `, and the unquotation marks , and ,@. Whitespace consists of spaces and newline characters and in some implementations also consists of other characters, such as tabs or form feeds. A comment consists of a semicolon ( ; ) followed by any number of characters up to the next line break. A token may be surrounded by any number of whitespace characters and comments. Identifiers, numbers, characters, and dot are delimited by whitespace, the start of a comment, an open or close parenthesis, or a string quote.

In the productions below, <empty> stands for the empty string. An item followed by an asterisk ( * ) represents zero or more occurrences of the item, and an item followed by a raised plus sign ( + ) represents one or more occurrences. Spacing between items within a production appears for readability only and should be treated as if it were not present.

Programs.  A program consists of a sequence of definitions and expressions.

<program><form>*
<form><definition> | <expression>

Definitions.  Definitions include variable and syntax definitions, begin forms containing zero or more definitions, let-syntax and letrec-syntax forms expanding into zero or more definitions, and derived definitions. Derived definitions are syntactic extensions that expand into some form of definition. A transformer expression is a syntax-rules form or some other expression that produces a transformer.

<definition><variable definition>
|<syntax definition>
|(begin <definition>*)
|(let-syntax (<syntax binding>*) <definition>*)
|(letrec-syntax (<syntax binding>*) <definition>*)
|<derived definition>
<variable definition>(define <variable> <expression>)
|(define (<variable> <variable>*) <body>)
|(define (<variable> <variable>* . <variable>) <body>)
<variable><identifier>
<body><definition>* <expression>+
<syntax definition>(define-syntax <keyword> <transformer expression>)
<keyword><identifier>
<syntax binding>(<keyword> <transformer expression>)

Expressions.  Expressions include core expressions, let-syntax or letrec-syntax forms expanding into a sequence of one or more expressions, and derived expressions. The core expressions are self-evaluating constants, variable references, applications, and quote, lambda, if, and set! expressions. Derived expressions include and, begin, case, cond, delay, do, let, let*, letrec, or, and quasiquote expressions plus syntactic extensions that expand into some form of expression.

<expression><constant>
|<variable>
|(quote <datum>) | ' <datum>
|(lambda <formals> <body>)
|(if <expression> <expression> <expression>) | (if <expression> <expression>)
|(set! <variable> <expression>)
|<application>
|(let-syntax (<syntax binding>*) <expression>+)
|(letrec-syntax (<syntax binding>*) <expression>+)
|<derived expression>
<constant><boolean> | <number> | <character> | <string>
<formals><variable> | (<variable>*) | (<variable>+ . <variable>)
<application>(<expression> <expression>*)

Identifiers.  Identifiers may denote variables, keywords, or symbols, depending upon context. They are formed from sequences of letters, digits, and special characters. With three exceptions, identifiers cannot begin with a character that can also begin a number, i.e., they cannot begin with ., +, -, or a digit. The three exceptions are the identifiers ..., +, and -. Case is insignificant in symbols so that, for example, newspaper, NewsPaper, and NEWSPAPER all represent the same identifier.

<identifier><initial> <subsequent>* | + | - | ...
<initial><letter> | ! | $ | % | & | * | / | : | < | = | > | ? | ~ | _ | ^
<subsequent><initial> | <digit> | . | + | -
<letter>a | b | ... | z
<digit>0 | 1 | ... | 9

Data.  Data include booleans, numbers, characters, strings, symbols, lists, and vectors. Case is insignificant in the syntax for booleans, numbers, and character names, but it is significant in other character constants and in strings. For example, #T is equivalent to #t, #E1E3 is equivalent to #e1e3, #X2aBc is equivalent to #x2abc, and #\NewLine is equivalent to #\newline; but #\A is distinct from #\a and "String" is distinct from "string".

<datum><boolean> | <number> | <character> | <string> | <symbol> | <list> | <vector>
<boolean>#t | #f
<number><num 2> | <num 8> | <num 10> | <num 16>
<character>#\ <any character> | #\newline | #\space
<string>" <string character>* "
<string character>\" | \\ | <any character other than " or \>
<symbol><identifier>
<list>(<datum>*) | (<datum>+ . <datum>) | <abbreviation>
<abbreviation>' <datum> | ` <datum> | , <datum> | ,@ <datum>
<vector>#(<datum>*)

Numbers.  Numbers can appear in one of four radixes: 2, 8, 10, and 16, with 10 the default. The first several of productions below are parameterized by the radix, r, and each represents four productions, one for each of the four possible radixes. Numbers that contain radix points or exponents are constrained to appear in radix 10, so <decimal r> is valid only when r is 10.

<num r><prefix r> <complex r>
<complex r><real r> | <real r> @ <real r>
|<real r> + <imag r> | <real r> - <imag r>
|+ <imag r> | - <imag r>
<imag r>i | <ureal r> i
<real r><sign> <ureal r>
<ureal r><uinteger r> | <uinteger r> / <uinteger r> | <decimal r>
<uinteger r><digit r>+ #*
<prefix r><radix r> <exactness> | <exactness> <radix r>
<decimal 10><uinteger 10> <exponent>
|. <digit 10>+ #* <suffix>
|<digit 10>+ . <digit 10>* #* <suffix>
|<digit 10>+ #+ . #* <suffix>
<suffix><empty> | <exponent>
<exponent><exponent marker> <sign> <digit 10>+
<exponent marker>e | s | f | d | l
<sign><empty> | + | -
<exactness><empty> | #i | #e
<radix 2>#b
<radix 8>#o
<radix 10><empty> | #d
<radix 16>#x
<digit 2>0 | 1
<digit 8>0 | 1 | ... | 7
<digit 10><digit>
<digit 16><digit> | a | b | c | d | e | f


R. Kent Dybvig
The Scheme Programming Language, Second Edition
© 1996. Electronically reproduced by permission of Prentice Hall, Upper Saddle River, New Jersey.
http://www.scheme.com
Illustrations © 1997 Jean-Pierre Hébert
to order this book
about this book