Formal Syntax

The formal grammar and accompanying text appearing here describe the syntax of Scheme programs and data with Chez Scheme extensions. See the formal syntax appendix in The Scheme Programming Language, Second Edition for a grammar that excludes Chez Scheme extensions.

Details on individual syntactic forms can be found elsewhere in this book or in The Scheme Programming Language, Second Edition. A summary can be found in the Summary of Forms on page 245.

Programs and data are formed from tokens, whitespace, and comments. Tokens include the standard Scheme tokens: identifiers, booleans, numbers, characters, strings, open and close parentheses, open vector parentheses (#(), dotted pair markers (.), quotation abbreviation marks (' and `), and unquotation abbreviation marks (, and ,@). They also include some specific to Chez Scheme: open vector parentheses with explicit lengths (#n(), open and close square brackets ([ and ]), open record brackets (#[), syntax abbreviation marks #', open and close braces ({ and }), box prefixs (#&), \#primitive abbreviations (#% and #n%), mark and reference tags (#n= and #n#), end-of-file (#!eof), the broken-weak-pointer object (#!bwp), and the fasl object prefix (#@). A few additional tokens are supported for compatibility with early dialects of Scheme: #!true, #!false, and #!null, which are equivalent to #t, #f, and () respectively.

Whitespace consists of spaces, newlines, carriage returns, tabs, and formfeeds. A comment may consist of a semicolon ( ; ) followed by any number of characters up to the next line break. A comment may also consist of any sequence of characters enclosed in properly nested #|, |# pairs or of the prefix #; followed by any datum. A token may be surrounded by whitespace and comments. Identifiers, numbers, characters, and dot are delimited by whitespace, a comment beginning with a semicolon, an open or close parenthesis, an open or close square bracket, or a string quote.

In the productions below, <empty> stands for the empty string. An item followed by an asterisk ( * ) represents zero or more occurrences of the item, and an item followed by a raised plus sign ( + ) represents one or more occurrences. Spacing between items within a production appears for readability only and should be treated as if it were not present.

Wherever matching open and close parentheses appear, matching open and close square brackets may be used instead, except within the syntax for vectors.

Programs.  A program consists of a sequence of definitions and expressions.

<program><form>*
<form><definition> | <expression>

Definitions.  Definitions include variable and syntax definitions; begin forms containing zero or more definitions; module, import, and import-only forms; let-syntax and letrec-syntax forms expanding into zero or more definitions; and derived definitions. Derived definitions are syntactic extensions that expand into some form of definition.

<definition><variable definition>
|<syntax definition>
|(begin <definition>*)
|<module form>
|<import form>
|(let-syntax (<syntax binding>*) <definition>*)
|(letrec-syntax (<syntax binding>*) <definition>*)
|<derived definition>
<variable definition>(define <variable> <expression>)
|(define <variable>)
|(define (<variable> <variable>*) <body>)
|(define (<variable> <variable>* . <variable>) <body>)
<variable><identifier>
<body><definition>* <expression>+
<syntax definition>(define-syntax <keyword> <transformer expression>)
<keyword><identifier>
<syntax binding>(<keyword> <transformer expression>)
<module form>(module <module name> <interface> <definition>* <init>*)
|(module <interface> <definition>* <init>*)
<import form>(import <module name>)
|(import-only <module name>)
<interface>(<export>*)
<export><identifier> | (<identifier> <export >*)
<init><expression>

A transformer expression is an expression that evaluates to a transformer.

Expressions.  Expressions include core expressions, let-syntax or letrec-syntax forms expanding into a sequence of one or more expressions, and derived expressions. The core expressions are self-evaluating constants, variable references, applications, and quote, case-lambda, if, and set! expressions. Derived expressions include all other nondefinition syntactic forms described in this book, plus syntactic extensions that expand into some form of expression.

<expression><constant>
|<variable>
|(quote <datum>) | ' <datum>
|(lambda <formals> <body>)
|(case-lambda (<formals> <body>) ...)
|(if <expression> <expression> <expression>) | (if <expression> <expression>)
|(set! <variable> <expression>)
|<application>
|(let-syntax (<syntax binding>*) <expression>+)
|(letrec-syntax (<syntax binding>*) <expression>+)
|<derived expression>
<constant><boolean> | <number> | <character> | <string> | <special>
<formals><variable> | (<variable>*) | (<variable>+ . <variable>)
<application>(<expression> <expression>*)

Identifiers.  Identifiers may denote variables, keywords, module names, or symbols, depending upon context. They are formed from sequences of letters, digits, and special characters. With three exceptions, standard identifiers cannot begin with a character that can also begin a number, i.e., they cannot begin with ., +, -, or a digit. The three exceptions are the identifiers ..., +, and -. Case is insignificant in identifiers so that, for example, newspaper, NewsPaper, and NEWSPAPER all represent the same identifier.

Chez Scheme extends the syntax of identifiers in several ways. Uninterned identifiers are printed with a #: prefix. Other identifiers may contain #, but not as the first character. Identifiers may also contain @, but ,@abc is parsed as (unquote-splicing abc); to produce (unquote @abc) one can type either , @abc or ,|@abc|. The single-character sequences { and } are identifiers. In addition to the exceptions listed above, an identifier may begin with ., +, -, or a digit as long as it cannot be parsed as a number. The grammar for identifier below is thus in conflict with the grammar for numbers; where both grammars accept a given string, the datum is parsed as a number. Finally, identifiers containing arbitrary sequences of characters may be written by escaping them them with \ or with | as described in Section 1.1.

<identifier><initial> <subsequent>*
<initial><letter> | <digit> | <symbol escape> | #:
|. | + | - | ! | $ | % | & | * | / | : | < | = | > | ? | ~ | _ | ^ | @
<subsequent><initial> | #
<letter>a | b | ... | z
<digit>0 | 1 | ... | 9
<symbol escape>\ <any character> | | <any character other than |>* |

Data.  Data include booleans, numbers, characters, strings, symbols, lists, vectors, boxes, records, and special objects such as #!eof. Case is insignificant in the syntax for booleans, numbers, special objects, and character names, but it is significant in other character constants and in strings. For example, #T is equivalent to #t, #E1E3 is equivalent to #e1e3, #X2aBc is equivalent to #x2abc, and #\NewLine is equivalent to #\newline; but #\A is distinct from #\a and "String" is distinct from "string".

<datum><boolean> | <number> | <character> | <string> | <symbol> | <special>
|<list> | <vector> | <box> | <record>
|<marked datum> | <mark reference>
<boolean>#t | #f
<number><num 2> | <num 3> | ... | <num 36>
<character>#\ <any character> | <named character> | <octal character>
<named character>#\backspace | #\linefeed | #\newline | #\nul
|#\page | #\return | #\rubout | #\space | #\tab
<octal character>#\ <digit 8> <digit 8> <digit 8>
<string>" <string character>* "
<string character>\" | \\ | <any character other than " or \>
<symbol><identifier>
<special>#!eof | #!bwp
<list>( <datum>* ) | ( <datum>+ . <datum> ) | <abbreviation>
<abbreviation>' <datum> | ` <datum> | , <datum> | ,@ <datum> | #' <datum>
|#% <identifier> | #2% <identifier> | #3% <identifier>
<vector>#( <datum>* ) | # <digit 10>+ ( <datum>* )
<box>#& <datum>
<record>#[ <string> <datum>* ]
<marked datum># <digit 10>+ = <datum>
<mark reference># <digit 10>+ #

Numbers.  Numbers can appear in any radix from 2 through 36. The first several of productions below are parameterized by the radix, r, and each represents 35 productions, one for each of the possible radixes. When digits can be confused for exponent markers, they are treated as digits.

<num r><prefix r> <complex r>
<complex r><real r> | <real r> @ <real r> | <real r> <sreal r> i | <sreal r> i
<real r><sreal r> | <real r>
<sreal r><sign> <ureal r> | <sign> inf.0 | <sign> nan.0
<ureal r><uinteger r> | <uinteger r> / <uinteger r> | <decimal r>
<uinteger r><digit r>+ #*
<decimal r><uinteger r> <exponent>
|. <digit r>+ #* <suffix r>
|<digit r>+ . <digit r>* #* <suffix r>
|<digit 10>+ #+ . #* <suffix r>
<suffix r><empty> | <exponent r>
<exponent r><exponent marker> <sign> <digit r>+ | <exponent marker> <digit r>+
<exponent marker>e | s | f | d | l
<sign>+ | -
<prefix r><radix r> <exactness> | <exactness> <radix r>
<exactness><empty> | #i | #e
<radix 2>#b
<radix 8>#o
<radix 10><empty> | #d
<radix 16>#x
<radix r>#rr
<digit 2>0 | 1
<digit 3><digit 2> | 2
<digit 10><digit 9> | 9
<digit 11><digit 10> | a
<digit 12><digit 11> | b
<digit 36><digit 35> | z


Chez Scheme User's Guide
© 1998 R. Kent Dybvig
Cadence Research Systems
http://www.scheme.com
Illustrations © 1998 Jean-Pierre Hébert
about this book