2 \chapter{Syntax Transformations} \label{chap:syntax}
3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
4 \newcommand\mtt[1]{\mbox{\tt #1}}
5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
6 \newcommand\Constant{\ttfct{Constant}}
7 \newcommand\Variable{\ttfct{Variable}}
8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
9 \index{syntax!transformations|(}
11 This chapter is intended for experienced Isabelle users who need to define
12 macros or code their own translation functions. It describes the
13 transformations between parse trees, abstract syntax trees and terms.
16 \section{Abstract syntax trees} \label{sec:asts}
19 The parser, given a token list from the lexer, applies productions to yield
20 a parse tree\index{parse trees}. By applying some internal transformations
21 the parse tree becomes an abstract syntax tree, or \AST{}. Macro
22 expansion, further translations and finally type inference yields a
23 well-typed term. The printing process is the reverse, except for some
24 subtleties to be discussed later.
26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.
27 Much of the complexity is due to the macro mechanism. Using macros, you
28 can specify most forms of concrete syntax without writing any \ML{} code.
34 $\downarrow$ & parser \\
36 $\downarrow$ & parse \AST{} translation \\
38 $\downarrow$ & \AST{} rewriting (macros) \\
40 $\downarrow$ & parse translation, type inference \\
41 --- well-typed term --- & \\
42 $\downarrow$ & print translation \\
44 $\downarrow$ & \AST{} rewriting (macros) \\
46 $\downarrow$ & print \AST{} translation, printer \\
51 \caption{Parsing and printing}\label{fig:parse_print}
54 Abstract syntax trees are an intermediate form between the raw parse trees
55 and the typed $\lambda$-terms. An \AST{} is either an atom (constant or
56 variable) or a list of {\em at least two\/} subtrees. Internally, they
57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
60 datatype ast = Constant of string
65 Isabelle uses an S-expression syntax for abstract syntax trees. Constant
66 atoms are shown as quoted strings, variable atoms as non-quoted strings and
67 applications as a parenthesised list of subtrees. For example, the \AST
69 Appl [Constant "_constrain",
70 Appl [Constant "_abs", Variable "x", Variable "t"],
71 Appl [Constant "fun", Variable "'a", Variable "'b"]]
73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
77 The resemblance to Lisp's S-expressions is intentional, but there are two
78 kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the
79 names {\tt Constant} and {\tt Variable} too literally; in the later
80 translation to terms, $\Variable x$ may become a constant, free or bound
81 variable, even a type constructor or class name; the actual outcome depends
84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
85 application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of
86 application is determined later by context; it could be a type constructor
89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later
91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
93 constructor \ttindex{Bound}).
96 \section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
97 \index{ASTs!made from parse trees}
98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
100 The parse tree is the raw output of the parser. Translation functions,
101 called {\bf parse AST translations}\indexbold{translations!parse AST},
102 transform the parse tree into an abstract syntax tree.
104 The parse tree is constructed by nesting the right-hand sides of the
105 productions used to recognize the input. Such parse trees are simply lists
106 of tokens and constituent parse trees, the latter representing the
107 nonterminals of the productions. Let us refer to the actual productions in
108 the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
111 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
112 by stripping out delimiters and copy productions. More precisely, the
113 mapping $\astofpt{-}$ is derived from the productions as follows:
115 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
116 \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$
117 its associated string. Note that for {\tt xstr} this does not include the
120 \item Copy productions:\index{productions!copy}
121 $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for
122 strings of delimiters, which are discarded. $P$ stands for the single
123 constituent that is not a delimiter; it is either a nonterminal symbol or
126 \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
127 Here there are no constituents other than delimiters, which are
130 \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
131 the remaining constituents $P@1$, \ldots, $P@n$ are built into an
132 application whose head constant is~$c$:
133 \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
134 \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
137 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
138 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
139 These examples illustrate the need for further translations to make \AST{}s
140 closer to the typed $\lambda$-calculus. The Pure syntax provides
141 predefined parse \AST{} translations\index{translations!parse AST} for
142 ordinary applications, type applications, nested abstractions, meta
143 implications and function types. Figure~\ref{fig:parse_ast_tr} shows their
144 effect on some representative input strings.
149 \tt\begin{tabular}{ll}
150 \rm input string & \rm \AST \\\hline
153 "t == u" & ("==" t u) \\
154 "f(x)" & ("_appl" f x) \\
155 "f(x, y)" & ("_appl" f ("_args" x y)) \\
156 "f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\
157 "\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\
160 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
165 \tt\begin{tabular}{ll}
166 \rm input string & \rm \AST{} \\\hline
167 "f(x, y, z)" & (f x y z) \\
169 "('a, 'b) ty" & (ty 'a 'b) \\
170 "\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\
171 "\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\
172 "[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\
173 "['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
176 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
179 The names of constant heads in the \AST{} control the translation process.
180 The list of constants invoking parse \AST{} translations appears in the
181 output of {\tt print_syntax} under {\tt parse_ast_translation}.
184 \section{Transforming \AST{}s to terms}\label{sec:termofast}
185 \index{terms!made from ASTs}
186 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
188 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
189 transformed into a term. This term is probably ill-typed since type
190 inference has not occurred yet. The term may contain type constraints
191 consisting of applications with head {\tt "_constrain"}; the second
192 argument is a type encoded as a term. Type inference later introduces
193 correct types or rejects the input.
195 Another set of translation functions, namely parse
196 translations\index{translations!parse}, may affect this process. If we
197 ignore parse translations for the time being, then \AST{}s are transformed
198 to terms by mapping \AST{} constants to constants, \AST{} variables to
199 schematic or free variables and \AST{} applications to applications.
201 More precisely, the mapping $\termofast{-}$ is defined by
203 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
206 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
207 \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
208 the index extracted from~$xi$.
210 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
213 \item Function applications with $n$ arguments:
214 \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
216 \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
219 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
220 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
221 while \ttindex{dummyT} stands for some dummy type that is ignored during
224 So far the outcome is still a first-order term. Abstractions and bound
225 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
226 by parse translations. Such translations are attached to {\tt "_abs"},
227 {\tt "!!"} and user-defined binders.
230 \section{Printing of terms}
231 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
233 The output phase is essentially the inverse of the input phase. Terms are
234 translated via abstract syntax trees into strings. Finally the strings are
237 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
238 terms into \AST{}s. Ignoring those, the transformation maps
239 term constants, variables and applications to the corresponding constructs
240 on \AST{}s. Abstractions are mapped to applications of the special
243 More precisely, the mapping $\astofterm{-}$ is defined as follows:
245 \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
247 \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
250 \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
251 \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
252 the {\tt indexname} $(x, i)$.
254 \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
255 of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
256 be obtained from~$t$ by replacing all bound occurrences of~$x$ by
257 the free variable $x'$. This replaces corresponding occurrences of the
258 constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
260 \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
261 \Appl{\Constant \mtt{"_abs"},
262 constrain(\Variable x', \tau), \astofterm{t'}}.
265 \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
266 The occurrence of constructor \ttindex{Bound} should never happen
267 when printing well-typed terms; it indicates a de Bruijn index with no
268 matching abstraction.
270 \item Where $f$ is not an application,
271 \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
272 \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
276 Type constraints\index{type constraints} are inserted to allow the printing
277 of types. This is governed by the boolean variable \ttindex{show_types}:
279 \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
280 \ttindex{show_types} is set to {\tt false}.
282 \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
283 \astofterm{\tau}}$ \ otherwise.
285 Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
286 constructors go to {\tt Constant}s; type identifiers go to {\tt
287 Variable}s; type applications go to {\tt Appl}s with the type
288 constructor as the first element. If \ttindex{show_sorts} is set to
289 {\tt true}, some type variables are decorated with an \AST{} encoding
293 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
294 transformed into the final output string. The built-in {\bf print AST
295 translations}\indexbold{translations!print AST} reverse the
296 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
298 For the actual printing process, the names attached to productions
299 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
300 a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
301 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
302 for~$c$. Each argument~$x@i$ is converted to a string, and put in
303 parentheses if its priority~$(p@i)$ requires this. The resulting strings
304 and their syntactic sugar (denoted by \dots{} above) are joined to make a
307 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
308 corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
309 x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
310 non-constant head or without a corresponding production are printed as
311 $f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of
312 $\Variable x$ is simply printed as~$x$.
314 Blanks are {\em not\/} inserted automatically. If blanks are required to
315 separate tokens, specify them in the mixfix declaration, possibly preceded
316 by a slash~({\tt/}) to allow a line break.
321 \section{Macros: Syntactic rewriting} \label{sec:macros}
322 \index{macros|(}\index{rewriting!syntactic|(}
324 Mixfix declarations alone can handle situations where there is a direct
325 connection between the concrete syntax and the underlying term. Sometimes
326 we require a more elaborate concrete syntax, such as quantifiers and list
327 notation. Isabelle's {\bf macros} and {\bf translation functions} can
328 perform translations such as
330 \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
331 ALL x:A.P & Ball(A, \%x.P) \\ \relax
332 [x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))
335 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
336 are the most powerful translation mechanism but are difficult to read or
337 write. Macros are specified by first-order rewriting systems that operate
338 on abstract syntax trees. They are usually easy to read and write, and can
339 express all but the most obscure translations.
341 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
342 theory.\footnote{This and the following theories are complete working
343 examples, though they specify only syntax, no axioms. The file {\tt
344 ZF/ZF.thy} presents a full set theory definition, including many
345 macro rules.} Theory {\tt SET} defines constants for set comprehension
346 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
347 quantification ({\tt Ball}). Each of these binds some variables. Without
348 additional syntax we should have to write $\forall x \in A. P$ as {\tt
349 Ball(A,\%x.P)}, and similarly for the others.
359 Trueprop :: "o => prop" ("_" 5)
360 Collect :: "[i, i => o] => i"
361 Replace :: "[i, [i, i] => o] => i"
362 Ball :: "[i, i => o] => o"
364 "{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
365 "{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
366 "{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10)
368 "{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)"
369 "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
370 "ALL x:A. P" == "Ball(A, \%x. P)"
373 \caption{Macro example: set theory}\label{fig:set_trans}
376 The theory specifies a variable-binding syntax through additional productions
377 that have mixfix declarations. Each non-copy production must specify some
378 constant, which is used for building \AST{}s. \index{constants!syntactic} The
379 additional constants are decorated with {\tt\at} to stress their purely
380 syntactic purpose; they may not occur within the final well-typed terms,
381 being declared as {\tt syntax} rather than {\tt consts}.
383 The translations cause the replacement of external forms by internal forms
384 after parsing, and vice versa before printing of terms. As a specification
385 of the set theory notation, they should be largely self-explanatory. The
386 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
387 appear implicitly in the macro rules via their mixfix forms.
389 Macros can define variable-binding syntax because they operate on \AST{}s,
390 which have no inbuilt notion of bound variable. The macro variables {\tt
391 x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
392 in this case bound variables. The macro variables {\tt P} and~{\tt Q}
393 range over formulae containing bound variable occurrences.
395 Other applications of the macro system can be less straightforward, and
396 there are peculiarities. The rest of this section will describe in detail
397 how Isabelle macros are preprocessed and applied.
400 \subsection{Specifying macros}
401 Macros are basically rewrite rules on \AST{}s. But unlike other macro
402 systems found in programming languages, Isabelle's macros work in both
403 directions. Therefore a syntax contains two lists of rewrites: one for
404 parsing and one for printing.
406 \index{*translations section}
407 The {\tt translations} section specifies macros. The syntax for a macro is
408 \[ (root)\; string \quad
409 \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
414 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
415 ({\tt ==}). The two strings specify the left and right-hand sides of the
416 macro rule. The $(root)$ specification is optional; it specifies the
417 nonterminal for parsing the $string$ and if omitted defaults to {\tt
418 logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:
420 \item Rules must be left linear: $l$ must not contain repeated variables.
422 \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
423 (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
425 \item Every variable in~$r$ must also occur in~$l$.
428 Macro rules may refer to any syntax from the parent theories. They may
429 also refer to anything defined before the {\tt .thy} file's {\tt
430 translations} section --- including any mixfix declarations.
432 Upon declaration, both sides of the macro rule undergo parsing and parse
433 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
434 macro expansion. The lexer runs in a different mode that additionally
435 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
436 {\tt _K}). Thus, a constant whose name starts with an underscore can
437 appear in macro rules but not in ordinary terms.
439 Some atoms of the macro rule's \AST{} are designated as constants for
440 matching. These are all names that have been declared as classes, types or
441 constants (logical and syntactic).
443 The result of this preprocessing is two lists of macro rules, each stored
444 as a pair of \AST{}s. They can be viewed using {\tt print_syntax}
445 (sections \ttindex{parse_rules} and \ttindex{print_rules}). For
446 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
449 ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))
450 ("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q)))
451 ("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))
453 ("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P)
454 ("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q)
455 ("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)
459 Avoid choosing variable names that have previously been used as
460 constants, types or type classes; the {\tt consts} section in the output
461 of {\tt print_syntax} lists all such names. If a macro rule works
462 incorrectly, inspect its internal form as shown above, recalling that
463 constants appear as quoted strings and variables without quotes.
467 If \ttindex{eta_contract} is set to {\tt true}, terms will be
468 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some
469 abstraction nodes needed for print rules to match may vanish. For example,
470 \verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
471 not apply and the output will be {\tt Ball(A, P)}. This problem would not
472 occur if \ML{} translation functions were used instead of macros (as is
473 done for binder declarations).
478 Another trap concerns type constraints. If \ttindex{show_types} is set to
479 {\tt true}, bound variables will be decorated by their meta types at the
480 binding place (but not at occurrences in the body). Matching with
481 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
482 "i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to
483 appear in the external form, say \verb|{y::i:A::i. P::o}|.
485 To allow such constraints to be re-read, your syntax should specify bound
486 variables using the nonterminal~\ndx{idt}. This is the case in our
487 example. Choosing {\tt id} instead of {\tt idt} is a common error,
488 especially since it appears in former versions of most of Isabelle's
494 \subsection{Applying rules}
495 As a term is being parsed or printed, an \AST{} is generated as an
496 intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is
497 normalised by applying macro rules in the manner of a traditional term
498 rewriting system. We first examine how a single rule is applied.
500 Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
501 translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an
502 instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex
503 matched by $l$ may be replaced by the corresponding instance of~$r$, thus
504 {\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf
505 place-holders} that may occur in rule patterns but not in ordinary
506 \AST{}s; {\tt Variable} atoms serve this purpose.
508 The matching of the object~$u$ by the pattern~$l$ is performed as follows:
510 \item Every constant matches itself.
512 \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
513 This point is discussed further below.
515 \item Every \AST{} in the object matches $\Variable x$ in the pattern,
518 \item One application matches another if they have the same number of
519 subtrees and corresponding subtrees match.
521 \item In every other case, matching fails. In particular, {\tt
522 Constant}~$x$ can only match itself.
524 A successful match yields a substitution that is applied to~$r$, generating
525 the instance that replaces~$u$.
527 The second case above may look odd. This is where {\tt Variable}s of
528 non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not
529 far removed from parse trees; at this level it is not yet known which
530 identifiers will become constants, bounds, frees, types or classes. As
531 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
532 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
533 \ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other
534 hand, when \AST{}s generated from terms for printing, all constants and type
535 constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may
536 contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is
537 insignificant at macro level because matching treats them alike.
539 Because of this behaviour, different kinds of atoms with the same name are
540 indistinguishable, which may make some rules prone to misbehaviour. Example:
547 "[]" :: "'a list" ("[]")
551 The term {\tt Nil} will be printed as {\tt []}, just as expected.
552 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
553 expected! Guess how type~{\tt Nil} is printed?
555 Normalizing an \AST{} involves repeatedly applying macro rules until none
556 are applicable. Macro rules are chosen in the order that they appear in the
557 {\tt translations} section. You can watch the normalization of \AST{}s
558 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
559 {\tt true}.\index{tracing!of macros} Alternatively, use
560 \ttindex{Syntax.test_read}. The information displayed when tracing
561 includes the \AST{} before normalization ({\tt pre}), redexes with results
562 ({\tt rewrote}), the normal form finally reached ({\tt post}) and some
563 statistics ({\tt normalize}). If tracing is off,
564 \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
565 printing of the normal form and statistics only.
568 \subsection{Example: the syntax of finite sets}
569 \index{examples!of macros}
571 This example demonstrates the use of recursive macros to implement a
572 convenient notation for finite sets.
573 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
574 \index{"@Enum@{\tt\at Enum} constant}
575 \index{"@Finset@{\tt\at Finset} constant}
581 "" :: "i => is" ("_")
582 "{\at}Enum" :: "[i, is] => is" ("_,/ _")
584 empty :: "i" ("{\ttlbrace}{\ttrbrace}")
585 insert :: "[i, i] => i"
587 "{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}")
589 "{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"
590 "{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"
593 Finite sets are internally built up by {\tt empty} and {\tt insert}. The
594 declarations above specify \verb|{x, y, z}| as the external representation
597 insert(x, insert(y, insert(z, empty)))
599 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
600 i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|}
601 allows a line break after the comma for \rmindex{pretty printing}; if no
602 line break is required then a space is printed instead.
604 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
605 declaration. Hence {\tt is} is not a logical type and may be used safely as
606 a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be
607 re-used for other enumerations of type~{\tt i} like lists or tuples. If we
608 had needed polymorphic enumerations, we could have used the predefined
609 nonterminal symbol \ndx{args} and skipped this part altogether.
611 \index{"@Finset@{\tt\at Finset} constant}
612 Next follows {\tt empty}, which is already equipped with its syntax
613 \verb|{}|, and {\tt insert} without concrete syntax. The syntactic
614 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
615 i} enclosed in curly braces. Remember that a pair of parentheses, as in
616 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
618 The translations may look strange at first. Macro rules are best
619 understood in their internal forms:
622 ("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs))
623 ("{\at}Finset" x) -> ("insert" x "empty")
625 ("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs))
626 ("insert" x "empty") -> ("{\at}Finset" x)
628 This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
629 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
630 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
631 The parse rules only work in the order given.
634 The \AST{} rewriter cannot distinguish constants from variables and looks
635 only for names of atoms. Thus the names of {\tt Constant}s occurring in
636 the (internal) left-hand side of translation rules should be regarded as
637 \rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or
638 sufficiently long and strange names. If a bound variable's name gets
639 rewritten, the result will be incorrect; for example, the term
641 \%empty insert. insert(x, empty)
643 \par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
647 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
648 \index{examples!of macros}
650 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
651 the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal;
652 if allowed, it could cause variable capture. In such cases you usually
653 must fall back on translation functions. But a trick can make things
654 readable in some cases: {\em calling\/} translation functions by parse
659 Pi :: "[i, i => i] => i"
661 "{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10)
662 "{\at}->" :: "[i, i] => i" ("(_ ->/ _)" [51, 50] 50)
665 "PROD x:A. B" => "Pi(A, \%x. B)"
666 "A -> B" => "Pi(A, _K(B))"
669 val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
672 Here {\tt Pi} is a logical constant for constructing general products.
673 Two external forms exist: the general case {\tt PROD x:A.B} and the
674 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
675 does not depend on~{\tt x}.
677 The second parse macro introduces {\tt _K(B)}, which later becomes
678 \verb|%x.B| due to a parse translation associated with \cdx{_K}.
679 Unfortunately there is no such trick for printing, so we have to add a {\tt
680 ML} section for the print translation \ttindex{dependent_tr'}.
682 Recall that identifiers with a leading {\tt _} are allowed in translation
683 rules, but not in ordinary terms. Thus we can create \AST{}s containing
684 names that are not directly expressible.
686 The parse translation for {\tt _K} is already installed in Pure, and {\tt
687 dependent_tr'} is exported by the syntax module for public use. See
688 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
689 \index{macros|)}\index{rewriting!syntactic|)}
692 \section{Translation functions} \label{sec:tr_funs}
693 \index{translations|(}
695 This section describes the translation function mechanism. By writing
696 \ML{} functions, you can do almost everything with terms or \AST{}s during
697 parsing and printing. The logic \LK\ is a good example of sophisticated
698 transformations between internal and external representations of sequents;
699 here, macros would be useless.
701 A full understanding of translations requires some familiarity
702 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
703 {\tt Syntax.ast} and the encodings of types and terms as such at the various
704 stages of the parsing or printing process. Most users should never need to
705 use translation functions.
707 \subsection{Declaring translation functions}
708 There are four kinds of translation functions. Each such function is
709 associated with a name, which triggers calls to it. Such names can be
710 constants (logical or syntactic) or type constructors.
712 {\tt print_syntax} displays the sets of names associated with the
713 translation functions of a {\tt Syntax.syntax} under
714 \ttindex{parse_ast_translation}, \ttindex{parse_translation},
715 \ttindex{print_translation} and \ttindex{print_ast_translation}. You can
716 add new ones via the {\tt ML} section\index{*ML section} of
717 a {\tt .thy} file. There may never be more than one function of the same
718 kind per name. Even though the {\tt ML} section is the very last part of a
719 {\tt .thy} file, newly installed translation functions are effective when
720 processing the preceding section.
722 The {\tt ML} section is copied verbatim near the beginning of the \ML\ file
723 generated from a {\tt .thy} file. Definitions made here are accessible as
724 components of an \ML\ structure; to make some definitions private, use an
725 \ML{} {\tt local} declaration. The {\tt ML} section may install translation
726 functions by declaring any of the following identifiers:
728 val parse_ast_translation : (string * (ast list -> ast)) list
729 val print_ast_translation : (string * (ast list -> ast)) list
730 val parse_translation : (string * (term list -> term)) list
731 val print_translation : (string * (term list -> term)) list
734 \subsection{The translation strategy}
735 All four kinds of translation functions are treated similarly. They are
736 called during the transformations between parse trees, \AST{}s and terms
737 (recall Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form
738 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
739 $f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
740 function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
742 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. A
743 combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
744 x@n}$. For term translations, the arguments are terms and a combination
745 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
746 x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated
747 transformations than \AST{}s do, typically involving abstractions and bound
750 Regardless of whether they act on terms or \AST{}s, parse translations differ
751 from print translations in their overall behaviour fundamentally:
753 \item[Parse translations] are applied bottom-up. The arguments are already
754 in translated form. The translations must not fail; exceptions trigger
757 \item[Print translations] are applied top-down. They are supplied with
758 arguments that are partly still in internal form. The result again
759 undergoes translation; therefore a print translation should not introduce
760 as head the very constant that invoked it. The function may raise
761 exception \xdx{Match} to indicate failure; in this event it has no
765 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
766 \ttindex{Const} for terms --- can invoke translation functions. This
767 causes another difference between parsing and printing.
769 Parsing starts with a string and the constants are not yet identified.
770 Only parse tree heads create {\tt Constant}s in the resulting \AST, as
771 described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may
772 introduce further {\tt Constant}s. When the final \AST{} is converted to a
773 term, all {\tt Constant}s become {\tt Const}s, as described in
774 \S\ref{sec:termofast}.
776 Printing starts with a well-typed term and all the constants are known. So
777 all logical constants and type constructors may invoke print translations.
778 These, and macros, may introduce further constants.
781 \subsection{Example: a print translation for dependent types}
782 \index{examples!of translations}\indexbold{*dependent_tr'}
784 Let us continue the dependent type example (page~\pageref{prod_trans}) by
785 examining the parse translation for~\cdx{_K} and the print translation
786 {\tt dependent_tr'}, which are both built-in. By convention, parse
787 translations have names ending with {\tt _tr} and print translations have
788 names ending with {\tt _tr'}. Search for such names in the Isabelle
789 sources to locate more examples.
791 Here is the parse translation for {\tt _K}:
793 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
794 | k_tr ts = raise TERM ("k_tr", ts);
796 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
797 {\tt Abs} node with a body derived from $t$. Since terms given to parse
798 translations are not yet typed, the type of the bound variable in the new
799 {\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}
800 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
801 a basic term manipulation function defined in {\tt Pure/term.ML}.
803 Here is the print translation for dependent types:
805 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
806 if 0 mem (loose_bnos B) then
807 let val (x', B') = variant_abs (x, dummyT, B);
808 in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
810 else list_comb (Const (r, dummyT) $ A $ B, ts)
811 | dependent_tr' _ _ = raise Match;
813 The argument {\tt (q,r)} is supplied to the curried function {\tt
814 dependent_tr'} by a partial application during its installation. We
815 can set up print translations for both {\tt Pi} and {\tt Sigma} by
817 \begin{ttbox}\index{*ML section}
818 val print_translation =
819 [("Pi", dependent_tr' ("{\at}PROD", "{\at}->")),
820 ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
822 within the {\tt ML} section. The first of these transforms ${\tt Pi}(A,
823 \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
824 $\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not depend
825 on~$x$. It checks this using \ttindex{loose_bnos}, yet another function
826 from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$ renamed away
827 from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes
828 referring to the {\tt Abs} node replaced by $\ttfct{Free} (x',
831 We must be careful with types here. While types of {\tt Const}s are
832 ignored, type constraints may be printed for some {\tt Free}s and
833 {\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type
834 \ttindex{dummyT} are never printed with constraint, though. The line
836 let val (x', B') = variant_abs (x, dummyT, B);
837 \end{ttbox}\index{*variant_abs}
838 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
839 type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the
840 correct type~{\tt T}, so this is the only place where a type
841 constraint might appear.
842 \index{translations|)}
843 \index{syntax!transformations|)}