2 \chapter{Syntax Transformations} \label{chap:syntax}
3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
4 \newcommand\mtt[1]{\mbox{\tt #1}}
5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
6 \newcommand\Constant{\ttfct{Constant}}
7 \newcommand\Variable{\ttfct{Variable}}
8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
9 \index{syntax!transformations|(}
11 This chapter is intended for experienced Isabelle users who need to define
12 macros or code their own translation functions. It describes the
13 transformations between parse trees, abstract syntax trees and terms.
16 \section{Abstract syntax trees} \label{sec:asts}
19 The parser, given a token list from the lexer, applies productions to yield
20 a parse tree\index{parse trees}. By applying some internal transformations
21 the parse tree becomes an abstract syntax tree, or \AST{}. Macro
22 expansion, further translations and finally type inference yields a
23 well-typed term. The printing process is the reverse, except for some
24 subtleties to be discussed later.
26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.
27 Much of the complexity is due to the macro mechanism. Using macros, you
28 can specify most forms of concrete syntax without writing any \ML{} code.
34 $\downarrow$ & parser \\
36 $\downarrow$ & parse \AST{} translation \\
38 $\downarrow$ & \AST{} rewriting (macros) \\
40 $\downarrow$ & parse translation, type inference \\
41 --- well-typed term --- & \\
42 $\downarrow$ & print translation \\
44 $\downarrow$ & \AST{} rewriting (macros) \\
46 $\downarrow$ & print \AST{} translation, printer \\
51 \caption{Parsing and printing}\label{fig:parse_print}
54 Abstract syntax trees are an intermediate form between the raw parse trees
55 and the typed $\lambda$-terms. An \AST{} is either an atom (constant or
56 variable) or a list of {\em at least two\/} subtrees. Internally, they
57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
60 datatype ast = Constant of string
65 Isabelle uses an S-expression syntax for abstract syntax trees. Constant
66 atoms are shown as quoted strings, variable atoms as non-quoted strings and
67 applications as a parenthesized list of subtrees. For example, the \AST
69 Appl [Constant "_constrain",
70 Appl [Constant "_abs", Variable "x", Variable "t"],
71 Appl [Constant "fun", Variable "'a", Variable "'b"]]
73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
77 The resemblance to Lisp's S-expressions is intentional, but there are two
78 kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the
79 names {\tt Constant} and {\tt Variable} too literally; in the later
80 translation to terms, $\Variable x$ may become a constant, free or bound
81 variable, even a type constructor or class name; the actual outcome depends
84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
85 application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of
86 application is determined later by context; it could be a type constructor
89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later
91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
93 constructor \ttindex{Bound}).
96 \section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
97 \index{ASTs!made from parse trees}
98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
100 The parse tree is the raw output of the parser. Translation functions,
101 called {\bf parse AST translations}\indexbold{translations!parse AST},
102 transform the parse tree into an abstract syntax tree.
104 The parse tree is constructed by nesting the right-hand sides of the
105 productions used to recognize the input. Such parse trees are simply lists
106 of tokens and constituent parse trees, the latter representing the
107 nonterminals of the productions. Let us refer to the actual productions in
108 the form displayed by {\tt Syntax.print_syntax}.
110 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
111 by stripping out delimiters and copy productions. More precisely, the
112 mapping $\astofpt{-}$ is derived from the productions as follows:
114 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
115 \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
117 \item Copy productions:\index{productions!copy}
118 $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for
119 strings of delimiters, which are discarded. $P$ stands for the single
120 constituent that is not a delimiter; it is either a nonterminal symbol or
123 \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
124 Here there are no constituents other than delimiters, which are
127 \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
128 the remaining constituents $P@1$, \ldots, $P@n$ are built into an
129 application whose head constant is~$c$:
130 \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
131 \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
134 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
135 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
136 These examples illustrate the need for further translations to make \AST{}s
137 closer to the typed $\lambda$-calculus. The Pure syntax provides
138 predefined parse \AST{} translations\index{translations!parse AST} for
139 ordinary applications, type applications, nested abstractions, meta
140 implications and function types. Figure~\ref{fig:parse_ast_tr} shows their
141 effect on some representative input strings.
146 \tt\begin{tabular}{ll}
147 \rm input string & \rm \AST \\\hline
150 "t == u" & ("==" t u) \\
151 "f(x)" & ("_appl" f x) \\
152 "f(x, y)" & ("_appl" f ("_args" x y)) \\
153 "f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\
154 "\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\
157 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
162 \tt\begin{tabular}{ll}
163 \rm input string & \rm \AST{} \\\hline
164 "f(x, y, z)" & (f x y z) \\
166 "('a, 'b) ty" & (ty 'a 'b) \\
167 "\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\
168 "\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\
169 "[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\
170 "['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
173 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
176 The names of constant heads in the \AST{} control the translation process.
177 The list of constants invoking parse \AST{} translations appears in the
178 output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
181 \section{Transforming \AST{}s to terms}\label{sec:termofast}
182 \index{terms!made from ASTs}
183 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
185 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
186 transformed into a term. This term is probably ill-typed since type
187 inference has not occurred yet. The term may contain type constraints
188 consisting of applications with head {\tt "_constrain"}; the second
189 argument is a type encoded as a term. Type inference later introduces
190 correct types or rejects the input.
192 Another set of translation functions, namely parse
193 translations\index{translations!parse}, may affect this process. If we
194 ignore parse translations for the time being, then \AST{}s are transformed
195 to terms by mapping \AST{} constants to constants, \AST{} variables to
196 schematic or free variables and \AST{} applications to applications.
198 More precisely, the mapping $\termofast{-}$ is defined by
200 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
203 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
204 \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
205 the index extracted from~$xi$.
207 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
210 \item Function applications with $n$ arguments:
211 \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
213 \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
216 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
217 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
218 while \ttindex{dummyT} stands for some dummy type that is ignored during
221 So far the outcome is still a first-order term. Abstractions and bound
222 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
223 by parse translations. Such translations are attached to {\tt "_abs"},
224 {\tt "!!"} and user-defined binders.
227 \section{Printing of terms}
228 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
230 The output phase is essentially the inverse of the input phase. Terms are
231 translated via abstract syntax trees into strings. Finally the strings are
234 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
235 terms into \AST{}s. Ignoring those, the transformation maps
236 term constants, variables and applications to the corresponding constructs
237 on \AST{}s. Abstractions are mapped to applications of the special
240 More precisely, the mapping $\astofterm{-}$ is defined as follows:
242 \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
244 \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
247 \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
248 \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
249 the {\tt indexname} $(x, i)$.
251 \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
252 of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
253 be obtained from~$t$ by replacing all bound occurrences of~$x$ by
254 the free variable $x'$. This replaces corresponding occurrences of the
255 constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
257 \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
258 \Appl{\Constant \mtt{"_abs"},
259 constrain(\Variable x', \tau), \astofterm{t'}}.
262 \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
263 The occurrence of constructor \ttindex{Bound} should never happen
264 when printing well-typed terms; it indicates a de Bruijn index with no
265 matching abstraction.
267 \item Where $f$ is not an application,
268 \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
269 \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
273 Type constraints are inserted to allow the printing of types. This is
274 governed by the boolean variable \ttindex{show_types}:
276 \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
277 \ttindex{show_types} not set to {\tt true}.
279 \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
280 \astofterm{\tau}}$ \ otherwise.
282 Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
283 constructors go to {\tt Constant}s; type identifiers go to {\tt
284 Variable}s; type applications go to {\tt Appl}s with the type
285 constructor as the first element. If \ttindex{show_sorts} is set to
286 {\tt true}, some type variables are decorated with an \AST{} encoding
290 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
291 transformed into the final output string. The built-in {\bf print AST
292 translations}\indexbold{translations!print AST} effectively reverse the
293 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
295 For the actual printing process, the names attached to productions
296 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
297 a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
298 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
299 for~$c$. Each argument~$x@i$ is converted to a string, and put in
300 parentheses if its priority~$(p@i)$ requires this. The resulting strings
301 and their syntactic sugar (denoted by \dots{} above) are joined to make a
304 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
305 corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
306 x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
307 non-constant head or without a corresponding production are printed as
308 $f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of
309 $\Variable x$ is simply printed as~$x$.
311 Blanks are {\em not\/} inserted automatically. If blanks are required to
312 separate tokens, specify them in the mixfix declaration, possibly preceded
313 by a slash~({\tt/}) to allow a line break.
318 \section{Macros: Syntactic rewriting} \label{sec:macros}
319 \index{macros|(}\index{rewriting!syntactic|(}
321 Mixfix declarations alone can handle situations where there is a direct
322 connection between the concrete syntax and the underlying term. Sometimes
323 we require a more elaborate concrete syntax, such as quantifiers and list
324 notation. Isabelle's {\bf macros} and {\bf translation functions} can
325 perform translations such as
327 \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
328 ALL x:A.P & Ball(A, \%x.P) \\ \relax
329 [x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))
332 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
333 are the most powerful translation mechanism but are difficult to read or
334 write. Macros are specified by first-order rewriting systems that operate
335 on abstract syntax trees. They are usually easy to read and write, and can
336 express all but the most obscure translations.
338 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
339 theory.\footnote{This and the following theories are complete working
340 examples, though they specify only syntax, no axioms. The file {\tt
341 ZF/zf.thy} presents the full set theory definition, including many
342 macro rules.} Theory {\tt SET} defines constants for set comprehension
343 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
344 quantification ({\tt Ball}). Each of these binds some variables. Without
345 additional syntax we should have to express $\forall x \in A. P$ as {\tt
346 Ball(A,\%x.P)}, and similarly for the others.
356 Trueprop :: "o => prop" ("_" 5)
357 Collect :: "[i, i => o] => i"
358 "{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
359 Replace :: "[i, [i, i] => o] => i"
360 "{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
361 Ball :: "[i, i => o] => o"
362 "{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10)
364 "{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)"
365 "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
366 "ALL x:A. P" == "Ball(A, \%x. P)"
369 \caption{Macro example: set theory}\label{fig:set_trans}
372 The theory specifies a variable-binding syntax through additional
373 productions that have mixfix declarations. Each non-copy production must
374 specify some constant, which is used for building \AST{}s.
375 \index{constants!syntactic} The additional constants are decorated with
376 {\tt\at} to stress their purely syntactic purpose; they should never occur
377 within the final well-typed terms. Furthermore, they cannot be written in
378 formulae because they are not legal identifiers.
380 The translations cause the replacement of external forms by internal forms
381 after parsing, and vice versa before printing of terms. As a specification
382 of the set theory notation, they should be largely self-explanatory. The
383 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
384 appear implicitly in the macro rules via their mixfix forms.
386 Macros can define variable-binding syntax because they operate on \AST{}s,
387 which have no inbuilt notion of bound variable. The macro variables {\tt
388 x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
389 in this case bound variables. The macro variables {\tt P} and~{\tt Q}
390 range over formulae containing bound variable occurrences.
392 Other applications of the macro system can be less straightforward, and
393 there are peculiarities. The rest of this section will describe in detail
394 how Isabelle macros are preprocessed and applied.
397 \subsection{Specifying macros}
398 Macros are basically rewrite rules on \AST{}s. But unlike other macro
399 systems found in programming languages, Isabelle's macros work in both
400 directions. Therefore a syntax contains two lists of rewrites: one for
401 parsing and one for printing.
403 \index{*translations section}
404 The {\tt translations} section specifies macros. The syntax for a macro is
405 \[ (root)\; string \quad
406 \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
411 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
412 ({\tt ==}). The two strings specify the left and right-hand sides of the
413 macro rule. The $(root)$ specification is optional; it specifies the
414 nonterminal for parsing the $string$ and if omitted defaults to {\tt
415 logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:
417 \item Rules must be left linear: $l$ must not contain repeated variables.
419 \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
420 (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
422 \item Every variable in~$r$ must also occur in~$l$.
425 Macro rules may refer to any syntax from the parent theories. They may
426 also refer to anything defined before the the {\tt .thy} file's {\tt
427 translations} section --- including any mixfix declarations.
429 Upon declaration, both sides of the macro rule undergo parsing and parse
430 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
431 macro expansion. The lexer runs in a different mode that additionally
432 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
433 {\tt _K}). Thus, a constant whose name starts with an underscore can
434 appear in macro rules but not in ordinary terms.
436 Some atoms of the macro rule's \AST{} are designated as constants for
437 matching. These are all names that have been declared as classes, types or
440 The result of this preprocessing is two lists of macro rules, each stored
441 as a pair of \AST{}s. They can be viewed using {\tt Syntax.print_syntax}
442 (sections \ttindex{parse_rules} and \ttindex{print_rules}). For
443 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
446 ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))
447 ("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q)))
448 ("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))
450 ("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P)
451 ("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q)
452 ("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)
456 Avoid choosing variable names that have previously been used as
457 constants, types or type classes; the {\tt consts} section in the output
458 of {\tt Syntax.print_syntax} lists all such names. If a macro rule works
459 incorrectly, inspect its internal form as shown above, recalling that
460 constants appear as quoted strings and variables without quotes.
464 If \ttindex{eta_contract} is set to {\tt true}, terms will be
465 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some
466 abstraction nodes needed for print rules to match may vanish. For example,
467 \verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does
468 not apply and the output will be {\tt Ball(A, P)}. This problem would not
469 occur if \ML{} translation functions were used instead of macros (as is
470 done for binder declarations).
475 Another trap concerns type constraints. If \ttindex{show_types} is set to
476 {\tt true}, bound variables will be decorated by their meta types at the
477 binding place (but not at occurrences in the body). Matching with
478 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
479 "i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to
480 appear in the external form, say \verb|{y::i:A::i. P::o}|.
482 To allow such constraints to be re-read, your syntax should specify bound
483 variables using the nonterminal~\ndx{idt}. This is the case in our
484 example. Choosing {\tt id} instead of {\tt idt} is a common error,
485 especially since it appears in former versions of most of Isabelle's
491 \subsection{Applying rules}
492 As a term is being parsed or printed, an \AST{} is generated as an
493 intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is
494 normalized by applying macro rules in the manner of a traditional term
495 rewriting system. We first examine how a single rule is applied.
497 Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some
498 translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an
499 instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex
500 matched by $l$ may be replaced by the corresponding instance of~$r$, thus
501 {\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf
502 place-holders} that may occur in rule patterns but not in ordinary
503 \AST{}s; {\tt Variable} atoms serve this purpose.
505 The matching of the object~$u$ by the pattern~$l$ is performed as follows:
507 \item Every constant matches itself.
509 \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
510 This point is discussed further below.
512 \item Every \AST{} in the object matches $\Variable x$ in the pattern,
515 \item One application matches another if they have the same number of
516 subtrees and corresponding subtrees match.
518 \item In every other case, matching fails. In particular, {\tt
519 Constant}~$x$ can only match itself.
521 A successful match yields a substitution that is applied to~$r$, generating
522 the instance that replaces~$u$.
524 The second case above may look odd. This is where {\tt Variable}s of
525 non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not
526 far removed from parse trees; at this level it is not yet known which
527 identifiers will become constants, bounds, frees, types or classes. As
528 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
529 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
530 \ndx{tvar} become {\tt Variable}s. On the other hand, when \AST{}s
531 generated from terms for printing, all constants and type constructors
532 become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may contain a
533 messy mixture of {\tt Variable}s and {\tt Constant}s. This is
534 insignificant at macro level because matching treats them alike.
536 Because of this behaviour, different kinds of atoms with the same name are
537 indistinguishable, which may make some rules prone to misbehaviour. Example:
543 "[]" :: "'a list" ("[]")
547 The term {\tt Nil} will be printed as {\tt []}, just as expected.
548 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
551 Normalizing an \AST{} involves repeatedly applying macro rules until none
552 is applicable. Macro rules are chosen in the order that they appear in the
553 {\tt translations} section. You can watch the normalization of \AST{}s
554 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
555 {\tt true}.\index{tracing!of macros} Alternatively, use
556 \ttindex{Syntax.test_read}. The information displayed when tracing
557 includes the \AST{} before normalization ({\tt pre}), redexes with results
558 ({\tt rewrote}), the normal form finally reached ({\tt post}) and some
559 statistics ({\tt normalize}). If tracing is off,
560 \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
561 printing of the normal form and statistics only.
564 \subsection{Example: the syntax of finite sets}
565 \index{examples!of macros}
567 This example demonstrates the use of recursive macros to implement a
568 convenient notation for finite sets.
569 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
570 \index{"@Enum@{\tt\at Enum} constant}
571 \index{"@Finset@{\tt\at Finset} constant}
577 "" :: "i => is" ("_")
578 "{\at}Enum" :: "[i, is] => is" ("_,/ _")
579 empty :: "i" ("{\ttlbrace}{\ttrbrace}")
580 insert :: "[i, i] => i"
581 "{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}")
583 "{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"
584 "{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"
587 Finite sets are internally built up by {\tt empty} and {\tt insert}. The
588 declarations above specify \verb|{x, y, z}| as the external representation
591 insert(x, insert(y, insert(z, empty)))
593 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
594 i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|}
595 allows a line break after the comma for \rmindex{pretty printing}; if no
596 line break is required then a space is printed instead.
598 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
599 declaration. Hence {\tt is} is not a logical type and no default
600 productions are added. If we had needed enumerations of the nonterminal
601 {\tt logic}, which would include all the logical types, we could have used
602 the predefined nonterminal symbol \ndx{args} and skipped this part
603 altogether. The nonterminal~{\tt is} can later be reused for other
604 enumerations of type~{\tt i} like lists or tuples.
606 \index{"@Finset@{\tt\at Finset} constant}
607 Next follows {\tt empty}, which is already equipped with its syntax
608 \verb|{}|, and {\tt insert} without concrete syntax. The syntactic
609 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
610 i} enclosed in curly braces. Remember that a pair of parentheses, as in
611 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
613 The translations may look strange at first. Macro rules are best
614 understood in their internal forms:
617 ("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs))
618 ("{\at}Finset" x) -> ("insert" x "empty")
620 ("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs))
621 ("insert" x "empty") -> ("{\at}Finset" x)
623 This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
624 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
625 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
626 The parse rules only work in the order given.
629 The \AST{} rewriter cannot discern constants from variables and looks
630 only for names of atoms. Thus the names of {\tt Constant}s occurring in
631 the (internal) left-hand side of translation rules should be regarded as
632 \rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or
633 sufficiently long and strange names. If a bound variable's name gets
634 rewritten, the result will be incorrect; for example, the term
636 \%empty insert. insert(x, empty)
638 gets printed as \verb|%empty insert. {x}|.
642 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
643 \index{examples!of macros}
645 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
646 the right-hand side. Something like \verb|"K(B)" => "%x. B"| is illegal;
647 if allowed, it could cause variable capture. In such cases you usually
648 must fall back on translation functions. But a trick can make things
649 readable in some cases: {\em calling translation functions by parse
654 Pi :: "[i, i => i] => i"
655 "{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10)
656 "{\at}->" :: "[i, i] => i" ("(_ ->/ _)" [51, 50] 50)
659 "PROD x:A. B" => "Pi(A, \%x. B)"
660 "A -> B" => "Pi(A, _K(B))"
663 val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
666 Here {\tt Pi} is an internal constant for constructing general products.
667 Two external forms exist: the general case {\tt PROD x:A.B} and the
668 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
669 does not depend on~{\tt x}.
671 The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
672 due to a parse translation associated with \cdx{_K}. The order of the
673 parse rules is critical. Unfortunately there is no such trick for
674 printing, so we have to add a {\tt ML} section for the print translation
675 \ttindex{dependent_tr'}.
677 Recall that identifiers with a leading {\tt _} are allowed in translation
678 rules, but not in ordinary terms. Thus we can create \AST{}s containing
679 names that are not directly expressible.
681 The parse translation for {\tt _K} is already installed in Pure, and {\tt
682 dependent_tr'} is exported by the syntax module for public use. See
683 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
684 \index{macros|)}\index{rewriting!syntactic|)}
687 \section{Translation functions} \label{sec:tr_funs}
688 \index{translations|(}
690 This section describes the translation function mechanism. By writing
691 \ML{} functions, you can do almost everything with terms or \AST{}s during
692 parsing and printing. The logic \LK\ is a good example of sophisticated
693 transformations between internal and external representations of
694 associative sequences; here, macros would be useless.
696 A full understanding of translations requires some familiarity
697 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
698 {\tt Syntax.ast} and the encodings of types and terms as such at the various
699 stages of the parsing or printing process. Most users should never need to
700 use translation functions.
702 \subsection{Declaring translation functions}
703 There are four kinds of translation functions. Each such function is
704 associated with a name, which triggers calls to it. Such names can be
705 constants (logical or syntactic) or type constructors.
707 {\tt Syntax.print_syntax} displays the sets of names associated with the
708 translation functions of a {\tt Syntax.syntax} under
709 \ttindex{parse_ast_translation}, \ttindex{parse_translation},
710 \ttindex{print_translation} and \ttindex{print_ast_translation}. You can
711 add new ones via the {\tt ML} section\index{*ML section} of
712 a {\tt .thy} file. There may never be more than one function of the same
713 kind per name. Conceptually, the {\tt ML} section should appear between
714 {\tt consts} and {\tt translations}; newly installed translation functions
715 are already effective when macros and logical rules are parsed.
717 The {\tt ML} section is copied verbatim into the \ML\ file generated from a
718 {\tt .thy} file. Definitions made here are accessible as components of an
719 \ML\ structure; to make some definitions private, use an \ML{} {\tt local}
720 declaration. The {\tt ML} section may install translation functions by
721 declaring any of the following identifiers:
723 val parse_ast_translation : (string * (ast list -> ast)) list
724 val print_ast_translation : (string * (ast list -> ast)) list
725 val parse_translation : (string * (term list -> term)) list
726 val print_translation : (string * (term list -> term)) list
729 \subsection{The translation strategy}
730 All four kinds of translation functions are treated similarly. They are
731 called during the transformations between parse trees, \AST{}s and terms
732 (recall Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form
733 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
734 $f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
735 function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
737 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. A
738 combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
739 x@n}$. For term translations, the arguments are terms and a combination
740 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
741 x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated
742 transformations than \AST{}s do, typically involving abstractions and bound
745 Regardless of whether they act on terms or \AST{}s,
746 parse translations differ from print translations fundamentally:
748 \item[Parse translations] are applied bottom-up. The arguments are already
749 in translated form. The translations must not fail; exceptions trigger
752 \item[Print translations] are applied top-down. They are supplied with
753 arguments that are partly still in internal form. The result again
754 undergoes translation; therefore a print translation should not introduce
755 as head the very constant that invoked it. The function may raise
756 exception \xdx{Match} to indicate failure; in this event it has no
760 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
761 \ttindex{Const} for terms --- can invoke translation functions. This
762 causes another difference between parsing and printing.
764 Parsing starts with a string and the constants are not yet identified.
765 Only parse tree heads create {\tt Constant}s in the resulting \AST, as
766 described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may
767 introduce further {\tt Constant}s. When the final \AST{} is converted to a
768 term, all {\tt Constant}s become {\tt Const}s, as described in
769 \S\ref{sec:termofast}.
771 Printing starts with a well-typed term and all the constants are known. So
772 all logical constants and type constructors may invoke print translations.
773 These, and macros, may introduce further constants.
776 \subsection{Example: a print translation for dependent types}
777 \index{examples!of translations}\indexbold{*dependent_tr'}
779 Let us continue the dependent type example (page~\pageref{prod_trans}) by
780 examining the parse translation for~\cdx{_K} and the print translation
781 {\tt dependent_tr'}, which are both built-in. By convention, parse
782 translations have names ending with {\tt _tr} and print translations have
783 names ending with {\tt _tr'}. Search for such names in the Isabelle
784 sources to locate more examples.
786 Here is the parse translation for {\tt _K}:
788 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
789 | k_tr ts = raise TERM("k_tr",ts);
791 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
792 {\tt Abs} node with a body derived from $t$. Since terms given to parse
793 translations are not yet typed, the type of the bound variable in the new
794 {\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}
795 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
796 a basic term manipulation function defined in {\tt Pure/term.ML}.
798 Here is the print translation for dependent types:
800 fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
801 if 0 mem (loose_bnos B) then
802 let val (x', B') = variant_abs (x, dummyT, B);
803 in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
805 else list_comb (Const (r, dummyT) $ A $ B, ts)
806 | dependent_tr' _ _ = raise Match;
808 The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried
809 function application during its installation. We could set up print
810 translations for both {\tt Pi} and {\tt Sigma} by including
811 \begin{ttbox}\index{*ML section}
812 val print_translation =
813 [("Pi", dependent_tr' ("{\at}PROD", "{\at}->")),
814 ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
816 within the {\tt ML} section. The first of these transforms ${\tt Pi}(A,
817 \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
818 $\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if $B$ does not depend
819 on~$x$. It checks this using \ttindex{loose_bnos}, yet another function
820 from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$ renamed away
821 from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes
822 referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',
825 We must be careful with types here. While types of {\tt Const}s are
826 ignored, type constraints may be printed for some {\tt Free}s and
827 {\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type
828 \ttindex{dummyT} are never printed with constraint, though. The line
830 let val (x', B') = variant_abs (x, dummyT, B);
831 \end{ttbox}\index{*variant_abs}
832 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
833 type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the
834 correct type~{\tt T}, so this is the only place where a type
835 constraint might appear.
836 \index{translations|)}
837 \index{syntax!transformations|)}