2 \chapter{Syntax Transformations} \label{chap:syntax}
3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
4 \newcommand\mtt[1]{\mbox{\tt #1}}
5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
6 \newcommand\Constant{\ttfct{Constant}}
7 \newcommand\Variable{\ttfct{Variable}}
8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
9 \index{syntax!transformations|(}
11 This chapter is intended for experienced Isabelle users who need to define
12 macros or code their own translation functions. It describes the
13 transformations between parse trees, abstract syntax trees and terms.
16 \section{Abstract syntax trees} \label{sec:asts}
19 The parser, given a token list from the lexer, applies productions to yield
20 a parse tree\index{parse trees}. By applying some internal transformations
21 the parse tree becomes an abstract syntax tree, or \AST{}. Macro
22 expansion, further translations and finally type inference yields a
23 well-typed term. The printing process is the reverse, except for some
24 subtleties to be discussed later.
26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.
27 Much of the complexity is due to the macro mechanism. Using macros, you
28 can specify most forms of concrete syntax without writing any \ML{} code.
34 $\downarrow$ & lexer, parser \\
36 $\downarrow$ & parse \AST{} translation \\
38 $\downarrow$ & \AST{} rewriting (macros) \\
40 $\downarrow$ & parse translation, type inference \\
41 --- well-typed term --- & \\
42 $\downarrow$ & print translation \\
44 $\downarrow$ & \AST{} rewriting (macros) \\
46 $\downarrow$ & print \AST{} translation, token translation \\
51 \caption{Parsing and printing}\label{fig:parse_print}
54 Abstract syntax trees are an intermediate form between the raw parse trees
55 and the typed $\lambda$-terms. An \AST{} is either an atom (constant or
56 variable) or a list of {\em at least two\/} subtrees. Internally, they
57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
60 datatype ast = Constant of string
65 Isabelle uses an S-expression syntax for abstract syntax trees. Constant
66 atoms are shown as quoted strings, variable atoms as non-quoted strings and
67 applications as a parenthesised list of subtrees. For example, the \AST
69 Appl [Constant "_constrain",
70 Appl [Constant "_abs", Variable "x", Variable "t"],
71 Appl [Constant "fun", Variable "'a", Variable "'b"]]
73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
77 The resemblance to Lisp's S-expressions is intentional, but there are two
78 kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the
79 names {\tt Constant} and {\tt Variable} too literally; in the later
80 translation to terms, $\Variable x$ may become a constant, free or bound
81 variable, even a type constructor or class name; the actual outcome depends
84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
85 application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of
86 application is determined later by context; it could be a type constructor
89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later
91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
93 constructor \ttindex{Bound}).
96 \section{Transforming parse trees to ASTs}\label{sec:astofpt}
97 \index{ASTs!made from parse trees}
98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
100 The parse tree is the raw output of the parser. Translation functions,
101 called {\bf parse AST translations}\indexbold{translations!parse AST},
102 transform the parse tree into an abstract syntax tree.
104 The parse tree is constructed by nesting the right-hand sides of the
105 productions used to recognize the input. Such parse trees are simply lists
106 of tokens and constituent parse trees, the latter representing the
107 nonterminals of the productions. Let us refer to the actual productions in
108 the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
111 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
112 by stripping out delimiters and copy productions. More precisely, the
113 mapping $\astofpt{-}$ is derived from the productions as follows:
115 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
116 \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$
117 its associated string. Note that for {\tt xstr} this does not include the
120 \item Copy productions:\index{productions!copy}
121 $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for
122 strings of delimiters, which are discarded. $P$ stands for the single
123 constituent that is not a delimiter; it is either a nonterminal symbol or
126 \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
127 Here there are no constituents other than delimiters, which are
130 \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
131 the remaining constituents $P@1$, \ldots, $P@n$ are built into an
132 application whose head constant is~$c$:
133 \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
134 \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
137 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
138 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
139 These examples illustrate the need for further translations to make \AST{}s
140 closer to the typed $\lambda$-calculus. The Pure syntax provides
141 predefined parse \AST{} translations\index{translations!parse AST} for
142 ordinary applications, type applications, nested abstractions, meta
143 implications and function types. Figure~\ref{fig:parse_ast_tr} shows their
144 effect on some representative input strings.
149 \tt\begin{tabular}{ll}
150 \rm input string & \rm \AST \\\hline
153 "t == u" & ("==" t u) \\
154 "f(x)" & ("_appl" f x) \\
155 "f(x, y)" & ("_appl" f ("_args" x y)) \\
156 "f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\
157 "\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\
160 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
165 \tt\begin{tabular}{ll}
166 \rm input string & \rm \AST{} \\\hline
167 "f(x, y, z)" & (f x y z) \\
169 "('a, 'b) ty" & (ty 'a 'b) \\
170 "\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\
171 "\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\
172 "[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\
173 "['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
176 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
179 The names of constant heads in the \AST{} control the translation process.
180 The list of constants invoking parse \AST{} translations appears in the
181 output of {\tt print_syntax} under {\tt parse_ast_translation}.
184 \section{Transforming ASTs to terms}\label{sec:termofast}
185 \index{terms!made from ASTs}
186 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
188 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
189 transformed into a term. This term is probably ill-typed since type
190 inference has not occurred yet. The term may contain type constraints
191 consisting of applications with head {\tt "_constrain"}; the second
192 argument is a type encoded as a term. Type inference later introduces
193 correct types or rejects the input.
195 Another set of translation functions, namely parse
196 translations\index{translations!parse}, may affect this process. If we
197 ignore parse translations for the time being, then \AST{}s are transformed
198 to terms by mapping \AST{} constants to constants, \AST{} variables to
199 schematic or free variables and \AST{} applications to applications.
201 More precisely, the mapping $\termofast{-}$ is defined by
203 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
206 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
207 \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
208 the index extracted from~$xi$.
210 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
213 \item Function applications with $n$ arguments:
214 \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
216 \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
219 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
220 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
221 while \ttindex{dummyT} stands for some dummy type that is ignored during
224 So far the outcome is still a first-order term. Abstractions and bound
225 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
226 by parse translations. Such translations are attached to {\tt "_abs"},
227 {\tt "!!"} and user-defined binders.
230 \section{Printing of terms}
231 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
233 The output phase is essentially the inverse of the input phase. Terms are
234 translated via abstract syntax trees into strings. Finally the strings are
237 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
238 terms into \AST{}s. Ignoring those, the transformation maps
239 term constants, variables and applications to the corresponding constructs
240 on \AST{}s. Abstractions are mapped to applications of the special
243 More precisely, the mapping $\astofterm{-}$ is defined as follows:
245 \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
247 \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
250 \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
251 \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
252 the {\tt indexname} $(x, i)$.
254 \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
255 of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
256 be obtained from~$t$ by replacing all bound occurrences of~$x$ by
257 the free variable $x'$. This replaces corresponding occurrences of the
258 constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
260 \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
261 \Appl{\Constant \mtt{"_abs"},
262 constrain(\Variable x', \tau), \astofterm{t'}}
265 \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
266 The occurrence of constructor \ttindex{Bound} should never happen
267 when printing well-typed terms; it indicates a de Bruijn index with no
268 matching abstraction.
270 \item Where $f$ is not an application,
271 \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
272 \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
276 Type constraints\index{type constraints} are inserted to allow the printing
277 of types. This is governed by the boolean variable \ttindex{show_types}:
279 \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
280 \ttindex{show_types} is set to {\tt false}.
282 \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
283 \astofterm{\tau}}$ \ otherwise.
285 Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
286 constructors go to {\tt Constant}s; type identifiers go to {\tt
287 Variable}s; type applications go to {\tt Appl}s with the type
288 constructor as the first element. If \ttindex{show_sorts} is set to
289 {\tt true}, some type variables are decorated with an \AST{} encoding
293 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
294 transformed into the final output string. The built-in {\bf print AST
295 translations}\indexbold{translations!print AST} reverse the
296 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
298 For the actual printing process, the names attached to productions
299 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
300 a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
301 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
302 for~$c$. Each argument~$x@i$ is converted to a string, and put in
303 parentheses if its priority~$(p@i)$ requires this. The resulting strings
304 and their syntactic sugar (denoted by \dots{} above) are joined to make a
307 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
308 than the corresponding production, it is first split into
309 $((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications
310 with too few arguments or with non-constant head or without a
311 corresponding production are printed as $f(x@1, \ldots, x@l)$ or
312 $(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated
313 with some name $c$ are tried in order of appearance. An occurrence of
314 $\Variable x$ is simply printed as~$x$.
316 Blanks are {\em not\/} inserted automatically. If blanks are required to
317 separate tokens, specify them in the mixfix declaration, possibly preceded
318 by a slash~({\tt/}) to allow a line break.
323 \section{Macros: syntactic rewriting} \label{sec:macros}
324 \index{macros|(}\index{rewriting!syntactic|(}
326 Mixfix declarations alone can handle situations where there is a direct
327 connection between the concrete syntax and the underlying term. Sometimes
328 we require a more elaborate concrete syntax, such as quantifiers and list
329 notation. Isabelle's {\bf macros} and {\bf translation functions} can
330 perform translations such as
332 \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
333 ALL x:A.P & Ball(A, \%x.P) \\ \relax
334 [x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))
337 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
338 are the most powerful translation mechanism but are difficult to read or
339 write. Macros are specified by first-order rewriting systems that operate
340 on abstract syntax trees. They are usually easy to read and write, and can
341 express all but the most obscure translations.
343 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
344 set theory.\footnote{This and the following theories are complete
345 working examples, though they specify only syntax, no axioms. The
346 file {\tt ZF/ZF.thy} presents a full set theory definition,
347 including many macro rules.} Theory {\tt SetSyntax} declares
348 constants for set comprehension ({\tt Collect}), replacement ({\tt
349 Replace}) and bounded universal quantification ({\tt Ball}). Each
350 of these binds some variables. Without additional syntax we should
351 have to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, and
352 similarly for the others.
362 Trueprop :: o => prop ("_" 5)
363 Collect :: [i, i => o] => i
364 Replace :: [i, [i, i] => o] => i
365 Ball :: [i, i => o] => o
367 "{\at}Collect" :: [idt, i, o] => i ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
368 "{\at}Replace" :: [idt, idt, i, o] => i ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
369 "{\at}Ball" :: [idt, i, o] => o ("(3ALL _:_./ _)" 10)
371 "{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)"
372 "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
373 "ALL x:A. P" == "Ball(A, \%x. P)"
376 \caption{Macro example: set theory}\label{fig:set_trans}
379 The theory specifies a variable-binding syntax through additional productions
380 that have mixfix declarations. Each non-copy production must specify some
381 constant, which is used for building \AST{}s. \index{constants!syntactic} The
382 additional constants are decorated with {\tt\at} to stress their purely
383 syntactic purpose; they may not occur within the final well-typed terms,
384 being declared as {\tt syntax} rather than {\tt consts}.
386 The translations cause the replacement of external forms by internal forms
387 after parsing, and vice versa before printing of terms. As a specification
388 of the set theory notation, they should be largely self-explanatory. The
389 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
390 appear implicitly in the macro rules via their mixfix forms.
392 Macros can define variable-binding syntax because they operate on \AST{}s,
393 which have no inbuilt notion of bound variable. The macro variables {\tt
394 x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
395 in this case bound variables. The macro variables {\tt P} and~{\tt Q}
396 range over formulae containing bound variable occurrences.
398 Other applications of the macro system can be less straightforward, and
399 there are peculiarities. The rest of this section will describe in detail
400 how Isabelle macros are preprocessed and applied.
403 \subsection{Specifying macros}
404 Macros are basically rewrite rules on \AST{}s. But unlike other macro
405 systems found in programming languages, Isabelle's macros work in both
406 directions. Therefore a syntax contains two lists of rewrites: one for
407 parsing and one for printing.
409 \index{*translations section}
410 The {\tt translations} section specifies macros. The syntax for a macro is
411 \[ (root)\; string \quad
412 \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
417 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
418 ({\tt ==}). The two strings specify the left and right-hand sides of the
419 macro rule. The $(root)$ specification is optional; it specifies the
420 nonterminal for parsing the $string$ and if omitted defaults to {\tt
421 logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:
423 \item Rules must be left linear: $l$ must not contain repeated variables.
425 \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
426 (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
428 \item Every variable in~$r$ must also occur in~$l$.
431 Macro rules may refer to any syntax from the parent theories. They
432 may also refer to anything defined before the current {\tt
433 translations} section --- including any mixfix declarations.
435 Upon declaration, both sides of the macro rule undergo parsing and parse
436 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
437 macro expansion. The lexer runs in a different mode that additionally
438 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
439 {\tt _K}). Thus, a constant whose name starts with an underscore can
440 appear in macro rules but not in ordinary terms.
442 Some atoms of the macro rule's \AST{} are designated as constants for
443 matching. These are all names that have been declared as classes, types or
444 constants (logical and syntactic).
446 The result of this preprocessing is two lists of macro rules, each
447 stored as a pair of \AST{}s. They can be viewed using {\tt
448 print_syntax} (sections \ttindex{parse_rules} and
449 \ttindex{print_rules}). For theory~{\tt SetSyntax} of
450 Fig.~\ref{fig:set_trans} these are
453 ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))
454 ("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q)))
455 ("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))
457 ("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P)
458 ("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q)
459 ("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)
463 Avoid choosing variable names that have previously been used as
464 constants, types or type classes; the {\tt consts} section in the output
465 of {\tt print_syntax} lists all such names. If a macro rule works
466 incorrectly, inspect its internal form as shown above, recalling that
467 constants appear as quoted strings and variables without quotes.
471 If \ttindex{eta_contract} is set to {\tt true}, terms will be
472 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some
473 abstraction nodes needed for print rules to match may vanish. For example,
474 \verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
475 not apply and the output will be {\tt Ball(A, P)}. This problem would not
476 occur if \ML{} translation functions were used instead of macros (as is
477 done for binder declarations).
482 Another trap concerns type constraints. If \ttindex{show_types} is set to
483 {\tt true}, bound variables will be decorated by their meta types at the
484 binding place (but not at occurrences in the body). Matching with
485 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
486 "i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to
487 appear in the external form, say \verb|{y::i:A::i. P::o}|.
489 To allow such constraints to be re-read, your syntax should specify bound
490 variables using the nonterminal~\ndx{idt}. This is the case in our
491 example. Choosing {\tt id} instead of {\tt idt} is a common error.
496 \subsection{Applying rules}
497 As a term is being parsed or printed, an \AST{} is generated as an
498 intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is
499 normalised by applying macro rules in the manner of a traditional term
500 rewriting system. We first examine how a single rule is applied.
502 Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
503 translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an
504 instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex
505 matched by $l$ may be replaced by the corresponding instance of~$r$, thus
506 {\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf
507 place-holders} that may occur in rule patterns but not in ordinary
508 \AST{}s; {\tt Variable} atoms serve this purpose.
510 The matching of the object~$u$ by the pattern~$l$ is performed as follows:
512 \item Every constant matches itself.
514 \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
515 This point is discussed further below.
517 \item Every \AST{} in the object matches $\Variable x$ in the pattern,
520 \item One application matches another if they have the same number of
521 subtrees and corresponding subtrees match.
523 \item In every other case, matching fails. In particular, {\tt
524 Constant}~$x$ can only match itself.
526 A successful match yields a substitution that is applied to~$r$, generating
527 the instance that replaces~$u$.
529 The second case above may look odd. This is where {\tt Variable}s of
530 non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not
531 far removed from parse trees; at this level it is not yet known which
532 identifiers will become constants, bounds, frees, types or classes. As
533 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
534 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
535 \ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other
536 hand, when \AST{}s generated from terms for printing, all constants and type
537 constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may
538 contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is
539 insignificant at macro level because matching treats them alike.
541 Because of this behaviour, different kinds of atoms with the same name are
542 indistinguishable, which may make some rules prone to misbehaviour. Example:
549 "[]" :: 'a list ("[]")
553 The term {\tt Nil} will be printed as {\tt []}, just as expected.
554 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
555 expected! Guess how type~{\tt Nil} is printed?
557 Normalizing an \AST{} involves repeatedly applying macro rules until
558 none are applicable. Macro rules are chosen in order of appearance in
559 the theory definitions. You can watch the normalization of \AST{}s
560 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast}
561 to {\tt true}.\index{tracing!of macros} Alternatively, use
562 \ttindex{Syntax.test_read}. The information displayed when tracing
563 includes the \AST{} before normalization ({\tt pre}), redexes with
564 results ({\tt rewrote}), the normal form finally reached ({\tt post})
565 and some statistics ({\tt normalize}). If tracing is off,
566 \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to
567 enable printing of the normal form and statistics only.
570 \subsection{Example: the syntax of finite sets}
571 \index{examples!of macros}
573 This example demonstrates the use of recursive macros to implement a
574 convenient notation for finite sets.
575 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
576 \index{"@Enum@{\tt\at Enum} constant}
577 \index{"@Finset@{\tt\at Finset} constant}
579 FinSyntax = SetSyntax +
584 "{\at}Enum" :: [i, is] => is ("_,/ _")
586 empty :: i ("{\ttlbrace}{\ttrbrace}")
587 insert :: [i, i] => i
589 "{\at}Finset" :: is => i ("{\ttlbrace}(_){\ttrbrace}")
591 "{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"
592 "{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"
595 Finite sets are internally built up by {\tt empty} and {\tt insert}. The
596 declarations above specify \verb|{x, y, z}| as the external representation
599 insert(x, insert(y, insert(z, empty)))
601 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
602 i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|}
603 allows a line break after the comma for \rmindex{pretty printing}; if no
604 line break is required then a space is printed instead.
606 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
607 declaration. Hence {\tt is} is not a logical type and may be used safely as
608 a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be
609 re-used for other enumerations of type~{\tt i} like lists or tuples. If we
610 had needed polymorphic enumerations, we could have used the predefined
611 nonterminal symbol \ndx{args} and skipped this part altogether.
613 \index{"@Finset@{\tt\at Finset} constant}
614 Next follows {\tt empty}, which is already equipped with its syntax
615 \verb|{}|, and {\tt insert} without concrete syntax. The syntactic
616 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
617 i} enclosed in curly braces. Remember that a pair of parentheses, as in
618 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
620 The translations may look strange at first. Macro rules are best
621 understood in their internal forms:
624 ("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs))
625 ("{\at}Finset" x) -> ("insert" x "empty")
627 ("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs))
628 ("insert" x "empty") -> ("{\at}Finset" x)
630 This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
631 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
632 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
633 The parse rules only work in the order given.
636 The \AST{} rewriter cannot distinguish constants from variables and looks
637 only for names of atoms. Thus the names of {\tt Constant}s occurring in
638 the (internal) left-hand side of translation rules should be regarded as
639 \rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or
640 sufficiently long and strange names. If a bound variable's name gets
641 rewritten, the result will be incorrect; for example, the term
643 \%empty insert. insert(x, empty)
645 \par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
649 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
650 \index{examples!of macros}
652 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
653 the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal;
654 if allowed, it could cause variable capture. In such cases you usually
655 must fall back on translation functions. But a trick can make things
656 readable in some cases: {\em calling\/} translation functions by parse
659 ProdSyntax = SetSyntax +
661 Pi :: [i, i => i] => i
663 "{\at}PROD" :: [idt, i, i] => i ("(3PROD _:_./ _)" 10)
664 "{\at}->" :: [i, i] => i ("(_ ->/ _)" [51, 50] 50)
667 "PROD x:A. B" => "Pi(A, \%x. B)"
668 "A -> B" => "Pi(A, _K(B))"
671 val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
674 Here {\tt Pi} is a logical constant for constructing general products.
675 Two external forms exist: the general case {\tt PROD x:A.B} and the
676 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
677 does not depend on~{\tt x}.
679 The second parse macro introduces {\tt _K(B)}, which later becomes
680 \verb|%x.B| due to a parse translation associated with \cdx{_K}.
681 Unfortunately there is no such trick for printing, so we have to add a {\tt
682 ML} section for the print translation \ttindex{dependent_tr'}.
684 Recall that identifiers with a leading {\tt _} are allowed in translation
685 rules, but not in ordinary terms. Thus we can create \AST{}s containing
686 names that are not directly expressible.
688 The parse translation for {\tt _K} is already installed in Pure, and the
689 function {\tt dependent_tr'} is exported by the syntax module for public use.
690 See \S\ref{sec:tr_funs} below for more of the arcane lore of translation
691 functions. \index{macros|)}\index{rewriting!syntactic|)}
694 \section{Translation functions} \label{sec:tr_funs}
695 \index{translations|(}
697 This section describes the translation function mechanism. By writing
698 \ML{} functions, you can do almost everything to terms or \AST{}s
699 during parsing and printing. The logic \LK\ is a good example of
700 sophisticated transformations between internal and external
701 representations of sequents; here, macros would be useless.
703 A full understanding of translations requires some familiarity
704 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
705 {\tt Syntax.ast} and the encodings of types and terms as such at the various
706 stages of the parsing or printing process. Most users should never need to
707 use translation functions.
709 \subsection{Declaring translation functions}
710 There are four kinds of translation functions, with one of these
711 coming in two variants. Each such function is associated with a name,
712 which triggers calls to it. Such names can be constants (logical or
713 syntactic) or type constructors.
715 Function {\tt print_syntax} displays the sets of names associated with the
716 translation functions of a theory under \texttt{parse_ast_translation}, etc.
717 You can add new ones via the {\tt ML} section\index{*ML section} of a theory
718 definition file. Even though the {\tt ML} section is the very last part of
719 the file, newly installed translation functions are already effective when
720 processing all of the preceding sections.
722 The {\tt ML} section's contents are simply copied verbatim near the
723 beginning of the \ML\ file generated from a theory definition file.
724 Definitions made here are accessible as components of an \ML\
725 structure; to make some parts private, use an \ML{} {\tt local}
726 declaration. The {\ML} code may install translation functions by
727 declaring any of the following identifiers:
729 val parse_ast_translation : (string * (ast list -> ast)) list
730 val print_ast_translation : (string * (ast list -> ast)) list
731 val parse_translation : (string * (term list -> term)) list
732 val print_translation : (string * (term list -> term)) list
733 val typed_print_translation :
734 (string * (bool -> typ -> term list -> term)) list
737 \subsection{The translation strategy}
738 The different kinds of translation functions are called during the
739 transformations between parse trees, \AST{}s and terms (recall
740 Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form
741 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation
742 function $f$ of appropriate kind exists for $c$, the result is
743 computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
745 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.
746 A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1,
747 \ldots, x@n}$. For term translations, the arguments are terms and a
748 combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}
749 (c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more
750 sophisticated transformations than \AST{}s do, typically involving
751 abstractions and bound variables. {\em Typed} print translations may
752 even peek at the type $\tau$ of the constant they are invoked on; they
753 are also passed the current value of the \ttindex{show_sorts} flag.
755 Regardless of whether they act on terms or \AST{}s, translation
756 functions called during the parsing process differ from those for
757 printing more fundamentally in their overall behaviour:
759 \item[Parse translations] are applied bottom-up. The arguments are already in
760 translated form. The translations must not fail; exceptions trigger an
761 error message. There may never be more than one function associated with
764 \item[Print translations] are applied top-down. They are supplied with
765 arguments that are partly still in internal form. The result again
766 undergoes translation; therefore a print translation should not introduce as
767 head the very constant that invoked it. The function may raise exception
768 \xdx{Match} to indicate failure; in this event it has no effect. Multiple
769 functions associated with some syntactic name are tried in an unspecified
773 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
774 \ttindex{Const} for terms --- can invoke translation functions. This
775 causes another difference between parsing and printing.
777 Parsing starts with a string and the constants are not yet identified.
778 Only parse tree heads create {\tt Constant}s in the resulting \AST, as
779 described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may
780 introduce further {\tt Constant}s. When the final \AST{} is converted to a
781 term, all {\tt Constant}s become {\tt Const}s, as described in
782 \S\ref{sec:termofast}.
784 Printing starts with a well-typed term and all the constants are known. So
785 all logical constants and type constructors may invoke print translations.
786 These, and macros, may introduce further constants.
789 \subsection{Example: a print translation for dependent types}
790 \index{examples!of translations}\indexbold{*dependent_tr'}
792 Let us continue the dependent type example (page~\pageref{prod_trans}) by
793 examining the parse translation for~\cdx{_K} and the print translation
794 {\tt dependent_tr'}, which are both built-in. By convention, parse
795 translations have names ending with {\tt _tr} and print translations have
796 names ending with {\tt _tr'}. Search for such names in the Isabelle
797 sources to locate more examples.
799 Here is the parse translation for {\tt _K}:
801 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
802 | k_tr ts = raise TERM ("k_tr", ts);
804 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
805 {\tt Abs} node with a body derived from $t$. Since terms given to parse
806 translations are not yet typed, the type of the bound variable in the new
807 {\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}
808 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
809 a basic term manipulation function defined in {\tt Pure/term.ML}.
811 Here is the print translation for dependent types:
813 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
814 if 0 mem (loose_bnos B) then
815 let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in
818 Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts)
820 else list_comb (Const (r, dummyT) $ A $ B, ts)
821 | dependent_tr' _ _ = raise Match;
823 The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt
824 dependent_tr'} by a partial application during its installation.
825 For example, we could set up print translations for both {\tt Pi} and
826 {\tt Sigma} by including
827 \begin{ttbox}\index{*ML section}
828 val print_translation =
829 [("Pi", dependent_tr' ("{\at}PROD", "{\at}->")),
830 ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
832 within the {\tt ML} section. The first of these transforms ${\tt
833 Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
834 $\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not
835 depend on~$x$. It checks this using \ttindex{loose_bnos}, yet another
836 function from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$
837 renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt
838 Bound} nodes referring to the {\tt Abs} node replaced by
839 $\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound
842 We must be careful with types here. While types of {\tt Const}s are
843 ignored, type constraints may be printed for some {\tt Free}s and
844 {\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type
845 \ttindex{dummyT} are never printed with constraint, though. The line
847 let val (x', B') = Syntax.variant_abs' (x, dummyT, B);
848 \end{ttbox}\index{*Syntax.variant_abs'}
849 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
850 type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the
851 correct type~{\tt T}, so this is the only place where a type
852 constraint might appear.
854 Also note that we are responsible to mark free identifiers that
855 actually represent bound variables. This is achieved by
856 \ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.
857 Failing to do so may cause these names to be printed in the wrong
858 style. \index{translations|)} \index{syntax!transformations|)}
861 \section{Token translations} \label{sec:tok_tr}
862 \index{token translations|(}
864 Isabelle's meta-logic features quite a lot of different kinds of
865 identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},
866 {\em bound}, {\em var}. One might want to have these printed in
867 different styles, e.g.\ in bold or italic, or even transcribed into
868 something more readable like $\alpha, \alpha', \beta$ instead of {\tt
869 'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translations
870 provide a means to such ends, enabling the user to install certain
871 \ML{} functions associated with any logical \rmindex{token class} and
872 depending on some \rmindex{print mode}.
874 The logical class of identifiers can not necessarily be determined by
875 its syntactic category, though. For example, consider free vs.\ bound
876 variables. So Isabelle's pretty printing mechanism, starting from
877 fully typed terms, has to be careful to preserve this additional
878 information\footnote{This is done by marking atoms in abstract syntax
879 trees appropriately. The marks are actually visible by print
880 translation functions -- they are just special constants applied to
881 atomic asts, for example \texttt{("_bound" x)}.}. In particular,
882 user-supplied print translation functions operating on terms have to
883 be well-behaved in this respect. Free identifiers introduced to
884 represent bound variables have to be marked appropriately (cf.\ the
885 example at the end of \S\ref{sec:tr_funs}).
887 \medskip Token translations may be installed by declaring the
888 \ttindex{token_translation} value within the \texttt{ML} section of a theory
891 val token_translation:
892 (string * string * (string -> string * real)) list
894 The elements of this list are of the form $(m, c, f)$, where $m$ is a print
895 mode identifier, $c$ a token class, and $f\colon string \to string \times
896 real$ the actual translation function. Assuming that $x$ is of identifier
897 class $c$, and print mode $m$ is the first (active) mode providing some
898 translation for $c$, then $x$ is output according to $f(x) = (x', len)$.
899 Thereby $x'$ is the modified identifier name and $len$ its visual length in
900 terms of characters (e.g.\ length 1.0 would correspond to $1/2$\,em in
901 \LaTeX). Thus $x'$ may include non-printing parts like control sequences or
902 markup information for typesetting systems.
905 \index{token translations|)}
910 %%% TeX-master: "ref"