doc-src/Ref/syntax.tex
author lcp
Fri, 15 Apr 1994 17:16:23 +0200
changeset 323 361a71713176
child 332 01b87a921967
permissions -rw-r--r--
penultimate Springer draft
     1 %% $Id$
     2 \chapter{Syntax Transformations} \label{chap:syntax}
     3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
     4 \newcommand\mtt[1]{\mbox{\tt #1}}
     5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
     6 \newcommand\Constant{\ttfct{Constant}}
     7 \newcommand\Variable{\ttfct{Variable}}
     8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
     9 \index{syntax!transformations|(}
    10 
    11 This chapter is intended for experienced Isabelle users who need to define
    12 macros or code their own translation functions.  It describes the
    13 transformations between parse trees, abstract syntax trees and terms.
    14 
    15 
    16 \section{Abstract syntax trees} \label{sec:asts}
    17 \index{ASTs|(} 
    18 
    19 The parser, given a token list from the lexer, applies productions to yield
    20 a parse tree\index{parse trees}.  By applying some internal transformations
    21 the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
    22 expansion, further translations and finally type inference yields a
    23 well-typed term.  The printing process is the reverse, except for some
    24 subtleties to be discussed later.
    25 
    26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.
    27 Much of the complexity is due to the macro mechanism.  Using macros, you
    28 can specify most forms of concrete syntax without writing any \ML{} code.
    29 
    30 \begin{figure}
    31 \begin{center}
    32 \begin{tabular}{cl}
    33 string          & \\
    34 $\downarrow$    & parser \\
    35 parse tree      & \\
    36 $\downarrow$    & parse \AST{} translation \\
    37 \AST{}             & \\
    38 $\downarrow$    & \AST{} rewriting (macros) \\
    39 \AST{}             & \\
    40 $\downarrow$    & parse translation, type inference \\
    41 --- well-typed term --- & \\
    42 $\downarrow$    & print translation \\
    43 \AST{}             & \\
    44 $\downarrow$    & \AST{} rewriting (macros) \\
    45 \AST{}             & \\
    46 $\downarrow$    & print \AST{} translation, printer \\
    47 string          &
    48 \end{tabular}
    49 
    50 \end{center}
    51 \caption{Parsing and printing}\label{fig:parse_print}
    52 \end{figure}
    53 
    54 Abstract syntax trees are an intermediate form between the raw parse trees
    55 and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
    56 variable) or a list of {\em at least two\/} subtrees.  Internally, they
    57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
    58 \index{*Appl}
    59 \begin{ttbox}
    60 datatype ast = Constant of string
    61              | Variable of string
    62              | Appl of ast list
    63 \end{ttbox}
    64 %
    65 Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
    66 atoms are shown as quoted strings, variable atoms as non-quoted strings and
    67 applications as a parenthesized list of subtrees.  For example, the \AST
    68 \begin{ttbox}
    69 Appl [Constant "_constrain",
    70       Appl [Constant "_abs", Variable "x", Variable "t"],
    71       Appl [Constant "fun", Variable "'a", Variable "'b"]]
    72 \end{ttbox}
    73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
    74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
    75 subtrees. 
    76 
    77 The resemblance to Lisp's S-expressions is intentional, but there are two
    78 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
    79 names {\tt Constant} and {\tt Variable} too literally; in the later
    80 translation to terms, $\Variable x$ may become a constant, free or bound
    81 variable, even a type constructor or class name; the actual outcome depends
    82 on the context.
    83 
    84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
    85 application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
    86 application is determined later by context; it could be a type constructor
    87 applied to types.
    88 
    89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
    90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
    91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
    92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
    93 constructor \ttindex{Bound}).
    94 
    95 
    96 \section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
    97 \index{ASTs!made from parse trees}
    98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
    99 
   100 The parse tree is the raw output of the parser.  Translation functions,
   101 called {\bf parse AST translations}\indexbold{translations!parse AST},
   102 transform the parse tree into an abstract syntax tree.
   103 
   104 The parse tree is constructed by nesting the right-hand sides of the
   105 productions used to recognize the input.  Such parse trees are simply lists
   106 of tokens and constituent parse trees, the latter representing the
   107 nonterminals of the productions.  Let us refer to the actual productions in
   108 the form displayed by {\tt Syntax.print_syntax}.
   109 
   110 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
   111 by stripping out delimiters and copy productions.  More precisely, the
   112 mapping $\astofpt{-}$ is derived from the productions as follows:
   113 \begin{itemize}
   114 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
   115   \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
   116 
   117 \item Copy productions:\index{productions!copy}
   118   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
   119   strings of delimiters, which are discarded.  $P$ stands for the single
   120   constituent that is not a delimiter; it is either a nonterminal symbol or
   121   a name token.
   122 
   123   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
   124     Here there are no constituents other than delimiters, which are
   125     discarded. 
   126 
   127   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
   128     the remaining constituents $P@1$, \ldots, $P@n$ are built into an
   129     application whose head constant is~$c$:
   130     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = 
   131        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
   132     \]
   133 \end{itemize}
   134 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
   135 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
   136 These examples illustrate the need for further translations to make \AST{}s
   137 closer to the typed $\lambda$-calculus.  The Pure syntax provides
   138 predefined parse \AST{} translations\index{translations!parse AST} for
   139 ordinary applications, type applications, nested abstractions, meta
   140 implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
   141 effect on some representative input strings.
   142 
   143 
   144 \begin{figure}
   145 \begin{center}
   146 \tt\begin{tabular}{ll}
   147 \rm input string    & \rm \AST \\\hline
   148 "f"                 & f \\
   149 "'a"                & 'a \\
   150 "t == u"            & ("==" t u) \\
   151 "f(x)"              & ("_appl" f x) \\
   152 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
   153 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
   154 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
   155 \end{tabular}
   156 \end{center}
   157 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
   158 \end{figure}
   159 
   160 \begin{figure}
   161 \begin{center}
   162 \tt\begin{tabular}{ll}
   163 \rm input string            & \rm \AST{} \\\hline
   164 "f(x, y, z)"                & (f x y z) \\
   165 "'a ty"                     & (ty 'a) \\
   166 "('a, 'b) ty"               & (ty 'a 'b) \\
   167 "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
   168 "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
   169 "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
   170 "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
   171 \end{tabular}
   172 \end{center}
   173 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
   174 \end{figure}
   175 
   176 The names of constant heads in the \AST{} control the translation process.
   177 The list of constants invoking parse \AST{} translations appears in the
   178 output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
   179 
   180 
   181 \section{Transforming \AST{}s to terms}\label{sec:termofast}
   182 \index{terms!made from ASTs}
   183 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
   184 
   185 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
   186 transformed into a term.  This term is probably ill-typed since type
   187 inference has not occurred yet.  The term may contain type constraints
   188 consisting of applications with head {\tt "_constrain"}; the second
   189 argument is a type encoded as a term.  Type inference later introduces
   190 correct types or rejects the input.
   191 
   192 Another set of translation functions, namely parse
   193 translations\index{translations!parse}, may affect this process.  If we
   194 ignore parse translations for the time being, then \AST{}s are transformed
   195 to terms by mapping \AST{} constants to constants, \AST{} variables to
   196 schematic or free variables and \AST{} applications to applications.
   197 
   198 More precisely, the mapping $\termofast{-}$ is defined by
   199 \begin{itemize}
   200 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
   201   \mtt{dummyT})$.
   202 
   203 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
   204   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
   205   the index extracted from~$xi$.
   206 
   207 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
   208   \mtt{dummyT})$.
   209 
   210 \item Function applications with $n$ arguments:
   211     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = 
   212        \termofast{f} \ttapp
   213          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
   214     \]
   215 \end{itemize}
   216 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
   217 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
   218 while \ttindex{dummyT} stands for some dummy type that is ignored during
   219 type inference.
   220 
   221 So far the outcome is still a first-order term.  Abstractions and bound
   222 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
   223 by parse translations.  Such translations are attached to {\tt "_abs"},
   224 {\tt "!!"} and user-defined binders.
   225 
   226 
   227 \section{Printing of terms}
   228 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
   229 
   230 The output phase is essentially the inverse of the input phase.  Terms are
   231 translated via abstract syntax trees into strings.  Finally the strings are
   232 pretty printed.
   233 
   234 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
   235 terms into \AST{}s.  Ignoring those, the transformation maps
   236 term constants, variables and applications to the corresponding constructs
   237 on \AST{}s.  Abstractions are mapped to applications of the special
   238 constant {\tt _abs}.
   239 
   240 More precisely, the mapping $\astofterm{-}$ is defined as follows:
   241 \begin{itemize}
   242   \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
   243 
   244   \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
   245     \tau)$.
   246 
   247   \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
   248     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
   249     the {\tt indexname} $(x, i)$.
   250 
   251   \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
   252     of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
   253     be obtained from~$t$ by replacing all bound occurrences of~$x$ by
   254     the free variable $x'$.  This replaces corresponding occurrences of the
   255     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
   256     \mtt{dummyT})$:
   257    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = 
   258       \Appl{\Constant \mtt{"_abs"}, 
   259         constrain(\Variable x', \tau), \astofterm{t'}}.
   260     \]
   261 
   262   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.  
   263     The occurrence of constructor \ttindex{Bound} should never happen
   264     when printing well-typed terms; it indicates a de Bruijn index with no
   265     matching abstraction.
   266 
   267   \item Where $f$ is not an application,
   268     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = 
   269        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
   270     \]
   271 \end{itemize}
   272 %
   273 Type constraints are inserted to allow the printing of types.  This is
   274 governed by the boolean variable \ttindex{show_types}:
   275 \begin{itemize}
   276   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
   277     \ttindex{show_types} not set to {\tt true}.
   278 
   279   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, 
   280          \astofterm{\tau}}$ \ otherwise.  
   281 
   282     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
   283     constructors go to {\tt Constant}s; type identifiers go to {\tt
   284       Variable}s; type applications go to {\tt Appl}s with the type
   285     constructor as the first element.  If \ttindex{show_sorts} is set to
   286     {\tt true}, some type variables are decorated with an \AST{} encoding
   287     of their sort.
   288 \end{itemize}
   289 %
   290 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
   291 transformed into the final output string.  The built-in {\bf print AST
   292   translations}\indexbold{translations!print AST} effectively reverse the
   293 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
   294 
   295 For the actual printing process, the names attached to productions
   296 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
   297 a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
   298 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
   299 for~$c$.  Each argument~$x@i$ is converted to a string, and put in
   300 parentheses if its priority~$(p@i)$ requires this.  The resulting strings
   301 and their syntactic sugar (denoted by \dots{} above) are joined to make a
   302 single string.
   303 
   304 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
   305 corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
   306 x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with
   307 non-constant head or without a corresponding production are printed as
   308 $f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
   309 $\Variable x$ is simply printed as~$x$.
   310 
   311 Blanks are {\em not\/} inserted automatically.  If blanks are required to
   312 separate tokens, specify them in the mixfix declaration, possibly preceded
   313 by a slash~({\tt/}) to allow a line break.
   314 \index{ASTs|)}
   315 
   316 
   317 
   318 \section{Macros: Syntactic rewriting} \label{sec:macros}
   319 \index{macros|(}\index{rewriting!syntactic|(} 
   320 
   321 Mixfix declarations alone can handle situations where there is a direct
   322 connection between the concrete syntax and the underlying term.  Sometimes
   323 we require a more elaborate concrete syntax, such as quantifiers and list
   324 notation.  Isabelle's {\bf macros} and {\bf translation functions} can
   325 perform translations such as
   326 \begin{center}\tt
   327   \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
   328     ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
   329     [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
   330   \end{tabular}
   331 \end{center}
   332 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
   333 are the most powerful translation mechanism but are difficult to read or
   334 write.  Macros are specified by first-order rewriting systems that operate
   335 on abstract syntax trees.  They are usually easy to read and write, and can
   336 express all but the most obscure translations.
   337 
   338 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
   339 theory.\footnote{This and the following theories are complete working
   340   examples, though they specify only syntax, no axioms.  The file {\tt
   341     ZF/zf.thy} presents the full set theory definition, including many
   342   macro rules.}  Theory {\tt SET} defines constants for set comprehension
   343 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
   344 quantification ({\tt Ball}).  Each of these binds some variables.  Without
   345 additional syntax we should have to express $\forall x \in A.  P$ as {\tt
   346   Ball(A,\%x.P)}, and similarly for the others.
   347 
   348 \begin{figure}
   349 \begin{ttbox}
   350 SET = Pure +
   351 types
   352   i, o
   353 arities
   354   i, o :: logic
   355 consts
   356   Trueprop      :: "o => prop"              ("_" 5)
   357   Collect       :: "[i, i => o] => i"
   358   "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
   359   Replace       :: "[i, [i, i] => o] => i"
   360   "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
   361   Ball          :: "[i, i => o] => o"
   362   "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
   363 translations
   364   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
   365   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
   366   "ALL x:A. P"  == "Ball(A, \%x. P)"
   367 end
   368 \end{ttbox}
   369 \caption{Macro example: set theory}\label{fig:set_trans}
   370 \end{figure}
   371 
   372 The theory specifies a variable-binding syntax through additional
   373 productions that have mixfix declarations.  Each non-copy production must
   374 specify some constant, which is used for building \AST{}s.
   375 \index{constants!syntactic} The additional constants are decorated with
   376 {\tt\at} to stress their purely syntactic purpose; they should never occur
   377 within the final well-typed terms.  Furthermore, they cannot be written in
   378 formulae because they are not legal identifiers.
   379 
   380 The translations cause the replacement of external forms by internal forms
   381 after parsing, and vice versa before printing of terms.  As a specification
   382 of the set theory notation, they should be largely self-explanatory.  The
   383 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
   384 appear implicitly in the macro rules via their mixfix forms.
   385 
   386 Macros can define variable-binding syntax because they operate on \AST{}s,
   387 which have no inbuilt notion of bound variable.  The macro variables {\tt
   388   x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
   389 in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
   390 range over formulae containing bound variable occurrences.
   391 
   392 Other applications of the macro system can be less straightforward, and
   393 there are peculiarities.  The rest of this section will describe in detail
   394 how Isabelle macros are preprocessed and applied.
   395 
   396 
   397 \subsection{Specifying macros}
   398 Macros are basically rewrite rules on \AST{}s.  But unlike other macro
   399 systems found in programming languages, Isabelle's macros work in both
   400 directions.  Therefore a syntax contains two lists of rewrites: one for
   401 parsing and one for printing.
   402 
   403 \index{*translations section}
   404 The {\tt translations} section specifies macros.  The syntax for a macro is
   405 \[ (root)\; string \quad
   406    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
   407    \right\} \quad
   408    (root)\; string 
   409 \]
   410 %
   411 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
   412 ({\tt ==}).  The two strings specify the left and right-hand sides of the
   413 macro rule.  The $(root)$ specification is optional; it specifies the
   414 nonterminal for parsing the $string$ and if omitted defaults to {\tt
   415   logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
   416 \begin{itemize}
   417 \item Rules must be left linear: $l$ must not contain repeated variables.
   418 
   419 \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
   420   (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
   421 
   422 \item Every variable in~$r$ must also occur in~$l$.
   423 \end{itemize}
   424 
   425 Macro rules may refer to any syntax from the parent theories.  They may
   426 also refer to anything defined before the the {\tt .thy} file's {\tt
   427   translations} section --- including any mixfix declarations.
   428 
   429 Upon declaration, both sides of the macro rule undergo parsing and parse
   430 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
   431 macro expansion.  The lexer runs in a different mode that additionally
   432 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
   433 {\tt _K}).  Thus, a constant whose name starts with an underscore can
   434 appear in macro rules but not in ordinary terms.
   435 
   436 Some atoms of the macro rule's \AST{} are designated as constants for
   437 matching.  These are all names that have been declared as classes, types or
   438 constants.
   439 
   440 The result of this preprocessing is two lists of macro rules, each stored
   441 as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
   442 (sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
   443 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
   444 \begin{ttbox}
   445 parse_rules:
   446   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
   447   ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
   448   ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
   449 print_rules:
   450   ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
   451   ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
   452   ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
   453 \end{ttbox}
   454 
   455 \begin{warn}
   456   Avoid choosing variable names that have previously been used as
   457   constants, types or type classes; the {\tt consts} section in the output
   458   of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
   459   incorrectly, inspect its internal form as shown above, recalling that
   460   constants appear as quoted strings and variables without quotes.
   461 \end{warn}
   462 
   463 \begin{warn}
   464 If \ttindex{eta_contract} is set to {\tt true}, terms will be
   465 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
   466 abstraction nodes needed for print rules to match may vanish.  For example,
   467 \verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does
   468 not apply and the output will be {\tt Ball(A, P)}.  This problem would not
   469 occur if \ML{} translation functions were used instead of macros (as is
   470 done for binder declarations).
   471 \end{warn}
   472 
   473 
   474 \begin{warn}
   475 Another trap concerns type constraints.  If \ttindex{show_types} is set to
   476 {\tt true}, bound variables will be decorated by their meta types at the
   477 binding place (but not at occurrences in the body).  Matching with
   478 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
   479 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
   480 appear in the external form, say \verb|{y::i:A::i. P::o}|.  
   481 
   482 To allow such constraints to be re-read, your syntax should specify bound
   483 variables using the nonterminal~\ndx{idt}.  This is the case in our
   484 example.  Choosing {\tt id} instead of {\tt idt} is a common error,
   485 especially since it appears in former versions of most of Isabelle's
   486 object-logics.
   487 \end{warn}
   488 
   489 
   490 
   491 \subsection{Applying rules}
   492 As a term is being parsed or printed, an \AST{} is generated as an
   493 intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
   494 normalized by applying macro rules in the manner of a traditional term
   495 rewriting system.  We first examine how a single rule is applied.
   496 
   497 Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some
   498 translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
   499 instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
   500 matched by $l$ may be replaced by the corresponding instance of~$r$, thus
   501 {\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
   502   place-holders} that may occur in rule patterns but not in ordinary
   503 \AST{}s; {\tt Variable} atoms serve this purpose.
   504 
   505 The matching of the object~$u$ by the pattern~$l$ is performed as follows:
   506 \begin{itemize}
   507   \item Every constant matches itself.
   508 
   509   \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
   510     This point is discussed further below.
   511 
   512   \item Every \AST{} in the object matches $\Variable x$ in the pattern,
   513     binding~$x$ to~$u$.
   514 
   515   \item One application matches another if they have the same number of
   516     subtrees and corresponding subtrees match.
   517 
   518   \item In every other case, matching fails.  In particular, {\tt
   519       Constant}~$x$ can only match itself.
   520 \end{itemize}
   521 A successful match yields a substitution that is applied to~$r$, generating
   522 the instance that replaces~$u$.
   523 
   524 The second case above may look odd.  This is where {\tt Variable}s of
   525 non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
   526 far removed from parse trees; at this level it is not yet known which
   527 identifiers will become constants, bounds, frees, types or classes.  As
   528 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
   529 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
   530 \ndx{tvar} become {\tt Variable}s.  On the other hand, when \AST{}s
   531 generated from terms for printing, all constants and type constructors
   532 become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may contain a
   533 messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
   534 insignificant at macro level because matching treats them alike.
   535 
   536 Because of this behaviour, different kinds of atoms with the same name are
   537 indistinguishable, which may make some rules prone to misbehaviour.  Example:
   538 \begin{ttbox}
   539 types
   540   Nil
   541 consts
   542   Nil     :: "'a list"
   543   "[]"    :: "'a list"    ("[]")
   544 translations
   545   "[]"    == "Nil"
   546 \end{ttbox}
   547 The term {\tt Nil} will be printed as {\tt []}, just as expected.
   548 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
   549 expected! 
   550 
   551 Normalizing an \AST{} involves repeatedly applying macro rules until none
   552 is applicable.  Macro rules are chosen in the order that they appear in the
   553 {\tt translations} section.  You can watch the normalization of \AST{}s
   554 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
   555 {\tt true}.\index{tracing!of macros} Alternatively, use
   556 \ttindex{Syntax.test_read}.  The information displayed when tracing
   557 includes the \AST{} before normalization ({\tt pre}), redexes with results
   558 ({\tt rewrote}), the normal form finally reached ({\tt post}) and some
   559 statistics ({\tt normalize}).  If tracing is off,
   560 \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
   561 printing of the normal form and statistics only.
   562 
   563 
   564 \subsection{Example: the syntax of finite sets}
   565 \index{examples!of macros}
   566 
   567 This example demonstrates the use of recursive macros to implement a
   568 convenient notation for finite sets.
   569 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
   570 \index{"@Enum@{\tt\at Enum} constant}
   571 \index{"@Finset@{\tt\at Finset} constant}
   572 \begin{ttbox}
   573 FINSET = SET +
   574 types
   575   is
   576 consts
   577   ""            :: "i => is"                ("_")
   578   "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
   579   empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
   580   insert        :: "[i, i] => i"
   581   "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
   582 translations
   583   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
   584   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
   585 end
   586 \end{ttbox}
   587 Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
   588 declarations above specify \verb|{x, y, z}| as the external representation
   589 of
   590 \begin{ttbox}
   591 insert(x, insert(y, insert(z, empty)))
   592 \end{ttbox}
   593 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
   594   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
   595 allows a line break after the comma for \rmindex{pretty printing}; if no
   596 line break is required then a space is printed instead.
   597 
   598 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
   599 declaration.  Hence {\tt is} is not a logical type and no default
   600 productions are added.  If we had needed enumerations of the nonterminal
   601 {\tt logic}, which would include all the logical types, we could have used
   602 the predefined nonterminal symbol \ndx{args} and skipped this part
   603 altogether.  The nonterminal~{\tt is} can later be reused for other
   604 enumerations of type~{\tt i} like lists or tuples.
   605 
   606 \index{"@Finset@{\tt\at Finset} constant}
   607 Next follows {\tt empty}, which is already equipped with its syntax
   608 \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
   609 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
   610   i} enclosed in curly braces.  Remember that a pair of parentheses, as in
   611 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
   612 
   613 The translations may look strange at first.  Macro rules are best
   614 understood in their internal forms:
   615 \begin{ttbox}
   616 parse_rules:
   617   ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
   618   ("{\at}Finset" x)  ->  ("insert" x "empty")
   619 print_rules:
   620   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
   621   ("insert" x "empty")  ->  ("{\at}Finset" x)
   622 \end{ttbox}
   623 This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
   624 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
   625 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
   626 The parse rules only work in the order given.
   627 
   628 \begin{warn}
   629   The \AST{} rewriter cannot discern constants from variables and looks
   630   only for names of atoms.  Thus the names of {\tt Constant}s occurring in
   631   the (internal) left-hand side of translation rules should be regarded as
   632   \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
   633   sufficiently long and strange names.  If a bound variable's name gets
   634   rewritten, the result will be incorrect; for example, the term
   635 \begin{ttbox}
   636 \%empty insert. insert(x, empty)
   637 \end{ttbox}
   638   gets printed as \verb|%empty insert. {x}|.
   639 \end{warn}
   640 
   641 
   642 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
   643 \index{examples!of macros}
   644 
   645 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
   646 the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
   647 if allowed, it could cause variable capture.  In such cases you usually
   648 must fall back on translation functions.  But a trick can make things
   649 readable in some cases: {\em calling translation functions by parse
   650   macros}:
   651 \begin{ttbox}
   652 PROD = FINSET +
   653 consts
   654   Pi            :: "[i, i => i] => i"
   655   "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
   656   "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
   657 \ttbreak
   658 translations
   659   "PROD x:A. B" => "Pi(A, \%x. B)"
   660   "A -> B"      => "Pi(A, _K(B))"
   661 end
   662 ML
   663   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
   664 \end{ttbox}
   665 
   666 Here {\tt Pi} is an internal constant for constructing general products.
   667 Two external forms exist: the general case {\tt PROD x:A.B} and the
   668 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
   669 does not depend on~{\tt x}.
   670 
   671 The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
   672 due to a parse translation associated with \cdx{_K}.  The order of the
   673 parse rules is critical.  Unfortunately there is no such trick for
   674 printing, so we have to add a {\tt ML} section for the print translation
   675 \ttindex{dependent_tr'}.
   676 
   677 Recall that identifiers with a leading {\tt _} are allowed in translation
   678 rules, but not in ordinary terms.  Thus we can create \AST{}s containing
   679 names that are not directly expressible.
   680 
   681 The parse translation for {\tt _K} is already installed in Pure, and {\tt
   682 dependent_tr'} is exported by the syntax module for public use.  See
   683 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
   684 \index{macros|)}\index{rewriting!syntactic|)}
   685 
   686 
   687 \section{Translation functions} \label{sec:tr_funs}
   688 \index{translations|(} 
   689 %
   690 This section describes the translation function mechanism.  By writing
   691 \ML{} functions, you can do almost everything with terms or \AST{}s during
   692 parsing and printing.  The logic \LK\ is a good example of sophisticated
   693 transformations between internal and external representations of
   694 associative sequences; here, macros would be useless.
   695 
   696 A full understanding of translations requires some familiarity
   697 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
   698 {\tt Syntax.ast} and the encodings of types and terms as such at the various
   699 stages of the parsing or printing process.  Most users should never need to
   700 use translation functions.
   701 
   702 \subsection{Declaring translation functions}
   703 There are four kinds of translation functions.  Each such function is
   704 associated with a name, which triggers calls to it.  Such names can be
   705 constants (logical or syntactic) or type constructors.
   706 
   707 {\tt Syntax.print_syntax} displays the sets of names associated with the
   708 translation functions of a {\tt Syntax.syntax} under
   709 \ttindex{parse_ast_translation}, \ttindex{parse_translation},
   710 \ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
   711 add new ones via the {\tt ML} section\index{*ML section} of
   712 a {\tt .thy} file.  There may never be more than one function of the same
   713 kind per name.  Conceptually, the {\tt ML} section should appear between
   714 {\tt consts} and {\tt translations}; newly installed translation functions
   715 are already effective when macros and logical rules are parsed.
   716 
   717 The {\tt ML} section is copied verbatim into the \ML\ file generated from a
   718 {\tt .thy} file.  Definitions made here are accessible as components of an
   719 \ML\ structure; to make some definitions private, use an \ML{} {\tt local}
   720 declaration.  The {\tt ML} section may install translation functions by
   721 declaring any of the following identifiers:
   722 \begin{ttbox}
   723 val parse_ast_translation : (string * (ast list -> ast)) list
   724 val print_ast_translation : (string * (ast list -> ast)) list
   725 val parse_translation     : (string * (term list -> term)) list
   726 val print_translation     : (string * (term list -> term)) list
   727 \end{ttbox}
   728 
   729 \subsection{The translation strategy}
   730 All four kinds of translation functions are treated similarly.  They are
   731 called during the transformations between parse trees, \AST{}s and terms
   732 (recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
   733 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
   734 $f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
   735 function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
   736 
   737 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
   738 combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
   739   x@n}$.  For term translations, the arguments are terms and a combination
   740 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
   741 x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
   742 transformations than \AST{}s do, typically involving abstractions and bound
   743 variables.
   744 
   745 Regardless of whether they act on terms or \AST{}s,
   746 parse translations differ from print translations fundamentally:
   747 \begin{description}
   748 \item[Parse translations] are applied bottom-up.  The arguments are already
   749   in translated form.  The translations must not fail; exceptions trigger
   750   an error message.
   751 
   752 \item[Print translations] are applied top-down.  They are supplied with
   753   arguments that are partly still in internal form.  The result again
   754   undergoes translation; therefore a print translation should not introduce
   755   as head the very constant that invoked it.  The function may raise
   756   exception \xdx{Match} to indicate failure; in this event it has no
   757   effect.
   758 \end{description}
   759 
   760 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
   761 \ttindex{Const} for terms --- can invoke translation functions.  This
   762 causes another difference between parsing and printing.
   763 
   764 Parsing starts with a string and the constants are not yet identified.
   765 Only parse tree heads create {\tt Constant}s in the resulting \AST, as
   766 described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
   767 introduce further {\tt Constant}s.  When the final \AST{} is converted to a
   768 term, all {\tt Constant}s become {\tt Const}s, as described in
   769 \S\ref{sec:termofast}.
   770 
   771 Printing starts with a well-typed term and all the constants are known.  So
   772 all logical constants and type constructors may invoke print translations.
   773 These, and macros, may introduce further constants.
   774 
   775 
   776 \subsection{Example: a print translation for dependent types}
   777 \index{examples!of translations}\indexbold{*dependent_tr'}
   778 
   779 Let us continue the dependent type example (page~\pageref{prod_trans}) by
   780 examining the parse translation for~\cdx{_K} and the print translation
   781 {\tt dependent_tr'}, which are both built-in.  By convention, parse
   782 translations have names ending with {\tt _tr} and print translations have
   783 names ending with {\tt _tr'}.  Search for such names in the Isabelle
   784 sources to locate more examples.
   785 
   786 Here is the parse translation for {\tt _K}:
   787 \begin{ttbox}
   788 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
   789   | k_tr ts = raise TERM("k_tr",ts);
   790 \end{ttbox}
   791 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
   792 {\tt Abs} node with a body derived from $t$.  Since terms given to parse
   793 translations are not yet typed, the type of the bound variable in the new
   794 {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
   795 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
   796 a basic term manipulation function defined in {\tt Pure/term.ML}.
   797 
   798 Here is the print translation for dependent types:
   799 \begin{ttbox}
   800 fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
   801       if 0 mem (loose_bnos B) then
   802         let val (x', B') = variant_abs (x, dummyT, B);
   803         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
   804         end
   805       else list_comb (Const (r, dummyT) $ A $ B, ts)
   806   | dependent_tr' _ _ = raise Match;
   807 \end{ttbox}
   808 The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried
   809 function application during its installation.  We could set up print
   810 translations for both {\tt Pi} and {\tt Sigma} by including
   811 \begin{ttbox}\index{*ML section}
   812 val print_translation =
   813   [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
   814    ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
   815 \end{ttbox}
   816 within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
   817 \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
   818 $\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if $B$ does not depend
   819 on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
   820 from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
   821 from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes
   822 referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',
   823 \mtt{dummyT})$.
   824 
   825 We must be careful with types here.  While types of {\tt Const}s are
   826 ignored, type constraints may be printed for some {\tt Free}s and
   827 {\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
   828 \ttindex{dummyT} are never printed with constraint, though.  The line
   829 \begin{ttbox}
   830         let val (x', B') = variant_abs (x, dummyT, B);
   831 \end{ttbox}\index{*variant_abs}
   832 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
   833 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
   834 correct type~{\tt T}, so this is the only place where a type
   835 constraint might appear. 
   836 \index{translations|)}
   837 \index{syntax!transformations|)}