doc-src/Ref/syntax.tex
author wenzelm
Wed, 18 Jan 1995 10:17:55 +0100
changeset 864 d63b111b917a
parent 499 5a54c796b808
child 1387 9bcad9c22fd4
permissions -rw-r--r--
quite a lot of minor and major revisions (inspecting theories, read_axm,
cert_axm, grammar generation, lexical matters, macro examples, ...);
     1 %% $Id$
     2 \chapter{Syntax Transformations} \label{chap:syntax}
     3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
     4 \newcommand\mtt[1]{\mbox{\tt #1}}
     5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
     6 \newcommand\Constant{\ttfct{Constant}}
     7 \newcommand\Variable{\ttfct{Variable}}
     8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
     9 \index{syntax!transformations|(}
    10 
    11 This chapter is intended for experienced Isabelle users who need to define
    12 macros or code their own translation functions.  It describes the
    13 transformations between parse trees, abstract syntax trees and terms.
    14 
    15 
    16 \section{Abstract syntax trees} \label{sec:asts}
    17 \index{ASTs|(}
    18 
    19 The parser, given a token list from the lexer, applies productions to yield
    20 a parse tree\index{parse trees}.  By applying some internal transformations
    21 the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
    22 expansion, further translations and finally type inference yields a
    23 well-typed term.  The printing process is the reverse, except for some
    24 subtleties to be discussed later.
    25 
    26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.
    27 Much of the complexity is due to the macro mechanism.  Using macros, you
    28 can specify most forms of concrete syntax without writing any \ML{} code.
    29 
    30 \begin{figure}
    31 \begin{center}
    32 \begin{tabular}{cl}
    33 string          & \\
    34 $\downarrow$    & parser \\
    35 parse tree      & \\
    36 $\downarrow$    & parse \AST{} translation \\
    37 \AST{}             & \\
    38 $\downarrow$    & \AST{} rewriting (macros) \\
    39 \AST{}             & \\
    40 $\downarrow$    & parse translation, type inference \\
    41 --- well-typed term --- & \\
    42 $\downarrow$    & print translation \\
    43 \AST{}             & \\
    44 $\downarrow$    & \AST{} rewriting (macros) \\
    45 \AST{}             & \\
    46 $\downarrow$    & print \AST{} translation, printer \\
    47 string          &
    48 \end{tabular}
    49 
    50 \end{center}
    51 \caption{Parsing and printing}\label{fig:parse_print}
    52 \end{figure}
    53 
    54 Abstract syntax trees are an intermediate form between the raw parse trees
    55 and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
    56 variable) or a list of {\em at least two\/} subtrees.  Internally, they
    57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
    58 \index{*Appl}
    59 \begin{ttbox}
    60 datatype ast = Constant of string
    61              | Variable of string
    62              | Appl of ast list
    63 \end{ttbox}
    64 %
    65 Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
    66 atoms are shown as quoted strings, variable atoms as non-quoted strings and
    67 applications as a parenthesised list of subtrees.  For example, the \AST
    68 \begin{ttbox}
    69 Appl [Constant "_constrain",
    70       Appl [Constant "_abs", Variable "x", Variable "t"],
    71       Appl [Constant "fun", Variable "'a", Variable "'b"]]
    72 \end{ttbox}
    73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
    74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
    75 subtrees.
    76 
    77 The resemblance to Lisp's S-expressions is intentional, but there are two
    78 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
    79 names {\tt Constant} and {\tt Variable} too literally; in the later
    80 translation to terms, $\Variable x$ may become a constant, free or bound
    81 variable, even a type constructor or class name; the actual outcome depends
    82 on the context.
    83 
    84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
    85 application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
    86 application is determined later by context; it could be a type constructor
    87 applied to types.
    88 
    89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
    90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
    91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
    92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
    93 constructor \ttindex{Bound}).
    94 
    95 
    96 \section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
    97 \index{ASTs!made from parse trees}
    98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
    99 
   100 The parse tree is the raw output of the parser.  Translation functions,
   101 called {\bf parse AST translations}\indexbold{translations!parse AST},
   102 transform the parse tree into an abstract syntax tree.
   103 
   104 The parse tree is constructed by nesting the right-hand sides of the
   105 productions used to recognize the input.  Such parse trees are simply lists
   106 of tokens and constituent parse trees, the latter representing the
   107 nonterminals of the productions.  Let us refer to the actual productions in
   108 the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
   109 example).
   110 
   111 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
   112 by stripping out delimiters and copy productions.  More precisely, the
   113 mapping $\astofpt{-}$ is derived from the productions as follows:
   114 \begin{itemize}
   115 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
   116   \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$
   117   its associated string. Note that for {\tt xstr} this does not include the
   118   quotes.
   119 
   120 \item Copy productions:\index{productions!copy}
   121   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
   122   strings of delimiters, which are discarded.  $P$ stands for the single
   123   constituent that is not a delimiter; it is either a nonterminal symbol or
   124   a name token.
   125 
   126   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
   127     Here there are no constituents other than delimiters, which are
   128     discarded.
   129 
   130   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
   131     the remaining constituents $P@1$, \ldots, $P@n$ are built into an
   132     application whose head constant is~$c$:
   133     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
   134        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
   135     \]
   136 \end{itemize}
   137 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
   138 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
   139 These examples illustrate the need for further translations to make \AST{}s
   140 closer to the typed $\lambda$-calculus.  The Pure syntax provides
   141 predefined parse \AST{} translations\index{translations!parse AST} for
   142 ordinary applications, type applications, nested abstractions, meta
   143 implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
   144 effect on some representative input strings.
   145 
   146 
   147 \begin{figure}
   148 \begin{center}
   149 \tt\begin{tabular}{ll}
   150 \rm input string    & \rm \AST \\\hline
   151 "f"                 & f \\
   152 "'a"                & 'a \\
   153 "t == u"            & ("==" t u) \\
   154 "f(x)"              & ("_appl" f x) \\
   155 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
   156 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
   157 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
   158 \end{tabular}
   159 \end{center}
   160 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
   161 \end{figure}
   162 
   163 \begin{figure}
   164 \begin{center}
   165 \tt\begin{tabular}{ll}
   166 \rm input string            & \rm \AST{} \\\hline
   167 "f(x, y, z)"                & (f x y z) \\
   168 "'a ty"                     & (ty 'a) \\
   169 "('a, 'b) ty"               & (ty 'a 'b) \\
   170 "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
   171 "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
   172 "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
   173 "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
   174 \end{tabular}
   175 \end{center}
   176 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
   177 \end{figure}
   178 
   179 The names of constant heads in the \AST{} control the translation process.
   180 The list of constants invoking parse \AST{} translations appears in the
   181 output of {\tt print_syntax} under {\tt parse_ast_translation}.
   182 
   183 
   184 \section{Transforming \AST{}s to terms}\label{sec:termofast}
   185 \index{terms!made from ASTs}
   186 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
   187 
   188 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
   189 transformed into a term.  This term is probably ill-typed since type
   190 inference has not occurred yet.  The term may contain type constraints
   191 consisting of applications with head {\tt "_constrain"}; the second
   192 argument is a type encoded as a term.  Type inference later introduces
   193 correct types or rejects the input.
   194 
   195 Another set of translation functions, namely parse
   196 translations\index{translations!parse}, may affect this process.  If we
   197 ignore parse translations for the time being, then \AST{}s are transformed
   198 to terms by mapping \AST{} constants to constants, \AST{} variables to
   199 schematic or free variables and \AST{} applications to applications.
   200 
   201 More precisely, the mapping $\termofast{-}$ is defined by
   202 \begin{itemize}
   203 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
   204   \mtt{dummyT})$.
   205 
   206 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
   207   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
   208   the index extracted from~$xi$.
   209 
   210 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
   211   \mtt{dummyT})$.
   212 
   213 \item Function applications with $n$ arguments:
   214     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
   215        \termofast{f} \ttapp
   216          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
   217     \]
   218 \end{itemize}
   219 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
   220 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
   221 while \ttindex{dummyT} stands for some dummy type that is ignored during
   222 type inference.
   223 
   224 So far the outcome is still a first-order term.  Abstractions and bound
   225 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
   226 by parse translations.  Such translations are attached to {\tt "_abs"},
   227 {\tt "!!"} and user-defined binders.
   228 
   229 
   230 \section{Printing of terms}
   231 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
   232 
   233 The output phase is essentially the inverse of the input phase.  Terms are
   234 translated via abstract syntax trees into strings.  Finally the strings are
   235 pretty printed.
   236 
   237 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
   238 terms into \AST{}s.  Ignoring those, the transformation maps
   239 term constants, variables and applications to the corresponding constructs
   240 on \AST{}s.  Abstractions are mapped to applications of the special
   241 constant {\tt _abs}.
   242 
   243 More precisely, the mapping $\astofterm{-}$ is defined as follows:
   244 \begin{itemize}
   245   \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
   246 
   247   \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
   248     \tau)$.
   249 
   250   \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
   251     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
   252     the {\tt indexname} $(x, i)$.
   253 
   254   \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
   255     of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
   256     be obtained from~$t$ by replacing all bound occurrences of~$x$ by
   257     the free variable $x'$.  This replaces corresponding occurrences of the
   258     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
   259     \mtt{dummyT})$:
   260    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
   261       \Appl{\Constant \mtt{"_abs"},
   262         constrain(\Variable x', \tau), \astofterm{t'}}.
   263     \]
   264 
   265   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
   266     The occurrence of constructor \ttindex{Bound} should never happen
   267     when printing well-typed terms; it indicates a de Bruijn index with no
   268     matching abstraction.
   269 
   270   \item Where $f$ is not an application,
   271     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
   272        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
   273     \]
   274 \end{itemize}
   275 %
   276 Type constraints\index{type constraints} are inserted to allow the printing
   277 of types.  This is governed by the boolean variable \ttindex{show_types}:
   278 \begin{itemize}
   279   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
   280     \ttindex{show_types} is set to {\tt false}.
   281 
   282   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
   283          \astofterm{\tau}}$ \ otherwise.
   284 
   285     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
   286     constructors go to {\tt Constant}s; type identifiers go to {\tt
   287       Variable}s; type applications go to {\tt Appl}s with the type
   288     constructor as the first element.  If \ttindex{show_sorts} is set to
   289     {\tt true}, some type variables are decorated with an \AST{} encoding
   290     of their sort.
   291 \end{itemize}
   292 %
   293 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
   294 transformed into the final output string.  The built-in {\bf print AST
   295   translations}\indexbold{translations!print AST} reverse the
   296 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
   297 
   298 For the actual printing process, the names attached to productions
   299 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
   300 a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
   301 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
   302 for~$c$.  Each argument~$x@i$ is converted to a string, and put in
   303 parentheses if its priority~$(p@i)$ requires this.  The resulting strings
   304 and their syntactic sugar (denoted by \dots{} above) are joined to make a
   305 single string.
   306 
   307 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
   308 corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
   309 x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with
   310 non-constant head or without a corresponding production are printed as
   311 $f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
   312 $\Variable x$ is simply printed as~$x$.
   313 
   314 Blanks are {\em not\/} inserted automatically.  If blanks are required to
   315 separate tokens, specify them in the mixfix declaration, possibly preceded
   316 by a slash~({\tt/}) to allow a line break.
   317 \index{ASTs|)}
   318 
   319 
   320 
   321 \section{Macros: Syntactic rewriting} \label{sec:macros}
   322 \index{macros|(}\index{rewriting!syntactic|(}
   323 
   324 Mixfix declarations alone can handle situations where there is a direct
   325 connection between the concrete syntax and the underlying term.  Sometimes
   326 we require a more elaborate concrete syntax, such as quantifiers and list
   327 notation.  Isabelle's {\bf macros} and {\bf translation functions} can
   328 perform translations such as
   329 \begin{center}\tt
   330   \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
   331     ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
   332     [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
   333   \end{tabular}
   334 \end{center}
   335 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
   336 are the most powerful translation mechanism but are difficult to read or
   337 write.  Macros are specified by first-order rewriting systems that operate
   338 on abstract syntax trees.  They are usually easy to read and write, and can
   339 express all but the most obscure translations.
   340 
   341 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
   342 theory.\footnote{This and the following theories are complete working
   343   examples, though they specify only syntax, no axioms.  The file {\tt
   344     ZF/ZF.thy} presents a full set theory definition, including many
   345   macro rules.}  Theory {\tt SET} defines constants for set comprehension
   346 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
   347 quantification ({\tt Ball}).  Each of these binds some variables.  Without
   348 additional syntax we should have to write $\forall x \in A.  P$ as {\tt
   349   Ball(A,\%x.P)}, and similarly for the others.
   350 
   351 \begin{figure}
   352 \begin{ttbox}
   353 SET = Pure +
   354 types
   355   i o
   356 arities
   357   i, o :: logic
   358 consts
   359   Trueprop      :: "o => prop"              ("_" 5)
   360   Collect       :: "[i, i => o] => i"
   361   Replace       :: "[i, [i, i] => o] => i"
   362   Ball          :: "[i, i => o] => o"
   363 syntax
   364   "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
   365   "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
   366   "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
   367 translations
   368   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
   369   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
   370   "ALL x:A. P"  == "Ball(A, \%x. P)"
   371 end
   372 \end{ttbox}
   373 \caption{Macro example: set theory}\label{fig:set_trans}
   374 \end{figure}
   375 
   376 The theory specifies a variable-binding syntax through additional productions
   377 that have mixfix declarations.  Each non-copy production must specify some
   378 constant, which is used for building \AST{}s. \index{constants!syntactic} The
   379 additional constants are decorated with {\tt\at} to stress their purely
   380 syntactic purpose; they may not occur within the final well-typed terms,
   381 being declared as {\tt syntax} rather than {\tt consts}.
   382 
   383 The translations cause the replacement of external forms by internal forms
   384 after parsing, and vice versa before printing of terms.  As a specification
   385 of the set theory notation, they should be largely self-explanatory.  The
   386 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
   387 appear implicitly in the macro rules via their mixfix forms.
   388 
   389 Macros can define variable-binding syntax because they operate on \AST{}s,
   390 which have no inbuilt notion of bound variable.  The macro variables {\tt
   391   x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
   392 in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
   393 range over formulae containing bound variable occurrences.
   394 
   395 Other applications of the macro system can be less straightforward, and
   396 there are peculiarities.  The rest of this section will describe in detail
   397 how Isabelle macros are preprocessed and applied.
   398 
   399 
   400 \subsection{Specifying macros}
   401 Macros are basically rewrite rules on \AST{}s.  But unlike other macro
   402 systems found in programming languages, Isabelle's macros work in both
   403 directions.  Therefore a syntax contains two lists of rewrites: one for
   404 parsing and one for printing.
   405 
   406 \index{*translations section}
   407 The {\tt translations} section specifies macros.  The syntax for a macro is
   408 \[ (root)\; string \quad
   409    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
   410    \right\} \quad
   411    (root)\; string
   412 \]
   413 %
   414 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
   415 ({\tt ==}).  The two strings specify the left and right-hand sides of the
   416 macro rule.  The $(root)$ specification is optional; it specifies the
   417 nonterminal for parsing the $string$ and if omitted defaults to {\tt
   418   logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
   419 \begin{itemize}
   420 \item Rules must be left linear: $l$ must not contain repeated variables.
   421 
   422 \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
   423   (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
   424 
   425 \item Every variable in~$r$ must also occur in~$l$.
   426 \end{itemize}
   427 
   428 Macro rules may refer to any syntax from the parent theories.  They may
   429 also refer to anything defined before the {\tt .thy} file's {\tt
   430   translations} section --- including any mixfix declarations.
   431 
   432 Upon declaration, both sides of the macro rule undergo parsing and parse
   433 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
   434 macro expansion.  The lexer runs in a different mode that additionally
   435 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
   436 {\tt _K}).  Thus, a constant whose name starts with an underscore can
   437 appear in macro rules but not in ordinary terms.
   438 
   439 Some atoms of the macro rule's \AST{} are designated as constants for
   440 matching.  These are all names that have been declared as classes, types or
   441 constants (logical and syntactic).
   442 
   443 The result of this preprocessing is two lists of macro rules, each stored
   444 as a pair of \AST{}s.  They can be viewed using {\tt print_syntax}
   445 (sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
   446 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
   447 \begin{ttbox}
   448 parse_rules:
   449   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
   450   ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
   451   ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
   452 print_rules:
   453   ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
   454   ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
   455   ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
   456 \end{ttbox}
   457 
   458 \begin{warn}
   459   Avoid choosing variable names that have previously been used as
   460   constants, types or type classes; the {\tt consts} section in the output
   461   of {\tt print_syntax} lists all such names.  If a macro rule works
   462   incorrectly, inspect its internal form as shown above, recalling that
   463   constants appear as quoted strings and variables without quotes.
   464 \end{warn}
   465 
   466 \begin{warn}
   467 If \ttindex{eta_contract} is set to {\tt true}, terms will be
   468 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
   469 abstraction nodes needed for print rules to match may vanish.  For example,
   470 \verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
   471 not apply and the output will be {\tt Ball(A, P)}.  This problem would not
   472 occur if \ML{} translation functions were used instead of macros (as is
   473 done for binder declarations).
   474 \end{warn}
   475 
   476 
   477 \begin{warn}
   478 Another trap concerns type constraints.  If \ttindex{show_types} is set to
   479 {\tt true}, bound variables will be decorated by their meta types at the
   480 binding place (but not at occurrences in the body).  Matching with
   481 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
   482 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
   483 appear in the external form, say \verb|{y::i:A::i. P::o}|.
   484 
   485 To allow such constraints to be re-read, your syntax should specify bound
   486 variables using the nonterminal~\ndx{idt}.  This is the case in our
   487 example.  Choosing {\tt id} instead of {\tt idt} is a common error,
   488 especially since it appears in former versions of most of Isabelle's
   489 object-logics.
   490 \end{warn}
   491 
   492 
   493 
   494 \subsection{Applying rules}
   495 As a term is being parsed or printed, an \AST{} is generated as an
   496 intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
   497 normalised by applying macro rules in the manner of a traditional term
   498 rewriting system.  We first examine how a single rule is applied.
   499 
   500 Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
   501 translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
   502 instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
   503 matched by $l$ may be replaced by the corresponding instance of~$r$, thus
   504 {\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
   505   place-holders} that may occur in rule patterns but not in ordinary
   506 \AST{}s; {\tt Variable} atoms serve this purpose.
   507 
   508 The matching of the object~$u$ by the pattern~$l$ is performed as follows:
   509 \begin{itemize}
   510   \item Every constant matches itself.
   511 
   512   \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
   513     This point is discussed further below.
   514 
   515   \item Every \AST{} in the object matches $\Variable x$ in the pattern,
   516     binding~$x$ to~$u$.
   517 
   518   \item One application matches another if they have the same number of
   519     subtrees and corresponding subtrees match.
   520 
   521   \item In every other case, matching fails.  In particular, {\tt
   522       Constant}~$x$ can only match itself.
   523 \end{itemize}
   524 A successful match yields a substitution that is applied to~$r$, generating
   525 the instance that replaces~$u$.
   526 
   527 The second case above may look odd.  This is where {\tt Variable}s of
   528 non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
   529 far removed from parse trees; at this level it is not yet known which
   530 identifiers will become constants, bounds, frees, types or classes.  As
   531 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
   532 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
   533 \ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other
   534 hand, when \AST{}s generated from terms for printing, all constants and type
   535 constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may
   536 contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
   537 insignificant at macro level because matching treats them alike.
   538 
   539 Because of this behaviour, different kinds of atoms with the same name are
   540 indistinguishable, which may make some rules prone to misbehaviour.  Example:
   541 \begin{ttbox}
   542 types
   543   Nil
   544 consts
   545   Nil     :: "'a list"
   546 syntax
   547   "[]"    :: "'a list"    ("[]")
   548 translations
   549   "[]"    == "Nil"
   550 \end{ttbox}
   551 The term {\tt Nil} will be printed as {\tt []}, just as expected.
   552 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
   553 expected!  Guess how type~{\tt Nil} is printed?
   554 
   555 Normalizing an \AST{} involves repeatedly applying macro rules until none
   556 are applicable.  Macro rules are chosen in the order that they appear in the
   557 {\tt translations} section.  You can watch the normalization of \AST{}s
   558 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
   559 {\tt true}.\index{tracing!of macros} Alternatively, use
   560 \ttindex{Syntax.test_read}.  The information displayed when tracing
   561 includes the \AST{} before normalization ({\tt pre}), redexes with results
   562 ({\tt rewrote}), the normal form finally reached ({\tt post}) and some
   563 statistics ({\tt normalize}).  If tracing is off,
   564 \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
   565 printing of the normal form and statistics only.
   566 
   567 
   568 \subsection{Example: the syntax of finite sets}
   569 \index{examples!of macros}
   570 
   571 This example demonstrates the use of recursive macros to implement a
   572 convenient notation for finite sets.
   573 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
   574 \index{"@Enum@{\tt\at Enum} constant}
   575 \index{"@Finset@{\tt\at Finset} constant}
   576 \begin{ttbox}
   577 FINSET = SET +
   578 types
   579   is
   580 syntax
   581   ""            :: "i => is"                ("_")
   582   "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
   583 consts
   584   empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
   585   insert        :: "[i, i] => i"
   586 syntax
   587   "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
   588 translations
   589   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
   590   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
   591 end
   592 \end{ttbox}
   593 Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
   594 declarations above specify \verb|{x, y, z}| as the external representation
   595 of
   596 \begin{ttbox}
   597 insert(x, insert(y, insert(z, empty)))
   598 \end{ttbox}
   599 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
   600   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
   601 allows a line break after the comma for \rmindex{pretty printing}; if no
   602 line break is required then a space is printed instead.
   603 
   604 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
   605 declaration.  Hence {\tt is} is not a logical type and may be used safely as
   606 a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be
   607 re-used for other enumerations of type~{\tt i} like lists or tuples. If we
   608 had needed polymorphic enumerations, we could have used the predefined
   609 nonterminal symbol \ndx{args} and skipped this part altogether.
   610 
   611 \index{"@Finset@{\tt\at Finset} constant}
   612 Next follows {\tt empty}, which is already equipped with its syntax
   613 \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
   614 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
   615   i} enclosed in curly braces.  Remember that a pair of parentheses, as in
   616 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
   617 
   618 The translations may look strange at first.  Macro rules are best
   619 understood in their internal forms:
   620 \begin{ttbox}
   621 parse_rules:
   622   ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
   623   ("{\at}Finset" x)  ->  ("insert" x "empty")
   624 print_rules:
   625   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
   626   ("insert" x "empty")  ->  ("{\at}Finset" x)
   627 \end{ttbox}
   628 This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
   629 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
   630 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
   631 The parse rules only work in the order given.
   632 
   633 \begin{warn}
   634   The \AST{} rewriter cannot distinguish constants from variables and looks
   635   only for names of atoms.  Thus the names of {\tt Constant}s occurring in
   636   the (internal) left-hand side of translation rules should be regarded as
   637   \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
   638   sufficiently long and strange names.  If a bound variable's name gets
   639   rewritten, the result will be incorrect; for example, the term
   640 \begin{ttbox}
   641 \%empty insert. insert(x, empty)
   642 \end{ttbox}
   643 \par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
   644 \end{warn}
   645 
   646 
   647 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
   648 \index{examples!of macros}
   649 
   650 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
   651 the right-hand side.  Something like \verb|"K(B)" => "%x.B"| is illegal;
   652 if allowed, it could cause variable capture.  In such cases you usually
   653 must fall back on translation functions.  But a trick can make things
   654 readable in some cases: {\em calling\/} translation functions by parse
   655 macros:
   656 \begin{ttbox}
   657 PROD = FINSET +
   658 consts
   659   Pi            :: "[i, i => i] => i"
   660 syntax
   661   "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
   662   "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
   663 \ttbreak
   664 translations
   665   "PROD x:A. B" => "Pi(A, \%x. B)"
   666   "A -> B"      => "Pi(A, _K(B))"
   667 end
   668 ML
   669   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
   670 \end{ttbox}
   671 
   672 Here {\tt Pi} is a logical constant for constructing general products.
   673 Two external forms exist: the general case {\tt PROD x:A.B} and the
   674 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
   675 does not depend on~{\tt x}.
   676 
   677 The second parse macro introduces {\tt _K(B)}, which later becomes
   678 \verb|%x.B| due to a parse translation associated with \cdx{_K}.
   679 Unfortunately there is no such trick for printing, so we have to add a {\tt
   680 ML} section for the print translation \ttindex{dependent_tr'}.
   681 
   682 Recall that identifiers with a leading {\tt _} are allowed in translation
   683 rules, but not in ordinary terms.  Thus we can create \AST{}s containing
   684 names that are not directly expressible.
   685 
   686 The parse translation for {\tt _K} is already installed in Pure, and {\tt
   687 dependent_tr'} is exported by the syntax module for public use.  See
   688 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
   689 \index{macros|)}\index{rewriting!syntactic|)}
   690 
   691 
   692 \section{Translation functions} \label{sec:tr_funs}
   693 \index{translations|(}
   694 %
   695 This section describes the translation function mechanism.  By writing
   696 \ML{} functions, you can do almost everything with terms or \AST{}s during
   697 parsing and printing.  The logic \LK\ is a good example of sophisticated
   698 transformations between internal and external representations of sequents;
   699 here, macros would be useless.
   700 
   701 A full understanding of translations requires some familiarity
   702 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
   703 {\tt Syntax.ast} and the encodings of types and terms as such at the various
   704 stages of the parsing or printing process.  Most users should never need to
   705 use translation functions.
   706 
   707 \subsection{Declaring translation functions}
   708 There are four kinds of translation functions.  Each such function is
   709 associated with a name, which triggers calls to it.  Such names can be
   710 constants (logical or syntactic) or type constructors.
   711 
   712 {\tt print_syntax} displays the sets of names associated with the
   713 translation functions of a {\tt Syntax.syntax} under
   714 \ttindex{parse_ast_translation}, \ttindex{parse_translation},
   715 \ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
   716 add new ones via the {\tt ML} section\index{*ML section} of
   717 a {\tt .thy} file.  There may never be more than one function of the same
   718 kind per name.  Even though the {\tt ML} section is the very last part of a
   719 {\tt .thy} file, newly installed translation functions are effective when
   720 processing the preceding section.
   721 
   722 The {\tt ML} section is copied verbatim near the beginning of the \ML\ file
   723 generated from a {\tt .thy} file.  Definitions made here are accessible as
   724 components of an \ML\ structure; to make some definitions private, use an
   725 \ML{} {\tt local} declaration.  The {\tt ML} section may install translation
   726 functions by declaring any of the following identifiers:
   727 \begin{ttbox}
   728 val parse_ast_translation : (string * (ast list -> ast)) list
   729 val print_ast_translation : (string * (ast list -> ast)) list
   730 val parse_translation     : (string * (term list -> term)) list
   731 val print_translation     : (string * (term list -> term)) list
   732 \end{ttbox}
   733 
   734 \subsection{The translation strategy}
   735 All four kinds of translation functions are treated similarly.  They are
   736 called during the transformations between parse trees, \AST{}s and terms
   737 (recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
   738 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
   739 $f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
   740 function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
   741 
   742 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
   743 combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
   744   x@n}$.  For term translations, the arguments are terms and a combination
   745 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
   746 x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
   747 transformations than \AST{}s do, typically involving abstractions and bound
   748 variables.
   749 
   750 Regardless of whether they act on terms or \AST{}s, parse translations differ
   751 from print translations in their overall behaviour fundamentally:
   752 \begin{description}
   753 \item[Parse translations] are applied bottom-up.  The arguments are already
   754   in translated form.  The translations must not fail; exceptions trigger
   755   an error message.
   756 
   757 \item[Print translations] are applied top-down.  They are supplied with
   758   arguments that are partly still in internal form.  The result again
   759   undergoes translation; therefore a print translation should not introduce
   760   as head the very constant that invoked it.  The function may raise
   761   exception \xdx{Match} to indicate failure; in this event it has no
   762   effect.
   763 \end{description}
   764 
   765 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
   766 \ttindex{Const} for terms --- can invoke translation functions.  This
   767 causes another difference between parsing and printing.
   768 
   769 Parsing starts with a string and the constants are not yet identified.
   770 Only parse tree heads create {\tt Constant}s in the resulting \AST, as
   771 described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
   772 introduce further {\tt Constant}s.  When the final \AST{} is converted to a
   773 term, all {\tt Constant}s become {\tt Const}s, as described in
   774 \S\ref{sec:termofast}.
   775 
   776 Printing starts with a well-typed term and all the constants are known.  So
   777 all logical constants and type constructors may invoke print translations.
   778 These, and macros, may introduce further constants.
   779 
   780 
   781 \subsection{Example: a print translation for dependent types}
   782 \index{examples!of translations}\indexbold{*dependent_tr'}
   783 
   784 Let us continue the dependent type example (page~\pageref{prod_trans}) by
   785 examining the parse translation for~\cdx{_K} and the print translation
   786 {\tt dependent_tr'}, which are both built-in.  By convention, parse
   787 translations have names ending with {\tt _tr} and print translations have
   788 names ending with {\tt _tr'}.  Search for such names in the Isabelle
   789 sources to locate more examples.
   790 
   791 Here is the parse translation for {\tt _K}:
   792 \begin{ttbox}
   793 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
   794   | k_tr ts = raise TERM ("k_tr", ts);
   795 \end{ttbox}
   796 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
   797 {\tt Abs} node with a body derived from $t$.  Since terms given to parse
   798 translations are not yet typed, the type of the bound variable in the new
   799 {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
   800 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
   801 a basic term manipulation function defined in {\tt Pure/term.ML}.
   802 
   803 Here is the print translation for dependent types:
   804 \begin{ttbox}
   805 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
   806       if 0 mem (loose_bnos B) then
   807         let val (x', B') = variant_abs (x, dummyT, B);
   808         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
   809         end
   810       else list_comb (Const (r, dummyT) $ A $ B, ts)
   811   | dependent_tr' _ _ = raise Match;
   812 \end{ttbox}
   813 The argument {\tt (q,r)} is supplied to the curried function {\tt
   814   dependent_tr'} by a partial application during its installation.  We
   815 can set up print translations for both {\tt Pi} and {\tt Sigma} by
   816 including
   817 \begin{ttbox}\index{*ML section}
   818 val print_translation =
   819   [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
   820    ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
   821 \end{ttbox}
   822 within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
   823 \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
   824 $\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not depend
   825 on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
   826 from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
   827 from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes
   828 referring to the {\tt Abs} node replaced by $\ttfct{Free} (x',
   829 \mtt{dummyT})$.
   830 
   831 We must be careful with types here.  While types of {\tt Const}s are
   832 ignored, type constraints may be printed for some {\tt Free}s and
   833 {\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
   834 \ttindex{dummyT} are never printed with constraint, though.  The line
   835 \begin{ttbox}
   836         let val (x', B') = variant_abs (x, dummyT, B);
   837 \end{ttbox}\index{*variant_abs}
   838 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
   839 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
   840 correct type~{\tt T}, so this is the only place where a type
   841 constraint might appear.
   842 \index{translations|)}
   843 \index{syntax!transformations|)}