doc-src/Ref/syntax.tex
author wenzelm
Mon, 28 Aug 2000 13:52:38 +0200
changeset 9695 ec7d7f877712
parent 9350 d855d9d1add9
child 14893 55e83c32cdec
permissions -rw-r--r--
proper setup of iman.sty/extra.sty/ttbox.sty;
     1 %% $Id$
     2 \chapter{Syntax Transformations} \label{chap:syntax}
     3 \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
     4 \newcommand\mtt[1]{\mbox{\tt #1}}
     5 \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
     6 \newcommand\Constant{\ttfct{Constant}}
     7 \newcommand\Variable{\ttfct{Variable}}
     8 \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
     9 \index{syntax!transformations|(}
    10 
    11 This chapter is intended for experienced Isabelle users who need to define
    12 macros or code their own translation functions.  It describes the
    13 transformations between parse trees, abstract syntax trees and terms.
    14 
    15 
    16 \section{Abstract syntax trees} \label{sec:asts}
    17 \index{ASTs|(}
    18 
    19 The parser, given a token list from the lexer, applies productions to yield
    20 a parse tree\index{parse trees}.  By applying some internal transformations
    21 the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
    22 expansion, further translations and finally type inference yields a
    23 well-typed term.  The printing process is the reverse, except for some
    24 subtleties to be discussed later.
    25 
    26 Figure~\ref{fig:parse_print} outlines the parsing and printing process.
    27 Much of the complexity is due to the macro mechanism.  Using macros, you
    28 can specify most forms of concrete syntax without writing any \ML{} code.
    29 
    30 \begin{figure}
    31 \begin{center}
    32 \begin{tabular}{cl}
    33 string          & \\
    34 $\downarrow$    & lexer, parser \\
    35 parse tree      & \\
    36 $\downarrow$    & parse \AST{} translation \\
    37 \AST{}             & \\
    38 $\downarrow$    & \AST{} rewriting (macros) \\
    39 \AST{}             & \\
    40 $\downarrow$    & parse translation, type inference \\
    41 --- well-typed term --- & \\
    42 $\downarrow$    & print translation \\
    43 \AST{}             & \\
    44 $\downarrow$    & \AST{} rewriting (macros) \\
    45 \AST{}             & \\
    46 $\downarrow$    & print \AST{} translation, token translation \\
    47 string          &
    48 \end{tabular}
    49 
    50 \end{center}
    51 \caption{Parsing and printing}\label{fig:parse_print}
    52 \end{figure}
    53 
    54 Abstract syntax trees are an intermediate form between the raw parse trees
    55 and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
    56 variable) or a list of {\em at least two\/} subtrees.  Internally, they
    57 have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
    58 \index{*Appl}
    59 \begin{ttbox}
    60 datatype ast = Constant of string
    61              | Variable of string
    62              | Appl of ast list
    63 \end{ttbox}
    64 %
    65 Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
    66 atoms are shown as quoted strings, variable atoms as non-quoted strings and
    67 applications as a parenthesised list of subtrees.  For example, the \AST
    68 \begin{ttbox}
    69 Appl [Constant "_constrain",
    70       Appl [Constant "_abs", Variable "x", Variable "t"],
    71       Appl [Constant "fun", Variable "'a", Variable "'b"]]
    72 \end{ttbox}
    73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
    74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
    75 subtrees.
    76 
    77 The resemblance to Lisp's S-expressions is intentional, but there are two
    78 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
    79 names {\tt Constant} and {\tt Variable} too literally; in the later
    80 translation to terms, $\Variable x$ may become a constant, free or bound
    81 variable, even a type constructor or class name; the actual outcome depends
    82 on the context.
    83 
    84 Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
    85 application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
    86 application is determined later by context; it could be a type constructor
    87 applied to types.
    88 
    89 Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
    90 first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
    91 at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
    92 occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
    93 constructor \ttindex{Bound}).
    94 
    95 
    96 \section{Transforming parse trees to ASTs}\label{sec:astofpt}
    97 \index{ASTs!made from parse trees}
    98 \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
    99 
   100 The parse tree is the raw output of the parser.  Translation functions,
   101 called {\bf parse AST translations}\indexbold{translations!parse AST},
   102 transform the parse tree into an abstract syntax tree.
   103 
   104 The parse tree is constructed by nesting the right-hand sides of the
   105 productions used to recognize the input.  Such parse trees are simply lists
   106 of tokens and constituent parse trees, the latter representing the
   107 nonterminals of the productions.  Let us refer to the actual productions in
   108 the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
   109 example).
   110 
   111 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
   112 by stripping out delimiters and copy productions.  More precisely, the
   113 mapping $\astofpt{-}$ is derived from the productions as follows:
   114 \begin{itemize}
   115 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
   116   \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$
   117   its associated string.  Note that for {\tt xstr} this does not include the
   118   quotes.
   119 
   120 \item Copy productions:\index{productions!copy}
   121   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
   122   strings of delimiters, which are discarded.  $P$ stands for the single
   123   constituent that is not a delimiter; it is either a nonterminal symbol or
   124   a name token.
   125 
   126   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
   127     Here there are no constituents other than delimiters, which are
   128     discarded.
   129 
   130   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
   131     the remaining constituents $P@1$, \ldots, $P@n$ are built into an
   132     application whose head constant is~$c$:
   133     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
   134        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
   135     \]
   136 \end{itemize}
   137 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
   138 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
   139 These examples illustrate the need for further translations to make \AST{}s
   140 closer to the typed $\lambda$-calculus.  The Pure syntax provides
   141 predefined parse \AST{} translations\index{translations!parse AST} for
   142 ordinary applications, type applications, nested abstractions, meta
   143 implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
   144 effect on some representative input strings.
   145 
   146 
   147 \begin{figure}
   148 \begin{center}
   149 \tt\begin{tabular}{ll}
   150 \rm input string    & \rm \AST \\\hline
   151 "f"                 & f \\
   152 "'a"                & 'a \\
   153 "t == u"            & ("==" t u) \\
   154 "f(x)"              & ("_appl" f x) \\
   155 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
   156 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
   157 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
   158 \end{tabular}
   159 \end{center}
   160 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
   161 \end{figure}
   162 
   163 \begin{figure}
   164 \begin{center}
   165 \tt\begin{tabular}{ll}
   166 \rm input string            & \rm \AST{} \\\hline
   167 "f(x, y, z)"                & (f x y z) \\
   168 "'a ty"                     & (ty 'a) \\
   169 "('a, 'b) ty"               & (ty 'a 'b) \\
   170 "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
   171 "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
   172 "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
   173 "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
   174 \end{tabular}
   175 \end{center}
   176 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
   177 \end{figure}
   178 
   179 The names of constant heads in the \AST{} control the translation process.
   180 The list of constants invoking parse \AST{} translations appears in the
   181 output of {\tt print_syntax} under {\tt parse_ast_translation}.
   182 
   183 
   184 \section{Transforming ASTs to terms}\label{sec:termofast}
   185 \index{terms!made from ASTs}
   186 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
   187 
   188 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
   189 transformed into a term.  This term is probably ill-typed since type
   190 inference has not occurred yet.  The term may contain type constraints
   191 consisting of applications with head {\tt "_constrain"}; the second
   192 argument is a type encoded as a term.  Type inference later introduces
   193 correct types or rejects the input.
   194 
   195 Another set of translation functions, namely parse
   196 translations\index{translations!parse}, may affect this process.  If we
   197 ignore parse translations for the time being, then \AST{}s are transformed
   198 to terms by mapping \AST{} constants to constants, \AST{} variables to
   199 schematic or free variables and \AST{} applications to applications.
   200 
   201 More precisely, the mapping $\termofast{-}$ is defined by
   202 \begin{itemize}
   203 \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
   204   \mtt{dummyT})$.
   205 
   206 \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
   207   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
   208   the index extracted from~$xi$.
   209 
   210 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
   211   \mtt{dummyT})$.
   212 
   213 \item Function applications with $n$ arguments:
   214     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
   215        \termofast{f} \ttapp
   216          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
   217     \]
   218 \end{itemize}
   219 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
   220 \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
   221 while \ttindex{dummyT} stands for some dummy type that is ignored during
   222 type inference.
   223 
   224 So far the outcome is still a first-order term.  Abstractions and bound
   225 variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
   226 by parse translations.  Such translations are attached to {\tt "_abs"},
   227 {\tt "!!"} and user-defined binders.
   228 
   229 
   230 \section{Printing of terms}
   231 \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
   232 
   233 The output phase is essentially the inverse of the input phase.  Terms are
   234 translated via abstract syntax trees into strings.  Finally the strings are
   235 pretty printed.
   236 
   237 Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
   238 terms into \AST{}s.  Ignoring those, the transformation maps
   239 term constants, variables and applications to the corresponding constructs
   240 on \AST{}s.  Abstractions are mapped to applications of the special
   241 constant {\tt _abs}.
   242 
   243 More precisely, the mapping $\astofterm{-}$ is defined as follows:
   244 \begin{itemize}
   245   \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
   246 
   247   \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
   248     \tau)$.
   249 
   250   \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
   251     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
   252     the {\tt indexname} $(x, i)$.
   253 
   254   \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
   255     of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
   256     be obtained from~$t$ by replacing all bound occurrences of~$x$ by
   257     the free variable $x'$.  This replaces corresponding occurrences of the
   258     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
   259     \mtt{dummyT})$:
   260    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
   261       \Appl{\Constant \mtt{"_abs"},
   262         constrain(\Variable x', \tau), \astofterm{t'}}
   263     \]
   264 
   265   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
   266     The occurrence of constructor \ttindex{Bound} should never happen
   267     when printing well-typed terms; it indicates a de Bruijn index with no
   268     matching abstraction.
   269 
   270   \item Where $f$ is not an application,
   271     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
   272        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
   273     \]
   274 \end{itemize}
   275 %
   276 Type constraints\index{type constraints} are inserted to allow the printing
   277 of types.  This is governed by the boolean variable \ttindex{show_types}:
   278 \begin{itemize}
   279   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
   280     \ttindex{show_types} is set to {\tt false}.
   281 
   282   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
   283          \astofterm{\tau}}$ \ otherwise.
   284 
   285     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
   286     constructors go to {\tt Constant}s; type identifiers go to {\tt
   287       Variable}s; type applications go to {\tt Appl}s with the type
   288     constructor as the first element.  If \ttindex{show_sorts} is set to
   289     {\tt true}, some type variables are decorated with an \AST{} encoding
   290     of their sort.
   291 \end{itemize}
   292 %
   293 The \AST{}, after application of macros (see \S\ref{sec:macros}), is
   294 transformed into the final output string.  The built-in {\bf print AST
   295   translations}\indexbold{translations!print AST} reverse the
   296 parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
   297 
   298 For the actual printing process, the names attached to productions
   299 of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
   300 a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
   301 $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
   302 for~$c$.  Each argument~$x@i$ is converted to a string, and put in
   303 parentheses if its priority~$(p@i)$ requires this.  The resulting strings
   304 and their syntactic sugar (denoted by \dots{} above) are joined to make a
   305 single string.
   306 
   307 If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
   308 than the corresponding production, it is first split into
   309 $((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
   310 with too few arguments or with non-constant head or without a
   311 corresponding production are printed as $f(x@1, \ldots, x@l)$ or
   312 $(\alpha@1, \ldots, \alpha@l) ty$.  Multiple productions associated
   313 with some name $c$ are tried in order of appearance.  An occurrence of
   314 $\Variable x$ is simply printed as~$x$.
   315 
   316 Blanks are {\em not\/} inserted automatically.  If blanks are required to
   317 separate tokens, specify them in the mixfix declaration, possibly preceded
   318 by a slash~({\tt/}) to allow a line break.
   319 \index{ASTs|)}
   320 
   321 
   322 
   323 \section{Macros: syntactic rewriting} \label{sec:macros}
   324 \index{macros|(}\index{rewriting!syntactic|(}
   325 
   326 Mixfix declarations alone can handle situations where there is a direct
   327 connection between the concrete syntax and the underlying term.  Sometimes
   328 we require a more elaborate concrete syntax, such as quantifiers and list
   329 notation.  Isabelle's {\bf macros} and {\bf translation functions} can
   330 perform translations such as
   331 \begin{center}\tt
   332   \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
   333     ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
   334     [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
   335   \end{tabular}
   336 \end{center}
   337 Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
   338 are the most powerful translation mechanism but are difficult to read or
   339 write.  Macros are specified by first-order rewriting systems that operate
   340 on abstract syntax trees.  They are usually easy to read and write, and can
   341 express all but the most obscure translations.
   342 
   343 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
   344 set theory.\footnote{This and the following theories are complete
   345   working examples, though they specify only syntax, no axioms.  The
   346   file {\tt ZF/ZF.thy} presents a full set theory definition,
   347   including many macro rules.} Theory {\tt SetSyntax} declares
   348 constants for set comprehension ({\tt Collect}), replacement ({\tt
   349   Replace}) and bounded universal quantification ({\tt Ball}).  Each
   350 of these binds some variables.  Without additional syntax we should
   351 have to write $\forall x \in A.  P$ as {\tt Ball(A,\%x.P)}, and
   352 similarly for the others.
   353 
   354 \begin{figure}
   355 \begin{ttbox}
   356 SetSyntax = Pure +
   357 types
   358   i o
   359 arities
   360   i, o :: logic
   361 consts
   362   Trueprop      :: o => prop                ("_" 5)
   363   Collect       :: [i, i => o] => i
   364   Replace       :: [i, [i, i] => o] => i
   365   Ball          :: [i, i => o] => o
   366 syntax
   367   "{\at}Collect"    :: [idt, i, o] => i         ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
   368   "{\at}Replace"    :: [idt, idt, i, o] => i    ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
   369   "{\at}Ball"       :: [idt, i, o] => o         ("(3ALL _:_./ _)" 10)
   370 translations
   371   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
   372   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
   373   "ALL x:A. P"  == "Ball(A, \%x. P)"
   374 end
   375 \end{ttbox}
   376 \caption{Macro example: set theory}\label{fig:set_trans}
   377 \end{figure}
   378 
   379 The theory specifies a variable-binding syntax through additional productions
   380 that have mixfix declarations.  Each non-copy production must specify some
   381 constant, which is used for building \AST{}s. \index{constants!syntactic} The
   382 additional constants are decorated with {\tt\at} to stress their purely
   383 syntactic purpose; they may not occur within the final well-typed terms,
   384 being declared as {\tt syntax} rather than {\tt consts}.
   385 
   386 The translations cause the replacement of external forms by internal forms
   387 after parsing, and vice versa before printing of terms.  As a specification
   388 of the set theory notation, they should be largely self-explanatory.  The
   389 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
   390 appear implicitly in the macro rules via their mixfix forms.
   391 
   392 Macros can define variable-binding syntax because they operate on \AST{}s,
   393 which have no inbuilt notion of bound variable.  The macro variables {\tt
   394   x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
   395 in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
   396 range over formulae containing bound variable occurrences.
   397 
   398 Other applications of the macro system can be less straightforward, and
   399 there are peculiarities.  The rest of this section will describe in detail
   400 how Isabelle macros are preprocessed and applied.
   401 
   402 
   403 \subsection{Specifying macros}
   404 Macros are basically rewrite rules on \AST{}s.  But unlike other macro
   405 systems found in programming languages, Isabelle's macros work in both
   406 directions.  Therefore a syntax contains two lists of rewrites: one for
   407 parsing and one for printing.
   408 
   409 \index{*translations section}
   410 The {\tt translations} section specifies macros.  The syntax for a macro is
   411 \[ (root)\; string \quad
   412    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
   413    \right\} \quad
   414    (root)\; string
   415 \]
   416 %
   417 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
   418 ({\tt ==}).  The two strings specify the left and right-hand sides of the
   419 macro rule.  The $(root)$ specification is optional; it specifies the
   420 nonterminal for parsing the $string$ and if omitted defaults to {\tt
   421   logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
   422 \begin{itemize}
   423 \item Rules must be left linear: $l$ must not contain repeated variables.
   424 
   425 \item Every variable in~$r$ must also occur in~$l$.
   426 \end{itemize}
   427 
   428 Macro rules may refer to any syntax from the parent theories.  They
   429 may also refer to anything defined before the current {\tt
   430   translations} section --- including any mixfix declarations.
   431 
   432 Upon declaration, both sides of the macro rule undergo parsing and parse
   433 \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
   434 macro expansion.  The lexer runs in a different mode that additionally
   435 accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
   436 {\tt _K}).  Thus, a constant whose name starts with an underscore can
   437 appear in macro rules but not in ordinary terms.
   438 
   439 Some atoms of the macro rule's \AST{} are designated as constants for
   440 matching.  These are all names that have been declared as classes, types or
   441 constants (logical and syntactic).
   442 
   443 The result of this preprocessing is two lists of macro rules, each
   444 stored as a pair of \AST{}s.  They can be viewed using {\tt
   445   print_syntax} (sections \ttindex{parse_rules} and
   446 \ttindex{print_rules}).  For theory~{\tt SetSyntax} of
   447 Fig.~\ref{fig:set_trans} these are
   448 \begin{ttbox}
   449 parse_rules:
   450   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
   451   ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
   452   ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
   453 print_rules:
   454   ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
   455   ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
   456   ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
   457 \end{ttbox}
   458 
   459 \begin{warn}
   460   Avoid choosing variable names that have previously been used as
   461   constants, types or type classes; the {\tt consts} section in the output
   462   of {\tt print_syntax} lists all such names.  If a macro rule works
   463   incorrectly, inspect its internal form as shown above, recalling that
   464   constants appear as quoted strings and variables without quotes.
   465 \end{warn}
   466 
   467 \begin{warn}
   468 If \ttindex{eta_contract} is set to {\tt true}, terms will be
   469 $\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
   470 abstraction nodes needed for print rules to match may vanish.  For example,
   471 \verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
   472 not apply and the output will be {\tt Ball(A, P)}.  This problem would not
   473 occur if \ML{} translation functions were used instead of macros (as is
   474 done for binder declarations).
   475 \end{warn}
   476 
   477 
   478 \begin{warn}
   479 Another trap concerns type constraints.  If \ttindex{show_types} is set to
   480 {\tt true}, bound variables will be decorated by their meta types at the
   481 binding place (but not at occurrences in the body).  Matching with
   482 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
   483 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
   484 appear in the external form, say \verb|{y::i:A::i. P::o}|.
   485 
   486 To allow such constraints to be re-read, your syntax should specify bound
   487 variables using the nonterminal~\ndx{idt}.  This is the case in our
   488 example.  Choosing {\tt id} instead of {\tt idt} is a common error.
   489 \end{warn}
   490 
   491 
   492 
   493 \subsection{Applying rules}
   494 As a term is being parsed or printed, an \AST{} is generated as an
   495 intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
   496 normalised by applying macro rules in the manner of a traditional term
   497 rewriting system.  We first examine how a single rule is applied.
   498 
   499 Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
   500 translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
   501 instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
   502 matched by $l$ may be replaced by the corresponding instance of~$r$, thus
   503 {\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
   504   place-holders} that may occur in rule patterns but not in ordinary
   505 \AST{}s; {\tt Variable} atoms serve this purpose.
   506 
   507 The matching of the object~$u$ by the pattern~$l$ is performed as follows:
   508 \begin{itemize}
   509   \item Every constant matches itself.
   510 
   511   \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
   512     This point is discussed further below.
   513 
   514   \item Every \AST{} in the object matches $\Variable x$ in the pattern,
   515     binding~$x$ to~$u$.
   516 
   517   \item One application matches another if they have the same number of
   518     subtrees and corresponding subtrees match.
   519 
   520   \item In every other case, matching fails.  In particular, {\tt
   521       Constant}~$x$ can only match itself.
   522 \end{itemize}
   523 A successful match yields a substitution that is applied to~$r$, generating
   524 the instance that replaces~$u$.
   525 
   526 The second case above may look odd.  This is where {\tt Variable}s of
   527 non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
   528 far removed from parse trees; at this level it is not yet known which
   529 identifiers will become constants, bounds, frees, types or classes.  As
   530 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
   531 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
   532 \ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other
   533 hand, when \AST{}s generated from terms for printing, all constants and type
   534 constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may
   535 contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
   536 insignificant at macro level because matching treats them alike.
   537 
   538 Because of this behaviour, different kinds of atoms with the same name are
   539 indistinguishable, which may make some rules prone to misbehaviour.  Example:
   540 \begin{ttbox}
   541 types
   542   Nil
   543 consts
   544   Nil     :: 'a list
   545 syntax
   546   "[]"    :: 'a list    ("[]")
   547 translations
   548   "[]"    == "Nil"
   549 \end{ttbox}
   550 The term {\tt Nil} will be printed as {\tt []}, just as expected.
   551 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
   552 expected!  Guess how type~{\tt Nil} is printed?
   553 
   554 Normalizing an \AST{} involves repeatedly applying macro rules until
   555 none are applicable.  Macro rules are chosen in order of appearance in
   556 the theory definitions.  You can watch the normalization of \AST{}s
   557 during parsing and printing by setting \ttindex{Syntax.trace_ast}
   558 to {\tt true}.\index{tracing!of macros} Alternatively, use
   559 \ttindex{Syntax.test_read}.  The information displayed when tracing
   560 includes the \AST{} before normalization ({\tt pre}), redexes with
   561 results ({\tt rewrote}), the normal form finally reached ({\tt post})
   562 and some statistics ({\tt normalize}).
   563 
   564 \subsection{Example: the syntax of finite sets}
   565 \index{examples!of macros}
   566 
   567 This example demonstrates the use of recursive macros to implement a
   568 convenient notation for finite sets.
   569 \index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
   570 \index{"@Enum@{\tt\at Enum} constant}
   571 \index{"@Finset@{\tt\at Finset} constant}
   572 \begin{ttbox}
   573 FinSyntax = SetSyntax +
   574 types
   575   is
   576 syntax
   577   ""            :: i => is                  ("_")
   578   "{\at}Enum"       :: [i, is] => is            ("_,/ _")
   579 consts
   580   empty         :: i                        ("{\ttlbrace}{\ttrbrace}")
   581   insert        :: [i, i] => i
   582 syntax
   583   "{\at}Finset"     :: is => i                  ("{\ttlbrace}(_){\ttrbrace}")
   584 translations
   585   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
   586   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
   587 end
   588 \end{ttbox}
   589 Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
   590 declarations above specify \verb|{x, y, z}| as the external representation
   591 of
   592 \begin{ttbox}
   593 insert(x, insert(y, insert(z, empty)))
   594 \end{ttbox}
   595 The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
   596   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
   597 allows a line break after the comma for \rmindex{pretty printing}; if no
   598 line break is required then a space is printed instead.
   599 
   600 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
   601 declaration.  Hence {\tt is} is not a logical type and may be used safely as
   602 a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be
   603 re-used for other enumerations of type~{\tt i} like lists or tuples.  If we
   604 had needed polymorphic enumerations, we could have used the predefined
   605 nonterminal symbol \ndx{args} and skipped this part altogether.
   606 
   607 \index{"@Finset@{\tt\at Finset} constant}
   608 Next follows {\tt empty}, which is already equipped with its syntax
   609 \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
   610 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
   611   i} enclosed in curly braces.  Remember that a pair of parentheses, as in
   612 \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
   613 
   614 The translations may look strange at first.  Macro rules are best
   615 understood in their internal forms:
   616 \begin{ttbox}
   617 parse_rules:
   618   ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
   619   ("{\at}Finset" x)  ->  ("insert" x "empty")
   620 print_rules:
   621   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
   622   ("insert" x "empty")  ->  ("{\at}Finset" x)
   623 \end{ttbox}
   624 This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
   625 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
   626 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
   627 The parse rules only work in the order given.
   628 
   629 \begin{warn}
   630   The \AST{} rewriter cannot distinguish constants from variables and looks
   631   only for names of atoms.  Thus the names of {\tt Constant}s occurring in
   632   the (internal) left-hand side of translation rules should be regarded as
   633   \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
   634   sufficiently long and strange names.  If a bound variable's name gets
   635   rewritten, the result will be incorrect; for example, the term
   636 \begin{ttbox}
   637 \%empty insert. insert(x, empty)
   638 \end{ttbox}
   639 \par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
   640 \end{warn}
   641 
   642 
   643 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
   644 \index{examples!of macros}
   645 
   646 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
   647 the right-hand side.  Something like \verb|"K(B)" => "%x.B"| is illegal;
   648 if allowed, it could cause variable capture.  In such cases you usually
   649 must fall back on translation functions.  But a trick can make things
   650 readable in some cases: {\em calling\/} translation functions by parse
   651 macros:
   652 \begin{ttbox}
   653 ProdSyntax = SetSyntax +
   654 consts
   655   Pi            :: [i, i => i] => i
   656 syntax
   657   "{\at}PROD"       :: [idt, i, i] => i       ("(3PROD _:_./ _)" 10)
   658   "{\at}->"         :: [i, i] => i            ("(_ ->/ _)" [51, 50] 50)
   659 \ttbreak
   660 translations
   661   "PROD x:A. B" => "Pi(A, \%x. B)"
   662   "A -> B"      => "Pi(A, _K(B))"
   663 end
   664 ML
   665   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
   666 \end{ttbox}
   667 
   668 Here {\tt Pi} is a logical constant for constructing general products.
   669 Two external forms exist: the general case {\tt PROD x:A.B} and the
   670 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
   671 does not depend on~{\tt x}.
   672 
   673 The second parse macro introduces {\tt _K(B)}, which later becomes
   674 \verb|%x.B| due to a parse translation associated with \cdx{_K}.
   675 Unfortunately there is no such trick for printing, so we have to add a {\tt
   676 ML} section for the print translation \ttindex{dependent_tr'}.
   677 
   678 Recall that identifiers with a leading {\tt _} are allowed in translation
   679 rules, but not in ordinary terms.  Thus we can create \AST{}s containing
   680 names that are not directly expressible.
   681 
   682 The parse translation for {\tt _K} is already installed in Pure, and the
   683 function {\tt dependent_tr'} is exported by the syntax module for public use.
   684 See \S\ref{sec:tr_funs} below for more of the arcane lore of translation
   685 functions.  \index{macros|)}\index{rewriting!syntactic|)}
   686 
   687 
   688 \section{Translation functions} \label{sec:tr_funs}
   689 \index{translations|(}
   690 %
   691 This section describes the translation function mechanism.  By writing \ML{}
   692 functions, you can do almost everything to terms or \AST{}s during parsing and
   693 printing.  The logic LK is a good example of sophisticated transformations
   694 between internal and external representations of sequents; here, macros would
   695 be useless.
   696 
   697 A full understanding of translations requires some familiarity
   698 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
   699 {\tt Syntax.ast} and the encodings of types and terms as such at the various
   700 stages of the parsing or printing process.  Most users should never need to
   701 use translation functions.
   702 
   703 \subsection{Declaring translation functions}
   704 There are four kinds of translation functions, with one of these
   705 coming in two variants.  Each such function is associated with a name,
   706 which triggers calls to it.  Such names can be constants (logical or
   707 syntactic) or type constructors.
   708 
   709 Function {\tt print_syntax} displays the sets of names associated with the
   710 translation functions of a theory under \texttt{parse_ast_translation}, etc.
   711 You can add new ones via the {\tt ML} section\index{*ML section} of a theory
   712 definition file.  Even though the {\tt ML} section is the very last part of
   713 the file, newly installed translation functions are already effective when
   714 processing all of the preceding sections.
   715 
   716 The {\tt ML} section's contents are simply copied verbatim near the
   717 beginning of the \ML\ file generated from a theory definition file.
   718 Definitions made here are accessible as components of an \ML\ 
   719 structure; to make some parts private, use an \ML{} {\tt local}
   720 declaration.  The {\ML} code may install translation functions by
   721 declaring any of the following identifiers:
   722 \begin{ttbox}
   723 val parse_ast_translation   : (string * (ast list -> ast)) list
   724 val print_ast_translation   : (string * (ast list -> ast)) list
   725 val parse_translation       : (string * (term list -> term)) list
   726 val print_translation       : (string * (term list -> term)) list
   727 val typed_print_translation :
   728     (string * (bool -> typ -> term list -> term)) list
   729 \end{ttbox}
   730 
   731 \subsection{The translation strategy}
   732 The different kinds of translation functions are called during the
   733 transformations between parse trees, \AST{}s and terms (recall
   734 Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
   735 $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation
   736 function $f$ of appropriate kind exists for $c$, the result is
   737 computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
   738 
   739 For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.
   740 A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1,
   741   \ldots, x@n}$.  For term translations, the arguments are terms and a
   742 combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}
   743 (c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more
   744 sophisticated transformations than \AST{}s do, typically involving
   745 abstractions and bound variables. {\em Typed} print translations may
   746 even peek at the type $\tau$ of the constant they are invoked on; they
   747 are also passed the current value of the \ttindex{show_sorts} flag.
   748 
   749 Regardless of whether they act on terms or \AST{}s, translation
   750 functions called during the parsing process differ from those for
   751 printing more fundamentally in their overall behaviour:
   752 \begin{description}
   753 \item[Parse translations] are applied bottom-up.  The arguments are already in
   754   translated form.  The translations must not fail; exceptions trigger an
   755   error message.  There may never be more than one function associated with
   756   any syntactic name.
   757   
   758 \item[Print translations] are applied top-down.  They are supplied with
   759   arguments that are partly still in internal form.  The result again
   760   undergoes translation; therefore a print translation should not introduce as
   761   head the very constant that invoked it.  The function may raise exception
   762   \xdx{Match} to indicate failure; in this event it has no effect.  Multiple
   763   functions associated with some syntactic name are tried in an unspecified
   764   order.
   765 \end{description}
   766 
   767 Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
   768 \ttindex{Const} for terms --- can invoke translation functions.  This
   769 causes another difference between parsing and printing.
   770 
   771 Parsing starts with a string and the constants are not yet identified.
   772 Only parse tree heads create {\tt Constant}s in the resulting \AST, as
   773 described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
   774 introduce further {\tt Constant}s.  When the final \AST{} is converted to a
   775 term, all {\tt Constant}s become {\tt Const}s, as described in
   776 \S\ref{sec:termofast}.
   777 
   778 Printing starts with a well-typed term and all the constants are known.  So
   779 all logical constants and type constructors may invoke print translations.
   780 These, and macros, may introduce further constants.
   781 
   782 
   783 \subsection{Example: a print translation for dependent types}
   784 \index{examples!of translations}\indexbold{*dependent_tr'}
   785 
   786 Let us continue the dependent type example (page~\pageref{prod_trans}) by
   787 examining the parse translation for~\cdx{_K} and the print translation
   788 {\tt dependent_tr'}, which are both built-in.  By convention, parse
   789 translations have names ending with {\tt _tr} and print translations have
   790 names ending with {\tt _tr'}.  Search for such names in the Isabelle
   791 sources to locate more examples.
   792 
   793 Here is the parse translation for {\tt _K}:
   794 \begin{ttbox}
   795 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
   796   | k_tr ts = raise TERM ("k_tr", ts);
   797 \end{ttbox}
   798 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
   799 {\tt Abs} node with a body derived from $t$.  Since terms given to parse
   800 translations are not yet typed, the type of the bound variable in the new
   801 {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
   802 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
   803 a basic term manipulation function defined in {\tt Pure/term.ML}.
   804 
   805 Here is the print translation for dependent types:
   806 \begin{ttbox}
   807 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
   808       if 0 mem (loose_bnos B) then
   809         let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in
   810           list_comb
   811             (Const (q,dummyT) $
   812              Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts)
   813         end
   814       else list_comb (Const (r, dummyT) $ A $ B, ts)
   815   | dependent_tr' _ _ = raise Match;
   816 \end{ttbox}
   817 The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt
   818   dependent_tr'} by a partial application during its installation.
   819 For example, we could set up print translations for both {\tt Pi} and
   820 {\tt Sigma} by including
   821 \begin{ttbox}\index{*ML section}
   822 val print_translation =
   823   [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
   824    ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
   825 \end{ttbox}
   826 within the {\tt ML} section.  The first of these transforms ${\tt
   827   Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
   828 $\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not
   829 depend on~$x$.  It checks this using \ttindex{loose_bnos}, yet another
   830 function from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$
   831 renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt
   832   Bound} nodes referring to the {\tt Abs} node replaced by
   833 $\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound
   834 variable).
   835 
   836 We must be careful with types here.  While types of {\tt Const}s are
   837 ignored, type constraints may be printed for some {\tt Free}s and
   838 {\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
   839 \ttindex{dummyT} are never printed with constraint, though.  The line
   840 \begin{ttbox}
   841         let val (x', B') = Syntax.variant_abs' (x, dummyT, B);
   842 \end{ttbox}\index{*Syntax.variant_abs'}
   843 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
   844 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
   845 correct type~{\tt T}, so this is the only place where a type
   846 constraint might appear.
   847 
   848 Also note that we are responsible to mark free identifiers that
   849 actually represent bound variables.  This is achieved by
   850 \ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.
   851 Failing to do so may cause these names to be printed in the wrong
   852 style.  \index{translations|)} \index{syntax!transformations|)}
   853 
   854 
   855 \section{Token translations} \label{sec:tok_tr}
   856 \index{token translations|(}
   857 %
   858 Isabelle's meta-logic features quite a lot of different kinds of
   859 identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},
   860 {\em bound}, {\em var}.  One might want to have these printed in
   861 different styles, e.g.\ in bold or italic, or even transcribed into
   862 something more readable like $\alpha, \alpha', \beta$ instead of {\tt
   863   'a}, {\tt 'aa}, {\tt 'b} for type variables.  Token translations
   864 provide a means to such ends, enabling the user to install certain
   865 \ML{} functions associated with any logical \rmindex{token class} and
   866 depending on some \rmindex{print mode}.
   867 
   868 The logical class of identifiers can not necessarily be determined by
   869 its syntactic category, though.  For example, consider free vs.\ bound
   870 variables.  So Isabelle's pretty printing mechanism, starting from
   871 fully typed terms, has to be careful to preserve this additional
   872 information\footnote{This is done by marking atoms in abstract syntax
   873   trees appropriately.  The marks are actually visible by print
   874   translation functions -- they are just special constants applied to
   875   atomic asts, for example \texttt{("_bound" x)}.}.  In particular,
   876 user-supplied print translation functions operating on terms have to
   877 be well-behaved in this respect.  Free identifiers introduced to
   878 represent bound variables have to be marked appropriately (cf.\ the
   879 example at the end of \S\ref{sec:tr_funs}).
   880 
   881 \medskip Token translations may be installed by declaring the
   882 \ttindex{token_translation} value within the \texttt{ML} section of a theory
   883 definition file:
   884 \begin{ttbox}
   885 val token_translation: 
   886       (string * string * (string -> string * real)) list
   887 \end{ttbox}
   888 The elements of this list are of the form $(m, c, f)$, where $m$ is a print
   889 mode identifier, $c$ a token class, and $f\colon string \to string \times
   890 real$ the actual translation function.  Assuming that $x$ is of identifier
   891 class $c$, and print mode $m$ is the first (active) mode providing some
   892 translation for $c$, then $x$ is output according to $f(x) = (x', len)$.
   893 Thereby $x'$ is the modified identifier name and $len$ its visual length in
   894 terms of characters (e.g.\ length 1.0 would correspond to $1/2$\,em in
   895 \LaTeX).  Thus $x'$ may include non-printing parts like control sequences or
   896 markup information for typesetting systems.
   897 
   898 
   899 \index{token translations|)}
   900 
   901 
   902 %%% Local Variables: 
   903 %%% mode: latex
   904 %%% TeX-master: "ref"
   905 %%% End: