doc-src/Ref/syntax.tex
author lcp
Fri, 29 Jul 1994 13:30:48 +0200
changeset 499 5a54c796b808
parent 332 01b87a921967
child 864 d63b111b917a
permissions -rw-r--r--
deleted repeated "the" in "before the the .thy file"
lcp@323
     1
%% $Id$
lcp@323
     2
\chapter{Syntax Transformations} \label{chap:syntax}
lcp@323
     3
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
lcp@323
     4
\newcommand\mtt[1]{\mbox{\tt #1}}
lcp@323
     5
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
lcp@323
     6
\newcommand\Constant{\ttfct{Constant}}
lcp@323
     7
\newcommand\Variable{\ttfct{Variable}}
lcp@323
     8
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
lcp@323
     9
\index{syntax!transformations|(}
lcp@323
    10
lcp@323
    11
This chapter is intended for experienced Isabelle users who need to define
lcp@323
    12
macros or code their own translation functions.  It describes the
lcp@323
    13
transformations between parse trees, abstract syntax trees and terms.
lcp@323
    14
lcp@323
    15
lcp@323
    16
\section{Abstract syntax trees} \label{sec:asts}
lcp@323
    17
\index{ASTs|(} 
lcp@323
    18
lcp@323
    19
The parser, given a token list from the lexer, applies productions to yield
lcp@323
    20
a parse tree\index{parse trees}.  By applying some internal transformations
lcp@323
    21
the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
lcp@323
    22
expansion, further translations and finally type inference yields a
lcp@323
    23
well-typed term.  The printing process is the reverse, except for some
lcp@323
    24
subtleties to be discussed later.
lcp@323
    25
lcp@323
    26
Figure~\ref{fig:parse_print} outlines the parsing and printing process.
lcp@323
    27
Much of the complexity is due to the macro mechanism.  Using macros, you
lcp@323
    28
can specify most forms of concrete syntax without writing any \ML{} code.
lcp@323
    29
lcp@323
    30
\begin{figure}
lcp@323
    31
\begin{center}
lcp@323
    32
\begin{tabular}{cl}
lcp@323
    33
string          & \\
lcp@323
    34
$\downarrow$    & parser \\
lcp@323
    35
parse tree      & \\
lcp@323
    36
$\downarrow$    & parse \AST{} translation \\
lcp@323
    37
\AST{}             & \\
lcp@323
    38
$\downarrow$    & \AST{} rewriting (macros) \\
lcp@323
    39
\AST{}             & \\
lcp@323
    40
$\downarrow$    & parse translation, type inference \\
lcp@323
    41
--- well-typed term --- & \\
lcp@323
    42
$\downarrow$    & print translation \\
lcp@323
    43
\AST{}             & \\
lcp@323
    44
$\downarrow$    & \AST{} rewriting (macros) \\
lcp@323
    45
\AST{}             & \\
lcp@323
    46
$\downarrow$    & print \AST{} translation, printer \\
lcp@323
    47
string          &
lcp@323
    48
\end{tabular}
lcp@323
    49
lcp@323
    50
\end{center}
lcp@323
    51
\caption{Parsing and printing}\label{fig:parse_print}
lcp@323
    52
\end{figure}
lcp@323
    53
lcp@323
    54
Abstract syntax trees are an intermediate form between the raw parse trees
lcp@323
    55
and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
lcp@323
    56
variable) or a list of {\em at least two\/} subtrees.  Internally, they
lcp@323
    57
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
lcp@323
    58
\index{*Appl}
lcp@323
    59
\begin{ttbox}
lcp@323
    60
datatype ast = Constant of string
lcp@323
    61
             | Variable of string
lcp@323
    62
             | Appl of ast list
lcp@323
    63
\end{ttbox}
lcp@323
    64
%
lcp@323
    65
Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
lcp@323
    66
atoms are shown as quoted strings, variable atoms as non-quoted strings and
lcp@332
    67
applications as a parenthesised list of subtrees.  For example, the \AST
lcp@323
    68
\begin{ttbox}
lcp@323
    69
Appl [Constant "_constrain",
lcp@323
    70
      Appl [Constant "_abs", Variable "x", Variable "t"],
lcp@323
    71
      Appl [Constant "fun", Variable "'a", Variable "'b"]]
lcp@323
    72
\end{ttbox}
lcp@323
    73
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
lcp@323
    74
Both {\tt ()} and {\tt (f)} are illegal because they have too few
lcp@323
    75
subtrees. 
lcp@323
    76
lcp@323
    77
The resemblance to Lisp's S-expressions is intentional, but there are two
lcp@323
    78
kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
lcp@323
    79
names {\tt Constant} and {\tt Variable} too literally; in the later
lcp@323
    80
translation to terms, $\Variable x$ may become a constant, free or bound
lcp@323
    81
variable, even a type constructor or class name; the actual outcome depends
lcp@323
    82
on the context.
lcp@323
    83
lcp@323
    84
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
lcp@323
    85
application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
lcp@323
    86
application is determined later by context; it could be a type constructor
lcp@323
    87
applied to types.
lcp@323
    88
lcp@323
    89
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
lcp@323
    90
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
lcp@323
    91
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
lcp@323
    92
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
lcp@323
    93
constructor \ttindex{Bound}).
lcp@323
    94
lcp@323
    95
lcp@323
    96
\section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
lcp@323
    97
\index{ASTs!made from parse trees}
lcp@323
    98
\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
lcp@323
    99
lcp@323
   100
The parse tree is the raw output of the parser.  Translation functions,
lcp@323
   101
called {\bf parse AST translations}\indexbold{translations!parse AST},
lcp@323
   102
transform the parse tree into an abstract syntax tree.
lcp@323
   103
lcp@323
   104
The parse tree is constructed by nesting the right-hand sides of the
lcp@323
   105
productions used to recognize the input.  Such parse trees are simply lists
lcp@323
   106
of tokens and constituent parse trees, the latter representing the
lcp@323
   107
nonterminals of the productions.  Let us refer to the actual productions in
lcp@323
   108
the form displayed by {\tt Syntax.print_syntax}.
lcp@323
   109
lcp@323
   110
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
lcp@323
   111
by stripping out delimiters and copy productions.  More precisely, the
lcp@323
   112
mapping $\astofpt{-}$ is derived from the productions as follows:
lcp@323
   113
\begin{itemize}
lcp@323
   114
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
lcp@323
   115
  \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
lcp@323
   116
lcp@323
   117
\item Copy productions:\index{productions!copy}
lcp@323
   118
  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
lcp@323
   119
  strings of delimiters, which are discarded.  $P$ stands for the single
lcp@323
   120
  constituent that is not a delimiter; it is either a nonterminal symbol or
lcp@323
   121
  a name token.
lcp@323
   122
lcp@323
   123
  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
lcp@323
   124
    Here there are no constituents other than delimiters, which are
lcp@323
   125
    discarded. 
lcp@323
   126
lcp@323
   127
  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
lcp@323
   128
    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
lcp@323
   129
    application whose head constant is~$c$:
lcp@323
   130
    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = 
lcp@323
   131
       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
lcp@323
   132
    \]
lcp@323
   133
\end{itemize}
lcp@323
   134
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
lcp@323
   135
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
lcp@323
   136
These examples illustrate the need for further translations to make \AST{}s
lcp@323
   137
closer to the typed $\lambda$-calculus.  The Pure syntax provides
lcp@323
   138
predefined parse \AST{} translations\index{translations!parse AST} for
lcp@323
   139
ordinary applications, type applications, nested abstractions, meta
lcp@323
   140
implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
lcp@323
   141
effect on some representative input strings.
lcp@323
   142
lcp@323
   143
lcp@323
   144
\begin{figure}
lcp@323
   145
\begin{center}
lcp@323
   146
\tt\begin{tabular}{ll}
lcp@323
   147
\rm input string    & \rm \AST \\\hline
lcp@323
   148
"f"                 & f \\
lcp@323
   149
"'a"                & 'a \\
lcp@323
   150
"t == u"            & ("==" t u) \\
lcp@323
   151
"f(x)"              & ("_appl" f x) \\
lcp@323
   152
"f(x, y)"           & ("_appl" f ("_args" x y)) \\
lcp@323
   153
"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
lcp@323
   154
"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
lcp@323
   155
\end{tabular}
lcp@323
   156
\end{center}
lcp@323
   157
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
lcp@323
   158
\end{figure}
lcp@323
   159
lcp@323
   160
\begin{figure}
lcp@323
   161
\begin{center}
lcp@323
   162
\tt\begin{tabular}{ll}
lcp@323
   163
\rm input string            & \rm \AST{} \\\hline
lcp@323
   164
"f(x, y, z)"                & (f x y z) \\
lcp@323
   165
"'a ty"                     & (ty 'a) \\
lcp@323
   166
"('a, 'b) ty"               & (ty 'a 'b) \\
lcp@323
   167
"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
lcp@323
   168
"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
lcp@323
   169
"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
lcp@323
   170
"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
lcp@323
   171
\end{tabular}
lcp@323
   172
\end{center}
lcp@323
   173
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
lcp@323
   174
\end{figure}
lcp@323
   175
lcp@323
   176
The names of constant heads in the \AST{} control the translation process.
lcp@323
   177
The list of constants invoking parse \AST{} translations appears in the
lcp@323
   178
output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
lcp@323
   179
lcp@323
   180
lcp@323
   181
\section{Transforming \AST{}s to terms}\label{sec:termofast}
lcp@323
   182
\index{terms!made from ASTs}
lcp@323
   183
\newcommand\termofast[1]{\lbrakk#1\rbrakk}
lcp@323
   184
lcp@323
   185
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323
   186
transformed into a term.  This term is probably ill-typed since type
lcp@323
   187
inference has not occurred yet.  The term may contain type constraints
lcp@323
   188
consisting of applications with head {\tt "_constrain"}; the second
lcp@323
   189
argument is a type encoded as a term.  Type inference later introduces
lcp@323
   190
correct types or rejects the input.
lcp@323
   191
lcp@323
   192
Another set of translation functions, namely parse
lcp@323
   193
translations\index{translations!parse}, may affect this process.  If we
lcp@323
   194
ignore parse translations for the time being, then \AST{}s are transformed
lcp@323
   195
to terms by mapping \AST{} constants to constants, \AST{} variables to
lcp@323
   196
schematic or free variables and \AST{} applications to applications.
lcp@323
   197
lcp@323
   198
More precisely, the mapping $\termofast{-}$ is defined by
lcp@323
   199
\begin{itemize}
lcp@323
   200
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
lcp@323
   201
  \mtt{dummyT})$.
lcp@323
   202
lcp@323
   203
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
lcp@323
   204
  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
lcp@323
   205
  the index extracted from~$xi$.
lcp@323
   206
lcp@323
   207
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
lcp@323
   208
  \mtt{dummyT})$.
lcp@323
   209
lcp@323
   210
\item Function applications with $n$ arguments:
lcp@323
   211
    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = 
lcp@323
   212
       \termofast{f} \ttapp
lcp@323
   213
         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
lcp@323
   214
    \]
lcp@323
   215
\end{itemize}
lcp@323
   216
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
lcp@323
   217
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
lcp@323
   218
while \ttindex{dummyT} stands for some dummy type that is ignored during
lcp@323
   219
type inference.
lcp@323
   220
lcp@323
   221
So far the outcome is still a first-order term.  Abstractions and bound
lcp@323
   222
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
lcp@323
   223
by parse translations.  Such translations are attached to {\tt "_abs"},
lcp@323
   224
{\tt "!!"} and user-defined binders.
lcp@323
   225
lcp@323
   226
lcp@323
   227
\section{Printing of terms}
lcp@323
   228
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
lcp@323
   229
lcp@323
   230
The output phase is essentially the inverse of the input phase.  Terms are
lcp@323
   231
translated via abstract syntax trees into strings.  Finally the strings are
lcp@323
   232
pretty printed.
lcp@323
   233
lcp@323
   234
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
lcp@323
   235
terms into \AST{}s.  Ignoring those, the transformation maps
lcp@323
   236
term constants, variables and applications to the corresponding constructs
lcp@323
   237
on \AST{}s.  Abstractions are mapped to applications of the special
lcp@323
   238
constant {\tt _abs}.
lcp@323
   239
lcp@323
   240
More precisely, the mapping $\astofterm{-}$ is defined as follows:
lcp@323
   241
\begin{itemize}
lcp@323
   242
  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
lcp@323
   243
lcp@323
   244
  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
lcp@323
   245
    \tau)$.
lcp@323
   246
lcp@323
   247
  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
lcp@323
   248
    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
lcp@323
   249
    the {\tt indexname} $(x, i)$.
lcp@323
   250
lcp@323
   251
  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
lcp@323
   252
    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
lcp@323
   253
    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
lcp@323
   254
    the free variable $x'$.  This replaces corresponding occurrences of the
lcp@323
   255
    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
lcp@323
   256
    \mtt{dummyT})$:
lcp@323
   257
   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = 
lcp@323
   258
      \Appl{\Constant \mtt{"_abs"}, 
lcp@323
   259
        constrain(\Variable x', \tau), \astofterm{t'}}.
lcp@323
   260
    \]
lcp@323
   261
lcp@323
   262
  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.  
lcp@323
   263
    The occurrence of constructor \ttindex{Bound} should never happen
lcp@323
   264
    when printing well-typed terms; it indicates a de Bruijn index with no
lcp@323
   265
    matching abstraction.
lcp@323
   266
lcp@323
   267
  \item Where $f$ is not an application,
lcp@323
   268
    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = 
lcp@323
   269
       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
lcp@323
   270
    \]
lcp@323
   271
\end{itemize}
lcp@323
   272
%
lcp@332
   273
Type constraints\index{type constraints} are inserted to allow the printing
lcp@332
   274
of types.  This is governed by the boolean variable \ttindex{show_types}:
lcp@323
   275
\begin{itemize}
lcp@323
   276
  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
lcp@332
   277
    \ttindex{show_types} is set to {\tt false}.
lcp@323
   278
lcp@323
   279
  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, 
lcp@323
   280
         \astofterm{\tau}}$ \ otherwise.  
lcp@323
   281
lcp@323
   282
    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
lcp@323
   283
    constructors go to {\tt Constant}s; type identifiers go to {\tt
lcp@323
   284
      Variable}s; type applications go to {\tt Appl}s with the type
lcp@323
   285
    constructor as the first element.  If \ttindex{show_sorts} is set to
lcp@323
   286
    {\tt true}, some type variables are decorated with an \AST{} encoding
lcp@323
   287
    of their sort.
lcp@323
   288
\end{itemize}
lcp@323
   289
%
lcp@323
   290
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323
   291
transformed into the final output string.  The built-in {\bf print AST
lcp@332
   292
  translations}\indexbold{translations!print AST} reverse the
lcp@323
   293
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
lcp@323
   294
lcp@323
   295
For the actual printing process, the names attached to productions
lcp@323
   296
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
lcp@323
   297
a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
lcp@323
   298
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
lcp@323
   299
for~$c$.  Each argument~$x@i$ is converted to a string, and put in
lcp@323
   300
parentheses if its priority~$(p@i)$ requires this.  The resulting strings
lcp@323
   301
and their syntactic sugar (denoted by \dots{} above) are joined to make a
lcp@323
   302
single string.
lcp@323
   303
lcp@323
   304
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
lcp@323
   305
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
lcp@323
   306
x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with
lcp@323
   307
non-constant head or without a corresponding production are printed as
lcp@323
   308
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
lcp@323
   309
$\Variable x$ is simply printed as~$x$.
lcp@323
   310
lcp@323
   311
Blanks are {\em not\/} inserted automatically.  If blanks are required to
lcp@323
   312
separate tokens, specify them in the mixfix declaration, possibly preceded
lcp@323
   313
by a slash~({\tt/}) to allow a line break.
lcp@323
   314
\index{ASTs|)}
lcp@323
   315
lcp@323
   316
lcp@323
   317
lcp@323
   318
\section{Macros: Syntactic rewriting} \label{sec:macros}
lcp@323
   319
\index{macros|(}\index{rewriting!syntactic|(} 
lcp@323
   320
lcp@323
   321
Mixfix declarations alone can handle situations where there is a direct
lcp@323
   322
connection between the concrete syntax and the underlying term.  Sometimes
lcp@323
   323
we require a more elaborate concrete syntax, such as quantifiers and list
lcp@323
   324
notation.  Isabelle's {\bf macros} and {\bf translation functions} can
lcp@323
   325
perform translations such as
lcp@323
   326
\begin{center}\tt
lcp@323
   327
  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
lcp@323
   328
    ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
lcp@323
   329
    [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
lcp@323
   330
  \end{tabular}
lcp@323
   331
\end{center}
lcp@323
   332
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
lcp@323
   333
are the most powerful translation mechanism but are difficult to read or
lcp@323
   334
write.  Macros are specified by first-order rewriting systems that operate
lcp@323
   335
on abstract syntax trees.  They are usually easy to read and write, and can
lcp@323
   336
express all but the most obscure translations.
lcp@323
   337
lcp@323
   338
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
lcp@323
   339
theory.\footnote{This and the following theories are complete working
lcp@323
   340
  examples, though they specify only syntax, no axioms.  The file {\tt
lcp@332
   341
    ZF/ZF.thy} presents the full set theory definition, including many
lcp@323
   342
  macro rules.}  Theory {\tt SET} defines constants for set comprehension
lcp@323
   343
({\tt Collect}), replacement ({\tt Replace}) and bounded universal
lcp@323
   344
quantification ({\tt Ball}).  Each of these binds some variables.  Without
lcp@332
   345
additional syntax we should have to write $\forall x \in A.  P$ as {\tt
lcp@323
   346
  Ball(A,\%x.P)}, and similarly for the others.
lcp@323
   347
lcp@323
   348
\begin{figure}
lcp@323
   349
\begin{ttbox}
lcp@323
   350
SET = Pure +
lcp@323
   351
types
lcp@323
   352
  i, o
lcp@323
   353
arities
lcp@323
   354
  i, o :: logic
lcp@323
   355
consts
lcp@323
   356
  Trueprop      :: "o => prop"              ("_" 5)
lcp@323
   357
  Collect       :: "[i, i => o] => i"
lcp@323
   358
  "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
lcp@323
   359
  Replace       :: "[i, [i, i] => o] => i"
lcp@323
   360
  "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
lcp@323
   361
  Ball          :: "[i, i => o] => o"
lcp@323
   362
  "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
lcp@323
   363
translations
lcp@323
   364
  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
lcp@323
   365
  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
lcp@323
   366
  "ALL x:A. P"  == "Ball(A, \%x. P)"
lcp@323
   367
end
lcp@323
   368
\end{ttbox}
lcp@323
   369
\caption{Macro example: set theory}\label{fig:set_trans}
lcp@323
   370
\end{figure}
lcp@323
   371
lcp@323
   372
The theory specifies a variable-binding syntax through additional
lcp@323
   373
productions that have mixfix declarations.  Each non-copy production must
lcp@323
   374
specify some constant, which is used for building \AST{}s.
lcp@323
   375
\index{constants!syntactic} The additional constants are decorated with
lcp@323
   376
{\tt\at} to stress their purely syntactic purpose; they should never occur
lcp@323
   377
within the final well-typed terms.  Furthermore, they cannot be written in
lcp@323
   378
formulae because they are not legal identifiers.
lcp@323
   379
lcp@323
   380
The translations cause the replacement of external forms by internal forms
lcp@323
   381
after parsing, and vice versa before printing of terms.  As a specification
lcp@323
   382
of the set theory notation, they should be largely self-explanatory.  The
lcp@323
   383
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
lcp@323
   384
appear implicitly in the macro rules via their mixfix forms.
lcp@323
   385
lcp@323
   386
Macros can define variable-binding syntax because they operate on \AST{}s,
lcp@323
   387
which have no inbuilt notion of bound variable.  The macro variables {\tt
lcp@323
   388
  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
lcp@323
   389
in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
lcp@323
   390
range over formulae containing bound variable occurrences.
lcp@323
   391
lcp@323
   392
Other applications of the macro system can be less straightforward, and
lcp@323
   393
there are peculiarities.  The rest of this section will describe in detail
lcp@323
   394
how Isabelle macros are preprocessed and applied.
lcp@323
   395
lcp@323
   396
lcp@323
   397
\subsection{Specifying macros}
lcp@323
   398
Macros are basically rewrite rules on \AST{}s.  But unlike other macro
lcp@323
   399
systems found in programming languages, Isabelle's macros work in both
lcp@323
   400
directions.  Therefore a syntax contains two lists of rewrites: one for
lcp@323
   401
parsing and one for printing.
lcp@323
   402
lcp@323
   403
\index{*translations section}
lcp@323
   404
The {\tt translations} section specifies macros.  The syntax for a macro is
lcp@323
   405
\[ (root)\; string \quad
lcp@323
   406
   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
lcp@323
   407
   \right\} \quad
lcp@323
   408
   (root)\; string 
lcp@323
   409
\]
lcp@323
   410
%
lcp@323
   411
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
lcp@323
   412
({\tt ==}).  The two strings specify the left and right-hand sides of the
lcp@323
   413
macro rule.  The $(root)$ specification is optional; it specifies the
lcp@323
   414
nonterminal for parsing the $string$ and if omitted defaults to {\tt
lcp@323
   415
  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
lcp@323
   416
\begin{itemize}
lcp@323
   417
\item Rules must be left linear: $l$ must not contain repeated variables.
lcp@323
   418
lcp@323
   419
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
lcp@323
   420
  (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
lcp@323
   421
lcp@323
   422
\item Every variable in~$r$ must also occur in~$l$.
lcp@323
   423
\end{itemize}
lcp@323
   424
lcp@323
   425
Macro rules may refer to any syntax from the parent theories.  They may
lcp@499
   426
also refer to anything defined before the {\tt .thy} file's {\tt
lcp@323
   427
  translations} section --- including any mixfix declarations.
lcp@323
   428
lcp@323
   429
Upon declaration, both sides of the macro rule undergo parsing and parse
lcp@323
   430
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
lcp@323
   431
macro expansion.  The lexer runs in a different mode that additionally
lcp@323
   432
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
lcp@323
   433
{\tt _K}).  Thus, a constant whose name starts with an underscore can
lcp@323
   434
appear in macro rules but not in ordinary terms.
lcp@323
   435
lcp@323
   436
Some atoms of the macro rule's \AST{} are designated as constants for
lcp@323
   437
matching.  These are all names that have been declared as classes, types or
lcp@323
   438
constants.
lcp@323
   439
lcp@323
   440
The result of this preprocessing is two lists of macro rules, each stored
lcp@323
   441
as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
lcp@323
   442
(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
lcp@323
   443
theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
lcp@323
   444
\begin{ttbox}
lcp@323
   445
parse_rules:
lcp@323
   446
  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
lcp@323
   447
  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
lcp@323
   448
  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
lcp@323
   449
print_rules:
lcp@323
   450
  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
lcp@323
   451
  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
lcp@323
   452
  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
lcp@323
   453
\end{ttbox}
lcp@323
   454
lcp@323
   455
\begin{warn}
lcp@323
   456
  Avoid choosing variable names that have previously been used as
lcp@323
   457
  constants, types or type classes; the {\tt consts} section in the output
lcp@323
   458
  of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
lcp@323
   459
  incorrectly, inspect its internal form as shown above, recalling that
lcp@323
   460
  constants appear as quoted strings and variables without quotes.
lcp@323
   461
\end{warn}
lcp@323
   462
lcp@323
   463
\begin{warn}
lcp@323
   464
If \ttindex{eta_contract} is set to {\tt true}, terms will be
lcp@323
   465
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
lcp@323
   466
abstraction nodes needed for print rules to match may vanish.  For example,
lcp@332
   467
\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
lcp@323
   468
not apply and the output will be {\tt Ball(A, P)}.  This problem would not
lcp@323
   469
occur if \ML{} translation functions were used instead of macros (as is
lcp@323
   470
done for binder declarations).
lcp@323
   471
\end{warn}
lcp@323
   472
lcp@323
   473
lcp@323
   474
\begin{warn}
lcp@323
   475
Another trap concerns type constraints.  If \ttindex{show_types} is set to
lcp@323
   476
{\tt true}, bound variables will be decorated by their meta types at the
lcp@323
   477
binding place (but not at occurrences in the body).  Matching with
lcp@323
   478
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
lcp@323
   479
"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
lcp@323
   480
appear in the external form, say \verb|{y::i:A::i. P::o}|.  
lcp@323
   481
lcp@323
   482
To allow such constraints to be re-read, your syntax should specify bound
lcp@323
   483
variables using the nonterminal~\ndx{idt}.  This is the case in our
lcp@323
   484
example.  Choosing {\tt id} instead of {\tt idt} is a common error,
lcp@323
   485
especially since it appears in former versions of most of Isabelle's
lcp@323
   486
object-logics.
lcp@323
   487
\end{warn}
lcp@323
   488
lcp@323
   489
lcp@323
   490
lcp@323
   491
\subsection{Applying rules}
lcp@323
   492
As a term is being parsed or printed, an \AST{} is generated as an
lcp@323
   493
intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
lcp@332
   494
normalised by applying macro rules in the manner of a traditional term
lcp@323
   495
rewriting system.  We first examine how a single rule is applied.
lcp@323
   496
lcp@332
   497
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
lcp@323
   498
translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
lcp@323
   499
instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
lcp@323
   500
matched by $l$ may be replaced by the corresponding instance of~$r$, thus
lcp@323
   501
{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
lcp@323
   502
  place-holders} that may occur in rule patterns but not in ordinary
lcp@323
   503
\AST{}s; {\tt Variable} atoms serve this purpose.
lcp@323
   504
lcp@323
   505
The matching of the object~$u$ by the pattern~$l$ is performed as follows:
lcp@323
   506
\begin{itemize}
lcp@323
   507
  \item Every constant matches itself.
lcp@323
   508
lcp@323
   509
  \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
lcp@323
   510
    This point is discussed further below.
lcp@323
   511
lcp@323
   512
  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
lcp@323
   513
    binding~$x$ to~$u$.
lcp@323
   514
lcp@323
   515
  \item One application matches another if they have the same number of
lcp@323
   516
    subtrees and corresponding subtrees match.
lcp@323
   517
lcp@323
   518
  \item In every other case, matching fails.  In particular, {\tt
lcp@323
   519
      Constant}~$x$ can only match itself.
lcp@323
   520
\end{itemize}
lcp@323
   521
A successful match yields a substitution that is applied to~$r$, generating
lcp@323
   522
the instance that replaces~$u$.
lcp@323
   523
lcp@323
   524
The second case above may look odd.  This is where {\tt Variable}s of
lcp@323
   525
non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
lcp@323
   526
far removed from parse trees; at this level it is not yet known which
lcp@323
   527
identifiers will become constants, bounds, frees, types or classes.  As
lcp@323
   528
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
lcp@323
   529
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
lcp@323
   530
\ndx{tvar} become {\tt Variable}s.  On the other hand, when \AST{}s
lcp@323
   531
generated from terms for printing, all constants and type constructors
lcp@323
   532
become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may contain a
lcp@323
   533
messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
lcp@323
   534
insignificant at macro level because matching treats them alike.
lcp@323
   535
lcp@323
   536
Because of this behaviour, different kinds of atoms with the same name are
lcp@323
   537
indistinguishable, which may make some rules prone to misbehaviour.  Example:
lcp@323
   538
\begin{ttbox}
lcp@323
   539
types
lcp@323
   540
  Nil
lcp@323
   541
consts
lcp@323
   542
  Nil     :: "'a list"
lcp@323
   543
  "[]"    :: "'a list"    ("[]")
lcp@323
   544
translations
lcp@323
   545
  "[]"    == "Nil"
lcp@323
   546
\end{ttbox}
lcp@323
   547
The term {\tt Nil} will be printed as {\tt []}, just as expected.
lcp@323
   548
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
lcp@332
   549
expected!  How is the type~{\tt Nil} printed?
lcp@323
   550
lcp@323
   551
Normalizing an \AST{} involves repeatedly applying macro rules until none
lcp@332
   552
are applicable.  Macro rules are chosen in the order that they appear in the
lcp@323
   553
{\tt translations} section.  You can watch the normalization of \AST{}s
lcp@323
   554
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
lcp@323
   555
{\tt true}.\index{tracing!of macros} Alternatively, use
lcp@323
   556
\ttindex{Syntax.test_read}.  The information displayed when tracing
lcp@323
   557
includes the \AST{} before normalization ({\tt pre}), redexes with results
lcp@323
   558
({\tt rewrote}), the normal form finally reached ({\tt post}) and some
lcp@323
   559
statistics ({\tt normalize}).  If tracing is off,
lcp@323
   560
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
lcp@323
   561
printing of the normal form and statistics only.
lcp@323
   562
lcp@323
   563
lcp@323
   564
\subsection{Example: the syntax of finite sets}
lcp@323
   565
\index{examples!of macros}
lcp@323
   566
lcp@323
   567
This example demonstrates the use of recursive macros to implement a
lcp@323
   568
convenient notation for finite sets.
lcp@323
   569
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
lcp@323
   570
\index{"@Enum@{\tt\at Enum} constant}
lcp@323
   571
\index{"@Finset@{\tt\at Finset} constant}
lcp@323
   572
\begin{ttbox}
lcp@323
   573
FINSET = SET +
lcp@323
   574
types
lcp@323
   575
  is
lcp@323
   576
consts
lcp@323
   577
  ""            :: "i => is"                ("_")
lcp@323
   578
  "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
lcp@323
   579
  empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
lcp@323
   580
  insert        :: "[i, i] => i"
lcp@323
   581
  "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
lcp@323
   582
translations
lcp@323
   583
  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
lcp@323
   584
  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
lcp@323
   585
end
lcp@323
   586
\end{ttbox}
lcp@323
   587
Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
lcp@323
   588
declarations above specify \verb|{x, y, z}| as the external representation
lcp@323
   589
of
lcp@323
   590
\begin{ttbox}
lcp@323
   591
insert(x, insert(y, insert(z, empty)))
lcp@323
   592
\end{ttbox}
lcp@323
   593
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
lcp@323
   594
  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
lcp@323
   595
allows a line break after the comma for \rmindex{pretty printing}; if no
lcp@323
   596
line break is required then a space is printed instead.
lcp@323
   597
lcp@323
   598
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
lcp@323
   599
declaration.  Hence {\tt is} is not a logical type and no default
lcp@323
   600
productions are added.  If we had needed enumerations of the nonterminal
lcp@323
   601
{\tt logic}, which would include all the logical types, we could have used
lcp@323
   602
the predefined nonterminal symbol \ndx{args} and skipped this part
lcp@323
   603
altogether.  The nonterminal~{\tt is} can later be reused for other
lcp@323
   604
enumerations of type~{\tt i} like lists or tuples.
lcp@323
   605
lcp@323
   606
\index{"@Finset@{\tt\at Finset} constant}
lcp@323
   607
Next follows {\tt empty}, which is already equipped with its syntax
lcp@323
   608
\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
lcp@323
   609
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
lcp@323
   610
  i} enclosed in curly braces.  Remember that a pair of parentheses, as in
lcp@323
   611
\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
lcp@323
   612
lcp@323
   613
The translations may look strange at first.  Macro rules are best
lcp@323
   614
understood in their internal forms:
lcp@323
   615
\begin{ttbox}
lcp@323
   616
parse_rules:
lcp@323
   617
  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
lcp@323
   618
  ("{\at}Finset" x)  ->  ("insert" x "empty")
lcp@323
   619
print_rules:
lcp@323
   620
  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
lcp@323
   621
  ("insert" x "empty")  ->  ("{\at}Finset" x)
lcp@323
   622
\end{ttbox}
lcp@323
   623
This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
lcp@323
   624
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
lcp@323
   625
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
lcp@323
   626
The parse rules only work in the order given.
lcp@323
   627
lcp@323
   628
\begin{warn}
lcp@332
   629
  The \AST{} rewriter cannot distinguish constants from variables and looks
lcp@323
   630
  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
lcp@323
   631
  the (internal) left-hand side of translation rules should be regarded as
lcp@323
   632
  \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
lcp@323
   633
  sufficiently long and strange names.  If a bound variable's name gets
lcp@323
   634
  rewritten, the result will be incorrect; for example, the term
lcp@323
   635
\begin{ttbox}
lcp@323
   636
\%empty insert. insert(x, empty)
lcp@323
   637
\end{ttbox}
lcp@332
   638
\par\noindent is printed as \verb|%empty insert. {x}|.
lcp@323
   639
\end{warn}
lcp@323
   640
lcp@323
   641
lcp@323
   642
\subsection{Example: a parse macro for dependent types}\label{prod_trans}
lcp@323
   643
\index{examples!of macros}
lcp@323
   644
lcp@323
   645
As stated earlier, a macro rule may not introduce new {\tt Variable}s on
lcp@323
   646
the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
lcp@323
   647
if allowed, it could cause variable capture.  In such cases you usually
lcp@323
   648
must fall back on translation functions.  But a trick can make things
lcp@323
   649
readable in some cases: {\em calling translation functions by parse
lcp@323
   650
  macros}:
lcp@323
   651
\begin{ttbox}
lcp@323
   652
PROD = FINSET +
lcp@323
   653
consts
lcp@323
   654
  Pi            :: "[i, i => i] => i"
lcp@323
   655
  "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
lcp@323
   656
  "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
lcp@323
   657
\ttbreak
lcp@323
   658
translations
lcp@323
   659
  "PROD x:A. B" => "Pi(A, \%x. B)"
lcp@323
   660
  "A -> B"      => "Pi(A, _K(B))"
lcp@323
   661
end
lcp@323
   662
ML
lcp@323
   663
  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
lcp@323
   664
\end{ttbox}
lcp@323
   665
lcp@323
   666
Here {\tt Pi} is an internal constant for constructing general products.
lcp@323
   667
Two external forms exist: the general case {\tt PROD x:A.B} and the
lcp@323
   668
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
lcp@323
   669
does not depend on~{\tt x}.
lcp@323
   670
lcp@323
   671
The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
lcp@323
   672
due to a parse translation associated with \cdx{_K}.  The order of the
lcp@323
   673
parse rules is critical.  Unfortunately there is no such trick for
lcp@323
   674
printing, so we have to add a {\tt ML} section for the print translation
lcp@323
   675
\ttindex{dependent_tr'}.
lcp@323
   676
lcp@323
   677
Recall that identifiers with a leading {\tt _} are allowed in translation
lcp@323
   678
rules, but not in ordinary terms.  Thus we can create \AST{}s containing
lcp@323
   679
names that are not directly expressible.
lcp@323
   680
lcp@323
   681
The parse translation for {\tt _K} is already installed in Pure, and {\tt
lcp@323
   682
dependent_tr'} is exported by the syntax module for public use.  See
lcp@323
   683
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
lcp@323
   684
\index{macros|)}\index{rewriting!syntactic|)}
lcp@323
   685
lcp@323
   686
lcp@323
   687
\section{Translation functions} \label{sec:tr_funs}
lcp@323
   688
\index{translations|(} 
lcp@323
   689
%
lcp@323
   690
This section describes the translation function mechanism.  By writing
lcp@323
   691
\ML{} functions, you can do almost everything with terms or \AST{}s during
lcp@323
   692
parsing and printing.  The logic \LK\ is a good example of sophisticated
lcp@332
   693
transformations between internal and external representations of sequents;
lcp@332
   694
here, macros would be useless. 
lcp@323
   695
lcp@323
   696
A full understanding of translations requires some familiarity
lcp@323
   697
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
lcp@323
   698
{\tt Syntax.ast} and the encodings of types and terms as such at the various
lcp@323
   699
stages of the parsing or printing process.  Most users should never need to
lcp@323
   700
use translation functions.
lcp@323
   701
lcp@323
   702
\subsection{Declaring translation functions}
lcp@323
   703
There are four kinds of translation functions.  Each such function is
lcp@323
   704
associated with a name, which triggers calls to it.  Such names can be
lcp@323
   705
constants (logical or syntactic) or type constructors.
lcp@323
   706
lcp@323
   707
{\tt Syntax.print_syntax} displays the sets of names associated with the
lcp@323
   708
translation functions of a {\tt Syntax.syntax} under
lcp@323
   709
\ttindex{parse_ast_translation}, \ttindex{parse_translation},
lcp@323
   710
\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
lcp@323
   711
add new ones via the {\tt ML} section\index{*ML section} of
lcp@323
   712
a {\tt .thy} file.  There may never be more than one function of the same
lcp@323
   713
kind per name.  Conceptually, the {\tt ML} section should appear between
lcp@323
   714
{\tt consts} and {\tt translations}; newly installed translation functions
lcp@323
   715
are already effective when macros and logical rules are parsed.
lcp@323
   716
lcp@323
   717
The {\tt ML} section is copied verbatim into the \ML\ file generated from a
lcp@323
   718
{\tt .thy} file.  Definitions made here are accessible as components of an
lcp@323
   719
\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
lcp@323
   720
declaration.  The {\tt ML} section may install translation functions by
lcp@323
   721
declaring any of the following identifiers:
lcp@323
   722
\begin{ttbox}
lcp@323
   723
val parse_ast_translation : (string * (ast list -> ast)) list
lcp@323
   724
val print_ast_translation : (string * (ast list -> ast)) list
lcp@323
   725
val parse_translation     : (string * (term list -> term)) list
lcp@323
   726
val print_translation     : (string * (term list -> term)) list
lcp@323
   727
\end{ttbox}
lcp@323
   728
lcp@323
   729
\subsection{The translation strategy}
lcp@323
   730
All four kinds of translation functions are treated similarly.  They are
lcp@323
   731
called during the transformations between parse trees, \AST{}s and terms
lcp@323
   732
(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
lcp@323
   733
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
lcp@323
   734
$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
lcp@323
   735
function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
lcp@323
   736
lcp@323
   737
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
lcp@323
   738
combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
lcp@323
   739
  x@n}$.  For term translations, the arguments are terms and a combination
lcp@323
   740
has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
lcp@323
   741
x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
lcp@323
   742
transformations than \AST{}s do, typically involving abstractions and bound
lcp@323
   743
variables.
lcp@323
   744
lcp@323
   745
Regardless of whether they act on terms or \AST{}s,
lcp@323
   746
parse translations differ from print translations fundamentally:
lcp@323
   747
\begin{description}
lcp@323
   748
\item[Parse translations] are applied bottom-up.  The arguments are already
lcp@323
   749
  in translated form.  The translations must not fail; exceptions trigger
lcp@323
   750
  an error message.
lcp@323
   751
lcp@323
   752
\item[Print translations] are applied top-down.  They are supplied with
lcp@323
   753
  arguments that are partly still in internal form.  The result again
lcp@323
   754
  undergoes translation; therefore a print translation should not introduce
lcp@323
   755
  as head the very constant that invoked it.  The function may raise
lcp@323
   756
  exception \xdx{Match} to indicate failure; in this event it has no
lcp@323
   757
  effect.
lcp@323
   758
\end{description}
lcp@323
   759
lcp@323
   760
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
lcp@323
   761
\ttindex{Const} for terms --- can invoke translation functions.  This
lcp@323
   762
causes another difference between parsing and printing.
lcp@323
   763
lcp@323
   764
Parsing starts with a string and the constants are not yet identified.
lcp@323
   765
Only parse tree heads create {\tt Constant}s in the resulting \AST, as
lcp@323
   766
described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
lcp@323
   767
introduce further {\tt Constant}s.  When the final \AST{} is converted to a
lcp@323
   768
term, all {\tt Constant}s become {\tt Const}s, as described in
lcp@323
   769
\S\ref{sec:termofast}.
lcp@323
   770
lcp@323
   771
Printing starts with a well-typed term and all the constants are known.  So
lcp@323
   772
all logical constants and type constructors may invoke print translations.
lcp@323
   773
These, and macros, may introduce further constants.
lcp@323
   774
lcp@323
   775
lcp@323
   776
\subsection{Example: a print translation for dependent types}
lcp@323
   777
\index{examples!of translations}\indexbold{*dependent_tr'}
lcp@323
   778
lcp@323
   779
Let us continue the dependent type example (page~\pageref{prod_trans}) by
lcp@323
   780
examining the parse translation for~\cdx{_K} and the print translation
lcp@323
   781
{\tt dependent_tr'}, which are both built-in.  By convention, parse
lcp@323
   782
translations have names ending with {\tt _tr} and print translations have
lcp@323
   783
names ending with {\tt _tr'}.  Search for such names in the Isabelle
lcp@323
   784
sources to locate more examples.
lcp@323
   785
lcp@323
   786
Here is the parse translation for {\tt _K}:
lcp@323
   787
\begin{ttbox}
lcp@323
   788
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
lcp@323
   789
  | k_tr ts = raise TERM("k_tr",ts);
lcp@323
   790
\end{ttbox}
lcp@323
   791
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
lcp@323
   792
{\tt Abs} node with a body derived from $t$.  Since terms given to parse
lcp@323
   793
translations are not yet typed, the type of the bound variable in the new
lcp@323
   794
{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
lcp@323
   795
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
lcp@323
   796
a basic term manipulation function defined in {\tt Pure/term.ML}.
lcp@323
   797
lcp@323
   798
Here is the print translation for dependent types:
lcp@323
   799
\begin{ttbox}
lcp@323
   800
fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
lcp@323
   801
      if 0 mem (loose_bnos B) then
lcp@323
   802
        let val (x', B') = variant_abs (x, dummyT, B);
lcp@323
   803
        in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
lcp@323
   804
        end
lcp@323
   805
      else list_comb (Const (r, dummyT) $ A $ B, ts)
lcp@323
   806
  | dependent_tr' _ _ = raise Match;
lcp@323
   807
\end{ttbox}
lcp@332
   808
The argument {\tt (q,r)} is supplied to the curried function {\tt
lcp@332
   809
  dependent_tr'} by a partial application during its installation.  We
lcp@332
   810
can set up print translations for both {\tt Pi} and {\tt Sigma} by
lcp@332
   811
including
lcp@323
   812
\begin{ttbox}\index{*ML section}
lcp@323
   813
val print_translation =
lcp@323
   814
  [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
lcp@323
   815
   ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
lcp@323
   816
\end{ttbox}
lcp@323
   817
within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
lcp@323
   818
\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
lcp@332
   819
$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not depend
lcp@323
   820
on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
lcp@323
   821
from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
lcp@332
   822
from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes
lcp@332
   823
referring to the {\tt Abs} node replaced by $\ttfct{Free} (x',
lcp@323
   824
\mtt{dummyT})$.
lcp@323
   825
lcp@323
   826
We must be careful with types here.  While types of {\tt Const}s are
lcp@323
   827
ignored, type constraints may be printed for some {\tt Free}s and
lcp@323
   828
{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
lcp@323
   829
\ttindex{dummyT} are never printed with constraint, though.  The line
lcp@323
   830
\begin{ttbox}
lcp@323
   831
        let val (x', B') = variant_abs (x, dummyT, B);
lcp@323
   832
\end{ttbox}\index{*variant_abs}
lcp@323
   833
replaces bound variable occurrences in~$B$ by the free variable $x'$ with
lcp@323
   834
type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
lcp@323
   835
correct type~{\tt T}, so this is the only place where a type
lcp@323
   836
constraint might appear. 
lcp@323
   837
\index{translations|)}
lcp@323
   838
\index{syntax!transformations|)}