doc-src/Ref/syntax.tex
author Walther Neuper <neuper@ist.tugraz.at>
Thu, 12 Aug 2010 15:03:34 +0200
branchisac-from-Isabelle2009-2
changeset 37913 20e3616b2d9c
parent 30184 37969710e61f
child 43760 e87888b4152f
permissions -rw-r--r--
prepare reactivation of isac-update-Isa09-2
wenzelm@30184
     1
lcp@323
     2
\chapter{Syntax Transformations} \label{chap:syntax}
lcp@323
     3
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
lcp@323
     4
\newcommand\mtt[1]{\mbox{\tt #1}}
lcp@323
     5
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
lcp@323
     6
\newcommand\Constant{\ttfct{Constant}}
lcp@323
     7
\newcommand\Variable{\ttfct{Variable}}
lcp@323
     8
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
lcp@323
     9
\index{syntax!transformations|(}
lcp@323
    10
lcp@323
    11
This chapter is intended for experienced Isabelle users who need to define
lcp@323
    12
macros or code their own translation functions.  It describes the
lcp@323
    13
transformations between parse trees, abstract syntax trees and terms.
lcp@323
    14
lcp@323
    15
lcp@323
    16
\section{Abstract syntax trees} \label{sec:asts}
wenzelm@864
    17
\index{ASTs|(}
lcp@323
    18
lcp@323
    19
The parser, given a token list from the lexer, applies productions to yield
lcp@323
    20
a parse tree\index{parse trees}.  By applying some internal transformations
lcp@323
    21
the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
lcp@323
    22
expansion, further translations and finally type inference yields a
lcp@323
    23
well-typed term.  The printing process is the reverse, except for some
lcp@323
    24
subtleties to be discussed later.
lcp@323
    25
lcp@323
    26
Figure~\ref{fig:parse_print} outlines the parsing and printing process.
lcp@323
    27
Much of the complexity is due to the macro mechanism.  Using macros, you
lcp@323
    28
can specify most forms of concrete syntax without writing any \ML{} code.
lcp@323
    29
lcp@323
    30
\begin{figure}
lcp@323
    31
\begin{center}
lcp@323
    32
\begin{tabular}{cl}
lcp@323
    33
string          & \\
wenzelm@3108
    34
$\downarrow$    & lexer, parser \\
lcp@323
    35
parse tree      & \\
lcp@323
    36
$\downarrow$    & parse \AST{} translation \\
lcp@323
    37
\AST{}             & \\
lcp@323
    38
$\downarrow$    & \AST{} rewriting (macros) \\
lcp@323
    39
\AST{}             & \\
lcp@323
    40
$\downarrow$    & parse translation, type inference \\
lcp@323
    41
--- well-typed term --- & \\
lcp@323
    42
$\downarrow$    & print translation \\
lcp@323
    43
\AST{}             & \\
lcp@323
    44
$\downarrow$    & \AST{} rewriting (macros) \\
lcp@323
    45
\AST{}             & \\
wenzelm@3108
    46
$\downarrow$    & print \AST{} translation, token translation \\
lcp@323
    47
string          &
lcp@323
    48
\end{tabular}
lcp@323
    49
lcp@323
    50
\end{center}
lcp@323
    51
\caption{Parsing and printing}\label{fig:parse_print}
lcp@323
    52
\end{figure}
lcp@323
    53
lcp@323
    54
Abstract syntax trees are an intermediate form between the raw parse trees
lcp@323
    55
and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
lcp@323
    56
variable) or a list of {\em at least two\/} subtrees.  Internally, they
lcp@323
    57
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
lcp@323
    58
\index{*Appl}
lcp@323
    59
\begin{ttbox}
lcp@323
    60
datatype ast = Constant of string
lcp@323
    61
             | Variable of string
lcp@323
    62
             | Appl of ast list
lcp@323
    63
\end{ttbox}
lcp@323
    64
%
lcp@323
    65
Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
lcp@323
    66
atoms are shown as quoted strings, variable atoms as non-quoted strings and
lcp@332
    67
applications as a parenthesised list of subtrees.  For example, the \AST
lcp@323
    68
\begin{ttbox}
lcp@323
    69
Appl [Constant "_constrain",
lcp@323
    70
      Appl [Constant "_abs", Variable "x", Variable "t"],
lcp@323
    71
      Appl [Constant "fun", Variable "'a", Variable "'b"]]
lcp@323
    72
\end{ttbox}
lcp@323
    73
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
lcp@323
    74
Both {\tt ()} and {\tt (f)} are illegal because they have too few
wenzelm@864
    75
subtrees.
lcp@323
    76
lcp@323
    77
The resemblance to Lisp's S-expressions is intentional, but there are two
lcp@323
    78
kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
lcp@323
    79
names {\tt Constant} and {\tt Variable} too literally; in the later
lcp@323
    80
translation to terms, $\Variable x$ may become a constant, free or bound
lcp@323
    81
variable, even a type constructor or class name; the actual outcome depends
lcp@323
    82
on the context.
lcp@323
    83
lcp@323
    84
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
lcp@323
    85
application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
lcp@323
    86
application is determined later by context; it could be a type constructor
lcp@323
    87
applied to types.
lcp@323
    88
lcp@323
    89
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
lcp@323
    90
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
lcp@323
    91
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
lcp@323
    92
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
lcp@323
    93
constructor \ttindex{Bound}).
lcp@323
    94
lcp@323
    95
wenzelm@6618
    96
\section{Transforming parse trees to ASTs}\label{sec:astofpt}
lcp@323
    97
\index{ASTs!made from parse trees}
lcp@323
    98
\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
lcp@323
    99
lcp@323
   100
The parse tree is the raw output of the parser.  Translation functions,
lcp@323
   101
called {\bf parse AST translations}\indexbold{translations!parse AST},
lcp@323
   102
transform the parse tree into an abstract syntax tree.
lcp@323
   103
lcp@323
   104
The parse tree is constructed by nesting the right-hand sides of the
lcp@323
   105
productions used to recognize the input.  Such parse trees are simply lists
lcp@323
   106
of tokens and constituent parse trees, the latter representing the
lcp@323
   107
nonterminals of the productions.  Let us refer to the actual productions in
wenzelm@864
   108
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
wenzelm@864
   109
example).
lcp@323
   110
lcp@323
   111
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
lcp@323
   112
by stripping out delimiters and copy productions.  More precisely, the
lcp@323
   113
mapping $\astofpt{-}$ is derived from the productions as follows:
lcp@323
   114
\begin{itemize}
lcp@323
   115
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
wenzelm@14947
   116
  \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
wenzelm@14947
   117
  and $s$ its associated string.  Note that for {\tt xstr} this does not
wenzelm@14947
   118
  include the quotes.
lcp@323
   119
lcp@323
   120
\item Copy productions:\index{productions!copy}
lcp@323
   121
  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
lcp@323
   122
  strings of delimiters, which are discarded.  $P$ stands for the single
lcp@323
   123
  constituent that is not a delimiter; it is either a nonterminal symbol or
lcp@323
   124
  a name token.
lcp@323
   125
lcp@323
   126
  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
lcp@323
   127
    Here there are no constituents other than delimiters, which are
wenzelm@864
   128
    discarded.
lcp@323
   129
lcp@323
   130
  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
lcp@323
   131
    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
lcp@323
   132
    application whose head constant is~$c$:
wenzelm@864
   133
    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
lcp@323
   134
       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
lcp@323
   135
    \]
lcp@323
   136
\end{itemize}
lcp@323
   137
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
lcp@323
   138
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
lcp@323
   139
These examples illustrate the need for further translations to make \AST{}s
lcp@323
   140
closer to the typed $\lambda$-calculus.  The Pure syntax provides
lcp@323
   141
predefined parse \AST{} translations\index{translations!parse AST} for
lcp@323
   142
ordinary applications, type applications, nested abstractions, meta
lcp@323
   143
implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
lcp@323
   144
effect on some representative input strings.
lcp@323
   145
lcp@323
   146
lcp@323
   147
\begin{figure}
lcp@323
   148
\begin{center}
lcp@323
   149
\tt\begin{tabular}{ll}
lcp@323
   150
\rm input string    & \rm \AST \\\hline
lcp@323
   151
"f"                 & f \\
lcp@323
   152
"'a"                & 'a \\
lcp@323
   153
"t == u"            & ("==" t u) \\
lcp@323
   154
"f(x)"              & ("_appl" f x) \\
lcp@323
   155
"f(x, y)"           & ("_appl" f ("_args" x y)) \\
lcp@323
   156
"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
lcp@323
   157
"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
lcp@323
   158
\end{tabular}
lcp@323
   159
\end{center}
wenzelm@864
   160
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
lcp@323
   161
\end{figure}
lcp@323
   162
lcp@323
   163
\begin{figure}
lcp@323
   164
\begin{center}
lcp@323
   165
\tt\begin{tabular}{ll}
lcp@323
   166
\rm input string            & \rm \AST{} \\\hline
lcp@323
   167
"f(x, y, z)"                & (f x y z) \\
lcp@323
   168
"'a ty"                     & (ty 'a) \\
lcp@323
   169
"('a, 'b) ty"               & (ty 'a 'b) \\
lcp@323
   170
"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
lcp@323
   171
"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
lcp@323
   172
"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
lcp@323
   173
"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
lcp@323
   174
\end{tabular}
lcp@323
   175
\end{center}
lcp@323
   176
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
lcp@323
   177
\end{figure}
lcp@323
   178
lcp@323
   179
The names of constant heads in the \AST{} control the translation process.
lcp@323
   180
The list of constants invoking parse \AST{} translations appears in the
wenzelm@864
   181
output of {\tt print_syntax} under {\tt parse_ast_translation}.
lcp@323
   182
lcp@323
   183
wenzelm@6618
   184
\section{Transforming ASTs to terms}\label{sec:termofast}
lcp@323
   185
\index{terms!made from ASTs}
lcp@323
   186
\newcommand\termofast[1]{\lbrakk#1\rbrakk}
lcp@323
   187
lcp@323
   188
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323
   189
transformed into a term.  This term is probably ill-typed since type
lcp@323
   190
inference has not occurred yet.  The term may contain type constraints
lcp@323
   191
consisting of applications with head {\tt "_constrain"}; the second
lcp@323
   192
argument is a type encoded as a term.  Type inference later introduces
lcp@323
   193
correct types or rejects the input.
lcp@323
   194
lcp@323
   195
Another set of translation functions, namely parse
lcp@323
   196
translations\index{translations!parse}, may affect this process.  If we
lcp@323
   197
ignore parse translations for the time being, then \AST{}s are transformed
lcp@323
   198
to terms by mapping \AST{} constants to constants, \AST{} variables to
lcp@323
   199
schematic or free variables and \AST{} applications to applications.
lcp@323
   200
lcp@323
   201
More precisely, the mapping $\termofast{-}$ is defined by
lcp@323
   202
\begin{itemize}
lcp@323
   203
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
lcp@323
   204
  \mtt{dummyT})$.
lcp@323
   205
lcp@323
   206
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
lcp@323
   207
  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
lcp@323
   208
  the index extracted from~$xi$.
lcp@323
   209
lcp@323
   210
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
lcp@323
   211
  \mtt{dummyT})$.
lcp@323
   212
lcp@323
   213
\item Function applications with $n$ arguments:
wenzelm@864
   214
    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
lcp@323
   215
       \termofast{f} \ttapp
lcp@323
   216
         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
lcp@323
   217
    \]
lcp@323
   218
\end{itemize}
lcp@323
   219
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
lcp@323
   220
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
lcp@323
   221
while \ttindex{dummyT} stands for some dummy type that is ignored during
lcp@323
   222
type inference.
lcp@323
   223
lcp@323
   224
So far the outcome is still a first-order term.  Abstractions and bound
lcp@323
   225
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
lcp@323
   226
by parse translations.  Such translations are attached to {\tt "_abs"},
lcp@323
   227
{\tt "!!"} and user-defined binders.
lcp@323
   228
lcp@323
   229
lcp@323
   230
\section{Printing of terms}
lcp@323
   231
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
lcp@323
   232
lcp@323
   233
The output phase is essentially the inverse of the input phase.  Terms are
lcp@323
   234
translated via abstract syntax trees into strings.  Finally the strings are
lcp@323
   235
pretty printed.
lcp@323
   236
lcp@323
   237
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
lcp@323
   238
terms into \AST{}s.  Ignoring those, the transformation maps
lcp@323
   239
term constants, variables and applications to the corresponding constructs
lcp@323
   240
on \AST{}s.  Abstractions are mapped to applications of the special
lcp@323
   241
constant {\tt _abs}.
lcp@323
   242
lcp@323
   243
More precisely, the mapping $\astofterm{-}$ is defined as follows:
lcp@323
   244
\begin{itemize}
lcp@323
   245
  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
lcp@323
   246
lcp@323
   247
  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
lcp@323
   248
    \tau)$.
lcp@323
   249
lcp@323
   250
  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
lcp@323
   251
    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
lcp@323
   252
    the {\tt indexname} $(x, i)$.
lcp@323
   253
lcp@323
   254
  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
lcp@323
   255
    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
lcp@323
   256
    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
lcp@323
   257
    the free variable $x'$.  This replaces corresponding occurrences of the
lcp@323
   258
    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
lcp@323
   259
    \mtt{dummyT})$:
wenzelm@864
   260
   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
wenzelm@864
   261
      \Appl{\Constant \mtt{"_abs"},
paulson@8136
   262
        constrain(\Variable x', \tau), \astofterm{t'}}
lcp@323
   263
    \]
lcp@323
   264
wenzelm@864
   265
  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
lcp@323
   266
    The occurrence of constructor \ttindex{Bound} should never happen
lcp@323
   267
    when printing well-typed terms; it indicates a de Bruijn index with no
lcp@323
   268
    matching abstraction.
lcp@323
   269
lcp@323
   270
  \item Where $f$ is not an application,
wenzelm@864
   271
    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
lcp@323
   272
       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
lcp@323
   273
    \]
lcp@323
   274
\end{itemize}
lcp@323
   275
%
lcp@332
   276
Type constraints\index{type constraints} are inserted to allow the printing
lcp@332
   277
of types.  This is governed by the boolean variable \ttindex{show_types}:
lcp@323
   278
\begin{itemize}
lcp@323
   279
  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
lcp@332
   280
    \ttindex{show_types} is set to {\tt false}.
lcp@323
   281
wenzelm@864
   282
  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
wenzelm@864
   283
         \astofterm{\tau}}$ \ otherwise.
lcp@323
   284
lcp@323
   285
    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
lcp@323
   286
    constructors go to {\tt Constant}s; type identifiers go to {\tt
lcp@323
   287
      Variable}s; type applications go to {\tt Appl}s with the type
lcp@323
   288
    constructor as the first element.  If \ttindex{show_sorts} is set to
lcp@323
   289
    {\tt true}, some type variables are decorated with an \AST{} encoding
lcp@323
   290
    of their sort.
lcp@323
   291
\end{itemize}
lcp@323
   292
%
lcp@323
   293
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323
   294
transformed into the final output string.  The built-in {\bf print AST
lcp@332
   295
  translations}\indexbold{translations!print AST} reverse the
lcp@323
   296
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
lcp@323
   297
lcp@323
   298
For the actual printing process, the names attached to productions
lcp@323
   299
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
lcp@323
   300
a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
lcp@323
   301
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
lcp@323
   302
for~$c$.  Each argument~$x@i$ is converted to a string, and put in
lcp@323
   303
parentheses if its priority~$(p@i)$ requires this.  The resulting strings
lcp@323
   304
and their syntactic sugar (denoted by \dots{} above) are joined to make a
lcp@323
   305
single string.
lcp@323
   306
wenzelm@3108
   307
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
wenzelm@3108
   308
than the corresponding production, it is first split into
wenzelm@3108
   309
$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
wenzelm@3108
   310
with too few arguments or with non-constant head or without a
wenzelm@3108
   311
corresponding production are printed as $f(x@1, \ldots, x@l)$ or
wenzelm@3108
   312
$(\alpha@1, \ldots, \alpha@l) ty$.  Multiple productions associated
wenzelm@3108
   313
with some name $c$ are tried in order of appearance.  An occurrence of
lcp@323
   314
$\Variable x$ is simply printed as~$x$.
lcp@323
   315
lcp@323
   316
Blanks are {\em not\/} inserted automatically.  If blanks are required to
lcp@323
   317
separate tokens, specify them in the mixfix declaration, possibly preceded
lcp@323
   318
by a slash~({\tt/}) to allow a line break.
lcp@323
   319
\index{ASTs|)}
lcp@323
   320
lcp@323
   321
lcp@323
   322
wenzelm@3108
   323
\section{Macros: syntactic rewriting} \label{sec:macros}
wenzelm@864
   324
\index{macros|(}\index{rewriting!syntactic|(}
lcp@323
   325
lcp@323
   326
Mixfix declarations alone can handle situations where there is a direct
lcp@323
   327
connection between the concrete syntax and the underlying term.  Sometimes
lcp@323
   328
we require a more elaborate concrete syntax, such as quantifiers and list
lcp@323
   329
notation.  Isabelle's {\bf macros} and {\bf translation functions} can
lcp@323
   330
perform translations such as
lcp@323
   331
\begin{center}\tt
lcp@323
   332
  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
lcp@323
   333
    ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
lcp@323
   334
    [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
lcp@323
   335
  \end{tabular}
lcp@323
   336
\end{center}
lcp@323
   337
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
lcp@323
   338
are the most powerful translation mechanism but are difficult to read or
lcp@323
   339
write.  Macros are specified by first-order rewriting systems that operate
lcp@323
   340
on abstract syntax trees.  They are usually easy to read and write, and can
lcp@323
   341
express all but the most obscure translations.
lcp@323
   342
wenzelm@3108
   343
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
wenzelm@3108
   344
set theory.\footnote{This and the following theories are complete
wenzelm@3108
   345
  working examples, though they specify only syntax, no axioms.  The
wenzelm@3108
   346
  file {\tt ZF/ZF.thy} presents a full set theory definition,
wenzelm@3108
   347
  including many macro rules.} Theory {\tt SetSyntax} declares
wenzelm@3108
   348
constants for set comprehension ({\tt Collect}), replacement ({\tt
wenzelm@3108
   349
  Replace}) and bounded universal quantification ({\tt Ball}).  Each
wenzelm@3108
   350
of these binds some variables.  Without additional syntax we should
wenzelm@3108
   351
have to write $\forall x \in A.  P$ as {\tt Ball(A,\%x.P)}, and
wenzelm@3108
   352
similarly for the others.
lcp@323
   353
lcp@323
   354
\begin{figure}
lcp@323
   355
\begin{ttbox}
wenzelm@3108
   356
SetSyntax = Pure +
lcp@323
   357
types
wenzelm@864
   358
  i o
lcp@323
   359
arities
lcp@323
   360
  i, o :: logic
lcp@323
   361
consts
clasohm@1387
   362
  Trueprop      :: o => prop                ("_" 5)
clasohm@1387
   363
  Collect       :: [i, i => o] => i
clasohm@1387
   364
  Replace       :: [i, [i, i] => o] => i
clasohm@1387
   365
  Ball          :: [i, i => o] => o
wenzelm@864
   366
syntax
clasohm@1387
   367
  "{\at}Collect"    :: [idt, i, o] => i         ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
clasohm@1387
   368
  "{\at}Replace"    :: [idt, idt, i, o] => i    ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
clasohm@1387
   369
  "{\at}Ball"       :: [idt, i, o] => o         ("(3ALL _:_./ _)" 10)
lcp@323
   370
translations
lcp@323
   371
  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
lcp@323
   372
  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
lcp@323
   373
  "ALL x:A. P"  == "Ball(A, \%x. P)"
lcp@323
   374
end
lcp@323
   375
\end{ttbox}
lcp@323
   376
\caption{Macro example: set theory}\label{fig:set_trans}
lcp@323
   377
\end{figure}
lcp@323
   378
wenzelm@864
   379
The theory specifies a variable-binding syntax through additional productions
wenzelm@864
   380
that have mixfix declarations.  Each non-copy production must specify some
wenzelm@864
   381
constant, which is used for building \AST{}s. \index{constants!syntactic} The
wenzelm@864
   382
additional constants are decorated with {\tt\at} to stress their purely
wenzelm@864
   383
syntactic purpose; they may not occur within the final well-typed terms,
wenzelm@864
   384
being declared as {\tt syntax} rather than {\tt consts}.
lcp@323
   385
lcp@323
   386
The translations cause the replacement of external forms by internal forms
lcp@323
   387
after parsing, and vice versa before printing of terms.  As a specification
lcp@323
   388
of the set theory notation, they should be largely self-explanatory.  The
lcp@323
   389
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
lcp@323
   390
appear implicitly in the macro rules via their mixfix forms.
lcp@323
   391
lcp@323
   392
Macros can define variable-binding syntax because they operate on \AST{}s,
lcp@323
   393
which have no inbuilt notion of bound variable.  The macro variables {\tt
lcp@323
   394
  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
lcp@323
   395
in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
lcp@323
   396
range over formulae containing bound variable occurrences.
lcp@323
   397
lcp@323
   398
Other applications of the macro system can be less straightforward, and
lcp@323
   399
there are peculiarities.  The rest of this section will describe in detail
lcp@323
   400
how Isabelle macros are preprocessed and applied.
lcp@323
   401
lcp@323
   402
lcp@323
   403
\subsection{Specifying macros}
lcp@323
   404
Macros are basically rewrite rules on \AST{}s.  But unlike other macro
lcp@323
   405
systems found in programming languages, Isabelle's macros work in both
lcp@323
   406
directions.  Therefore a syntax contains two lists of rewrites: one for
lcp@323
   407
parsing and one for printing.
lcp@323
   408
lcp@323
   409
\index{*translations section}
lcp@323
   410
The {\tt translations} section specifies macros.  The syntax for a macro is
lcp@323
   411
\[ (root)\; string \quad
lcp@323
   412
   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
lcp@323
   413
   \right\} \quad
wenzelm@864
   414
   (root)\; string
lcp@323
   415
\]
lcp@323
   416
%
lcp@323
   417
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
lcp@323
   418
({\tt ==}).  The two strings specify the left and right-hand sides of the
lcp@323
   419
macro rule.  The $(root)$ specification is optional; it specifies the
lcp@323
   420
nonterminal for parsing the $string$ and if omitted defaults to {\tt
lcp@323
   421
  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
lcp@323
   422
\begin{itemize}
lcp@323
   423
\item Rules must be left linear: $l$ must not contain repeated variables.
lcp@323
   424
lcp@323
   425
\item Every variable in~$r$ must also occur in~$l$.
lcp@323
   426
\end{itemize}
lcp@323
   427
wenzelm@3108
   428
Macro rules may refer to any syntax from the parent theories.  They
wenzelm@3108
   429
may also refer to anything defined before the current {\tt
lcp@323
   430
  translations} section --- including any mixfix declarations.
lcp@323
   431
lcp@323
   432
Upon declaration, both sides of the macro rule undergo parsing and parse
lcp@323
   433
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
lcp@323
   434
macro expansion.  The lexer runs in a different mode that additionally
lcp@323
   435
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
lcp@323
   436
{\tt _K}).  Thus, a constant whose name starts with an underscore can
lcp@323
   437
appear in macro rules but not in ordinary terms.
lcp@323
   438
lcp@323
   439
Some atoms of the macro rule's \AST{} are designated as constants for
lcp@323
   440
matching.  These are all names that have been declared as classes, types or
wenzelm@864
   441
constants (logical and syntactic).
lcp@323
   442
wenzelm@3108
   443
The result of this preprocessing is two lists of macro rules, each
wenzelm@3108
   444
stored as a pair of \AST{}s.  They can be viewed using {\tt
wenzelm@3108
   445
  print_syntax} (sections \ttindex{parse_rules} and
wenzelm@3108
   446
\ttindex{print_rules}).  For theory~{\tt SetSyntax} of
wenzelm@3108
   447
Fig.~\ref{fig:set_trans} these are
lcp@323
   448
\begin{ttbox}
lcp@323
   449
parse_rules:
lcp@323
   450
  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
lcp@323
   451
  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
lcp@323
   452
  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
lcp@323
   453
print_rules:
lcp@323
   454
  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
lcp@323
   455
  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
lcp@323
   456
  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
lcp@323
   457
\end{ttbox}
lcp@323
   458
lcp@323
   459
\begin{warn}
lcp@323
   460
  Avoid choosing variable names that have previously been used as
lcp@323
   461
  constants, types or type classes; the {\tt consts} section in the output
wenzelm@864
   462
  of {\tt print_syntax} lists all such names.  If a macro rule works
lcp@323
   463
  incorrectly, inspect its internal form as shown above, recalling that
lcp@323
   464
  constants appear as quoted strings and variables without quotes.
lcp@323
   465
\end{warn}
lcp@323
   466
lcp@323
   467
\begin{warn}
lcp@323
   468
If \ttindex{eta_contract} is set to {\tt true}, terms will be
lcp@323
   469
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
lcp@323
   470
abstraction nodes needed for print rules to match may vanish.  For example,
lcp@332
   471
\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
lcp@323
   472
not apply and the output will be {\tt Ball(A, P)}.  This problem would not
lcp@323
   473
occur if \ML{} translation functions were used instead of macros (as is
lcp@323
   474
done for binder declarations).
lcp@323
   475
\end{warn}
lcp@323
   476
lcp@323
   477
lcp@323
   478
\begin{warn}
lcp@323
   479
Another trap concerns type constraints.  If \ttindex{show_types} is set to
lcp@323
   480
{\tt true}, bound variables will be decorated by their meta types at the
lcp@323
   481
binding place (but not at occurrences in the body).  Matching with
lcp@323
   482
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
lcp@323
   483
"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
wenzelm@864
   484
appear in the external form, say \verb|{y::i:A::i. P::o}|.
lcp@323
   485
lcp@323
   486
To allow such constraints to be re-read, your syntax should specify bound
lcp@323
   487
variables using the nonterminal~\ndx{idt}.  This is the case in our
wenzelm@3108
   488
example.  Choosing {\tt id} instead of {\tt idt} is a common error.
lcp@323
   489
\end{warn}
lcp@323
   490
lcp@323
   491
lcp@323
   492
lcp@323
   493
\subsection{Applying rules}
lcp@323
   494
As a term is being parsed or printed, an \AST{} is generated as an
lcp@323
   495
intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
lcp@332
   496
normalised by applying macro rules in the manner of a traditional term
lcp@323
   497
rewriting system.  We first examine how a single rule is applied.
lcp@323
   498
lcp@332
   499
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
lcp@323
   500
translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
lcp@323
   501
instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
lcp@323
   502
matched by $l$ may be replaced by the corresponding instance of~$r$, thus
lcp@323
   503
{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
lcp@323
   504
  place-holders} that may occur in rule patterns but not in ordinary
lcp@323
   505
\AST{}s; {\tt Variable} atoms serve this purpose.
lcp@323
   506
lcp@323
   507
The matching of the object~$u$ by the pattern~$l$ is performed as follows:
lcp@323
   508
\begin{itemize}
lcp@323
   509
  \item Every constant matches itself.
lcp@323
   510
lcp@323
   511
  \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
lcp@323
   512
    This point is discussed further below.
lcp@323
   513
lcp@323
   514
  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
lcp@323
   515
    binding~$x$ to~$u$.
lcp@323
   516
lcp@323
   517
  \item One application matches another if they have the same number of
lcp@323
   518
    subtrees and corresponding subtrees match.
lcp@323
   519
lcp@323
   520
  \item In every other case, matching fails.  In particular, {\tt
lcp@323
   521
      Constant}~$x$ can only match itself.
lcp@323
   522
\end{itemize}
lcp@323
   523
A successful match yields a substitution that is applied to~$r$, generating
lcp@323
   524
the instance that replaces~$u$.
lcp@323
   525
lcp@323
   526
The second case above may look odd.  This is where {\tt Variable}s of
lcp@323
   527
non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
lcp@323
   528
far removed from parse trees; at this level it is not yet known which
lcp@323
   529
identifiers will become constants, bounds, frees, types or classes.  As
lcp@323
   530
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
wenzelm@864
   531
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
wenzelm@14947
   532
\ndx{tvar}, \ndx{num}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other
wenzelm@864
   533
hand, when \AST{}s generated from terms for printing, all constants and type
wenzelm@864
   534
constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may
wenzelm@864
   535
contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
lcp@323
   536
insignificant at macro level because matching treats them alike.
lcp@323
   537
lcp@323
   538
Because of this behaviour, different kinds of atoms with the same name are
lcp@323
   539
indistinguishable, which may make some rules prone to misbehaviour.  Example:
lcp@323
   540
\begin{ttbox}
lcp@323
   541
types
lcp@323
   542
  Nil
lcp@323
   543
consts
clasohm@1387
   544
  Nil     :: 'a list
wenzelm@864
   545
syntax
clasohm@1387
   546
  "[]"    :: 'a list    ("[]")
lcp@323
   547
translations
lcp@323
   548
  "[]"    == "Nil"
lcp@323
   549
\end{ttbox}
lcp@323
   550
The term {\tt Nil} will be printed as {\tt []}, just as expected.
lcp@323
   551
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
wenzelm@864
   552
expected!  Guess how type~{\tt Nil} is printed?
lcp@323
   553
wenzelm@14893
   554
Normalizing an \AST{} involves repeatedly applying macro rules until none are
wenzelm@14893
   555
applicable.  Macro rules are chosen in order of appearance in the theory
wenzelm@14893
   556
definitions.  You can watch the normalization of \AST{}s during parsing and
wenzelm@14893
   557
printing by setting \ttindex{Syntax.trace_ast} to {\tt true}.\index{tracing!of
wenzelm@14893
   558
  macros} The information displayed when tracing includes the \AST{} before
wenzelm@14893
   559
normalization ({\tt pre}), redexes with results ({\tt rewrote}), the normal
wenzelm@14893
   560
form finally reached ({\tt post}) and some statistics ({\tt normalize}).
lcp@323
   561
lcp@323
   562
\subsection{Example: the syntax of finite sets}
lcp@323
   563
\index{examples!of macros}
lcp@323
   564
lcp@323
   565
This example demonstrates the use of recursive macros to implement a
lcp@323
   566
convenient notation for finite sets.
lcp@323
   567
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
lcp@323
   568
\index{"@Enum@{\tt\at Enum} constant}
lcp@323
   569
\index{"@Finset@{\tt\at Finset} constant}
lcp@323
   570
\begin{ttbox}
wenzelm@3108
   571
FinSyntax = SetSyntax +
lcp@323
   572
types
lcp@323
   573
  is
wenzelm@864
   574
syntax
clasohm@1387
   575
  ""            :: i => is                  ("_")
clasohm@1387
   576
  "{\at}Enum"       :: [i, is] => is            ("_,/ _")
wenzelm@864
   577
consts
clasohm@1387
   578
  empty         :: i                        ("{\ttlbrace}{\ttrbrace}")
clasohm@1387
   579
  insert        :: [i, i] => i
wenzelm@864
   580
syntax
clasohm@1387
   581
  "{\at}Finset"     :: is => i                  ("{\ttlbrace}(_){\ttrbrace}")
lcp@323
   582
translations
lcp@323
   583
  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
lcp@323
   584
  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
lcp@323
   585
end
lcp@323
   586
\end{ttbox}
lcp@323
   587
Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
lcp@323
   588
declarations above specify \verb|{x, y, z}| as the external representation
lcp@323
   589
of
lcp@323
   590
\begin{ttbox}
lcp@323
   591
insert(x, insert(y, insert(z, empty)))
lcp@323
   592
\end{ttbox}
lcp@323
   593
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
lcp@323
   594
  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
lcp@323
   595
allows a line break after the comma for \rmindex{pretty printing}; if no
lcp@323
   596
line break is required then a space is printed instead.
lcp@323
   597
lcp@323
   598
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
wenzelm@864
   599
declaration.  Hence {\tt is} is not a logical type and may be used safely as
wenzelm@864
   600
a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be
paulson@3485
   601
re-used for other enumerations of type~{\tt i} like lists or tuples.  If we
wenzelm@864
   602
had needed polymorphic enumerations, we could have used the predefined
wenzelm@864
   603
nonterminal symbol \ndx{args} and skipped this part altogether.
lcp@323
   604
lcp@323
   605
\index{"@Finset@{\tt\at Finset} constant}
lcp@323
   606
Next follows {\tt empty}, which is already equipped with its syntax
lcp@323
   607
\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
lcp@323
   608
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
lcp@323
   609
  i} enclosed in curly braces.  Remember that a pair of parentheses, as in
lcp@323
   610
\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
lcp@323
   611
lcp@323
   612
The translations may look strange at first.  Macro rules are best
lcp@323
   613
understood in their internal forms:
lcp@323
   614
\begin{ttbox}
lcp@323
   615
parse_rules:
lcp@323
   616
  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
lcp@323
   617
  ("{\at}Finset" x)  ->  ("insert" x "empty")
lcp@323
   618
print_rules:
lcp@323
   619
  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
lcp@323
   620
  ("insert" x "empty")  ->  ("{\at}Finset" x)
lcp@323
   621
\end{ttbox}
wenzelm@864
   622
This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
lcp@323
   623
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
wenzelm@864
   624
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
lcp@323
   625
The parse rules only work in the order given.
lcp@323
   626
lcp@323
   627
\begin{warn}
lcp@332
   628
  The \AST{} rewriter cannot distinguish constants from variables and looks
lcp@323
   629
  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
lcp@323
   630
  the (internal) left-hand side of translation rules should be regarded as
lcp@323
   631
  \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
lcp@323
   632
  sufficiently long and strange names.  If a bound variable's name gets
lcp@323
   633
  rewritten, the result will be incorrect; for example, the term
lcp@323
   634
\begin{ttbox}
lcp@323
   635
\%empty insert. insert(x, empty)
lcp@323
   636
\end{ttbox}
wenzelm@864
   637
\par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
lcp@323
   638
\end{warn}
lcp@323
   639
lcp@323
   640
lcp@323
   641
\subsection{Example: a parse macro for dependent types}\label{prod_trans}
lcp@323
   642
\index{examples!of macros}
lcp@323
   643
lcp@323
   644
As stated earlier, a macro rule may not introduce new {\tt Variable}s on
wenzelm@864
   645
the right-hand side.  Something like \verb|"K(B)" => "%x.B"| is illegal;
lcp@323
   646
if allowed, it could cause variable capture.  In such cases you usually
lcp@323
   647
must fall back on translation functions.  But a trick can make things
wenzelm@864
   648
readable in some cases: {\em calling\/} translation functions by parse
wenzelm@864
   649
macros:
lcp@323
   650
\begin{ttbox}
wenzelm@3135
   651
ProdSyntax = SetSyntax +
lcp@323
   652
consts
clasohm@1387
   653
  Pi            :: [i, i => i] => i
wenzelm@864
   654
syntax
clasohm@1387
   655
  "{\at}PROD"       :: [idt, i, i] => i       ("(3PROD _:_./ _)" 10)
clasohm@1387
   656
  "{\at}->"         :: [i, i] => i            ("(_ ->/ _)" [51, 50] 50)
lcp@323
   657
\ttbreak
lcp@323
   658
translations
lcp@323
   659
  "PROD x:A. B" => "Pi(A, \%x. B)"
lcp@323
   660
  "A -> B"      => "Pi(A, _K(B))"
lcp@323
   661
end
lcp@323
   662
ML
lcp@323
   663
  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
lcp@323
   664
\end{ttbox}
lcp@323
   665
wenzelm@864
   666
Here {\tt Pi} is a logical constant for constructing general products.
lcp@323
   667
Two external forms exist: the general case {\tt PROD x:A.B} and the
lcp@323
   668
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
lcp@323
   669
does not depend on~{\tt x}.
lcp@323
   670
wenzelm@864
   671
The second parse macro introduces {\tt _K(B)}, which later becomes
wenzelm@864
   672
\verb|%x.B| due to a parse translation associated with \cdx{_K}.
wenzelm@864
   673
Unfortunately there is no such trick for printing, so we have to add a {\tt
wenzelm@864
   674
ML} section for the print translation \ttindex{dependent_tr'}.
lcp@323
   675
lcp@323
   676
Recall that identifiers with a leading {\tt _} are allowed in translation
lcp@323
   677
rules, but not in ordinary terms.  Thus we can create \AST{}s containing
lcp@323
   678
names that are not directly expressible.
lcp@323
   679
paulson@8136
   680
The parse translation for {\tt _K} is already installed in Pure, and the
paulson@8136
   681
function {\tt dependent_tr'} is exported by the syntax module for public use.
paulson@8136
   682
See \S\ref{sec:tr_funs} below for more of the arcane lore of translation
paulson@8136
   683
functions.  \index{macros|)}\index{rewriting!syntactic|)}
lcp@323
   684
lcp@323
   685
lcp@323
   686
\section{Translation functions} \label{sec:tr_funs}
wenzelm@864
   687
\index{translations|(}
lcp@323
   688
%
wenzelm@9695
   689
This section describes the translation function mechanism.  By writing \ML{}
wenzelm@9695
   690
functions, you can do almost everything to terms or \AST{}s during parsing and
wenzelm@9695
   691
printing.  The logic LK is a good example of sophisticated transformations
wenzelm@9695
   692
between internal and external representations of sequents; here, macros would
wenzelm@9695
   693
be useless.
lcp@323
   694
lcp@323
   695
A full understanding of translations requires some familiarity
lcp@323
   696
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
lcp@323
   697
{\tt Syntax.ast} and the encodings of types and terms as such at the various
lcp@323
   698
stages of the parsing or printing process.  Most users should never need to
lcp@323
   699
use translation functions.
lcp@323
   700
lcp@323
   701
\subsection{Declaring translation functions}
wenzelm@3108
   702
There are four kinds of translation functions, with one of these
wenzelm@3108
   703
coming in two variants.  Each such function is associated with a name,
wenzelm@3108
   704
which triggers calls to it.  Such names can be constants (logical or
wenzelm@3108
   705
syntactic) or type constructors.
lcp@323
   706
wenzelm@6343
   707
Function {\tt print_syntax} displays the sets of names associated with the
wenzelm@6343
   708
translation functions of a theory under \texttt{parse_ast_translation}, etc.
wenzelm@6343
   709
You can add new ones via the {\tt ML} section\index{*ML section} of a theory
wenzelm@6343
   710
definition file.  Even though the {\tt ML} section is the very last part of
wenzelm@6343
   711
the file, newly installed translation functions are already effective when
wenzelm@6343
   712
processing all of the preceding sections.
lcp@323
   713
wenzelm@3108
   714
The {\tt ML} section's contents are simply copied verbatim near the
wenzelm@3108
   715
beginning of the \ML\ file generated from a theory definition file.
wenzelm@3108
   716
Definitions made here are accessible as components of an \ML\ 
wenzelm@3108
   717
structure; to make some parts private, use an \ML{} {\tt local}
wenzelm@3108
   718
declaration.  The {\ML} code may install translation functions by
wenzelm@3108
   719
declaring any of the following identifiers:
lcp@323
   720
\begin{ttbox}
wenzelm@3108
   721
val parse_ast_translation   : (string * (ast list -> ast)) list
wenzelm@3108
   722
val print_ast_translation   : (string * (ast list -> ast)) list
wenzelm@3108
   723
val parse_translation       : (string * (term list -> term)) list
wenzelm@3108
   724
val print_translation       : (string * (term list -> term)) list
wenzelm@4375
   725
val typed_print_translation :
wenzelm@4375
   726
    (string * (bool -> typ -> term list -> term)) list
lcp@323
   727
\end{ttbox}
lcp@323
   728
lcp@323
   729
\subsection{The translation strategy}
wenzelm@3108
   730
The different kinds of translation functions are called during the
wenzelm@3108
   731
transformations between parse trees, \AST{}s and terms (recall
wenzelm@3108
   732
Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
wenzelm@3108
   733
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation
wenzelm@3108
   734
function $f$ of appropriate kind exists for $c$, the result is
wenzelm@3108
   735
computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
lcp@323
   736
wenzelm@3108
   737
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.
wenzelm@3108
   738
A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1,
wenzelm@3108
   739
  \ldots, x@n}$.  For term translations, the arguments are terms and a
wenzelm@3108
   740
combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}
wenzelm@3108
   741
(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more
wenzelm@3108
   742
sophisticated transformations than \AST{}s do, typically involving
wenzelm@3108
   743
abstractions and bound variables. {\em Typed} print translations may
wenzelm@4375
   744
even peek at the type $\tau$ of the constant they are invoked on; they
wenzelm@4375
   745
are also passed the current value of the \ttindex{show_sorts} flag.
lcp@323
   746
wenzelm@3108
   747
Regardless of whether they act on terms or \AST{}s, translation
wenzelm@3108
   748
functions called during the parsing process differ from those for
wenzelm@3108
   749
printing more fundamentally in their overall behaviour:
lcp@323
   750
\begin{description}
wenzelm@6343
   751
\item[Parse translations] are applied bottom-up.  The arguments are already in
wenzelm@6343
   752
  translated form.  The translations must not fail; exceptions trigger an
wenzelm@6343
   753
  error message.  There may never be more than one function associated with
wenzelm@6343
   754
  any syntactic name.
wenzelm@6343
   755
  
lcp@323
   756
\item[Print translations] are applied top-down.  They are supplied with
lcp@323
   757
  arguments that are partly still in internal form.  The result again
wenzelm@6343
   758
  undergoes translation; therefore a print translation should not introduce as
wenzelm@6343
   759
  head the very constant that invoked it.  The function may raise exception
wenzelm@6343
   760
  \xdx{Match} to indicate failure; in this event it has no effect.  Multiple
wenzelm@6343
   761
  functions associated with some syntactic name are tried in an unspecified
wenzelm@6343
   762
  order.
lcp@323
   763
\end{description}
lcp@323
   764
lcp@323
   765
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
lcp@323
   766
\ttindex{Const} for terms --- can invoke translation functions.  This
lcp@323
   767
causes another difference between parsing and printing.
lcp@323
   768
lcp@323
   769
Parsing starts with a string and the constants are not yet identified.
lcp@323
   770
Only parse tree heads create {\tt Constant}s in the resulting \AST, as
lcp@323
   771
described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
lcp@323
   772
introduce further {\tt Constant}s.  When the final \AST{} is converted to a
lcp@323
   773
term, all {\tt Constant}s become {\tt Const}s, as described in
lcp@323
   774
\S\ref{sec:termofast}.
lcp@323
   775
lcp@323
   776
Printing starts with a well-typed term and all the constants are known.  So
lcp@323
   777
all logical constants and type constructors may invoke print translations.
lcp@323
   778
These, and macros, may introduce further constants.
lcp@323
   779
lcp@323
   780
lcp@323
   781
\subsection{Example: a print translation for dependent types}
lcp@323
   782
\index{examples!of translations}\indexbold{*dependent_tr'}
lcp@323
   783
lcp@323
   784
Let us continue the dependent type example (page~\pageref{prod_trans}) by
lcp@323
   785
examining the parse translation for~\cdx{_K} and the print translation
lcp@323
   786
{\tt dependent_tr'}, which are both built-in.  By convention, parse
lcp@323
   787
translations have names ending with {\tt _tr} and print translations have
lcp@323
   788
names ending with {\tt _tr'}.  Search for such names in the Isabelle
lcp@323
   789
sources to locate more examples.
lcp@323
   790
lcp@323
   791
Here is the parse translation for {\tt _K}:
lcp@323
   792
\begin{ttbox}
lcp@323
   793
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
wenzelm@864
   794
  | k_tr ts = raise TERM ("k_tr", ts);
lcp@323
   795
\end{ttbox}
lcp@323
   796
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
lcp@323
   797
{\tt Abs} node with a body derived from $t$.  Since terms given to parse
lcp@323
   798
translations are not yet typed, the type of the bound variable in the new
lcp@323
   799
{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
lcp@323
   800
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
lcp@323
   801
a basic term manipulation function defined in {\tt Pure/term.ML}.
lcp@323
   802
lcp@323
   803
Here is the print translation for dependent types:
lcp@323
   804
\begin{ttbox}
wenzelm@864
   805
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
lcp@323
   806
      if 0 mem (loose_bnos B) then
wenzelm@3108
   807
        let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in
wenzelm@3108
   808
          list_comb
paulson@8136
   809
            (Const (q,dummyT) $
paulson@8136
   810
             Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts)
lcp@323
   811
        end
lcp@323
   812
      else list_comb (Const (r, dummyT) $ A $ B, ts)
lcp@323
   813
  | dependent_tr' _ _ = raise Match;
lcp@323
   814
\end{ttbox}
wenzelm@3135
   815
The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt
wenzelm@3108
   816
  dependent_tr'} by a partial application during its installation.
wenzelm@3108
   817
For example, we could set up print translations for both {\tt Pi} and
wenzelm@3108
   818
{\tt Sigma} by including
lcp@323
   819
\begin{ttbox}\index{*ML section}
lcp@323
   820
val print_translation =
lcp@323
   821
  [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
lcp@323
   822
   ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
lcp@323
   823
\end{ttbox}
wenzelm@3108
   824
within the {\tt ML} section.  The first of these transforms ${\tt
wenzelm@3108
   825
  Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
wenzelm@3108
   826
$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not
wenzelm@3108
   827
depend on~$x$.  It checks this using \ttindex{loose_bnos}, yet another
wenzelm@3108
   828
function from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$
wenzelm@3108
   829
renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt
wenzelm@3108
   830
  Bound} nodes referring to the {\tt Abs} node replaced by
wenzelm@3108
   831
$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound
wenzelm@3108
   832
variable).
lcp@323
   833
lcp@323
   834
We must be careful with types here.  While types of {\tt Const}s are
lcp@323
   835
ignored, type constraints may be printed for some {\tt Free}s and
lcp@323
   836
{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
lcp@323
   837
\ttindex{dummyT} are never printed with constraint, though.  The line
lcp@323
   838
\begin{ttbox}
wenzelm@3108
   839
        let val (x', B') = Syntax.variant_abs' (x, dummyT, B);
wenzelm@3108
   840
\end{ttbox}\index{*Syntax.variant_abs'}
lcp@323
   841
replaces bound variable occurrences in~$B$ by the free variable $x'$ with
lcp@323
   842
type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
lcp@323
   843
correct type~{\tt T}, so this is the only place where a type
wenzelm@864
   844
constraint might appear.
wenzelm@3108
   845
wenzelm@3108
   846
Also note that we are responsible to mark free identifiers that
paulson@3485
   847
actually represent bound variables.  This is achieved by
wenzelm@3108
   848
\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.
wenzelm@3108
   849
Failing to do so may cause these names to be printed in the wrong
wenzelm@3108
   850
style.  \index{translations|)} \index{syntax!transformations|)}
wenzelm@3108
   851
wenzelm@3108
   852
wenzelm@3108
   853
\section{Token translations} \label{sec:tok_tr}
wenzelm@3108
   854
\index{token translations|(}
wenzelm@3108
   855
%
wenzelm@3108
   856
Isabelle's meta-logic features quite a lot of different kinds of
wenzelm@3108
   857
identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},
paulson@3485
   858
{\em bound}, {\em var}.  One might want to have these printed in
wenzelm@3108
   859
different styles, e.g.\ in bold or italic, or even transcribed into
wenzelm@3108
   860
something more readable like $\alpha, \alpha', \beta$ instead of {\tt
paulson@3485
   861
  'a}, {\tt 'aa}, {\tt 'b} for type variables.  Token translations
wenzelm@3108
   862
provide a means to such ends, enabling the user to install certain
wenzelm@3108
   863
\ML{} functions associated with any logical \rmindex{token class} and
wenzelm@3108
   864
depending on some \rmindex{print mode}.
wenzelm@3108
   865
wenzelm@3108
   866
The logical class of identifiers can not necessarily be determined by
paulson@3485
   867
its syntactic category, though.  For example, consider free vs.\ bound
paulson@3485
   868
variables.  So Isabelle's pretty printing mechanism, starting from
wenzelm@3108
   869
fully typed terms, has to be careful to preserve this additional
wenzelm@3108
   870
information\footnote{This is done by marking atoms in abstract syntax
paulson@3485
   871
  trees appropriately.  The marks are actually visible by print
wenzelm@3108
   872
  translation functions -- they are just special constants applied to
wenzelm@3108
   873
  atomic asts, for example \texttt{("_bound" x)}.}.  In particular,
wenzelm@3108
   874
user-supplied print translation functions operating on terms have to
paulson@3485
   875
be well-behaved in this respect.  Free identifiers introduced to
wenzelm@3108
   876
represent bound variables have to be marked appropriately (cf.\ the
wenzelm@3108
   877
example at the end of \S\ref{sec:tr_funs}).
wenzelm@3108
   878
wenzelm@3108
   879
\medskip Token translations may be installed by declaring the
wenzelm@6343
   880
\ttindex{token_translation} value within the \texttt{ML} section of a theory
wenzelm@6343
   881
definition file:
wenzelm@3108
   882
\begin{ttbox}
paulson@8136
   883
val token_translation: 
paulson@8136
   884
      (string * string * (string -> string * real)) list
wenzelm@8701
   885
\end{ttbox}
wenzelm@6343
   886
The elements of this list are of the form $(m, c, f)$, where $m$ is a print
wenzelm@6343
   887
mode identifier, $c$ a token class, and $f\colon string \to string \times
wenzelm@6343
   888
real$ the actual translation function.  Assuming that $x$ is of identifier
wenzelm@6343
   889
class $c$, and print mode $m$ is the first (active) mode providing some
wenzelm@6343
   890
translation for $c$, then $x$ is output according to $f(x) = (x', len)$.
wenzelm@6343
   891
Thereby $x'$ is the modified identifier name and $len$ its visual length in
wenzelm@6343
   892
terms of characters (e.g.\ length 1.0 would correspond to $1/2$\,em in
wenzelm@6343
   893
\LaTeX).  Thus $x'$ may include non-printing parts like control sequences or
wenzelm@6343
   894
markup information for typesetting systems.
wenzelm@3108
   895
wenzelm@3108
   896
wenzelm@3108
   897
\index{token translations|)}
wenzelm@5371
   898
wenzelm@5371
   899
wenzelm@5371
   900
%%% Local Variables: 
wenzelm@5371
   901
%%% mode: latex
wenzelm@5371
   902
%%% TeX-master: "ref"
wenzelm@5371
   903
%%% End: