wenzelm@30184
|
1 |
|
lcp@323
|
2 |
\chapter{Syntax Transformations} \label{chap:syntax}
|
lcp@323
|
3 |
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
|
lcp@323
|
4 |
\newcommand\mtt[1]{\mbox{\tt #1}}
|
lcp@323
|
5 |
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
|
lcp@323
|
6 |
\newcommand\Constant{\ttfct{Constant}}
|
lcp@323
|
7 |
\newcommand\Variable{\ttfct{Variable}}
|
lcp@323
|
8 |
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
|
lcp@323
|
9 |
\index{syntax!transformations|(}
|
lcp@323
|
10 |
|
lcp@323
|
11 |
This chapter is intended for experienced Isabelle users who need to define
|
lcp@323
|
12 |
macros or code their own translation functions. It describes the
|
lcp@323
|
13 |
transformations between parse trees, abstract syntax trees and terms.
|
lcp@323
|
14 |
|
lcp@323
|
15 |
|
lcp@323
|
16 |
\section{Abstract syntax trees} \label{sec:asts}
|
wenzelm@864
|
17 |
\index{ASTs|(}
|
lcp@323
|
18 |
|
lcp@323
|
19 |
The parser, given a token list from the lexer, applies productions to yield
|
lcp@323
|
20 |
a parse tree\index{parse trees}. By applying some internal transformations
|
lcp@323
|
21 |
the parse tree becomes an abstract syntax tree, or \AST{}. Macro
|
lcp@323
|
22 |
expansion, further translations and finally type inference yields a
|
lcp@323
|
23 |
well-typed term. The printing process is the reverse, except for some
|
lcp@323
|
24 |
subtleties to be discussed later.
|
lcp@323
|
25 |
|
lcp@323
|
26 |
Figure~\ref{fig:parse_print} outlines the parsing and printing process.
|
lcp@323
|
27 |
Much of the complexity is due to the macro mechanism. Using macros, you
|
lcp@323
|
28 |
can specify most forms of concrete syntax without writing any \ML{} code.
|
lcp@323
|
29 |
|
lcp@323
|
30 |
\begin{figure}
|
lcp@323
|
31 |
\begin{center}
|
lcp@323
|
32 |
\begin{tabular}{cl}
|
lcp@323
|
33 |
string & \\
|
wenzelm@3108
|
34 |
$\downarrow$ & lexer, parser \\
|
lcp@323
|
35 |
parse tree & \\
|
lcp@323
|
36 |
$\downarrow$ & parse \AST{} translation \\
|
lcp@323
|
37 |
\AST{} & \\
|
lcp@323
|
38 |
$\downarrow$ & \AST{} rewriting (macros) \\
|
lcp@323
|
39 |
\AST{} & \\
|
lcp@323
|
40 |
$\downarrow$ & parse translation, type inference \\
|
lcp@323
|
41 |
--- well-typed term --- & \\
|
lcp@323
|
42 |
$\downarrow$ & print translation \\
|
lcp@323
|
43 |
\AST{} & \\
|
lcp@323
|
44 |
$\downarrow$ & \AST{} rewriting (macros) \\
|
lcp@323
|
45 |
\AST{} & \\
|
wenzelm@3108
|
46 |
$\downarrow$ & print \AST{} translation, token translation \\
|
lcp@323
|
47 |
string &
|
lcp@323
|
48 |
\end{tabular}
|
lcp@323
|
49 |
|
lcp@323
|
50 |
\end{center}
|
lcp@323
|
51 |
\caption{Parsing and printing}\label{fig:parse_print}
|
lcp@323
|
52 |
\end{figure}
|
lcp@323
|
53 |
|
lcp@323
|
54 |
Abstract syntax trees are an intermediate form between the raw parse trees
|
lcp@323
|
55 |
and the typed $\lambda$-terms. An \AST{} is either an atom (constant or
|
lcp@323
|
56 |
variable) or a list of {\em at least two\/} subtrees. Internally, they
|
lcp@323
|
57 |
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
|
lcp@323
|
58 |
\index{*Appl}
|
lcp@323
|
59 |
\begin{ttbox}
|
lcp@323
|
60 |
datatype ast = Constant of string
|
lcp@323
|
61 |
| Variable of string
|
lcp@323
|
62 |
| Appl of ast list
|
lcp@323
|
63 |
\end{ttbox}
|
lcp@323
|
64 |
%
|
lcp@323
|
65 |
Isabelle uses an S-expression syntax for abstract syntax trees. Constant
|
lcp@323
|
66 |
atoms are shown as quoted strings, variable atoms as non-quoted strings and
|
lcp@332
|
67 |
applications as a parenthesised list of subtrees. For example, the \AST
|
lcp@323
|
68 |
\begin{ttbox}
|
lcp@323
|
69 |
Appl [Constant "_constrain",
|
lcp@323
|
70 |
Appl [Constant "_abs", Variable "x", Variable "t"],
|
lcp@323
|
71 |
Appl [Constant "fun", Variable "'a", Variable "'b"]]
|
lcp@323
|
72 |
\end{ttbox}
|
lcp@323
|
73 |
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
|
lcp@323
|
74 |
Both {\tt ()} and {\tt (f)} are illegal because they have too few
|
wenzelm@864
|
75 |
subtrees.
|
lcp@323
|
76 |
|
lcp@323
|
77 |
The resemblance to Lisp's S-expressions is intentional, but there are two
|
lcp@323
|
78 |
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the
|
lcp@323
|
79 |
names {\tt Constant} and {\tt Variable} too literally; in the later
|
lcp@323
|
80 |
translation to terms, $\Variable x$ may become a constant, free or bound
|
lcp@323
|
81 |
variable, even a type constructor or class name; the actual outcome depends
|
lcp@323
|
82 |
on the context.
|
lcp@323
|
83 |
|
lcp@323
|
84 |
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
|
lcp@323
|
85 |
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of
|
lcp@323
|
86 |
application is determined later by context; it could be a type constructor
|
lcp@323
|
87 |
applied to types.
|
lcp@323
|
88 |
|
lcp@323
|
89 |
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
|
lcp@323
|
90 |
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later
|
lcp@323
|
91 |
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
|
lcp@323
|
92 |
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
|
lcp@323
|
93 |
constructor \ttindex{Bound}).
|
lcp@323
|
94 |
|
lcp@323
|
95 |
|
wenzelm@6618
|
96 |
\section{Transforming parse trees to ASTs}\label{sec:astofpt}
|
lcp@323
|
97 |
\index{ASTs!made from parse trees}
|
lcp@323
|
98 |
\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
|
lcp@323
|
99 |
|
lcp@323
|
100 |
The parse tree is the raw output of the parser. Translation functions,
|
lcp@323
|
101 |
called {\bf parse AST translations}\indexbold{translations!parse AST},
|
lcp@323
|
102 |
transform the parse tree into an abstract syntax tree.
|
lcp@323
|
103 |
|
lcp@323
|
104 |
The parse tree is constructed by nesting the right-hand sides of the
|
lcp@323
|
105 |
productions used to recognize the input. Such parse trees are simply lists
|
lcp@323
|
106 |
of tokens and constituent parse trees, the latter representing the
|
lcp@323
|
107 |
nonterminals of the productions. Let us refer to the actual productions in
|
wenzelm@864
|
108 |
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
|
wenzelm@864
|
109 |
example).
|
lcp@323
|
110 |
|
lcp@323
|
111 |
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
|
lcp@323
|
112 |
by stripping out delimiters and copy productions. More precisely, the
|
lcp@323
|
113 |
mapping $\astofpt{-}$ is derived from the productions as follows:
|
lcp@323
|
114 |
\begin{itemize}
|
lcp@323
|
115 |
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
|
wenzelm@14947
|
116 |
\ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
|
wenzelm@14947
|
117 |
and $s$ its associated string. Note that for {\tt xstr} this does not
|
wenzelm@14947
|
118 |
include the quotes.
|
lcp@323
|
119 |
|
lcp@323
|
120 |
\item Copy productions:\index{productions!copy}
|
lcp@323
|
121 |
$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for
|
lcp@323
|
122 |
strings of delimiters, which are discarded. $P$ stands for the single
|
lcp@323
|
123 |
constituent that is not a delimiter; it is either a nonterminal symbol or
|
lcp@323
|
124 |
a name token.
|
lcp@323
|
125 |
|
lcp@323
|
126 |
\item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
|
lcp@323
|
127 |
Here there are no constituents other than delimiters, which are
|
wenzelm@864
|
128 |
discarded.
|
lcp@323
|
129 |
|
lcp@323
|
130 |
\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
|
lcp@323
|
131 |
the remaining constituents $P@1$, \ldots, $P@n$ are built into an
|
lcp@323
|
132 |
application whose head constant is~$c$:
|
wenzelm@864
|
133 |
\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
|
lcp@323
|
134 |
\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
|
lcp@323
|
135 |
\]
|
lcp@323
|
136 |
\end{itemize}
|
lcp@323
|
137 |
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
|
lcp@323
|
138 |
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
|
lcp@323
|
139 |
These examples illustrate the need for further translations to make \AST{}s
|
lcp@323
|
140 |
closer to the typed $\lambda$-calculus. The Pure syntax provides
|
lcp@323
|
141 |
predefined parse \AST{} translations\index{translations!parse AST} for
|
lcp@323
|
142 |
ordinary applications, type applications, nested abstractions, meta
|
lcp@323
|
143 |
implications and function types. Figure~\ref{fig:parse_ast_tr} shows their
|
lcp@323
|
144 |
effect on some representative input strings.
|
lcp@323
|
145 |
|
lcp@323
|
146 |
|
lcp@323
|
147 |
\begin{figure}
|
lcp@323
|
148 |
\begin{center}
|
lcp@323
|
149 |
\tt\begin{tabular}{ll}
|
lcp@323
|
150 |
\rm input string & \rm \AST \\\hline
|
lcp@323
|
151 |
"f" & f \\
|
lcp@323
|
152 |
"'a" & 'a \\
|
lcp@323
|
153 |
"t == u" & ("==" t u) \\
|
lcp@323
|
154 |
"f(x)" & ("_appl" f x) \\
|
lcp@323
|
155 |
"f(x, y)" & ("_appl" f ("_args" x y)) \\
|
lcp@323
|
156 |
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\
|
lcp@323
|
157 |
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\
|
lcp@323
|
158 |
\end{tabular}
|
lcp@323
|
159 |
\end{center}
|
wenzelm@864
|
160 |
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
|
lcp@323
|
161 |
\end{figure}
|
lcp@323
|
162 |
|
lcp@323
|
163 |
\begin{figure}
|
lcp@323
|
164 |
\begin{center}
|
lcp@323
|
165 |
\tt\begin{tabular}{ll}
|
lcp@323
|
166 |
\rm input string & \rm \AST{} \\\hline
|
lcp@323
|
167 |
"f(x, y, z)" & (f x y z) \\
|
lcp@323
|
168 |
"'a ty" & (ty 'a) \\
|
lcp@323
|
169 |
"('a, 'b) ty" & (ty 'a 'b) \\
|
lcp@323
|
170 |
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\
|
lcp@323
|
171 |
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\
|
lcp@323
|
172 |
"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\
|
lcp@323
|
173 |
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
|
lcp@323
|
174 |
\end{tabular}
|
lcp@323
|
175 |
\end{center}
|
lcp@323
|
176 |
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
|
lcp@323
|
177 |
\end{figure}
|
lcp@323
|
178 |
|
lcp@323
|
179 |
The names of constant heads in the \AST{} control the translation process.
|
lcp@323
|
180 |
The list of constants invoking parse \AST{} translations appears in the
|
wenzelm@864
|
181 |
output of {\tt print_syntax} under {\tt parse_ast_translation}.
|
lcp@323
|
182 |
|
lcp@323
|
183 |
|
wenzelm@6618
|
184 |
\section{Transforming ASTs to terms}\label{sec:termofast}
|
lcp@323
|
185 |
\index{terms!made from ASTs}
|
lcp@323
|
186 |
\newcommand\termofast[1]{\lbrakk#1\rbrakk}
|
lcp@323
|
187 |
|
lcp@323
|
188 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
|
lcp@323
|
189 |
transformed into a term. This term is probably ill-typed since type
|
lcp@323
|
190 |
inference has not occurred yet. The term may contain type constraints
|
lcp@323
|
191 |
consisting of applications with head {\tt "_constrain"}; the second
|
lcp@323
|
192 |
argument is a type encoded as a term. Type inference later introduces
|
lcp@323
|
193 |
correct types or rejects the input.
|
lcp@323
|
194 |
|
lcp@323
|
195 |
Another set of translation functions, namely parse
|
lcp@323
|
196 |
translations\index{translations!parse}, may affect this process. If we
|
lcp@323
|
197 |
ignore parse translations for the time being, then \AST{}s are transformed
|
lcp@323
|
198 |
to terms by mapping \AST{} constants to constants, \AST{} variables to
|
lcp@323
|
199 |
schematic or free variables and \AST{} applications to applications.
|
lcp@323
|
200 |
|
lcp@323
|
201 |
More precisely, the mapping $\termofast{-}$ is defined by
|
lcp@323
|
202 |
\begin{itemize}
|
lcp@323
|
203 |
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
|
lcp@323
|
204 |
\mtt{dummyT})$.
|
lcp@323
|
205 |
|
lcp@323
|
206 |
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
|
lcp@323
|
207 |
\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
|
lcp@323
|
208 |
the index extracted from~$xi$.
|
lcp@323
|
209 |
|
lcp@323
|
210 |
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
|
lcp@323
|
211 |
\mtt{dummyT})$.
|
lcp@323
|
212 |
|
lcp@323
|
213 |
\item Function applications with $n$ arguments:
|
wenzelm@864
|
214 |
\[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
|
lcp@323
|
215 |
\termofast{f} \ttapp
|
lcp@323
|
216 |
\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
|
lcp@323
|
217 |
\]
|
lcp@323
|
218 |
\end{itemize}
|
lcp@323
|
219 |
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
|
lcp@323
|
220 |
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
|
lcp@323
|
221 |
while \ttindex{dummyT} stands for some dummy type that is ignored during
|
lcp@323
|
222 |
type inference.
|
lcp@323
|
223 |
|
lcp@323
|
224 |
So far the outcome is still a first-order term. Abstractions and bound
|
lcp@323
|
225 |
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
|
lcp@323
|
226 |
by parse translations. Such translations are attached to {\tt "_abs"},
|
lcp@323
|
227 |
{\tt "!!"} and user-defined binders.
|
lcp@323
|
228 |
|
lcp@323
|
229 |
|
lcp@323
|
230 |
\section{Printing of terms}
|
lcp@323
|
231 |
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
|
lcp@323
|
232 |
|
lcp@323
|
233 |
The output phase is essentially the inverse of the input phase. Terms are
|
lcp@323
|
234 |
translated via abstract syntax trees into strings. Finally the strings are
|
lcp@323
|
235 |
pretty printed.
|
lcp@323
|
236 |
|
lcp@323
|
237 |
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
|
lcp@323
|
238 |
terms into \AST{}s. Ignoring those, the transformation maps
|
lcp@323
|
239 |
term constants, variables and applications to the corresponding constructs
|
lcp@323
|
240 |
on \AST{}s. Abstractions are mapped to applications of the special
|
lcp@323
|
241 |
constant {\tt _abs}.
|
lcp@323
|
242 |
|
lcp@323
|
243 |
More precisely, the mapping $\astofterm{-}$ is defined as follows:
|
lcp@323
|
244 |
\begin{itemize}
|
lcp@323
|
245 |
\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
|
lcp@323
|
246 |
|
lcp@323
|
247 |
\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
|
lcp@323
|
248 |
\tau)$.
|
lcp@323
|
249 |
|
lcp@323
|
250 |
\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
|
lcp@323
|
251 |
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
|
lcp@323
|
252 |
the {\tt indexname} $(x, i)$.
|
lcp@323
|
253 |
|
lcp@323
|
254 |
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
|
lcp@323
|
255 |
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
|
lcp@323
|
256 |
be obtained from~$t$ by replacing all bound occurrences of~$x$ by
|
lcp@323
|
257 |
the free variable $x'$. This replaces corresponding occurrences of the
|
lcp@323
|
258 |
constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
|
lcp@323
|
259 |
\mtt{dummyT})$:
|
wenzelm@864
|
260 |
\[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
|
wenzelm@864
|
261 |
\Appl{\Constant \mtt{"_abs"},
|
paulson@8136
|
262 |
constrain(\Variable x', \tau), \astofterm{t'}}
|
lcp@323
|
263 |
\]
|
lcp@323
|
264 |
|
wenzelm@864
|
265 |
\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
|
lcp@323
|
266 |
The occurrence of constructor \ttindex{Bound} should never happen
|
lcp@323
|
267 |
when printing well-typed terms; it indicates a de Bruijn index with no
|
lcp@323
|
268 |
matching abstraction.
|
lcp@323
|
269 |
|
lcp@323
|
270 |
\item Where $f$ is not an application,
|
wenzelm@864
|
271 |
\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
|
lcp@323
|
272 |
\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
|
lcp@323
|
273 |
\]
|
lcp@323
|
274 |
\end{itemize}
|
lcp@323
|
275 |
%
|
lcp@332
|
276 |
Type constraints\index{type constraints} are inserted to allow the printing
|
lcp@332
|
277 |
of types. This is governed by the boolean variable \ttindex{show_types}:
|
lcp@323
|
278 |
\begin{itemize}
|
lcp@323
|
279 |
\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
|
lcp@332
|
280 |
\ttindex{show_types} is set to {\tt false}.
|
lcp@323
|
281 |
|
wenzelm@864
|
282 |
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
|
wenzelm@864
|
283 |
\astofterm{\tau}}$ \ otherwise.
|
lcp@323
|
284 |
|
lcp@323
|
285 |
Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
|
lcp@323
|
286 |
constructors go to {\tt Constant}s; type identifiers go to {\tt
|
lcp@323
|
287 |
Variable}s; type applications go to {\tt Appl}s with the type
|
lcp@323
|
288 |
constructor as the first element. If \ttindex{show_sorts} is set to
|
lcp@323
|
289 |
{\tt true}, some type variables are decorated with an \AST{} encoding
|
lcp@323
|
290 |
of their sort.
|
lcp@323
|
291 |
\end{itemize}
|
lcp@323
|
292 |
%
|
lcp@323
|
293 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
|
lcp@323
|
294 |
transformed into the final output string. The built-in {\bf print AST
|
lcp@332
|
295 |
translations}\indexbold{translations!print AST} reverse the
|
lcp@323
|
296 |
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
|
lcp@323
|
297 |
|
lcp@323
|
298 |
For the actual printing process, the names attached to productions
|
lcp@323
|
299 |
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
|
lcp@323
|
300 |
a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
|
lcp@323
|
301 |
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
|
lcp@323
|
302 |
for~$c$. Each argument~$x@i$ is converted to a string, and put in
|
lcp@323
|
303 |
parentheses if its priority~$(p@i)$ requires this. The resulting strings
|
lcp@323
|
304 |
and their syntactic sugar (denoted by \dots{} above) are joined to make a
|
lcp@323
|
305 |
single string.
|
lcp@323
|
306 |
|
wenzelm@3108
|
307 |
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
|
wenzelm@3108
|
308 |
than the corresponding production, it is first split into
|
wenzelm@3108
|
309 |
$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications
|
wenzelm@3108
|
310 |
with too few arguments or with non-constant head or without a
|
wenzelm@3108
|
311 |
corresponding production are printed as $f(x@1, \ldots, x@l)$ or
|
wenzelm@3108
|
312 |
$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated
|
wenzelm@3108
|
313 |
with some name $c$ are tried in order of appearance. An occurrence of
|
lcp@323
|
314 |
$\Variable x$ is simply printed as~$x$.
|
lcp@323
|
315 |
|
lcp@323
|
316 |
Blanks are {\em not\/} inserted automatically. If blanks are required to
|
lcp@323
|
317 |
separate tokens, specify them in the mixfix declaration, possibly preceded
|
lcp@323
|
318 |
by a slash~({\tt/}) to allow a line break.
|
lcp@323
|
319 |
\index{ASTs|)}
|
lcp@323
|
320 |
|
lcp@323
|
321 |
|
lcp@323
|
322 |
|
wenzelm@3108
|
323 |
\section{Macros: syntactic rewriting} \label{sec:macros}
|
wenzelm@864
|
324 |
\index{macros|(}\index{rewriting!syntactic|(}
|
lcp@323
|
325 |
|
lcp@323
|
326 |
Mixfix declarations alone can handle situations where there is a direct
|
lcp@323
|
327 |
connection between the concrete syntax and the underlying term. Sometimes
|
lcp@323
|
328 |
we require a more elaborate concrete syntax, such as quantifiers and list
|
lcp@323
|
329 |
notation. Isabelle's {\bf macros} and {\bf translation functions} can
|
lcp@323
|
330 |
perform translations such as
|
lcp@323
|
331 |
\begin{center}\tt
|
lcp@323
|
332 |
\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
|
lcp@323
|
333 |
ALL x:A.P & Ball(A, \%x.P) \\ \relax
|
lcp@323
|
334 |
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))
|
lcp@323
|
335 |
\end{tabular}
|
lcp@323
|
336 |
\end{center}
|
lcp@323
|
337 |
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
|
lcp@323
|
338 |
are the most powerful translation mechanism but are difficult to read or
|
lcp@323
|
339 |
write. Macros are specified by first-order rewriting systems that operate
|
lcp@323
|
340 |
on abstract syntax trees. They are usually easy to read and write, and can
|
lcp@323
|
341 |
express all but the most obscure translations.
|
lcp@323
|
342 |
|
wenzelm@3108
|
343 |
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
|
wenzelm@3108
|
344 |
set theory.\footnote{This and the following theories are complete
|
wenzelm@3108
|
345 |
working examples, though they specify only syntax, no axioms. The
|
wenzelm@3108
|
346 |
file {\tt ZF/ZF.thy} presents a full set theory definition,
|
wenzelm@3108
|
347 |
including many macro rules.} Theory {\tt SetSyntax} declares
|
wenzelm@3108
|
348 |
constants for set comprehension ({\tt Collect}), replacement ({\tt
|
wenzelm@3108
|
349 |
Replace}) and bounded universal quantification ({\tt Ball}). Each
|
wenzelm@3108
|
350 |
of these binds some variables. Without additional syntax we should
|
wenzelm@3108
|
351 |
have to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, and
|
wenzelm@3108
|
352 |
similarly for the others.
|
lcp@323
|
353 |
|
lcp@323
|
354 |
\begin{figure}
|
lcp@323
|
355 |
\begin{ttbox}
|
wenzelm@3108
|
356 |
SetSyntax = Pure +
|
lcp@323
|
357 |
types
|
wenzelm@864
|
358 |
i o
|
lcp@323
|
359 |
arities
|
lcp@323
|
360 |
i, o :: logic
|
lcp@323
|
361 |
consts
|
clasohm@1387
|
362 |
Trueprop :: o => prop ("_" 5)
|
clasohm@1387
|
363 |
Collect :: [i, i => o] => i
|
clasohm@1387
|
364 |
Replace :: [i, [i, i] => o] => i
|
clasohm@1387
|
365 |
Ball :: [i, i => o] => o
|
wenzelm@864
|
366 |
syntax
|
clasohm@1387
|
367 |
"{\at}Collect" :: [idt, i, o] => i ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
|
clasohm@1387
|
368 |
"{\at}Replace" :: [idt, idt, i, o] => i ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
|
clasohm@1387
|
369 |
"{\at}Ball" :: [idt, i, o] => o ("(3ALL _:_./ _)" 10)
|
lcp@323
|
370 |
translations
|
lcp@323
|
371 |
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)"
|
lcp@323
|
372 |
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
|
lcp@323
|
373 |
"ALL x:A. P" == "Ball(A, \%x. P)"
|
lcp@323
|
374 |
end
|
lcp@323
|
375 |
\end{ttbox}
|
lcp@323
|
376 |
\caption{Macro example: set theory}\label{fig:set_trans}
|
lcp@323
|
377 |
\end{figure}
|
lcp@323
|
378 |
|
wenzelm@864
|
379 |
The theory specifies a variable-binding syntax through additional productions
|
wenzelm@864
|
380 |
that have mixfix declarations. Each non-copy production must specify some
|
wenzelm@864
|
381 |
constant, which is used for building \AST{}s. \index{constants!syntactic} The
|
wenzelm@864
|
382 |
additional constants are decorated with {\tt\at} to stress their purely
|
wenzelm@864
|
383 |
syntactic purpose; they may not occur within the final well-typed terms,
|
wenzelm@864
|
384 |
being declared as {\tt syntax} rather than {\tt consts}.
|
lcp@323
|
385 |
|
lcp@323
|
386 |
The translations cause the replacement of external forms by internal forms
|
lcp@323
|
387 |
after parsing, and vice versa before printing of terms. As a specification
|
lcp@323
|
388 |
of the set theory notation, they should be largely self-explanatory. The
|
lcp@323
|
389 |
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
|
lcp@323
|
390 |
appear implicitly in the macro rules via their mixfix forms.
|
lcp@323
|
391 |
|
lcp@323
|
392 |
Macros can define variable-binding syntax because they operate on \AST{}s,
|
lcp@323
|
393 |
which have no inbuilt notion of bound variable. The macro variables {\tt
|
lcp@323
|
394 |
x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
|
lcp@323
|
395 |
in this case bound variables. The macro variables {\tt P} and~{\tt Q}
|
lcp@323
|
396 |
range over formulae containing bound variable occurrences.
|
lcp@323
|
397 |
|
lcp@323
|
398 |
Other applications of the macro system can be less straightforward, and
|
lcp@323
|
399 |
there are peculiarities. The rest of this section will describe in detail
|
lcp@323
|
400 |
how Isabelle macros are preprocessed and applied.
|
lcp@323
|
401 |
|
lcp@323
|
402 |
|
lcp@323
|
403 |
\subsection{Specifying macros}
|
lcp@323
|
404 |
Macros are basically rewrite rules on \AST{}s. But unlike other macro
|
lcp@323
|
405 |
systems found in programming languages, Isabelle's macros work in both
|
lcp@323
|
406 |
directions. Therefore a syntax contains two lists of rewrites: one for
|
lcp@323
|
407 |
parsing and one for printing.
|
lcp@323
|
408 |
|
lcp@323
|
409 |
\index{*translations section}
|
lcp@323
|
410 |
The {\tt translations} section specifies macros. The syntax for a macro is
|
lcp@323
|
411 |
\[ (root)\; string \quad
|
lcp@323
|
412 |
\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
|
lcp@323
|
413 |
\right\} \quad
|
wenzelm@864
|
414 |
(root)\; string
|
lcp@323
|
415 |
\]
|
lcp@323
|
416 |
%
|
lcp@323
|
417 |
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
|
lcp@323
|
418 |
({\tt ==}). The two strings specify the left and right-hand sides of the
|
lcp@323
|
419 |
macro rule. The $(root)$ specification is optional; it specifies the
|
lcp@323
|
420 |
nonterminal for parsing the $string$ and if omitted defaults to {\tt
|
lcp@323
|
421 |
logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:
|
lcp@323
|
422 |
\begin{itemize}
|
lcp@323
|
423 |
\item Rules must be left linear: $l$ must not contain repeated variables.
|
lcp@323
|
424 |
|
lcp@323
|
425 |
\item Every variable in~$r$ must also occur in~$l$.
|
lcp@323
|
426 |
\end{itemize}
|
lcp@323
|
427 |
|
wenzelm@3108
|
428 |
Macro rules may refer to any syntax from the parent theories. They
|
wenzelm@3108
|
429 |
may also refer to anything defined before the current {\tt
|
lcp@323
|
430 |
translations} section --- including any mixfix declarations.
|
lcp@323
|
431 |
|
lcp@323
|
432 |
Upon declaration, both sides of the macro rule undergo parsing and parse
|
lcp@323
|
433 |
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
|
lcp@323
|
434 |
macro expansion. The lexer runs in a different mode that additionally
|
lcp@323
|
435 |
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
|
lcp@323
|
436 |
{\tt _K}). Thus, a constant whose name starts with an underscore can
|
lcp@323
|
437 |
appear in macro rules but not in ordinary terms.
|
lcp@323
|
438 |
|
lcp@323
|
439 |
Some atoms of the macro rule's \AST{} are designated as constants for
|
lcp@323
|
440 |
matching. These are all names that have been declared as classes, types or
|
wenzelm@864
|
441 |
constants (logical and syntactic).
|
lcp@323
|
442 |
|
wenzelm@3108
|
443 |
The result of this preprocessing is two lists of macro rules, each
|
wenzelm@3108
|
444 |
stored as a pair of \AST{}s. They can be viewed using {\tt
|
wenzelm@3108
|
445 |
print_syntax} (sections \ttindex{parse_rules} and
|
wenzelm@3108
|
446 |
\ttindex{print_rules}). For theory~{\tt SetSyntax} of
|
wenzelm@3108
|
447 |
Fig.~\ref{fig:set_trans} these are
|
lcp@323
|
448 |
\begin{ttbox}
|
lcp@323
|
449 |
parse_rules:
|
lcp@323
|
450 |
("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))
|
lcp@323
|
451 |
("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q)))
|
lcp@323
|
452 |
("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))
|
lcp@323
|
453 |
print_rules:
|
lcp@323
|
454 |
("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P)
|
lcp@323
|
455 |
("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q)
|
lcp@323
|
456 |
("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)
|
lcp@323
|
457 |
\end{ttbox}
|
lcp@323
|
458 |
|
lcp@323
|
459 |
\begin{warn}
|
lcp@323
|
460 |
Avoid choosing variable names that have previously been used as
|
lcp@323
|
461 |
constants, types or type classes; the {\tt consts} section in the output
|
wenzelm@864
|
462 |
of {\tt print_syntax} lists all such names. If a macro rule works
|
lcp@323
|
463 |
incorrectly, inspect its internal form as shown above, recalling that
|
lcp@323
|
464 |
constants appear as quoted strings and variables without quotes.
|
lcp@323
|
465 |
\end{warn}
|
lcp@323
|
466 |
|
lcp@323
|
467 |
\begin{warn}
|
lcp@323
|
468 |
If \ttindex{eta_contract} is set to {\tt true}, terms will be
|
lcp@323
|
469 |
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some
|
lcp@323
|
470 |
abstraction nodes needed for print rules to match may vanish. For example,
|
lcp@332
|
471 |
\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
|
lcp@323
|
472 |
not apply and the output will be {\tt Ball(A, P)}. This problem would not
|
lcp@323
|
473 |
occur if \ML{} translation functions were used instead of macros (as is
|
lcp@323
|
474 |
done for binder declarations).
|
lcp@323
|
475 |
\end{warn}
|
lcp@323
|
476 |
|
lcp@323
|
477 |
|
lcp@323
|
478 |
\begin{warn}
|
lcp@323
|
479 |
Another trap concerns type constraints. If \ttindex{show_types} is set to
|
lcp@323
|
480 |
{\tt true}, bound variables will be decorated by their meta types at the
|
lcp@323
|
481 |
binding place (but not at occurrences in the body). Matching with
|
lcp@323
|
482 |
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
|
lcp@323
|
483 |
"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to
|
wenzelm@864
|
484 |
appear in the external form, say \verb|{y::i:A::i. P::o}|.
|
lcp@323
|
485 |
|
lcp@323
|
486 |
To allow such constraints to be re-read, your syntax should specify bound
|
lcp@323
|
487 |
variables using the nonterminal~\ndx{idt}. This is the case in our
|
wenzelm@3108
|
488 |
example. Choosing {\tt id} instead of {\tt idt} is a common error.
|
lcp@323
|
489 |
\end{warn}
|
lcp@323
|
490 |
|
lcp@323
|
491 |
|
lcp@323
|
492 |
|
lcp@323
|
493 |
\subsection{Applying rules}
|
lcp@323
|
494 |
As a term is being parsed or printed, an \AST{} is generated as an
|
lcp@323
|
495 |
intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is
|
lcp@332
|
496 |
normalised by applying macro rules in the manner of a traditional term
|
lcp@323
|
497 |
rewriting system. We first examine how a single rule is applied.
|
lcp@323
|
498 |
|
lcp@332
|
499 |
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
|
lcp@323
|
500 |
translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an
|
lcp@323
|
501 |
instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex
|
lcp@323
|
502 |
matched by $l$ may be replaced by the corresponding instance of~$r$, thus
|
lcp@323
|
503 |
{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf
|
lcp@323
|
504 |
place-holders} that may occur in rule patterns but not in ordinary
|
lcp@323
|
505 |
\AST{}s; {\tt Variable} atoms serve this purpose.
|
lcp@323
|
506 |
|
lcp@323
|
507 |
The matching of the object~$u$ by the pattern~$l$ is performed as follows:
|
lcp@323
|
508 |
\begin{itemize}
|
lcp@323
|
509 |
\item Every constant matches itself.
|
lcp@323
|
510 |
|
lcp@323
|
511 |
\item $\Variable x$ in the object matches $\Constant x$ in the pattern.
|
lcp@323
|
512 |
This point is discussed further below.
|
lcp@323
|
513 |
|
lcp@323
|
514 |
\item Every \AST{} in the object matches $\Variable x$ in the pattern,
|
lcp@323
|
515 |
binding~$x$ to~$u$.
|
lcp@323
|
516 |
|
lcp@323
|
517 |
\item One application matches another if they have the same number of
|
lcp@323
|
518 |
subtrees and corresponding subtrees match.
|
lcp@323
|
519 |
|
lcp@323
|
520 |
\item In every other case, matching fails. In particular, {\tt
|
lcp@323
|
521 |
Constant}~$x$ can only match itself.
|
lcp@323
|
522 |
\end{itemize}
|
lcp@323
|
523 |
A successful match yields a substitution that is applied to~$r$, generating
|
lcp@323
|
524 |
the instance that replaces~$u$.
|
lcp@323
|
525 |
|
lcp@323
|
526 |
The second case above may look odd. This is where {\tt Variable}s of
|
lcp@323
|
527 |
non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not
|
lcp@323
|
528 |
far removed from parse trees; at this level it is not yet known which
|
lcp@323
|
529 |
identifiers will become constants, bounds, frees, types or classes. As
|
lcp@323
|
530 |
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
|
wenzelm@864
|
531 |
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
|
wenzelm@14947
|
532 |
\ndx{tvar}, \ndx{num}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other
|
wenzelm@864
|
533 |
hand, when \AST{}s generated from terms for printing, all constants and type
|
wenzelm@864
|
534 |
constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may
|
wenzelm@864
|
535 |
contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is
|
lcp@323
|
536 |
insignificant at macro level because matching treats them alike.
|
lcp@323
|
537 |
|
lcp@323
|
538 |
Because of this behaviour, different kinds of atoms with the same name are
|
lcp@323
|
539 |
indistinguishable, which may make some rules prone to misbehaviour. Example:
|
lcp@323
|
540 |
\begin{ttbox}
|
lcp@323
|
541 |
types
|
lcp@323
|
542 |
Nil
|
lcp@323
|
543 |
consts
|
clasohm@1387
|
544 |
Nil :: 'a list
|
wenzelm@864
|
545 |
syntax
|
clasohm@1387
|
546 |
"[]" :: 'a list ("[]")
|
lcp@323
|
547 |
translations
|
lcp@323
|
548 |
"[]" == "Nil"
|
lcp@323
|
549 |
\end{ttbox}
|
lcp@323
|
550 |
The term {\tt Nil} will be printed as {\tt []}, just as expected.
|
lcp@323
|
551 |
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
|
wenzelm@864
|
552 |
expected! Guess how type~{\tt Nil} is printed?
|
lcp@323
|
553 |
|
wenzelm@14893
|
554 |
Normalizing an \AST{} involves repeatedly applying macro rules until none are
|
wenzelm@14893
|
555 |
applicable. Macro rules are chosen in order of appearance in the theory
|
wenzelm@14893
|
556 |
definitions. You can watch the normalization of \AST{}s during parsing and
|
wenzelm@14893
|
557 |
printing by setting \ttindex{Syntax.trace_ast} to {\tt true}.\index{tracing!of
|
wenzelm@14893
|
558 |
macros} The information displayed when tracing includes the \AST{} before
|
wenzelm@14893
|
559 |
normalization ({\tt pre}), redexes with results ({\tt rewrote}), the normal
|
wenzelm@14893
|
560 |
form finally reached ({\tt post}) and some statistics ({\tt normalize}).
|
lcp@323
|
561 |
|
lcp@323
|
562 |
\subsection{Example: the syntax of finite sets}
|
lcp@323
|
563 |
\index{examples!of macros}
|
lcp@323
|
564 |
|
lcp@323
|
565 |
This example demonstrates the use of recursive macros to implement a
|
lcp@323
|
566 |
convenient notation for finite sets.
|
lcp@323
|
567 |
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
|
lcp@323
|
568 |
\index{"@Enum@{\tt\at Enum} constant}
|
lcp@323
|
569 |
\index{"@Finset@{\tt\at Finset} constant}
|
lcp@323
|
570 |
\begin{ttbox}
|
wenzelm@3108
|
571 |
FinSyntax = SetSyntax +
|
lcp@323
|
572 |
types
|
lcp@323
|
573 |
is
|
wenzelm@864
|
574 |
syntax
|
clasohm@1387
|
575 |
"" :: i => is ("_")
|
clasohm@1387
|
576 |
"{\at}Enum" :: [i, is] => is ("_,/ _")
|
wenzelm@864
|
577 |
consts
|
clasohm@1387
|
578 |
empty :: i ("{\ttlbrace}{\ttrbrace}")
|
clasohm@1387
|
579 |
insert :: [i, i] => i
|
wenzelm@864
|
580 |
syntax
|
clasohm@1387
|
581 |
"{\at}Finset" :: is => i ("{\ttlbrace}(_){\ttrbrace}")
|
lcp@323
|
582 |
translations
|
lcp@323
|
583 |
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"
|
lcp@323
|
584 |
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"
|
lcp@323
|
585 |
end
|
lcp@323
|
586 |
\end{ttbox}
|
lcp@323
|
587 |
Finite sets are internally built up by {\tt empty} and {\tt insert}. The
|
lcp@323
|
588 |
declarations above specify \verb|{x, y, z}| as the external representation
|
lcp@323
|
589 |
of
|
lcp@323
|
590 |
\begin{ttbox}
|
lcp@323
|
591 |
insert(x, insert(y, insert(z, empty)))
|
lcp@323
|
592 |
\end{ttbox}
|
lcp@323
|
593 |
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
|
lcp@323
|
594 |
i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|}
|
lcp@323
|
595 |
allows a line break after the comma for \rmindex{pretty printing}; if no
|
lcp@323
|
596 |
line break is required then a space is printed instead.
|
lcp@323
|
597 |
|
lcp@323
|
598 |
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
|
wenzelm@864
|
599 |
declaration. Hence {\tt is} is not a logical type and may be used safely as
|
wenzelm@864
|
600 |
a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be
|
paulson@3485
|
601 |
re-used for other enumerations of type~{\tt i} like lists or tuples. If we
|
wenzelm@864
|
602 |
had needed polymorphic enumerations, we could have used the predefined
|
wenzelm@864
|
603 |
nonterminal symbol \ndx{args} and skipped this part altogether.
|
lcp@323
|
604 |
|
lcp@323
|
605 |
\index{"@Finset@{\tt\at Finset} constant}
|
lcp@323
|
606 |
Next follows {\tt empty}, which is already equipped with its syntax
|
lcp@323
|
607 |
\verb|{}|, and {\tt insert} without concrete syntax. The syntactic
|
lcp@323
|
608 |
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
|
lcp@323
|
609 |
i} enclosed in curly braces. Remember that a pair of parentheses, as in
|
lcp@323
|
610 |
\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
|
lcp@323
|
611 |
|
lcp@323
|
612 |
The translations may look strange at first. Macro rules are best
|
lcp@323
|
613 |
understood in their internal forms:
|
lcp@323
|
614 |
\begin{ttbox}
|
lcp@323
|
615 |
parse_rules:
|
lcp@323
|
616 |
("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs))
|
lcp@323
|
617 |
("{\at}Finset" x) -> ("insert" x "empty")
|
lcp@323
|
618 |
print_rules:
|
lcp@323
|
619 |
("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs))
|
lcp@323
|
620 |
("insert" x "empty") -> ("{\at}Finset" x)
|
lcp@323
|
621 |
\end{ttbox}
|
wenzelm@864
|
622 |
This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
|
lcp@323
|
623 |
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
|
wenzelm@864
|
624 |
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
|
lcp@323
|
625 |
The parse rules only work in the order given.
|
lcp@323
|
626 |
|
lcp@323
|
627 |
\begin{warn}
|
lcp@332
|
628 |
The \AST{} rewriter cannot distinguish constants from variables and looks
|
lcp@323
|
629 |
only for names of atoms. Thus the names of {\tt Constant}s occurring in
|
lcp@323
|
630 |
the (internal) left-hand side of translation rules should be regarded as
|
lcp@323
|
631 |
\rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or
|
lcp@323
|
632 |
sufficiently long and strange names. If a bound variable's name gets
|
lcp@323
|
633 |
rewritten, the result will be incorrect; for example, the term
|
lcp@323
|
634 |
\begin{ttbox}
|
lcp@323
|
635 |
\%empty insert. insert(x, empty)
|
lcp@323
|
636 |
\end{ttbox}
|
wenzelm@864
|
637 |
\par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
|
lcp@323
|
638 |
\end{warn}
|
lcp@323
|
639 |
|
lcp@323
|
640 |
|
lcp@323
|
641 |
\subsection{Example: a parse macro for dependent types}\label{prod_trans}
|
lcp@323
|
642 |
\index{examples!of macros}
|
lcp@323
|
643 |
|
lcp@323
|
644 |
As stated earlier, a macro rule may not introduce new {\tt Variable}s on
|
wenzelm@864
|
645 |
the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal;
|
lcp@323
|
646 |
if allowed, it could cause variable capture. In such cases you usually
|
lcp@323
|
647 |
must fall back on translation functions. But a trick can make things
|
wenzelm@864
|
648 |
readable in some cases: {\em calling\/} translation functions by parse
|
wenzelm@864
|
649 |
macros:
|
lcp@323
|
650 |
\begin{ttbox}
|
wenzelm@3135
|
651 |
ProdSyntax = SetSyntax +
|
lcp@323
|
652 |
consts
|
clasohm@1387
|
653 |
Pi :: [i, i => i] => i
|
wenzelm@864
|
654 |
syntax
|
clasohm@1387
|
655 |
"{\at}PROD" :: [idt, i, i] => i ("(3PROD _:_./ _)" 10)
|
clasohm@1387
|
656 |
"{\at}->" :: [i, i] => i ("(_ ->/ _)" [51, 50] 50)
|
lcp@323
|
657 |
\ttbreak
|
lcp@323
|
658 |
translations
|
lcp@323
|
659 |
"PROD x:A. B" => "Pi(A, \%x. B)"
|
lcp@323
|
660 |
"A -> B" => "Pi(A, _K(B))"
|
lcp@323
|
661 |
end
|
lcp@323
|
662 |
ML
|
lcp@323
|
663 |
val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
|
lcp@323
|
664 |
\end{ttbox}
|
lcp@323
|
665 |
|
wenzelm@864
|
666 |
Here {\tt Pi} is a logical constant for constructing general products.
|
lcp@323
|
667 |
Two external forms exist: the general case {\tt PROD x:A.B} and the
|
lcp@323
|
668 |
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
|
lcp@323
|
669 |
does not depend on~{\tt x}.
|
lcp@323
|
670 |
|
wenzelm@864
|
671 |
The second parse macro introduces {\tt _K(B)}, which later becomes
|
wenzelm@864
|
672 |
\verb|%x.B| due to a parse translation associated with \cdx{_K}.
|
wenzelm@864
|
673 |
Unfortunately there is no such trick for printing, so we have to add a {\tt
|
wenzelm@864
|
674 |
ML} section for the print translation \ttindex{dependent_tr'}.
|
lcp@323
|
675 |
|
lcp@323
|
676 |
Recall that identifiers with a leading {\tt _} are allowed in translation
|
lcp@323
|
677 |
rules, but not in ordinary terms. Thus we can create \AST{}s containing
|
lcp@323
|
678 |
names that are not directly expressible.
|
lcp@323
|
679 |
|
paulson@8136
|
680 |
The parse translation for {\tt _K} is already installed in Pure, and the
|
paulson@8136
|
681 |
function {\tt dependent_tr'} is exported by the syntax module for public use.
|
paulson@8136
|
682 |
See \S\ref{sec:tr_funs} below for more of the arcane lore of translation
|
paulson@8136
|
683 |
functions. \index{macros|)}\index{rewriting!syntactic|)}
|
lcp@323
|
684 |
|
lcp@323
|
685 |
|
lcp@323
|
686 |
\section{Translation functions} \label{sec:tr_funs}
|
wenzelm@864
|
687 |
\index{translations|(}
|
lcp@323
|
688 |
%
|
wenzelm@9695
|
689 |
This section describes the translation function mechanism. By writing \ML{}
|
wenzelm@9695
|
690 |
functions, you can do almost everything to terms or \AST{}s during parsing and
|
wenzelm@9695
|
691 |
printing. The logic LK is a good example of sophisticated transformations
|
wenzelm@9695
|
692 |
between internal and external representations of sequents; here, macros would
|
wenzelm@9695
|
693 |
be useless.
|
lcp@323
|
694 |
|
lcp@323
|
695 |
A full understanding of translations requires some familiarity
|
lcp@323
|
696 |
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
|
lcp@323
|
697 |
{\tt Syntax.ast} and the encodings of types and terms as such at the various
|
lcp@323
|
698 |
stages of the parsing or printing process. Most users should never need to
|
lcp@323
|
699 |
use translation functions.
|
lcp@323
|
700 |
|
lcp@323
|
701 |
\subsection{Declaring translation functions}
|
wenzelm@3108
|
702 |
There are four kinds of translation functions, with one of these
|
wenzelm@3108
|
703 |
coming in two variants. Each such function is associated with a name,
|
wenzelm@3108
|
704 |
which triggers calls to it. Such names can be constants (logical or
|
wenzelm@3108
|
705 |
syntactic) or type constructors.
|
lcp@323
|
706 |
|
wenzelm@6343
|
707 |
Function {\tt print_syntax} displays the sets of names associated with the
|
wenzelm@6343
|
708 |
translation functions of a theory under \texttt{parse_ast_translation}, etc.
|
wenzelm@6343
|
709 |
You can add new ones via the {\tt ML} section\index{*ML section} of a theory
|
wenzelm@6343
|
710 |
definition file. Even though the {\tt ML} section is the very last part of
|
wenzelm@6343
|
711 |
the file, newly installed translation functions are already effective when
|
wenzelm@6343
|
712 |
processing all of the preceding sections.
|
lcp@323
|
713 |
|
wenzelm@3108
|
714 |
The {\tt ML} section's contents are simply copied verbatim near the
|
wenzelm@3108
|
715 |
beginning of the \ML\ file generated from a theory definition file.
|
wenzelm@3108
|
716 |
Definitions made here are accessible as components of an \ML\
|
wenzelm@3108
|
717 |
structure; to make some parts private, use an \ML{} {\tt local}
|
wenzelm@3108
|
718 |
declaration. The {\ML} code may install translation functions by
|
wenzelm@3108
|
719 |
declaring any of the following identifiers:
|
lcp@323
|
720 |
\begin{ttbox}
|
wenzelm@3108
|
721 |
val parse_ast_translation : (string * (ast list -> ast)) list
|
wenzelm@3108
|
722 |
val print_ast_translation : (string * (ast list -> ast)) list
|
wenzelm@3108
|
723 |
val parse_translation : (string * (term list -> term)) list
|
wenzelm@3108
|
724 |
val print_translation : (string * (term list -> term)) list
|
wenzelm@4375
|
725 |
val typed_print_translation :
|
wenzelm@4375
|
726 |
(string * (bool -> typ -> term list -> term)) list
|
lcp@323
|
727 |
\end{ttbox}
|
lcp@323
|
728 |
|
lcp@323
|
729 |
\subsection{The translation strategy}
|
wenzelm@3108
|
730 |
The different kinds of translation functions are called during the
|
wenzelm@3108
|
731 |
transformations between parse trees, \AST{}s and terms (recall
|
wenzelm@3108
|
732 |
Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form
|
wenzelm@3108
|
733 |
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation
|
wenzelm@3108
|
734 |
function $f$ of appropriate kind exists for $c$, the result is
|
wenzelm@3108
|
735 |
computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
|
lcp@323
|
736 |
|
wenzelm@3108
|
737 |
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.
|
wenzelm@3108
|
738 |
A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1,
|
wenzelm@3108
|
739 |
\ldots, x@n}$. For term translations, the arguments are terms and a
|
wenzelm@3108
|
740 |
combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}
|
wenzelm@3108
|
741 |
(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more
|
wenzelm@3108
|
742 |
sophisticated transformations than \AST{}s do, typically involving
|
wenzelm@3108
|
743 |
abstractions and bound variables. {\em Typed} print translations may
|
wenzelm@4375
|
744 |
even peek at the type $\tau$ of the constant they are invoked on; they
|
wenzelm@4375
|
745 |
are also passed the current value of the \ttindex{show_sorts} flag.
|
lcp@323
|
746 |
|
wenzelm@3108
|
747 |
Regardless of whether they act on terms or \AST{}s, translation
|
wenzelm@3108
|
748 |
functions called during the parsing process differ from those for
|
wenzelm@3108
|
749 |
printing more fundamentally in their overall behaviour:
|
lcp@323
|
750 |
\begin{description}
|
wenzelm@6343
|
751 |
\item[Parse translations] are applied bottom-up. The arguments are already in
|
wenzelm@6343
|
752 |
translated form. The translations must not fail; exceptions trigger an
|
wenzelm@6343
|
753 |
error message. There may never be more than one function associated with
|
wenzelm@6343
|
754 |
any syntactic name.
|
wenzelm@6343
|
755 |
|
lcp@323
|
756 |
\item[Print translations] are applied top-down. They are supplied with
|
lcp@323
|
757 |
arguments that are partly still in internal form. The result again
|
wenzelm@6343
|
758 |
undergoes translation; therefore a print translation should not introduce as
|
wenzelm@6343
|
759 |
head the very constant that invoked it. The function may raise exception
|
wenzelm@6343
|
760 |
\xdx{Match} to indicate failure; in this event it has no effect. Multiple
|
wenzelm@6343
|
761 |
functions associated with some syntactic name are tried in an unspecified
|
wenzelm@6343
|
762 |
order.
|
lcp@323
|
763 |
\end{description}
|
lcp@323
|
764 |
|
lcp@323
|
765 |
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
|
lcp@323
|
766 |
\ttindex{Const} for terms --- can invoke translation functions. This
|
lcp@323
|
767 |
causes another difference between parsing and printing.
|
lcp@323
|
768 |
|
lcp@323
|
769 |
Parsing starts with a string and the constants are not yet identified.
|
lcp@323
|
770 |
Only parse tree heads create {\tt Constant}s in the resulting \AST, as
|
lcp@323
|
771 |
described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may
|
lcp@323
|
772 |
introduce further {\tt Constant}s. When the final \AST{} is converted to a
|
lcp@323
|
773 |
term, all {\tt Constant}s become {\tt Const}s, as described in
|
lcp@323
|
774 |
\S\ref{sec:termofast}.
|
lcp@323
|
775 |
|
lcp@323
|
776 |
Printing starts with a well-typed term and all the constants are known. So
|
lcp@323
|
777 |
all logical constants and type constructors may invoke print translations.
|
lcp@323
|
778 |
These, and macros, may introduce further constants.
|
lcp@323
|
779 |
|
lcp@323
|
780 |
|
lcp@323
|
781 |
\subsection{Example: a print translation for dependent types}
|
lcp@323
|
782 |
\index{examples!of translations}\indexbold{*dependent_tr'}
|
lcp@323
|
783 |
|
lcp@323
|
784 |
Let us continue the dependent type example (page~\pageref{prod_trans}) by
|
lcp@323
|
785 |
examining the parse translation for~\cdx{_K} and the print translation
|
lcp@323
|
786 |
{\tt dependent_tr'}, which are both built-in. By convention, parse
|
lcp@323
|
787 |
translations have names ending with {\tt _tr} and print translations have
|
lcp@323
|
788 |
names ending with {\tt _tr'}. Search for such names in the Isabelle
|
lcp@323
|
789 |
sources to locate more examples.
|
lcp@323
|
790 |
|
lcp@323
|
791 |
Here is the parse translation for {\tt _K}:
|
lcp@323
|
792 |
\begin{ttbox}
|
lcp@323
|
793 |
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
|
wenzelm@864
|
794 |
| k_tr ts = raise TERM ("k_tr", ts);
|
lcp@323
|
795 |
\end{ttbox}
|
lcp@323
|
796 |
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
|
lcp@323
|
797 |
{\tt Abs} node with a body derived from $t$. Since terms given to parse
|
lcp@323
|
798 |
translations are not yet typed, the type of the bound variable in the new
|
lcp@323
|
799 |
{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}
|
lcp@323
|
800 |
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
|
lcp@323
|
801 |
a basic term manipulation function defined in {\tt Pure/term.ML}.
|
lcp@323
|
802 |
|
lcp@323
|
803 |
Here is the print translation for dependent types:
|
lcp@323
|
804 |
\begin{ttbox}
|
wenzelm@864
|
805 |
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
|
lcp@323
|
806 |
if 0 mem (loose_bnos B) then
|
wenzelm@3108
|
807 |
let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in
|
wenzelm@3108
|
808 |
list_comb
|
paulson@8136
|
809 |
(Const (q,dummyT) $
|
paulson@8136
|
810 |
Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts)
|
lcp@323
|
811 |
end
|
lcp@323
|
812 |
else list_comb (Const (r, dummyT) $ A $ B, ts)
|
lcp@323
|
813 |
| dependent_tr' _ _ = raise Match;
|
lcp@323
|
814 |
\end{ttbox}
|
wenzelm@3135
|
815 |
The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt
|
wenzelm@3108
|
816 |
dependent_tr'} by a partial application during its installation.
|
wenzelm@3108
|
817 |
For example, we could set up print translations for both {\tt Pi} and
|
wenzelm@3108
|
818 |
{\tt Sigma} by including
|
lcp@323
|
819 |
\begin{ttbox}\index{*ML section}
|
lcp@323
|
820 |
val print_translation =
|
lcp@323
|
821 |
[("Pi", dependent_tr' ("{\at}PROD", "{\at}->")),
|
lcp@323
|
822 |
("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
|
lcp@323
|
823 |
\end{ttbox}
|
wenzelm@3108
|
824 |
within the {\tt ML} section. The first of these transforms ${\tt
|
wenzelm@3108
|
825 |
Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
|
wenzelm@3108
|
826 |
$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not
|
wenzelm@3108
|
827 |
depend on~$x$. It checks this using \ttindex{loose_bnos}, yet another
|
wenzelm@3108
|
828 |
function from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$
|
wenzelm@3108
|
829 |
renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt
|
wenzelm@3108
|
830 |
Bound} nodes referring to the {\tt Abs} node replaced by
|
wenzelm@3108
|
831 |
$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound
|
wenzelm@3108
|
832 |
variable).
|
lcp@323
|
833 |
|
lcp@323
|
834 |
We must be careful with types here. While types of {\tt Const}s are
|
lcp@323
|
835 |
ignored, type constraints may be printed for some {\tt Free}s and
|
lcp@323
|
836 |
{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type
|
lcp@323
|
837 |
\ttindex{dummyT} are never printed with constraint, though. The line
|
lcp@323
|
838 |
\begin{ttbox}
|
wenzelm@3108
|
839 |
let val (x', B') = Syntax.variant_abs' (x, dummyT, B);
|
wenzelm@3108
|
840 |
\end{ttbox}\index{*Syntax.variant_abs'}
|
lcp@323
|
841 |
replaces bound variable occurrences in~$B$ by the free variable $x'$ with
|
lcp@323
|
842 |
type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the
|
lcp@323
|
843 |
correct type~{\tt T}, so this is the only place where a type
|
wenzelm@864
|
844 |
constraint might appear.
|
wenzelm@3108
|
845 |
|
wenzelm@3108
|
846 |
Also note that we are responsible to mark free identifiers that
|
paulson@3485
|
847 |
actually represent bound variables. This is achieved by
|
wenzelm@3108
|
848 |
\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.
|
wenzelm@3108
|
849 |
Failing to do so may cause these names to be printed in the wrong
|
wenzelm@3108
|
850 |
style. \index{translations|)} \index{syntax!transformations|)}
|
wenzelm@3108
|
851 |
|
wenzelm@3108
|
852 |
|
wenzelm@3108
|
853 |
\section{Token translations} \label{sec:tok_tr}
|
wenzelm@3108
|
854 |
\index{token translations|(}
|
wenzelm@3108
|
855 |
%
|
wenzelm@3108
|
856 |
Isabelle's meta-logic features quite a lot of different kinds of
|
wenzelm@3108
|
857 |
identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},
|
paulson@3485
|
858 |
{\em bound}, {\em var}. One might want to have these printed in
|
wenzelm@3108
|
859 |
different styles, e.g.\ in bold or italic, or even transcribed into
|
wenzelm@3108
|
860 |
something more readable like $\alpha, \alpha', \beta$ instead of {\tt
|
paulson@3485
|
861 |
'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translations
|
wenzelm@3108
|
862 |
provide a means to such ends, enabling the user to install certain
|
wenzelm@3108
|
863 |
\ML{} functions associated with any logical \rmindex{token class} and
|
wenzelm@3108
|
864 |
depending on some \rmindex{print mode}.
|
wenzelm@3108
|
865 |
|
wenzelm@3108
|
866 |
The logical class of identifiers can not necessarily be determined by
|
paulson@3485
|
867 |
its syntactic category, though. For example, consider free vs.\ bound
|
paulson@3485
|
868 |
variables. So Isabelle's pretty printing mechanism, starting from
|
wenzelm@3108
|
869 |
fully typed terms, has to be careful to preserve this additional
|
wenzelm@3108
|
870 |
information\footnote{This is done by marking atoms in abstract syntax
|
paulson@3485
|
871 |
trees appropriately. The marks are actually visible by print
|
wenzelm@3108
|
872 |
translation functions -- they are just special constants applied to
|
wenzelm@3108
|
873 |
atomic asts, for example \texttt{("_bound" x)}.}. In particular,
|
wenzelm@3108
|
874 |
user-supplied print translation functions operating on terms have to
|
paulson@3485
|
875 |
be well-behaved in this respect. Free identifiers introduced to
|
wenzelm@3108
|
876 |
represent bound variables have to be marked appropriately (cf.\ the
|
wenzelm@3108
|
877 |
example at the end of \S\ref{sec:tr_funs}).
|
wenzelm@3108
|
878 |
|
wenzelm@3108
|
879 |
\medskip Token translations may be installed by declaring the
|
wenzelm@6343
|
880 |
\ttindex{token_translation} value within the \texttt{ML} section of a theory
|
wenzelm@6343
|
881 |
definition file:
|
wenzelm@3108
|
882 |
\begin{ttbox}
|
paulson@8136
|
883 |
val token_translation:
|
paulson@8136
|
884 |
(string * string * (string -> string * real)) list
|
wenzelm@8701
|
885 |
\end{ttbox}
|
wenzelm@6343
|
886 |
The elements of this list are of the form $(m, c, f)$, where $m$ is a print
|
wenzelm@6343
|
887 |
mode identifier, $c$ a token class, and $f\colon string \to string \times
|
wenzelm@6343
|
888 |
real$ the actual translation function. Assuming that $x$ is of identifier
|
wenzelm@6343
|
889 |
class $c$, and print mode $m$ is the first (active) mode providing some
|
wenzelm@6343
|
890 |
translation for $c$, then $x$ is output according to $f(x) = (x', len)$.
|
wenzelm@6343
|
891 |
Thereby $x'$ is the modified identifier name and $len$ its visual length in
|
wenzelm@6343
|
892 |
terms of characters (e.g.\ length 1.0 would correspond to $1/2$\,em in
|
wenzelm@6343
|
893 |
\LaTeX). Thus $x'$ may include non-printing parts like control sequences or
|
wenzelm@6343
|
894 |
markup information for typesetting systems.
|
wenzelm@3108
|
895 |
|
wenzelm@3108
|
896 |
|
wenzelm@3108
|
897 |
\index{token translations|)}
|
wenzelm@5371
|
898 |
|
wenzelm@5371
|
899 |
|
wenzelm@5371
|
900 |
%%% Local Variables:
|
wenzelm@5371
|
901 |
%%% mode: latex
|
wenzelm@5371
|
902 |
%%% TeX-master: "ref"
|
wenzelm@5371
|
903 |
%%% End:
|