wneuper/isa: doc-src/Ref/syntax.tex@5a54c796b808 (annotated)

lcp@323	1	%% $Id$
lcp@323	2	\chapter{Syntax Transformations} \label{chap:syntax}
lcp@323	3	\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
lcp@323	4	\newcommand\mtt[1]{\mbox{\tt #1}}
lcp@323	5	\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
lcp@323	6	\newcommand\Constant{\ttfct{Constant}}
lcp@323	7	\newcommand\Variable{\ttfct{Variable}}
lcp@323	8	\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
lcp@323	9	\index{syntax!transformations\|(}
lcp@323	10
lcp@323	11	This chapter is intended for experienced Isabelle users who need to define
lcp@323	12	macros or code their own translation functions. It describes the
lcp@323	13	transformations between parse trees, abstract syntax trees and terms.
lcp@323	14
lcp@323	15
lcp@323	16	\section{Abstract syntax trees} \label{sec:asts}
lcp@323	17	\index{ASTs\|(}
lcp@323	18
lcp@323	19	The parser, given a token list from the lexer, applies productions to yield
lcp@323	20	a parse tree\index{parse trees}. By applying some internal transformations
lcp@323	21	the parse tree becomes an abstract syntax tree, or \AST{}. Macro
lcp@323	22	expansion, further translations and finally type inference yields a
lcp@323	23	well-typed term. The printing process is the reverse, except for some
lcp@323	24	subtleties to be discussed later.
lcp@323	25
lcp@323	26	Figure~\ref{fig:parse_print} outlines the parsing and printing process.
lcp@323	27	Much of the complexity is due to the macro mechanism. Using macros, you
lcp@323	28	can specify most forms of concrete syntax without writing any \ML{} code.
lcp@323	29
lcp@323	30	\begin{figure}
lcp@323	31	\begin{center}
lcp@323	32	\begin{tabular}{cl}
lcp@323	33	string & \\
lcp@323	34	$\downarrow$ & parser \\
lcp@323	35	parse tree & \\
lcp@323	36	$\downarrow$ & parse \AST{} translation \\
lcp@323	37	\AST{} & \\
lcp@323	38	$\downarrow$ & \AST{} rewriting (macros) \\
lcp@323	39	\AST{} & \\
lcp@323	40	$\downarrow$ & parse translation, type inference \\
lcp@323	41	--- well-typed term --- & \\
lcp@323	42	$\downarrow$ & print translation \\
lcp@323	43	\AST{} & \\
lcp@323	44	$\downarrow$ & \AST{} rewriting (macros) \\
lcp@323	45	\AST{} & \\
lcp@323	46	$\downarrow$ & print \AST{} translation, printer \\
lcp@323	47	string &
lcp@323	48	\end{tabular}
lcp@323	49
lcp@323	50	\end{center}
lcp@323	51	\caption{Parsing and printing}\label{fig:parse_print}
lcp@323	52	\end{figure}
lcp@323	53
lcp@323	54	Abstract syntax trees are an intermediate form between the raw parse trees
lcp@323	55	and the typed $\lambda$-terms. An \AST{} is either an atom (constant or
lcp@323	56	variable) or a list of {\em at least two\/} subtrees. Internally, they
lcp@323	57	have type \mltydx{Syntax.ast}: \index{Constant} \index{Variable}
lcp@323	58	\index{*Appl}
lcp@323	59	\begin{ttbox}
lcp@323	60	datatype ast = Constant of string
lcp@323	61	\| Variable of string
lcp@323	62	\| Appl of ast list
lcp@323	63	\end{ttbox}
lcp@323	64	%
lcp@323	65	Isabelle uses an S-expression syntax for abstract syntax trees. Constant
lcp@323	66	atoms are shown as quoted strings, variable atoms as non-quoted strings and
lcp@332	67	applications as a parenthesised list of subtrees. For example, the \AST
lcp@323	68	\begin{ttbox}
lcp@323	69	Appl [Constant "_constrain",
lcp@323	70	Appl [Constant "_abs", Variable "x", Variable "t"],
lcp@323	71	Appl [Constant "fun", Variable "'a", Variable "'b"]]
lcp@323	72	\end{ttbox}
lcp@323	73	is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
lcp@323	74	Both {\tt ()} and {\tt (f)} are illegal because they have too few
lcp@323	75	subtrees.
lcp@323	76
lcp@323	77	The resemblance to Lisp's S-expressions is intentional, but there are two
lcp@323	78	kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the
lcp@323	79	names {\tt Constant} and {\tt Variable} too literally; in the later
lcp@323	80	translation to terms, $\Variable x$ may become a constant, free or bound
lcp@323	81	variable, even a type constructor or class name; the actual outcome depends
lcp@323	82	on the context.
lcp@323	83
lcp@323	84	Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
lcp@323	85	application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of
lcp@323	86	application is determined later by context; it could be a type constructor
lcp@323	87	applied to types.
lcp@323	88
lcp@323	89	Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
lcp@323	90	first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later
lcp@323	91	at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
lcp@323	92	occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
lcp@323	93	constructor \ttindex{Bound}).
lcp@323	94
lcp@323	95
lcp@323	96	\section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
lcp@323	97	\index{ASTs!made from parse trees}
lcp@323	98	\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
lcp@323	99
lcp@323	100	The parse tree is the raw output of the parser. Translation functions,
lcp@323	101	called {\bf parse AST translations}\indexbold{translations!parse AST},
lcp@323	102	transform the parse tree into an abstract syntax tree.
lcp@323	103
lcp@323	104	The parse tree is constructed by nesting the right-hand sides of the
lcp@323	105	productions used to recognize the input. Such parse trees are simply lists
lcp@323	106	of tokens and constituent parse trees, the latter representing the
lcp@323	107	nonterminals of the productions. Let us refer to the actual productions in
lcp@323	108	the form displayed by {\tt Syntax.print_syntax}.
lcp@323	109
lcp@323	110	Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
lcp@323	111	by stripping out delimiters and copy productions. More precisely, the
lcp@323	112	mapping $\astofpt{-}$ is derived from the productions as follows:
lcp@323	113	\begin{itemize}
lcp@323	114	\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
lcp@323	115	\ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
lcp@323	116
lcp@323	117	\item Copy productions:\index{productions!copy}
lcp@323	118	$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for
lcp@323	119	strings of delimiters, which are discarded. $P$ stands for the single
lcp@323	120	constituent that is not a delimiter; it is either a nonterminal symbol or
lcp@323	121	a name token.
lcp@323	122
lcp@323	123	\item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
lcp@323	124	Here there are no constituents other than delimiters, which are
lcp@323	125	discarded.
lcp@323	126
lcp@323	127	\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
lcp@323	128	the remaining constituents $P@1$, \ldots, $P@n$ are built into an
lcp@323	129	application whose head constant is~$c$:
lcp@323	130	\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
lcp@323	131	\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
lcp@323	132	\]
lcp@323	133	\end{itemize}
lcp@323	134	Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
lcp@323	135	{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
lcp@323	136	These examples illustrate the need for further translations to make \AST{}s
lcp@323	137	closer to the typed $\lambda$-calculus. The Pure syntax provides
lcp@323	138	predefined parse \AST{} translations\index{translations!parse AST} for
lcp@323	139	ordinary applications, type applications, nested abstractions, meta
lcp@323	140	implications and function types. Figure~\ref{fig:parse_ast_tr} shows their
lcp@323	141	effect on some representative input strings.
lcp@323	142
lcp@323	143
lcp@323	144	\begin{figure}
lcp@323	145	\begin{center}
lcp@323	146	\tt\begin{tabular}{ll}
lcp@323	147	\rm input string & \rm \AST \\\hline
lcp@323	148	"f" & f \\
lcp@323	149	"'a" & 'a \\
lcp@323	150	"t == u" & ("==" t u) \\
lcp@323	151	"f(x)" & ("_appl" f x) \\
lcp@323	152	"f(x, y)" & ("_appl" f ("_args" x y)) \\
lcp@323	153	"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\
lcp@323	154	"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\
lcp@323	155	\end{tabular}
lcp@323	156	\end{center}
lcp@323	157	\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
lcp@323	158	\end{figure}
lcp@323	159
lcp@323	160	\begin{figure}
lcp@323	161	\begin{center}
lcp@323	162	\tt\begin{tabular}{ll}
lcp@323	163	\rm input string & \rm \AST{} \\\hline
lcp@323	164	"f(x, y, z)" & (f x y z) \\
lcp@323	165	"'a ty" & (ty 'a) \\
lcp@323	166	"('a, 'b) ty" & (ty 'a 'b) \\
lcp@323	167	"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\
lcp@323	168	"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\
lcp@323	169	"[\| P; Q; R \|] => S" & ("==>" P ("==>" Q ("==>" R S))) \\
lcp@323	170	"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
lcp@323	171	\end{tabular}
lcp@323	172	\end{center}
lcp@323	173	\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
lcp@323	174	\end{figure}
lcp@323	175
lcp@323	176	The names of constant heads in the \AST{} control the translation process.
lcp@323	177	The list of constants invoking parse \AST{} translations appears in the
lcp@323	178	output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
lcp@323	179
lcp@323	180
lcp@323	181	\section{Transforming \AST{}s to terms}\label{sec:termofast}
lcp@323	182	\index{terms!made from ASTs}
lcp@323	183	\newcommand\termofast[1]{\lbrakk#1\rbrakk}
lcp@323	184
lcp@323	185	The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323	186	transformed into a term. This term is probably ill-typed since type
lcp@323	187	inference has not occurred yet. The term may contain type constraints
lcp@323	188	consisting of applications with head {\tt "_constrain"}; the second
lcp@323	189	argument is a type encoded as a term. Type inference later introduces
lcp@323	190	correct types or rejects the input.
lcp@323	191
lcp@323	192	Another set of translation functions, namely parse
lcp@323	193	translations\index{translations!parse}, may affect this process. If we
lcp@323	194	ignore parse translations for the time being, then \AST{}s are transformed
lcp@323	195	to terms by mapping \AST{} constants to constants, \AST{} variables to
lcp@323	196	schematic or free variables and \AST{} applications to applications.
lcp@323	197
lcp@323	198	More precisely, the mapping $\termofast{-}$ is defined by
lcp@323	199	\begin{itemize}
lcp@323	200	\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
lcp@323	201	\mtt{dummyT})$.
lcp@323	202
lcp@323	203	\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
lcp@323	204	\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
lcp@323	205	the index extracted from~$xi$.
lcp@323	206
lcp@323	207	\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
lcp@323	208	\mtt{dummyT})$.
lcp@323	209
lcp@323	210	\item Function applications with $n$ arguments:
lcp@323	211	\[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
lcp@323	212	\termofast{f} \ttapp
lcp@323	213	\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
lcp@323	214	\]
lcp@323	215	\end{itemize}
lcp@323	216	Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
lcp@323	217	\verb\|$\|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
lcp@323	218	while \ttindex{dummyT} stands for some dummy type that is ignored during
lcp@323	219	type inference.
lcp@323	220
lcp@323	221	So far the outcome is still a first-order term. Abstractions and bound
lcp@323	222	variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
lcp@323	223	by parse translations. Such translations are attached to {\tt "_abs"},
lcp@323	224	{\tt "!!"} and user-defined binders.
lcp@323	225
lcp@323	226
lcp@323	227	\section{Printing of terms}
lcp@323	228	\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
lcp@323	229
lcp@323	230	The output phase is essentially the inverse of the input phase. Terms are
lcp@323	231	translated via abstract syntax trees into strings. Finally the strings are
lcp@323	232	pretty printed.
lcp@323	233
lcp@323	234	Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
lcp@323	235	terms into \AST{}s. Ignoring those, the transformation maps
lcp@323	236	term constants, variables and applications to the corresponding constructs
lcp@323	237	on \AST{}s. Abstractions are mapped to applications of the special
lcp@323	238	constant {\tt _abs}.
lcp@323	239
lcp@323	240	More precisely, the mapping $\astofterm{-}$ is defined as follows:
lcp@323	241	\begin{itemize}
lcp@323	242	\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
lcp@323	243
lcp@323	244	\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
lcp@323	245	\tau)$.
lcp@323	246
lcp@323	247	\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
lcp@323	248	\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
lcp@323	249	the {\tt indexname} $(x, i)$.
lcp@323	250
lcp@323	251	\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
lcp@323	252	of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
lcp@323	253	be obtained from~$t$ by replacing all bound occurrences of~$x$ by
lcp@323	254	the free variable $x'$. This replaces corresponding occurrences of the
lcp@323	255	constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
lcp@323	256	\mtt{dummyT})$:
lcp@323	257	\[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
lcp@323	258	\Appl{\Constant \mtt{"_abs"},
lcp@323	259	constrain(\Variable x', \tau), \astofterm{t'}}.
lcp@323	260	\]
lcp@323	261
lcp@323	262	\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
lcp@323	263	The occurrence of constructor \ttindex{Bound} should never happen
lcp@323	264	when printing well-typed terms; it indicates a de Bruijn index with no
lcp@323	265	matching abstraction.
lcp@323	266
lcp@323	267	\item Where $f$ is not an application,
lcp@323	268	\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
lcp@323	269	\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
lcp@323	270	\]
lcp@323	271	\end{itemize}
lcp@323	272	%
lcp@332	273	Type constraints\index{type constraints} are inserted to allow the printing
lcp@332	274	of types. This is governed by the boolean variable \ttindex{show_types}:
lcp@323	275	\begin{itemize}
lcp@323	276	\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
lcp@332	277	\ttindex{show_types} is set to {\tt false}.
lcp@323	278
lcp@323	279	\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
lcp@323	280	\astofterm{\tau}}$ \ otherwise.
lcp@323	281
lcp@323	282	Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
lcp@323	283	constructors go to {\tt Constant}s; type identifiers go to {\tt
lcp@323	284	Variable}s; type applications go to {\tt Appl}s with the type
lcp@323	285	constructor as the first element. If \ttindex{show_sorts} is set to
lcp@323	286	{\tt true}, some type variables are decorated with an \AST{} encoding
lcp@323	287	of their sort.
lcp@323	288	\end{itemize}
lcp@323	289	%
lcp@323	290	The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323	291	transformed into the final output string. The built-in {\bf print AST
lcp@332	292	translations}\indexbold{translations!print AST} reverse the
lcp@323	293	parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
lcp@323	294
lcp@323	295	For the actual printing process, the names attached to productions
lcp@323	296	of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
lcp@323	297	a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
lcp@323	298	$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
lcp@323	299	for~$c$. Each argument~$x@i$ is converted to a string, and put in
lcp@323	300	parentheses if its priority~$(p@i)$ requires this. The resulting strings
lcp@323	301	and their syntactic sugar (denoted by \dots{} above) are joined to make a
lcp@323	302	single string.
lcp@323	303
lcp@323	304	If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
lcp@323	305	corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
lcp@323	306	x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
lcp@323	307	non-constant head or without a corresponding production are printed as
lcp@323	308	$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of
lcp@323	309	$\Variable x$ is simply printed as~$x$.
lcp@323	310
lcp@323	311	Blanks are {\em not\/} inserted automatically. If blanks are required to
lcp@323	312	separate tokens, specify them in the mixfix declaration, possibly preceded
lcp@323	313	by a slash~({\tt/}) to allow a line break.
lcp@323	314	\index{ASTs\|)}
lcp@323	315
lcp@323	316
lcp@323	317
lcp@323	318	\section{Macros: Syntactic rewriting} \label{sec:macros}
lcp@323	319	\index{macros\|(}\index{rewriting!syntactic\|(}
lcp@323	320
lcp@323	321	Mixfix declarations alone can handle situations where there is a direct
lcp@323	322	connection between the concrete syntax and the underlying term. Sometimes
lcp@323	323	we require a more elaborate concrete syntax, such as quantifiers and list
lcp@323	324	notation. Isabelle's {\bf macros} and {\bf translation functions} can
lcp@323	325	perform translations such as
lcp@323	326	\begin{center}\tt
lcp@323	327	\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
lcp@323	328	ALL x:A.P & Ball(A, \%x.P) \\ \relax
lcp@323	329	[x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))
lcp@323	330	\end{tabular}
lcp@323	331	\end{center}
lcp@323	332	Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
lcp@323	333	are the most powerful translation mechanism but are difficult to read or
lcp@323	334	write. Macros are specified by first-order rewriting systems that operate
lcp@323	335	on abstract syntax trees. They are usually easy to read and write, and can
lcp@323	336	express all but the most obscure translations.
lcp@323	337
lcp@323	338	Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
lcp@323	339	theory.\footnote{This and the following theories are complete working
lcp@323	340	examples, though they specify only syntax, no axioms. The file {\tt
lcp@332	341	ZF/ZF.thy} presents the full set theory definition, including many
lcp@323	342	macro rules.} Theory {\tt SET} defines constants for set comprehension
lcp@323	343	({\tt Collect}), replacement ({\tt Replace}) and bounded universal
lcp@323	344	quantification ({\tt Ball}). Each of these binds some variables. Without
lcp@332	345	additional syntax we should have to write $\forall x \in A. P$ as {\tt
lcp@323	346	Ball(A,\%x.P)}, and similarly for the others.
lcp@323	347
lcp@323	348	\begin{figure}
lcp@323	349	\begin{ttbox}
lcp@323	350	SET = Pure +
lcp@323	351	types
lcp@323	352	i, o
lcp@323	353	arities
lcp@323	354	i, o :: logic
lcp@323	355	consts
lcp@323	356	Trueprop :: "o => prop" ("_" 5)
lcp@323	357	Collect :: "[i, i => o] => i"
lcp@323	358	"{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
lcp@323	359	Replace :: "[i, [i, i] => o] => i"
lcp@323	360	"{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
lcp@323	361	Ball :: "[i, i => o] => o"
lcp@323	362	"{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10)
lcp@323	363	translations
lcp@323	364	"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)"
lcp@323	365	"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
lcp@323	366	"ALL x:A. P" == "Ball(A, \%x. P)"
lcp@323	367	end
lcp@323	368	\end{ttbox}
lcp@323	369	\caption{Macro example: set theory}\label{fig:set_trans}
lcp@323	370	\end{figure}
lcp@323	371
lcp@323	372	The theory specifies a variable-binding syntax through additional
lcp@323	373	productions that have mixfix declarations. Each non-copy production must
lcp@323	374	specify some constant, which is used for building \AST{}s.
lcp@323	375	\index{constants!syntactic} The additional constants are decorated with
lcp@323	376	{\tt\at} to stress their purely syntactic purpose; they should never occur
lcp@323	377	within the final well-typed terms. Furthermore, they cannot be written in
lcp@323	378	formulae because they are not legal identifiers.
lcp@323	379
lcp@323	380	The translations cause the replacement of external forms by internal forms
lcp@323	381	after parsing, and vice versa before printing of terms. As a specification
lcp@323	382	of the set theory notation, they should be largely self-explanatory. The
lcp@323	383	syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
lcp@323	384	appear implicitly in the macro rules via their mixfix forms.
lcp@323	385
lcp@323	386	Macros can define variable-binding syntax because they operate on \AST{}s,
lcp@323	387	which have no inbuilt notion of bound variable. The macro variables {\tt
lcp@323	388	x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
lcp@323	389	in this case bound variables. The macro variables {\tt P} and~{\tt Q}
lcp@323	390	range over formulae containing bound variable occurrences.
lcp@323	391
lcp@323	392	Other applications of the macro system can be less straightforward, and
lcp@323	393	there are peculiarities. The rest of this section will describe in detail
lcp@323	394	how Isabelle macros are preprocessed and applied.
lcp@323	395
lcp@323	396
lcp@323	397	\subsection{Specifying macros}
lcp@323	398	Macros are basically rewrite rules on \AST{}s. But unlike other macro
lcp@323	399	systems found in programming languages, Isabelle's macros work in both
lcp@323	400	directions. Therefore a syntax contains two lists of rewrites: one for
lcp@323	401	parsing and one for printing.
lcp@323	402
lcp@323	403	\index{*translations section}
lcp@323	404	The {\tt translations} section specifies macros. The syntax for a macro is
lcp@323	405	\[ (root)\; string \quad
lcp@323	406	\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
lcp@323	407	\right\} \quad
lcp@323	408	(root)\; string
lcp@323	409	\]
lcp@323	410	%
lcp@323	411	This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
lcp@323	412	({\tt ==}). The two strings specify the left and right-hand sides of the
lcp@323	413	macro rule. The $(root)$ specification is optional; it specifies the
lcp@323	414	nonterminal for parsing the $string$ and if omitted defaults to {\tt
lcp@323	415	logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:
lcp@323	416	\begin{itemize}
lcp@323	417	\item Rules must be left linear: $l$ must not contain repeated variables.
lcp@323	418
lcp@323	419	\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
lcp@323	420	(\mtt"c\mtt" ~ x@1 \ldots x@n)$.
lcp@323	421
lcp@323	422	\item Every variable in~$r$ must also occur in~$l$.
lcp@323	423	\end{itemize}
lcp@323	424
lcp@323	425	Macro rules may refer to any syntax from the parent theories. They may
lcp@499	426	also refer to anything defined before the {\tt .thy} file's {\tt
lcp@323	427	translations} section --- including any mixfix declarations.
lcp@323	428
lcp@323	429	Upon declaration, both sides of the macro rule undergo parsing and parse
lcp@323	430	\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
lcp@323	431	macro expansion. The lexer runs in a different mode that additionally
lcp@323	432	accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
lcp@323	433	{\tt _K}). Thus, a constant whose name starts with an underscore can
lcp@323	434	appear in macro rules but not in ordinary terms.
lcp@323	435
lcp@323	436	Some atoms of the macro rule's \AST{} are designated as constants for
lcp@323	437	matching. These are all names that have been declared as classes, types or
lcp@323	438	constants.
lcp@323	439
lcp@323	440	The result of this preprocessing is two lists of macro rules, each stored
lcp@323	441	as a pair of \AST{}s. They can be viewed using {\tt Syntax.print_syntax}
lcp@323	442	(sections \ttindex{parse_rules} and \ttindex{print_rules}). For
lcp@323	443	theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
lcp@323	444	\begin{ttbox}
lcp@323	445	parse_rules:
lcp@323	446	("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))
lcp@323	447	("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q)))
lcp@323	448	("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))
lcp@323	449	print_rules:
lcp@323	450	("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P)
lcp@323	451	("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q)
lcp@323	452	("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)
lcp@323	453	\end{ttbox}
lcp@323	454
lcp@323	455	\begin{warn}
lcp@323	456	Avoid choosing variable names that have previously been used as
lcp@323	457	constants, types or type classes; the {\tt consts} section in the output
lcp@323	458	of {\tt Syntax.print_syntax} lists all such names. If a macro rule works
lcp@323	459	incorrectly, inspect its internal form as shown above, recalling that
lcp@323	460	constants appear as quoted strings and variables without quotes.
lcp@323	461	\end{warn}
lcp@323	462
lcp@323	463	\begin{warn}
lcp@323	464	If \ttindex{eta_contract} is set to {\tt true}, terms will be
lcp@323	465	$\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some
lcp@323	466	abstraction nodes needed for print rules to match may vanish. For example,
lcp@332	467	\verb\|Ball(A, %x. P(x))\| contracts to {\tt Ball(A, P)}; the print rule does
lcp@323	468	not apply and the output will be {\tt Ball(A, P)}. This problem would not
lcp@323	469	occur if \ML{} translation functions were used instead of macros (as is
lcp@323	470	done for binder declarations).
lcp@323	471	\end{warn}
lcp@323	472
lcp@323	473
lcp@323	474	\begin{warn}
lcp@323	475	Another trap concerns type constraints. If \ttindex{show_types} is set to
lcp@323	476	{\tt true}, bound variables will be decorated by their meta types at the
lcp@323	477	binding place (but not at occurrences in the body). Matching with
lcp@323	478	\verb\|Collect(A, %x. P)\| binds {\tt x} to something like {\tt ("_constrain" y
lcp@323	479	"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to
lcp@323	480	appear in the external form, say \verb\|{y::i:A::i. P::o}\|.
lcp@323	481
lcp@323	482	To allow such constraints to be re-read, your syntax should specify bound
lcp@323	483	variables using the nonterminal~\ndx{idt}. This is the case in our
lcp@323	484	example. Choosing {\tt id} instead of {\tt idt} is a common error,
lcp@323	485	especially since it appears in former versions of most of Isabelle's
lcp@323	486	object-logics.
lcp@323	487	\end{warn}
lcp@323	488
lcp@323	489
lcp@323	490
lcp@323	491	\subsection{Applying rules}
lcp@323	492	As a term is being parsed or printed, an \AST{} is generated as an
lcp@323	493	intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is
lcp@332	494	normalised by applying macro rules in the manner of a traditional term
lcp@323	495	rewriting system. We first examine how a single rule is applied.
lcp@323	496
lcp@332	497	Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
lcp@323	498	translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an
lcp@323	499	instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex
lcp@323	500	matched by $l$ may be replaced by the corresponding instance of~$r$, thus
lcp@323	501	{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf
lcp@323	502	place-holders} that may occur in rule patterns but not in ordinary
lcp@323	503	\AST{}s; {\tt Variable} atoms serve this purpose.
lcp@323	504
lcp@323	505	The matching of the object~$u$ by the pattern~$l$ is performed as follows:
lcp@323	506	\begin{itemize}
lcp@323	507	\item Every constant matches itself.
lcp@323	508
lcp@323	509	\item $\Variable x$ in the object matches $\Constant x$ in the pattern.
lcp@323	510	This point is discussed further below.
lcp@323	511
lcp@323	512	\item Every \AST{} in the object matches $\Variable x$ in the pattern,
lcp@323	513	binding~$x$ to~$u$.
lcp@323	514
lcp@323	515	\item One application matches another if they have the same number of
lcp@323	516	subtrees and corresponding subtrees match.
lcp@323	517
lcp@323	518	\item In every other case, matching fails. In particular, {\tt
lcp@323	519	Constant}~$x$ can only match itself.
lcp@323	520	\end{itemize}
lcp@323	521	A successful match yields a substitution that is applied to~$r$, generating
lcp@323	522	the instance that replaces~$u$.
lcp@323	523
lcp@323	524	The second case above may look odd. This is where {\tt Variable}s of
lcp@323	525	non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not
lcp@323	526	far removed from parse trees; at this level it is not yet known which
lcp@323	527	identifiers will become constants, bounds, frees, types or classes. As
lcp@323	528	\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
lcp@323	529	{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
lcp@323	530	\ndx{tvar} become {\tt Variable}s. On the other hand, when \AST{}s
lcp@323	531	generated from terms for printing, all constants and type constructors
lcp@323	532	become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may contain a
lcp@323	533	messy mixture of {\tt Variable}s and {\tt Constant}s. This is
lcp@323	534	insignificant at macro level because matching treats them alike.
lcp@323	535
lcp@323	536	Because of this behaviour, different kinds of atoms with the same name are
lcp@323	537	indistinguishable, which may make some rules prone to misbehaviour. Example:
lcp@323	538	\begin{ttbox}
lcp@323	539	types
lcp@323	540	Nil
lcp@323	541	consts
lcp@323	542	Nil :: "'a list"
lcp@323	543	"[]" :: "'a list" ("[]")
lcp@323	544	translations
lcp@323	545	"[]" == "Nil"
lcp@323	546	\end{ttbox}
lcp@323	547	The term {\tt Nil} will be printed as {\tt []}, just as expected.
lcp@323	548	The term \verb\|%Nil.t\| will be printed as \verb\|%[].t\|, which might not be
lcp@332	549	expected! How is the type~{\tt Nil} printed?
lcp@323	550
lcp@323	551	Normalizing an \AST{} involves repeatedly applying macro rules until none
lcp@332	552	are applicable. Macro rules are chosen in the order that they appear in the
lcp@323	553	{\tt translations} section. You can watch the normalization of \AST{}s
lcp@323	554	during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
lcp@323	555	{\tt true}.\index{tracing!of macros} Alternatively, use
lcp@323	556	\ttindex{Syntax.test_read}. The information displayed when tracing
lcp@323	557	includes the \AST{} before normalization ({\tt pre}), redexes with results
lcp@323	558	({\tt rewrote}), the normal form finally reached ({\tt post}) and some
lcp@323	559	statistics ({\tt normalize}). If tracing is off,
lcp@323	560	\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
lcp@323	561	printing of the normal form and statistics only.
lcp@323	562
lcp@323	563
lcp@323	564	\subsection{Example: the syntax of finite sets}
lcp@323	565	\index{examples!of macros}
lcp@323	566
lcp@323	567	This example demonstrates the use of recursive macros to implement a
lcp@323	568	convenient notation for finite sets.
lcp@323	569	\index{empty constant}\index{insert constant}\index{{}@\verb'{}' symbol}
lcp@323	570	\index{"@Enum@{\tt\at Enum} constant}
lcp@323	571	\index{"@Finset@{\tt\at Finset} constant}
lcp@323	572	\begin{ttbox}
lcp@323	573	FINSET = SET +
lcp@323	574	types
lcp@323	575	is
lcp@323	576	consts
lcp@323	577	"" :: "i => is" ("_")
lcp@323	578	"{\at}Enum" :: "[i, is] => is" ("_,/ _")
lcp@323	579	empty :: "i" ("{\ttlbrace}{\ttrbrace}")
lcp@323	580	insert :: "[i, i] => i"
lcp@323	581	"{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}")
lcp@323	582	translations
lcp@323	583	"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"
lcp@323	584	"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"
lcp@323	585	end
lcp@323	586	\end{ttbox}
lcp@323	587	Finite sets are internally built up by {\tt empty} and {\tt insert}. The
lcp@323	588	declarations above specify \verb\|{x, y, z}\| as the external representation
lcp@323	589	of
lcp@323	590	\begin{ttbox}
lcp@323	591	insert(x, insert(y, insert(z, empty)))
lcp@323	592	\end{ttbox}
lcp@323	593	The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
lcp@323	594	i} separated by commas. The mixfix declaration \hbox{\verb\|"_,/ _"\|}
lcp@323	595	allows a line break after the comma for \rmindex{pretty printing}; if no
lcp@323	596	line break is required then a space is printed instead.
lcp@323	597
lcp@323	598	The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
lcp@323	599	declaration. Hence {\tt is} is not a logical type and no default
lcp@323	600	productions are added. If we had needed enumerations of the nonterminal
lcp@323	601	{\tt logic}, which would include all the logical types, we could have used
lcp@323	602	the predefined nonterminal symbol \ndx{args} and skipped this part
lcp@323	603	altogether. The nonterminal~{\tt is} can later be reused for other
lcp@323	604	enumerations of type~{\tt i} like lists or tuples.
lcp@323	605
lcp@323	606	\index{"@Finset@{\tt\at Finset} constant}
lcp@323	607	Next follows {\tt empty}, which is already equipped with its syntax
lcp@323	608	\verb\|{}\|, and {\tt insert} without concrete syntax. The syntactic
lcp@323	609	constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
lcp@323	610	i} enclosed in curly braces. Remember that a pair of parentheses, as in
lcp@323	611	\verb\|"{(_)}"\|, specifies a block of indentation for pretty printing.
lcp@323	612
lcp@323	613	The translations may look strange at first. Macro rules are best
lcp@323	614	understood in their internal forms:
lcp@323	615	\begin{ttbox}
lcp@323	616	parse_rules:
lcp@323	617	("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs))
lcp@323	618	("{\at}Finset" x) -> ("insert" x "empty")
lcp@323	619	print_rules:
lcp@323	620	("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs))
lcp@323	621	("insert" x "empty") -> ("{\at}Finset" x)
lcp@323	622	\end{ttbox}
lcp@323	623	This shows that \verb\|{x, xs}\| indeed matches any set enumeration of at least
lcp@323	624	two elements, binding the first to {\tt x} and the rest to {\tt xs}.
lcp@323	625	Likewise, \verb\|{xs}\| and \verb\|{x}\| represent any set enumeration.
lcp@323	626	The parse rules only work in the order given.
lcp@323	627
lcp@323	628	\begin{warn}
lcp@332	629	The \AST{} rewriter cannot distinguish constants from variables and looks
lcp@323	630	only for names of atoms. Thus the names of {\tt Constant}s occurring in
lcp@323	631	the (internal) left-hand side of translation rules should be regarded as
lcp@323	632	\rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or
lcp@323	633	sufficiently long and strange names. If a bound variable's name gets
lcp@323	634	rewritten, the result will be incorrect; for example, the term
lcp@323	635	\begin{ttbox}
lcp@323	636	\%empty insert. insert(x, empty)
lcp@323	637	\end{ttbox}
lcp@332	638	\par\noindent is printed as \verb\|%empty insert. {x}\|.
lcp@323	639	\end{warn}
lcp@323	640
lcp@323	641
lcp@323	642	\subsection{Example: a parse macro for dependent types}\label{prod_trans}
lcp@323	643	\index{examples!of macros}
lcp@323	644
lcp@323	645	As stated earlier, a macro rule may not introduce new {\tt Variable}s on
lcp@323	646	the right-hand side. Something like \verb\|"K(B)" => "%x. B"\| is illegal;
lcp@323	647	if allowed, it could cause variable capture. In such cases you usually
lcp@323	648	must fall back on translation functions. But a trick can make things
lcp@323	649	readable in some cases: {\em calling translation functions by parse
lcp@323	650	macros}:
lcp@323	651	\begin{ttbox}
lcp@323	652	PROD = FINSET +
lcp@323	653	consts
lcp@323	654	Pi :: "[i, i => i] => i"
lcp@323	655	"{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10)
lcp@323	656	"{\at}->" :: "[i, i] => i" ("(_ ->/ _)" [51, 50] 50)
lcp@323	657	\ttbreak
lcp@323	658	translations
lcp@323	659	"PROD x:A. B" => "Pi(A, \%x. B)"
lcp@323	660	"A -> B" => "Pi(A, _K(B))"
lcp@323	661	end
lcp@323	662	ML
lcp@323	663	val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
lcp@323	664	\end{ttbox}
lcp@323	665
lcp@323	666	Here {\tt Pi} is an internal constant for constructing general products.
lcp@323	667	Two external forms exist: the general case {\tt PROD x:A.B} and the
lcp@323	668	function space {\tt A -> B}, which abbreviates \verb\|Pi(A, %x.B)\| when {\tt B}
lcp@323	669	does not depend on~{\tt x}.
lcp@323	670
lcp@323	671	The second parse macro introduces {\tt _K(B)}, which later becomes \verb\|%x.B\|
lcp@323	672	due to a parse translation associated with \cdx{_K}. The order of the
lcp@323	673	parse rules is critical. Unfortunately there is no such trick for
lcp@323	674	printing, so we have to add a {\tt ML} section for the print translation
lcp@323	675	\ttindex{dependent_tr'}.
lcp@323	676
lcp@323	677	Recall that identifiers with a leading {\tt _} are allowed in translation
lcp@323	678	rules, but not in ordinary terms. Thus we can create \AST{}s containing
lcp@323	679	names that are not directly expressible.
lcp@323	680
lcp@323	681	The parse translation for {\tt _K} is already installed in Pure, and {\tt
lcp@323	682	dependent_tr'} is exported by the syntax module for public use. See
lcp@323	683	\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
lcp@323	684	\index{macros\|)}\index{rewriting!syntactic\|)}
lcp@323	685
lcp@323	686
lcp@323	687	\section{Translation functions} \label{sec:tr_funs}
lcp@323	688	\index{translations\|(}
lcp@323	689	%
lcp@323	690	This section describes the translation function mechanism. By writing
lcp@323	691	\ML{} functions, you can do almost everything with terms or \AST{}s during
lcp@323	692	parsing and printing. The logic \LK\ is a good example of sophisticated
lcp@332	693	transformations between internal and external representations of sequents;
lcp@332	694	here, macros would be useless.
lcp@323	695
lcp@323	696	A full understanding of translations requires some familiarity
lcp@323	697	with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
lcp@323	698	{\tt Syntax.ast} and the encodings of types and terms as such at the various
lcp@323	699	stages of the parsing or printing process. Most users should never need to
lcp@323	700	use translation functions.
lcp@323	701
lcp@323	702	\subsection{Declaring translation functions}
lcp@323	703	There are four kinds of translation functions. Each such function is
lcp@323	704	associated with a name, which triggers calls to it. Such names can be
lcp@323	705	constants (logical or syntactic) or type constructors.
lcp@323	706
lcp@323	707	{\tt Syntax.print_syntax} displays the sets of names associated with the
lcp@323	708	translation functions of a {\tt Syntax.syntax} under
lcp@323	709	\ttindex{parse_ast_translation}, \ttindex{parse_translation},
lcp@323	710	\ttindex{print_translation} and \ttindex{print_ast_translation}. You can
lcp@323	711	add new ones via the {\tt ML} section\index{*ML section} of
lcp@323	712	a {\tt .thy} file. There may never be more than one function of the same
lcp@323	713	kind per name. Conceptually, the {\tt ML} section should appear between
lcp@323	714	{\tt consts} and {\tt translations}; newly installed translation functions
lcp@323	715	are already effective when macros and logical rules are parsed.
lcp@323	716
lcp@323	717	The {\tt ML} section is copied verbatim into the \ML\ file generated from a
lcp@323	718	{\tt .thy} file. Definitions made here are accessible as components of an
lcp@323	719	\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
lcp@323	720	declaration. The {\tt ML} section may install translation functions by
lcp@323	721	declaring any of the following identifiers:
lcp@323	722	\begin{ttbox}
lcp@323	723	val parse_ast_translation : (string * (ast list -> ast)) list
lcp@323	724	val print_ast_translation : (string * (ast list -> ast)) list
lcp@323	725	val parse_translation : (string * (term list -> term)) list
lcp@323	726	val print_translation : (string * (term list -> term)) list
lcp@323	727	\end{ttbox}
lcp@323	728
lcp@323	729	\subsection{The translation strategy}
lcp@323	730	All four kinds of translation functions are treated similarly. They are
lcp@323	731	called during the transformations between parse trees, \AST{}s and terms
lcp@323	732	(recall Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form
lcp@323	733	$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
lcp@323	734	$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
lcp@323	735	function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
lcp@323	736
lcp@323	737	For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. A
lcp@323	738	combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
lcp@323	739	x@n}$. For term translations, the arguments are terms and a combination
lcp@323	740	has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
lcp@323	741	x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated
lcp@323	742	transformations than \AST{}s do, typically involving abstractions and bound
lcp@323	743	variables.
lcp@323	744
lcp@323	745	Regardless of whether they act on terms or \AST{}s,
lcp@323	746	parse translations differ from print translations fundamentally:
lcp@323	747	\begin{description}
lcp@323	748	\item[Parse translations] are applied bottom-up. The arguments are already
lcp@323	749	in translated form. The translations must not fail; exceptions trigger
lcp@323	750	an error message.
lcp@323	751
lcp@323	752	\item[Print translations] are applied top-down. They are supplied with
lcp@323	753	arguments that are partly still in internal form. The result again
lcp@323	754	undergoes translation; therefore a print translation should not introduce
lcp@323	755	as head the very constant that invoked it. The function may raise
lcp@323	756	exception \xdx{Match} to indicate failure; in this event it has no
lcp@323	757	effect.
lcp@323	758	\end{description}
lcp@323	759
lcp@323	760	Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
lcp@323	761	\ttindex{Const} for terms --- can invoke translation functions. This
lcp@323	762	causes another difference between parsing and printing.
lcp@323	763
lcp@323	764	Parsing starts with a string and the constants are not yet identified.
lcp@323	765	Only parse tree heads create {\tt Constant}s in the resulting \AST, as
lcp@323	766	described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may
lcp@323	767	introduce further {\tt Constant}s. When the final \AST{} is converted to a
lcp@323	768	term, all {\tt Constant}s become {\tt Const}s, as described in
lcp@323	769	\S\ref{sec:termofast}.
lcp@323	770
lcp@323	771	Printing starts with a well-typed term and all the constants are known. So
lcp@323	772	all logical constants and type constructors may invoke print translations.
lcp@323	773	These, and macros, may introduce further constants.
lcp@323	774
lcp@323	775
lcp@323	776	\subsection{Example: a print translation for dependent types}
lcp@323	777	\index{examples!of translations}\indexbold{*dependent_tr'}
lcp@323	778
lcp@323	779	Let us continue the dependent type example (page~\pageref{prod_trans}) by
lcp@323	780	examining the parse translation for~\cdx{_K} and the print translation
lcp@323	781	{\tt dependent_tr'}, which are both built-in. By convention, parse
lcp@323	782	translations have names ending with {\tt _tr} and print translations have
lcp@323	783	names ending with {\tt _tr'}. Search for such names in the Isabelle
lcp@323	784	sources to locate more examples.
lcp@323	785
lcp@323	786	Here is the parse translation for {\tt _K}:
lcp@323	787	\begin{ttbox}
lcp@323	788	fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
lcp@323	789	\| k_tr ts = raise TERM("k_tr",ts);
lcp@323	790	\end{ttbox}
lcp@323	791	If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
lcp@323	792	{\tt Abs} node with a body derived from $t$. Since terms given to parse
lcp@323	793	translations are not yet typed, the type of the bound variable in the new
lcp@323	794	{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}
lcp@323	795	nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
lcp@323	796	a basic term manipulation function defined in {\tt Pure/term.ML}.
lcp@323	797
lcp@323	798	Here is the print translation for dependent types:
lcp@323	799	\begin{ttbox}
lcp@323	800	fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
lcp@323	801	if 0 mem (loose_bnos B) then
lcp@323	802	let val (x', B') = variant_abs (x, dummyT, B);
lcp@323	803	in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
lcp@323	804	end
lcp@323	805	else list_comb (Const (r, dummyT) $ A $ B, ts)
lcp@323	806	\| dependent_tr' _ _ = raise Match;
lcp@323	807	\end{ttbox}
lcp@332	808	The argument {\tt (q,r)} is supplied to the curried function {\tt
lcp@332	809	dependent_tr'} by a partial application during its installation. We
lcp@332	810	can set up print translations for both {\tt Pi} and {\tt Sigma} by
lcp@332	811	including
lcp@323	812	\begin{ttbox}\index{*ML section}
lcp@323	813	val print_translation =
lcp@323	814	[("Pi", dependent_tr' ("{\at}PROD", "{\at}->")),
lcp@323	815	("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
lcp@323	816	\end{ttbox}
lcp@323	817	within the {\tt ML} section. The first of these transforms ${\tt Pi}(A,
lcp@323	818	\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
lcp@332	819	$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not depend
lcp@323	820	on~$x$. It checks this using \ttindex{loose_bnos}, yet another function
lcp@323	821	from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$ renamed away
lcp@332	822	from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes
lcp@332	823	referring to the {\tt Abs} node replaced by $\ttfct{Free} (x',
lcp@323	824	\mtt{dummyT})$.
lcp@323	825
lcp@323	826	We must be careful with types here. While types of {\tt Const}s are
lcp@323	827	ignored, type constraints may be printed for some {\tt Free}s and
lcp@323	828	{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type
lcp@323	829	\ttindex{dummyT} are never printed with constraint, though. The line
lcp@323	830	\begin{ttbox}
lcp@323	831	let val (x', B') = variant_abs (x, dummyT, B);
lcp@323	832	\end{ttbox}\index{*variant_abs}
lcp@323	833	replaces bound variable occurrences in~$B$ by the free variable $x'$ with
lcp@323	834	type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the
lcp@323	835	correct type~{\tt T}, so this is the only place where a type
lcp@323	836	constraint might appear.
lcp@323	837	\index{translations\|)}
lcp@323	838	\index{syntax!transformations\|)}

author	lcp
	Fri, 29 Jul 1994 13:30:48 +0200
changeset 499	5a54c796b808
parent 332	01b87a921967
child 864	d63b111b917a
permissions	-rw-r--r--