wneuper/isa: doc-src/Ref/syntax.tex@20e3616b2d9c (annotated)

wenzelm@30184	1
lcp@323	2	\chapter{Syntax Transformations} \label{chap:syntax}
lcp@323	3	\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
lcp@323	4	\newcommand\mtt[1]{\mbox{\tt #1}}
lcp@323	5	\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
lcp@323	6	\newcommand\Constant{\ttfct{Constant}}
lcp@323	7	\newcommand\Variable{\ttfct{Variable}}
lcp@323	8	\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
lcp@323	9	\index{syntax!transformations\|(}
lcp@323	10
lcp@323	11	This chapter is intended for experienced Isabelle users who need to define
lcp@323	12	macros or code their own translation functions. It describes the
lcp@323	13	transformations between parse trees, abstract syntax trees and terms.
lcp@323	14
lcp@323	15
lcp@323	16	\section{Abstract syntax trees} \label{sec:asts}
wenzelm@864	17	\index{ASTs\|(}
lcp@323	18
lcp@323	19	The parser, given a token list from the lexer, applies productions to yield
lcp@323	20	a parse tree\index{parse trees}. By applying some internal transformations
lcp@323	21	the parse tree becomes an abstract syntax tree, or \AST{}. Macro
lcp@323	22	expansion, further translations and finally type inference yields a
lcp@323	23	well-typed term. The printing process is the reverse, except for some
lcp@323	24	subtleties to be discussed later.
lcp@323	25
lcp@323	26	Figure~\ref{fig:parse_print} outlines the parsing and printing process.
lcp@323	27	Much of the complexity is due to the macro mechanism. Using macros, you
lcp@323	28	can specify most forms of concrete syntax without writing any \ML{} code.
lcp@323	29
lcp@323	30	\begin{figure}
lcp@323	31	\begin{center}
lcp@323	32	\begin{tabular}{cl}
lcp@323	33	string & \\
wenzelm@3108	34	$\downarrow$ & lexer, parser \\
lcp@323	35	parse tree & \\
lcp@323	36	$\downarrow$ & parse \AST{} translation \\
lcp@323	37	\AST{} & \\
lcp@323	38	$\downarrow$ & \AST{} rewriting (macros) \\
lcp@323	39	\AST{} & \\
lcp@323	40	$\downarrow$ & parse translation, type inference \\
lcp@323	41	--- well-typed term --- & \\
lcp@323	42	$\downarrow$ & print translation \\
lcp@323	43	\AST{} & \\
lcp@323	44	$\downarrow$ & \AST{} rewriting (macros) \\
lcp@323	45	\AST{} & \\
wenzelm@3108	46	$\downarrow$ & print \AST{} translation, token translation \\
lcp@323	47	string &
lcp@323	48	\end{tabular}
lcp@323	49
lcp@323	50	\end{center}
lcp@323	51	\caption{Parsing and printing}\label{fig:parse_print}
lcp@323	52	\end{figure}
lcp@323	53
lcp@323	54	Abstract syntax trees are an intermediate form between the raw parse trees
lcp@323	55	and the typed $\lambda$-terms. An \AST{} is either an atom (constant or
lcp@323	56	variable) or a list of {\em at least two\/} subtrees. Internally, they
lcp@323	57	have type \mltydx{Syntax.ast}: \index{Constant} \index{Variable}
lcp@323	58	\index{*Appl}
lcp@323	59	\begin{ttbox}
lcp@323	60	datatype ast = Constant of string
lcp@323	61	\| Variable of string
lcp@323	62	\| Appl of ast list
lcp@323	63	\end{ttbox}
lcp@323	64	%
lcp@323	65	Isabelle uses an S-expression syntax for abstract syntax trees. Constant
lcp@323	66	atoms are shown as quoted strings, variable atoms as non-quoted strings and
lcp@332	67	applications as a parenthesised list of subtrees. For example, the \AST
lcp@323	68	\begin{ttbox}
lcp@323	69	Appl [Constant "_constrain",
lcp@323	70	Appl [Constant "_abs", Variable "x", Variable "t"],
lcp@323	71	Appl [Constant "fun", Variable "'a", Variable "'b"]]
lcp@323	72	\end{ttbox}
lcp@323	73	is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
lcp@323	74	Both {\tt ()} and {\tt (f)} are illegal because they have too few
wenzelm@864	75	subtrees.
lcp@323	76
lcp@323	77	The resemblance to Lisp's S-expressions is intentional, but there are two
lcp@323	78	kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the
lcp@323	79	names {\tt Constant} and {\tt Variable} too literally; in the later
lcp@323	80	translation to terms, $\Variable x$ may become a constant, free or bound
lcp@323	81	variable, even a type constructor or class name; the actual outcome depends
lcp@323	82	on the context.
lcp@323	83
lcp@323	84	Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
lcp@323	85	application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of
lcp@323	86	application is determined later by context; it could be a type constructor
lcp@323	87	applied to types.
lcp@323	88
lcp@323	89	Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
lcp@323	90	first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later
lcp@323	91	at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
lcp@323	92	occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
lcp@323	93	constructor \ttindex{Bound}).
lcp@323	94
lcp@323	95
wenzelm@6618	96	\section{Transforming parse trees to ASTs}\label{sec:astofpt}
lcp@323	97	\index{ASTs!made from parse trees}
lcp@323	98	\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
lcp@323	99
lcp@323	100	The parse tree is the raw output of the parser. Translation functions,
lcp@323	101	called {\bf parse AST translations}\indexbold{translations!parse AST},
lcp@323	102	transform the parse tree into an abstract syntax tree.
lcp@323	103
lcp@323	104	The parse tree is constructed by nesting the right-hand sides of the
lcp@323	105	productions used to recognize the input. Such parse trees are simply lists
lcp@323	106	of tokens and constituent parse trees, the latter representing the
lcp@323	107	nonterminals of the productions. Let us refer to the actual productions in
wenzelm@864	108	the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
wenzelm@864	109	example).
lcp@323	110
lcp@323	111	Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
lcp@323	112	by stripping out delimiters and copy productions. More precisely, the
lcp@323	113	mapping $\astofpt{-}$ is derived from the productions as follows:
lcp@323	114	\begin{itemize}
lcp@323	115	\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
wenzelm@14947	116	\ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
wenzelm@14947	117	and $s$ its associated string. Note that for {\tt xstr} this does not
wenzelm@14947	118	include the quotes.
lcp@323	119
lcp@323	120	\item Copy productions:\index{productions!copy}
lcp@323	121	$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for
lcp@323	122	strings of delimiters, which are discarded. $P$ stands for the single
lcp@323	123	constituent that is not a delimiter; it is either a nonterminal symbol or
lcp@323	124	a name token.
lcp@323	125
lcp@323	126	\item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
lcp@323	127	Here there are no constituents other than delimiters, which are
wenzelm@864	128	discarded.
lcp@323	129
lcp@323	130	\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
lcp@323	131	the remaining constituents $P@1$, \ldots, $P@n$ are built into an
lcp@323	132	application whose head constant is~$c$:
wenzelm@864	133	\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
lcp@323	134	\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
lcp@323	135	\]
lcp@323	136	\end{itemize}
lcp@323	137	Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
lcp@323	138	{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
lcp@323	139	These examples illustrate the need for further translations to make \AST{}s
lcp@323	140	closer to the typed $\lambda$-calculus. The Pure syntax provides
lcp@323	141	predefined parse \AST{} translations\index{translations!parse AST} for
lcp@323	142	ordinary applications, type applications, nested abstractions, meta
lcp@323	143	implications and function types. Figure~\ref{fig:parse_ast_tr} shows their
lcp@323	144	effect on some representative input strings.
lcp@323	145
lcp@323	146
lcp@323	147	\begin{figure}
lcp@323	148	\begin{center}
lcp@323	149	\tt\begin{tabular}{ll}
lcp@323	150	\rm input string & \rm \AST \\\hline
lcp@323	151	"f" & f \\
lcp@323	152	"'a" & 'a \\
lcp@323	153	"t == u" & ("==" t u) \\
lcp@323	154	"f(x)" & ("_appl" f x) \\
lcp@323	155	"f(x, y)" & ("_appl" f ("_args" x y)) \\
lcp@323	156	"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\
lcp@323	157	"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\
lcp@323	158	\end{tabular}
lcp@323	159	\end{center}
wenzelm@864	160	\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
lcp@323	161	\end{figure}
lcp@323	162
lcp@323	163	\begin{figure}
lcp@323	164	\begin{center}
lcp@323	165	\tt\begin{tabular}{ll}
lcp@323	166	\rm input string & \rm \AST{} \\\hline
lcp@323	167	"f(x, y, z)" & (f x y z) \\
lcp@323	168	"'a ty" & (ty 'a) \\
lcp@323	169	"('a, 'b) ty" & (ty 'a 'b) \\
lcp@323	170	"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\
lcp@323	171	"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\
lcp@323	172	"[\| P; Q; R \|] => S" & ("==>" P ("==>" Q ("==>" R S))) \\
lcp@323	173	"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
lcp@323	174	\end{tabular}
lcp@323	175	\end{center}
lcp@323	176	\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
lcp@323	177	\end{figure}
lcp@323	178
lcp@323	179	The names of constant heads in the \AST{} control the translation process.
lcp@323	180	The list of constants invoking parse \AST{} translations appears in the
wenzelm@864	181	output of {\tt print_syntax} under {\tt parse_ast_translation}.
lcp@323	182
lcp@323	183
wenzelm@6618	184	\section{Transforming ASTs to terms}\label{sec:termofast}
lcp@323	185	\index{terms!made from ASTs}
lcp@323	186	\newcommand\termofast[1]{\lbrakk#1\rbrakk}
lcp@323	187
lcp@323	188	The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323	189	transformed into a term. This term is probably ill-typed since type
lcp@323	190	inference has not occurred yet. The term may contain type constraints
lcp@323	191	consisting of applications with head {\tt "_constrain"}; the second
lcp@323	192	argument is a type encoded as a term. Type inference later introduces
lcp@323	193	correct types or rejects the input.
lcp@323	194
lcp@323	195	Another set of translation functions, namely parse
lcp@323	196	translations\index{translations!parse}, may affect this process. If we
lcp@323	197	ignore parse translations for the time being, then \AST{}s are transformed
lcp@323	198	to terms by mapping \AST{} constants to constants, \AST{} variables to
lcp@323	199	schematic or free variables and \AST{} applications to applications.
lcp@323	200
lcp@323	201	More precisely, the mapping $\termofast{-}$ is defined by
lcp@323	202	\begin{itemize}
lcp@323	203	\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
lcp@323	204	\mtt{dummyT})$.
lcp@323	205
lcp@323	206	\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
lcp@323	207	\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
lcp@323	208	the index extracted from~$xi$.
lcp@323	209
lcp@323	210	\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
lcp@323	211	\mtt{dummyT})$.
lcp@323	212
lcp@323	213	\item Function applications with $n$ arguments:
wenzelm@864	214	\[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
lcp@323	215	\termofast{f} \ttapp
lcp@323	216	\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
lcp@323	217	\]
lcp@323	218	\end{itemize}
lcp@323	219	Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
lcp@323	220	\verb\|$\|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
lcp@323	221	while \ttindex{dummyT} stands for some dummy type that is ignored during
lcp@323	222	type inference.
lcp@323	223
lcp@323	224	So far the outcome is still a first-order term. Abstractions and bound
lcp@323	225	variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
lcp@323	226	by parse translations. Such translations are attached to {\tt "_abs"},
lcp@323	227	{\tt "!!"} and user-defined binders.
lcp@323	228
lcp@323	229
lcp@323	230	\section{Printing of terms}
lcp@323	231	\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
lcp@323	232
lcp@323	233	The output phase is essentially the inverse of the input phase. Terms are
lcp@323	234	translated via abstract syntax trees into strings. Finally the strings are
lcp@323	235	pretty printed.
lcp@323	236
lcp@323	237	Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
lcp@323	238	terms into \AST{}s. Ignoring those, the transformation maps
lcp@323	239	term constants, variables and applications to the corresponding constructs
lcp@323	240	on \AST{}s. Abstractions are mapped to applications of the special
lcp@323	241	constant {\tt _abs}.
lcp@323	242
lcp@323	243	More precisely, the mapping $\astofterm{-}$ is defined as follows:
lcp@323	244	\begin{itemize}
lcp@323	245	\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
lcp@323	246
lcp@323	247	\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
lcp@323	248	\tau)$.
lcp@323	249
lcp@323	250	\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
lcp@323	251	\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
lcp@323	252	the {\tt indexname} $(x, i)$.
lcp@323	253
lcp@323	254	\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
lcp@323	255	of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
lcp@323	256	be obtained from~$t$ by replacing all bound occurrences of~$x$ by
lcp@323	257	the free variable $x'$. This replaces corresponding occurrences of the
lcp@323	258	constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
lcp@323	259	\mtt{dummyT})$:
wenzelm@864	260	\[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
wenzelm@864	261	\Appl{\Constant \mtt{"_abs"},
paulson@8136	262	constrain(\Variable x', \tau), \astofterm{t'}}
lcp@323	263	\]
lcp@323	264
wenzelm@864	265	\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
lcp@323	266	The occurrence of constructor \ttindex{Bound} should never happen
lcp@323	267	when printing well-typed terms; it indicates a de Bruijn index with no
lcp@323	268	matching abstraction.
lcp@323	269
lcp@323	270	\item Where $f$ is not an application,
wenzelm@864	271	\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
lcp@323	272	\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
lcp@323	273	\]
lcp@323	274	\end{itemize}
lcp@323	275	%
lcp@332	276	Type constraints\index{type constraints} are inserted to allow the printing
lcp@332	277	of types. This is governed by the boolean variable \ttindex{show_types}:
lcp@323	278	\begin{itemize}
lcp@323	279	\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
lcp@332	280	\ttindex{show_types} is set to {\tt false}.
lcp@323	281
wenzelm@864	282	\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
wenzelm@864	283	\astofterm{\tau}}$ \ otherwise.
lcp@323	284
lcp@323	285	Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
lcp@323	286	constructors go to {\tt Constant}s; type identifiers go to {\tt
lcp@323	287	Variable}s; type applications go to {\tt Appl}s with the type
lcp@323	288	constructor as the first element. If \ttindex{show_sorts} is set to
lcp@323	289	{\tt true}, some type variables are decorated with an \AST{} encoding
lcp@323	290	of their sort.
lcp@323	291	\end{itemize}
lcp@323	292	%
lcp@323	293	The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@323	294	transformed into the final output string. The built-in {\bf print AST
lcp@332	295	translations}\indexbold{translations!print AST} reverse the
lcp@323	296	parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
lcp@323	297
lcp@323	298	For the actual printing process, the names attached to productions
lcp@323	299	of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
lcp@323	300	a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
lcp@323	301	$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
lcp@323	302	for~$c$. Each argument~$x@i$ is converted to a string, and put in
lcp@323	303	parentheses if its priority~$(p@i)$ requires this. The resulting strings
lcp@323	304	and their syntactic sugar (denoted by \dots{} above) are joined to make a
lcp@323	305	single string.
lcp@323	306
wenzelm@3108	307	If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
wenzelm@3108	308	than the corresponding production, it is first split into
wenzelm@3108	309	$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications
wenzelm@3108	310	with too few arguments or with non-constant head or without a
wenzelm@3108	311	corresponding production are printed as $f(x@1, \ldots, x@l)$ or
wenzelm@3108	312	$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated
wenzelm@3108	313	with some name $c$ are tried in order of appearance. An occurrence of
lcp@323	314	$\Variable x$ is simply printed as~$x$.
lcp@323	315
lcp@323	316	Blanks are {\em not\/} inserted automatically. If blanks are required to
lcp@323	317	separate tokens, specify them in the mixfix declaration, possibly preceded
lcp@323	318	by a slash~({\tt/}) to allow a line break.
lcp@323	319	\index{ASTs\|)}
lcp@323	320
lcp@323	321
lcp@323	322
wenzelm@3108	323	\section{Macros: syntactic rewriting} \label{sec:macros}
wenzelm@864	324	\index{macros\|(}\index{rewriting!syntactic\|(}
lcp@323	325
lcp@323	326	Mixfix declarations alone can handle situations where there is a direct
lcp@323	327	connection between the concrete syntax and the underlying term. Sometimes
lcp@323	328	we require a more elaborate concrete syntax, such as quantifiers and list
lcp@323	329	notation. Isabelle's {\bf macros} and {\bf translation functions} can
lcp@323	330	perform translations such as
lcp@323	331	\begin{center}\tt
lcp@323	332	\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
lcp@323	333	ALL x:A.P & Ball(A, \%x.P) \\ \relax
lcp@323	334	[x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))
lcp@323	335	\end{tabular}
lcp@323	336	\end{center}
lcp@323	337	Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
lcp@323	338	are the most powerful translation mechanism but are difficult to read or
lcp@323	339	write. Macros are specified by first-order rewriting systems that operate
lcp@323	340	on abstract syntax trees. They are usually easy to read and write, and can
lcp@323	341	express all but the most obscure translations.
lcp@323	342
wenzelm@3108	343	Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
wenzelm@3108	344	set theory.\footnote{This and the following theories are complete
wenzelm@3108	345	working examples, though they specify only syntax, no axioms. The
wenzelm@3108	346	file {\tt ZF/ZF.thy} presents a full set theory definition,
wenzelm@3108	347	including many macro rules.} Theory {\tt SetSyntax} declares
wenzelm@3108	348	constants for set comprehension ({\tt Collect}), replacement ({\tt
wenzelm@3108	349	Replace}) and bounded universal quantification ({\tt Ball}). Each
wenzelm@3108	350	of these binds some variables. Without additional syntax we should
wenzelm@3108	351	have to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, and
wenzelm@3108	352	similarly for the others.
lcp@323	353
lcp@323	354	\begin{figure}
lcp@323	355	\begin{ttbox}
wenzelm@3108	356	SetSyntax = Pure +
lcp@323	357	types
wenzelm@864	358	i o
lcp@323	359	arities
lcp@323	360	i, o :: logic
lcp@323	361	consts
clasohm@1387	362	Trueprop :: o => prop ("_" 5)
clasohm@1387	363	Collect :: [i, i => o] => i
clasohm@1387	364	Replace :: [i, [i, i] => o] => i
clasohm@1387	365	Ball :: [i, i => o] => o
wenzelm@864	366	syntax
clasohm@1387	367	"{\at}Collect" :: [idt, i, o] => i ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
clasohm@1387	368	"{\at}Replace" :: [idt, idt, i, o] => i ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
clasohm@1387	369	"{\at}Ball" :: [idt, i, o] => o ("(3ALL _:_./ _)" 10)
lcp@323	370	translations
lcp@323	371	"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)"
lcp@323	372	"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
lcp@323	373	"ALL x:A. P" == "Ball(A, \%x. P)"
lcp@323	374	end
lcp@323	375	\end{ttbox}
lcp@323	376	\caption{Macro example: set theory}\label{fig:set_trans}
lcp@323	377	\end{figure}
lcp@323	378
wenzelm@864	379	The theory specifies a variable-binding syntax through additional productions
wenzelm@864	380	that have mixfix declarations. Each non-copy production must specify some
wenzelm@864	381	constant, which is used for building \AST{}s. \index{constants!syntactic} The
wenzelm@864	382	additional constants are decorated with {\tt\at} to stress their purely
wenzelm@864	383	syntactic purpose; they may not occur within the final well-typed terms,
wenzelm@864	384	being declared as {\tt syntax} rather than {\tt consts}.
lcp@323	385
lcp@323	386	The translations cause the replacement of external forms by internal forms
lcp@323	387	after parsing, and vice versa before printing of terms. As a specification
lcp@323	388	of the set theory notation, they should be largely self-explanatory. The
lcp@323	389	syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
lcp@323	390	appear implicitly in the macro rules via their mixfix forms.
lcp@323	391
lcp@323	392	Macros can define variable-binding syntax because they operate on \AST{}s,
lcp@323	393	which have no inbuilt notion of bound variable. The macro variables {\tt
lcp@323	394	x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
lcp@323	395	in this case bound variables. The macro variables {\tt P} and~{\tt Q}
lcp@323	396	range over formulae containing bound variable occurrences.
lcp@323	397
lcp@323	398	Other applications of the macro system can be less straightforward, and
lcp@323	399	there are peculiarities. The rest of this section will describe in detail
lcp@323	400	how Isabelle macros are preprocessed and applied.
lcp@323	401
lcp@323	402
lcp@323	403	\subsection{Specifying macros}
lcp@323	404	Macros are basically rewrite rules on \AST{}s. But unlike other macro
lcp@323	405	systems found in programming languages, Isabelle's macros work in both
lcp@323	406	directions. Therefore a syntax contains two lists of rewrites: one for
lcp@323	407	parsing and one for printing.
lcp@323	408
lcp@323	409	\index{*translations section}
lcp@323	410	The {\tt translations} section specifies macros. The syntax for a macro is
lcp@323	411	\[ (root)\; string \quad
lcp@323	412	\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
lcp@323	413	\right\} \quad
wenzelm@864	414	(root)\; string
lcp@323	415	\]
lcp@323	416	%
lcp@323	417	This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
lcp@323	418	({\tt ==}). The two strings specify the left and right-hand sides of the
lcp@323	419	macro rule. The $(root)$ specification is optional; it specifies the
lcp@323	420	nonterminal for parsing the $string$ and if omitted defaults to {\tt
lcp@323	421	logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:
lcp@323	422	\begin{itemize}
lcp@323	423	\item Rules must be left linear: $l$ must not contain repeated variables.
lcp@323	424
lcp@323	425	\item Every variable in~$r$ must also occur in~$l$.
lcp@323	426	\end{itemize}
lcp@323	427
wenzelm@3108	428	Macro rules may refer to any syntax from the parent theories. They
wenzelm@3108	429	may also refer to anything defined before the current {\tt
lcp@323	430	translations} section --- including any mixfix declarations.
lcp@323	431
lcp@323	432	Upon declaration, both sides of the macro rule undergo parsing and parse
lcp@323	433	\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
lcp@323	434	macro expansion. The lexer runs in a different mode that additionally
lcp@323	435	accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
lcp@323	436	{\tt _K}). Thus, a constant whose name starts with an underscore can
lcp@323	437	appear in macro rules but not in ordinary terms.
lcp@323	438
lcp@323	439	Some atoms of the macro rule's \AST{} are designated as constants for
lcp@323	440	matching. These are all names that have been declared as classes, types or
wenzelm@864	441	constants (logical and syntactic).
lcp@323	442
wenzelm@3108	443	The result of this preprocessing is two lists of macro rules, each
wenzelm@3108	444	stored as a pair of \AST{}s. They can be viewed using {\tt
wenzelm@3108	445	print_syntax} (sections \ttindex{parse_rules} and
wenzelm@3108	446	\ttindex{print_rules}). For theory~{\tt SetSyntax} of
wenzelm@3108	447	Fig.~\ref{fig:set_trans} these are
lcp@323	448	\begin{ttbox}
lcp@323	449	parse_rules:
lcp@323	450	("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))
lcp@323	451	("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q)))
lcp@323	452	("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))
lcp@323	453	print_rules:
lcp@323	454	("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P)
lcp@323	455	("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q)
lcp@323	456	("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)
lcp@323	457	\end{ttbox}
lcp@323	458
lcp@323	459	\begin{warn}
lcp@323	460	Avoid choosing variable names that have previously been used as
lcp@323	461	constants, types or type classes; the {\tt consts} section in the output
wenzelm@864	462	of {\tt print_syntax} lists all such names. If a macro rule works
lcp@323	463	incorrectly, inspect its internal form as shown above, recalling that
lcp@323	464	constants appear as quoted strings and variables without quotes.
lcp@323	465	\end{warn}
lcp@323	466
lcp@323	467	\begin{warn}
lcp@323	468	If \ttindex{eta_contract} is set to {\tt true}, terms will be
lcp@323	469	$\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some
lcp@323	470	abstraction nodes needed for print rules to match may vanish. For example,
lcp@332	471	\verb\|Ball(A, %x. P(x))\| contracts to {\tt Ball(A, P)}; the print rule does
lcp@323	472	not apply and the output will be {\tt Ball(A, P)}. This problem would not
lcp@323	473	occur if \ML{} translation functions were used instead of macros (as is
lcp@323	474	done for binder declarations).
lcp@323	475	\end{warn}
lcp@323	476
lcp@323	477
lcp@323	478	\begin{warn}
lcp@323	479	Another trap concerns type constraints. If \ttindex{show_types} is set to
lcp@323	480	{\tt true}, bound variables will be decorated by their meta types at the
lcp@323	481	binding place (but not at occurrences in the body). Matching with
lcp@323	482	\verb\|Collect(A, %x. P)\| binds {\tt x} to something like {\tt ("_constrain" y
lcp@323	483	"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to
wenzelm@864	484	appear in the external form, say \verb\|{y::i:A::i. P::o}\|.
lcp@323	485
lcp@323	486	To allow such constraints to be re-read, your syntax should specify bound
lcp@323	487	variables using the nonterminal~\ndx{idt}. This is the case in our
wenzelm@3108	488	example. Choosing {\tt id} instead of {\tt idt} is a common error.
lcp@323	489	\end{warn}
lcp@323	490
lcp@323	491
lcp@323	492
lcp@323	493	\subsection{Applying rules}
lcp@323	494	As a term is being parsed or printed, an \AST{} is generated as an
lcp@323	495	intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is
lcp@332	496	normalised by applying macro rules in the manner of a traditional term
lcp@323	497	rewriting system. We first examine how a single rule is applied.
lcp@323	498
lcp@332	499	Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
lcp@323	500	translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an
lcp@323	501	instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex
lcp@323	502	matched by $l$ may be replaced by the corresponding instance of~$r$, thus
lcp@323	503	{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf
lcp@323	504	place-holders} that may occur in rule patterns but not in ordinary
lcp@323	505	\AST{}s; {\tt Variable} atoms serve this purpose.
lcp@323	506
lcp@323	507	The matching of the object~$u$ by the pattern~$l$ is performed as follows:
lcp@323	508	\begin{itemize}
lcp@323	509	\item Every constant matches itself.
lcp@323	510
lcp@323	511	\item $\Variable x$ in the object matches $\Constant x$ in the pattern.
lcp@323	512	This point is discussed further below.
lcp@323	513
lcp@323	514	\item Every \AST{} in the object matches $\Variable x$ in the pattern,
lcp@323	515	binding~$x$ to~$u$.
lcp@323	516
lcp@323	517	\item One application matches another if they have the same number of
lcp@323	518	subtrees and corresponding subtrees match.
lcp@323	519
lcp@323	520	\item In every other case, matching fails. In particular, {\tt
lcp@323	521	Constant}~$x$ can only match itself.
lcp@323	522	\end{itemize}
lcp@323	523	A successful match yields a substitution that is applied to~$r$, generating
lcp@323	524	the instance that replaces~$u$.
lcp@323	525
lcp@323	526	The second case above may look odd. This is where {\tt Variable}s of
lcp@323	527	non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not
lcp@323	528	far removed from parse trees; at this level it is not yet known which
lcp@323	529	identifiers will become constants, bounds, frees, types or classes. As
lcp@323	530	\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
wenzelm@864	531	{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
wenzelm@14947	532	\ndx{tvar}, \ndx{num}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other
wenzelm@864	533	hand, when \AST{}s generated from terms for printing, all constants and type
wenzelm@864	534	constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may
wenzelm@864	535	contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is
lcp@323	536	insignificant at macro level because matching treats them alike.
lcp@323	537
lcp@323	538	Because of this behaviour, different kinds of atoms with the same name are
lcp@323	539	indistinguishable, which may make some rules prone to misbehaviour. Example:
lcp@323	540	\begin{ttbox}
lcp@323	541	types
lcp@323	542	Nil
lcp@323	543	consts
clasohm@1387	544	Nil :: 'a list
wenzelm@864	545	syntax
clasohm@1387	546	"[]" :: 'a list ("[]")
lcp@323	547	translations
lcp@323	548	"[]" == "Nil"
lcp@323	549	\end{ttbox}
lcp@323	550	The term {\tt Nil} will be printed as {\tt []}, just as expected.
lcp@323	551	The term \verb\|%Nil.t\| will be printed as \verb\|%[].t\|, which might not be
wenzelm@864	552	expected! Guess how type~{\tt Nil} is printed?
lcp@323	553
wenzelm@14893	554	Normalizing an \AST{} involves repeatedly applying macro rules until none are
wenzelm@14893	555	applicable. Macro rules are chosen in order of appearance in the theory
wenzelm@14893	556	definitions. You can watch the normalization of \AST{}s during parsing and
wenzelm@14893	557	printing by setting \ttindex{Syntax.trace_ast} to {\tt true}.\index{tracing!of
wenzelm@14893	558	macros} The information displayed when tracing includes the \AST{} before
wenzelm@14893	559	normalization ({\tt pre}), redexes with results ({\tt rewrote}), the normal
wenzelm@14893	560	form finally reached ({\tt post}) and some statistics ({\tt normalize}).
lcp@323	561
lcp@323	562	\subsection{Example: the syntax of finite sets}
lcp@323	563	\index{examples!of macros}
lcp@323	564
lcp@323	565	This example demonstrates the use of recursive macros to implement a
lcp@323	566	convenient notation for finite sets.
lcp@323	567	\index{empty constant}\index{insert constant}\index{{}@\verb'{}' symbol}
lcp@323	568	\index{"@Enum@{\tt\at Enum} constant}
lcp@323	569	\index{"@Finset@{\tt\at Finset} constant}
lcp@323	570	\begin{ttbox}
wenzelm@3108	571	FinSyntax = SetSyntax +
lcp@323	572	types
lcp@323	573	is
wenzelm@864	574	syntax
clasohm@1387	575	"" :: i => is ("_")
clasohm@1387	576	"{\at}Enum" :: [i, is] => is ("_,/ _")
wenzelm@864	577	consts
clasohm@1387	578	empty :: i ("{\ttlbrace}{\ttrbrace}")
clasohm@1387	579	insert :: [i, i] => i
wenzelm@864	580	syntax
clasohm@1387	581	"{\at}Finset" :: is => i ("{\ttlbrace}(_){\ttrbrace}")
lcp@323	582	translations
lcp@323	583	"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"
lcp@323	584	"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"
lcp@323	585	end
lcp@323	586	\end{ttbox}
lcp@323	587	Finite sets are internally built up by {\tt empty} and {\tt insert}. The
lcp@323	588	declarations above specify \verb\|{x, y, z}\| as the external representation
lcp@323	589	of
lcp@323	590	\begin{ttbox}
lcp@323	591	insert(x, insert(y, insert(z, empty)))
lcp@323	592	\end{ttbox}
lcp@323	593	The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
lcp@323	594	i} separated by commas. The mixfix declaration \hbox{\verb\|"_,/ _"\|}
lcp@323	595	allows a line break after the comma for \rmindex{pretty printing}; if no
lcp@323	596	line break is required then a space is printed instead.
lcp@323	597
lcp@323	598	The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
wenzelm@864	599	declaration. Hence {\tt is} is not a logical type and may be used safely as
wenzelm@864	600	a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be
paulson@3485	601	re-used for other enumerations of type~{\tt i} like lists or tuples. If we
wenzelm@864	602	had needed polymorphic enumerations, we could have used the predefined
wenzelm@864	603	nonterminal symbol \ndx{args} and skipped this part altogether.
lcp@323	604
lcp@323	605	\index{"@Finset@{\tt\at Finset} constant}
lcp@323	606	Next follows {\tt empty}, which is already equipped with its syntax
lcp@323	607	\verb\|{}\|, and {\tt insert} without concrete syntax. The syntactic
lcp@323	608	constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
lcp@323	609	i} enclosed in curly braces. Remember that a pair of parentheses, as in
lcp@323	610	\verb\|"{(_)}"\|, specifies a block of indentation for pretty printing.
lcp@323	611
lcp@323	612	The translations may look strange at first. Macro rules are best
lcp@323	613	understood in their internal forms:
lcp@323	614	\begin{ttbox}
lcp@323	615	parse_rules:
lcp@323	616	("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs))
lcp@323	617	("{\at}Finset" x) -> ("insert" x "empty")
lcp@323	618	print_rules:
lcp@323	619	("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs))
lcp@323	620	("insert" x "empty") -> ("{\at}Finset" x)
lcp@323	621	\end{ttbox}
wenzelm@864	622	This shows that \verb\|{x,xs}\| indeed matches any set enumeration of at least
lcp@323	623	two elements, binding the first to {\tt x} and the rest to {\tt xs}.
wenzelm@864	624	Likewise, \verb\|{xs}\| and \verb\|{x}\| represent any set enumeration.
lcp@323	625	The parse rules only work in the order given.
lcp@323	626
lcp@323	627	\begin{warn}
lcp@332	628	The \AST{} rewriter cannot distinguish constants from variables and looks
lcp@323	629	only for names of atoms. Thus the names of {\tt Constant}s occurring in
lcp@323	630	the (internal) left-hand side of translation rules should be regarded as
lcp@323	631	\rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or
lcp@323	632	sufficiently long and strange names. If a bound variable's name gets
lcp@323	633	rewritten, the result will be incorrect; for example, the term
lcp@323	634	\begin{ttbox}
lcp@323	635	\%empty insert. insert(x, empty)
lcp@323	636	\end{ttbox}
wenzelm@864	637	\par\noindent is incorrectly printed as \verb\|%empty insert. {x}\|.
lcp@323	638	\end{warn}
lcp@323	639
lcp@323	640
lcp@323	641	\subsection{Example: a parse macro for dependent types}\label{prod_trans}
lcp@323	642	\index{examples!of macros}
lcp@323	643
lcp@323	644	As stated earlier, a macro rule may not introduce new {\tt Variable}s on
wenzelm@864	645	the right-hand side. Something like \verb\|"K(B)" => "%x.B"\| is illegal;
lcp@323	646	if allowed, it could cause variable capture. In such cases you usually
lcp@323	647	must fall back on translation functions. But a trick can make things
wenzelm@864	648	readable in some cases: {\em calling\/} translation functions by parse
wenzelm@864	649	macros:
lcp@323	650	\begin{ttbox}
wenzelm@3135	651	ProdSyntax = SetSyntax +
lcp@323	652	consts
clasohm@1387	653	Pi :: [i, i => i] => i
wenzelm@864	654	syntax
clasohm@1387	655	"{\at}PROD" :: [idt, i, i] => i ("(3PROD _:_./ _)" 10)
clasohm@1387	656	"{\at}->" :: [i, i] => i ("(_ ->/ _)" [51, 50] 50)
lcp@323	657	\ttbreak
lcp@323	658	translations
lcp@323	659	"PROD x:A. B" => "Pi(A, \%x. B)"
lcp@323	660	"A -> B" => "Pi(A, _K(B))"
lcp@323	661	end
lcp@323	662	ML
lcp@323	663	val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
lcp@323	664	\end{ttbox}
lcp@323	665
wenzelm@864	666	Here {\tt Pi} is a logical constant for constructing general products.
lcp@323	667	Two external forms exist: the general case {\tt PROD x:A.B} and the
lcp@323	668	function space {\tt A -> B}, which abbreviates \verb\|Pi(A, %x.B)\| when {\tt B}
lcp@323	669	does not depend on~{\tt x}.
lcp@323	670
wenzelm@864	671	The second parse macro introduces {\tt _K(B)}, which later becomes
wenzelm@864	672	\verb\|%x.B\| due to a parse translation associated with \cdx{_K}.
wenzelm@864	673	Unfortunately there is no such trick for printing, so we have to add a {\tt
wenzelm@864	674	ML} section for the print translation \ttindex{dependent_tr'}.
lcp@323	675
lcp@323	676	Recall that identifiers with a leading {\tt _} are allowed in translation
lcp@323	677	rules, but not in ordinary terms. Thus we can create \AST{}s containing
lcp@323	678	names that are not directly expressible.
lcp@323	679
paulson@8136	680	The parse translation for {\tt _K} is already installed in Pure, and the
paulson@8136	681	function {\tt dependent_tr'} is exported by the syntax module for public use.
paulson@8136	682	See \S\ref{sec:tr_funs} below for more of the arcane lore of translation
paulson@8136	683	functions. \index{macros\|)}\index{rewriting!syntactic\|)}
lcp@323	684
lcp@323	685
lcp@323	686	\section{Translation functions} \label{sec:tr_funs}
wenzelm@864	687	\index{translations\|(}
lcp@323	688	%
wenzelm@9695	689	This section describes the translation function mechanism. By writing \ML{}
wenzelm@9695	690	functions, you can do almost everything to terms or \AST{}s during parsing and
wenzelm@9695	691	printing. The logic LK is a good example of sophisticated transformations
wenzelm@9695	692	between internal and external representations of sequents; here, macros would
wenzelm@9695	693	be useless.
lcp@323	694
lcp@323	695	A full understanding of translations requires some familiarity
lcp@323	696	with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
lcp@323	697	{\tt Syntax.ast} and the encodings of types and terms as such at the various
lcp@323	698	stages of the parsing or printing process. Most users should never need to
lcp@323	699	use translation functions.
lcp@323	700
lcp@323	701	\subsection{Declaring translation functions}
wenzelm@3108	702	There are four kinds of translation functions, with one of these
wenzelm@3108	703	coming in two variants. Each such function is associated with a name,
wenzelm@3108	704	which triggers calls to it. Such names can be constants (logical or
wenzelm@3108	705	syntactic) or type constructors.
lcp@323	706
wenzelm@6343	707	Function {\tt print_syntax} displays the sets of names associated with the
wenzelm@6343	708	translation functions of a theory under \texttt{parse_ast_translation}, etc.
wenzelm@6343	709	You can add new ones via the {\tt ML} section\index{*ML section} of a theory
wenzelm@6343	710	definition file. Even though the {\tt ML} section is the very last part of
wenzelm@6343	711	the file, newly installed translation functions are already effective when
wenzelm@6343	712	processing all of the preceding sections.
lcp@323	713
wenzelm@3108	714	The {\tt ML} section's contents are simply copied verbatim near the
wenzelm@3108	715	beginning of the \ML\ file generated from a theory definition file.
wenzelm@3108	716	Definitions made here are accessible as components of an \ML\
wenzelm@3108	717	structure; to make some parts private, use an \ML{} {\tt local}
wenzelm@3108	718	declaration. The {\ML} code may install translation functions by
wenzelm@3108	719	declaring any of the following identifiers:
lcp@323	720	\begin{ttbox}
wenzelm@3108	721	val parse_ast_translation : (string * (ast list -> ast)) list
wenzelm@3108	722	val print_ast_translation : (string * (ast list -> ast)) list
wenzelm@3108	723	val parse_translation : (string * (term list -> term)) list
wenzelm@3108	724	val print_translation : (string * (term list -> term)) list
wenzelm@4375	725	val typed_print_translation :
wenzelm@4375	726	(string * (bool -> typ -> term list -> term)) list
lcp@323	727	\end{ttbox}
lcp@323	728
lcp@323	729	\subsection{The translation strategy}
wenzelm@3108	730	The different kinds of translation functions are called during the
wenzelm@3108	731	transformations between parse trees, \AST{}s and terms (recall
wenzelm@3108	732	Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form
wenzelm@3108	733	$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation
wenzelm@3108	734	function $f$ of appropriate kind exists for $c$, the result is
wenzelm@3108	735	computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
lcp@323	736
wenzelm@3108	737	For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.
wenzelm@3108	738	A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1,
wenzelm@3108	739	\ldots, x@n}$. For term translations, the arguments are terms and a
wenzelm@3108	740	combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}
wenzelm@3108	741	(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more
wenzelm@3108	742	sophisticated transformations than \AST{}s do, typically involving
wenzelm@3108	743	abstractions and bound variables. {\em Typed} print translations may
wenzelm@4375	744	even peek at the type $\tau$ of the constant they are invoked on; they
wenzelm@4375	745	are also passed the current value of the \ttindex{show_sorts} flag.
lcp@323	746
wenzelm@3108	747	Regardless of whether they act on terms or \AST{}s, translation
wenzelm@3108	748	functions called during the parsing process differ from those for
wenzelm@3108	749	printing more fundamentally in their overall behaviour:
lcp@323	750	\begin{description}
wenzelm@6343	751	\item[Parse translations] are applied bottom-up. The arguments are already in
wenzelm@6343	752	translated form. The translations must not fail; exceptions trigger an
wenzelm@6343	753	error message. There may never be more than one function associated with
wenzelm@6343	754	any syntactic name.
wenzelm@6343	755
lcp@323	756	\item[Print translations] are applied top-down. They are supplied with
lcp@323	757	arguments that are partly still in internal form. The result again
wenzelm@6343	758	undergoes translation; therefore a print translation should not introduce as
wenzelm@6343	759	head the very constant that invoked it. The function may raise exception
wenzelm@6343	760	\xdx{Match} to indicate failure; in this event it has no effect. Multiple
wenzelm@6343	761	functions associated with some syntactic name are tried in an unspecified
wenzelm@6343	762	order.
lcp@323	763	\end{description}
lcp@323	764
lcp@323	765	Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
lcp@323	766	\ttindex{Const} for terms --- can invoke translation functions. This
lcp@323	767	causes another difference between parsing and printing.
lcp@323	768
lcp@323	769	Parsing starts with a string and the constants are not yet identified.
lcp@323	770	Only parse tree heads create {\tt Constant}s in the resulting \AST, as
lcp@323	771	described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may
lcp@323	772	introduce further {\tt Constant}s. When the final \AST{} is converted to a
lcp@323	773	term, all {\tt Constant}s become {\tt Const}s, as described in
lcp@323	774	\S\ref{sec:termofast}.
lcp@323	775
lcp@323	776	Printing starts with a well-typed term and all the constants are known. So
lcp@323	777	all logical constants and type constructors may invoke print translations.
lcp@323	778	These, and macros, may introduce further constants.
lcp@323	779
lcp@323	780
lcp@323	781	\subsection{Example: a print translation for dependent types}
lcp@323	782	\index{examples!of translations}\indexbold{*dependent_tr'}
lcp@323	783
lcp@323	784	Let us continue the dependent type example (page~\pageref{prod_trans}) by
lcp@323	785	examining the parse translation for~\cdx{_K} and the print translation
lcp@323	786	{\tt dependent_tr'}, which are both built-in. By convention, parse
lcp@323	787	translations have names ending with {\tt _tr} and print translations have
lcp@323	788	names ending with {\tt _tr'}. Search for such names in the Isabelle
lcp@323	789	sources to locate more examples.
lcp@323	790
lcp@323	791	Here is the parse translation for {\tt _K}:
lcp@323	792	\begin{ttbox}
lcp@323	793	fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
wenzelm@864	794	\| k_tr ts = raise TERM ("k_tr", ts);
lcp@323	795	\end{ttbox}
lcp@323	796	If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
lcp@323	797	{\tt Abs} node with a body derived from $t$. Since terms given to parse
lcp@323	798	translations are not yet typed, the type of the bound variable in the new
lcp@323	799	{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}
lcp@323	800	nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
lcp@323	801	a basic term manipulation function defined in {\tt Pure/term.ML}.
lcp@323	802
lcp@323	803	Here is the print translation for dependent types:
lcp@323	804	\begin{ttbox}
wenzelm@864	805	fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
lcp@323	806	if 0 mem (loose_bnos B) then
wenzelm@3108	807	let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in
wenzelm@3108	808	list_comb
paulson@8136	809	(Const (q,dummyT) $
paulson@8136	810	Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts)
lcp@323	811	end
lcp@323	812	else list_comb (Const (r, dummyT) $ A $ B, ts)
lcp@323	813	\| dependent_tr' _ _ = raise Match;
lcp@323	814	\end{ttbox}
wenzelm@3135	815	The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt
wenzelm@3108	816	dependent_tr'} by a partial application during its installation.
wenzelm@3108	817	For example, we could set up print translations for both {\tt Pi} and
wenzelm@3108	818	{\tt Sigma} by including
lcp@323	819	\begin{ttbox}\index{*ML section}
lcp@323	820	val print_translation =
lcp@323	821	[("Pi", dependent_tr' ("{\at}PROD", "{\at}->")),
lcp@323	822	("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
lcp@323	823	\end{ttbox}
wenzelm@3108	824	within the {\tt ML} section. The first of these transforms ${\tt
wenzelm@3108	825	Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
wenzelm@3108	826	$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not
wenzelm@3108	827	depend on~$x$. It checks this using \ttindex{loose_bnos}, yet another
wenzelm@3108	828	function from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$
wenzelm@3108	829	renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt
wenzelm@3108	830	Bound} nodes referring to the {\tt Abs} node replaced by
wenzelm@3108	831	$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound
wenzelm@3108	832	variable).
lcp@323	833
lcp@323	834	We must be careful with types here. While types of {\tt Const}s are
lcp@323	835	ignored, type constraints may be printed for some {\tt Free}s and
lcp@323	836	{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type
lcp@323	837	\ttindex{dummyT} are never printed with constraint, though. The line
lcp@323	838	\begin{ttbox}
wenzelm@3108	839	let val (x', B') = Syntax.variant_abs' (x, dummyT, B);
wenzelm@3108	840	\end{ttbox}\index{*Syntax.variant_abs'}
lcp@323	841	replaces bound variable occurrences in~$B$ by the free variable $x'$ with
lcp@323	842	type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the
lcp@323	843	correct type~{\tt T}, so this is the only place where a type
wenzelm@864	844	constraint might appear.
wenzelm@3108	845
wenzelm@3108	846	Also note that we are responsible to mark free identifiers that
paulson@3485	847	actually represent bound variables. This is achieved by
wenzelm@3108	848	\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.
wenzelm@3108	849	Failing to do so may cause these names to be printed in the wrong
wenzelm@3108	850	style. \index{translations\|)} \index{syntax!transformations\|)}
wenzelm@3108	851
wenzelm@3108	852
wenzelm@3108	853	\section{Token translations} \label{sec:tok_tr}
wenzelm@3108	854	\index{token translations\|(}
wenzelm@3108	855	%
wenzelm@3108	856	Isabelle's meta-logic features quite a lot of different kinds of
wenzelm@3108	857	identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},
paulson@3485	858	{\em bound}, {\em var}. One might want to have these printed in
wenzelm@3108	859	different styles, e.g.\ in bold or italic, or even transcribed into
wenzelm@3108	860	something more readable like $\alpha, \alpha', \beta$ instead of {\tt
paulson@3485	861	'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translations
wenzelm@3108	862	provide a means to such ends, enabling the user to install certain
wenzelm@3108	863	\ML{} functions associated with any logical \rmindex{token class} and
wenzelm@3108	864	depending on some \rmindex{print mode}.
wenzelm@3108	865
wenzelm@3108	866	The logical class of identifiers can not necessarily be determined by
paulson@3485	867	its syntactic category, though. For example, consider free vs.\ bound
paulson@3485	868	variables. So Isabelle's pretty printing mechanism, starting from
wenzelm@3108	869	fully typed terms, has to be careful to preserve this additional
wenzelm@3108	870	information\footnote{This is done by marking atoms in abstract syntax
paulson@3485	871	trees appropriately. The marks are actually visible by print
wenzelm@3108	872	translation functions -- they are just special constants applied to
wenzelm@3108	873	atomic asts, for example \texttt{("_bound" x)}.}. In particular,
wenzelm@3108	874	user-supplied print translation functions operating on terms have to
paulson@3485	875	be well-behaved in this respect. Free identifiers introduced to
wenzelm@3108	876	represent bound variables have to be marked appropriately (cf.\ the
wenzelm@3108	877	example at the end of \S\ref{sec:tr_funs}).
wenzelm@3108	878
wenzelm@3108	879	\medskip Token translations may be installed by declaring the
wenzelm@6343	880	\ttindex{token_translation} value within the \texttt{ML} section of a theory
wenzelm@6343	881	definition file:
wenzelm@3108	882	\begin{ttbox}
paulson@8136	883	val token_translation:
paulson@8136	884	(string * string * (string -> string * real)) list
wenzelm@8701	885	\end{ttbox}
wenzelm@6343	886	The elements of this list are of the form $(m, c, f)$, where $m$ is a print
wenzelm@6343	887	mode identifier, $c$ a token class, and $f\colon string \to string \times
wenzelm@6343	888	real$ the actual translation function. Assuming that $x$ is of identifier
wenzelm@6343	889	class $c$, and print mode $m$ is the first (active) mode providing some
wenzelm@6343	890	translation for $c$, then $x$ is output according to $f(x) = (x', len)$.
wenzelm@6343	891	Thereby $x'$ is the modified identifier name and $len$ its visual length in
wenzelm@6343	892	terms of characters (e.g.\ length 1.0 would correspond to $1/2$\,em in
wenzelm@6343	893	\LaTeX). Thus $x'$ may include non-printing parts like control sequences or
wenzelm@6343	894	markup information for typesetting systems.
wenzelm@3108	895
wenzelm@3108	896
wenzelm@3108	897	\index{token translations\|)}
wenzelm@5371	898
wenzelm@5371	899
wenzelm@5371	900	%%% Local Variables:
wenzelm@5371	901	%%% mode: latex
wenzelm@5371	902	%%% TeX-master: "ref"
wenzelm@5371	903	%%% End:

author	Walther Neuper <neuper@ist.tugraz.at>
	Thu, 12 Aug 2010 15:03:34 +0200
branch	isac-from-Isabelle2009-2
changeset 37913	20e3616b2d9c
parent 30184	37969710e61f
child 43760	e87888b4152f
permissions	-rw-r--r--