wneuper/isa: doc-src/Ref/defining.tex@20e3616b2d9c (annotated)

wenzelm@30184	1
lcp@320	2	\chapter{Defining Logics} \label{Defining-Logics}
lcp@320	3
lcp@320	4	\section{Mixfix declarations} \label{sec:mixfix}
wenzelm@864	5	\index{mixfix declarations\|(}
lcp@320	6
lcp@320	7	When defining a theory, you declare new constants by giving their names,
lcp@320	8	their type, and an optional {\bf mixfix annotation}. Mixfix annotations
lcp@320	9	allow you to extend Isabelle's basic $\lambda$-calculus syntax with
lcp@320	10	readable notation. They can express any context-free priority grammar.
lcp@320	11	Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
wenzelm@864	12	general than the priority declarations of \ML\ and Prolog.
lcp@320	13
lcp@320	14	A mixfix annotation defines a production of the priority grammar. It
lcp@320	15	describes the concrete syntax, the translation to abstract syntax, and the
lcp@320	16	pretty printing. Special case annotations provide a simple means of
wenzelm@864	17	specifying infix operators and binders.
lcp@320	18
lcp@320	19	\subsection{The general mixfix form}
lcp@320	20	Here is a detailed account of mixfix declarations. Suppose the following
wenzelm@864	21	line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
wenzelm@864	22	file:
lcp@320	23	\begin{center}
lcp@320	24	{\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
lcp@320	25	\end{center}
lcp@332	26	This constant declaration and mixfix annotation are interpreted as follows:
lcp@320	27	\begin{itemize}\index{productions}
lcp@320	28	\item The string {\tt $c$} is the name of the constant associated with the
lcp@320	29	production; unless it is a valid identifier, it must be enclosed in
lcp@320	30	quotes. If $c$ is empty (given as~{\tt ""}) then this is a copy
lcp@320	31	production.\index{productions!copy} Otherwise, parsing an instance of the
lcp@320	32	phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
lcp@320	33	$a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
lcp@320	34	argument.
lcp@320	35
wenzelm@864	36	\item The constant $c$, if non-empty, is declared to have type $\sigma$
wenzelm@864	37	({\tt consts} section only).
lcp@320	38
lcp@320	39	\item The string $template$ specifies the right-hand side of
lcp@320	40	the production. It has the form
wenzelm@864	41	\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
lcp@320	42	where each occurrence of {\tt_} denotes an argument position and
lcp@320	43	the~$w@i$ do not contain~{\tt _}. (If you want a literal~{\tt _} in
lcp@320	44	the concrete syntax, you must escape it as described below.) The $w@i$
wenzelm@864	45	may consist of \rmindex{delimiters}, spaces or
lcp@320	46	\rmindex{pretty printing} annotations (see below).
lcp@320	47
lcp@320	48	\item The type $\sigma$ specifies the production's nonterminal symbols
lcp@320	49	(or name tokens). If $template$ is of the form above then $\sigma$
lcp@320	50	must be a function type with at least~$n$ argument positions, say
lcp@320	51	$\sigma = [\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are
lcp@320	52	derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
wenzelm@864	53	below. Any of these may be function types.
lcp@320	54
lcp@320	55	\item The optional list~$ps$ may contain at most $n$ integers, say {\tt
lcp@320	56	[$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
lcp@320	57	priority\indexbold{priorities} required of any phrase that may appear
lcp@320	58	as the $i$-th argument. Missing priorities default to~0.
wenzelm@4543	59
wenzelm@4543	60	\item The integer $p$ is the priority of this production. If
wenzelm@4543	61	omitted, it defaults to the maximal priority. Priorities range
wenzelm@4543	62	between 0 and \ttindexbold{max_pri} (= 1000).
lcp@320	63
lcp@320	64	\end{itemize}
lcp@320	65	%
wenzelm@864	66	The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
wenzelm@864	67	A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
wenzelm@864	68	nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
wenzelm@864	69	The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
wenzelm@864	70	this is a logical type (namely one of class {\tt logic} excluding {\tt
wenzelm@864	71	prop}). Otherwise it is $ty$ (note that only the outermost type constructor
wenzelm@864	72	is taken into account). Finally, the nonterminal of a type variable is {\tt
wenzelm@864	73	any}.
wenzelm@864	74
wenzelm@911	75	\begin{warn}
wenzelm@864	76	Theories must sometimes declare types for purely syntactic purposes ---
paulson@3485	77	merely playing the role of nonterminals. One example is \tydx{type}, the
wenzelm@864	78	built-in type of types. This is a `type of all types' in the syntactic
wenzelm@864	79	sense only. Do not declare such types under {\tt arities} as belonging to
wenzelm@864	80	class {\tt logic}\index{*logic class}, for that would make them useless as
wenzelm@864	81	separate nonterminal symbols.
wenzelm@864	82	\end{warn}
wenzelm@864	83
wenzelm@864	84	Associating nonterminals with types allows a constant's type to specify
wenzelm@864	85	syntax as well. We can declare the function~$f$ to have type $[\tau@1,
wenzelm@864	86	\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
wenzelm@864	87	of the function's $n$ arguments. The constant's name, in this case~$f$, will
wenzelm@864	88	also serve as the label in the abstract syntax tree.
wenzelm@864	89
wenzelm@864	90	You may also declare mixfix syntax without adding constants to the theory's
wenzelm@864	91	signature, by using a {\tt syntax} section instead of {\tt consts}. Thus a
wenzelm@864	92	production need not map directly to a logical function (this typically
wenzelm@864	93	requires additional syntactic translations, see also
wenzelm@864	94	Chapter~\ref{chap:syntax}).
wenzelm@864	95
wenzelm@864	96
wenzelm@911	97	\medskip
wenzelm@911	98	As a special case of the general mixfix declaration, the form
wenzelm@864	99	\begin{center}
wenzelm@911	100	{\tt $c$ ::\ "$\sigma$" ("$template$")}
wenzelm@864	101	\end{center}
wenzelm@864	102	specifies no priorities. The resulting production puts no priority
wenzelm@864	103	constraints on any of its arguments and has maximal priority itself.
wenzelm@864	104	Omitting priorities in this manner is prone to syntactic ambiguities unless
berghofe@3098	105	the production's right-hand side is fully bracketed, as in
berghofe@3098	106	\verb\|"if _ then _ else _ fi"\|.
lcp@320	107
lcp@320	108	Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
lcp@320	109	is sensible only if~$c$ is an identifier. Otherwise you will be unable to
lcp@320	110	write terms involving~$c$.
lcp@320	111
lcp@320	112
lcp@320	113	\subsection{Example: arithmetic expressions}
lcp@320	114	\index{examples!of mixfix declarations}
wenzelm@864	115	This theory specification contains a {\tt syntax} section with mixfix
lcp@320	116	declarations encoding the priority grammar from
lcp@320	117	\S\ref{sec:priority_grammars}:
lcp@320	118	\begin{ttbox}
wenzelm@3108	119	ExpSyntax = Pure +
lcp@320	120	types
lcp@320	121	exp
wenzelm@864	122	syntax
clasohm@1387	123	"0" :: exp ("0" 9)
clasohm@1387	124	"+" :: [exp, exp] => exp ("_ + _" [0, 1] 0)
clasohm@1387	125	"" :: [exp, exp] => exp ("_ _" [3, 2] 2)
clasohm@1387	126	"-" :: exp => exp ("- _" [3] 3)
lcp@320	127	end
lcp@320	128	\end{ttbox}
wenzelm@864	129	Executing {\tt Syntax.print_gram} reveals the productions derived from the
wenzelm@864	130	above mixfix declarations (lots of additional information deleted):
lcp@320	131	\begin{ttbox}
wenzelm@3108	132	Syntax.print_gram (syn_of ExpSyntax.thy);
lcp@320	133	{\out exp = "0" => "0" (9)}
lcp@320	134	{\out exp = exp[0] "+" exp[1] => "+" (0)}
lcp@320	135	{\out exp = exp[3] "" exp[2] => "" (2)}
lcp@320	136	{\out exp = "-" exp[3] => "-" (3)}
lcp@320	137	\end{ttbox}
lcp@320	138
wenzelm@3108	139	Note that because {\tt exp} is not of class {\tt logic}, it has been
paulson@3485	140	retained as a separate nonterminal. This also entails that the syntax
wenzelm@3108	141	does not provide for identifiers or paranthesized expressions.
wenzelm@3108	142	Normally you would also want to add the declaration {\tt arities
wenzelm@3108	143	exp::logic} after {\tt types} and use {\tt consts} instead of {\tt
paulson@3485	144	syntax}. Try this as an exercise and study the changes in the
nipkow@867	145	grammar.
lcp@320	146
lcp@320	147
lcp@320	148	\subsection{Infixes}
lcp@320	149	\indexbold{infixes}
lcp@320	150
wenzelm@3108	151	Infix operators associating to the left or right can be declared using
wenzelm@3108	152	{\tt infixl} or {\tt infixr}. Basically, the form {\tt $c$ ::\
wenzelm@3108	153	$\sigma$ (infixl $p$)} abbreviates the mixfix declarations
lcp@320	154	\begin{ttbox}
clasohm@1387	155	"op $c$" :: $\sigma$ ("(_ $c$/ _)" [$p$, $p+1$] $p$)
clasohm@1387	156	"op $c$" :: $\sigma$ ("op $c$")
lcp@320	157	\end{ttbox}
clasohm@1387	158	and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations
lcp@320	159	\begin{ttbox}
clasohm@1387	160	"op $c$" :: $\sigma$ ("(_ $c$/ _)" [$p+1$, $p$] $p$)
clasohm@1387	161	"op $c$" :: $\sigma$ ("op $c$")
lcp@320	162	\end{ttbox}
lcp@320	163	The infix operator is declared as a constant with the prefix {\tt op}.
lcp@320	164	Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
lcp@320	165	function symbols, as in \ML. Special characters occurring in~$c$ must be
lcp@320	166	escaped, as in delimiters, using a single quote.
lcp@320	167
wenzelm@3108	168	A slightly more general form of infix declarations allows constant
wenzelm@3108	169	names to be independent from their concrete syntax, namely \texttt{$c$
paulson@3485	170	::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}. As
wenzelm@3108	171	an example consider:
wenzelm@3108	172	\begin{ttbox}
wenzelm@3108	173	and :: [bool, bool] => bool (infixr "&" 35)
wenzelm@3108	174	\end{ttbox}
wenzelm@3108	175	The internal constant name will then be just \texttt{and}, without any
wenzelm@3108	176	\texttt{op} prefixed.
wenzelm@3108	177
lcp@320	178
lcp@320	179	\subsection{Binders}
lcp@320	180	\indexbold{binders}
lcp@320	181	\begingroup
lcp@320	182	\def\Q{{\cal Q}}
lcp@320	183	A {\bf binder} is a variable-binding construct such as a quantifier. The
lcp@320	184	constant declaration
lcp@320	185	\begin{ttbox}
clasohm@1387	186	$c$ :: $\sigma$ (binder "$\Q$" [$pb$] $p$)
lcp@320	187	\end{ttbox}
lcp@320	188	introduces a constant~$c$ of type~$\sigma$, which must have the form
lcp@320	189	$(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where
lcp@320	190	$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
paulson@3485	191	and the whole term has type~$\tau@3$. The optional integer $pb$
lcp@1060	192	specifies the body's priority, by default~$p$. Special characters
clasohm@877	193	in $\Q$ must be escaped using a single quote.
lcp@320	194
wenzelm@864	195	The declaration is expanded internally to something like
lcp@320	196	\begin{ttbox}
berghofe@3098	197	$c$\hskip3pt :: ($\tau@1$ => $\tau@2$) => $\tau@3$
berghofe@3098	198	"$\Q$" :: [idts, $\tau@2$] => $\tau@3$ ("(3$\Q$_./ _)" [0, $pb$] $p$)
lcp@320	199	\end{ttbox}
lcp@320	200	Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
lcp@332	201	\index{type constraints}
lcp@320	202	optional type constraints (see Fig.\ts\ref{fig:pure_gram}). The
lcp@320	203	declaration also installs a parse translation\index{translations!parse}
lcp@320	204	for~$\Q$ and a print translation\index{translations!print} for~$c$ to
lcp@320	205	translate between the internal and external forms.
lcp@320	206
lcp@320	207	A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
lcp@320	208	list of variables. The external form $\Q~x@1~x@2 \ldots x@n. P$
lcp@320	209	corresponds to the internal form
lcp@320	210	\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
lcp@320	211
lcp@320	212	\medskip
lcp@320	213	For example, let us declare the quantifier~$\forall$:\index{quantifiers}
lcp@320	214	\begin{ttbox}
clasohm@1387	215	All :: ('a => o) => o (binder "ALL " 10)
lcp@320	216	\end{ttbox}
lcp@320	217	This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
lcp@320	218	$x$.$P$}. When printing, Isabelle prefers the latter form, but must fall
lcp@320	219	back on ${\tt All}(P)$ if $P$ is not an abstraction. Both $P$ and {\tt ALL
lcp@320	220	$x$.$P$} have type~$o$, the type of formulae, while the bound variable
lcp@320	221	can be polymorphic.
lcp@320	222	\endgroup
lcp@320	223
lcp@320	224	\index{mixfix declarations\|)}
lcp@320	225
wenzelm@3108	226
wenzelm@3108	227	\section{*Alternative print modes} \label{sec:prmodes}
wenzelm@3108	228	\index{print modes\|(}
wenzelm@3108	229	%
paulson@3485	230	Isabelle's pretty printer supports alternative output syntaxes. These
paulson@3485	231	may be used independently or in cooperation. The currently active
wenzelm@3108	232	print modes (with precedence from left to right) are determined by a
wenzelm@3108	233	reference variable.
wenzelm@3108	234	\begin{ttbox}\index{*print_mode}
wenzelm@3108	235	print_mode: string list ref
wenzelm@3108	236	\end{ttbox}
wenzelm@3108	237	Initially this may already contain some print mode identifiers,
wenzelm@3108	238	depending on how Isabelle has been invoked (e.g.\ by some user
paulson@3485	239	interface). So changes should be incremental --- adding or deleting
wenzelm@3108	240	modes relative to the current value.
wenzelm@3108	241
wenzelm@12465	242	Any \ML{} string is a legal print mode identifier, without any predeclaration
wenzelm@12465	243	required. The following names should be considered reserved, though:
wenzelm@12465	244	\texttt{""} (the empty string), \texttt{symbols}, \texttt{xsymbols}, and
wenzelm@12465	245	\texttt{latex}.
wenzelm@3108	246
wenzelm@3108	247	There is a separate table of mixfix productions for pretty printing
paulson@3485	248	associated with each print mode. The currently active ones are
wenzelm@3108	249	conceptually just concatenated from left to right, with the standard
wenzelm@3108	250	syntax output table always coming last as default. Thus mixfix
wenzelm@3108	251	productions of preceding modes in the list may override those of later
wenzelm@3108	252	ones. Also note that token translations are always relative to some
wenzelm@3108	253	print mode (see \S\ref{sec:tok_tr}).
wenzelm@3108	254
wenzelm@3108	255	\medskip The canonical application of print modes is optional printing
wenzelm@3108	256	of mathematical symbols from a special screen font instead of {\sc
paulson@3485	257	ascii}. Another example is to re-use Isabelle's advanced
wenzelm@3108	258	$\lambda$-term printing mechanisms to generate completely different
wenzelm@3228	259	output, say for interfacing external tools like \rmindex{model
wenzelm@3228	260	checkers} (see also \texttt{HOL/Modelcheck}).
wenzelm@3108	261
wenzelm@3108	262	\index{print modes\|)}
wenzelm@3108	263
wenzelm@3108	264
clasohm@711	265	\section{Ambiguity of parsed expressions} \label{sec:ambiguity}
clasohm@711	266	\index{ambiguity!of parsed expressions}
clasohm@711	267
clasohm@711	268	To keep the grammar small and allow common productions to be shared
wenzelm@864	269	all logical types (except {\tt prop}) are internally represented
paulson@3485	270	by one nonterminal, namely {\tt logic}. This and omitted or too freely
clasohm@711	271	chosen priorities may lead to ways of parsing an expression that were
paulson@3485	272	not intended by the theory's maker. In most cases Isabelle is able to
wenzelm@864	273	select one of multiple parse trees that an expression has lead
paulson@3485	274	to by checking which of them can be typed correctly. But this may not
clasohm@711	275	work in every case and always slows down parsing.
wenzelm@864	276	The warning and error messages that can be produced during this process are
clasohm@711	277	as follows:
clasohm@711	278
clasohm@880	279	If an ambiguity can be resolved by type inference the following
clasohm@880	280	warning is shown to remind the user that parsing is (unnecessarily)
paulson@3485	281	slowed down. In cases where it's not easily possible to eliminate the
clasohm@880	282	ambiguity the frequency of the warning can be controlled by changing
clasohm@883	283	the value of {\tt Syntax.ambiguity_level} which has type {\tt int
paulson@3485	284	ref}. Its default value is 1 and by increasing it one can control how
clasohm@883	285	many parse trees are necessary to generate the warning.
clasohm@711	286
clasohm@711	287	\begin{ttbox}
wenzelm@3801	288	{\out Ambiguous input "\dots"}
clasohm@711	289	{\out produces the following parse trees:}
wenzelm@3801	290	{\out \dots}
clasohm@711	291	{\out Fortunately, only one parse tree is type correct.}
wenzelm@3801	292	{\out You may still want to disambiguate your grammar or your input.}
clasohm@711	293	\end{ttbox}
clasohm@711	294
clasohm@711	295	The following message is normally caused by using the same
clasohm@711	296	syntax in two different productions:
clasohm@711	297
clasohm@711	298	\begin{ttbox}
wenzelm@3802	299	{\out Ambiguous input "..."}
clasohm@711	300	{\out produces the following parse trees:}
wenzelm@3802	301	{\out \dots}
wenzelm@3802	302	{\out More than one term is type correct:}
wenzelm@3802	303	{\out \dots}
clasohm@711	304	\end{ttbox}
clasohm@711	305
clasohm@866	306	Ambiguities occuring in syntax translation rules cannot be resolved by
clasohm@866	307	type inference because it is not necessary for these rules to be type
paulson@3485	308	correct. Therefore Isabelle always generates an error message and the
clasohm@866	309	ambiguity should be eliminated by changing the grammar or the rule.
clasohm@711	310
lcp@320	311
wenzelm@5371	312	%%% Local Variables:
wenzelm@5371	313	%%% mode: latex
wenzelm@5371	314	%%% TeX-master: "ref"
wenzelm@5371	315	%%% End:

author	Walther Neuper <neuper@ist.tugraz.at>
	Thu, 12 Aug 2010 15:03:34 +0200
branch	isac-from-Isabelle2009-2
changeset 37913	20e3616b2d9c
parent 30184	37969710e61f
child 43760	e87888b4152f
permissions	-rw-r--r--