1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000
1.2 +++ b/doc-src/Ref/defining.tex Fri Apr 15 16:37:59 1994 +0200
1.3 @@ -0,0 +1,772 @@
1.4 +%% $Id$
1.5 +\chapter{Defining Logics} \label{Defining-Logics}
1.6 +This chapter explains how to define new formal systems --- in particular,
1.7 +their concrete syntax. While Isabelle can be regarded as a theorem prover
1.8 +for set theory, higher-order logic or the sequent calculus, its
1.9 +distinguishing feature is support for the definition of new logics.
1.10 +
1.11 +Isabelle logics are hierarchies of theories, which are described and
1.12 +illustrated in
1.13 +\iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%
1.14 +{\S\ref{sec:defining-theories}}. That material, together with the theory
1.15 +files provided in the examples directories, should suffice for all simple
1.16 +applications. The easiest way to define a new theory is by modifying a
1.17 +copy of an existing theory.
1.18 +
1.19 +This chapter documents the meta-logic syntax, mixfix declarations and
1.20 +pretty printing. The extended examples in \S\ref{sec:min_logics}
1.21 +demonstrate the logical aspects of the definition of theories.
1.22 +
1.23 +
1.24 +\section{Priority grammars} \label{sec:priority_grammars}
1.25 +\index{priority grammars|(}
1.26 +
1.27 +A context-free grammar contains a set of {\bf nonterminal symbols}, a set of
1.28 +{\bf terminal symbols} and a set of {\bf productions}\index{productions}.
1.29 +Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and
1.30 +$\gamma$ is a string of terminals and nonterminals. One designated
1.31 +nonterminal is called the {\bf start symbol}. The language defined by the
1.32 +grammar consists of all strings of terminals that can be derived from the
1.33 +start symbol by applying productions as rewrite rules.
1.34 +
1.35 +The syntax of an Isabelle logic is specified by a {\bf priority
1.36 + grammar}.\index{priorities} Each nonterminal is decorated by an integer
1.37 +priority, as in~$A^{(p)}$. A nonterminal $A^{(p)}$ in a derivation may be
1.38 +rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$. Any
1.39 +priority grammar can be translated into a normal context free grammar by
1.40 +introducing new nonterminals and productions.
1.41 +
1.42 +Formally, a set of context free productions $G$ induces a derivation
1.43 +relation $\longrightarrow@G$. Let $\alpha$ and $\beta$ denote strings of
1.44 +terminal or nonterminal symbols. Then
1.45 +\[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \]
1.46 +if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$.
1.47 +
1.48 +The following simple grammar for arithmetic expressions demonstrates how
1.49 +binding power and associativity of operators can be enforced by priorities.
1.50 +\begin{center}
1.51 +\begin{tabular}{rclr}
1.52 + $A^{(9)}$ & = & {\tt0} \\
1.53 + $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
1.54 + $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
1.55 + $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
1.56 + $A^{(3)}$ & = & {\tt-} $A^{(3)}$
1.57 +\end{tabular}
1.58 +\end{center}
1.59 +The choice of priorities determines that {\tt -} binds tighter than {\tt *},
1.60 +which binds tighter than {\tt +}. Furthermore {\tt +} associates to the
1.61 +left and {\tt *} to the right.
1.62 +
1.63 +For clarity, grammars obey these conventions:
1.64 +\begin{itemize}
1.65 +\item All priorities must lie between~0 and \ttindex{max_pri}, which is a
1.66 + some fixed integer. Sometimes {\tt max_pri} is written as $\infty$.
1.67 +\item Priority 0 on the right-hand side and priority \ttindex{max_pri} on
1.68 + the left-hand side may be omitted.
1.69 +\item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the
1.70 + priority of the left-hand side actually appears in a column on the far
1.71 + right.
1.72 +\item Alternatives are separated by~$|$.
1.73 +\item Repetition is indicated by dots~(\dots) in an informal but obvious
1.74 + way.
1.75 +\end{itemize}
1.76 +
1.77 +Using these conventions and assuming $\infty=9$, the grammar
1.78 +takes the form
1.79 +\begin{center}
1.80 +\begin{tabular}{rclc}
1.81 +$A$ & = & {\tt0} & \hspace*{4em} \\
1.82 + & $|$ & {\tt(} $A$ {\tt)} \\
1.83 + & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
1.84 + & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
1.85 + & $|$ & {\tt-} $A^{(3)}$ & (3)
1.86 +\end{tabular}
1.87 +\end{center}
1.88 +\index{priority grammars|)}
1.89 +
1.90 +
1.91 +\begin{figure}
1.92 +\begin{center}
1.93 +\begin{tabular}{rclc}
1.94 +$prop$ &=& {\tt PROP} $aprop$ ~~$|$~~ {\tt(} $prop$ {\tt)} \\
1.95 + &$|$& $logic^{(3)}$ {\tt ==} $logic^{(2)}$ & (2) \\
1.96 + &$|$& $logic^{(3)}$ {\tt =?=} $logic^{(2)}$ & (2) \\
1.97 + &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\
1.98 + &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
1.99 + &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\
1.100 +$logic$ &=& $prop$ ~~$|$~~ $fun$ \\\\
1.101 +$aprop$ &=& $id$ ~~$|$~~ $var$
1.102 + ~~$|$~~ $fun^{(\infty)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\
1.103 +$fun$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $fun$ {\tt)} \\
1.104 + &$|$& $fun^{(\infty)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\
1.105 + &$|$& $fun^{(\infty)}$ {\tt::} $type$ \\
1.106 + &$|$& {\tt \%} $idts$ {\tt.} $logic$ & (0) \\\\
1.107 +$idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
1.108 +$idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
1.109 + &$|$& $id$ {\tt ::} $type$ & (0) \\\\
1.110 +$type$ &=& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
1.111 + ~~$|$~~ $tvar$ {\tt::} $sort$ \\
1.112 + &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$
1.113 + ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
1.114 + &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\
1.115 + &$|$& {\tt[} $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\
1.116 + &$|$& {\tt(} $type$ {\tt)} \\\\
1.117 +$sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
1.118 + ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
1.119 +\end{tabular}
1.120 +\index{*PROP symbol}
1.121 +\index{*== symbol}\index{*=?= symbol}\index{*==> symbol}
1.122 +\index{*:: symbol}\index{*=> symbol}
1.123 +%index command: percent is permitted, but braces must match!
1.124 +\index{%@{\tt\%} symbol}
1.125 +\index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}
1.126 +\index{*[ symbol}\index{*] symbol}
1.127 +\index{*"!"! symbol}
1.128 +\index{*"["| symbol}
1.129 +\index{*"|"] symbol}
1.130 +\end{center}
1.131 +\caption{Meta-logic syntax}\label{fig:pure_gram}
1.132 +\end{figure}
1.133 +
1.134 +
1.135 +\section{The Pure syntax} \label{sec:basic_syntax}
1.136 +\index{syntax!Pure|(}
1.137 +
1.138 +At the root of all object-logics lies the theory \thydx{Pure}. It
1.139 +contains, among many other things, the Pure syntax. An informal account of
1.140 +this basic syntax (types, terms and formulae) appears in
1.141 +\iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%
1.142 +{\S\ref{sec:forward}}. A more precise description using a priority grammar
1.143 +appears in Fig.\ts\ref{fig:pure_gram}. It defines the following
1.144 +nonterminals:
1.145 +\begin{ttdescription}
1.146 +\item[\ndxbold{prop}] denotes terms of type {\tt prop}. These are formulae
1.147 + of the meta-logic.
1.148 +
1.149 +\item[\ndxbold{aprop}] denotes atomic propositions. These typically
1.150 + include the judgement forms of the object-logic; its definition
1.151 + introduces a meta-level predicate for each judgement form.
1.152 +
1.153 +\item[\ndxbold{logic}] denotes terms whose type belongs to class
1.154 + \cldx{logic}. As the syntax is extended by new object-logics, more
1.155 + productions for {\tt logic} are added automatically (see below).
1.156 +
1.157 + \item[\ndxbold{fun}] denotes terms potentially of function type.
1.158 +
1.159 + \item[\ndxbold{type}] denotes types of the meta-logic.
1.160 +
1.161 + \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained
1.162 + by types.
1.163 +\end{ttdescription}
1.164 +
1.165 +\begin{warn}
1.166 + In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,
1.167 + treating {\tt y} like a type constructor applied to {\tt nat}. The
1.168 + likely result is an error message. To avoid this interpretation, use
1.169 + parentheses and write \verb|(x::nat) y|.
1.170 +
1.171 + Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
1.172 + yields an error. The correct form is \verb|(x::nat) (y::nat)|.
1.173 +\end{warn}
1.174 +
1.175 +\subsection{Logical types and default syntax}\label{logical-types}
1.176 +\index{lambda calc@$\lambda$-calculus}
1.177 +
1.178 +Isabelle's representation of mathematical languages is based on the simply
1.179 +typed $\lambda$-calculus. All logical types, namely those of class
1.180 +\cldx{logic}, are automatically equipped with a basic syntax of types,
1.181 +identifiers, variables, parentheses, $\lambda$-abstractions and
1.182 +applications.
1.183 +
1.184 +More precisely, for each type constructor $ty$ with arity $(\vec{s})c$,
1.185 +where $c$ is a subclass of \cldx{logic}, several productions are added:
1.186 +\begin{center}
1.187 +\begin{tabular}{rclc}
1.188 +$ty$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $ty$ {\tt)} \\
1.189 + &$|$& $fun^{(\infty)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\
1.190 + &$|$& $ty^{(\infty)}$ {\tt::} $type$\\\\
1.191 +$logic$ &=& $ty$
1.192 +\end{tabular}
1.193 +\end{center}
1.194 +
1.195 +
1.196 +\subsection{Lexical matters}
1.197 +The parser does not process input strings directly. It operates on token
1.198 +lists provided by Isabelle's \bfindex{lexer}. There are two kinds of
1.199 +tokens: \bfindex{delimiters} and \bfindex{name tokens}.
1.200 +
1.201 +\index{reserved words}
1.202 +Delimiters can be regarded as reserved words of the syntax. You can
1.203 +add new ones when extending theories. In Fig.\ts\ref{fig:pure_gram} they
1.204 +appear in typewriter font, for example {\tt ==}, {\tt =?=} and
1.205 +{\tt PROP}\@.
1.206 +
1.207 +Name tokens have a predefined syntax. The lexer distinguishes four
1.208 +disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
1.209 +identifiers\index{type identifiers}, type unknowns\index{type unknowns}.
1.210 +They are denoted by \ndxbold{id}, \ndxbold{var}, \ndxbold{tid},
1.211 +\ndxbold{tvar}, respectively. Typical examples are {\tt x}, {\tt ?x7},
1.212 +{\tt 'a}, {\tt ?'a3}. Here is the precise syntax:
1.213 +\begin{eqnarray*}
1.214 +id & = & letter~quasiletter^* \\
1.215 +var & = & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
1.216 +tid & = & \mbox{\tt '}id \\
1.217 +tvar & = & \mbox{\tt ?}tid ~~|~~
1.218 + \mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex]
1.219 +letter & = & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
1.220 +digit & = & \mbox{one of {\tt 0}\dots {\tt 9}} \\
1.221 +quasiletter & = & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
1.222 +nat & = & digit^+
1.223 +\end{eqnarray*}
1.224 +A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally
1.225 +a pair of base name and index (\ML\ type \mltydx{indexname}). These
1.226 +components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
1.227 +run together as in {\tt ?x1}. The latter form is possible if the base name
1.228 +does not end with digits. If the index is 0, it may be dropped altogether:
1.229 +{\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
1.230 +
1.231 +The lexer repeatedly takes the maximal prefix of the input string that
1.232 +forms a valid token. A maximal prefix that is both a delimiter and a name
1.233 +is treated as a delimiter. Spaces, tabs and newlines are separators; they
1.234 +never occur within tokens.
1.235 +
1.236 +Delimiters need not be separated by white space. For example, if {\tt -}
1.237 +is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
1.238 +two consecutive occurrences of the token~{\tt -}. In contrast, \ML\
1.239 +treats {\tt --} as a single symbolic name. The consequence of Isabelle's
1.240 +more liberal scheme is that the same string may be parsed in different ways
1.241 +after extending the syntax: after adding {\tt --} as a delimiter, the input
1.242 +{\tt --} is treated as a single token.
1.243 +
1.244 +Although name tokens are returned from the lexer rather than the parser, it
1.245 +is more logical to regard them as nonterminals. Delimiters, however, are
1.246 +terminals; they are just syntactic sugar and contribute nothing to the
1.247 +abstract syntax tree.
1.248 +
1.249 +
1.250 +\subsection{*Inspecting the syntax}
1.251 +\begin{ttbox}
1.252 +syn_of : theory -> Syntax.syntax
1.253 +Syntax.print_syntax : Syntax.syntax -> unit
1.254 +Syntax.print_gram : Syntax.syntax -> unit
1.255 +Syntax.print_trans : Syntax.syntax -> unit
1.256 +\end{ttbox}
1.257 +The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes
1.258 +in \ML. You can display values of this type by calling the following
1.259 +functions:
1.260 +\begin{ttdescription}
1.261 +\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
1.262 + theory~{\it thy} as an \ML\ value.
1.263 +
1.264 +\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
1.265 + information contained in the syntax {\it syn}. The displayed output can
1.266 + be large. The following two functions are more selective.
1.267 +
1.268 +\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
1.269 + of~{\it syn}, namely the lexicon, roots and productions. These are
1.270 + discussed below.
1.271 +
1.272 +\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
1.273 + part of~{\it syn}, namely the constants, parse/print macros and
1.274 + parse/print translations.
1.275 +\end{ttdescription}
1.276 +
1.277 +Let us demonstrate these functions by inspecting Pure's syntax. Even that
1.278 +is too verbose to display in full.
1.279 +\begin{ttbox}\index{*Pure theory}
1.280 +Syntax.print_syntax (syn_of Pure.thy);
1.281 +{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
1.282 +{\out roots: logic type fun prop}
1.283 +{\out prods:}
1.284 +{\out type = tid (1000)}
1.285 +{\out type = tvar (1000)}
1.286 +{\out type = id (1000)}
1.287 +{\out type = tid "::" sort[0] => "_ofsort" (1000)}
1.288 +{\out type = tvar "::" sort[0] => "_ofsort" (1000)}
1.289 +{\out \vdots}
1.290 +\ttbreak
1.291 +{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
1.292 +{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
1.293 +{\out "_idtyp" "_lambda" "_tapp" "_tappl"}
1.294 +{\out parse_rules:}
1.295 +{\out parse_translation: "!!" "_K" "_abs" "_aprop"}
1.296 +{\out print_translation: "all"}
1.297 +{\out print_rules:}
1.298 +{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
1.299 +\end{ttbox}
1.300 +
1.301 +As you can see, the output is divided into labeled sections. The grammar
1.302 +is represented by {\tt lexicon}, {\tt roots} and {\tt prods}. The rest
1.303 +refers to syntactic translations and macro expansion. Here is an
1.304 +explanation of the various sections.
1.305 +\begin{description}
1.306 + \item[{\tt lexicon}] lists the delimiters used for lexical
1.307 + analysis.\index{delimiters}
1.308 +
1.309 + \item[{\tt roots}] lists the grammar's nonterminal symbols. You must
1.310 + name the desired root when calling lower level functions or specifying
1.311 + macros. Higher level functions usually expect a type and derive the
1.312 + actual root as described in~\S\ref{sec:grammar}.
1.313 +
1.314 + \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.
1.315 + The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
1.316 + Each delimiter is quoted. Some productions are shown with {\tt =>} and
1.317 + an attached string. These strings later become the heads of parse
1.318 + trees; they also play a vital role when terms are printed (see
1.319 + \S\ref{sec:asts}).
1.320 +
1.321 + Productions with no strings attached are called {\bf copy
1.322 + productions}\indexbold{productions!copy}. Their right-hand side must
1.323 + have exactly one nonterminal symbol (or name token). The parser does
1.324 + not create a new parse tree node for copy productions, but simply
1.325 + returns the parse tree of the right-hand symbol.
1.326 +
1.327 + If the right-hand side consists of a single nonterminal with no
1.328 + delimiters, then the copy production is called a {\bf chain
1.329 + production}. Chain productions act as abbreviations:
1.330 + conceptually, they are removed from the grammar by adding new
1.331 + productions. Priority information attached to chain productions is
1.332 + ignored; only the dummy value $-1$ is displayed.
1.333 +
1.334 + \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]
1.335 + relate to macros (see \S\ref{sec:macros}).
1.336 +
1.337 + \item[{\tt parse_ast_translation}, {\tt print_ast_translation}]
1.338 + list sets of constants that invoke translation functions for abstract
1.339 + syntax trees. Section \S\ref{sec:asts} below discusses this obscure
1.340 + matter.\index{constants!for translations}
1.341 +
1.342 + \item[{\tt parse_translation}, {\tt print_translation}] list sets
1.343 + of constants that invoke translation functions for terms (see
1.344 + \S\ref{sec:tr_funs}).
1.345 +\end{description}
1.346 +\index{syntax!Pure|)}
1.347 +
1.348 +
1.349 +\section{Mixfix declarations} \label{sec:mixfix}
1.350 +\index{mixfix declarations|(}
1.351 +
1.352 +When defining a theory, you declare new constants by giving their names,
1.353 +their type, and an optional {\bf mixfix annotation}. Mixfix annotations
1.354 +allow you to extend Isabelle's basic $\lambda$-calculus syntax with
1.355 +readable notation. They can express any context-free priority grammar.
1.356 +Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
1.357 +general than the priority declarations of \ML\ and Prolog.
1.358 +
1.359 +A mixfix annotation defines a production of the priority grammar. It
1.360 +describes the concrete syntax, the translation to abstract syntax, and the
1.361 +pretty printing. Special case annotations provide a simple means of
1.362 +specifying infix operators, binders and so forth.
1.363 +
1.364 +\subsection{Grammar productions}\label{sec:grammar}\index{productions}
1.365 +
1.366 +Let us examine the treatment of the production
1.367 +\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\,
1.368 + A@n^{(p@n)}\, w@n. \]
1.369 +Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$,
1.370 +\ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals.
1.371 +In the corresponding mixfix annotation, the priorities are given separately
1.372 +as $[p@1,\ldots,p@n]$ and~$p$. The nonterminal symbols are identified with
1.373 +types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's
1.374 +effect on nonterminals is expressed as the function type
1.375 +\[ [\tau@1, \ldots, \tau@n]\To \tau. \]
1.376 +Finally, the template
1.377 +\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n \]
1.378 +describes the strings of terminals.
1.379 +
1.380 +A simple type is typically declared for each nonterminal symbol. In
1.381 +first-order logic, type~$i$ stands for terms and~$o$ for formulae. Only
1.382 +the outermost type constructor is taken into account. For example, any
1.383 +type of the form $\sigma list$ stands for a list; productions may refer
1.384 +to the symbol {\tt list} and will apply lists of any type.
1.385 +
1.386 +The symbol associated with a type is called its {\bf root} since it may
1.387 +serve as the root of a parse tree. Precisely, the root of $(\tau@1, \dots,
1.388 +\tau@n)ty$ is $ty$, where $\tau@1$, \ldots, $\tau@n$ are types and $ty$ is
1.389 +a type constructor. Type infixes are a special case of this; in
1.390 +particular, the root of $\tau@1 \To \tau@2$ is {\tt fun}. Finally, the
1.391 +root of a type variable is {\tt logic}; general productions might
1.392 +refer to this nonterminal.
1.393 +
1.394 +Identifying nonterminals with types allows a constant's type to specify
1.395 +syntax as well. We can declare the function~$f$ to have type $[\tau@1,
1.396 +\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the
1.397 +layout of the function's $n$ arguments. The constant's name, in this
1.398 +case~$f$, will also serve as the label in the abstract syntax tree. There
1.399 +are two exceptions to this treatment of constants:
1.400 +\begin{enumerate}\index{constants!syntactic}
1.401 + \item A production need not map directly to a logical function. In this
1.402 + case, you must declare a constant whose purpose is purely syntactic.
1.403 + By convention such constants begin with the symbol~{\tt\at},
1.404 + ensuring that they can never be written in formulae.
1.405 +
1.406 + \item A copy production has no associated constant.\index{productions!copy}
1.407 +\end{enumerate}
1.408 +There is something artificial about this representation of productions,
1.409 +but it is convenient, particularly for simple theory extensions.
1.410 +
1.411 +\subsection{The general mixfix form}
1.412 +Here is a detailed account of mixfix declarations. Suppose the following
1.413 +line occurs within the {\tt consts} section of a {\tt .thy} file:
1.414 +\begin{center}
1.415 + {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
1.416 +\end{center}
1.417 +This constant declaration and mixfix annotation is interpreted as follows:
1.418 +\begin{itemize}\index{productions}
1.419 +\item The string {\tt $c$} is the name of the constant associated with the
1.420 + production; unless it is a valid identifier, it must be enclosed in
1.421 + quotes. If $c$ is empty (given as~{\tt ""}) then this is a copy
1.422 + production.\index{productions!copy} Otherwise, parsing an instance of the
1.423 + phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
1.424 + $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
1.425 + argument.
1.426 +
1.427 + \item The constant $c$, if non-empty, is declared to have type $\sigma$.
1.428 +
1.429 + \item The string $template$ specifies the right-hand side of
1.430 + the production. It has the form
1.431 + \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
1.432 + where each occurrence of {\tt_} denotes an argument position and
1.433 + the~$w@i$ do not contain~{\tt _}. (If you want a literal~{\tt _} in
1.434 + the concrete syntax, you must escape it as described below.) The $w@i$
1.435 + may consist of \rmindex{delimiters}, spaces or
1.436 + \rmindex{pretty printing} annotations (see below).
1.437 +
1.438 + \item The type $\sigma$ specifies the production's nonterminal symbols
1.439 + (or name tokens). If $template$ is of the form above then $\sigma$
1.440 + must be a function type with at least~$n$ argument positions, say
1.441 + $\sigma = [\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are
1.442 + derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
1.443 + above. Any of these may be function types; the corresponding root is
1.444 + then \tydx{fun}.
1.445 +
1.446 + \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
1.447 + [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
1.448 + priority\indexbold{priorities} required of any phrase that may appear
1.449 + as the $i$-th argument. Missing priorities default to~0.
1.450 +
1.451 + \item The integer $p$ is the priority of this production. If omitted, it
1.452 + defaults to the maximal priority.
1.453 + Priorities range between 0 and \ttindexbold{max_pri} (= 1000).
1.454 +\end{itemize}
1.455 +%
1.456 +The declaration {\tt $c$ ::\ "$\sigma$" ("$template$")} specifies no
1.457 +priorities. The resulting production puts no priority constraints on any
1.458 +of its arguments and has maximal priority itself. Omitting priorities in
1.459 +this manner will introduce syntactic ambiguities unless the production's
1.460 +right-hand side is fully bracketed, as in \verb|"if _ then _ else _ fi"|.
1.461 +
1.462 +Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
1.463 +is sensible only if~$c$ is an identifier. Otherwise you will be unable to
1.464 +write terms involving~$c$.
1.465 +
1.466 +\begin{warn}
1.467 + Theories must sometimes declare types for purely syntactic purposes. One
1.468 + example is \tydx{type}, the built-in type of types. This is a `type of
1.469 + all types' in the syntactic sense only. Do not declare such types under
1.470 + {\tt arities} as belonging to class {\tt logic}\index{*logic class}, for
1.471 + that would allow their use in arbitrary Isabelle
1.472 + expressions~(\S\ref{logical-types}).
1.473 +\end{warn}
1.474 +
1.475 +\subsection{Example: arithmetic expressions}
1.476 +\index{examples!of mixfix declarations}
1.477 +This theory specification contains a {\tt consts} section with mixfix
1.478 +declarations encoding the priority grammar from
1.479 +\S\ref{sec:priority_grammars}:
1.480 +\begin{ttbox}
1.481 +EXP = Pure +
1.482 +types
1.483 + exp
1.484 +arities
1.485 + exp :: logic
1.486 +consts
1.487 + "0" :: "exp" ("0" 9)
1.488 + "+" :: "[exp, exp] => exp" ("_ + _" [0, 1] 0)
1.489 + "*" :: "[exp, exp] => exp" ("_ * _" [3, 2] 2)
1.490 + "-" :: "exp => exp" ("- _" [3] 3)
1.491 +end
1.492 +\end{ttbox}
1.493 +The {\tt arities} declaration causes {\tt exp} to be added as a new root.
1.494 +If you put this into a file {\tt exp.thy} and load it via {\tt
1.495 + use_thy "EXP"}, you can run some tests:
1.496 +\begin{ttbox}
1.497 +val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";
1.498 +{\out val it = fn : string -> unit}
1.499 +read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";
1.500 +{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
1.501 +{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
1.502 +{\out \vdots}
1.503 +read_exp "0 + - 0 + 0";
1.504 +{\out tokens: "0" "+" "-" "0" "+" "0"}
1.505 +{\out raw: ("+" ("+" "0" ("-" "0")) "0")}
1.506 +{\out \vdots}
1.507 +\end{ttbox}
1.508 +The output of \ttindex{Syntax.test_read} includes the token list ({\tt
1.509 + tokens}) and the raw \AST{} directly derived from the parse tree,
1.510 +ignoring parse \AST{} translations. The rest is tracing information
1.511 +provided by the macro expander (see \S\ref{sec:macros}).
1.512 +
1.513 +Executing {\tt Syntax.print_gram} reveals the productions derived
1.514 +from our mixfix declarations (lots of additional information deleted):
1.515 +\begin{ttbox}
1.516 +Syntax.print_gram (syn_of EXP.thy);
1.517 +{\out exp = "0" => "0" (9)}
1.518 +{\out exp = exp[0] "+" exp[1] => "+" (0)}
1.519 +{\out exp = exp[3] "*" exp[2] => "*" (2)}
1.520 +{\out exp = "-" exp[3] => "-" (3)}
1.521 +\end{ttbox}
1.522 +
1.523 +
1.524 +\subsection{The mixfix template}
1.525 +Let us take a closer look at the string $template$ appearing in mixfix
1.526 +annotations. This string specifies a list of parsing and printing
1.527 +directives: delimiters\index{delimiters}, arguments, spaces, blocks of
1.528 +indentation and line breaks. These are encoded by the following character
1.529 +sequences:
1.530 +\index{pretty printing|(}
1.531 +\begin{description}
1.532 +\item[~$d$~] is a delimiter, namely a non-empty sequence of characters
1.533 + other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.
1.534 + Even these characters may appear if escaped; this means preceding it with
1.535 + a~{\tt '} (single quote). Thus you have to write {\tt ''} if you really
1.536 + want a single quote. Delimiters may never contain spaces.
1.537 +
1.538 +\item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol
1.539 + or name token.
1.540 +
1.541 +\item[~$s$~] is a non-empty sequence of spaces for printing. This and the
1.542 + following specifications do not affect parsing at all.
1.543 +
1.544 +\item[~{\tt(}$n$~] opens a pretty printing block. The optional number $n$
1.545 + specifies how much indentation to add when a line break occurs within the
1.546 + block. If {\tt(} is not followed by digits, the indentation defaults
1.547 + to~0.
1.548 +
1.549 +\item[~{\tt)}~] closes a pretty printing block.
1.550 +
1.551 +\item[~{\tt//}~] forces a line break.
1.552 +
1.553 +\item[~{\tt/}$s$~] allows a line break. Here $s$ stands for the string of
1.554 + spaces (zero or more) right after the {\tt /} character. These spaces
1.555 + are printed if the break is not taken.
1.556 +\end{description}
1.557 +For example, the template {\tt"(_ +/ _)"} specifies an infix operator.
1.558 +There are two argument positions; the delimiter~{\tt+} is preceded by a
1.559 +space and followed by a space or line break; the entire phrase is a pretty
1.560 +printing block. Other examples appear in Fig.\ts\ref{fig:set_trans} below.
1.561 +Isabelle's pretty printer resembles the one described in
1.562 +Paulson~\cite{paulson91}.
1.563 +
1.564 +\index{pretty printing|)}
1.565 +
1.566 +
1.567 +\subsection{Infixes}
1.568 +\indexbold{infixes}
1.569 +
1.570 +Infix operators associating to the left or right can be declared
1.571 +using {\tt infixl} or {\tt infixr}.
1.572 +Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)}
1.573 +abbreviates the constant declarations
1.574 +\begin{ttbox}
1.575 +"op \(c\)" :: "\(\sigma\)" ("op \(c\)")
1.576 +"op \(c\)" :: "\(\sigma\)" ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
1.577 +\end{ttbox}
1.578 +and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the constant declarations
1.579 +\begin{ttbox}
1.580 +"op \(c\)" :: "\(\sigma\)" ("op \(c\)")
1.581 +"op \(c\)" :: "\(\sigma\)" ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
1.582 +\end{ttbox}
1.583 +The infix operator is declared as a constant with the prefix {\tt op}.
1.584 +Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
1.585 +function symbols, as in \ML. Special characters occurring in~$c$ must be
1.586 +escaped, as in delimiters, using a single quote.
1.587 +
1.588 +The expanded forms above would be actually illegal in a {\tt .thy} file
1.589 +because they declare the constant \hbox{\tt"op \(c\)"} twice.
1.590 +
1.591 +
1.592 +\subsection{Binders}
1.593 +\indexbold{binders}
1.594 +\begingroup
1.595 +\def\Q{{\cal Q}}
1.596 +A {\bf binder} is a variable-binding construct such as a quantifier. The
1.597 +constant declaration
1.598 +\begin{ttbox}
1.599 +\(c\) :: "\(\sigma\)" (binder "\(\Q\)" \(p\))
1.600 +\end{ttbox}
1.601 +introduces a constant~$c$ of type~$\sigma$, which must have the form
1.602 +$(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where
1.603 +$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
1.604 +and the whole term has type~$\tau@3$. Special characters in $\Q$ must be
1.605 +escaped using a single quote.
1.606 +
1.607 +The declaration is expanded internally to
1.608 +\begin{ttbox}
1.609 +\(c\) :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"
1.610 +"\(\Q\)"\hskip-3pt :: "[idts, \(\tau@2\)] => \(\tau@3\)" ("(3\(\Q\)_./ _)" \(p\))
1.611 +\end{ttbox}
1.612 +Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
1.613 +optional type constraints (see Fig.\ts\ref{fig:pure_gram}). The
1.614 +declaration also installs a parse translation\index{translations!parse}
1.615 +for~$\Q$ and a print translation\index{translations!print} for~$c$ to
1.616 +translate between the internal and external forms.
1.617 +
1.618 +A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
1.619 +list of variables. The external form $\Q~x@1~x@2 \ldots x@n. P$
1.620 +corresponds to the internal form
1.621 +\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
1.622 +
1.623 +\medskip
1.624 +For example, let us declare the quantifier~$\forall$:\index{quantifiers}
1.625 +\begin{ttbox}
1.626 +All :: "('a => o) => o" (binder "ALL " 10)
1.627 +\end{ttbox}
1.628 +This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
1.629 + $x$.$P$}. When printing, Isabelle prefers the latter form, but must fall
1.630 +back on ${\tt All}(P)$ if $P$ is not an abstraction. Both $P$ and {\tt ALL
1.631 + $x$.$P$} have type~$o$, the type of formulae, while the bound variable
1.632 +can be polymorphic.
1.633 +\endgroup
1.634 +
1.635 +\index{mixfix declarations|)}
1.636 +
1.637 +
1.638 +\section{Example: some minimal logics} \label{sec:min_logics}
1.639 +\index{examples!of logic definitions}
1.640 +
1.641 +This section presents some examples that have a simple syntax. They
1.642 +demonstrate how to define new object-logics from scratch.
1.643 +
1.644 +First we must define how an object-logic syntax embedded into the
1.645 +meta-logic. Since all theorems must conform to the syntax for~\ndx{prop} (see
1.646 +Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
1.647 +object-level syntax. Assume that the syntax of your object-logic defines a
1.648 +nonterminal symbol~\ndx{o} of formulae. These formulae can now appear in
1.649 +axioms and theorems wherever \ndx{prop} does if you add the production
1.650 +\[ prop ~=~ o. \]
1.651 +This is not a copy production but a coercion from formulae to propositions:
1.652 +\begin{ttbox}
1.653 +Base = Pure +
1.654 +types
1.655 + o
1.656 +arities
1.657 + o :: logic
1.658 +consts
1.659 + Trueprop :: "o => prop" ("_" 5)
1.660 +end
1.661 +\end{ttbox}
1.662 +The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible
1.663 +coercion function. Assuming this definition resides in a file {\tt base.thy},
1.664 +you have to load it with the command {\tt use_thy "Base"}.
1.665 +
1.666 +One of the simplest nontrivial logics is {\bf minimal logic} of
1.667 +implication. Its definition in Isabelle needs no advanced features but
1.668 +illustrates the overall mechanism nicely:
1.669 +\begin{ttbox}
1.670 +Hilbert = Base +
1.671 +consts
1.672 + "-->" :: "[o, o] => o" (infixr 10)
1.673 +rules
1.674 + K "P --> Q --> P"
1.675 + S "(P --> Q --> R) --> (P --> Q) --> P --> R"
1.676 + MP "[| P --> Q; P |] ==> Q"
1.677 +end
1.678 +\end{ttbox}
1.679 +After loading this definition from the file {\tt hilbert.thy}, you can
1.680 +start to prove theorems in the logic:
1.681 +\begin{ttbox}
1.682 +goal Hilbert.thy "P --> P";
1.683 +{\out Level 0}
1.684 +{\out P --> P}
1.685 +{\out 1. P --> P}
1.686 +\ttbreak
1.687 +by (resolve_tac [Hilbert.MP] 1);
1.688 +{\out Level 1}
1.689 +{\out P --> P}
1.690 +{\out 1. ?P --> P --> P}
1.691 +{\out 2. ?P}
1.692 +\ttbreak
1.693 +by (resolve_tac [Hilbert.MP] 1);
1.694 +{\out Level 2}
1.695 +{\out P --> P}
1.696 +{\out 1. ?P1 --> ?P --> P --> P}
1.697 +{\out 2. ?P1}
1.698 +{\out 3. ?P}
1.699 +\ttbreak
1.700 +by (resolve_tac [Hilbert.S] 1);
1.701 +{\out Level 3}
1.702 +{\out P --> P}
1.703 +{\out 1. P --> ?Q2 --> P}
1.704 +{\out 2. P --> ?Q2}
1.705 +\ttbreak
1.706 +by (resolve_tac [Hilbert.K] 1);
1.707 +{\out Level 4}
1.708 +{\out P --> P}
1.709 +{\out 1. P --> ?Q2}
1.710 +\ttbreak
1.711 +by (resolve_tac [Hilbert.K] 1);
1.712 +{\out Level 5}
1.713 +{\out P --> P}
1.714 +{\out No subgoals!}
1.715 +\end{ttbox}
1.716 +As we can see, this Hilbert-style formulation of minimal logic is easy to
1.717 +define but difficult to use. The following natural deduction formulation is
1.718 +better:
1.719 +\begin{ttbox}
1.720 +MinI = Base +
1.721 +consts
1.722 + "-->" :: "[o, o] => o" (infixr 10)
1.723 +rules
1.724 + impI "(P ==> Q) ==> P --> Q"
1.725 + impE "[| P --> Q; P |] ==> Q"
1.726 +end
1.727 +\end{ttbox}
1.728 +Note, however, that although the two systems are equivalent, this fact
1.729 +cannot be proved within Isabelle. Axioms {\tt S} and {\tt K} can be
1.730 +derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
1.731 + Hilbert}. The reason is that {\tt impI} is only an {\bf admissible} rule
1.732 +in {\tt Hilbert}, something that can only be shown by induction over all
1.733 +possible proofs in {\tt Hilbert}.
1.734 +
1.735 +We may easily extend minimal logic with falsity:
1.736 +\begin{ttbox}
1.737 +MinIF = MinI +
1.738 +consts
1.739 + False :: "o"
1.740 +rules
1.741 + FalseE "False ==> P"
1.742 +end
1.743 +\end{ttbox}
1.744 +On the other hand, we may wish to introduce conjunction only:
1.745 +\begin{ttbox}
1.746 +MinC = Base +
1.747 +consts
1.748 + "&" :: "[o, o] => o" (infixr 30)
1.749 +\ttbreak
1.750 +rules
1.751 + conjI "[| P; Q |] ==> P & Q"
1.752 + conjE1 "P & Q ==> P"
1.753 + conjE2 "P & Q ==> Q"
1.754 +end
1.755 +\end{ttbox}
1.756 +And if we want to have all three connectives together, we create and load a
1.757 +theory file consisting of a single line:\footnote{We can combine the
1.758 + theories without creating a theory file using the ML declaration
1.759 +\begin{ttbox}
1.760 +val MinIFC_thy = merge_theories(MinIF,MinC)
1.761 +\end{ttbox}
1.762 +\index{*merge_theories|fnote}}
1.763 +\begin{ttbox}
1.764 +MinIFC = MinIF + MinC
1.765 +\end{ttbox}
1.766 +Now we can prove mixed theorems like
1.767 +\begin{ttbox}
1.768 +goal MinIFC.thy "P & False --> Q";
1.769 +by (resolve_tac [MinI.impI] 1);
1.770 +by (dresolve_tac [MinC.conjE2] 1);
1.771 +by (eresolve_tac [MinIF.FalseE] 1);
1.772 +\end{ttbox}
1.773 +Try this as an exercise!
1.774 +
1.775 +