author lcp

Fri, 15 Apr 1994 16:37:59 +0200

changeset 320 76ae17549558

parent 319 f143f7686cd6

child 321 998f1c5adafb

doc-src/Ref/defining.tex file | annotate | diff | comparison | revisions
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/doc-src/Ref/defining.tex	Fri Apr 15 16:37:59 1994 +0200
     1.3 @@ -0,0 +1,772 @@
     1.4 +%% $Id$
     1.5 +\chapter{Defining Logics} \label{Defining-Logics}
     1.6 +This chapter explains how to define new formal systems --- in particular,
     1.7 +their concrete syntax.  While Isabelle can be regarded as a theorem prover
     1.8 +for set theory, higher-order logic or the sequent calculus, its
     1.9 +distinguishing feature is support for the definition of new logics.
    1.10 +
    1.11 +Isabelle logics are hierarchies of theories, which are described and
    1.12 +illustrated in 
    1.13 +\iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%
    1.14 +{\S\ref{sec:defining-theories}}.  That material, together with the theory
    1.15 +files provided in the examples directories, should suffice for all simple
    1.16 +applications.  The easiest way to define a new theory is by modifying a
    1.17 +copy of an existing theory.
    1.18 +
    1.19 +This chapter documents the meta-logic syntax, mixfix declarations and
    1.20 +pretty printing.  The extended examples in \S\ref{sec:min_logics}
    1.21 +demonstrate the logical aspects of the definition of theories.
    1.22 +
    1.23 +
    1.24 +\section{Priority grammars} \label{sec:priority_grammars}
    1.25 +\index{priority grammars|(} 
    1.26 +
    1.27 +A context-free grammar contains a set of {\bf nonterminal symbols}, a set of
    1.28 +{\bf terminal symbols} and a set of {\bf productions}\index{productions}.
    1.29 +Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and
    1.30 +$\gamma$ is a string of terminals and nonterminals.  One designated
    1.31 +nonterminal is called the {\bf start symbol}.  The language defined by the
    1.32 +grammar consists of all strings of terminals that can be derived from the
    1.33 +start symbol by applying productions as rewrite rules.
    1.34 +
    1.35 +The syntax of an Isabelle logic is specified by a {\bf priority
    1.36 +  grammar}.\index{priorities} Each nonterminal is decorated by an integer
    1.37 +priority, as in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be
    1.38 +rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$.  Any
    1.39 +priority grammar can be translated into a normal context free grammar by
    1.40 +introducing new nonterminals and productions.
    1.41 +
    1.42 +Formally, a set of context free productions $G$ induces a derivation
    1.43 +relation $\longrightarrow@G$.  Let $\alpha$ and $\beta$ denote strings of
    1.44 +terminal or nonterminal symbols.  Then
    1.45 +\[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \] 
    1.46 +if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$.
    1.47 +
    1.48 +The following simple grammar for arithmetic expressions demonstrates how
    1.49 +binding power and associativity of operators can be enforced by priorities.
    1.50 +\begin{center}
    1.51 +\begin{tabular}{rclr}
    1.52 +  $A^{(9)}$ & = & {\tt0} \\
    1.53 +  $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
    1.54 +  $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
    1.55 +  $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
    1.56 +  $A^{(3)}$ & = & {\tt-} $A^{(3)}$
    1.57 +\end{tabular}
    1.58 +\end{center}
    1.59 +The choice of priorities determines that {\tt -} binds tighter than {\tt *},
    1.60 +which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
    1.61 +left and {\tt *} to the right.
    1.62 +
    1.63 +For clarity, grammars obey these conventions:
    1.64 +\begin{itemize}
    1.65 +\item All priorities must lie between~0 and \ttindex{max_pri}, which is a
    1.66 +  some fixed integer.  Sometimes {\tt max_pri} is written as $\infty$.
    1.67 +\item Priority 0 on the right-hand side and priority \ttindex{max_pri} on
    1.68 +  the left-hand side may be omitted.
    1.69 +\item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the
    1.70 +  priority of the left-hand side actually appears in a column on the far
    1.71 +  right.  
    1.72 +\item Alternatives are separated by~$|$.  
    1.73 +\item Repetition is indicated by dots~(\dots) in an informal but obvious
    1.74 +  way.
    1.75 +\end{itemize}
    1.76 +
    1.77 +Using these conventions and assuming $\infty=9$, the grammar
    1.78 +takes the form
    1.79 +\begin{center}
    1.80 +\begin{tabular}{rclc}
    1.81 +$A$ & = & {\tt0} & \hspace*{4em} \\
    1.82 + & $|$ & {\tt(} $A$ {\tt)} \\
    1.83 + & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
    1.84 + & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
    1.85 + & $|$ & {\tt-} $A^{(3)}$ & (3)
    1.86 +\end{tabular}
    1.87 +\end{center}
    1.88 +\index{priority grammars|)}
    1.89 +
    1.90 +
    1.91 +\begin{figure}
    1.92 +\begin{center}
    1.93 +\begin{tabular}{rclc}
    1.94 +$prop$ &=& {\tt PROP} $aprop$ ~~$|$~~ {\tt(} $prop$ {\tt)} \\
    1.95 +     &$|$& $logic^{(3)}$ {\tt ==} $logic^{(2)}$ & (2) \\
    1.96 +     &$|$& $logic^{(3)}$ {\tt =?=} $logic^{(2)}$ & (2) \\
    1.97 +     &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\
    1.98 +     &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
    1.99 +     &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\
   1.100 +$logic$ &=& $prop$ ~~$|$~~ $fun$ \\\\
   1.101 +$aprop$ &=& $id$ ~~$|$~~ $var$
   1.102 +    ~~$|$~~ $fun^{(\infty)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\
   1.103 +$fun$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $fun$ {\tt)} \\
   1.104 +    &$|$& $fun^{(\infty)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\
   1.105 +    &$|$& $fun^{(\infty)}$ {\tt::} $type$ \\
   1.106 +    &$|$& {\tt \%} $idts$ {\tt.} $logic$ & (0) \\\\
   1.107 +$idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
   1.108 +$idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
   1.109 +    &$|$& $id$ {\tt ::} $type$ & (0) \\\\
   1.110 +$type$ &=& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
   1.111 +  ~~$|$~~ $tvar$ {\tt::} $sort$ \\
   1.112 +     &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$
   1.113 +                ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
   1.114 +     &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\
   1.115 +     &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\
   1.116 +     &$|$& {\tt(} $type$ {\tt)} \\\\
   1.117 +$sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
   1.118 +                ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
   1.119 +\end{tabular}
   1.120 +\index{*PROP symbol}
   1.121 +\index{*== symbol}\index{*=?= symbol}\index{*==> symbol}
   1.122 +\index{*:: symbol}\index{*=> symbol}
   1.123 +%index command: percent is permitted, but braces must match!
   1.124 +\index{%@{\tt\%} symbol}
   1.125 +\index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}
   1.126 +\index{*[ symbol}\index{*] symbol}
   1.127 +\index{*"!"! symbol}
   1.128 +\index{*"["| symbol}
   1.129 +\index{*"|"] symbol}
   1.130 +\end{center}
   1.131 +\caption{Meta-logic syntax}\label{fig:pure_gram}
   1.132 +\end{figure}
   1.133 +
   1.134 +
   1.135 +\section{The Pure syntax} \label{sec:basic_syntax}
   1.136 +\index{syntax!Pure|(}
   1.137 +
   1.138 +At the root of all object-logics lies the theory \thydx{Pure}.  It
   1.139 +contains, among many other things, the Pure syntax.  An informal account of
   1.140 +this basic syntax (types, terms and formulae) appears in 
   1.141 +\iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%
   1.142 +{\S\ref{sec:forward}}.  A more precise description using a priority grammar
   1.143 +appears in Fig.\ts\ref{fig:pure_gram}.  It defines the following
   1.144 +nonterminals:
   1.145 +\begin{ttdescription}
   1.146 +\item[\ndxbold{prop}] denotes terms of type {\tt prop}.  These are formulae
   1.147 +  of the meta-logic.
   1.148 +
   1.149 +\item[\ndxbold{aprop}] denotes atomic propositions.  These typically
   1.150 +  include the judgement forms of the object-logic; its definition
   1.151 +  introduces a meta-level predicate for each judgement form.
   1.152 +
   1.153 +\item[\ndxbold{logic}] denotes terms whose type belongs to class
   1.154 +  \cldx{logic}.  As the syntax is extended by new object-logics, more
   1.155 +  productions for {\tt logic} are added automatically (see below).
   1.156 +
   1.157 +  \item[\ndxbold{fun}] denotes terms potentially of function type.
   1.158 +
   1.159 +  \item[\ndxbold{type}] denotes types of the meta-logic.
   1.160 +
   1.161 +  \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained
   1.162 +    by types.
   1.163 +\end{ttdescription}
   1.164 +
   1.165 +\begin{warn}
   1.166 +  In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,
   1.167 +  treating {\tt y} like a type constructor applied to {\tt nat}.  The
   1.168 +  likely result is an error message.  To avoid this interpretation, use
   1.169 +  parentheses and write \verb|(x::nat) y|.
   1.170 +
   1.171 +  Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
   1.172 +  yields an error.  The correct form is \verb|(x::nat) (y::nat)|.
   1.173 +\end{warn}
   1.174 +
   1.175 +\subsection{Logical types and default syntax}\label{logical-types}
   1.176 +\index{lambda calc@$\lambda$-calculus}
   1.177 +
   1.178 +Isabelle's representation of mathematical languages is based on the simply
   1.179 +typed $\lambda$-calculus.  All logical types, namely those of class
   1.180 +\cldx{logic}, are automatically equipped with a basic syntax of types,
   1.181 +identifiers, variables, parentheses, $\lambda$-abstractions and
   1.182 +applications.
   1.183 +
   1.184 +More precisely, for each type constructor $ty$ with arity $(\vec{s})c$,
   1.185 +where $c$ is a subclass of \cldx{logic}, several productions are added:
   1.186 +\begin{center}
   1.187 +\begin{tabular}{rclc}
   1.188 +$ty$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $ty$ {\tt)} \\
   1.189 +  &$|$& $fun^{(\infty)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\
   1.190 +  &$|$& $ty^{(\infty)}$ {\tt::} $type$\\\\
   1.191 +$logic$ &=& $ty$
   1.192 +\end{tabular}
   1.193 +\end{center}
   1.194 +
   1.195 +
   1.196 +\subsection{Lexical matters}
   1.197 +The parser does not process input strings directly.  It operates on token
   1.198 +lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
   1.199 +tokens: \bfindex{delimiters} and \bfindex{name tokens}.
   1.200 +
   1.201 +\index{reserved words}
   1.202 +Delimiters can be regarded as reserved words of the syntax.  You can
   1.203 +add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
   1.204 +appear in typewriter font, for example {\tt ==}, {\tt =?=} and
   1.205 +{\tt PROP}\@.
   1.206 +
   1.207 +Name tokens have a predefined syntax.  The lexer distinguishes four
   1.208 +disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
   1.209 +identifiers\index{type identifiers}, type unknowns\index{type unknowns}.
   1.210 +They are denoted by \ndxbold{id}, \ndxbold{var}, \ndxbold{tid},
   1.211 +\ndxbold{tvar}, respectively.  Typical examples are {\tt x}, {\tt ?x7},
   1.212 +{\tt 'a}, {\tt ?'a3}.  Here is the precise syntax:
   1.213 +\begin{eqnarray*}
   1.214 +id        & =   & letter~quasiletter^* \\
   1.215 +var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
   1.216 +tid       & =   & \mbox{\tt '}id \\
   1.217 +tvar      & =   & \mbox{\tt ?}tid ~~|~~
   1.218 +                  \mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex]
   1.219 +letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
   1.220 +digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
   1.221 +quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
   1.222 +nat       & =   & digit^+
   1.223 +\end{eqnarray*}
   1.224 +A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally
   1.225 +a pair of base name and index (\ML\ type \mltydx{indexname}).  These
   1.226 +components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
   1.227 +run together as in {\tt ?x1}.  The latter form is possible if the base name
   1.228 +does not end with digits.  If the index is 0, it may be dropped altogether:
   1.229 +{\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
   1.230 +
   1.231 +The lexer repeatedly takes the maximal prefix of the input string that
   1.232 +forms a valid token.  A maximal prefix that is both a delimiter and a name
   1.233 +is treated as a delimiter.  Spaces, tabs and newlines are separators; they
   1.234 +never occur within tokens.
   1.235 +
   1.236 +Delimiters need not be separated by white space.  For example, if {\tt -}
   1.237 +is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
   1.238 +two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\ 
   1.239 +treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
   1.240 +more liberal scheme is that the same string may be parsed in different ways
   1.241 +after extending the syntax: after adding {\tt --} as a delimiter, the input
   1.242 +{\tt --} is treated as a single token.
   1.243 +
   1.244 +Although name tokens are returned from the lexer rather than the parser, it
   1.245 +is more logical to regard them as nonterminals.  Delimiters, however, are
   1.246 +terminals; they are just syntactic sugar and contribute nothing to the
   1.247 +abstract syntax tree.
   1.248 +
   1.249 +
   1.250 +\subsection{*Inspecting the syntax}
   1.251 +\begin{ttbox}
   1.252 +syn_of              : theory -> Syntax.syntax
   1.253 +Syntax.print_syntax : Syntax.syntax -> unit
   1.254 +Syntax.print_gram   : Syntax.syntax -> unit
   1.255 +Syntax.print_trans  : Syntax.syntax -> unit
   1.256 +\end{ttbox}
   1.257 +The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes
   1.258 +in \ML.  You can display values of this type by calling the following
   1.259 +functions:
   1.260 +\begin{ttdescription}
   1.261 +\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
   1.262 +  theory~{\it thy} as an \ML\ value.
   1.263 +
   1.264 +\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
   1.265 +  information contained in the syntax {\it syn}.  The displayed output can
   1.266 +  be large.  The following two functions are more selective.
   1.267 +
   1.268 +\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
   1.269 +  of~{\it syn}, namely the lexicon, roots and productions.  These are
   1.270 +  discussed below.
   1.271 +
   1.272 +\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
   1.273 +  part of~{\it syn}, namely the constants, parse/print macros and
   1.274 +  parse/print translations.
   1.275 +\end{ttdescription}
   1.276 +
   1.277 +Let us demonstrate these functions by inspecting Pure's syntax.  Even that
   1.278 +is too verbose to display in full.
   1.279 +\begin{ttbox}\index{*Pure theory}
   1.280 +Syntax.print_syntax (syn_of Pure.thy);
   1.281 +{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
   1.282 +{\out roots: logic type fun prop}
   1.283 +{\out prods:}
   1.284 +{\out   type = tid  (1000)}
   1.285 +{\out   type = tvar  (1000)}
   1.286 +{\out   type = id  (1000)}
   1.287 +{\out   type = tid "::" sort[0]  => "_ofsort" (1000)}
   1.288 +{\out   type = tvar "::" sort[0]  => "_ofsort" (1000)}
   1.289 +{\out   \vdots}
   1.290 +\ttbreak
   1.291 +{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
   1.292 +{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
   1.293 +{\out   "_idtyp" "_lambda" "_tapp" "_tappl"}
   1.294 +{\out parse_rules:}
   1.295 +{\out parse_translation: "!!" "_K" "_abs" "_aprop"}
   1.296 +{\out print_translation: "all"}
   1.297 +{\out print_rules:}
   1.298 +{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
   1.299 +\end{ttbox}
   1.300 +
   1.301 +As you can see, the output is divided into labeled sections.  The grammar
   1.302 +is represented by {\tt lexicon}, {\tt roots} and {\tt prods}.  The rest
   1.303 +refers to syntactic translations and macro expansion.  Here is an
   1.304 +explanation of the various sections.
   1.305 +\begin{description}
   1.306 +  \item[{\tt lexicon}] lists the delimiters used for lexical
   1.307 +    analysis.\index{delimiters} 
   1.308 +
   1.309 +  \item[{\tt roots}] lists the grammar's nonterminal symbols.  You must
   1.310 +    name the desired root when calling lower level functions or specifying
   1.311 +    macros.  Higher level functions usually expect a type and derive the
   1.312 +    actual root as described in~\S\ref{sec:grammar}.
   1.313 +
   1.314 +  \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.
   1.315 +    The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
   1.316 +    Each delimiter is quoted.  Some productions are shown with {\tt =>} and
   1.317 +    an attached string.  These strings later become the heads of parse
   1.318 +    trees; they also play a vital role when terms are printed (see
   1.319 +    \S\ref{sec:asts}).
   1.320 +
   1.321 +    Productions with no strings attached are called {\bf copy
   1.322 +      productions}\indexbold{productions!copy}.  Their right-hand side must
   1.323 +    have exactly one nonterminal symbol (or name token).  The parser does
   1.324 +    not create a new parse tree node for copy productions, but simply
   1.325 +    returns the parse tree of the right-hand symbol.
   1.326 +
   1.327 +    If the right-hand side consists of a single nonterminal with no
   1.328 +    delimiters, then the copy production is called a {\bf chain
   1.329 +      production}.  Chain productions act as abbreviations:
   1.330 +    conceptually, they are removed from the grammar by adding new
   1.331 +    productions.  Priority information attached to chain productions is
   1.332 +    ignored; only the dummy value $-1$ is displayed.
   1.333 +
   1.334 +  \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]
   1.335 +    relate to macros (see \S\ref{sec:macros}).
   1.336 +
   1.337 +  \item[{\tt parse_ast_translation}, {\tt print_ast_translation}]
   1.338 +    list sets of constants that invoke translation functions for abstract
   1.339 +    syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
   1.340 +    matter.\index{constants!for translations}
   1.341 +
   1.342 +  \item[{\tt parse_translation}, {\tt print_translation}] list sets
   1.343 +    of constants that invoke translation functions for terms (see
   1.344 +    \S\ref{sec:tr_funs}).
   1.345 +\end{description}
   1.346 +\index{syntax!Pure|)}
   1.347 +
   1.348 +
   1.349 +\section{Mixfix declarations} \label{sec:mixfix}
   1.350 +\index{mixfix declarations|(} 
   1.351 +
   1.352 +When defining a theory, you declare new constants by giving their names,
   1.353 +their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
   1.354 +allow you to extend Isabelle's basic $\lambda$-calculus syntax with
   1.355 +readable notation.  They can express any context-free priority grammar.
   1.356 +Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
   1.357 +general than the priority declarations of \ML\ and Prolog.  
   1.358 +
   1.359 +A mixfix annotation defines a production of the priority grammar.  It
   1.360 +describes the concrete syntax, the translation to abstract syntax, and the
   1.361 +pretty printing.  Special case annotations provide a simple means of
   1.362 +specifying infix operators, binders and so forth.
   1.363 +
   1.364 +\subsection{Grammar productions}\label{sec:grammar}\index{productions}
   1.365 +
   1.366 +Let us examine the treatment of the production
   1.367 +\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\,  
   1.368 +                  A@n^{(p@n)}\, w@n. \]
   1.369 +Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$,
   1.370 +\ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals.
   1.371 +In the corresponding mixfix annotation, the priorities are given separately
   1.372 +as $[p@1,\ldots,p@n]$ and~$p$.  The nonterminal symbols are identified with
   1.373 +types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's
   1.374 +effect on nonterminals is expressed as the function type
   1.375 +\[ [\tau@1, \ldots, \tau@n]\To \tau. \]
   1.376 +Finally, the template
   1.377 +\[ w@0  \;_\; w@1 \;_\; \ldots \;_\; w@n \]
   1.378 +describes the strings of terminals.
   1.379 +
   1.380 +A simple type is typically declared for each nonterminal symbol.  In
   1.381 +first-order logic, type~$i$ stands for terms and~$o$ for formulae.  Only
   1.382 +the outermost type constructor is taken into account.  For example, any
   1.383 +type of the form $\sigma list$ stands for a list;  productions may refer
   1.384 +to the symbol {\tt list} and will apply lists of any type.
   1.385 +
   1.386 +The symbol associated with a type is called its {\bf root} since it may
   1.387 +serve as the root of a parse tree.  Precisely, the root of $(\tau@1, \dots,
   1.388 +\tau@n)ty$ is $ty$, where $\tau@1$, \ldots, $\tau@n$ are types and $ty$ is
   1.389 +a type constructor.  Type infixes are a special case of this; in
   1.390 +particular, the root of $\tau@1 \To \tau@2$ is {\tt fun}.  Finally, the
   1.391 +root of a type variable is {\tt logic}; general productions might
   1.392 +refer to this nonterminal.
   1.393 +
   1.394 +Identifying nonterminals with types allows a constant's type to specify
   1.395 +syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
   1.396 +\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the
   1.397 +layout of the function's $n$ arguments.  The constant's name, in this
   1.398 +case~$f$, will also serve as the label in the abstract syntax tree.  There
   1.399 +are two exceptions to this treatment of constants:
   1.400 +\begin{enumerate}\index{constants!syntactic}
   1.401 +  \item A production need not map directly to a logical function.  In this
   1.402 +    case, you must declare a constant whose purpose is purely syntactic.
   1.403 +    By convention such constants begin with the symbol~{\tt\at}, 
   1.404 +    ensuring that they can never be written in formulae.
   1.405 +
   1.406 +  \item A copy production has no associated constant.\index{productions!copy}
   1.407 +\end{enumerate}
   1.408 +There is something artificial about this representation of productions,
   1.409 +but it is convenient, particularly for simple theory extensions.
   1.410 +
   1.411 +\subsection{The general mixfix form}
   1.412 +Here is a detailed account of mixfix declarations.  Suppose the following
   1.413 +line occurs within the {\tt consts} section of a {\tt .thy} file:
   1.414 +\begin{center}
   1.415 +  {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
   1.416 +\end{center}
   1.417 +This constant declaration and mixfix annotation is interpreted as follows:
   1.418 +\begin{itemize}\index{productions}
   1.419 +\item The string {\tt $c$} is the name of the constant associated with the
   1.420 +  production; unless it is a valid identifier, it must be enclosed in
   1.421 +  quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
   1.422 +  production.\index{productions!copy} Otherwise, parsing an instance of the
   1.423 +  phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
   1.424 +    $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
   1.425 +  argument.
   1.426 +
   1.427 +  \item The constant $c$, if non-empty, is declared to have type $\sigma$.
   1.428 +
   1.429 +  \item The string $template$ specifies the right-hand side of
   1.430 +    the production.  It has the form
   1.431 +    \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \] 
   1.432 +    where each occurrence of {\tt_} denotes an argument position and
   1.433 +    the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
   1.434 +    the concrete syntax, you must escape it as described below.)  The $w@i$
   1.435 +    may consist of \rmindex{delimiters}, spaces or 
   1.436 +    \rmindex{pretty printing} annotations (see below).
   1.437 +
   1.438 +  \item The type $\sigma$ specifies the production's nonterminal symbols
   1.439 +    (or name tokens).  If $template$ is of the form above then $\sigma$
   1.440 +    must be a function type with at least~$n$ argument positions, say
   1.441 +    $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
   1.442 +    derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
   1.443 +    above.  Any of these may be function types; the corresponding root is
   1.444 +    then \tydx{fun}.
   1.445 +
   1.446 +  \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
   1.447 +      [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
   1.448 +    priority\indexbold{priorities} required of any phrase that may appear
   1.449 +    as the $i$-th argument.  Missing priorities default to~0.
   1.450 +
   1.451 +  \item The integer $p$ is the priority of this production.  If omitted, it
   1.452 +    defaults to the maximal priority.
   1.453 +    Priorities range between 0 and \ttindexbold{max_pri} (= 1000).
   1.454 +\end{itemize}
   1.455 +%
   1.456 +The declaration {\tt $c$ ::\ "$\sigma$" ("$template$")} specifies no
   1.457 +priorities.  The resulting production puts no priority constraints on any
   1.458 +of its arguments and has maximal priority itself.  Omitting priorities in
   1.459 +this manner will introduce syntactic ambiguities unless the production's
   1.460 +right-hand side is fully bracketed, as in \verb|"if _ then _ else _ fi"|.
   1.461 +
   1.462 +Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
   1.463 +is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
   1.464 +write terms involving~$c$.
   1.465 +
   1.466 +\begin{warn}
   1.467 +  Theories must sometimes declare types for purely syntactic purposes.  One
   1.468 +  example is \tydx{type}, the built-in type of types.  This is a `type of
   1.469 +  all types' in the syntactic sense only.  Do not declare such types under
   1.470 +  {\tt arities} as belonging to class {\tt logic}\index{*logic class}, for
   1.471 +  that would allow their use in arbitrary Isabelle
   1.472 +  expressions~(\S\ref{logical-types}).
   1.473 +\end{warn}
   1.474 +
   1.475 +\subsection{Example: arithmetic expressions}
   1.476 +\index{examples!of mixfix declarations}
   1.477 +This theory specification contains a {\tt consts} section with mixfix
   1.478 +declarations encoding the priority grammar from
   1.479 +\S\ref{sec:priority_grammars}:
   1.480 +\begin{ttbox}
   1.481 +EXP = Pure +
   1.482 +types
   1.483 +  exp
   1.484 +arities
   1.485 +  exp :: logic
   1.486 +consts
   1.487 +  "0" :: "exp"                ("0"      9)
   1.488 +  "+" :: "[exp, exp] => exp"  ("_ + _"  [0, 1] 0)
   1.489 +  "*" :: "[exp, exp] => exp"  ("_ * _"  [3, 2] 2)
   1.490 +  "-" :: "exp => exp"         ("- _"    [3] 3)
   1.491 +end
   1.492 +\end{ttbox}
   1.493 +The {\tt arities} declaration causes {\tt exp} to be added as a new root.
   1.494 +If you put this into a file {\tt exp.thy} and load it via {\tt
   1.495 +  use_thy "EXP"}, you can run some tests:
   1.496 +\begin{ttbox}
   1.497 +val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";
   1.498 +{\out val it = fn : string -> unit}
   1.499 +read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";
   1.500 +{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
   1.501 +{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
   1.502 +{\out \vdots}
   1.503 +read_exp "0 + - 0 + 0";
   1.504 +{\out tokens: "0" "+" "-" "0" "+" "0"}
   1.505 +{\out raw: ("+" ("+" "0" ("-" "0")) "0")}
   1.506 +{\out \vdots}
   1.507 +\end{ttbox}
   1.508 +The output of \ttindex{Syntax.test_read} includes the token list ({\tt
   1.509 +  tokens}) and the raw \AST{} directly derived from the parse tree,
   1.510 +ignoring parse \AST{} translations.  The rest is tracing information
   1.511 +provided by the macro expander (see \S\ref{sec:macros}).
   1.512 +
   1.513 +Executing {\tt Syntax.print_gram} reveals the productions derived
   1.514 +from our mixfix declarations (lots of additional information deleted):
   1.515 +\begin{ttbox}
   1.516 +Syntax.print_gram (syn_of EXP.thy);
   1.517 +{\out exp = "0"  => "0" (9)}
   1.518 +{\out exp = exp[0] "+" exp[1]  => "+" (0)}
   1.519 +{\out exp = exp[3] "*" exp[2]  => "*" (2)}
   1.520 +{\out exp = "-" exp[3]  => "-" (3)}
   1.521 +\end{ttbox}
   1.522 +
   1.523 +
   1.524 +\subsection{The mixfix template}
   1.525 +Let us take a closer look at the string $template$ appearing in mixfix
   1.526 +annotations.  This string specifies a list of parsing and printing
   1.527 +directives: delimiters\index{delimiters}, arguments, spaces, blocks of
   1.528 +indentation and line breaks.  These are encoded by the following character
   1.529 +sequences:
   1.530 +\index{pretty printing|(}
   1.531 +\begin{description}
   1.532 +\item[~$d$~] is a delimiter, namely a non-empty sequence of characters
   1.533 +  other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.
   1.534 +  Even these characters may appear if escaped; this means preceding it with
   1.535 +  a~{\tt '} (single quote).  Thus you have to write {\tt ''} if you really
   1.536 +  want a single quote.  Delimiters may never contain spaces.
   1.537 +
   1.538 +\item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol
   1.539 +  or name token.
   1.540 +
   1.541 +\item[~$s$~] is a non-empty sequence of spaces for printing.  This and the
   1.542 +  following specifications do not affect parsing at all.
   1.543 +
   1.544 +\item[~{\tt(}$n$~] opens a pretty printing block.  The optional number $n$
   1.545 +  specifies how much indentation to add when a line break occurs within the
   1.546 +  block.  If {\tt(} is not followed by digits, the indentation defaults
   1.547 +  to~0.
   1.548 +
   1.549 +\item[~{\tt)}~] closes a pretty printing block.
   1.550 +
   1.551 +\item[~{\tt//}~] forces a line break.
   1.552 +
   1.553 +\item[~{\tt/}$s$~] allows a line break.  Here $s$ stands for the string of
   1.554 +  spaces (zero or more) right after the {\tt /} character.  These spaces
   1.555 +  are printed if the break is not taken.
   1.556 +\end{description}
   1.557 +For example, the template {\tt"(_ +/ _)"} specifies an infix operator.
   1.558 +There are two argument positions; the delimiter~{\tt+} is preceded by a
   1.559 +space and followed by a space or line break; the entire phrase is a pretty
   1.560 +printing block.  Other examples appear in Fig.\ts\ref{fig:set_trans} below.
   1.561 +Isabelle's pretty printer resembles the one described in
   1.562 +Paulson~\cite{paulson91}.
   1.563 +
   1.564 +\index{pretty printing|)}
   1.565 +
   1.566 +
   1.567 +\subsection{Infixes}
   1.568 +\indexbold{infixes}
   1.569 +
   1.570 +Infix operators associating to the left or right can be declared
   1.571 +using {\tt infixl} or {\tt infixr}.
   1.572 +Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)}
   1.573 +abbreviates the constant declarations
   1.574 +\begin{ttbox}
   1.575 +"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
   1.576 +"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
   1.577 +\end{ttbox}
   1.578 +and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the constant declarations
   1.579 +\begin{ttbox}
   1.580 +"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
   1.581 +"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
   1.582 +\end{ttbox}
   1.583 +The infix operator is declared as a constant with the prefix {\tt op}.
   1.584 +Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
   1.585 +function symbols, as in \ML.  Special characters occurring in~$c$ must be
   1.586 +escaped, as in delimiters, using a single quote.
   1.587 +
   1.588 +The expanded forms above would be actually illegal in a {\tt .thy} file
   1.589 +because they declare the constant \hbox{\tt"op \(c\)"} twice.
   1.590 +
   1.591 +
   1.592 +\subsection{Binders}
   1.593 +\indexbold{binders}
   1.594 +\begingroup
   1.595 +\def\Q{{\cal Q}}
   1.596 +A {\bf binder} is a variable-binding construct such as a quantifier.  The
   1.597 +constant declaration
   1.598 +\begin{ttbox}
   1.599 +\(c\) :: "\(\sigma\)"   (binder "\(\Q\)" \(p\))
   1.600 +\end{ttbox}
   1.601 +introduces a constant~$c$ of type~$\sigma$, which must have the form
   1.602 +$(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
   1.603 +$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
   1.604 +and the whole term has type~$\tau@3$.  Special characters in $\Q$ must be
   1.605 +escaped using a single quote.
   1.606 +
   1.607 +The declaration is expanded internally to
   1.608 +\begin{ttbox}
   1.609 +\(c\)    :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"
   1.610 +"\(\Q\)"\hskip-3pt  :: "[idts, \(\tau@2\)] => \(\tau@3\)"   ("(3\(\Q\)_./ _)" \(p\))
   1.611 +\end{ttbox}
   1.612 +Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
   1.613 +optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
   1.614 +declaration also installs a parse translation\index{translations!parse}
   1.615 +for~$\Q$ and a print translation\index{translations!print} for~$c$ to
   1.616 +translate between the internal and external forms.
   1.617 +
   1.618 +A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
   1.619 +list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
   1.620 +corresponds to the internal form
   1.621 +\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
   1.622 +
   1.623 +\medskip
   1.624 +For example, let us declare the quantifier~$\forall$:\index{quantifiers}
   1.625 +\begin{ttbox}
   1.626 +All :: "('a => o) => o"   (binder "ALL " 10)
   1.627 +\end{ttbox}
   1.628 +This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
   1.629 +  $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
   1.630 +back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
   1.631 +  $x$.$P$} have type~$o$, the type of formulae, while the bound variable
   1.632 +can be polymorphic.
   1.633 +\endgroup
   1.634 +
   1.635 +\index{mixfix declarations|)}
   1.636 +
   1.637 +
   1.638 +\section{Example: some minimal logics} \label{sec:min_logics}
   1.639 +\index{examples!of logic definitions}
   1.640 +
   1.641 +This section presents some examples that have a simple syntax.  They
   1.642 +demonstrate how to define new object-logics from scratch.
   1.643 +
   1.644 +First we must define how an object-logic syntax embedded into the
   1.645 +meta-logic.  Since all theorems must conform to the syntax for~\ndx{prop} (see
   1.646 +Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
   1.647 +object-level syntax.  Assume that the syntax of your object-logic defines a
   1.648 +nonterminal symbol~\ndx{o} of formulae.  These formulae can now appear in
   1.649 +axioms and theorems wherever \ndx{prop} does if you add the production
   1.650 +\[ prop ~=~ o. \]
   1.651 +This is not a copy production but a coercion from formulae to propositions:
   1.652 +\begin{ttbox}
   1.653 +Base = Pure +
   1.654 +types
   1.655 +  o
   1.656 +arities
   1.657 +  o :: logic
   1.658 +consts
   1.659 +  Trueprop :: "o => prop"   ("_" 5)
   1.660 +end
   1.661 +\end{ttbox}
   1.662 +The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible
   1.663 +coercion function.  Assuming this definition resides in a file {\tt base.thy},
   1.664 +you have to load it with the command {\tt use_thy "Base"}.
   1.665 +
   1.666 +One of the simplest nontrivial logics is {\bf minimal logic} of
   1.667 +implication.  Its definition in Isabelle needs no advanced features but
   1.668 +illustrates the overall mechanism nicely:
   1.669 +\begin{ttbox}
   1.670 +Hilbert = Base +
   1.671 +consts
   1.672 +  "-->" :: "[o, o] => o"   (infixr 10)
   1.673 +rules
   1.674 +  K     "P --> Q --> P"
   1.675 +  S     "(P --> Q --> R) --> (P --> Q) --> P --> R"
   1.676 +  MP    "[| P --> Q; P |] ==> Q"
   1.677 +end
   1.678 +\end{ttbox}
   1.679 +After loading this definition from the file {\tt hilbert.thy}, you can
   1.680 +start to prove theorems in the logic:
   1.681 +\begin{ttbox}
   1.682 +goal Hilbert.thy "P --> P";
   1.683 +{\out Level 0}
   1.684 +{\out P --> P}
   1.685 +{\out  1.  P --> P}
   1.686 +\ttbreak
   1.687 +by (resolve_tac [Hilbert.MP] 1);
   1.688 +{\out Level 1}
   1.689 +{\out P --> P}
   1.690 +{\out  1.  ?P --> P --> P}
   1.691 +{\out  2.  ?P}
   1.692 +\ttbreak
   1.693 +by (resolve_tac [Hilbert.MP] 1);
   1.694 +{\out Level 2}
   1.695 +{\out P --> P}
   1.696 +{\out  1.  ?P1 --> ?P --> P --> P}
   1.697 +{\out  2.  ?P1}
   1.698 +{\out  3.  ?P}
   1.699 +\ttbreak
   1.700 +by (resolve_tac [Hilbert.S] 1);
   1.701 +{\out Level 3}
   1.702 +{\out P --> P}
   1.703 +{\out  1.  P --> ?Q2 --> P}
   1.704 +{\out  2.  P --> ?Q2}
   1.705 +\ttbreak
   1.706 +by (resolve_tac [Hilbert.K] 1);
   1.707 +{\out Level 4}
   1.708 +{\out P --> P}
   1.709 +{\out  1.  P --> ?Q2}
   1.710 +\ttbreak
   1.711 +by (resolve_tac [Hilbert.K] 1);
   1.712 +{\out Level 5}
   1.713 +{\out P --> P}
   1.714 +{\out No subgoals!}
   1.715 +\end{ttbox}
   1.716 +As we can see, this Hilbert-style formulation of minimal logic is easy to
   1.717 +define but difficult to use.  The following natural deduction formulation is
   1.718 +better:
   1.719 +\begin{ttbox}
   1.720 +MinI = Base +
   1.721 +consts
   1.722 +  "-->" :: "[o, o] => o"   (infixr 10)
   1.723 +rules
   1.724 +  impI  "(P ==> Q) ==> P --> Q"
   1.725 +  impE  "[| P --> Q; P |] ==> Q"
   1.726 +end
   1.727 +\end{ttbox}
   1.728 +Note, however, that although the two systems are equivalent, this fact
   1.729 +cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
   1.730 +derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
   1.731 +  Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
   1.732 +in {\tt Hilbert}, something that can only be shown by induction over all
   1.733 +possible proofs in {\tt Hilbert}.
   1.734 +
   1.735 +We may easily extend minimal logic with falsity:
   1.736 +\begin{ttbox}
   1.737 +MinIF = MinI +
   1.738 +consts
   1.739 +  False :: "o"
   1.740 +rules
   1.741 +  FalseE "False ==> P"
   1.742 +end
   1.743 +\end{ttbox}
   1.744 +On the other hand, we may wish to introduce conjunction only:
   1.745 +\begin{ttbox}
   1.746 +MinC = Base +
   1.747 +consts
   1.748 +  "&" :: "[o, o] => o"   (infixr 30)
   1.749 +\ttbreak
   1.750 +rules
   1.751 +  conjI  "[| P; Q |] ==> P & Q"
   1.752 +  conjE1 "P & Q ==> P"
   1.753 +  conjE2 "P & Q ==> Q"
   1.754 +end
   1.755 +\end{ttbox}
   1.756 +And if we want to have all three connectives together, we create and load a
   1.757 +theory file consisting of a single line:\footnote{We can combine the
   1.758 +  theories without creating a theory file using the ML declaration
   1.759 +\begin{ttbox}
   1.760 +val MinIFC_thy = merge_theories(MinIF,MinC)
   1.761 +\end{ttbox}
   1.762 +\index{*merge_theories|fnote}}
   1.763 +\begin{ttbox}
   1.764 +MinIFC = MinIF + MinC
   1.765 +\end{ttbox}
   1.766 +Now we can prove mixed theorems like
   1.767 +\begin{ttbox}
   1.768 +goal MinIFC.thy "P & False --> Q";
   1.769 +by (resolve_tac [MinI.impI] 1);
   1.770 +by (dresolve_tac [MinC.conjE2] 1);
   1.771 +by (eresolve_tac [MinIF.FalseE] 1);
   1.772 +\end{ttbox}
   1.773 +Try this as an exercise!
   1.774 +
   1.775 +
author	lcp
	Fri, 15 Apr 1994 16:37:59 +0200
changeset 320	76ae17549558
parent 319	f143f7686cd6
child 321	998f1c5adafb