doc-src/Ref/defining.tex
author clasohm
Thu, 19 Jan 1995 16:05:21 +0100
changeset 866 2d3d020eef11
parent 864 d63b111b917a
child 867 e1a654c29b05
permissions -rw-r--r--
added documentation of bind_thm, qed, qed_goal, get_thm, thms_of
and theory_of_thm
     1 %% $Id$
     2 \chapter{Defining Logics} \label{Defining-Logics}
     3 This chapter explains how to define new formal systems --- in particular,
     4 their concrete syntax.  While Isabelle can be regarded as a theorem prover
     5 for set theory, higher-order logic or the sequent calculus, its
     6 distinguishing feature is support for the definition of new logics.
     7 
     8 Isabelle logics are hierarchies of theories, which are described and
     9 illustrated in
    10 \iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%
    11 {\S\ref{sec:defining-theories}}.  That material, together with the theory
    12 files provided in the examples directories, should suffice for all simple
    13 applications.  The easiest way to define a new theory is by modifying a
    14 copy of an existing theory.
    15 
    16 This chapter documents the meta-logic syntax, mixfix declarations and
    17 pretty printing.  The extended examples in \S\ref{sec:min_logics}
    18 demonstrate the logical aspects of the definition of theories.
    19 
    20 
    21 \section{Priority grammars} \label{sec:priority_grammars}
    22 \index{priority grammars|(}
    23 
    24 A context-free grammar contains a set of {\bf nonterminal symbols}, a set of
    25 {\bf terminal symbols} and a set of {\bf productions}\index{productions}.
    26 Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and
    27 $\gamma$ is a string of terminals and nonterminals.  One designated
    28 nonterminal is called the {\bf start symbol}.  The language defined by the
    29 grammar consists of all strings of terminals that can be derived from the
    30 start symbol by applying productions as rewrite rules.
    31 
    32 The syntax of an Isabelle logic is specified by a {\bf priority
    33   grammar}.\index{priorities} Each nonterminal is decorated by an integer
    34 priority, as in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be
    35 rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$.  Any
    36 priority grammar can be translated into a normal context free grammar by
    37 introducing new nonterminals and productions.
    38 
    39 Formally, a set of context free productions $G$ induces a derivation
    40 relation $\longrightarrow@G$.  Let $\alpha$ and $\beta$ denote strings of
    41 terminal or nonterminal symbols.  Then
    42 \[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \]
    43 if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$.
    44 
    45 The following simple grammar for arithmetic expressions demonstrates how
    46 binding power and associativity of operators can be enforced by priorities.
    47 \begin{center}
    48 \begin{tabular}{rclr}
    49   $A^{(9)}$ & = & {\tt0} \\
    50   $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
    51   $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
    52   $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
    53   $A^{(3)}$ & = & {\tt-} $A^{(3)}$
    54 \end{tabular}
    55 \end{center}
    56 The choice of priorities determines that {\tt -} binds tighter than {\tt *},
    57 which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
    58 left and {\tt *} to the right.
    59 
    60 For clarity, grammars obey these conventions:
    61 \begin{itemize}
    62 \item All priorities must lie between~0 and \ttindex{max_pri}, which is a
    63   some fixed integer.  Sometimes {\tt max_pri} is written as $\infty$.
    64 \item Priority 0 on the right-hand side and priority \ttindex{max_pri} on
    65   the left-hand side may be omitted.
    66 \item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the
    67   priority of the left-hand side actually appears in a column on the far
    68   right.
    69 \item Alternatives are separated by~$|$.
    70 \item Repetition is indicated by dots~(\dots) in an informal but obvious
    71   way.
    72 \end{itemize}
    73 
    74 Using these conventions and assuming $\infty=9$, the grammar
    75 takes the form
    76 \begin{center}
    77 \begin{tabular}{rclc}
    78 $A$ & = & {\tt0} & \hspace*{4em} \\
    79  & $|$ & {\tt(} $A$ {\tt)} \\
    80  & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
    81  & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
    82  & $|$ & {\tt-} $A^{(3)}$ & (3)
    83 \end{tabular}
    84 \end{center}
    85 \index{priority grammars|)}
    86 
    87 
    88 \begin{figure}
    89 \begin{center}
    90 \begin{tabular}{rclc}
    91 $any$ &=& $prop$ ~~$|$~~ $logic$ \\\\
    92 $prop$ &=& {\tt(} $prop$ {\tt)} \\
    93      &$|$& $prop^{(4)}$ {\tt::} $type$ & (3) \\
    94      &$|$& {\tt PROP} $aprop$ \\
    95      &$|$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\
    96      &$|$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\
    97      &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\
    98      &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
    99      &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\
   100      &$|$& {\tt OFCLASS} {\tt(} $type$ {\tt,} $logic$ {\tt)} \\\\
   101 $aprop$ &=& $id$ ~~$|$~~ $var$
   102     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\
   103 $logic$ &=& {\tt(} $logic$ {\tt)} \\
   104       &$|$& $logic^{(4)}$ {\tt::} $type$ & (3) \\
   105       &$|$& $id$ ~~$|$~~ $var$
   106     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\
   107       &$|$& {\tt \%} $idts$ {\tt.} $any$ & (0) \\\\
   108 $idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
   109 $idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
   110     &$|$& $id$ {\tt ::} $type$ & (0) \\\\
   111 $type$ &=& {\tt(} $type$ {\tt)} \\
   112      &$|$& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
   113        ~~$|$~~ $tvar$ {\tt::} $sort$ \\
   114      &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$
   115                 ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
   116      &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\
   117      &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0) \\\\
   118 $sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
   119                 ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
   120 \end{tabular}
   121 \index{*PROP symbol}
   122 \index{*== symbol}\index{*=?= symbol}\index{*==> symbol}
   123 \index{*:: symbol}\index{*=> symbol}
   124 \index{sort constraints}
   125 %the index command: a percent is permitted, but braces must match!
   126 \index{%@{\tt\%} symbol}
   127 \index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}
   128 \index{*[ symbol}\index{*] symbol}
   129 \index{*"!"! symbol}
   130 \index{*"["| symbol}
   131 \index{*"|"] symbol}
   132 \end{center}
   133 \caption{Meta-logic syntax}\label{fig:pure_gram}
   134 \end{figure}
   135 
   136 
   137 \section{The Pure syntax} \label{sec:basic_syntax}
   138 \index{syntax!Pure|(}
   139 
   140 At the root of all object-logics lies the theory \thydx{Pure}.  It
   141 contains, among many other things, the Pure syntax.  An informal account of
   142 this basic syntax (types, terms and formulae) appears in
   143 \iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%
   144 {\S\ref{sec:forward}}.  A more precise description using a priority grammar
   145 appears in Fig.\ts\ref{fig:pure_gram}.  It defines the following
   146 nonterminals:
   147 \begin{ttdescription}
   148   \item[\ndxbold{any}] denotes any term.
   149 
   150   \item[\ndxbold{prop}] denotes terms of type {\tt prop}.  These are formulae
   151     of the meta-logic.  Note that user constants of result type {\tt prop}
   152     (i.e.\ $c :: \ldots \To prop$) should always provide concrete syntax.
   153     Otherwise atomic propositions with head $c$ may be printed incorrectly.
   154 
   155   \item[\ndxbold{aprop}] denotes atomic propositions.
   156 
   157 %% FIXME huh!?
   158 %  These typically
   159 %  include the judgement forms of the object-logic; its definition
   160 %  introduces a meta-level predicate for each judgement form.
   161 
   162   \item[\ndxbold{logic}] denotes terms whose type belongs to class
   163     \cldx{logic}, excluding type \tydx{prop}.
   164 
   165   \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained
   166     by types.
   167 
   168   \item[\ndxbold{type}] denotes types of the meta-logic.
   169 
   170   \item[\ndxbold{sort}] denotes meta-level sorts.
   171 \end{ttdescription}
   172 
   173 \begin{warn}
   174   In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,
   175   treating {\tt y} like a type constructor applied to {\tt nat}.  The
   176   likely result is an error message.  To avoid this interpretation, use
   177   parentheses and write \verb|(x::nat) y|.
   178   \index{type constraints}\index{*:: symbol}
   179 
   180   Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
   181   yields an error.  The correct form is \verb|(x::nat) (y::nat)|.
   182 \end{warn}
   183 
   184 \begin{warn}
   185   Type constraints bind very weakly. For example, \verb!x<y::nat! is normally
   186   parsed as \verb!(x<y)::nat!, unless \verb$<$ has priority of 3 or less, in
   187   which case the string is likely to be ambiguous. The correct form is
   188   \verb!x<(y::nat)!.
   189 \end{warn}
   190 
   191 Isabelle's representation of mathematical languages is based on the simply
   192 typed $\lambda$-calculus\index{lambda calc@$\lambda$-calculus}.  All user
   193 defined logical types, namely those of class \cldx{logic}, refer to the
   194 nonterminal {\tt logic}. Thus they are automatically equipped with a basic
   195 syntax of types, identifiers, variables, parentheses, $\lambda$-abstractions
   196 and applications.
   197 
   198 
   199 \subsection{Lexical matters}
   200 The parser does not process input strings directly.  It operates on token
   201 lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
   202 tokens: \bfindex{delimiters} and \bfindex{name tokens}.
   203 
   204 \index{reserved words}
   205 Delimiters can be regarded as reserved words of the syntax.  You can
   206 add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
   207 appear in typewriter font, for example {\tt ==}, {\tt =?=} and
   208 {\tt PROP}\@.
   209 
   210 Name tokens have a predefined syntax.  The lexer distinguishes six disjoint
   211 classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
   212 identifiers\index{type identifiers}, type unknowns\index{type unknowns},
   213 \rmindex{numerals}, \rmindex{strings}. They are denoted by \ndxbold{id},
   214 \ndxbold{var}, \ndxbold{tid}, \ndxbold{tvar}, \ndxbold{xnum}, \ndxbold{xstr},
   215 respectively.  Typical examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3},
   216 {\tt \#42}, {\tt ''foo bar''}. Here is the precise syntax:
   217 \begin{eqnarray*}
   218 id        & =   & letter~quasiletter^* \\
   219 var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
   220 tid       & =   & \mbox{\tt '}id \\
   221 tvar      & =   & \mbox{\tt ?}tid ~~|~~
   222                   \mbox{\tt ?}tid\mbox{\tt .}nat \\
   223 xnum      & =   & \mbox{\tt \#}nat ~~|~~ \mbox{\tt \#\char`\~}nat \\
   224 xstr      & =   & \mbox{\tt ''\rm text\tt ''} \\[1ex]
   225 letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
   226 digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
   227 quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
   228 nat       & =   & digit^+
   229 \end{eqnarray*}
   230 The lexer repeatedly takes the maximal prefix of the input string that forms
   231 a valid token.  A maximal prefix that is both a delimiter and a name is
   232 treated as a delimiter.  Spaces, tabs, newlines and formfeeds are separators;
   233 they never occur within tokens, except those of class $xstr$.
   234 
   235 \medskip
   236 Delimiters need not be separated by white space.  For example, if {\tt -}
   237 is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
   238 two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\
   239 treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
   240 more liberal scheme is that the same string may be parsed in different ways
   241 after extending the syntax: after adding {\tt --} as a delimiter, the input
   242 {\tt --} is treated as a single token.
   243 
   244 A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally
   245 a pair of base name and index (\ML\ type \mltydx{indexname}).  These
   246 components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
   247 run together as in {\tt ?x1}.  The latter form is possible if the base name
   248 does not end with digits.  If the index is 0, it may be dropped altogether:
   249 {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
   250 
   251 Tokens of class $xnum$ or $xstr$ are not used by the meta-logic.
   252 Object-logics may provide numerals and string constants by adding appropriate
   253 productions and translation functions.
   254 
   255 \medskip
   256 Although name tokens are returned from the lexer rather than the parser, it
   257 is more logical to regard them as nonterminals.  Delimiters, however, are
   258 terminals; they are just syntactic sugar and contribute nothing to the
   259 abstract syntax tree.
   260 
   261 
   262 \subsection{*Inspecting the syntax}
   263 \begin{ttbox}
   264 syn_of              : theory -> Syntax.syntax
   265 print_syntax        : theory -> unit
   266 Syntax.print_syntax : Syntax.syntax -> unit
   267 Syntax.print_gram   : Syntax.syntax -> unit
   268 Syntax.print_trans  : Syntax.syntax -> unit
   269 \end{ttbox}
   270 The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes
   271 in \ML.  You can display values of this type by calling the following
   272 functions:
   273 \begin{ttdescription}
   274 \item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
   275   theory~{\it thy} as an \ML\ value.
   276 
   277 \item[\ttindexbold{print_syntax} $thy$] displays the syntax part of $thy$
   278   using {\tt Syntax.print_syntax}.
   279 
   280 \item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
   281   information contained in the syntax {\it syn}.  The displayed output can
   282   be large.  The following two functions are more selective.
   283 
   284 \item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
   285   of~{\it syn}, namely the lexicon, logical types and productions.  These are
   286   discussed below.
   287 
   288 \item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
   289   part of~{\it syn}, namely the constants, parse/print macros and
   290   parse/print translations.
   291 \end{ttdescription}
   292 
   293 Let us demonstrate these functions by inspecting Pure's syntax.  Even that
   294 is too verbose to display in full.
   295 \begin{ttbox}\index{*Pure theory}
   296 Syntax.print_syntax (syn_of Pure.thy);
   297 {\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
   298 {\out logtypes: fun itself}
   299 {\out prods:}
   300 {\out   type = tid  (1000)}
   301 {\out   type = tvar  (1000)}
   302 {\out   type = id  (1000)}
   303 {\out   type = tid "::" sort[0]  => "_ofsort" (1000)}
   304 {\out   type = tvar "::" sort[0]  => "_ofsort" (1000)}
   305 {\out   \vdots}
   306 \ttbreak
   307 {\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
   308 {\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
   309 {\out   "_idtyp" "_lambda" "_tapp" "_tappl"}
   310 {\out parse_rules:}
   311 {\out parse_translation: "!!" "_K" "_abs" "_aprop"}
   312 {\out print_translation: "all"}
   313 {\out print_rules:}
   314 {\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
   315 \end{ttbox}
   316 
   317 As you can see, the output is divided into labelled sections.  The grammar
   318 is represented by {\tt lexicon}, {\tt logtypes} and {\tt prods}.  The rest
   319 refers to syntactic translations and macro expansion.  Here is an
   320 explanation of the various sections.
   321 \begin{description}
   322   \item[{\tt lexicon}] lists the delimiters used for lexical
   323     analysis.\index{delimiters}
   324 
   325   \item[{\tt logtypes}] lists the types that are regarded the same as {\tt
   326     logic} syntactically. Thus types of object-logics (e.g.\ {\tt nat}, say)
   327     will be automatically equipped with the standard syntax of
   328     $\lambda$-calculus.
   329 
   330   \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.
   331     The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
   332     Each delimiter is quoted.  Some productions are shown with {\tt =>} and
   333     an attached string.  These strings later become the heads of parse
   334     trees; they also play a vital role when terms are printed (see
   335     \S\ref{sec:asts}).
   336 
   337     Productions with no strings attached are called {\bf copy
   338       productions}\indexbold{productions!copy}.  Their right-hand side must
   339     have exactly one nonterminal symbol (or name token).  The parser does
   340     not create a new parse tree node for copy productions, but simply
   341     returns the parse tree of the right-hand symbol.
   342 
   343     If the right-hand side consists of a single nonterminal with no
   344     delimiters, then the copy production is called a {\bf chain
   345       production}.  Chain productions act as abbreviations:
   346     conceptually, they are removed from the grammar by adding new
   347     productions.  Priority information attached to chain productions is
   348     ignored; only the dummy value $-1$ is displayed.
   349 
   350   \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]
   351     relate to macros (see \S\ref{sec:macros}).
   352 
   353   \item[{\tt parse_ast_translation}, {\tt print_ast_translation}]
   354     list sets of constants that invoke translation functions for abstract
   355     syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
   356     matter.\index{constants!for translations}
   357 
   358   \item[{\tt parse_translation}, {\tt print_translation}] list sets
   359     of constants that invoke translation functions for terms (see
   360     \S\ref{sec:tr_funs}).
   361 \end{description}
   362 \index{syntax!Pure|)}
   363 
   364 
   365 \section{Mixfix declarations} \label{sec:mixfix}
   366 \index{mixfix declarations|(}
   367 
   368 When defining a theory, you declare new constants by giving their names,
   369 their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
   370 allow you to extend Isabelle's basic $\lambda$-calculus syntax with
   371 readable notation.  They can express any context-free priority grammar.
   372 Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
   373 general than the priority declarations of \ML\ and Prolog.
   374 
   375 A mixfix annotation defines a production of the priority grammar.  It
   376 describes the concrete syntax, the translation to abstract syntax, and the
   377 pretty printing.  Special case annotations provide a simple means of
   378 specifying infix operators and binders.
   379 
   380 
   381 %% FIXME remove
   382 %\subsection{Grammar productions}\label{sec:grammar}\index{productions}
   383 %
   384 %Let us examine the treatment of the production
   385 %\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\,
   386 %                  A@n^{(p@n)}\, w@n. \]
   387 %Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$,
   388 %\ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals.
   389 %In the corresponding mixfix annotation, the priorities are given separately
   390 %as $[p@1,\ldots,p@n]$ and~$p$.  The nonterminal symbols are derived from
   391 %types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's
   392 %effect on nonterminals is expressed as the function type
   393 %\[ [\tau@1, \ldots, \tau@n]\To \tau. \]
   394 %Finally, the template
   395 %\[ w@0  \;_\; w@1 \;_\; \ldots \;_\; w@n \]
   396 %describes the strings of terminals.
   397 
   398 
   399 \subsection{The general mixfix form}
   400 Here is a detailed account of mixfix declarations.  Suppose the following
   401 line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
   402 file:
   403 \begin{center}
   404   {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
   405 \end{center}
   406 This constant declaration and mixfix annotation are interpreted as follows:
   407 \begin{itemize}\index{productions}
   408 \item The string {\tt $c$} is the name of the constant associated with the
   409   production; unless it is a valid identifier, it must be enclosed in
   410   quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
   411   production.\index{productions!copy} Otherwise, parsing an instance of the
   412   phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
   413     $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
   414   argument.
   415 
   416   \item The constant $c$, if non-empty, is declared to have type $\sigma$
   417     ({\tt consts} section only).
   418 
   419   \item The string $template$ specifies the right-hand side of
   420     the production.  It has the form
   421     \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
   422     where each occurrence of {\tt_} denotes an argument position and
   423     the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
   424     the concrete syntax, you must escape it as described below.)  The $w@i$
   425     may consist of \rmindex{delimiters}, spaces or
   426     \rmindex{pretty printing} annotations (see below).
   427 
   428   \item The type $\sigma$ specifies the production's nonterminal symbols
   429     (or name tokens).  If $template$ is of the form above then $\sigma$
   430     must be a function type with at least~$n$ argument positions, say
   431     $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
   432     derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
   433     below.  Any of these may be function types.
   434 
   435   \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
   436       [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
   437     priority\indexbold{priorities} required of any phrase that may appear
   438     as the $i$-th argument.  Missing priorities default to~0.
   439 
   440   \item The integer $p$ is the priority of this production.  If omitted, it
   441     defaults to the maximal priority.
   442     Priorities range between 0 and \ttindexbold{max_pri} (= 1000).
   443 \end{itemize}
   444 %
   445 The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
   446 A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
   447 nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
   448 The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
   449 this is a logical type (namely one of class {\tt logic} excluding {\tt
   450 prop}).  Otherwise it is $ty$ (note that only the outermost type constructor
   451 is taken into account).  Finally, the nonterminal of a type variable is {\tt
   452 any}.
   453 
   454 \begin{warn} 
   455   Theories must sometimes declare types for purely syntactic purposes ---
   456   merely playing the role of nonterminals. One example is \tydx{type}, the
   457   built-in type of types.  This is a `type of all types' in the syntactic
   458   sense only.  Do not declare such types under {\tt arities} as belonging to
   459   class {\tt logic}\index{*logic class}, for that would make them useless as
   460   separate nonterminal symbols.
   461 \end{warn}
   462 
   463 Associating nonterminals with types allows a constant's type to specify
   464 syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
   465 \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
   466 of the function's $n$ arguments.  The constant's name, in this case~$f$, will
   467 also serve as the label in the abstract syntax tree.
   468 
   469 You may also declare mixfix syntax without adding constants to the theory's
   470 signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a
   471 production need not map directly to a logical function (this typically
   472 requires additional syntactic translations, see also
   473 Chapter~\ref{chap:syntax}).
   474 
   475 
   476 \medskip 
   477 As a special case of the general mixfix declaration, the form 
   478 \begin{center}
   479   {\tt $c$ ::\ "$\sigma$" ("$template$")} 
   480 \end{center}
   481 specifies no priorities.  The resulting production puts no priority
   482 constraints on any of its arguments and has maximal priority itself.
   483 Omitting priorities in this manner is prone to syntactic ambiguities unless
   484 the production's right-hand side is fully bracketed, as in \verb|"if _ then _
   485 else _ fi"|.
   486 
   487 Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
   488 is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
   489 write terms involving~$c$.
   490 
   491 \medskip 
   492 There is something artificial about the representation of productions as
   493 mixfix declarations, but it is convenient, particularly for simple theory
   494 extensions.
   495 
   496 
   497 \subsection{Example: arithmetic expressions}
   498 \index{examples!of mixfix declarations}
   499 This theory specification contains a {\tt syntax} section with mixfix
   500 declarations encoding the priority grammar from
   501 \S\ref{sec:priority_grammars}:
   502 \begin{ttbox}
   503 EXP = Pure +
   504 types
   505   exp
   506 syntax
   507   "0" :: "exp"                ("0"      9)
   508   "+" :: "[exp, exp] => exp"  ("_ + _"  [0, 1] 0)
   509   "*" :: "[exp, exp] => exp"  ("_ * _"  [3, 2] 2)
   510   "-" :: "exp => exp"         ("- _"    [3] 3)
   511 end
   512 \end{ttbox}
   513 If you put this into a file {\tt EXP.thy} and load it via {\tt use_thy"EXP"},
   514 you can run some tests:
   515 \begin{ttbox}
   516 val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";
   517 {\out val it = fn : string -> unit}
   518 read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";
   519 {\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
   520 {\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
   521 {\out \vdots}
   522 read_exp "0 + - 0 + 0";
   523 {\out tokens: "0" "+" "-" "0" "+" "0"}
   524 {\out raw: ("+" ("+" "0" ("-" "0")) "0")}
   525 {\out \vdots}
   526 \end{ttbox}
   527 The output of \ttindex{Syntax.test_read} includes the token list ({\tt
   528   tokens}) and the raw \AST{} directly derived from the parse tree,
   529 ignoring parse \AST{} translations.  The rest is tracing information
   530 provided by the macro expander (see \S\ref{sec:macros}).
   531 
   532 Executing {\tt Syntax.print_gram} reveals the productions derived from the
   533 above mixfix declarations (lots of additional information deleted):
   534 \begin{ttbox}
   535 Syntax.print_gram (syn_of EXP.thy);
   536 {\out exp = "0"  => "0" (9)}
   537 {\out exp = exp[0] "+" exp[1]  => "+" (0)}
   538 {\out exp = exp[3] "*" exp[2]  => "*" (2)}
   539 {\out exp = "-" exp[3]  => "-" (3)}
   540 \end{ttbox}
   541 
   542 
   543 \subsection{The mixfix template}
   544 Let us now take a closer look at the string $template$ appearing in mixfix
   545 annotations.  This string specifies a list of parsing and printing
   546 directives: delimiters\index{delimiters}, arguments, spaces, blocks of
   547 indentation and line breaks.  These are encoded by the following character
   548 sequences:
   549 \index{pretty printing|(}
   550 \begin{description}
   551 \item[~$d$~] is a delimiter, namely a non-empty sequence of characters
   552   other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.
   553   Even these characters may appear if escaped; this means preceding it with
   554   a~{\tt '} (single quote).  Thus you have to write {\tt ''} if you really
   555   want a single quote.  Delimiters may never contain spaces.
   556 
   557 \item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol
   558   or name token.
   559 
   560 \item[~$s$~] is a non-empty sequence of spaces for printing.  This and the
   561   following specifications do not affect parsing at all.
   562 
   563 \item[~{\tt(}$n$~] opens a pretty printing block.  The optional number $n$
   564   specifies how much indentation to add when a line break occurs within the
   565   block.  If {\tt(} is not followed by digits, the indentation defaults
   566   to~0.
   567 
   568 \item[~{\tt)}~] closes a pretty printing block.
   569 
   570 \item[~{\tt//}~] forces a line break.
   571 
   572 \item[~{\tt/}$s$~] allows a line break.  Here $s$ stands for the string of
   573   spaces (zero or more) right after the {\tt /} character.  These spaces
   574   are printed if the break is not taken.
   575 \end{description}
   576 For example, the template {\tt"(_ +/ _)"} specifies an infix operator.
   577 There are two argument positions; the delimiter~{\tt+} is preceded by a
   578 space and followed by a space or line break; the entire phrase is a pretty
   579 printing block.  Other examples appear in Fig.\ts\ref{fig:set_trans} below.
   580 Isabelle's pretty printer resembles the one described in
   581 Paulson~\cite{paulson91}.
   582 
   583 \index{pretty printing|)}
   584 
   585 
   586 \subsection{Infixes}
   587 \indexbold{infixes}
   588 
   589 Infix operators associating to the left or right can be declared
   590 using {\tt infixl} or {\tt infixr}.
   591 Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)}
   592 abbreviates the mixfix declarations
   593 \begin{ttbox}
   594 "op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
   595 "op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
   596 \end{ttbox}
   597 and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the mixfix declarations
   598 \begin{ttbox}
   599 "op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
   600 "op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
   601 \end{ttbox}
   602 The infix operator is declared as a constant with the prefix {\tt op}.
   603 Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
   604 function symbols, as in \ML.  Special characters occurring in~$c$ must be
   605 escaped, as in delimiters, using a single quote.
   606 
   607 
   608 \subsection{Binders}
   609 \indexbold{binders}
   610 \begingroup
   611 \def\Q{{\cal Q}}
   612 A {\bf binder} is a variable-binding construct such as a quantifier.  The
   613 constant declaration
   614 \begin{ttbox}
   615 \(c\) :: "\(\sigma\)"   (binder "\(\Q\)" \(p\))
   616 \end{ttbox}
   617 introduces a constant~$c$ of type~$\sigma$, which must have the form
   618 $(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
   619 $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
   620 and the whole term has type~$\tau@3$.  Special characters in $\Q$ must be
   621 escaped using a single quote.
   622 
   623 The declaration is expanded internally to something like
   624 \begin{ttbox}
   625 \(c\)    :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"
   626 "\(\Q\)"\hskip-3pt  :: "[idts, \(\tau@2\)] => \(\tau@3\)"   ("(3\(\Q\)_./ _)" \(p\))
   627 \end{ttbox}
   628 Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
   629 \index{type constraints}
   630 optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
   631 declaration also installs a parse translation\index{translations!parse}
   632 for~$\Q$ and a print translation\index{translations!print} for~$c$ to
   633 translate between the internal and external forms.
   634 
   635 A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
   636 list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
   637 corresponds to the internal form
   638 \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
   639 
   640 \medskip
   641 For example, let us declare the quantifier~$\forall$:\index{quantifiers}
   642 \begin{ttbox}
   643 All :: "('a => o) => o"   (binder "ALL " 10)
   644 \end{ttbox}
   645 This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
   646   $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
   647 back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
   648   $x$.$P$} have type~$o$, the type of formulae, while the bound variable
   649 can be polymorphic.
   650 \endgroup
   651 
   652 \index{mixfix declarations|)}
   653 
   654 \section{Ambiguity of parsed expressions} \label{sec:ambiguity}
   655 \index{ambiguity!of parsed expressions}
   656 
   657 To keep the grammar small and allow common productions to be shared
   658 all logical types (except {\tt prop}) are internally represented
   659 by one nonterminal, namely {\tt logic}. This and omitted or too freely
   660 chosen priorities may lead to ways of parsing an expression that were
   661 not intended by the theory's maker. In most cases Isabelle is able to
   662 select one of multiple parse trees that an expression has lead
   663 to by checking which of them can be typed correctly. But this may not
   664 work in every case and always slows down parsing.
   665 The warning and error messages that can be produced during this process are
   666 as follows:
   667 
   668 If an ambiguity can be resolved by type inference this warning
   669 is shown to remind the user that parsing is (unnecessarily) slowed
   670 down:
   671 
   672 \begin{ttbox}
   673 {\out Warning: Ambiguous input "..."}
   674 {\out produces the following parse trees:}
   675 {\out ...}
   676 {\out Fortunately, only one parse tree is type correct.}
   677 {\out It helps (speed!) if you disambiguate your grammar or your input.}
   678 \end{ttbox}
   679 
   680 The following message is normally caused by using the same
   681 syntax in two different productions:
   682 
   683 \begin{ttbox}
   684 {\out Warning: Ambiguous input "..."}
   685 {\out produces the following parse trees:}
   686 {\out ...}
   687 {\out Error: More than one term is type correct:}
   688 {\out ...}
   689 \end{ttbox}
   690 
   691 Ambiguities occuring in syntax translation rules cannot be resolved by
   692 type inference because it is not necessary for these rules to be type
   693 correct. Therefore Isabelle always generates an error message and the
   694 ambiguity should be eliminated by changing the grammar or the rule.
   695 
   696 
   697 \section{Example: some minimal logics} \label{sec:min_logics}
   698 \index{examples!of logic definitions}
   699 
   700 This section presents some examples that have a simple syntax.  They
   701 demonstrate how to define new object-logics from scratch.
   702 
   703 First we must define how an object-logic syntax is embedded into the
   704 meta-logic.  Since all theorems must conform to the syntax for~\ndx{prop}
   705 (see Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
   706 object-level syntax.  Assume that the syntax of your object-logic defines a
   707 meta-type~\tydx{o} of formulae which refers to the nonterminal {\tt logic}.
   708 These formulae can now appear in axioms and theorems wherever \ndx{prop} does
   709 if you add the production
   710 \[ prop ~=~ logic. \]
   711 This is not supposed to be a copy production but an implicit coercion from
   712 formulae to propositions:
   713 \begin{ttbox}
   714 Base = Pure +
   715 types
   716   o
   717 arities
   718   o :: logic
   719 consts
   720   Trueprop :: "o => prop"   ("_" 5)
   721 end
   722 \end{ttbox}
   723 The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible
   724 coercion function.  Assuming this definition resides in a file {\tt Base.thy},
   725 you have to load it with the command {\tt use_thy "Base"}.
   726 
   727 One of the simplest nontrivial logics is {\bf minimal logic} of
   728 implication.  Its definition in Isabelle needs no advanced features but
   729 illustrates the overall mechanism nicely:
   730 \begin{ttbox}
   731 Hilbert = Base +
   732 consts
   733   "-->" :: "[o, o] => o"   (infixr 10)
   734 rules
   735   K     "P --> Q --> P"
   736   S     "(P --> Q --> R) --> (P --> Q) --> P --> R"
   737   MP    "[| P --> Q; P |] ==> Q"
   738 end
   739 \end{ttbox}
   740 After loading this definition from the file {\tt Hilbert.thy}, you can
   741 start to prove theorems in the logic:
   742 \begin{ttbox}
   743 goal Hilbert.thy "P --> P";
   744 {\out Level 0}
   745 {\out P --> P}
   746 {\out  1.  P --> P}
   747 \ttbreak
   748 by (resolve_tac [Hilbert.MP] 1);
   749 {\out Level 1}
   750 {\out P --> P}
   751 {\out  1.  ?P --> P --> P}
   752 {\out  2.  ?P}
   753 \ttbreak
   754 by (resolve_tac [Hilbert.MP] 1);
   755 {\out Level 2}
   756 {\out P --> P}
   757 {\out  1.  ?P1 --> ?P --> P --> P}
   758 {\out  2.  ?P1}
   759 {\out  3.  ?P}
   760 \ttbreak
   761 by (resolve_tac [Hilbert.S] 1);
   762 {\out Level 3}
   763 {\out P --> P}
   764 {\out  1.  P --> ?Q2 --> P}
   765 {\out  2.  P --> ?Q2}
   766 \ttbreak
   767 by (resolve_tac [Hilbert.K] 1);
   768 {\out Level 4}
   769 {\out P --> P}
   770 {\out  1.  P --> ?Q2}
   771 \ttbreak
   772 by (resolve_tac [Hilbert.K] 1);
   773 {\out Level 5}
   774 {\out P --> P}
   775 {\out No subgoals!}
   776 \end{ttbox}
   777 As we can see, this Hilbert-style formulation of minimal logic is easy to
   778 define but difficult to use.  The following natural deduction formulation is
   779 better:
   780 \begin{ttbox}
   781 MinI = Base +
   782 consts
   783   "-->" :: "[o, o] => o"   (infixr 10)
   784 rules
   785   impI  "(P ==> Q) ==> P --> Q"
   786   impE  "[| P --> Q; P |] ==> Q"
   787 end
   788 \end{ttbox}
   789 Note, however, that although the two systems are equivalent, this fact
   790 cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
   791 derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
   792   Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
   793 in {\tt Hilbert}, something that can only be shown by induction over all
   794 possible proofs in {\tt Hilbert}.
   795 
   796 We may easily extend minimal logic with falsity:
   797 \begin{ttbox}
   798 MinIF = MinI +
   799 consts
   800   False :: "o"
   801 rules
   802   FalseE "False ==> P"
   803 end
   804 \end{ttbox}
   805 On the other hand, we may wish to introduce conjunction only:
   806 \begin{ttbox}
   807 MinC = Base +
   808 consts
   809   "&" :: "[o, o] => o"   (infixr 30)
   810 \ttbreak
   811 rules
   812   conjI  "[| P; Q |] ==> P & Q"
   813   conjE1 "P & Q ==> P"
   814   conjE2 "P & Q ==> Q"
   815 end
   816 \end{ttbox}
   817 And if we want to have all three connectives together, we create and load a
   818 theory file consisting of a single line:\footnote{We can combine the
   819   theories without creating a theory file using the ML declaration
   820 \begin{ttbox}
   821 val MinIFC_thy = merge_theories(MinIF,MinC)
   822 \end{ttbox}
   823 \index{*merge_theories|fnote}}
   824 \begin{ttbox}
   825 MinIFC = MinIF + MinC
   826 \end{ttbox}
   827 Now we can prove mixed theorems like
   828 \begin{ttbox}
   829 goal MinIFC.thy "P & False --> Q";
   830 by (resolve_tac [MinI.impI] 1);
   831 by (dresolve_tac [MinC.conjE2] 1);
   832 by (eresolve_tac [MinIF.FalseE] 1);
   833 \end{ttbox}
   834 Try this as an exercise!