doc-src/Ref/defining.tex
author berghofe
Fri, 02 May 1997 16:41:35 +0200
changeset 3098 a31170b67367
parent 1387 9bcad9c22fd4
child 3108 335efc3f5632
permissions -rw-r--r--
Updated to LaTeX 2e
     1 %% $Id$
     2 \chapter{Defining Logics} \label{Defining-Logics}
     3 This chapter explains how to define new formal systems --- in particular,
     4 their concrete syntax.  While Isabelle can be regarded as a theorem prover
     5 for set theory, higher-order logic or the sequent calculus, its
     6 distinguishing feature is support for the definition of new logics.
     7 
     8 Isabelle logics are hierarchies of theories, which are described and
     9 illustrated in
    10 \iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%
    11 {\S\ref{sec:defining-theories}}.  That material, together with the theory
    12 files provided in the examples directories, should suffice for all simple
    13 applications.  The easiest way to define a new theory is by modifying a
    14 copy of an existing theory.
    15 
    16 This chapter documents the meta-logic syntax, mixfix declarations and
    17 pretty printing.  The extended examples in \S\ref{sec:min_logics}
    18 demonstrate the logical aspects of the definition of theories.
    19 
    20 
    21 \section{Priority grammars} \label{sec:priority_grammars}
    22 \index{priority grammars|(}
    23 
    24 A context-free grammar contains a set of {\bf nonterminal symbols}, a set of
    25 {\bf terminal symbols} and a set of {\bf productions}\index{productions}.
    26 Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and
    27 $\gamma$ is a string of terminals and nonterminals.  One designated
    28 nonterminal is called the {\bf start symbol}.  The language defined by the
    29 grammar consists of all strings of terminals that can be derived from the
    30 start symbol by applying productions as rewrite rules.
    31 
    32 The syntax of an Isabelle logic is specified by a {\bf priority
    33   grammar}.\index{priorities} Each nonterminal is decorated by an integer
    34 priority, as in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be
    35 rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$.  Any
    36 priority grammar can be translated into a normal context free grammar by
    37 introducing new nonterminals and productions.
    38 
    39 Formally, a set of context free productions $G$ induces a derivation
    40 relation $\longrightarrow@G$.  Let $\alpha$ and $\beta$ denote strings of
    41 terminal or nonterminal symbols.  Then
    42 \[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \]
    43 if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$.
    44 
    45 The following simple grammar for arithmetic expressions demonstrates how
    46 binding power and associativity of operators can be enforced by priorities.
    47 \begin{center}
    48 \begin{tabular}{rclr}
    49   $A^{(9)}$ & = & {\tt0} \\
    50   $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
    51   $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
    52   $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
    53   $A^{(3)}$ & = & {\tt-} $A^{(3)}$
    54 \end{tabular}
    55 \end{center}
    56 The choice of priorities determines that {\tt -} binds tighter than {\tt *},
    57 which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
    58 left and {\tt *} to the right.
    59 
    60 For clarity, grammars obey these conventions:
    61 \begin{itemize}
    62 \item All priorities must lie between~0 and \ttindex{max_pri}, which is a
    63   some fixed integer.  Sometimes {\tt max_pri} is written as $\infty$.
    64 \item Priority 0 on the right-hand side and priority \ttindex{max_pri} on
    65   the left-hand side may be omitted.
    66 \item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the
    67   priority of the left-hand side actually appears in a column on the far
    68   right.
    69 \item Alternatives are separated by~$|$.
    70 \item Repetition is indicated by dots~(\dots) in an informal but obvious
    71   way.
    72 \end{itemize}
    73 
    74 Using these conventions and assuming $\infty=9$, the grammar
    75 takes the form
    76 \begin{center}
    77 \begin{tabular}{rclc}
    78 $A$ & = & {\tt0} & \hspace*{4em} \\
    79  & $|$ & {\tt(} $A$ {\tt)} \\
    80  & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
    81  & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
    82  & $|$ & {\tt-} $A^{(3)}$ & (3)
    83 \end{tabular}
    84 \end{center}
    85 \index{priority grammars|)}
    86 
    87 
    88 \begin{figure}
    89 \begin{center}
    90 \begin{tabular}{rclc}
    91 $any$ &=& $prop$ ~~$|$~~ $logic$ \\\\
    92 $prop$ &=& {\tt(} $prop$ {\tt)} \\
    93      &$|$& $prop^{(4)}$ {\tt::} $type$ & (3) \\
    94      &$|$& {\tt PROP} $aprop$ \\
    95      &$|$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\
    96      &$|$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\
    97      &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\
    98      &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
    99      &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\
   100      &$|$& {\tt OFCLASS} {\tt(} $type$ {\tt,} $logic$ {\tt)} \\\\
   101 $aprop$ &=& $id$ ~~$|$~~ $var$
   102     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\
   103 $logic$ &=& {\tt(} $logic$ {\tt)} \\
   104       &$|$& $logic^{(4)}$ {\tt::} $type$ & (3) \\
   105       &$|$& $id$ ~~$|$~~ $var$
   106     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\
   107       &$|$& {\tt \%} $idts$ {\tt.} $any$ & (0) \\\\
   108 $idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
   109 $idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
   110     &$|$& $id$ {\tt ::} $type$ & (0) \\\\
   111 $type$ &=& {\tt(} $type$ {\tt)} \\
   112      &$|$& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
   113        ~~$|$~~ $tvar$ {\tt::} $sort$ \\
   114      &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$
   115                 ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
   116      &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\
   117      &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0) \\\\
   118 $sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
   119                 ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
   120 \end{tabular}
   121 \index{*PROP symbol}
   122 \index{*== symbol}\index{*=?= symbol}\index{*==> symbol}
   123 \index{*:: symbol}\index{*=> symbol}
   124 \index{sort constraints}
   125 %the index command: a percent is permitted, but braces must match!
   126 \index{%@{\tt\%} symbol}
   127 \index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}
   128 \index{*[ symbol}\index{*] symbol}
   129 \index{*"!"! symbol}
   130 \index{*"["| symbol}
   131 \index{*"|"] symbol}
   132 \end{center}
   133 \caption{Meta-logic syntax}\label{fig:pure_gram}
   134 \end{figure}
   135 
   136 
   137 \section{The Pure syntax} \label{sec:basic_syntax}
   138 \index{syntax!Pure|(}
   139 
   140 At the root of all object-logics lies the theory \thydx{Pure}.  It
   141 contains, among many other things, the Pure syntax.  An informal account of
   142 this basic syntax (types, terms and formulae) appears in
   143 \iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%
   144 {\S\ref{sec:forward}}.  A more precise description using a priority grammar
   145 appears in Fig.\ts\ref{fig:pure_gram}.  It defines the following
   146 nonterminals:
   147 \begin{ttdescription}
   148   \item[\ndxbold{any}] denotes any term.
   149 
   150   \item[\ndxbold{prop}] denotes terms of type {\tt prop}.  These are formulae
   151     of the meta-logic.  Note that user constants of result type {\tt prop}
   152     (i.e.\ $c :: \ldots \To prop$) should always provide concrete syntax.
   153     Otherwise atomic propositions with head $c$ may be printed incorrectly.
   154 
   155   \item[\ndxbold{aprop}] denotes atomic propositions.
   156 
   157 %% FIXME huh!?
   158 %  These typically
   159 %  include the judgement forms of the object-logic; its definition
   160 %  introduces a meta-level predicate for each judgement form.
   161 
   162   \item[\ndxbold{logic}] denotes terms whose type belongs to class
   163     \cldx{logic}, excluding type \tydx{prop}.
   164 
   165   \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained
   166     by types.
   167 
   168   \item[\ndxbold{type}] denotes types of the meta-logic.
   169 
   170   \item[\ndxbold{sort}] denotes meta-level sorts.
   171 \end{ttdescription}
   172 
   173 \begin{warn}
   174   In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,
   175   treating {\tt y} like a type constructor applied to {\tt nat}.  The
   176   likely result is an error message.  To avoid this interpretation, use
   177   parentheses and write \verb|(x::nat) y|.
   178   \index{type constraints}\index{*:: symbol}
   179 
   180   Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
   181   yields an error.  The correct form is \verb|(x::nat) (y::nat)|.
   182 \end{warn}
   183 
   184 \begin{warn}
   185   Type constraints bind very weakly. For example, \verb!x<y::nat! is normally
   186   parsed as \verb!(x<y)::nat!, unless \verb$<$ has priority of 3 or less, in
   187   which case the string is likely to be ambiguous. The correct form is
   188   \verb!x<(y::nat)!.
   189 \end{warn}
   190 
   191 \subsection{Logical types and default syntax}\label{logical-types}
   192 \index{lambda calc@$\lambda$-calculus}
   193 
   194 Isabelle's representation of mathematical languages is based on the
   195 simply typed $\lambda$-calculus.  All logical types, namely those of
   196 class \cldx{logic}, are automatically equipped with a basic syntax of
   197 types, identifiers, variables, parentheses, $\lambda$-abstraction and
   198 application.
   199 \begin{warn}
   200   Isabelle combines the syntaxes for all types of class \cldx{logic} by
   201   mapping all those types to the single nonterminal $logic$.  Thus all
   202   productions of $logic$, in particular $id$, $var$ etc, become available.
   203 \end{warn}
   204 
   205 
   206 \subsection{Lexical matters}
   207 The parser does not process input strings directly.  It operates on token
   208 lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
   209 tokens: \bfindex{delimiters} and \bfindex{name tokens}.
   210 
   211 \index{reserved words}
   212 Delimiters can be regarded as reserved words of the syntax.  You can
   213 add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
   214 appear in typewriter font, for example {\tt ==}, {\tt =?=} and
   215 {\tt PROP}\@.
   216 
   217 Name tokens have a predefined syntax.  The lexer distinguishes six disjoint
   218 classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
   219 identifiers\index{type identifiers}, type unknowns\index{type unknowns},
   220 \rmindex{numerals}, \rmindex{strings}. They are denoted by \ndxbold{id},
   221 \ndxbold{var}, \ndxbold{tid}, \ndxbold{tvar}, \ndxbold{xnum}, \ndxbold{xstr},
   222 respectively.  Typical examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3},
   223 {\tt \#42}, {\tt ''foo bar''}. Here is the precise syntax:
   224 \begin{eqnarray*}
   225 id        & =   & letter~quasiletter^* \\
   226 var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
   227 tid       & =   & \mbox{\tt '}id \\
   228 tvar      & =   & \mbox{\tt ?}tid ~~|~~
   229                   \mbox{\tt ?}tid\mbox{\tt .}nat \\
   230 xnum      & =   & \mbox{\tt \#}nat ~~|~~ \mbox{\tt \#\char`\~}nat \\
   231 xstr      & =   & \mbox{\tt ''\rm text\tt ''} \\[1ex]
   232 letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
   233 digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
   234 quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
   235 nat       & =   & digit^+
   236 \end{eqnarray*}
   237 The lexer repeatedly takes the maximal prefix of the input string that forms
   238 a valid token.  A maximal prefix that is both a delimiter and a name is
   239 treated as a delimiter.  Spaces, tabs, newlines and formfeeds are separators;
   240 they never occur within tokens, except those of class $xstr$.
   241 
   242 \medskip
   243 Delimiters need not be separated by white space.  For example, if {\tt -}
   244 is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
   245 two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\
   246 treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
   247 more liberal scheme is that the same string may be parsed in different ways
   248 after extending the syntax: after adding {\tt --} as a delimiter, the input
   249 {\tt --} is treated as a single token.
   250 
   251 A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally
   252 a pair of base name and index (\ML\ type \mltydx{indexname}).  These
   253 components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
   254 run together as in {\tt ?x1}.  The latter form is possible if the base name
   255 does not end with digits.  If the index is 0, it may be dropped altogether:
   256 {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
   257 
   258 Tokens of class $xnum$ or $xstr$ are not used by the meta-logic.
   259 Object-logics may provide numerals and string constants by adding appropriate
   260 productions and translation functions.
   261 
   262 \medskip
   263 Although name tokens are returned from the lexer rather than the parser, it
   264 is more logical to regard them as nonterminals.  Delimiters, however, are
   265 terminals; they are just syntactic sugar and contribute nothing to the
   266 abstract syntax tree.
   267 
   268 
   269 \subsection{*Inspecting the syntax}
   270 \begin{ttbox}
   271 syn_of              : theory -> Syntax.syntax
   272 print_syntax        : theory -> unit
   273 Syntax.print_syntax : Syntax.syntax -> unit
   274 Syntax.print_gram   : Syntax.syntax -> unit
   275 Syntax.print_trans  : Syntax.syntax -> unit
   276 \end{ttbox}
   277 The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes
   278 in \ML.  You can display values of this type by calling the following
   279 functions:
   280 \begin{ttdescription}
   281 \item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
   282   theory~{\it thy} as an \ML\ value.
   283 
   284 \item[\ttindexbold{print_syntax} $thy$] displays the syntax part of $thy$
   285   using {\tt Syntax.print_syntax}.
   286 
   287 \item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
   288   information contained in the syntax {\it syn}.  The displayed output can
   289   be large.  The following two functions are more selective.
   290 
   291 \item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
   292   of~{\it syn}, namely the lexicon, logical types and productions.  These are
   293   discussed below.
   294 
   295 \item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
   296   part of~{\it syn}, namely the constants, parse/print macros and
   297   parse/print translations.
   298 \end{ttdescription}
   299 
   300 Let us demonstrate these functions by inspecting Pure's syntax.  Even that
   301 is too verbose to display in full.
   302 \begin{ttbox}\index{*Pure theory}
   303 Syntax.print_syntax (syn_of Pure.thy);
   304 {\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
   305 {\out logtypes: fun itself}
   306 {\out prods:}
   307 {\out   type = tid  (1000)}
   308 {\out   type = tvar  (1000)}
   309 {\out   type = id  (1000)}
   310 {\out   type = tid "::" sort[0]  => "_ofsort" (1000)}
   311 {\out   type = tvar "::" sort[0]  => "_ofsort" (1000)}
   312 {\out   \vdots}
   313 \ttbreak
   314 {\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
   315 {\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
   316 {\out   "_idtyp" "_lambda" "_tapp" "_tappl"}
   317 {\out parse_rules:}
   318 {\out parse_translation: "!!" "_K" "_abs" "_aprop"}
   319 {\out print_translation: "all"}
   320 {\out print_rules:}
   321 {\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
   322 \end{ttbox}
   323 
   324 As you can see, the output is divided into labelled sections.  The grammar
   325 is represented by {\tt lexicon}, {\tt logtypes} and {\tt prods}.  The rest
   326 refers to syntactic translations and macro expansion.  Here is an
   327 explanation of the various sections.
   328 \begin{description}
   329   \item[{\tt lexicon}] lists the delimiters used for lexical
   330     analysis.\index{delimiters}
   331 
   332   \item[{\tt logtypes}] lists the types that are regarded the same as {\tt
   333     logic} syntactically. Thus types of object-logics (e.g.\ {\tt nat}, say)
   334     will be automatically equipped with the standard syntax of
   335     $\lambda$-calculus.
   336 
   337   \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.
   338     The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
   339     Each delimiter is quoted.  Some productions are shown with {\tt =>} and
   340     an attached string.  These strings later become the heads of parse
   341     trees; they also play a vital role when terms are printed (see
   342     \S\ref{sec:asts}).
   343 
   344     Productions with no strings attached are called {\bf copy
   345       productions}\indexbold{productions!copy}.  Their right-hand side must
   346     have exactly one nonterminal symbol (or name token).  The parser does
   347     not create a new parse tree node for copy productions, but simply
   348     returns the parse tree of the right-hand symbol.
   349 
   350     If the right-hand side consists of a single nonterminal with no
   351     delimiters, then the copy production is called a {\bf chain
   352       production}.  Chain productions act as abbreviations:
   353     conceptually, they are removed from the grammar by adding new
   354     productions.  Priority information attached to chain productions is
   355     ignored; only the dummy value $-1$ is displayed.
   356 
   357   \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]
   358     relate to macros (see \S\ref{sec:macros}).
   359 
   360   \item[{\tt parse_ast_translation}, {\tt print_ast_translation}]
   361     list sets of constants that invoke translation functions for abstract
   362     syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
   363     matter.\index{constants!for translations}
   364 
   365   \item[{\tt parse_translation}, {\tt print_translation}] list sets
   366     of constants that invoke translation functions for terms (see
   367     \S\ref{sec:tr_funs}).
   368 \end{description}
   369 \index{syntax!Pure|)}
   370 
   371 
   372 \section{Mixfix declarations} \label{sec:mixfix}
   373 \index{mixfix declarations|(}
   374 
   375 When defining a theory, you declare new constants by giving their names,
   376 their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
   377 allow you to extend Isabelle's basic $\lambda$-calculus syntax with
   378 readable notation.  They can express any context-free priority grammar.
   379 Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
   380 general than the priority declarations of \ML\ and Prolog.
   381 
   382 A mixfix annotation defines a production of the priority grammar.  It
   383 describes the concrete syntax, the translation to abstract syntax, and the
   384 pretty printing.  Special case annotations provide a simple means of
   385 specifying infix operators and binders.
   386 
   387 \subsection{The general mixfix form}
   388 Here is a detailed account of mixfix declarations.  Suppose the following
   389 line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
   390 file:
   391 \begin{center}
   392   {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
   393 \end{center}
   394 This constant declaration and mixfix annotation are interpreted as follows:
   395 \begin{itemize}\index{productions}
   396 \item The string {\tt $c$} is the name of the constant associated with the
   397   production; unless it is a valid identifier, it must be enclosed in
   398   quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
   399   production.\index{productions!copy} Otherwise, parsing an instance of the
   400   phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
   401     $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
   402   argument.
   403 
   404   \item The constant $c$, if non-empty, is declared to have type $\sigma$
   405     ({\tt consts} section only).
   406 
   407   \item The string $template$ specifies the right-hand side of
   408     the production.  It has the form
   409     \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
   410     where each occurrence of {\tt_} denotes an argument position and
   411     the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
   412     the concrete syntax, you must escape it as described below.)  The $w@i$
   413     may consist of \rmindex{delimiters}, spaces or
   414     \rmindex{pretty printing} annotations (see below).
   415 
   416   \item The type $\sigma$ specifies the production's nonterminal symbols
   417     (or name tokens).  If $template$ is of the form above then $\sigma$
   418     must be a function type with at least~$n$ argument positions, say
   419     $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
   420     derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
   421     below.  Any of these may be function types.
   422 
   423   \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
   424       [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
   425     priority\indexbold{priorities} required of any phrase that may appear
   426     as the $i$-th argument.  Missing priorities default to~0.
   427 
   428   \item The integer $p$ is the priority of this production.  If omitted, it
   429     defaults to the maximal priority.
   430     Priorities range between 0 and \ttindexbold{max_pri} (= 1000).
   431 \end{itemize}
   432 %
   433 The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
   434 A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
   435 nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
   436 The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
   437 this is a logical type (namely one of class {\tt logic} excluding {\tt
   438 prop}).  Otherwise it is $ty$ (note that only the outermost type constructor
   439 is taken into account).  Finally, the nonterminal of a type variable is {\tt
   440 any}.
   441 
   442 \begin{warn}
   443   Theories must sometimes declare types for purely syntactic purposes ---
   444   merely playing the role of nonterminals. One example is \tydx{type}, the
   445   built-in type of types.  This is a `type of all types' in the syntactic
   446   sense only.  Do not declare such types under {\tt arities} as belonging to
   447   class {\tt logic}\index{*logic class}, for that would make them useless as
   448   separate nonterminal symbols.
   449 \end{warn}
   450 
   451 Associating nonterminals with types allows a constant's type to specify
   452 syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
   453 \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
   454 of the function's $n$ arguments.  The constant's name, in this case~$f$, will
   455 also serve as the label in the abstract syntax tree.
   456 
   457 You may also declare mixfix syntax without adding constants to the theory's
   458 signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a
   459 production need not map directly to a logical function (this typically
   460 requires additional syntactic translations, see also
   461 Chapter~\ref{chap:syntax}).
   462 
   463 
   464 \medskip
   465 As a special case of the general mixfix declaration, the form
   466 \begin{center}
   467   {\tt $c$ ::\ "$\sigma$" ("$template$")}
   468 \end{center}
   469 specifies no priorities.  The resulting production puts no priority
   470 constraints on any of its arguments and has maximal priority itself.
   471 Omitting priorities in this manner is prone to syntactic ambiguities unless
   472 the production's right-hand side is fully bracketed, as in
   473 \verb|"if _ then _ else _ fi"|.
   474 
   475 Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
   476 is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
   477 write terms involving~$c$.
   478 
   479 
   480 \subsection{Example: arithmetic expressions}
   481 \index{examples!of mixfix declarations}
   482 This theory specification contains a {\tt syntax} section with mixfix
   483 declarations encoding the priority grammar from
   484 \S\ref{sec:priority_grammars}:
   485 \begin{ttbox}
   486 EXP = Pure +
   487 types
   488   exp
   489 syntax
   490   "0" :: exp                 ("0"      9)
   491   "+" :: [exp, exp] => exp   ("_ + _"  [0, 1] 0)
   492   "*" :: [exp, exp] => exp   ("_ * _"  [3, 2] 2)
   493   "-" :: exp => exp          ("- _"    [3] 3)
   494 end
   495 \end{ttbox}
   496 If you put this into a file {\tt EXP.thy} and load it via {\tt use_thy"EXP"},
   497 you can run some tests:
   498 \begin{ttbox}
   499 val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";
   500 {\out val it = fn : string -> unit}
   501 read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";
   502 {\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
   503 {\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
   504 {\out \vdots}
   505 read_exp "0 + - 0 + 0";
   506 {\out tokens: "0" "+" "-" "0" "+" "0"}
   507 {\out raw: ("+" ("+" "0" ("-" "0")) "0")}
   508 {\out \vdots}
   509 \end{ttbox}
   510 The output of \ttindex{Syntax.test_read} includes the token list ({\tt
   511   tokens}) and the raw \AST{} directly derived from the parse tree,
   512 ignoring parse \AST{} translations.  The rest is tracing information
   513 provided by the macro expander (see \S\ref{sec:macros}).
   514 
   515 Executing {\tt Syntax.print_gram} reveals the productions derived from the
   516 above mixfix declarations (lots of additional information deleted):
   517 \begin{ttbox}
   518 Syntax.print_gram (syn_of EXP.thy);
   519 {\out exp = "0"  => "0" (9)}
   520 {\out exp = exp[0] "+" exp[1]  => "+" (0)}
   521 {\out exp = exp[3] "*" exp[2]  => "*" (2)}
   522 {\out exp = "-" exp[3]  => "-" (3)}
   523 \end{ttbox}
   524 
   525 Note that because {\tt exp} is not of class {\tt logic}, it has been retained
   526 as a separate nonterminal. This also entails that the syntax does not provide
   527 for identifiers or paranthesized expressions. Normally you would also want to
   528 add the declaration {\tt arities exp :: logic} and use {\tt consts} instead
   529 of {\tt syntax}. Try this as an exercise and study the changes in the
   530 grammar.
   531 
   532 \subsection{The mixfix template}
   533 Let us now take a closer look at the string $template$ appearing in mixfix
   534 annotations.  This string specifies a list of parsing and printing
   535 directives: delimiters\index{delimiters}, arguments, spaces, blocks of
   536 indentation and line breaks.  These are encoded by the following character
   537 sequences:
   538 \index{pretty printing|(}
   539 \begin{description}
   540 \item[~$d$~] is a delimiter, namely a non-empty sequence of characters
   541   other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.
   542   Even these characters may appear if escaped; this means preceding it with
   543   a~{\tt '} (single quote).  Thus you have to write {\tt ''} if you really
   544   want a single quote.  Furthermore, a~{\tt '} followed by a space separates
   545   delimiters without extra white space being added for printing.
   546 
   547 \item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol
   548   or name token.
   549 
   550 \item[~$s$~] is a non-empty sequence of spaces for printing.  This and the
   551   following specifications do not affect parsing at all.
   552 
   553 \item[~{\tt(}$n$~] opens a pretty printing block.  The optional number $n$
   554   specifies how much indentation to add when a line break occurs within the
   555   block.  If {\tt(} is not followed by digits, the indentation defaults
   556   to~0.
   557 
   558 \item[~{\tt)}~] closes a pretty printing block.
   559 
   560 \item[~{\tt//}~] forces a line break.
   561 
   562 \item[~{\tt/}$s$~] allows a line break.  Here $s$ stands for the string of
   563   spaces (zero or more) right after the {\tt /} character.  These spaces
   564   are printed if the break is not taken.
   565 \end{description}
   566 For example, the template {\tt"(_ +/ _)"} specifies an infix operator.
   567 There are two argument positions; the delimiter~{\tt+} is preceded by a
   568 space and followed by a space or line break; the entire phrase is a pretty
   569 printing block.  Other examples appear in Fig.\ts\ref{fig:set_trans} below.
   570 Isabelle's pretty printer resembles the one described in
   571 Paulson~\cite{paulson91}.
   572 
   573 \index{pretty printing|)}
   574 
   575 
   576 \subsection{Infixes}
   577 \indexbold{infixes}
   578 
   579 Infix operators associating to the left or right can be declared
   580 using {\tt infixl} or {\tt infixr}.
   581 Roughly speaking, the form {\tt $c$ ::\ $\sigma$ (infixl $p$)}
   582 abbreviates the mixfix declarations
   583 \begin{ttbox}
   584 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
   585 "op \(c\)" :: \(\sigma\)   ("op \(c\)")
   586 \end{ttbox}
   587 and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations
   588 \begin{ttbox}
   589 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
   590 "op \(c\)" :: \(\sigma\)   ("op \(c\)")
   591 \end{ttbox}
   592 The infix operator is declared as a constant with the prefix {\tt op}.
   593 Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
   594 function symbols, as in \ML.  Special characters occurring in~$c$ must be
   595 escaped, as in delimiters, using a single quote.
   596 
   597 
   598 \subsection{Binders}
   599 \indexbold{binders}
   600 \begingroup
   601 \def\Q{{\cal Q}}
   602 A {\bf binder} is a variable-binding construct such as a quantifier.  The
   603 constant declaration
   604 \begin{ttbox}
   605 \(c\) :: \(\sigma\)   (binder "\(\Q\)" [\(pb\)] \(p\))
   606 \end{ttbox}
   607 introduces a constant~$c$ of type~$\sigma$, which must have the form
   608 $(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
   609 $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
   610 and the whole term has type~$\tau@3$. The optional integer $pb$
   611 specifies the body's priority, by default~$p$.  Special characters
   612 in $\Q$ must be escaped using a single quote.
   613 
   614 The declaration is expanded internally to something like
   615 \begin{ttbox}
   616 \(c\)\hskip3pt    :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)
   617 "\(\Q\)"  :: [idts, \(\tau@2\)] => \(\tau@3\)   ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))
   618 \end{ttbox}
   619 Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
   620 \index{type constraints}
   621 optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
   622 declaration also installs a parse translation\index{translations!parse}
   623 for~$\Q$ and a print translation\index{translations!print} for~$c$ to
   624 translate between the internal and external forms.
   625 
   626 A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
   627 list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
   628 corresponds to the internal form
   629 \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
   630 
   631 \medskip
   632 For example, let us declare the quantifier~$\forall$:\index{quantifiers}
   633 \begin{ttbox}
   634 All :: ('a => o) => o   (binder "ALL " 10)
   635 \end{ttbox}
   636 This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
   637   $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
   638 back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
   639   $x$.$P$} have type~$o$, the type of formulae, while the bound variable
   640 can be polymorphic.
   641 \endgroup
   642 
   643 \index{mixfix declarations|)}
   644 
   645 \section{Ambiguity of parsed expressions} \label{sec:ambiguity}
   646 \index{ambiguity!of parsed expressions}
   647 
   648 To keep the grammar small and allow common productions to be shared
   649 all logical types (except {\tt prop}) are internally represented
   650 by one nonterminal, namely {\tt logic}. This and omitted or too freely
   651 chosen priorities may lead to ways of parsing an expression that were
   652 not intended by the theory's maker. In most cases Isabelle is able to
   653 select one of multiple parse trees that an expression has lead
   654 to by checking which of them can be typed correctly. But this may not
   655 work in every case and always slows down parsing.
   656 The warning and error messages that can be produced during this process are
   657 as follows:
   658 
   659 If an ambiguity can be resolved by type inference the following
   660 warning is shown to remind the user that parsing is (unnecessarily)
   661 slowed down. In cases where it's not easily possible to eliminate the
   662 ambiguity the frequency of the warning can be controlled by changing
   663 the value of {\tt Syntax.ambiguity_level} which has type {\tt int
   664 ref}. Its default value is 1 and by increasing it one can control how
   665 many parse trees are necessary to generate the warning.
   666 
   667 \begin{ttbox}
   668 {\out Warning: Ambiguous input "..."}
   669 {\out produces the following parse trees:}
   670 {\out ...}
   671 {\out Fortunately, only one parse tree is type correct.}
   672 {\out It helps (speed!) if you disambiguate your grammar or your input.}
   673 \end{ttbox}
   674 
   675 The following message is normally caused by using the same
   676 syntax in two different productions:
   677 
   678 \begin{ttbox}
   679 {\out Warning: Ambiguous input "..."}
   680 {\out produces the following parse trees:}
   681 {\out ...}
   682 {\out Error: More than one term is type correct:}
   683 {\out ...}
   684 \end{ttbox}
   685 
   686 Ambiguities occuring in syntax translation rules cannot be resolved by
   687 type inference because it is not necessary for these rules to be type
   688 correct. Therefore Isabelle always generates an error message and the
   689 ambiguity should be eliminated by changing the grammar or the rule.
   690 
   691 
   692 \section{Example: some minimal logics} \label{sec:min_logics}
   693 \index{examples!of logic definitions}
   694 
   695 This section presents some examples that have a simple syntax.  They
   696 demonstrate how to define new object-logics from scratch.
   697 
   698 First we must define how an object-logic syntax is embedded into the
   699 meta-logic.  Since all theorems must conform to the syntax for~\ndx{prop}
   700 (see Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
   701 object-level syntax.  Assume that the syntax of your object-logic defines a
   702 meta-type~\tydx{o} of formulae which refers to the nonterminal {\tt logic}.
   703 These formulae can now appear in axioms and theorems wherever \ndx{prop} does
   704 if you add the production
   705 \[ prop ~=~ logic. \]
   706 This is not supposed to be a copy production but an implicit coercion from
   707 formulae to propositions:
   708 \begin{ttbox}
   709 Base = Pure +
   710 types
   711   o
   712 arities
   713   o :: logic
   714 consts
   715   Trueprop :: o => prop   ("_" 5)
   716 end
   717 \end{ttbox}
   718 The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible
   719 coercion function.  Assuming this definition resides in a file {\tt Base.thy},
   720 you have to load it with the command {\tt use_thy "Base"}.
   721 
   722 One of the simplest nontrivial logics is {\bf minimal logic} of
   723 implication.  Its definition in Isabelle needs no advanced features but
   724 illustrates the overall mechanism nicely:
   725 \begin{ttbox}
   726 Hilbert = Base +
   727 consts
   728   "-->" :: [o, o] => o   (infixr 10)
   729 rules
   730   K     "P --> Q --> P"
   731   S     "(P --> Q --> R) --> (P --> Q) --> P --> R"
   732   MP    "[| P --> Q; P |] ==> Q"
   733 end
   734 \end{ttbox}
   735 After loading this definition from the file {\tt Hilbert.thy}, you can
   736 start to prove theorems in the logic:
   737 \begin{ttbox}
   738 goal Hilbert.thy "P --> P";
   739 {\out Level 0}
   740 {\out P --> P}
   741 {\out  1.  P --> P}
   742 \ttbreak
   743 by (resolve_tac [Hilbert.MP] 1);
   744 {\out Level 1}
   745 {\out P --> P}
   746 {\out  1.  ?P --> P --> P}
   747 {\out  2.  ?P}
   748 \ttbreak
   749 by (resolve_tac [Hilbert.MP] 1);
   750 {\out Level 2}
   751 {\out P --> P}
   752 {\out  1.  ?P1 --> ?P --> P --> P}
   753 {\out  2.  ?P1}
   754 {\out  3.  ?P}
   755 \ttbreak
   756 by (resolve_tac [Hilbert.S] 1);
   757 {\out Level 3}
   758 {\out P --> P}
   759 {\out  1.  P --> ?Q2 --> P}
   760 {\out  2.  P --> ?Q2}
   761 \ttbreak
   762 by (resolve_tac [Hilbert.K] 1);
   763 {\out Level 4}
   764 {\out P --> P}
   765 {\out  1.  P --> ?Q2}
   766 \ttbreak
   767 by (resolve_tac [Hilbert.K] 1);
   768 {\out Level 5}
   769 {\out P --> P}
   770 {\out No subgoals!}
   771 \end{ttbox}
   772 As we can see, this Hilbert-style formulation of minimal logic is easy to
   773 define but difficult to use.  The following natural deduction formulation is
   774 better:
   775 \begin{ttbox}
   776 MinI = Base +
   777 consts
   778   "-->" :: [o, o] => o   (infixr 10)
   779 rules
   780   impI  "(P ==> Q) ==> P --> Q"
   781   impE  "[| P --> Q; P |] ==> Q"
   782 end
   783 \end{ttbox}
   784 Note, however, that although the two systems are equivalent, this fact
   785 cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
   786 derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
   787   Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
   788 in {\tt Hilbert}, something that can only be shown by induction over all
   789 possible proofs in {\tt Hilbert}.
   790 
   791 We may easily extend minimal logic with falsity:
   792 \begin{ttbox}
   793 MinIF = MinI +
   794 consts
   795   False :: o
   796 rules
   797   FalseE "False ==> P"
   798 end
   799 \end{ttbox}
   800 On the other hand, we may wish to introduce conjunction only:
   801 \begin{ttbox}
   802 MinC = Base +
   803 consts
   804   "&" :: [o, o] => o   (infixr 30)
   805 \ttbreak
   806 rules
   807   conjI  "[| P; Q |] ==> P & Q"
   808   conjE1 "P & Q ==> P"
   809   conjE2 "P & Q ==> Q"
   810 end
   811 \end{ttbox}
   812 And if we want to have all three connectives together, we create and load a
   813 theory file consisting of a single line:\footnote{We can combine the
   814   theories without creating a theory file using the ML declaration
   815 \begin{ttbox}
   816 val MinIFC_thy = merge_theories(MinIF,MinC)
   817 \end{ttbox}
   818 \index{*merge_theories|fnote}}
   819 \begin{ttbox}
   820 MinIFC = MinIF + MinC
   821 \end{ttbox}
   822 Now we can prove mixed theorems like
   823 \begin{ttbox}
   824 goal MinIFC.thy "P & False --> Q";
   825 by (resolve_tac [MinI.impI] 1);
   826 by (dresolve_tac [MinC.conjE2] 1);
   827 by (eresolve_tac [MinIF.FalseE] 1);
   828 \end{ttbox}
   829 Try this as an exercise!