doc-src/Ref/defining.tex
author wenzelm
Wed, 09 Jun 2004 18:50:38 +0200
changeset 14893 55e83c32cdec
parent 14483 6eac487f9cfa
child 14948 aa6d54648b32
permissions -rw-r--r--
removed Syntax.test_read;
     1 %% $Id$
     2 \chapter{Defining Logics} \label{Defining-Logics}
     3 This chapter explains how to define new formal systems --- in particular,
     4 their concrete syntax.  While Isabelle can be regarded as a theorem prover
     5 for set theory, higher-order logic or the sequent calculus, its
     6 distinguishing feature is support for the definition of new logics.
     7 
     8 Isabelle logics are hierarchies of theories, which are described and
     9 illustrated in
    10 \iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%
    11 {\S\ref{sec:defining-theories}}.  That material, together with the theory
    12 files provided in the examples directories, should suffice for all simple
    13 applications.  The easiest way to define a new theory is by modifying a
    14 copy of an existing theory.
    15 
    16 This chapter documents the meta-logic syntax, mixfix declarations and
    17 pretty printing.  The extended examples in \S\ref{sec:min_logics}
    18 demonstrate the logical aspects of the definition of theories.
    19 
    20 
    21 \section{Priority grammars} \label{sec:priority_grammars}
    22 \index{priority grammars|(}
    23 
    24 A context-free grammar contains a set of {\bf nonterminal symbols}, a set of
    25 {\bf terminal symbols} and a set of {\bf productions}\index{productions}.
    26 Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and
    27 $\gamma$ is a string of terminals and nonterminals.  One designated
    28 nonterminal is called the {\bf start symbol}.  The language defined by the
    29 grammar consists of all strings of terminals that can be derived from the
    30 start symbol by applying productions as rewrite rules.
    31 
    32 The syntax of an Isabelle logic is specified by a {\bf priority
    33   grammar}.\index{priorities} Each nonterminal is decorated by an integer
    34 priority, as in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be
    35 rewritten using a production $A^{(q)} = \gamma$ only if~$p \leq q$.  Any
    36 priority grammar can be translated into a normal context free grammar by
    37 introducing new nonterminals and productions.
    38 
    39 Formally, a set of context free productions $G$ induces a derivation
    40 relation $\longrightarrow@G$.  Let $\alpha$ and $\beta$ denote strings of
    41 terminal or nonterminal symbols.  Then
    42 \[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \]
    43 if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \leq q$.
    44 
    45 The following simple grammar for arithmetic expressions demonstrates how
    46 binding power and associativity of operators can be enforced by priorities.
    47 \begin{center}
    48 \begin{tabular}{rclr}
    49   $A^{(9)}$ & = & {\tt0} \\
    50   $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
    51   $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
    52   $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
    53   $A^{(3)}$ & = & {\tt-} $A^{(3)}$
    54 \end{tabular}
    55 \end{center}
    56 The choice of priorities determines that {\tt -} binds tighter than {\tt *},
    57 which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
    58 left and {\tt *} to the right.
    59 
    60 For clarity, grammars obey these conventions:
    61 \begin{itemize}
    62 \item All priorities must lie between~0 and \ttindex{max_pri}, which is a
    63   some fixed integer.  Sometimes {\tt max_pri} is written as $\infty$.
    64 \item Priority 0 on the right-hand side and priority \ttindex{max_pri} on
    65   the left-hand side may be omitted.
    66 \item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the
    67   priority of the left-hand side actually appears in a column on the far
    68   right.
    69 \item Alternatives are separated by~$|$.
    70 \item Repetition is indicated by dots~(\dots) in an informal but obvious
    71   way.
    72 \end{itemize}
    73 
    74 Using these conventions and assuming $\infty=9$, the grammar
    75 takes the form
    76 \begin{center}
    77 \begin{tabular}{rclc}
    78 $A$ & = & {\tt0} & \hspace*{4em} \\
    79  & $|$ & {\tt(} $A$ {\tt)} \\
    80  & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
    81  & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
    82  & $|$ & {\tt-} $A^{(3)}$ & (3)
    83 \end{tabular}
    84 \end{center}
    85 \index{priority grammars|)}
    86 
    87 
    88 \begin{figure}\small
    89 \begin{center}
    90 \begin{tabular}{rclc}
    91 $any$ &=& $prop$ ~~$|$~~ $logic$ \\\\
    92 $prop$ &=& {\tt(} $prop$ {\tt)} \\
    93      &$|$& $prop^{(4)}$ {\tt::} $type$ & (3) \\
    94      &$|$& {\tt PROP} $aprop$ \\
    95      &$|$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\
    96      &$|$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\
    97      &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\
    98      &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
    99      &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\
   100      &$|$& {\tt OFCLASS} {\tt(} $type$ {\tt,} $logic$ {\tt)} \\\\
   101 $aprop$ &=& $id$ ~~$|$~~ $longid$ ~~$|$~~ $var$
   102     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\
   103 $logic$ &=& {\tt(} $logic$ {\tt)} \\
   104       &$|$& $logic^{(4)}$ {\tt::} $type$ & (3) \\
   105       &$|$& $id$ ~~$|$~~ $longid$ ~~$|$~~ $var$
   106     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\
   107       &$|$& {\tt \%} $pttrns$ {\tt.} $any^{(3)}$ & (3) \\
   108       &$|$& {\tt TYPE} {\tt(} $type$ {\tt)} \\\\
   109 $idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
   110 $idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
   111     &$|$& $id$ {\tt ::} $type$ & (0) \\\\
   112 $pttrns$ &=& $pttrn$ ~~$|$~~ $pttrn^{(1)}$ $pttrns$ \\\\
   113 $pttrn$ &=& $idt$ \\\\
   114 $type$ &=& {\tt(} $type$ {\tt)} \\
   115      &$|$& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
   116        ~~$|$~~ $tvar$ {\tt::} $sort$ \\
   117      &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$
   118                 ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
   119      &$|$& $longid$ ~~$|$~~ $type^{(\infty)}$ $longid$
   120                 ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $longid$ \\
   121      &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\
   122      &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0) \\\\
   123 $sort$ &=& $id$ ~~$|$~~ $longid$ ~~$|$~~ {\tt\ttlbrace\ttrbrace} ~~$|$~~
   124   {\tt\ttlbrace} $id$ ~$|$~ $longid${\tt,}\dots{\tt,} $id$ ~$|$~$longid$ {\tt\ttrbrace}
   125 \end{tabular}
   126 \index{*PROP symbol}
   127 \index{*== symbol}\index{*=?= symbol}\index{*==> symbol}
   128 \index{*:: symbol}\index{*=> symbol}
   129 \index{sort constraints}
   130 %the index command: a percent is permitted, but braces must match!
   131 \index{%@{\tt\%} symbol}
   132 \index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}
   133 \index{*[ symbol}\index{*] symbol}
   134 \index{*"!"! symbol}
   135 \index{*"["| symbol}
   136 \index{*"|"] symbol}
   137 \end{center}
   138 \caption{Meta-logic syntax}\label{fig:pure_gram}
   139 \end{figure}
   140 
   141 
   142 \section{The Pure syntax} \label{sec:basic_syntax}
   143 \index{syntax!Pure|(}
   144 
   145 At the root of all object-logics lies the theory \thydx{Pure}.  It
   146 contains, among many other things, the Pure syntax.  An informal account of
   147 this basic syntax (types, terms and formulae) appears in
   148 \iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%
   149 {\S\ref{sec:forward}}.  A more precise description using a priority grammar
   150 appears in Fig.\ts\ref{fig:pure_gram}.  It defines the following
   151 nonterminals:
   152 \begin{ttdescription}
   153   \item[\ndxbold{any}] denotes any term.
   154 
   155   \item[\ndxbold{prop}] denotes terms of type {\tt prop}.  These are formulae
   156     of the meta-logic.  Note that user constants of result type {\tt prop}
   157     (i.e.\ $c :: \ldots \To prop$) should always provide concrete syntax.
   158     Otherwise atomic propositions with head $c$ may be printed incorrectly.
   159 
   160   \item[\ndxbold{aprop}] denotes atomic propositions.
   161 
   162 %% FIXME huh!?
   163 %  These typically
   164 %  include the judgement forms of the object-logic; its definition
   165 %  introduces a meta-level predicate for each judgement form.
   166 
   167   \item[\ndxbold{logic}] denotes terms whose type belongs to class
   168     \cldx{logic}, excluding type \tydx{prop}.
   169 
   170   \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained
   171     by types.
   172     
   173   \item[\ndxbold{pttrn}, \ndxbold{pttrns}] denote patterns for
   174     abstraction, cases etc.  Initially the same as $idt$ and $idts$,
   175     these are intended to be augmented by user extensions.
   176 
   177   \item[\ndxbold{type}] denotes types of the meta-logic.
   178 
   179   \item[\ndxbold{sort}] denotes meta-level sorts.
   180 \end{ttdescription}
   181 
   182 \begin{warn}
   183   In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,
   184   treating {\tt y} like a type constructor applied to {\tt nat}.  The
   185   likely result is an error message.  To avoid this interpretation, use
   186   parentheses and write \verb|(x::nat) y|.
   187   \index{type constraints}\index{*:: symbol}
   188 
   189   Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
   190   yields an error.  The correct form is \verb|(x::nat) (y::nat)|.
   191 \end{warn}
   192 
   193 \begin{warn}
   194   Type constraints bind very weakly.  For example, \verb!x<y::nat! is normally
   195   parsed as \verb!(x<y)::nat!, unless \verb$<$ has priority of 3 or less, in
   196   which case the string is likely to be ambiguous.  The correct form is
   197   \verb!x<(y::nat)!.
   198 \end{warn}
   199 
   200 \subsection{Logical types and default syntax}\label{logical-types}
   201 \index{lambda calc@$\lambda$-calculus}
   202 
   203 Isabelle's representation of mathematical languages is based on the
   204 simply typed $\lambda$-calculus.  All logical types, namely those of
   205 class \cldx{logic}, are automatically equipped with a basic syntax of
   206 types, identifiers, variables, parentheses, $\lambda$-abstraction and
   207 application.
   208 \begin{warn}
   209   Isabelle combines the syntaxes for all types of class \cldx{logic} by
   210   mapping all those types to the single nonterminal $logic$.  Thus all
   211   productions of $logic$, in particular $id$, $var$ etc, become available.
   212 \end{warn}
   213 
   214 
   215 \subsection{Lexical matters}
   216 The parser does not process input strings directly.  It operates on token
   217 lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
   218 tokens: \bfindex{delimiters} and \bfindex{name tokens}.
   219 
   220 \index{reserved words}
   221 Delimiters can be regarded as reserved words of the syntax.  You can
   222 add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
   223 appear in typewriter font, for example {\tt ==}, {\tt =?=} and
   224 {\tt PROP}\@.
   225 
   226 Name tokens have a predefined syntax.  The lexer distinguishes six disjoint
   227 classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
   228 identifiers\index{type identifiers}, type unknowns\index{type unknowns},
   229 \rmindex{numerals}, \rmindex{strings}.  They are denoted by \ndxbold{id},
   230 \ndxbold{var}, \ndxbold{tid}, \ndxbold{tvar}, \ndxbold{xnum}, \ndxbold{xstr},
   231 respectively.  Typical examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3},
   232 {\tt \#42}, {\tt ''foo bar''}.  Here is the precise syntax:
   233 \begin{eqnarray*}
   234 id        & =   & letter~quasiletter^* \\
   235 longid    & =   & id\mbox{\tt .}id~\dots~id \\
   236 var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
   237 tid       & =   & \mbox{\tt '}id \\
   238 tvar      & =   & \mbox{\tt ?}tid ~~|~~
   239                   \mbox{\tt ?}tid\mbox{\tt .}nat \\
   240 xnum      & =   & \mbox{\tt \#}nat ~~|~~ \mbox{\tt \#-}nat \\
   241 xstr      & =   & \mbox{\tt ''\rm text\tt ''} \\[1ex]
   242 letter    & =   & sletter ~~|~~ xletter \\
   243 digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
   244 quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
   245 nat       & =   & digit^+\\[1ex]
   246 sletter & = & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
   247 xletter & = & {\tt \backslash<} ~ (sletter ~|~ dletter ~|~ gletter ~|~ cletter) ~ {\tt >}\\
   248 dletter & = & \mbox{one of {\tt aa}\dots {\tt zz} {\tt AA}\dots {\tt ZZ}} \\
   249 bletter & = & {\tt bool} ~|~ {\tt complex} ~|~ {\tt nat} ~|~ {\tt rat} ~|~ {\tt real} ~|~ {\tt int}\\
   250 gletter & = & {\tt alpha} ~|~ {\tt beta} ~|~ {\tt gamma} ~|~ {\tt delta} ~|~ {\tt epsilon} ~|~ {\tt zeta} ~|~ {\tt eta} ~|\\ 
   251         &   & {\tt theta} ~|~ {\tt iota} ~|~ {\tt kappa} ~|~ {\tt mu} ~|~ {\tt nu} ~|~ {\tt xi} ~|~ {\tt pi} ~|~ {\tt rho} ~|\\
   252         &   & {\tt sigma} ~|~ {\tt tau} ~|~ {\tt upsilon} ~|~ {\tt phi} ~|~ {\tt psi} ~|~ {\tt omega} ~|~ {\tt Gamma} ~|\\
   253         &   & {\tt Delta} ~|~ {\tt Theta} ~|~ {\tt Lambda} ~|~ {\tt Xi} ~|~ {\tt Pi} ~|~ {\tt Sigma} ~|~ {\tt Upsilon} ~|\\
   254         &   & {\tt Phi} ~|~ {\tt Psi} ~|~ {\tt Omega}\\
   255 cletter & = & {\tt \hat{}\, isup} ~~|~~ {\tt \hat{}\, isub}
   256 \end{eqnarray*}
   257 The lexer repeatedly takes the longest prefix of the input string that
   258 forms a valid token.  A maximal prefix that is both a delimiter and a
   259 name is treated as a delimiter.  Spaces, tabs, newlines and formfeeds
   260 are separators; they never occur within tokens, except those of class
   261 $xstr$.
   262 
   263 \medskip
   264 Delimiters need not be separated by white space.  For example, if {\tt -}
   265 is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
   266 two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\
   267 treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
   268 more liberal scheme is that the same string may be parsed in different ways
   269 after extending the syntax: after adding {\tt --} as a delimiter, the input
   270 {\tt --} is treated as a single token.
   271 
   272 A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally
   273 a pair of base name and index (\ML\ type \mltydx{indexname}).  These
   274 components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
   275 run together as in {\tt ?x1}.  The latter form is possible if the base name
   276 does not end with digits.  If the index is 0, it may be dropped altogether:
   277 {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
   278 
   279 Tokens of class $xnum$ or $xstr$ are not used by the meta-logic.
   280 Object-logics may provide numerals and string constants by adding appropriate
   281 productions and translation functions.
   282 
   283 \medskip
   284 Although name tokens are returned from the lexer rather than the parser, it
   285 is more logical to regard them as nonterminals.  Delimiters, however, are
   286 terminals; they are just syntactic sugar and contribute nothing to the
   287 abstract syntax tree.
   288 
   289 
   290 \subsection{*Inspecting the syntax} \label{pg:print_syn}
   291 \begin{ttbox}
   292 syn_of              : theory -> Syntax.syntax
   293 print_syntax        : theory -> unit
   294 Syntax.print_syntax : Syntax.syntax -> unit
   295 Syntax.print_gram   : Syntax.syntax -> unit
   296 Syntax.print_trans  : Syntax.syntax -> unit
   297 \end{ttbox}
   298 The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes
   299 in \ML.  You can display values of this type by calling the following
   300 functions:
   301 \begin{ttdescription}
   302 \item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
   303   theory~{\it thy} as an \ML\ value.
   304 
   305 \item[\ttindexbold{print_syntax} $thy$] uses {\tt Syntax.print_syntax}
   306  to display the syntax part of theory $thy$.
   307 
   308 \item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
   309   information contained in the syntax {\it syn}.  The displayed output can
   310   be large.  The following two functions are more selective.
   311 
   312 \item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
   313   of~{\it syn}, namely the lexicon, logical types and productions.  These are
   314   discussed below.
   315 
   316 \item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
   317   part of~{\it syn}, namely the constants, parse/print macros and
   318   parse/print translations.
   319 \end{ttdescription}
   320 
   321 The output of the above print functions is divided into labelled sections.
   322 The grammar is represented by {\tt lexicon}, {\tt logtypes} and {\tt prods}.
   323 The rest refers to syntactic translations and macro expansion.  Here is an
   324 explanation of the various sections.
   325 \begin{description}
   326   \item[{\tt lexicon}] lists the delimiters used for lexical
   327     analysis.\index{delimiters}
   328 
   329   \item[{\tt logtypes}] lists the types that are regarded the same as {\tt
   330     logic} syntactically.  Thus types of object-logics (e.g.\ {\tt nat}, say)
   331     will be automatically equipped with the standard syntax of
   332     $\lambda$-calculus.
   333 
   334   \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.
   335     The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
   336     Each delimiter is quoted.  Some productions are shown with {\tt =>} and
   337     an attached string.  These strings later become the heads of parse
   338     trees; they also play a vital role when terms are printed (see
   339     \S\ref{sec:asts}).
   340 
   341     Productions with no strings attached are called {\bf copy
   342       productions}\indexbold{productions!copy}.  Their right-hand side must
   343     have exactly one nonterminal symbol (or name token).  The parser does
   344     not create a new parse tree node for copy productions, but simply
   345     returns the parse tree of the right-hand symbol.
   346 
   347     If the right-hand side consists of a single nonterminal with no
   348     delimiters, then the copy production is called a {\bf chain
   349       production}.  Chain productions act as abbreviations:
   350     conceptually, they are removed from the grammar by adding new
   351     productions.  Priority information attached to chain productions is
   352     ignored; only the dummy value $-1$ is displayed.
   353     
   354   \item[\ttindex{print_modes}] lists the alternative print modes
   355     provided by this syntax (see \S\ref{sec:prmodes}).
   356 
   357   \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]
   358     relate to macros (see \S\ref{sec:macros}).
   359 
   360   \item[{\tt parse_ast_translation}, {\tt print_ast_translation}]
   361     list sets of constants that invoke translation functions for abstract
   362     syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
   363     matter.\index{constants!for translations}
   364 
   365   \item[{\tt parse_translation}, {\tt print_translation}] list the sets
   366     of constants that invoke translation functions for terms (see
   367     \S\ref{sec:tr_funs}).
   368 \end{description}
   369 \index{syntax!Pure|)}
   370 
   371 
   372 \section{Mixfix declarations} \label{sec:mixfix}
   373 \index{mixfix declarations|(}
   374 
   375 When defining a theory, you declare new constants by giving their names,
   376 their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
   377 allow you to extend Isabelle's basic $\lambda$-calculus syntax with
   378 readable notation.  They can express any context-free priority grammar.
   379 Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
   380 general than the priority declarations of \ML\ and Prolog.
   381 
   382 A mixfix annotation defines a production of the priority grammar.  It
   383 describes the concrete syntax, the translation to abstract syntax, and the
   384 pretty printing.  Special case annotations provide a simple means of
   385 specifying infix operators and binders.
   386 
   387 \subsection{The general mixfix form}
   388 Here is a detailed account of mixfix declarations.  Suppose the following
   389 line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
   390 file:
   391 \begin{center}
   392   {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
   393 \end{center}
   394 This constant declaration and mixfix annotation are interpreted as follows:
   395 \begin{itemize}\index{productions}
   396 \item The string {\tt $c$} is the name of the constant associated with the
   397   production; unless it is a valid identifier, it must be enclosed in
   398   quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
   399   production.\index{productions!copy} Otherwise, parsing an instance of the
   400   phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
   401     $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
   402   argument.
   403 
   404   \item The constant $c$, if non-empty, is declared to have type $\sigma$
   405     ({\tt consts} section only).
   406 
   407   \item The string $template$ specifies the right-hand side of
   408     the production.  It has the form
   409     \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
   410     where each occurrence of {\tt_} denotes an argument position and
   411     the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
   412     the concrete syntax, you must escape it as described below.)  The $w@i$
   413     may consist of \rmindex{delimiters}, spaces or
   414     \rmindex{pretty printing} annotations (see below).
   415 
   416   \item The type $\sigma$ specifies the production's nonterminal symbols
   417     (or name tokens).  If $template$ is of the form above then $\sigma$
   418     must be a function type with at least~$n$ argument positions, say
   419     $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
   420     derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
   421     below.  Any of these may be function types.
   422 
   423   \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
   424       [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
   425     priority\indexbold{priorities} required of any phrase that may appear
   426     as the $i$-th argument.  Missing priorities default to~0.
   427     
   428   \item The integer $p$ is the priority of this production.  If
   429     omitted, it defaults to the maximal priority.  Priorities range
   430     between 0 and \ttindexbold{max_pri} (= 1000).
   431 
   432 \end{itemize}
   433 %
   434 The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
   435 A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
   436 nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
   437 The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
   438 this is a logical type (namely one of class {\tt logic} excluding {\tt
   439 prop}).  Otherwise it is $ty$ (note that only the outermost type constructor
   440 is taken into account).  Finally, the nonterminal of a type variable is {\tt
   441 any}.
   442 
   443 \begin{warn}
   444   Theories must sometimes declare types for purely syntactic purposes ---
   445   merely playing the role of nonterminals.  One example is \tydx{type}, the
   446   built-in type of types.  This is a `type of all types' in the syntactic
   447   sense only.  Do not declare such types under {\tt arities} as belonging to
   448   class {\tt logic}\index{*logic class}, for that would make them useless as
   449   separate nonterminal symbols.
   450 \end{warn}
   451 
   452 Associating nonterminals with types allows a constant's type to specify
   453 syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
   454 \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
   455 of the function's $n$ arguments.  The constant's name, in this case~$f$, will
   456 also serve as the label in the abstract syntax tree.
   457 
   458 You may also declare mixfix syntax without adding constants to the theory's
   459 signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a
   460 production need not map directly to a logical function (this typically
   461 requires additional syntactic translations, see also
   462 Chapter~\ref{chap:syntax}).
   463 
   464 
   465 \medskip
   466 As a special case of the general mixfix declaration, the form
   467 \begin{center}
   468   {\tt $c$ ::\ "$\sigma$" ("$template$")}
   469 \end{center}
   470 specifies no priorities.  The resulting production puts no priority
   471 constraints on any of its arguments and has maximal priority itself.
   472 Omitting priorities in this manner is prone to syntactic ambiguities unless
   473 the production's right-hand side is fully bracketed, as in
   474 \verb|"if _ then _ else _ fi"|.
   475 
   476 Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
   477 is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
   478 write terms involving~$c$.
   479 
   480 
   481 \subsection{Example: arithmetic expressions}
   482 \index{examples!of mixfix declarations}
   483 This theory specification contains a {\tt syntax} section with mixfix
   484 declarations encoding the priority grammar from
   485 \S\ref{sec:priority_grammars}:
   486 \begin{ttbox}
   487 ExpSyntax = Pure +
   488 types
   489   exp
   490 syntax
   491   "0" :: exp                 ("0"      9)
   492   "+" :: [exp, exp] => exp   ("_ + _"  [0, 1] 0)
   493   "*" :: [exp, exp] => exp   ("_ * _"  [3, 2] 2)
   494   "-" :: exp => exp          ("- _"    [3] 3)
   495 end
   496 \end{ttbox}
   497 Executing {\tt Syntax.print_gram} reveals the productions derived from the
   498 above mixfix declarations (lots of additional information deleted):
   499 \begin{ttbox}
   500 Syntax.print_gram (syn_of ExpSyntax.thy);
   501 {\out exp = "0"  => "0" (9)}
   502 {\out exp = exp[0] "+" exp[1]  => "+" (0)}
   503 {\out exp = exp[3] "*" exp[2]  => "*" (2)}
   504 {\out exp = "-" exp[3]  => "-" (3)}
   505 \end{ttbox}
   506 
   507 Note that because {\tt exp} is not of class {\tt logic}, it has been
   508 retained as a separate nonterminal.  This also entails that the syntax
   509 does not provide for identifiers or paranthesized expressions.
   510 Normally you would also want to add the declaration {\tt arities
   511   exp::logic} after {\tt types} and use {\tt consts} instead of {\tt
   512   syntax}.  Try this as an exercise and study the changes in the
   513 grammar.
   514 
   515 \subsection{The mixfix template}
   516 Let us now take a closer look at the string $template$ appearing in mixfix
   517 annotations.  This string specifies a list of parsing and printing
   518 directives: delimiters\index{delimiters}, arguments, spaces, blocks of
   519 indentation and line breaks.  These are encoded by the following character
   520 sequences:
   521 \index{pretty printing|(}
   522 \begin{description}
   523 \item[~$d$~] is a delimiter, namely a non-empty sequence of characters
   524   other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.
   525   Even these characters may appear if escaped; this means preceding it with
   526   a~{\tt '} (single quote).  Thus you have to write {\tt ''} if you really
   527   want a single quote.  Furthermore, a~{\tt '} followed by a space separates
   528   delimiters without extra white space being added for printing.
   529 
   530 \item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol
   531   or name token.
   532 
   533 \item[~$s$~] is a non-empty sequence of spaces for printing.  This and the
   534   following specifications do not affect parsing at all.
   535 
   536 \item[~{\tt(}$n$~] opens a pretty printing block.  The optional number $n$
   537   specifies how much indentation to add when a line break occurs within the
   538   block.  If {\tt(} is not followed by digits, the indentation defaults
   539   to~0.
   540 
   541 \item[~{\tt)}~] closes a pretty printing block.
   542 
   543 \item[~{\tt//}~] forces a line break.
   544 
   545 \item[~{\tt/}$s$~] allows a line break.  Here $s$ stands for the string of
   546   spaces (zero or more) right after the {\tt /} character.  These spaces
   547   are printed if the break is not taken.
   548 \end{description}
   549 For example, the template {\tt"(_ +/ _)"} specifies an infix operator.
   550 There are two argument positions; the delimiter~{\tt+} is preceded by a
   551 space and followed by a space or line break; the entire phrase is a pretty
   552 printing block.  Other examples appear in Fig.\ts\ref{fig:set_trans} below.
   553 Isabelle's pretty printer resembles the one described in
   554 Paulson~\cite{paulson-ml2}.
   555 
   556 \index{pretty printing|)}
   557 
   558 
   559 \subsection{Infixes}
   560 \indexbold{infixes}
   561 
   562 Infix operators associating to the left or right can be declared using
   563 {\tt infixl} or {\tt infixr}.  Basically, the form {\tt $c$ ::\ 
   564   $\sigma$ (infixl $p$)} abbreviates the mixfix declarations
   565 \begin{ttbox}
   566 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
   567 "op \(c\)" :: \(\sigma\)   ("op \(c\)")
   568 \end{ttbox}
   569 and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations
   570 \begin{ttbox}
   571 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
   572 "op \(c\)" :: \(\sigma\)   ("op \(c\)")
   573 \end{ttbox}
   574 The infix operator is declared as a constant with the prefix {\tt op}.
   575 Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
   576 function symbols, as in \ML.  Special characters occurring in~$c$ must be
   577 escaped, as in delimiters, using a single quote.
   578 
   579 A slightly more general form of infix declarations allows constant
   580 names to be independent from their concrete syntax, namely \texttt{$c$
   581   ::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}.  As
   582 an example consider:
   583 \begin{ttbox}
   584 and :: [bool, bool] => bool  (infixr "&" 35)
   585 \end{ttbox}
   586 The internal constant name will then be just \texttt{and}, without any
   587 \texttt{op} prefixed.
   588 
   589 
   590 \subsection{Binders}
   591 \indexbold{binders}
   592 \begingroup
   593 \def\Q{{\cal Q}}
   594 A {\bf binder} is a variable-binding construct such as a quantifier.  The
   595 constant declaration
   596 \begin{ttbox}
   597 \(c\) :: \(\sigma\)   (binder "\(\Q\)" [\(pb\)] \(p\))
   598 \end{ttbox}
   599 introduces a constant~$c$ of type~$\sigma$, which must have the form
   600 $(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
   601 $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
   602 and the whole term has type~$\tau@3$.  The optional integer $pb$
   603 specifies the body's priority, by default~$p$.  Special characters
   604 in $\Q$ must be escaped using a single quote.
   605 
   606 The declaration is expanded internally to something like
   607 \begin{ttbox}
   608 \(c\)\hskip3pt    :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)
   609 "\(\Q\)"  :: [idts, \(\tau@2\)] => \(\tau@3\)   ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))
   610 \end{ttbox}
   611 Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
   612 \index{type constraints}
   613 optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
   614 declaration also installs a parse translation\index{translations!parse}
   615 for~$\Q$ and a print translation\index{translations!print} for~$c$ to
   616 translate between the internal and external forms.
   617 
   618 A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
   619 list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
   620 corresponds to the internal form
   621 \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
   622 
   623 \medskip
   624 For example, let us declare the quantifier~$\forall$:\index{quantifiers}
   625 \begin{ttbox}
   626 All :: ('a => o) => o   (binder "ALL " 10)
   627 \end{ttbox}
   628 This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
   629   $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
   630 back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
   631   $x$.$P$} have type~$o$, the type of formulae, while the bound variable
   632 can be polymorphic.
   633 \endgroup
   634 
   635 \index{mixfix declarations|)}
   636 
   637 
   638 \section{*Alternative print modes} \label{sec:prmodes}
   639 \index{print modes|(}
   640 %
   641 Isabelle's pretty printer supports alternative output syntaxes.  These
   642 may be used independently or in cooperation.  The currently active
   643 print modes (with precedence from left to right) are determined by a
   644 reference variable.
   645 \begin{ttbox}\index{*print_mode}
   646 print_mode: string list ref
   647 \end{ttbox}
   648 Initially this may already contain some print mode identifiers,
   649 depending on how Isabelle has been invoked (e.g.\ by some user
   650 interface).  So changes should be incremental --- adding or deleting
   651 modes relative to the current value.
   652 
   653 Any \ML{} string is a legal print mode identifier, without any predeclaration
   654 required.  The following names should be considered reserved, though:
   655 \texttt{""} (the empty string), \texttt{symbols}, \texttt{xsymbols}, and
   656 \texttt{latex}.
   657 
   658 There is a separate table of mixfix productions for pretty printing
   659 associated with each print mode.  The currently active ones are
   660 conceptually just concatenated from left to right, with the standard
   661 syntax output table always coming last as default.  Thus mixfix
   662 productions of preceding modes in the list may override those of later
   663 ones.  Also note that token translations are always relative to some
   664 print mode (see \S\ref{sec:tok_tr}).
   665 
   666 \medskip The canonical application of print modes is optional printing
   667 of mathematical symbols from a special screen font instead of {\sc
   668   ascii}.  Another example is to re-use Isabelle's advanced
   669 $\lambda$-term printing mechanisms to generate completely different
   670 output, say for interfacing external tools like \rmindex{model
   671   checkers} (see also \texttt{HOL/Modelcheck}).
   672 
   673 \index{print modes|)}
   674 
   675 
   676 \section{Ambiguity of parsed expressions} \label{sec:ambiguity}
   677 \index{ambiguity!of parsed expressions}
   678 
   679 To keep the grammar small and allow common productions to be shared
   680 all logical types (except {\tt prop}) are internally represented
   681 by one nonterminal, namely {\tt logic}.  This and omitted or too freely
   682 chosen priorities may lead to ways of parsing an expression that were
   683 not intended by the theory's maker.  In most cases Isabelle is able to
   684 select one of multiple parse trees that an expression has lead
   685 to by checking which of them can be typed correctly.  But this may not
   686 work in every case and always slows down parsing.
   687 The warning and error messages that can be produced during this process are
   688 as follows:
   689 
   690 If an ambiguity can be resolved by type inference the following
   691 warning is shown to remind the user that parsing is (unnecessarily)
   692 slowed down.  In cases where it's not easily possible to eliminate the
   693 ambiguity the frequency of the warning can be controlled by changing
   694 the value of {\tt Syntax.ambiguity_level} which has type {\tt int
   695 ref}.  Its default value is 1 and by increasing it one can control how
   696 many parse trees are necessary to generate the warning.
   697 
   698 \begin{ttbox}
   699 {\out Ambiguous input "\dots"}
   700 {\out produces the following parse trees:}
   701 {\out \dots}
   702 {\out Fortunately, only one parse tree is type correct.}
   703 {\out You may still want to disambiguate your grammar or your input.}
   704 \end{ttbox}
   705 
   706 The following message is normally caused by using the same
   707 syntax in two different productions:
   708 
   709 \begin{ttbox}
   710 {\out Ambiguous input "..."}
   711 {\out produces the following parse trees:}
   712 {\out \dots}
   713 {\out More than one term is type correct:}
   714 {\out \dots}
   715 \end{ttbox}
   716 
   717 Ambiguities occuring in syntax translation rules cannot be resolved by
   718 type inference because it is not necessary for these rules to be type
   719 correct.  Therefore Isabelle always generates an error message and the
   720 ambiguity should be eliminated by changing the grammar or the rule.
   721 
   722 
   723 \section{Example: some minimal logics} \label{sec:min_logics}
   724 \index{examples!of logic definitions}
   725 
   726 This section presents some examples that have a simple syntax.  They
   727 demonstrate how to define new object-logics from scratch.
   728 
   729 First we must define how an object-logic syntax is embedded into the
   730 meta-logic.  Since all theorems must conform to the syntax for~\ndx{prop}
   731 (see Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
   732 object-level syntax.  Assume that the syntax of your object-logic defines a
   733 meta-type~\tydx{o} of formulae which refers to the nonterminal {\tt logic}.
   734 These formulae can now appear in axioms and theorems wherever \ndx{prop} does
   735 if you add the production
   736 \[ prop ~=~ logic. \]
   737 This is not supposed to be a copy production but an implicit coercion from
   738 formulae to propositions:
   739 \begin{ttbox}
   740 Base = Pure +
   741 types
   742   o
   743 arities
   744   o :: logic
   745 consts
   746   Trueprop :: o => prop   ("_" 5)
   747 end
   748 \end{ttbox}
   749 The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible
   750 coercion function.  Assuming this definition resides in a file {\tt Base.thy},
   751 you have to load it with the command {\tt use_thy "Base"}.
   752 
   753 One of the simplest nontrivial logics is {\bf minimal logic} of
   754 implication.  Its definition in Isabelle needs no advanced features but
   755 illustrates the overall mechanism nicely:
   756 \begin{ttbox}
   757 Hilbert = Base +
   758 consts
   759   "-->" :: [o, o] => o   (infixr 10)
   760 rules
   761   K     "P --> Q --> P"
   762   S     "(P --> Q --> R) --> (P --> Q) --> P --> R"
   763   MP    "[| P --> Q; P |] ==> Q"
   764 end
   765 \end{ttbox}
   766 After loading this definition from the file {\tt Hilbert.thy}, you can
   767 start to prove theorems in the logic:
   768 \begin{ttbox}
   769 Goal "P --> P";
   770 {\out Level 0}
   771 {\out P --> P}
   772 {\out  1.  P --> P}
   773 \ttbreak
   774 by (resolve_tac [Hilbert.MP] 1);
   775 {\out Level 1}
   776 {\out P --> P}
   777 {\out  1.  ?P --> P --> P}
   778 {\out  2.  ?P}
   779 \ttbreak
   780 by (resolve_tac [Hilbert.MP] 1);
   781 {\out Level 2}
   782 {\out P --> P}
   783 {\out  1.  ?P1 --> ?P --> P --> P}
   784 {\out  2.  ?P1}
   785 {\out  3.  ?P}
   786 \ttbreak
   787 by (resolve_tac [Hilbert.S] 1);
   788 {\out Level 3}
   789 {\out P --> P}
   790 {\out  1.  P --> ?Q2 --> P}
   791 {\out  2.  P --> ?Q2}
   792 \ttbreak
   793 by (resolve_tac [Hilbert.K] 1);
   794 {\out Level 4}
   795 {\out P --> P}
   796 {\out  1.  P --> ?Q2}
   797 \ttbreak
   798 by (resolve_tac [Hilbert.K] 1);
   799 {\out Level 5}
   800 {\out P --> P}
   801 {\out No subgoals!}
   802 \end{ttbox}
   803 As we can see, this Hilbert-style formulation of minimal logic is easy to
   804 define but difficult to use.  The following natural deduction formulation is
   805 better:
   806 \begin{ttbox}
   807 MinI = Base +
   808 consts
   809   "-->" :: [o, o] => o   (infixr 10)
   810 rules
   811   impI  "(P ==> Q) ==> P --> Q"
   812   impE  "[| P --> Q; P |] ==> Q"
   813 end
   814 \end{ttbox}
   815 Note, however, that although the two systems are equivalent, this fact
   816 cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
   817 derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
   818   Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
   819 in {\tt Hilbert}, something that can only be shown by induction over all
   820 possible proofs in {\tt Hilbert}.
   821 
   822 We may easily extend minimal logic with falsity:
   823 \begin{ttbox}
   824 MinIF = MinI +
   825 consts
   826   False :: o
   827 rules
   828   FalseE "False ==> P"
   829 end
   830 \end{ttbox}
   831 On the other hand, we may wish to introduce conjunction only:
   832 \begin{ttbox}
   833 MinC = Base +
   834 consts
   835   "&" :: [o, o] => o   (infixr 30)
   836 \ttbreak
   837 rules
   838   conjI  "[| P; Q |] ==> P & Q"
   839   conjE1 "P & Q ==> P"
   840   conjE2 "P & Q ==> Q"
   841 end
   842 \end{ttbox}
   843 And if we want to have all three connectives together, we create and load a
   844 theory file consisting of a single line:
   845 \begin{ttbox}
   846 MinIFC = MinIF + MinC
   847 \end{ttbox}
   848 Now we can prove mixed theorems like
   849 \begin{ttbox}
   850 Goal "P & False --> Q";
   851 by (resolve_tac [MinI.impI] 1);
   852 by (dresolve_tac [MinC.conjE2] 1);
   853 by (eresolve_tac [MinIF.FalseE] 1);
   854 \end{ttbox}
   855 Try this as an exercise!
   856 
   857 
   858 %%% Local Variables: 
   859 %%% mode: latex
   860 %%% TeX-master: "ref"
   861 %%% End: