doc-src/Ref/defining.tex
author clasohm
Mon, 14 Nov 1994 14:29:20 +0100
changeset 711 bb868a30e66f
parent 452 395bbf6e55f9
child 864 d63b111b917a
permissions -rw-r--r--
updated remarks about grammar; added section about ambiguities
lcp@320
     1
%% $Id$
lcp@320
     2
\chapter{Defining Logics} \label{Defining-Logics}
lcp@320
     3
This chapter explains how to define new formal systems --- in particular,
lcp@320
     4
their concrete syntax.  While Isabelle can be regarded as a theorem prover
lcp@320
     5
for set theory, higher-order logic or the sequent calculus, its
lcp@320
     6
distinguishing feature is support for the definition of new logics.
lcp@320
     7
lcp@320
     8
Isabelle logics are hierarchies of theories, which are described and
lcp@320
     9
illustrated in 
lcp@320
    10
\iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%
lcp@320
    11
{\S\ref{sec:defining-theories}}.  That material, together with the theory
lcp@320
    12
files provided in the examples directories, should suffice for all simple
lcp@320
    13
applications.  The easiest way to define a new theory is by modifying a
lcp@320
    14
copy of an existing theory.
lcp@320
    15
lcp@320
    16
This chapter documents the meta-logic syntax, mixfix declarations and
lcp@320
    17
pretty printing.  The extended examples in \S\ref{sec:min_logics}
lcp@320
    18
demonstrate the logical aspects of the definition of theories.
lcp@320
    19
lcp@320
    20
lcp@320
    21
\section{Priority grammars} \label{sec:priority_grammars}
lcp@320
    22
\index{priority grammars|(} 
lcp@320
    23
lcp@320
    24
A context-free grammar contains a set of {\bf nonterminal symbols}, a set of
lcp@320
    25
{\bf terminal symbols} and a set of {\bf productions}\index{productions}.
lcp@320
    26
Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and
lcp@320
    27
$\gamma$ is a string of terminals and nonterminals.  One designated
lcp@320
    28
nonterminal is called the {\bf start symbol}.  The language defined by the
lcp@320
    29
grammar consists of all strings of terminals that can be derived from the
lcp@320
    30
start symbol by applying productions as rewrite rules.
lcp@320
    31
lcp@320
    32
The syntax of an Isabelle logic is specified by a {\bf priority
lcp@320
    33
  grammar}.\index{priorities} Each nonterminal is decorated by an integer
lcp@320
    34
priority, as in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be
lcp@320
    35
rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$.  Any
lcp@320
    36
priority grammar can be translated into a normal context free grammar by
lcp@320
    37
introducing new nonterminals and productions.
lcp@320
    38
lcp@320
    39
Formally, a set of context free productions $G$ induces a derivation
lcp@320
    40
relation $\longrightarrow@G$.  Let $\alpha$ and $\beta$ denote strings of
lcp@320
    41
terminal or nonterminal symbols.  Then
lcp@320
    42
\[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \] 
lcp@320
    43
if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$.
lcp@320
    44
lcp@320
    45
The following simple grammar for arithmetic expressions demonstrates how
lcp@320
    46
binding power and associativity of operators can be enforced by priorities.
lcp@320
    47
\begin{center}
lcp@320
    48
\begin{tabular}{rclr}
lcp@320
    49
  $A^{(9)}$ & = & {\tt0} \\
lcp@320
    50
  $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
lcp@320
    51
  $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
lcp@320
    52
  $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
lcp@320
    53
  $A^{(3)}$ & = & {\tt-} $A^{(3)}$
lcp@320
    54
\end{tabular}
lcp@320
    55
\end{center}
lcp@320
    56
The choice of priorities determines that {\tt -} binds tighter than {\tt *},
lcp@320
    57
which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
lcp@320
    58
left and {\tt *} to the right.
lcp@320
    59
lcp@320
    60
For clarity, grammars obey these conventions:
lcp@320
    61
\begin{itemize}
lcp@320
    62
\item All priorities must lie between~0 and \ttindex{max_pri}, which is a
lcp@320
    63
  some fixed integer.  Sometimes {\tt max_pri} is written as $\infty$.
lcp@320
    64
\item Priority 0 on the right-hand side and priority \ttindex{max_pri} on
lcp@320
    65
  the left-hand side may be omitted.
lcp@320
    66
\item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the
lcp@320
    67
  priority of the left-hand side actually appears in a column on the far
lcp@320
    68
  right.  
lcp@320
    69
\item Alternatives are separated by~$|$.  
lcp@320
    70
\item Repetition is indicated by dots~(\dots) in an informal but obvious
lcp@320
    71
  way.
lcp@320
    72
\end{itemize}
lcp@320
    73
lcp@320
    74
Using these conventions and assuming $\infty=9$, the grammar
lcp@320
    75
takes the form
lcp@320
    76
\begin{center}
lcp@320
    77
\begin{tabular}{rclc}
lcp@320
    78
$A$ & = & {\tt0} & \hspace*{4em} \\
lcp@320
    79
 & $|$ & {\tt(} $A$ {\tt)} \\
lcp@320
    80
 & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
lcp@320
    81
 & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
lcp@320
    82
 & $|$ & {\tt-} $A^{(3)}$ & (3)
lcp@320
    83
\end{tabular}
lcp@320
    84
\end{center}
lcp@320
    85
\index{priority grammars|)}
lcp@320
    86
lcp@320
    87
lcp@320
    88
\begin{figure}
lcp@320
    89
\begin{center}
lcp@320
    90
\begin{tabular}{rclc}
clasohm@711
    91
$any$ &=& $prop$ ~~$|$~~ $logic$ \\\\
lcp@320
    92
$prop$ &=& {\tt PROP} $aprop$ ~~$|$~~ {\tt(} $prop$ {\tt)} \\
clasohm@711
    93
     &$|$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\
clasohm@711
    94
     &$|$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\
lcp@320
    95
     &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\
lcp@320
    96
     &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
lcp@320
    97
     &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\
lcp@320
    98
$aprop$ &=& $id$ ~~$|$~~ $var$
clasohm@711
    99
    ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\
clasohm@711
   100
$logic$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $logic$ {\tt)} \\
clasohm@711
   101
    &$|$& $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\
clasohm@711
   102
    &$|$& $logic^{(4)}$ {\tt::} $type$ & (4) \\
clasohm@711
   103
    &$|$& {\tt \%} $idts$ {\tt.} $any$ & (0) \\\\
lcp@320
   104
$idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
lcp@320
   105
$idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
lcp@320
   106
    &$|$& $id$ {\tt ::} $type$ & (0) \\\\
lcp@320
   107
$type$ &=& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
lcp@320
   108
  ~~$|$~~ $tvar$ {\tt::} $sort$ \\
lcp@320
   109
     &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$
lcp@320
   110
                ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
lcp@320
   111
     &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\
lcp@320
   112
     &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\
lcp@320
   113
     &$|$& {\tt(} $type$ {\tt)} \\\\
lcp@320
   114
$sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
lcp@320
   115
                ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
lcp@320
   116
\end{tabular}
lcp@320
   117
\index{*PROP symbol}
lcp@320
   118
\index{*== symbol}\index{*=?= symbol}\index{*==> symbol}
lcp@320
   119
\index{*:: symbol}\index{*=> symbol}
lcp@332
   120
\index{sort constraints}
lcp@332
   121
%the index command: a percent is permitted, but braces must match!
lcp@320
   122
\index{%@{\tt\%} symbol}
lcp@320
   123
\index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}
lcp@320
   124
\index{*[ symbol}\index{*] symbol}
lcp@320
   125
\index{*"!"! symbol}
lcp@320
   126
\index{*"["| symbol}
lcp@320
   127
\index{*"|"] symbol}
lcp@320
   128
\end{center}
lcp@320
   129
\caption{Meta-logic syntax}\label{fig:pure_gram}
lcp@320
   130
\end{figure}
lcp@320
   131
lcp@320
   132
lcp@320
   133
\section{The Pure syntax} \label{sec:basic_syntax}
lcp@320
   134
\index{syntax!Pure|(}
lcp@320
   135
lcp@320
   136
At the root of all object-logics lies the theory \thydx{Pure}.  It
lcp@320
   137
contains, among many other things, the Pure syntax.  An informal account of
lcp@320
   138
this basic syntax (types, terms and formulae) appears in 
lcp@320
   139
\iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%
lcp@320
   140
{\S\ref{sec:forward}}.  A more precise description using a priority grammar
lcp@320
   141
appears in Fig.\ts\ref{fig:pure_gram}.  It defines the following
lcp@320
   142
nonterminals:
lcp@320
   143
\begin{ttdescription}
clasohm@711
   144
  \item[\ndxbold{prop}] denotes terms of type {\tt prop}.  These are formulae
lcp@320
   145
  of the meta-logic.
lcp@320
   146
clasohm@711
   147
  \item[\ndxbold{aprop}] denotes atomic propositions.  These typically
lcp@320
   148
  include the judgement forms of the object-logic; its definition
lcp@320
   149
  introduces a meta-level predicate for each judgement form.
lcp@320
   150
clasohm@711
   151
  \item[\ndxbold{logic}] denotes terms whose type belongs to class
lcp@320
   152
  \cldx{logic}.  As the syntax is extended by new object-logics, more
lcp@320
   153
  productions for {\tt logic} are added automatically (see below).
lcp@320
   154
clasohm@711
   155
  \item[\ndxbold{any}] denotes terms that either belong to {\tt prop}
clasohm@711
   156
    or {\tt logic}.
lcp@320
   157
lcp@320
   158
  \item[\ndxbold{type}] denotes types of the meta-logic.
lcp@320
   159
lcp@320
   160
  \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained
lcp@320
   161
    by types.
lcp@320
   162
\end{ttdescription}
lcp@320
   163
lcp@320
   164
\begin{warn}
lcp@320
   165
  In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,
lcp@320
   166
  treating {\tt y} like a type constructor applied to {\tt nat}.  The
lcp@320
   167
  likely result is an error message.  To avoid this interpretation, use
lcp@320
   168
  parentheses and write \verb|(x::nat) y|.
lcp@332
   169
  \index{type constraints}\index{*:: symbol}
lcp@320
   170
lcp@320
   171
  Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
lcp@320
   172
  yields an error.  The correct form is \verb|(x::nat) (y::nat)|.
lcp@320
   173
\end{warn}
lcp@320
   174
lcp@320
   175
\subsection{Logical types and default syntax}\label{logical-types}
lcp@320
   176
\index{lambda calc@$\lambda$-calculus}
lcp@320
   177
clasohm@711
   178
Isabelle's representation of mathematical languages is based on the
clasohm@711
   179
simply typed $\lambda$-calculus.  All logical types, namely those of
clasohm@711
   180
class \cldx{logic}, are automatically equipped with a basic syntax of
clasohm@711
   181
types, identifiers, variables, parentheses, $\lambda$-abstractions and
lcp@320
   182
applications.
lcp@320
   183
clasohm@711
   184
More precisely, Isabelle internally replaces every nonterminal by
clasohm@711
   185
$logic$ if it belongs to a subclass of \cldx{logic}.  Thereby these
clasohm@711
   186
productions (which actually are productions of the nonterminal
clasohm@711
   187
$logic$) can be used for $ty$:
clasohm@711
   188
lcp@320
   189
\begin{center}
lcp@320
   190
\begin{tabular}{rclc}
lcp@320
   191
$ty$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $ty$ {\tt)} \\
clasohm@711
   192
  &$|$& $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)}\\
nipkow@452
   193
  &$|$& $ty^{(4)}$ {\tt::} $type$ ~~~~~~~ (3) \\\\
lcp@320
   194
\end{tabular}
lcp@320
   195
\end{center}
lcp@320
   196
nipkow@452
   197
\begin{warn}
nipkow@452
   198
  Type constraints bind very weakly. For example, \verb!x<y::nat! is normally
clasohm@711
   199
  parsed as \verb!(x<y)::nat!, unless \verb$<$ has priority of 3 or less, in
nipkow@452
   200
  which case the string is likely to be ambiguous. The correct form is
nipkow@452
   201
  \verb!x<(y::nat)!.
nipkow@452
   202
\end{warn}
lcp@320
   203
lcp@320
   204
\subsection{Lexical matters}
lcp@320
   205
The parser does not process input strings directly.  It operates on token
lcp@320
   206
lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
lcp@320
   207
tokens: \bfindex{delimiters} and \bfindex{name tokens}.
lcp@320
   208
lcp@320
   209
\index{reserved words}
lcp@320
   210
Delimiters can be regarded as reserved words of the syntax.  You can
lcp@320
   211
add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
lcp@320
   212
appear in typewriter font, for example {\tt ==}, {\tt =?=} and
lcp@320
   213
{\tt PROP}\@.
lcp@320
   214
lcp@320
   215
Name tokens have a predefined syntax.  The lexer distinguishes four
lcp@320
   216
disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
lcp@320
   217
identifiers\index{type identifiers}, type unknowns\index{type unknowns}.
lcp@320
   218
They are denoted by \ndxbold{id}, \ndxbold{var}, \ndxbold{tid},
lcp@320
   219
\ndxbold{tvar}, respectively.  Typical examples are {\tt x}, {\tt ?x7},
lcp@320
   220
{\tt 'a}, {\tt ?'a3}.  Here is the precise syntax:
lcp@320
   221
\begin{eqnarray*}
lcp@320
   222
id        & =   & letter~quasiletter^* \\
lcp@320
   223
var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
lcp@320
   224
tid       & =   & \mbox{\tt '}id \\
lcp@320
   225
tvar      & =   & \mbox{\tt ?}tid ~~|~~
lcp@320
   226
                  \mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex]
lcp@320
   227
letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
lcp@320
   228
digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
lcp@320
   229
quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
lcp@320
   230
nat       & =   & digit^+
lcp@320
   231
\end{eqnarray*}
lcp@320
   232
A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally
lcp@320
   233
a pair of base name and index (\ML\ type \mltydx{indexname}).  These
lcp@320
   234
components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
lcp@320
   235
run together as in {\tt ?x1}.  The latter form is possible if the base name
lcp@320
   236
does not end with digits.  If the index is 0, it may be dropped altogether:
lcp@320
   237
{\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
lcp@320
   238
lcp@320
   239
The lexer repeatedly takes the maximal prefix of the input string that
lcp@320
   240
forms a valid token.  A maximal prefix that is both a delimiter and a name
lcp@320
   241
is treated as a delimiter.  Spaces, tabs and newlines are separators; they
lcp@320
   242
never occur within tokens.
lcp@320
   243
lcp@320
   244
Delimiters need not be separated by white space.  For example, if {\tt -}
lcp@320
   245
is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
lcp@320
   246
two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\ 
lcp@320
   247
treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
lcp@320
   248
more liberal scheme is that the same string may be parsed in different ways
lcp@320
   249
after extending the syntax: after adding {\tt --} as a delimiter, the input
lcp@320
   250
{\tt --} is treated as a single token.
lcp@320
   251
lcp@320
   252
Although name tokens are returned from the lexer rather than the parser, it
lcp@320
   253
is more logical to regard them as nonterminals.  Delimiters, however, are
lcp@320
   254
terminals; they are just syntactic sugar and contribute nothing to the
lcp@320
   255
abstract syntax tree.
lcp@320
   256
lcp@320
   257
lcp@320
   258
\subsection{*Inspecting the syntax}
lcp@320
   259
\begin{ttbox}
lcp@320
   260
syn_of              : theory -> Syntax.syntax
lcp@320
   261
Syntax.print_syntax : Syntax.syntax -> unit
lcp@320
   262
Syntax.print_gram   : Syntax.syntax -> unit
lcp@320
   263
Syntax.print_trans  : Syntax.syntax -> unit
lcp@320
   264
\end{ttbox}
lcp@320
   265
The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes
lcp@320
   266
in \ML.  You can display values of this type by calling the following
lcp@320
   267
functions:
lcp@320
   268
\begin{ttdescription}
lcp@320
   269
\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
lcp@320
   270
  theory~{\it thy} as an \ML\ value.
lcp@320
   271
lcp@320
   272
\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
lcp@320
   273
  information contained in the syntax {\it syn}.  The displayed output can
lcp@320
   274
  be large.  The following two functions are more selective.
lcp@320
   275
lcp@320
   276
\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
lcp@320
   277
  of~{\it syn}, namely the lexicon, roots and productions.  These are
lcp@320
   278
  discussed below.
lcp@320
   279
lcp@320
   280
\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
lcp@320
   281
  part of~{\it syn}, namely the constants, parse/print macros and
lcp@320
   282
  parse/print translations.
lcp@320
   283
\end{ttdescription}
lcp@320
   284
lcp@320
   285
Let us demonstrate these functions by inspecting Pure's syntax.  Even that
lcp@320
   286
is too verbose to display in full.
lcp@320
   287
\begin{ttbox}\index{*Pure theory}
lcp@320
   288
Syntax.print_syntax (syn_of Pure.thy);
lcp@320
   289
{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
lcp@320
   290
{\out roots: logic type fun prop}
lcp@320
   291
{\out prods:}
lcp@320
   292
{\out   type = tid  (1000)}
lcp@320
   293
{\out   type = tvar  (1000)}
lcp@320
   294
{\out   type = id  (1000)}
lcp@320
   295
{\out   type = tid "::" sort[0]  => "_ofsort" (1000)}
lcp@320
   296
{\out   type = tvar "::" sort[0]  => "_ofsort" (1000)}
lcp@320
   297
{\out   \vdots}
lcp@320
   298
\ttbreak
lcp@320
   299
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
lcp@320
   300
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
lcp@320
   301
{\out   "_idtyp" "_lambda" "_tapp" "_tappl"}
lcp@320
   302
{\out parse_rules:}
lcp@320
   303
{\out parse_translation: "!!" "_K" "_abs" "_aprop"}
lcp@320
   304
{\out print_translation: "all"}
lcp@320
   305
{\out print_rules:}
lcp@320
   306
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
lcp@320
   307
\end{ttbox}
lcp@320
   308
lcp@332
   309
As you can see, the output is divided into labelled sections.  The grammar
lcp@320
   310
is represented by {\tt lexicon}, {\tt roots} and {\tt prods}.  The rest
lcp@320
   311
refers to syntactic translations and macro expansion.  Here is an
lcp@320
   312
explanation of the various sections.
lcp@320
   313
\begin{description}
lcp@320
   314
  \item[{\tt lexicon}] lists the delimiters used for lexical
lcp@320
   315
    analysis.\index{delimiters} 
lcp@320
   316
lcp@320
   317
  \item[{\tt roots}] lists the grammar's nonterminal symbols.  You must
lcp@320
   318
    name the desired root when calling lower level functions or specifying
lcp@320
   319
    macros.  Higher level functions usually expect a type and derive the
lcp@320
   320
    actual root as described in~\S\ref{sec:grammar}.
lcp@320
   321
lcp@320
   322
  \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.
lcp@320
   323
    The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
lcp@320
   324
    Each delimiter is quoted.  Some productions are shown with {\tt =>} and
lcp@320
   325
    an attached string.  These strings later become the heads of parse
lcp@320
   326
    trees; they also play a vital role when terms are printed (see
lcp@320
   327
    \S\ref{sec:asts}).
lcp@320
   328
lcp@320
   329
    Productions with no strings attached are called {\bf copy
lcp@320
   330
      productions}\indexbold{productions!copy}.  Their right-hand side must
lcp@320
   331
    have exactly one nonterminal symbol (or name token).  The parser does
lcp@320
   332
    not create a new parse tree node for copy productions, but simply
lcp@320
   333
    returns the parse tree of the right-hand symbol.
lcp@320
   334
lcp@320
   335
    If the right-hand side consists of a single nonterminal with no
lcp@320
   336
    delimiters, then the copy production is called a {\bf chain
lcp@320
   337
      production}.  Chain productions act as abbreviations:
lcp@320
   338
    conceptually, they are removed from the grammar by adding new
lcp@320
   339
    productions.  Priority information attached to chain productions is
lcp@320
   340
    ignored; only the dummy value $-1$ is displayed.
lcp@320
   341
lcp@320
   342
  \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]
lcp@320
   343
    relate to macros (see \S\ref{sec:macros}).
lcp@320
   344
lcp@320
   345
  \item[{\tt parse_ast_translation}, {\tt print_ast_translation}]
lcp@320
   346
    list sets of constants that invoke translation functions for abstract
lcp@320
   347
    syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
lcp@320
   348
    matter.\index{constants!for translations}
lcp@320
   349
lcp@320
   350
  \item[{\tt parse_translation}, {\tt print_translation}] list sets
lcp@320
   351
    of constants that invoke translation functions for terms (see
lcp@320
   352
    \S\ref{sec:tr_funs}).
lcp@320
   353
\end{description}
lcp@320
   354
\index{syntax!Pure|)}
lcp@320
   355
lcp@320
   356
lcp@320
   357
\section{Mixfix declarations} \label{sec:mixfix}
lcp@320
   358
\index{mixfix declarations|(} 
lcp@320
   359
lcp@320
   360
When defining a theory, you declare new constants by giving their names,
lcp@320
   361
their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
lcp@320
   362
allow you to extend Isabelle's basic $\lambda$-calculus syntax with
lcp@320
   363
readable notation.  They can express any context-free priority grammar.
lcp@320
   364
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
lcp@320
   365
general than the priority declarations of \ML\ and Prolog.  
lcp@320
   366
lcp@320
   367
A mixfix annotation defines a production of the priority grammar.  It
lcp@320
   368
describes the concrete syntax, the translation to abstract syntax, and the
lcp@320
   369
pretty printing.  Special case annotations provide a simple means of
lcp@320
   370
specifying infix operators, binders and so forth.
lcp@320
   371
lcp@320
   372
\subsection{Grammar productions}\label{sec:grammar}\index{productions}
lcp@320
   373
lcp@320
   374
Let us examine the treatment of the production
lcp@320
   375
\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\,  
lcp@320
   376
                  A@n^{(p@n)}\, w@n. \]
lcp@320
   377
Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$,
lcp@320
   378
\ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals.
lcp@320
   379
In the corresponding mixfix annotation, the priorities are given separately
lcp@320
   380
as $[p@1,\ldots,p@n]$ and~$p$.  The nonterminal symbols are identified with
lcp@320
   381
types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's
lcp@320
   382
effect on nonterminals is expressed as the function type
lcp@320
   383
\[ [\tau@1, \ldots, \tau@n]\To \tau. \]
lcp@320
   384
Finally, the template
lcp@320
   385
\[ w@0  \;_\; w@1 \;_\; \ldots \;_\; w@n \]
lcp@320
   386
describes the strings of terminals.
lcp@320
   387
lcp@320
   388
A simple type is typically declared for each nonterminal symbol.  In
lcp@320
   389
first-order logic, type~$i$ stands for terms and~$o$ for formulae.  Only
lcp@320
   390
the outermost type constructor is taken into account.  For example, any
lcp@320
   391
type of the form $\sigma list$ stands for a list;  productions may refer
lcp@332
   392
to the symbol {\tt list} and will apply to lists of any type.
lcp@320
   393
lcp@320
   394
The symbol associated with a type is called its {\bf root} since it may
lcp@320
   395
serve as the root of a parse tree.  Precisely, the root of $(\tau@1, \dots,
lcp@320
   396
\tau@n)ty$ is $ty$, where $\tau@1$, \ldots, $\tau@n$ are types and $ty$ is
lcp@320
   397
a type constructor.  Type infixes are a special case of this; in
lcp@320
   398
particular, the root of $\tau@1 \To \tau@2$ is {\tt fun}.  Finally, the
lcp@320
   399
root of a type variable is {\tt logic}; general productions might
lcp@320
   400
refer to this nonterminal.
lcp@320
   401
lcp@320
   402
Identifying nonterminals with types allows a constant's type to specify
lcp@320
   403
syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
lcp@320
   404
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the
lcp@320
   405
layout of the function's $n$ arguments.  The constant's name, in this
lcp@320
   406
case~$f$, will also serve as the label in the abstract syntax tree.  There
lcp@320
   407
are two exceptions to this treatment of constants:
lcp@320
   408
\begin{enumerate}\index{constants!syntactic}
lcp@320
   409
  \item A production need not map directly to a logical function.  In this
lcp@320
   410
    case, you must declare a constant whose purpose is purely syntactic.
lcp@320
   411
    By convention such constants begin with the symbol~{\tt\at}, 
lcp@320
   412
    ensuring that they can never be written in formulae.
lcp@320
   413
lcp@320
   414
  \item A copy production has no associated constant.\index{productions!copy}
lcp@320
   415
\end{enumerate}
lcp@320
   416
There is something artificial about this representation of productions,
lcp@320
   417
but it is convenient, particularly for simple theory extensions.
lcp@320
   418
lcp@320
   419
\subsection{The general mixfix form}
lcp@320
   420
Here is a detailed account of mixfix declarations.  Suppose the following
lcp@320
   421
line occurs within the {\tt consts} section of a {\tt .thy} file:
lcp@320
   422
\begin{center}
lcp@320
   423
  {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
lcp@320
   424
\end{center}
lcp@332
   425
This constant declaration and mixfix annotation are interpreted as follows:
lcp@320
   426
\begin{itemize}\index{productions}
lcp@320
   427
\item The string {\tt $c$} is the name of the constant associated with the
lcp@320
   428
  production; unless it is a valid identifier, it must be enclosed in
lcp@320
   429
  quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
lcp@320
   430
  production.\index{productions!copy} Otherwise, parsing an instance of the
lcp@320
   431
  phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
lcp@320
   432
    $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
lcp@320
   433
  argument.
lcp@320
   434
lcp@320
   435
  \item The constant $c$, if non-empty, is declared to have type $\sigma$.
lcp@320
   436
lcp@320
   437
  \item The string $template$ specifies the right-hand side of
lcp@320
   438
    the production.  It has the form
lcp@320
   439
    \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \] 
lcp@320
   440
    where each occurrence of {\tt_} denotes an argument position and
lcp@320
   441
    the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
lcp@320
   442
    the concrete syntax, you must escape it as described below.)  The $w@i$
lcp@320
   443
    may consist of \rmindex{delimiters}, spaces or 
lcp@320
   444
    \rmindex{pretty printing} annotations (see below).
lcp@320
   445
lcp@320
   446
  \item The type $\sigma$ specifies the production's nonterminal symbols
lcp@320
   447
    (or name tokens).  If $template$ is of the form above then $\sigma$
lcp@320
   448
    must be a function type with at least~$n$ argument positions, say
lcp@320
   449
    $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
lcp@320
   450
    derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
lcp@320
   451
    above.  Any of these may be function types; the corresponding root is
lcp@320
   452
    then \tydx{fun}.
lcp@320
   453
lcp@320
   454
  \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
lcp@320
   455
      [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
lcp@320
   456
    priority\indexbold{priorities} required of any phrase that may appear
lcp@320
   457
    as the $i$-th argument.  Missing priorities default to~0.
lcp@320
   458
lcp@320
   459
  \item The integer $p$ is the priority of this production.  If omitted, it
lcp@320
   460
    defaults to the maximal priority.
lcp@320
   461
    Priorities range between 0 and \ttindexbold{max_pri} (= 1000).
lcp@320
   462
\end{itemize}
lcp@320
   463
%
lcp@320
   464
The declaration {\tt $c$ ::\ "$\sigma$" ("$template$")} specifies no
lcp@320
   465
priorities.  The resulting production puts no priority constraints on any
lcp@320
   466
of its arguments and has maximal priority itself.  Omitting priorities in
lcp@320
   467
this manner will introduce syntactic ambiguities unless the production's
lcp@320
   468
right-hand side is fully bracketed, as in \verb|"if _ then _ else _ fi"|.
lcp@320
   469
lcp@320
   470
Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
lcp@320
   471
is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
lcp@320
   472
write terms involving~$c$.
lcp@320
   473
lcp@320
   474
\begin{warn}
lcp@320
   475
  Theories must sometimes declare types for purely syntactic purposes.  One
lcp@320
   476
  example is \tydx{type}, the built-in type of types.  This is a `type of
lcp@320
   477
  all types' in the syntactic sense only.  Do not declare such types under
lcp@320
   478
  {\tt arities} as belonging to class {\tt logic}\index{*logic class}, for
lcp@320
   479
  that would allow their use in arbitrary Isabelle
lcp@320
   480
  expressions~(\S\ref{logical-types}).
lcp@320
   481
\end{warn}
lcp@320
   482
lcp@320
   483
\subsection{Example: arithmetic expressions}
lcp@320
   484
\index{examples!of mixfix declarations}
lcp@320
   485
This theory specification contains a {\tt consts} section with mixfix
lcp@320
   486
declarations encoding the priority grammar from
lcp@320
   487
\S\ref{sec:priority_grammars}:
lcp@320
   488
\begin{ttbox}
lcp@320
   489
EXP = Pure +
lcp@320
   490
types
lcp@320
   491
  exp
lcp@320
   492
arities
lcp@320
   493
  exp :: logic
lcp@320
   494
consts
lcp@320
   495
  "0" :: "exp"                ("0"      9)
lcp@320
   496
  "+" :: "[exp, exp] => exp"  ("_ + _"  [0, 1] 0)
lcp@320
   497
  "*" :: "[exp, exp] => exp"  ("_ * _"  [3, 2] 2)
lcp@320
   498
  "-" :: "exp => exp"         ("- _"    [3] 3)
lcp@320
   499
end
lcp@320
   500
\end{ttbox}
lcp@320
   501
The {\tt arities} declaration causes {\tt exp} to be added as a new root.
lcp@332
   502
If you put this into a file {\tt EXP.thy} and load it via {\tt
lcp@320
   503
  use_thy "EXP"}, you can run some tests:
lcp@320
   504
\begin{ttbox}
lcp@320
   505
val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";
lcp@320
   506
{\out val it = fn : string -> unit}
lcp@320
   507
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";
lcp@320
   508
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
lcp@320
   509
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
lcp@320
   510
{\out \vdots}
lcp@320
   511
read_exp "0 + - 0 + 0";
lcp@320
   512
{\out tokens: "0" "+" "-" "0" "+" "0"}
lcp@320
   513
{\out raw: ("+" ("+" "0" ("-" "0")) "0")}
lcp@320
   514
{\out \vdots}
lcp@320
   515
\end{ttbox}
lcp@320
   516
The output of \ttindex{Syntax.test_read} includes the token list ({\tt
lcp@320
   517
  tokens}) and the raw \AST{} directly derived from the parse tree,
lcp@320
   518
ignoring parse \AST{} translations.  The rest is tracing information
lcp@320
   519
provided by the macro expander (see \S\ref{sec:macros}).
lcp@320
   520
lcp@320
   521
Executing {\tt Syntax.print_gram} reveals the productions derived
lcp@320
   522
from our mixfix declarations (lots of additional information deleted):
lcp@320
   523
\begin{ttbox}
lcp@320
   524
Syntax.print_gram (syn_of EXP.thy);
lcp@320
   525
{\out exp = "0"  => "0" (9)}
lcp@320
   526
{\out exp = exp[0] "+" exp[1]  => "+" (0)}
lcp@320
   527
{\out exp = exp[3] "*" exp[2]  => "*" (2)}
lcp@320
   528
{\out exp = "-" exp[3]  => "-" (3)}
lcp@320
   529
\end{ttbox}
lcp@320
   530
lcp@320
   531
lcp@320
   532
\subsection{The mixfix template}
lcp@320
   533
Let us take a closer look at the string $template$ appearing in mixfix
lcp@320
   534
annotations.  This string specifies a list of parsing and printing
lcp@320
   535
directives: delimiters\index{delimiters}, arguments, spaces, blocks of
lcp@320
   536
indentation and line breaks.  These are encoded by the following character
lcp@320
   537
sequences:
lcp@320
   538
\index{pretty printing|(}
lcp@320
   539
\begin{description}
lcp@320
   540
\item[~$d$~] is a delimiter, namely a non-empty sequence of characters
lcp@320
   541
  other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.
lcp@320
   542
  Even these characters may appear if escaped; this means preceding it with
lcp@320
   543
  a~{\tt '} (single quote).  Thus you have to write {\tt ''} if you really
lcp@320
   544
  want a single quote.  Delimiters may never contain spaces.
lcp@320
   545
lcp@320
   546
\item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol
lcp@320
   547
  or name token.
lcp@320
   548
lcp@320
   549
\item[~$s$~] is a non-empty sequence of spaces for printing.  This and the
lcp@320
   550
  following specifications do not affect parsing at all.
lcp@320
   551
lcp@320
   552
\item[~{\tt(}$n$~] opens a pretty printing block.  The optional number $n$
lcp@320
   553
  specifies how much indentation to add when a line break occurs within the
lcp@320
   554
  block.  If {\tt(} is not followed by digits, the indentation defaults
lcp@320
   555
  to~0.
lcp@320
   556
lcp@320
   557
\item[~{\tt)}~] closes a pretty printing block.
lcp@320
   558
lcp@320
   559
\item[~{\tt//}~] forces a line break.
lcp@320
   560
lcp@320
   561
\item[~{\tt/}$s$~] allows a line break.  Here $s$ stands for the string of
lcp@320
   562
  spaces (zero or more) right after the {\tt /} character.  These spaces
lcp@320
   563
  are printed if the break is not taken.
lcp@320
   564
\end{description}
lcp@320
   565
For example, the template {\tt"(_ +/ _)"} specifies an infix operator.
lcp@320
   566
There are two argument positions; the delimiter~{\tt+} is preceded by a
lcp@320
   567
space and followed by a space or line break; the entire phrase is a pretty
lcp@320
   568
printing block.  Other examples appear in Fig.\ts\ref{fig:set_trans} below.
lcp@320
   569
Isabelle's pretty printer resembles the one described in
lcp@320
   570
Paulson~\cite{paulson91}.
lcp@320
   571
lcp@320
   572
\index{pretty printing|)}
lcp@320
   573
lcp@320
   574
lcp@320
   575
\subsection{Infixes}
lcp@320
   576
\indexbold{infixes}
lcp@320
   577
lcp@320
   578
Infix operators associating to the left or right can be declared
lcp@320
   579
using {\tt infixl} or {\tt infixr}.
lcp@320
   580
Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)}
lcp@320
   581
abbreviates the constant declarations
lcp@320
   582
\begin{ttbox}
lcp@320
   583
"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
lcp@320
   584
"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
lcp@320
   585
\end{ttbox}
lcp@320
   586
and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the constant declarations
lcp@320
   587
\begin{ttbox}
lcp@320
   588
"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
lcp@320
   589
"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
lcp@320
   590
\end{ttbox}
lcp@320
   591
The infix operator is declared as a constant with the prefix {\tt op}.
lcp@320
   592
Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
lcp@320
   593
function symbols, as in \ML.  Special characters occurring in~$c$ must be
lcp@320
   594
escaped, as in delimiters, using a single quote.
lcp@320
   595
lcp@320
   596
The expanded forms above would be actually illegal in a {\tt .thy} file
lcp@320
   597
because they declare the constant \hbox{\tt"op \(c\)"} twice.
lcp@320
   598
lcp@320
   599
lcp@320
   600
\subsection{Binders}
lcp@320
   601
\indexbold{binders}
lcp@320
   602
\begingroup
lcp@320
   603
\def\Q{{\cal Q}}
lcp@320
   604
A {\bf binder} is a variable-binding construct such as a quantifier.  The
lcp@320
   605
constant declaration
lcp@320
   606
\begin{ttbox}
lcp@320
   607
\(c\) :: "\(\sigma\)"   (binder "\(\Q\)" \(p\))
lcp@320
   608
\end{ttbox}
lcp@320
   609
introduces a constant~$c$ of type~$\sigma$, which must have the form
lcp@320
   610
$(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
lcp@320
   611
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
lcp@320
   612
and the whole term has type~$\tau@3$.  Special characters in $\Q$ must be
lcp@320
   613
escaped using a single quote.
lcp@320
   614
lcp@320
   615
The declaration is expanded internally to
lcp@320
   616
\begin{ttbox}
lcp@320
   617
\(c\)    :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"
lcp@320
   618
"\(\Q\)"\hskip-3pt  :: "[idts, \(\tau@2\)] => \(\tau@3\)"   ("(3\(\Q\)_./ _)" \(p\))
lcp@320
   619
\end{ttbox}
lcp@320
   620
Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
lcp@332
   621
\index{type constraints}
lcp@320
   622
optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
lcp@320
   623
declaration also installs a parse translation\index{translations!parse}
lcp@320
   624
for~$\Q$ and a print translation\index{translations!print} for~$c$ to
lcp@320
   625
translate between the internal and external forms.
lcp@320
   626
lcp@320
   627
A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
lcp@320
   628
list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
lcp@320
   629
corresponds to the internal form
lcp@320
   630
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
lcp@320
   631
lcp@320
   632
\medskip
lcp@320
   633
For example, let us declare the quantifier~$\forall$:\index{quantifiers}
lcp@320
   634
\begin{ttbox}
lcp@320
   635
All :: "('a => o) => o"   (binder "ALL " 10)
lcp@320
   636
\end{ttbox}
lcp@320
   637
This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
lcp@320
   638
  $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
lcp@320
   639
back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
lcp@320
   640
  $x$.$P$} have type~$o$, the type of formulae, while the bound variable
lcp@320
   641
can be polymorphic.
lcp@320
   642
\endgroup
lcp@320
   643
lcp@320
   644
\index{mixfix declarations|)}
lcp@320
   645
clasohm@711
   646
\section{Ambiguity of parsed expressions} \label{sec:ambiguity}
clasohm@711
   647
\index{ambiguity!of parsed expressions}
clasohm@711
   648
clasohm@711
   649
To keep the grammar small and allow common productions to be shared
clasohm@711
   650
all logical types are internally represented
clasohm@711
   651
by one nonterminal, namely {\tt logic}. This and omitted or too freely 
clasohm@711
   652
chosen priorities may lead to ways of parsing an expression that were
clasohm@711
   653
not intended by the theory's maker. In most cases Isabelle is able to
clasohm@711
   654
select one of multiple parse trees that an expression has lead 
clasohm@711
   655
to by checking which of them can be typed correctly. But this may not
clasohm@711
   656
work in every case and always slows down parsing.
clasohm@711
   657
The warning and error messages that can be produced during this process are 
clasohm@711
   658
as follows:
clasohm@711
   659
clasohm@711
   660
If an ambiguity can be resolved by type inference this warning
clasohm@711
   661
is shown to remind the user that parsing is (unnecessarily) slowed
clasohm@711
   662
down:
clasohm@711
   663
clasohm@711
   664
\begin{ttbox}
clasohm@711
   665
{\out Warning: Ambiguous input "..."}
clasohm@711
   666
{\out produces the following parse trees:}
clasohm@711
   667
{\out ...}
clasohm@711
   668
{\out Fortunately, only one parse tree is type correct.}
clasohm@711
   669
{\out It helps (speed!) if you disambiguate your grammar or your input.}
clasohm@711
   670
\end{ttbox}
clasohm@711
   671
clasohm@711
   672
The following message is normally caused by using the same
clasohm@711
   673
syntax in two different productions:
clasohm@711
   674
clasohm@711
   675
\begin{ttbox}
clasohm@711
   676
{\out Warning: Ambiguous input "..."}
clasohm@711
   677
{\out produces the following parse trees:}
clasohm@711
   678
{\out ...}
clasohm@711
   679
{\out Error: More than one term is type correct:}
clasohm@711
   680
{\out ...}
clasohm@711
   681
\end{ttbox}
clasohm@711
   682
clasohm@711
   683
On the other hand it's also possible that none of the parse trees can be
clasohm@711
   684
typed correctly though the user did not make a mistake. By default Isabelle 
clasohm@711
   685
assumes that the type of a syntax translation rule is {\tt logic} but does 
clasohm@711
   686
not look at the type unless parsing the rule produces more than one parse 
clasohm@711
   687
tree. In that case this message is output if the rule's type is different 
clasohm@711
   688
from {\tt logic}:
clasohm@711
   689
clasohm@711
   690
\begin{ttbox}
clasohm@711
   691
{\out Warning: Ambiguous input "..."}
clasohm@711
   692
{\out produces the following parse trees:}
clasohm@711
   693
{\out ...}
clasohm@711
   694
{\out This occured in syntax translation rule: "..."  ->  "..."}
clasohm@711
   695
{\out Type checking error: Term does not have expected type}
clasohm@711
   696
{\out ...}
clasohm@711
   697
\end{ttbox}
clasohm@711
   698
clasohm@711
   699
To circumvent this the rule's type has to be stated.
clasohm@711
   700
lcp@320
   701
lcp@320
   702
\section{Example: some minimal logics} \label{sec:min_logics}
lcp@320
   703
\index{examples!of logic definitions}
lcp@320
   704
lcp@320
   705
This section presents some examples that have a simple syntax.  They
lcp@320
   706
demonstrate how to define new object-logics from scratch.
lcp@320
   707
clasohm@711
   708
First we must define how an object-logic syntax is embedded into the
lcp@320
   709
meta-logic.  Since all theorems must conform to the syntax for~\ndx{prop} (see
lcp@320
   710
Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
lcp@320
   711
object-level syntax.  Assume that the syntax of your object-logic defines a
lcp@320
   712
nonterminal symbol~\ndx{o} of formulae.  These formulae can now appear in
lcp@320
   713
axioms and theorems wherever \ndx{prop} does if you add the production
lcp@320
   714
\[ prop ~=~ o. \]
lcp@320
   715
This is not a copy production but a coercion from formulae to propositions:
lcp@320
   716
\begin{ttbox}
lcp@320
   717
Base = Pure +
lcp@320
   718
types
lcp@320
   719
  o
lcp@320
   720
arities
lcp@320
   721
  o :: logic
lcp@320
   722
consts
lcp@320
   723
  Trueprop :: "o => prop"   ("_" 5)
lcp@320
   724
end
lcp@320
   725
\end{ttbox}
lcp@320
   726
The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible
lcp@332
   727
coercion function.  Assuming this definition resides in a file {\tt Base.thy},
lcp@320
   728
you have to load it with the command {\tt use_thy "Base"}.
lcp@320
   729
lcp@320
   730
One of the simplest nontrivial logics is {\bf minimal logic} of
lcp@320
   731
implication.  Its definition in Isabelle needs no advanced features but
lcp@320
   732
illustrates the overall mechanism nicely:
lcp@320
   733
\begin{ttbox}
lcp@320
   734
Hilbert = Base +
lcp@320
   735
consts
lcp@320
   736
  "-->" :: "[o, o] => o"   (infixr 10)
lcp@320
   737
rules
lcp@320
   738
  K     "P --> Q --> P"
lcp@320
   739
  S     "(P --> Q --> R) --> (P --> Q) --> P --> R"
lcp@320
   740
  MP    "[| P --> Q; P |] ==> Q"
lcp@320
   741
end
lcp@320
   742
\end{ttbox}
lcp@332
   743
After loading this definition from the file {\tt Hilbert.thy}, you can
lcp@320
   744
start to prove theorems in the logic:
lcp@320
   745
\begin{ttbox}
lcp@320
   746
goal Hilbert.thy "P --> P";
lcp@320
   747
{\out Level 0}
lcp@320
   748
{\out P --> P}
lcp@320
   749
{\out  1.  P --> P}
lcp@320
   750
\ttbreak
lcp@320
   751
by (resolve_tac [Hilbert.MP] 1);
lcp@320
   752
{\out Level 1}
lcp@320
   753
{\out P --> P}
lcp@320
   754
{\out  1.  ?P --> P --> P}
lcp@320
   755
{\out  2.  ?P}
lcp@320
   756
\ttbreak
lcp@320
   757
by (resolve_tac [Hilbert.MP] 1);
lcp@320
   758
{\out Level 2}
lcp@320
   759
{\out P --> P}
lcp@320
   760
{\out  1.  ?P1 --> ?P --> P --> P}
lcp@320
   761
{\out  2.  ?P1}
lcp@320
   762
{\out  3.  ?P}
lcp@320
   763
\ttbreak
lcp@320
   764
by (resolve_tac [Hilbert.S] 1);
lcp@320
   765
{\out Level 3}
lcp@320
   766
{\out P --> P}
lcp@320
   767
{\out  1.  P --> ?Q2 --> P}
lcp@320
   768
{\out  2.  P --> ?Q2}
lcp@320
   769
\ttbreak
lcp@320
   770
by (resolve_tac [Hilbert.K] 1);
lcp@320
   771
{\out Level 4}
lcp@320
   772
{\out P --> P}
lcp@320
   773
{\out  1.  P --> ?Q2}
lcp@320
   774
\ttbreak
lcp@320
   775
by (resolve_tac [Hilbert.K] 1);
lcp@320
   776
{\out Level 5}
lcp@320
   777
{\out P --> P}
lcp@320
   778
{\out No subgoals!}
lcp@320
   779
\end{ttbox}
lcp@320
   780
As we can see, this Hilbert-style formulation of minimal logic is easy to
lcp@320
   781
define but difficult to use.  The following natural deduction formulation is
lcp@320
   782
better:
lcp@320
   783
\begin{ttbox}
lcp@320
   784
MinI = Base +
lcp@320
   785
consts
lcp@320
   786
  "-->" :: "[o, o] => o"   (infixr 10)
lcp@320
   787
rules
lcp@320
   788
  impI  "(P ==> Q) ==> P --> Q"
lcp@320
   789
  impE  "[| P --> Q; P |] ==> Q"
lcp@320
   790
end
lcp@320
   791
\end{ttbox}
lcp@320
   792
Note, however, that although the two systems are equivalent, this fact
lcp@320
   793
cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
lcp@320
   794
derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
lcp@320
   795
  Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
lcp@320
   796
in {\tt Hilbert}, something that can only be shown by induction over all
lcp@320
   797
possible proofs in {\tt Hilbert}.
lcp@320
   798
lcp@320
   799
We may easily extend minimal logic with falsity:
lcp@320
   800
\begin{ttbox}
lcp@320
   801
MinIF = MinI +
lcp@320
   802
consts
lcp@320
   803
  False :: "o"
lcp@320
   804
rules
lcp@320
   805
  FalseE "False ==> P"
lcp@320
   806
end
lcp@320
   807
\end{ttbox}
lcp@320
   808
On the other hand, we may wish to introduce conjunction only:
lcp@320
   809
\begin{ttbox}
lcp@320
   810
MinC = Base +
lcp@320
   811
consts
lcp@320
   812
  "&" :: "[o, o] => o"   (infixr 30)
lcp@320
   813
\ttbreak
lcp@320
   814
rules
lcp@320
   815
  conjI  "[| P; Q |] ==> P & Q"
lcp@320
   816
  conjE1 "P & Q ==> P"
lcp@320
   817
  conjE2 "P & Q ==> Q"
lcp@320
   818
end
lcp@320
   819
\end{ttbox}
lcp@320
   820
And if we want to have all three connectives together, we create and load a
lcp@320
   821
theory file consisting of a single line:\footnote{We can combine the
lcp@320
   822
  theories without creating a theory file using the ML declaration
lcp@320
   823
\begin{ttbox}
lcp@320
   824
val MinIFC_thy = merge_theories(MinIF,MinC)
lcp@320
   825
\end{ttbox}
lcp@320
   826
\index{*merge_theories|fnote}}
lcp@320
   827
\begin{ttbox}
lcp@320
   828
MinIFC = MinIF + MinC
lcp@320
   829
\end{ttbox}
lcp@320
   830
Now we can prove mixed theorems like
lcp@320
   831
\begin{ttbox}
lcp@320
   832
goal MinIFC.thy "P & False --> Q";
lcp@320
   833
by (resolve_tac [MinI.impI] 1);
lcp@320
   834
by (dresolve_tac [MinC.conjE2] 1);
lcp@320
   835
by (eresolve_tac [MinIF.FalseE] 1);
lcp@320
   836
\end{ttbox}
lcp@320
   837
Try this as an exercise!