doc-src/Ref/defining.tex
author Walther Neuper <neuper@ist.tugraz.at>
Thu, 12 Aug 2010 15:03:34 +0200
branchisac-from-Isabelle2009-2
changeset 37913 20e3616b2d9c
parent 30184 37969710e61f
child 43760 e87888b4152f
permissions -rw-r--r--
prepare reactivation of isac-update-Isa09-2
wenzelm@30184
     1
lcp@320
     2
\chapter{Defining Logics} \label{Defining-Logics}
lcp@320
     3
lcp@320
     4
\section{Mixfix declarations} \label{sec:mixfix}
wenzelm@864
     5
\index{mixfix declarations|(}
lcp@320
     6
lcp@320
     7
When defining a theory, you declare new constants by giving their names,
lcp@320
     8
their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
lcp@320
     9
allow you to extend Isabelle's basic $\lambda$-calculus syntax with
lcp@320
    10
readable notation.  They can express any context-free priority grammar.
lcp@320
    11
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
wenzelm@864
    12
general than the priority declarations of \ML\ and Prolog.
lcp@320
    13
lcp@320
    14
A mixfix annotation defines a production of the priority grammar.  It
lcp@320
    15
describes the concrete syntax, the translation to abstract syntax, and the
lcp@320
    16
pretty printing.  Special case annotations provide a simple means of
wenzelm@864
    17
specifying infix operators and binders.
lcp@320
    18
lcp@320
    19
\subsection{The general mixfix form}
lcp@320
    20
Here is a detailed account of mixfix declarations.  Suppose the following
wenzelm@864
    21
line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
wenzelm@864
    22
file:
lcp@320
    23
\begin{center}
lcp@320
    24
  {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
lcp@320
    25
\end{center}
lcp@332
    26
This constant declaration and mixfix annotation are interpreted as follows:
lcp@320
    27
\begin{itemize}\index{productions}
lcp@320
    28
\item The string {\tt $c$} is the name of the constant associated with the
lcp@320
    29
  production; unless it is a valid identifier, it must be enclosed in
lcp@320
    30
  quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
lcp@320
    31
  production.\index{productions!copy} Otherwise, parsing an instance of the
lcp@320
    32
  phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
lcp@320
    33
    $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
lcp@320
    34
  argument.
lcp@320
    35
wenzelm@864
    36
  \item The constant $c$, if non-empty, is declared to have type $\sigma$
wenzelm@864
    37
    ({\tt consts} section only).
lcp@320
    38
lcp@320
    39
  \item The string $template$ specifies the right-hand side of
lcp@320
    40
    the production.  It has the form
wenzelm@864
    41
    \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
lcp@320
    42
    where each occurrence of {\tt_} denotes an argument position and
lcp@320
    43
    the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
lcp@320
    44
    the concrete syntax, you must escape it as described below.)  The $w@i$
wenzelm@864
    45
    may consist of \rmindex{delimiters}, spaces or
lcp@320
    46
    \rmindex{pretty printing} annotations (see below).
lcp@320
    47
lcp@320
    48
  \item The type $\sigma$ specifies the production's nonterminal symbols
lcp@320
    49
    (or name tokens).  If $template$ is of the form above then $\sigma$
lcp@320
    50
    must be a function type with at least~$n$ argument positions, say
lcp@320
    51
    $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
lcp@320
    52
    derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
wenzelm@864
    53
    below.  Any of these may be function types.
lcp@320
    54
lcp@320
    55
  \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
lcp@320
    56
      [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
lcp@320
    57
    priority\indexbold{priorities} required of any phrase that may appear
lcp@320
    58
    as the $i$-th argument.  Missing priorities default to~0.
wenzelm@4543
    59
    
wenzelm@4543
    60
  \item The integer $p$ is the priority of this production.  If
wenzelm@4543
    61
    omitted, it defaults to the maximal priority.  Priorities range
wenzelm@4543
    62
    between 0 and \ttindexbold{max_pri} (= 1000).
lcp@320
    63
lcp@320
    64
\end{itemize}
lcp@320
    65
%
wenzelm@864
    66
The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
wenzelm@864
    67
A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
wenzelm@864
    68
nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
wenzelm@864
    69
The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
wenzelm@864
    70
this is a logical type (namely one of class {\tt logic} excluding {\tt
wenzelm@864
    71
prop}).  Otherwise it is $ty$ (note that only the outermost type constructor
wenzelm@864
    72
is taken into account).  Finally, the nonterminal of a type variable is {\tt
wenzelm@864
    73
any}.
wenzelm@864
    74
wenzelm@911
    75
\begin{warn}
wenzelm@864
    76
  Theories must sometimes declare types for purely syntactic purposes ---
paulson@3485
    77
  merely playing the role of nonterminals.  One example is \tydx{type}, the
wenzelm@864
    78
  built-in type of types.  This is a `type of all types' in the syntactic
wenzelm@864
    79
  sense only.  Do not declare such types under {\tt arities} as belonging to
wenzelm@864
    80
  class {\tt logic}\index{*logic class}, for that would make them useless as
wenzelm@864
    81
  separate nonterminal symbols.
wenzelm@864
    82
\end{warn}
wenzelm@864
    83
wenzelm@864
    84
Associating nonterminals with types allows a constant's type to specify
wenzelm@864
    85
syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
wenzelm@864
    86
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
wenzelm@864
    87
of the function's $n$ arguments.  The constant's name, in this case~$f$, will
wenzelm@864
    88
also serve as the label in the abstract syntax tree.
wenzelm@864
    89
wenzelm@864
    90
You may also declare mixfix syntax without adding constants to the theory's
wenzelm@864
    91
signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a
wenzelm@864
    92
production need not map directly to a logical function (this typically
wenzelm@864
    93
requires additional syntactic translations, see also
wenzelm@864
    94
Chapter~\ref{chap:syntax}).
wenzelm@864
    95
wenzelm@864
    96
wenzelm@911
    97
\medskip
wenzelm@911
    98
As a special case of the general mixfix declaration, the form
wenzelm@864
    99
\begin{center}
wenzelm@911
   100
  {\tt $c$ ::\ "$\sigma$" ("$template$")}
wenzelm@864
   101
\end{center}
wenzelm@864
   102
specifies no priorities.  The resulting production puts no priority
wenzelm@864
   103
constraints on any of its arguments and has maximal priority itself.
wenzelm@864
   104
Omitting priorities in this manner is prone to syntactic ambiguities unless
berghofe@3098
   105
the production's right-hand side is fully bracketed, as in
berghofe@3098
   106
\verb|"if _ then _ else _ fi"|.
lcp@320
   107
lcp@320
   108
Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
lcp@320
   109
is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
lcp@320
   110
write terms involving~$c$.
lcp@320
   111
lcp@320
   112
lcp@320
   113
\subsection{Example: arithmetic expressions}
lcp@320
   114
\index{examples!of mixfix declarations}
wenzelm@864
   115
This theory specification contains a {\tt syntax} section with mixfix
lcp@320
   116
declarations encoding the priority grammar from
lcp@320
   117
\S\ref{sec:priority_grammars}:
lcp@320
   118
\begin{ttbox}
wenzelm@3108
   119
ExpSyntax = Pure +
lcp@320
   120
types
lcp@320
   121
  exp
wenzelm@864
   122
syntax
clasohm@1387
   123
  "0" :: exp                 ("0"      9)
clasohm@1387
   124
  "+" :: [exp, exp] => exp   ("_ + _"  [0, 1] 0)
clasohm@1387
   125
  "*" :: [exp, exp] => exp   ("_ * _"  [3, 2] 2)
clasohm@1387
   126
  "-" :: exp => exp          ("- _"    [3] 3)
lcp@320
   127
end
lcp@320
   128
\end{ttbox}
wenzelm@864
   129
Executing {\tt Syntax.print_gram} reveals the productions derived from the
wenzelm@864
   130
above mixfix declarations (lots of additional information deleted):
lcp@320
   131
\begin{ttbox}
wenzelm@3108
   132
Syntax.print_gram (syn_of ExpSyntax.thy);
lcp@320
   133
{\out exp = "0"  => "0" (9)}
lcp@320
   134
{\out exp = exp[0] "+" exp[1]  => "+" (0)}
lcp@320
   135
{\out exp = exp[3] "*" exp[2]  => "*" (2)}
lcp@320
   136
{\out exp = "-" exp[3]  => "-" (3)}
lcp@320
   137
\end{ttbox}
lcp@320
   138
wenzelm@3108
   139
Note that because {\tt exp} is not of class {\tt logic}, it has been
paulson@3485
   140
retained as a separate nonterminal.  This also entails that the syntax
wenzelm@3108
   141
does not provide for identifiers or paranthesized expressions.
wenzelm@3108
   142
Normally you would also want to add the declaration {\tt arities
wenzelm@3108
   143
  exp::logic} after {\tt types} and use {\tt consts} instead of {\tt
paulson@3485
   144
  syntax}.  Try this as an exercise and study the changes in the
nipkow@867
   145
grammar.
lcp@320
   146
lcp@320
   147
lcp@320
   148
\subsection{Infixes}
lcp@320
   149
\indexbold{infixes}
lcp@320
   150
wenzelm@3108
   151
Infix operators associating to the left or right can be declared using
wenzelm@3108
   152
{\tt infixl} or {\tt infixr}.  Basically, the form {\tt $c$ ::\ 
wenzelm@3108
   153
  $\sigma$ (infixl $p$)} abbreviates the mixfix declarations
lcp@320
   154
\begin{ttbox}
clasohm@1387
   155
"op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
clasohm@1387
   156
"op \(c\)" :: \(\sigma\)   ("op \(c\)")
lcp@320
   157
\end{ttbox}
clasohm@1387
   158
and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations
lcp@320
   159
\begin{ttbox}
clasohm@1387
   160
"op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
clasohm@1387
   161
"op \(c\)" :: \(\sigma\)   ("op \(c\)")
lcp@320
   162
\end{ttbox}
lcp@320
   163
The infix operator is declared as a constant with the prefix {\tt op}.
lcp@320
   164
Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
lcp@320
   165
function symbols, as in \ML.  Special characters occurring in~$c$ must be
lcp@320
   166
escaped, as in delimiters, using a single quote.
lcp@320
   167
wenzelm@3108
   168
A slightly more general form of infix declarations allows constant
wenzelm@3108
   169
names to be independent from their concrete syntax, namely \texttt{$c$
paulson@3485
   170
  ::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}.  As
wenzelm@3108
   171
an example consider:
wenzelm@3108
   172
\begin{ttbox}
wenzelm@3108
   173
and :: [bool, bool] => bool  (infixr "&" 35)
wenzelm@3108
   174
\end{ttbox}
wenzelm@3108
   175
The internal constant name will then be just \texttt{and}, without any
wenzelm@3108
   176
\texttt{op} prefixed.
wenzelm@3108
   177
lcp@320
   178
lcp@320
   179
\subsection{Binders}
lcp@320
   180
\indexbold{binders}
lcp@320
   181
\begingroup
lcp@320
   182
\def\Q{{\cal Q}}
lcp@320
   183
A {\bf binder} is a variable-binding construct such as a quantifier.  The
lcp@320
   184
constant declaration
lcp@320
   185
\begin{ttbox}
clasohm@1387
   186
\(c\) :: \(\sigma\)   (binder "\(\Q\)" [\(pb\)] \(p\))
lcp@320
   187
\end{ttbox}
lcp@320
   188
introduces a constant~$c$ of type~$\sigma$, which must have the form
lcp@320
   189
$(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
lcp@320
   190
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
paulson@3485
   191
and the whole term has type~$\tau@3$.  The optional integer $pb$
lcp@1060
   192
specifies the body's priority, by default~$p$.  Special characters
clasohm@877
   193
in $\Q$ must be escaped using a single quote.
lcp@320
   194
wenzelm@864
   195
The declaration is expanded internally to something like
lcp@320
   196
\begin{ttbox}
berghofe@3098
   197
\(c\)\hskip3pt    :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)
berghofe@3098
   198
"\(\Q\)"  :: [idts, \(\tau@2\)] => \(\tau@3\)   ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))
lcp@320
   199
\end{ttbox}
lcp@320
   200
Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
lcp@332
   201
\index{type constraints}
lcp@320
   202
optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
lcp@320
   203
declaration also installs a parse translation\index{translations!parse}
lcp@320
   204
for~$\Q$ and a print translation\index{translations!print} for~$c$ to
lcp@320
   205
translate between the internal and external forms.
lcp@320
   206
lcp@320
   207
A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
lcp@320
   208
list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
lcp@320
   209
corresponds to the internal form
lcp@320
   210
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
lcp@320
   211
lcp@320
   212
\medskip
lcp@320
   213
For example, let us declare the quantifier~$\forall$:\index{quantifiers}
lcp@320
   214
\begin{ttbox}
clasohm@1387
   215
All :: ('a => o) => o   (binder "ALL " 10)
lcp@320
   216
\end{ttbox}
lcp@320
   217
This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
lcp@320
   218
  $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
lcp@320
   219
back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
lcp@320
   220
  $x$.$P$} have type~$o$, the type of formulae, while the bound variable
lcp@320
   221
can be polymorphic.
lcp@320
   222
\endgroup
lcp@320
   223
lcp@320
   224
\index{mixfix declarations|)}
lcp@320
   225
wenzelm@3108
   226
wenzelm@3108
   227
\section{*Alternative print modes} \label{sec:prmodes}
wenzelm@3108
   228
\index{print modes|(}
wenzelm@3108
   229
%
paulson@3485
   230
Isabelle's pretty printer supports alternative output syntaxes.  These
paulson@3485
   231
may be used independently or in cooperation.  The currently active
wenzelm@3108
   232
print modes (with precedence from left to right) are determined by a
wenzelm@3108
   233
reference variable.
wenzelm@3108
   234
\begin{ttbox}\index{*print_mode}
wenzelm@3108
   235
print_mode: string list ref
wenzelm@3108
   236
\end{ttbox}
wenzelm@3108
   237
Initially this may already contain some print mode identifiers,
wenzelm@3108
   238
depending on how Isabelle has been invoked (e.g.\ by some user
paulson@3485
   239
interface).  So changes should be incremental --- adding or deleting
wenzelm@3108
   240
modes relative to the current value.
wenzelm@3108
   241
wenzelm@12465
   242
Any \ML{} string is a legal print mode identifier, without any predeclaration
wenzelm@12465
   243
required.  The following names should be considered reserved, though:
wenzelm@12465
   244
\texttt{""} (the empty string), \texttt{symbols}, \texttt{xsymbols}, and
wenzelm@12465
   245
\texttt{latex}.
wenzelm@3108
   246
wenzelm@3108
   247
There is a separate table of mixfix productions for pretty printing
paulson@3485
   248
associated with each print mode.  The currently active ones are
wenzelm@3108
   249
conceptually just concatenated from left to right, with the standard
wenzelm@3108
   250
syntax output table always coming last as default.  Thus mixfix
wenzelm@3108
   251
productions of preceding modes in the list may override those of later
wenzelm@3108
   252
ones.  Also note that token translations are always relative to some
wenzelm@3108
   253
print mode (see \S\ref{sec:tok_tr}).
wenzelm@3108
   254
wenzelm@3108
   255
\medskip The canonical application of print modes is optional printing
wenzelm@3108
   256
of mathematical symbols from a special screen font instead of {\sc
paulson@3485
   257
  ascii}.  Another example is to re-use Isabelle's advanced
wenzelm@3108
   258
$\lambda$-term printing mechanisms to generate completely different
wenzelm@3228
   259
output, say for interfacing external tools like \rmindex{model
wenzelm@3228
   260
  checkers} (see also \texttt{HOL/Modelcheck}).
wenzelm@3108
   261
wenzelm@3108
   262
\index{print modes|)}
wenzelm@3108
   263
wenzelm@3108
   264
clasohm@711
   265
\section{Ambiguity of parsed expressions} \label{sec:ambiguity}
clasohm@711
   266
\index{ambiguity!of parsed expressions}
clasohm@711
   267
clasohm@711
   268
To keep the grammar small and allow common productions to be shared
wenzelm@864
   269
all logical types (except {\tt prop}) are internally represented
paulson@3485
   270
by one nonterminal, namely {\tt logic}.  This and omitted or too freely
clasohm@711
   271
chosen priorities may lead to ways of parsing an expression that were
paulson@3485
   272
not intended by the theory's maker.  In most cases Isabelle is able to
wenzelm@864
   273
select one of multiple parse trees that an expression has lead
paulson@3485
   274
to by checking which of them can be typed correctly.  But this may not
clasohm@711
   275
work in every case and always slows down parsing.
wenzelm@864
   276
The warning and error messages that can be produced during this process are
clasohm@711
   277
as follows:
clasohm@711
   278
clasohm@880
   279
If an ambiguity can be resolved by type inference the following
clasohm@880
   280
warning is shown to remind the user that parsing is (unnecessarily)
paulson@3485
   281
slowed down.  In cases where it's not easily possible to eliminate the
clasohm@880
   282
ambiguity the frequency of the warning can be controlled by changing
clasohm@883
   283
the value of {\tt Syntax.ambiguity_level} which has type {\tt int
paulson@3485
   284
ref}.  Its default value is 1 and by increasing it one can control how
clasohm@883
   285
many parse trees are necessary to generate the warning.
clasohm@711
   286
clasohm@711
   287
\begin{ttbox}
wenzelm@3801
   288
{\out Ambiguous input "\dots"}
clasohm@711
   289
{\out produces the following parse trees:}
wenzelm@3801
   290
{\out \dots}
clasohm@711
   291
{\out Fortunately, only one parse tree is type correct.}
wenzelm@3801
   292
{\out You may still want to disambiguate your grammar or your input.}
clasohm@711
   293
\end{ttbox}
clasohm@711
   294
clasohm@711
   295
The following message is normally caused by using the same
clasohm@711
   296
syntax in two different productions:
clasohm@711
   297
clasohm@711
   298
\begin{ttbox}
wenzelm@3802
   299
{\out Ambiguous input "..."}
clasohm@711
   300
{\out produces the following parse trees:}
wenzelm@3802
   301
{\out \dots}
wenzelm@3802
   302
{\out More than one term is type correct:}
wenzelm@3802
   303
{\out \dots}
clasohm@711
   304
\end{ttbox}
clasohm@711
   305
clasohm@866
   306
Ambiguities occuring in syntax translation rules cannot be resolved by
clasohm@866
   307
type inference because it is not necessary for these rules to be type
paulson@3485
   308
correct.  Therefore Isabelle always generates an error message and the
clasohm@866
   309
ambiguity should be eliminated by changing the grammar or the rule.
clasohm@711
   310
lcp@320
   311
wenzelm@5371
   312
%%% Local Variables: 
wenzelm@5371
   313
%%% mode: latex
wenzelm@5371
   314
%%% TeX-master: "ref"
wenzelm@5371
   315
%%% End: