lcp@320
|
1 |
%% $Id$
|
lcp@320
|
2 |
\chapter{Defining Logics} \label{Defining-Logics}
|
lcp@320
|
3 |
This chapter explains how to define new formal systems --- in particular,
|
lcp@320
|
4 |
their concrete syntax. While Isabelle can be regarded as a theorem prover
|
lcp@320
|
5 |
for set theory, higher-order logic or the sequent calculus, its
|
lcp@320
|
6 |
distinguishing feature is support for the definition of new logics.
|
lcp@320
|
7 |
|
lcp@320
|
8 |
Isabelle logics are hierarchies of theories, which are described and
|
wenzelm@864
|
9 |
illustrated in
|
lcp@320
|
10 |
\iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%
|
lcp@320
|
11 |
{\S\ref{sec:defining-theories}}. That material, together with the theory
|
lcp@320
|
12 |
files provided in the examples directories, should suffice for all simple
|
lcp@320
|
13 |
applications. The easiest way to define a new theory is by modifying a
|
lcp@320
|
14 |
copy of an existing theory.
|
lcp@320
|
15 |
|
lcp@320
|
16 |
This chapter documents the meta-logic syntax, mixfix declarations and
|
lcp@320
|
17 |
pretty printing. The extended examples in \S\ref{sec:min_logics}
|
lcp@320
|
18 |
demonstrate the logical aspects of the definition of theories.
|
lcp@320
|
19 |
|
lcp@320
|
20 |
|
lcp@320
|
21 |
\section{Priority grammars} \label{sec:priority_grammars}
|
wenzelm@864
|
22 |
\index{priority grammars|(}
|
lcp@320
|
23 |
|
lcp@320
|
24 |
A context-free grammar contains a set of {\bf nonterminal symbols}, a set of
|
lcp@320
|
25 |
{\bf terminal symbols} and a set of {\bf productions}\index{productions}.
|
lcp@320
|
26 |
Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and
|
lcp@320
|
27 |
$\gamma$ is a string of terminals and nonterminals. One designated
|
lcp@320
|
28 |
nonterminal is called the {\bf start symbol}. The language defined by the
|
lcp@320
|
29 |
grammar consists of all strings of terminals that can be derived from the
|
lcp@320
|
30 |
start symbol by applying productions as rewrite rules.
|
lcp@320
|
31 |
|
lcp@320
|
32 |
The syntax of an Isabelle logic is specified by a {\bf priority
|
lcp@320
|
33 |
grammar}.\index{priorities} Each nonterminal is decorated by an integer
|
lcp@320
|
34 |
priority, as in~$A^{(p)}$. A nonterminal $A^{(p)}$ in a derivation may be
|
lcp@320
|
35 |
rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$. Any
|
lcp@320
|
36 |
priority grammar can be translated into a normal context free grammar by
|
lcp@320
|
37 |
introducing new nonterminals and productions.
|
lcp@320
|
38 |
|
lcp@320
|
39 |
Formally, a set of context free productions $G$ induces a derivation
|
lcp@320
|
40 |
relation $\longrightarrow@G$. Let $\alpha$ and $\beta$ denote strings of
|
lcp@320
|
41 |
terminal or nonterminal symbols. Then
|
wenzelm@864
|
42 |
\[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \]
|
lcp@320
|
43 |
if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$.
|
lcp@320
|
44 |
|
lcp@320
|
45 |
The following simple grammar for arithmetic expressions demonstrates how
|
lcp@320
|
46 |
binding power and associativity of operators can be enforced by priorities.
|
lcp@320
|
47 |
\begin{center}
|
lcp@320
|
48 |
\begin{tabular}{rclr}
|
lcp@320
|
49 |
$A^{(9)}$ & = & {\tt0} \\
|
lcp@320
|
50 |
$A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
|
lcp@320
|
51 |
$A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
|
lcp@320
|
52 |
$A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
|
lcp@320
|
53 |
$A^{(3)}$ & = & {\tt-} $A^{(3)}$
|
lcp@320
|
54 |
\end{tabular}
|
lcp@320
|
55 |
\end{center}
|
lcp@320
|
56 |
The choice of priorities determines that {\tt -} binds tighter than {\tt *},
|
lcp@320
|
57 |
which binds tighter than {\tt +}. Furthermore {\tt +} associates to the
|
lcp@320
|
58 |
left and {\tt *} to the right.
|
lcp@320
|
59 |
|
lcp@320
|
60 |
For clarity, grammars obey these conventions:
|
lcp@320
|
61 |
\begin{itemize}
|
lcp@320
|
62 |
\item All priorities must lie between~0 and \ttindex{max_pri}, which is a
|
lcp@320
|
63 |
some fixed integer. Sometimes {\tt max_pri} is written as $\infty$.
|
lcp@320
|
64 |
\item Priority 0 on the right-hand side and priority \ttindex{max_pri} on
|
lcp@320
|
65 |
the left-hand side may be omitted.
|
lcp@320
|
66 |
\item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the
|
lcp@320
|
67 |
priority of the left-hand side actually appears in a column on the far
|
wenzelm@864
|
68 |
right.
|
wenzelm@864
|
69 |
\item Alternatives are separated by~$|$.
|
lcp@320
|
70 |
\item Repetition is indicated by dots~(\dots) in an informal but obvious
|
lcp@320
|
71 |
way.
|
lcp@320
|
72 |
\end{itemize}
|
lcp@320
|
73 |
|
lcp@320
|
74 |
Using these conventions and assuming $\infty=9$, the grammar
|
lcp@320
|
75 |
takes the form
|
lcp@320
|
76 |
\begin{center}
|
lcp@320
|
77 |
\begin{tabular}{rclc}
|
lcp@320
|
78 |
$A$ & = & {\tt0} & \hspace*{4em} \\
|
lcp@320
|
79 |
& $|$ & {\tt(} $A$ {\tt)} \\
|
lcp@320
|
80 |
& $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
|
lcp@320
|
81 |
& $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
|
lcp@320
|
82 |
& $|$ & {\tt-} $A^{(3)}$ & (3)
|
lcp@320
|
83 |
\end{tabular}
|
lcp@320
|
84 |
\end{center}
|
lcp@320
|
85 |
\index{priority grammars|)}
|
lcp@320
|
86 |
|
lcp@320
|
87 |
|
lcp@320
|
88 |
\begin{figure}
|
lcp@320
|
89 |
\begin{center}
|
lcp@320
|
90 |
\begin{tabular}{rclc}
|
clasohm@711
|
91 |
$any$ &=& $prop$ ~~$|$~~ $logic$ \\\\
|
wenzelm@864
|
92 |
$prop$ &=& {\tt(} $prop$ {\tt)} \\
|
wenzelm@864
|
93 |
&$|$& $prop^{(4)}$ {\tt::} $type$ & (3) \\
|
wenzelm@864
|
94 |
&$|$& {\tt PROP} $aprop$ \\
|
clasohm@711
|
95 |
&$|$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\
|
clasohm@711
|
96 |
&$|$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\
|
lcp@320
|
97 |
&$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\
|
lcp@320
|
98 |
&$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
|
wenzelm@864
|
99 |
&$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\
|
wenzelm@864
|
100 |
&$|$& {\tt OFCLASS} {\tt(} $type$ {\tt,} $logic$ {\tt)} \\\\
|
lcp@320
|
101 |
$aprop$ &=& $id$ ~~$|$~~ $var$
|
clasohm@711
|
102 |
~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\
|
wenzelm@864
|
103 |
$logic$ &=& {\tt(} $logic$ {\tt)} \\
|
wenzelm@864
|
104 |
&$|$& $logic^{(4)}$ {\tt::} $type$ & (3) \\
|
wenzelm@864
|
105 |
&$|$& $id$ ~~$|$~~ $var$
|
wenzelm@864
|
106 |
~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\
|
wenzelm@864
|
107 |
&$|$& {\tt \%} $idts$ {\tt.} $any$ & (0) \\\\
|
lcp@320
|
108 |
$idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
|
lcp@320
|
109 |
$idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
|
lcp@320
|
110 |
&$|$& $id$ {\tt ::} $type$ & (0) \\\\
|
wenzelm@864
|
111 |
$type$ &=& {\tt(} $type$ {\tt)} \\
|
wenzelm@864
|
112 |
&$|$& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
|
wenzelm@864
|
113 |
~~$|$~~ $tvar$ {\tt::} $sort$ \\
|
lcp@320
|
114 |
&$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$
|
lcp@320
|
115 |
~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
|
lcp@320
|
116 |
&$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\
|
wenzelm@864
|
117 |
&$|$& {\tt[} $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0) \\\\
|
lcp@320
|
118 |
$sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
|
lcp@320
|
119 |
~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
|
lcp@320
|
120 |
\end{tabular}
|
lcp@320
|
121 |
\index{*PROP symbol}
|
lcp@320
|
122 |
\index{*== symbol}\index{*=?= symbol}\index{*==> symbol}
|
lcp@320
|
123 |
\index{*:: symbol}\index{*=> symbol}
|
lcp@332
|
124 |
\index{sort constraints}
|
lcp@332
|
125 |
%the index command: a percent is permitted, but braces must match!
|
lcp@320
|
126 |
\index{%@{\tt\%} symbol}
|
lcp@320
|
127 |
\index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}
|
lcp@320
|
128 |
\index{*[ symbol}\index{*] symbol}
|
lcp@320
|
129 |
\index{*"!"! symbol}
|
lcp@320
|
130 |
\index{*"["| symbol}
|
lcp@320
|
131 |
\index{*"|"] symbol}
|
lcp@320
|
132 |
\end{center}
|
lcp@320
|
133 |
\caption{Meta-logic syntax}\label{fig:pure_gram}
|
lcp@320
|
134 |
\end{figure}
|
lcp@320
|
135 |
|
lcp@320
|
136 |
|
lcp@320
|
137 |
\section{The Pure syntax} \label{sec:basic_syntax}
|
lcp@320
|
138 |
\index{syntax!Pure|(}
|
lcp@320
|
139 |
|
lcp@320
|
140 |
At the root of all object-logics lies the theory \thydx{Pure}. It
|
lcp@320
|
141 |
contains, among many other things, the Pure syntax. An informal account of
|
wenzelm@864
|
142 |
this basic syntax (types, terms and formulae) appears in
|
lcp@320
|
143 |
\iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%
|
lcp@320
|
144 |
{\S\ref{sec:forward}}. A more precise description using a priority grammar
|
lcp@320
|
145 |
appears in Fig.\ts\ref{fig:pure_gram}. It defines the following
|
lcp@320
|
146 |
nonterminals:
|
lcp@320
|
147 |
\begin{ttdescription}
|
wenzelm@864
|
148 |
\item[\ndxbold{any}] denotes any term.
|
wenzelm@864
|
149 |
|
clasohm@711
|
150 |
\item[\ndxbold{prop}] denotes terms of type {\tt prop}. These are formulae
|
wenzelm@864
|
151 |
of the meta-logic. Note that user constants of result type {\tt prop}
|
wenzelm@864
|
152 |
(i.e.\ $c :: \ldots \To prop$) should always provide concrete syntax.
|
wenzelm@864
|
153 |
Otherwise atomic propositions with head $c$ may be printed incorrectly.
|
lcp@320
|
154 |
|
wenzelm@864
|
155 |
\item[\ndxbold{aprop}] denotes atomic propositions.
|
wenzelm@864
|
156 |
|
wenzelm@864
|
157 |
%% FIXME huh!?
|
wenzelm@864
|
158 |
% These typically
|
wenzelm@864
|
159 |
% include the judgement forms of the object-logic; its definition
|
wenzelm@864
|
160 |
% introduces a meta-level predicate for each judgement form.
|
lcp@320
|
161 |
|
clasohm@711
|
162 |
\item[\ndxbold{logic}] denotes terms whose type belongs to class
|
wenzelm@864
|
163 |
\cldx{logic}, excluding type \tydx{prop}.
|
lcp@320
|
164 |
|
wenzelm@864
|
165 |
\item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained
|
wenzelm@864
|
166 |
by types.
|
lcp@320
|
167 |
|
lcp@320
|
168 |
\item[\ndxbold{type}] denotes types of the meta-logic.
|
lcp@320
|
169 |
|
wenzelm@864
|
170 |
\item[\ndxbold{sort}] denotes meta-level sorts.
|
lcp@320
|
171 |
\end{ttdescription}
|
lcp@320
|
172 |
|
lcp@320
|
173 |
\begin{warn}
|
lcp@320
|
174 |
In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,
|
lcp@320
|
175 |
treating {\tt y} like a type constructor applied to {\tt nat}. The
|
lcp@320
|
176 |
likely result is an error message. To avoid this interpretation, use
|
lcp@320
|
177 |
parentheses and write \verb|(x::nat) y|.
|
lcp@332
|
178 |
\index{type constraints}\index{*:: symbol}
|
lcp@320
|
179 |
|
lcp@320
|
180 |
Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
|
lcp@320
|
181 |
yields an error. The correct form is \verb|(x::nat) (y::nat)|.
|
lcp@320
|
182 |
\end{warn}
|
lcp@320
|
183 |
|
nipkow@452
|
184 |
\begin{warn}
|
nipkow@452
|
185 |
Type constraints bind very weakly. For example, \verb!x<y::nat! is normally
|
clasohm@711
|
186 |
parsed as \verb!(x<y)::nat!, unless \verb$<$ has priority of 3 or less, in
|
nipkow@452
|
187 |
which case the string is likely to be ambiguous. The correct form is
|
nipkow@452
|
188 |
\verb!x<(y::nat)!.
|
nipkow@452
|
189 |
\end{warn}
|
lcp@320
|
190 |
|
nipkow@867
|
191 |
\subsection{Logical types and default syntax}\label{logical-types}
|
nipkow@867
|
192 |
\index{lambda calc@$\lambda$-calculus}
|
nipkow@867
|
193 |
|
nipkow@867
|
194 |
Isabelle's representation of mathematical languages is based on the
|
nipkow@867
|
195 |
simply typed $\lambda$-calculus. All logical types, namely those of
|
nipkow@867
|
196 |
class \cldx{logic}, are automatically equipped with a basic syntax of
|
nipkow@867
|
197 |
types, identifiers, variables, parentheses, $\lambda$-abstraction and
|
nipkow@867
|
198 |
application.
|
nipkow@867
|
199 |
\begin{warn}
|
nipkow@867
|
200 |
Isabelle combines the syntaxes for all types of class \cldx{logic} by
|
nipkow@867
|
201 |
mapping all those types to the single nonterminal $logic$. Thus all
|
nipkow@867
|
202 |
productions of $logic$, in particular $id$, $var$ etc, become available.
|
nipkow@867
|
203 |
\end{warn}
|
wenzelm@864
|
204 |
|
wenzelm@864
|
205 |
|
lcp@320
|
206 |
\subsection{Lexical matters}
|
lcp@320
|
207 |
The parser does not process input strings directly. It operates on token
|
lcp@320
|
208 |
lists provided by Isabelle's \bfindex{lexer}. There are two kinds of
|
lcp@320
|
209 |
tokens: \bfindex{delimiters} and \bfindex{name tokens}.
|
lcp@320
|
210 |
|
lcp@320
|
211 |
\index{reserved words}
|
lcp@320
|
212 |
Delimiters can be regarded as reserved words of the syntax. You can
|
lcp@320
|
213 |
add new ones when extending theories. In Fig.\ts\ref{fig:pure_gram} they
|
lcp@320
|
214 |
appear in typewriter font, for example {\tt ==}, {\tt =?=} and
|
lcp@320
|
215 |
{\tt PROP}\@.
|
lcp@320
|
216 |
|
wenzelm@864
|
217 |
Name tokens have a predefined syntax. The lexer distinguishes six disjoint
|
wenzelm@864
|
218 |
classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
|
wenzelm@864
|
219 |
identifiers\index{type identifiers}, type unknowns\index{type unknowns},
|
wenzelm@864
|
220 |
\rmindex{numerals}, \rmindex{strings}. They are denoted by \ndxbold{id},
|
wenzelm@864
|
221 |
\ndxbold{var}, \ndxbold{tid}, \ndxbold{tvar}, \ndxbold{xnum}, \ndxbold{xstr},
|
wenzelm@864
|
222 |
respectively. Typical examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3},
|
wenzelm@864
|
223 |
{\tt \#42}, {\tt ''foo bar''}. Here is the precise syntax:
|
lcp@320
|
224 |
\begin{eqnarray*}
|
lcp@320
|
225 |
id & = & letter~quasiletter^* \\
|
lcp@320
|
226 |
var & = & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
|
lcp@320
|
227 |
tid & = & \mbox{\tt '}id \\
|
lcp@320
|
228 |
tvar & = & \mbox{\tt ?}tid ~~|~~
|
wenzelm@864
|
229 |
\mbox{\tt ?}tid\mbox{\tt .}nat \\
|
wenzelm@864
|
230 |
xnum & = & \mbox{\tt \#}nat ~~|~~ \mbox{\tt \#\char`\~}nat \\
|
wenzelm@864
|
231 |
xstr & = & \mbox{\tt ''\rm text\tt ''} \\[1ex]
|
lcp@320
|
232 |
letter & = & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
|
lcp@320
|
233 |
digit & = & \mbox{one of {\tt 0}\dots {\tt 9}} \\
|
lcp@320
|
234 |
quasiletter & = & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
|
lcp@320
|
235 |
nat & = & digit^+
|
lcp@320
|
236 |
\end{eqnarray*}
|
wenzelm@864
|
237 |
The lexer repeatedly takes the maximal prefix of the input string that forms
|
wenzelm@864
|
238 |
a valid token. A maximal prefix that is both a delimiter and a name is
|
wenzelm@864
|
239 |
treated as a delimiter. Spaces, tabs, newlines and formfeeds are separators;
|
wenzelm@864
|
240 |
they never occur within tokens, except those of class $xstr$.
|
wenzelm@864
|
241 |
|
wenzelm@864
|
242 |
\medskip
|
wenzelm@864
|
243 |
Delimiters need not be separated by white space. For example, if {\tt -}
|
wenzelm@864
|
244 |
is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
|
wenzelm@864
|
245 |
two consecutive occurrences of the token~{\tt -}. In contrast, \ML\
|
wenzelm@864
|
246 |
treats {\tt --} as a single symbolic name. The consequence of Isabelle's
|
wenzelm@864
|
247 |
more liberal scheme is that the same string may be parsed in different ways
|
wenzelm@864
|
248 |
after extending the syntax: after adding {\tt --} as a delimiter, the input
|
wenzelm@864
|
249 |
{\tt --} is treated as a single token.
|
wenzelm@864
|
250 |
|
lcp@320
|
251 |
A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally
|
lcp@320
|
252 |
a pair of base name and index (\ML\ type \mltydx{indexname}). These
|
lcp@320
|
253 |
components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
|
lcp@320
|
254 |
run together as in {\tt ?x1}. The latter form is possible if the base name
|
lcp@320
|
255 |
does not end with digits. If the index is 0, it may be dropped altogether:
|
lcp@320
|
256 |
{\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
|
lcp@320
|
257 |
|
wenzelm@864
|
258 |
Tokens of class $xnum$ or $xstr$ are not used by the meta-logic.
|
wenzelm@864
|
259 |
Object-logics may provide numerals and string constants by adding appropriate
|
wenzelm@864
|
260 |
productions and translation functions.
|
lcp@320
|
261 |
|
wenzelm@864
|
262 |
\medskip
|
lcp@320
|
263 |
Although name tokens are returned from the lexer rather than the parser, it
|
lcp@320
|
264 |
is more logical to regard them as nonterminals. Delimiters, however, are
|
lcp@320
|
265 |
terminals; they are just syntactic sugar and contribute nothing to the
|
lcp@320
|
266 |
abstract syntax tree.
|
lcp@320
|
267 |
|
lcp@320
|
268 |
|
lcp@320
|
269 |
\subsection{*Inspecting the syntax}
|
lcp@320
|
270 |
\begin{ttbox}
|
lcp@320
|
271 |
syn_of : theory -> Syntax.syntax
|
wenzelm@864
|
272 |
print_syntax : theory -> unit
|
lcp@320
|
273 |
Syntax.print_syntax : Syntax.syntax -> unit
|
lcp@320
|
274 |
Syntax.print_gram : Syntax.syntax -> unit
|
lcp@320
|
275 |
Syntax.print_trans : Syntax.syntax -> unit
|
lcp@320
|
276 |
\end{ttbox}
|
lcp@320
|
277 |
The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes
|
lcp@320
|
278 |
in \ML. You can display values of this type by calling the following
|
lcp@320
|
279 |
functions:
|
lcp@320
|
280 |
\begin{ttdescription}
|
lcp@320
|
281 |
\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
|
lcp@320
|
282 |
theory~{\it thy} as an \ML\ value.
|
lcp@320
|
283 |
|
wenzelm@864
|
284 |
\item[\ttindexbold{print_syntax} $thy$] displays the syntax part of $thy$
|
wenzelm@864
|
285 |
using {\tt Syntax.print_syntax}.
|
wenzelm@864
|
286 |
|
lcp@320
|
287 |
\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
|
lcp@320
|
288 |
information contained in the syntax {\it syn}. The displayed output can
|
lcp@320
|
289 |
be large. The following two functions are more selective.
|
lcp@320
|
290 |
|
lcp@320
|
291 |
\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
|
wenzelm@864
|
292 |
of~{\it syn}, namely the lexicon, logical types and productions. These are
|
lcp@320
|
293 |
discussed below.
|
lcp@320
|
294 |
|
lcp@320
|
295 |
\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
|
lcp@320
|
296 |
part of~{\it syn}, namely the constants, parse/print macros and
|
lcp@320
|
297 |
parse/print translations.
|
lcp@320
|
298 |
\end{ttdescription}
|
lcp@320
|
299 |
|
lcp@320
|
300 |
Let us demonstrate these functions by inspecting Pure's syntax. Even that
|
lcp@320
|
301 |
is too verbose to display in full.
|
lcp@320
|
302 |
\begin{ttbox}\index{*Pure theory}
|
lcp@320
|
303 |
Syntax.print_syntax (syn_of Pure.thy);
|
lcp@320
|
304 |
{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
|
wenzelm@864
|
305 |
{\out logtypes: fun itself}
|
lcp@320
|
306 |
{\out prods:}
|
lcp@320
|
307 |
{\out type = tid (1000)}
|
lcp@320
|
308 |
{\out type = tvar (1000)}
|
lcp@320
|
309 |
{\out type = id (1000)}
|
lcp@320
|
310 |
{\out type = tid "::" sort[0] => "_ofsort" (1000)}
|
lcp@320
|
311 |
{\out type = tvar "::" sort[0] => "_ofsort" (1000)}
|
lcp@320
|
312 |
{\out \vdots}
|
lcp@320
|
313 |
\ttbreak
|
lcp@320
|
314 |
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
|
lcp@320
|
315 |
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
|
lcp@320
|
316 |
{\out "_idtyp" "_lambda" "_tapp" "_tappl"}
|
lcp@320
|
317 |
{\out parse_rules:}
|
lcp@320
|
318 |
{\out parse_translation: "!!" "_K" "_abs" "_aprop"}
|
lcp@320
|
319 |
{\out print_translation: "all"}
|
lcp@320
|
320 |
{\out print_rules:}
|
lcp@320
|
321 |
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
|
lcp@320
|
322 |
\end{ttbox}
|
lcp@320
|
323 |
|
lcp@332
|
324 |
As you can see, the output is divided into labelled sections. The grammar
|
wenzelm@864
|
325 |
is represented by {\tt lexicon}, {\tt logtypes} and {\tt prods}. The rest
|
lcp@320
|
326 |
refers to syntactic translations and macro expansion. Here is an
|
lcp@320
|
327 |
explanation of the various sections.
|
lcp@320
|
328 |
\begin{description}
|
lcp@320
|
329 |
\item[{\tt lexicon}] lists the delimiters used for lexical
|
wenzelm@864
|
330 |
analysis.\index{delimiters}
|
lcp@320
|
331 |
|
wenzelm@864
|
332 |
\item[{\tt logtypes}] lists the types that are regarded the same as {\tt
|
wenzelm@864
|
333 |
logic} syntactically. Thus types of object-logics (e.g.\ {\tt nat}, say)
|
wenzelm@864
|
334 |
will be automatically equipped with the standard syntax of
|
wenzelm@864
|
335 |
$\lambda$-calculus.
|
lcp@320
|
336 |
|
lcp@320
|
337 |
\item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.
|
lcp@320
|
338 |
The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
|
lcp@320
|
339 |
Each delimiter is quoted. Some productions are shown with {\tt =>} and
|
lcp@320
|
340 |
an attached string. These strings later become the heads of parse
|
lcp@320
|
341 |
trees; they also play a vital role when terms are printed (see
|
lcp@320
|
342 |
\S\ref{sec:asts}).
|
lcp@320
|
343 |
|
lcp@320
|
344 |
Productions with no strings attached are called {\bf copy
|
lcp@320
|
345 |
productions}\indexbold{productions!copy}. Their right-hand side must
|
lcp@320
|
346 |
have exactly one nonterminal symbol (or name token). The parser does
|
lcp@320
|
347 |
not create a new parse tree node for copy productions, but simply
|
lcp@320
|
348 |
returns the parse tree of the right-hand symbol.
|
lcp@320
|
349 |
|
lcp@320
|
350 |
If the right-hand side consists of a single nonterminal with no
|
lcp@320
|
351 |
delimiters, then the copy production is called a {\bf chain
|
lcp@320
|
352 |
production}. Chain productions act as abbreviations:
|
lcp@320
|
353 |
conceptually, they are removed from the grammar by adding new
|
lcp@320
|
354 |
productions. Priority information attached to chain productions is
|
lcp@320
|
355 |
ignored; only the dummy value $-1$ is displayed.
|
lcp@320
|
356 |
|
lcp@320
|
357 |
\item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]
|
lcp@320
|
358 |
relate to macros (see \S\ref{sec:macros}).
|
lcp@320
|
359 |
|
lcp@320
|
360 |
\item[{\tt parse_ast_translation}, {\tt print_ast_translation}]
|
lcp@320
|
361 |
list sets of constants that invoke translation functions for abstract
|
lcp@320
|
362 |
syntax trees. Section \S\ref{sec:asts} below discusses this obscure
|
lcp@320
|
363 |
matter.\index{constants!for translations}
|
lcp@320
|
364 |
|
lcp@320
|
365 |
\item[{\tt parse_translation}, {\tt print_translation}] list sets
|
lcp@320
|
366 |
of constants that invoke translation functions for terms (see
|
lcp@320
|
367 |
\S\ref{sec:tr_funs}).
|
lcp@320
|
368 |
\end{description}
|
lcp@320
|
369 |
\index{syntax!Pure|)}
|
lcp@320
|
370 |
|
lcp@320
|
371 |
|
lcp@320
|
372 |
\section{Mixfix declarations} \label{sec:mixfix}
|
wenzelm@864
|
373 |
\index{mixfix declarations|(}
|
lcp@320
|
374 |
|
lcp@320
|
375 |
When defining a theory, you declare new constants by giving their names,
|
lcp@320
|
376 |
their type, and an optional {\bf mixfix annotation}. Mixfix annotations
|
lcp@320
|
377 |
allow you to extend Isabelle's basic $\lambda$-calculus syntax with
|
lcp@320
|
378 |
readable notation. They can express any context-free priority grammar.
|
lcp@320
|
379 |
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
|
wenzelm@864
|
380 |
general than the priority declarations of \ML\ and Prolog.
|
lcp@320
|
381 |
|
lcp@320
|
382 |
A mixfix annotation defines a production of the priority grammar. It
|
lcp@320
|
383 |
describes the concrete syntax, the translation to abstract syntax, and the
|
lcp@320
|
384 |
pretty printing. Special case annotations provide a simple means of
|
wenzelm@864
|
385 |
specifying infix operators and binders.
|
lcp@320
|
386 |
|
lcp@320
|
387 |
\subsection{The general mixfix form}
|
lcp@320
|
388 |
Here is a detailed account of mixfix declarations. Suppose the following
|
wenzelm@864
|
389 |
line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
|
wenzelm@864
|
390 |
file:
|
lcp@320
|
391 |
\begin{center}
|
lcp@320
|
392 |
{\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
|
lcp@320
|
393 |
\end{center}
|
lcp@332
|
394 |
This constant declaration and mixfix annotation are interpreted as follows:
|
lcp@320
|
395 |
\begin{itemize}\index{productions}
|
lcp@320
|
396 |
\item The string {\tt $c$} is the name of the constant associated with the
|
lcp@320
|
397 |
production; unless it is a valid identifier, it must be enclosed in
|
lcp@320
|
398 |
quotes. If $c$ is empty (given as~{\tt ""}) then this is a copy
|
lcp@320
|
399 |
production.\index{productions!copy} Otherwise, parsing an instance of the
|
lcp@320
|
400 |
phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
|
lcp@320
|
401 |
$a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
|
lcp@320
|
402 |
argument.
|
lcp@320
|
403 |
|
wenzelm@864
|
404 |
\item The constant $c$, if non-empty, is declared to have type $\sigma$
|
wenzelm@864
|
405 |
({\tt consts} section only).
|
lcp@320
|
406 |
|
lcp@320
|
407 |
\item The string $template$ specifies the right-hand side of
|
lcp@320
|
408 |
the production. It has the form
|
wenzelm@864
|
409 |
\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
|
lcp@320
|
410 |
where each occurrence of {\tt_} denotes an argument position and
|
lcp@320
|
411 |
the~$w@i$ do not contain~{\tt _}. (If you want a literal~{\tt _} in
|
lcp@320
|
412 |
the concrete syntax, you must escape it as described below.) The $w@i$
|
wenzelm@864
|
413 |
may consist of \rmindex{delimiters}, spaces or
|
lcp@320
|
414 |
\rmindex{pretty printing} annotations (see below).
|
lcp@320
|
415 |
|
lcp@320
|
416 |
\item The type $\sigma$ specifies the production's nonterminal symbols
|
lcp@320
|
417 |
(or name tokens). If $template$ is of the form above then $\sigma$
|
lcp@320
|
418 |
must be a function type with at least~$n$ argument positions, say
|
lcp@320
|
419 |
$\sigma = [\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are
|
lcp@320
|
420 |
derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
|
wenzelm@864
|
421 |
below. Any of these may be function types.
|
lcp@320
|
422 |
|
lcp@320
|
423 |
\item The optional list~$ps$ may contain at most $n$ integers, say {\tt
|
lcp@320
|
424 |
[$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
|
lcp@320
|
425 |
priority\indexbold{priorities} required of any phrase that may appear
|
lcp@320
|
426 |
as the $i$-th argument. Missing priorities default to~0.
|
lcp@320
|
427 |
|
lcp@320
|
428 |
\item The integer $p$ is the priority of this production. If omitted, it
|
lcp@320
|
429 |
defaults to the maximal priority.
|
lcp@320
|
430 |
Priorities range between 0 and \ttindexbold{max_pri} (= 1000).
|
lcp@320
|
431 |
\end{itemize}
|
lcp@320
|
432 |
%
|
wenzelm@864
|
433 |
The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
|
wenzelm@864
|
434 |
A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
|
wenzelm@864
|
435 |
nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
|
wenzelm@864
|
436 |
The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
|
wenzelm@864
|
437 |
this is a logical type (namely one of class {\tt logic} excluding {\tt
|
wenzelm@864
|
438 |
prop}). Otherwise it is $ty$ (note that only the outermost type constructor
|
wenzelm@864
|
439 |
is taken into account). Finally, the nonterminal of a type variable is {\tt
|
wenzelm@864
|
440 |
any}.
|
wenzelm@864
|
441 |
|
wenzelm@911
|
442 |
\begin{warn}
|
wenzelm@864
|
443 |
Theories must sometimes declare types for purely syntactic purposes ---
|
wenzelm@864
|
444 |
merely playing the role of nonterminals. One example is \tydx{type}, the
|
wenzelm@864
|
445 |
built-in type of types. This is a `type of all types' in the syntactic
|
wenzelm@864
|
446 |
sense only. Do not declare such types under {\tt arities} as belonging to
|
wenzelm@864
|
447 |
class {\tt logic}\index{*logic class}, for that would make them useless as
|
wenzelm@864
|
448 |
separate nonterminal symbols.
|
wenzelm@864
|
449 |
\end{warn}
|
wenzelm@864
|
450 |
|
wenzelm@864
|
451 |
Associating nonterminals with types allows a constant's type to specify
|
wenzelm@864
|
452 |
syntax as well. We can declare the function~$f$ to have type $[\tau@1,
|
wenzelm@864
|
453 |
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
|
wenzelm@864
|
454 |
of the function's $n$ arguments. The constant's name, in this case~$f$, will
|
wenzelm@864
|
455 |
also serve as the label in the abstract syntax tree.
|
wenzelm@864
|
456 |
|
wenzelm@864
|
457 |
You may also declare mixfix syntax without adding constants to the theory's
|
wenzelm@864
|
458 |
signature, by using a {\tt syntax} section instead of {\tt consts}. Thus a
|
wenzelm@864
|
459 |
production need not map directly to a logical function (this typically
|
wenzelm@864
|
460 |
requires additional syntactic translations, see also
|
wenzelm@864
|
461 |
Chapter~\ref{chap:syntax}).
|
wenzelm@864
|
462 |
|
wenzelm@864
|
463 |
|
wenzelm@911
|
464 |
\medskip
|
wenzelm@911
|
465 |
As a special case of the general mixfix declaration, the form
|
wenzelm@864
|
466 |
\begin{center}
|
wenzelm@911
|
467 |
{\tt $c$ ::\ "$\sigma$" ("$template$")}
|
wenzelm@864
|
468 |
\end{center}
|
wenzelm@864
|
469 |
specifies no priorities. The resulting production puts no priority
|
wenzelm@864
|
470 |
constraints on any of its arguments and has maximal priority itself.
|
wenzelm@864
|
471 |
Omitting priorities in this manner is prone to syntactic ambiguities unless
|
berghofe@3098
|
472 |
the production's right-hand side is fully bracketed, as in
|
berghofe@3098
|
473 |
\verb|"if _ then _ else _ fi"|.
|
lcp@320
|
474 |
|
lcp@320
|
475 |
Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
|
lcp@320
|
476 |
is sensible only if~$c$ is an identifier. Otherwise you will be unable to
|
lcp@320
|
477 |
write terms involving~$c$.
|
lcp@320
|
478 |
|
lcp@320
|
479 |
|
lcp@320
|
480 |
\subsection{Example: arithmetic expressions}
|
lcp@320
|
481 |
\index{examples!of mixfix declarations}
|
wenzelm@864
|
482 |
This theory specification contains a {\tt syntax} section with mixfix
|
lcp@320
|
483 |
declarations encoding the priority grammar from
|
lcp@320
|
484 |
\S\ref{sec:priority_grammars}:
|
lcp@320
|
485 |
\begin{ttbox}
|
lcp@320
|
486 |
EXP = Pure +
|
lcp@320
|
487 |
types
|
lcp@320
|
488 |
exp
|
wenzelm@864
|
489 |
syntax
|
clasohm@1387
|
490 |
"0" :: exp ("0" 9)
|
clasohm@1387
|
491 |
"+" :: [exp, exp] => exp ("_ + _" [0, 1] 0)
|
clasohm@1387
|
492 |
"*" :: [exp, exp] => exp ("_ * _" [3, 2] 2)
|
clasohm@1387
|
493 |
"-" :: exp => exp ("- _" [3] 3)
|
lcp@320
|
494 |
end
|
lcp@320
|
495 |
\end{ttbox}
|
wenzelm@864
|
496 |
If you put this into a file {\tt EXP.thy} and load it via {\tt use_thy"EXP"},
|
wenzelm@864
|
497 |
you can run some tests:
|
lcp@320
|
498 |
\begin{ttbox}
|
lcp@320
|
499 |
val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";
|
lcp@320
|
500 |
{\out val it = fn : string -> unit}
|
lcp@320
|
501 |
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";
|
lcp@320
|
502 |
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
|
lcp@320
|
503 |
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
|
lcp@320
|
504 |
{\out \vdots}
|
lcp@320
|
505 |
read_exp "0 + - 0 + 0";
|
lcp@320
|
506 |
{\out tokens: "0" "+" "-" "0" "+" "0"}
|
lcp@320
|
507 |
{\out raw: ("+" ("+" "0" ("-" "0")) "0")}
|
lcp@320
|
508 |
{\out \vdots}
|
lcp@320
|
509 |
\end{ttbox}
|
lcp@320
|
510 |
The output of \ttindex{Syntax.test_read} includes the token list ({\tt
|
lcp@320
|
511 |
tokens}) and the raw \AST{} directly derived from the parse tree,
|
lcp@320
|
512 |
ignoring parse \AST{} translations. The rest is tracing information
|
lcp@320
|
513 |
provided by the macro expander (see \S\ref{sec:macros}).
|
lcp@320
|
514 |
|
wenzelm@864
|
515 |
Executing {\tt Syntax.print_gram} reveals the productions derived from the
|
wenzelm@864
|
516 |
above mixfix declarations (lots of additional information deleted):
|
lcp@320
|
517 |
\begin{ttbox}
|
lcp@320
|
518 |
Syntax.print_gram (syn_of EXP.thy);
|
lcp@320
|
519 |
{\out exp = "0" => "0" (9)}
|
lcp@320
|
520 |
{\out exp = exp[0] "+" exp[1] => "+" (0)}
|
lcp@320
|
521 |
{\out exp = exp[3] "*" exp[2] => "*" (2)}
|
lcp@320
|
522 |
{\out exp = "-" exp[3] => "-" (3)}
|
lcp@320
|
523 |
\end{ttbox}
|
lcp@320
|
524 |
|
nipkow@867
|
525 |
Note that because {\tt exp} is not of class {\tt logic}, it has been retained
|
nipkow@867
|
526 |
as a separate nonterminal. This also entails that the syntax does not provide
|
nipkow@867
|
527 |
for identifiers or paranthesized expressions. Normally you would also want to
|
nipkow@867
|
528 |
add the declaration {\tt arities exp :: logic} and use {\tt consts} instead
|
nipkow@867
|
529 |
of {\tt syntax}. Try this as an exercise and study the changes in the
|
nipkow@867
|
530 |
grammar.
|
lcp@320
|
531 |
|
lcp@320
|
532 |
\subsection{The mixfix template}
|
wenzelm@864
|
533 |
Let us now take a closer look at the string $template$ appearing in mixfix
|
lcp@320
|
534 |
annotations. This string specifies a list of parsing and printing
|
lcp@320
|
535 |
directives: delimiters\index{delimiters}, arguments, spaces, blocks of
|
lcp@320
|
536 |
indentation and line breaks. These are encoded by the following character
|
lcp@320
|
537 |
sequences:
|
lcp@320
|
538 |
\index{pretty printing|(}
|
lcp@320
|
539 |
\begin{description}
|
lcp@320
|
540 |
\item[~$d$~] is a delimiter, namely a non-empty sequence of characters
|
lcp@320
|
541 |
other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.
|
lcp@320
|
542 |
Even these characters may appear if escaped; this means preceding it with
|
lcp@320
|
543 |
a~{\tt '} (single quote). Thus you have to write {\tt ''} if you really
|
wenzelm@911
|
544 |
want a single quote. Furthermore, a~{\tt '} followed by a space separates
|
wenzelm@911
|
545 |
delimiters without extra white space being added for printing.
|
lcp@320
|
546 |
|
lcp@320
|
547 |
\item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol
|
lcp@320
|
548 |
or name token.
|
lcp@320
|
549 |
|
lcp@320
|
550 |
\item[~$s$~] is a non-empty sequence of spaces for printing. This and the
|
lcp@320
|
551 |
following specifications do not affect parsing at all.
|
lcp@320
|
552 |
|
lcp@320
|
553 |
\item[~{\tt(}$n$~] opens a pretty printing block. The optional number $n$
|
lcp@320
|
554 |
specifies how much indentation to add when a line break occurs within the
|
lcp@320
|
555 |
block. If {\tt(} is not followed by digits, the indentation defaults
|
lcp@320
|
556 |
to~0.
|
lcp@320
|
557 |
|
lcp@320
|
558 |
\item[~{\tt)}~] closes a pretty printing block.
|
lcp@320
|
559 |
|
lcp@320
|
560 |
\item[~{\tt//}~] forces a line break.
|
lcp@320
|
561 |
|
lcp@320
|
562 |
\item[~{\tt/}$s$~] allows a line break. Here $s$ stands for the string of
|
lcp@320
|
563 |
spaces (zero or more) right after the {\tt /} character. These spaces
|
lcp@320
|
564 |
are printed if the break is not taken.
|
lcp@320
|
565 |
\end{description}
|
lcp@320
|
566 |
For example, the template {\tt"(_ +/ _)"} specifies an infix operator.
|
lcp@320
|
567 |
There are two argument positions; the delimiter~{\tt+} is preceded by a
|
lcp@320
|
568 |
space and followed by a space or line break; the entire phrase is a pretty
|
lcp@320
|
569 |
printing block. Other examples appear in Fig.\ts\ref{fig:set_trans} below.
|
lcp@320
|
570 |
Isabelle's pretty printer resembles the one described in
|
lcp@320
|
571 |
Paulson~\cite{paulson91}.
|
lcp@320
|
572 |
|
lcp@320
|
573 |
\index{pretty printing|)}
|
lcp@320
|
574 |
|
lcp@320
|
575 |
|
lcp@320
|
576 |
\subsection{Infixes}
|
lcp@320
|
577 |
\indexbold{infixes}
|
lcp@320
|
578 |
|
lcp@320
|
579 |
Infix operators associating to the left or right can be declared
|
lcp@320
|
580 |
using {\tt infixl} or {\tt infixr}.
|
clasohm@1387
|
581 |
Roughly speaking, the form {\tt $c$ ::\ $\sigma$ (infixl $p$)}
|
wenzelm@864
|
582 |
abbreviates the mixfix declarations
|
lcp@320
|
583 |
\begin{ttbox}
|
clasohm@1387
|
584 |
"op \(c\)" :: \(\sigma\) ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
|
clasohm@1387
|
585 |
"op \(c\)" :: \(\sigma\) ("op \(c\)")
|
lcp@320
|
586 |
\end{ttbox}
|
clasohm@1387
|
587 |
and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations
|
lcp@320
|
588 |
\begin{ttbox}
|
clasohm@1387
|
589 |
"op \(c\)" :: \(\sigma\) ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
|
clasohm@1387
|
590 |
"op \(c\)" :: \(\sigma\) ("op \(c\)")
|
lcp@320
|
591 |
\end{ttbox}
|
lcp@320
|
592 |
The infix operator is declared as a constant with the prefix {\tt op}.
|
lcp@320
|
593 |
Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
|
lcp@320
|
594 |
function symbols, as in \ML. Special characters occurring in~$c$ must be
|
lcp@320
|
595 |
escaped, as in delimiters, using a single quote.
|
lcp@320
|
596 |
|
lcp@320
|
597 |
|
lcp@320
|
598 |
\subsection{Binders}
|
lcp@320
|
599 |
\indexbold{binders}
|
lcp@320
|
600 |
\begingroup
|
lcp@320
|
601 |
\def\Q{{\cal Q}}
|
lcp@320
|
602 |
A {\bf binder} is a variable-binding construct such as a quantifier. The
|
lcp@320
|
603 |
constant declaration
|
lcp@320
|
604 |
\begin{ttbox}
|
clasohm@1387
|
605 |
\(c\) :: \(\sigma\) (binder "\(\Q\)" [\(pb\)] \(p\))
|
lcp@320
|
606 |
\end{ttbox}
|
lcp@320
|
607 |
introduces a constant~$c$ of type~$\sigma$, which must have the form
|
lcp@320
|
608 |
$(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where
|
lcp@320
|
609 |
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
|
clasohm@877
|
610 |
and the whole term has type~$\tau@3$. The optional integer $pb$
|
lcp@1060
|
611 |
specifies the body's priority, by default~$p$. Special characters
|
clasohm@877
|
612 |
in $\Q$ must be escaped using a single quote.
|
lcp@320
|
613 |
|
wenzelm@864
|
614 |
The declaration is expanded internally to something like
|
lcp@320
|
615 |
\begin{ttbox}
|
berghofe@3098
|
616 |
\(c\)\hskip3pt :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)
|
berghofe@3098
|
617 |
"\(\Q\)" :: [idts, \(\tau@2\)] => \(\tau@3\) ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))
|
lcp@320
|
618 |
\end{ttbox}
|
lcp@320
|
619 |
Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
|
lcp@332
|
620 |
\index{type constraints}
|
lcp@320
|
621 |
optional type constraints (see Fig.\ts\ref{fig:pure_gram}). The
|
lcp@320
|
622 |
declaration also installs a parse translation\index{translations!parse}
|
lcp@320
|
623 |
for~$\Q$ and a print translation\index{translations!print} for~$c$ to
|
lcp@320
|
624 |
translate between the internal and external forms.
|
lcp@320
|
625 |
|
lcp@320
|
626 |
A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
|
lcp@320
|
627 |
list of variables. The external form $\Q~x@1~x@2 \ldots x@n. P$
|
lcp@320
|
628 |
corresponds to the internal form
|
lcp@320
|
629 |
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
|
lcp@320
|
630 |
|
lcp@320
|
631 |
\medskip
|
lcp@320
|
632 |
For example, let us declare the quantifier~$\forall$:\index{quantifiers}
|
lcp@320
|
633 |
\begin{ttbox}
|
clasohm@1387
|
634 |
All :: ('a => o) => o (binder "ALL " 10)
|
lcp@320
|
635 |
\end{ttbox}
|
lcp@320
|
636 |
This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
|
lcp@320
|
637 |
$x$.$P$}. When printing, Isabelle prefers the latter form, but must fall
|
lcp@320
|
638 |
back on ${\tt All}(P)$ if $P$ is not an abstraction. Both $P$ and {\tt ALL
|
lcp@320
|
639 |
$x$.$P$} have type~$o$, the type of formulae, while the bound variable
|
lcp@320
|
640 |
can be polymorphic.
|
lcp@320
|
641 |
\endgroup
|
lcp@320
|
642 |
|
lcp@320
|
643 |
\index{mixfix declarations|)}
|
lcp@320
|
644 |
|
clasohm@711
|
645 |
\section{Ambiguity of parsed expressions} \label{sec:ambiguity}
|
clasohm@711
|
646 |
\index{ambiguity!of parsed expressions}
|
clasohm@711
|
647 |
|
clasohm@711
|
648 |
To keep the grammar small and allow common productions to be shared
|
wenzelm@864
|
649 |
all logical types (except {\tt prop}) are internally represented
|
wenzelm@864
|
650 |
by one nonterminal, namely {\tt logic}. This and omitted or too freely
|
clasohm@711
|
651 |
chosen priorities may lead to ways of parsing an expression that were
|
clasohm@711
|
652 |
not intended by the theory's maker. In most cases Isabelle is able to
|
wenzelm@864
|
653 |
select one of multiple parse trees that an expression has lead
|
clasohm@711
|
654 |
to by checking which of them can be typed correctly. But this may not
|
clasohm@711
|
655 |
work in every case and always slows down parsing.
|
wenzelm@864
|
656 |
The warning and error messages that can be produced during this process are
|
clasohm@711
|
657 |
as follows:
|
clasohm@711
|
658 |
|
clasohm@880
|
659 |
If an ambiguity can be resolved by type inference the following
|
clasohm@880
|
660 |
warning is shown to remind the user that parsing is (unnecessarily)
|
clasohm@880
|
661 |
slowed down. In cases where it's not easily possible to eliminate the
|
clasohm@880
|
662 |
ambiguity the frequency of the warning can be controlled by changing
|
clasohm@883
|
663 |
the value of {\tt Syntax.ambiguity_level} which has type {\tt int
|
clasohm@880
|
664 |
ref}. Its default value is 1 and by increasing it one can control how
|
clasohm@883
|
665 |
many parse trees are necessary to generate the warning.
|
clasohm@711
|
666 |
|
clasohm@711
|
667 |
\begin{ttbox}
|
clasohm@711
|
668 |
{\out Warning: Ambiguous input "..."}
|
clasohm@711
|
669 |
{\out produces the following parse trees:}
|
clasohm@711
|
670 |
{\out ...}
|
clasohm@711
|
671 |
{\out Fortunately, only one parse tree is type correct.}
|
clasohm@711
|
672 |
{\out It helps (speed!) if you disambiguate your grammar or your input.}
|
clasohm@711
|
673 |
\end{ttbox}
|
clasohm@711
|
674 |
|
clasohm@711
|
675 |
The following message is normally caused by using the same
|
clasohm@711
|
676 |
syntax in two different productions:
|
clasohm@711
|
677 |
|
clasohm@711
|
678 |
\begin{ttbox}
|
clasohm@711
|
679 |
{\out Warning: Ambiguous input "..."}
|
clasohm@711
|
680 |
{\out produces the following parse trees:}
|
clasohm@711
|
681 |
{\out ...}
|
clasohm@711
|
682 |
{\out Error: More than one term is type correct:}
|
clasohm@711
|
683 |
{\out ...}
|
clasohm@711
|
684 |
\end{ttbox}
|
clasohm@711
|
685 |
|
clasohm@866
|
686 |
Ambiguities occuring in syntax translation rules cannot be resolved by
|
clasohm@866
|
687 |
type inference because it is not necessary for these rules to be type
|
clasohm@866
|
688 |
correct. Therefore Isabelle always generates an error message and the
|
clasohm@866
|
689 |
ambiguity should be eliminated by changing the grammar or the rule.
|
clasohm@711
|
690 |
|
lcp@320
|
691 |
|
lcp@320
|
692 |
\section{Example: some minimal logics} \label{sec:min_logics}
|
lcp@320
|
693 |
\index{examples!of logic definitions}
|
lcp@320
|
694 |
|
lcp@320
|
695 |
This section presents some examples that have a simple syntax. They
|
lcp@320
|
696 |
demonstrate how to define new object-logics from scratch.
|
lcp@320
|
697 |
|
clasohm@711
|
698 |
First we must define how an object-logic syntax is embedded into the
|
wenzelm@864
|
699 |
meta-logic. Since all theorems must conform to the syntax for~\ndx{prop}
|
wenzelm@864
|
700 |
(see Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
|
lcp@320
|
701 |
object-level syntax. Assume that the syntax of your object-logic defines a
|
wenzelm@864
|
702 |
meta-type~\tydx{o} of formulae which refers to the nonterminal {\tt logic}.
|
wenzelm@864
|
703 |
These formulae can now appear in axioms and theorems wherever \ndx{prop} does
|
wenzelm@864
|
704 |
if you add the production
|
wenzelm@864
|
705 |
\[ prop ~=~ logic. \]
|
wenzelm@864
|
706 |
This is not supposed to be a copy production but an implicit coercion from
|
wenzelm@864
|
707 |
formulae to propositions:
|
lcp@320
|
708 |
\begin{ttbox}
|
lcp@320
|
709 |
Base = Pure +
|
lcp@320
|
710 |
types
|
lcp@320
|
711 |
o
|
lcp@320
|
712 |
arities
|
lcp@320
|
713 |
o :: logic
|
lcp@320
|
714 |
consts
|
clasohm@1387
|
715 |
Trueprop :: o => prop ("_" 5)
|
lcp@320
|
716 |
end
|
lcp@320
|
717 |
\end{ttbox}
|
lcp@320
|
718 |
The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible
|
lcp@332
|
719 |
coercion function. Assuming this definition resides in a file {\tt Base.thy},
|
lcp@320
|
720 |
you have to load it with the command {\tt use_thy "Base"}.
|
lcp@320
|
721 |
|
lcp@320
|
722 |
One of the simplest nontrivial logics is {\bf minimal logic} of
|
lcp@320
|
723 |
implication. Its definition in Isabelle needs no advanced features but
|
lcp@320
|
724 |
illustrates the overall mechanism nicely:
|
lcp@320
|
725 |
\begin{ttbox}
|
lcp@320
|
726 |
Hilbert = Base +
|
lcp@320
|
727 |
consts
|
clasohm@1387
|
728 |
"-->" :: [o, o] => o (infixr 10)
|
lcp@320
|
729 |
rules
|
lcp@320
|
730 |
K "P --> Q --> P"
|
lcp@320
|
731 |
S "(P --> Q --> R) --> (P --> Q) --> P --> R"
|
lcp@320
|
732 |
MP "[| P --> Q; P |] ==> Q"
|
lcp@320
|
733 |
end
|
lcp@320
|
734 |
\end{ttbox}
|
lcp@332
|
735 |
After loading this definition from the file {\tt Hilbert.thy}, you can
|
lcp@320
|
736 |
start to prove theorems in the logic:
|
lcp@320
|
737 |
\begin{ttbox}
|
lcp@320
|
738 |
goal Hilbert.thy "P --> P";
|
lcp@320
|
739 |
{\out Level 0}
|
lcp@320
|
740 |
{\out P --> P}
|
lcp@320
|
741 |
{\out 1. P --> P}
|
lcp@320
|
742 |
\ttbreak
|
lcp@320
|
743 |
by (resolve_tac [Hilbert.MP] 1);
|
lcp@320
|
744 |
{\out Level 1}
|
lcp@320
|
745 |
{\out P --> P}
|
lcp@320
|
746 |
{\out 1. ?P --> P --> P}
|
lcp@320
|
747 |
{\out 2. ?P}
|
lcp@320
|
748 |
\ttbreak
|
lcp@320
|
749 |
by (resolve_tac [Hilbert.MP] 1);
|
lcp@320
|
750 |
{\out Level 2}
|
lcp@320
|
751 |
{\out P --> P}
|
lcp@320
|
752 |
{\out 1. ?P1 --> ?P --> P --> P}
|
lcp@320
|
753 |
{\out 2. ?P1}
|
lcp@320
|
754 |
{\out 3. ?P}
|
lcp@320
|
755 |
\ttbreak
|
lcp@320
|
756 |
by (resolve_tac [Hilbert.S] 1);
|
lcp@320
|
757 |
{\out Level 3}
|
lcp@320
|
758 |
{\out P --> P}
|
lcp@320
|
759 |
{\out 1. P --> ?Q2 --> P}
|
lcp@320
|
760 |
{\out 2. P --> ?Q2}
|
lcp@320
|
761 |
\ttbreak
|
lcp@320
|
762 |
by (resolve_tac [Hilbert.K] 1);
|
lcp@320
|
763 |
{\out Level 4}
|
lcp@320
|
764 |
{\out P --> P}
|
lcp@320
|
765 |
{\out 1. P --> ?Q2}
|
lcp@320
|
766 |
\ttbreak
|
lcp@320
|
767 |
by (resolve_tac [Hilbert.K] 1);
|
lcp@320
|
768 |
{\out Level 5}
|
lcp@320
|
769 |
{\out P --> P}
|
lcp@320
|
770 |
{\out No subgoals!}
|
lcp@320
|
771 |
\end{ttbox}
|
lcp@320
|
772 |
As we can see, this Hilbert-style formulation of minimal logic is easy to
|
lcp@320
|
773 |
define but difficult to use. The following natural deduction formulation is
|
lcp@320
|
774 |
better:
|
lcp@320
|
775 |
\begin{ttbox}
|
lcp@320
|
776 |
MinI = Base +
|
lcp@320
|
777 |
consts
|
clasohm@1387
|
778 |
"-->" :: [o, o] => o (infixr 10)
|
lcp@320
|
779 |
rules
|
lcp@320
|
780 |
impI "(P ==> Q) ==> P --> Q"
|
lcp@320
|
781 |
impE "[| P --> Q; P |] ==> Q"
|
lcp@320
|
782 |
end
|
lcp@320
|
783 |
\end{ttbox}
|
lcp@320
|
784 |
Note, however, that although the two systems are equivalent, this fact
|
lcp@320
|
785 |
cannot be proved within Isabelle. Axioms {\tt S} and {\tt K} can be
|
lcp@320
|
786 |
derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
|
lcp@320
|
787 |
Hilbert}. The reason is that {\tt impI} is only an {\bf admissible} rule
|
lcp@320
|
788 |
in {\tt Hilbert}, something that can only be shown by induction over all
|
lcp@320
|
789 |
possible proofs in {\tt Hilbert}.
|
lcp@320
|
790 |
|
lcp@320
|
791 |
We may easily extend minimal logic with falsity:
|
lcp@320
|
792 |
\begin{ttbox}
|
lcp@320
|
793 |
MinIF = MinI +
|
lcp@320
|
794 |
consts
|
clasohm@1387
|
795 |
False :: o
|
lcp@320
|
796 |
rules
|
lcp@320
|
797 |
FalseE "False ==> P"
|
lcp@320
|
798 |
end
|
lcp@320
|
799 |
\end{ttbox}
|
lcp@320
|
800 |
On the other hand, we may wish to introduce conjunction only:
|
lcp@320
|
801 |
\begin{ttbox}
|
lcp@320
|
802 |
MinC = Base +
|
lcp@320
|
803 |
consts
|
clasohm@1387
|
804 |
"&" :: [o, o] => o (infixr 30)
|
lcp@320
|
805 |
\ttbreak
|
lcp@320
|
806 |
rules
|
lcp@320
|
807 |
conjI "[| P; Q |] ==> P & Q"
|
lcp@320
|
808 |
conjE1 "P & Q ==> P"
|
lcp@320
|
809 |
conjE2 "P & Q ==> Q"
|
lcp@320
|
810 |
end
|
lcp@320
|
811 |
\end{ttbox}
|
lcp@320
|
812 |
And if we want to have all three connectives together, we create and load a
|
lcp@320
|
813 |
theory file consisting of a single line:\footnote{We can combine the
|
lcp@320
|
814 |
theories without creating a theory file using the ML declaration
|
lcp@320
|
815 |
\begin{ttbox}
|
lcp@320
|
816 |
val MinIFC_thy = merge_theories(MinIF,MinC)
|
lcp@320
|
817 |
\end{ttbox}
|
lcp@320
|
818 |
\index{*merge_theories|fnote}}
|
lcp@320
|
819 |
\begin{ttbox}
|
lcp@320
|
820 |
MinIFC = MinIF + MinC
|
lcp@320
|
821 |
\end{ttbox}
|
lcp@320
|
822 |
Now we can prove mixed theorems like
|
lcp@320
|
823 |
\begin{ttbox}
|
lcp@320
|
824 |
goal MinIFC.thy "P & False --> Q";
|
lcp@320
|
825 |
by (resolve_tac [MinI.impI] 1);
|
lcp@320
|
826 |
by (dresolve_tac [MinC.conjE2] 1);
|
lcp@320
|
827 |
by (eresolve_tac [MinIF.FalseE] 1);
|
lcp@320
|
828 |
\end{ttbox}
|
lcp@320
|
829 |
Try this as an exercise!
|