wneuper/isa: doc-src/Ref/defining.tex@47f79ad602d9

     1 %% $Id$

     2 \chapter{Defining Logics} \label{Defining-Logics}

     3 This chapter explains how to define new formal systems --- in particular,

     4 their concrete syntax.  While Isabelle can be regarded as a theorem prover

     5 for set theory, higher-order logic or the sequent calculus, its

     6 distinguishing feature is support for the definition of new logics.

     8 Isabelle logics are hierarchies of theories, which are described and

     9 illustrated in

    10 \iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}%

    11 {\S\ref{sec:defining-theories}}.  That material, together with the theory

    12 files provided in the examples directories, should suffice for all simple

    13 applications.  The easiest way to define a new theory is by modifying a

    14 copy of an existing theory.

    16 This chapter documents the meta-logic syntax, mixfix declarations and

    17 pretty printing.  The extended examples in \S\ref{sec:min_logics}

    18 demonstrate the logical aspects of the definition of theories.

    21 \section{Priority grammars} \label{sec:priority_grammars}

    22 \index{priority grammars|(}

    24 A context-free grammar contains a set of {\bf nonterminal symbols}, a set of

    25 {\bf terminal symbols} and a set of {\bf productions}\index{productions}.

    26 Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and

    27 $\gamma$ is a string of terminals and nonterminals.  One designated

    28 nonterminal is called the {\bf start symbol}.  The language defined by the

    29 grammar consists of all strings of terminals that can be derived from the

    30 start symbol by applying productions as rewrite rules.

    32 The syntax of an Isabelle logic is specified by a {\bf priority

    33   grammar}.\index{priorities} Each nonterminal is decorated by an integer

    34 priority, as in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be

    35 rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$.  Any

    36 priority grammar can be translated into a normal context free grammar by

    37 introducing new nonterminals and productions.

    39 Formally, a set of context free productions $G$ induces a derivation

    40 relation $\longrightarrow@G$.  Let $\alpha$ and $\beta$ denote strings of

    41 terminal or nonterminal symbols.  Then

    42 \[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \]

    43 if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$.

    45 The following simple grammar for arithmetic expressions demonstrates how

    46 binding power and associativity of operators can be enforced by priorities.

    47 \begin{center}

    48 \begin{tabular}{rclr}

    49   $A^{(9)}$ & = & {\tt0} \\

    50   $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\

    51   $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\

    52   $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\

    53   $A^{(3)}$ & = & {\tt-} $A^{(3)}$

    54 \end{tabular}

    55 \end{center}

    56 The choice of priorities determines that {\tt -} binds tighter than {\tt *},

    57 which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the

    58 left and {\tt *} to the right.

    60 For clarity, grammars obey these conventions:

    61 \begin{itemize}

    62 \item All priorities must lie between~0 and \ttindex{max_pri}, which is a

    63   some fixed integer.  Sometimes {\tt max_pri} is written as $\infty$.

    64 \item Priority 0 on the right-hand side and priority \ttindex{max_pri} on

    65   the left-hand side may be omitted.

    66 \item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the

    67   priority of the left-hand side actually appears in a column on the far

    68   right.

    69 \item Alternatives are separated by~$|$.

    70 \item Repetition is indicated by dots~(\dots) in an informal but obvious

    71   way.

    72 \end{itemize}

    74 Using these conventions and assuming $\infty=9$, the grammar

    75 takes the form

    76 \begin{center}

    77 \begin{tabular}{rclc}

    78 $A$ & = & {\tt0} & \hspace*{4em} \\

    79  & $|$ & {\tt(} $A$ {\tt)} \\

    80  & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\

    81  & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\

    82  & $|$ & {\tt-} $A^{(3)}$ & (3)

    83 \end{tabular}

    84 \end{center}

    85 \index{priority grammars|)}

    88 \begin{figure}\small

    89 \begin{center}

    90 \begin{tabular}{rclc}

    91 $any$ &=& $prop$ ~~$|$~~ $logic$ \\\\

    92 $prop$ &=& {\tt(} $prop$ {\tt)} \\

    93      &$|$& $prop^{(4)}$ {\tt::} $type$ & (3) \\

    94      &$|$& {\tt PROP} $aprop$ \\

    95      &$|$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\

    96      &$|$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\

    97      &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\

    98      &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\

    99      &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\

   100      &$|$& {\tt OFCLASS} {\tt(} $type$ {\tt,} $logic$ {\tt)} \\\\

   101 $aprop$ &=& $id$ ~~$|$~~ $longid$ ~~$|$~~ $var$

   102     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\

   103 $logic$ &=& {\tt(} $logic$ {\tt)} \\

   104       &$|$& $logic^{(4)}$ {\tt::} $type$ & (3) \\

   105       &$|$& $id$ ~~$|$~~ $longid$ ~~$|$~~ $var$

   106     ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\

   107       &$|$& {\tt \%} $pttrns$ {\tt.} $any^{(3)}$ & (3) \\

   108       &$|$& {\tt TYPE} {\tt(} $type$ {\tt)} \\\\

   109 $idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\

   110 $idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\

   111     &$|$& $id$ {\tt ::} $type$ & (0) \\\\

   112 $pttrns$ &=& $pttrn$ ~~$|$~~ $pttrn^{(1)}$ $pttrns$ \\\\

   113 $pttrn$ &=& $idt$ \\\\

   114 $type$ &=& {\tt(} $type$ {\tt)} \\

   115      &$|$& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$

   116        ~~$|$~~ $tvar$ {\tt::} $sort$ \\

   117      &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$

   118                 ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\

   119      &$|$& $longid$ ~~$|$~~ $type^{(\infty)}$ $longid$

   120                 ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $longid$ \\

   121      &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\

   122      &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0) \\\\

   123 $sort$ &=& $id$ ~~$|$~~ $longid$ ~~$|$~~ {\tt\ttlbrace\ttrbrace} ~~$|$~~

   124   {\tt\ttlbrace} $id$ ~$|$~ $longid${\tt,}\dots{\tt,} $id$ ~$|$~$longid$ {\tt\ttrbrace}

   125 \end{tabular}

   126 \index{*PROP symbol}

   127 \index{*== symbol}\index{*=?= symbol}\index{*==> symbol}

   128 \index{*:: symbol}\index{*=> symbol}

   129 \index{sort constraints}

   130 %the index command: a percent is permitted, but braces must match!

   131 \index{%@{\tt\%} symbol}

   132 \index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol}

   133 \index{*[ symbol}\index{*] symbol}

   134 \index{*"!"! symbol}

   135 \index{*"["| symbol}

   136 \index{*"|"] symbol}

   137 \end{center}

   138 \caption{Meta-logic syntax}\label{fig:pure_gram}

   139 \end{figure}

   142 \section{The Pure syntax} \label{sec:basic_syntax}

   143 \index{syntax!Pure|(}

   145 At the root of all object-logics lies the theory \thydx{Pure}.  It

   146 contains, among many other things, the Pure syntax.  An informal account of

   147 this basic syntax (types, terms and formulae) appears in

   148 \iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}%

   149 {\S\ref{sec:forward}}.  A more precise description using a priority grammar

   150 appears in Fig.\ts\ref{fig:pure_gram}.  It defines the following

   151 nonterminals:

   152 \begin{ttdescription}

   153   \item[\ndxbold{any}] denotes any term.

   155   \item[\ndxbold{prop}] denotes terms of type {\tt prop}.  These are formulae

   156     of the meta-logic.  Note that user constants of result type {\tt prop}

   157     (i.e.\ $c :: \ldots \To prop$) should always provide concrete syntax.

   158     Otherwise atomic propositions with head $c$ may be printed incorrectly.

   160   \item[\ndxbold{aprop}] denotes atomic propositions.

   162 %% FIXME huh!?

   163 %  These typically

   164 %  include the judgement forms of the object-logic; its definition

   165 %  introduces a meta-level predicate for each judgement form.

   167   \item[\ndxbold{logic}] denotes terms whose type belongs to class

   168     \cldx{logic}, excluding type \tydx{prop}.

   170   \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained

   171     by types.

   173   \item[\ndxbold{pttrn}, \ndxbold{pttrns}] denote patterns for

   174     abstraction, cases etc.  Initially the same as $idt$ and $idts$,

   175     these are indetended to be augmented by user extensions.

   177   \item[\ndxbold{type}] denotes types of the meta-logic.

   179   \item[\ndxbold{sort}] denotes meta-level sorts.

   180 \end{ttdescription}

   182 \begin{warn}

   183   In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|,

   184   treating {\tt y} like a type constructor applied to {\tt nat}.  The

   185   likely result is an error message.  To avoid this interpretation, use

   186   parentheses and write \verb|(x::nat) y|.

   187   \index{type constraints}\index{*:: symbol}

   189   Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and

   190   yields an error.  The correct form is \verb|(x::nat) (y::nat)|.

   191 \end{warn}

   193 \begin{warn}

   194   Type constraints bind very weakly.  For example, \verb!x<y::nat! is normally

   195   parsed as \verb!(x<y)::nat!, unless \verb$<$ has priority of 3 or less, in

   196   which case the string is likely to be ambiguous.  The correct form is

   197   \verb!x<(y::nat)!.

   198 \end{warn}

   200 \subsection{Logical types and default syntax}\label{logical-types}

   201 \index{lambda calc@$\lambda$-calculus}

   203 Isabelle's representation of mathematical languages is based on the

   204 simply typed $\lambda$-calculus.  All logical types, namely those of

   205 class \cldx{logic}, are automatically equipped with a basic syntax of

   206 types, identifiers, variables, parentheses, $\lambda$-abstraction and

   207 application.

   208 \begin{warn}

   209   Isabelle combines the syntaxes for all types of class \cldx{logic} by

   210   mapping all those types to the single nonterminal $logic$.  Thus all

   211   productions of $logic$, in particular $id$, $var$ etc, become available.

   212 \end{warn}

   215 \subsection{Lexical matters}

   216 The parser does not process input strings directly.  It operates on token

   217 lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of

   218 tokens: \bfindex{delimiters} and \bfindex{name tokens}.

   220 \index{reserved words}

   221 Delimiters can be regarded as reserved words of the syntax.  You can

   222 add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they

   223 appear in typewriter font, for example {\tt ==}, {\tt =?=} and

   224 {\tt PROP}\@.

   226 Name tokens have a predefined syntax.  The lexer distinguishes six disjoint

   227 classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type

   228 identifiers\index{type identifiers}, type unknowns\index{type unknowns},

   229 \rmindex{numerals}, \rmindex{strings}.  They are denoted by \ndxbold{id},

   230 \ndxbold{var}, \ndxbold{tid}, \ndxbold{tvar}, \ndxbold{xnum}, \ndxbold{xstr},

   231 respectively.  Typical examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3},

   232 {\tt \#42}, {\tt ''foo bar''}.  Here is the precise syntax:

   233 \begin{eqnarray*}

   234 id        & =   & letter~quasiletter^* \\

   235 longid    & =   & id\mbox{\tt .}id~\dots~id \\

   236 var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\

   237 tid       & =   & \mbox{\tt '}id \\

   238 tvar      & =   & \mbox{\tt ?}tid ~~|~~

   239                   \mbox{\tt ?}tid\mbox{\tt .}nat \\

   240 xnum      & =   & \mbox{\tt \#}nat ~~|~~ \mbox{\tt \#-}nat \\

   241 xstr      & =   & \mbox{\tt ''\rm text\tt ''} \\[1ex]

   242 letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\

   243 digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\

   244 quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\

   245 nat       & =   & digit^+

   246 \end{eqnarray*}

   247 The lexer repeatedly takes the longest prefix of the input string that

   248 forms a valid token.  A maximal prefix that is both a delimiter and a

   249 name is treated as a delimiter.  Spaces, tabs, newlines and formfeeds

   250 are separators; they never occur within tokens, except those of class

   251 $xstr$.

   253 \medskip

   254 Delimiters need not be separated by white space.  For example, if {\tt -}

   255 is a delimiter but {\tt --} is not, then the string {\tt --} is treated as

   256 two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\

   257 treats {\tt --} as a single symbolic name.  The consequence of Isabelle's

   258 more liberal scheme is that the same string may be parsed in different ways

   259 after extending the syntax: after adding {\tt --} as a delimiter, the input

   260 {\tt --} is treated as a single token.

   262 A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally

   263 a pair of base name and index (\ML\ type \mltydx{indexname}).  These

   264 components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or

   265 run together as in {\tt ?x1}.  The latter form is possible if the base name

   266 does not end with digits.  If the index is 0, it may be dropped altogether:

   267 {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.

   269 Tokens of class $xnum$ or $xstr$ are not used by the meta-logic.

   270 Object-logics may provide numerals and string constants by adding appropriate

   271 productions and translation functions.

   273 \medskip

   274 Although name tokens are returned from the lexer rather than the parser, it

   275 is more logical to regard them as nonterminals.  Delimiters, however, are

   276 terminals; they are just syntactic sugar and contribute nothing to the

   277 abstract syntax tree.

   280 \subsection{*Inspecting the syntax} \label{pg:print_syn}

   281 \begin{ttbox}

   282 syn_of              : theory -> Syntax.syntax

   283 print_syntax        : theory -> unit

   284 Syntax.print_syntax : Syntax.syntax -> unit

   285 Syntax.print_gram   : Syntax.syntax -> unit

   286 Syntax.print_trans  : Syntax.syntax -> unit

   287 \end{ttbox}

   288 The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes

   289 in \ML.  You can display values of this type by calling the following

   290 functions:

   291 \begin{ttdescription}

   292 \item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle

   293   theory~{\it thy} as an \ML\ value.

   295 \item[\ttindexbold{print_syntax} $thy$] uses {\tt Syntax.print_syntax}

   296  to display the syntax part of theory $thy$.

   298 \item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all

   299   information contained in the syntax {\it syn}.  The displayed output can

   300   be large.  The following two functions are more selective.

   302 \item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part

   303   of~{\it syn}, namely the lexicon, logical types and productions.  These are

   304   discussed below.

   306 \item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation

   307   part of~{\it syn}, namely the constants, parse/print macros and

   308   parse/print translations.

   309 \end{ttdescription}

   311 The output of the above print functions is divided into labelled sections.

   312 The grammar is represented by {\tt lexicon}, {\tt logtypes} and {\tt prods}.

   313 The rest refers to syntactic translations and macro expansion.  Here is an

   314 explanation of the various sections.

   315 \begin{description}

   316   \item[{\tt lexicon}] lists the delimiters used for lexical

   317     analysis.\index{delimiters}

   319   \item[{\tt logtypes}] lists the types that are regarded the same as {\tt

   320     logic} syntactically.  Thus types of object-logics (e.g.\ {\tt nat}, say)

   321     will be automatically equipped with the standard syntax of

   322     $\lambda$-calculus.

   324   \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar.

   325     The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.

   326     Each delimiter is quoted.  Some productions are shown with {\tt =>} and

   327     an attached string.  These strings later become the heads of parse

   328     trees; they also play a vital role when terms are printed (see

   329     \S\ref{sec:asts}).

   331     Productions with no strings attached are called {\bf copy

   332       productions}\indexbold{productions!copy}.  Their right-hand side must

   333     have exactly one nonterminal symbol (or name token).  The parser does

   334     not create a new parse tree node for copy productions, but simply

   335     returns the parse tree of the right-hand symbol.

   337     If the right-hand side consists of a single nonterminal with no

   338     delimiters, then the copy production is called a {\bf chain

   339       production}.  Chain productions act as abbreviations:

   340     conceptually, they are removed from the grammar by adding new

   341     productions.  Priority information attached to chain productions is

   342     ignored; only the dummy value $-1$ is displayed.

   344   \item[\ttindex{print_modes}] lists the alternative print modes

   345     provided by this syntax (see \S\ref{sec:prmodes}).

   347   \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}]

   348     relate to macros (see \S\ref{sec:macros}).

   350   \item[{\tt parse_ast_translation}, {\tt print_ast_translation}]

   351     list sets of constants that invoke translation functions for abstract

   352     syntax trees.  Section \S\ref{sec:asts} below discusses this obscure

   353     matter.\index{constants!for translations}

   355   \item[{\tt parse_translation}, {\tt print_translation}] list the sets

   356     of constants that invoke translation functions for terms (see

   357     \S\ref{sec:tr_funs}).

   358 \end{description}

   359 \index{syntax!Pure|)}

   362 \section{Mixfix declarations} \label{sec:mixfix}

   363 \index{mixfix declarations|(}

   365 When defining a theory, you declare new constants by giving their names,

   366 their type, and an optional {\bf mixfix annotation}.  Mixfix annotations

   367 allow you to extend Isabelle's basic $\lambda$-calculus syntax with

   368 readable notation.  They can express any context-free priority grammar.

   369 Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more

   370 general than the priority declarations of \ML\ and Prolog.

   372 A mixfix annotation defines a production of the priority grammar.  It

   373 describes the concrete syntax, the translation to abstract syntax, and the

   374 pretty printing.  Special case annotations provide a simple means of

   375 specifying infix operators and binders.

   377 \subsection{The general mixfix form}

   378 Here is a detailed account of mixfix declarations.  Suppose the following

   379 line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}

   380 file:

   381 \begin{center}

   382   {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}

   383 \end{center}

   384 This constant declaration and mixfix annotation are interpreted as follows:

   385 \begin{itemize}\index{productions}

   386 \item The string {\tt $c$} is the name of the constant associated with the

   387   production; unless it is a valid identifier, it must be enclosed in

   388   quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy

   389   production.\index{productions!copy} Otherwise, parsing an instance of the

   390   phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$

   391     $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th

   392   argument.

   394   \item The constant $c$, if non-empty, is declared to have type $\sigma$

   395     ({\tt consts} section only).

   397   \item The string $template$ specifies the right-hand side of

   398     the production.  It has the form

   399     \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]

   400     where each occurrence of {\tt_} denotes an argument position and

   401     the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in

   402     the concrete syntax, you must escape it as described below.)  The $w@i$

   403     may consist of \rmindex{delimiters}, spaces or

   404     \rmindex{pretty printing} annotations (see below).

   406   \item The type $\sigma$ specifies the production's nonterminal symbols

   407     (or name tokens).  If $template$ is of the form above then $\sigma$

   408     must be a function type with at least~$n$ argument positions, say

   409     $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are

   410     derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described

   411     below.  Any of these may be function types.

   413   \item The optional list~$ps$ may contain at most $n$ integers, say {\tt

   414       [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal

   415     priority\indexbold{priorities} required of any phrase that may appear

   416     as the $i$-th argument.  Missing priorities default to~0.

   418   \item The integer $p$ is the priority of this production.  If

   419     omitted, it defaults to the maximal priority.  Priorities range

   420     between 0 and \ttindexbold{max_pri} (= 1000).

   422 \end{itemize}

   423 %

   424 The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,

   425 A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the

   426 nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.

   427 The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if

   428 this is a logical type (namely one of class {\tt logic} excluding {\tt

   429 prop}).  Otherwise it is $ty$ (note that only the outermost type constructor

   430 is taken into account).  Finally, the nonterminal of a type variable is {\tt

   431 any}.

   433 \begin{warn}

   434   Theories must sometimes declare types for purely syntactic purposes ---

   435   merely playing the role of nonterminals.  One example is \tydx{type}, the

   436   built-in type of types.  This is a `type of all types' in the syntactic

   437   sense only.  Do not declare such types under {\tt arities} as belonging to

   438   class {\tt logic}\index{*logic class}, for that would make them useless as

   439   separate nonterminal symbols.

   440 \end{warn}

   442 Associating nonterminals with types allows a constant's type to specify

   443 syntax as well.  We can declare the function~$f$ to have type $[\tau@1,

   444 \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout

   445 of the function's $n$ arguments.  The constant's name, in this case~$f$, will

   446 also serve as the label in the abstract syntax tree.

   448 You may also declare mixfix syntax without adding constants to the theory's

   449 signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a

   450 production need not map directly to a logical function (this typically

   451 requires additional syntactic translations, see also

   452 Chapter~\ref{chap:syntax}).

   455 \medskip

   456 As a special case of the general mixfix declaration, the form

   457 \begin{center}

   458   {\tt $c$ ::\ "$\sigma$" ("$template$")}

   459 \end{center}

   460 specifies no priorities.  The resulting production puts no priority

   461 constraints on any of its arguments and has maximal priority itself.

   462 Omitting priorities in this manner is prone to syntactic ambiguities unless

   463 the production's right-hand side is fully bracketed, as in

   464 \verb|"if _ then _ else _ fi"|.

   466 Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},

   467 is sensible only if~$c$ is an identifier.  Otherwise you will be unable to

   468 write terms involving~$c$.

   471 \subsection{Example: arithmetic expressions}

   472 \index{examples!of mixfix declarations}

   473 This theory specification contains a {\tt syntax} section with mixfix

   474 declarations encoding the priority grammar from

   475 \S\ref{sec:priority_grammars}:

   476 \begin{ttbox}

   477 ExpSyntax = Pure +

   478 types

   479   exp

   480 syntax

   481   "0" :: exp                 ("0"      9)

   482   "+" :: [exp, exp] => exp   ("_ + _"  [0, 1] 0)

   483   "*" :: [exp, exp] => exp   ("_ * _"  [3, 2] 2)

   484   "-" :: exp => exp          ("- _"    [3] 3)

   485 end

   486 \end{ttbox}

   487 If you put this into a file {\tt ExpSyntax.thy} and load it by {\tt

   488   use_thy"ExpSyntax"}, you can run some tests:

   489 \begin{ttbox}

   490 val read_exp = Syntax.test_read (syn_of ExpSyntax.thy) "exp";

   491 {\out val it = fn : string -> unit}

   492 read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";

   493 {\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}

   494 {\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}

   495 {\out \vdots}

   496 read_exp "0 + - 0 + 0";

   497 {\out tokens: "0" "+" "-" "0" "+" "0"}

   498 {\out raw: ("+" ("+" "0" ("-" "0")) "0")}

   499 {\out \vdots}

   500 \end{ttbox}

   501 The output of \ttindex{Syntax.test_read} includes the token list ({\tt

   502   tokens}) and the raw \AST{} directly derived from the parse tree,

   503 ignoring parse \AST{} translations.  The rest is tracing information

   504 provided by the macro expander (see \S\ref{sec:macros}).

   506 Executing {\tt Syntax.print_gram} reveals the productions derived from the

   507 above mixfix declarations (lots of additional information deleted):

   508 \begin{ttbox}

   509 Syntax.print_gram (syn_of ExpSyntax.thy);

   510 {\out exp = "0"  => "0" (9)}

   511 {\out exp = exp[0] "+" exp[1]  => "+" (0)}

   512 {\out exp = exp[3] "*" exp[2]  => "*" (2)}

   513 {\out exp = "-" exp[3]  => "-" (3)}

   514 \end{ttbox}

   516 Note that because {\tt exp} is not of class {\tt logic}, it has been

   517 retained as a separate nonterminal.  This also entails that the syntax

   518 does not provide for identifiers or paranthesized expressions.

   519 Normally you would also want to add the declaration {\tt arities

   520   exp::logic} after {\tt types} and use {\tt consts} instead of {\tt

   521   syntax}.  Try this as an exercise and study the changes in the

   522 grammar.

   524 \subsection{The mixfix template}

   525 Let us now take a closer look at the string $template$ appearing in mixfix

   526 annotations.  This string specifies a list of parsing and printing

   527 directives: delimiters\index{delimiters}, arguments, spaces, blocks of

   528 indentation and line breaks.  These are encoded by the following character

   529 sequences:

   530 \index{pretty printing|(}

   531 \begin{description}

   532 \item[~$d$~] is a delimiter, namely a non-empty sequence of characters

   533   other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}.

   534   Even these characters may appear if escaped; this means preceding it with

   535   a~{\tt '} (single quote).  Thus you have to write {\tt ''} if you really

   536   want a single quote.  Furthermore, a~{\tt '} followed by a space separates

   537   delimiters without extra white space being added for printing.

   539 \item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol

   540   or name token.

   542 \item[~$s$~] is a non-empty sequence of spaces for printing.  This and the

   543   following specifications do not affect parsing at all.

   545 \item[~{\tt(}$n$~] opens a pretty printing block.  The optional number $n$

   546   specifies how much indentation to add when a line break occurs within the

   547   block.  If {\tt(} is not followed by digits, the indentation defaults

   548   to~0.

   550 \item[~{\tt)}~] closes a pretty printing block.

   552 \item[~{\tt//}~] forces a line break.

   554 \item[~{\tt/}$s$~] allows a line break.  Here $s$ stands for the string of

   555   spaces (zero or more) right after the {\tt /} character.  These spaces

   556   are printed if the break is not taken.

   557 \end{description}

   558 For example, the template {\tt"(_ +/ _)"} specifies an infix operator.

   559 There are two argument positions; the delimiter~{\tt+} is preceded by a

   560 space and followed by a space or line break; the entire phrase is a pretty

   561 printing block.  Other examples appear in Fig.\ts\ref{fig:set_trans} below.

   562 Isabelle's pretty printer resembles the one described in

   563 Paulson~\cite{paulson-ml2}.

   565 \index{pretty printing|)}

   568 \subsection{Infixes}

   569 \indexbold{infixes}

   571 Infix operators associating to the left or right can be declared using

   572 {\tt infixl} or {\tt infixr}.  Basically, the form {\tt $c$ ::\

   573   $\sigma$ (infixl $p$)} abbreviates the mixfix declarations

   574 \begin{ttbox}

   575 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))

   576 "op \(c\)" :: \(\sigma\)   ("op \(c\)")

   577 \end{ttbox}

   578 and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations

   579 \begin{ttbox}

   580 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))

   581 "op \(c\)" :: \(\sigma\)   ("op \(c\)")

   582 \end{ttbox}

   583 The infix operator is declared as a constant with the prefix {\tt op}.

   584 Thus, prefixing infixes with \sdx{op} makes them behave like ordinary

   585 function symbols, as in \ML.  Special characters occurring in~$c$ must be

   586 escaped, as in delimiters, using a single quote.

   588 A slightly more general form of infix declarations allows constant

   589 names to be independent from their concrete syntax, namely \texttt{$c$

   590   ::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}.  As

   591 an example consider:

   592 \begin{ttbox}

   593 and :: [bool, bool] => bool  (infixr "&" 35)

   594 \end{ttbox}

   595 The internal constant name will then be just \texttt{and}, without any

   596 \texttt{op} prefixed.

   599 \subsection{Binders}

   600 \indexbold{binders}

   601 \begingroup

   602 \def\Q{{\cal Q}}

   603 A {\bf binder} is a variable-binding construct such as a quantifier.  The

   604 constant declaration

   605 \begin{ttbox}

   606 \(c\) :: \(\sigma\)   (binder "\(\Q\)" [\(pb\)] \(p\))

   607 \end{ttbox}

   608 introduces a constant~$c$ of type~$\sigma$, which must have the form

   609 $(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where

   610 $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$

   611 and the whole term has type~$\tau@3$.  The optional integer $pb$

   612 specifies the body's priority, by default~$p$.  Special characters

   613 in $\Q$ must be escaped using a single quote.

   615 The declaration is expanded internally to something like

   616 \begin{ttbox}

   617 \(c\)\hskip3pt    :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)

   618 "\(\Q\)"  :: [idts, \(\tau@2\)] => \(\tau@3\)   ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))

   619 \end{ttbox}

   620 Here \ndx{idts} is the nonterminal symbol for a list of identifiers with

   621 \index{type constraints}

   622 optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The

   623 declaration also installs a parse translation\index{translations!parse}

   624 for~$\Q$ and a print translation\index{translations!print} for~$c$ to

   625 translate between the internal and external forms.

   627 A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a

   628 list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$

   629 corresponds to the internal form

   630 \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]

   632 \medskip

   633 For example, let us declare the quantifier~$\forall$:\index{quantifiers}

   634 \begin{ttbox}

   635 All :: ('a => o) => o   (binder "ALL " 10)

   636 \end{ttbox}

   637 This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL

   638   $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall

   639 back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL

   640   $x$.$P$} have type~$o$, the type of formulae, while the bound variable

   641 can be polymorphic.

   642 \endgroup

   644 \index{mixfix declarations|)}

   647 \section{*Alternative print modes} \label{sec:prmodes}

   648 \index{print modes|(}

   649 %

   650 Isabelle's pretty printer supports alternative output syntaxes.  These

   651 may be used independently or in cooperation.  The currently active

   652 print modes (with precedence from left to right) are determined by a

   653 reference variable.

   654 \begin{ttbox}\index{*print_mode}

   655 print_mode: string list ref

   656 \end{ttbox}

   657 Initially this may already contain some print mode identifiers,

   658 depending on how Isabelle has been invoked (e.g.\ by some user

   659 interface).  So changes should be incremental --- adding or deleting

   660 modes relative to the current value.

   662 Any \ML{} string is a legal print mode identifier, without any predeclaration

   663 required.  The following names should be considered reserved, though:

   664 \texttt{""} (the empty string), \texttt{symbols}, \texttt{xsymbols}, and

   665 \texttt{latex}.

   667 There is a separate table of mixfix productions for pretty printing

   668 associated with each print mode.  The currently active ones are

   669 conceptually just concatenated from left to right, with the standard

   670 syntax output table always coming last as default.  Thus mixfix

   671 productions of preceding modes in the list may override those of later

   672 ones.  Also note that token translations are always relative to some

   673 print mode (see \S\ref{sec:tok_tr}).

   675 \medskip The canonical application of print modes is optional printing

   676 of mathematical symbols from a special screen font instead of {\sc

   677   ascii}.  Another example is to re-use Isabelle's advanced

   678 $\lambda$-term printing mechanisms to generate completely different

   679 output, say for interfacing external tools like \rmindex{model

   680   checkers} (see also \texttt{HOL/Modelcheck}).

   682 \index{print modes|)}

   685 \section{Ambiguity of parsed expressions} \label{sec:ambiguity}

   686 \index{ambiguity!of parsed expressions}

   688 To keep the grammar small and allow common productions to be shared

   689 all logical types (except {\tt prop}) are internally represented

   690 by one nonterminal, namely {\tt logic}.  This and omitted or too freely

   691 chosen priorities may lead to ways of parsing an expression that were

   692 not intended by the theory's maker.  In most cases Isabelle is able to

   693 select one of multiple parse trees that an expression has lead

   694 to by checking which of them can be typed correctly.  But this may not

   695 work in every case and always slows down parsing.

   696 The warning and error messages that can be produced during this process are

   697 as follows:

   699 If an ambiguity can be resolved by type inference the following

   700 warning is shown to remind the user that parsing is (unnecessarily)

   701 slowed down.  In cases where it's not easily possible to eliminate the

   702 ambiguity the frequency of the warning can be controlled by changing

   703 the value of {\tt Syntax.ambiguity_level} which has type {\tt int

   704 ref}.  Its default value is 1 and by increasing it one can control how

   705 many parse trees are necessary to generate the warning.

   707 \begin{ttbox}

   708 {\out Ambiguous input "\dots"}

   709 {\out produces the following parse trees:}

   710 {\out \dots}

   711 {\out Fortunately, only one parse tree is type correct.}

   712 {\out You may still want to disambiguate your grammar or your input.}

   713 \end{ttbox}

   715 The following message is normally caused by using the same

   716 syntax in two different productions:

   718 \begin{ttbox}

   719 {\out Ambiguous input "..."}

   720 {\out produces the following parse trees:}

   721 {\out \dots}

   722 {\out More than one term is type correct:}

   723 {\out \dots}

   724 \end{ttbox}

   726 Ambiguities occuring in syntax translation rules cannot be resolved by

   727 type inference because it is not necessary for these rules to be type

   728 correct.  Therefore Isabelle always generates an error message and the

   729 ambiguity should be eliminated by changing the grammar or the rule.

   732 \section{Example: some minimal logics} \label{sec:min_logics}

   733 \index{examples!of logic definitions}

   735 This section presents some examples that have a simple syntax.  They

   736 demonstrate how to define new object-logics from scratch.

   738 First we must define how an object-logic syntax is embedded into the

   739 meta-logic.  Since all theorems must conform to the syntax for~\ndx{prop}

   740 (see Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the

   741 object-level syntax.  Assume that the syntax of your object-logic defines a

   742 meta-type~\tydx{o} of formulae which refers to the nonterminal {\tt logic}.

   743 These formulae can now appear in axioms and theorems wherever \ndx{prop} does

   744 if you add the production

   745 \[ prop ~=~ logic. \]

   746 This is not supposed to be a copy production but an implicit coercion from

   747 formulae to propositions:

   748 \begin{ttbox}

   749 Base = Pure +

   750 types

   751   o

   752 arities

   753   o :: logic

   754 consts

   755   Trueprop :: o => prop   ("_" 5)

   756 end

   757 \end{ttbox}

   758 The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible

   759 coercion function.  Assuming this definition resides in a file {\tt Base.thy},

   760 you have to load it with the command {\tt use_thy "Base"}.

   762 One of the simplest nontrivial logics is {\bf minimal logic} of

   763 implication.  Its definition in Isabelle needs no advanced features but

   764 illustrates the overall mechanism nicely:

   765 \begin{ttbox}

   766 Hilbert = Base +

   767 consts

   768   "-->" :: [o, o] => o   (infixr 10)

   769 rules

   770   K     "P --> Q --> P"

   771   S     "(P --> Q --> R) --> (P --> Q) --> P --> R"

   772   MP    "[| P --> Q; P |] ==> Q"

   773 end

   774 \end{ttbox}

   775 After loading this definition from the file {\tt Hilbert.thy}, you can

   776 start to prove theorems in the logic:

   777 \begin{ttbox}

   778 Goal "P --> P";

   779 {\out Level 0}

   780 {\out P --> P}

   781 {\out  1.  P --> P}

   782 \ttbreak

   783 by (resolve_tac [Hilbert.MP] 1);

   784 {\out Level 1}

   785 {\out P --> P}

   786 {\out  1.  ?P --> P --> P}

   787 {\out  2.  ?P}

   788 \ttbreak

   789 by (resolve_tac [Hilbert.MP] 1);

   790 {\out Level 2}

   791 {\out P --> P}

   792 {\out  1.  ?P1 --> ?P --> P --> P}

   793 {\out  2.  ?P1}

   794 {\out  3.  ?P}

   795 \ttbreak

   796 by (resolve_tac [Hilbert.S] 1);

   797 {\out Level 3}

   798 {\out P --> P}

   799 {\out  1.  P --> ?Q2 --> P}

   800 {\out  2.  P --> ?Q2}

   801 \ttbreak

   802 by (resolve_tac [Hilbert.K] 1);

   803 {\out Level 4}

   804 {\out P --> P}

   805 {\out  1.  P --> ?Q2}

   806 \ttbreak

   807 by (resolve_tac [Hilbert.K] 1);

   808 {\out Level 5}

   809 {\out P --> P}

   810 {\out No subgoals!}

   811 \end{ttbox}

   812 As we can see, this Hilbert-style formulation of minimal logic is easy to

   813 define but difficult to use.  The following natural deduction formulation is

   814 better:

   815 \begin{ttbox}

   816 MinI = Base +

   817 consts

   818   "-->" :: [o, o] => o   (infixr 10)

   819 rules

   820   impI  "(P ==> Q) ==> P --> Q"

   821   impE  "[| P --> Q; P |] ==> Q"

   822 end

   823 \end{ttbox}

   824 Note, however, that although the two systems are equivalent, this fact

   825 cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be

   826 derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt

   827   Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule

   828 in {\tt Hilbert}, something that can only be shown by induction over all

   829 possible proofs in {\tt Hilbert}.

   831 We may easily extend minimal logic with falsity:

   832 \begin{ttbox}

   833 MinIF = MinI +

   834 consts

   835   False :: o

   836 rules

   837   FalseE "False ==> P"

   838 end

   839 \end{ttbox}

   840 On the other hand, we may wish to introduce conjunction only:

   841 \begin{ttbox}

   842 MinC = Base +

   843 consts

   844   "&" :: [o, o] => o   (infixr 30)

   845 \ttbreak

   846 rules

   847   conjI  "[| P; Q |] ==> P & Q"

   848   conjE1 "P & Q ==> P"

   849   conjE2 "P & Q ==> Q"

   850 end

   851 \end{ttbox}

   852 And if we want to have all three connectives together, we create and load a

   853 theory file consisting of a single line:

   854 \begin{ttbox}

   855 MinIFC = MinIF + MinC

   856 \end{ttbox}

   857 Now we can prove mixed theorems like

   858 \begin{ttbox}

   859 Goal "P & False --> Q";

   860 by (resolve_tac [MinI.impI] 1);

   861 by (dresolve_tac [MinC.conjE2] 1);

   862 by (eresolve_tac [MinIF.FalseE] 1);

   863 \end{ttbox}

   864 Try this as an exercise!

   867 %%% Local Variables:

   868 %%% mode: latex

   869 %%% TeX-master: "ref"

   870 %%% End:

author	wenzelm
	Tue, 11 Dec 2001 15:58:32 +0100
changeset 12465	47f79ad602d9
parent 11621	a19bc891e4bf
child 14231	6d8b6eb8623b
permissions	-rw-r--r--