doc-src/Ref/defining.tex
author Walther Neuper <neuper@ist.tugraz.at>
Thu, 12 Aug 2010 15:03:34 +0200
branchisac-from-Isabelle2009-2
changeset 37913 20e3616b2d9c
parent 30184 37969710e61f
child 43760 e87888b4152f
permissions -rw-r--r--
prepare reactivation of isac-update-Isa09-2
     1 
     2 \chapter{Defining Logics} \label{Defining-Logics}
     3 
     4 \section{Mixfix declarations} \label{sec:mixfix}
     5 \index{mixfix declarations|(}
     6 
     7 When defining a theory, you declare new constants by giving their names,
     8 their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
     9 allow you to extend Isabelle's basic $\lambda$-calculus syntax with
    10 readable notation.  They can express any context-free priority grammar.
    11 Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
    12 general than the priority declarations of \ML\ and Prolog.
    13 
    14 A mixfix annotation defines a production of the priority grammar.  It
    15 describes the concrete syntax, the translation to abstract syntax, and the
    16 pretty printing.  Special case annotations provide a simple means of
    17 specifying infix operators and binders.
    18 
    19 \subsection{The general mixfix form}
    20 Here is a detailed account of mixfix declarations.  Suppose the following
    21 line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
    22 file:
    23 \begin{center}
    24   {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
    25 \end{center}
    26 This constant declaration and mixfix annotation are interpreted as follows:
    27 \begin{itemize}\index{productions}
    28 \item The string {\tt $c$} is the name of the constant associated with the
    29   production; unless it is a valid identifier, it must be enclosed in
    30   quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
    31   production.\index{productions!copy} Otherwise, parsing an instance of the
    32   phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
    33     $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
    34   argument.
    35 
    36   \item The constant $c$, if non-empty, is declared to have type $\sigma$
    37     ({\tt consts} section only).
    38 
    39   \item The string $template$ specifies the right-hand side of
    40     the production.  It has the form
    41     \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
    42     where each occurrence of {\tt_} denotes an argument position and
    43     the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
    44     the concrete syntax, you must escape it as described below.)  The $w@i$
    45     may consist of \rmindex{delimiters}, spaces or
    46     \rmindex{pretty printing} annotations (see below).
    47 
    48   \item The type $\sigma$ specifies the production's nonterminal symbols
    49     (or name tokens).  If $template$ is of the form above then $\sigma$
    50     must be a function type with at least~$n$ argument positions, say
    51     $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
    52     derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
    53     below.  Any of these may be function types.
    54 
    55   \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
    56       [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
    57     priority\indexbold{priorities} required of any phrase that may appear
    58     as the $i$-th argument.  Missing priorities default to~0.
    59     
    60   \item The integer $p$ is the priority of this production.  If
    61     omitted, it defaults to the maximal priority.  Priorities range
    62     between 0 and \ttindexbold{max_pri} (= 1000).
    63 
    64 \end{itemize}
    65 %
    66 The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
    67 A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
    68 nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
    69 The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
    70 this is a logical type (namely one of class {\tt logic} excluding {\tt
    71 prop}).  Otherwise it is $ty$ (note that only the outermost type constructor
    72 is taken into account).  Finally, the nonterminal of a type variable is {\tt
    73 any}.
    74 
    75 \begin{warn}
    76   Theories must sometimes declare types for purely syntactic purposes ---
    77   merely playing the role of nonterminals.  One example is \tydx{type}, the
    78   built-in type of types.  This is a `type of all types' in the syntactic
    79   sense only.  Do not declare such types under {\tt arities} as belonging to
    80   class {\tt logic}\index{*logic class}, for that would make them useless as
    81   separate nonterminal symbols.
    82 \end{warn}
    83 
    84 Associating nonterminals with types allows a constant's type to specify
    85 syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
    86 \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
    87 of the function's $n$ arguments.  The constant's name, in this case~$f$, will
    88 also serve as the label in the abstract syntax tree.
    89 
    90 You may also declare mixfix syntax without adding constants to the theory's
    91 signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a
    92 production need not map directly to a logical function (this typically
    93 requires additional syntactic translations, see also
    94 Chapter~\ref{chap:syntax}).
    95 
    96 
    97 \medskip
    98 As a special case of the general mixfix declaration, the form
    99 \begin{center}
   100   {\tt $c$ ::\ "$\sigma$" ("$template$")}
   101 \end{center}
   102 specifies no priorities.  The resulting production puts no priority
   103 constraints on any of its arguments and has maximal priority itself.
   104 Omitting priorities in this manner is prone to syntactic ambiguities unless
   105 the production's right-hand side is fully bracketed, as in
   106 \verb|"if _ then _ else _ fi"|.
   107 
   108 Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
   109 is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
   110 write terms involving~$c$.
   111 
   112 
   113 \subsection{Example: arithmetic expressions}
   114 \index{examples!of mixfix declarations}
   115 This theory specification contains a {\tt syntax} section with mixfix
   116 declarations encoding the priority grammar from
   117 \S\ref{sec:priority_grammars}:
   118 \begin{ttbox}
   119 ExpSyntax = Pure +
   120 types
   121   exp
   122 syntax
   123   "0" :: exp                 ("0"      9)
   124   "+" :: [exp, exp] => exp   ("_ + _"  [0, 1] 0)
   125   "*" :: [exp, exp] => exp   ("_ * _"  [3, 2] 2)
   126   "-" :: exp => exp          ("- _"    [3] 3)
   127 end
   128 \end{ttbox}
   129 Executing {\tt Syntax.print_gram} reveals the productions derived from the
   130 above mixfix declarations (lots of additional information deleted):
   131 \begin{ttbox}
   132 Syntax.print_gram (syn_of ExpSyntax.thy);
   133 {\out exp = "0"  => "0" (9)}
   134 {\out exp = exp[0] "+" exp[1]  => "+" (0)}
   135 {\out exp = exp[3] "*" exp[2]  => "*" (2)}
   136 {\out exp = "-" exp[3]  => "-" (3)}
   137 \end{ttbox}
   138 
   139 Note that because {\tt exp} is not of class {\tt logic}, it has been
   140 retained as a separate nonterminal.  This also entails that the syntax
   141 does not provide for identifiers or paranthesized expressions.
   142 Normally you would also want to add the declaration {\tt arities
   143   exp::logic} after {\tt types} and use {\tt consts} instead of {\tt
   144   syntax}.  Try this as an exercise and study the changes in the
   145 grammar.
   146 
   147 
   148 \subsection{Infixes}
   149 \indexbold{infixes}
   150 
   151 Infix operators associating to the left or right can be declared using
   152 {\tt infixl} or {\tt infixr}.  Basically, the form {\tt $c$ ::\ 
   153   $\sigma$ (infixl $p$)} abbreviates the mixfix declarations
   154 \begin{ttbox}
   155 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
   156 "op \(c\)" :: \(\sigma\)   ("op \(c\)")
   157 \end{ttbox}
   158 and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations
   159 \begin{ttbox}
   160 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
   161 "op \(c\)" :: \(\sigma\)   ("op \(c\)")
   162 \end{ttbox}
   163 The infix operator is declared as a constant with the prefix {\tt op}.
   164 Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
   165 function symbols, as in \ML.  Special characters occurring in~$c$ must be
   166 escaped, as in delimiters, using a single quote.
   167 
   168 A slightly more general form of infix declarations allows constant
   169 names to be independent from their concrete syntax, namely \texttt{$c$
   170   ::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}.  As
   171 an example consider:
   172 \begin{ttbox}
   173 and :: [bool, bool] => bool  (infixr "&" 35)
   174 \end{ttbox}
   175 The internal constant name will then be just \texttt{and}, without any
   176 \texttt{op} prefixed.
   177 
   178 
   179 \subsection{Binders}
   180 \indexbold{binders}
   181 \begingroup
   182 \def\Q{{\cal Q}}
   183 A {\bf binder} is a variable-binding construct such as a quantifier.  The
   184 constant declaration
   185 \begin{ttbox}
   186 \(c\) :: \(\sigma\)   (binder "\(\Q\)" [\(pb\)] \(p\))
   187 \end{ttbox}
   188 introduces a constant~$c$ of type~$\sigma$, which must have the form
   189 $(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
   190 $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
   191 and the whole term has type~$\tau@3$.  The optional integer $pb$
   192 specifies the body's priority, by default~$p$.  Special characters
   193 in $\Q$ must be escaped using a single quote.
   194 
   195 The declaration is expanded internally to something like
   196 \begin{ttbox}
   197 \(c\)\hskip3pt    :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)
   198 "\(\Q\)"  :: [idts, \(\tau@2\)] => \(\tau@3\)   ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))
   199 \end{ttbox}
   200 Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
   201 \index{type constraints}
   202 optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
   203 declaration also installs a parse translation\index{translations!parse}
   204 for~$\Q$ and a print translation\index{translations!print} for~$c$ to
   205 translate between the internal and external forms.
   206 
   207 A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
   208 list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
   209 corresponds to the internal form
   210 \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]
   211 
   212 \medskip
   213 For example, let us declare the quantifier~$\forall$:\index{quantifiers}
   214 \begin{ttbox}
   215 All :: ('a => o) => o   (binder "ALL " 10)
   216 \end{ttbox}
   217 This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
   218   $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
   219 back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
   220   $x$.$P$} have type~$o$, the type of formulae, while the bound variable
   221 can be polymorphic.
   222 \endgroup
   223 
   224 \index{mixfix declarations|)}
   225 
   226 
   227 \section{*Alternative print modes} \label{sec:prmodes}
   228 \index{print modes|(}
   229 %
   230 Isabelle's pretty printer supports alternative output syntaxes.  These
   231 may be used independently or in cooperation.  The currently active
   232 print modes (with precedence from left to right) are determined by a
   233 reference variable.
   234 \begin{ttbox}\index{*print_mode}
   235 print_mode: string list ref
   236 \end{ttbox}
   237 Initially this may already contain some print mode identifiers,
   238 depending on how Isabelle has been invoked (e.g.\ by some user
   239 interface).  So changes should be incremental --- adding or deleting
   240 modes relative to the current value.
   241 
   242 Any \ML{} string is a legal print mode identifier, without any predeclaration
   243 required.  The following names should be considered reserved, though:
   244 \texttt{""} (the empty string), \texttt{symbols}, \texttt{xsymbols}, and
   245 \texttt{latex}.
   246 
   247 There is a separate table of mixfix productions for pretty printing
   248 associated with each print mode.  The currently active ones are
   249 conceptually just concatenated from left to right, with the standard
   250 syntax output table always coming last as default.  Thus mixfix
   251 productions of preceding modes in the list may override those of later
   252 ones.  Also note that token translations are always relative to some
   253 print mode (see \S\ref{sec:tok_tr}).
   254 
   255 \medskip The canonical application of print modes is optional printing
   256 of mathematical symbols from a special screen font instead of {\sc
   257   ascii}.  Another example is to re-use Isabelle's advanced
   258 $\lambda$-term printing mechanisms to generate completely different
   259 output, say for interfacing external tools like \rmindex{model
   260   checkers} (see also \texttt{HOL/Modelcheck}).
   261 
   262 \index{print modes|)}
   263 
   264 
   265 \section{Ambiguity of parsed expressions} \label{sec:ambiguity}
   266 \index{ambiguity!of parsed expressions}
   267 
   268 To keep the grammar small and allow common productions to be shared
   269 all logical types (except {\tt prop}) are internally represented
   270 by one nonterminal, namely {\tt logic}.  This and omitted or too freely
   271 chosen priorities may lead to ways of parsing an expression that were
   272 not intended by the theory's maker.  In most cases Isabelle is able to
   273 select one of multiple parse trees that an expression has lead
   274 to by checking which of them can be typed correctly.  But this may not
   275 work in every case and always slows down parsing.
   276 The warning and error messages that can be produced during this process are
   277 as follows:
   278 
   279 If an ambiguity can be resolved by type inference the following
   280 warning is shown to remind the user that parsing is (unnecessarily)
   281 slowed down.  In cases where it's not easily possible to eliminate the
   282 ambiguity the frequency of the warning can be controlled by changing
   283 the value of {\tt Syntax.ambiguity_level} which has type {\tt int
   284 ref}.  Its default value is 1 and by increasing it one can control how
   285 many parse trees are necessary to generate the warning.
   286 
   287 \begin{ttbox}
   288 {\out Ambiguous input "\dots"}
   289 {\out produces the following parse trees:}
   290 {\out \dots}
   291 {\out Fortunately, only one parse tree is type correct.}
   292 {\out You may still want to disambiguate your grammar or your input.}
   293 \end{ttbox}
   294 
   295 The following message is normally caused by using the same
   296 syntax in two different productions:
   297 
   298 \begin{ttbox}
   299 {\out Ambiguous input "..."}
   300 {\out produces the following parse trees:}
   301 {\out \dots}
   302 {\out More than one term is type correct:}
   303 {\out \dots}
   304 \end{ttbox}
   305 
   306 Ambiguities occuring in syntax translation rules cannot be resolved by
   307 type inference because it is not necessary for these rules to be type
   308 correct.  Therefore Isabelle always generates an error message and the
   309 ambiguity should be eliminated by changing the grammar or the rule.
   310 
   311 
   312 %%% Local Variables: 
   313 %%% mode: latex
   314 %%% TeX-master: "ref"
   315 %%% End: