wneuper/isa: doc-src/Ref/defining.tex@20e3616b2d9c

     2 \chapter{Defining Logics} \label{Defining-Logics}

     4 \section{Mixfix declarations} \label{sec:mixfix}

     5 \index{mixfix declarations|(}

     7 When defining a theory, you declare new constants by giving their names,

     8 their type, and an optional {\bf mixfix annotation}.  Mixfix annotations

     9 allow you to extend Isabelle's basic $\lambda$-calculus syntax with

    10 readable notation.  They can express any context-free priority grammar.

    11 Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more

    12 general than the priority declarations of \ML\ and Prolog.

    14 A mixfix annotation defines a production of the priority grammar.  It

    15 describes the concrete syntax, the translation to abstract syntax, and the

    16 pretty printing.  Special case annotations provide a simple means of

    17 specifying infix operators and binders.

    19 \subsection{The general mixfix form}

    20 Here is a detailed account of mixfix declarations.  Suppose the following

    21 line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}

    22 file:

    23 \begin{center}

    24   {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}

    25 \end{center}

    26 This constant declaration and mixfix annotation are interpreted as follows:

    27 \begin{itemize}\index{productions}

    28 \item The string {\tt $c$} is the name of the constant associated with the

    29   production; unless it is a valid identifier, it must be enclosed in

    30   quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy

    31   production.\index{productions!copy} Otherwise, parsing an instance of the

    32   phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$

    33     $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th

    34   argument.

    36   \item The constant $c$, if non-empty, is declared to have type $\sigma$

    37     ({\tt consts} section only).

    39   \item The string $template$ specifies the right-hand side of

    40     the production.  It has the form

    41     \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]

    42     where each occurrence of {\tt_} denotes an argument position and

    43     the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in

    44     the concrete syntax, you must escape it as described below.)  The $w@i$

    45     may consist of \rmindex{delimiters}, spaces or

    46     \rmindex{pretty printing} annotations (see below).

    48   \item The type $\sigma$ specifies the production's nonterminal symbols

    49     (or name tokens).  If $template$ is of the form above then $\sigma$

    50     must be a function type with at least~$n$ argument positions, say

    51     $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are

    52     derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described

    53     below.  Any of these may be function types.

    55   \item The optional list~$ps$ may contain at most $n$ integers, say {\tt

    56       [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal

    57     priority\indexbold{priorities} required of any phrase that may appear

    58     as the $i$-th argument.  Missing priorities default to~0.

    60   \item The integer $p$ is the priority of this production.  If

    61     omitted, it defaults to the maximal priority.  Priorities range

    62     between 0 and \ttindexbold{max_pri} (= 1000).

    64 \end{itemize}

    65 %

    66 The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,

    67 A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the

    68 nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.

    69 The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if

    70 this is a logical type (namely one of class {\tt logic} excluding {\tt

    71 prop}).  Otherwise it is $ty$ (note that only the outermost type constructor

    72 is taken into account).  Finally, the nonterminal of a type variable is {\tt

    73 any}.

    75 \begin{warn}

    76   Theories must sometimes declare types for purely syntactic purposes ---

    77   merely playing the role of nonterminals.  One example is \tydx{type}, the

    78   built-in type of types.  This is a `type of all types' in the syntactic

    79   sense only.  Do not declare such types under {\tt arities} as belonging to

    80   class {\tt logic}\index{*logic class}, for that would make them useless as

    81   separate nonterminal symbols.

    82 \end{warn}

    84 Associating nonterminals with types allows a constant's type to specify

    85 syntax as well.  We can declare the function~$f$ to have type $[\tau@1,

    86 \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout

    87 of the function's $n$ arguments.  The constant's name, in this case~$f$, will

    88 also serve as the label in the abstract syntax tree.

    90 You may also declare mixfix syntax without adding constants to the theory's

    91 signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a

    92 production need not map directly to a logical function (this typically

    93 requires additional syntactic translations, see also

    94 Chapter~\ref{chap:syntax}).

    97 \medskip

    98 As a special case of the general mixfix declaration, the form

    99 \begin{center}

   100   {\tt $c$ ::\ "$\sigma$" ("$template$")}

   101 \end{center}

   102 specifies no priorities.  The resulting production puts no priority

   103 constraints on any of its arguments and has maximal priority itself.

   104 Omitting priorities in this manner is prone to syntactic ambiguities unless

   105 the production's right-hand side is fully bracketed, as in

   106 \verb|"if _ then _ else _ fi"|.

   108 Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},

   109 is sensible only if~$c$ is an identifier.  Otherwise you will be unable to

   110 write terms involving~$c$.

   113 \subsection{Example: arithmetic expressions}

   114 \index{examples!of mixfix declarations}

   115 This theory specification contains a {\tt syntax} section with mixfix

   116 declarations encoding the priority grammar from

   117 \S\ref{sec:priority_grammars}:

   118 \begin{ttbox}

   119 ExpSyntax = Pure +

   120 types

   121   exp

   122 syntax

   123   "0" :: exp                 ("0"      9)

   124   "+" :: [exp, exp] => exp   ("_ + _"  [0, 1] 0)

   125   "*" :: [exp, exp] => exp   ("_ * _"  [3, 2] 2)

   126   "-" :: exp => exp          ("- _"    [3] 3)

   127 end

   128 \end{ttbox}

   129 Executing {\tt Syntax.print_gram} reveals the productions derived from the

   130 above mixfix declarations (lots of additional information deleted):

   131 \begin{ttbox}

   132 Syntax.print_gram (syn_of ExpSyntax.thy);

   133 {\out exp = "0"  => "0" (9)}

   134 {\out exp = exp[0] "+" exp[1]  => "+" (0)}

   135 {\out exp = exp[3] "*" exp[2]  => "*" (2)}

   136 {\out exp = "-" exp[3]  => "-" (3)}

   137 \end{ttbox}

   139 Note that because {\tt exp} is not of class {\tt logic}, it has been

   140 retained as a separate nonterminal.  This also entails that the syntax

   141 does not provide for identifiers or paranthesized expressions.

   142 Normally you would also want to add the declaration {\tt arities

   143   exp::logic} after {\tt types} and use {\tt consts} instead of {\tt

   144   syntax}.  Try this as an exercise and study the changes in the

   145 grammar.

   148 \subsection{Infixes}

   149 \indexbold{infixes}

   151 Infix operators associating to the left or right can be declared using

   152 {\tt infixl} or {\tt infixr}.  Basically, the form {\tt $c$ ::\

   153   $\sigma$ (infixl $p$)} abbreviates the mixfix declarations

   154 \begin{ttbox}

   155 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))

   156 "op \(c\)" :: \(\sigma\)   ("op \(c\)")

   157 \end{ttbox}

   158 and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations

   159 \begin{ttbox}

   160 "op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))

   161 "op \(c\)" :: \(\sigma\)   ("op \(c\)")

   162 \end{ttbox}

   163 The infix operator is declared as a constant with the prefix {\tt op}.

   164 Thus, prefixing infixes with \sdx{op} makes them behave like ordinary

   165 function symbols, as in \ML.  Special characters occurring in~$c$ must be

   166 escaped, as in delimiters, using a single quote.

   168 A slightly more general form of infix declarations allows constant

   169 names to be independent from their concrete syntax, namely \texttt{$c$

   170   ::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}.  As

   171 an example consider:

   172 \begin{ttbox}

   173 and :: [bool, bool] => bool  (infixr "&" 35)

   174 \end{ttbox}

   175 The internal constant name will then be just \texttt{and}, without any

   176 \texttt{op} prefixed.

   179 \subsection{Binders}

   180 \indexbold{binders}

   181 \begingroup

   182 \def\Q{{\cal Q}}

   183 A {\bf binder} is a variable-binding construct such as a quantifier.  The

   184 constant declaration

   185 \begin{ttbox}

   186 \(c\) :: \(\sigma\)   (binder "\(\Q\)" [\(pb\)] \(p\))

   187 \end{ttbox}

   188 introduces a constant~$c$ of type~$\sigma$, which must have the form

   189 $(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where

   190 $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$

   191 and the whole term has type~$\tau@3$.  The optional integer $pb$

   192 specifies the body's priority, by default~$p$.  Special characters

   193 in $\Q$ must be escaped using a single quote.

   195 The declaration is expanded internally to something like

   196 \begin{ttbox}

   197 \(c\)\hskip3pt    :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)

   198 "\(\Q\)"  :: [idts, \(\tau@2\)] => \(\tau@3\)   ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))

   199 \end{ttbox}

   200 Here \ndx{idts} is the nonterminal symbol for a list of identifiers with

   201 \index{type constraints}

   202 optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The

   203 declaration also installs a parse translation\index{translations!parse}

   204 for~$\Q$ and a print translation\index{translations!print} for~$c$ to

   205 translate between the internal and external forms.

   207 A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a

   208 list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$

   209 corresponds to the internal form

   210 \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]

   212 \medskip

   213 For example, let us declare the quantifier~$\forall$:\index{quantifiers}

   214 \begin{ttbox}

   215 All :: ('a => o) => o   (binder "ALL " 10)

   216 \end{ttbox}

   217 This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL

   218   $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall

   219 back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL

   220   $x$.$P$} have type~$o$, the type of formulae, while the bound variable

   221 can be polymorphic.

   222 \endgroup

   224 \index{mixfix declarations|)}

   227 \section{*Alternative print modes} \label{sec:prmodes}

   228 \index{print modes|(}

   229 %

   230 Isabelle's pretty printer supports alternative output syntaxes.  These

   231 may be used independently or in cooperation.  The currently active

   232 print modes (with precedence from left to right) are determined by a

   233 reference variable.

   234 \begin{ttbox}\index{*print_mode}

   235 print_mode: string list ref

   236 \end{ttbox}

   237 Initially this may already contain some print mode identifiers,

   238 depending on how Isabelle has been invoked (e.g.\ by some user

   239 interface).  So changes should be incremental --- adding or deleting

   240 modes relative to the current value.

   242 Any \ML{} string is a legal print mode identifier, without any predeclaration

   243 required.  The following names should be considered reserved, though:

   244 \texttt{""} (the empty string), \texttt{symbols}, \texttt{xsymbols}, and

   245 \texttt{latex}.

   247 There is a separate table of mixfix productions for pretty printing

   248 associated with each print mode.  The currently active ones are

   249 conceptually just concatenated from left to right, with the standard

   250 syntax output table always coming last as default.  Thus mixfix

   251 productions of preceding modes in the list may override those of later

   252 ones.  Also note that token translations are always relative to some

   253 print mode (see \S\ref{sec:tok_tr}).

   255 \medskip The canonical application of print modes is optional printing

   256 of mathematical symbols from a special screen font instead of {\sc

   257   ascii}.  Another example is to re-use Isabelle's advanced

   258 $\lambda$-term printing mechanisms to generate completely different

   259 output, say for interfacing external tools like \rmindex{model

   260   checkers} (see also \texttt{HOL/Modelcheck}).

   262 \index{print modes|)}

   265 \section{Ambiguity of parsed expressions} \label{sec:ambiguity}

   266 \index{ambiguity!of parsed expressions}

   268 To keep the grammar small and allow common productions to be shared

   269 all logical types (except {\tt prop}) are internally represented

   270 by one nonterminal, namely {\tt logic}.  This and omitted or too freely

   271 chosen priorities may lead to ways of parsing an expression that were

   272 not intended by the theory's maker.  In most cases Isabelle is able to

   273 select one of multiple parse trees that an expression has lead

   274 to by checking which of them can be typed correctly.  But this may not

   275 work in every case and always slows down parsing.

   276 The warning and error messages that can be produced during this process are

   277 as follows:

   279 If an ambiguity can be resolved by type inference the following

   280 warning is shown to remind the user that parsing is (unnecessarily)

   281 slowed down.  In cases where it's not easily possible to eliminate the

   282 ambiguity the frequency of the warning can be controlled by changing

   283 the value of {\tt Syntax.ambiguity_level} which has type {\tt int

   284 ref}.  Its default value is 1 and by increasing it one can control how

   285 many parse trees are necessary to generate the warning.

   287 \begin{ttbox}

   288 {\out Ambiguous input "\dots"}

   289 {\out produces the following parse trees:}

   290 {\out \dots}

   291 {\out Fortunately, only one parse tree is type correct.}

   292 {\out You may still want to disambiguate your grammar or your input.}

   293 \end{ttbox}

   295 The following message is normally caused by using the same

   296 syntax in two different productions:

   298 \begin{ttbox}

   299 {\out Ambiguous input "..."}

   300 {\out produces the following parse trees:}

   301 {\out \dots}

   302 {\out More than one term is type correct:}

   303 {\out \dots}

   304 \end{ttbox}

   306 Ambiguities occuring in syntax translation rules cannot be resolved by

   307 type inference because it is not necessary for these rules to be type

   308 correct.  Therefore Isabelle always generates an error message and the

   309 ambiguity should be eliminated by changing the grammar or the rule.

   312 %%% Local Variables:

   313 %%% mode: latex

   314 %%% TeX-master: "ref"

   315 %%% End:

author	Walther Neuper <neuper@ist.tugraz.at>
	Thu, 12 Aug 2010 15:03:34 +0200
branch	isac-from-Isabelle2009-2
changeset 37913	20e3616b2d9c
parent 30184	37969710e61f
child 43760	e87888b4152f
permissions	-rw-r--r--