lcp@320: %% $Id$ lcp@320: \chapter{Defining Logics} \label{Defining-Logics} lcp@320: This chapter explains how to define new formal systems --- in particular, lcp@320: their concrete syntax. While Isabelle can be regarded as a theorem prover lcp@320: for set theory, higher-order logic or the sequent calculus, its lcp@320: distinguishing feature is support for the definition of new logics. lcp@320: lcp@320: Isabelle logics are hierarchies of theories, which are described and wenzelm@864: illustrated in lcp@320: \iflabelundefined{sec:defining-theories}{{\em Introduction to Isabelle}}% lcp@320: {\S\ref{sec:defining-theories}}. That material, together with the theory lcp@320: files provided in the examples directories, should suffice for all simple lcp@320: applications. The easiest way to define a new theory is by modifying a lcp@320: copy of an existing theory. lcp@320: lcp@320: This chapter documents the meta-logic syntax, mixfix declarations and lcp@320: pretty printing. The extended examples in \S\ref{sec:min_logics} lcp@320: demonstrate the logical aspects of the definition of theories. lcp@320: lcp@320: lcp@320: \section{Priority grammars} \label{sec:priority_grammars} wenzelm@864: \index{priority grammars|(} lcp@320: lcp@320: A context-free grammar contains a set of {\bf nonterminal symbols}, a set of lcp@320: {\bf terminal symbols} and a set of {\bf productions}\index{productions}. lcp@320: Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and lcp@320: $\gamma$ is a string of terminals and nonterminals. One designated lcp@320: nonterminal is called the {\bf start symbol}. The language defined by the lcp@320: grammar consists of all strings of terminals that can be derived from the lcp@320: start symbol by applying productions as rewrite rules. lcp@320: lcp@320: The syntax of an Isabelle logic is specified by a {\bf priority lcp@320: grammar}.\index{priorities} Each nonterminal is decorated by an integer lcp@320: priority, as in~$A^{(p)}$. A nonterminal $A^{(p)}$ in a derivation may be lcp@320: rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$. Any lcp@320: priority grammar can be translated into a normal context free grammar by lcp@320: introducing new nonterminals and productions. lcp@320: lcp@320: Formally, a set of context free productions $G$ induces a derivation lcp@320: relation $\longrightarrow@G$. Let $\alpha$ and $\beta$ denote strings of lcp@320: terminal or nonterminal symbols. Then wenzelm@864: \[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \] lcp@320: if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$. lcp@320: lcp@320: The following simple grammar for arithmetic expressions demonstrates how lcp@320: binding power and associativity of operators can be enforced by priorities. lcp@320: \begin{center} lcp@320: \begin{tabular}{rclr} lcp@320: $A^{(9)}$ & = & {\tt0} \\ lcp@320: $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\ lcp@320: $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\ lcp@320: $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\ lcp@320: $A^{(3)}$ & = & {\tt-} $A^{(3)}$ lcp@320: \end{tabular} lcp@320: \end{center} lcp@320: The choice of priorities determines that {\tt -} binds tighter than {\tt *}, lcp@320: which binds tighter than {\tt +}. Furthermore {\tt +} associates to the lcp@320: left and {\tt *} to the right. lcp@320: lcp@320: For clarity, grammars obey these conventions: lcp@320: \begin{itemize} lcp@320: \item All priorities must lie between~0 and \ttindex{max_pri}, which is a lcp@320: some fixed integer. Sometimes {\tt max_pri} is written as $\infty$. lcp@320: \item Priority 0 on the right-hand side and priority \ttindex{max_pri} on lcp@320: the left-hand side may be omitted. lcp@320: \item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the lcp@320: priority of the left-hand side actually appears in a column on the far wenzelm@864: right. wenzelm@864: \item Alternatives are separated by~$|$. lcp@320: \item Repetition is indicated by dots~(\dots) in an informal but obvious lcp@320: way. lcp@320: \end{itemize} lcp@320: lcp@320: Using these conventions and assuming $\infty=9$, the grammar lcp@320: takes the form lcp@320: \begin{center} lcp@320: \begin{tabular}{rclc} lcp@320: $A$ & = & {\tt0} & \hspace*{4em} \\ lcp@320: & $|$ & {\tt(} $A$ {\tt)} \\ lcp@320: & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\ lcp@320: & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\ lcp@320: & $|$ & {\tt-} $A^{(3)}$ & (3) lcp@320: \end{tabular} lcp@320: \end{center} lcp@320: \index{priority grammars|)} lcp@320: lcp@320: lcp@320: \begin{figure} lcp@320: \begin{center} lcp@320: \begin{tabular}{rclc} clasohm@711: $any$ &=& $prop$ ~~$|$~~ $logic$ \\\\ wenzelm@864: $prop$ &=& {\tt(} $prop$ {\tt)} \\ wenzelm@864: &$|$& $prop^{(4)}$ {\tt::} $type$ & (3) \\ wenzelm@864: &$|$& {\tt PROP} $aprop$ \\ clasohm@711: &$|$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\ clasohm@711: &$|$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\ lcp@320: &$|$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\ lcp@320: &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\ wenzelm@864: &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\ wenzelm@864: &$|$& {\tt OFCLASS} {\tt(} $type$ {\tt,} $logic$ {\tt)} \\\\ lcp@320: $aprop$ &=& $id$ ~~$|$~~ $var$ clasohm@711: ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\ wenzelm@864: $logic$ &=& {\tt(} $logic$ {\tt)} \\ wenzelm@864: &$|$& $logic^{(4)}$ {\tt::} $type$ & (3) \\ wenzelm@864: &$|$& $id$ ~~$|$~~ $var$ wenzelm@864: ~~$|$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\ wenzelm@3694: &$|$& {\tt \%} $pttrns$ {\tt.} $any^{(3)}$ & (3) \\\\ lcp@320: $idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\ lcp@320: $idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\ lcp@320: &$|$& $id$ {\tt ::} $type$ & (0) \\\\ wenzelm@3694: $pttrns$ &=& $pttrn$ ~~$|$~~ $pttrn^{(1)}$ $pttrns$ \\\\ wenzelm@3694: $pttrn$ &=& $idt$ \\\\ wenzelm@864: $type$ &=& {\tt(} $type$ {\tt)} \\ wenzelm@864: &$|$& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$ wenzelm@864: ~~$|$~~ $tvar$ {\tt::} $sort$ \\ lcp@320: &$|$& $id$ ~~$|$~~ $type^{(\infty)}$ $id$ lcp@320: ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\ lcp@320: &$|$& $type^{(1)}$ {\tt =>} $type$ & (0) \\ wenzelm@864: &$|$& {\tt[} $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0) \\\\ lcp@320: $sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace} lcp@320: ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace} lcp@320: \end{tabular} lcp@320: \index{*PROP symbol} lcp@320: \index{*== symbol}\index{*=?= symbol}\index{*==> symbol} lcp@320: \index{*:: symbol}\index{*=> symbol} lcp@332: \index{sort constraints} lcp@332: %the index command: a percent is permitted, but braces must match! lcp@320: \index{%@{\tt\%} symbol} lcp@320: \index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol} lcp@320: \index{*[ symbol}\index{*] symbol} lcp@320: \index{*"!"! symbol} lcp@320: \index{*"["| symbol} lcp@320: \index{*"|"] symbol} lcp@320: \end{center} lcp@320: \caption{Meta-logic syntax}\label{fig:pure_gram} lcp@320: \end{figure} lcp@320: lcp@320: lcp@320: \section{The Pure syntax} \label{sec:basic_syntax} lcp@320: \index{syntax!Pure|(} lcp@320: lcp@320: At the root of all object-logics lies the theory \thydx{Pure}. It lcp@320: contains, among many other things, the Pure syntax. An informal account of wenzelm@864: this basic syntax (types, terms and formulae) appears in lcp@320: \iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}% lcp@320: {\S\ref{sec:forward}}. A more precise description using a priority grammar lcp@320: appears in Fig.\ts\ref{fig:pure_gram}. It defines the following lcp@320: nonterminals: lcp@320: \begin{ttdescription} wenzelm@864: \item[\ndxbold{any}] denotes any term. wenzelm@864: clasohm@711: \item[\ndxbold{prop}] denotes terms of type {\tt prop}. These are formulae wenzelm@864: of the meta-logic. Note that user constants of result type {\tt prop} wenzelm@864: (i.e.\ $c :: \ldots \To prop$) should always provide concrete syntax. wenzelm@864: Otherwise atomic propositions with head $c$ may be printed incorrectly. lcp@320: wenzelm@864: \item[\ndxbold{aprop}] denotes atomic propositions. wenzelm@864: wenzelm@864: %% FIXME huh!? wenzelm@864: % These typically wenzelm@864: % include the judgement forms of the object-logic; its definition wenzelm@864: % introduces a meta-level predicate for each judgement form. lcp@320: clasohm@711: \item[\ndxbold{logic}] denotes terms whose type belongs to class wenzelm@864: \cldx{logic}, excluding type \tydx{prop}. lcp@320: wenzelm@864: \item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained wenzelm@864: by types. wenzelm@3694: wenzelm@3694: \item[\ndxbold{pttrn}, \ndxbold{pttrns}] denote patterns for wenzelm@3694: abstraction, cases etc. Initially the same as $idt$ and $idts$, wenzelm@3694: these are indetended to be augmented by user extensions. lcp@320: lcp@320: \item[\ndxbold{type}] denotes types of the meta-logic. lcp@320: wenzelm@864: \item[\ndxbold{sort}] denotes meta-level sorts. lcp@320: \end{ttdescription} lcp@320: lcp@320: \begin{warn} lcp@320: In {\tt idts}, note that \verb|x::nat y| is parsed as \verb|x::(nat y)|, lcp@320: treating {\tt y} like a type constructor applied to {\tt nat}. The lcp@320: likely result is an error message. To avoid this interpretation, use lcp@320: parentheses and write \verb|(x::nat) y|. lcp@332: \index{type constraints}\index{*:: symbol} lcp@320: lcp@320: Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and lcp@320: yields an error. The correct form is \verb|(x::nat) (y::nat)|. lcp@320: \end{warn} lcp@320: nipkow@452: \begin{warn} paulson@3485: Type constraints bind very weakly. For example, \verb!x Syntax.syntax wenzelm@864: print_syntax : theory -> unit lcp@320: Syntax.print_syntax : Syntax.syntax -> unit lcp@320: Syntax.print_gram : Syntax.syntax -> unit lcp@320: Syntax.print_trans : Syntax.syntax -> unit lcp@320: \end{ttbox} lcp@320: The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes lcp@320: in \ML. You can display values of this type by calling the following lcp@320: functions: lcp@320: \begin{ttdescription} lcp@320: \item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle lcp@320: theory~{\it thy} as an \ML\ value. lcp@320: wenzelm@864: \item[\ttindexbold{print_syntax} $thy$] displays the syntax part of $thy$ wenzelm@864: using {\tt Syntax.print_syntax}. wenzelm@864: lcp@320: \item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all lcp@320: information contained in the syntax {\it syn}. The displayed output can lcp@320: be large. The following two functions are more selective. lcp@320: lcp@320: \item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part wenzelm@864: of~{\it syn}, namely the lexicon, logical types and productions. These are lcp@320: discussed below. lcp@320: lcp@320: \item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation lcp@320: part of~{\it syn}, namely the constants, parse/print macros and lcp@320: parse/print translations. lcp@320: \end{ttdescription} lcp@320: lcp@320: Let us demonstrate these functions by inspecting Pure's syntax. Even that lcp@320: is too verbose to display in full. lcp@320: \begin{ttbox}\index{*Pure theory} lcp@320: Syntax.print_syntax (syn_of Pure.thy); lcp@320: {\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots} wenzelm@864: {\out logtypes: fun itself} lcp@320: {\out prods:} lcp@320: {\out type = tid (1000)} lcp@320: {\out type = tvar (1000)} lcp@320: {\out type = id (1000)} lcp@320: {\out type = tid "::" sort[0] => "_ofsort" (1000)} lcp@320: {\out type = tvar "::" sort[0] => "_ofsort" (1000)} lcp@320: {\out \vdots} lcp@320: \ttbreak wenzelm@3108: {\out print modes: "symbols" "xterm"} lcp@320: {\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots} lcp@320: {\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"} lcp@320: {\out "_idtyp" "_lambda" "_tapp" "_tappl"} lcp@320: {\out parse_rules:} lcp@320: {\out parse_translation: "!!" "_K" "_abs" "_aprop"} lcp@320: {\out print_translation: "all"} lcp@320: {\out print_rules:} lcp@320: {\out print_ast_translation: "==>" "_abs" "_idts" "fun"} lcp@320: \end{ttbox} lcp@320: lcp@332: As you can see, the output is divided into labelled sections. The grammar wenzelm@864: is represented by {\tt lexicon}, {\tt logtypes} and {\tt prods}. The rest lcp@320: refers to syntactic translations and macro expansion. Here is an lcp@320: explanation of the various sections. lcp@320: \begin{description} lcp@320: \item[{\tt lexicon}] lists the delimiters used for lexical wenzelm@864: analysis.\index{delimiters} lcp@320: wenzelm@864: \item[{\tt logtypes}] lists the types that are regarded the same as {\tt paulson@3485: logic} syntactically. Thus types of object-logics (e.g.\ {\tt nat}, say) wenzelm@864: will be automatically equipped with the standard syntax of wenzelm@864: $\lambda$-calculus. lcp@320: lcp@320: \item[{\tt prods}] lists the \rmindex{productions} of the priority grammar. lcp@320: The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}. lcp@320: Each delimiter is quoted. Some productions are shown with {\tt =>} and lcp@320: an attached string. These strings later become the heads of parse lcp@320: trees; they also play a vital role when terms are printed (see lcp@320: \S\ref{sec:asts}). lcp@320: lcp@320: Productions with no strings attached are called {\bf copy lcp@320: productions}\indexbold{productions!copy}. Their right-hand side must lcp@320: have exactly one nonterminal symbol (or name token). The parser does lcp@320: not create a new parse tree node for copy productions, but simply lcp@320: returns the parse tree of the right-hand symbol. lcp@320: lcp@320: If the right-hand side consists of a single nonterminal with no lcp@320: delimiters, then the copy production is called a {\bf chain lcp@320: production}. Chain productions act as abbreviations: lcp@320: conceptually, they are removed from the grammar by adding new lcp@320: productions. Priority information attached to chain productions is lcp@320: ignored; only the dummy value $-1$ is displayed. wenzelm@3108: wenzelm@3108: \item[\ttindex{print_modes}] lists the alternative print modes wenzelm@3108: provided by this syntax (see \S\ref{sec:prmodes}). lcp@320: lcp@320: \item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}] lcp@320: relate to macros (see \S\ref{sec:macros}). lcp@320: lcp@320: \item[{\tt parse_ast_translation}, {\tt print_ast_translation}] lcp@320: list sets of constants that invoke translation functions for abstract lcp@320: syntax trees. Section \S\ref{sec:asts} below discusses this obscure lcp@320: matter.\index{constants!for translations} lcp@320: lcp@320: \item[{\tt parse_translation}, {\tt print_translation}] list sets lcp@320: of constants that invoke translation functions for terms (see lcp@320: \S\ref{sec:tr_funs}). lcp@320: \end{description} lcp@320: \index{syntax!Pure|)} lcp@320: lcp@320: lcp@320: \section{Mixfix declarations} \label{sec:mixfix} wenzelm@864: \index{mixfix declarations|(} lcp@320: lcp@320: When defining a theory, you declare new constants by giving their names, lcp@320: their type, and an optional {\bf mixfix annotation}. Mixfix annotations lcp@320: allow you to extend Isabelle's basic $\lambda$-calculus syntax with lcp@320: readable notation. They can express any context-free priority grammar. lcp@320: Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more wenzelm@864: general than the priority declarations of \ML\ and Prolog. lcp@320: lcp@320: A mixfix annotation defines a production of the priority grammar. It lcp@320: describes the concrete syntax, the translation to abstract syntax, and the lcp@320: pretty printing. Special case annotations provide a simple means of wenzelm@864: specifying infix operators and binders. lcp@320: lcp@320: \subsection{The general mixfix form} lcp@320: Here is a detailed account of mixfix declarations. Suppose the following wenzelm@864: line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy} wenzelm@864: file: lcp@320: \begin{center} lcp@320: {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)} lcp@320: \end{center} lcp@332: This constant declaration and mixfix annotation are interpreted as follows: lcp@320: \begin{itemize}\index{productions} lcp@320: \item The string {\tt $c$} is the name of the constant associated with the lcp@320: production; unless it is a valid identifier, it must be enclosed in lcp@320: quotes. If $c$ is empty (given as~{\tt ""}) then this is a copy lcp@320: production.\index{productions!copy} Otherwise, parsing an instance of the lcp@320: phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$ lcp@320: $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th lcp@320: argument. lcp@320: wenzelm@864: \item The constant $c$, if non-empty, is declared to have type $\sigma$ wenzelm@864: ({\tt consts} section only). lcp@320: lcp@320: \item The string $template$ specifies the right-hand side of lcp@320: the production. It has the form wenzelm@864: \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \] lcp@320: where each occurrence of {\tt_} denotes an argument position and lcp@320: the~$w@i$ do not contain~{\tt _}. (If you want a literal~{\tt _} in lcp@320: the concrete syntax, you must escape it as described below.) The $w@i$ wenzelm@864: may consist of \rmindex{delimiters}, spaces or lcp@320: \rmindex{pretty printing} annotations (see below). lcp@320: lcp@320: \item The type $\sigma$ specifies the production's nonterminal symbols lcp@320: (or name tokens). If $template$ is of the form above then $\sigma$ lcp@320: must be a function type with at least~$n$ argument positions, say lcp@320: $\sigma = [\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are lcp@320: derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described wenzelm@864: below. Any of these may be function types. lcp@320: lcp@320: \item The optional list~$ps$ may contain at most $n$ integers, say {\tt lcp@320: [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal lcp@320: priority\indexbold{priorities} required of any phrase that may appear lcp@320: as the $i$-th argument. Missing priorities default to~0. lcp@320: lcp@320: \item The integer $p$ is the priority of this production. If omitted, it lcp@320: defaults to the maximal priority. lcp@320: Priorities range between 0 and \ttindexbold{max_pri} (= 1000). lcp@320: \end{itemize} lcp@320: % wenzelm@864: The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, wenzelm@864: A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the wenzelm@864: nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively. wenzelm@864: The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if wenzelm@864: this is a logical type (namely one of class {\tt logic} excluding {\tt wenzelm@864: prop}). Otherwise it is $ty$ (note that only the outermost type constructor wenzelm@864: is taken into account). Finally, the nonterminal of a type variable is {\tt wenzelm@864: any}. wenzelm@864: wenzelm@911: \begin{warn} wenzelm@864: Theories must sometimes declare types for purely syntactic purposes --- paulson@3485: merely playing the role of nonterminals. One example is \tydx{type}, the wenzelm@864: built-in type of types. This is a `type of all types' in the syntactic wenzelm@864: sense only. Do not declare such types under {\tt arities} as belonging to wenzelm@864: class {\tt logic}\index{*logic class}, for that would make them useless as wenzelm@864: separate nonterminal symbols. wenzelm@864: \end{warn} wenzelm@864: wenzelm@864: Associating nonterminals with types allows a constant's type to specify wenzelm@864: syntax as well. We can declare the function~$f$ to have type $[\tau@1, wenzelm@864: \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout wenzelm@864: of the function's $n$ arguments. The constant's name, in this case~$f$, will wenzelm@864: also serve as the label in the abstract syntax tree. wenzelm@864: wenzelm@864: You may also declare mixfix syntax without adding constants to the theory's wenzelm@864: signature, by using a {\tt syntax} section instead of {\tt consts}. Thus a wenzelm@864: production need not map directly to a logical function (this typically wenzelm@864: requires additional syntactic translations, see also wenzelm@864: Chapter~\ref{chap:syntax}). wenzelm@864: wenzelm@864: wenzelm@911: \medskip wenzelm@911: As a special case of the general mixfix declaration, the form wenzelm@864: \begin{center} wenzelm@911: {\tt $c$ ::\ "$\sigma$" ("$template$")} wenzelm@864: \end{center} wenzelm@864: specifies no priorities. The resulting production puts no priority wenzelm@864: constraints on any of its arguments and has maximal priority itself. wenzelm@864: Omitting priorities in this manner is prone to syntactic ambiguities unless berghofe@3098: the production's right-hand side is fully bracketed, as in berghofe@3098: \verb|"if _ then _ else _ fi"|. lcp@320: lcp@320: Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"}, lcp@320: is sensible only if~$c$ is an identifier. Otherwise you will be unable to lcp@320: write terms involving~$c$. lcp@320: lcp@320: lcp@320: \subsection{Example: arithmetic expressions} lcp@320: \index{examples!of mixfix declarations} wenzelm@864: This theory specification contains a {\tt syntax} section with mixfix lcp@320: declarations encoding the priority grammar from lcp@320: \S\ref{sec:priority_grammars}: lcp@320: \begin{ttbox} wenzelm@3108: ExpSyntax = Pure + lcp@320: types lcp@320: exp wenzelm@864: syntax clasohm@1387: "0" :: exp ("0" 9) clasohm@1387: "+" :: [exp, exp] => exp ("_ + _" [0, 1] 0) clasohm@1387: "*" :: [exp, exp] => exp ("_ * _" [3, 2] 2) clasohm@1387: "-" :: exp => exp ("- _" [3] 3) lcp@320: end lcp@320: \end{ttbox} wenzelm@3108: If you put this into a file {\tt ExpSyntax.thy} and load it via {\tt wenzelm@3108: use_thy"ExpSyntax"}, you can run some tests: lcp@320: \begin{ttbox} wenzelm@3108: val read_exp = Syntax.test_read (syn_of ExpSyntax.thy) "exp"; lcp@320: {\out val it = fn : string -> unit} lcp@320: read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0"; lcp@320: {\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"} lcp@320: {\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")} lcp@320: {\out \vdots} lcp@320: read_exp "0 + - 0 + 0"; lcp@320: {\out tokens: "0" "+" "-" "0" "+" "0"} lcp@320: {\out raw: ("+" ("+" "0" ("-" "0")) "0")} lcp@320: {\out \vdots} lcp@320: \end{ttbox} lcp@320: The output of \ttindex{Syntax.test_read} includes the token list ({\tt lcp@320: tokens}) and the raw \AST{} directly derived from the parse tree, lcp@320: ignoring parse \AST{} translations. The rest is tracing information lcp@320: provided by the macro expander (see \S\ref{sec:macros}). lcp@320: wenzelm@864: Executing {\tt Syntax.print_gram} reveals the productions derived from the wenzelm@864: above mixfix declarations (lots of additional information deleted): lcp@320: \begin{ttbox} wenzelm@3108: Syntax.print_gram (syn_of ExpSyntax.thy); lcp@320: {\out exp = "0" => "0" (9)} lcp@320: {\out exp = exp[0] "+" exp[1] => "+" (0)} lcp@320: {\out exp = exp[3] "*" exp[2] => "*" (2)} lcp@320: {\out exp = "-" exp[3] => "-" (3)} lcp@320: \end{ttbox} lcp@320: wenzelm@3108: Note that because {\tt exp} is not of class {\tt logic}, it has been paulson@3485: retained as a separate nonterminal. This also entails that the syntax wenzelm@3108: does not provide for identifiers or paranthesized expressions. wenzelm@3108: Normally you would also want to add the declaration {\tt arities wenzelm@3108: exp::logic} after {\tt types} and use {\tt consts} instead of {\tt paulson@3485: syntax}. Try this as an exercise and study the changes in the nipkow@867: grammar. lcp@320: lcp@320: \subsection{The mixfix template} wenzelm@864: Let us now take a closer look at the string $template$ appearing in mixfix lcp@320: annotations. This string specifies a list of parsing and printing lcp@320: directives: delimiters\index{delimiters}, arguments, spaces, blocks of lcp@320: indentation and line breaks. These are encoded by the following character lcp@320: sequences: lcp@320: \index{pretty printing|(} lcp@320: \begin{description} lcp@320: \item[~$d$~] is a delimiter, namely a non-empty sequence of characters lcp@320: other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}. lcp@320: Even these characters may appear if escaped; this means preceding it with lcp@320: a~{\tt '} (single quote). Thus you have to write {\tt ''} if you really wenzelm@911: want a single quote. Furthermore, a~{\tt '} followed by a space separates wenzelm@911: delimiters without extra white space being added for printing. lcp@320: lcp@320: \item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol lcp@320: or name token. lcp@320: lcp@320: \item[~$s$~] is a non-empty sequence of spaces for printing. This and the lcp@320: following specifications do not affect parsing at all. lcp@320: lcp@320: \item[~{\tt(}$n$~] opens a pretty printing block. The optional number $n$ lcp@320: specifies how much indentation to add when a line break occurs within the lcp@320: block. If {\tt(} is not followed by digits, the indentation defaults lcp@320: to~0. lcp@320: lcp@320: \item[~{\tt)}~] closes a pretty printing block. lcp@320: lcp@320: \item[~{\tt//}~] forces a line break. lcp@320: lcp@320: \item[~{\tt/}$s$~] allows a line break. Here $s$ stands for the string of lcp@320: spaces (zero or more) right after the {\tt /} character. These spaces lcp@320: are printed if the break is not taken. lcp@320: \end{description} lcp@320: For example, the template {\tt"(_ +/ _)"} specifies an infix operator. lcp@320: There are two argument positions; the delimiter~{\tt+} is preceded by a lcp@320: space and followed by a space or line break; the entire phrase is a pretty lcp@320: printing block. Other examples appear in Fig.\ts\ref{fig:set_trans} below. lcp@320: Isabelle's pretty printer resembles the one described in lcp@320: Paulson~\cite{paulson91}. lcp@320: lcp@320: \index{pretty printing|)} lcp@320: lcp@320: lcp@320: \subsection{Infixes} lcp@320: \indexbold{infixes} lcp@320: wenzelm@3108: Infix operators associating to the left or right can be declared using wenzelm@3108: {\tt infixl} or {\tt infixr}. Basically, the form {\tt $c$ ::\ wenzelm@3108: $\sigma$ (infixl $p$)} abbreviates the mixfix declarations lcp@320: \begin{ttbox} clasohm@1387: "op \(c\)" :: \(\sigma\) ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\)) clasohm@1387: "op \(c\)" :: \(\sigma\) ("op \(c\)") lcp@320: \end{ttbox} clasohm@1387: and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations lcp@320: \begin{ttbox} clasohm@1387: "op \(c\)" :: \(\sigma\) ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\)) clasohm@1387: "op \(c\)" :: \(\sigma\) ("op \(c\)") lcp@320: \end{ttbox} lcp@320: The infix operator is declared as a constant with the prefix {\tt op}. lcp@320: Thus, prefixing infixes with \sdx{op} makes them behave like ordinary lcp@320: function symbols, as in \ML. Special characters occurring in~$c$ must be lcp@320: escaped, as in delimiters, using a single quote. lcp@320: wenzelm@3108: A slightly more general form of infix declarations allows constant wenzelm@3108: names to be independent from their concrete syntax, namely \texttt{$c$ paulson@3485: ::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}. As wenzelm@3108: an example consider: wenzelm@3108: \begin{ttbox} wenzelm@3108: and :: [bool, bool] => bool (infixr "&" 35) wenzelm@3108: \end{ttbox} wenzelm@3108: The internal constant name will then be just \texttt{and}, without any wenzelm@3108: \texttt{op} prefixed. wenzelm@3108: lcp@320: lcp@320: \subsection{Binders} lcp@320: \indexbold{binders} lcp@320: \begingroup lcp@320: \def\Q{{\cal Q}} lcp@320: A {\bf binder} is a variable-binding construct such as a quantifier. The lcp@320: constant declaration lcp@320: \begin{ttbox} clasohm@1387: \(c\) :: \(\sigma\) (binder "\(\Q\)" [\(pb\)] \(p\)) lcp@320: \end{ttbox} lcp@320: introduces a constant~$c$ of type~$\sigma$, which must have the form lcp@320: $(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where lcp@320: $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$ paulson@3485: and the whole term has type~$\tau@3$. The optional integer $pb$ lcp@1060: specifies the body's priority, by default~$p$. Special characters clasohm@877: in $\Q$ must be escaped using a single quote. lcp@320: wenzelm@864: The declaration is expanded internally to something like lcp@320: \begin{ttbox} berghofe@3098: \(c\)\hskip3pt :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\) berghofe@3098: "\(\Q\)" :: [idts, \(\tau@2\)] => \(\tau@3\) ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\)) lcp@320: \end{ttbox} lcp@320: Here \ndx{idts} is the nonterminal symbol for a list of identifiers with lcp@332: \index{type constraints} lcp@320: optional type constraints (see Fig.\ts\ref{fig:pure_gram}). The lcp@320: declaration also installs a parse translation\index{translations!parse} lcp@320: for~$\Q$ and a print translation\index{translations!print} for~$c$ to lcp@320: translate between the internal and external forms. lcp@320: lcp@320: A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a lcp@320: list of variables. The external form $\Q~x@1~x@2 \ldots x@n. P$ lcp@320: corresponds to the internal form lcp@320: \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \] lcp@320: lcp@320: \medskip lcp@320: For example, let us declare the quantifier~$\forall$:\index{quantifiers} lcp@320: \begin{ttbox} clasohm@1387: All :: ('a => o) => o (binder "ALL " 10) lcp@320: \end{ttbox} lcp@320: This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL lcp@320: $x$.$P$}. When printing, Isabelle prefers the latter form, but must fall lcp@320: back on ${\tt All}(P)$ if $P$ is not an abstraction. Both $P$ and {\tt ALL lcp@320: $x$.$P$} have type~$o$, the type of formulae, while the bound variable lcp@320: can be polymorphic. lcp@320: \endgroup lcp@320: lcp@320: \index{mixfix declarations|)} lcp@320: wenzelm@3108: wenzelm@3108: \section{*Alternative print modes} \label{sec:prmodes} wenzelm@3108: \index{print modes|(} wenzelm@3108: % paulson@3485: Isabelle's pretty printer supports alternative output syntaxes. These paulson@3485: may be used independently or in cooperation. The currently active wenzelm@3108: print modes (with precedence from left to right) are determined by a wenzelm@3108: reference variable. wenzelm@3108: \begin{ttbox}\index{*print_mode} wenzelm@3108: print_mode: string list ref wenzelm@3108: \end{ttbox} wenzelm@3108: Initially this may already contain some print mode identifiers, wenzelm@3108: depending on how Isabelle has been invoked (e.g.\ by some user paulson@3485: interface). So changes should be incremental --- adding or deleting wenzelm@3108: modes relative to the current value. wenzelm@3108: wenzelm@3108: Any \ML{} string is a legal print mode identifier, without any wenzelm@3108: predeclaration required. The following names should be considered wenzelm@3108: reserved, though: \texttt{""} (yes, the empty string), wenzelm@3108: \texttt{symbols}, \texttt{latex}, \texttt{xterm}. wenzelm@3108: wenzelm@3108: There is a separate table of mixfix productions for pretty printing paulson@3485: associated with each print mode. The currently active ones are wenzelm@3108: conceptually just concatenated from left to right, with the standard wenzelm@3108: syntax output table always coming last as default. Thus mixfix wenzelm@3108: productions of preceding modes in the list may override those of later wenzelm@3108: ones. Also note that token translations are always relative to some wenzelm@3108: print mode (see \S\ref{sec:tok_tr}). wenzelm@3108: wenzelm@3108: \medskip The canonical application of print modes is optional printing wenzelm@3108: of mathematical symbols from a special screen font instead of {\sc paulson@3485: ascii}. Another example is to re-use Isabelle's advanced wenzelm@3108: $\lambda$-term printing mechanisms to generate completely different wenzelm@3228: output, say for interfacing external tools like \rmindex{model wenzelm@3228: checkers} (see also \texttt{HOL/Modelcheck}). wenzelm@3108: wenzelm@3108: \index{print modes|)} wenzelm@3108: wenzelm@3108: clasohm@711: \section{Ambiguity of parsed expressions} \label{sec:ambiguity} clasohm@711: \index{ambiguity!of parsed expressions} clasohm@711: clasohm@711: To keep the grammar small and allow common productions to be shared wenzelm@864: all logical types (except {\tt prop}) are internally represented paulson@3485: by one nonterminal, namely {\tt logic}. This and omitted or too freely clasohm@711: chosen priorities may lead to ways of parsing an expression that were paulson@3485: not intended by the theory's maker. In most cases Isabelle is able to wenzelm@864: select one of multiple parse trees that an expression has lead paulson@3485: to by checking which of them can be typed correctly. But this may not clasohm@711: work in every case and always slows down parsing. wenzelm@864: The warning and error messages that can be produced during this process are clasohm@711: as follows: clasohm@711: clasohm@880: If an ambiguity can be resolved by type inference the following clasohm@880: warning is shown to remind the user that parsing is (unnecessarily) paulson@3485: slowed down. In cases where it's not easily possible to eliminate the clasohm@880: ambiguity the frequency of the warning can be controlled by changing clasohm@883: the value of {\tt Syntax.ambiguity_level} which has type {\tt int paulson@3485: ref}. Its default value is 1 and by increasing it one can control how clasohm@883: many parse trees are necessary to generate the warning. clasohm@711: clasohm@711: \begin{ttbox} clasohm@711: {\out Warning: Ambiguous input "..."} clasohm@711: {\out produces the following parse trees:} clasohm@711: {\out ...} clasohm@711: {\out Fortunately, only one parse tree is type correct.} clasohm@711: {\out It helps (speed!) if you disambiguate your grammar or your input.} clasohm@711: \end{ttbox} clasohm@711: clasohm@711: The following message is normally caused by using the same clasohm@711: syntax in two different productions: clasohm@711: clasohm@711: \begin{ttbox} clasohm@711: {\out Warning: Ambiguous input "..."} clasohm@711: {\out produces the following parse trees:} clasohm@711: {\out ...} clasohm@711: {\out Error: More than one term is type correct:} clasohm@711: {\out ...} clasohm@711: \end{ttbox} clasohm@711: clasohm@866: Ambiguities occuring in syntax translation rules cannot be resolved by clasohm@866: type inference because it is not necessary for these rules to be type paulson@3485: correct. Therefore Isabelle always generates an error message and the clasohm@866: ambiguity should be eliminated by changing the grammar or the rule. clasohm@711: lcp@320: lcp@320: \section{Example: some minimal logics} \label{sec:min_logics} lcp@320: \index{examples!of logic definitions} lcp@320: lcp@320: This section presents some examples that have a simple syntax. They lcp@320: demonstrate how to define new object-logics from scratch. lcp@320: clasohm@711: First we must define how an object-logic syntax is embedded into the wenzelm@864: meta-logic. Since all theorems must conform to the syntax for~\ndx{prop} wenzelm@864: (see Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the lcp@320: object-level syntax. Assume that the syntax of your object-logic defines a wenzelm@864: meta-type~\tydx{o} of formulae which refers to the nonterminal {\tt logic}. wenzelm@864: These formulae can now appear in axioms and theorems wherever \ndx{prop} does wenzelm@864: if you add the production wenzelm@864: \[ prop ~=~ logic. \] wenzelm@864: This is not supposed to be a copy production but an implicit coercion from wenzelm@864: formulae to propositions: lcp@320: \begin{ttbox} lcp@320: Base = Pure + lcp@320: types lcp@320: o lcp@320: arities lcp@320: o :: logic lcp@320: consts clasohm@1387: Trueprop :: o => prop ("_" 5) lcp@320: end lcp@320: \end{ttbox} lcp@320: The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible lcp@332: coercion function. Assuming this definition resides in a file {\tt Base.thy}, lcp@320: you have to load it with the command {\tt use_thy "Base"}. lcp@320: lcp@320: One of the simplest nontrivial logics is {\bf minimal logic} of lcp@320: implication. Its definition in Isabelle needs no advanced features but lcp@320: illustrates the overall mechanism nicely: lcp@320: \begin{ttbox} lcp@320: Hilbert = Base + lcp@320: consts clasohm@1387: "-->" :: [o, o] => o (infixr 10) lcp@320: rules lcp@320: K "P --> Q --> P" lcp@320: S "(P --> Q --> R) --> (P --> Q) --> P --> R" lcp@320: MP "[| P --> Q; P |] ==> Q" lcp@320: end lcp@320: \end{ttbox} lcp@332: After loading this definition from the file {\tt Hilbert.thy}, you can lcp@320: start to prove theorems in the logic: lcp@320: \begin{ttbox} lcp@320: goal Hilbert.thy "P --> P"; lcp@320: {\out Level 0} lcp@320: {\out P --> P} lcp@320: {\out 1. P --> P} lcp@320: \ttbreak lcp@320: by (resolve_tac [Hilbert.MP] 1); lcp@320: {\out Level 1} lcp@320: {\out P --> P} lcp@320: {\out 1. ?P --> P --> P} lcp@320: {\out 2. ?P} lcp@320: \ttbreak lcp@320: by (resolve_tac [Hilbert.MP] 1); lcp@320: {\out Level 2} lcp@320: {\out P --> P} lcp@320: {\out 1. ?P1 --> ?P --> P --> P} lcp@320: {\out 2. ?P1} lcp@320: {\out 3. ?P} lcp@320: \ttbreak lcp@320: by (resolve_tac [Hilbert.S] 1); lcp@320: {\out Level 3} lcp@320: {\out P --> P} lcp@320: {\out 1. P --> ?Q2 --> P} lcp@320: {\out 2. P --> ?Q2} lcp@320: \ttbreak lcp@320: by (resolve_tac [Hilbert.K] 1); lcp@320: {\out Level 4} lcp@320: {\out P --> P} lcp@320: {\out 1. P --> ?Q2} lcp@320: \ttbreak lcp@320: by (resolve_tac [Hilbert.K] 1); lcp@320: {\out Level 5} lcp@320: {\out P --> P} lcp@320: {\out No subgoals!} lcp@320: \end{ttbox} lcp@320: As we can see, this Hilbert-style formulation of minimal logic is easy to lcp@320: define but difficult to use. The following natural deduction formulation is lcp@320: better: lcp@320: \begin{ttbox} lcp@320: MinI = Base + lcp@320: consts clasohm@1387: "-->" :: [o, o] => o (infixr 10) lcp@320: rules lcp@320: impI "(P ==> Q) ==> P --> Q" lcp@320: impE "[| P --> Q; P |] ==> Q" lcp@320: end lcp@320: \end{ttbox} lcp@320: Note, however, that although the two systems are equivalent, this fact lcp@320: cannot be proved within Isabelle. Axioms {\tt S} and {\tt K} can be lcp@320: derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt lcp@320: Hilbert}. The reason is that {\tt impI} is only an {\bf admissible} rule lcp@320: in {\tt Hilbert}, something that can only be shown by induction over all lcp@320: possible proofs in {\tt Hilbert}. lcp@320: lcp@320: We may easily extend minimal logic with falsity: lcp@320: \begin{ttbox} lcp@320: MinIF = MinI + lcp@320: consts clasohm@1387: False :: o lcp@320: rules lcp@320: FalseE "False ==> P" lcp@320: end lcp@320: \end{ttbox} lcp@320: On the other hand, we may wish to introduce conjunction only: lcp@320: \begin{ttbox} lcp@320: MinC = Base + lcp@320: consts clasohm@1387: "&" :: [o, o] => o (infixr 30) lcp@320: \ttbreak lcp@320: rules lcp@320: conjI "[| P; Q |] ==> P & Q" lcp@320: conjE1 "P & Q ==> P" lcp@320: conjE2 "P & Q ==> Q" lcp@320: end lcp@320: \end{ttbox} lcp@320: And if we want to have all three connectives together, we create and load a wenzelm@3108: theory file consisting of a single line: lcp@320: \begin{ttbox} lcp@320: MinIFC = MinIF + MinC lcp@320: \end{ttbox} lcp@320: Now we can prove mixed theorems like lcp@320: \begin{ttbox} lcp@320: goal MinIFC.thy "P & False --> Q"; lcp@320: by (resolve_tac [MinI.impI] 1); lcp@320: by (dresolve_tac [MinC.conjE2] 1); lcp@320: by (eresolve_tac [MinIF.FalseE] 1); lcp@320: \end{ttbox} lcp@320: Try this as an exercise!