2 \chapter{Isar Syntax Primitives}
4 We give a complete reference of all basic syntactic entities underlying the
5 Isabelle/Isar document syntax. Actual theory and proof commands will be
10 In order to get started with writing well-formed Isabelle/Isar documents, the
11 most important aspect to be noted is the difference of \emph{inner} versus
12 \emph{outer} syntax. Inner syntax is that of Isabelle types and terms of the
13 logic, while outer syntax is that of Isabelle/Isar theories (including
14 proofs). As a general rule, inner syntax entities may occur only as
15 \emph{atomic entities} within outer syntax. For example, the string
16 \texttt{"x + y"} and identifier \texttt{z} are legal term specifications
17 within a theory, while \texttt{x + y} is not.
20 Note that classic Isabelle theories used to fake parts of the inner syntax
21 of types, with rather complicated rules when quotes may be omitted. Despite
22 the minor drawback of requiring quotes more often, the syntax of
23 Isabelle/Isar is much simpler and more robust in that respect.
28 Isabelle/Isar input may contain any number of input termination characters
29 ``\texttt{;}'' (semicolon) to separate commands explicitly. This may be
30 particularly useful in interactive shell sessions to make clear where the
31 current command is intended to end. Otherwise, the interactive loop will
32 continue until end-of-command in clearly indicated from the input syntax,
33 e.g.\ encounter of the next command keyword.
35 Advanced interfaces such as Proof~General \cite{proofgeneral} do not require
36 explicit semicolons, the amount of input text is determined automatically by
37 inspecting the Emacs text buffer. Also note that in the presentation of
38 Isabelle/Isar documents, semicolons are omitted in any case to gain
42 \section{Lexical matters}\label{sec:lex-syntax}
44 The Isabelle/Isar outer syntax provides token classes as presented below.
45 Note that some of these coincide (by full intention) with the inner lexical
46 syntax as presented in \cite{isabelle-ref}. These different levels of syntax
47 should not be confused, though.
49 %FIXME keyword, command
50 \indexoutertoken{ident}\indexoutertoken{longident}\indexoutertoken{symident}
51 \indexoutertoken{nat}\indexoutertoken{var}\indexoutertoken{typefree}
52 \indexoutertoken{typevar}\indexoutertoken{string}\indexoutertoken{verbatim}
53 \begin{matharray}{rcl}
54 ident & = & letter~quasiletter^* \\
55 longident & = & ident\verb,.,ident~\dots~ident \\
56 symident & = & sym^+ ~|~ symbol \\
58 var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
59 typefree & = & \verb,',ident \\
60 typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
61 string & = & \verb,", ~\dots~ \verb,", \\
62 verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\
64 \begin{matharray}{rcl}
65 letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
66 digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
67 quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
68 sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~ %$
69 \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~
70 \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \texttt{\at} ~|~ \\
71 & & \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\
72 symbol & = & {\forall} ~|~ {\exists} ~|~ \dots
75 The syntax of \texttt{string} admits any characters, including newlines;
76 ``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) have to be escaped by
77 a backslash; newlines need not be escaped. Note that ML-style control
78 characters are \emph{not} supported. The body of \texttt{verbatim} may
79 consist of any text not containing ``\verb|*}|''.
81 Comments take the form \texttt{(*~\dots~*)} and may be
82 nested\footnote{Proof~General may occasionally get confused by nested
83 comments.}, just as in ML. Note that these are \emph{source} comments only,
84 which are stripped after lexical analysis of the input. The Isar document
85 syntax also provides \emph{formal comments} that are actually part of the text
86 (see \S\ref{sec:comments}).
89 \section{Common syntax entities}
91 Subsequently, we introduce several basic syntactic entities, such as names,
92 terms, and theorem specifications, which have been factored out of the actual
93 Isar language elements to be described later.
95 Note that some of the basic syntactic entities introduced below (e.g.\
96 \railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\
97 \railnonterm{sort}), especially for the sake of error messages. E.g.\ syntax
98 elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
99 would really report a missing name or type rather than any of the constituent
100 primitive tokens such as \railtoken{ident} or \railtoken{string}.
105 Entity \railqtoken{name} usually refers to any name of types, constants,
106 theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
107 identifiers are excluded here). Quoted strings provide an escape for
108 non-identifier names or those ruled out by outer syntax keywords (e.g.\
109 \verb|"let"|). Already existing objects are usually referenced by
110 \railqtoken{nameref}.
112 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
113 \indexoutertoken{int}
115 name: ident | symident | string | nat
117 parname: '(' name ')'
119 nameref: name | longident
126 \subsection{Comments}\label{sec:comments}
128 Large chunks of plain \railqtoken{text} are usually given
129 \railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|. For
130 convenience, any of the smaller text units conforming to \railqtoken{nameref}
131 are admitted as well. Almost any of the Isar commands may be annotated by
132 marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
133 Note that the latter kind of comment is actually part of the language, while
134 source level comments \verb|(*|~\dots~\verb|*)| are stripped at the lexical
135 level. A few commands such as $\PROOFNAME$ admit additional markup with a
136 ``level of interest'': \texttt{\%} followed by an optional number $n$ (default
137 $n = 1$) indicates that the respective part of the document becomes $n$ levels
138 more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
139 every hope, who enter here.
141 \indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
143 text: verbatim | nameref
145 comment: ('--' text +)
147 interest: percent nat? | ppercent
152 \subsection{Type classes, sorts and arities}
154 Classes are specified by plain names. Sorts have a very simple inner syntax,
155 which is either a single class name $c$ or a list $\{c@1, \dots, c@n\}$
156 referring to the intersection of these classes. The syntax of type arities is
157 given directly at the outer level.
159 \indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
160 \indexouternonterm{classdecl}
162 classdecl: name ('<' (nameref + ','))?
166 arity: ('(' (sort + ',') ')')? sort
168 simplearity: ('(' (sort + ',') ')')? nameref
173 \subsection{Types and terms}\label{sec:types-terms}
175 The actual inner Isabelle syntax, that of types and terms of the logic, is far
176 too sophisticated in order to be modelled explicitly at the outer theory
177 level. Basically, any such entity has to be quoted to turn it into a single
178 token (the parsing and type-checking is performed internally later). For
179 convenience, a slightly more liberal convention is adopted: quotes may be
180 omitted for any type or term that is already \emph{atomic} at the outer level.
181 For example, one may write just \texttt{x} instead of \texttt{"x"}. Note that
182 symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,
183 provided these are not superseded by commands or keywords (e.g.\ \texttt{+}).
185 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
187 type: nameref | typefree | typevar
195 Positional instantiations are indicated by giving a sequence of terms, or the
196 placeholder ``$\_$'' (underscore), which means to skip a position.
198 \indexoutertoken{inst}\indexoutertoken{insts}
200 inst: underscore | term
206 Type declarations and definitions usually refer to \railnonterm{typespec} on
207 the left-hand side. This models basic type constructor application at the
208 outer syntax level. Note that only plain postfix notation is available here,
211 \indexouternonterm{typespec}
213 typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
218 \subsection{Term patterns}\label{sec:term-pats}
220 Assumptions and goal statements usually admit casual binding of schematic term
221 variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
222 There are separate versions available for \railqtoken{term}s and
223 \railqtoken{prop}s. The latter provides a $\CONCLNAME$ part with patterns
224 referring the (atomic) conclusion of a rule.
226 \indexouternonterm{termpat}\indexouternonterm{proppat}
228 termpat: '(' ('is' term +) ')'
230 proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'
235 \subsection{Mixfix annotations}
237 Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
238 terms (see also \cite{isabelle-ref}). Some commands such as $\TYPES$ (see
239 \S\ref{sec:types-pure}) admit infixes only, while $\CONSTS$ (see
240 \S\ref{sec:consts}) and $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans})
241 support the full range of general mixfixes and binders.
243 \indexouternonterm{infix}\indexouternonterm{mixfix}
245 infix: '(' ('infixl' | 'infixr') string? nat ')'
247 mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
250 prios: '[' (nat + ',') ']'
255 \subsection{Attributes and theorems}\label{sec:syn-att}
257 Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
258 ``semi-inner'' syntax, in the sense that input conforming to
259 \railnonterm{args} below is parsed by the attribute a second time. The
260 attribute argument specifications may be any sequence of atomic entities
261 (identifiers, strings etc.), or properly bracketed argument lists. Below
262 \railqtoken{atom} refers to any atomic entity, including any
263 \railtoken{keyword} conforming to \railtoken{symident}.
265 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
267 atom: nameref | typefree | typevar | var | nat | keyword
269 arg: atom | '(' args ')' | '[' args ']'
273 attributes: '[' (nameref args * ',') ']'
277 Theorem specifications come in several flavors: \railnonterm{axmdecl} and
278 \railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
279 statements, while \railnonterm{thmdef} collects lists of existing theorems.
280 Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
281 the former requires an actual singleton result. Any of these theorem
282 specifications may include lists of attributes both on the left and right hand
283 sides; attributes are applied to any immediately preceding theorem. If names
284 are omitted, the theorems are not stored within the theorem database of the
285 theory or proof context; any given attributes are still applied, though.
287 \indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
288 \indexouternonterm{thmdef}\indexouternonterm{thmrefs}
290 axmdecl: name attributes? ':'
296 thmref: nameref attributes?
301 thmbind: name attributes | name | attributes
306 \subsection{Proof methods}\label{sec:syn-meth}
308 Proof methods are either basic ones, or expressions composed of methods via
309 ``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),
310 ``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once). In practice,
311 proof methods are usually just a comma separated list of
312 \railqtoken{nameref}~\railnonterm{args} specifications. Note that parentheses
313 may be dropped for single method specifications (with no arguments).
315 \indexouternonterm{method}
317 method: (nameref | '(' methods ')') (() | '?' | '+')
319 methods: (nameref args | method) + (',' | '|')
323 Proper use of Isar proof methods does \emph{not} involve goal addressing.
324 Nevertheless, specifying goal ranges may occasionally come in handy in
325 emulating tactic scripts. Note that $[n-]$ refers to all goals, starting from
326 $n$. All goals may be specified by $[!]$, which is the same as $[1-]$.
328 \indexouternonterm{goalspec}
330 goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'
335 \subsection{Antiquotations}\label{sec:antiq}
337 The text body of formal comments (see also \S\ref{sec:comments}) may contain
338 antiquotations of logical entities, such as theorems, terms and types, which
339 are to be presented in the final output produced by the Isabelle document
340 preparation system (see also \S\ref{sec:document-prep}).
343 \texttt{{\at}{\ttlbrace}term[show_types]~"f(x)~=~a~+~x"{\ttrbrace}} within a
344 text block would cause
345 \isa{(f{\isasymColon}'a~{\isasymRightarrow}~'a)~(x{\isasymColon}'a)~=~(a{\isasymColon}'a)~+~x}
346 to appear in the final {\LaTeX} document.
348 Antiquotations do not only spare the author from tedious typing, but also
349 achieve some degree of consistency-checking of informal explanations with
350 formal developments, since well-formedness of terms and types with respect to
351 the current theory or proof context can be ensured. The $text$ antiquotation
352 is an exception to this rule: it prints an uninterpreted text argument that is
353 not checked in any way.
355 \indexisarant{thm}\indexisarant{prop}\indexisarant{term}
356 \indexisarant{typ}\indexisarant{name}
358 atsign lbrace antiquotation rbrace
362 'thm' options thmrefs |
363 'prop' options prop |
364 'term' options term |
368 options: '[' (option * ',') ']'
370 option: name | name '=' name
374 Note that the syntax of antiquotations may \emph{not} include source comments
375 \texttt{(*~\dots~*)} or verbatim text \verb|{*|~\dots~\verb|*}|.
379 The following options are available to tune the output. Note that some of
380 these coincide with ML flags of the same names (see also \cite{isabelle-ref}).
382 \item[$show_types = bool$ and $show_sorts = bool$] control printing of
383 explicit type and sort constraints.
384 \item[$long_names = bool$] forces names of types and constants etc.\ to be
385 printed in their fully qualified internal form.
386 \item[$eta_contract = bool$] prints terms in $\eta$-contracted form.
387 \item[$display = bool$] indicates if the text is to be output as multi-line
388 ``display material'', rather than a small piece of text without line breaks
389 (which is the default).
390 \item[$quotes = bool$] indicates if the output should be enclosed in double
392 \item[$mode = name$] adds $name$ to the print mode to be used for presentation
393 (see also \cite{isabelle-ref}). Note that the standard setup for {\LaTeX}
394 output is already present by default, including the modes ``$latex$'',
395 ``$xsymbols$'', ``$symbols$''.
396 \item[$margin = nat$ and $indent = nat$] change the margin or indentation for
397 pretty printing of display material.
398 \item[$source = bool$] prints the source text of the antiquotation arguments,
399 rather than the actual value. Note that this does not affect
400 well-formedness checks of $thm$, $term$, etc. (only the $text$ antiquotation
401 admits arbitrary output).
404 For boolean flags, ``$name = true$'' may be abbreviated as ``$name$''. All of
405 the above flags are disabled by default, unless changed from ML.
410 %%% TeX-master: "isar-ref"