5 chapter {* Outer syntax *}
8 The rather generic framework of Isabelle/Isar syntax emerges from
9 three main syntactic categories: \emph{commands} of the top-level
10 Isar engine (covering theory and proof elements), \emph{methods} for
11 general goal refinements (analogous to traditional ``tactics''), and
12 \emph{attributes} for operations on facts (within a certain
13 context). Subsequently we give a reference of basic syntactic
14 entities underlying Isabelle/Isar syntax in a bottom-up manner.
15 Concrete theory and proof language elements will be introduced later
18 \medskip In order to get started with writing well-formed
19 Isabelle/Isar documents, the most important aspect to be noted is
20 the difference of \emph{inner} versus \emph{outer} syntax. Inner
21 syntax is that of Isabelle types and terms of the logic, while outer
22 syntax is that of Isabelle/Isar theory sources (specifications and
23 proofs). As a general rule, inner syntax entities may occur only as
24 \emph{atomic entities} within outer syntax. For example, the string
25 @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
26 specifications within a theory, while @{verbatim "x + y"} without
29 Printed theory documents usually omit quotes to gain readability
30 (this is a matter of {\LaTeX} macro setup, say via @{verbatim
31 "\\isabellestyle"}, see also \cite{isabelle-sys}). Experienced
32 users of Isabelle/Isar may easily reconstruct the lost technical
33 information, while mere readers need not care about quotes at all.
35 \medskip Isabelle/Isar input may contain any number of input
36 termination characters ``@{verbatim ";"}'' (semicolon) to separate
37 commands explicitly. This is particularly useful in interactive
38 shell sessions to make clear where the current command is intended
39 to end. Otherwise, the interpreter loop will continue to issue a
40 secondary prompt ``@{verbatim "#"}'' until an end-of-command is
41 clearly recognized from the input syntax, e.g.\ encounter of the
44 More advanced interfaces such as Proof~General \cite{proofgeneral}
45 do not require explicit semicolons, the amount of input text is
46 determined automatically by inspecting the present content of the
47 Emacs text buffer. In the printed presentation of Isabelle/Isar
48 documents semicolons are omitted altogether for readability.
51 Proof~General requires certain syntax classification tables in
52 order to achieve properly synchronized interaction with the
53 Isabelle/Isar process. These tables need to be consistent with
54 the Isabelle version and particular logic image to be used in a
55 running session (common object-logics may well change the outer
56 syntax). The standard setup should work correctly with any of the
57 ``official'' logic images derived from Isabelle/HOL (including
58 HOLCF etc.). Users of alternative logics may need to tell
59 Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
60 (in conjunction with @{verbatim "-l ZF"}, to specify the default
61 logic image). Note that option @{verbatim "-L"} does both
62 of this at the same time.
67 section {* Lexical matters \label{sec:outer-lex} *}
69 text {* The outer lexical syntax consists of three main categories of
74 \item \emph{major keywords} --- the command names that are available
75 in the present logic session;
77 \item \emph{minor keywords} --- additional literal tokens required
78 by the syntax of commands;
80 \item \emph{named tokens} --- various categories of identifiers etc.
84 Major keywords and minor keywords are guaranteed to be disjoint.
85 This helps user-interfaces to determine the overall structure of a
86 theory text, without knowing the full details of command syntax.
87 Internally, there is some additional information about the kind of
88 major keywords, which approximates the command type (theory command,
91 Keywords override named tokens. For example, the presence of a
92 command called @{verbatim term} inhibits the identifier @{verbatim
93 term}, but the string @{verbatim "\"term\""} can be used instead.
94 By convention, the outer syntax always allows quoted strings in
95 addition to identifiers, wherever a named entity is expected.
97 When tokenizing a given input sequence, the lexer repeatedly takes
98 the longest prefix of the input that forms a valid token. Spaces,
99 tabs, newlines and formfeeds between tokens serve as explicit
102 \medskip The categories for named tokens are defined once and for
106 \begin{supertabular}{rcl}
107 @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\
108 @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\
109 @{syntax_def symident} & = & @{text "sym\<^sup>+ | "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\
110 @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\
111 @{syntax_def var} & = & @{verbatim "?"}@{text "ident | "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\
112 @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\
113 @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree | "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\
114 @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\
115 @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\
116 @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]
118 @{text letter} & = & @{text "latin | "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text " | "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text " | greek |"} \\
119 & & @{verbatim "\<^isub>"}@{text " | "}@{verbatim "\<^isup>"} \\
120 @{text quasiletter} & = & @{text "letter | digit | "}@{verbatim "_"}@{text " | "}@{verbatim "'"} \\
121 @{text latin} & = & @{verbatim a}@{text " | \<dots> | "}@{verbatim z}@{text " | "}@{verbatim A}@{text " | \<dots> | "}@{verbatim Z} \\
122 @{text digit} & = & @{verbatim "0"}@{text " | \<dots> | "}@{verbatim "9"} \\
123 @{text sym} & = & @{verbatim "!"}@{text " | "}@{verbatim "#"}@{text " | "}@{verbatim "$"}@{text " | "}@{verbatim "%"}@{text " | "}@{verbatim "&"}@{text " | "}@{verbatim "*"}@{text " | "}@{verbatim "+"}@{text " | "}@{verbatim "-"}@{text " | "}@{verbatim "/"}@{text " |"} \\
124 & & @{verbatim "<"}@{text " | "}@{verbatim "="}@{text " | "}@{verbatim ">"}@{text " | "}@{verbatim "?"}@{text " | "}@{verbatim "@"}@{text " | "}@{verbatim "^"}@{text " | "}@{verbatim "_"}@{text " | "}@{verbatim "|"}@{text " | "}@{verbatim "~"} \\
125 @{text greek} & = & @{verbatim "\<alpha>"}@{text " | "}@{verbatim "\<beta>"}@{text " | "}@{verbatim "\<gamma>"}@{text " | "}@{verbatim "\<delta>"}@{text " |"} \\
126 & & @{verbatim "\<epsilon>"}@{text " | "}@{verbatim "\<zeta>"}@{text " | "}@{verbatim "\<eta>"}@{text " | "}@{verbatim "\<theta>"}@{text " |"} \\
127 & & @{verbatim "\<iota>"}@{text " | "}@{verbatim "\<kappa>"}@{text " | "}@{verbatim "\<mu>"}@{text " | "}@{verbatim "\<nu>"}@{text " |"} \\
128 & & @{verbatim "\<xi>"}@{text " | "}@{verbatim "\<pi>"}@{text " | "}@{verbatim "\<rho>"}@{text " | "}@{verbatim "\<sigma>"}@{text " | "}@{verbatim "\<tau>"}@{text " |"} \\
129 & & @{verbatim "\<upsilon>"}@{text " | "}@{verbatim "\<phi>"}@{text " | "}@{verbatim "\<chi>"}@{text " | "}@{verbatim "\<psi>"}@{text " |"} \\
130 & & @{verbatim "\<omega>"}@{text " | "}@{verbatim "\<Gamma>"}@{text " | "}@{verbatim "\<Delta>"}@{text " | "}@{verbatim "\<Theta>"}@{text " |"} \\
131 & & @{verbatim "\<Lambda>"}@{text " | "}@{verbatim "\<Xi>"}@{text " | "}@{verbatim "\<Pi>"}@{text " | "}@{verbatim "\<Sigma>"}@{text " |"} \\
132 & & @{verbatim "\<Upsilon>"}@{text " | "}@{verbatim "\<Phi>"}@{text " | "}@{verbatim "\<Psi>"}@{text " | "}@{verbatim "\<Omega>"} \\
136 A @{syntax_ref var} or @{syntax_ref typevar} describes an unknown,
137 which is internally a pair of base name and index (ML type @{ML_type
138 indexname}). These components are either separated by a dot as in
139 @{text "?x.1"} or @{text "?x7.3"} or run together as in @{text
140 "?x1"}. The latter form is possible if the base name does not end
141 with digits. If the index is 0, it may be dropped altogether:
142 @{text "?x"} and @{text "?x0"} and @{text "?x.0"} all refer to the
143 same unknown, with basename @{text "x"} and index 0.
145 The syntax of @{syntax_ref string} admits any characters, including
146 newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
147 "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
148 character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
149 with three decimal digits. Alternative strings according to
150 @{syntax_ref altstring} are analogous, using single back-quotes
153 The body of @{syntax_ref verbatim} may consist of any text not
154 containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
155 convenient inclusion of quotes without further escapes. There is no
156 way to escape ``@{verbatim "*"}@{verbatim "}"}''. If the quoted
157 text is {\LaTeX} source, one may usually add some blank or comment
158 to avoid the critical character sequence.
160 Source comments take the form @{verbatim "(*"}~@{text
161 "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface
162 might prevent this. Note that this form indicates source comments
163 only, which are stripped after lexical analysis of the input. The
164 Isar syntax also provides proper \emph{document comments} that are
165 considered as part of the text (see \secref{sec:comments}).
167 Common mathematical symbols such as @{text \<forall>} are represented in
168 Isabelle as @{verbatim \<forall>}. There are infinitely many Isabelle
169 symbols like this, although proper presentation is left to front-end
170 tools such as {\LaTeX} or Proof~General with the X-Symbol package.
171 A list of predefined Isabelle symbols that work well with these
172 tools is given in \appref{app:symbols}. Note that @{verbatim "\<lambda>"}
173 does not belong to the @{text letter} category, since it is already
174 used differently in the Pure term language.
178 section {* Common syntax entities *}
181 We now introduce several basic syntactic entities, such as names,
182 terms, and theorem specifications, which are factored out of the
183 actual Isar language elements to be described later.
187 subsection {* Names *}
190 Entity \railqtok{name} usually refers to any name of types,
191 constants, theorems etc.\ that are to be \emph{declared} or
192 \emph{defined} (so qualified identifiers are excluded here). Quoted
193 strings provide an escape for non-identifier names or those ruled
194 out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
195 Already existing objects are usually referenced by
198 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
199 \indexoutertoken{int}
201 name: ident | symident | string | nat
203 parname: '(' name ')'
205 nameref: name | longident
213 subsection {* Comments \label{sec:comments} *}
216 Large chunks of plain \railqtok{text} are usually given
217 \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
218 "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}. For convenience,
219 any of the smaller text units conforming to \railqtok{nameref} are
220 admitted as well. A marginal \railnonterm{comment} is of the form
221 @{verbatim "--"} \railqtok{text}. Any number of these may occur
222 within Isabelle/Isar commands.
224 \indexoutertoken{text}\indexouternonterm{comment}
226 text: verbatim | nameref
234 subsection {* Type classes, sorts and arities *}
237 Classes are specified by plain names. Sorts have a very simple
238 inner syntax, which is either a single class name @{text c} or a
239 list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
240 intersection of these classes. The syntax of type arities is given
241 directly at the outer level.
243 \indexouternonterm{sort}\indexouternonterm{arity}
244 \indexouternonterm{classdecl}
246 classdecl: name (('<' | subseteq) (nameref + ','))?
250 arity: ('(' (sort + ',') ')')? sort
256 subsection {* Types and terms \label{sec:types-terms} *}
259 The actual inner Isabelle syntax, that of types and terms of the
260 logic, is far too sophisticated in order to be modelled explicitly
261 at the outer theory level. Basically, any such entity has to be
262 quoted to turn it into a single token (the parsing and type-checking
263 is performed internally later). For convenience, a slightly more
264 liberal convention is adopted: quotes may be omitted for any type or
265 term that is already atomic at the outer level. For example, one
266 may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
267 Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
268 "\<forall>"} are available as well, provided these have not been superseded
269 by commands or other keywords already (such as @{verbatim "="} or
272 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
274 type: nameref | typefree | typevar
282 Positional instantiations are indicated by giving a sequence of
283 terms, or the placeholder ``@{text _}'' (underscore), which means to
286 \indexoutertoken{inst}\indexoutertoken{insts}
288 inst: underscore | term
294 Type declarations and definitions usually refer to
295 \railnonterm{typespec} on the left-hand side. This models basic
296 type constructor application at the outer syntax level. Note that
297 only plain postfix notation is available here, but no infixes.
299 \indexouternonterm{typespec}
300 \indexouternonterm{typespecsorts}
302 typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
305 typespecsorts: (() | (typefree ('::' sort)?) | '(' ( (typefree ('::' sort)?) + ',' ) ')') name
311 subsection {* Term patterns and declarations \label{sec:term-decls} *}
314 Wherever explicit propositions (or term fragments) occur in a proof
315 text, casual binding of schematic term variables may be given
316 specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
317 p\<^sub>n)"}''. This works both for \railqtok{term} and \railqtok{prop}.
319 \indexouternonterm{termpat}\indexouternonterm{proppat}
321 termpat: '(' ('is' term +) ')'
323 proppat: '(' ('is' prop +) ')'
327 \medskip Declarations of local variables @{text "x :: \<tau>"} and
328 logical propositions @{text "a : \<phi>"} represent different views on
329 the same principle of introducing a local scope. In practice, one
330 may usually omit the typing of \railnonterm{vars} (due to
331 type-inference), and the naming of propositions (due to implicit
332 references of current facts). In any case, Isar proof elements
333 usually admit to introduce multiple such items simultaneously.
335 \indexouternonterm{vars}\indexouternonterm{props}
337 vars: (name+) ('::' type)?
339 props: thmdecl? (prop proppat? +)
343 The treatment of multiple declarations corresponds to the
344 complementary focus of \railnonterm{vars} versus
345 \railnonterm{props}. In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
346 the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
347 \<phi>\<^sub>n"} the naming refers to all propositions collectively.
348 Isar language elements that refer to \railnonterm{vars} or
349 \railnonterm{props} typically admit separate typings or namings via
350 another level of iteration, with explicit @{keyword_ref "and"}
351 separators; e.g.\ see @{command "fix"} and @{command "assume"} in
352 \secref{sec:proof-context}.
356 subsection {* Attributes and theorems \label{sec:syn-att} *}
358 text {* Attributes have their own ``semi-inner'' syntax, in the sense
359 that input conforming to \railnonterm{args} below is parsed by the
360 attribute a second time. The attribute argument specifications may
361 be any sequence of atomic entities (identifiers, strings etc.), or
362 properly bracketed argument lists. Below \railqtok{atom} refers to
363 any atomic entity, including any \railtok{keyword} conforming to
366 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
368 atom: nameref | typefree | typevar | var | nat | keyword
370 arg: atom | '(' args ')' | '[' args ']'
374 attributes: '[' (nameref args * ',') ']'
378 Theorem specifications come in several flavors:
379 \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
380 axioms, assumptions or results of goal statements, while
381 \railnonterm{thmdef} collects lists of existing theorems. Existing
382 theorems are given by \railnonterm{thmref} and
383 \railnonterm{thmrefs}, the former requires an actual singleton
386 There are three forms of theorem references:
389 \item named facts @{text "a"},
391 \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
393 \item literal fact propositions using @{syntax_ref altstring} syntax
394 @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
399 Any kind of theorem specification may include lists of attributes
400 both on the left and right hand sides; attributes are applied to any
401 immediately preceding fact. If names are omitted, the theorems are
402 not stored within the theorem database of the theory or proof
403 context, but any given attributes are applied nonetheless.
405 An extra pair of brackets around attributes (like ``@{text
406 "[[simproc a]]"}'') abbreviates a theorem reference involving an
407 internal dummy fact, which will be ignored later on. So only the
408 effect of the attribute on the background context will persist.
409 This form of in-place declarations is particularly useful with
410 commands like @{command "declare"} and @{command "using"}.
412 \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
413 \indexouternonterm{thmdef}\indexouternonterm{thmref}
414 \indexouternonterm{thmrefs}\indexouternonterm{selection}
416 axmdecl: name attributes? ':'
422 thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
427 thmbind: name attributes | name | attributes
429 selection: '(' ((nat | nat '-' nat?) + ',') ')'