7 chapter {* Outer syntax *}
10 The rather generic framework of Isabelle/Isar syntax emerges from
11 three main syntactic categories: \emph{commands} of the top-level
12 Isar engine (covering theory and proof elements), \emph{methods} for
13 general goal refinements (analogous to traditional ``tactics''), and
14 \emph{attributes} for operations on facts (within a certain
15 context). Subsequently we give a reference of basic syntactic
16 entities underlying Isabelle/Isar syntax in a bottom-up manner.
17 Concrete theory and proof language elements will be introduced later
20 \medskip In order to get started with writing well-formed
21 Isabelle/Isar documents, the most important aspect to be noted is
22 the difference of \emph{inner} versus \emph{outer} syntax. Inner
23 syntax is that of Isabelle types and terms of the logic, while outer
24 syntax is that of Isabelle/Isar theory sources (specifications and
25 proofs). As a general rule, inner syntax entities may occur only as
26 \emph{atomic entities} within outer syntax. For example, the string
27 @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
28 specifications within a theory, while @{verbatim "x + y"} without
31 Printed theory documents usually omit quotes to gain readability
32 (this is a matter of {\LaTeX} macro setup, say via @{verbatim
33 "\\isabellestyle"}, see also \cite{isabelle-sys}). Experienced
34 users of Isabelle/Isar may easily reconstruct the lost technical
35 information, while mere readers need not care about quotes at all.
37 \medskip Isabelle/Isar input may contain any number of input
38 termination characters ``@{verbatim ";"}'' (semicolon) to separate
39 commands explicitly. This is particularly useful in interactive
40 shell sessions to make clear where the current command is intended
41 to end. Otherwise, the interpreter loop will continue to issue a
42 secondary prompt ``@{verbatim "#"}'' until an end-of-command is
43 clearly recognized from the input syntax, e.g.\ encounter of the
46 More advanced interfaces such as Proof~General \cite{proofgeneral}
47 do not require explicit semicolons, the amount of input text is
48 determined automatically by inspecting the present content of the
49 Emacs text buffer. In the printed presentation of Isabelle/Isar
50 documents semicolons are omitted altogether for readability.
53 Proof~General requires certain syntax classification tables in
54 order to achieve properly synchronized interaction with the
55 Isabelle/Isar process. These tables need to be consistent with
56 the Isabelle version and particular logic image to be used in a
57 running session (common object-logics may well change the outer
58 syntax). The standard setup should work correctly with any of the
59 ``official'' logic images derived from Isabelle/HOL (including
60 HOLCF etc.). Users of alternative logics may need to tell
61 Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
62 (in conjunction with @{verbatim "-l ZF"}, to specify the default
63 logic image). Note that option @{verbatim "-L"} does both
64 of this at the same time.
69 section {* Lexical matters \label{sec:outer-lex} *}
71 text {* The outer lexical syntax consists of three main categories of
76 \item \emph{major keywords} --- the command names that are available
77 in the present logic session;
79 \item \emph{minor keywords} --- additional literal tokens required
80 by the syntax of commands;
82 \item \emph{named tokens} --- various categories of identifiers etc.
86 Major keywords and minor keywords are guaranteed to be disjoint.
87 This helps user-interfaces to determine the overall structure of a
88 theory text, without knowing the full details of command syntax.
89 Internally, there is some additional information about the kind of
90 major keywords, which approximates the command type (theory command,
93 Keywords override named tokens. For example, the presence of a
94 command called @{verbatim term} inhibits the identifier @{verbatim
95 term}, but the string @{verbatim "\"term\""} can be used instead.
96 By convention, the outer syntax always allows quoted strings in
97 addition to identifiers, wherever a named entity is expected.
99 When tokenizing a given input sequence, the lexer repeatedly takes
100 the longest prefix of the input that forms a valid token. Spaces,
101 tabs, newlines and formfeeds between tokens serve as explicit
104 \medskip The categories for named tokens are defined once and for
108 \begin{supertabular}{rcl}
109 @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\
110 @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\
111 @{syntax_def symident} & = & @{text "sym\<^sup>+ | "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\
112 @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\
113 @{syntax_def var} & = & @{verbatim "?"}@{text "ident | "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\
114 @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\
115 @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree | "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\
116 @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\
117 @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\
118 @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]
120 @{text letter} & = & @{text "latin | "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text " | "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text " | greek |"} \\
121 & & @{verbatim "\<^isub>"}@{text " | "}@{verbatim "\<^isup>"} \\
122 @{text quasiletter} & = & @{text "letter | digit | "}@{verbatim "_"}@{text " | "}@{verbatim "'"} \\
123 @{text latin} & = & @{verbatim a}@{text " | \<dots> | "}@{verbatim z}@{text " | "}@{verbatim A}@{text " | \<dots> | "}@{verbatim Z} \\
124 @{text digit} & = & @{verbatim "0"}@{text " | \<dots> | "}@{verbatim "9"} \\
125 @{text sym} & = & @{verbatim "!"}@{text " | "}@{verbatim "#"}@{text " | "}@{verbatim "$"}@{text " | "}@{verbatim "%"}@{text " | "}@{verbatim "&"}@{text " | "}@{verbatim "*"}@{text " | "}@{verbatim "+"}@{text " | "}@{verbatim "-"}@{text " | "}@{verbatim "/"}@{text " |"} \\
126 & & @{verbatim "<"}@{text " | "}@{verbatim "="}@{text " | "}@{verbatim ">"}@{text " | "}@{verbatim "?"}@{text " | "}@{verbatim "@"}@{text " | "}@{verbatim "^"}@{text " | "}@{verbatim "_"}@{text " | "}@{verbatim "|"}@{text " | "}@{verbatim "~"} \\
127 @{text greek} & = & @{verbatim "\<alpha>"}@{text " | "}@{verbatim "\<beta>"}@{text " | "}@{verbatim "\<gamma>"}@{text " | "}@{verbatim "\<delta>"}@{text " |"} \\
128 & & @{verbatim "\<epsilon>"}@{text " | "}@{verbatim "\<zeta>"}@{text " | "}@{verbatim "\<eta>"}@{text " | "}@{verbatim "\<theta>"}@{text " |"} \\
129 & & @{verbatim "\<iota>"}@{text " | "}@{verbatim "\<kappa>"}@{text " | "}@{verbatim "\<mu>"}@{text " | "}@{verbatim "\<nu>"}@{text " |"} \\
130 & & @{verbatim "\<xi>"}@{text " | "}@{verbatim "\<pi>"}@{text " | "}@{verbatim "\<rho>"}@{text " | "}@{verbatim "\<sigma>"}@{text " | "}@{verbatim "\<tau>"}@{text " |"} \\
131 & & @{verbatim "\<upsilon>"}@{text " | "}@{verbatim "\<phi>"}@{text " | "}@{verbatim "\<chi>"}@{text " | "}@{verbatim "\<psi>"}@{text " |"} \\
132 & & @{verbatim "\<omega>"}@{text " | "}@{verbatim "\<Gamma>"}@{text " | "}@{verbatim "\<Delta>"}@{text " | "}@{verbatim "\<Theta>"}@{text " |"} \\
133 & & @{verbatim "\<Lambda>"}@{text " | "}@{verbatim "\<Xi>"}@{text " | "}@{verbatim "\<Pi>"}@{text " | "}@{verbatim "\<Sigma>"}@{text " |"} \\
134 & & @{verbatim "\<Upsilon>"}@{text " | "}@{verbatim "\<Phi>"}@{text " | "}@{verbatim "\<Psi>"}@{text " | "}@{verbatim "\<Omega>"} \\
138 A @{syntax_ref var} or @{syntax_ref typevar} describes an unknown,
139 which is internally a pair of base name and index (ML type @{ML_type
140 indexname}). These components are either separated by a dot as in
141 @{text "?x.1"} or @{text "?x7.3"} or run together as in @{text
142 "?x1"}. The latter form is possible if the base name does not end
143 with digits. If the index is 0, it may be dropped altogether:
144 @{text "?x"} and @{text "?x0"} and @{text "?x.0"} all refer to the
145 same unknown, with basename @{text "x"} and index 0.
147 The syntax of @{syntax_ref string} admits any characters, including
148 newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
149 "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
150 character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
151 with three decimal digits. Alternative strings according to
152 @{syntax_ref altstring} are analogous, using single back-quotes
155 The body of @{syntax_ref verbatim} may consist of any text not
156 containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
157 convenient inclusion of quotes without further escapes. There is no
158 way to escape ``@{verbatim "*"}@{verbatim "}"}''. If the quoted
159 text is {\LaTeX} source, one may usually add some blank or comment
160 to avoid the critical character sequence.
162 Source comments take the form @{verbatim "(*"}~@{text
163 "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface
164 might prevent this. Note that this form indicates source comments
165 only, which are stripped after lexical analysis of the input. The
166 Isar syntax also provides proper \emph{document comments} that are
167 considered as part of the text (see \secref{sec:comments}).
169 Common mathematical symbols such as @{text \<forall>} are represented in
170 Isabelle as @{verbatim \<forall>}. There are infinitely many Isabelle
171 symbols like this, although proper presentation is left to front-end
172 tools such as {\LaTeX} or Proof~General with the X-Symbol package.
173 A list of standard Isabelle symbols that work well with these tools
174 is given in \appref{app:symbols}. Note that @{verbatim "\<lambda>"} does
175 not belong to the @{text letter} category, since it is already used
176 differently in the Pure term language.
180 section {* Common syntax entities *}
183 We now introduce several basic syntactic entities, such as names,
184 terms, and theorem specifications, which are factored out of the
185 actual Isar language elements to be described later.
189 subsection {* Names *}
192 Entity \railqtok{name} usually refers to any name of types,
193 constants, theorems etc.\ that are to be \emph{declared} or
194 \emph{defined} (so qualified identifiers are excluded here). Quoted
195 strings provide an escape for non-identifier names or those ruled
196 out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
197 Already existing objects are usually referenced by
200 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
201 \indexoutertoken{int}
203 name: ident | symident | string | nat
205 parname: '(' name ')'
207 nameref: name | longident
215 subsection {* Comments \label{sec:comments} *}
218 Large chunks of plain \railqtok{text} are usually given
219 \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
220 "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}. For convenience,
221 any of the smaller text units conforming to \railqtok{nameref} are
222 admitted as well. A marginal \railnonterm{comment} is of the form
223 @{verbatim "--"} \railqtok{text}. Any number of these may occur
224 within Isabelle/Isar commands.
226 \indexoutertoken{text}\indexouternonterm{comment}
228 text: verbatim | nameref
236 subsection {* Type classes, sorts and arities *}
239 Classes are specified by plain names. Sorts have a very simple
240 inner syntax, which is either a single class name @{text c} or a
241 list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
242 intersection of these classes. The syntax of type arities is given
243 directly at the outer level.
245 \indexouternonterm{sort}\indexouternonterm{arity}
246 \indexouternonterm{classdecl}
248 classdecl: name (('<' | subseteq) (nameref + ','))?
252 arity: ('(' (sort + ',') ')')? sort
258 subsection {* Types and terms \label{sec:types-terms} *}
261 The actual inner Isabelle syntax, that of types and terms of the
262 logic, is far too sophisticated in order to be modelled explicitly
263 at the outer theory level. Basically, any such entity has to be
264 quoted to turn it into a single token (the parsing and type-checking
265 is performed internally later). For convenience, a slightly more
266 liberal convention is adopted: quotes may be omitted for any type or
267 term that is already atomic at the outer level. For example, one
268 may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
269 Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
270 "\<forall>"} are available as well, provided these have not been superseded
271 by commands or other keywords already (such as @{verbatim "="} or
274 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
276 type: nameref | typefree | typevar
284 Positional instantiations are indicated by giving a sequence of
285 terms, or the placeholder ``@{text _}'' (underscore), which means to
288 \indexoutertoken{inst}\indexoutertoken{insts}
290 inst: underscore | term
296 Type declarations and definitions usually refer to
297 \railnonterm{typespec} on the left-hand side. This models basic
298 type constructor application at the outer syntax level. Note that
299 only plain postfix notation is available here, but no infixes.
301 \indexouternonterm{typespec}
303 typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
309 subsection {* Term patterns and declarations \label{sec:term-decls} *}
312 Wherever explicit propositions (or term fragments) occur in a proof
313 text, casual binding of schematic term variables may be given
314 specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
315 p\<^sub>n)"}''. This works both for \railqtok{term} and \railqtok{prop}.
317 \indexouternonterm{termpat}\indexouternonterm{proppat}
319 termpat: '(' ('is' term +) ')'
321 proppat: '(' ('is' prop +) ')'
325 \medskip Declarations of local variables @{text "x :: \<tau>"} and
326 logical propositions @{text "a : \<phi>"} represent different views on
327 the same principle of introducing a local scope. In practice, one
328 may usually omit the typing of \railnonterm{vars} (due to
329 type-inference), and the naming of propositions (due to implicit
330 references of current facts). In any case, Isar proof elements
331 usually admit to introduce multiple such items simultaneously.
333 \indexouternonterm{vars}\indexouternonterm{props}
335 vars: (name+) ('::' type)?
337 props: thmdecl? (prop proppat? +)
341 The treatment of multiple declarations corresponds to the
342 complementary focus of \railnonterm{vars} versus
343 \railnonterm{props}. In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
344 the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
345 \<phi>\<^sub>n"} the naming refers to all propositions collectively.
346 Isar language elements that refer to \railnonterm{vars} or
347 \railnonterm{props} typically admit separate typings or namings via
348 another level of iteration, with explicit @{keyword_ref "and"}
349 separators; e.g.\ see @{command "fix"} and @{command "assume"} in
350 \secref{sec:proof-context}.
354 subsection {* Attributes and theorems \label{sec:syn-att} *}
356 text {* Attributes have their own ``semi-inner'' syntax, in the sense
357 that input conforming to \railnonterm{args} below is parsed by the
358 attribute a second time. The attribute argument specifications may
359 be any sequence of atomic entities (identifiers, strings etc.), or
360 properly bracketed argument lists. Below \railqtok{atom} refers to
361 any atomic entity, including any \railtok{keyword} conforming to
364 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
366 atom: nameref | typefree | typevar | var | nat | keyword
368 arg: atom | '(' args ')' | '[' args ']'
372 attributes: '[' (nameref args * ',') ']'
376 Theorem specifications come in several flavors:
377 \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
378 axioms, assumptions or results of goal statements, while
379 \railnonterm{thmdef} collects lists of existing theorems. Existing
380 theorems are given by \railnonterm{thmref} and
381 \railnonterm{thmrefs}, the former requires an actual singleton
384 There are three forms of theorem references:
387 \item named facts @{text "a"},
389 \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
391 \item literal fact propositions using @{syntax_ref altstring} syntax
392 @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
397 Any kind of theorem specification may include lists of attributes
398 both on the left and right hand sides; attributes are applied to any
399 immediately preceding fact. If names are omitted, the theorems are
400 not stored within the theorem database of the theory or proof
401 context, but any given attributes are applied nonetheless.
403 An extra pair of brackets around attributes (like ``@{text
404 "[[simproc a]]"}'') abbreviates a theorem reference involving an
405 internal dummy fact, which will be ignored later on. So only the
406 effect of the attribute on the background context will persist.
407 This form of in-place declarations is particularly useful with
408 commands like @{command "declare"} and @{command "using"}.
410 \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
411 \indexouternonterm{thmdef}\indexouternonterm{thmref}
412 \indexouternonterm{thmrefs}\indexouternonterm{selection}
414 axmdecl: name attributes? ':'
420 thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
425 thmbind: name attributes | name | attributes
427 selection: '(' ((nat | nat '-' nat?) + ',') ')'