wneuper/isa: doc-src/IsarRef/syntax.tex@68619606c5d1 (annotated)

wenzelm@7046	1
wenzelm@7895	2	\chapter{Isar Syntax Primitives}
wenzelm@7046	3
wenzelm@7315	4	We give a complete reference of all basic syntactic entities underlying the
wenzelm@7335	5	Isabelle/Isar document syntax. Actual theory and proof commands will be
wenzelm@7335	6	introduced later on.
wenzelm@7134	7
wenzelm@7315	8	\medskip
wenzelm@7046	9
wenzelm@7315	10	In order to get started with writing well-formed Isabelle/Isar documents, the
wenzelm@7315	11	most important aspect to be noted is the difference of \emph{inner} versus
wenzelm@7315	12	\emph{outer} syntax. Inner syntax is that of Isabelle types and terms of the
wenzelm@7895	13	logic, while outer syntax is that of Isabelle/Isar theories (including
wenzelm@7895	14	proofs). As a general rule, inner syntax entities may occur only as
wenzelm@7895	15	\emph{atomic entities} within outer syntax. For example, the string
wenzelm@7895	16	\texttt{"x + y"} and identifier \texttt{z} are legal term specifications
wenzelm@7895	17	within a theory, while \texttt{x + y} is not.
wenzelm@7315	18
wenzelm@7315	19	\begin{warn}
wenzelm@8378	20	Note that classic Isabelle theories used to fake parts of the inner syntax
wenzelm@8378	21	of types, with rather complicated rules when quotes may be omitted. Despite
wenzelm@7981	22	the minor drawback of requiring quotes more often, the syntax of
wenzelm@8548	23	Isabelle/Isar is much simpler and more robust in that respect.
wenzelm@7315	24	\end{warn}
wenzelm@7315	25
wenzelm@7466	26	\medskip
wenzelm@7466	27
wenzelm@7466	28	Another notable point is proper input termination. Proof~General demands any
wenzelm@7466	29	command to be terminated by ``\texttt{;}''
wenzelm@7466	30	(semicolon)\index{semicolon}\index{*;}. As far as plain Isabelle/Isar is
wenzelm@7981	31	concerned, commands may be directly run together, though. In the presentation
wenzelm@7981	32	of Isabelle/Isar documents, semicolons are omitted in order to gain
wenzelm@7981	33	readability.
wenzelm@7466	34
wenzelm@7315	35
wenzelm@7315	36	\section{Lexical matters}\label{sec:lex-syntax}
wenzelm@7315	37
wenzelm@7315	38	The Isabelle/Isar outer syntax provides token classes as presented below.
wenzelm@7895	39	Note that some of these coincide (by full intention) with the inner lexical
wenzelm@7895	40	syntax as presented in \cite{isabelle-ref}. These different levels of syntax
wenzelm@7895	41	should not be confused, though.
wenzelm@7315	42
wenzelm@7335	43	%FIXME keyword, command
wenzelm@7315	44	\begin{matharray}{rcl}
wenzelm@7315	45	ident & = & letter~quasiletter^* \\
wenzelm@7315	46	longident & = & ident\verb,.,ident~\dots~ident \\
wenzelm@8548	47	symident & = & sym^+ ~\|~ symbol \\
wenzelm@7315	48	nat & = & digit^+ \\
wenzelm@7315	49	var & = & \verb,?,ident ~\|~ \verb,?,ident\verb,.,nat \\
wenzelm@7315	50	typefree & = & \verb,',ident \\
wenzelm@7315	51	typevar & = & \verb,?,typefree ~\|~ \verb,?,typefree\verb,.,nat \\
wenzelm@7315	52	string & = & \verb,", ~\dots~ \verb,", \\
wenzelm@7319	53	verbatim & = & \verb,{, ~\dots~ \verb,}, \\
wenzelm@7319	54	\end{matharray}
wenzelm@7319	55	\begin{matharray}{rcl}
wenzelm@7315	56	letter & = & \verb,a, ~\|~ \dots ~\|~ \verb,z, ~\|~ \verb,A, ~\|~ \dots ~\|~ \verb,Z, \\
wenzelm@7315	57	digit & = & \verb,0, ~\|~ \dots ~\|~ \verb,9, \\
wenzelm@7315	58	quasiletter & = & letter ~\|~ digit ~\|~ \verb,_, ~\|~ \verb,', \\
wenzelm@7315	59	sym & = & \verb,!, ~\|~ \verb,#, ~\|~ \verb,$, ~\|~ \verb,%, ~\|~ \verb,&, ~\|~ %$
wenzelm@7319	60	\verb,*, ~\|~ \verb,+, ~\|~ \verb,-, ~\|~ \verb,/, ~\|~ \verb,:, ~\|~
wenzelm@7319	61	\verb,<, ~\|~ \verb,=, ~\|~ \verb,>, ~\|~ \verb,?, ~\|~ \mathtt{\at} ~\|~ \\
wenzelm@7319	62	& & \verb,^, ~\|~ \verb,_, ~\|~ \verb,`, ~\|~ \verb,\|, ~\|~ \verb,~, \\
wenzelm@8548	63	symbol & = & {\forall} ~\|~ {\exists} ~\|~ \dots
wenzelm@7315	64	\end{matharray}
wenzelm@7315	65
wenzelm@7315	66	The syntax of \texttt{string} admits any characters, including newlines;
wenzelm@7895	67	``\verb\|"\|'' (double-quote) and ``\verb\|\\|'' (backslash) have to be escaped by
wenzelm@7981	68	a backslash. Note that ML-style control characters are \emph{not} supported.
wenzelm@7981	69	The body of \texttt{verbatim} may consist of any text not containing
wenzelm@7981	70	``\verb\|*}\|''.
wenzelm@7315	71
wenzelm@7895	72	Comments take the form \texttt{(~\dots~)} and may be
wenzelm@8378	73	nested\footnote{Proof~General may occasionally get confused by nested
wenzelm@8378	74	comments.}, just as in ML. Note that these are \emph{source} comments only,
wenzelm@8378	75	which are stripped after lexical analysis of the input. The Isar document
wenzelm@8378	76	syntax also provides \emph{formal comments} that are actually part of the text
wenzelm@8378	77	(see \S\ref{sec:comments}).
wenzelm@7315	78
wenzelm@7046	79
wenzelm@7046	80	\section{Common syntax entities}
wenzelm@7046	81
wenzelm@7335	82	Subsequently, we introduce several basic syntactic entities, such as names,
wenzelm@7895	83	terms, and theorem specifications, which have been factored out of the actual
wenzelm@7895	84	Isar language elements to be described later.
wenzelm@7134	85
wenzelm@7981	86	Note that some of the basic syntactic entities introduced below (e.g.\
wenzelm@7895	87	\railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\
wenzelm@7895	88	\railnonterm{sort}), especially for the sake of error messages. E.g.\ syntax
wenzelm@7895	89	elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
wenzelm@7895	90	would really report a missing name or type rather than any of the constituent
wenzelm@7895	91	primitive tokens such as \railtoken{ident} or \railtoken{string}.
wenzelm@7046	92
wenzelm@7050	93
wenzelm@7050	94	\subsection{Names}
wenzelm@7050	95
wenzelm@7134	96	Entity \railqtoken{name} usually refers to any name of types, constants,
wenzelm@7167	97	theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
wenzelm@8548	98	identifiers are excluded here). Quoted strings provide an escape for
wenzelm@7134	99	non-identifier names or those ruled out by outer syntax keywords (e.g.\
wenzelm@7134	100	\verb\|"let"\|). Already existing objects are usually referenced by
wenzelm@7134	101	\railqtoken{nameref}.
wenzelm@7050	102
wenzelm@7141	103	\indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
wenzelm@7046	104	\begin{rail}
wenzelm@8145	105	name: ident \| symident \| string \| nat
wenzelm@7046	106	;
wenzelm@7167	107	parname: '(' name ')'
wenzelm@7141	108	;
wenzelm@7167	109	nameref: name \| longident
wenzelm@7046	110	;
wenzelm@7046	111	\end{rail}
wenzelm@7046	112
wenzelm@7050	113
wenzelm@7315	114	\subsection{Comments}\label{sec:comments}
wenzelm@7046	115
wenzelm@7167	116	Large chunks of plain \railqtoken{text} are usually given
wenzelm@7895	117	\railtoken{verbatim}, i.e.\ enclosed in \verb\|{\|~\dots~\verb\|}\|. For
wenzelm@7175	118	convenience, any of the smaller text units conforming to \railqtoken{nameref}
wenzelm@8102	119	are admitted as well. Almost any of the Isar commands may be annotated by
wenzelm@7466	120	marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
wenzelm@7466	121	Note that the latter kind of comment is actually part of the language, while
wenzelm@7895	122	source level comments \verb\|(\|~\dots~\verb\|)\| are stripped at the lexical
wenzelm@7466	123	level. A few commands such as $\PROOFNAME$ admit additional markup with a
wenzelm@7466	124	``level of interest'': \texttt{\%} followed by an optional number $n$ (default
wenzelm@7466	125	$n = 1$) indicates that the respective part of the document becomes $n$ levels
wenzelm@7466	126	more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
wenzelm@7466	127	every hope, who enter here.
wenzelm@7050	128
wenzelm@7050	129	\indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
wenzelm@7046	130	\begin{rail}
wenzelm@7167	131	text: verbatim \| nameref
wenzelm@7050	132	;
wenzelm@8102	133	comment: ('--' text +)
wenzelm@7046	134	;
wenzelm@7167	135	interest: percent nat? \| ppercent
wenzelm@7046	136	;
wenzelm@7046	137	\end{rail}
wenzelm@7046	138
wenzelm@7046	139
wenzelm@7335	140	\subsection{Type classes, sorts and arities}
wenzelm@7046	141
wenzelm@7050	142	The syntax of sorts and arities is given directly at the outer level. Note
wenzelm@7335	143	that this is in contrast to types and terms (see \ref{sec:types-terms}).
wenzelm@7050	144
wenzelm@7050	145	\indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
wenzelm@7135	146	\indexouternonterm{classdecl}
wenzelm@7046	147	\begin{rail}
wenzelm@7321	148	classdecl: name ('<' (nameref + ','))?
wenzelm@7046	149	;
wenzelm@7167	150	sort: nameref \| lbrace (nameref * ',') rbrace
wenzelm@7046	151	;
wenzelm@7167	152	arity: ('(' (sort + ',') ')')? sort
wenzelm@7046	153	;
wenzelm@7167	154	simplearity: ('(' (sort + ',') ')')? nameref
wenzelm@7167	155	;
wenzelm@7046	156	\end{rail}
wenzelm@7046	157
wenzelm@7046	158
wenzelm@7167	159	\subsection{Types and terms}\label{sec:types-terms}
wenzelm@7046	160
wenzelm@7167	161	The actual inner Isabelle syntax, that of types and terms of the logic, is far
wenzelm@7895	162	too sophisticated in order to be modelled explicitly at the outer theory
wenzelm@8548	163	level. Basically, any such entity has to be quoted to turn it into a single
wenzelm@8548	164	token (the parsing and type-checking is performed internally later). For
wenzelm@8548	165	convenience, a slightly more liberal convention is adopted: quotes may be
wenzelm@7895	166	omitted for any type or term that is already \emph{atomic} at the outer level.
wenzelm@7895	167	For example, one may write just \texttt{x} instead of \texttt{"x"}. Note that
wenzelm@8548	168	symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,
wenzelm@8548	169	provided these are not superseded by commands or keywords (e.g.\ \texttt{+}).
wenzelm@7050	170
wenzelm@7050	171	\indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
wenzelm@7046	172	\begin{rail}
wenzelm@7167	173	type: nameref \| typefree \| typevar
wenzelm@7050	174	;
wenzelm@8593	175	term: nameref \| var
wenzelm@7050	176	;
wenzelm@7167	177	prop: term
wenzelm@7050	178	;
wenzelm@7046	179	\end{rail}
wenzelm@7046	180
wenzelm@7167	181	Type declarations and definitions usually refer to \railnonterm{typespec} on
wenzelm@7167	182	the left-hand side. This models basic type constructor application at the
wenzelm@7167	183	outer syntax level. Note that only plain postfix notation is available here,
wenzelm@7167	184	but no infixes.
wenzelm@7050	185
wenzelm@7050	186	\indexouternonterm{typespec}
wenzelm@7050	187	\begin{rail}
wenzelm@7167	188	typespec: (() \| typefree \| '(' ( typefree + ',' ) ')') name
wenzelm@7050	189	;
wenzelm@7050	190	\end{rail}
wenzelm@7050	191
wenzelm@7050	192
wenzelm@7315	193	\subsection{Term patterns}\label{sec:term-pats}
wenzelm@7050	194
wenzelm@7895	195	Assumptions and goal statements usually admit casual binding of schematic term
wenzelm@7981	196	variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
wenzelm@7167	197	There are separate versions available for \railqtoken{term}s and
wenzelm@7167	198	\railqtoken{prop}s. The latter provides a $\CONCLNAME$ part with patterns
wenzelm@7167	199	referring the (atomic) conclusion of a rule.
wenzelm@7050	200
wenzelm@7050	201	\indexouternonterm{termpat}\indexouternonterm{proppat}
wenzelm@7050	202	\begin{rail}
wenzelm@7167	203	termpat: '(' ('is' term +) ')'
wenzelm@7050	204	;
wenzelm@7167	205	proppat: '(' (('is' prop +) \| 'concl' ('is' prop +) \| ('is' prop +) 'concl' ('is' prop +)) ')'
wenzelm@7050	206	;
wenzelm@7050	207	\end{rail}
wenzelm@7050	208
wenzelm@7050	209
wenzelm@7046	210	\subsection{Mixfix annotations}
wenzelm@7046	211
wenzelm@7134	212	Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
wenzelm@8548	213	terms (see also \cite{isabelle-ref}). Some commands such as $\TYPES$ (see
wenzelm@8548	214	\S\ref{sec:types-pure}) admit infixes only, while $\CONSTS$ (see
wenzelm@8548	215	\S\ref{sec:consts}) and $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans})
wenzelm@8548	216	support the full range of general mixfixes and binders.
wenzelm@7046	217
wenzelm@7050	218	\indexouternonterm{infix}\indexouternonterm{mixfix}
wenzelm@7050	219	\begin{rail}
wenzelm@7167	220	infix: '(' ('infixl' \| 'infixr') string? nat ')'
wenzelm@7167	221	;
wenzelm@7175	222	mixfix: infix \| '(' string prios? nat? ')' \| '(' 'binder' string prios? nat ')'
wenzelm@7050	223	;
wenzelm@7046	224
wenzelm@7175	225	prios: '[' (nat + ',') ']'
wenzelm@7050	226	;
wenzelm@7050	227	\end{rail}
wenzelm@7046	228
wenzelm@7050	229
wenzelm@7134	230	\subsection{Attributes and theorems}\label{sec:syn-att}
wenzelm@7050	231
wenzelm@7050	232	Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
wenzelm@7335	233	``semi-inner'' syntax, in the sense that input conforming to
wenzelm@7335	234	\railnonterm{args} below is parsed by the attribute a second time. The
wenzelm@7335	235	attribute argument specifications may be any sequence of atomic entities
wenzelm@7335	236	(identifiers, strings etc.), or properly bracketed argument lists. Below
wenzelm@7981	237	\railqtoken{atom} refers to any atomic entity, including any
wenzelm@7981	238	\railtoken{keyword} conforming to \railtoken{symident}.
wenzelm@7050	239
wenzelm@7050	240	\indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
wenzelm@7050	241	\begin{rail}
wenzelm@7466	242	atom: nameref \| typefree \| typevar \| var \| nat \| keyword
wenzelm@7050	243	;
wenzelm@7167	244	arg: atom \| '(' args ')' \| '[' args ']' \| lbrace args rbrace
wenzelm@7134	245	;
wenzelm@7167	246	args: arg *
wenzelm@7134	247	;
wenzelm@7167	248	attributes: '[' (nameref args * ',') ']'
wenzelm@7050	249	;
wenzelm@7050	250	\end{rail}
wenzelm@7050	251
wenzelm@7895	252	Theorem specifications come in several flavors: \railnonterm{axmdecl} and
wenzelm@7175	253	\railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
wenzelm@7981	254	statements, while \railnonterm{thmdef} collects lists of existing theorems.
wenzelm@7981	255	Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
wenzelm@7981	256	the former requires an actual singleton result. Any of these theorem
wenzelm@7175	257	specifications may include lists of attributes both on the left and right hand
wenzelm@7466	258	sides; attributes are applied to any immediately preceding theorem. If names
wenzelm@7981	259	are omitted, the theorems are not stored within the theorem database of the
wenzelm@7981	260	theory or proof context; any given attributes are still applied, though.
wenzelm@7050	261
wenzelm@7135	262	\indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
wenzelm@7135	263	\indexouternonterm{thmdef}\indexouternonterm{thmrefs}
wenzelm@7050	264	\begin{rail}
wenzelm@7167	265	axmdecl: name attributes? ':'
wenzelm@7050	266	;
wenzelm@7167	267	thmdecl: thmname ':'
wenzelm@7135	268	;
wenzelm@7167	269	thmdef: thmname '='
wenzelm@7050	270	;
wenzelm@7175	271	thmref: nameref attributes?
wenzelm@7175	272	;
wenzelm@7175	273	thmrefs: thmref +
wenzelm@7134	274	;
wenzelm@7167	275
wenzelm@7167	276	thmname: name attributes \| name \| attributes
wenzelm@7050	277	;
wenzelm@7050	278	\end{rail}
wenzelm@7050	279
wenzelm@7050	280
wenzelm@7050	281	\subsection{Proof methods}\label{sec:syn-meth}
wenzelm@7050	282
wenzelm@7050	283	Proof methods are either basic ones, or expressions composed of methods via
wenzelm@7175	284	``\texttt{,}'' (sequential composition), ``\texttt{\|}'' (alternative choices),
wenzelm@7981	285	``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once). In practice,
wenzelm@7981	286	proof methods are usually just a comma separated list of
wenzelm@7981	287	\railqtoken{nameref}~\railnonterm{args} specifications. Note that parentheses
wenzelm@7981	288	may be dropped for single method specifications (with no arguments).
wenzelm@7050	289
wenzelm@7050	290	\indexouternonterm{method}
wenzelm@7050	291	\begin{rail}
wenzelm@7430	292	method: (nameref \| '(' methods ')') (() \| '?' \| '+')
wenzelm@7134	293	;
wenzelm@7167	294	methods: (nameref args \| method) + (',' \| '\|')
wenzelm@7050	295	;
wenzelm@7050	296	\end{rail}
wenzelm@7046	297
wenzelm@8532	298	Proper use of Isar proof methods does \emph{not} involve goal addressing.
wenzelm@8532	299	Nevertheless, specifying goal ranges may occasionally come in handy in
wenzelm@8532	300	emulating tactic scripts. Note that $[n-]$ refers to all goals, starting from
wenzelm@8548	301	$n$. All goals may be specified by $[!]$, which is the same as $[1-]$.
wenzelm@8532	302
wenzelm@8532	303	\indexouternonterm{goalspec}
wenzelm@8532	304	\begin{rail}
wenzelm@8548	305	goalspec: '[' (nat '-' nat \| nat '-' \| nat \| '!' ) ']'
wenzelm@8532	306	;
wenzelm@8532	307	\end{rail}
wenzelm@8532	308
wenzelm@7046	309
wenzelm@7046	310	%%% Local Variables:
wenzelm@7046	311	%%% mode: latex
wenzelm@7046	312	%%% TeX-master: "isar-ref"
wenzelm@7046	313	%%% End:

author	wenzelm
	Mon, 27 Mar 2000 18:09:49 +0200
changeset 8593	68619606c5d1
parent 8548	7c5fe9d17712
child 8690	48786b52c8d8
permissions	-rw-r--r--