doc-src/IsarRef/syntax.tex
author wenzelm
Mon, 27 Mar 2000 18:09:49 +0200
changeset 8593 68619606c5d1
parent 8548 7c5fe9d17712
child 8690 48786b52c8d8
permissions -rw-r--r--
fixed term syntax;
wenzelm@7046
     1
wenzelm@7895
     2
\chapter{Isar Syntax Primitives}
wenzelm@7046
     3
wenzelm@7315
     4
We give a complete reference of all basic syntactic entities underlying the
wenzelm@7335
     5
Isabelle/Isar document syntax.  Actual theory and proof commands will be
wenzelm@7335
     6
introduced later on.
wenzelm@7134
     7
wenzelm@7315
     8
\medskip
wenzelm@7046
     9
wenzelm@7315
    10
In order to get started with writing well-formed Isabelle/Isar documents, the
wenzelm@7315
    11
most important aspect to be noted is the difference of \emph{inner} versus
wenzelm@7315
    12
\emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the
wenzelm@7895
    13
logic, while outer syntax is that of Isabelle/Isar theories (including
wenzelm@7895
    14
proofs).  As a general rule, inner syntax entities may occur only as
wenzelm@7895
    15
\emph{atomic entities} within outer syntax.  For example, the string
wenzelm@7895
    16
\texttt{"x + y"} and identifier \texttt{z} are legal term specifications
wenzelm@7895
    17
within a theory, while \texttt{x + y} is not.
wenzelm@7315
    18
wenzelm@7315
    19
\begin{warn}
wenzelm@8378
    20
  Note that classic Isabelle theories used to fake parts of the inner syntax
wenzelm@8378
    21
  of types, with rather complicated rules when quotes may be omitted.  Despite
wenzelm@7981
    22
  the minor drawback of requiring quotes more often, the syntax of
wenzelm@8548
    23
  Isabelle/Isar is much simpler and more robust in that respect.
wenzelm@7315
    24
\end{warn}
wenzelm@7315
    25
wenzelm@7466
    26
\medskip
wenzelm@7466
    27
wenzelm@7466
    28
Another notable point is proper input termination.  Proof~General demands any
wenzelm@7466
    29
command to be terminated by ``\texttt{;}''
wenzelm@7466
    30
(semicolon)\index{semicolon}\index{*;}.  As far as plain Isabelle/Isar is
wenzelm@7981
    31
concerned, commands may be directly run together, though.  In the presentation
wenzelm@7981
    32
of Isabelle/Isar documents, semicolons are omitted in order to gain
wenzelm@7981
    33
readability.
wenzelm@7466
    34
wenzelm@7315
    35
wenzelm@7315
    36
\section{Lexical matters}\label{sec:lex-syntax}
wenzelm@7315
    37
wenzelm@7315
    38
The Isabelle/Isar outer syntax provides token classes as presented below.
wenzelm@7895
    39
Note that some of these coincide (by full intention) with the inner lexical
wenzelm@7895
    40
syntax as presented in \cite{isabelle-ref}.  These different levels of syntax
wenzelm@7895
    41
should not be confused, though.
wenzelm@7315
    42
wenzelm@7335
    43
%FIXME keyword, command
wenzelm@7315
    44
\begin{matharray}{rcl}
wenzelm@7315
    45
  ident & = & letter~quasiletter^* \\
wenzelm@7315
    46
  longident & = & ident\verb,.,ident~\dots~ident \\
wenzelm@8548
    47
  symident & = & sym^+ ~|~ symbol \\
wenzelm@7315
    48
  nat & = & digit^+ \\
wenzelm@7315
    49
  var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
wenzelm@7315
    50
  typefree & = & \verb,',ident \\
wenzelm@7315
    51
  typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
wenzelm@7315
    52
  string & = & \verb,", ~\dots~ \verb,", \\
wenzelm@7319
    53
  verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\
wenzelm@7319
    54
\end{matharray}
wenzelm@7319
    55
\begin{matharray}{rcl}
wenzelm@7315
    56
  letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
wenzelm@7315
    57
  digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
wenzelm@7315
    58
  quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
wenzelm@7315
    59
  sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$
wenzelm@7319
    60
   \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~
wenzelm@7319
    61
   \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \mathtt{\at} ~|~ \\
wenzelm@7319
    62
  & & \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\
wenzelm@8548
    63
  symbol & = & {\forall} ~|~ {\exists} ~|~ \dots
wenzelm@7315
    64
\end{matharray}
wenzelm@7315
    65
wenzelm@7315
    66
The syntax of \texttt{string} admits any characters, including newlines;
wenzelm@7895
    67
``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) have to be escaped by
wenzelm@7981
    68
a backslash.  Note that ML-style control characters are \emph{not} supported.
wenzelm@7981
    69
The body of \texttt{verbatim} may consist of any text not containing
wenzelm@7981
    70
``\verb|*}|''.
wenzelm@7315
    71
wenzelm@7895
    72
Comments take the form \texttt{(*~\dots~*)} and may be
wenzelm@8378
    73
nested\footnote{Proof~General may occasionally get confused by nested
wenzelm@8378
    74
  comments.}, just as in ML. Note that these are \emph{source} comments only,
wenzelm@8378
    75
which are stripped after lexical analysis of the input.  The Isar document
wenzelm@8378
    76
syntax also provides \emph{formal comments} that are actually part of the text
wenzelm@8378
    77
(see \S\ref{sec:comments}).
wenzelm@7315
    78
wenzelm@7046
    79
wenzelm@7046
    80
\section{Common syntax entities}
wenzelm@7046
    81
wenzelm@7335
    82
Subsequently, we introduce several basic syntactic entities, such as names,
wenzelm@7895
    83
terms, and theorem specifications, which have been factored out of the actual
wenzelm@7895
    84
Isar language elements to be described later.
wenzelm@7134
    85
wenzelm@7981
    86
Note that some of the basic syntactic entities introduced below (e.g.\ 
wenzelm@7895
    87
\railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\ 
wenzelm@7895
    88
\railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax
wenzelm@7895
    89
elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
wenzelm@7895
    90
would really report a missing name or type rather than any of the constituent
wenzelm@7895
    91
primitive tokens such as \railtoken{ident} or \railtoken{string}.
wenzelm@7046
    92
wenzelm@7050
    93
wenzelm@7050
    94
\subsection{Names}
wenzelm@7050
    95
wenzelm@7134
    96
Entity \railqtoken{name} usually refers to any name of types, constants,
wenzelm@7167
    97
theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
wenzelm@8548
    98
identifiers are excluded here).  Quoted strings provide an escape for
wenzelm@7134
    99
non-identifier names or those ruled out by outer syntax keywords (e.g.\ 
wenzelm@7134
   100
\verb|"let"|).  Already existing objects are usually referenced by
wenzelm@7134
   101
\railqtoken{nameref}.
wenzelm@7050
   102
wenzelm@7141
   103
\indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
wenzelm@7046
   104
\begin{rail}
wenzelm@8145
   105
  name: ident | symident | string | nat
wenzelm@7046
   106
  ;
wenzelm@7167
   107
  parname: '(' name ')'
wenzelm@7141
   108
  ;
wenzelm@7167
   109
  nameref: name | longident
wenzelm@7046
   110
  ;
wenzelm@7046
   111
\end{rail}
wenzelm@7046
   112
wenzelm@7050
   113
wenzelm@7315
   114
\subsection{Comments}\label{sec:comments}
wenzelm@7046
   115
wenzelm@7167
   116
Large chunks of plain \railqtoken{text} are usually given
wenzelm@7895
   117
\railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For
wenzelm@7175
   118
convenience, any of the smaller text units conforming to \railqtoken{nameref}
wenzelm@8102
   119
are admitted as well.  Almost any of the Isar commands may be annotated by
wenzelm@7466
   120
marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
wenzelm@7466
   121
Note that the latter kind of comment is actually part of the language, while
wenzelm@7895
   122
source level comments \verb|(*|~\dots~\verb|*)| are stripped at the lexical
wenzelm@7466
   123
level.  A few commands such as $\PROOFNAME$ admit additional markup with a
wenzelm@7466
   124
``level of interest'': \texttt{\%} followed by an optional number $n$ (default
wenzelm@7466
   125
$n = 1$) indicates that the respective part of the document becomes $n$ levels
wenzelm@7466
   126
more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
wenzelm@7466
   127
every hope, who enter here.
wenzelm@7050
   128
wenzelm@7050
   129
\indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
wenzelm@7046
   130
\begin{rail}
wenzelm@7167
   131
  text: verbatim | nameref
wenzelm@7050
   132
  ;
wenzelm@8102
   133
  comment: ('--' text +)
wenzelm@7046
   134
  ;
wenzelm@7167
   135
  interest: percent nat? | ppercent
wenzelm@7046
   136
  ;
wenzelm@7046
   137
\end{rail}
wenzelm@7046
   138
wenzelm@7046
   139
wenzelm@7335
   140
\subsection{Type classes, sorts and arities}
wenzelm@7046
   141
wenzelm@7050
   142
The syntax of sorts and arities is given directly at the outer level.  Note
wenzelm@7335
   143
that this is in contrast to types and terms (see \ref{sec:types-terms}).
wenzelm@7050
   144
wenzelm@7050
   145
\indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
wenzelm@7135
   146
\indexouternonterm{classdecl}
wenzelm@7046
   147
\begin{rail}
wenzelm@7321
   148
  classdecl: name ('<' (nameref + ','))?
wenzelm@7046
   149
  ;
wenzelm@7167
   150
  sort: nameref | lbrace (nameref * ',') rbrace
wenzelm@7046
   151
  ;
wenzelm@7167
   152
  arity: ('(' (sort + ',') ')')? sort
wenzelm@7046
   153
  ;
wenzelm@7167
   154
  simplearity: ('(' (sort + ',') ')')? nameref
wenzelm@7167
   155
  ;
wenzelm@7046
   156
\end{rail}
wenzelm@7046
   157
wenzelm@7046
   158
wenzelm@7167
   159
\subsection{Types and terms}\label{sec:types-terms}
wenzelm@7046
   160
wenzelm@7167
   161
The actual inner Isabelle syntax, that of types and terms of the logic, is far
wenzelm@7895
   162
too sophisticated in order to be modelled explicitly at the outer theory
wenzelm@8548
   163
level.  Basically, any such entity has to be quoted to turn it into a single
wenzelm@8548
   164
token (the parsing and type-checking is performed internally later).  For
wenzelm@8548
   165
convenience, a slightly more liberal convention is adopted: quotes may be
wenzelm@7895
   166
omitted for any type or term that is already \emph{atomic} at the outer level.
wenzelm@7895
   167
For example, one may write just \texttt{x} instead of \texttt{"x"}.  Note that
wenzelm@8548
   168
symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,
wenzelm@8548
   169
provided these are not superseded by commands or keywords (e.g.\ \texttt{+}).
wenzelm@7050
   170
wenzelm@7050
   171
\indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
wenzelm@7046
   172
\begin{rail}
wenzelm@7167
   173
  type: nameref | typefree | typevar
wenzelm@7050
   174
  ;
wenzelm@8593
   175
  term: nameref | var
wenzelm@7050
   176
  ;
wenzelm@7167
   177
  prop: term
wenzelm@7050
   178
  ;
wenzelm@7046
   179
\end{rail}
wenzelm@7046
   180
wenzelm@7167
   181
Type declarations and definitions usually refer to \railnonterm{typespec} on
wenzelm@7167
   182
the left-hand side.  This models basic type constructor application at the
wenzelm@7167
   183
outer syntax level.  Note that only plain postfix notation is available here,
wenzelm@7167
   184
but no infixes.
wenzelm@7050
   185
wenzelm@7050
   186
\indexouternonterm{typespec}
wenzelm@7050
   187
\begin{rail}
wenzelm@7167
   188
  typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
wenzelm@7050
   189
  ;
wenzelm@7050
   190
\end{rail}
wenzelm@7050
   191
wenzelm@7050
   192
wenzelm@7315
   193
\subsection{Term patterns}\label{sec:term-pats}
wenzelm@7050
   194
wenzelm@7895
   195
Assumptions and goal statements usually admit casual binding of schematic term
wenzelm@7981
   196
variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
wenzelm@7167
   197
There are separate versions available for \railqtoken{term}s and
wenzelm@7167
   198
\railqtoken{prop}s.  The latter provides a $\CONCLNAME$ part with patterns
wenzelm@7167
   199
referring the (atomic) conclusion of a rule.
wenzelm@7050
   200
wenzelm@7050
   201
\indexouternonterm{termpat}\indexouternonterm{proppat}
wenzelm@7050
   202
\begin{rail}
wenzelm@7167
   203
  termpat: '(' ('is' term +) ')'
wenzelm@7050
   204
  ;
wenzelm@7167
   205
  proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'
wenzelm@7050
   206
  ;
wenzelm@7050
   207
\end{rail}
wenzelm@7050
   208
wenzelm@7050
   209
wenzelm@7046
   210
\subsection{Mixfix annotations}
wenzelm@7046
   211
wenzelm@7134
   212
Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
wenzelm@8548
   213
terms (see also \cite{isabelle-ref}).  Some commands such as $\TYPES$ (see
wenzelm@8548
   214
\S\ref{sec:types-pure}) admit infixes only, while $\CONSTS$ (see
wenzelm@8548
   215
\S\ref{sec:consts}) and $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans})
wenzelm@8548
   216
support the full range of general mixfixes and binders.
wenzelm@7046
   217
wenzelm@7050
   218
\indexouternonterm{infix}\indexouternonterm{mixfix}
wenzelm@7050
   219
\begin{rail}
wenzelm@7167
   220
  infix: '(' ('infixl' | 'infixr') string? nat ')'
wenzelm@7167
   221
  ;
wenzelm@7175
   222
  mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
wenzelm@7050
   223
  ;
wenzelm@7046
   224
wenzelm@7175
   225
  prios: '[' (nat + ',') ']'
wenzelm@7050
   226
  ;
wenzelm@7050
   227
\end{rail}
wenzelm@7046
   228
wenzelm@7050
   229
wenzelm@7134
   230
\subsection{Attributes and theorems}\label{sec:syn-att}
wenzelm@7050
   231
wenzelm@7050
   232
Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
wenzelm@7335
   233
``semi-inner'' syntax, in the sense that input conforming to
wenzelm@7335
   234
\railnonterm{args} below is parsed by the attribute a second time.  The
wenzelm@7335
   235
attribute argument specifications may be any sequence of atomic entities
wenzelm@7335
   236
(identifiers, strings etc.), or properly bracketed argument lists.  Below
wenzelm@7981
   237
\railqtoken{atom} refers to any atomic entity, including any
wenzelm@7981
   238
\railtoken{keyword} conforming to \railtoken{symident}.
wenzelm@7050
   239
wenzelm@7050
   240
\indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
wenzelm@7050
   241
\begin{rail}
wenzelm@7466
   242
  atom: nameref | typefree | typevar | var | nat | keyword
wenzelm@7050
   243
  ;
wenzelm@7167
   244
  arg: atom | '(' args ')' | '[' args ']' | lbrace args rbrace
wenzelm@7134
   245
  ;
wenzelm@7167
   246
  args: arg *
wenzelm@7134
   247
  ;
wenzelm@7167
   248
  attributes: '[' (nameref args * ',') ']'
wenzelm@7050
   249
  ;
wenzelm@7050
   250
\end{rail}
wenzelm@7050
   251
wenzelm@7895
   252
Theorem specifications come in several flavors: \railnonterm{axmdecl} and
wenzelm@7175
   253
\railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
wenzelm@7981
   254
statements, while \railnonterm{thmdef} collects lists of existing theorems.
wenzelm@7981
   255
Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
wenzelm@7981
   256
the former requires an actual singleton result.  Any of these theorem
wenzelm@7175
   257
specifications may include lists of attributes both on the left and right hand
wenzelm@7466
   258
sides; attributes are applied to any immediately preceding theorem.  If names
wenzelm@7981
   259
are omitted, the theorems are not stored within the theorem database of the
wenzelm@7981
   260
theory or proof context; any given attributes are still applied, though.
wenzelm@7050
   261
wenzelm@7135
   262
\indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
wenzelm@7135
   263
\indexouternonterm{thmdef}\indexouternonterm{thmrefs}
wenzelm@7050
   264
\begin{rail}
wenzelm@7167
   265
  axmdecl: name attributes? ':'
wenzelm@7050
   266
  ;
wenzelm@7167
   267
  thmdecl: thmname ':'
wenzelm@7135
   268
  ;
wenzelm@7167
   269
  thmdef: thmname '='
wenzelm@7050
   270
  ;
wenzelm@7175
   271
  thmref: nameref attributes?
wenzelm@7175
   272
  ;
wenzelm@7175
   273
  thmrefs: thmref +
wenzelm@7134
   274
  ;
wenzelm@7167
   275
wenzelm@7167
   276
  thmname: name attributes | name | attributes
wenzelm@7050
   277
  ;
wenzelm@7050
   278
\end{rail}
wenzelm@7050
   279
wenzelm@7050
   280
wenzelm@7050
   281
\subsection{Proof methods}\label{sec:syn-meth}
wenzelm@7050
   282
wenzelm@7050
   283
Proof methods are either basic ones, or expressions composed of methods via
wenzelm@7175
   284
``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),
wenzelm@7981
   285
``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,
wenzelm@7981
   286
proof methods are usually just a comma separated list of
wenzelm@7981
   287
\railqtoken{nameref}~\railnonterm{args} specifications.  Note that parentheses
wenzelm@7981
   288
may be dropped for single method specifications (with no arguments).
wenzelm@7050
   289
wenzelm@7050
   290
\indexouternonterm{method}
wenzelm@7050
   291
\begin{rail}
wenzelm@7430
   292
  method: (nameref | '(' methods ')') (() | '?' | '+')
wenzelm@7134
   293
  ;
wenzelm@7167
   294
  methods: (nameref args | method) + (',' | '|')
wenzelm@7050
   295
  ;
wenzelm@7050
   296
\end{rail}
wenzelm@7046
   297
wenzelm@8532
   298
Proper use of Isar proof methods does \emph{not} involve goal addressing.
wenzelm@8532
   299
Nevertheless, specifying goal ranges may occasionally come in handy in
wenzelm@8532
   300
emulating tactic scripts.  Note that $[n-]$ refers to all goals, starting from
wenzelm@8548
   301
$n$.  All goals may be specified by $[!]$, which is the same as $[1-]$.
wenzelm@8532
   302
wenzelm@8532
   303
\indexouternonterm{goalspec}
wenzelm@8532
   304
\begin{rail}
wenzelm@8548
   305
  goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'
wenzelm@8532
   306
  ;
wenzelm@8532
   307
\end{rail}
wenzelm@8532
   308
wenzelm@7046
   309
wenzelm@7046
   310
%%% Local Variables: 
wenzelm@7046
   311
%%% mode: latex
wenzelm@7046
   312
%%% TeX-master: "isar-ref"
wenzelm@7046
   313
%%% End: