doc-src/IsarRef/syntax.tex
author wenzelm
Sun, 21 May 2000 14:33:46 +0200
changeset 8896 c80aba8c1d5e
parent 8690 48786b52c8d8
child 9051 887a15590f0e
permissions -rw-r--r--
replaced {{ }} by { };
     1 
     2 \chapter{Isar Syntax Primitives}
     3 
     4 We give a complete reference of all basic syntactic entities underlying the
     5 Isabelle/Isar document syntax.  Actual theory and proof commands will be
     6 introduced later on.
     7 
     8 \medskip
     9 
    10 In order to get started with writing well-formed Isabelle/Isar documents, the
    11 most important aspect to be noted is the difference of \emph{inner} versus
    12 \emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the
    13 logic, while outer syntax is that of Isabelle/Isar theories (including
    14 proofs).  As a general rule, inner syntax entities may occur only as
    15 \emph{atomic entities} within outer syntax.  For example, the string
    16 \texttt{"x + y"} and identifier \texttt{z} are legal term specifications
    17 within a theory, while \texttt{x + y} is not.
    18 
    19 \begin{warn}
    20   Note that classic Isabelle theories used to fake parts of the inner syntax
    21   of types, with rather complicated rules when quotes may be omitted.  Despite
    22   the minor drawback of requiring quotes more often, the syntax of
    23   Isabelle/Isar is much simpler and more robust in that respect.
    24 \end{warn}
    25 
    26 \medskip
    27 
    28 Another notable point is proper input termination.  Proof~General demands any
    29 command to be terminated by ``\texttt{;}''
    30 (semicolon)\index{semicolon}\index{*;}.  As far as plain Isabelle/Isar is
    31 concerned, commands may be directly run together, though.  In the presentation
    32 of Isabelle/Isar documents, semicolons are omitted in order to gain
    33 readability.
    34 
    35 
    36 \section{Lexical matters}\label{sec:lex-syntax}
    37 
    38 The Isabelle/Isar outer syntax provides token classes as presented below.
    39 Note that some of these coincide (by full intention) with the inner lexical
    40 syntax as presented in \cite{isabelle-ref}.  These different levels of syntax
    41 should not be confused, though.
    42 
    43 %FIXME keyword, command
    44 \begin{matharray}{rcl}
    45   ident & = & letter~quasiletter^* \\
    46   longident & = & ident\verb,.,ident~\dots~ident \\
    47   symident & = & sym^+ ~|~ symbol \\
    48   nat & = & digit^+ \\
    49   var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
    50   typefree & = & \verb,',ident \\
    51   typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
    52   string & = & \verb,", ~\dots~ \verb,", \\
    53   verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\
    54 \end{matharray}
    55 \begin{matharray}{rcl}
    56   letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
    57   digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
    58   quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
    59   sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$
    60    \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~
    61    \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \mathtt{\at} ~|~ \\
    62   & & \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\
    63   symbol & = & {\forall} ~|~ {\exists} ~|~ \dots
    64 \end{matharray}
    65 
    66 The syntax of \texttt{string} admits any characters, including newlines;
    67 ``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) have to be escaped by
    68 a backslash.  Note that ML-style control characters are \emph{not} supported.
    69 The body of \texttt{verbatim} may consist of any text not containing
    70 ``\verb|*}|''.
    71 
    72 Comments take the form \texttt{(*~\dots~*)} and may be
    73 nested\footnote{Proof~General may occasionally get confused by nested
    74   comments.}, just as in ML. Note that these are \emph{source} comments only,
    75 which are stripped after lexical analysis of the input.  The Isar document
    76 syntax also provides \emph{formal comments} that are actually part of the text
    77 (see \S\ref{sec:comments}).
    78 
    79 
    80 \section{Common syntax entities}
    81 
    82 Subsequently, we introduce several basic syntactic entities, such as names,
    83 terms, and theorem specifications, which have been factored out of the actual
    84 Isar language elements to be described later.
    85 
    86 Note that some of the basic syntactic entities introduced below (e.g.\ 
    87 \railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\ 
    88 \railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax
    89 elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
    90 would really report a missing name or type rather than any of the constituent
    91 primitive tokens such as \railtoken{ident} or \railtoken{string}.
    92 
    93 
    94 \subsection{Names}
    95 
    96 Entity \railqtoken{name} usually refers to any name of types, constants,
    97 theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
    98 identifiers are excluded here).  Quoted strings provide an escape for
    99 non-identifier names or those ruled out by outer syntax keywords (e.g.\ 
   100 \verb|"let"|).  Already existing objects are usually referenced by
   101 \railqtoken{nameref}.
   102 
   103 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   104 \begin{rail}
   105   name: ident | symident | string | nat
   106   ;
   107   parname: '(' name ')'
   108   ;
   109   nameref: name | longident
   110   ;
   111 \end{rail}
   112 
   113 
   114 \subsection{Comments}\label{sec:comments}
   115 
   116 Large chunks of plain \railqtoken{text} are usually given
   117 \railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For
   118 convenience, any of the smaller text units conforming to \railqtoken{nameref}
   119 are admitted as well.  Almost any of the Isar commands may be annotated by
   120 marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
   121 Note that the latter kind of comment is actually part of the language, while
   122 source level comments \verb|(*|~\dots~\verb|*)| are stripped at the lexical
   123 level.  A few commands such as $\PROOFNAME$ admit additional markup with a
   124 ``level of interest'': \texttt{\%} followed by an optional number $n$ (default
   125 $n = 1$) indicates that the respective part of the document becomes $n$ levels
   126 more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
   127 every hope, who enter here.
   128 
   129 \indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
   130 \begin{rail}
   131   text: verbatim | nameref
   132   ;
   133   comment: ('--' text +)
   134   ;
   135   interest: percent nat? | ppercent
   136   ;
   137 \end{rail}
   138 
   139 
   140 \subsection{Type classes, sorts and arities}
   141 
   142 Classes are specified by plain names.  Sorts have a very simple inner syntax,
   143 which is either a single class name $c$ or a list $\{c@1, \dots, c@n\}$
   144 referring to the intersection of these classes.  The syntax of type arities is
   145 given directly at the outer level.
   146 
   147 \indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
   148 \indexouternonterm{classdecl}
   149 \begin{rail}
   150   classdecl: name ('<' (nameref + ','))?
   151   ;
   152   sort: nameref
   153   ;
   154   arity: ('(' (sort + ',') ')')? sort
   155   ;
   156   simplearity: ('(' (sort + ',') ')')? nameref
   157   ;
   158 \end{rail}
   159 
   160 
   161 \subsection{Types and terms}\label{sec:types-terms}
   162 
   163 The actual inner Isabelle syntax, that of types and terms of the logic, is far
   164 too sophisticated in order to be modelled explicitly at the outer theory
   165 level.  Basically, any such entity has to be quoted to turn it into a single
   166 token (the parsing and type-checking is performed internally later).  For
   167 convenience, a slightly more liberal convention is adopted: quotes may be
   168 omitted for any type or term that is already \emph{atomic} at the outer level.
   169 For example, one may write just \texttt{x} instead of \texttt{"x"}.  Note that
   170 symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,
   171 provided these are not superseded by commands or keywords (e.g.\ \texttt{+}).
   172 
   173 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   174 \begin{rail}
   175   type: nameref | typefree | typevar
   176   ;
   177   term: nameref | var
   178   ;
   179   prop: term
   180   ;
   181 \end{rail}
   182 
   183 Positional instantiations are indicated by giving a sequence of terms, or the
   184 placeholder ``$\_$'' (underscore), which means to skip a position.
   185 
   186 \indexoutertoken{inst}\indexoutertoken{insts}
   187 \begin{rail}
   188   inst: underscore | term
   189   ;
   190   insts: (inst *)
   191   ;
   192 \end{rail}
   193 
   194 Type declarations and definitions usually refer to \railnonterm{typespec} on
   195 the left-hand side.  This models basic type constructor application at the
   196 outer syntax level.  Note that only plain postfix notation is available here,
   197 but no infixes.
   198 
   199 \indexouternonterm{typespec}
   200 \begin{rail}
   201   typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   202   ;
   203 \end{rail}
   204 
   205 
   206 \subsection{Term patterns}\label{sec:term-pats}
   207 
   208 Assumptions and goal statements usually admit casual binding of schematic term
   209 variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
   210 There are separate versions available for \railqtoken{term}s and
   211 \railqtoken{prop}s.  The latter provides a $\CONCLNAME$ part with patterns
   212 referring the (atomic) conclusion of a rule.
   213 
   214 \indexouternonterm{termpat}\indexouternonterm{proppat}
   215 \begin{rail}
   216   termpat: '(' ('is' term +) ')'
   217   ;
   218   proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'
   219   ;
   220 \end{rail}
   221 
   222 
   223 \subsection{Mixfix annotations}
   224 
   225 Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
   226 terms (see also \cite{isabelle-ref}).  Some commands such as $\TYPES$ (see
   227 \S\ref{sec:types-pure}) admit infixes only, while $\CONSTS$ (see
   228 \S\ref{sec:consts}) and $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans})
   229 support the full range of general mixfixes and binders.
   230 
   231 \indexouternonterm{infix}\indexouternonterm{mixfix}
   232 \begin{rail}
   233   infix: '(' ('infixl' | 'infixr') string? nat ')'
   234   ;
   235   mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
   236   ;
   237 
   238   prios: '[' (nat + ',') ']'
   239   ;
   240 \end{rail}
   241 
   242 
   243 \subsection{Attributes and theorems}\label{sec:syn-att}
   244 
   245 Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
   246 ``semi-inner'' syntax, in the sense that input conforming to
   247 \railnonterm{args} below is parsed by the attribute a second time.  The
   248 attribute argument specifications may be any sequence of atomic entities
   249 (identifiers, strings etc.), or properly bracketed argument lists.  Below
   250 \railqtoken{atom} refers to any atomic entity, including any
   251 \railtoken{keyword} conforming to \railtoken{symident}.
   252 
   253 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   254 \begin{rail}
   255   atom: nameref | typefree | typevar | var | nat | keyword
   256   ;
   257   arg: atom | '(' args ')' | '[' args ']'
   258   ;
   259   args: arg *
   260   ;
   261   attributes: '[' (nameref args * ',') ']'
   262   ;
   263 \end{rail}
   264 
   265 Theorem specifications come in several flavors: \railnonterm{axmdecl} and
   266 \railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
   267 statements, while \railnonterm{thmdef} collects lists of existing theorems.
   268 Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
   269 the former requires an actual singleton result.  Any of these theorem
   270 specifications may include lists of attributes both on the left and right hand
   271 sides; attributes are applied to any immediately preceding theorem.  If names
   272 are omitted, the theorems are not stored within the theorem database of the
   273 theory or proof context; any given attributes are still applied, though.
   274 
   275 \indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
   276 \indexouternonterm{thmdef}\indexouternonterm{thmrefs}
   277 \begin{rail}
   278   axmdecl: name attributes? ':'
   279   ;
   280   thmdecl: thmname ':'
   281   ;
   282   thmdef: thmname '='
   283   ;
   284   thmref: nameref attributes?
   285   ;
   286   thmrefs: thmref +
   287   ;
   288 
   289   thmname: name attributes | name | attributes
   290   ;
   291 \end{rail}
   292 
   293 
   294 \subsection{Proof methods}\label{sec:syn-meth}
   295 
   296 Proof methods are either basic ones, or expressions composed of methods via
   297 ``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),
   298 ``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,
   299 proof methods are usually just a comma separated list of
   300 \railqtoken{nameref}~\railnonterm{args} specifications.  Note that parentheses
   301 may be dropped for single method specifications (with no arguments).
   302 
   303 \indexouternonterm{method}
   304 \begin{rail}
   305   method: (nameref | '(' methods ')') (() | '?' | '+')
   306   ;
   307   methods: (nameref args | method) + (',' | '|')
   308   ;
   309 \end{rail}
   310 
   311 Proper use of Isar proof methods does \emph{not} involve goal addressing.
   312 Nevertheless, specifying goal ranges may occasionally come in handy in
   313 emulating tactic scripts.  Note that $[n-]$ refers to all goals, starting from
   314 $n$.  All goals may be specified by $[!]$, which is the same as $[1-]$.
   315 
   316 \indexouternonterm{goalspec}
   317 \begin{rail}
   318   goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'
   319   ;
   320 \end{rail}
   321 
   322 
   323 %%% Local Variables: 
   324 %%% mode: latex
   325 %%% TeX-master: "isar-ref"
   326 %%% End: