doc-src/IsarRef/syntax.tex
author wenzelm
Wed, 04 Oct 2000 21:05:10 +0200
changeset 10152 473807a5a436
parent 9752 a09f4a7accea
child 10160 bb8f9412fec6
permissions -rw-r--r--
improved @;
     1 
     2 \chapter{Isar Syntax Primitives}
     3 
     4 We give a complete reference of all basic syntactic entities underlying the
     5 Isabelle/Isar document syntax.  Actual theory and proof commands will be
     6 introduced later on.
     7 
     8 \medskip
     9 
    10 In order to get started with writing well-formed Isabelle/Isar documents, the
    11 most important aspect to be noted is the difference of \emph{inner} versus
    12 \emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the
    13 logic, while outer syntax is that of Isabelle/Isar theories (including
    14 proofs).  As a general rule, inner syntax entities may occur only as
    15 \emph{atomic entities} within outer syntax.  For example, the string
    16 \texttt{"x + y"} and identifier \texttt{z} are legal term specifications
    17 within a theory, while \texttt{x + y} is not.
    18 
    19 \begin{warn}
    20   Note that classic Isabelle theories used to fake parts of the inner syntax
    21   of types, with rather complicated rules when quotes may be omitted.  Despite
    22   the minor drawback of requiring quotes more often, the syntax of
    23   Isabelle/Isar is much simpler and more robust in that respect.
    24 \end{warn}
    25 
    26 \medskip
    27 
    28 Isabelle/Isar input may contain any number of input termination characters
    29 ``\texttt{;}'' (semicolon) to separate commands explicitly.  This may be
    30 particularly useful in interactive shell sessions to make clear where the
    31 current command is intended to end.  Otherwise, the interactive loop will
    32 continue until end-of-command in clearly indicated from the input syntax,
    33 e.g.\ encounter of the next command keyword.
    34 
    35 Advanced interfaces such as Proof~General \cite{proofgeneral} do not require
    36 explicit semicolons, the amount of input text is determined automatically by
    37 inspecting the Emacs text buffer.  Also note that in the presentation of
    38 Isabelle/Isar documents, semicolons are omitted in any case to gain
    39 readability.
    40 
    41 
    42 \section{Lexical matters}\label{sec:lex-syntax}
    43 
    44 The Isabelle/Isar outer syntax provides token classes as presented below.
    45 Note that some of these coincide (by full intention) with the inner lexical
    46 syntax as presented in \cite{isabelle-ref}.  These different levels of syntax
    47 should not be confused, though.
    48 
    49 %FIXME keyword, command
    50 \indexoutertoken{ident}\indexoutertoken{longident}\indexoutertoken{symident}
    51 \indexoutertoken{nat}\indexoutertoken{var}\indexoutertoken{typefree}
    52 \indexoutertoken{typevar}\indexoutertoken{string}\indexoutertoken{verbatim}
    53 \begin{matharray}{rcl}
    54   ident & = & letter~quasiletter^* \\
    55   longident & = & ident\verb,.,ident~\dots~ident \\
    56   symident & = & sym^+ ~|~ symbol \\
    57   nat & = & digit^+ \\
    58   var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
    59   typefree & = & \verb,',ident \\
    60   typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
    61   string & = & \verb,", ~\dots~ \verb,", \\
    62   verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\
    63 \end{matharray}
    64 \begin{matharray}{rcl}
    65   letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
    66   digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
    67   quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
    68   sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$
    69    \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~
    70    \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \texttt{\at} ~|~ \\
    71   & & \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\
    72   symbol & = & {\forall} ~|~ {\exists} ~|~ \dots
    73 \end{matharray}
    74 
    75 The syntax of \texttt{string} admits any characters, including newlines;
    76 ``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) have to be escaped by
    77 a backslash; newlines need not be escaped.  Note that ML-style control
    78 characters are \emph{not} supported.  The body of \texttt{verbatim} may
    79 consist of any text not containing ``\verb|*}|''.
    80 
    81 Comments take the form \texttt{(*~\dots~*)} and may be
    82 nested\footnote{Proof~General may occasionally get confused by nested
    83   comments.}, just as in ML. Note that these are \emph{source} comments only,
    84 which are stripped after lexical analysis of the input.  The Isar document
    85 syntax also provides \emph{formal comments} that are actually part of the text
    86 (see \S\ref{sec:comments}).
    87 
    88 
    89 \section{Common syntax entities}
    90 
    91 Subsequently, we introduce several basic syntactic entities, such as names,
    92 terms, and theorem specifications, which have been factored out of the actual
    93 Isar language elements to be described later.
    94 
    95 Note that some of the basic syntactic entities introduced below (e.g.\ 
    96 \railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\ 
    97 \railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax
    98 elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
    99 would really report a missing name or type rather than any of the constituent
   100 primitive tokens such as \railtoken{ident} or \railtoken{string}.
   101 
   102 
   103 \subsection{Names}
   104 
   105 Entity \railqtoken{name} usually refers to any name of types, constants,
   106 theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
   107 identifiers are excluded here).  Quoted strings provide an escape for
   108 non-identifier names or those ruled out by outer syntax keywords (e.g.\ 
   109 \verb|"let"|).  Already existing objects are usually referenced by
   110 \railqtoken{nameref}.
   111 
   112 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   113 \indexoutertoken{int}
   114 \begin{rail}
   115   name: ident | symident | string | nat
   116   ;
   117   parname: '(' name ')'
   118   ;
   119   nameref: name | longident
   120   ;
   121   int: nat | '-' nat
   122   ;
   123 \end{rail}
   124 
   125 
   126 \subsection{Comments}\label{sec:comments}
   127 
   128 Large chunks of plain \railqtoken{text} are usually given
   129 \railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For
   130 convenience, any of the smaller text units conforming to \railqtoken{nameref}
   131 are admitted as well.  Almost any of the Isar commands may be annotated by
   132 marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
   133 Note that the latter kind of comment is actually part of the language, while
   134 source level comments \verb|(*|~\dots~\verb|*)| are stripped at the lexical
   135 level.  A few commands such as $\PROOFNAME$ admit additional markup with a
   136 ``level of interest'': \texttt{\%} followed by an optional number $n$ (default
   137 $n = 1$) indicates that the respective part of the document becomes $n$ levels
   138 more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
   139 every hope, who enter here.
   140 
   141 \indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
   142 \begin{rail}
   143   text: verbatim | nameref
   144   ;
   145   comment: ('--' text +)
   146   ;
   147   interest: percent nat? | ppercent
   148   ;
   149 \end{rail}
   150 
   151 
   152 \subsection{Type classes, sorts and arities}
   153 
   154 Classes are specified by plain names.  Sorts have a very simple inner syntax,
   155 which is either a single class name $c$ or a list $\{c@1, \dots, c@n\}$
   156 referring to the intersection of these classes.  The syntax of type arities is
   157 given directly at the outer level.
   158 
   159 \indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
   160 \indexouternonterm{classdecl}
   161 \begin{rail}
   162   classdecl: name ('<' (nameref + ','))?
   163   ;
   164   sort: nameref
   165   ;
   166   arity: ('(' (sort + ',') ')')? sort
   167   ;
   168   simplearity: ('(' (sort + ',') ')')? nameref
   169   ;
   170 \end{rail}
   171 
   172 
   173 \subsection{Types and terms}\label{sec:types-terms}
   174 
   175 The actual inner Isabelle syntax, that of types and terms of the logic, is far
   176 too sophisticated in order to be modelled explicitly at the outer theory
   177 level.  Basically, any such entity has to be quoted to turn it into a single
   178 token (the parsing and type-checking is performed internally later).  For
   179 convenience, a slightly more liberal convention is adopted: quotes may be
   180 omitted for any type or term that is already \emph{atomic} at the outer level.
   181 For example, one may write just \texttt{x} instead of \texttt{"x"}.  Note that
   182 symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,
   183 provided these are not superseded by commands or keywords (e.g.\ \texttt{+}).
   184 
   185 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   186 \begin{rail}
   187   type: nameref | typefree | typevar
   188   ;
   189   term: nameref | var
   190   ;
   191   prop: term
   192   ;
   193 \end{rail}
   194 
   195 Positional instantiations are indicated by giving a sequence of terms, or the
   196 placeholder ``$\_$'' (underscore), which means to skip a position.
   197 
   198 \indexoutertoken{inst}\indexoutertoken{insts}
   199 \begin{rail}
   200   inst: underscore | term
   201   ;
   202   insts: (inst *)
   203   ;
   204 \end{rail}
   205 
   206 Type declarations and definitions usually refer to \railnonterm{typespec} on
   207 the left-hand side.  This models basic type constructor application at the
   208 outer syntax level.  Note that only plain postfix notation is available here,
   209 but no infixes.
   210 
   211 \indexouternonterm{typespec}
   212 \begin{rail}
   213   typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   214   ;
   215 \end{rail}
   216 
   217 
   218 \subsection{Term patterns}\label{sec:term-pats}
   219 
   220 Assumptions and goal statements usually admit casual binding of schematic term
   221 variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
   222 There are separate versions available for \railqtoken{term}s and
   223 \railqtoken{prop}s.  The latter provides a $\CONCLNAME$ part with patterns
   224 referring the (atomic) conclusion of a rule.
   225 
   226 \indexouternonterm{termpat}\indexouternonterm{proppat}
   227 \begin{rail}
   228   termpat: '(' ('is' term +) ')'
   229   ;
   230   proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'
   231   ;
   232 \end{rail}
   233 
   234 
   235 \subsection{Mixfix annotations}
   236 
   237 Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
   238 terms (see also \cite{isabelle-ref}).  Some commands such as $\TYPES$ (see
   239 \S\ref{sec:types-pure}) admit infixes only, while $\CONSTS$ (see
   240 \S\ref{sec:consts}) and $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans})
   241 support the full range of general mixfixes and binders.
   242 
   243 \indexouternonterm{infix}\indexouternonterm{mixfix}
   244 \begin{rail}
   245   infix: '(' ('infixl' | 'infixr') string? nat ')'
   246   ;
   247   mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
   248   ;
   249 
   250   prios: '[' (nat + ',') ']'
   251   ;
   252 \end{rail}
   253 
   254 
   255 \subsection{Attributes and theorems}\label{sec:syn-att}
   256 
   257 Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
   258 ``semi-inner'' syntax, in the sense that input conforming to
   259 \railnonterm{args} below is parsed by the attribute a second time.  The
   260 attribute argument specifications may be any sequence of atomic entities
   261 (identifiers, strings etc.), or properly bracketed argument lists.  Below
   262 \railqtoken{atom} refers to any atomic entity, including any
   263 \railtoken{keyword} conforming to \railtoken{symident}.
   264 
   265 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   266 \begin{rail}
   267   atom: nameref | typefree | typevar | var | nat | keyword
   268   ;
   269   arg: atom | '(' args ')' | '[' args ']'
   270   ;
   271   args: arg *
   272   ;
   273   attributes: '[' (nameref args * ',') ']'
   274   ;
   275 \end{rail}
   276 
   277 Theorem specifications come in several flavors: \railnonterm{axmdecl} and
   278 \railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
   279 statements, while \railnonterm{thmdef} collects lists of existing theorems.
   280 Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
   281 the former requires an actual singleton result.  Any of these theorem
   282 specifications may include lists of attributes both on the left and right hand
   283 sides; attributes are applied to any immediately preceding theorem.  If names
   284 are omitted, the theorems are not stored within the theorem database of the
   285 theory or proof context; any given attributes are still applied, though.
   286 
   287 \indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
   288 \indexouternonterm{thmdef}\indexouternonterm{thmrefs}
   289 \begin{rail}
   290   axmdecl: name attributes? ':'
   291   ;
   292   thmdecl: thmbind ':'
   293   ;
   294   thmdef: thmbind '='
   295   ;
   296   thmref: nameref attributes?
   297   ;
   298   thmrefs: thmref +
   299   ;
   300 
   301   thmbind: name attributes | name | attributes
   302   ;
   303 \end{rail}
   304 
   305 
   306 \subsection{Proof methods}\label{sec:syn-meth}
   307 
   308 Proof methods are either basic ones, or expressions composed of methods via
   309 ``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),
   310 ``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,
   311 proof methods are usually just a comma separated list of
   312 \railqtoken{nameref}~\railnonterm{args} specifications.  Note that parentheses
   313 may be dropped for single method specifications (with no arguments).
   314 
   315 \indexouternonterm{method}
   316 \begin{rail}
   317   method: (nameref | '(' methods ')') (() | '?' | '+')
   318   ;
   319   methods: (nameref args | method) + (',' | '|')
   320   ;
   321 \end{rail}
   322 
   323 Proper use of Isar proof methods does \emph{not} involve goal addressing.
   324 Nevertheless, specifying goal ranges may occasionally come in handy in
   325 emulating tactic scripts.  Note that $[n-]$ refers to all goals, starting from
   326 $n$.  All goals may be specified by $[!]$, which is the same as $[1-]$.
   327 
   328 \indexouternonterm{goalspec}
   329 \begin{rail}
   330   goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'
   331   ;
   332 \end{rail}
   333 
   334 
   335 \subsection{Antiquotations}\label{sec:antiq}
   336 
   337 The text body of formal comments (see also \S\ref{sec:comments}) may contain
   338 antiquotations of logical entities, such as theorems, terms and types, which
   339 are to be presented in the final output produced by the Isabelle document
   340 preparation system (see also \S\ref{sec:document-prep}).
   341 
   342 Thus embedding of
   343 \texttt{{\at}{\ttlbrace}term[show_types]~"f(x)~=~a~+~x"{\ttrbrace}} within a
   344 text block would cause
   345 \isa{(f{\isasymColon}'a~{\isasymRightarrow}~'a)~(x{\isasymColon}'a)~=~(a{\isasymColon}'a)~+~x}
   346 to appear in the final {\LaTeX} document.
   347 
   348 Antiquotations do not only spare the author from tedious typing, but also
   349 achieve some degree of consistency-checking of informal explanations with
   350 formal developments, since well-formedness of terms and types with respect to
   351 the current theory or proof context can be ensured.  The $text$ antiquotation
   352 is an exception to this rule: it prints an uninterpreted text argument that is
   353 not checked in any way.
   354 
   355 \indexisarant{thm}\indexisarant{prop}\indexisarant{term}
   356 \indexisarant{typ}\indexisarant{name}
   357 \begin{rail}
   358   atsign lbrace antiquotation rbrace
   359   ;
   360 
   361   antiquotation:
   362     'thm' options thmrefs |
   363     'prop' options prop |
   364     'term' options term |
   365     'typ' options type |
   366     'text' options name
   367   ;
   368   options: '[' (option * ',') ']'
   369   ;
   370   option: name | name '=' name
   371   ;
   372 \end{rail}
   373 
   374 Note that the syntax of antiquotations may \emph{not} include source comments
   375 \texttt{(*~\dots~*)} or verbatim text \verb|{*|~\dots~\verb|*}|.
   376 
   377 \medskip
   378 
   379 The following options are available to tune the output.  Note that some of
   380 these coincide with ML flags of the same names (see also \cite{isabelle-ref}).
   381 \begin{descr}
   382 \item[$show_types = bool$ and $show_sorts = bool$] control printing of
   383   explicit type and sort constraints.
   384 \item[$long_names = bool$] forces names of types and constants etc.\ to be
   385   printed in their fully qualified internal form.
   386 \item[$eta_contract = bool$] prints terms in $\eta$-contracted form.
   387 \item[$display = bool$] indicates if the text is to be output as multi-line
   388   ``display material'', rather than a small piece of text without line breaks
   389   (which is the default).
   390 \item[$quotes = bool$] indicates if the output should be enclosed in double
   391   quotes.
   392 \item[$mode = name$] adds $name$ to the print mode to be used for presentation
   393   (see also \cite{isabelle-ref}).  Note that the standard setup for {\LaTeX}
   394   output is already present by default, including the modes ``$latex$'',
   395   ``$xsymbols$'', ``$symbols$''.
   396 \item[$margin = nat$ and $indent = nat$] change the margin or indentation for
   397   pretty printing of display material.
   398 \item[$source = bool$] prints the source text of the antiquotation arguments,
   399   rather than the actual value.  Note that this does not affect
   400   well-formedness checks of $thm$, $term$, etc. (only the $text$ antiquotation
   401   admits arbitrary output).
   402 \end{descr}
   403 
   404 For boolean flags, ``$name = true$'' may be abbreviated as ``$name$''.  All of
   405 the above flags are disabled by default, unless changed from ML.
   406 
   407 
   408 %%% Local Variables: 
   409 %%% mode: latex
   410 %%% TeX-master: "isar-ref"
   411 %%% End: