wneuper/isa: doc-src/IsarRef/syntax.tex@8b2eb3b78cc3

     2 \chapter{Syntax primitives}

     4 The rather generic framework of Isabelle/Isar syntax emerges from three main

     5 syntactic categories: \emph{commands} of the top-level Isar engine (covering

     6 theory and proof elements), \emph{methods} for general goal refinements

     7 (analogous to traditional ``tactics''), and \emph{attributes} for operations

     8 on facts (within a certain context).  Here we give a reference of basic

     9 syntactic entities underlying Isabelle/Isar syntax in a bottom-up manner.

    10 Concrete theory and proof language elements will be introduced later on.

    12 \medskip

    14 In order to get started with writing well-formed Isabelle/Isar documents, the

    15 most important aspect to be noted is the difference of \emph{inner} versus

    16 \emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the

    17 logic, while outer syntax is that of Isabelle/Isar theory sources (including

    18 proofs).  As a general rule, inner syntax entities may occur only as

    19 \emph{atomic entities} within outer syntax.  For example, the string

    20 \texttt{"x + y"} and identifier \texttt{z} are legal term specifications

    21 within a theory, while \texttt{x + y} is not.

    23 \begin{warn}

    24   Old-style Isabelle theories used to fake parts of the inner syntax of types,

    25   with rather complicated rules when quotes may be omitted.  Despite the minor

    26   drawback of requiring quotes more often, the syntax of Isabelle/Isar is

    27   somewhat simpler and more robust in that respect.

    28 \end{warn}

    30 Printed theory documents usually omit quotes to gain readability (this is a

    31 matter of {\LaTeX} macro setup, say via \verb,\isabellestyle,, see also

    32 \cite{isabelle-sys}).  Experienced users of Isabelle/Isar may easily

    33 reconstruct the lost technical information, while mere readers need not care

    34 about quotes at all.

    36 \medskip

    38 Isabelle/Isar input may contain any number of input termination characters

    39 ``\texttt{;}'' (semicolon) to separate commands explicitly.  This is

    40 particularly useful in interactive shell sessions to make clear where the

    41 current command is intended to end.  Otherwise, the interpreter loop will

    42 continue to issue a secondary prompt ``\verb,#,'' until an end-of-command is

    43 clearly recognized from the input syntax, e.g.\ encounter of the next command

    44 keyword.

    46 Advanced interfaces such as Proof~General \cite{proofgeneral} do not require

    47 explicit semicolons, the amount of input text is determined automatically by

    48 inspecting the present content of the Emacs text buffer.  In the printed

    49 presentation of Isabelle/Isar documents semicolons are omitted altogether for

    50 readability.

    52 \begin{warn}

    53   Proof~General requires certain syntax classification tables in order to

    54   achieve properly synchronized interaction with the Isabelle/Isar process.

    55   These tables need to be consistent with the Isabelle version and particular

    56   logic image to be used in a running session (common object-logics may well

    57   change the outer syntax).  The standard setup should work correctly with any

    58   of the ``official'' logic images derived from Isabelle/HOL (including HOLCF

    59   etc.).  Users of alternative logics may need to tell Proof~General

    60   explicitly, e.g.\ by giving an option \verb,-k ZF, (in conjunction with

    61   \verb,-l ZF, to specify the default logic image).

    62 \end{warn}

    64 \section{Lexical matters}\label{sec:lex-syntax}

    66 The Isabelle/Isar outer syntax provides token classes as presented below.

    67 Note that some of these coincide (by full intention) with the inner lexical

    68 syntax as presented in \cite{isabelle-ref}.

    70 \indexoutertoken{ident}\indexoutertoken{longident}\indexoutertoken{symident}

    71 \indexoutertoken{nat}\indexoutertoken{var}\indexoutertoken{typefree}

    72 \indexoutertoken{typevar}\indexoutertoken{string}\indexoutertoken{verbatim}

    73 \begin{matharray}{rcl}

    74   ident & = & letter~quasiletter^* \\

    75   longident & = & ident\verb,.,ident~\dots~ident \\

    76   symident & = & sym^+ ~|~ symbol \\

    77   nat & = & digit^+ \\

    78   var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\

    79   typefree & = & \verb,',ident \\

    80   typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\

    81   string & = & \verb,", ~\dots~ \verb,", \\

    82   verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\

    83 \end{matharray}

    84 \begin{matharray}{rcl}

    85   letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\

    86   digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\

    87   quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\

    88   sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$

    89    \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~ \\

    90   & & \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \texttt{\at} ~|~

    91   \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\

    92   symbol & = & {\forall} ~|~ {\exists} ~|~ {\land} ~|~ {\lor} ~|~ \dots

    93 \end{matharray}

    95 The syntax of $string$ admits any characters, including newlines; ``\verb|"|''

    96 (double-quote) and ``\verb|\|'' (backslash) need to be escaped by a backslash.

    97 Note that ML-style control characters are \emph{not} supported.  The body of

    98 $verbatim$ may consist of any text not containing ``\verb|*}|''; this allows

    99 convenient inclusion of quotes without further escapes.

   101 Comments take the form \texttt{(*~\dots~*)} and may in principle be nested,

   102 just as in ML.  Note that these are \emph{source} comments only, which are

   103 stripped after lexical analysis of the input.  The Isar document syntax also

   104 provides \emph{formal comments} that are considered as part of the text (see

   105 \S\ref{sec:comments}).

   107 \begin{warn}

   108   Proof~General does not handle nested comments properly; it is also unable to

   109   keep \verb,(*,\,/\,\verb,{*, and \verb,*),\,/\,\verb,*}, apart, despite

   110   their rather different meaning.  These are inherent problems of Emacs

   111   legacy.  Users should not be overly aggressive about nesting or alternating

   112   these delimiters.

   113 \end{warn}

   115 \medskip

   117 Mathematical symbols such as ``$\forall$'' are represented in plain ASCII as

   118 ``\verb,\<forall>,''.  Concerning Isabelle itself, any sequence of the form

   119 \verb,\<,$ident$\verb,>, (or \verb,\\<,$ident$\verb,>,) is a legal symbol.

   120 Display of appropriate glyphs is a matter of front-end tools, say the

   121 user-interface of Proof~General plus the X-Symbol package, or the {\LaTeX}

   122 macro setup of document output.  A list of predefined Isabelle symbols is

   123 given in \cite[appendix~A]{isabelle-sys}.

   126 \section{Common syntax entities}

   128 Subsequently, we introduce several basic syntactic entities, such as names,

   129 terms, and theorem specifications, which have been factored out of the actual

   130 Isar language elements to be described later.

   132 Note that some of the basic syntactic entities introduced below (e.g.\

   133 \railqtok{name}) act much like tokens rather than plain nonterminals (e.g.\

   134 \railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax

   135 elements like $\CONSTS$ referring to \railqtok{name} or \railqtok{type} would

   136 really report a missing name or type rather than any of the constituent

   137 primitive tokens such as \railtok{ident} or \railtok{string}.

   140 \subsection{Names}

   142 Entity \railqtok{name} usually refers to any name of types, constants,

   143 theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified

   144 identifiers are excluded here).  Quoted strings provide an escape for

   145 non-identifier names or those ruled out by outer syntax keywords (e.g.\

   146 \verb|"let"|).  Already existing objects are usually referenced by

   147 \railqtok{nameref}.

   149 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}

   150 \indexoutertoken{int}

   151 \begin{rail}

   152   name: ident | symident | string | nat

   153   ;

   154   parname: '(' name ')'

   155   ;

   156   nameref: name | longident

   157   ;

   158   int: nat | '-' nat

   159   ;

   160 \end{rail}

   163 \subsection{Comments}\label{sec:comments}

   165 Large chunks of plain \railqtok{text} are usually given \railtok{verbatim},

   166 i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For convenience, any of the

   167 smaller text units conforming to \railqtok{nameref} are admitted as well.  A

   168 marginal \railnonterm{comment} is of the form \texttt{--} \railqtok{text}.

   169 Any number of these may occur within Isabelle/Isar commands.

   171 \indexoutertoken{text}\indexouternonterm{comment}

   172 \begin{rail}

   173   text: verbatim | nameref

   174   ;

   175   comment: '--' text

   176   ;

   177 \end{rail}

   180 \subsection{Type classes, sorts and arities}

   182 Classes are specified by plain names.  Sorts have a very simple inner syntax,

   183 which is either a single class name $c$ or a list $\{c@1, \dots, c@n\}$

   184 referring to the intersection of these classes.  The syntax of type arities is

   185 given directly at the outer level.

   187 \railalias{subseteq}{\isasymsubseteq}

   188 \railterm{subseteq}

   190 \indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}

   191 \indexouternonterm{classdecl}

   192 \begin{rail}

   193   classdecl: name (('<' | subseteq) (nameref + ','))?

   194   ;

   195   sort: nameref

   196   ;

   197   arity: ('(' (sort + ',') ')')? sort

   198   ;

   199   simplearity: ('(' (sort + ',') ')')? nameref

   200   ;

   201 \end{rail}

   204 \subsection{Types and terms}\label{sec:types-terms}

   206 The actual inner Isabelle syntax, that of types and terms of the logic, is far

   207 too sophisticated in order to be modelled explicitly at the outer theory

   208 level.  Basically, any such entity has to be quoted to turn it into a single

   209 token (the parsing and type-checking is performed internally later).  For

   210 convenience, a slightly more liberal convention is adopted: quotes may be

   211 omitted for any type or term that is already atomic at the outer level.  For

   212 example, one may just write \texttt{x} instead of \texttt{"x"}.  Note that

   213 symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,

   214 provided these have not been superseded by commands or other keywords already

   215 (e.g.\ \texttt{=} or \texttt{+}).

   217 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}

   218 \begin{rail}

   219   type: nameref | typefree | typevar

   220   ;

   221   term: nameref | var

   222   ;

   223   prop: term

   224   ;

   225 \end{rail}

   227 Positional instantiations are indicated by giving a sequence of terms, or the

   228 placeholder ``$\_$'' (underscore), which means to skip a position.

   230 \indexoutertoken{inst}\indexoutertoken{insts}

   231 \begin{rail}

   232   inst: underscore | term

   233   ;

   234   insts: (inst *)

   235   ;

   236 \end{rail}

   238 Type declarations and definitions usually refer to \railnonterm{typespec} on

   239 the left-hand side.  This models basic type constructor application at the

   240 outer syntax level.  Note that only plain postfix notation is available here,

   241 but no infixes.

   243 \indexouternonterm{typespec}

   244 \begin{rail}

   245   typespec: (() | typefree | '(' ( typefree + ',' ) ')') name

   246   ;

   247 \end{rail}

   250 \subsection{Mixfix annotations}

   252 Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and

   253 terms.  Some commands such as $\TYPES$ (see \S\ref{sec:types-pure}) admit

   254 infixes only, while $\CONSTS$ (see \S\ref{sec:consts}) and

   255 $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans}) support the full range of

   256 general mixfixes and binders.

   258 \indexouternonterm{infix}\indexouternonterm{mixfix}\indexouternonterm{structmixfix}

   259 \begin{rail}

   260   infix: '(' ('infix' | 'infixl' | 'infixr') string? nat ')'

   261   ;

   262   mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'

   263   ;

   264   structmixfix: mixfix | '(' 'structure' ')'

   265   ;

   267   prios: '[' (nat + ',') ']'

   268   ;

   269 \end{rail}

   271 Here the \railtok{string} specifications refer to the actual mixfix template

   272 (see also \cite{isabelle-ref}), which may include literal text, spacing,

   273 blocks, and arguments (denoted by ``$_$''); the special symbol \verb,\<index>,

   274 (printed as ``\i'') represents an index argument that specifies an implicit

   275 structure reference (see also \S\ref{sec:locale}).  Infix and binder

   276 declarations provide common abbreviations for particular mixfix declarations.

   277 So in practice, mixfix templates mostly degenerate to literal text for

   278 concrete syntax, such as ``\verb,++,'' for an infix symbol, or ``\verb,++,\i''

   279 for an infix of an implicit structure.

   283 \subsection{Proof methods}\label{sec:syn-meth}

   285 Proof methods are either basic ones, or expressions composed of methods via

   286 ``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),

   287 ``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,

   288 proof methods are usually just a comma separated list of

   289 \railqtok{nameref}~\railnonterm{args} specifications.  Note that parentheses

   290 may be dropped for single method specifications (with no arguments).

   292 \indexouternonterm{method}

   293 \begin{rail}

   294   method: (nameref | '(' methods ')') (() | '?' | '+')

   295   ;

   296   methods: (nameref args | method) + (',' | '|')

   297   ;

   298 \end{rail}

   300 Proper use of Isar proof methods does \emph{not} involve goal addressing.

   301 Nevertheless, specifying goal ranges may occasionally come in handy in

   302 emulating tactic scripts.  Note that $[n-]$ refers to all goals, starting from

   303 $n$.  All goals may be specified by $[!]$, which is the same as $[1-]$.

   305 \indexouternonterm{goalspec}

   306 \begin{rail}

   307   goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'

   308   ;

   309 \end{rail}

   312 \subsection{Attributes and theorems}\label{sec:syn-att}

   314 Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own

   315 ``semi-inner'' syntax, in the sense that input conforming to

   316 \railnonterm{args} below is parsed by the attribute a second time.  The

   317 attribute argument specifications may be any sequence of atomic entities

   318 (identifiers, strings etc.), or properly bracketed argument lists.  Below

   319 \railqtok{atom} refers to any atomic entity, including any \railtok{keyword}

   320 conforming to \railtok{symident}.

   322 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}

   323 \begin{rail}

   324   atom: nameref | typefree | typevar | var | nat | keyword

   325   ;

   326   arg: atom | '(' args ')' | '[' args ']'

   327   ;

   328   args: arg *

   329   ;

   330   attributes: '[' (nameref args * ',') ']'

   331   ;

   332 \end{rail}

   334 Theorem specifications come in several flavors: \railnonterm{axmdecl} and

   335 \railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal

   336 statements, while \railnonterm{thmdef} collects lists of existing theorems.

   337 Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},

   338 the former requires an actual singleton result.  Any of these theorem

   339 specifications may include lists of attributes both on the left and right hand

   340 sides; attributes are applied to any immediately preceding fact.  If names are

   341 omitted, the theorems are not stored within the theorem database of the theory

   342 or proof context; any given attributes are still applied, though.

   344 \indexouternonterm{thmdecl}\indexouternonterm{axmdecl}

   345 \indexouternonterm{thmdef}\indexouternonterm{thmrefs}

   346 \begin{rail}

   347   axmdecl: name attributes? ':'

   348   ;

   349   thmdecl: thmbind ':'

   350   ;

   351   thmdef: thmbind '='

   352   ;

   353   thmref: nameref attributes?

   354   ;

   355   thmrefs: thmref +

   356   ;

   358   thmbind: name attributes | name | attributes

   359   ;

   360 \end{rail}

   363 \subsection{Term patterns and declarations}\label{sec:term-decls}

   365 Wherever explicit propositions (or term fragments) occur in a proof text,

   366 casual binding of schematic term variables may be given specified via patterns

   367 of the form ``$\ISS{p@1\;\dots}{p@n}$''.  There are separate versions

   368 available for \railqtok{term}s and \railqtok{prop}s.  The latter provides a

   369 $\CONCLNAME$ part with patterns referring the (atomic) conclusion of a rule.

   371 \indexouternonterm{termpat}\indexouternonterm{proppat}

   372 \begin{rail}

   373   termpat: '(' ('is' term +) ')'

   374   ;

   375   proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'

   376   ;

   377 \end{rail}

   379 Declarations of local variables $x :: \tau$ and logical propositions $a :

   380 \phi$ represent different views on the same principle of introducing a local

   381 scope.  In practice, one may usually omit the typing of $vars$ (due to

   382 type-inference), and the naming of propositions (due to implicit references of

   383 current facts).  In any case, Isar proof elements usually admit to introduce

   384 multiple such items simultaneously.

   386 \indexouternonterm{vars}\indexouternonterm{props}

   387 \begin{rail}

   388   vars: (name+) ('::' type)?

   389   ;

   390   props: thmdecl? (prop proppat? +)

   391   ;

   392 \end{rail}

   394 The treatment of multiple declarations corresponds to the complementary focus

   395 of $vars$ versus $props$: in ``$x@1~\dots~x@n :: \tau$'' the typing refers to

   396 all variables, while in $a\colon \phi@1~\dots~\phi@n$ the naming refers to all

   397 propositions collectively.  Isar language elements that refer to $vars$ or

   398 $props$ typically admit separate typings or namings via another level of

   399 iteration, with explicit $\AND$ separators; e.g.\ see $\FIXNAME$ and

   400 $\ASSUMENAME$ in \S\ref{sec:proof-context}.

   403 \subsection{Antiquotations}\label{sec:antiq}

   405 \begin{matharray}{rcl}

   406   thm & : & \isarantiq \\

   407   prop & : & \isarantiq \\

   408   term & : & \isarantiq \\

   409   typ & : & \isarantiq \\

   410   text & : & \isarantiq \\

   411   goals & : & \isarantiq \\

   412   subgoals & : & \isarantiq \\

   413 \end{matharray}

   415 The text body of formal comments (see also \S\ref{sec:comments}) may contain

   416 antiquotations of logical entities, such as theorems, terms and types, which

   417 are to be presented in the final output produced by the Isabelle document

   418 preparation system (see also \S\ref{sec:document-prep}).

   420 Thus embedding of

   421 ``\texttt{{\at}{\ttlbrace}term~[show_types]~"f(x)~=~a~+~x"{\ttrbrace}}''

   422 within a text block would cause

   423 \isa{(f{\isasymColon}'a~{\isasymRightarrow}~'a)~(x{\isasymColon}'a)~=~(a{\isasymColon}'a)~+~x}

   424 to appear in the final {\LaTeX} document.  Also note that theorem

   425 antiquotations may involve attributes as well.  For example,

   426 \texttt{{\at}{\ttlbrace}thm~sym~[no_vars]{\ttrbrace}} would print the

   427 statement where all schematic variables have been replaced by fixed ones,

   428 which are easier to read.

   430 \indexisarant{thm}\indexisarant{prop}\indexisarant{term}

   431 \indexisarant{typ}\indexisarant{text}\indexisarant{goals}\indexisarant{subgoals}

   432 \begin{rail}

   433   atsign lbrace antiquotation rbrace

   434   ;

   436   antiquotation:

   437     'thm' options thmrefs |

   438     'prop' options prop |

   439     'term' options term |

   440     'typ' options type |

   441     'text' options name |

   442     'goals' options |

   443     'subgoals' options

   444   ;

   445   options: '[' (option * ',') ']'

   446   ;

   447   option: name | name '=' name

   448   ;

   449 \end{rail}

   451 Note that the syntax of antiquotations may \emph{not} include source comments

   452 \texttt{(*~\dots~*)} or verbatim text \verb|{*|~\dots~\verb|*}|.

   454 \begin{descr}

   456 \item [$\at\{thm~\vec a\}$] prints theorems $\vec a$. Note that attribute

   457   specifications may be included as well (see also \S\ref{sec:syn-att}); the

   458   $no_vars$ operation (see \S\ref{sec:misc-meth-att}) would be particularly

   459   useful to suppress printing of schematic variables.

   461 \item [$\at\{prop~\phi\}$] prints a well-typed proposition $\phi$.

   463 \item [$\at\{term~t\}$] prints a well-typed term $t$.

   465 \item [$\at\{typ~\tau\}$] prints a well-formed type $\tau$.

   467 \item [$\at\{text~s\}$] prints uninterpreted source text $s$.  This is

   468   particularly useful to print portions of text according to the Isabelle

   469   {\LaTeX} output style, without demanding well-formedness (e.g.\ small pieces

   470   of terms that should not be parsed or type-checked yet).

   472 \item [$\at\{goals\}$] prints the current \emph{dynamic} goal state.  This is

   473   mainly for support of tactic-emulation scripts within Isar --- presentation

   474   of goal states does not conform to actual human-readable proof documents.

   475   Please do not include goal states into document output unless you really

   476   know what you are doing!

   478 \item [$\at\{subgoals\}$] behaves almost like $goals$, except that it does not

   479   print the main goal.

   481 \end{descr}

   483 \medskip

   485 The following options are available to tune the output.  Note that most of

   486 these coincide with ML flags of the same names (see also \cite{isabelle-ref}).

   487 \begin{descr}

   488 \item[$show_types = bool$ and $show_sorts = bool$] control printing of

   489   explicit type and sort constraints.

   490 \item[$long_names = bool$] forces names of types and constants etc.\ to be

   491   printed in their fully qualified internal form.

   492 \item[$eta_contract = bool$] prints terms in $\eta$-contracted form.

   493 \item[$display = bool$] indicates if the text is to be output as multi-line

   494   ``display material'', rather than a small piece of text without line breaks

   495   (which is the default).

   496 \item[$quotes = bool$] indicates if the output should be enclosed in double

   497   quotes.

   498 \item[$mode = name$] adds $name$ to the print mode to be used for presentation

   499   (see also \cite{isabelle-ref}).  Note that the standard setup for {\LaTeX}

   500   output is already present by default, including the modes ``$latex$'',

   501   ``$xsymbols$'', ``$symbols$''.

   502 \item[$margin = nat$ and $indent = nat$] change the margin or indentation for

   503   pretty printing of display material.

   504 \item[$source = bool$] prints the source text of the antiquotation arguments,

   505   rather than the actual value.  Note that this does not affect

   506   well-formedness checks of $thm$, $term$, etc. (only the $text$ antiquotation

   507   admits arbitrary output).

   508 \item[$goals_limit = nat$] determines the maximum number of goals to be

   509   printed.

   510 \end{descr}

   512 For boolean flags, ``$name = true$'' may be abbreviated as ``$name$''.  All of

   513 the above flags are disabled by default, unless changed from ML.

   515 \medskip Note that antiquotations do not only spare the author from tedious

   516 typing of logical entities, but also achieve some degree of

   517 consistency-checking of informal explanations with formal developments:

   518 well-formedness of terms and types with respect to the current theory or proof

   519 context is ensured here.

   521 %%% Local Variables:

   522 %%% mode: latex

   523 %%% TeX-master: "isar-ref"

   524 %%% End:

author	wenzelm
	Fri, 08 Mar 2002 15:53:15 +0100
changeset 13048	8b2eb3b78cc3
parent 13039	cfcc1f6f21df
child 13827	c690cb885db4
permissions	-rw-r--r--