wneuper/isa: doc-src/IsarRef/syntax.tex@8e1cae1de136

     2 \chapter{Syntax primitives}

     4 The rather generic framework of Isabelle/Isar syntax emerges from three main

     5 syntactic categories: \emph{commands} of the top-level Isar engine (covering

     6 theory and proof elements), \emph{methods} for general goal refinements

     7 (analogous to traditional ``tactics''), and \emph{attributes} for operations

     8 on facts (within a certain context).  Here we give a reference of basic

     9 syntactic entities underlying Isabelle/Isar syntax in a bottom-up manner.

    10 Concrete theory and proof language elements will be introduced later on.

    12 \medskip

    14 In order to get started with writing well-formed Isabelle/Isar documents, the

    15 most important aspect to be noted is the difference of \emph{inner} versus

    16 \emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the

    17 logic, while outer syntax is that of Isabelle/Isar theory sources (including

    18 proofs).  As a general rule, inner syntax entities may occur only as

    19 \emph{atomic entities} within outer syntax.  For example, the string

    20 \texttt{"x + y"} and identifier \texttt{z} are legal term specifications

    21 within a theory, while \texttt{x + y} is not.

    23 \begin{warn}

    24   Old-style Isabelle theories used to fake parts of the inner syntax of types,

    25   with rather complicated rules when quotes may be omitted.  Despite the minor

    26   drawback of requiring quotes more often, the syntax of Isabelle/Isar is

    27   somewhat simpler and more robust in that respect.

    28 \end{warn}

    30 Printed theory documents usually omit quotes to gain readability (this is a

    31 matter of {\LaTeX} macro setup, say via \verb,\isabellestyle,, see also

    32 \cite{isabelle-sys}).  Experienced users of Isabelle/Isar may easily

    33 reconstruct the lost technical information, while mere readers need not care

    34 about quotes at all.

    36 \medskip

    38 Isabelle/Isar input may contain any number of input termination characters

    39 ``\texttt{;}'' (semicolon) to separate commands explicitly.  This is

    40 particularly useful in interactive shell sessions to make clear where the

    41 current command is intended to end.  Otherwise, the interpreter loop will

    42 continue to issue a secondary prompt ``\verb,#,'' until an end-of-command is

    43 clearly indicated from the input syntax, e.g.\ encounter of the next command

    44 keyword.

    46 Advanced interfaces such as Proof~General \cite{proofgeneral} do not require

    47 explicit semicolons, the amount of input text is determined automatically by

    48 inspecting the present content of the Emacs text buffer.  In the printed

    49 presentation of Isabelle/Isar documents semicolons are omitted altogether for

    50 readability.

    52 \begin{warn}

    53   Proof~General requires certain syntax classification tables in order to

    54   achieve properly synchronized interaction with the Isabelle/Isar process.

    55   These tables need to be consistent with the Isabelle version and particular

    56   logic image to be used in a running session (common object-logics may well

    57   change the outer syntax).  The standard setup should work correctly with any

    58   of the ``official'' logic images derived from Isabelle/HOL (including HOLCF

    59   etc.).  Users of alternative logics may need to tell Proof~General

    60   explicitly, e.g.\ by giving an option \verb,-k ZF, (in conjunction with

    61   \verb,-l ZF, to specify the default logic image).

    62 \end{warn}

    64 \section{Lexical matters}\label{sec:lex-syntax}

    66 The Isabelle/Isar outer syntax provides token classes as presented below.

    67 Note that some of these coincide (by full intention) with the inner lexical

    68 syntax as presented in \cite{isabelle-ref}.

    70 \indexoutertoken{ident}\indexoutertoken{longident}\indexoutertoken{symident}

    71 \indexoutertoken{nat}\indexoutertoken{var}\indexoutertoken{typefree}

    72 \indexoutertoken{typevar}\indexoutertoken{string}\indexoutertoken{verbatim}

    73 \begin{matharray}{rcl}

    74   ident & = & letter~quasiletter^* \\

    75   longident & = & ident\verb,.,ident~\dots~ident \\

    76   symident & = & sym^+ ~|~ symbol \\

    77   nat & = & digit^+ \\

    78   var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\

    79   typefree & = & \verb,',ident \\

    80   typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\

    81   string & = & \verb,", ~\dots~ \verb,", \\

    82   verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\

    83 \end{matharray}

    84 \begin{matharray}{rcl}

    85   letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\

    86   digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\

    87   quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\

    88   sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$

    89    \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~ \\

    90   & & \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \texttt{\at} ~|~

    91   \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\

    92   symbol & = & {\forall} ~|~ {\exists} ~|~ {\land} ~|~ {\lor} ~|~ \dots

    93 \end{matharray}

    95 The syntax of \railtoken{string} admits any characters, including newlines;

    96 ``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) need to be escaped by

    97 a backslash.  Note that ML-style control characters are \emph{not} supported.

    98 The body of \railtoken{verbatim} may consist of any text not containing

    99 ``\verb|*}|''; this allows handsome inclusion of quotes without further

   100 escapes.

   102 Comments take the form \texttt{(*~\dots~*)} and may in principle be nested,

   103 just as in ML.  Note that these are \emph{source} comments only, which are

   104 stripped after lexical analysis of the input.  The Isar document syntax also

   105 provides \emph{formal comments} that are considered as part of the text (see

   106 \S\ref{sec:comments}).

   108 \begin{warn}

   109   Proof~General does not handle nested comments properly; it is also unable to

   110   keep \verb,(*,\,/\,\verb,{*, and \verb,*),\,/\,\verb,*}, apart, despite

   111   their rather different meaning.  These are inherent problems of Emacs

   112   legacy.

   113 \end{warn}

   115 \medskip

   117 Mathematical symbols such as ``$\forall$'' are represented in plain ASCII as

   118 ``\verb,\<forall>,''.  Concerning Isabelle itself, any sequence of the form

   119 \verb,\<,$ident$\verb,>, (or \verb,\\<,$ident$\verb,>,) is a legal symbol.

   120 Display of appropriate glyphs is a matter of front-end tools, say the

   121 user-interface of Proof~General plus the X-Symbol package, or the {\LaTeX}

   122 macro setup of document output.  A list of predefined Isabelle symbols is

   123 given in \cite[appendix~A]{isabelle-sys}.

   126 \section{Common syntax entities}

   128 Subsequently, we introduce several basic syntactic entities, such as names,

   129 terms, and theorem specifications, which have been factored out of the actual

   130 Isar language elements to be described later.

   132 Note that some of the basic syntactic entities introduced below (e.g.\

   133 \railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\

   134 \railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax

   135 elements like $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}

   136 would really report a missing name or type rather than any of the constituent

   137 primitive tokens such as \railtoken{ident} or \railtoken{string}.

   140 \subsection{Names}

   142 Entity \railqtoken{name} usually refers to any name of types, constants,

   143 theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified

   144 identifiers are excluded here).  Quoted strings provide an escape for

   145 non-identifier names or those ruled out by outer syntax keywords (e.g.\

   146 \verb|"let"|).  Already existing objects are usually referenced by

   147 \railqtoken{nameref}.

   149 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}

   150 \indexoutertoken{int}

   151 \begin{rail}

   152   name: ident | symident | string | nat

   153   ;

   154   parname: '(' name ')'

   155   ;

   156   nameref: name | longident

   157   ;

   158   int: nat | '-' nat

   159   ;

   160 \end{rail}

   163 \subsection{Comments}\label{sec:comments}

   165 Large chunks of plain \railqtoken{text} are usually given

   166 \railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For

   167 convenience, any of the smaller text units conforming to \railqtoken{nameref}

   168 are admitted as well.  A marginal \railnonterm{comment} is of the form

   169 \texttt{--} \railqtoken{text}.  Any number of these may occur within

   170 Isabelle/Isar commands.

   172 \indexoutertoken{text}\indexouternonterm{comment}

   173 \begin{rail}

   174   text: verbatim | nameref

   175   ;

   176   comment: '--' text

   177   ;

   178 \end{rail}

   181 \subsection{Type classes, sorts and arities}

   183 Classes are specified by plain names.  Sorts have a very simple inner syntax,

   184 which is either a single class name $c$ or a list $\{c@1, \dots, c@n\}$

   185 referring to the intersection of these classes.  The syntax of type arities is

   186 given directly at the outer level.

   188 \railalias{subseteq}{\isasymsubseteq}

   189 \railterm{subseteq}

   191 \indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}

   192 \indexouternonterm{classdecl}

   193 \begin{rail}

   194   classdecl: name (('<' | subseteq) (nameref + ','))?

   195   ;

   196   sort: nameref

   197   ;

   198   arity: ('(' (sort + ',') ')')? sort

   199   ;

   200   simplearity: ('(' (sort + ',') ')')? nameref

   201   ;

   202 \end{rail}

   205 \subsection{Types and terms}\label{sec:types-terms}

   207 The actual inner Isabelle syntax, that of types and terms of the logic, is far

   208 too sophisticated in order to be modelled explicitly at the outer theory

   209 level.  Basically, any such entity has to be quoted to turn it into a single

   210 token (the parsing and type-checking is performed internally later).  For

   211 convenience, a slightly more liberal convention is adopted: quotes may be

   212 omitted for any type or term that is already \emph{atomic} at the outer level.

   213 For example, one may just write \texttt{x} instead of \texttt{"x"}.  Note that

   214 symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,

   215 provided these have not been superseded by commands or other keywords already

   216 (e.g.\ \texttt{=} or \texttt{+}).

   218 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}

   219 \begin{rail}

   220   type: nameref | typefree | typevar

   221   ;

   222   term: nameref | var

   223   ;

   224   prop: term

   225   ;

   226 \end{rail}

   228 Positional instantiations are indicated by giving a sequence of terms, or the

   229 placeholder ``$\_$'' (underscore), which means to skip a position.

   231 \indexoutertoken{inst}\indexoutertoken{insts}

   232 \begin{rail}

   233   inst: underscore | term

   234   ;

   235   insts: (inst *)

   236   ;

   237 \end{rail}

   239 Type declarations and definitions usually refer to \railnonterm{typespec} on

   240 the left-hand side.  This models basic type constructor application at the

   241 outer syntax level.  Note that only plain postfix notation is available here,

   242 but no infixes.

   244 \indexouternonterm{typespec}

   245 \begin{rail}

   246   typespec: (() | typefree | '(' ( typefree + ',' ) ')') name

   247   ;

   248 \end{rail}

   251 \subsection{Mixfix annotations}

   253 Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and

   254 terms.  Some commands such as $\TYPES$ (see \S\ref{sec:types-pure}) admit

   255 infixes only, while $\CONSTS$ (see \S\ref{sec:consts}) and

   256 $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans}) support the full range of

   257 general mixfixes and binders.

   259 \indexouternonterm{infix}\indexouternonterm{mixfix}

   260 \begin{rail}

   261   infix: '(' ('infix' | 'infixl' | 'infixr') string? nat ')'

   262   ;

   263   mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'

   264   ;

   266   prios: '[' (nat + ',') ']'

   267   ;

   268 \end{rail}

   270 Here the \railtoken{string} specifications refer to the actual mixfix template

   271 (see also \cite{isabelle-ref}), which may include literal text, spacing,

   272 blocks, and arguments (denoted by ``$_$''); the special symbol \verb,\<index>,

   273 (printed as ``\i'') represents an index argument that specifies an implicit

   274 structure reference (see also \S\ref{sec:locale}).  Infix and binder

   275 declarations provide common abbreviations for particular mixfix declarations.

   276 So in practice, mixfix templates mostly degenerate to literal text for

   277 concrete syntax, such as ``\verb,++,'' for an infix symbol, or ``\verb,++,\i''

   278 for an infix of an implicit structure.

   282 \subsection{Proof methods}\label{sec:syn-meth}

   284 Proof methods are either basic ones, or expressions composed of methods via

   285 ``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),

   286 ``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,

   287 proof methods are usually just a comma separated list of

   288 \railqtoken{nameref}~\railnonterm{args} specifications.  Note that parentheses

   289 may be dropped for single method specifications (with no arguments).

   291 \indexouternonterm{method}

   292 \begin{rail}

   293   method: (nameref | '(' methods ')') (() | '?' | '+')

   294   ;

   295   methods: (nameref args | method) + (',' | '|')

   296   ;

   297 \end{rail}

   299 Proper use of Isar proof methods does \emph{not} involve goal addressing.

   300 Nevertheless, specifying goal ranges may occasionally come in handy in

   301 emulating tactic scripts.  Note that $[n-]$ refers to all goals, starting from

   302 $n$.  All goals may be specified by $[!]$, which is the same as $[1-]$.

   304 \indexouternonterm{goalspec}

   305 \begin{rail}

   306   goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'

   307   ;

   308 \end{rail}

   311 \subsection{Attributes and theorems}\label{sec:syn-att}

   313 Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own

   314 ``semi-inner'' syntax, in the sense that input conforming to

   315 \railnonterm{args} below is parsed by the attribute a second time.  The

   316 attribute argument specifications may be any sequence of atomic entities

   317 (identifiers, strings etc.), or properly bracketed argument lists.  Below

   318 \railqtoken{atom} refers to any atomic entity, including any

   319 \railtoken{keyword} conforming to \railtoken{symident}.

   321 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}

   322 \begin{rail}

   323   atom: nameref | typefree | typevar | var | nat | keyword

   324   ;

   325   arg: atom | '(' args ')' | '[' args ']'

   326   ;

   327   args: arg *

   328   ;

   329   attributes: '[' (nameref args * ',') ']'

   330   ;

   331 \end{rail}

   333 Theorem specifications come in several flavors: \railnonterm{axmdecl} and

   334 \railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal

   335 statements, while \railnonterm{thmdef} collects lists of existing theorems.

   336 Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},

   337 the former requires an actual singleton result.  Any of these theorem

   338 specifications may include lists of attributes both on the left and right hand

   339 sides; attributes are applied to any immediately preceding theorem.  If names

   340 are omitted, the theorems are not stored within the theorem database of the

   341 theory or proof context; any given attributes are still applied, though.

   343 \indexouternonterm{thmdecl}\indexouternonterm{axmdecl}

   344 \indexouternonterm{thmdef}\indexouternonterm{thmrefs}

   345 \begin{rail}

   346   axmdecl: name attributes? ':'

   347   ;

   348   thmdecl: thmbind ':'

   349   ;

   350   thmdef: thmbind '='

   351   ;

   352   thmref: nameref attributes?

   353   ;

   354   thmrefs: thmref +

   355   ;

   357   thmbind: name attributes | name | attributes

   358   ;

   359 \end{rail}

   362 \subsection{Term patterns and declarations}\label{sec:term-decls}

   364 Wherever explicit propositions (or term fragments) occur in a proof text,

   365 casual binding of schematic term variables may be given specified via patterns

   366 of the form $\ISS{p@1\;\dots}{p@n}$.  There are separate versions available

   367 for \railqtoken{term}s and \railqtoken{prop}s.  The latter provides a

   368 $\CONCLNAME$ part with patterns referring the (atomic) conclusion of a rule.

   370 \indexouternonterm{termpat}\indexouternonterm{proppat}

   371 \begin{rail}

   372   termpat: '(' ('is' term +) ')'

   373   ;

   374   proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'

   375   ;

   376 \end{rail}

   378 Declarations of local variables $x :: \tau$ and logical propositions $a :

   379 \phi$ represent different views on the same principle of introducing a local

   380 scope.  In practice, one may usually omit the typing of $vars$ (due to

   381 type-inference), and the naming of propositions (due to implicit chaining of

   382 emerging facts).  In any case, Isar proof elements usually admit to introduce

   383 multiple such items simultaneously.

   385 \indexouternonterm{vars}\indexouternonterm{props}

   386 \begin{rail}

   387   vars: (name+) ('::' type)?

   388   ;

   389   props: thmdecl? (prop proppat? +)

   390   ;

   391 \end{rail}

   393 The treatment of multiple declarations corresponds to the complementary focus

   394 of $vars$ versus $props$: in ``$x@1~\dots~x@n :: \tau$'' the typing refers to

   395 all variables, while in $a\colon \phi@1~\dots~\phi@n$ the naming refers to all

   396 propositions collectively.  Isar language elements that refer to $vars$ or

   397 $props$ typically admit separate typings or namings via another level of

   398 iteration, with explicit $\AND$ separators; e.g.\ see $\FIXNAME$ and

   399 $\ASSUMENAME$ in \S\ref{sec:proof-context}.

   402 \subsection{Antiquotations}\label{sec:antiq}

   404 \begin{matharray}{rcl}

   405   thm & : & \isarantiq \\

   406   prop & : & \isarantiq \\

   407   term & : & \isarantiq \\

   408   typ & : & \isarantiq \\

   409   text & : & \isarantiq \\

   410   goals & : & \isarantiq \\

   411   subgoals & : & \isarantiq \\

   412 \end{matharray}

   414 The text body of formal comments (see also \S\ref{sec:comments}) may contain

   415 antiquotations of logical entities, such as theorems, terms and types, which

   416 are to be presented in the final output produced by the Isabelle document

   417 preparation system (see also \S\ref{sec:document-prep}).

   419 Thus embedding of

   420 \texttt{{\at}{\ttlbrace}term~[show_types]~"f(x)~=~a~+~x"{\ttrbrace}} within a

   421 text block would cause

   422 \isa{(f{\isasymColon}'a~{\isasymRightarrow}~'a)~(x{\isasymColon}'a)~=~(a{\isasymColon}'a)~+~x}

   423 to appear in the final {\LaTeX} document.  Also note that theorem

   424 antiquotations may involve attributes as well.  For example,

   425 \texttt{{\at}{\ttlbrace}thm~sym~[no_vars]{\ttrbrace}} would print the

   426 statement where all schematic variables have been replaced by fixed ones,

   427 which are easier to read.

   429 \indexisarant{thm}\indexisarant{prop}\indexisarant{term}

   430 \indexisarant{typ}\indexisarant{text}\indexisarant{goals}\indexisarant{subgoals}

   431 \begin{rail}

   432   atsign lbrace antiquotation rbrace

   433   ;

   435   antiquotation:

   436     'thm' options thmrefs |

   437     'prop' options prop |

   438     'term' options term |

   439     'typ' options type |

   440     'text' options name |

   441     'goals' options |

   442     'subgoals' options

   443   ;

   444   options: '[' (option * ',') ']'

   445   ;

   446   option: name | name '=' name

   447   ;

   448 \end{rail}

   450 Note that the syntax of antiquotations may \emph{not} include source comments

   451 \texttt{(*~\dots~*)} or verbatim text \verb|{*|~\dots~\verb|*}|.

   453 \begin{descr}

   454 \item [$\at\{thm~\vec a\}$] prints theorems $\vec a$. Note that attribute

   455   specifications may be included as well (see also \S\ref{sec:syn-att}); the

   456   $no_vars$ operation (see \S\ref{sec:misc-meth-att}) would be particularly

   457   useful to suppress printing of schematic variables.

   458 \item [$\at\{prop~\phi\}$] prints a well-typed proposition $\phi$.

   459 \item [$\at\{term~t\}$] prints a well-typed term $t$.

   460 \item [$\at\{typ~\tau\}$] prints a well-formed type $\tau$.

   461 \item [$\at\{text~s\}$] prints uninterpreted source text $s$.  This is

   462   particularly useful to print portions of text according to the Isabelle

   463   {\LaTeX} output style, without demanding well-formedness (e.g.\ small pieces

   464   of terms that cannot be parsed or type-checked yet).

   465 \item [$\at\{goals\}$] prints the current \emph{dynamic} goal state.  This is

   466   only for support of tactic-emulation scripts within Isar --- presentation of

   467   goal states does not conform to actual human-readable proof documents.

   469   Please do not include goal states into document output unless you really

   470   know what you are doing!

   471 \item [$\at\{subgoals\}$] behaves almost like $goals$, except that it does not

   472   print the main goal.

   473 \end{descr}

   475 \medskip

   477 The following options are available to tune the output.  Note that most of

   478 these coincide with ML flags of the same names (see also \cite{isabelle-ref}).

   479 \begin{descr}

   480 \item[$show_types = bool$ and $show_sorts = bool$] control printing of

   481   explicit type and sort constraints.

   482 \item[$long_names = bool$] forces names of types and constants etc.\ to be

   483   printed in their fully qualified internal form.

   484 \item[$eta_contract = bool$] prints terms in $\eta$-contracted form.

   485 \item[$display = bool$] indicates if the text is to be output as multi-line

   486   ``display material'', rather than a small piece of text without line breaks

   487   (which is the default).

   488 \item[$quotes = bool$] indicates if the output should be enclosed in double

   489   quotes.

   490 \item[$mode = name$] adds $name$ to the print mode to be used for presentation

   491   (see also \cite{isabelle-ref}).  Note that the standard setup for {\LaTeX}

   492   output is already present by default, including the modes ``$latex$'',

   493   ``$xsymbols$'', ``$symbols$''.

   494 \item[$margin = nat$ and $indent = nat$] change the margin or indentation for

   495   pretty printing of display material.

   496 \item[$source = bool$] prints the source text of the antiquotation arguments,

   497   rather than the actual value.  Note that this does not affect

   498   well-formedness checks of $thm$, $term$, etc. (only the $text$ antiquotation

   499   admits arbitrary output).

   500 \item[$goals_limit = nat$] determines the maximum number of goals to be

   501   printed.

   502 \end{descr}

   504 For boolean flags, ``$name = true$'' may be abbreviated as ``$name$''.  All of

   505 the above flags are disabled by default, unless changed from ML.

   507 \medskip Note that antiquotations do not only spare the author from tedious

   508 typing, but also achieve some degree of consistency-checking of informal

   509 explanations with formal developments, since well-formedness of terms and

   510 types with respect to the current theory or proof context can be ensured.

   512 %%% Local Variables:

   513 %%% mode: latex

   514 %%% TeX-master: "isar-ref"

   515 %%% End:

author	wenzelm
	Tue, 12 Feb 2002 20:33:03 +0100
changeset 12879	8e1cae1de136
parent 12637	4d43b06a81e1
child 12976	5cfe2941a5db
permissions	-rw-r--r--