wneuper/isa: doc-src/IsarRef/Thy/Outer

     1 (* $Id$ *)

     3 theory Outer_Syntax

     4 imports Pure

     5 begin

     7 chapter {* Outer syntax *}

     9 text {*

    10   The rather generic framework of Isabelle/Isar syntax emerges from

    11   three main syntactic categories: \emph{commands} of the top-level

    12   Isar engine (covering theory and proof elements), \emph{methods} for

    13   general goal refinements (analogous to traditional ``tactics''), and

    14   \emph{attributes} for operations on facts (within a certain

    15   context).  Subsequently we give a reference of basic syntactic

    16   entities underlying Isabelle/Isar syntax in a bottom-up manner.

    17   Concrete theory and proof language elements will be introduced later

    18   on.

    20   \medskip In order to get started with writing well-formed

    21   Isabelle/Isar documents, the most important aspect to be noted is

    22   the difference of \emph{inner} versus \emph{outer} syntax.  Inner

    23   syntax is that of Isabelle types and terms of the logic, while outer

    24   syntax is that of Isabelle/Isar theory sources (specifications and

    25   proofs).  As a general rule, inner syntax entities may occur only as

    26   \emph{atomic entities} within outer syntax.  For example, the string

    27   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term

    28   specifications within a theory, while @{verbatim "x + y"} without

    29   quotes is not.

    31   Printed theory documents usually omit quotes to gain readability

    32   (this is a matter of {\LaTeX} macro setup, say via @{verbatim

    33   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced

    34   users of Isabelle/Isar may easily reconstruct the lost technical

    35   information, while mere readers need not care about quotes at all.

    37   \medskip Isabelle/Isar input may contain any number of input

    38   termination characters ``@{verbatim ";"}'' (semicolon) to separate

    39   commands explicitly.  This is particularly useful in interactive

    40   shell sessions to make clear where the current command is intended

    41   to end.  Otherwise, the interpreter loop will continue to issue a

    42   secondary prompt ``@{verbatim "#"}'' until an end-of-command is

    43   clearly recognized from the input syntax, e.g.\ encounter of the

    44   next command keyword.

    46   More advanced interfaces such as Proof~General \cite{proofgeneral}

    47   do not require explicit semicolons, the amount of input text is

    48   determined automatically by inspecting the present content of the

    49   Emacs text buffer.  In the printed presentation of Isabelle/Isar

    50   documents semicolons are omitted altogether for readability.

    52   \begin{warn}

    53     Proof~General requires certain syntax classification tables in

    54     order to achieve properly synchronized interaction with the

    55     Isabelle/Isar process.  These tables need to be consistent with

    56     the Isabelle version and particular logic image to be used in a

    57     running session (common object-logics may well change the outer

    58     syntax).  The standard setup should work correctly with any of the

    59     ``official'' logic images derived from Isabelle/HOL (including

    60     HOLCF etc.).  Users of alternative logics may need to tell

    61     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}

    62     (in conjunction with @{verbatim "-l ZF"}, to specify the default

    63     logic image).  Note that option @{verbatim "-L"} does both

    64     of this at the same time.

    65   \end{warn}

    66 *}

    69 section {* Lexical matters \label{sec:lex-syntax} *}

    71 text {*

    72   The Isabelle/Isar outer syntax provides token classes as presented

    73   below; most of these coincide with the inner lexical syntax as

    74   presented in \cite{isabelle-ref}.

    76   \begin{matharray}{rcl}

    77     @{syntax_def ident} & = & letter\,quasiletter^* \\

    78     @{syntax_def longident} & = & ident (\verb,.,ident)^+ \\

    79     @{syntax_def symident} & = & sym^+ ~|~ \verb,\,\verb,<,ident\verb,>, \\

    80     @{syntax_def nat} & = & digit^+ \\

    81     @{syntax_def var} & = & ident ~|~ \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\

    82     @{syntax_def typefree} & = & \verb,',ident \\

    83     @{syntax_def typevar} & = & typefree ~|~ \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\

    84     @{syntax_def string} & = & \verb,", ~\dots~ \verb,", \\

    85     @{syntax_def altstring} & = & \backquote ~\dots~ \backquote \\

    86     @{syntax_def verbatim} & = & \verb,{*, ~\dots~ \verb,*,\verb,}, \\[1ex]

    88     letter & = & latin ~|~ \verb,\,\verb,<,latin\verb,>, ~|~ \verb,\,\verb,<,latin\,latin\verb,>, ~|~ greek ~|~ \\

    89            &   & \verb,\<^isub>, ~|~ \verb,\<^isup>, \\

    90     quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\

    91     latin & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\

    92     digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\

    93     sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~

    94      \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \\

    95     & & \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \texttt{\at} ~|~

    96     \verb,^, ~|~ \verb,_, ~|~ \verb,|, ~|~ \verb,~, \\

    97     greek & = & \verb,\<alpha>, ~|~ \verb,\<beta>, ~|~ \verb,\<gamma>, ~|~ \verb,\<delta>, ~| \\

    98           &   & \verb,\<epsilon>, ~|~ \verb,\<zeta>, ~|~ \verb,\<eta>, ~|~ \verb,\<theta>, ~| \\

    99           &   & \verb,\<iota>, ~|~ \verb,\<kappa>, ~|~ \verb,\<mu>, ~|~ \verb,\<nu>, ~| \\

   100           &   & \verb,\<xi>, ~|~ \verb,\<pi>, ~|~ \verb,\<rho>, ~|~ \verb,\<sigma>, ~|~ \verb,\<tau>, ~| \\

   101           &   & \verb,\<upsilon>, ~|~ \verb,\<phi>, ~|~ \verb,\<chi>, ~|~ \verb,\<psi>, ~| \\

   102           &   & \verb,\<omega>, ~|~ \verb,\<Gamma>, ~|~ \verb,\<Delta>, ~|~ \verb,\<Theta>, ~| \\

   103           &   & \verb,\<Lambda>, ~|~ \verb,\<Xi>, ~|~ \verb,\<Pi>, ~|~ \verb,\<Sigma>, ~| \\

   104           &   & \verb,\<Upsilon>, ~|~ \verb,\<Phi>, ~|~ \verb,\<Psi>, ~|~ \verb,\<Omega>, \\

   105   \end{matharray}

   107   The syntax of @{syntax string} admits any characters, including

   108   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim

   109   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary

   110   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',

   111   with three decimal digits.  Alternative strings according to

   112   @{syntax altstring} are analogous, using single back-quotes instead.

   113   The body of @{syntax verbatim} may consist of any text not

   114   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows

   115   convenient inclusion of quotes without further escapes.  The greek

   116   letters do \emph{not} include @{verbatim "\<lambda>"}, which is already used

   117   differently in the meta-logic.

   119   Common mathematical symbols such as @{text \<forall>} are represented in

   120   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle

   121   symbols like this, although proper presentation is left to front-end

   122   tools such as {\LaTeX} or Proof~General with the X-Symbol package.

   123   A list of standard Isabelle symbols that work well with these tools

   124   is given in \cite[appendix~A]{isabelle-sys}.

   126   Source comments take the form @{verbatim "(*"}~@{text

   127   "\<dots>"}~@{verbatim "*)"} and may be nested, although user-interface

   128   tools might prevent this.  Note that this form indicates source

   129   comments only, which are stripped after lexical analysis of the

   130   input.  The Isar document syntax also provides formal comments that

   131   are considered as part of the text (see \secref{sec:comments}).

   132 *}

   135 section {* Common syntax entities *}

   137 text {*

   138   We now introduce several basic syntactic entities, such as names,

   139   terms, and theorem specifications, which are factored out of the

   140   actual Isar language elements to be described later.

   141 *}

   144 subsection {* Names *}

   146 text {*

   147   Entity \railqtok{name} usually refers to any name of types,

   148   constants, theorems etc.\ that are to be \emph{declared} or

   149   \emph{defined} (so qualified identifiers are excluded here).  Quoted

   150   strings provide an escape for non-identifier names or those ruled

   151   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).

   152   Already existing objects are usually referenced by

   153   \railqtok{nameref}.

   155   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}

   156   \indexoutertoken{int}

   157   \begin{rail}

   158     name: ident | symident | string | nat

   159     ;

   160     parname: '(' name ')'

   161     ;

   162     nameref: name | longident

   163     ;

   164     int: nat | '-' nat

   165     ;

   166   \end{rail}

   167 *}

   170 subsection {* Comments \label{sec:comments} *}

   172 text {*

   173   Large chunks of plain \railqtok{text} are usually given

   174   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim

   175   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,

   176   any of the smaller text units conforming to \railqtok{nameref} are

   177   admitted as well.  A marginal \railnonterm{comment} is of the form

   178   @{verbatim "--"} \railqtok{text}.  Any number of these may occur

   179   within Isabelle/Isar commands.

   181   \indexoutertoken{text}\indexouternonterm{comment}

   182   \begin{rail}

   183     text: verbatim | nameref

   184     ;

   185     comment: '--' text

   186     ;

   187   \end{rail}

   188 *}

   191 subsection {* Type classes, sorts and arities *}

   193 text {*

   194   Classes are specified by plain names.  Sorts have a very simple

   195   inner syntax, which is either a single class name @{text c} or a

   196   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the

   197   intersection of these classes.  The syntax of type arities is given

   198   directly at the outer level.

   200   \indexouternonterm{sort}\indexouternonterm{arity}

   201   \indexouternonterm{classdecl}

   202   \begin{rail}

   203     classdecl: name (('<' | subseteq) (nameref + ','))?

   204     ;

   205     sort: nameref

   206     ;

   207     arity: ('(' (sort + ',') ')')? sort

   208     ;

   209   \end{rail}

   210 *}

   213 subsection {* Types and terms \label{sec:types-terms} *}

   215 text {*

   216   The actual inner Isabelle syntax, that of types and terms of the

   217   logic, is far too sophisticated in order to be modelled explicitly

   218   at the outer theory level.  Basically, any such entity has to be

   219   quoted to turn it into a single token (the parsing and type-checking

   220   is performed internally later).  For convenience, a slightly more

   221   liberal convention is adopted: quotes may be omitted for any type or

   222   term that is already atomic at the outer level.  For example, one

   223   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.

   224   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text

   225   "\<forall>"} are available as well, provided these have not been superseded

   226   by commands or other keywords already (such as @{verbatim "="} or

   227   @{verbatim "+"}).

   229   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}

   230   \begin{rail}

   231     type: nameref | typefree | typevar

   232     ;

   233     term: nameref | var

   234     ;

   235     prop: term

   236     ;

   237   \end{rail}

   239   Positional instantiations are indicated by giving a sequence of

   240   terms, or the placeholder ``@{text _}'' (underscore), which means to

   241   skip a position.

   243   \indexoutertoken{inst}\indexoutertoken{insts}

   244   \begin{rail}

   245     inst: underscore | term

   246     ;

   247     insts: (inst *)

   248     ;

   249   \end{rail}

   251   Type declarations and definitions usually refer to

   252   \railnonterm{typespec} on the left-hand side.  This models basic

   253   type constructor application at the outer syntax level.  Note that

   254   only plain postfix notation is available here, but no infixes.

   256   \indexouternonterm{typespec}

   257   \begin{rail}

   258     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name

   259     ;

   260   \end{rail}

   261 *}

   264 subsection {* Mixfix annotations *}

   266 text {*

   267   Mixfix annotations specify concrete \emph{inner} syntax of Isabelle

   268   types and terms.  Some commands such as @{command "types"} (see

   269   \secref{sec:types-pure}) admit infixes only, while @{command

   270   "consts"} (see \secref{sec:consts}) and @{command "syntax"} (see

   271   \secref{sec:syn-trans}) support the full range of general mixfixes

   272   and binders.

   274   \indexouternonterm{infix}\indexouternonterm{mixfix}\indexouternonterm{structmixfix}

   275   \begin{rail}

   276     infix: '(' ('infix' | 'infixl' | 'infixr') string? nat ')'

   277     ;

   278     mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'

   279     ;

   280     structmixfix: mixfix | '(' 'structure' ')'

   281     ;

   283     prios: '[' (nat + ',') ']'

   284     ;

   285   \end{rail}

   287   Here the \railtok{string} specifications refer to the actual mixfix

   288   template (see also \cite{isabelle-ref}), which may include literal

   289   text, spacing, blocks, and arguments (denoted by ``@{text _}''); the

   290   special symbol ``@{verbatim "\<index>"}'' (printed as ``@{text "\<index>"}'')

   291   represents an index argument that specifies an implicit structure

   292   reference (see also \secref{sec:locale}).  Infix and binder

   293   declarations provide common abbreviations for particular mixfix

   294   declarations.  So in practice, mixfix templates mostly degenerate to

   295   literal text for concrete syntax, such as ``@{verbatim "++"}'' for

   296   an infix symbol, or ``@{verbatim "++"}@{text "\<index>"}'' for an infix of

   297   an implicit structure.

   298 *}

   301 subsection {* Proof methods \label{sec:syn-meth} *}

   303 text {*

   304   Proof methods are either basic ones, or expressions composed of

   305   methods via ``@{verbatim ","}'' (sequential composition),

   306   ``@{verbatim "|"}'' (alternative choices), ``@{verbatim "?"}''

   307   (try), ``@{verbatim "+"}'' (repeat at least once), ``@{verbatim

   308   "["}@{text n}@{verbatim "]"}'' (restriction to first @{text n}

   309   sub-goals, with default @{text "n = 1"}).  In practice, proof

   310   methods are usually just a comma separated list of

   311   \railqtok{nameref}~\railnonterm{args} specifications.  Note that

   312   parentheses may be dropped for single method specifications (with no

   313   arguments).

   315   \indexouternonterm{method}

   316   \begin{rail}

   317     method: (nameref | '(' methods ')') (() | '?' | '+' | '[' nat? ']')

   318     ;

   319     methods: (nameref args | method) + (',' | '|')

   320     ;

   321   \end{rail}

   323   Proper Isar proof methods do \emph{not} admit arbitrary goal

   324   addressing, but refer either to the first sub-goal or all sub-goals

   325   uniformly.  The goal restriction operator ``@{text "[n]"}''

   326   evaluates a method expression within a sandbox consisting of the

   327   first @{text n} sub-goals (which need to exist).  For example, the

   328   method ``@{text "simp_all[3]"}'' simplifies the first three

   329   sub-goals, while ``@{text "(rule foo, simp_all)[]"}'' simplifies all

   330   new goals that emerge from applying rule @{text "foo"} to the

   331   originally first one.

   333   Improper methods, notably tactic emulations, offer a separate

   334   low-level goal addressing scheme as explicit argument to the

   335   individual tactic being involved.  Here ``@{text "[!]"}'' refers to

   336   all goals, and ``@{text "[n-]"}'' to all goals starting from @{text

   337   "n"}.

   339   \indexouternonterm{goalspec}

   340   \begin{rail}

   341     goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'

   342     ;

   343   \end{rail}

   344 *}

   347 subsection {* Attributes and theorems \label{sec:syn-att} *}

   349 text {*

   350   Attributes (and proof methods, see \secref{sec:syn-meth}) have their

   351   own ``semi-inner'' syntax, in the sense that input conforming to

   352   \railnonterm{args} below is parsed by the attribute a second time.

   353   The attribute argument specifications may be any sequence of atomic

   354   entities (identifiers, strings etc.), or properly bracketed argument

   355   lists.  Below \railqtok{atom} refers to any atomic entity, including

   356   any \railtok{keyword} conforming to \railtok{symident}.

   358   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}

   359   \begin{rail}

   360     atom: nameref | typefree | typevar | var | nat | keyword

   361     ;

   362     arg: atom | '(' args ')' | '[' args ']'

   363     ;

   364     args: arg *

   365     ;

   366     attributes: '[' (nameref args * ',') ']'

   367     ;

   368   \end{rail}

   370   Theorem specifications come in several flavors:

   371   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to

   372   axioms, assumptions or results of goal statements, while

   373   \railnonterm{thmdef} collects lists of existing theorems.  Existing

   374   theorems are given by \railnonterm{thmref} and

   375   \railnonterm{thmrefs}, the former requires an actual singleton

   376   result.

   378   There are three forms of theorem references:

   379   \begin{enumerate}

   381   \item named facts @{text "a"},

   383   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},

   385   \item literal fact propositions using @{syntax_ref altstring} syntax

   386   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method

   387   @{method_ref fact} in \secref{sec:pure-meth-att}).

   389   \end{enumerate}

   391   Any kind of theorem specification may include lists of attributes

   392   both on the left and right hand sides; attributes are applied to any

   393   immediately preceding fact.  If names are omitted, the theorems are

   394   not stored within the theorem database of the theory or proof

   395   context, but any given attributes are applied nonetheless.

   397   An extra pair of brackets around attributes (like ``@{text

   398   "[[simproc a]]"}'') abbreviates a theorem reference involving an

   399   internal dummy fact, which will be ignored later on.  So only the

   400   effect of the attribute on the background context will persist.

   401   This form of in-place declarations is particularly useful with

   402   commands like @{command "declare"} and @{command "using"}.

   404   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}

   405   \indexouternonterm{thmdef}\indexouternonterm{thmref}

   406   \indexouternonterm{thmrefs}\indexouternonterm{selection}

   407   \begin{rail}

   408     axmdecl: name attributes? ':'

   409     ;

   410     thmdecl: thmbind ':'

   411     ;

   412     thmdef: thmbind '='

   413     ;

   414     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'

   415     ;

   416     thmrefs: thmref +

   417     ;

   419     thmbind: name attributes | name | attributes

   420     ;

   421     selection: '(' ((nat | nat '-' nat?) + ',') ')'

   422     ;

   423   \end{rail}

   424 *}

   427 subsection {* Term patterns and declarations \label{sec:term-decls} *}

   429 text {*

   430   Wherever explicit propositions (or term fragments) occur in a proof

   431   text, casual binding of schematic term variables may be given

   432   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>

   433   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.

   435   \indexouternonterm{termpat}\indexouternonterm{proppat}

   436   \begin{rail}

   437     termpat: '(' ('is' term +) ')'

   438     ;

   439     proppat: '(' ('is' prop +) ')'

   440     ;

   441   \end{rail}

   443   \medskip Declarations of local variables @{text "x :: \<tau>"} and

   444   logical propositions @{text "a : \<phi>"} represent different views on

   445   the same principle of introducing a local scope.  In practice, one

   446   may usually omit the typing of \railnonterm{vars} (due to

   447   type-inference), and the naming of propositions (due to implicit

   448   references of current facts).  In any case, Isar proof elements

   449   usually admit to introduce multiple such items simultaneously.

   451   \indexouternonterm{vars}\indexouternonterm{props}

   452   \begin{rail}

   453     vars: (name+) ('::' type)?

   454     ;

   455     props: thmdecl? (prop proppat? +)

   456     ;

   457   \end{rail}

   459   The treatment of multiple declarations corresponds to the

   460   complementary focus of \railnonterm{vars} versus

   461   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''

   462   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>

   463   \<phi>\<^sub>n"} the naming refers to all propositions collectively.

   464   Isar language elements that refer to \railnonterm{vars} or

   465   \railnonterm{props} typically admit separate typings or namings via

   466   another level of iteration, with explicit @{keyword_ref "and"}

   467   separators; e.g.\ see @{command "fix"} and @{command "assume"} in

   468   \secref{sec:proof-context}.

   469 *}

   471 end

author	wenzelm
	Mon, 02 Jun 2008 22:50:23 +0200
changeset 27040	3d3e6e07b931
parent 27037	33d95687514e
child 27050	cd8d99b9ef09
permissions	-rw-r--r--