wneuper/isa: doc-src/IsarRef/Thy/Outer

     1 (* $Id$ *)

     3 theory Outer_Syntax

     4 imports Main

     5 begin

     7 chapter {* Outer syntax *}

     9 text {*

    10   The rather generic framework of Isabelle/Isar syntax emerges from

    11   three main syntactic categories: \emph{commands} of the top-level

    12   Isar engine (covering theory and proof elements), \emph{methods} for

    13   general goal refinements (analogous to traditional ``tactics''), and

    14   \emph{attributes} for operations on facts (within a certain

    15   context).  Subsequently we give a reference of basic syntactic

    16   entities underlying Isabelle/Isar syntax in a bottom-up manner.

    17   Concrete theory and proof language elements will be introduced later

    18   on.

    20   \medskip In order to get started with writing well-formed

    21   Isabelle/Isar documents, the most important aspect to be noted is

    22   the difference of \emph{inner} versus \emph{outer} syntax.  Inner

    23   syntax is that of Isabelle types and terms of the logic, while outer

    24   syntax is that of Isabelle/Isar theory sources (specifications and

    25   proofs).  As a general rule, inner syntax entities may occur only as

    26   \emph{atomic entities} within outer syntax.  For example, the string

    27   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term

    28   specifications within a theory, while @{verbatim "x + y"} without

    29   quotes is not.

    31   Printed theory documents usually omit quotes to gain readability

    32   (this is a matter of {\LaTeX} macro setup, say via @{verbatim

    33   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced

    34   users of Isabelle/Isar may easily reconstruct the lost technical

    35   information, while mere readers need not care about quotes at all.

    37   \medskip Isabelle/Isar input may contain any number of input

    38   termination characters ``@{verbatim ";"}'' (semicolon) to separate

    39   commands explicitly.  This is particularly useful in interactive

    40   shell sessions to make clear where the current command is intended

    41   to end.  Otherwise, the interpreter loop will continue to issue a

    42   secondary prompt ``@{verbatim "#"}'' until an end-of-command is

    43   clearly recognized from the input syntax, e.g.\ encounter of the

    44   next command keyword.

    46   More advanced interfaces such as Proof~General \cite{proofgeneral}

    47   do not require explicit semicolons, the amount of input text is

    48   determined automatically by inspecting the present content of the

    49   Emacs text buffer.  In the printed presentation of Isabelle/Isar

    50   documents semicolons are omitted altogether for readability.

    52   \begin{warn}

    53     Proof~General requires certain syntax classification tables in

    54     order to achieve properly synchronized interaction with the

    55     Isabelle/Isar process.  These tables need to be consistent with

    56     the Isabelle version and particular logic image to be used in a

    57     running session (common object-logics may well change the outer

    58     syntax).  The standard setup should work correctly with any of the

    59     ``official'' logic images derived from Isabelle/HOL (including

    60     HOLCF etc.).  Users of alternative logics may need to tell

    61     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}

    62     (in conjunction with @{verbatim "-l ZF"}, to specify the default

    63     logic image).  Note that option @{verbatim "-L"} does both

    64     of this at the same time.

    65   \end{warn}

    66 *}

    69 section {* Lexical matters \label{sec:outer-lex} *}

    71 text {* The outer lexical syntax consists of three main categories of

    72   syntax tokens:

    74   \begin{enumerate}

    76   \item \emph{major keywords} --- the command names that are available

    77   in the present logic session;

    79   \item \emph{minor keywords} --- additional literal tokens required

    80   by the syntax of commands;

    82   \item \emph{named tokens} --- various categories of identifiers etc.

    84   \end{enumerate}

    86   Major keywords and minor keywords are guaranteed to be disjoint.

    87   This helps user-interfaces to determine the overall structure of a

    88   theory text, without knowing the full details of command syntax.

    89   Internally, there is some additional information about the kind of

    90   major keywords, which approximates the command type (theory command,

    91   proof command etc.).

    93   Keywords override named tokens.  For example, the presence of a

    94   command called @{verbatim term} inhibits the identifier @{verbatim

    95   term}, but the string @{verbatim "\"term\""} can be used instead.

    96   By convention, the outer syntax always allows quoted strings in

    97   addition to identifiers, wherever a named entity is expected.

    99   When tokenizing a given input sequence, the lexer repeatedly takes

   100   the longest prefix of the input that forms a valid token.  Spaces,

   101   tabs, newlines and formfeeds between tokens serve as explicit

   102   separators.

   104   \medskip The categories for named tokens are defined once and for

   105   all as follows.

   107   \begin{center}

   108   \begin{supertabular}{rcl}

   109     @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\

   110     @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\

   111     @{syntax_def symident} & = & @{text "sym\<^sup>+  |  "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\

   112     @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\

   113     @{syntax_def var} & = & @{verbatim "?"}@{text "ident  |  "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\

   114     @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\

   115     @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree  |  "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\

   116     @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\

   117     @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\

   118     @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]

   120     @{text letter} & = & @{text "latin  |  "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text "  |  "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text "  |  greek  |"} \\

   121           &   & @{verbatim "\<^isub>"}@{text "  |  "}@{verbatim "\<^isup>"} \\

   122     @{text quasiletter} & = & @{text "letter  |  digit  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "'"} \\

   123     @{text latin} & = & @{verbatim a}@{text "  | \<dots> |  "}@{verbatim z}@{text "  |  "}@{verbatim A}@{text "  |  \<dots> |  "}@{verbatim Z} \\

   124     @{text digit} & = & @{verbatim "0"}@{text "  |  \<dots> |  "}@{verbatim "9"} \\

   125     @{text sym} & = & @{verbatim "!"}@{text "  |  "}@{verbatim "#"}@{text "  |  "}@{verbatim "$"}@{text "  |  "}@{verbatim "%"}@{text "  |  "}@{verbatim "&"}@{text "  |  "}@{verbatim "*"}@{text "  |  "}@{verbatim "+"}@{text "  |  "}@{verbatim "-"}@{text "  |  "}@{verbatim "/"}@{text "  |"} \\

   126     & & @{verbatim "<"}@{text "  |  "}@{verbatim "="}@{text "  |  "}@{verbatim ">"}@{text "  |  "}@{verbatim "?"}@{text "  |  "}@{verbatim "@"}@{text "  |  "}@{verbatim "^"}@{text "  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "|"}@{text "  |  "}@{verbatim "~"} \\

   127     @{text greek} & = & @{verbatim "\<alpha>"}@{text "  |  "}@{verbatim "\<beta>"}@{text "  |  "}@{verbatim "\<gamma>"}@{text "  |  "}@{verbatim "\<delta>"}@{text "  |"} \\

   128           &   & @{verbatim "\<epsilon>"}@{text "  |  "}@{verbatim "\<zeta>"}@{text "  |  "}@{verbatim "\<eta>"}@{text "  |  "}@{verbatim "\<theta>"}@{text "  |"} \\

   129           &   & @{verbatim "\<iota>"}@{text "  |  "}@{verbatim "\<kappa>"}@{text "  |  "}@{verbatim "\<mu>"}@{text "  |  "}@{verbatim "\<nu>"}@{text "  |"} \\

   130           &   & @{verbatim "\<xi>"}@{text "  |  "}@{verbatim "\<pi>"}@{text "  |  "}@{verbatim "\<rho>"}@{text "  |  "}@{verbatim "\<sigma>"}@{text "  |  "}@{verbatim "\<tau>"}@{text "  |"} \\

   131           &   & @{verbatim "\<upsilon>"}@{text "  |  "}@{verbatim "\<phi>"}@{text "  |  "}@{verbatim "\<chi>"}@{text "  |  "}@{verbatim "\<psi>"}@{text "  |"} \\

   132           &   & @{verbatim "\<omega>"}@{text "  |  "}@{verbatim "\<Gamma>"}@{text "  |  "}@{verbatim "\<Delta>"}@{text "  |  "}@{verbatim "\<Theta>"}@{text "  |"} \\

   133           &   & @{verbatim "\<Lambda>"}@{text "  |  "}@{verbatim "\<Xi>"}@{text "  |  "}@{verbatim "\<Pi>"}@{text "  |  "}@{verbatim "\<Sigma>"}@{text "  |"} \\

   134           &   & @{verbatim "\<Upsilon>"}@{text "  |  "}@{verbatim "\<Phi>"}@{text "  |  "}@{verbatim "\<Psi>"}@{text "  |  "}@{verbatim "\<Omega>"} \\

   135   \end{supertabular}

   136   \end{center}

   138   A @{syntax_ref var} or @{syntax_ref typevar} describes an unknown,

   139   which is internally a pair of base name and index (ML type @{ML_type

   140   indexname}).  These components are either separated by a dot as in

   141   @{text "?x.1"} or @{text "?x7.3"} or run together as in @{text

   142   "?x1"}.  The latter form is possible if the base name does not end

   143   with digits.  If the index is 0, it may be dropped altogether:

   144   @{text "?x"} and @{text "?x0"} and @{text "?x.0"} all refer to the

   145   same unknown, with basename @{text "x"} and index 0.

   147   The syntax of @{syntax_ref string} admits any characters, including

   148   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim

   149   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary

   150   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',

   151   with three decimal digits.  Alternative strings according to

   152   @{syntax_ref altstring} are analogous, using single back-quotes

   153   instead.

   155   The body of @{syntax_ref verbatim} may consist of any text not

   156   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows

   157   convenient inclusion of quotes without further escapes.  There is no

   158   way to escape ``@{verbatim "*"}@{verbatim "}"}''.  If the quoted

   159   text is {\LaTeX} source, one may usually add some blank or comment

   160   to avoid the critical character sequence.

   162   Source comments take the form @{verbatim "(*"}~@{text

   163   "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface

   164   might prevent this.  Note that this form indicates source comments

   165   only, which are stripped after lexical analysis of the input.  The

   166   Isar syntax also provides proper \emph{document comments} that are

   167   considered as part of the text (see \secref{sec:comments}).

   169   Common mathematical symbols such as @{text \<forall>} are represented in

   170   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle

   171   symbols like this, although proper presentation is left to front-end

   172   tools such as {\LaTeX} or Proof~General with the X-Symbol package.

   173   A list of standard Isabelle symbols that work well with these tools

   174   is given in \appref{app:symbols}.  Note that @{verbatim "\<lambda>"} does

   175   not belong to the @{text letter} category, since it is already used

   176   differently in the Pure term language.

   177 *}

   180 section {* Common syntax entities *}

   182 text {*

   183   We now introduce several basic syntactic entities, such as names,

   184   terms, and theorem specifications, which are factored out of the

   185   actual Isar language elements to be described later.

   186 *}

   189 subsection {* Names *}

   191 text {*

   192   Entity \railqtok{name} usually refers to any name of types,

   193   constants, theorems etc.\ that are to be \emph{declared} or

   194   \emph{defined} (so qualified identifiers are excluded here).  Quoted

   195   strings provide an escape for non-identifier names or those ruled

   196   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).

   197   Already existing objects are usually referenced by

   198   \railqtok{nameref}.

   200   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}

   201   \indexoutertoken{int}

   202   \begin{rail}

   203     name: ident | symident | string | nat

   204     ;

   205     parname: '(' name ')'

   206     ;

   207     nameref: name | longident

   208     ;

   209     int: nat | '-' nat

   210     ;

   211   \end{rail}

   212 *}

   215 subsection {* Comments \label{sec:comments} *}

   217 text {*

   218   Large chunks of plain \railqtok{text} are usually given

   219   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim

   220   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,

   221   any of the smaller text units conforming to \railqtok{nameref} are

   222   admitted as well.  A marginal \railnonterm{comment} is of the form

   223   @{verbatim "--"} \railqtok{text}.  Any number of these may occur

   224   within Isabelle/Isar commands.

   226   \indexoutertoken{text}\indexouternonterm{comment}

   227   \begin{rail}

   228     text: verbatim | nameref

   229     ;

   230     comment: '--' text

   231     ;

   232   \end{rail}

   233 *}

   236 subsection {* Type classes, sorts and arities *}

   238 text {*

   239   Classes are specified by plain names.  Sorts have a very simple

   240   inner syntax, which is either a single class name @{text c} or a

   241   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the

   242   intersection of these classes.  The syntax of type arities is given

   243   directly at the outer level.

   245   \indexouternonterm{sort}\indexouternonterm{arity}

   246   \indexouternonterm{classdecl}

   247   \begin{rail}

   248     classdecl: name (('<' | subseteq) (nameref + ','))?

   249     ;

   250     sort: nameref

   251     ;

   252     arity: ('(' (sort + ',') ')')? sort

   253     ;

   254   \end{rail}

   255 *}

   258 subsection {* Types and terms \label{sec:types-terms} *}

   260 text {*

   261   The actual inner Isabelle syntax, that of types and terms of the

   262   logic, is far too sophisticated in order to be modelled explicitly

   263   at the outer theory level.  Basically, any such entity has to be

   264   quoted to turn it into a single token (the parsing and type-checking

   265   is performed internally later).  For convenience, a slightly more

   266   liberal convention is adopted: quotes may be omitted for any type or

   267   term that is already atomic at the outer level.  For example, one

   268   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.

   269   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text

   270   "\<forall>"} are available as well, provided these have not been superseded

   271   by commands or other keywords already (such as @{verbatim "="} or

   272   @{verbatim "+"}).

   274   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}

   275   \begin{rail}

   276     type: nameref | typefree | typevar

   277     ;

   278     term: nameref | var

   279     ;

   280     prop: term

   281     ;

   282   \end{rail}

   284   Positional instantiations are indicated by giving a sequence of

   285   terms, or the placeholder ``@{text _}'' (underscore), which means to

   286   skip a position.

   288   \indexoutertoken{inst}\indexoutertoken{insts}

   289   \begin{rail}

   290     inst: underscore | term

   291     ;

   292     insts: (inst *)

   293     ;

   294   \end{rail}

   296   Type declarations and definitions usually refer to

   297   \railnonterm{typespec} on the left-hand side.  This models basic

   298   type constructor application at the outer syntax level.  Note that

   299   only plain postfix notation is available here, but no infixes.

   301   \indexouternonterm{typespec}

   302   \begin{rail}

   303     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name

   304     ;

   305   \end{rail}

   306 *}

   309 subsection {* Term patterns and declarations \label{sec:term-decls} *}

   311 text {*

   312   Wherever explicit propositions (or term fragments) occur in a proof

   313   text, casual binding of schematic term variables may be given

   314   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>

   315   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.

   317   \indexouternonterm{termpat}\indexouternonterm{proppat}

   318   \begin{rail}

   319     termpat: '(' ('is' term +) ')'

   320     ;

   321     proppat: '(' ('is' prop +) ')'

   322     ;

   323   \end{rail}

   325   \medskip Declarations of local variables @{text "x :: \<tau>"} and

   326   logical propositions @{text "a : \<phi>"} represent different views on

   327   the same principle of introducing a local scope.  In practice, one

   328   may usually omit the typing of \railnonterm{vars} (due to

   329   type-inference), and the naming of propositions (due to implicit

   330   references of current facts).  In any case, Isar proof elements

   331   usually admit to introduce multiple such items simultaneously.

   333   \indexouternonterm{vars}\indexouternonterm{props}

   334   \begin{rail}

   335     vars: (name+) ('::' type)?

   336     ;

   337     props: thmdecl? (prop proppat? +)

   338     ;

   339   \end{rail}

   341   The treatment of multiple declarations corresponds to the

   342   complementary focus of \railnonterm{vars} versus

   343   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''

   344   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>

   345   \<phi>\<^sub>n"} the naming refers to all propositions collectively.

   346   Isar language elements that refer to \railnonterm{vars} or

   347   \railnonterm{props} typically admit separate typings or namings via

   348   another level of iteration, with explicit @{keyword_ref "and"}

   349   separators; e.g.\ see @{command "fix"} and @{command "assume"} in

   350   \secref{sec:proof-context}.

   351 *}

   354 subsection {* Attributes and theorems \label{sec:syn-att} *}

   356 text {* Attributes have their own ``semi-inner'' syntax, in the sense

   357   that input conforming to \railnonterm{args} below is parsed by the

   358   attribute a second time.  The attribute argument specifications may

   359   be any sequence of atomic entities (identifiers, strings etc.), or

   360   properly bracketed argument lists.  Below \railqtok{atom} refers to

   361   any atomic entity, including any \railtok{keyword} conforming to

   362   \railtok{symident}.

   364   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}

   365   \begin{rail}

   366     atom: nameref | typefree | typevar | var | nat | keyword

   367     ;

   368     arg: atom | '(' args ')' | '[' args ']'

   369     ;

   370     args: arg *

   371     ;

   372     attributes: '[' (nameref args * ',') ']'

   373     ;

   374   \end{rail}

   376   Theorem specifications come in several flavors:

   377   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to

   378   axioms, assumptions or results of goal statements, while

   379   \railnonterm{thmdef} collects lists of existing theorems.  Existing

   380   theorems are given by \railnonterm{thmref} and

   381   \railnonterm{thmrefs}, the former requires an actual singleton

   382   result.

   384   There are three forms of theorem references:

   385   \begin{enumerate}

   387   \item named facts @{text "a"},

   389   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},

   391   \item literal fact propositions using @{syntax_ref altstring} syntax

   392   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method

   393   @{method_ref fact}).

   395   \end{enumerate}

   397   Any kind of theorem specification may include lists of attributes

   398   both on the left and right hand sides; attributes are applied to any

   399   immediately preceding fact.  If names are omitted, the theorems are

   400   not stored within the theorem database of the theory or proof

   401   context, but any given attributes are applied nonetheless.

   403   An extra pair of brackets around attributes (like ``@{text

   404   "[[simproc a]]"}'') abbreviates a theorem reference involving an

   405   internal dummy fact, which will be ignored later on.  So only the

   406   effect of the attribute on the background context will persist.

   407   This form of in-place declarations is particularly useful with

   408   commands like @{command "declare"} and @{command "using"}.

   410   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}

   411   \indexouternonterm{thmdef}\indexouternonterm{thmref}

   412   \indexouternonterm{thmrefs}\indexouternonterm{selection}

   413   \begin{rail}

   414     axmdecl: name attributes? ':'

   415     ;

   416     thmdecl: thmbind ':'

   417     ;

   418     thmdef: thmbind '='

   419     ;

   420     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'

   421     ;

   422     thmrefs: thmref +

   423     ;

   425     thmbind: name attributes | name | attributes

   426     ;

   427     selection: '(' ((nat | nat '-' nat?) + ',') ')'

   428     ;

   429   \end{rail}

   430 *}

   432 end

author	wenzelm
	Tue, 18 Nov 2008 18:25:10 +0100
changeset 28838	d5db6dfcb34a
parent 28778	a25630deacaf
child 30045	d2597c4f7e5c
child 30240	5b25fee0362c
permissions	-rw-r--r--