wneuper/isa: doc-src/IsarRef/Thy/Outer

     1 theory Outer_Syntax

     2 imports Main

     3 begin

     5 chapter {* Outer syntax *}

     7 text {*

     8   The rather generic framework of Isabelle/Isar syntax emerges from

     9   three main syntactic categories: \emph{commands} of the top-level

    10   Isar engine (covering theory and proof elements), \emph{methods} for

    11   general goal refinements (analogous to traditional ``tactics''), and

    12   \emph{attributes} for operations on facts (within a certain

    13   context).  Subsequently we give a reference of basic syntactic

    14   entities underlying Isabelle/Isar syntax in a bottom-up manner.

    15   Concrete theory and proof language elements will be introduced later

    16   on.

    18   \medskip In order to get started with writing well-formed

    19   Isabelle/Isar documents, the most important aspect to be noted is

    20   the difference of \emph{inner} versus \emph{outer} syntax.  Inner

    21   syntax is that of Isabelle types and terms of the logic, while outer

    22   syntax is that of Isabelle/Isar theory sources (specifications and

    23   proofs).  As a general rule, inner syntax entities may occur only as

    24   \emph{atomic entities} within outer syntax.  For example, the string

    25   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term

    26   specifications within a theory, while @{verbatim "x + y"} without

    27   quotes is not.

    29   Printed theory documents usually omit quotes to gain readability

    30   (this is a matter of {\LaTeX} macro setup, say via @{verbatim

    31   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced

    32   users of Isabelle/Isar may easily reconstruct the lost technical

    33   information, while mere readers need not care about quotes at all.

    35   \medskip Isabelle/Isar input may contain any number of input

    36   termination characters ``@{verbatim ";"}'' (semicolon) to separate

    37   commands explicitly.  This is particularly useful in interactive

    38   shell sessions to make clear where the current command is intended

    39   to end.  Otherwise, the interpreter loop will continue to issue a

    40   secondary prompt ``@{verbatim "#"}'' until an end-of-command is

    41   clearly recognized from the input syntax, e.g.\ encounter of the

    42   next command keyword.

    44   More advanced interfaces such as Proof~General \cite{proofgeneral}

    45   do not require explicit semicolons, the amount of input text is

    46   determined automatically by inspecting the present content of the

    47   Emacs text buffer.  In the printed presentation of Isabelle/Isar

    48   documents semicolons are omitted altogether for readability.

    50   \begin{warn}

    51     Proof~General requires certain syntax classification tables in

    52     order to achieve properly synchronized interaction with the

    53     Isabelle/Isar process.  These tables need to be consistent with

    54     the Isabelle version and particular logic image to be used in a

    55     running session (common object-logics may well change the outer

    56     syntax).  The standard setup should work correctly with any of the

    57     ``official'' logic images derived from Isabelle/HOL (including

    58     HOLCF etc.).  Users of alternative logics may need to tell

    59     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}

    60     (in conjunction with @{verbatim "-l ZF"}, to specify the default

    61     logic image).  Note that option @{verbatim "-L"} does both

    62     of this at the same time.

    63   \end{warn}

    64 *}

    67 section {* Lexical matters \label{sec:outer-lex} *}

    69 text {* The outer lexical syntax consists of three main categories of

    70   syntax tokens:

    72   \begin{enumerate}

    74   \item \emph{major keywords} --- the command names that are available

    75   in the present logic session;

    77   \item \emph{minor keywords} --- additional literal tokens required

    78   by the syntax of commands;

    80   \item \emph{named tokens} --- various categories of identifiers etc.

    82   \end{enumerate}

    84   Major keywords and minor keywords are guaranteed to be disjoint.

    85   This helps user-interfaces to determine the overall structure of a

    86   theory text, without knowing the full details of command syntax.

    87   Internally, there is some additional information about the kind of

    88   major keywords, which approximates the command type (theory command,

    89   proof command etc.).

    91   Keywords override named tokens.  For example, the presence of a

    92   command called @{verbatim term} inhibits the identifier @{verbatim

    93   term}, but the string @{verbatim "\"term\""} can be used instead.

    94   By convention, the outer syntax always allows quoted strings in

    95   addition to identifiers, wherever a named entity is expected.

    97   When tokenizing a given input sequence, the lexer repeatedly takes

    98   the longest prefix of the input that forms a valid token.  Spaces,

    99   tabs, newlines and formfeeds between tokens serve as explicit

   100   separators.

   102   \medskip The categories for named tokens are defined once and for

   103   all as follows.

   105   \begin{center}

   106   \begin{supertabular}{rcl}

   107     @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\

   108     @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\

   109     @{syntax_def symident} & = & @{text "sym\<^sup>+  |  "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\

   110     @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\

   111     @{syntax_def var} & = & @{verbatim "?"}@{text "ident  |  "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\

   112     @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\

   113     @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree  |  "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\

   114     @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\

   115     @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\

   116     @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]

   118     @{text letter} & = & @{text "latin  |  "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text "  |  "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text "  |  greek  |"} \\

   119           &   & @{verbatim "\<^isub>"}@{text "  |  "}@{verbatim "\<^isup>"} \\

   120     @{text quasiletter} & = & @{text "letter  |  digit  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "'"} \\

   121     @{text latin} & = & @{verbatim a}@{text "  | \<dots> |  "}@{verbatim z}@{text "  |  "}@{verbatim A}@{text "  |  \<dots> |  "}@{verbatim Z} \\

   122     @{text digit} & = & @{verbatim "0"}@{text "  |  \<dots> |  "}@{verbatim "9"} \\

   123     @{text sym} & = & @{verbatim "!"}@{text "  |  "}@{verbatim "#"}@{text "  |  "}@{verbatim "$"}@{text "  |  "}@{verbatim "%"}@{text "  |  "}@{verbatim "&"}@{text "  |  "}@{verbatim "*"}@{text "  |  "}@{verbatim "+"}@{text "  |  "}@{verbatim "-"}@{text "  |  "}@{verbatim "/"}@{text "  |"} \\

   124     & & @{verbatim "<"}@{text "  |  "}@{verbatim "="}@{text "  |  "}@{verbatim ">"}@{text "  |  "}@{verbatim "?"}@{text "  |  "}@{verbatim "@"}@{text "  |  "}@{verbatim "^"}@{text "  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "|"}@{text "  |  "}@{verbatim "~"} \\

   125     @{text greek} & = & @{verbatim "\<alpha>"}@{text "  |  "}@{verbatim "\<beta>"}@{text "  |  "}@{verbatim "\<gamma>"}@{text "  |  "}@{verbatim "\<delta>"}@{text "  |"} \\

   126           &   & @{verbatim "\<epsilon>"}@{text "  |  "}@{verbatim "\<zeta>"}@{text "  |  "}@{verbatim "\<eta>"}@{text "  |  "}@{verbatim "\<theta>"}@{text "  |"} \\

   127           &   & @{verbatim "\<iota>"}@{text "  |  "}@{verbatim "\<kappa>"}@{text "  |  "}@{verbatim "\<mu>"}@{text "  |  "}@{verbatim "\<nu>"}@{text "  |"} \\

   128           &   & @{verbatim "\<xi>"}@{text "  |  "}@{verbatim "\<pi>"}@{text "  |  "}@{verbatim "\<rho>"}@{text "  |  "}@{verbatim "\<sigma>"}@{text "  |  "}@{verbatim "\<tau>"}@{text "  |"} \\

   129           &   & @{verbatim "\<upsilon>"}@{text "  |  "}@{verbatim "\<phi>"}@{text "  |  "}@{verbatim "\<chi>"}@{text "  |  "}@{verbatim "\<psi>"}@{text "  |"} \\

   130           &   & @{verbatim "\<omega>"}@{text "  |  "}@{verbatim "\<Gamma>"}@{text "  |  "}@{verbatim "\<Delta>"}@{text "  |  "}@{verbatim "\<Theta>"}@{text "  |"} \\

   131           &   & @{verbatim "\<Lambda>"}@{text "  |  "}@{verbatim "\<Xi>"}@{text "  |  "}@{verbatim "\<Pi>"}@{text "  |  "}@{verbatim "\<Sigma>"}@{text "  |"} \\

   132           &   & @{verbatim "\<Upsilon>"}@{text "  |  "}@{verbatim "\<Phi>"}@{text "  |  "}@{verbatim "\<Psi>"}@{text "  |  "}@{verbatim "\<Omega>"} \\

   133   \end{supertabular}

   134   \end{center}

   136   A @{syntax_ref var} or @{syntax_ref typevar} describes an unknown,

   137   which is internally a pair of base name and index (ML type @{ML_type

   138   indexname}).  These components are either separated by a dot as in

   139   @{text "?x.1"} or @{text "?x7.3"} or run together as in @{text

   140   "?x1"}.  The latter form is possible if the base name does not end

   141   with digits.  If the index is 0, it may be dropped altogether:

   142   @{text "?x"} and @{text "?x0"} and @{text "?x.0"} all refer to the

   143   same unknown, with basename @{text "x"} and index 0.

   145   The syntax of @{syntax_ref string} admits any characters, including

   146   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim

   147   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary

   148   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',

   149   with three decimal digits.  Alternative strings according to

   150   @{syntax_ref altstring} are analogous, using single back-quotes

   151   instead.

   153   The body of @{syntax_ref verbatim} may consist of any text not

   154   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows

   155   convenient inclusion of quotes without further escapes.  There is no

   156   way to escape ``@{verbatim "*"}@{verbatim "}"}''.  If the quoted

   157   text is {\LaTeX} source, one may usually add some blank or comment

   158   to avoid the critical character sequence.

   160   Source comments take the form @{verbatim "(*"}~@{text

   161   "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface

   162   might prevent this.  Note that this form indicates source comments

   163   only, which are stripped after lexical analysis of the input.  The

   164   Isar syntax also provides proper \emph{document comments} that are

   165   considered as part of the text (see \secref{sec:comments}).

   167   Common mathematical symbols such as @{text \<forall>} are represented in

   168   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle

   169   symbols like this, although proper presentation is left to front-end

   170   tools such as {\LaTeX} or Proof~General with the X-Symbol package.

   171   A list of predefined Isabelle symbols that work well with these

   172   tools is given in \appref{app:symbols}.  Note that @{verbatim "\<lambda>"}

   173   does not belong to the @{text letter} category, since it is already

   174   used differently in the Pure term language.

   175 *}

   178 section {* Common syntax entities *}

   180 text {*

   181   We now introduce several basic syntactic entities, such as names,

   182   terms, and theorem specifications, which are factored out of the

   183   actual Isar language elements to be described later.

   184 *}

   187 subsection {* Names *}

   189 text {*

   190   Entity \railqtok{name} usually refers to any name of types,

   191   constants, theorems etc.\ that are to be \emph{declared} or

   192   \emph{defined} (so qualified identifiers are excluded here).  Quoted

   193   strings provide an escape for non-identifier names or those ruled

   194   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).

   195   Already existing objects are usually referenced by

   196   \railqtok{nameref}.

   198   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}

   199   \indexoutertoken{int}

   200   \begin{rail}

   201     name: ident | symident | string | nat

   202     ;

   203     parname: '(' name ')'

   204     ;

   205     nameref: name | longident

   206     ;

   207     int: nat | '-' nat

   208     ;

   209   \end{rail}

   210 *}

   213 subsection {* Comments \label{sec:comments} *}

   215 text {*

   216   Large chunks of plain \railqtok{text} are usually given

   217   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim

   218   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,

   219   any of the smaller text units conforming to \railqtok{nameref} are

   220   admitted as well.  A marginal \railnonterm{comment} is of the form

   221   @{verbatim "--"} \railqtok{text}.  Any number of these may occur

   222   within Isabelle/Isar commands.

   224   \indexoutertoken{text}\indexouternonterm{comment}

   225   \begin{rail}

   226     text: verbatim | nameref

   227     ;

   228     comment: '--' text

   229     ;

   230   \end{rail}

   231 *}

   234 subsection {* Type classes, sorts and arities *}

   236 text {*

   237   Classes are specified by plain names.  Sorts have a very simple

   238   inner syntax, which is either a single class name @{text c} or a

   239   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the

   240   intersection of these classes.  The syntax of type arities is given

   241   directly at the outer level.

   243   \indexouternonterm{sort}\indexouternonterm{arity}

   244   \indexouternonterm{classdecl}

   245   \begin{rail}

   246     classdecl: name (('<' | subseteq) (nameref + ','))?

   247     ;

   248     sort: nameref

   249     ;

   250     arity: ('(' (sort + ',') ')')? sort

   251     ;

   252   \end{rail}

   253 *}

   256 subsection {* Types and terms \label{sec:types-terms} *}

   258 text {*

   259   The actual inner Isabelle syntax, that of types and terms of the

   260   logic, is far too sophisticated in order to be modelled explicitly

   261   at the outer theory level.  Basically, any such entity has to be

   262   quoted to turn it into a single token (the parsing and type-checking

   263   is performed internally later).  For convenience, a slightly more

   264   liberal convention is adopted: quotes may be omitted for any type or

   265   term that is already atomic at the outer level.  For example, one

   266   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.

   267   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text

   268   "\<forall>"} are available as well, provided these have not been superseded

   269   by commands or other keywords already (such as @{verbatim "="} or

   270   @{verbatim "+"}).

   272   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}

   273   \begin{rail}

   274     type: nameref | typefree | typevar

   275     ;

   276     term: nameref | var

   277     ;

   278     prop: term

   279     ;

   280   \end{rail}

   282   Positional instantiations are indicated by giving a sequence of

   283   terms, or the placeholder ``@{text _}'' (underscore), which means to

   284   skip a position.

   286   \indexoutertoken{inst}\indexoutertoken{insts}

   287   \begin{rail}

   288     inst: underscore | term

   289     ;

   290     insts: (inst *)

   291     ;

   292   \end{rail}

   294   Type declarations and definitions usually refer to

   295   \railnonterm{typespec} on the left-hand side.  This models basic

   296   type constructor application at the outer syntax level.  Note that

   297   only plain postfix notation is available here, but no infixes.

   299   \indexouternonterm{typespec}

   300   \indexouternonterm{typespecsorts}

   301   \begin{rail}

   302     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name

   303     ;

   305     typespecsorts: (() | (typefree ('::' sort)?) | '(' ( (typefree ('::' sort)?) + ',' ) ')') name

   306     ;

   307   \end{rail}

   308 *}

   311 subsection {* Term patterns and declarations \label{sec:term-decls} *}

   313 text {*

   314   Wherever explicit propositions (or term fragments) occur in a proof

   315   text, casual binding of schematic term variables may be given

   316   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>

   317   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.

   319   \indexouternonterm{termpat}\indexouternonterm{proppat}

   320   \begin{rail}

   321     termpat: '(' ('is' term +) ')'

   322     ;

   323     proppat: '(' ('is' prop +) ')'

   324     ;

   325   \end{rail}

   327   \medskip Declarations of local variables @{text "x :: \<tau>"} and

   328   logical propositions @{text "a : \<phi>"} represent different views on

   329   the same principle of introducing a local scope.  In practice, one

   330   may usually omit the typing of \railnonterm{vars} (due to

   331   type-inference), and the naming of propositions (due to implicit

   332   references of current facts).  In any case, Isar proof elements

   333   usually admit to introduce multiple such items simultaneously.

   335   \indexouternonterm{vars}\indexouternonterm{props}

   336   \begin{rail}

   337     vars: (name+) ('::' type)?

   338     ;

   339     props: thmdecl? (prop proppat? +)

   340     ;

   341   \end{rail}

   343   The treatment of multiple declarations corresponds to the

   344   complementary focus of \railnonterm{vars} versus

   345   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''

   346   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>

   347   \<phi>\<^sub>n"} the naming refers to all propositions collectively.

   348   Isar language elements that refer to \railnonterm{vars} or

   349   \railnonterm{props} typically admit separate typings or namings via

   350   another level of iteration, with explicit @{keyword_ref "and"}

   351   separators; e.g.\ see @{command "fix"} and @{command "assume"} in

   352   \secref{sec:proof-context}.

   353 *}

   356 subsection {* Attributes and theorems \label{sec:syn-att} *}

   358 text {* Attributes have their own ``semi-inner'' syntax, in the sense

   359   that input conforming to \railnonterm{args} below is parsed by the

   360   attribute a second time.  The attribute argument specifications may

   361   be any sequence of atomic entities (identifiers, strings etc.), or

   362   properly bracketed argument lists.  Below \railqtok{atom} refers to

   363   any atomic entity, including any \railtok{keyword} conforming to

   364   \railtok{symident}.

   366   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}

   367   \begin{rail}

   368     atom: nameref | typefree | typevar | var | nat | keyword

   369     ;

   370     arg: atom | '(' args ')' | '[' args ']'

   371     ;

   372     args: arg *

   373     ;

   374     attributes: '[' (nameref args * ',') ']'

   375     ;

   376   \end{rail}

   378   Theorem specifications come in several flavors:

   379   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to

   380   axioms, assumptions or results of goal statements, while

   381   \railnonterm{thmdef} collects lists of existing theorems.  Existing

   382   theorems are given by \railnonterm{thmref} and

   383   \railnonterm{thmrefs}, the former requires an actual singleton

   384   result.

   386   There are three forms of theorem references:

   387   \begin{enumerate}

   389   \item named facts @{text "a"},

   391   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},

   393   \item literal fact propositions using @{syntax_ref altstring} syntax

   394   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method

   395   @{method_ref fact}).

   397   \end{enumerate}

   399   Any kind of theorem specification may include lists of attributes

   400   both on the left and right hand sides; attributes are applied to any

   401   immediately preceding fact.  If names are omitted, the theorems are

   402   not stored within the theorem database of the theory or proof

   403   context, but any given attributes are applied nonetheless.

   405   An extra pair of brackets around attributes (like ``@{text

   406   "[[simproc a]]"}'') abbreviates a theorem reference involving an

   407   internal dummy fact, which will be ignored later on.  So only the

   408   effect of the attribute on the background context will persist.

   409   This form of in-place declarations is particularly useful with

   410   commands like @{command "declare"} and @{command "using"}.

   412   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}

   413   \indexouternonterm{thmdef}\indexouternonterm{thmref}

   414   \indexouternonterm{thmrefs}\indexouternonterm{selection}

   415   \begin{rail}

   416     axmdecl: name attributes? ':'

   417     ;

   418     thmdecl: thmbind ':'

   419     ;

   420     thmdef: thmbind '='

   421     ;

   422     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'

   423     ;

   424     thmrefs: thmref +

   425     ;

   427     thmbind: name attributes | name | attributes

   428     ;

   429     selection: '(' ((nat | nat '-' nat?) + ',') ')'

   430     ;

   431   \end{rail}

   432 *}

   434 end

author	wenzelm
	Fri, 19 Mar 2010 00:46:08 +0100
changeset 35841	94f901e4969a
parent 30242	aea5d7fa7ef5
child 40536	47f572aff50a
permissions	-rw-r--r--