doc-src/IsarRef/Thy/Outer_Syntax.thy
author Walther Neuper <neuper@ist.tugraz.at>
Thu, 12 Aug 2010 15:03:34 +0200
branchisac-from-Isabelle2009-2
changeset 37913 20e3616b2d9c
parent 35841 94f901e4969a
child 40536 47f572aff50a
permissions -rw-r--r--
prepare reactivation of isac-update-Isa09-2
     1 theory Outer_Syntax
     2 imports Main
     3 begin
     4 
     5 chapter {* Outer syntax *}
     6 
     7 text {*
     8   The rather generic framework of Isabelle/Isar syntax emerges from
     9   three main syntactic categories: \emph{commands} of the top-level
    10   Isar engine (covering theory and proof elements), \emph{methods} for
    11   general goal refinements (analogous to traditional ``tactics''), and
    12   \emph{attributes} for operations on facts (within a certain
    13   context).  Subsequently we give a reference of basic syntactic
    14   entities underlying Isabelle/Isar syntax in a bottom-up manner.
    15   Concrete theory and proof language elements will be introduced later
    16   on.
    17 
    18   \medskip In order to get started with writing well-formed
    19   Isabelle/Isar documents, the most important aspect to be noted is
    20   the difference of \emph{inner} versus \emph{outer} syntax.  Inner
    21   syntax is that of Isabelle types and terms of the logic, while outer
    22   syntax is that of Isabelle/Isar theory sources (specifications and
    23   proofs).  As a general rule, inner syntax entities may occur only as
    24   \emph{atomic entities} within outer syntax.  For example, the string
    25   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
    26   specifications within a theory, while @{verbatim "x + y"} without
    27   quotes is not.
    28 
    29   Printed theory documents usually omit quotes to gain readability
    30   (this is a matter of {\LaTeX} macro setup, say via @{verbatim
    31   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced
    32   users of Isabelle/Isar may easily reconstruct the lost technical
    33   information, while mere readers need not care about quotes at all.
    34 
    35   \medskip Isabelle/Isar input may contain any number of input
    36   termination characters ``@{verbatim ";"}'' (semicolon) to separate
    37   commands explicitly.  This is particularly useful in interactive
    38   shell sessions to make clear where the current command is intended
    39   to end.  Otherwise, the interpreter loop will continue to issue a
    40   secondary prompt ``@{verbatim "#"}'' until an end-of-command is
    41   clearly recognized from the input syntax, e.g.\ encounter of the
    42   next command keyword.
    43 
    44   More advanced interfaces such as Proof~General \cite{proofgeneral}
    45   do not require explicit semicolons, the amount of input text is
    46   determined automatically by inspecting the present content of the
    47   Emacs text buffer.  In the printed presentation of Isabelle/Isar
    48   documents semicolons are omitted altogether for readability.
    49 
    50   \begin{warn}
    51     Proof~General requires certain syntax classification tables in
    52     order to achieve properly synchronized interaction with the
    53     Isabelle/Isar process.  These tables need to be consistent with
    54     the Isabelle version and particular logic image to be used in a
    55     running session (common object-logics may well change the outer
    56     syntax).  The standard setup should work correctly with any of the
    57     ``official'' logic images derived from Isabelle/HOL (including
    58     HOLCF etc.).  Users of alternative logics may need to tell
    59     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
    60     (in conjunction with @{verbatim "-l ZF"}, to specify the default
    61     logic image).  Note that option @{verbatim "-L"} does both
    62     of this at the same time.
    63   \end{warn}
    64 *}
    65 
    66 
    67 section {* Lexical matters \label{sec:outer-lex} *}
    68 
    69 text {* The outer lexical syntax consists of three main categories of
    70   syntax tokens:
    71 
    72   \begin{enumerate}
    73 
    74   \item \emph{major keywords} --- the command names that are available
    75   in the present logic session;
    76 
    77   \item \emph{minor keywords} --- additional literal tokens required
    78   by the syntax of commands;
    79 
    80   \item \emph{named tokens} --- various categories of identifiers etc.
    81 
    82   \end{enumerate}
    83 
    84   Major keywords and minor keywords are guaranteed to be disjoint.
    85   This helps user-interfaces to determine the overall structure of a
    86   theory text, without knowing the full details of command syntax.
    87   Internally, there is some additional information about the kind of
    88   major keywords, which approximates the command type (theory command,
    89   proof command etc.).
    90 
    91   Keywords override named tokens.  For example, the presence of a
    92   command called @{verbatim term} inhibits the identifier @{verbatim
    93   term}, but the string @{verbatim "\"term\""} can be used instead.
    94   By convention, the outer syntax always allows quoted strings in
    95   addition to identifiers, wherever a named entity is expected.
    96 
    97   When tokenizing a given input sequence, the lexer repeatedly takes
    98   the longest prefix of the input that forms a valid token.  Spaces,
    99   tabs, newlines and formfeeds between tokens serve as explicit
   100   separators.
   101 
   102   \medskip The categories for named tokens are defined once and for
   103   all as follows.
   104 
   105   \begin{center}
   106   \begin{supertabular}{rcl}
   107     @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\
   108     @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\
   109     @{syntax_def symident} & = & @{text "sym\<^sup>+  |  "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\
   110     @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\
   111     @{syntax_def var} & = & @{verbatim "?"}@{text "ident  |  "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\
   112     @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\
   113     @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree  |  "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\
   114     @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\
   115     @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\
   116     @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]
   117 
   118     @{text letter} & = & @{text "latin  |  "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text "  |  "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text "  |  greek  |"} \\
   119           &   & @{verbatim "\<^isub>"}@{text "  |  "}@{verbatim "\<^isup>"} \\
   120     @{text quasiletter} & = & @{text "letter  |  digit  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "'"} \\
   121     @{text latin} & = & @{verbatim a}@{text "  | \<dots> |  "}@{verbatim z}@{text "  |  "}@{verbatim A}@{text "  |  \<dots> |  "}@{verbatim Z} \\
   122     @{text digit} & = & @{verbatim "0"}@{text "  |  \<dots> |  "}@{verbatim "9"} \\
   123     @{text sym} & = & @{verbatim "!"}@{text "  |  "}@{verbatim "#"}@{text "  |  "}@{verbatim "$"}@{text "  |  "}@{verbatim "%"}@{text "  |  "}@{verbatim "&"}@{text "  |  "}@{verbatim "*"}@{text "  |  "}@{verbatim "+"}@{text "  |  "}@{verbatim "-"}@{text "  |  "}@{verbatim "/"}@{text "  |"} \\
   124     & & @{verbatim "<"}@{text "  |  "}@{verbatim "="}@{text "  |  "}@{verbatim ">"}@{text "  |  "}@{verbatim "?"}@{text "  |  "}@{verbatim "@"}@{text "  |  "}@{verbatim "^"}@{text "  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "|"}@{text "  |  "}@{verbatim "~"} \\
   125     @{text greek} & = & @{verbatim "\<alpha>"}@{text "  |  "}@{verbatim "\<beta>"}@{text "  |  "}@{verbatim "\<gamma>"}@{text "  |  "}@{verbatim "\<delta>"}@{text "  |"} \\
   126           &   & @{verbatim "\<epsilon>"}@{text "  |  "}@{verbatim "\<zeta>"}@{text "  |  "}@{verbatim "\<eta>"}@{text "  |  "}@{verbatim "\<theta>"}@{text "  |"} \\
   127           &   & @{verbatim "\<iota>"}@{text "  |  "}@{verbatim "\<kappa>"}@{text "  |  "}@{verbatim "\<mu>"}@{text "  |  "}@{verbatim "\<nu>"}@{text "  |"} \\
   128           &   & @{verbatim "\<xi>"}@{text "  |  "}@{verbatim "\<pi>"}@{text "  |  "}@{verbatim "\<rho>"}@{text "  |  "}@{verbatim "\<sigma>"}@{text "  |  "}@{verbatim "\<tau>"}@{text "  |"} \\
   129           &   & @{verbatim "\<upsilon>"}@{text "  |  "}@{verbatim "\<phi>"}@{text "  |  "}@{verbatim "\<chi>"}@{text "  |  "}@{verbatim "\<psi>"}@{text "  |"} \\
   130           &   & @{verbatim "\<omega>"}@{text "  |  "}@{verbatim "\<Gamma>"}@{text "  |  "}@{verbatim "\<Delta>"}@{text "  |  "}@{verbatim "\<Theta>"}@{text "  |"} \\
   131           &   & @{verbatim "\<Lambda>"}@{text "  |  "}@{verbatim "\<Xi>"}@{text "  |  "}@{verbatim "\<Pi>"}@{text "  |  "}@{verbatim "\<Sigma>"}@{text "  |"} \\
   132           &   & @{verbatim "\<Upsilon>"}@{text "  |  "}@{verbatim "\<Phi>"}@{text "  |  "}@{verbatim "\<Psi>"}@{text "  |  "}@{verbatim "\<Omega>"} \\
   133   \end{supertabular}
   134   \end{center}
   135 
   136   A @{syntax_ref var} or @{syntax_ref typevar} describes an unknown,
   137   which is internally a pair of base name and index (ML type @{ML_type
   138   indexname}).  These components are either separated by a dot as in
   139   @{text "?x.1"} or @{text "?x7.3"} or run together as in @{text
   140   "?x1"}.  The latter form is possible if the base name does not end
   141   with digits.  If the index is 0, it may be dropped altogether:
   142   @{text "?x"} and @{text "?x0"} and @{text "?x.0"} all refer to the
   143   same unknown, with basename @{text "x"} and index 0.
   144 
   145   The syntax of @{syntax_ref string} admits any characters, including
   146   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
   147   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
   148   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
   149   with three decimal digits.  Alternative strings according to
   150   @{syntax_ref altstring} are analogous, using single back-quotes
   151   instead.
   152 
   153   The body of @{syntax_ref verbatim} may consist of any text not
   154   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
   155   convenient inclusion of quotes without further escapes.  There is no
   156   way to escape ``@{verbatim "*"}@{verbatim "}"}''.  If the quoted
   157   text is {\LaTeX} source, one may usually add some blank or comment
   158   to avoid the critical character sequence.
   159 
   160   Source comments take the form @{verbatim "(*"}~@{text
   161   "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface
   162   might prevent this.  Note that this form indicates source comments
   163   only, which are stripped after lexical analysis of the input.  The
   164   Isar syntax also provides proper \emph{document comments} that are
   165   considered as part of the text (see \secref{sec:comments}).
   166 
   167   Common mathematical symbols such as @{text \<forall>} are represented in
   168   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle
   169   symbols like this, although proper presentation is left to front-end
   170   tools such as {\LaTeX} or Proof~General with the X-Symbol package.
   171   A list of predefined Isabelle symbols that work well with these
   172   tools is given in \appref{app:symbols}.  Note that @{verbatim "\<lambda>"}
   173   does not belong to the @{text letter} category, since it is already
   174   used differently in the Pure term language.
   175 *}
   176 
   177 
   178 section {* Common syntax entities *}
   179 
   180 text {*
   181   We now introduce several basic syntactic entities, such as names,
   182   terms, and theorem specifications, which are factored out of the
   183   actual Isar language elements to be described later.
   184 *}
   185 
   186 
   187 subsection {* Names *}
   188 
   189 text {*
   190   Entity \railqtok{name} usually refers to any name of types,
   191   constants, theorems etc.\ that are to be \emph{declared} or
   192   \emph{defined} (so qualified identifiers are excluded here).  Quoted
   193   strings provide an escape for non-identifier names or those ruled
   194   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
   195   Already existing objects are usually referenced by
   196   \railqtok{nameref}.
   197 
   198   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   199   \indexoutertoken{int}
   200   \begin{rail}
   201     name: ident | symident | string | nat
   202     ;
   203     parname: '(' name ')'
   204     ;
   205     nameref: name | longident
   206     ;
   207     int: nat | '-' nat
   208     ;
   209   \end{rail}
   210 *}
   211 
   212 
   213 subsection {* Comments \label{sec:comments} *}
   214 
   215 text {*
   216   Large chunks of plain \railqtok{text} are usually given
   217   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
   218   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,
   219   any of the smaller text units conforming to \railqtok{nameref} are
   220   admitted as well.  A marginal \railnonterm{comment} is of the form
   221   @{verbatim "--"} \railqtok{text}.  Any number of these may occur
   222   within Isabelle/Isar commands.
   223 
   224   \indexoutertoken{text}\indexouternonterm{comment}
   225   \begin{rail}
   226     text: verbatim | nameref
   227     ;
   228     comment: '--' text
   229     ;
   230   \end{rail}
   231 *}
   232 
   233 
   234 subsection {* Type classes, sorts and arities *}
   235 
   236 text {*
   237   Classes are specified by plain names.  Sorts have a very simple
   238   inner syntax, which is either a single class name @{text c} or a
   239   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
   240   intersection of these classes.  The syntax of type arities is given
   241   directly at the outer level.
   242 
   243   \indexouternonterm{sort}\indexouternonterm{arity}
   244   \indexouternonterm{classdecl}
   245   \begin{rail}
   246     classdecl: name (('<' | subseteq) (nameref + ','))?
   247     ;
   248     sort: nameref
   249     ;
   250     arity: ('(' (sort + ',') ')')? sort
   251     ;
   252   \end{rail}
   253 *}
   254 
   255 
   256 subsection {* Types and terms \label{sec:types-terms} *}
   257 
   258 text {*
   259   The actual inner Isabelle syntax, that of types and terms of the
   260   logic, is far too sophisticated in order to be modelled explicitly
   261   at the outer theory level.  Basically, any such entity has to be
   262   quoted to turn it into a single token (the parsing and type-checking
   263   is performed internally later).  For convenience, a slightly more
   264   liberal convention is adopted: quotes may be omitted for any type or
   265   term that is already atomic at the outer level.  For example, one
   266   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
   267   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
   268   "\<forall>"} are available as well, provided these have not been superseded
   269   by commands or other keywords already (such as @{verbatim "="} or
   270   @{verbatim "+"}).
   271 
   272   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   273   \begin{rail}
   274     type: nameref | typefree | typevar
   275     ;
   276     term: nameref | var
   277     ;
   278     prop: term
   279     ;
   280   \end{rail}
   281 
   282   Positional instantiations are indicated by giving a sequence of
   283   terms, or the placeholder ``@{text _}'' (underscore), which means to
   284   skip a position.
   285 
   286   \indexoutertoken{inst}\indexoutertoken{insts}
   287   \begin{rail}
   288     inst: underscore | term
   289     ;
   290     insts: (inst *)
   291     ;
   292   \end{rail}
   293 
   294   Type declarations and definitions usually refer to
   295   \railnonterm{typespec} on the left-hand side.  This models basic
   296   type constructor application at the outer syntax level.  Note that
   297   only plain postfix notation is available here, but no infixes.
   298 
   299   \indexouternonterm{typespec}
   300   \indexouternonterm{typespecsorts}
   301   \begin{rail}
   302     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   303     ;
   304 
   305     typespecsorts: (() | (typefree ('::' sort)?) | '(' ( (typefree ('::' sort)?) + ',' ) ')') name
   306     ;
   307   \end{rail}
   308 *}
   309 
   310 
   311 subsection {* Term patterns and declarations \label{sec:term-decls} *}
   312 
   313 text {*
   314   Wherever explicit propositions (or term fragments) occur in a proof
   315   text, casual binding of schematic term variables may be given
   316   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
   317   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.
   318 
   319   \indexouternonterm{termpat}\indexouternonterm{proppat}
   320   \begin{rail}
   321     termpat: '(' ('is' term +) ')'
   322     ;
   323     proppat: '(' ('is' prop +) ')'
   324     ;
   325   \end{rail}
   326 
   327   \medskip Declarations of local variables @{text "x :: \<tau>"} and
   328   logical propositions @{text "a : \<phi>"} represent different views on
   329   the same principle of introducing a local scope.  In practice, one
   330   may usually omit the typing of \railnonterm{vars} (due to
   331   type-inference), and the naming of propositions (due to implicit
   332   references of current facts).  In any case, Isar proof elements
   333   usually admit to introduce multiple such items simultaneously.
   334 
   335   \indexouternonterm{vars}\indexouternonterm{props}
   336   \begin{rail}
   337     vars: (name+) ('::' type)?
   338     ;
   339     props: thmdecl? (prop proppat? +)
   340     ;
   341   \end{rail}
   342 
   343   The treatment of multiple declarations corresponds to the
   344   complementary focus of \railnonterm{vars} versus
   345   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
   346   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
   347   \<phi>\<^sub>n"} the naming refers to all propositions collectively.
   348   Isar language elements that refer to \railnonterm{vars} or
   349   \railnonterm{props} typically admit separate typings or namings via
   350   another level of iteration, with explicit @{keyword_ref "and"}
   351   separators; e.g.\ see @{command "fix"} and @{command "assume"} in
   352   \secref{sec:proof-context}.
   353 *}
   354 
   355 
   356 subsection {* Attributes and theorems \label{sec:syn-att} *}
   357 
   358 text {* Attributes have their own ``semi-inner'' syntax, in the sense
   359   that input conforming to \railnonterm{args} below is parsed by the
   360   attribute a second time.  The attribute argument specifications may
   361   be any sequence of atomic entities (identifiers, strings etc.), or
   362   properly bracketed argument lists.  Below \railqtok{atom} refers to
   363   any atomic entity, including any \railtok{keyword} conforming to
   364   \railtok{symident}.
   365 
   366   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   367   \begin{rail}
   368     atom: nameref | typefree | typevar | var | nat | keyword
   369     ;
   370     arg: atom | '(' args ')' | '[' args ']'
   371     ;
   372     args: arg *
   373     ;
   374     attributes: '[' (nameref args * ',') ']'
   375     ;
   376   \end{rail}
   377 
   378   Theorem specifications come in several flavors:
   379   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
   380   axioms, assumptions or results of goal statements, while
   381   \railnonterm{thmdef} collects lists of existing theorems.  Existing
   382   theorems are given by \railnonterm{thmref} and
   383   \railnonterm{thmrefs}, the former requires an actual singleton
   384   result.
   385 
   386   There are three forms of theorem references:
   387   \begin{enumerate}
   388   
   389   \item named facts @{text "a"},
   390 
   391   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
   392 
   393   \item literal fact propositions using @{syntax_ref altstring} syntax
   394   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
   395   @{method_ref fact}).
   396 
   397   \end{enumerate}
   398 
   399   Any kind of theorem specification may include lists of attributes
   400   both on the left and right hand sides; attributes are applied to any
   401   immediately preceding fact.  If names are omitted, the theorems are
   402   not stored within the theorem database of the theory or proof
   403   context, but any given attributes are applied nonetheless.
   404 
   405   An extra pair of brackets around attributes (like ``@{text
   406   "[[simproc a]]"}'') abbreviates a theorem reference involving an
   407   internal dummy fact, which will be ignored later on.  So only the
   408   effect of the attribute on the background context will persist.
   409   This form of in-place declarations is particularly useful with
   410   commands like @{command "declare"} and @{command "using"}.
   411 
   412   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
   413   \indexouternonterm{thmdef}\indexouternonterm{thmref}
   414   \indexouternonterm{thmrefs}\indexouternonterm{selection}
   415   \begin{rail}
   416     axmdecl: name attributes? ':'
   417     ;
   418     thmdecl: thmbind ':'
   419     ;
   420     thmdef: thmbind '='
   421     ;
   422     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
   423     ;
   424     thmrefs: thmref +
   425     ;
   426 
   427     thmbind: name attributes | name | attributes
   428     ;
   429     selection: '(' ((nat | nat '-' nat?) + ',') ')'
   430     ;
   431   \end{rail}
   432 *}
   433 
   434 end