doc-src/IsarRef/Thy/Outer_Syntax.thy
author wenzelm
Tue, 18 Nov 2008 18:25:10 +0100
changeset 28838 d5db6dfcb34a
parent 28778 a25630deacaf
child 30045 d2597c4f7e5c
child 30240 5b25fee0362c
permissions -rw-r--r--
moved table of standard Isabelle symbols to isar-ref manual;
     1 (* $Id$ *)
     2 
     3 theory Outer_Syntax
     4 imports Main
     5 begin
     6 
     7 chapter {* Outer syntax *}
     8 
     9 text {*
    10   The rather generic framework of Isabelle/Isar syntax emerges from
    11   three main syntactic categories: \emph{commands} of the top-level
    12   Isar engine (covering theory and proof elements), \emph{methods} for
    13   general goal refinements (analogous to traditional ``tactics''), and
    14   \emph{attributes} for operations on facts (within a certain
    15   context).  Subsequently we give a reference of basic syntactic
    16   entities underlying Isabelle/Isar syntax in a bottom-up manner.
    17   Concrete theory and proof language elements will be introduced later
    18   on.
    19 
    20   \medskip In order to get started with writing well-formed
    21   Isabelle/Isar documents, the most important aspect to be noted is
    22   the difference of \emph{inner} versus \emph{outer} syntax.  Inner
    23   syntax is that of Isabelle types and terms of the logic, while outer
    24   syntax is that of Isabelle/Isar theory sources (specifications and
    25   proofs).  As a general rule, inner syntax entities may occur only as
    26   \emph{atomic entities} within outer syntax.  For example, the string
    27   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
    28   specifications within a theory, while @{verbatim "x + y"} without
    29   quotes is not.
    30 
    31   Printed theory documents usually omit quotes to gain readability
    32   (this is a matter of {\LaTeX} macro setup, say via @{verbatim
    33   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced
    34   users of Isabelle/Isar may easily reconstruct the lost technical
    35   information, while mere readers need not care about quotes at all.
    36 
    37   \medskip Isabelle/Isar input may contain any number of input
    38   termination characters ``@{verbatim ";"}'' (semicolon) to separate
    39   commands explicitly.  This is particularly useful in interactive
    40   shell sessions to make clear where the current command is intended
    41   to end.  Otherwise, the interpreter loop will continue to issue a
    42   secondary prompt ``@{verbatim "#"}'' until an end-of-command is
    43   clearly recognized from the input syntax, e.g.\ encounter of the
    44   next command keyword.
    45 
    46   More advanced interfaces such as Proof~General \cite{proofgeneral}
    47   do not require explicit semicolons, the amount of input text is
    48   determined automatically by inspecting the present content of the
    49   Emacs text buffer.  In the printed presentation of Isabelle/Isar
    50   documents semicolons are omitted altogether for readability.
    51 
    52   \begin{warn}
    53     Proof~General requires certain syntax classification tables in
    54     order to achieve properly synchronized interaction with the
    55     Isabelle/Isar process.  These tables need to be consistent with
    56     the Isabelle version and particular logic image to be used in a
    57     running session (common object-logics may well change the outer
    58     syntax).  The standard setup should work correctly with any of the
    59     ``official'' logic images derived from Isabelle/HOL (including
    60     HOLCF etc.).  Users of alternative logics may need to tell
    61     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
    62     (in conjunction with @{verbatim "-l ZF"}, to specify the default
    63     logic image).  Note that option @{verbatim "-L"} does both
    64     of this at the same time.
    65   \end{warn}
    66 *}
    67 
    68 
    69 section {* Lexical matters \label{sec:outer-lex} *}
    70 
    71 text {* The outer lexical syntax consists of three main categories of
    72   syntax tokens:
    73 
    74   \begin{enumerate}
    75 
    76   \item \emph{major keywords} --- the command names that are available
    77   in the present logic session;
    78 
    79   \item \emph{minor keywords} --- additional literal tokens required
    80   by the syntax of commands;
    81 
    82   \item \emph{named tokens} --- various categories of identifiers etc.
    83 
    84   \end{enumerate}
    85 
    86   Major keywords and minor keywords are guaranteed to be disjoint.
    87   This helps user-interfaces to determine the overall structure of a
    88   theory text, without knowing the full details of command syntax.
    89   Internally, there is some additional information about the kind of
    90   major keywords, which approximates the command type (theory command,
    91   proof command etc.).
    92 
    93   Keywords override named tokens.  For example, the presence of a
    94   command called @{verbatim term} inhibits the identifier @{verbatim
    95   term}, but the string @{verbatim "\"term\""} can be used instead.
    96   By convention, the outer syntax always allows quoted strings in
    97   addition to identifiers, wherever a named entity is expected.
    98 
    99   When tokenizing a given input sequence, the lexer repeatedly takes
   100   the longest prefix of the input that forms a valid token.  Spaces,
   101   tabs, newlines and formfeeds between tokens serve as explicit
   102   separators.
   103 
   104   \medskip The categories for named tokens are defined once and for
   105   all as follows.
   106 
   107   \begin{center}
   108   \begin{supertabular}{rcl}
   109     @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\
   110     @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\
   111     @{syntax_def symident} & = & @{text "sym\<^sup>+  |  "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\
   112     @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\
   113     @{syntax_def var} & = & @{verbatim "?"}@{text "ident  |  "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\
   114     @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\
   115     @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree  |  "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\
   116     @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\
   117     @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\
   118     @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]
   119 
   120     @{text letter} & = & @{text "latin  |  "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text "  |  "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text "  |  greek  |"} \\
   121           &   & @{verbatim "\<^isub>"}@{text "  |  "}@{verbatim "\<^isup>"} \\
   122     @{text quasiletter} & = & @{text "letter  |  digit  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "'"} \\
   123     @{text latin} & = & @{verbatim a}@{text "  | \<dots> |  "}@{verbatim z}@{text "  |  "}@{verbatim A}@{text "  |  \<dots> |  "}@{verbatim Z} \\
   124     @{text digit} & = & @{verbatim "0"}@{text "  |  \<dots> |  "}@{verbatim "9"} \\
   125     @{text sym} & = & @{verbatim "!"}@{text "  |  "}@{verbatim "#"}@{text "  |  "}@{verbatim "$"}@{text "  |  "}@{verbatim "%"}@{text "  |  "}@{verbatim "&"}@{text "  |  "}@{verbatim "*"}@{text "  |  "}@{verbatim "+"}@{text "  |  "}@{verbatim "-"}@{text "  |  "}@{verbatim "/"}@{text "  |"} \\
   126     & & @{verbatim "<"}@{text "  |  "}@{verbatim "="}@{text "  |  "}@{verbatim ">"}@{text "  |  "}@{verbatim "?"}@{text "  |  "}@{verbatim "@"}@{text "  |  "}@{verbatim "^"}@{text "  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "|"}@{text "  |  "}@{verbatim "~"} \\
   127     @{text greek} & = & @{verbatim "\<alpha>"}@{text "  |  "}@{verbatim "\<beta>"}@{text "  |  "}@{verbatim "\<gamma>"}@{text "  |  "}@{verbatim "\<delta>"}@{text "  |"} \\
   128           &   & @{verbatim "\<epsilon>"}@{text "  |  "}@{verbatim "\<zeta>"}@{text "  |  "}@{verbatim "\<eta>"}@{text "  |  "}@{verbatim "\<theta>"}@{text "  |"} \\
   129           &   & @{verbatim "\<iota>"}@{text "  |  "}@{verbatim "\<kappa>"}@{text "  |  "}@{verbatim "\<mu>"}@{text "  |  "}@{verbatim "\<nu>"}@{text "  |"} \\
   130           &   & @{verbatim "\<xi>"}@{text "  |  "}@{verbatim "\<pi>"}@{text "  |  "}@{verbatim "\<rho>"}@{text "  |  "}@{verbatim "\<sigma>"}@{text "  |  "}@{verbatim "\<tau>"}@{text "  |"} \\
   131           &   & @{verbatim "\<upsilon>"}@{text "  |  "}@{verbatim "\<phi>"}@{text "  |  "}@{verbatim "\<chi>"}@{text "  |  "}@{verbatim "\<psi>"}@{text "  |"} \\
   132           &   & @{verbatim "\<omega>"}@{text "  |  "}@{verbatim "\<Gamma>"}@{text "  |  "}@{verbatim "\<Delta>"}@{text "  |  "}@{verbatim "\<Theta>"}@{text "  |"} \\
   133           &   & @{verbatim "\<Lambda>"}@{text "  |  "}@{verbatim "\<Xi>"}@{text "  |  "}@{verbatim "\<Pi>"}@{text "  |  "}@{verbatim "\<Sigma>"}@{text "  |"} \\
   134           &   & @{verbatim "\<Upsilon>"}@{text "  |  "}@{verbatim "\<Phi>"}@{text "  |  "}@{verbatim "\<Psi>"}@{text "  |  "}@{verbatim "\<Omega>"} \\
   135   \end{supertabular}
   136   \end{center}
   137 
   138   A @{syntax_ref var} or @{syntax_ref typevar} describes an unknown,
   139   which is internally a pair of base name and index (ML type @{ML_type
   140   indexname}).  These components are either separated by a dot as in
   141   @{text "?x.1"} or @{text "?x7.3"} or run together as in @{text
   142   "?x1"}.  The latter form is possible if the base name does not end
   143   with digits.  If the index is 0, it may be dropped altogether:
   144   @{text "?x"} and @{text "?x0"} and @{text "?x.0"} all refer to the
   145   same unknown, with basename @{text "x"} and index 0.
   146 
   147   The syntax of @{syntax_ref string} admits any characters, including
   148   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
   149   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
   150   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
   151   with three decimal digits.  Alternative strings according to
   152   @{syntax_ref altstring} are analogous, using single back-quotes
   153   instead.
   154 
   155   The body of @{syntax_ref verbatim} may consist of any text not
   156   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
   157   convenient inclusion of quotes without further escapes.  There is no
   158   way to escape ``@{verbatim "*"}@{verbatim "}"}''.  If the quoted
   159   text is {\LaTeX} source, one may usually add some blank or comment
   160   to avoid the critical character sequence.
   161 
   162   Source comments take the form @{verbatim "(*"}~@{text
   163   "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface
   164   might prevent this.  Note that this form indicates source comments
   165   only, which are stripped after lexical analysis of the input.  The
   166   Isar syntax also provides proper \emph{document comments} that are
   167   considered as part of the text (see \secref{sec:comments}).
   168 
   169   Common mathematical symbols such as @{text \<forall>} are represented in
   170   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle
   171   symbols like this, although proper presentation is left to front-end
   172   tools such as {\LaTeX} or Proof~General with the X-Symbol package.
   173   A list of standard Isabelle symbols that work well with these tools
   174   is given in \appref{app:symbols}.  Note that @{verbatim "\<lambda>"} does
   175   not belong to the @{text letter} category, since it is already used
   176   differently in the Pure term language.
   177 *}
   178 
   179 
   180 section {* Common syntax entities *}
   181 
   182 text {*
   183   We now introduce several basic syntactic entities, such as names,
   184   terms, and theorem specifications, which are factored out of the
   185   actual Isar language elements to be described later.
   186 *}
   187 
   188 
   189 subsection {* Names *}
   190 
   191 text {*
   192   Entity \railqtok{name} usually refers to any name of types,
   193   constants, theorems etc.\ that are to be \emph{declared} or
   194   \emph{defined} (so qualified identifiers are excluded here).  Quoted
   195   strings provide an escape for non-identifier names or those ruled
   196   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
   197   Already existing objects are usually referenced by
   198   \railqtok{nameref}.
   199 
   200   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   201   \indexoutertoken{int}
   202   \begin{rail}
   203     name: ident | symident | string | nat
   204     ;
   205     parname: '(' name ')'
   206     ;
   207     nameref: name | longident
   208     ;
   209     int: nat | '-' nat
   210     ;
   211   \end{rail}
   212 *}
   213 
   214 
   215 subsection {* Comments \label{sec:comments} *}
   216 
   217 text {*
   218   Large chunks of plain \railqtok{text} are usually given
   219   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
   220   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,
   221   any of the smaller text units conforming to \railqtok{nameref} are
   222   admitted as well.  A marginal \railnonterm{comment} is of the form
   223   @{verbatim "--"} \railqtok{text}.  Any number of these may occur
   224   within Isabelle/Isar commands.
   225 
   226   \indexoutertoken{text}\indexouternonterm{comment}
   227   \begin{rail}
   228     text: verbatim | nameref
   229     ;
   230     comment: '--' text
   231     ;
   232   \end{rail}
   233 *}
   234 
   235 
   236 subsection {* Type classes, sorts and arities *}
   237 
   238 text {*
   239   Classes are specified by plain names.  Sorts have a very simple
   240   inner syntax, which is either a single class name @{text c} or a
   241   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
   242   intersection of these classes.  The syntax of type arities is given
   243   directly at the outer level.
   244 
   245   \indexouternonterm{sort}\indexouternonterm{arity}
   246   \indexouternonterm{classdecl}
   247   \begin{rail}
   248     classdecl: name (('<' | subseteq) (nameref + ','))?
   249     ;
   250     sort: nameref
   251     ;
   252     arity: ('(' (sort + ',') ')')? sort
   253     ;
   254   \end{rail}
   255 *}
   256 
   257 
   258 subsection {* Types and terms \label{sec:types-terms} *}
   259 
   260 text {*
   261   The actual inner Isabelle syntax, that of types and terms of the
   262   logic, is far too sophisticated in order to be modelled explicitly
   263   at the outer theory level.  Basically, any such entity has to be
   264   quoted to turn it into a single token (the parsing and type-checking
   265   is performed internally later).  For convenience, a slightly more
   266   liberal convention is adopted: quotes may be omitted for any type or
   267   term that is already atomic at the outer level.  For example, one
   268   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
   269   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
   270   "\<forall>"} are available as well, provided these have not been superseded
   271   by commands or other keywords already (such as @{verbatim "="} or
   272   @{verbatim "+"}).
   273 
   274   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   275   \begin{rail}
   276     type: nameref | typefree | typevar
   277     ;
   278     term: nameref | var
   279     ;
   280     prop: term
   281     ;
   282   \end{rail}
   283 
   284   Positional instantiations are indicated by giving a sequence of
   285   terms, or the placeholder ``@{text _}'' (underscore), which means to
   286   skip a position.
   287 
   288   \indexoutertoken{inst}\indexoutertoken{insts}
   289   \begin{rail}
   290     inst: underscore | term
   291     ;
   292     insts: (inst *)
   293     ;
   294   \end{rail}
   295 
   296   Type declarations and definitions usually refer to
   297   \railnonterm{typespec} on the left-hand side.  This models basic
   298   type constructor application at the outer syntax level.  Note that
   299   only plain postfix notation is available here, but no infixes.
   300 
   301   \indexouternonterm{typespec}
   302   \begin{rail}
   303     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   304     ;
   305   \end{rail}
   306 *}
   307 
   308 
   309 subsection {* Term patterns and declarations \label{sec:term-decls} *}
   310 
   311 text {*
   312   Wherever explicit propositions (or term fragments) occur in a proof
   313   text, casual binding of schematic term variables may be given
   314   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
   315   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.
   316 
   317   \indexouternonterm{termpat}\indexouternonterm{proppat}
   318   \begin{rail}
   319     termpat: '(' ('is' term +) ')'
   320     ;
   321     proppat: '(' ('is' prop +) ')'
   322     ;
   323   \end{rail}
   324 
   325   \medskip Declarations of local variables @{text "x :: \<tau>"} and
   326   logical propositions @{text "a : \<phi>"} represent different views on
   327   the same principle of introducing a local scope.  In practice, one
   328   may usually omit the typing of \railnonterm{vars} (due to
   329   type-inference), and the naming of propositions (due to implicit
   330   references of current facts).  In any case, Isar proof elements
   331   usually admit to introduce multiple such items simultaneously.
   332 
   333   \indexouternonterm{vars}\indexouternonterm{props}
   334   \begin{rail}
   335     vars: (name+) ('::' type)?
   336     ;
   337     props: thmdecl? (prop proppat? +)
   338     ;
   339   \end{rail}
   340 
   341   The treatment of multiple declarations corresponds to the
   342   complementary focus of \railnonterm{vars} versus
   343   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
   344   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
   345   \<phi>\<^sub>n"} the naming refers to all propositions collectively.
   346   Isar language elements that refer to \railnonterm{vars} or
   347   \railnonterm{props} typically admit separate typings or namings via
   348   another level of iteration, with explicit @{keyword_ref "and"}
   349   separators; e.g.\ see @{command "fix"} and @{command "assume"} in
   350   \secref{sec:proof-context}.
   351 *}
   352 
   353 
   354 subsection {* Attributes and theorems \label{sec:syn-att} *}
   355 
   356 text {* Attributes have their own ``semi-inner'' syntax, in the sense
   357   that input conforming to \railnonterm{args} below is parsed by the
   358   attribute a second time.  The attribute argument specifications may
   359   be any sequence of atomic entities (identifiers, strings etc.), or
   360   properly bracketed argument lists.  Below \railqtok{atom} refers to
   361   any atomic entity, including any \railtok{keyword} conforming to
   362   \railtok{symident}.
   363 
   364   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   365   \begin{rail}
   366     atom: nameref | typefree | typevar | var | nat | keyword
   367     ;
   368     arg: atom | '(' args ')' | '[' args ']'
   369     ;
   370     args: arg *
   371     ;
   372     attributes: '[' (nameref args * ',') ']'
   373     ;
   374   \end{rail}
   375 
   376   Theorem specifications come in several flavors:
   377   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
   378   axioms, assumptions or results of goal statements, while
   379   \railnonterm{thmdef} collects lists of existing theorems.  Existing
   380   theorems are given by \railnonterm{thmref} and
   381   \railnonterm{thmrefs}, the former requires an actual singleton
   382   result.
   383 
   384   There are three forms of theorem references:
   385   \begin{enumerate}
   386   
   387   \item named facts @{text "a"},
   388 
   389   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
   390 
   391   \item literal fact propositions using @{syntax_ref altstring} syntax
   392   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
   393   @{method_ref fact}).
   394 
   395   \end{enumerate}
   396 
   397   Any kind of theorem specification may include lists of attributes
   398   both on the left and right hand sides; attributes are applied to any
   399   immediately preceding fact.  If names are omitted, the theorems are
   400   not stored within the theorem database of the theory or proof
   401   context, but any given attributes are applied nonetheless.
   402 
   403   An extra pair of brackets around attributes (like ``@{text
   404   "[[simproc a]]"}'') abbreviates a theorem reference involving an
   405   internal dummy fact, which will be ignored later on.  So only the
   406   effect of the attribute on the background context will persist.
   407   This form of in-place declarations is particularly useful with
   408   commands like @{command "declare"} and @{command "using"}.
   409 
   410   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
   411   \indexouternonterm{thmdef}\indexouternonterm{thmref}
   412   \indexouternonterm{thmrefs}\indexouternonterm{selection}
   413   \begin{rail}
   414     axmdecl: name attributes? ':'
   415     ;
   416     thmdecl: thmbind ':'
   417     ;
   418     thmdef: thmbind '='
   419     ;
   420     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
   421     ;
   422     thmrefs: thmref +
   423     ;
   424 
   425     thmbind: name attributes | name | attributes
   426     ;
   427     selection: '(' ((nat | nat '-' nat?) + ',') ')'
   428     ;
   429   \end{rail}
   430 *}
   431 
   432 end