doc-src/IsarRef/Thy/Outer_Syntax.thy
author wenzelm
Thu, 13 Nov 2008 22:00:12 +0100
changeset 28776 e4090e51b8b9
parent 28775 d25fe9601dbd
child 28778 a25630deacaf
permissions -rw-r--r--
misc tuning;
     1 (* $Id$ *)
     2 
     3 theory Outer_Syntax
     4 imports Main
     5 begin
     6 
     7 chapter {* Outer syntax *}
     8 
     9 text {*
    10   The rather generic framework of Isabelle/Isar syntax emerges from
    11   three main syntactic categories: \emph{commands} of the top-level
    12   Isar engine (covering theory and proof elements), \emph{methods} for
    13   general goal refinements (analogous to traditional ``tactics''), and
    14   \emph{attributes} for operations on facts (within a certain
    15   context).  Subsequently we give a reference of basic syntactic
    16   entities underlying Isabelle/Isar syntax in a bottom-up manner.
    17   Concrete theory and proof language elements will be introduced later
    18   on.
    19 
    20   \medskip In order to get started with writing well-formed
    21   Isabelle/Isar documents, the most important aspect to be noted is
    22   the difference of \emph{inner} versus \emph{outer} syntax.  Inner
    23   syntax is that of Isabelle types and terms of the logic, while outer
    24   syntax is that of Isabelle/Isar theory sources (specifications and
    25   proofs).  As a general rule, inner syntax entities may occur only as
    26   \emph{atomic entities} within outer syntax.  For example, the string
    27   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
    28   specifications within a theory, while @{verbatim "x + y"} without
    29   quotes is not.
    30 
    31   Printed theory documents usually omit quotes to gain readability
    32   (this is a matter of {\LaTeX} macro setup, say via @{verbatim
    33   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced
    34   users of Isabelle/Isar may easily reconstruct the lost technical
    35   information, while mere readers need not care about quotes at all.
    36 
    37   \medskip Isabelle/Isar input may contain any number of input
    38   termination characters ``@{verbatim ";"}'' (semicolon) to separate
    39   commands explicitly.  This is particularly useful in interactive
    40   shell sessions to make clear where the current command is intended
    41   to end.  Otherwise, the interpreter loop will continue to issue a
    42   secondary prompt ``@{verbatim "#"}'' until an end-of-command is
    43   clearly recognized from the input syntax, e.g.\ encounter of the
    44   next command keyword.
    45 
    46   More advanced interfaces such as Proof~General \cite{proofgeneral}
    47   do not require explicit semicolons, the amount of input text is
    48   determined automatically by inspecting the present content of the
    49   Emacs text buffer.  In the printed presentation of Isabelle/Isar
    50   documents semicolons are omitted altogether for readability.
    51 
    52   \begin{warn}
    53     Proof~General requires certain syntax classification tables in
    54     order to achieve properly synchronized interaction with the
    55     Isabelle/Isar process.  These tables need to be consistent with
    56     the Isabelle version and particular logic image to be used in a
    57     running session (common object-logics may well change the outer
    58     syntax).  The standard setup should work correctly with any of the
    59     ``official'' logic images derived from Isabelle/HOL (including
    60     HOLCF etc.).  Users of alternative logics may need to tell
    61     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
    62     (in conjunction with @{verbatim "-l ZF"}, to specify the default
    63     logic image).  Note that option @{verbatim "-L"} does both
    64     of this at the same time.
    65   \end{warn}
    66 *}
    67 
    68 
    69 section {* Lexical matters \label{sec:outer-lex} *}
    70 
    71 text {* The outer lexical syntax consists of three main categories of
    72   syntax tokens:
    73 
    74   \begin{enumerate}
    75 
    76   \item \emph{major keywords} --- the command names that are available
    77   in the present logic session;
    78 
    79   \item \emph{minor keywords} --- additional literal tokens required
    80   by the syntax of commands;
    81 
    82   \item \emph{named tokens} --- various categories of identifiers etc.
    83 
    84   \end{enumerate}
    85 
    86   Major keywords and minor keywords are guaranteed to be disjoint.
    87   This helps user-interfaces to determine the overall structure of a
    88   theory text, without knowing the full details of command syntax.
    89   Internally, there is some additional information about the kind of
    90   major keywords, which approximates the command type (theory command,
    91   proof command etc.).
    92 
    93   Keywords override named tokens.  For example, the presence of a
    94   command called @{verbatim term} inhibits the identifier @{verbatim
    95   term}, but the string @{verbatim "\"term\""} can be used instead.
    96   By convention, the outer syntax always allows quoted strings in
    97   addition to identifiers, wherever a named entity is expected.
    98 
    99   When tokenizing a given input sequence, the lexer repeatedly takes
   100   the longest prefix of the input that forms a valid token.  Spaces,
   101   tabs, newlines and formfeeds between tokens serve as explicit
   102   separators.
   103 
   104   \medskip The categories for named tokens are defined once and for
   105   all as follows.
   106 
   107   \begin{center}
   108   \begin{supertabular}{rcl}
   109     @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\
   110     @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\
   111     @{syntax_def symident} & = & @{text "sym\<^sup>+  |  "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\
   112     @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\
   113     @{syntax_def var} & = & @{verbatim "?"}@{text "ident  |  "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\
   114     @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\
   115     @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree  |  "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\
   116     @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\
   117     @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\
   118     @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]
   119 
   120     @{text letter} & = & @{text "latin  |  "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text "  |  "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text "  |  greek  |"} \\
   121           &   & @{verbatim "\<^isub>"}@{text "  |  "}@{verbatim "\<^isup>"} \\
   122     @{text quasiletter} & = & @{text "letter  |  digit  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "'"} \\
   123     @{text latin} & = & @{verbatim a}@{text "  | \<dots> |  "}@{verbatim z}@{text "  |  "}@{verbatim A}@{text "  |  \<dots> |  "}@{verbatim Z} \\
   124     @{text digit} & = & @{verbatim "0"}@{text "  |  \<dots> |  "}@{verbatim "9"} \\
   125     @{text sym} & = & @{verbatim "!"}@{text "  |  "}@{verbatim "#"}@{text "  |  "}@{verbatim "$"}@{text "  |  "}@{verbatim "%"}@{text "  |  "}@{verbatim "&"}@{text "  |  "}@{verbatim "*"}@{text "  |  "}@{verbatim "+"}@{text "  |  "}@{verbatim "-"}@{text "  |  "}@{verbatim "/"}@{text "  |"} \\
   126     & & @{verbatim "<"}@{text "  |  "}@{verbatim "="}@{text "  |  "}@{verbatim ">"}@{text "  |  "}@{verbatim "?"}@{text "  |  "}@{verbatim "@"}@{text "  |  "}@{verbatim "^"}@{text "  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "|"}@{text "  |  "}@{verbatim "~"} \\
   127     @{text greek} & = & @{verbatim "\<alpha>"}@{text "  |  "}@{verbatim "\<beta>"}@{text "  |  "}@{verbatim "\<gamma>"}@{text "  |  "}@{verbatim "\<delta>"}@{text "  |"} \\
   128           &   & @{verbatim "\<epsilon>"}@{text "  |  "}@{verbatim "\<zeta>"}@{text "  |  "}@{verbatim "\<eta>"}@{text "  |  "}@{verbatim "\<theta>"}@{text "  |"} \\
   129           &   & @{verbatim "\<iota>"}@{text "  |  "}@{verbatim "\<kappa>"}@{text "  |  "}@{verbatim "\<mu>"}@{text "  |  "}@{verbatim "\<nu>"}@{text "  |"} \\
   130           &   & @{verbatim "\<xi>"}@{text "  |  "}@{verbatim "\<pi>"}@{text "  |  "}@{verbatim "\<rho>"}@{text "  |  "}@{verbatim "\<sigma>"}@{text "  |  "}@{verbatim "\<tau>"}@{text "  |"} \\
   131           &   & @{verbatim "\<upsilon>"}@{text "  |  "}@{verbatim "\<phi>"}@{text "  |  "}@{verbatim "\<chi>"}@{text "  |  "}@{verbatim "\<psi>"}@{text "  |"} \\
   132           &   & @{verbatim "\<omega>"}@{text "  |  "}@{verbatim "\<Gamma>"}@{text "  |  "}@{verbatim "\<Delta>"}@{text "  |  "}@{verbatim "\<Theta>"}@{text "  |"} \\
   133           &   & @{verbatim "\<Lambda>"}@{text "  |  "}@{verbatim "\<Xi>"}@{text "  |  "}@{verbatim "\<Pi>"}@{text "  |  "}@{verbatim "\<Sigma>"}@{text "  |"} \\
   134           &   & @{verbatim "\<Upsilon>"}@{text "  |  "}@{verbatim "\<Phi>"}@{text "  |  "}@{verbatim "\<Psi>"}@{text "  |  "}@{verbatim "\<Omega>"} \\
   135   \end{supertabular}
   136   \end{center}
   137 
   138   The syntax of @{syntax string} admits any characters, including
   139   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
   140   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
   141   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
   142   with three decimal digits.  Alternative strings according to
   143   @{syntax altstring} are analogous, using single back-quotes instead.
   144   The body of @{syntax verbatim} may consist of any text not
   145   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
   146   convenient inclusion of quotes without further escapes.  The greek
   147   letters do \emph{not} include @{verbatim "\<lambda>"}, which is already used
   148   differently in the meta-logic.
   149 
   150   Common mathematical symbols such as @{text \<forall>} are represented in
   151   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle
   152   symbols like this, although proper presentation is left to front-end
   153   tools such as {\LaTeX} or Proof~General with the X-Symbol package.
   154   A list of standard Isabelle symbols that work well with these tools
   155   is given in \cite[appendix~A]{isabelle-sys}.
   156   
   157   Source comments take the form @{verbatim "(*"}~@{text
   158   "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface
   159   might prevent this.  Note that this form indicates source comments
   160   only, which are stripped after lexical analysis of the input.  The
   161   Isar syntax also provides proper \emph{document comments} that are
   162   considered as part of the text (see \secref{sec:comments}).
   163 *}
   164 
   165 
   166 section {* Common syntax entities *}
   167 
   168 text {*
   169   We now introduce several basic syntactic entities, such as names,
   170   terms, and theorem specifications, which are factored out of the
   171   actual Isar language elements to be described later.
   172 *}
   173 
   174 
   175 subsection {* Names *}
   176 
   177 text {*
   178   Entity \railqtok{name} usually refers to any name of types,
   179   constants, theorems etc.\ that are to be \emph{declared} or
   180   \emph{defined} (so qualified identifiers are excluded here).  Quoted
   181   strings provide an escape for non-identifier names or those ruled
   182   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
   183   Already existing objects are usually referenced by
   184   \railqtok{nameref}.
   185 
   186   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   187   \indexoutertoken{int}
   188   \begin{rail}
   189     name: ident | symident | string | nat
   190     ;
   191     parname: '(' name ')'
   192     ;
   193     nameref: name | longident
   194     ;
   195     int: nat | '-' nat
   196     ;
   197   \end{rail}
   198 *}
   199 
   200 
   201 subsection {* Comments \label{sec:comments} *}
   202 
   203 text {*
   204   Large chunks of plain \railqtok{text} are usually given
   205   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
   206   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,
   207   any of the smaller text units conforming to \railqtok{nameref} are
   208   admitted as well.  A marginal \railnonterm{comment} is of the form
   209   @{verbatim "--"} \railqtok{text}.  Any number of these may occur
   210   within Isabelle/Isar commands.
   211 
   212   \indexoutertoken{text}\indexouternonterm{comment}
   213   \begin{rail}
   214     text: verbatim | nameref
   215     ;
   216     comment: '--' text
   217     ;
   218   \end{rail}
   219 *}
   220 
   221 
   222 subsection {* Type classes, sorts and arities *}
   223 
   224 text {*
   225   Classes are specified by plain names.  Sorts have a very simple
   226   inner syntax, which is either a single class name @{text c} or a
   227   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
   228   intersection of these classes.  The syntax of type arities is given
   229   directly at the outer level.
   230 
   231   \indexouternonterm{sort}\indexouternonterm{arity}
   232   \indexouternonterm{classdecl}
   233   \begin{rail}
   234     classdecl: name (('<' | subseteq) (nameref + ','))?
   235     ;
   236     sort: nameref
   237     ;
   238     arity: ('(' (sort + ',') ')')? sort
   239     ;
   240   \end{rail}
   241 *}
   242 
   243 
   244 subsection {* Types and terms \label{sec:types-terms} *}
   245 
   246 text {*
   247   The actual inner Isabelle syntax, that of types and terms of the
   248   logic, is far too sophisticated in order to be modelled explicitly
   249   at the outer theory level.  Basically, any such entity has to be
   250   quoted to turn it into a single token (the parsing and type-checking
   251   is performed internally later).  For convenience, a slightly more
   252   liberal convention is adopted: quotes may be omitted for any type or
   253   term that is already atomic at the outer level.  For example, one
   254   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
   255   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
   256   "\<forall>"} are available as well, provided these have not been superseded
   257   by commands or other keywords already (such as @{verbatim "="} or
   258   @{verbatim "+"}).
   259 
   260   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   261   \begin{rail}
   262     type: nameref | typefree | typevar
   263     ;
   264     term: nameref | var
   265     ;
   266     prop: term
   267     ;
   268   \end{rail}
   269 
   270   Positional instantiations are indicated by giving a sequence of
   271   terms, or the placeholder ``@{text _}'' (underscore), which means to
   272   skip a position.
   273 
   274   \indexoutertoken{inst}\indexoutertoken{insts}
   275   \begin{rail}
   276     inst: underscore | term
   277     ;
   278     insts: (inst *)
   279     ;
   280   \end{rail}
   281 
   282   Type declarations and definitions usually refer to
   283   \railnonterm{typespec} on the left-hand side.  This models basic
   284   type constructor application at the outer syntax level.  Note that
   285   only plain postfix notation is available here, but no infixes.
   286 
   287   \indexouternonterm{typespec}
   288   \begin{rail}
   289     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   290     ;
   291   \end{rail}
   292 *}
   293 
   294 
   295 subsection {* Term patterns and declarations \label{sec:term-decls} *}
   296 
   297 text {*
   298   Wherever explicit propositions (or term fragments) occur in a proof
   299   text, casual binding of schematic term variables may be given
   300   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
   301   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.
   302 
   303   \indexouternonterm{termpat}\indexouternonterm{proppat}
   304   \begin{rail}
   305     termpat: '(' ('is' term +) ')'
   306     ;
   307     proppat: '(' ('is' prop +) ')'
   308     ;
   309   \end{rail}
   310 
   311   \medskip Declarations of local variables @{text "x :: \<tau>"} and
   312   logical propositions @{text "a : \<phi>"} represent different views on
   313   the same principle of introducing a local scope.  In practice, one
   314   may usually omit the typing of \railnonterm{vars} (due to
   315   type-inference), and the naming of propositions (due to implicit
   316   references of current facts).  In any case, Isar proof elements
   317   usually admit to introduce multiple such items simultaneously.
   318 
   319   \indexouternonterm{vars}\indexouternonterm{props}
   320   \begin{rail}
   321     vars: (name+) ('::' type)?
   322     ;
   323     props: thmdecl? (prop proppat? +)
   324     ;
   325   \end{rail}
   326 
   327   The treatment of multiple declarations corresponds to the
   328   complementary focus of \railnonterm{vars} versus
   329   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
   330   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
   331   \<phi>\<^sub>n"} the naming refers to all propositions collectively.
   332   Isar language elements that refer to \railnonterm{vars} or
   333   \railnonterm{props} typically admit separate typings or namings via
   334   another level of iteration, with explicit @{keyword_ref "and"}
   335   separators; e.g.\ see @{command "fix"} and @{command "assume"} in
   336   \secref{sec:proof-context}.
   337 *}
   338 
   339 
   340 subsection {* Attributes and theorems \label{sec:syn-att} *}
   341 
   342 text {* Attributes have their own ``semi-inner'' syntax, in the sense
   343   that input conforming to \railnonterm{args} below is parsed by the
   344   attribute a second time.  The attribute argument specifications may
   345   be any sequence of atomic entities (identifiers, strings etc.), or
   346   properly bracketed argument lists.  Below \railqtok{atom} refers to
   347   any atomic entity, including any \railtok{keyword} conforming to
   348   \railtok{symident}.
   349 
   350   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   351   \begin{rail}
   352     atom: nameref | typefree | typevar | var | nat | keyword
   353     ;
   354     arg: atom | '(' args ')' | '[' args ']'
   355     ;
   356     args: arg *
   357     ;
   358     attributes: '[' (nameref args * ',') ']'
   359     ;
   360   \end{rail}
   361 
   362   Theorem specifications come in several flavors:
   363   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
   364   axioms, assumptions or results of goal statements, while
   365   \railnonterm{thmdef} collects lists of existing theorems.  Existing
   366   theorems are given by \railnonterm{thmref} and
   367   \railnonterm{thmrefs}, the former requires an actual singleton
   368   result.
   369 
   370   There are three forms of theorem references:
   371   \begin{enumerate}
   372   
   373   \item named facts @{text "a"},
   374 
   375   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
   376 
   377   \item literal fact propositions using @{syntax_ref altstring} syntax
   378   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
   379   @{method_ref fact}).
   380 
   381   \end{enumerate}
   382 
   383   Any kind of theorem specification may include lists of attributes
   384   both on the left and right hand sides; attributes are applied to any
   385   immediately preceding fact.  If names are omitted, the theorems are
   386   not stored within the theorem database of the theory or proof
   387   context, but any given attributes are applied nonetheless.
   388 
   389   An extra pair of brackets around attributes (like ``@{text
   390   "[[simproc a]]"}'') abbreviates a theorem reference involving an
   391   internal dummy fact, which will be ignored later on.  So only the
   392   effect of the attribute on the background context will persist.
   393   This form of in-place declarations is particularly useful with
   394   commands like @{command "declare"} and @{command "using"}.
   395 
   396   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
   397   \indexouternonterm{thmdef}\indexouternonterm{thmref}
   398   \indexouternonterm{thmrefs}\indexouternonterm{selection}
   399   \begin{rail}
   400     axmdecl: name attributes? ':'
   401     ;
   402     thmdecl: thmbind ':'
   403     ;
   404     thmdef: thmbind '='
   405     ;
   406     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
   407     ;
   408     thmrefs: thmref +
   409     ;
   410 
   411     thmbind: name attributes | name | attributes
   412     ;
   413     selection: '(' ((nat | nat '-' nat?) + ',') ')'
   414     ;
   415   \end{rail}
   416 *}
   417 
   418 end