doc-src/IsarRef/Thy/Outer_Syntax.thy
author wenzelm
Sat, 28 Feb 2009 16:39:46 +0100
changeset 30168 9a20be5be90b
parent 30045 d2597c4f7e5c
child 30242 aea5d7fa7ef5
permissions -rw-r--r--
removed Ids;
     1 theory Outer_Syntax
     2 imports Main
     3 begin
     4 
     5 chapter {* Outer syntax *}
     6 
     7 text {*
     8   The rather generic framework of Isabelle/Isar syntax emerges from
     9   three main syntactic categories: \emph{commands} of the top-level
    10   Isar engine (covering theory and proof elements), \emph{methods} for
    11   general goal refinements (analogous to traditional ``tactics''), and
    12   \emph{attributes} for operations on facts (within a certain
    13   context).  Subsequently we give a reference of basic syntactic
    14   entities underlying Isabelle/Isar syntax in a bottom-up manner.
    15   Concrete theory and proof language elements will be introduced later
    16   on.
    17 
    18   \medskip In order to get started with writing well-formed
    19   Isabelle/Isar documents, the most important aspect to be noted is
    20   the difference of \emph{inner} versus \emph{outer} syntax.  Inner
    21   syntax is that of Isabelle types and terms of the logic, while outer
    22   syntax is that of Isabelle/Isar theory sources (specifications and
    23   proofs).  As a general rule, inner syntax entities may occur only as
    24   \emph{atomic entities} within outer syntax.  For example, the string
    25   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
    26   specifications within a theory, while @{verbatim "x + y"} without
    27   quotes is not.
    28 
    29   Printed theory documents usually omit quotes to gain readability
    30   (this is a matter of {\LaTeX} macro setup, say via @{verbatim
    31   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced
    32   users of Isabelle/Isar may easily reconstruct the lost technical
    33   information, while mere readers need not care about quotes at all.
    34 
    35   \medskip Isabelle/Isar input may contain any number of input
    36   termination characters ``@{verbatim ";"}'' (semicolon) to separate
    37   commands explicitly.  This is particularly useful in interactive
    38   shell sessions to make clear where the current command is intended
    39   to end.  Otherwise, the interpreter loop will continue to issue a
    40   secondary prompt ``@{verbatim "#"}'' until an end-of-command is
    41   clearly recognized from the input syntax, e.g.\ encounter of the
    42   next command keyword.
    43 
    44   More advanced interfaces such as Proof~General \cite{proofgeneral}
    45   do not require explicit semicolons, the amount of input text is
    46   determined automatically by inspecting the present content of the
    47   Emacs text buffer.  In the printed presentation of Isabelle/Isar
    48   documents semicolons are omitted altogether for readability.
    49 
    50   \begin{warn}
    51     Proof~General requires certain syntax classification tables in
    52     order to achieve properly synchronized interaction with the
    53     Isabelle/Isar process.  These tables need to be consistent with
    54     the Isabelle version and particular logic image to be used in a
    55     running session (common object-logics may well change the outer
    56     syntax).  The standard setup should work correctly with any of the
    57     ``official'' logic images derived from Isabelle/HOL (including
    58     HOLCF etc.).  Users of alternative logics may need to tell
    59     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
    60     (in conjunction with @{verbatim "-l ZF"}, to specify the default
    61     logic image).  Note that option @{verbatim "-L"} does both
    62     of this at the same time.
    63   \end{warn}
    64 *}
    65 
    66 
    67 section {* Lexical matters \label{sec:outer-lex} *}
    68 
    69 text {* The outer lexical syntax consists of three main categories of
    70   syntax tokens:
    71 
    72   \begin{enumerate}
    73 
    74   \item \emph{major keywords} --- the command names that are available
    75   in the present logic session;
    76 
    77   \item \emph{minor keywords} --- additional literal tokens required
    78   by the syntax of commands;
    79 
    80   \item \emph{named tokens} --- various categories of identifiers etc.
    81 
    82   \end{enumerate}
    83 
    84   Major keywords and minor keywords are guaranteed to be disjoint.
    85   This helps user-interfaces to determine the overall structure of a
    86   theory text, without knowing the full details of command syntax.
    87   Internally, there is some additional information about the kind of
    88   major keywords, which approximates the command type (theory command,
    89   proof command etc.).
    90 
    91   Keywords override named tokens.  For example, the presence of a
    92   command called @{verbatim term} inhibits the identifier @{verbatim
    93   term}, but the string @{verbatim "\"term\""} can be used instead.
    94   By convention, the outer syntax always allows quoted strings in
    95   addition to identifiers, wherever a named entity is expected.
    96 
    97   When tokenizing a given input sequence, the lexer repeatedly takes
    98   the longest prefix of the input that forms a valid token.  Spaces,
    99   tabs, newlines and formfeeds between tokens serve as explicit
   100   separators.
   101 
   102   \medskip The categories for named tokens are defined once and for
   103   all as follows.
   104 
   105   \begin{center}
   106   \begin{supertabular}{rcl}
   107     @{syntax_def ident} & = & @{text "letter quasiletter\<^sup>*"} \\
   108     @{syntax_def longident} & = & @{text "ident("}@{verbatim "."}@{text "ident)\<^sup>+"} \\
   109     @{syntax_def symident} & = & @{text "sym\<^sup>+  |  "}@{verbatim "\\"}@{verbatim "<"}@{text ident}@{verbatim ">"} \\
   110     @{syntax_def nat} & = & @{text "digit\<^sup>+"} \\
   111     @{syntax_def var} & = & @{verbatim "?"}@{text "ident  |  "}@{verbatim "?"}@{text ident}@{verbatim "."}@{text nat} \\
   112     @{syntax_def typefree} & = & @{verbatim "'"}@{text ident} \\
   113     @{syntax_def typevar} & = & @{verbatim "?"}@{text "typefree  |  "}@{verbatim "?"}@{text typefree}@{verbatim "."}@{text nat} \\
   114     @{syntax_def string} & = & @{verbatim "\""} @{text "\<dots>"} @{verbatim "\""} \\
   115     @{syntax_def altstring} & = & @{verbatim "`"} @{text "\<dots>"} @{verbatim "`"} \\
   116     @{syntax_def verbatim} & = & @{verbatim "{*"} @{text "\<dots>"} @{verbatim "*"}@{verbatim "}"} \\[1ex]
   117 
   118     @{text letter} & = & @{text "latin  |  "}@{verbatim "\\"}@{verbatim "<"}@{text latin}@{verbatim ">"}@{text "  |  "}@{verbatim "\\"}@{verbatim "<"}@{text "latin latin"}@{verbatim ">"}@{text "  |  greek  |"} \\
   119           &   & @{verbatim "\<^isub>"}@{text "  |  "}@{verbatim "\<^isup>"} \\
   120     @{text quasiletter} & = & @{text "letter  |  digit  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "'"} \\
   121     @{text latin} & = & @{verbatim a}@{text "  | \<dots> |  "}@{verbatim z}@{text "  |  "}@{verbatim A}@{text "  |  \<dots> |  "}@{verbatim Z} \\
   122     @{text digit} & = & @{verbatim "0"}@{text "  |  \<dots> |  "}@{verbatim "9"} \\
   123     @{text sym} & = & @{verbatim "!"}@{text "  |  "}@{verbatim "#"}@{text "  |  "}@{verbatim "$"}@{text "  |  "}@{verbatim "%"}@{text "  |  "}@{verbatim "&"}@{text "  |  "}@{verbatim "*"}@{text "  |  "}@{verbatim "+"}@{text "  |  "}@{verbatim "-"}@{text "  |  "}@{verbatim "/"}@{text "  |"} \\
   124     & & @{verbatim "<"}@{text "  |  "}@{verbatim "="}@{text "  |  "}@{verbatim ">"}@{text "  |  "}@{verbatim "?"}@{text "  |  "}@{verbatim "@"}@{text "  |  "}@{verbatim "^"}@{text "  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "|"}@{text "  |  "}@{verbatim "~"} \\
   125     @{text greek} & = & @{verbatim "\<alpha>"}@{text "  |  "}@{verbatim "\<beta>"}@{text "  |  "}@{verbatim "\<gamma>"}@{text "  |  "}@{verbatim "\<delta>"}@{text "  |"} \\
   126           &   & @{verbatim "\<epsilon>"}@{text "  |  "}@{verbatim "\<zeta>"}@{text "  |  "}@{verbatim "\<eta>"}@{text "  |  "}@{verbatim "\<theta>"}@{text "  |"} \\
   127           &   & @{verbatim "\<iota>"}@{text "  |  "}@{verbatim "\<kappa>"}@{text "  |  "}@{verbatim "\<mu>"}@{text "  |  "}@{verbatim "\<nu>"}@{text "  |"} \\
   128           &   & @{verbatim "\<xi>"}@{text "  |  "}@{verbatim "\<pi>"}@{text "  |  "}@{verbatim "\<rho>"}@{text "  |  "}@{verbatim "\<sigma>"}@{text "  |  "}@{verbatim "\<tau>"}@{text "  |"} \\
   129           &   & @{verbatim "\<upsilon>"}@{text "  |  "}@{verbatim "\<phi>"}@{text "  |  "}@{verbatim "\<chi>"}@{text "  |  "}@{verbatim "\<psi>"}@{text "  |"} \\
   130           &   & @{verbatim "\<omega>"}@{text "  |  "}@{verbatim "\<Gamma>"}@{text "  |  "}@{verbatim "\<Delta>"}@{text "  |  "}@{verbatim "\<Theta>"}@{text "  |"} \\
   131           &   & @{verbatim "\<Lambda>"}@{text "  |  "}@{verbatim "\<Xi>"}@{text "  |  "}@{verbatim "\<Pi>"}@{text "  |  "}@{verbatim "\<Sigma>"}@{text "  |"} \\
   132           &   & @{verbatim "\<Upsilon>"}@{text "  |  "}@{verbatim "\<Phi>"}@{text "  |  "}@{verbatim "\<Psi>"}@{text "  |  "}@{verbatim "\<Omega>"} \\
   133   \end{supertabular}
   134   \end{center}
   135 
   136   A @{syntax_ref var} or @{syntax_ref typevar} describes an unknown,
   137   which is internally a pair of base name and index (ML type @{ML_type
   138   indexname}).  These components are either separated by a dot as in
   139   @{text "?x.1"} or @{text "?x7.3"} or run together as in @{text
   140   "?x1"}.  The latter form is possible if the base name does not end
   141   with digits.  If the index is 0, it may be dropped altogether:
   142   @{text "?x"} and @{text "?x0"} and @{text "?x.0"} all refer to the
   143   same unknown, with basename @{text "x"} and index 0.
   144 
   145   The syntax of @{syntax_ref string} admits any characters, including
   146   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
   147   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
   148   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
   149   with three decimal digits.  Alternative strings according to
   150   @{syntax_ref altstring} are analogous, using single back-quotes
   151   instead.
   152 
   153   The body of @{syntax_ref verbatim} may consist of any text not
   154   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
   155   convenient inclusion of quotes without further escapes.  There is no
   156   way to escape ``@{verbatim "*"}@{verbatim "}"}''.  If the quoted
   157   text is {\LaTeX} source, one may usually add some blank or comment
   158   to avoid the critical character sequence.
   159 
   160   Source comments take the form @{verbatim "(*"}~@{text
   161   "\<dots>"}~@{verbatim "*)"} and may be nested, although the user-interface
   162   might prevent this.  Note that this form indicates source comments
   163   only, which are stripped after lexical analysis of the input.  The
   164   Isar syntax also provides proper \emph{document comments} that are
   165   considered as part of the text (see \secref{sec:comments}).
   166 
   167   Common mathematical symbols such as @{text \<forall>} are represented in
   168   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle
   169   symbols like this, although proper presentation is left to front-end
   170   tools such as {\LaTeX} or Proof~General with the X-Symbol package.
   171   A list of predefined Isabelle symbols that work well with these
   172   tools is given in \appref{app:symbols}.  Note that @{verbatim "\<lambda>"}
   173   does not belong to the @{text letter} category, since it is already
   174   used differently in the Pure term language.
   175 *}
   176 
   177 
   178 section {* Common syntax entities *}
   179 
   180 text {*
   181   We now introduce several basic syntactic entities, such as names,
   182   terms, and theorem specifications, which are factored out of the
   183   actual Isar language elements to be described later.
   184 *}
   185 
   186 
   187 subsection {* Names *}
   188 
   189 text {*
   190   Entity \railqtok{name} usually refers to any name of types,
   191   constants, theorems etc.\ that are to be \emph{declared} or
   192   \emph{defined} (so qualified identifiers are excluded here).  Quoted
   193   strings provide an escape for non-identifier names or those ruled
   194   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
   195   Already existing objects are usually referenced by
   196   \railqtok{nameref}.
   197 
   198   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   199   \indexoutertoken{int}
   200   \begin{rail}
   201     name: ident | symident | string | nat
   202     ;
   203     parname: '(' name ')'
   204     ;
   205     nameref: name | longident
   206     ;
   207     int: nat | '-' nat
   208     ;
   209   \end{rail}
   210 *}
   211 
   212 
   213 subsection {* Comments \label{sec:comments} *}
   214 
   215 text {*
   216   Large chunks of plain \railqtok{text} are usually given
   217   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
   218   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,
   219   any of the smaller text units conforming to \railqtok{nameref} are
   220   admitted as well.  A marginal \railnonterm{comment} is of the form
   221   @{verbatim "--"} \railqtok{text}.  Any number of these may occur
   222   within Isabelle/Isar commands.
   223 
   224   \indexoutertoken{text}\indexouternonterm{comment}
   225   \begin{rail}
   226     text: verbatim | nameref
   227     ;
   228     comment: '--' text
   229     ;
   230   \end{rail}
   231 *}
   232 
   233 
   234 subsection {* Type classes, sorts and arities *}
   235 
   236 text {*
   237   Classes are specified by plain names.  Sorts have a very simple
   238   inner syntax, which is either a single class name @{text c} or a
   239   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
   240   intersection of these classes.  The syntax of type arities is given
   241   directly at the outer level.
   242 
   243   \indexouternonterm{sort}\indexouternonterm{arity}
   244   \indexouternonterm{classdecl}
   245   \begin{rail}
   246     classdecl: name (('<' | subseteq) (nameref + ','))?
   247     ;
   248     sort: nameref
   249     ;
   250     arity: ('(' (sort + ',') ')')? sort
   251     ;
   252   \end{rail}
   253 *}
   254 
   255 
   256 subsection {* Types and terms \label{sec:types-terms} *}
   257 
   258 text {*
   259   The actual inner Isabelle syntax, that of types and terms of the
   260   logic, is far too sophisticated in order to be modelled explicitly
   261   at the outer theory level.  Basically, any such entity has to be
   262   quoted to turn it into a single token (the parsing and type-checking
   263   is performed internally later).  For convenience, a slightly more
   264   liberal convention is adopted: quotes may be omitted for any type or
   265   term that is already atomic at the outer level.  For example, one
   266   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
   267   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
   268   "\<forall>"} are available as well, provided these have not been superseded
   269   by commands or other keywords already (such as @{verbatim "="} or
   270   @{verbatim "+"}).
   271 
   272   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   273   \begin{rail}
   274     type: nameref | typefree | typevar
   275     ;
   276     term: nameref | var
   277     ;
   278     prop: term
   279     ;
   280   \end{rail}
   281 
   282   Positional instantiations are indicated by giving a sequence of
   283   terms, or the placeholder ``@{text _}'' (underscore), which means to
   284   skip a position.
   285 
   286   \indexoutertoken{inst}\indexoutertoken{insts}
   287   \begin{rail}
   288     inst: underscore | term
   289     ;
   290     insts: (inst *)
   291     ;
   292   \end{rail}
   293 
   294   Type declarations and definitions usually refer to
   295   \railnonterm{typespec} on the left-hand side.  This models basic
   296   type constructor application at the outer syntax level.  Note that
   297   only plain postfix notation is available here, but no infixes.
   298 
   299   \indexouternonterm{typespec}
   300   \begin{rail}
   301     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   302     ;
   303   \end{rail}
   304 *}
   305 
   306 
   307 subsection {* Term patterns and declarations \label{sec:term-decls} *}
   308 
   309 text {*
   310   Wherever explicit propositions (or term fragments) occur in a proof
   311   text, casual binding of schematic term variables may be given
   312   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
   313   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.
   314 
   315   \indexouternonterm{termpat}\indexouternonterm{proppat}
   316   \begin{rail}
   317     termpat: '(' ('is' term +) ')'
   318     ;
   319     proppat: '(' ('is' prop +) ')'
   320     ;
   321   \end{rail}
   322 
   323   \medskip Declarations of local variables @{text "x :: \<tau>"} and
   324   logical propositions @{text "a : \<phi>"} represent different views on
   325   the same principle of introducing a local scope.  In practice, one
   326   may usually omit the typing of \railnonterm{vars} (due to
   327   type-inference), and the naming of propositions (due to implicit
   328   references of current facts).  In any case, Isar proof elements
   329   usually admit to introduce multiple such items simultaneously.
   330 
   331   \indexouternonterm{vars}\indexouternonterm{props}
   332   \begin{rail}
   333     vars: (name+) ('::' type)?
   334     ;
   335     props: thmdecl? (prop proppat? +)
   336     ;
   337   \end{rail}
   338 
   339   The treatment of multiple declarations corresponds to the
   340   complementary focus of \railnonterm{vars} versus
   341   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
   342   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
   343   \<phi>\<^sub>n"} the naming refers to all propositions collectively.
   344   Isar language elements that refer to \railnonterm{vars} or
   345   \railnonterm{props} typically admit separate typings or namings via
   346   another level of iteration, with explicit @{keyword_ref "and"}
   347   separators; e.g.\ see @{command "fix"} and @{command "assume"} in
   348   \secref{sec:proof-context}.
   349 *}
   350 
   351 
   352 subsection {* Attributes and theorems \label{sec:syn-att} *}
   353 
   354 text {* Attributes have their own ``semi-inner'' syntax, in the sense
   355   that input conforming to \railnonterm{args} below is parsed by the
   356   attribute a second time.  The attribute argument specifications may
   357   be any sequence of atomic entities (identifiers, strings etc.), or
   358   properly bracketed argument lists.  Below \railqtok{atom} refers to
   359   any atomic entity, including any \railtok{keyword} conforming to
   360   \railtok{symident}.
   361 
   362   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   363   \begin{rail}
   364     atom: nameref | typefree | typevar | var | nat | keyword
   365     ;
   366     arg: atom | '(' args ')' | '[' args ']'
   367     ;
   368     args: arg *
   369     ;
   370     attributes: '[' (nameref args * ',') ']'
   371     ;
   372   \end{rail}
   373 
   374   Theorem specifications come in several flavors:
   375   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
   376   axioms, assumptions or results of goal statements, while
   377   \railnonterm{thmdef} collects lists of existing theorems.  Existing
   378   theorems are given by \railnonterm{thmref} and
   379   \railnonterm{thmrefs}, the former requires an actual singleton
   380   result.
   381 
   382   There are three forms of theorem references:
   383   \begin{enumerate}
   384   
   385   \item named facts @{text "a"},
   386 
   387   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
   388 
   389   \item literal fact propositions using @{syntax_ref altstring} syntax
   390   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
   391   @{method_ref fact}).
   392 
   393   \end{enumerate}
   394 
   395   Any kind of theorem specification may include lists of attributes
   396   both on the left and right hand sides; attributes are applied to any
   397   immediately preceding fact.  If names are omitted, the theorems are
   398   not stored within the theorem database of the theory or proof
   399   context, but any given attributes are applied nonetheless.
   400 
   401   An extra pair of brackets around attributes (like ``@{text
   402   "[[simproc a]]"}'') abbreviates a theorem reference involving an
   403   internal dummy fact, which will be ignored later on.  So only the
   404   effect of the attribute on the background context will persist.
   405   This form of in-place declarations is particularly useful with
   406   commands like @{command "declare"} and @{command "using"}.
   407 
   408   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
   409   \indexouternonterm{thmdef}\indexouternonterm{thmref}
   410   \indexouternonterm{thmrefs}\indexouternonterm{selection}
   411   \begin{rail}
   412     axmdecl: name attributes? ':'
   413     ;
   414     thmdecl: thmbind ':'
   415     ;
   416     thmdef: thmbind '='
   417     ;
   418     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
   419     ;
   420     thmrefs: thmref +
   421     ;
   422 
   423     thmbind: name attributes | name | attributes
   424     ;
   425     selection: '(' ((nat | nat '-' nat?) + ',') ')'
   426     ;
   427   \end{rail}
   428 *}
   429 
   430 end