doc-src/IsarRef/Thy/Outer_Syntax.thy
author wenzelm
Mon, 02 Jun 2008 22:50:23 +0200
changeset 27040 3d3e6e07b931
parent 27037 33d95687514e
child 27050 cd8d99b9ef09
permissions -rw-r--r--
major reorganization of document structure;
     1 (* $Id$ *)
     2 
     3 theory Outer_Syntax
     4 imports Pure
     5 begin
     6 
     7 chapter {* Outer syntax *}
     8 
     9 text {*
    10   The rather generic framework of Isabelle/Isar syntax emerges from
    11   three main syntactic categories: \emph{commands} of the top-level
    12   Isar engine (covering theory and proof elements), \emph{methods} for
    13   general goal refinements (analogous to traditional ``tactics''), and
    14   \emph{attributes} for operations on facts (within a certain
    15   context).  Subsequently we give a reference of basic syntactic
    16   entities underlying Isabelle/Isar syntax in a bottom-up manner.
    17   Concrete theory and proof language elements will be introduced later
    18   on.
    19 
    20   \medskip In order to get started with writing well-formed
    21   Isabelle/Isar documents, the most important aspect to be noted is
    22   the difference of \emph{inner} versus \emph{outer} syntax.  Inner
    23   syntax is that of Isabelle types and terms of the logic, while outer
    24   syntax is that of Isabelle/Isar theory sources (specifications and
    25   proofs).  As a general rule, inner syntax entities may occur only as
    26   \emph{atomic entities} within outer syntax.  For example, the string
    27   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
    28   specifications within a theory, while @{verbatim "x + y"} without
    29   quotes is not.
    30 
    31   Printed theory documents usually omit quotes to gain readability
    32   (this is a matter of {\LaTeX} macro setup, say via @{verbatim
    33   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced
    34   users of Isabelle/Isar may easily reconstruct the lost technical
    35   information, while mere readers need not care about quotes at all.
    36 
    37   \medskip Isabelle/Isar input may contain any number of input
    38   termination characters ``@{verbatim ";"}'' (semicolon) to separate
    39   commands explicitly.  This is particularly useful in interactive
    40   shell sessions to make clear where the current command is intended
    41   to end.  Otherwise, the interpreter loop will continue to issue a
    42   secondary prompt ``@{verbatim "#"}'' until an end-of-command is
    43   clearly recognized from the input syntax, e.g.\ encounter of the
    44   next command keyword.
    45 
    46   More advanced interfaces such as Proof~General \cite{proofgeneral}
    47   do not require explicit semicolons, the amount of input text is
    48   determined automatically by inspecting the present content of the
    49   Emacs text buffer.  In the printed presentation of Isabelle/Isar
    50   documents semicolons are omitted altogether for readability.
    51 
    52   \begin{warn}
    53     Proof~General requires certain syntax classification tables in
    54     order to achieve properly synchronized interaction with the
    55     Isabelle/Isar process.  These tables need to be consistent with
    56     the Isabelle version and particular logic image to be used in a
    57     running session (common object-logics may well change the outer
    58     syntax).  The standard setup should work correctly with any of the
    59     ``official'' logic images derived from Isabelle/HOL (including
    60     HOLCF etc.).  Users of alternative logics may need to tell
    61     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
    62     (in conjunction with @{verbatim "-l ZF"}, to specify the default
    63     logic image).  Note that option @{verbatim "-L"} does both
    64     of this at the same time.
    65   \end{warn}
    66 *}
    67 
    68 
    69 section {* Lexical matters \label{sec:lex-syntax} *}
    70 
    71 text {*
    72   The Isabelle/Isar outer syntax provides token classes as presented
    73   below; most of these coincide with the inner lexical syntax as
    74   presented in \cite{isabelle-ref}.
    75 
    76   \begin{matharray}{rcl}
    77     @{syntax_def ident} & = & letter\,quasiletter^* \\
    78     @{syntax_def longident} & = & ident (\verb,.,ident)^+ \\
    79     @{syntax_def symident} & = & sym^+ ~|~ \verb,\,\verb,<,ident\verb,>, \\
    80     @{syntax_def nat} & = & digit^+ \\
    81     @{syntax_def var} & = & ident ~|~ \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
    82     @{syntax_def typefree} & = & \verb,',ident \\
    83     @{syntax_def typevar} & = & typefree ~|~ \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
    84     @{syntax_def string} & = & \verb,", ~\dots~ \verb,", \\
    85     @{syntax_def altstring} & = & \backquote ~\dots~ \backquote \\
    86     @{syntax_def verbatim} & = & \verb,{*, ~\dots~ \verb,*,\verb,}, \\[1ex]
    87 
    88     letter & = & latin ~|~ \verb,\,\verb,<,latin\verb,>, ~|~ \verb,\,\verb,<,latin\,latin\verb,>, ~|~ greek ~|~ \\
    89            &   & \verb,\<^isub>, ~|~ \verb,\<^isup>, \\
    90     quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
    91     latin & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
    92     digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
    93     sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~
    94      \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \\
    95     & & \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \texttt{\at} ~|~
    96     \verb,^, ~|~ \verb,_, ~|~ \verb,|, ~|~ \verb,~, \\
    97     greek & = & \verb,\<alpha>, ~|~ \verb,\<beta>, ~|~ \verb,\<gamma>, ~|~ \verb,\<delta>, ~| \\
    98           &   & \verb,\<epsilon>, ~|~ \verb,\<zeta>, ~|~ \verb,\<eta>, ~|~ \verb,\<theta>, ~| \\
    99           &   & \verb,\<iota>, ~|~ \verb,\<kappa>, ~|~ \verb,\<mu>, ~|~ \verb,\<nu>, ~| \\
   100           &   & \verb,\<xi>, ~|~ \verb,\<pi>, ~|~ \verb,\<rho>, ~|~ \verb,\<sigma>, ~|~ \verb,\<tau>, ~| \\
   101           &   & \verb,\<upsilon>, ~|~ \verb,\<phi>, ~|~ \verb,\<chi>, ~|~ \verb,\<psi>, ~| \\
   102           &   & \verb,\<omega>, ~|~ \verb,\<Gamma>, ~|~ \verb,\<Delta>, ~|~ \verb,\<Theta>, ~| \\
   103           &   & \verb,\<Lambda>, ~|~ \verb,\<Xi>, ~|~ \verb,\<Pi>, ~|~ \verb,\<Sigma>, ~| \\
   104           &   & \verb,\<Upsilon>, ~|~ \verb,\<Phi>, ~|~ \verb,\<Psi>, ~|~ \verb,\<Omega>, \\
   105   \end{matharray}
   106 
   107   The syntax of @{syntax string} admits any characters, including
   108   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
   109   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
   110   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
   111   with three decimal digits.  Alternative strings according to
   112   @{syntax altstring} are analogous, using single back-quotes instead.
   113   The body of @{syntax verbatim} may consist of any text not
   114   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
   115   convenient inclusion of quotes without further escapes.  The greek
   116   letters do \emph{not} include @{verbatim "\<lambda>"}, which is already used
   117   differently in the meta-logic.
   118 
   119   Common mathematical symbols such as @{text \<forall>} are represented in
   120   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle
   121   symbols like this, although proper presentation is left to front-end
   122   tools such as {\LaTeX} or Proof~General with the X-Symbol package.
   123   A list of standard Isabelle symbols that work well with these tools
   124   is given in \cite[appendix~A]{isabelle-sys}.
   125   
   126   Source comments take the form @{verbatim "(*"}~@{text
   127   "\<dots>"}~@{verbatim "*)"} and may be nested, although user-interface
   128   tools might prevent this.  Note that this form indicates source
   129   comments only, which are stripped after lexical analysis of the
   130   input.  The Isar document syntax also provides formal comments that
   131   are considered as part of the text (see \secref{sec:comments}).
   132 *}
   133 
   134 
   135 section {* Common syntax entities *}
   136 
   137 text {*
   138   We now introduce several basic syntactic entities, such as names,
   139   terms, and theorem specifications, which are factored out of the
   140   actual Isar language elements to be described later.
   141 *}
   142 
   143 
   144 subsection {* Names *}
   145 
   146 text {*
   147   Entity \railqtok{name} usually refers to any name of types,
   148   constants, theorems etc.\ that are to be \emph{declared} or
   149   \emph{defined} (so qualified identifiers are excluded here).  Quoted
   150   strings provide an escape for non-identifier names or those ruled
   151   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
   152   Already existing objects are usually referenced by
   153   \railqtok{nameref}.
   154 
   155   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   156   \indexoutertoken{int}
   157   \begin{rail}
   158     name: ident | symident | string | nat
   159     ;
   160     parname: '(' name ')'
   161     ;
   162     nameref: name | longident
   163     ;
   164     int: nat | '-' nat
   165     ;
   166   \end{rail}
   167 *}
   168 
   169 
   170 subsection {* Comments \label{sec:comments} *}
   171 
   172 text {*
   173   Large chunks of plain \railqtok{text} are usually given
   174   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
   175   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,
   176   any of the smaller text units conforming to \railqtok{nameref} are
   177   admitted as well.  A marginal \railnonterm{comment} is of the form
   178   @{verbatim "--"} \railqtok{text}.  Any number of these may occur
   179   within Isabelle/Isar commands.
   180 
   181   \indexoutertoken{text}\indexouternonterm{comment}
   182   \begin{rail}
   183     text: verbatim | nameref
   184     ;
   185     comment: '--' text
   186     ;
   187   \end{rail}
   188 *}
   189 
   190 
   191 subsection {* Type classes, sorts and arities *}
   192 
   193 text {*
   194   Classes are specified by plain names.  Sorts have a very simple
   195   inner syntax, which is either a single class name @{text c} or a
   196   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
   197   intersection of these classes.  The syntax of type arities is given
   198   directly at the outer level.
   199 
   200   \indexouternonterm{sort}\indexouternonterm{arity}
   201   \indexouternonterm{classdecl}
   202   \begin{rail}
   203     classdecl: name (('<' | subseteq) (nameref + ','))?
   204     ;
   205     sort: nameref
   206     ;
   207     arity: ('(' (sort + ',') ')')? sort
   208     ;
   209   \end{rail}
   210 *}
   211 
   212 
   213 subsection {* Types and terms \label{sec:types-terms} *}
   214 
   215 text {*
   216   The actual inner Isabelle syntax, that of types and terms of the
   217   logic, is far too sophisticated in order to be modelled explicitly
   218   at the outer theory level.  Basically, any such entity has to be
   219   quoted to turn it into a single token (the parsing and type-checking
   220   is performed internally later).  For convenience, a slightly more
   221   liberal convention is adopted: quotes may be omitted for any type or
   222   term that is already atomic at the outer level.  For example, one
   223   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
   224   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
   225   "\<forall>"} are available as well, provided these have not been superseded
   226   by commands or other keywords already (such as @{verbatim "="} or
   227   @{verbatim "+"}).
   228 
   229   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   230   \begin{rail}
   231     type: nameref | typefree | typevar
   232     ;
   233     term: nameref | var
   234     ;
   235     prop: term
   236     ;
   237   \end{rail}
   238 
   239   Positional instantiations are indicated by giving a sequence of
   240   terms, or the placeholder ``@{text _}'' (underscore), which means to
   241   skip a position.
   242 
   243   \indexoutertoken{inst}\indexoutertoken{insts}
   244   \begin{rail}
   245     inst: underscore | term
   246     ;
   247     insts: (inst *)
   248     ;
   249   \end{rail}
   250 
   251   Type declarations and definitions usually refer to
   252   \railnonterm{typespec} on the left-hand side.  This models basic
   253   type constructor application at the outer syntax level.  Note that
   254   only plain postfix notation is available here, but no infixes.
   255 
   256   \indexouternonterm{typespec}
   257   \begin{rail}
   258     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   259     ;
   260   \end{rail}
   261 *}
   262 
   263 
   264 subsection {* Mixfix annotations *}
   265 
   266 text {*
   267   Mixfix annotations specify concrete \emph{inner} syntax of Isabelle
   268   types and terms.  Some commands such as @{command "types"} (see
   269   \secref{sec:types-pure}) admit infixes only, while @{command
   270   "consts"} (see \secref{sec:consts}) and @{command "syntax"} (see
   271   \secref{sec:syn-trans}) support the full range of general mixfixes
   272   and binders.
   273 
   274   \indexouternonterm{infix}\indexouternonterm{mixfix}\indexouternonterm{structmixfix}
   275   \begin{rail}
   276     infix: '(' ('infix' | 'infixl' | 'infixr') string? nat ')'
   277     ;
   278     mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
   279     ;
   280     structmixfix: mixfix | '(' 'structure' ')'
   281     ;
   282 
   283     prios: '[' (nat + ',') ']'
   284     ;
   285   \end{rail}
   286 
   287   Here the \railtok{string} specifications refer to the actual mixfix
   288   template (see also \cite{isabelle-ref}), which may include literal
   289   text, spacing, blocks, and arguments (denoted by ``@{text _}''); the
   290   special symbol ``@{verbatim "\<index>"}'' (printed as ``@{text "\<index>"}'')
   291   represents an index argument that specifies an implicit structure
   292   reference (see also \secref{sec:locale}).  Infix and binder
   293   declarations provide common abbreviations for particular mixfix
   294   declarations.  So in practice, mixfix templates mostly degenerate to
   295   literal text for concrete syntax, such as ``@{verbatim "++"}'' for
   296   an infix symbol, or ``@{verbatim "++"}@{text "\<index>"}'' for an infix of
   297   an implicit structure.
   298 *}
   299 
   300 
   301 subsection {* Proof methods \label{sec:syn-meth} *}
   302 
   303 text {*
   304   Proof methods are either basic ones, or expressions composed of
   305   methods via ``@{verbatim ","}'' (sequential composition),
   306   ``@{verbatim "|"}'' (alternative choices), ``@{verbatim "?"}'' 
   307   (try), ``@{verbatim "+"}'' (repeat at least once), ``@{verbatim
   308   "["}@{text n}@{verbatim "]"}'' (restriction to first @{text n}
   309   sub-goals, with default @{text "n = 1"}).  In practice, proof
   310   methods are usually just a comma separated list of
   311   \railqtok{nameref}~\railnonterm{args} specifications.  Note that
   312   parentheses may be dropped for single method specifications (with no
   313   arguments).
   314 
   315   \indexouternonterm{method}
   316   \begin{rail}
   317     method: (nameref | '(' methods ')') (() | '?' | '+' | '[' nat? ']')
   318     ;
   319     methods: (nameref args | method) + (',' | '|')
   320     ;
   321   \end{rail}
   322 
   323   Proper Isar proof methods do \emph{not} admit arbitrary goal
   324   addressing, but refer either to the first sub-goal or all sub-goals
   325   uniformly.  The goal restriction operator ``@{text "[n]"}''
   326   evaluates a method expression within a sandbox consisting of the
   327   first @{text n} sub-goals (which need to exist).  For example, the
   328   method ``@{text "simp_all[3]"}'' simplifies the first three
   329   sub-goals, while ``@{text "(rule foo, simp_all)[]"}'' simplifies all
   330   new goals that emerge from applying rule @{text "foo"} to the
   331   originally first one.
   332 
   333   Improper methods, notably tactic emulations, offer a separate
   334   low-level goal addressing scheme as explicit argument to the
   335   individual tactic being involved.  Here ``@{text "[!]"}'' refers to
   336   all goals, and ``@{text "[n-]"}'' to all goals starting from @{text
   337   "n"}.
   338 
   339   \indexouternonterm{goalspec}
   340   \begin{rail}
   341     goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'
   342     ;
   343   \end{rail}
   344 *}
   345 
   346 
   347 subsection {* Attributes and theorems \label{sec:syn-att} *}
   348 
   349 text {*
   350   Attributes (and proof methods, see \secref{sec:syn-meth}) have their
   351   own ``semi-inner'' syntax, in the sense that input conforming to
   352   \railnonterm{args} below is parsed by the attribute a second time.
   353   The attribute argument specifications may be any sequence of atomic
   354   entities (identifiers, strings etc.), or properly bracketed argument
   355   lists.  Below \railqtok{atom} refers to any atomic entity, including
   356   any \railtok{keyword} conforming to \railtok{symident}.
   357 
   358   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   359   \begin{rail}
   360     atom: nameref | typefree | typevar | var | nat | keyword
   361     ;
   362     arg: atom | '(' args ')' | '[' args ']'
   363     ;
   364     args: arg *
   365     ;
   366     attributes: '[' (nameref args * ',') ']'
   367     ;
   368   \end{rail}
   369 
   370   Theorem specifications come in several flavors:
   371   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
   372   axioms, assumptions or results of goal statements, while
   373   \railnonterm{thmdef} collects lists of existing theorems.  Existing
   374   theorems are given by \railnonterm{thmref} and
   375   \railnonterm{thmrefs}, the former requires an actual singleton
   376   result.
   377 
   378   There are three forms of theorem references:
   379   \begin{enumerate}
   380   
   381   \item named facts @{text "a"},
   382 
   383   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
   384 
   385   \item literal fact propositions using @{syntax_ref altstring} syntax
   386   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
   387   @{method_ref fact} in \secref{sec:pure-meth-att}).
   388 
   389   \end{enumerate}
   390 
   391   Any kind of theorem specification may include lists of attributes
   392   both on the left and right hand sides; attributes are applied to any
   393   immediately preceding fact.  If names are omitted, the theorems are
   394   not stored within the theorem database of the theory or proof
   395   context, but any given attributes are applied nonetheless.
   396 
   397   An extra pair of brackets around attributes (like ``@{text
   398   "[[simproc a]]"}'') abbreviates a theorem reference involving an
   399   internal dummy fact, which will be ignored later on.  So only the
   400   effect of the attribute on the background context will persist.
   401   This form of in-place declarations is particularly useful with
   402   commands like @{command "declare"} and @{command "using"}.
   403 
   404   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
   405   \indexouternonterm{thmdef}\indexouternonterm{thmref}
   406   \indexouternonterm{thmrefs}\indexouternonterm{selection}
   407   \begin{rail}
   408     axmdecl: name attributes? ':'
   409     ;
   410     thmdecl: thmbind ':'
   411     ;
   412     thmdef: thmbind '='
   413     ;
   414     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
   415     ;
   416     thmrefs: thmref +
   417     ;
   418 
   419     thmbind: name attributes | name | attributes
   420     ;
   421     selection: '(' ((nat | nat '-' nat?) + ',') ')'
   422     ;
   423   \end{rail}
   424 *}
   425 
   426 
   427 subsection {* Term patterns and declarations \label{sec:term-decls} *}
   428 
   429 text {*
   430   Wherever explicit propositions (or term fragments) occur in a proof
   431   text, casual binding of schematic term variables may be given
   432   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
   433   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.
   434 
   435   \indexouternonterm{termpat}\indexouternonterm{proppat}
   436   \begin{rail}
   437     termpat: '(' ('is' term +) ')'
   438     ;
   439     proppat: '(' ('is' prop +) ')'
   440     ;
   441   \end{rail}
   442 
   443   \medskip Declarations of local variables @{text "x :: \<tau>"} and
   444   logical propositions @{text "a : \<phi>"} represent different views on
   445   the same principle of introducing a local scope.  In practice, one
   446   may usually omit the typing of \railnonterm{vars} (due to
   447   type-inference), and the naming of propositions (due to implicit
   448   references of current facts).  In any case, Isar proof elements
   449   usually admit to introduce multiple such items simultaneously.
   450 
   451   \indexouternonterm{vars}\indexouternonterm{props}
   452   \begin{rail}
   453     vars: (name+) ('::' type)?
   454     ;
   455     props: thmdecl? (prop proppat? +)
   456     ;
   457   \end{rail}
   458 
   459   The treatment of multiple declarations corresponds to the
   460   complementary focus of \railnonterm{vars} versus
   461   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
   462   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
   463   \<phi>\<^sub>n"} the naming refers to all propositions collectively.
   464   Isar language elements that refer to \railnonterm{vars} or
   465   \railnonterm{props} typically admit separate typings or namings via
   466   another level of iteration, with explicit @{keyword_ref "and"}
   467   separators; e.g.\ see @{command "fix"} and @{command "assume"} in
   468   \secref{sec:proof-context}.
   469 *}
   470 
   471 end