src/Doc/Implementation/Syntax.thy
author wenzelm
Thu, 03 Jul 2014 12:17:55 +0200
changeset 58838 94596c573b38
parent 58688 1d6d44a0583f
child 59062 7cbb28332896
permissions -rw-r--r--
misc tuning;
     1 (*:wrap=hard:maxLineLen=78:*)
     2 
     3 theory Syntax
     4 imports Base
     5 begin
     6 
     7 chapter {* Concrete syntax and type-checking *}
     8 
     9 text {* Pure @{text "\<lambda>"}-calculus as introduced in \chref{ch:logic} is
    10   an adequate foundation for logical languages --- in the tradition of
    11   \emph{higher-order abstract syntax} --- but end-users require
    12   additional means for reading and printing of terms and types.  This
    13   important add-on outside the logical core is called \emph{inner
    14   syntax} in Isabelle jargon, as opposed to the \emph{outer syntax} of
    15   the theory and proof language \cite{isabelle-isar-ref}.
    16 
    17   For example, according to \cite{church40} quantifiers are represented as
    18   higher-order constants @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} such that @{text
    19   "All (\<lambda>x::'a. B x)"} faithfully represents the idea that is displayed in
    20   Isabelle as @{text "\<forall>x::'a. B x"} via @{keyword "binder"} notation.
    21   Moreover, type-inference in the style of Hindley-Milner \cite{hindleymilner}
    22   (and extensions) enables users to write @{text "\<forall>x. B x"} concisely, when
    23   the type @{text "'a"} is already clear from the
    24   context.\footnote{Type-inference taken to the extreme can easily confuse
    25   users. Beginners often stumble over unexpectedly general types inferred by
    26   the system.}
    27 
    28   \medskip The main inner syntax operations are \emph{read} for
    29   parsing together with type-checking, and \emph{pretty} for formatted
    30   output.  See also \secref{sec:read-print}.
    31 
    32   Furthermore, the input and output syntax layers are sub-divided into
    33   separate phases for \emph{concrete syntax} versus \emph{abstract
    34   syntax}, see also \secref{sec:parse-unparse} and
    35   \secref{sec:term-check}, respectively.  This results in the
    36   following decomposition of the main operations:
    37 
    38   \begin{itemize}
    39 
    40   \item @{text "read = parse; check"}
    41 
    42   \item @{text "pretty = uncheck; unparse"}
    43 
    44   \end{itemize}
    45 
    46   For example, some specification package might thus intercept syntax
    47   processing at a well-defined stage after @{text "parse"}, to a augment the
    48   resulting pre-term before full type-reconstruction is performed by @{text
    49   "check"}. Note that the formal status of bound variables, versus free
    50   variables, versus constants must not be changed between these phases.
    51 
    52   \medskip In general, @{text check} and @{text uncheck} operate
    53   simultaneously on a list of terms. This is particular important for
    54   type-checking, to reconstruct types for several terms of the same context
    55   and scope. In contrast, @{text parse} and @{text unparse} operate separately
    56   on single terms.
    57 
    58   There are analogous operations to read and print types, with the same
    59   sub-division into phases.
    60 *}
    61 
    62 
    63 section {* Reading and pretty printing \label{sec:read-print} *}
    64 
    65 text {*
    66   Read and print operations are roughly dual to each other, such that for the
    67   user @{text "s' = pretty (read s)"} looks similar to the original source
    68   text @{text "s"}, but the details depend on many side-conditions. There are
    69   also explicit options to control the removal of type information in the
    70   output. The default configuration routinely looses information, so @{text
    71   "t' = read (pretty t)"} might fail, or produce a differently typed term, or
    72   a completely different term in the face of syntactic overloading.
    73 *}
    74 
    75 text %mlref {*
    76   \begin{mldecls}
    77   @{index_ML Syntax.read_typs: "Proof.context -> string list -> typ list"} \\
    78   @{index_ML Syntax.read_terms: "Proof.context -> string list -> term list"} \\
    79   @{index_ML Syntax.read_props: "Proof.context -> string list -> term list"} \\[0.5ex]
    80   @{index_ML Syntax.read_typ: "Proof.context -> string -> typ"} \\
    81   @{index_ML Syntax.read_term: "Proof.context -> string -> term"} \\
    82   @{index_ML Syntax.read_prop: "Proof.context -> string -> term"} \\[0.5ex]
    83   @{index_ML Syntax.pretty_typ: "Proof.context -> typ -> Pretty.T"} \\
    84   @{index_ML Syntax.pretty_term: "Proof.context -> term -> Pretty.T"} \\
    85   @{index_ML Syntax.string_of_typ: "Proof.context -> typ -> string"} \\
    86   @{index_ML Syntax.string_of_term: "Proof.context -> term -> string"} \\
    87   \end{mldecls}
    88 
    89   \begin{description}
    90 
    91   \item @{ML Syntax.read_typs}~@{text "ctxt strs"} parses and checks a
    92   simultaneous list of source strings as types of the logic.
    93 
    94   \item @{ML Syntax.read_terms}~@{text "ctxt strs"} parses and checks a
    95   simultaneous list of source strings as terms of the logic.
    96   Type-reconstruction puts all parsed terms into the same scope: types of
    97   free variables ultimately need to coincide.
    98 
    99   If particular type-constraints are required for some of the arguments, the
   100   read operations needs to be split into its parse and check phases. Then it
   101   is possible to use @{ML Type.constraint} on the intermediate pre-terms
   102   \secref{sec:term-check}.
   103 
   104   \item @{ML Syntax.read_props}~@{text "ctxt strs"} parses and checks a
   105   simultaneous list of source strings as terms of the logic, with an implicit
   106   type-constraint for each argument to enforce type @{typ prop}; this also
   107   affects the inner syntax for parsing. The remaining type-reconstruction
   108   works as for @{ML Syntax.read_terms}.
   109 
   110   \item @{ML Syntax.read_typ}, @{ML Syntax.read_term}, @{ML Syntax.read_prop}
   111   are like the simultaneous versions, but operate on a single argument only.
   112   This convenient shorthand is adequate in situations where a single item in
   113   its own scope is processed. Do not use @{ML "map o Syntax.read_term"} where
   114   @{ML Syntax.read_terms} is actually intended!
   115 
   116   \item @{ML Syntax.pretty_typ}~@{text "ctxt T"} and @{ML
   117   Syntax.pretty_term}~@{text "ctxt t"} uncheck and pretty-print the given type
   118   or term, respectively. Although the uncheck phase acts on a simultaneous
   119   list as well, this is rarely used in practice, so only the singleton case is
   120   provided as combined pretty operation. There is no distinction of term vs.\
   121   proposition.
   122 
   123   \item @{ML Syntax.string_of_typ} and @{ML Syntax.string_of_term} are
   124   convenient compositions of @{ML Syntax.pretty_typ} and @{ML
   125   Syntax.pretty_term} with @{ML Pretty.string_of} for output. The result may
   126   be concatenated with other strings, as long as there is no further
   127   formatting and line-breaking involved.
   128 
   129   \end{description}
   130 
   131   @{ML Syntax.read_term}, @{ML Syntax.read_prop}, and @{ML
   132   Syntax.string_of_term} are the most important operations in practice.
   133 
   134   \medskip Note that the string values that are passed in and out are
   135   annotated by the system, to carry further markup that is relevant for the
   136   Prover IDE \cite{isabelle-jedit}. User code should neither compose its own
   137   input strings, nor try to analyze the output strings. Conceptually this is
   138   an abstract datatype, encoded as concrete string for historical reasons.
   139 
   140   The standard way to provide the required position markup for input works via
   141   the outer syntax parser wrapper @{ML Parse.inner_syntax}, which is already
   142   part of @{ML Parse.typ}, @{ML Parse.term}, @{ML Parse.prop}. So a string
   143   obtained from one of the latter may be directly passed to the corresponding
   144   read operation: this yields PIDE markup of the input and precise positions
   145   for warning and error messages.
   146 *}
   147 
   148 
   149 section {* Parsing and unparsing \label{sec:parse-unparse} *}
   150 
   151 text {*
   152   Parsing and unparsing converts between actual source text and a certain
   153   \emph{pre-term} format, where all bindings and scopes are already resolved
   154   faithfully. Thus the names of free variables or constants are determined in
   155   the sense of the logical context, but type information might be still
   156   missing. Pre-terms support an explicit language of \emph{type constraints}
   157   that may be augmented by user code to guide the later \emph{check} phase.
   158 
   159   Actual parsing is based on traditional lexical analysis and Earley parsing
   160   for arbitrary context-free grammars. The user can specify the grammar
   161   declaratively via mixfix annotations. Moreover, there are \emph{syntax
   162   translations} that can be augmented by the user, either declaratively via
   163   @{command translations} or programmatically via @{command
   164   parse_translation}, @{command print_translation} \cite{isabelle-isar-ref}.
   165   The final scope-resolution is performed by the system, according to name
   166   spaces for types, term variables and constants determined by the context.
   167 *}
   168 
   169 text %mlref {*
   170   \begin{mldecls}
   171   @{index_ML Syntax.parse_typ: "Proof.context -> string -> typ"} \\
   172   @{index_ML Syntax.parse_term: "Proof.context -> string -> term"} \\
   173   @{index_ML Syntax.parse_prop: "Proof.context -> string -> term"} \\[0.5ex]
   174   @{index_ML Syntax.unparse_typ: "Proof.context -> typ -> Pretty.T"} \\
   175   @{index_ML Syntax.unparse_term: "Proof.context -> term -> Pretty.T"} \\
   176   \end{mldecls}
   177 
   178   \begin{description}
   179 
   180   \item @{ML Syntax.parse_typ}~@{text "ctxt str"} parses a source string as
   181   pre-type that is ready to be used with subsequent check operations.
   182 
   183   \item @{ML Syntax.parse_term}~@{text "ctxt str"} parses a source string as
   184   pre-term that is ready to be used with subsequent check operations.
   185 
   186   \item @{ML Syntax.parse_prop}~@{text "ctxt str"} parses a source string as
   187   pre-term that is ready to be used with subsequent check operations. The
   188   inner syntax category is @{typ prop} and a suitable type-constraint is
   189   included to ensure that this information is observed in subsequent type
   190   reconstruction.
   191 
   192   \item @{ML Syntax.unparse_typ}~@{text "ctxt T"} unparses a type after
   193   uncheck operations, to turn it into a pretty tree.
   194 
   195   \item @{ML Syntax.unparse_term}~@{text "ctxt T"} unparses a term after
   196   uncheck operations, to turn it into a pretty tree. There is no distinction
   197   for propositions here.
   198 
   199   \end{description}
   200 
   201   These operations always operate on a single item; use the combinator @{ML
   202   map} to apply them to a list.
   203 *}
   204 
   205 
   206 section {* Checking and unchecking \label{sec:term-check} *}
   207 
   208 text {* These operations define the transition from pre-terms and
   209   fully-annotated terms in the sense of the logical core
   210   (\chref{ch:logic}).
   211 
   212   The \emph{check} phase is meant to subsume a variety of mechanisms
   213   in the manner of ``type-inference'' or ``type-reconstruction'' or
   214   ``type-improvement'', not just type-checking in the narrow sense.
   215   The \emph{uncheck} phase is roughly dual, it prunes type-information
   216   before pretty printing.
   217 
   218   A typical add-on for the check/uncheck syntax layer is the @{command
   219   abbreviation} mechanism \cite{isabelle-isar-ref}. Here the user specifies
   220   syntactic definitions that are managed by the system as polymorphic @{text
   221   "let"} bindings. These are expanded during the @{text "check"} phase, and
   222   contracted during the @{text "uncheck"} phase, without affecting the
   223   type-assignment of the given terms.
   224 
   225   \medskip The precise meaning of type checking depends on the context ---
   226   additional check/uncheck modules might be defined in user space.
   227 
   228   For example, the @{command class} command defines a context where
   229   @{text "check"} treats certain type instances of overloaded
   230   constants according to the ``dictionary construction'' of its
   231   logical foundation.  This involves ``type improvement''
   232   (specialization of slightly too general types) and replacement by
   233   certain locale parameters.  See also \cite{Haftmann-Wenzel:2009}.
   234 *}
   235 
   236 text %mlref {*
   237   \begin{mldecls}
   238   @{index_ML Syntax.check_typs: "Proof.context -> typ list -> typ list"} \\
   239   @{index_ML Syntax.check_terms: "Proof.context -> term list -> term list"} \\
   240   @{index_ML Syntax.check_props: "Proof.context -> term list -> term list"} \\[0.5ex]
   241   @{index_ML Syntax.uncheck_typs: "Proof.context -> typ list -> typ list"} \\
   242   @{index_ML Syntax.uncheck_terms: "Proof.context -> term list -> term list"} \\
   243   \end{mldecls}
   244 
   245   \begin{description}
   246 
   247   \item @{ML Syntax.check_typs}~@{text "ctxt Ts"} checks a simultaneous list
   248   of pre-types as types of the logic.  Typically, this involves normalization
   249   of type synonyms.
   250 
   251   \item @{ML Syntax.check_terms}~@{text "ctxt ts"} checks a simultaneous list
   252   of pre-terms as terms of the logic. Typically, this involves type-inference
   253   and normalization term abbreviations. The types within the given terms are
   254   treated in the same way as for @{ML Syntax.check_typs}.
   255 
   256   Applications sometimes need to check several types and terms together. The
   257   standard approach uses @{ML Logic.mk_type} to embed the language of types
   258   into that of terms; all arguments are appended into one list of terms that
   259   is checked; afterwards the type arguments are recovered with @{ML
   260   Logic.dest_type}.
   261 
   262   \item @{ML Syntax.check_props}~@{text "ctxt ts"} checks a simultaneous list
   263   of pre-terms as terms of the logic, such that all terms are constrained by
   264   type @{typ prop}. The remaining check operation works as @{ML
   265   Syntax.check_terms} above.
   266 
   267   \item @{ML Syntax.uncheck_typs}~@{text "ctxt Ts"} unchecks a simultaneous
   268   list of types of the logic, in preparation of pretty printing.
   269 
   270   \item @{ML Syntax.uncheck_terms}~@{text "ctxt ts"} unchecks a simultaneous
   271   list of terms of the logic, in preparation of pretty printing. There is no
   272   distinction for propositions here.
   273 
   274   \end{description}
   275 
   276   These operations always operate simultaneously on a list; use the combinator
   277   @{ML singleton} to apply them to a single item.
   278 *}
   279 
   280 end