wneuper/isa: src/Doc/Implementation/Syntax.thy@94596c573b38

     1 (*:wrap=hard:maxLineLen=78:*)

     3 theory Syntax

     4 imports Base

     5 begin

     7 chapter {* Concrete syntax and type-checking *}

     9 text {* Pure @{text "\<lambda>"}-calculus as introduced in \chref{ch:logic} is

    10   an adequate foundation for logical languages --- in the tradition of

    11   \emph{higher-order abstract syntax} --- but end-users require

    12   additional means for reading and printing of terms and types.  This

    13   important add-on outside the logical core is called \emph{inner

    14   syntax} in Isabelle jargon, as opposed to the \emph{outer syntax} of

    15   the theory and proof language \cite{isabelle-isar-ref}.

    17   For example, according to \cite{church40} quantifiers are represented as

    18   higher-order constants @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} such that @{text

    19   "All (\<lambda>x::'a. B x)"} faithfully represents the idea that is displayed in

    20   Isabelle as @{text "\<forall>x::'a. B x"} via @{keyword "binder"} notation.

    21   Moreover, type-inference in the style of Hindley-Milner \cite{hindleymilner}

    22   (and extensions) enables users to write @{text "\<forall>x. B x"} concisely, when

    23   the type @{text "'a"} is already clear from the

    24   context.\footnote{Type-inference taken to the extreme can easily confuse

    25   users. Beginners often stumble over unexpectedly general types inferred by

    26   the system.}

    28   \medskip The main inner syntax operations are \emph{read} for

    29   parsing together with type-checking, and \emph{pretty} for formatted

    30   output.  See also \secref{sec:read-print}.

    32   Furthermore, the input and output syntax layers are sub-divided into

    33   separate phases for \emph{concrete syntax} versus \emph{abstract

    34   syntax}, see also \secref{sec:parse-unparse} and

    35   \secref{sec:term-check}, respectively.  This results in the

    36   following decomposition of the main operations:

    38   \begin{itemize}

    40   \item @{text "read = parse; check"}

    42   \item @{text "pretty = uncheck; unparse"}

    44   \end{itemize}

    46   For example, some specification package might thus intercept syntax

    47   processing at a well-defined stage after @{text "parse"}, to a augment the

    48   resulting pre-term before full type-reconstruction is performed by @{text

    49   "check"}. Note that the formal status of bound variables, versus free

    50   variables, versus constants must not be changed between these phases.

    52   \medskip In general, @{text check} and @{text uncheck} operate

    53   simultaneously on a list of terms. This is particular important for

    54   type-checking, to reconstruct types for several terms of the same context

    55   and scope. In contrast, @{text parse} and @{text unparse} operate separately

    56   on single terms.

    58   There are analogous operations to read and print types, with the same

    59   sub-division into phases.

    60 *}

    63 section {* Reading and pretty printing \label{sec:read-print} *}

    65 text {*

    66   Read and print operations are roughly dual to each other, such that for the

    67   user @{text "s' = pretty (read s)"} looks similar to the original source

    68   text @{text "s"}, but the details depend on many side-conditions. There are

    69   also explicit options to control the removal of type information in the

    70   output. The default configuration routinely looses information, so @{text

    71   "t' = read (pretty t)"} might fail, or produce a differently typed term, or

    72   a completely different term in the face of syntactic overloading.

    73 *}

    75 text %mlref {*

    76   \begin{mldecls}

    77   @{index_ML Syntax.read_typs: "Proof.context -> string list -> typ list"} \\

    78   @{index_ML Syntax.read_terms: "Proof.context -> string list -> term list"} \\

    79   @{index_ML Syntax.read_props: "Proof.context -> string list -> term list"} \\[0.5ex]

    80   @{index_ML Syntax.read_typ: "Proof.context -> string -> typ"} \\

    81   @{index_ML Syntax.read_term: "Proof.context -> string -> term"} \\

    82   @{index_ML Syntax.read_prop: "Proof.context -> string -> term"} \\[0.5ex]

    83   @{index_ML Syntax.pretty_typ: "Proof.context -> typ -> Pretty.T"} \\

    84   @{index_ML Syntax.pretty_term: "Proof.context -> term -> Pretty.T"} \\

    85   @{index_ML Syntax.string_of_typ: "Proof.context -> typ -> string"} \\

    86   @{index_ML Syntax.string_of_term: "Proof.context -> term -> string"} \\

    87   \end{mldecls}

    89   \begin{description}

    91   \item @{ML Syntax.read_typs}~@{text "ctxt strs"} parses and checks a

    92   simultaneous list of source strings as types of the logic.

    94   \item @{ML Syntax.read_terms}~@{text "ctxt strs"} parses and checks a

    95   simultaneous list of source strings as terms of the logic.

    96   Type-reconstruction puts all parsed terms into the same scope: types of

    97   free variables ultimately need to coincide.

    99   If particular type-constraints are required for some of the arguments, the

   100   read operations needs to be split into its parse and check phases. Then it

   101   is possible to use @{ML Type.constraint} on the intermediate pre-terms

   102   \secref{sec:term-check}.

   104   \item @{ML Syntax.read_props}~@{text "ctxt strs"} parses and checks a

   105   simultaneous list of source strings as terms of the logic, with an implicit

   106   type-constraint for each argument to enforce type @{typ prop}; this also

   107   affects the inner syntax for parsing. The remaining type-reconstruction

   108   works as for @{ML Syntax.read_terms}.

   110   \item @{ML Syntax.read_typ}, @{ML Syntax.read_term}, @{ML Syntax.read_prop}

   111   are like the simultaneous versions, but operate on a single argument only.

   112   This convenient shorthand is adequate in situations where a single item in

   113   its own scope is processed. Do not use @{ML "map o Syntax.read_term"} where

   114   @{ML Syntax.read_terms} is actually intended!

   116   \item @{ML Syntax.pretty_typ}~@{text "ctxt T"} and @{ML

   117   Syntax.pretty_term}~@{text "ctxt t"} uncheck and pretty-print the given type

   118   or term, respectively. Although the uncheck phase acts on a simultaneous

   119   list as well, this is rarely used in practice, so only the singleton case is

   120   provided as combined pretty operation. There is no distinction of term vs.\

   121   proposition.

   123   \item @{ML Syntax.string_of_typ} and @{ML Syntax.string_of_term} are

   124   convenient compositions of @{ML Syntax.pretty_typ} and @{ML

   125   Syntax.pretty_term} with @{ML Pretty.string_of} for output. The result may

   126   be concatenated with other strings, as long as there is no further

   127   formatting and line-breaking involved.

   129   \end{description}

   131   @{ML Syntax.read_term}, @{ML Syntax.read_prop}, and @{ML

   132   Syntax.string_of_term} are the most important operations in practice.

   134   \medskip Note that the string values that are passed in and out are

   135   annotated by the system, to carry further markup that is relevant for the

   136   Prover IDE \cite{isabelle-jedit}. User code should neither compose its own

   137   input strings, nor try to analyze the output strings. Conceptually this is

   138   an abstract datatype, encoded as concrete string for historical reasons.

   140   The standard way to provide the required position markup for input works via

   141   the outer syntax parser wrapper @{ML Parse.inner_syntax}, which is already

   142   part of @{ML Parse.typ}, @{ML Parse.term}, @{ML Parse.prop}. So a string

   143   obtained from one of the latter may be directly passed to the corresponding

   144   read operation: this yields PIDE markup of the input and precise positions

   145   for warning and error messages.

   146 *}

   149 section {* Parsing and unparsing \label{sec:parse-unparse} *}

   151 text {*

   152   Parsing and unparsing converts between actual source text and a certain

   153   \emph{pre-term} format, where all bindings and scopes are already resolved

   154   faithfully. Thus the names of free variables or constants are determined in

   155   the sense of the logical context, but type information might be still

   156   missing. Pre-terms support an explicit language of \emph{type constraints}

   157   that may be augmented by user code to guide the later \emph{check} phase.

   159   Actual parsing is based on traditional lexical analysis and Earley parsing

   160   for arbitrary context-free grammars. The user can specify the grammar

   161   declaratively via mixfix annotations. Moreover, there are \emph{syntax

   162   translations} that can be augmented by the user, either declaratively via

   163   @{command translations} or programmatically via @{command

   164   parse_translation}, @{command print_translation} \cite{isabelle-isar-ref}.

   165   The final scope-resolution is performed by the system, according to name

   166   spaces for types, term variables and constants determined by the context.

   167 *}

   169 text %mlref {*

   170   \begin{mldecls}

   171   @{index_ML Syntax.parse_typ: "Proof.context -> string -> typ"} \\

   172   @{index_ML Syntax.parse_term: "Proof.context -> string -> term"} \\

   173   @{index_ML Syntax.parse_prop: "Proof.context -> string -> term"} \\[0.5ex]

   174   @{index_ML Syntax.unparse_typ: "Proof.context -> typ -> Pretty.T"} \\

   175   @{index_ML Syntax.unparse_term: "Proof.context -> term -> Pretty.T"} \\

   176   \end{mldecls}

   178   \begin{description}

   180   \item @{ML Syntax.parse_typ}~@{text "ctxt str"} parses a source string as

   181   pre-type that is ready to be used with subsequent check operations.

   183   \item @{ML Syntax.parse_term}~@{text "ctxt str"} parses a source string as

   184   pre-term that is ready to be used with subsequent check operations.

   186   \item @{ML Syntax.parse_prop}~@{text "ctxt str"} parses a source string as

   187   pre-term that is ready to be used with subsequent check operations. The

   188   inner syntax category is @{typ prop} and a suitable type-constraint is

   189   included to ensure that this information is observed in subsequent type

   190   reconstruction.

   192   \item @{ML Syntax.unparse_typ}~@{text "ctxt T"} unparses a type after

   193   uncheck operations, to turn it into a pretty tree.

   195   \item @{ML Syntax.unparse_term}~@{text "ctxt T"} unparses a term after

   196   uncheck operations, to turn it into a pretty tree. There is no distinction

   197   for propositions here.

   199   \end{description}

   201   These operations always operate on a single item; use the combinator @{ML

   202   map} to apply them to a list.

   203 *}

   206 section {* Checking and unchecking \label{sec:term-check} *}

   208 text {* These operations define the transition from pre-terms and

   209   fully-annotated terms in the sense of the logical core

   210   (\chref{ch:logic}).

   212   The \emph{check} phase is meant to subsume a variety of mechanisms

   213   in the manner of ``type-inference'' or ``type-reconstruction'' or

   214   ``type-improvement'', not just type-checking in the narrow sense.

   215   The \emph{uncheck} phase is roughly dual, it prunes type-information

   216   before pretty printing.

   218   A typical add-on for the check/uncheck syntax layer is the @{command

   219   abbreviation} mechanism \cite{isabelle-isar-ref}. Here the user specifies

   220   syntactic definitions that are managed by the system as polymorphic @{text

   221   "let"} bindings. These are expanded during the @{text "check"} phase, and

   222   contracted during the @{text "uncheck"} phase, without affecting the

   223   type-assignment of the given terms.

   225   \medskip The precise meaning of type checking depends on the context ---

   226   additional check/uncheck modules might be defined in user space.

   228   For example, the @{command class} command defines a context where

   229   @{text "check"} treats certain type instances of overloaded

   230   constants according to the ``dictionary construction'' of its

   231   logical foundation.  This involves ``type improvement''

   232   (specialization of slightly too general types) and replacement by

   233   certain locale parameters.  See also \cite{Haftmann-Wenzel:2009}.

   234 *}

   236 text %mlref {*

   237   \begin{mldecls}

   238   @{index_ML Syntax.check_typs: "Proof.context -> typ list -> typ list"} \\

   239   @{index_ML Syntax.check_terms: "Proof.context -> term list -> term list"} \\

   240   @{index_ML Syntax.check_props: "Proof.context -> term list -> term list"} \\[0.5ex]

   241   @{index_ML Syntax.uncheck_typs: "Proof.context -> typ list -> typ list"} \\

   242   @{index_ML Syntax.uncheck_terms: "Proof.context -> term list -> term list"} \\

   243   \end{mldecls}

   245   \begin{description}

   247   \item @{ML Syntax.check_typs}~@{text "ctxt Ts"} checks a simultaneous list

   248   of pre-types as types of the logic.  Typically, this involves normalization

   249   of type synonyms.

   251   \item @{ML Syntax.check_terms}~@{text "ctxt ts"} checks a simultaneous list

   252   of pre-terms as terms of the logic. Typically, this involves type-inference

   253   and normalization term abbreviations. The types within the given terms are

   254   treated in the same way as for @{ML Syntax.check_typs}.

   256   Applications sometimes need to check several types and terms together. The

   257   standard approach uses @{ML Logic.mk_type} to embed the language of types

   258   into that of terms; all arguments are appended into one list of terms that

   259   is checked; afterwards the type arguments are recovered with @{ML

   260   Logic.dest_type}.

   262   \item @{ML Syntax.check_props}~@{text "ctxt ts"} checks a simultaneous list

   263   of pre-terms as terms of the logic, such that all terms are constrained by

   264   type @{typ prop}. The remaining check operation works as @{ML

   265   Syntax.check_terms} above.

   267   \item @{ML Syntax.uncheck_typs}~@{text "ctxt Ts"} unchecks a simultaneous

   268   list of types of the logic, in preparation of pretty printing.

   270   \item @{ML Syntax.uncheck_terms}~@{text "ctxt ts"} unchecks a simultaneous

   271   list of terms of the logic, in preparation of pretty printing. There is no

   272   distinction for propositions here.

   274   \end{description}

   276   These operations always operate simultaneously on a list; use the combinator

   277   @{ML singleton} to apply them to a single item.

   278 *}

   280 end

author	wenzelm
	Thu, 03 Jul 2014 12:17:55 +0200
changeset 58838	94596c573b38
parent 58688	1d6d44a0583f
child 59062	7cbb28332896
permissions	-rw-r--r--