1 (*:wrap=hard:maxLineLen=78:*)
7 chapter {* Concrete syntax and type-checking *}
9 text {* Pure @{text "\<lambda>"}-calculus as introduced in \chref{ch:logic} is
10 an adequate foundation for logical languages --- in the tradition of
11 \emph{higher-order abstract syntax} --- but end-users require
12 additional means for reading and printing of terms and types. This
13 important add-on outside the logical core is called \emph{inner
14 syntax} in Isabelle jargon, as opposed to the \emph{outer syntax} of
15 the theory and proof language \cite{isabelle-isar-ref}.
17 For example, according to \cite{church40} quantifiers are represented as
18 higher-order constants @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} such that @{text
19 "All (\<lambda>x::'a. B x)"} faithfully represents the idea that is displayed in
20 Isabelle as @{text "\<forall>x::'a. B x"} via @{keyword "binder"} notation.
21 Moreover, type-inference in the style of Hindley-Milner \cite{hindleymilner}
22 (and extensions) enables users to write @{text "\<forall>x. B x"} concisely, when
23 the type @{text "'a"} is already clear from the
24 context.\footnote{Type-inference taken to the extreme can easily confuse
25 users. Beginners often stumble over unexpectedly general types inferred by
28 \medskip The main inner syntax operations are \emph{read} for
29 parsing together with type-checking, and \emph{pretty} for formatted
30 output. See also \secref{sec:read-print}.
32 Furthermore, the input and output syntax layers are sub-divided into
33 separate phases for \emph{concrete syntax} versus \emph{abstract
34 syntax}, see also \secref{sec:parse-unparse} and
35 \secref{sec:term-check}, respectively. This results in the
36 following decomposition of the main operations:
40 \item @{text "read = parse; check"}
42 \item @{text "pretty = uncheck; unparse"}
46 For example, some specification package might thus intercept syntax
47 processing at a well-defined stage after @{text "parse"}, to a augment the
48 resulting pre-term before full type-reconstruction is performed by @{text
49 "check"}. Note that the formal status of bound variables, versus free
50 variables, versus constants must not be changed between these phases.
52 \medskip In general, @{text check} and @{text uncheck} operate
53 simultaneously on a list of terms. This is particular important for
54 type-checking, to reconstruct types for several terms of the same context
55 and scope. In contrast, @{text parse} and @{text unparse} operate separately
58 There are analogous operations to read and print types, with the same
59 sub-division into phases.
63 section {* Reading and pretty printing \label{sec:read-print} *}
66 Read and print operations are roughly dual to each other, such that for the
67 user @{text "s' = pretty (read s)"} looks similar to the original source
68 text @{text "s"}, but the details depend on many side-conditions. There are
69 also explicit options to control the removal of type information in the
70 output. The default configuration routinely looses information, so @{text
71 "t' = read (pretty t)"} might fail, or produce a differently typed term, or
72 a completely different term in the face of syntactic overloading.
77 @{index_ML Syntax.read_typs: "Proof.context -> string list -> typ list"} \\
78 @{index_ML Syntax.read_terms: "Proof.context -> string list -> term list"} \\
79 @{index_ML Syntax.read_props: "Proof.context -> string list -> term list"} \\[0.5ex]
80 @{index_ML Syntax.read_typ: "Proof.context -> string -> typ"} \\
81 @{index_ML Syntax.read_term: "Proof.context -> string -> term"} \\
82 @{index_ML Syntax.read_prop: "Proof.context -> string -> term"} \\[0.5ex]
83 @{index_ML Syntax.pretty_typ: "Proof.context -> typ -> Pretty.T"} \\
84 @{index_ML Syntax.pretty_term: "Proof.context -> term -> Pretty.T"} \\
85 @{index_ML Syntax.string_of_typ: "Proof.context -> typ -> string"} \\
86 @{index_ML Syntax.string_of_term: "Proof.context -> term -> string"} \\
91 \item @{ML Syntax.read_typs}~@{text "ctxt strs"} parses and checks a
92 simultaneous list of source strings as types of the logic.
94 \item @{ML Syntax.read_terms}~@{text "ctxt strs"} parses and checks a
95 simultaneous list of source strings as terms of the logic.
96 Type-reconstruction puts all parsed terms into the same scope: types of
97 free variables ultimately need to coincide.
99 If particular type-constraints are required for some of the arguments, the
100 read operations needs to be split into its parse and check phases. Then it
101 is possible to use @{ML Type.constraint} on the intermediate pre-terms
102 \secref{sec:term-check}.
104 \item @{ML Syntax.read_props}~@{text "ctxt strs"} parses and checks a
105 simultaneous list of source strings as terms of the logic, with an implicit
106 type-constraint for each argument to enforce type @{typ prop}; this also
107 affects the inner syntax for parsing. The remaining type-reconstruction
108 works as for @{ML Syntax.read_terms}.
110 \item @{ML Syntax.read_typ}, @{ML Syntax.read_term}, @{ML Syntax.read_prop}
111 are like the simultaneous versions, but operate on a single argument only.
112 This convenient shorthand is adequate in situations where a single item in
113 its own scope is processed. Do not use @{ML "map o Syntax.read_term"} where
114 @{ML Syntax.read_terms} is actually intended!
116 \item @{ML Syntax.pretty_typ}~@{text "ctxt T"} and @{ML
117 Syntax.pretty_term}~@{text "ctxt t"} uncheck and pretty-print the given type
118 or term, respectively. Although the uncheck phase acts on a simultaneous
119 list as well, this is rarely used in practice, so only the singleton case is
120 provided as combined pretty operation. There is no distinction of term vs.\
123 \item @{ML Syntax.string_of_typ} and @{ML Syntax.string_of_term} are
124 convenient compositions of @{ML Syntax.pretty_typ} and @{ML
125 Syntax.pretty_term} with @{ML Pretty.string_of} for output. The result may
126 be concatenated with other strings, as long as there is no further
127 formatting and line-breaking involved.
131 @{ML Syntax.read_term}, @{ML Syntax.read_prop}, and @{ML
132 Syntax.string_of_term} are the most important operations in practice.
134 \medskip Note that the string values that are passed in and out are
135 annotated by the system, to carry further markup that is relevant for the
136 Prover IDE \cite{isabelle-jedit}. User code should neither compose its own
137 input strings, nor try to analyze the output strings. Conceptually this is
138 an abstract datatype, encoded as concrete string for historical reasons.
140 The standard way to provide the required position markup for input works via
141 the outer syntax parser wrapper @{ML Parse.inner_syntax}, which is already
142 part of @{ML Parse.typ}, @{ML Parse.term}, @{ML Parse.prop}. So a string
143 obtained from one of the latter may be directly passed to the corresponding
144 read operation: this yields PIDE markup of the input and precise positions
145 for warning and error messages.
149 section {* Parsing and unparsing \label{sec:parse-unparse} *}
152 Parsing and unparsing converts between actual source text and a certain
153 \emph{pre-term} format, where all bindings and scopes are already resolved
154 faithfully. Thus the names of free variables or constants are determined in
155 the sense of the logical context, but type information might be still
156 missing. Pre-terms support an explicit language of \emph{type constraints}
157 that may be augmented by user code to guide the later \emph{check} phase.
159 Actual parsing is based on traditional lexical analysis and Earley parsing
160 for arbitrary context-free grammars. The user can specify the grammar
161 declaratively via mixfix annotations. Moreover, there are \emph{syntax
162 translations} that can be augmented by the user, either declaratively via
163 @{command translations} or programmatically via @{command
164 parse_translation}, @{command print_translation} \cite{isabelle-isar-ref}.
165 The final scope-resolution is performed by the system, according to name
166 spaces for types, term variables and constants determined by the context.
171 @{index_ML Syntax.parse_typ: "Proof.context -> string -> typ"} \\
172 @{index_ML Syntax.parse_term: "Proof.context -> string -> term"} \\
173 @{index_ML Syntax.parse_prop: "Proof.context -> string -> term"} \\[0.5ex]
174 @{index_ML Syntax.unparse_typ: "Proof.context -> typ -> Pretty.T"} \\
175 @{index_ML Syntax.unparse_term: "Proof.context -> term -> Pretty.T"} \\
180 \item @{ML Syntax.parse_typ}~@{text "ctxt str"} parses a source string as
181 pre-type that is ready to be used with subsequent check operations.
183 \item @{ML Syntax.parse_term}~@{text "ctxt str"} parses a source string as
184 pre-term that is ready to be used with subsequent check operations.
186 \item @{ML Syntax.parse_prop}~@{text "ctxt str"} parses a source string as
187 pre-term that is ready to be used with subsequent check operations. The
188 inner syntax category is @{typ prop} and a suitable type-constraint is
189 included to ensure that this information is observed in subsequent type
192 \item @{ML Syntax.unparse_typ}~@{text "ctxt T"} unparses a type after
193 uncheck operations, to turn it into a pretty tree.
195 \item @{ML Syntax.unparse_term}~@{text "ctxt T"} unparses a term after
196 uncheck operations, to turn it into a pretty tree. There is no distinction
197 for propositions here.
201 These operations always operate on a single item; use the combinator @{ML
202 map} to apply them to a list.
206 section {* Checking and unchecking \label{sec:term-check} *}
208 text {* These operations define the transition from pre-terms and
209 fully-annotated terms in the sense of the logical core
212 The \emph{check} phase is meant to subsume a variety of mechanisms
213 in the manner of ``type-inference'' or ``type-reconstruction'' or
214 ``type-improvement'', not just type-checking in the narrow sense.
215 The \emph{uncheck} phase is roughly dual, it prunes type-information
216 before pretty printing.
218 A typical add-on for the check/uncheck syntax layer is the @{command
219 abbreviation} mechanism \cite{isabelle-isar-ref}. Here the user specifies
220 syntactic definitions that are managed by the system as polymorphic @{text
221 "let"} bindings. These are expanded during the @{text "check"} phase, and
222 contracted during the @{text "uncheck"} phase, without affecting the
223 type-assignment of the given terms.
225 \medskip The precise meaning of type checking depends on the context ---
226 additional check/uncheck modules might be defined in user space.
228 For example, the @{command class} command defines a context where
229 @{text "check"} treats certain type instances of overloaded
230 constants according to the ``dictionary construction'' of its
231 logical foundation. This involves ``type improvement''
232 (specialization of slightly too general types) and replacement by
233 certain locale parameters. See also \cite{Haftmann-Wenzel:2009}.
238 @{index_ML Syntax.check_typs: "Proof.context -> typ list -> typ list"} \\
239 @{index_ML Syntax.check_terms: "Proof.context -> term list -> term list"} \\
240 @{index_ML Syntax.check_props: "Proof.context -> term list -> term list"} \\[0.5ex]
241 @{index_ML Syntax.uncheck_typs: "Proof.context -> typ list -> typ list"} \\
242 @{index_ML Syntax.uncheck_terms: "Proof.context -> term list -> term list"} \\
247 \item @{ML Syntax.check_typs}~@{text "ctxt Ts"} checks a simultaneous list
248 of pre-types as types of the logic. Typically, this involves normalization
251 \item @{ML Syntax.check_terms}~@{text "ctxt ts"} checks a simultaneous list
252 of pre-terms as terms of the logic. Typically, this involves type-inference
253 and normalization term abbreviations. The types within the given terms are
254 treated in the same way as for @{ML Syntax.check_typs}.
256 Applications sometimes need to check several types and terms together. The
257 standard approach uses @{ML Logic.mk_type} to embed the language of types
258 into that of terms; all arguments are appended into one list of terms that
259 is checked; afterwards the type arguments are recovered with @{ML
262 \item @{ML Syntax.check_props}~@{text "ctxt ts"} checks a simultaneous list
263 of pre-terms as terms of the logic, such that all terms are constrained by
264 type @{typ prop}. The remaining check operation works as @{ML
265 Syntax.check_terms} above.
267 \item @{ML Syntax.uncheck_typs}~@{text "ctxt Ts"} unchecks a simultaneous
268 list of types of the logic, in preparation of pretty printing.
270 \item @{ML Syntax.uncheck_terms}~@{text "ctxt ts"} unchecks a simultaneous
271 list of terms of the logic, in preparation of pretty printing. There is no
272 distinction for propositions here.
276 These operations always operate simultaneously on a list; use the combinator
277 @{ML singleton} to apply them to a single item.