5 section {* Turning Theories into Programs \label{sec:program} *}
7 subsection {* The @{text "Isabelle/HOL"} default setup *}
10 We have already seen how by default equations stemming from
11 @{command definition}/@{command primrec}/@{command fun}
12 statements are used for code generation. This default behaviour
13 can be changed, e.g. by providing different code equations.
14 All kinds of customisation shown in this section is \emph{safe}
15 in the sense that the user does not have to worry about
16 correctness -- all programs generatable that way are partially
20 subsection {* Selecting code equations *}
23 Coming back to our introductory example, we
24 could provide an alternative code equations for @{const dequeue}
29 "dequeue (AQueue xs []) =
30 (if xs = [] then (None, AQueue [] [])
31 else dequeue (AQueue [] (rev xs)))"
32 "dequeue (AQueue xs (y # ys)) =
33 (Some y, AQueue xs ys)"
34 by (cases xs, simp_all) (cases "rev xs", simp_all)
37 \noindent The annotation @{text "[code]"} is an @{text Isar}
38 @{text attribute} which states that the given theorems should be
39 considered as code equations for a @{text fun} statement --
40 the corresponding constant is determined syntactically. The resulting code:
43 text %quote {*@{code_stmts dequeue (consts) dequeue (Haskell)}*}
46 \noindent You may note that the equality test @{term "xs = []"} has been
47 replaced by the predicate @{term "null xs"}. This is due to the default
48 setup in the \qn{preprocessor} to be discussed further below (\secref{sec:preproc}).
50 Changing the default constructor set of datatypes is also
51 possible. See \secref{sec:datatypes} for an example.
53 As told in \secref{sec:concept}, code generation is based
54 on a structured collection of code theorems.
55 For explorative purpose, this collection
56 may be inspected using the @{command code_thms} command:
59 code_thms %quote dequeue
62 \noindent prints a table with \emph{all} code equations
63 for @{const dequeue}, including
64 \emph{all} code equations those equations depend
67 Similarly, the @{command code_deps} command shows a graph
68 visualising dependencies between code equations.
71 subsection {* @{text class} and @{text instantiation} *}
74 Concerning type classes and code generation, let us examine an example
75 from abstract algebra:
78 class %quote semigroup =
79 fixes mult :: "'a \<Rightarrow> 'a \<Rightarrow> 'a" (infixl "\<otimes>" 70)
80 assumes assoc: "(x \<otimes> y) \<otimes> z = x \<otimes> (y \<otimes> z)"
82 class %quote monoid = semigroup +
83 fixes neutral :: 'a ("\<one>")
84 assumes neutl: "\<one> \<otimes> x = x"
85 and neutr: "x \<otimes> \<one> = x"
87 instantiation %quote nat :: monoid
90 primrec %quote mult_nat where
91 "0 \<otimes> n = (0\<Colon>nat)"
92 | "Suc m \<otimes> n = n + m \<otimes> n"
94 definition %quote neutral_nat where
97 lemma %quote add_mult_distrib:
99 shows "(n + m) \<otimes> q = n \<otimes> q + m \<otimes> q"
100 by (induct n) simp_all
102 instance %quote proof
104 show "m \<otimes> n \<otimes> q = m \<otimes> (n \<otimes> q)"
105 by (induct m) (simp_all add: add_mult_distrib)
106 show "\<one> \<otimes> n = n"
107 by (simp add: neutral_nat_def)
108 show "m \<otimes> \<one> = m"
109 by (induct m) (simp_all add: neutral_nat_def)
115 \noindent We define the natural operation of the natural numbers
119 primrec %quote (in monoid) pow :: "nat \<Rightarrow> 'a \<Rightarrow> 'a" where
121 | "pow (Suc n) a = a \<otimes> pow n a"
124 \noindent This we use to define the discrete exponentiation function:
127 definition %quote bexp :: "nat \<Rightarrow> nat" where
128 "bexp n = pow n (Suc (Suc 0))"
131 \noindent The corresponding code:
134 text %quote {*@{code_stmts bexp (Haskell)}*}
137 \noindent This is a convenient place to show how explicit dictionary construction
138 manifests in generated code (here, the same example in @{text SML}):
141 text %quote {*@{code_stmts bexp (SML)}*}
144 \noindent Note the parameters with trailing underscore (@{verbatim "A_"})
145 which are the dictionary parameters.
148 subsection {* The preprocessor \label{sec:preproc} *}
151 Before selected function theorems are turned into abstract
152 code, a chain of definitional transformation steps is carried
153 out: \emph{preprocessing}. In essence, the preprocessor
154 consists of two components: a \emph{simpset} and \emph{function transformers}.
156 The \emph{simpset} allows to employ the full generality of the Isabelle
157 simplifier. Due to the interpretation of theorems
158 as code equations, rewrites are applied to the right
159 hand side and the arguments of the left hand side of an
160 equation, but never to the constant heading the left hand side.
161 An important special case are \emph{inline theorems} which may be
162 declared and undeclared using the
163 \emph{code inline} or \emph{code inline del} attribute respectively.
165 Some common applications:
173 \item replacing non-executable constructs by executable ones:
176 lemma %quote [code inline]:
177 "x \<in> set xs \<longleftrightarrow> x mem xs" by (induct xs) simp_all
180 \item eliminating superfluous constants:
183 lemma %quote [code inline]:
187 \item replacing executable but inconvenient constructs:
190 lemma %quote [code inline]:
191 "xs = [] \<longleftrightarrow> List.null xs" by (induct xs) simp_all
198 \noindent \emph{Function transformers} provide a very general interface,
199 transforming a list of function theorems to another
200 list of function theorems, provided that neither the heading
201 constant nor its type change. The @{term "0\<Colon>nat"} / @{const Suc}
202 pattern elimination implemented in
203 theory @{text Efficient_Nat} (see \secref{eff_nat}) uses this
206 \noindent The current setup of the preprocessor may be inspected using
207 the @{command print_codesetup} command.
208 @{command code_thms} provides a convenient
209 mechanism to inspect the impact of a preprocessor setup
213 The attribute \emph{code unfold}
214 associated with the @{text "SML code generator"} also applies to
215 the @{text "generic code generator"}:
216 \emph{code unfold} implies \emph{code inline}.
220 subsection {* Datatypes \label{sec:datatypes} *}
223 Conceptually, any datatype is spanned by a set of
224 \emph{constructors} of type @{text "\<tau> = \<dots> \<Rightarrow> \<kappa> \<alpha>\<^isub>1 \<dots> \<alpha>\<^isub>n"} where @{text
225 "{\<alpha>\<^isub>1, \<dots>, \<alpha>\<^isub>n}"} is exactly the set of \emph{all} type variables in
226 @{text "\<tau>"}. The HOL datatype package by default registers any new
227 datatype in the table of datatypes, which may be inspected using the
228 @{command print_codesetup} command.
230 In some cases, it is appropriate to alter or extend this table. As
231 an example, we will develop an alternative representation of the
232 queue example given in \secref{sec:intro}. The amortised
233 representation is convenient for generating code but exposes its
234 \qt{implementation} details, which may be cumbersome when proving
235 theorems about it. Therefore, here a simple, straightforward
236 representation of queues:
239 datatype %quote 'a queue = Queue "'a list"
241 definition %quote empty :: "'a queue" where
244 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
245 "enqueue x (Queue xs) = Queue (xs @ [x])"
247 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
248 "dequeue (Queue []) = (None, Queue [])"
249 | "dequeue (Queue (x # xs)) = (Some x, Queue xs)"
252 \noindent This we can use directly for proving; for executing,
253 we provide an alternative characterisation:
256 definition %quote AQueue :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a queue" where
257 "AQueue xs ys = Queue (ys @ rev xs)"
259 code_datatype %quote AQueue
262 \noindent Here we define a \qt{constructor} @{const "AQueue"} which
263 is defined in terms of @{text "Queue"} and interprets its arguments
264 according to what the \emph{content} of an amortised queue is supposed
265 to be. Equipped with this, we are able to prove the following equations
266 for our primitive queue operations which \qt{implement} the simple
267 queues in an amortised fashion:
270 lemma %quote empty_AQueue [code]:
271 "empty = AQueue [] []"
272 unfolding AQueue_def empty_def by simp
274 lemma %quote enqueue_AQueue [code]:
275 "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
276 unfolding AQueue_def by simp
278 lemma %quote dequeue_AQueue [code]:
279 "dequeue (AQueue xs []) =
280 (if xs = [] then (None, AQueue [] [])
281 else dequeue (AQueue [] (rev xs)))"
282 "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
283 unfolding AQueue_def by simp_all
286 \noindent For completeness, we provide a substitute for the
287 @{text case} combinator on queues:
290 lemma %quote queue_case_AQueue [code]:
291 "queue_case f (AQueue xs ys) = f (ys @ rev xs)"
292 unfolding AQueue_def by simp
295 \noindent The resulting code looks as expected:
298 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
301 \noindent From this example, it can be glimpsed that using own
302 constructor sets is a little delicate since it changes the set of
303 valid patterns for values of that type. Without going into much
304 detail, here some practical hints:
308 \item When changing the constructor set for datatypes, take care
309 to provide alternative equations for the @{text case} combinator.
311 \item Values in the target language need not to be normalised --
312 different values in the target language may represent the same
315 \item Usually, a good methodology to deal with the subtleties of
316 pattern matching is to see the type as an abstract type: provide
317 a set of operations which operate on the concrete representation
318 of the type, and derive further operations by combinations of
319 these primitive ones, without relying on a particular
326 subsection {* Equality and wellsortedness *}
329 Surely you have already noticed how equality is treated
330 by the code generator:
333 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where
334 "collect_duplicates xs ys [] = xs"
335 | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs
336 then if z \<in> set ys
337 then collect_duplicates xs ys zs
338 else collect_duplicates xs (z#ys) zs
339 else collect_duplicates (z#xs) (z#ys) zs)"
342 \noindent The membership test during preprocessing is rewritten,
343 resulting in @{const List.member}, which itself
344 performs an explicit equality check.
347 text %quote {*@{code_stmts collect_duplicates (SML)}*}
350 \noindent Obviously, polymorphic equality is implemented the Haskell
351 way using a type class. How is this achieved? HOL introduces
352 an explicit class @{class eq} with a corresponding operation
353 @{const eq_class.eq} such that @{thm eq [no_vars]}.
354 The preprocessing framework does the rest by propagating the
355 @{class eq} constraints through all dependent code equations.
356 For datatypes, instances of @{class eq} are implicitly derived
357 when possible. For other types, you may instantiate @{text eq}
358 manually like any other type class.
360 Though this @{text eq} class is designed to get rarely in
362 enters the stage when definitions of overloaded constants
363 are dependent on operational equality. For example, let
364 us define a lexicographic ordering on tuples
365 (also see theory @{theory Product_ord}):
368 instantiation %quote "*" :: (order, order) order
371 definition %quote [code del]:
372 "x \<le> y \<longleftrightarrow> fst x < fst y \<or> fst x = fst y \<and> snd x \<le> snd y"
374 definition %quote [code del]:
375 "x < y \<longleftrightarrow> fst x < fst y \<or> fst x = fst y \<and> snd x < snd y"
377 instance %quote proof
378 qed (auto simp: less_eq_prod_def less_prod_def intro: order_less_trans)
382 lemma %quote order_prod [code]:
383 "(x1 \<Colon> 'a\<Colon>order, y1 \<Colon> 'b\<Colon>order) < (x2, y2) \<longleftrightarrow>
384 x1 < x2 \<or> x1 = x2 \<and> y1 < y2"
385 "(x1 \<Colon> 'a\<Colon>order, y1 \<Colon> 'b\<Colon>order) \<le> (x2, y2) \<longleftrightarrow>
386 x1 < x2 \<or> x1 = x2 \<and> y1 \<le> y2"
387 by (simp_all add: less_prod_def less_eq_prod_def)
390 \noindent Then code generation will fail. Why? The definition
391 of @{term "op \<le>"} depends on equality on both arguments,
392 which are polymorphic and impose an additional @{class eq}
393 class constraint, which the preprocessor does not propagate
394 (for technical reasons).
396 The solution is to add @{class eq} explicitly to the first sort arguments in the
400 lemma %quote order_prod_code [code]:
401 "(x1 \<Colon> 'a\<Colon>{order, eq}, y1 \<Colon> 'b\<Colon>order) < (x2, y2) \<longleftrightarrow>
402 x1 < x2 \<or> x1 = x2 \<and> y1 < y2"
403 "(x1 \<Colon> 'a\<Colon>{order, eq}, y1 \<Colon> 'b\<Colon>order) \<le> (x2, y2) \<longleftrightarrow>
404 x1 < x2 \<or> x1 = x2 \<and> y1 \<le> y2"
405 by (simp_all add: less_prod_def less_eq_prod_def)
408 \noindent Then code generation succeeds:
411 text %quote {*@{code_stmts "op \<le> \<Colon> _ \<times> _ \<Rightarrow> _ \<times> _ \<Rightarrow> bool" (SML)}*}
414 In some cases, the automatically derived code equations
415 for equality on a particular type may not be appropriate.
416 As example, watch the following datatype representing
417 monomorphic parametric types (where type constructors
418 are referred to by natural numbers):
421 datatype %quote monotype = Mono nat "monotype list"
424 "eq_class.eq (Mono tyco1 typargs1) (Mono tyco2 typargs2) \<equiv>
425 eq_class.eq tyco1 tyco2 \<and> eq_class.eq typargs1 typargs2" by (simp add: eq)
429 \noindent Then code generation for SML would fail with a message
430 that the generated code contains illegal mutual dependencies:
431 the theorem @{thm monotype_eq [no_vars]} already requires the
432 instance @{text "monotype \<Colon> eq"}, which itself requires
433 @{thm monotype_eq [no_vars]}; Haskell has no problem with mutually
434 recursive @{text instance} and @{text function} definitions,
435 but the SML serialiser does not support this.
437 In such cases, you have to provide your own equality equations
438 involving auxiliary constants. In our case,
439 @{const [show_types] list_all2} can do the job:
442 lemma %quote monotype_eq_list_all2 [code]:
443 "eq_class.eq (Mono tyco1 typargs1) (Mono tyco2 typargs2) \<longleftrightarrow>
444 eq_class.eq tyco1 tyco2 \<and> list_all2 eq_class.eq typargs1 typargs2"
445 by (simp add: eq list_all2_eq [symmetric])
448 \noindent does not depend on instance @{text "monotype \<Colon> eq"}:
451 text %quote {*@{code_stmts "eq_class.eq :: monotype \<Rightarrow> monotype \<Rightarrow> bool" (SML)}*}
454 subsection {* Explicit partiality *}
457 Partiality usually enters the game by partial patterns, as
458 in the following example, again for amortised queues:
461 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
462 "strict_dequeue q = (case dequeue q
463 of (Some x, q') \<Rightarrow> (x, q'))"
465 lemma %quote strict_dequeue_AQueue [code]:
466 "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"
467 "strict_dequeue (AQueue xs []) =
468 (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"
469 by (simp_all add: strict_dequeue_def dequeue_AQueue split: list.splits)
472 \noindent In the corresponding code, there is no equation
473 for the pattern @{term "AQueue [] []"}:
476 text %quote {*@{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}*}
479 \noindent In some cases it is desirable to have this
480 pseudo-\qt{partiality} more explicitly, e.g.~as follows:
483 axiomatization %quote empty_queue :: 'a
485 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
486 "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"
488 lemma %quote strict_dequeue'_AQueue [code]:
489 "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue
490 else strict_dequeue' (AQueue [] (rev xs)))"
491 "strict_dequeue' (AQueue xs (y # ys)) =
493 by (simp_all add: strict_dequeue'_def dequeue_AQueue split: list.splits)
496 Observe that on the right hand side of the definition of @{const
497 "strict_dequeue'"} the constant @{const empty_queue} occurs
498 which is unspecified.
500 Normally, if constants without any code equations occur in a
501 program, the code generator complains (since in most cases this is
502 not what the user expects). But such constants can also be thought
503 of as function definitions with no equations which always fail,
504 since there is never a successful pattern match on the left hand
505 side. In order to categorise a constant into that category
506 explicitly, use @{command "code_abort"}:
509 code_abort %quote empty_queue
512 \noindent Then the code generator will just insert an error or
513 exception at the appropriate position:
516 text %quote {*@{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}*}
519 \noindent This feature however is rarely needed in practice.
520 Note also that the @{text HOL} default setup already declares
521 @{const undefined} as @{command "code_abort"}, which is most
522 likely to be used in such situations.