doc-src/IsarAdvanced/Codegen/Thy/Program.thy
author haftmann
Wed, 04 Feb 2009 10:59:31 +0100
changeset 29735 6df726203e39
parent 29733 a342da8ddf39
permissions -rw-r--r--
proper datatype abstraction example
     1 theory Program
     2 imports Introduction
     3 begin
     4 
     5 section {* Turning Theories into Programs \label{sec:program} *}
     6 
     7 subsection {* The @{text "Isabelle/HOL"} default setup *}
     8 
     9 text {*
    10   We have already seen how by default equations stemming from
    11   @{command definition}/@{command primrec}/@{command fun}
    12   statements are used for code generation.  This default behaviour
    13   can be changed, e.g. by providing different code equations.
    14   All kinds of customisation shown in this section is \emph{safe}
    15   in the sense that the user does not have to worry about
    16   correctness -- all programs generatable that way are partially
    17   correct.
    18 *}
    19 
    20 subsection {* Selecting code equations *}
    21 
    22 text {*
    23   Coming back to our introductory example, we
    24   could provide an alternative code equations for @{const dequeue}
    25   explicitly:
    26 *}
    27 
    28 lemma %quote [code]:
    29   "dequeue (AQueue xs []) =
    30      (if xs = [] then (None, AQueue [] [])
    31        else dequeue (AQueue [] (rev xs)))"
    32   "dequeue (AQueue xs (y # ys)) =
    33      (Some y, AQueue xs ys)"
    34   by (cases xs, simp_all) (cases "rev xs", simp_all)
    35 
    36 text {*
    37   \noindent The annotation @{text "[code]"} is an @{text Isar}
    38   @{text attribute} which states that the given theorems should be
    39   considered as code equations for a @{text fun} statement --
    40   the corresponding constant is determined syntactically.  The resulting code:
    41 *}
    42 
    43 text %quote {*@{code_stmts dequeue (consts) dequeue (Haskell)}*}
    44 
    45 text {*
    46   \noindent You may note that the equality test @{term "xs = []"} has been
    47   replaced by the predicate @{term "null xs"}.  This is due to the default
    48   setup in the \qn{preprocessor} to be discussed further below (\secref{sec:preproc}).
    49 
    50   Changing the default constructor set of datatypes is also
    51   possible.  See \secref{sec:datatypes} for an example.
    52 
    53   As told in \secref{sec:concept}, code generation is based
    54   on a structured collection of code theorems.
    55   For explorative purpose, this collection
    56   may be inspected using the @{command code_thms} command:
    57 *}
    58 
    59 code_thms %quote dequeue
    60 
    61 text {*
    62   \noindent prints a table with \emph{all} code equations
    63   for @{const dequeue}, including
    64   \emph{all} code equations those equations depend
    65   on recursively.
    66   
    67   Similarly, the @{command code_deps} command shows a graph
    68   visualising dependencies between code equations.
    69 *}
    70 
    71 subsection {* @{text class} and @{text instantiation} *}
    72 
    73 text {*
    74   Concerning type classes and code generation, let us examine an example
    75   from abstract algebra:
    76 *}
    77 
    78 class %quote semigroup =
    79   fixes mult :: "'a \<Rightarrow> 'a \<Rightarrow> 'a" (infixl "\<otimes>" 70)
    80   assumes assoc: "(x \<otimes> y) \<otimes> z = x \<otimes> (y \<otimes> z)"
    81 
    82 class %quote monoid = semigroup +
    83   fixes neutral :: 'a ("\<one>")
    84   assumes neutl: "\<one> \<otimes> x = x"
    85     and neutr: "x \<otimes> \<one> = x"
    86 
    87 instantiation %quote nat :: monoid
    88 begin
    89 
    90 primrec %quote mult_nat where
    91     "0 \<otimes> n = (0\<Colon>nat)"
    92   | "Suc m \<otimes> n = n + m \<otimes> n"
    93 
    94 definition %quote neutral_nat where
    95   "\<one> = Suc 0"
    96 
    97 lemma %quote add_mult_distrib:
    98   fixes n m q :: nat
    99   shows "(n + m) \<otimes> q = n \<otimes> q + m \<otimes> q"
   100   by (induct n) simp_all
   101 
   102 instance %quote proof
   103   fix m n q :: nat
   104   show "m \<otimes> n \<otimes> q = m \<otimes> (n \<otimes> q)"
   105     by (induct m) (simp_all add: add_mult_distrib)
   106   show "\<one> \<otimes> n = n"
   107     by (simp add: neutral_nat_def)
   108   show "m \<otimes> \<one> = m"
   109     by (induct m) (simp_all add: neutral_nat_def)
   110 qed
   111 
   112 end %quote
   113 
   114 text {*
   115   \noindent We define the natural operation of the natural numbers
   116   on monoids:
   117 *}
   118 
   119 primrec %quote (in monoid) pow :: "nat \<Rightarrow> 'a \<Rightarrow> 'a" where
   120     "pow 0 a = \<one>"
   121   | "pow (Suc n) a = a \<otimes> pow n a"
   122 
   123 text {*
   124   \noindent This we use to define the discrete exponentiation function:
   125 *}
   126 
   127 definition %quote bexp :: "nat \<Rightarrow> nat" where
   128   "bexp n = pow n (Suc (Suc 0))"
   129 
   130 text {*
   131   \noindent The corresponding code:
   132 *}
   133 
   134 text %quote {*@{code_stmts bexp (Haskell)}*}
   135 
   136 text {*
   137   \noindent This is a convenient place to show how explicit dictionary construction
   138   manifests in generated code (here, the same example in @{text SML}):
   139 *}
   140 
   141 text %quote {*@{code_stmts bexp (SML)}*}
   142 
   143 text {*
   144   \noindent Note the parameters with trailing underscore (@{verbatim "A_"})
   145     which are the dictionary parameters.
   146 *}
   147 
   148 subsection {* The preprocessor \label{sec:preproc} *}
   149 
   150 text {*
   151   Before selected function theorems are turned into abstract
   152   code, a chain of definitional transformation steps is carried
   153   out: \emph{preprocessing}.  In essence, the preprocessor
   154   consists of two components: a \emph{simpset} and \emph{function transformers}.
   155 
   156   The \emph{simpset} allows to employ the full generality of the Isabelle
   157   simplifier.  Due to the interpretation of theorems
   158   as code equations, rewrites are applied to the right
   159   hand side and the arguments of the left hand side of an
   160   equation, but never to the constant heading the left hand side.
   161   An important special case are \emph{inline theorems} which may be
   162   declared and undeclared using the
   163   \emph{code inline} or \emph{code inline del} attribute respectively.
   164 
   165   Some common applications:
   166 *}
   167 
   168 text_raw {*
   169   \begin{itemize}
   170 *}
   171 
   172 text {*
   173      \item replacing non-executable constructs by executable ones:
   174 *}     
   175 
   176 lemma %quote [code inline]:
   177   "x \<in> set xs \<longleftrightarrow> x mem xs" by (induct xs) simp_all
   178 
   179 text {*
   180      \item eliminating superfluous constants:
   181 *}
   182 
   183 lemma %quote [code inline]:
   184   "1 = Suc 0" by simp
   185 
   186 text {*
   187      \item replacing executable but inconvenient constructs:
   188 *}
   189 
   190 lemma %quote [code inline]:
   191   "xs = [] \<longleftrightarrow> List.null xs" by (induct xs) simp_all
   192 
   193 text_raw {*
   194   \end{itemize}
   195 *}
   196 
   197 text {*
   198   \noindent \emph{Function transformers} provide a very general interface,
   199   transforming a list of function theorems to another
   200   list of function theorems, provided that neither the heading
   201   constant nor its type change.  The @{term "0\<Colon>nat"} / @{const Suc}
   202   pattern elimination implemented in
   203   theory @{text Efficient_Nat} (see \secref{eff_nat}) uses this
   204   interface.
   205 
   206   \noindent The current setup of the preprocessor may be inspected using
   207   the @{command print_codesetup} command.
   208   @{command code_thms} provides a convenient
   209   mechanism to inspect the impact of a preprocessor setup
   210   on code equations.
   211 
   212   \begin{warn}
   213     The attribute \emph{code unfold}
   214     associated with the @{text "SML code generator"} also applies to
   215     the @{text "generic code generator"}:
   216     \emph{code unfold} implies \emph{code inline}.
   217   \end{warn}
   218 *}
   219 
   220 subsection {* Datatypes \label{sec:datatypes} *}
   221 
   222 text {*
   223   Conceptually, any datatype is spanned by a set of
   224   \emph{constructors} of type @{text "\<tau> = \<dots> \<Rightarrow> \<kappa> \<alpha>\<^isub>1 \<dots> \<alpha>\<^isub>n"} where @{text
   225   "{\<alpha>\<^isub>1, \<dots>, \<alpha>\<^isub>n}"} is exactly the set of \emph{all} type variables in
   226   @{text "\<tau>"}.  The HOL datatype package by default registers any new
   227   datatype in the table of datatypes, which may be inspected using the
   228   @{command print_codesetup} command.
   229 
   230   In some cases, it is appropriate to alter or extend this table.  As
   231   an example, we will develop an alternative representation of the
   232   queue example given in \secref{sec:intro}.  The amortised
   233   representation is convenient for generating code but exposes its
   234   \qt{implementation} details, which may be cumbersome when proving
   235   theorems about it.  Therefore, here a simple, straightforward
   236   representation of queues:
   237 *}
   238 
   239 datatype %quote 'a queue = Queue "'a list"
   240 
   241 definition %quote empty :: "'a queue" where
   242   "empty = Queue []"
   243 
   244 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
   245   "enqueue x (Queue xs) = Queue (xs @ [x])"
   246 
   247 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
   248     "dequeue (Queue []) = (None, Queue [])"
   249   | "dequeue (Queue (x # xs)) = (Some x, Queue xs)"
   250 
   251 text {*
   252   \noindent This we can use directly for proving;  for executing,
   253   we provide an alternative characterisation:
   254 *}
   255 
   256 definition %quote AQueue :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a queue" where
   257   "AQueue xs ys = Queue (ys @ rev xs)"
   258 
   259 code_datatype %quote AQueue
   260 
   261 text {*
   262   \noindent Here we define a \qt{constructor} @{const "AQueue"} which
   263   is defined in terms of @{text "Queue"} and interprets its arguments
   264   according to what the \emph{content} of an amortised queue is supposed
   265   to be.  Equipped with this, we are able to prove the following equations
   266   for our primitive queue operations which \qt{implement} the simple
   267   queues in an amortised fashion:
   268 *}
   269 
   270 lemma %quote empty_AQueue [code]:
   271   "empty = AQueue [] []"
   272   unfolding AQueue_def empty_def by simp
   273 
   274 lemma %quote enqueue_AQueue [code]:
   275   "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
   276   unfolding AQueue_def by simp
   277 
   278 lemma %quote dequeue_AQueue [code]:
   279   "dequeue (AQueue xs []) =
   280     (if xs = [] then (None, AQueue [] [])
   281     else dequeue (AQueue [] (rev xs)))"
   282   "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
   283   unfolding AQueue_def by simp_all
   284 
   285 text {*
   286   \noindent For completeness, we provide a substitute for the
   287   @{text case} combinator on queues:
   288 *}
   289 
   290 definition %quote
   291   aqueue_case_def: "aqueue_case = queue_case"
   292 
   293 lemma %quote aqueue_case [code, code inline]:
   294   "queue_case = aqueue_case"
   295   unfolding aqueue_case_def ..
   296 
   297 lemma %quote case_AQueue [code]:
   298   "aqueue_case f (AQueue xs ys) = f (ys @ rev xs)"
   299   unfolding aqueue_case_def AQueue_def by simp
   300 
   301 text {*
   302   \noindent The resulting code looks as expected:
   303 *}
   304 
   305 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
   306 
   307 text {*
   308   \noindent From this example, it can be glimpsed that using own
   309   constructor sets is a little delicate since it changes the set of
   310   valid patterns for values of that type.  Without going into much
   311   detail, here some practical hints:
   312 
   313   \begin{itemize}
   314 
   315     \item When changing the constructor set for datatypes, take care
   316       to provide an alternative for the @{text case} combinator
   317       (e.g.~by replacing it using the preprocessor).
   318 
   319     \item Values in the target language need not to be normalised --
   320       different values in the target language may represent the same
   321       value in the logic.
   322 
   323     \item Usually, a good methodology to deal with the subtleties of
   324       pattern matching is to see the type as an abstract type: provide
   325       a set of operations which operate on the concrete representation
   326       of the type, and derive further operations by combinations of
   327       these primitive ones, without relying on a particular
   328       representation.
   329 
   330   \end{itemize}
   331 *}
   332 
   333 
   334 subsection {* Equality and wellsortedness *}
   335 
   336 text {*
   337   Surely you have already noticed how equality is treated
   338   by the code generator:
   339 *}
   340 
   341 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where
   342   "collect_duplicates xs ys [] = xs"
   343   | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs
   344       then if z \<in> set ys
   345         then collect_duplicates xs ys zs
   346         else collect_duplicates xs (z#ys) zs
   347       else collect_duplicates (z#xs) (z#ys) zs)"
   348 
   349 text {*
   350   \noindent The membership test during preprocessing is rewritten,
   351   resulting in @{const List.member}, which itself
   352   performs an explicit equality check.
   353 *}
   354 
   355 text %quote {*@{code_stmts collect_duplicates (SML)}*}
   356 
   357 text {*
   358   \noindent Obviously, polymorphic equality is implemented the Haskell
   359   way using a type class.  How is this achieved?  HOL introduces
   360   an explicit class @{class eq} with a corresponding operation
   361   @{const eq_class.eq} such that @{thm eq [no_vars]}.
   362   The preprocessing framework does the rest by propagating the
   363   @{class eq} constraints through all dependent code equations.
   364   For datatypes, instances of @{class eq} are implicitly derived
   365   when possible.  For other types, you may instantiate @{text eq}
   366   manually like any other type class.
   367 
   368   Though this @{text eq} class is designed to get rarely in
   369   the way, a subtlety
   370   enters the stage when definitions of overloaded constants
   371   are dependent on operational equality.  For example, let
   372   us define a lexicographic ordering on tuples
   373   (also see theory @{theory Product_ord}):
   374 *}
   375 
   376 instantiation %quote "*" :: (order, order) order
   377 begin
   378 
   379 definition %quote [code del]:
   380   "x \<le> y \<longleftrightarrow> fst x < fst y \<or> fst x = fst y \<and> snd x \<le> snd y"
   381 
   382 definition %quote [code del]:
   383   "x < y \<longleftrightarrow> fst x < fst y \<or> fst x = fst y \<and> snd x < snd y"
   384 
   385 instance %quote proof
   386 qed (auto simp: less_eq_prod_def less_prod_def intro: order_less_trans)
   387 
   388 end %quote
   389 
   390 lemma %quote order_prod [code]:
   391   "(x1 \<Colon> 'a\<Colon>order, y1 \<Colon> 'b\<Colon>order) < (x2, y2) \<longleftrightarrow>
   392      x1 < x2 \<or> x1 = x2 \<and> y1 < y2"
   393   "(x1 \<Colon> 'a\<Colon>order, y1 \<Colon> 'b\<Colon>order) \<le> (x2, y2) \<longleftrightarrow>
   394      x1 < x2 \<or> x1 = x2 \<and> y1 \<le> y2"
   395   by (simp_all add: less_prod_def less_eq_prod_def)
   396 
   397 text {*
   398   \noindent Then code generation will fail.  Why?  The definition
   399   of @{term "op \<le>"} depends on equality on both arguments,
   400   which are polymorphic and impose an additional @{class eq}
   401   class constraint, which the preprocessor does not propagate
   402   (for technical reasons).
   403 
   404   The solution is to add @{class eq} explicitly to the first sort arguments in the
   405   code theorems:
   406 *}
   407 
   408 lemma %quote order_prod_code [code]:
   409   "(x1 \<Colon> 'a\<Colon>{order, eq}, y1 \<Colon> 'b\<Colon>order) < (x2, y2) \<longleftrightarrow>
   410      x1 < x2 \<or> x1 = x2 \<and> y1 < y2"
   411   "(x1 \<Colon> 'a\<Colon>{order, eq}, y1 \<Colon> 'b\<Colon>order) \<le> (x2, y2) \<longleftrightarrow>
   412      x1 < x2 \<or> x1 = x2 \<and> y1 \<le> y2"
   413   by (simp_all add: less_prod_def less_eq_prod_def)
   414 
   415 text {*
   416   \noindent Then code generation succeeds:
   417 *}
   418 
   419 text %quote {*@{code_stmts "op \<le> \<Colon> _ \<times> _ \<Rightarrow> _ \<times> _ \<Rightarrow> bool" (SML)}*}
   420 
   421 text {*
   422   In some cases, the automatically derived code equations
   423   for equality on a particular type may not be appropriate.
   424   As example, watch the following datatype representing
   425   monomorphic parametric types (where type constructors
   426   are referred to by natural numbers):
   427 *}
   428 
   429 datatype %quote monotype = Mono nat "monotype list"
   430 (*<*)
   431 lemma monotype_eq:
   432   "eq_class.eq (Mono tyco1 typargs1) (Mono tyco2 typargs2) \<equiv> 
   433      eq_class.eq tyco1 tyco2 \<and> eq_class.eq typargs1 typargs2" by (simp add: eq)
   434 (*>*)
   435 
   436 text {*
   437   \noindent Then code generation for SML would fail with a message
   438   that the generated code contains illegal mutual dependencies:
   439   the theorem @{thm monotype_eq [no_vars]} already requires the
   440   instance @{text "monotype \<Colon> eq"}, which itself requires
   441   @{thm monotype_eq [no_vars]};  Haskell has no problem with mutually
   442   recursive @{text instance} and @{text function} definitions,
   443   but the SML serialiser does not support this.
   444 
   445   In such cases, you have to provide your own equality equations
   446   involving auxiliary constants.  In our case,
   447   @{const [show_types] list_all2} can do the job:
   448 *}
   449 
   450 lemma %quote monotype_eq_list_all2 [code]:
   451   "eq_class.eq (Mono tyco1 typargs1) (Mono tyco2 typargs2) \<longleftrightarrow>
   452      eq_class.eq tyco1 tyco2 \<and> list_all2 eq_class.eq typargs1 typargs2"
   453   by (simp add: eq list_all2_eq [symmetric])
   454 
   455 text {*
   456   \noindent does not depend on instance @{text "monotype \<Colon> eq"}:
   457 *}
   458 
   459 text %quote {*@{code_stmts "eq_class.eq :: monotype \<Rightarrow> monotype \<Rightarrow> bool" (SML)}*}
   460 
   461 
   462 subsection {* Explicit partiality *}
   463 
   464 text {*
   465   Partiality usually enters the game by partial patterns, as
   466   in the following example, again for amortised queues:
   467 *}
   468 
   469 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
   470   "strict_dequeue q = (case dequeue q
   471     of (Some x, q') \<Rightarrow> (x, q'))"
   472 
   473 lemma %quote strict_dequeue_AQueue [code]:
   474   "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"
   475   "strict_dequeue (AQueue xs []) =
   476     (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"
   477   by (simp_all add: strict_dequeue_def dequeue_AQueue split: list.splits)
   478 
   479 text {*
   480   \noindent In the corresponding code, there is no equation
   481   for the pattern @{term "AQueue [] []"}:
   482 *}
   483 
   484 text %quote {*@{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}*}
   485 
   486 text {*
   487   \noindent In some cases it is desirable to have this
   488   pseudo-\qt{partiality} more explicitly, e.g.~as follows:
   489 *}
   490 
   491 axiomatization %quote empty_queue :: 'a
   492 
   493 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
   494   "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"
   495 
   496 lemma %quote strict_dequeue'_AQueue [code]:
   497   "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue
   498      else strict_dequeue' (AQueue [] (rev xs)))"
   499   "strict_dequeue' (AQueue xs (y # ys)) =
   500      (y, AQueue xs ys)"
   501   by (simp_all add: strict_dequeue'_def dequeue_AQueue split: list.splits)
   502 
   503 text {*
   504   Observe that on the right hand side of the definition of @{const
   505   "strict_dequeue'"} the constant @{const empty_queue} occurs
   506   which is unspecified.
   507 
   508   Normally, if constants without any code equations occur in a
   509   program, the code generator complains (since in most cases this is
   510   not what the user expects).  But such constants can also be thought
   511   of as function definitions with no equations which always fail,
   512   since there is never a successful pattern match on the left hand
   513   side.  In order to categorise a constant into that category
   514   explicitly, use @{command "code_abort"}:
   515 *}
   516 
   517 code_abort %quote empty_queue
   518 
   519 text {*
   520   \noindent Then the code generator will just insert an error or
   521   exception at the appropriate position:
   522 *}
   523 
   524 text %quote {*@{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}*}
   525 
   526 text {*
   527   \noindent This feature however is rarely needed in practice.
   528   Note also that the @{text HOL} default setup already declares
   529   @{const undefined} as @{command "code_abort"}, which is most
   530   likely to be used in such situations.
   531 *}
   532 
   533 end
   534