wneuper/isa: doc-src/Codegen/Thy/Program.thy@853abb4853cc

     1 theory Program

     2 imports Introduction

     3 begin

     5 section {* Turning Theories into Programs \label{sec:program} *}

     7 subsection {* The @{text "Isabelle/HOL"} default setup *}

     9 text {*

    10   We have already seen how by default equations stemming from

    11   @{command definition}/@{command primrec}/@{command fun}

    12   statements are used for code generation.  This default behaviour

    13   can be changed, e.g. by providing different code equations.

    14   All kinds of customisation shown in this section is \emph{safe}

    15   in the sense that the user does not have to worry about

    16   correctness -- all programs generatable that way are partially

    17   correct.

    18 *}

    20 subsection {* Selecting code equations *}

    22 text {*

    23   Coming back to our introductory example, we

    24   could provide an alternative code equations for @{const dequeue}

    25   explicitly:

    26 *}

    28 lemma %quote [code]:

    29   "dequeue (AQueue xs []) =

    30      (if xs = [] then (None, AQueue [] [])

    31        else dequeue (AQueue [] (rev xs)))"

    32   "dequeue (AQueue xs (y # ys)) =

    33      (Some y, AQueue xs ys)"

    34   by (cases xs, simp_all) (cases "rev xs", simp_all)

    36 text {*

    37   \noindent The annotation @{text "[code]"} is an @{text Isar}

    38   @{text attribute} which states that the given theorems should be

    39   considered as code equations for a @{text fun} statement --

    40   the corresponding constant is determined syntactically.  The resulting code:

    41 *}

    43 text %quote {*@{code_stmts dequeue (consts) dequeue (Haskell)}*}

    45 text {*

    46   \noindent You may note that the equality test @{term "xs = []"} has been

    47   replaced by the predicate @{term "null xs"}.  This is due to the default

    48   setup in the \qn{preprocessor} to be discussed further below (\secref{sec:preproc}).

    50   Changing the default constructor set of datatypes is also

    51   possible.  See \secref{sec:datatypes} for an example.

    53   As told in \secref{sec:concept}, code generation is based

    54   on a structured collection of code theorems.

    55   For explorative purpose, this collection

    56   may be inspected using the @{command code_thms} command:

    57 *}

    59 code_thms %quote dequeue

    61 text {*

    62   \noindent prints a table with \emph{all} code equations

    63   for @{const dequeue}, including

    64   \emph{all} code equations those equations depend

    65   on recursively.

    67   Similarly, the @{command code_deps} command shows a graph

    68   visualising dependencies between code equations.

    69 *}

    71 subsection {* @{text class} and @{text instantiation} *}

    73 text {*

    74   Concerning type classes and code generation, let us examine an example

    75   from abstract algebra:

    76 *}

    78 class %quote semigroup =

    79   fixes mult :: "'a \<Rightarrow> 'a \<Rightarrow> 'a" (infixl "\<otimes>" 70)

    80   assumes assoc: "(x \<otimes> y) \<otimes> z = x \<otimes> (y \<otimes> z)"

    82 class %quote monoid = semigroup +

    83   fixes neutral :: 'a ("\<one>")

    84   assumes neutl: "\<one> \<otimes> x = x"

    85     and neutr: "x \<otimes> \<one> = x"

    87 instantiation %quote nat :: monoid

    88 begin

    90 primrec %quote mult_nat where

    91     "0 \<otimes> n = (0\<Colon>nat)"

    92   | "Suc m \<otimes> n = n + m \<otimes> n"

    94 definition %quote neutral_nat where

    95   "\<one> = Suc 0"

    97 lemma %quote add_mult_distrib:

    98   fixes n m q :: nat

    99   shows "(n + m) \<otimes> q = n \<otimes> q + m \<otimes> q"

   100   by (induct n) simp_all

   102 instance %quote proof

   103   fix m n q :: nat

   104   show "m \<otimes> n \<otimes> q = m \<otimes> (n \<otimes> q)"

   105     by (induct m) (simp_all add: add_mult_distrib)

   106   show "\<one> \<otimes> n = n"

   107     by (simp add: neutral_nat_def)

   108   show "m \<otimes> \<one> = m"

   109     by (induct m) (simp_all add: neutral_nat_def)

   110 qed

   112 end %quote

   114 text {*

   115   \noindent We define the natural operation of the natural numbers

   116   on monoids:

   117 *}

   119 primrec %quote (in monoid) pow :: "nat \<Rightarrow> 'a \<Rightarrow> 'a" where

   120     "pow 0 a = \<one>"

   121   | "pow (Suc n) a = a \<otimes> pow n a"

   123 text {*

   124   \noindent This we use to define the discrete exponentiation function:

   125 *}

   127 definition %quote bexp :: "nat \<Rightarrow> nat" where

   128   "bexp n = pow n (Suc (Suc 0))"

   130 text {*

   131   \noindent The corresponding code:

   132 *}

   134 text %quote {*@{code_stmts bexp (Haskell)}*}

   136 text {*

   137   \noindent This is a convenient place to show how explicit dictionary construction

   138   manifests in generated code (here, the same example in @{text SML}):

   139 *}

   141 text %quote {*@{code_stmts bexp (SML)}*}

   143 text {*

   144   \noindent Note the parameters with trailing underscore (@{verbatim "A_"})

   145     which are the dictionary parameters.

   146 *}

   148 subsection {* The preprocessor \label{sec:preproc} *}

   150 text {*

   151   Before selected function theorems are turned into abstract

   152   code, a chain of definitional transformation steps is carried

   153   out: \emph{preprocessing}.  In essence, the preprocessor

   154   consists of two components: a \emph{simpset} and \emph{function transformers}.

   156   The \emph{simpset} allows to employ the full generality of the Isabelle

   157   simplifier.  Due to the interpretation of theorems

   158   as code equations, rewrites are applied to the right

   159   hand side and the arguments of the left hand side of an

   160   equation, but never to the constant heading the left hand side.

   161   An important special case are \emph{inline theorems} which may be

   162   declared and undeclared using the

   163   \emph{code inline} or \emph{code inline del} attribute respectively.

   165   Some common applications:

   166 *}

   168 text_raw {*

   169   \begin{itemize}

   170 *}

   172 text {*

   173      \item replacing non-executable constructs by executable ones:

   174 *}

   176 lemma %quote [code inline]:

   177   "x \<in> set xs \<longleftrightarrow> x mem xs" by (induct xs) simp_all

   179 text {*

   180      \item eliminating superfluous constants:

   181 *}

   183 lemma %quote [code inline]:

   184   "1 = Suc 0" by simp

   186 text {*

   187      \item replacing executable but inconvenient constructs:

   188 *}

   190 lemma %quote [code inline]:

   191   "xs = [] \<longleftrightarrow> List.null xs" by (induct xs) simp_all

   193 text_raw {*

   194   \end{itemize}

   195 *}

   197 text {*

   198   \noindent \emph{Function transformers} provide a very general interface,

   199   transforming a list of function theorems to another

   200   list of function theorems, provided that neither the heading

   201   constant nor its type change.  The @{term "0\<Colon>nat"} / @{const Suc}

   202   pattern elimination implemented in

   203   theory @{text Efficient_Nat} (see \secref{eff_nat}) uses this

   204   interface.

   206   \noindent The current setup of the preprocessor may be inspected using

   207   the @{command print_codesetup} command.

   208   @{command code_thms} provides a convenient

   209   mechanism to inspect the impact of a preprocessor setup

   210   on code equations.

   212   \begin{warn}

   213     The attribute \emph{code unfold}

   214     associated with the @{text "SML code generator"} also applies to

   215     the @{text "generic code generator"}:

   216     \emph{code unfold} implies \emph{code inline}.

   217   \end{warn}

   218 *}

   220 subsection {* Datatypes \label{sec:datatypes} *}

   222 text {*

   223   Conceptually, any datatype is spanned by a set of

   224   \emph{constructors} of type @{text "\<tau> = \<dots> \<Rightarrow> \<kappa> \<alpha>\<^isub>1 \<dots> \<alpha>\<^isub>n"} where @{text

   225   "{\<alpha>\<^isub>1, \<dots>, \<alpha>\<^isub>n}"} is exactly the set of \emph{all} type variables in

   226   @{text "\<tau>"}.  The HOL datatype package by default registers any new

   227   datatype in the table of datatypes, which may be inspected using the

   228   @{command print_codesetup} command.

   230   In some cases, it is appropriate to alter or extend this table.  As

   231   an example, we will develop an alternative representation of the

   232   queue example given in \secref{sec:intro}.  The amortised

   233   representation is convenient for generating code but exposes its

   234   \qt{implementation} details, which may be cumbersome when proving

   235   theorems about it.  Therefore, here a simple, straightforward

   236   representation of queues:

   237 *}

   239 datatype %quote 'a queue = Queue "'a list"

   241 definition %quote empty :: "'a queue" where

   242   "empty = Queue []"

   244 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where

   245   "enqueue x (Queue xs) = Queue (xs @ [x])"

   247 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where

   248     "dequeue (Queue []) = (None, Queue [])"

   249   | "dequeue (Queue (x # xs)) = (Some x, Queue xs)"

   251 text {*

   252   \noindent This we can use directly for proving;  for executing,

   253   we provide an alternative characterisation:

   254 *}

   256 definition %quote AQueue :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a queue" where

   257   "AQueue xs ys = Queue (ys @ rev xs)"

   259 code_datatype %quote AQueue

   261 text {*

   262   \noindent Here we define a \qt{constructor} @{const "AQueue"} which

   263   is defined in terms of @{text "Queue"} and interprets its arguments

   264   according to what the \emph{content} of an amortised queue is supposed

   265   to be.  Equipped with this, we are able to prove the following equations

   266   for our primitive queue operations which \qt{implement} the simple

   267   queues in an amortised fashion:

   268 *}

   270 lemma %quote empty_AQueue [code]:

   271   "empty = AQueue [] []"

   272   unfolding AQueue_def empty_def by simp

   274 lemma %quote enqueue_AQueue [code]:

   275   "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"

   276   unfolding AQueue_def by simp

   278 lemma %quote dequeue_AQueue [code]:

   279   "dequeue (AQueue xs []) =

   280     (if xs = [] then (None, AQueue [] [])

   281     else dequeue (AQueue [] (rev xs)))"

   282   "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"

   283   unfolding AQueue_def by simp_all

   285 text {*

   286   \noindent For completeness, we provide a substitute for the

   287   @{text case} combinator on queues:

   288 *}

   290 lemma %quote queue_case_AQueue [code]:

   291   "queue_case f (AQueue xs ys) = f (ys @ rev xs)"

   292   unfolding AQueue_def by simp

   294 text {*

   295   \noindent The resulting code looks as expected:

   296 *}

   298 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}

   300 text {*

   301   \noindent From this example, it can be glimpsed that using own

   302   constructor sets is a little delicate since it changes the set of

   303   valid patterns for values of that type.  Without going into much

   304   detail, here some practical hints:

   306   \begin{itemize}

   308     \item When changing the constructor set for datatypes, take care

   309       to provide alternative equations for the @{text case} combinator.

   311     \item Values in the target language need not to be normalised --

   312       different values in the target language may represent the same

   313       value in the logic.

   315     \item Usually, a good methodology to deal with the subtleties of

   316       pattern matching is to see the type as an abstract type: provide

   317       a set of operations which operate on the concrete representation

   318       of the type, and derive further operations by combinations of

   319       these primitive ones, without relying on a particular

   320       representation.

   322   \end{itemize}

   323 *}

   326 subsection {* Equality and wellsortedness *}

   328 text {*

   329   Surely you have already noticed how equality is treated

   330   by the code generator:

   331 *}

   333 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where

   334   "collect_duplicates xs ys [] = xs"

   335   | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs

   336       then if z \<in> set ys

   337         then collect_duplicates xs ys zs

   338         else collect_duplicates xs (z#ys) zs

   339       else collect_duplicates (z#xs) (z#ys) zs)"

   341 text {*

   342   \noindent The membership test during preprocessing is rewritten,

   343   resulting in @{const List.member}, which itself

   344   performs an explicit equality check.

   345 *}

   347 text %quote {*@{code_stmts collect_duplicates (SML)}*}

   349 text {*

   350   \noindent Obviously, polymorphic equality is implemented the Haskell

   351   way using a type class.  How is this achieved?  HOL introduces

   352   an explicit class @{class eq} with a corresponding operation

   353   @{const eq_class.eq} such that @{thm eq [no_vars]}.

   354   The preprocessing framework does the rest by propagating the

   355   @{class eq} constraints through all dependent code equations.

   356   For datatypes, instances of @{class eq} are implicitly derived

   357   when possible.  For other types, you may instantiate @{text eq}

   358   manually like any other type class.

   360   Though this @{text eq} class is designed to get rarely in

   361   the way, a subtlety

   362   enters the stage when definitions of overloaded constants

   363   are dependent on operational equality.  For example, let

   364   us define a lexicographic ordering on tuples

   365   (also see theory @{theory Product_ord}):

   366 *}

   368 instantiation %quote "*" :: (order, order) order

   369 begin

   371 definition %quote [code del]:

   372   "x \<le> y \<longleftrightarrow> fst x < fst y \<or> fst x = fst y \<and> snd x \<le> snd y"

   374 definition %quote [code del]:

   375   "x < y \<longleftrightarrow> fst x < fst y \<or> fst x = fst y \<and> snd x < snd y"

   377 instance %quote proof

   378 qed (auto simp: less_eq_prod_def less_prod_def intro: order_less_trans)

   380 end %quote

   382 lemma %quote order_prod [code]:

   383   "(x1 \<Colon> 'a\<Colon>order, y1 \<Colon> 'b\<Colon>order) < (x2, y2) \<longleftrightarrow>

   384      x1 < x2 \<or> x1 = x2 \<and> y1 < y2"

   385   "(x1 \<Colon> 'a\<Colon>order, y1 \<Colon> 'b\<Colon>order) \<le> (x2, y2) \<longleftrightarrow>

   386      x1 < x2 \<or> x1 = x2 \<and> y1 \<le> y2"

   387   by (simp_all add: less_prod_def less_eq_prod_def)

   389 text {*

   390   \noindent Then code generation will fail.  Why?  The definition

   391   of @{term "op \<le>"} depends on equality on both arguments,

   392   which are polymorphic and impose an additional @{class eq}

   393   class constraint, which the preprocessor does not propagate

   394   (for technical reasons).

   396   The solution is to add @{class eq} explicitly to the first sort arguments in the

   397   code theorems:

   398 *}

   400 lemma %quote order_prod_code [code]:

   401   "(x1 \<Colon> 'a\<Colon>{order, eq}, y1 \<Colon> 'b\<Colon>order) < (x2, y2) \<longleftrightarrow>

   402      x1 < x2 \<or> x1 = x2 \<and> y1 < y2"

   403   "(x1 \<Colon> 'a\<Colon>{order, eq}, y1 \<Colon> 'b\<Colon>order) \<le> (x2, y2) \<longleftrightarrow>

   404      x1 < x2 \<or> x1 = x2 \<and> y1 \<le> y2"

   405   by (simp_all add: less_prod_def less_eq_prod_def)

   407 text {*

   408   \noindent Then code generation succeeds:

   409 *}

   411 text %quote {*@{code_stmts "op \<le> \<Colon> _ \<times> _ \<Rightarrow> _ \<times> _ \<Rightarrow> bool" (SML)}*}

   413 text {*

   414   In some cases, the automatically derived code equations

   415   for equality on a particular type may not be appropriate.

   416   As example, watch the following datatype representing

   417   monomorphic parametric types (where type constructors

   418   are referred to by natural numbers):

   419 *}

   421 datatype %quote monotype = Mono nat "monotype list"

   422 (*<*)

   423 lemma monotype_eq:

   424   "eq_class.eq (Mono tyco1 typargs1) (Mono tyco2 typargs2) \<equiv>

   425      eq_class.eq tyco1 tyco2 \<and> eq_class.eq typargs1 typargs2" by (simp add: eq)

   426 (*>*)

   428 text {*

   429   \noindent Then code generation for SML would fail with a message

   430   that the generated code contains illegal mutual dependencies:

   431   the theorem @{thm monotype_eq [no_vars]} already requires the

   432   instance @{text "monotype \<Colon> eq"}, which itself requires

   433   @{thm monotype_eq [no_vars]};  Haskell has no problem with mutually

   434   recursive @{text instance} and @{text function} definitions,

   435   but the SML serialiser does not support this.

   437   In such cases, you have to provide your own equality equations

   438   involving auxiliary constants.  In our case,

   439   @{const [show_types] list_all2} can do the job:

   440 *}

   442 lemma %quote monotype_eq_list_all2 [code]:

   443   "eq_class.eq (Mono tyco1 typargs1) (Mono tyco2 typargs2) \<longleftrightarrow>

   444      eq_class.eq tyco1 tyco2 \<and> list_all2 eq_class.eq typargs1 typargs2"

   445   by (simp add: eq list_all2_eq [symmetric])

   447 text {*

   448   \noindent does not depend on instance @{text "monotype \<Colon> eq"}:

   449 *}

   451 text %quote {*@{code_stmts "eq_class.eq :: monotype \<Rightarrow> monotype \<Rightarrow> bool" (SML)}*}

   454 subsection {* Explicit partiality *}

   456 text {*

   457   Partiality usually enters the game by partial patterns, as

   458   in the following example, again for amortised queues:

   459 *}

   461 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

   462   "strict_dequeue q = (case dequeue q

   463     of (Some x, q') \<Rightarrow> (x, q'))"

   465 lemma %quote strict_dequeue_AQueue [code]:

   466   "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"

   467   "strict_dequeue (AQueue xs []) =

   468     (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"

   469   by (simp_all add: strict_dequeue_def dequeue_AQueue split: list.splits)

   471 text {*

   472   \noindent In the corresponding code, there is no equation

   473   for the pattern @{term "AQueue [] []"}:

   474 *}

   476 text %quote {*@{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}*}

   478 text {*

   479   \noindent In some cases it is desirable to have this

   480   pseudo-\qt{partiality} more explicitly, e.g.~as follows:

   481 *}

   483 axiomatization %quote empty_queue :: 'a

   485 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

   486   "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"

   488 lemma %quote strict_dequeue'_AQueue [code]:

   489   "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue

   490      else strict_dequeue' (AQueue [] (rev xs)))"

   491   "strict_dequeue' (AQueue xs (y # ys)) =

   492      (y, AQueue xs ys)"

   493   by (simp_all add: strict_dequeue'_def dequeue_AQueue split: list.splits)

   495 text {*

   496   Observe that on the right hand side of the definition of @{const

   497   "strict_dequeue'"} the constant @{const empty_queue} occurs

   498   which is unspecified.

   500   Normally, if constants without any code equations occur in a

   501   program, the code generator complains (since in most cases this is

   502   not what the user expects).  But such constants can also be thought

   503   of as function definitions with no equations which always fail,

   504   since there is never a successful pattern match on the left hand

   505   side.  In order to categorise a constant into that category

   506   explicitly, use @{command "code_abort"}:

   507 *}

   509 code_abort %quote empty_queue

   511 text {*

   512   \noindent Then the code generator will just insert an error or

   513   exception at the appropriate position:

   514 *}

   516 text %quote {*@{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}*}

   518 text {*

   519   \noindent This feature however is rarely needed in practice.

   520   Note also that the @{text HOL} default setup already declares

   521   @{const undefined} as @{command "code_abort"}, which is most

   522   likely to be used in such situations.

   523 *}

   525 end

author	haftmann
	Tue, 03 Mar 2009 13:20:53 +0100
changeset 30210	853abb4853cc
parent 30209	2f4684e2ea95
child 30938	c6c9359e474c
permissions	-rw-r--r--