5 chapter {* Code generation from @{text "Isabelle/HOL"} theories *}
7 section {* Introduction and Overview *}
10 This tutorial introduces a generic code generator for the
11 @{text Isabelle} system.
12 Generic in the sense that the
13 \qn{target language} for which code shall ultimately be
14 generated is not fixed but may be an arbitrary state-of-the-art
15 functional programming language (currently, the implementation
16 supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
17 \cite{haskell-revised-report}).
19 Conceptually the code generator framework is part
20 of Isabelle's @{theory Pure} meta logic framework; the logic
21 @{theory HOL} which is an extension of @{theory Pure}
22 already comes with a reasonable framework setup and thus provides
23 a good working horse for raising code-generation-driven
24 applications. So, we assume some familiarity and experience
25 with the ingredients of the @{theory HOL} distribution theories.
26 (see also \cite{isa-tutorial}).
28 The code generator aims to be usable with no further ado
29 in most cases while allowing for detailed customisation.
30 This manifests in the structure of this tutorial: after a short
31 conceptual introduction with an example (\secref{sec:intro}),
32 we discuss the generic customisation facilities (\secref{sec:program}).
33 A further section (\secref{sec:adaption}) is dedicated to the matter of
34 \qn{adaption} to specific target language environments. After some
35 further issues (\secref{sec:further}) we conclude with an overview
36 of some ML programming interfaces (\secref{sec:ml}).
39 Ultimately, the code generator which this tutorial deals with
40 is supposed to replace the existing code generator
41 by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
42 So, for the moment, there are two distinct code generators
43 in Isabelle. In case of ambiguity, we will refer to the framework
44 described here as @{text "generic code generator"}, to the
45 other as @{text "SML code generator"}.
46 Also note that while the framework itself is
47 object-logic independent, only @{theory HOL} provides a reasonable
53 subsection {* Code generation via shallow embedding \label{sec:intro} *}
56 The key concept for understanding @{text Isabelle}'s code generation is
57 \emph{shallow embedding}, i.e.~logical entities like constants, types and
58 classes are identified with corresponding concepts in the target language.
60 Inside @{theory HOL}, the @{command datatype} and
61 @{command definition}/@{command primrec}/@{command fun} declarations form
62 the core of a functional programming language. The default code generator setup
63 allows to turn those into functional programs immediately.
64 This means that \qt{naive} code generation can proceed without further ado.
65 For example, here a simple \qt{implementation} of amortised queues:
68 datatype %quote 'a queue = AQueue "'a list" "'a list"
70 definition %quote empty :: "'a queue" where
71 "empty = AQueue [] []"
73 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
74 "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
76 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
77 "dequeue (AQueue [] []) = (None, AQueue [] [])"
78 | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
79 | "dequeue (AQueue xs []) =
80 (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))"
82 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
84 export_code %quote empty dequeue enqueue in SML
85 module_name Example file "examples/example.ML"
87 text {* \noindent resulting in the following code: *}
89 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
92 \noindent The @{command export_code} command takes a space-separated list of
93 constants for which code shall be generated; anything else needed for those
94 is added implicitly. Then follows a target language identifier
95 (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
96 A file name denotes the destination to store the generated code. Note that
97 the semantics of the destination depends on the target language: for
98 @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
99 it denotes a \emph{directory} where a file named as the module name
100 (with extension @{text ".hs"}) is written:
103 export_code %quote empty dequeue enqueue in Haskell
104 module_name Example file "examples/"
107 \noindent This is how the corresponding code in @{text Haskell} looks like:
110 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
113 \noindent This demonstrates the basic usage of the @{command export_code} command;
114 for more details see \secref{sec:further}.
117 subsection {* Code generator architecture \label{sec:concept} *}
120 What you have seen so far should be already enough in a lot of cases. If you
121 are content with this, you can quit reading here. Anyway, in order to customise
122 and adapt the code generator, it is inevitable to gain some understanding
126 \begin{tikzpicture}[x = 4.2cm, y = 1cm]
127 \tikzstyle entity=[rounded corners, draw, thick, color = black, fill = white];
128 \tikzstyle process=[ellipse, draw, thick, color = green, fill = white];
129 \tikzstyle process_arrow=[->, semithick, color = green];
130 \node (HOL) at (0, 4) [style=entity] {@{text "Isabelle/HOL"} theory};
131 \node (eqn) at (2, 2) [style=entity] {code equations};
132 \node (iml) at (2, 0) [style=entity] {intermediate language};
133 \node (seri) at (1, 0) [style=process] {serialisation};
134 \node (SML) at (0, 3) [style=entity] {@{text SML}};
135 \node (OCaml) at (0, 2) [style=entity] {@{text OCaml}};
136 \node (further) at (0, 1) [style=entity] {@{text "\<dots>"}};
137 \node (Haskell) at (0, 0) [style=entity] {@{text Haskell}};
138 \draw [style=process_arrow] (HOL) .. controls (2, 4) ..
139 node [style=process, near start] {selection}
140 node [style=process, near end] {preprocessing}
142 \draw [style=process_arrow] (eqn) -- node (transl) [style=process] {translation} (iml);
143 \draw [style=process_arrow] (iml) -- (seri);
144 \draw [style=process_arrow] (seri) -- (SML);
145 \draw [style=process_arrow] (seri) -- (OCaml);
146 \draw [style=process_arrow, dashed] (seri) -- (further);
147 \draw [style=process_arrow] (seri) -- (Haskell);
149 \caption{Code generator architecture}
153 The code generator employs a notion of executability
154 for three foundational executable ingredients known
155 from functional programming:
156 \emph{code equations}, \emph{datatypes}, and
157 \emph{type classes}. A code equation as a first approximation
158 is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
159 (an equation headed by a constant @{text f} with arguments
160 @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
161 Code generation aims to turn code equations
162 into a functional program. This is achieved by three major
163 components which operate sequentially, i.e. the result of one is
165 of the next in the chain, see diagram \ref{fig:arch}:
169 \item Out of the vast collection of theorems proven in a
170 \qn{theory}, a reasonable subset modelling
171 code equations is \qn{selected}.
173 \item On those selected theorems, certain
174 transformations are carried out
175 (\qn{preprocessing}). Their purpose is to turn theorems
176 representing non- or badly executable
177 specifications into equivalent but executable counterparts.
178 The result is a structured collection of \qn{code theorems}.
180 \item Before the selected code equations are continued with,
181 they can be \qn{preprocessed}, i.e. subjected to theorem
182 transformations. This \qn{preprocessor} is an interface which
184 the full expressiveness of ML-based theorem transformations
185 to code generation; motivating examples are shown below, see
186 \secref{sec:preproc}.
187 The result of the preprocessing step is a structured collection
190 \item These code equations are \qn{translated} to a program
191 in an abstract intermediate language. Think of it as a kind
192 of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
193 (for datatypes), @{text fun} (stemming from code equations),
194 also @{text class} and @{text inst} (for type classes).
196 \item Finally, the abstract program is \qn{serialised} into concrete
197 source code of a target language.
201 \noindent From these steps, only the two last are carried out outside the logic; by
202 keeping this layer as thin as possible, the amount of code to trust is