5 section {* Introduction and Overview *}
8 This tutorial introduces a generic code generator for the
9 @{text Isabelle} system.
11 \qn{target language} for which code is
12 generated is not fixed, but may be one of several
13 functional programming languages (currently, the implementation
14 supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
15 \cite{haskell-revised-report}).
17 Conceptually the code generator framework is part
18 of Isabelle's @{theory Pure} meta logic framework; the logic
19 @{theory HOL} \cite{isa-tutorial}, which is an extension of @{theory Pure},
20 already comes with a reasonable framework setup and thus provides
21 a good basis for creating code-generation-driven
22 applications. So, we assume some familiarity and experience
23 with the ingredients of the @{theory HOL} distribution theories.
25 The code generator aims to be usable with no further ado
26 in most cases, while allowing for detailed customisation.
27 This can be seen in the structure of this tutorial: after a short
28 conceptual introduction with an example (\secref{sec:intro}),
29 we discuss the generic customisation facilities (\secref{sec:program}).
30 A further section (\secref{sec:adaptation}) is dedicated to the matter of
31 \qn{adaptation} to specific target language environments. After some
32 further issues (\secref{sec:further}) we conclude with an overview
33 of some ML programming interfaces (\secref{sec:ml}).
36 Ultimately, the code generator which this tutorial deals with
37 is supposed to replace the existing code generator
38 by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
39 So, for the moment, there are two distinct code generators
40 in Isabelle. In case of ambiguity, we will refer to the framework
41 described here as @{text "generic code generator"}, to the
42 other as @{text "SML code generator"}.
43 Also note that while the framework itself is
44 object-logic independent, only @{theory HOL} provides a reasonable
50 subsection {* Code generation via shallow embedding \label{sec:intro} *}
53 The key concept for understanding @{text Isabelle}'s code generation is
54 \emph{shallow embedding}, i.e.~logical entities like constants, types and
55 classes are identified with corresponding concepts in the target language.
57 Inside @{theory HOL}, the @{command datatype} and
58 @{command definition}/@{command primrec}/@{command fun} declarations form
59 the core of a functional programming language. The default code generator setup
60 transforms those into functional programs immediately.
61 This means that \qt{naive} code generation can proceed without further ado.
62 For example, here a simple \qt{implementation} of amortised queues:
65 datatype %quote 'a queue = AQueue "'a list" "'a list"
67 definition %quote empty :: "'a queue" where
68 "empty = AQueue [] []"
70 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
71 "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
73 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
74 "dequeue (AQueue [] []) = (None, AQueue [] [])"
75 | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
76 | "dequeue (AQueue xs []) =
77 (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))"
79 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
81 export_code %quote empty dequeue enqueue in SML
82 module_name Example file "examples/example.ML"
84 text {* \noindent resulting in the following code: *}
86 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
89 \noindent The @{command export_code} command takes a space-separated list of
90 constants for which code shall be generated; anything else needed for those
91 is added implicitly. Then follows a target language identifier
92 (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
93 A file name denotes the destination to store the generated code. Note that
94 the semantics of the destination depends on the target language: for
95 @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
96 it denotes a \emph{directory} where a file named as the module name
97 (with extension @{text ".hs"}) is written:
100 export_code %quote empty dequeue enqueue in Haskell
101 module_name Example file "examples/"
104 \noindent This is the corresponding code in @{text Haskell}:
107 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
110 \noindent This demonstrates the basic usage of the @{command export_code} command;
111 for more details see \secref{sec:further}.
114 subsection {* If something utterly fails *}
117 Under certain circumstances, the code generator fails to produce
122 \ditem{generate only one module}
124 \ditem{check with a different target language}
126 \ditem{inspect code equations}
128 \ditem{inspect preprocessor setup}
130 \ditem{generate exceptions}
132 \ditem{remove offending code equations}
137 subsection {* Code generator architecture \label{sec:concept} *}
140 What you have seen so far should be already enough in a lot of cases. If you
141 are content with this, you can quit reading here. Anyway, in order to customise
142 and adapt the code generator, it is necessary to gain some understanding
146 \includegraphics{architecture}
147 \caption{Code generator architecture}
151 The code generator employs a notion of executability
152 for three foundational executable ingredients known
153 from functional programming:
154 \emph{code equations}, \emph{datatypes}, and
155 \emph{type classes}. A code equation as a first approximation
156 is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
157 (an equation headed by a constant @{text f} with arguments
158 @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
159 Code generation aims to turn code equations
160 into a functional program. This is achieved by three major
161 components which operate sequentially, i.e. the result of one is
163 of the next in the chain, see figure \ref{fig:arch}:
167 \item The starting point is a collection of raw code equations in a
168 theory. It is not relevant where they
169 stem from, but typically they were either produced by specification
170 tools or proved explicitly by the user.
172 \item These raw code equations can be subjected to theorem transformations. This
173 \qn{preprocessor} can apply the full
174 expressiveness of ML-based theorem transformations to code
175 generation. The result of preprocessing is a
176 structured collection of code equations.
178 \item These code equations are \qn{translated} to a program in an
179 abstract intermediate language. Think of it as a kind
180 of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
181 (for datatypes), @{text fun} (stemming from code equations),
182 also @{text class} and @{text inst} (for type classes).
184 \item Finally, the abstract program is \qn{serialised} into concrete
185 source code of a target language.
186 This step only produces concrete syntax but does not change the
187 program in essence; all conceptual transformations occur in the
192 \noindent From these steps, only the two last are carried out outside the logic; by
193 keeping this layer as thin as possible, the amount of code to trust is