doc-src/Codegen/Thy/Introduction.thy
author haftmann
Tue, 03 Mar 2009 11:00:51 +0100
changeset 30209 2f4684e2ea95
parent 29735 doc-src/IsarAdvanced/Codegen/Thy/Introduction.thy@6df726203e39
child 30210 853abb4853cc
permissions -rw-r--r--
more canonical directory structure of manuals
     1 theory Introduction
     2 imports Setup
     3 begin
     4 
     5 chapter {* Code generation from @{text "Isabelle/HOL"} theories *}
     6 
     7 section {* Introduction and Overview *}
     8 
     9 text {*
    10   This tutorial introduces a generic code generator for the
    11   @{text Isabelle} system.
    12   Generic in the sense that the
    13   \qn{target language} for which code shall ultimately be
    14   generated is not fixed but may be an arbitrary state-of-the-art
    15   functional programming language (currently, the implementation
    16   supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
    17   \cite{haskell-revised-report}).
    18 
    19   Conceptually the code generator framework is part
    20   of Isabelle's @{theory Pure} meta logic framework; the logic
    21   @{theory HOL} which is an extension of @{theory Pure}
    22   already comes with a reasonable framework setup and thus provides
    23   a good working horse for raising code-generation-driven
    24   applications.  So, we assume some familiarity and experience
    25   with the ingredients of the @{theory HOL} distribution theories.
    26   (see also \cite{isa-tutorial}).
    27 
    28   The code generator aims to be usable with no further ado
    29   in most cases while allowing for detailed customisation.
    30   This manifests in the structure of this tutorial: after a short
    31   conceptual introduction with an example (\secref{sec:intro}),
    32   we discuss the generic customisation facilities (\secref{sec:program}).
    33   A further section (\secref{sec:adaption}) is dedicated to the matter of
    34   \qn{adaption} to specific target language environments.  After some
    35   further issues (\secref{sec:further}) we conclude with an overview
    36   of some ML programming interfaces (\secref{sec:ml}).
    37 
    38   \begin{warn}
    39     Ultimately, the code generator which this tutorial deals with
    40     is supposed to replace the existing code generator
    41     by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
    42     So, for the moment, there are two distinct code generators
    43     in Isabelle.  In case of ambiguity, we will refer to the framework
    44     described here as @{text "generic code generator"}, to the
    45     other as @{text "SML code generator"}.
    46     Also note that while the framework itself is
    47     object-logic independent, only @{theory HOL} provides a reasonable
    48     framework setup.    
    49   \end{warn}
    50 
    51 *}
    52 
    53 subsection {* Code generation via shallow embedding \label{sec:intro} *}
    54 
    55 text {*
    56   The key concept for understanding @{text Isabelle}'s code generation is
    57   \emph{shallow embedding}, i.e.~logical entities like constants, types and
    58   classes are identified with corresponding concepts in the target language.
    59 
    60   Inside @{theory HOL}, the @{command datatype} and
    61   @{command definition}/@{command primrec}/@{command fun} declarations form
    62   the core of a functional programming language.  The default code generator setup
    63   allows to turn those into functional programs immediately.
    64   This means that \qt{naive} code generation can proceed without further ado.
    65   For example, here a simple \qt{implementation} of amortised queues:
    66 *}
    67 
    68 datatype %quote 'a queue = AQueue "'a list" "'a list"
    69 
    70 definition %quote empty :: "'a queue" where
    71   "empty = AQueue [] []"
    72 
    73 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
    74   "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
    75 
    76 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
    77     "dequeue (AQueue [] []) = (None, AQueue [] [])"
    78   | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
    79   | "dequeue (AQueue xs []) =
    80       (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))"
    81 
    82 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
    83 
    84 export_code %quote empty dequeue enqueue in SML
    85   module_name Example file "examples/example.ML"
    86 
    87 text {* \noindent resulting in the following code: *}
    88 
    89 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
    90 
    91 text {*
    92   \noindent The @{command export_code} command takes a space-separated list of
    93   constants for which code shall be generated;  anything else needed for those
    94   is added implicitly.  Then follows a target language identifier
    95   (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
    96   A file name denotes the destination to store the generated code.  Note that
    97   the semantics of the destination depends on the target language:  for
    98   @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
    99   it denotes a \emph{directory} where a file named as the module name
   100   (with extension @{text ".hs"}) is written:
   101 *}
   102 
   103 export_code %quote empty dequeue enqueue in Haskell
   104   module_name Example file "examples/"
   105 
   106 text {*
   107   \noindent This is how the corresponding code in @{text Haskell} looks like:
   108 *}
   109 
   110 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
   111 
   112 text {*
   113   \noindent This demonstrates the basic usage of the @{command export_code} command;
   114   for more details see \secref{sec:further}.
   115 *}
   116 
   117 subsection {* Code generator architecture \label{sec:concept} *}
   118 
   119 text {*
   120   What you have seen so far should be already enough in a lot of cases.  If you
   121   are content with this, you can quit reading here.  Anyway, in order to customise
   122   and adapt the code generator, it is inevitable to gain some understanding
   123   how it works.
   124 
   125   \begin{figure}[h]
   126     \begin{tikzpicture}[x = 4.2cm, y = 1cm]
   127       \tikzstyle entity=[rounded corners, draw, thick, color = black, fill = white];
   128       \tikzstyle process=[ellipse, draw, thick, color = green, fill = white];
   129       \tikzstyle process_arrow=[->, semithick, color = green];
   130       \node (HOL) at (0, 4) [style=entity] {@{text "Isabelle/HOL"} theory};
   131       \node (eqn) at (2, 2) [style=entity] {code equations};
   132       \node (iml) at (2, 0) [style=entity] {intermediate language};
   133       \node (seri) at (1, 0) [style=process] {serialisation};
   134       \node (SML) at (0, 3) [style=entity] {@{text SML}};
   135       \node (OCaml) at (0, 2) [style=entity] {@{text OCaml}};
   136       \node (further) at (0, 1) [style=entity] {@{text "\<dots>"}};
   137       \node (Haskell) at (0, 0) [style=entity] {@{text Haskell}};
   138       \draw [style=process_arrow] (HOL) .. controls (2, 4) ..
   139         node [style=process, near start] {selection}
   140         node [style=process, near end] {preprocessing}
   141         (eqn);
   142       \draw [style=process_arrow] (eqn) -- node (transl) [style=process] {translation} (iml);
   143       \draw [style=process_arrow] (iml) -- (seri);
   144       \draw [style=process_arrow] (seri) -- (SML);
   145       \draw [style=process_arrow] (seri) -- (OCaml);
   146       \draw [style=process_arrow, dashed] (seri) -- (further);
   147       \draw [style=process_arrow] (seri) -- (Haskell);
   148     \end{tikzpicture}
   149     \caption{Code generator architecture}
   150     \label{fig:arch}
   151   \end{figure}
   152 
   153   The code generator employs a notion of executability
   154   for three foundational executable ingredients known
   155   from functional programming:
   156   \emph{code equations}, \emph{datatypes}, and
   157   \emph{type classes}.  A code equation as a first approximation
   158   is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
   159   (an equation headed by a constant @{text f} with arguments
   160   @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
   161   Code generation aims to turn code equations
   162   into a functional program.  This is achieved by three major
   163   components which operate sequentially, i.e. the result of one is
   164   the input
   165   of the next in the chain,  see diagram \ref{fig:arch}:
   166 
   167   \begin{itemize}
   168 
   169     \item Out of the vast collection of theorems proven in a
   170       \qn{theory}, a reasonable subset modelling
   171       code equations is \qn{selected}.
   172 
   173     \item On those selected theorems, certain
   174       transformations are carried out
   175       (\qn{preprocessing}).  Their purpose is to turn theorems
   176       representing non- or badly executable
   177       specifications into equivalent but executable counterparts.
   178       The result is a structured collection of \qn{code theorems}.
   179 
   180     \item Before the selected code equations are continued with,
   181       they can be \qn{preprocessed}, i.e. subjected to theorem
   182       transformations.  This \qn{preprocessor} is an interface which
   183       allows to apply
   184       the full expressiveness of ML-based theorem transformations
   185       to code generation;  motivating examples are shown below, see
   186       \secref{sec:preproc}.
   187       The result of the preprocessing step is a structured collection
   188       of code equations.
   189 
   190     \item These code equations are \qn{translated} to a program
   191       in an abstract intermediate language.  Think of it as a kind
   192       of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
   193       (for datatypes), @{text fun} (stemming from code equations),
   194       also @{text class} and @{text inst} (for type classes).
   195 
   196     \item Finally, the abstract program is \qn{serialised} into concrete
   197       source code of a target language.
   198 
   199   \end{itemize}
   200 
   201   \noindent From these steps, only the two last are carried out outside the logic;  by
   202   keeping this layer as thin as possible, the amount of code to trust is
   203   kept to a minimum.
   204 *}
   205 
   206 end