doc-src/Codegen/Thy/Introduction.thy
author haftmann
Fri, 13 Aug 2010 13:43:54 +0200
changeset 38628 58fc3a3af71f
parent 34155 14aaccb399b3
child 38631 7935b334893e
permissions -rw-r--r--
added stub "If something utterly fails"
     1 theory Introduction
     2 imports Setup
     3 begin
     4 
     5 section {* Introduction and Overview *}
     6 
     7 text {*
     8   This tutorial introduces a generic code generator for the
     9   @{text Isabelle} system.
    10   The
    11   \qn{target language} for which code is
    12   generated is not fixed, but may be one of several
    13   functional programming languages (currently, the implementation
    14   supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
    15   \cite{haskell-revised-report}).
    16 
    17   Conceptually the code generator framework is part
    18   of Isabelle's @{theory Pure} meta logic framework; the logic
    19   @{theory HOL} \cite{isa-tutorial}, which is an extension of @{theory Pure},
    20   already comes with a reasonable framework setup and thus provides
    21   a good basis for creating code-generation-driven
    22   applications.  So, we assume some familiarity and experience
    23   with the ingredients of the @{theory HOL} distribution theories.
    24 
    25   The code generator aims to be usable with no further ado
    26   in most cases, while allowing for detailed customisation.
    27   This can be seen in the structure of this tutorial: after a short
    28   conceptual introduction with an example (\secref{sec:intro}),
    29   we discuss the generic customisation facilities (\secref{sec:program}).
    30   A further section (\secref{sec:adaptation}) is dedicated to the matter of
    31   \qn{adaptation} to specific target language environments.  After some
    32   further issues (\secref{sec:further}) we conclude with an overview
    33   of some ML programming interfaces (\secref{sec:ml}).
    34 
    35   \begin{warn}
    36     Ultimately, the code generator which this tutorial deals with
    37     is supposed to replace the existing code generator
    38     by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
    39     So, for the moment, there are two distinct code generators
    40     in Isabelle.  In case of ambiguity, we will refer to the framework
    41     described here as @{text "generic code generator"}, to the
    42     other as @{text "SML code generator"}.
    43     Also note that while the framework itself is
    44     object-logic independent, only @{theory HOL} provides a reasonable
    45     framework setup.    
    46   \end{warn}
    47 
    48 *}
    49 
    50 subsection {* Code generation via shallow embedding \label{sec:intro} *}
    51 
    52 text {*
    53   The key concept for understanding @{text Isabelle}'s code generation is
    54   \emph{shallow embedding}, i.e.~logical entities like constants, types and
    55   classes are identified with corresponding concepts in the target language.
    56 
    57   Inside @{theory HOL}, the @{command datatype} and
    58   @{command definition}/@{command primrec}/@{command fun} declarations form
    59   the core of a functional programming language.  The default code generator setup
    60    transforms those into functional programs immediately.
    61   This means that \qt{naive} code generation can proceed without further ado.
    62   For example, here a simple \qt{implementation} of amortised queues:
    63 *}
    64 
    65 datatype %quote 'a queue = AQueue "'a list" "'a list"
    66 
    67 definition %quote empty :: "'a queue" where
    68   "empty = AQueue [] []"
    69 
    70 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
    71   "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
    72 
    73 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
    74     "dequeue (AQueue [] []) = (None, AQueue [] [])"
    75   | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
    76   | "dequeue (AQueue xs []) =
    77       (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))"
    78 
    79 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
    80 
    81 export_code %quote empty dequeue enqueue in SML
    82   module_name Example file "examples/example.ML"
    83 
    84 text {* \noindent resulting in the following code: *}
    85 
    86 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
    87 
    88 text {*
    89   \noindent The @{command export_code} command takes a space-separated list of
    90   constants for which code shall be generated;  anything else needed for those
    91   is added implicitly.  Then follows a target language identifier
    92   (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
    93   A file name denotes the destination to store the generated code.  Note that
    94   the semantics of the destination depends on the target language:  for
    95   @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
    96   it denotes a \emph{directory} where a file named as the module name
    97   (with extension @{text ".hs"}) is written:
    98 *}
    99 
   100 export_code %quote empty dequeue enqueue in Haskell
   101   module_name Example file "examples/"
   102 
   103 text {*
   104   \noindent This is the corresponding code in @{text Haskell}:
   105 *}
   106 
   107 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
   108 
   109 text {*
   110   \noindent This demonstrates the basic usage of the @{command export_code} command;
   111   for more details see \secref{sec:further}.
   112 *}
   113 
   114 subsection {* If something utterly fails *}
   115 
   116 text {*
   117   Under certain circumstances, the code generator fails to produce
   118   code entirely.  
   119 
   120   \begin{description}
   121 
   122     \ditem{generate only one module}
   123 
   124     \ditem{check with a different target language}
   125 
   126     \ditem{inspect code equations}
   127 
   128     \ditem{inspect preprocessor setup}
   129 
   130     \ditem{generate exceptions}
   131 
   132     \ditem{remove offending code equations}
   133 
   134   \end{description}
   135 *}
   136 
   137 subsection {* Code generator architecture \label{sec:concept} *}
   138 
   139 text {*
   140   What you have seen so far should be already enough in a lot of cases.  If you
   141   are content with this, you can quit reading here.  Anyway, in order to customise
   142   and adapt the code generator, it is necessary to gain some understanding
   143   how it works.
   144 
   145   \begin{figure}[h]
   146     \includegraphics{architecture}
   147     \caption{Code generator architecture}
   148     \label{fig:arch}
   149   \end{figure}
   150 
   151   The code generator employs a notion of executability
   152   for three foundational executable ingredients known
   153   from functional programming:
   154   \emph{code equations}, \emph{datatypes}, and
   155   \emph{type classes}.  A code equation as a first approximation
   156   is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
   157   (an equation headed by a constant @{text f} with arguments
   158   @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
   159   Code generation aims to turn code equations
   160   into a functional program.  This is achieved by three major
   161   components which operate sequentially, i.e. the result of one is
   162   the input
   163   of the next in the chain,  see figure \ref{fig:arch}:
   164 
   165   \begin{itemize}
   166 
   167     \item The starting point is a collection of raw code equations in a
   168       theory. It is not relevant where they
   169       stem from, but typically they were either produced by specification
   170       tools or proved explicitly by the user.
   171       
   172     \item These raw code equations can be subjected to theorem transformations.  This
   173       \qn{preprocessor} can apply the full
   174       expressiveness of ML-based theorem transformations to code
   175       generation.  The result of preprocessing is a
   176       structured collection of code equations.
   177 
   178     \item These code equations are \qn{translated} to a program in an
   179       abstract intermediate language.  Think of it as a kind
   180       of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
   181       (for datatypes), @{text fun} (stemming from code equations),
   182       also @{text class} and @{text inst} (for type classes).
   183 
   184     \item Finally, the abstract program is \qn{serialised} into concrete
   185       source code of a target language.
   186       This step only produces concrete syntax but does not change the
   187       program in essence; all conceptual transformations occur in the
   188       translation step.
   189 
   190   \end{itemize}
   191 
   192   \noindent From these steps, only the two last are carried out outside the logic;  by
   193   keeping this layer as thin as possible, the amount of code to trust is
   194   kept to a minimum.
   195 *}
   196 
   197 end