doc-src/Codegen/Thy/Introduction.thy
author wenzelm
Tue, 07 Apr 2009 21:24:39 +0200
changeset 30881 d15725e84091
parent 30880 257cbe43faa8
child 31050 555b56b66fcf
permissions -rw-r--r--
moved generated eps/pdf to main directory, for proper display in dvi;
haftmann@28213
     1
theory Introduction
haftmann@28213
     2
imports Setup
haftmann@28213
     3
begin
haftmann@28213
     4
haftmann@28213
     5
section {* Introduction and Overview *}
haftmann@28213
     6
haftmann@28213
     7
text {*
haftmann@28213
     8
  This tutorial introduces a generic code generator for the
haftmann@28419
     9
  @{text Isabelle} system.
haftmann@28419
    10
  Generic in the sense that the
haftmann@28419
    11
  \qn{target language} for which code shall ultimately be
haftmann@28419
    12
  generated is not fixed but may be an arbitrary state-of-the-art
haftmann@28419
    13
  functional programming language (currently, the implementation
haftmann@28419
    14
  supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
haftmann@28419
    15
  \cite{haskell-revised-report}).
haftmann@28419
    16
haftmann@28419
    17
  Conceptually the code generator framework is part
haftmann@28428
    18
  of Isabelle's @{theory Pure} meta logic framework; the logic
haftmann@28428
    19
  @{theory HOL} which is an extension of @{theory Pure}
haftmann@28419
    20
  already comes with a reasonable framework setup and thus provides
haftmann@28419
    21
  a good working horse for raising code-generation-driven
haftmann@28419
    22
  applications.  So, we assume some familiarity and experience
haftmann@28428
    23
  with the ingredients of the @{theory HOL} distribution theories.
haftmann@28419
    24
  (see also \cite{isa-tutorial}).
haftmann@28419
    25
haftmann@28419
    26
  The code generator aims to be usable with no further ado
haftmann@28419
    27
  in most cases while allowing for detailed customisation.
haftmann@28419
    28
  This manifests in the structure of this tutorial: after a short
haftmann@28447
    29
  conceptual introduction with an example (\secref{sec:intro}),
haftmann@28447
    30
  we discuss the generic customisation facilities (\secref{sec:program}).
haftmann@28447
    31
  A further section (\secref{sec:adaption}) is dedicated to the matter of
haftmann@28419
    32
  \qn{adaption} to specific target language environments.  After some
haftmann@28447
    33
  further issues (\secref{sec:further}) we conclude with an overview
haftmann@28447
    34
  of some ML programming interfaces (\secref{sec:ml}).
haftmann@28419
    35
haftmann@28419
    36
  \begin{warn}
haftmann@28419
    37
    Ultimately, the code generator which this tutorial deals with
haftmann@28447
    38
    is supposed to replace the existing code generator
haftmann@28419
    39
    by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
haftmann@28419
    40
    So, for the moment, there are two distinct code generators
haftmann@28447
    41
    in Isabelle.  In case of ambiguity, we will refer to the framework
haftmann@28447
    42
    described here as @{text "generic code generator"}, to the
haftmann@28447
    43
    other as @{text "SML code generator"}.
haftmann@28419
    44
    Also note that while the framework itself is
haftmann@28428
    45
    object-logic independent, only @{theory HOL} provides a reasonable
haftmann@28419
    46
    framework setup.    
haftmann@28419
    47
  \end{warn}
haftmann@28419
    48
haftmann@28213
    49
*}
haftmann@28213
    50
haftmann@28419
    51
subsection {* Code generation via shallow embedding \label{sec:intro} *}
haftmann@28213
    52
haftmann@28419
    53
text {*
haftmann@28419
    54
  The key concept for understanding @{text Isabelle}'s code generation is
haftmann@28419
    55
  \emph{shallow embedding}, i.e.~logical entities like constants, types and
haftmann@28419
    56
  classes are identified with corresponding concepts in the target language.
haftmann@28419
    57
haftmann@28428
    58
  Inside @{theory HOL}, the @{command datatype} and
haftmann@28419
    59
  @{command definition}/@{command primrec}/@{command fun} declarations form
haftmann@28419
    60
  the core of a functional programming language.  The default code generator setup
haftmann@28419
    61
  allows to turn those into functional programs immediately.
haftmann@28419
    62
  This means that \qt{naive} code generation can proceed without further ado.
haftmann@28419
    63
  For example, here a simple \qt{implementation} of amortised queues:
haftmann@28419
    64
*}
haftmann@28419
    65
haftmann@29735
    66
datatype %quote 'a queue = AQueue "'a list" "'a list"
haftmann@28419
    67
haftmann@28564
    68
definition %quote empty :: "'a queue" where
haftmann@29735
    69
  "empty = AQueue [] []"
haftmann@28419
    70
haftmann@28564
    71
primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
haftmann@29735
    72
  "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
haftmann@28419
    73
haftmann@28564
    74
fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
haftmann@29735
    75
    "dequeue (AQueue [] []) = (None, AQueue [] [])"
haftmann@29735
    76
  | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
haftmann@29735
    77
  | "dequeue (AQueue xs []) =
haftmann@29735
    78
      (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))"
haftmann@28419
    79
haftmann@28419
    80
text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
haftmann@28419
    81
haftmann@28564
    82
export_code %quote empty dequeue enqueue in SML
haftmann@28447
    83
  module_name Example file "examples/example.ML"
haftmann@28419
    84
haftmann@28419
    85
text {* \noindent resulting in the following code: *}
haftmann@28419
    86
haftmann@28564
    87
text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
haftmann@28419
    88
haftmann@28419
    89
text {*
haftmann@28419
    90
  \noindent The @{command export_code} command takes a space-separated list of
haftmann@28419
    91
  constants for which code shall be generated;  anything else needed for those
haftmann@28447
    92
  is added implicitly.  Then follows a target language identifier
haftmann@28419
    93
  (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
haftmann@28419
    94
  A file name denotes the destination to store the generated code.  Note that
haftmann@28419
    95
  the semantics of the destination depends on the target language:  for
haftmann@28419
    96
  @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
haftmann@28419
    97
  it denotes a \emph{directory} where a file named as the module name
haftmann@28419
    98
  (with extension @{text ".hs"}) is written:
haftmann@28419
    99
*}
haftmann@28419
   100
haftmann@28564
   101
export_code %quote empty dequeue enqueue in Haskell
haftmann@28447
   102
  module_name Example file "examples/"
haftmann@28419
   103
haftmann@28419
   104
text {*
haftmann@28419
   105
  \noindent This is how the corresponding code in @{text Haskell} looks like:
haftmann@28419
   106
*}
haftmann@28419
   107
haftmann@28564
   108
text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
haftmann@28419
   109
haftmann@28419
   110
text {*
haftmann@28419
   111
  \noindent This demonstrates the basic usage of the @{command export_code} command;
haftmann@28447
   112
  for more details see \secref{sec:further}.
haftmann@28419
   113
*}
haftmann@28213
   114
haftmann@28447
   115
subsection {* Code generator architecture \label{sec:concept} *}
haftmann@28213
   116
haftmann@28419
   117
text {*
haftmann@28419
   118
  What you have seen so far should be already enough in a lot of cases.  If you
haftmann@28419
   119
  are content with this, you can quit reading here.  Anyway, in order to customise
haftmann@28419
   120
  and adapt the code generator, it is inevitable to gain some understanding
haftmann@28419
   121
  how it works.
haftmann@28419
   122
haftmann@28419
   123
  \begin{figure}[h]
wenzelm@30881
   124
    \includegraphics{architecture}
haftmann@28419
   125
    \caption{Code generator architecture}
haftmann@28419
   126
    \label{fig:arch}
haftmann@28419
   127
  \end{figure}
haftmann@28419
   128
haftmann@28419
   129
  The code generator employs a notion of executability
haftmann@28419
   130
  for three foundational executable ingredients known
haftmann@28419
   131
  from functional programming:
haftmann@29560
   132
  \emph{code equations}, \emph{datatypes}, and
haftmann@29560
   133
  \emph{type classes}.  A code equation as a first approximation
haftmann@28419
   134
  is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
haftmann@28419
   135
  (an equation headed by a constant @{text f} with arguments
haftmann@28419
   136
  @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
haftmann@29560
   137
  Code generation aims to turn code equations
haftmann@28447
   138
  into a functional program.  This is achieved by three major
haftmann@28447
   139
  components which operate sequentially, i.e. the result of one is
haftmann@28447
   140
  the input
haftmann@30880
   141
  of the next in the chain,  see figure \ref{fig:arch}:
haftmann@28419
   142
haftmann@28419
   143
  \begin{itemize}
haftmann@28419
   144
haftmann@30836
   145
    \item Starting point is a collection of raw code equations in a
haftmann@30836
   146
      theory; due to proof irrelevance it is not relevant where they
haftmann@30836
   147
      stem from but typically they are either descendant of specification
haftmann@30836
   148
      tools or explicit proofs by the user.
haftmann@30836
   149
      
haftmann@30836
   150
    \item Before these raw code equations are continued
haftmann@30836
   151
      with, they can be subjected to theorem transformations.  This
haftmann@30836
   152
      \qn{preprocessor} is an interface which allows to apply the full
haftmann@30836
   153
      expressiveness of ML-based theorem transformations to code
haftmann@30836
   154
      generation.  The result of the preprocessing step is a
haftmann@30836
   155
      structured collection of code equations.
haftmann@28419
   156
haftmann@30836
   157
    \item These code equations are \qn{translated} to a program in an
haftmann@30836
   158
      abstract intermediate language.  Think of it as a kind
haftmann@28447
   159
      of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
haftmann@29560
   160
      (for datatypes), @{text fun} (stemming from code equations),
haftmann@28447
   161
      also @{text class} and @{text inst} (for type classes).
haftmann@28419
   162
haftmann@28419
   163
    \item Finally, the abstract program is \qn{serialised} into concrete
haftmann@28419
   164
      source code of a target language.
haftmann@30836
   165
      This step only produces concrete syntax but does not change the
haftmann@30836
   166
      program in essence; all conceptual transformations occur in the
haftmann@30836
   167
      translation step.
haftmann@28419
   168
haftmann@28419
   169
  \end{itemize}
haftmann@28419
   170
haftmann@28419
   171
  \noindent From these steps, only the two last are carried out outside the logic;  by
haftmann@28419
   172
  keeping this layer as thin as possible, the amount of code to trust is
haftmann@28419
   173
  kept to a minimum.
haftmann@28419
   174
*}
haftmann@28213
   175
haftmann@28213
   176
end