doc-src/TutorialI/fp.tex
author paulson
Fri, 03 Aug 2001 18:04:55 +0200
changeset 11458 09a6c44a48ea
parent 11457 279da0358aa9
child 11646 6a7d80a139c6
permissions -rw-r--r--
numerous stylistic changes and indexing
     1 \chapter{Functional Programming in HOL}
     2 
     3 This chapter describes how to write
     4 functional programs in HOL and how to verify them.  However, 
     5 most of the constructs and
     6 proof procedures introduced are general and recur in any specification
     7 or verification task.  We really should speak of functional
     8 \emph{modelling} rather than functional \emph{programming}: 
     9 our primary aim is not
    10 to write programs but to design abstract models of systems.  HOL is
    11 a specification language that goes well beyond what can be expressed as a
    12 program. However, for the time being we concentrate on the computable.
    13 
    14 If you are a purist functional programmer, please note that all functions
    15 in HOL must be total:
    16 they must terminate for all inputs.  Lazy data structures are not
    17 directly available.
    18 
    19 \section{An Introductory Theory}
    20 \label{sec:intro-theory}
    21 
    22 Functional programming needs datatypes and functions. Both of them can be
    23 defined in a theory with a syntax reminiscent of languages like ML or
    24 Haskell. As an example consider the theory in figure~\ref{fig:ToyList}.
    25 We will now examine it line by line.
    26 
    27 \begin{figure}[htbp]
    28 \begin{ttbox}\makeatother
    29 \input{ToyList2/ToyList1}\end{ttbox}
    30 \caption{A Theory of Lists}
    31 \label{fig:ToyList}
    32 \end{figure}
    33 
    34 \index{*ToyList example|(}
    35 {\makeatother\input{ToyList/document/ToyList.tex}}
    36 
    37 The complete proof script is shown in Fig.\ts\ref{fig:ToyList-proofs}. The
    38 concatenation of Figs.\ts\ref{fig:ToyList} and~\ref{fig:ToyList-proofs}
    39 constitutes the complete theory \texttt{ToyList} and should reside in file
    40 \texttt{ToyList.thy}. It is good practice to present all declarations and
    41 definitions at the beginning of a theory to facilitate browsing.%
    42 \index{*ToyList example|)}
    43 
    44 \begin{figure}[htbp]
    45 \begin{ttbox}\makeatother
    46 \input{ToyList2/ToyList2}\end{ttbox}
    47 \caption{Proofs about Lists}
    48 \label{fig:ToyList-proofs}
    49 \end{figure}
    50 
    51 \subsubsection*{Review}
    52 
    53 This is the end of our toy proof. It should have familiarized you with
    54 \begin{itemize}
    55 \item the standard theorem proving procedure:
    56 state a goal (lemma or theorem); proceed with proof until a separate lemma is
    57 required; prove that lemma; come back to the original goal.
    58 \item a specific procedure that works well for functional programs:
    59 induction followed by all-out simplification via \isa{auto}.
    60 \item a basic repertoire of proof commands.
    61 \end{itemize}
    62 
    63 
    64 \section{Some Helpful Commands}
    65 \label{sec:commands-and-hints}
    66 
    67 This section discusses a few basic commands for manipulating the proof state
    68 and can be skipped by casual readers.
    69 
    70 There are two kinds of commands used during a proof: the actual proof
    71 commands and auxiliary commands for examining the proof state and controlling
    72 the display. Simple proof commands are of the form
    73 \commdx{apply}\isa{(\textit{method})}, where \textit{method} is typically 
    74 \isa{induct_tac} or \isa{auto}.  All such theorem proving operations
    75 are referred to as \bfindex{methods}, and further ones are
    76 introduced throughout the tutorial.  Unless stated otherwise, you may
    77 assume that a method attacks merely the first subgoal. An exception is
    78 \isa{auto}, which tries to solve all subgoals.
    79 
    80 The most useful auxiliary commands are as follows:
    81 \begin{description}
    82 \item[Undoing:] \commdx{undo} undoes the effect of
    83 the
    84   last command; \isacommand{undo} can be undone by
    85   \commdx{redo}.  Both are only needed at the shell
    86   level and should not occur in the final theory.
    87 \item[Printing the current state:] \commdx{pr}
    88 redisplays
    89   the current proof state, for example when it has scrolled past the top of
    90   the screen.
    91 \item[Limiting the number of subgoals:] \isacommand{pr}~$n$ tells Isabelle to
    92   print only the first $n$ subgoals from now on and redisplays the current
    93   proof state. This is helpful when there are many subgoals.
    94 \item[Modifying the order of subgoals:]
    95 \commdx{defer} moves the first subgoal to the end and
    96 \commdx{prefer}~$n$ moves subgoal $n$ to the front.
    97 \item[Printing theorems:]
    98   \commdx{thm}~\textit{name}$@1$~\dots~\textit{name}$@n$
    99   prints the named theorems.
   100 \item[Displaying types:] We have already mentioned the flag
   101   \texttt{show_types} above.\index{*show_types (flag)}
   102   It can also be useful for detecting misspellings in
   103   formulae. For example, if \texttt{show_types} is set and the goal
   104   \isa{rev(rev xs) = xs} is started, Isabelle prints the additional output
   105 \par\noindent
   106 \begin{isabelle}%
   107 Variables:\isanewline
   108 ~~xs~::~'a~list
   109 \end{isabelle}%
   110 \par\noindent
   111 which tells us that Isabelle has correctly inferred that
   112 \isa{xs} is a variable of list type. On the other hand, had we
   113 made a typo as in \isa{rev(re xs) = xs}, the response
   114 \par\noindent
   115 \begin{isabelle}%
   116 Variables:\isanewline
   117 ~~re~::~'a~list~{\isasymRightarrow}~'a~list\isanewline
   118 ~~xs~::~'a~list%
   119 \end{isabelle}%
   120 \par\noindent
   121 would have alerted us because of the unexpected variable \isa{re}.
   122 \item[Reading terms and types:] \commdx{term}
   123   \textit{string} reads, type-checks and prints the given string as a term in
   124   the current context; the inferred type is output as well.
   125   \commdx{typ} \textit{string} reads and prints the given
   126   string as a type in the current context.
   127 \item[(Re)loading theories:] When you start your interaction you must open a
   128   named theory with the line \commdx{theory}~\isa{T~=~\dots~:}. Isabelle
   129   automatically loads all the required parent theories from their respective
   130   files (which may take a moment, unless the theories are already loaded and
   131   the files have not been modified).
   132   
   133   If you suddenly discover that you need to modify a parent theory of your
   134   current theory, you must first abandon your current theory%
   135   \indexbold{abandoning a theory}\indexbold{theories!abandoning} 
   136   (at the shell
   137   level type \commdx{kill}). After the parent theory has
   138   been modified, you go back to your original theory. When its first line
   139   \isa{\isacommand{theory}~T~=~\dots~:} is processed, the
   140   modified parent is reloaded automatically.
   141   
   142 %  The only time when you need to load a theory by hand is when you simply
   143 %  want to check if it loads successfully without wanting to make use of the
   144 %  theory itself. This you can do by typing
   145 %  \isa{\commdx{use\_thy}~"T"}.
   146 \end{description}
   147 Further commands are found in the Isabelle/Isar Reference Manual.
   148 
   149 We now examine Isabelle's functional programming constructs systematically,
   150 starting with inductive datatypes.
   151 
   152 
   153 \section{Datatypes}
   154 \label{sec:datatype}
   155 
   156 \index{datatypes|(}%
   157 Inductive datatypes are part of almost every non-trivial application of HOL.
   158 First we take another look at an important example, the datatype of
   159 lists, before we turn to datatypes in general. The section closes with a
   160 case study.
   161 
   162 
   163 \subsection{Lists}
   164 
   165 \index{*list (type)}%
   166 Lists are one of the essential datatypes in computing.  We expect that you
   167 are already familiar with their basic operations.
   168 Theory \isa{ToyList} is only a small fragment of HOL's predefined theory
   169 \thydx{List}\footnote{\url{http://isabelle.in.tum.de/library/HOL/List.html}}.
   170 The latter contains many further operations. For example, the functions
   171 \cdx{hd} (``head'') and \cdx{tl} (``tail'') return the first
   172 element and the remainder of a list. (However, pattern-matching is usually
   173 preferable to \isa{hd} and \isa{tl}.)  
   174 Also available are higher-order functions like \isa{map} and \isa{filter}.
   175 Theory \isa{List} also contains
   176 more syntactic sugar: \isa{[}$x@1$\isa{,}\dots\isa{,}$x@n$\isa{]} abbreviates
   177 $x@1$\isa{\#}\dots\isa{\#}$x@n$\isa{\#[]}.  In the rest of the tutorial we
   178 always use HOL's predefined lists.
   179 
   180 
   181 \subsection{The General Format}
   182 \label{sec:general-datatype}
   183 
   184 The general HOL \isacommand{datatype} definition is of the form
   185 \[
   186 \isacommand{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~
   187 C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~
   188 C@m~\tau@{m1}~\dots~\tau@{mk@m}
   189 \]
   190 where $\alpha@i$ are distinct type variables (the parameters), $C@i$ are distinct
   191 constructor names and $\tau@{ij}$ are types; it is customary to capitalize
   192 the first letter in constructor names. There are a number of
   193 restrictions (such as that the type should not be empty) detailed
   194 elsewhere~\cite{isabelle-HOL}. Isabelle notifies you if you violate them.
   195 
   196 Laws about datatypes, such as \isa{[] \isasymnoteq~x\#xs} and
   197 \isa{(x\#xs = y\#ys) = (x=y \isasymand~xs=ys)}, are used automatically
   198 during proofs by simplification.  The same is true for the equations in
   199 primitive recursive function definitions.
   200 
   201 Every datatype $t$ comes equipped with a \isa{size} function from $t$ into
   202 the natural numbers (see~{\S}\ref{sec:nat} below). For lists, \isa{size} is
   203 just the length, i.e.\ \isa{size [] = 0} and \isa{size(x \# xs) = size xs +
   204   1}.  In general, \cdx{size} returns 
   205 \begin{itemize}
   206 \item zero for all constructors
   207 that do not have an argument of type $t$\\
   208 \item one plus the sum of the sizes of all arguments of type~$t$,
   209 for all other constructors
   210 \end{itemize}
   211 Note that because
   212 \isa{size} is defined on every datatype, it is overloaded; on lists
   213 \isa{size} is also called \sdx{length}, which is not overloaded.
   214 Isabelle will always show \isa{size} on lists as \isa{length}.
   215 
   216 
   217 \subsection{Primitive Recursion}
   218 
   219 \index{recursion!primitive}%
   220 Functions on datatypes are usually defined by recursion. In fact, most of the
   221 time they are defined by what is called \textbf{primitive recursion}.
   222 The keyword \commdx{primrec} is followed by a list of
   223 equations
   224 \[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \]
   225 such that $C$ is a constructor of the datatype $t$ and all recursive calls of
   226 $f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus
   227 Isabelle immediately sees that $f$ terminates because one (fixed!) argument
   228 becomes smaller with every recursive call. There must be at most one equation
   229 for each constructor.  Their order is immaterial.
   230 A more general method for defining total recursive functions is introduced in
   231 {\S}\ref{sec:recdef}.
   232 
   233 \begin{exercise}\label{ex:Tree}
   234 \input{Misc/document/Tree.tex}%
   235 \end{exercise}
   236 
   237 \input{Misc/document/case_exprs.tex}
   238 
   239 \input{Ifexpr/document/Ifexpr.tex}
   240 \index{datatypes|)}
   241 
   242 
   243 \section{Some Basic Types}
   244 
   245 This section introduces the types of natural numbers and ordered pairs.  Also
   246 described is type \isa{option}, which is useful for modelling exceptional
   247 cases. 
   248 
   249 \subsection{Natural Numbers}
   250 \label{sec:nat}\index{natural numbers}%
   251 \index{linear arithmetic|(}
   252 
   253 \input{Misc/document/fakenat.tex}
   254 \input{Misc/document/natsum.tex}
   255 
   256 \index{linear arithmetic|)}
   257 
   258 
   259 \subsection{Pairs}
   260 \input{Misc/document/pairs.tex}
   261 
   262 \subsection{Datatype {\tt\slshape option}}
   263 \label{sec:option}
   264 \input{Misc/document/Option2.tex}
   265 
   266 \section{Definitions}
   267 \label{sec:Definitions}
   268 
   269 A definition is simply an abbreviation, i.e.\ a new name for an existing
   270 construction. In particular, definitions cannot be recursive. Isabelle offers
   271 definitions on the level of types and terms. Those on the type level are
   272 called \textbf{type synonyms}; those on the term level are simply called 
   273 definitions.
   274 
   275 
   276 \subsection{Type Synonyms}
   277 
   278 \index{type synonyms|(}%
   279 Type synonyms are similar to those found in ML\@. They are created by a 
   280 \commdx{types} command:
   281 
   282 \input{Misc/document/types.tex}%
   283 \index{type synonyms|)}
   284 
   285 \input{Misc/document/prime_def.tex}
   286 
   287 \input{Misc/document/Translations.tex}
   288 
   289 
   290 \section{The Definitional Approach}
   291 \label{sec:definitional}
   292 
   293 \index{Definitional Approach}%
   294 As we pointed out at the beginning of the chapter, asserting arbitrary
   295 axioms such as $f(n) = f(n) + 1$ can easily lead to contradictions. In order
   296 to avoid this danger, we advocate the definitional rather than
   297 the axiomatic approach: introduce new concepts by definitions. However,  Isabelle/HOL seems to
   298 support many richer definitional constructs, such as
   299 \isacommand{primrec}. The point is that Isabelle reduces such constructs to first principles. For example, each
   300 \isacommand{primrec} function definition is turned into a proper
   301 (nonrecursive!) definition from which the user-supplied recursion equations are
   302 automatically proved.  This process is
   303 hidden from the user, who does not have to understand the details.  Other commands described
   304 later, like \isacommand{recdef} and \isacommand{inductive}, work similarly.  
   305 This strict adherence to the definitional approach reduces the risk of 
   306 soundness errors.
   307 
   308 \chapter{More Functional Programming}
   309 
   310 The purpose of this chapter is to deepen your understanding of the
   311 concepts encountered so far and to introduce advanced forms of datatypes and
   312 recursive functions. The first two sections give a structured presentation of
   313 theorem proving by simplification ({\S}\ref{sec:Simplification}) and discuss
   314 important heuristics for induction ({\S}\ref{sec:InductionHeuristics}).  You can
   315 skip them if you are not planning to perform proofs yourself.
   316 We then present a case
   317 study: a compiler for expressions ({\S}\ref{sec:ExprCompiler}). Advanced
   318 datatypes, including those involving function spaces, are covered in
   319 {\S}\ref{sec:advanced-datatypes}; it closes with another case study, search
   320 trees (``tries'').  Finally we introduce \isacommand{recdef}, a general
   321 form of recursive function definition that goes well beyond 
   322 \isacommand{primrec} ({\S}\ref{sec:recdef}).
   323 
   324 
   325 \section{Simplification}
   326 \label{sec:Simplification}
   327 \index{simplification|(}
   328 
   329 So far we have proved our theorems by \isa{auto}, which simplifies
   330 all subgoals. In fact, \isa{auto} can do much more than that. 
   331 To go beyond toy examples, you
   332 need to understand the ingredients of \isa{auto}.  This section covers the
   333 method that \isa{auto} always applies first, simplification.
   334 
   335 Simplification is one of the central theorem proving tools in Isabelle and
   336 many other systems. The tool itself is called the \textbf{simplifier}. 
   337 This section introduces the many features of the simplifier
   338 and is required reading if you intend to perform proofs.  Later on,
   339 {\S}\ref{sec:simplification-II} explains some more advanced features and a
   340 little bit of how the simplifier works. The serious student should read that
   341 section as well, in particular to understand why the simplifier did
   342 something unexpected.
   343 
   344 \subsection{What is Simplification?}
   345 
   346 In its most basic form, simplification means repeated application of
   347 equations from left to right. For example, taking the rules for \isa{\at}
   348 and applying them to the term \isa{[0,1] \at\ []} results in a sequence of
   349 simplification steps:
   350 \begin{ttbox}\makeatother
   351 (0#1#[]) @ []  \(\leadsto\)  0#((1#[]) @ [])  \(\leadsto\)  0#(1#([] @ []))  \(\leadsto\)  0#1#[]
   352 \end{ttbox}
   353 This is also known as \bfindex{term rewriting}\indexbold{rewriting} and the
   354 equations are referred to as \bfindex{rewrite rules}.
   355 ``Rewriting'' is more honest than ``simplification'' because the terms do not
   356 necessarily become simpler in the process.
   357 
   358 The simplifier proves arithmetic goals as described in
   359 {\S}\ref{sec:nat} above.  Arithmetic expressions are simplified using built-in
   360 procedures that go beyond mere rewrite rules.  New simplification procedures
   361 can be coded and installed, but they are definitely not a matter for this
   362 tutorial. 
   363 
   364 \input{Misc/document/simp.tex}
   365 
   366 \index{simplification|)}
   367 
   368 \input{Misc/document/Itrev.tex}
   369 
   370 \begin{exercise}
   371 \input{Misc/document/Tree2.tex}%
   372 \end{exercise}
   373 
   374 \input{CodeGen/document/CodeGen.tex}
   375 
   376 
   377 \section{Advanced Datatypes}
   378 \label{sec:advanced-datatypes}
   379 \index{datatype@\isacommand {datatype} (command)|(}
   380 \index{primrec@\isacommand {primrec} (command)|(}
   381 %|)
   382 
   383 This section presents advanced forms of datatypes: mutual and nested
   384 recursion.  A series of examples will culminate in a treatment of the trie
   385 data structure.
   386 
   387 
   388 \subsection{Mutual Recursion}
   389 \label{sec:datatype-mut-rec}
   390 
   391 \input{Datatype/document/ABexpr.tex}
   392 
   393 \subsection{Nested Recursion}
   394 \label{sec:nested-datatype}
   395 
   396 {\makeatother\input{Datatype/document/Nested.tex}}
   397 
   398 
   399 \subsection{The Limits of Nested Recursion}
   400 
   401 How far can we push nested recursion? By the unfolding argument above, we can
   402 reduce nested to mutual recursion provided the nested recursion only involves
   403 previously defined datatypes. This does not include functions:
   404 \begin{isabelle}
   405 \isacommand{datatype} t = C "t \isasymRightarrow\ bool"
   406 \end{isabelle}
   407 This declaration is a real can of worms.
   408 In HOL it must be ruled out because it requires a type
   409 \isa{t} such that \isa{t} and its power set \isa{t \isasymFun\ bool} have the
   410 same cardinality --- an impossibility. For the same reason it is not possible
   411 to allow recursion involving the type \isa{set}, which is isomorphic to
   412 \isa{t \isasymFun\ bool}.
   413 
   414 Fortunately, a limited form of recursion
   415 involving function spaces is permitted: the recursive type may occur on the
   416 right of a function arrow, but never on the left. Hence the above can of worms
   417 is ruled out but the following example of a potentially 
   418 \index{infinitely branching trees}%
   419 infinitely branching tree is accepted:
   420 \smallskip
   421 
   422 \input{Datatype/document/Fundata.tex}
   423 
   424 If you need nested recursion on the left of a function arrow, there are
   425 alternatives to pure HOL\@.  In the Logic for Computable Functions 
   426 (\rmindex{LCF}), types like
   427 \begin{isabelle}
   428 \isacommand{datatype} lam = C "lam \isasymrightarrow\ lam"
   429 \end{isabelle}
   430 do indeed make sense~\cite{paulson87}.  Note the different arrow,
   431 \isa{\isasymrightarrow} instead of \isa{\isasymRightarrow},
   432 expressing the type of \emph{continuous} functions. 
   433 There is even a version of LCF on top of HOL,
   434 called \rmindex{HOLCF}~\cite{MuellerNvOS99}.
   435 \index{datatype@\isacommand {datatype} (command)|)}
   436 \index{primrec@\isacommand {primrec} (command)|)}
   437 
   438 
   439 \subsection{Case Study: Tries}
   440 \label{sec:Trie}
   441 
   442 \index{tries|(}%
   443 Tries are a classic search tree data structure~\cite{Knuth3-75} for fast
   444 indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a
   445 trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and
   446 ``cat''.  When searching a string in a trie, the letters of the string are
   447 examined sequentially. Each letter determines which subtrie to search next.
   448 In this case study we model tries as a datatype, define a lookup and an
   449 update function, and prove that they behave as expected.
   450 
   451 \begin{figure}[htbp]
   452 \begin{center}
   453 \unitlength1mm
   454 \begin{picture}(60,30)
   455 \put( 5, 0){\makebox(0,0)[b]{l}}
   456 \put(25, 0){\makebox(0,0)[b]{e}}
   457 \put(35, 0){\makebox(0,0)[b]{n}}
   458 \put(45, 0){\makebox(0,0)[b]{r}}
   459 \put(55, 0){\makebox(0,0)[b]{t}}
   460 %
   461 \put( 5, 9){\line(0,-1){5}}
   462 \put(25, 9){\line(0,-1){5}}
   463 \put(44, 9){\line(-3,-2){9}}
   464 \put(45, 9){\line(0,-1){5}}
   465 \put(46, 9){\line(3,-2){9}}
   466 %
   467 \put( 5,10){\makebox(0,0)[b]{l}}
   468 \put(15,10){\makebox(0,0)[b]{n}}
   469 \put(25,10){\makebox(0,0)[b]{p}}
   470 \put(45,10){\makebox(0,0)[b]{a}}
   471 %
   472 \put(14,19){\line(-3,-2){9}}
   473 \put(15,19){\line(0,-1){5}}
   474 \put(16,19){\line(3,-2){9}}
   475 \put(45,19){\line(0,-1){5}}
   476 %
   477 \put(15,20){\makebox(0,0)[b]{a}}
   478 \put(45,20){\makebox(0,0)[b]{c}}
   479 %
   480 \put(30,30){\line(-3,-2){13}}
   481 \put(30,30){\line(3,-2){13}}
   482 \end{picture}
   483 \end{center}
   484 \caption{A Sample Trie}
   485 \label{fig:trie}
   486 \end{figure}
   487 
   488 Proper tries associate some value with each string. Since the
   489 information is stored only in the final node associated with the string, many
   490 nodes do not carry any value. This distinction is modeled with the help
   491 of the predefined datatype \isa{option} (see {\S}\ref{sec:option}).
   492 \input{Trie/document/Trie.tex}
   493 \index{tries|)}
   494 
   495 \section{Total Recursive Functions}
   496 \label{sec:recdef}
   497 \index{recdef@\isacommand {recdef} (command)|(}\index{functions!total|(}
   498 
   499 Although many total functions have a natural primitive recursive definition,
   500 this is not always the case. Arbitrary total recursive functions can be
   501 defined by means of \isacommand{recdef}: you can use full pattern-matching,
   502 recursion need not involve datatypes, and termination is proved by showing
   503 that the arguments of all recursive calls are smaller in a suitable (user
   504 supplied) sense. In this section we restrict ourselves to measure functions;
   505 more advanced termination proofs are discussed in {\S}\ref{sec:beyond-measure}.
   506 
   507 \subsection{Defining Recursive Functions}
   508 \label{sec:recdef-examples}
   509 \input{Recdef/document/examples.tex}
   510 
   511 \subsection{Proving Termination}
   512 
   513 \input{Recdef/document/termination.tex}
   514 
   515 \subsection{Simplification and Recursive Functions}
   516 \label{sec:recdef-simplification}
   517 
   518 \input{Recdef/document/simplification.tex}
   519 
   520 \subsection{Induction and Recursive Functions}
   521 \index{induction!recursion|(}
   522 \index{recursion induction|(}
   523 
   524 \input{Recdef/document/Induction.tex}
   525 \label{sec:recdef-induction}
   526 
   527 \index{induction!recursion|)}
   528 \index{recursion induction|)}
   529 \index{recdef@\isacommand {recdef} (command)|)}\index{functions!total|)}