doc-src/TutorialI/fp.tex
author wenzelm
Mon, 29 Aug 2005 16:18:02 +0200
changeset 17182 ae84287f44e3
parent 16523 f8a734dc0fbc
child 25258 22d16596c306
permissions -rw-r--r--
tune spacing where a generated theory text is included directly;
     1 \chapter{Functional Programming in HOL}
     2 
     3 This chapter describes how to write
     4 functional programs in HOL and how to verify them.  However, 
     5 most of the constructs and
     6 proof procedures introduced are general and recur in any specification
     7 or verification task.  We really should speak of functional
     8 \emph{modelling} rather than functional \emph{programming}: 
     9 our primary aim is not
    10 to write programs but to design abstract models of systems.  HOL is
    11 a specification language that goes well beyond what can be expressed as a
    12 program. However, for the time being we concentrate on the computable.
    13 
    14 If you are a purist functional programmer, please note that all functions
    15 in HOL must be total:
    16 they must terminate for all inputs.  Lazy data structures are not
    17 directly available.
    18 
    19 \section{An Introductory Theory}
    20 \label{sec:intro-theory}
    21 
    22 Functional programming needs datatypes and functions. Both of them can be
    23 defined in a theory with a syntax reminiscent of languages like ML or
    24 Haskell. As an example consider the theory in figure~\ref{fig:ToyList}.
    25 We will now examine it line by line.
    26 
    27 \begin{figure}[htbp]
    28 \begin{ttbox}\makeatother
    29 \input{ToyList2/ToyList1}\end{ttbox}
    30 \caption{A Theory of Lists}
    31 \label{fig:ToyList}
    32 \end{figure}
    33 
    34 \index{*ToyList example|(}
    35 {\makeatother\medskip\input{ToyList/document/ToyList.tex}}
    36 
    37 The complete proof script is shown in Fig.\ts\ref{fig:ToyList-proofs}. The
    38 concatenation of Figs.\ts\ref{fig:ToyList} and~\ref{fig:ToyList-proofs}
    39 constitutes the complete theory \texttt{ToyList} and should reside in file
    40 \texttt{ToyList.thy}.
    41 % It is good practice to present all declarations and
    42 %definitions at the beginning of a theory to facilitate browsing.%
    43 \index{*ToyList example|)}
    44 
    45 \begin{figure}[htbp]
    46 \begin{ttbox}\makeatother
    47 \input{ToyList2/ToyList2}\end{ttbox}
    48 \caption{Proofs about Lists}
    49 \label{fig:ToyList-proofs}
    50 \end{figure}
    51 
    52 \subsubsection*{Review}
    53 
    54 This is the end of our toy proof. It should have familiarized you with
    55 \begin{itemize}
    56 \item the standard theorem proving procedure:
    57 state a goal (lemma or theorem); proceed with proof until a separate lemma is
    58 required; prove that lemma; come back to the original goal.
    59 \item a specific procedure that works well for functional programs:
    60 induction followed by all-out simplification via \isa{auto}.
    61 \item a basic repertoire of proof commands.
    62 \end{itemize}
    63 
    64 \begin{warn}
    65 It is tempting to think that all lemmas should have the \isa{simp} attribute
    66 just because this was the case in the example above. However, in that example
    67 all lemmas were equations, and the right-hand side was simpler than the
    68 left-hand side --- an ideal situation for simplification purposes. Unless
    69 this is clearly the case, novices should refrain from awarding a lemma the
    70 \isa{simp} attribute, which has a global effect. Instead, lemmas can be
    71 applied locally where they are needed, which is discussed in the following
    72 chapter.
    73 \end{warn}
    74 
    75 \section{Some Helpful Commands}
    76 \label{sec:commands-and-hints}
    77 
    78 This section discusses a few basic commands for manipulating the proof state
    79 and can be skipped by casual readers.
    80 
    81 There are two kinds of commands used during a proof: the actual proof
    82 commands and auxiliary commands for examining the proof state and controlling
    83 the display. Simple proof commands are of the form
    84 \commdx{apply}(\textit{method}), where \textit{method} is typically 
    85 \isa{induct_tac} or \isa{auto}.  All such theorem proving operations
    86 are referred to as \bfindex{methods}, and further ones are
    87 introduced throughout the tutorial.  Unless stated otherwise, you may
    88 assume that a method attacks merely the first subgoal. An exception is
    89 \isa{auto}, which tries to solve all subgoals.
    90 
    91 The most useful auxiliary commands are as follows:
    92 \begin{description}
    93 \item[Modifying the order of subgoals:]
    94 \commdx{defer} moves the first subgoal to the end and
    95 \commdx{prefer}~$n$ moves subgoal $n$ to the front.
    96 \item[Printing theorems:]
    97   \commdx{thm}~\textit{name}$@1$~\dots~\textit{name}$@n$
    98   prints the named theorems.
    99 \item[Reading terms and types:] \commdx{term}
   100   \textit{string} reads, type-checks and prints the given string as a term in
   101   the current context; the inferred type is output as well.
   102   \commdx{typ} \textit{string} reads and prints the given
   103   string as a type in the current context.
   104 \end{description}
   105 Further commands are found in the Isabelle/Isar Reference
   106 Manual~\cite{isabelle-isar-ref}.
   107 
   108 \begin{pgnote}
   109 Clicking on the \pgmenu{State} button redisplays the current proof state.
   110 This is helpful in case commands like \isacommand{thm} have overwritten it.
   111 \end{pgnote}
   112 
   113 We now examine Isabelle's functional programming constructs systematically,
   114 starting with inductive datatypes.
   115 
   116 
   117 \section{Datatypes}
   118 \label{sec:datatype}
   119 
   120 \index{datatypes|(}%
   121 Inductive datatypes are part of almost every non-trivial application of HOL.
   122 First we take another look at an important example, the datatype of
   123 lists, before we turn to datatypes in general. The section closes with a
   124 case study.
   125 
   126 
   127 \subsection{Lists}
   128 
   129 \index{*list (type)}%
   130 Lists are one of the essential datatypes in computing.  We expect that you
   131 are already familiar with their basic operations.
   132 Theory \isa{ToyList} is only a small fragment of HOL's predefined theory
   133 \thydx{List}\footnote{\url{http://isabelle.in.tum.de/library/HOL/List.html}}.
   134 The latter contains many further operations. For example, the functions
   135 \cdx{hd} (``head'') and \cdx{tl} (``tail'') return the first
   136 element and the remainder of a list. (However, pattern-matching is usually
   137 preferable to \isa{hd} and \isa{tl}.)  
   138 Also available are higher-order functions like \isa{map} and \isa{filter}.
   139 Theory \isa{List} also contains
   140 more syntactic sugar: \isa{[}$x@1$\isa{,}\dots\isa{,}$x@n$\isa{]} abbreviates
   141 $x@1$\isa{\#}\dots\isa{\#}$x@n$\isa{\#[]}.  In the rest of the tutorial we
   142 always use HOL's predefined lists by building on theory \isa{Main}.
   143 
   144 
   145 \subsection{The General Format}
   146 \label{sec:general-datatype}
   147 
   148 The general HOL \isacommand{datatype} definition is of the form
   149 \[
   150 \isacommand{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~
   151 C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~
   152 C@m~\tau@{m1}~\dots~\tau@{mk@m}
   153 \]
   154 where $\alpha@i$ are distinct type variables (the parameters), $C@i$ are distinct
   155 constructor names and $\tau@{ij}$ are types; it is customary to capitalize
   156 the first letter in constructor names. There are a number of
   157 restrictions (such as that the type should not be empty) detailed
   158 elsewhere~\cite{isabelle-HOL}. Isabelle notifies you if you violate them.
   159 
   160 Laws about datatypes, such as \isa{[] \isasymnoteq~x\#xs} and
   161 \isa{(x\#xs = y\#ys) = (x=y \isasymand~xs=ys)}, are used automatically
   162 during proofs by simplification.  The same is true for the equations in
   163 primitive recursive function definitions.
   164 
   165 Every\footnote{Except for advanced datatypes where the recursion involves
   166 ``\isasymRightarrow'' as in {\S}\ref{sec:nested-fun-datatype}.} datatype $t$
   167 comes equipped with a \isa{size} function from $t$ into the natural numbers
   168 (see~{\S}\ref{sec:nat} below). For lists, \isa{size} is just the length, i.e.\
   169 \isa{size [] = 0} and \isa{size(x \# xs) = size xs + 1}.  In general,
   170 \cdx{size} returns
   171 \begin{itemize}
   172 \item zero for all constructors
   173 that do not have an argument of type $t$\\
   174 \item one plus the sum of the sizes of all arguments of type~$t$,
   175 for all other constructors
   176 \end{itemize}
   177 Note that because
   178 \isa{size} is defined on every datatype, it is overloaded; on lists
   179 \isa{size} is also called \sdx{length}, which is not overloaded.
   180 Isabelle will always show \isa{size} on lists as \isa{length}.
   181 
   182 
   183 \subsection{Primitive Recursion}
   184 
   185 \index{recursion!primitive}%
   186 Functions on datatypes are usually defined by recursion. In fact, most of the
   187 time they are defined by what is called \textbf{primitive recursion}.
   188 The keyword \commdx{primrec} is followed by a list of
   189 equations
   190 \[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \]
   191 such that $C$ is a constructor of the datatype $t$ and all recursive calls of
   192 $f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus
   193 Isabelle immediately sees that $f$ terminates because one (fixed!) argument
   194 becomes smaller with every recursive call. There must be at most one equation
   195 for each constructor.  Their order is immaterial.
   196 A more general method for defining total recursive functions is introduced in
   197 {\S}\ref{sec:recdef}.
   198 
   199 \begin{exercise}\label{ex:Tree}
   200 \input{Misc/document/Tree.tex}%
   201 \end{exercise}
   202 
   203 \input{Misc/document/case_exprs.tex}
   204 
   205 \input{Ifexpr/document/Ifexpr.tex}
   206 \index{datatypes|)}
   207 
   208 
   209 \section{Some Basic Types}
   210 
   211 This section introduces the types of natural numbers and ordered pairs.  Also
   212 described is type \isa{option}, which is useful for modelling exceptional
   213 cases. 
   214 
   215 \subsection{Natural Numbers}
   216 \label{sec:nat}\index{natural numbers}%
   217 \index{linear arithmetic|(}
   218 
   219 \input{Misc/document/fakenat.tex}\medskip
   220 \input{Misc/document/natsum.tex}
   221 
   222 \index{linear arithmetic|)}
   223 
   224 
   225 \subsection{Pairs}
   226 \input{Misc/document/pairs.tex}
   227 
   228 \subsection{Datatype {\tt\slshape option}}
   229 \label{sec:option}
   230 \input{Misc/document/Option2.tex}
   231 
   232 \section{Definitions}
   233 \label{sec:Definitions}
   234 
   235 A definition is simply an abbreviation, i.e.\ a new name for an existing
   236 construction. In particular, definitions cannot be recursive. Isabelle offers
   237 definitions on the level of types and terms. Those on the type level are
   238 called \textbf{type synonyms}; those on the term level are simply called 
   239 definitions.
   240 
   241 
   242 \subsection{Type Synonyms}
   243 
   244 \index{type synonyms}%
   245 Type synonyms are similar to those found in ML\@. They are created by a 
   246 \commdx{types} command:
   247 
   248 \medskip
   249 \input{Misc/document/types.tex}
   250 
   251 \input{Misc/document/prime_def.tex}
   252 
   253 
   254 \section{The Definitional Approach}
   255 \label{sec:definitional}
   256 
   257 \index{Definitional Approach}%
   258 As we pointed out at the beginning of the chapter, asserting arbitrary
   259 axioms such as $f(n) = f(n) + 1$ can easily lead to contradictions. In order
   260 to avoid this danger, we advocate the definitional rather than
   261 the axiomatic approach: introduce new concepts by definitions. However,  Isabelle/HOL seems to
   262 support many richer definitional constructs, such as
   263 \isacommand{primrec}. The point is that Isabelle reduces such constructs to first principles. For example, each
   264 \isacommand{primrec} function definition is turned into a proper
   265 (nonrecursive!) definition from which the user-supplied recursion equations are
   266 automatically proved.  This process is
   267 hidden from the user, who does not have to understand the details.  Other commands described
   268 later, like \isacommand{recdef} and \isacommand{inductive}, work similarly.  
   269 This strict adherence to the definitional approach reduces the risk of 
   270 soundness errors.
   271 
   272 \chapter{More Functional Programming}
   273 
   274 The purpose of this chapter is to deepen your understanding of the
   275 concepts encountered so far and to introduce advanced forms of datatypes and
   276 recursive functions. The first two sections give a structured presentation of
   277 theorem proving by simplification ({\S}\ref{sec:Simplification}) and discuss
   278 important heuristics for induction ({\S}\ref{sec:InductionHeuristics}).  You can
   279 skip them if you are not planning to perform proofs yourself.
   280 We then present a case
   281 study: a compiler for expressions ({\S}\ref{sec:ExprCompiler}). Advanced
   282 datatypes, including those involving function spaces, are covered in
   283 {\S}\ref{sec:advanced-datatypes}; it closes with another case study, search
   284 trees (``tries'').  Finally we introduce \isacommand{recdef}, a general
   285 form of recursive function definition that goes well beyond 
   286 \isacommand{primrec} ({\S}\ref{sec:recdef}).
   287 
   288 
   289 \section{Simplification}
   290 \label{sec:Simplification}
   291 \index{simplification|(}
   292 
   293 So far we have proved our theorems by \isa{auto}, which simplifies
   294 all subgoals. In fact, \isa{auto} can do much more than that. 
   295 To go beyond toy examples, you
   296 need to understand the ingredients of \isa{auto}.  This section covers the
   297 method that \isa{auto} always applies first, simplification.
   298 
   299 Simplification is one of the central theorem proving tools in Isabelle and
   300 many other systems. The tool itself is called the \textbf{simplifier}. 
   301 This section introduces the many features of the simplifier
   302 and is required reading if you intend to perform proofs.  Later on,
   303 {\S}\ref{sec:simplification-II} explains some more advanced features and a
   304 little bit of how the simplifier works. The serious student should read that
   305 section as well, in particular to understand why the simplifier did
   306 something unexpected.
   307 
   308 \subsection{What is Simplification?}
   309 
   310 In its most basic form, simplification means repeated application of
   311 equations from left to right. For example, taking the rules for \isa{\at}
   312 and applying them to the term \isa{[0,1] \at\ []} results in a sequence of
   313 simplification steps:
   314 \begin{ttbox}\makeatother
   315 (0#1#[]) @ []  \(\leadsto\)  0#((1#[]) @ [])  \(\leadsto\)  0#(1#([] @ []))  \(\leadsto\)  0#1#[]
   316 \end{ttbox}
   317 This is also known as \bfindex{term rewriting}\indexbold{rewriting} and the
   318 equations are referred to as \bfindex{rewrite rules}.
   319 ``Rewriting'' is more honest than ``simplification'' because the terms do not
   320 necessarily become simpler in the process.
   321 
   322 The simplifier proves arithmetic goals as described in
   323 {\S}\ref{sec:nat} above.  Arithmetic expressions are simplified using built-in
   324 procedures that go beyond mere rewrite rules.  New simplification procedures
   325 can be coded and installed, but they are definitely not a matter for this
   326 tutorial. 
   327 
   328 \input{Misc/document/simp.tex}
   329 
   330 \index{simplification|)}
   331 
   332 \input{Misc/document/Itrev.tex}
   333 \begin{exercise}
   334 \input{Misc/document/Plus.tex}%
   335 \end{exercise}
   336 \begin{exercise}
   337 \input{Misc/document/Tree2.tex}%
   338 \end{exercise}
   339 
   340 \input{CodeGen/document/CodeGen.tex}
   341 
   342 
   343 \section{Advanced Datatypes}
   344 \label{sec:advanced-datatypes}
   345 \index{datatype@\isacommand {datatype} (command)|(}
   346 \index{primrec@\isacommand {primrec} (command)|(}
   347 %|)
   348 
   349 This section presents advanced forms of datatypes: mutual and nested
   350 recursion.  A series of examples will culminate in a treatment of the trie
   351 data structure.
   352 
   353 
   354 \subsection{Mutual Recursion}
   355 \label{sec:datatype-mut-rec}
   356 
   357 \input{Datatype/document/ABexpr.tex}
   358 
   359 \subsection{Nested Recursion}
   360 \label{sec:nested-datatype}
   361 
   362 {\makeatother\input{Datatype/document/Nested.tex}}
   363 
   364 
   365 \subsection{The Limits of Nested Recursion}
   366 \label{sec:nested-fun-datatype}
   367 
   368 How far can we push nested recursion? By the unfolding argument above, we can
   369 reduce nested to mutual recursion provided the nested recursion only involves
   370 previously defined datatypes. This does not include functions:
   371 \begin{isabelle}
   372 \isacommand{datatype} t = C "t \isasymRightarrow\ bool"
   373 \end{isabelle}
   374 This declaration is a real can of worms.
   375 In HOL it must be ruled out because it requires a type
   376 \isa{t} such that \isa{t} and its power set \isa{t \isasymFun\ bool} have the
   377 same cardinality --- an impossibility. For the same reason it is not possible
   378 to allow recursion involving the type \isa{t set}, which is isomorphic to
   379 \isa{t \isasymFun\ bool}.
   380 
   381 Fortunately, a limited form of recursion
   382 involving function spaces is permitted: the recursive type may occur on the
   383 right of a function arrow, but never on the left. Hence the above can of worms
   384 is ruled out but the following example of a potentially 
   385 \index{infinitely branching trees}%
   386 infinitely branching tree is accepted:
   387 \smallskip
   388 
   389 \input{Datatype/document/Fundata.tex}
   390 
   391 If you need nested recursion on the left of a function arrow, there are
   392 alternatives to pure HOL\@.  In the Logic for Computable Functions 
   393 (\rmindex{LCF}), types like
   394 \begin{isabelle}
   395 \isacommand{datatype} lam = C "lam \isasymrightarrow\ lam"
   396 \end{isabelle}
   397 do indeed make sense~\cite{paulson87}.  Note the different arrow,
   398 \isa{\isasymrightarrow} instead of \isa{\isasymRightarrow},
   399 expressing the type of \emph{continuous} functions. 
   400 There is even a version of LCF on top of HOL,
   401 called \rmindex{HOLCF}~\cite{MuellerNvOS99}.
   402 \index{datatype@\isacommand {datatype} (command)|)}
   403 \index{primrec@\isacommand {primrec} (command)|)}
   404 
   405 
   406 \subsection{Case Study: Tries}
   407 \label{sec:Trie}
   408 
   409 \index{tries|(}%
   410 Tries are a classic search tree data structure~\cite{Knuth3-75} for fast
   411 indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a
   412 trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and
   413 ``cat''.  When searching a string in a trie, the letters of the string are
   414 examined sequentially. Each letter determines which subtrie to search next.
   415 In this case study we model tries as a datatype, define a lookup and an
   416 update function, and prove that they behave as expected.
   417 
   418 \begin{figure}[htbp]
   419 \begin{center}
   420 \unitlength1mm
   421 \begin{picture}(60,30)
   422 \put( 5, 0){\makebox(0,0)[b]{l}}
   423 \put(25, 0){\makebox(0,0)[b]{e}}
   424 \put(35, 0){\makebox(0,0)[b]{n}}
   425 \put(45, 0){\makebox(0,0)[b]{r}}
   426 \put(55, 0){\makebox(0,0)[b]{t}}
   427 %
   428 \put( 5, 9){\line(0,-1){5}}
   429 \put(25, 9){\line(0,-1){5}}
   430 \put(44, 9){\line(-3,-2){9}}
   431 \put(45, 9){\line(0,-1){5}}
   432 \put(46, 9){\line(3,-2){9}}
   433 %
   434 \put( 5,10){\makebox(0,0)[b]{l}}
   435 \put(15,10){\makebox(0,0)[b]{n}}
   436 \put(25,10){\makebox(0,0)[b]{p}}
   437 \put(45,10){\makebox(0,0)[b]{a}}
   438 %
   439 \put(14,19){\line(-3,-2){9}}
   440 \put(15,19){\line(0,-1){5}}
   441 \put(16,19){\line(3,-2){9}}
   442 \put(45,19){\line(0,-1){5}}
   443 %
   444 \put(15,20){\makebox(0,0)[b]{a}}
   445 \put(45,20){\makebox(0,0)[b]{c}}
   446 %
   447 \put(30,30){\line(-3,-2){13}}
   448 \put(30,30){\line(3,-2){13}}
   449 \end{picture}
   450 \end{center}
   451 \caption{A Sample Trie}
   452 \label{fig:trie}
   453 \end{figure}
   454 
   455 Proper tries associate some value with each string. Since the
   456 information is stored only in the final node associated with the string, many
   457 nodes do not carry any value. This distinction is modeled with the help
   458 of the predefined datatype \isa{option} (see {\S}\ref{sec:option}).
   459 \input{Trie/document/Trie.tex}
   460 \index{tries|)}
   461 
   462 \section{Total Recursive Functions}
   463 \label{sec:recdef}
   464 \index{recdef@\isacommand {recdef} (command)|(}\index{functions!total|(}
   465 
   466 Although many total functions have a natural primitive recursive definition,
   467 this is not always the case. Arbitrary total recursive functions can be
   468 defined by means of \isacommand{recdef}: you can use full pattern-matching,
   469 recursion need not involve datatypes, and termination is proved by showing
   470 that the arguments of all recursive calls are smaller in a suitable (user
   471 supplied) sense. In this section we restrict ourselves to measure functions;
   472 more advanced termination proofs are discussed in {\S}\ref{sec:beyond-measure}.
   473 
   474 \subsection{Defining Recursive Functions}
   475 \label{sec:recdef-examples}
   476 \input{Recdef/document/examples.tex}
   477 
   478 \subsection{Proving Termination}
   479 \input{Recdef/document/termination.tex}
   480 
   481 \subsection{Simplification and Recursive Functions}
   482 \label{sec:recdef-simplification}
   483 \input{Recdef/document/simplification.tex}
   484 
   485 \subsection{Induction and Recursive Functions}
   486 \index{induction!recursion|(}
   487 \index{recursion induction|(}
   488 
   489 \input{Recdef/document/Induction.tex}
   490 \label{sec:recdef-induction}
   491 
   492 \index{induction!recursion|)}
   493 \index{recursion induction|)}
   494 \index{recdef@\isacommand {recdef} (command)|)}\index{functions!total|)}