doc-src/TutorialI/fp.tex
author nipkow
Mon, 19 Mar 2001 13:28:06 +0100
changeset 11215 b44ad7e4c4d2
parent 11214 3b3cc0cf533f
child 11294 16481a4cc9f3
permissions -rw-r--r--
*** empty log message ***
     1 \chapter{Functional Programming in HOL}
     2 
     3 Although on the surface this chapter is mainly concerned with how to write
     4 functional programs in HOL and how to verify them, most of the constructs and
     5 proof procedures introduced are general purpose and recur in any specification
     6 or verification task. In fact, it we should really speak of functional
     7 \emph{modelling} rather than \emph{programming} because our aim is not
     8 primarily to write programs but to design abstract models of systems.  HOL is
     9 a specification language that goes well beyond what can be expressed as a
    10 program. However, for the time being we concentrate on the computable.
    11 
    12 The dedicated functional programmer should be warned: HOL offers only
    13 \emph{total functional programming} --- all functions in HOL must be total,
    14 i.e.\ they must terminate for all inputs; lazy data structures are not
    15 directly available.
    16 
    17 \section{An Introductory Theory}
    18 \label{sec:intro-theory}
    19 
    20 Functional programming needs datatypes and functions. Both of them can be
    21 defined in a theory with a syntax reminiscent of languages like ML or
    22 Haskell. As an example consider the theory in figure~\ref{fig:ToyList}.
    23 We will now examine it line by line.
    24 
    25 \begin{figure}[htbp]
    26 \begin{ttbox}\makeatother
    27 \input{ToyList2/ToyList1}\end{ttbox}
    28 \caption{A theory of lists}
    29 \label{fig:ToyList}
    30 \end{figure}
    31 
    32 {\makeatother\input{ToyList/document/ToyList.tex}}
    33 
    34 The complete proof script is shown in Fig.\ts\ref{fig:ToyList-proofs}. The
    35 concatenation of Figs.\ts\ref{fig:ToyList} and~\ref{fig:ToyList-proofs}
    36 constitutes the complete theory \texttt{ToyList} and should reside in file
    37 \texttt{ToyList.thy}. It is good practice to present all declarations and
    38 definitions at the beginning of a theory to facilitate browsing.
    39 
    40 \begin{figure}[htbp]
    41 \begin{ttbox}\makeatother
    42 \input{ToyList2/ToyList2}\end{ttbox}
    43 \caption{Proofs about lists}
    44 \label{fig:ToyList-proofs}
    45 \end{figure}
    46 
    47 \subsubsection*{Review}
    48 
    49 This is the end of our toy proof. It should have familiarized you with
    50 \begin{itemize}
    51 \item the standard theorem proving procedure:
    52 state a goal (lemma or theorem); proceed with proof until a separate lemma is
    53 required; prove that lemma; come back to the original goal.
    54 \item a specific procedure that works well for functional programs:
    55 induction followed by all-out simplification via \isa{auto}.
    56 \item a basic repertoire of proof commands.
    57 \end{itemize}
    58 
    59 
    60 \section{Some Helpful Commands}
    61 \label{sec:commands-and-hints}
    62 
    63 This section discusses a few basic commands for manipulating the proof state
    64 and can be skipped by casual readers.
    65 
    66 There are two kinds of commands used during a proof: the actual proof
    67 commands and auxiliary commands for examining the proof state and controlling
    68 the display. Simple proof commands are of the form
    69 \isacommand{apply}\isa{(method)}\indexbold{apply} where \bfindex{method} is a
    70 synonym for ``theorem proving function''. Typical examples are
    71 \isa{induct_tac} and \isa{auto}. Further methods are introduced throughout
    72 the tutorial.  Unless stated otherwise you may assume that a method attacks
    73 merely the first subgoal. An exception is \isa{auto} which tries to solve all
    74 subgoals.
    75 
    76 The most useful auxiliary commands are:
    77 \begin{description}
    78 \item[Undoing:] \isacommand{undo}\indexbold{*undo} undoes the effect of the
    79   last command; \isacommand{undo} can be undone by
    80   \isacommand{redo}\indexbold{*redo}.  Both are only needed at the shell
    81   level and should not occur in the final theory.
    82 \item[Printing the current state:] \isacommand{pr}\indexbold{*pr} redisplays
    83   the current proof state, for example when it has disappeared off the
    84   screen.
    85 \item[Limiting the number of subgoals:] \isacommand{pr}~$n$ tells Isabelle to
    86   print only the first $n$ subgoals from now on and redisplays the current
    87   proof state. This is helpful when there are many subgoals.
    88 \item[Modifying the order of subgoals:]
    89 \isacommand{defer}\indexbold{*defer} moves the first subgoal to the end and
    90 \isacommand{prefer}\indexbold{*prefer}~$n$ moves subgoal $n$ to the front.
    91 \item[Printing theorems:]
    92   \isacommand{thm}\indexbold{*thm}~\textit{name}$@1$~\dots~\textit{name}$@n$
    93   prints the named theorems.
    94 \item[Displaying types:] We have already mentioned the flag
    95   \ttindex{show_types} above. It can also be useful for detecting typos in
    96   formulae early on. For example, if \texttt{show_types} is set and the goal
    97   \isa{rev(rev xs) = xs} is started, Isabelle prints the additional output
    98 \par\noindent
    99 \begin{isabelle}%
   100 Variables:\isanewline
   101 ~~xs~::~'a~list
   102 \end{isabelle}%
   103 \par\noindent
   104 which tells us that Isabelle has correctly inferred that
   105 \isa{xs} is a variable of list type. On the other hand, had we
   106 made a typo as in \isa{rev(re xs) = xs}, the response
   107 \par\noindent
   108 \begin{isabelle}%
   109 Variables:\isanewline
   110 ~~re~::~'a~list~{\isasymRightarrow}~'a~list\isanewline
   111 ~~xs~::~'a~list%
   112 \end{isabelle}%
   113 \par\noindent
   114 would have alerted us because of the unexpected variable \isa{re}.
   115 \item[Reading terms and types:] \isacommand{term}\indexbold{*term}
   116   \textit{string} reads, type-checks and prints the given string as a term in
   117   the current context; the inferred type is output as well.
   118   \isacommand{typ}\indexbold{*typ} \textit{string} reads and prints the given
   119   string as a type in the current context.
   120 \item[(Re)loading theories:] When you start your interaction you must open a
   121   named theory with the line \isa{\isacommand{theory}~T~=~\dots~:}. Isabelle
   122   automatically loads all the required parent theories from their respective
   123   files (which may take a moment, unless the theories are already loaded and
   124   the files have not been modified).
   125   
   126   If you suddenly discover that you need to modify a parent theory of your
   127   current theory, you must first abandon your current theory\indexbold{abandon
   128   theory}\indexbold{theory!abandon} (at the shell
   129   level type \isacommand{kill}\indexbold{*kill}). After the parent theory has
   130   been modified, you go back to your original theory. When its first line
   131   \isa{\isacommand{theory}~T~=~\dots~:} is processed, the
   132   modified parent is reloaded automatically.
   133   
   134 %  The only time when you need to load a theory by hand is when you simply
   135 %  want to check if it loads successfully without wanting to make use of the
   136 %  theory itself. This you can do by typing
   137 %  \isa{\isacommand{use\_thy}\indexbold{*use_thy}~"T"}.
   138 \end{description}
   139 Further commands are found in the Isabelle/Isar Reference Manual.
   140 
   141 We now examine Isabelle's functional programming constructs systematically,
   142 starting with inductive datatypes.
   143 
   144 
   145 \section{Datatypes}
   146 \label{sec:datatype}
   147 
   148 Inductive datatypes are part of almost every non-trivial application of HOL.
   149 First we take another look at a very important example, the datatype of
   150 lists, before we turn to datatypes in general. The section closes with a
   151 case study.
   152 
   153 
   154 \subsection{Lists}
   155 \indexbold{*list}
   156 
   157 Lists are one of the essential datatypes in computing. Readers of this
   158 tutorial and users of HOL need to be familiar with their basic operations.
   159 Theory \isa{ToyList} is only a small fragment of HOL's predefined theory
   160 \isa{List}\footnote{\url{http://isabelle.in.tum.de/library/HOL/List.html}}.
   161 The latter contains many further operations. For example, the functions
   162 \isaindexbold{hd} (``head'') and \isaindexbold{tl} (``tail'') return the first
   163 element and the remainder of a list. (However, pattern-matching is usually
   164 preferable to \isa{hd} and \isa{tl}.)  
   165 Also available are higher-order functions like \isa{map} and \isa{filter}.
   166 Theory \isa{List} also contains
   167 more syntactic sugar: \isa{[}$x@1$\isa{,}\dots\isa{,}$x@n$\isa{]} abbreviates
   168 $x@1$\isa{\#}\dots\isa{\#}$x@n$\isa{\#[]}.  In the rest of the tutorial we
   169 always use HOL's predefined lists.
   170 
   171 
   172 \subsection{The General Format}
   173 \label{sec:general-datatype}
   174 
   175 The general HOL \isacommand{datatype} definition is of the form
   176 \[
   177 \isacommand{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~
   178 C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~
   179 C@m~\tau@{m1}~\dots~\tau@{mk@m}
   180 \]
   181 where $\alpha@i$ are distinct type variables (the parameters), $C@i$ are distinct
   182 constructor names and $\tau@{ij}$ are types; it is customary to capitalize
   183 the first letter in constructor names. There are a number of
   184 restrictions (such as that the type should not be empty) detailed
   185 elsewhere~\cite{isabelle-HOL}. Isabelle notifies you if you violate them.
   186 
   187 Laws about datatypes, such as \isa{[] \isasymnoteq~x\#xs} and
   188 \isa{(x\#xs = y\#ys) = (x=y \isasymand~xs=ys)}, are used automatically
   189 during proofs by simplification.  The same is true for the equations in
   190 primitive recursive function definitions.
   191 
   192 Every datatype $t$ comes equipped with a \isa{size} function from $t$ into
   193 the natural numbers (see~{\S}\ref{sec:nat} below). For lists, \isa{size} is
   194 just the length, i.e.\ \isa{size [] = 0} and \isa{size(x \# xs) = size xs +
   195   1}.  In general, \isaindexbold{size} returns \isa{0} for all constructors
   196 that do not have an argument of type $t$, and for all other constructors
   197 \isa{1 +} the sum of the sizes of all arguments of type $t$. Note that because
   198 \isa{size} is defined on every datatype, it is overloaded; on lists
   199 \isa{size} is also called \isaindexbold{length}, which is not overloaded.
   200 Isabelle will always show \isa{size} on lists as \isa{length}.
   201 
   202 
   203 \subsection{Primitive Recursion}
   204 
   205 Functions on datatypes are usually defined by recursion. In fact, most of the
   206 time they are defined by what is called \bfindex{primitive recursion}.
   207 The keyword \isacommand{primrec}\indexbold{*primrec} is followed by a list of
   208 equations
   209 \[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \]
   210 such that $C$ is a constructor of the datatype $t$ and all recursive calls of
   211 $f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus
   212 Isabelle immediately sees that $f$ terminates because one (fixed!) argument
   213 becomes smaller with every recursive call. There must be at most one equation
   214 for each constructor.  Their order is immaterial.
   215 A more general method for defining total recursive functions is introduced in
   216 {\S}\ref{sec:recdef}.
   217 
   218 \begin{exercise}\label{ex:Tree}
   219 \input{Misc/document/Tree.tex}%
   220 \end{exercise}
   221 
   222 \input{Misc/document/case_exprs.tex}
   223 
   224 \input{Ifexpr/document/Ifexpr.tex}
   225 
   226 \section{Some Basic Types}
   227 
   228 
   229 \subsection{Natural Numbers}
   230 \label{sec:nat}
   231 \index{arithmetic|(}
   232 
   233 \input{Misc/document/fakenat.tex}
   234 \input{Misc/document/natsum.tex}
   235 
   236 \index{arithmetic|)}
   237 
   238 
   239 \subsection{Pairs}
   240 \input{Misc/document/pairs.tex}
   241 
   242 \subsection{Datatype {\tt\slshape option}}
   243 \label{sec:option}
   244 \input{Misc/document/Option2.tex}
   245 
   246 \section{Definitions}
   247 \label{sec:Definitions}
   248 
   249 A definition is simply an abbreviation, i.e.\ a new name for an existing
   250 construction. In particular, definitions cannot be recursive. Isabelle offers
   251 definitions on the level of types and terms. Those on the type level are
   252 called type synonyms, those on the term level are called (constant)
   253 definitions.
   254 
   255 
   256 \subsection{Type Synonyms}
   257 \indexbold{type synonym}
   258 
   259 Type synonyms are similar to those found in ML\@. Their syntax is fairly self
   260 explanatory:
   261 
   262 \input{Misc/document/types.tex}%
   263 
   264 Note that pattern-matching is not allowed, i.e.\ each definition must be of
   265 the form $f\,x@1\,\dots\,x@n~\isasymequiv~t$.
   266 Section~{\S}\ref{sec:Simp-with-Defs} explains how definitions are used
   267 in proofs.
   268 
   269 \input{Misc/document/prime_def.tex}
   270 
   271 \input{Misc/document/Translations.tex}
   272 
   273 
   274 \section{The Definitional Approach}
   275 \label{sec:definitional}
   276 
   277 As we pointed out at the beginning of the chapter, asserting arbitrary
   278 axioms, e.g. $f(n) = f(n) + 1$, may easily lead to contradictions. In order
   279 to avoid this danger, this tutorial advocates the definitional rather than
   280 the axiomatic approach: introduce new concepts by definitions, thus
   281 preserving consistency. However, on the face of it, Isabelle/HOL seems to
   282 support many more constructs than just definitions, for example
   283 \isacommand{primrec}. The crucial point is that internally, everything
   284 (except real axioms) is reduced to a definition. For example, given some
   285 \isacommand{primrec} function definition, this is turned into a proper
   286 (nonrecursive!) definition, and the user-supplied recursion equations are
   287 derived as theorems from that definition. This tricky process is completely
   288 hidden from the user and it is not necessary to understand the details. The
   289 result of the definitional approach is that \isacommand{primrec} is as safe
   290 as pure \isacommand{defs} are, but more convenient. And this is not just the
   291 case for \isacommand{primrec} but also for the other commands described
   292 later, like \isacommand{recdef} and \isacommand{inductive}.
   293 This strict adherence to the definitional approach reduces the risk of 
   294 soundness errors inside Isabelle/HOL.
   295 
   296 \chapter{More Functional Programming}
   297 
   298 The purpose of this chapter is to deepen the reader's understanding of the
   299 concepts encountered so far and to introduce advanced forms of datatypes and
   300 recursive functions. The first two sections give a structured presentation of
   301 theorem proving by simplification ({\S}\ref{sec:Simplification}) and discuss
   302 important heuristics for induction ({\S}\ref{sec:InductionHeuristics}). They can
   303 be skipped by readers less interested in proofs. They are followed by a case
   304 study, a compiler for expressions ({\S}\ref{sec:ExprCompiler}). Advanced
   305 datatypes, including those involving function spaces, are covered in
   306 {\S}\ref{sec:advanced-datatypes}, which closes with another case study, search
   307 trees (``tries'').  Finally we introduce \isacommand{recdef}, a very general
   308 form of recursive function definition which goes well beyond what
   309 \isacommand{primrec} allows ({\S}\ref{sec:recdef}).
   310 
   311 
   312 \section{Simplification}
   313 \label{sec:Simplification}
   314 \index{simplification|(}
   315 
   316 So far we have proved our theorems by \isa{auto}, which simplifies
   317 \emph{all} subgoals. In fact, \isa{auto} can do much more than that, except
   318 that it did not need to so far. However, when you go beyond toy examples, you
   319 need to understand the ingredients of \isa{auto}.  This section covers the
   320 method that \isa{auto} always applies first, simplification.
   321 
   322 Simplification is one of the central theorem proving tools in Isabelle and
   323 many other systems. The tool itself is called the \bfindex{simplifier}. The
   324 purpose of this section is to introduce the many features of the simplifier.
   325 Anybody intending to perform proofs in HOL should read this section. Later on
   326 ({\S}\ref{sec:simplification-II}) we explain some more advanced features and a
   327 little bit of how the simplifier works. The serious student should read that
   328 section as well, in particular in order to understand what happened if things
   329 do not simplify as expected.
   330 
   331 \subsection{What is Simplification}
   332 
   333 In its most basic form, simplification means repeated application of
   334 equations from left to right. For example, taking the rules for \isa{\at}
   335 and applying them to the term \isa{[0,1] \at\ []} results in a sequence of
   336 simplification steps:
   337 \begin{ttbox}\makeatother
   338 (0#1#[]) @ []  \(\leadsto\)  0#((1#[]) @ [])  \(\leadsto\)  0#(1#([] @ []))  \(\leadsto\)  0#1#[]
   339 \end{ttbox}
   340 This is also known as \bfindex{term rewriting}\indexbold{rewriting} and the
   341 equations are referred to as \textbf{rewrite rules}\indexbold{rewrite rule}.
   342 ``Rewriting'' is more honest than ``simplification'' because the terms do not
   343 necessarily become simpler in the process.
   344 
   345 \input{Misc/document/simp.tex}
   346 
   347 \index{simplification|)}
   348 
   349 \input{Misc/document/Itrev.tex}
   350 
   351 \begin{exercise}
   352 \input{Misc/document/Tree2.tex}%
   353 \end{exercise}
   354 
   355 \input{CodeGen/document/CodeGen.tex}
   356 
   357 
   358 \section{Advanced Datatypes}
   359 \label{sec:advanced-datatypes}
   360 \index{*datatype|(}
   361 \index{*primrec|(}
   362 %|)
   363 
   364 This section presents advanced forms of \isacommand{datatype}s.
   365 
   366 \subsection{Mutual Recursion}
   367 \label{sec:datatype-mut-rec}
   368 
   369 \input{Datatype/document/ABexpr.tex}
   370 
   371 \subsection{Nested Recursion}
   372 \label{sec:nested-datatype}
   373 
   374 {\makeatother\input{Datatype/document/Nested.tex}}
   375 
   376 
   377 \subsection{The Limits of Nested Recursion}
   378 
   379 How far can we push nested recursion? By the unfolding argument above, we can
   380 reduce nested to mutual recursion provided the nested recursion only involves
   381 previously defined datatypes. This does not include functions:
   382 \begin{isabelle}
   383 \isacommand{datatype} t = C "t \isasymRightarrow\ bool"
   384 \end{isabelle}
   385 This declaration is a real can of worms.
   386 In HOL it must be ruled out because it requires a type
   387 \isa{t} such that \isa{t} and its power set \isa{t \isasymFun\ bool} have the
   388 same cardinality --- an impossibility. For the same reason it is not possible
   389 to allow recursion involving the type \isa{set}, which is isomorphic to
   390 \isa{t \isasymFun\ bool}.
   391 
   392 Fortunately, a limited form of recursion
   393 involving function spaces is permitted: the recursive type may occur on the
   394 right of a function arrow, but never on the left. Hence the above can of worms
   395 is ruled out but the following example of a potentially infinitely branching tree is
   396 accepted:
   397 \smallskip
   398 
   399 \input{Datatype/document/Fundata.tex}
   400 \bigskip
   401 
   402 If you need nested recursion on the left of a function arrow, there are
   403 alternatives to pure HOL: LCF~\cite{paulson87} is a logic where types like
   404 \begin{isabelle}
   405 \isacommand{datatype} lam = C "lam \isasymrightarrow\ lam"
   406 \end{isabelle}
   407 do indeed make sense.  Note the different arrow,
   408 \isa{\isasymrightarrow} instead of \isa{\isasymRightarrow},
   409 expressing the type of \emph{continuous} functions. 
   410 There is even a version of LCF on top of HOL,
   411 called HOLCF~\cite{MuellerNvOS99}.
   412 
   413 \index{*primrec|)}
   414 \index{*datatype|)}
   415 
   416 \subsection{Case Study: Tries}
   417 \label{sec:Trie}
   418 
   419 Tries are a classic search tree data structure~\cite{Knuth3-75} for fast
   420 indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a
   421 trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and
   422 ``cat''.  When searching a string in a trie, the letters of the string are
   423 examined sequentially. Each letter determines which subtrie to search next.
   424 In this case study we model tries as a datatype, define a lookup and an
   425 update function, and prove that they behave as expected.
   426 
   427 \begin{figure}[htbp]
   428 \begin{center}
   429 \unitlength1mm
   430 \begin{picture}(60,30)
   431 \put( 5, 0){\makebox(0,0)[b]{l}}
   432 \put(25, 0){\makebox(0,0)[b]{e}}
   433 \put(35, 0){\makebox(0,0)[b]{n}}
   434 \put(45, 0){\makebox(0,0)[b]{r}}
   435 \put(55, 0){\makebox(0,0)[b]{t}}
   436 %
   437 \put( 5, 9){\line(0,-1){5}}
   438 \put(25, 9){\line(0,-1){5}}
   439 \put(44, 9){\line(-3,-2){9}}
   440 \put(45, 9){\line(0,-1){5}}
   441 \put(46, 9){\line(3,-2){9}}
   442 %
   443 \put( 5,10){\makebox(0,0)[b]{l}}
   444 \put(15,10){\makebox(0,0)[b]{n}}
   445 \put(25,10){\makebox(0,0)[b]{p}}
   446 \put(45,10){\makebox(0,0)[b]{a}}
   447 %
   448 \put(14,19){\line(-3,-2){9}}
   449 \put(15,19){\line(0,-1){5}}
   450 \put(16,19){\line(3,-2){9}}
   451 \put(45,19){\line(0,-1){5}}
   452 %
   453 \put(15,20){\makebox(0,0)[b]{a}}
   454 \put(45,20){\makebox(0,0)[b]{c}}
   455 %
   456 \put(30,30){\line(-3,-2){13}}
   457 \put(30,30){\line(3,-2){13}}
   458 \end{picture}
   459 \end{center}
   460 \caption{A sample trie}
   461 \label{fig:trie}
   462 \end{figure}
   463 
   464 Proper tries associate some value with each string. Since the
   465 information is stored only in the final node associated with the string, many
   466 nodes do not carry any value. This distinction is modeled with the help
   467 of the predefined datatype \isa{option} (see {\S}\ref{sec:option}).
   468 \input{Trie/document/Trie.tex}
   469 
   470 \begin{exercise}
   471   Write an improved version of \isa{update} that does not suffer from the
   472   space leak in the version above. Prove the main theorem for your improved
   473   \isa{update}.
   474 \end{exercise}
   475 
   476 \begin{exercise}
   477   Write a function to \emph{delete} entries from a trie. An easy solution is
   478   a clever modification of \isa{update} which allows both insertion and
   479   deletion with a single function.  Prove (a modified version of) the main
   480   theorem above. Optimize you function such that it shrinks tries after
   481   deletion, if possible.
   482 \end{exercise}
   483 
   484 \section{Total Recursive Functions}
   485 \label{sec:recdef}
   486 \index{*recdef|(}
   487 
   488 Although many total functions have a natural primitive recursive definition,
   489 this is not always the case. Arbitrary total recursive functions can be
   490 defined by means of \isacommand{recdef}: you can use full pattern-matching,
   491 recursion need not involve datatypes, and termination is proved by showing
   492 that the arguments of all recursive calls are smaller in a suitable (user
   493 supplied) sense. In this section we restrict ourselves to measure functions;
   494 more advanced termination proofs are discussed in {\S}\ref{sec:beyond-measure}.
   495 
   496 \subsection{Defining Recursive Functions}
   497 \label{sec:recdef-examples}
   498 \input{Recdef/document/examples.tex}
   499 
   500 \subsection{Proving Termination}
   501 
   502 \input{Recdef/document/termination.tex}
   503 
   504 \subsection{Simplification with Recdef}
   505 \label{sec:recdef-simplification}
   506 
   507 \input{Recdef/document/simplification.tex}
   508 
   509 \subsection{Induction}
   510 \index{induction!recursion|(}
   511 \index{recursion induction|(}
   512 
   513 \input{Recdef/document/Induction.tex}
   514 \label{sec:recdef-induction}
   515 
   516 \index{induction!recursion|)}
   517 \index{recursion induction|)}
   518 \index{*recdef|)}