doc-src/TutorialI/fp.tex
author paulson
Fri, 05 Jan 2001 18:32:57 +0100
changeset 10795 9e888d60d3e5
parent 10654 458068404143
child 10800 2d4c058749a7
permissions -rw-r--r--
minor edits to Chapters 1-3
     1 \chapter{Functional Programming in HOL}
     2 
     3 Although on the surface this chapter is mainly concerned with how to write
     4 functional programs in HOL and how to verify them, most of the
     5 constructs and proof procedures introduced are general purpose and recur in
     6 any specification or verification task.
     7 
     8 The dedicated functional programmer should be warned: HOL offers only
     9 \emph{total functional programming} --- all functions in HOL must be total;
    10 lazy data structures are not directly available. On the positive side,
    11 functions in HOL need not be computable: HOL is a specification language that
    12 goes well beyond what can be expressed as a program. However, for the time
    13 being we concentrate on the computable.
    14 
    15 \section{An introductory theory}
    16 \label{sec:intro-theory}
    17 
    18 Functional programming needs datatypes and functions. Both of them can be
    19 defined in a theory with a syntax reminiscent of languages like ML or
    20 Haskell. As an example consider the theory in figure~\ref{fig:ToyList}.
    21 We will now examine it line by line.
    22 
    23 \begin{figure}[htbp]
    24 \begin{ttbox}\makeatother
    25 \input{ToyList2/ToyList1}\end{ttbox}
    26 \caption{A theory of lists}
    27 \label{fig:ToyList}
    28 \end{figure}
    29 
    30 {\makeatother\input{ToyList/document/ToyList.tex}}
    31 
    32 The complete proof script is shown in Fig.\ts\ref{fig:ToyList-proofs}. The
    33 concatenation of Figs.\ts\ref{fig:ToyList} and~\ref{fig:ToyList-proofs}
    34 constitutes the complete theory \texttt{ToyList} and should reside in file
    35 \texttt{ToyList.thy}. It is good practice to present all declarations and
    36 definitions at the beginning of a theory to facilitate browsing.
    37 
    38 \begin{figure}[htbp]
    39 \begin{ttbox}\makeatother
    40 \input{ToyList2/ToyList2}\end{ttbox}
    41 \caption{Proofs about lists}
    42 \label{fig:ToyList-proofs}
    43 \end{figure}
    44 
    45 \subsubsection*{Review}
    46 
    47 This is the end of our toy proof. It should have familiarized you with
    48 \begin{itemize}
    49 \item the standard theorem proving procedure:
    50 state a goal (lemma or theorem); proceed with proof until a separate lemma is
    51 required; prove that lemma; come back to the original goal.
    52 \item a specific procedure that works well for functional programs:
    53 induction followed by all-out simplification via \isa{auto}.
    54 \item a basic repertoire of proof commands.
    55 \end{itemize}
    56 
    57 
    58 \section{Some helpful commands}
    59 \label{sec:commands-and-hints}
    60 
    61 This section discusses a few basic commands for manipulating the proof state
    62 and can be skipped by casual readers.
    63 
    64 There are two kinds of commands used during a proof: the actual proof
    65 commands and auxiliary commands for examining the proof state and controlling
    66 the display. Simple proof commands are of the form
    67 \isacommand{apply}\isa{(method)}\indexbold{apply} where \bfindex{method} is a
    68 synonym for ``theorem proving function''. Typical examples are
    69 \isa{induct_tac} and \isa{auto}. Further methods are introduced throughout
    70 the tutorial.  Unless stated otherwise you may assume that a method attacks
    71 merely the first subgoal. An exception is \isa{auto} which tries to solve all
    72 subgoals.
    73 
    74 The most useful auxiliary commands are:
    75 \begin{description}
    76 \item[Undoing:] \isacommand{undo}\indexbold{*undo} undoes the effect of the
    77   last command; \isacommand{undo} can be undone by
    78   \isacommand{redo}\indexbold{*redo}.  Both are only needed at the shell
    79   level and should not occur in the final theory.
    80 \item[Printing the current state:] \isacommand{pr}\indexbold{*pr} redisplays
    81   the current proof state, for example when it has disappeared off the
    82   screen.
    83 \item[Limiting the number of subgoals:] \isacommand{pr}~$n$ tells Isabelle to
    84   print only the first $n$ subgoals from now on and redisplays the current
    85   proof state. This is helpful when there are many subgoals.
    86 \item[Modifying the order of subgoals:]
    87 \isacommand{defer}\indexbold{*defer} moves the first subgoal to the end and
    88 \isacommand{prefer}\indexbold{*prefer}~$n$ moves subgoal $n$ to the front.
    89 \item[Printing theorems:]
    90   \isacommand{thm}\indexbold{*thm}~\textit{name}$@1$~\dots~\textit{name}$@n$
    91   prints the named theorems.
    92 \item[Displaying types:] We have already mentioned the flag
    93   \ttindex{show_types} above. It can also be useful for detecting typos in
    94   formulae early on. For example, if \texttt{show_types} is set and the goal
    95   \isa{rev(rev xs) = xs} is started, Isabelle prints the additional output
    96 \par\noindent
    97 \begin{isabelle}%
    98 Variables:\isanewline
    99 ~~xs~::~'a~list
   100 \end{isabelle}%
   101 \par\noindent
   102 which tells us that Isabelle has correctly inferred that
   103 \isa{xs} is a variable of list type. On the other hand, had we
   104 made a typo as in \isa{rev(re xs) = xs}, the response
   105 \par\noindent
   106 \begin{isabelle}%
   107 Variables:\isanewline
   108 ~~re~::~'a~list~{\isasymRightarrow}~'a~list\isanewline
   109 ~~xs~::~'a~list%
   110 \end{isabelle}%
   111 \par\noindent
   112 would have alerted us because of the unexpected variable \isa{re}.
   113 \item[Reading terms and types:] \isacommand{term}\indexbold{*term}
   114   \textit{string} reads, type-checks and prints the given string as a term in
   115   the current context; the inferred type is output as well.
   116   \isacommand{typ}\indexbold{*typ} \textit{string} reads and prints the given
   117   string as a type in the current context.
   118 \item[(Re)loading theories:] When you start your interaction you must open a
   119   named theory with the line \isa{\isacommand{theory}~T~=~\dots~:}. Isabelle
   120   automatically loads all the required parent theories from their respective
   121   files (which may take a moment, unless the theories are already loaded and
   122   the files have not been modified).
   123   
   124   If you suddenly discover that you need to modify a parent theory of your
   125   current theory you must first abandon your current theory\indexbold{abandon
   126   theory}\indexbold{theory!abandon} (at the shell
   127   level type \isacommand{kill}\indexbold{*kill}). After the parent theory has
   128   been modified you go back to your original theory. When its first line
   129   \isacommand{theory}\texttt{~T~=}~\dots~\texttt{:} is processed, the
   130   modified parent is reloaded automatically.
   131   
   132   The only time when you need to load a theory by hand is when you simply
   133   want to check if it loads successfully without wanting to make use of the
   134   theory itself. This you can do by typing
   135   \isa{\isacommand{use\_thy}\indexbold{*use_thy}~"T"}.
   136 \end{description}
   137 Further commands are found in the Isabelle/Isar Reference Manual.
   138 
   139 We now examine Isabelle's functional programming constructs systematically,
   140 starting with inductive datatypes.
   141 
   142 
   143 \section{Datatypes}
   144 \label{sec:datatype}
   145 
   146 Inductive datatypes are part of almost every non-trivial application of HOL.
   147 First we take another look at a very important example, the datatype of
   148 lists, before we turn to datatypes in general. The section closes with a
   149 case study.
   150 
   151 
   152 \subsection{Lists}
   153 \indexbold{*list}
   154 
   155 Lists are one of the essential datatypes in computing. Readers of this
   156 tutorial and users of HOL need to be familiar with their basic operations.
   157 Theory \isa{ToyList} is only a small fragment of HOL's predefined theory
   158 \isa{List}\footnote{\url{http://isabelle.in.tum.de/library/HOL/List.html}}.
   159 The latter contains many further operations. For example, the functions
   160 \isaindexbold{hd} (``head'') and \isaindexbold{tl} (``tail'') return the first
   161 element and the remainder of a list. (However, pattern-matching is usually
   162 preferable to \isa{hd} and \isa{tl}.)  
   163 Also available are the higher-order
   164 functions \isa{map}, \isa{filter}, \isa{foldl} and \isa{foldr}.
   165 Theory \isa{List} also contains
   166 more syntactic sugar: \isa{[}$x@1$\isa{,}\dots\isa{,}$x@n$\isa{]} abbreviates
   167 $x@1$\isa{\#}\dots\isa{\#}$x@n$\isa{\#[]}.  In the rest of the tutorial we
   168 always use HOL's predefined lists.
   169 
   170 
   171 \subsection{The general format}
   172 \label{sec:general-datatype}
   173 
   174 The general HOL \isacommand{datatype} definition is of the form
   175 \[
   176 \isacommand{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~
   177 C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~
   178 C@m~\tau@{m1}~\dots~\tau@{mk@m}
   179 \]
   180 where $\alpha@i$ are distinct type variables (the parameters), $C@i$ are distinct
   181 constructor names and $\tau@{ij}$ are types; it is customary to capitalize
   182 the first letter in constructor names. There are a number of
   183 restrictions (such as that the type should not be empty) detailed
   184 elsewhere~\cite{isabelle-HOL}. Isabelle notifies you if you violate them.
   185 
   186 Laws about datatypes, such as \isa{[] \isasymnoteq~x\#xs} and
   187 \isa{(x\#xs = y\#ys) = (x=y \isasymand~xs=ys)}, are used automatically
   188 during proofs by simplification.  The same is true for the equations in
   189 primitive recursive function definitions.
   190 
   191 Every datatype $t$ comes equipped with a \isa{size} function from $t$ into
   192 the natural numbers (see~{\S}\ref{sec:nat} below). For lists, \isa{size} is
   193 just the length, i.e.\ \isa{size [] = 0} and \isa{size(x \# xs) = size xs +
   194   1}.  In general, \isaindexbold{size} returns \isa{0} for all constructors
   195 that do not have an argument of type $t$, and for all other constructors
   196 \isa{1 +} the sum of the sizes of all arguments of type $t$. Note that because
   197 \isa{size} is defined on every datatype, it is overloaded; on lists
   198 \isa{size} is also called \isaindexbold{length}, which is not overloaded.
   199 Isabelle will always show \isa{size} on lists as \isa{length}.
   200 
   201 
   202 \subsection{Primitive recursion}
   203 
   204 Functions on datatypes are usually defined by recursion. In fact, most of the
   205 time they are defined by what is called \bfindex{primitive recursion}.
   206 The keyword \isacommand{primrec}\indexbold{*primrec} is followed by a list of
   207 equations
   208 \[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \]
   209 such that $C$ is a constructor of the datatype $t$ and all recursive calls of
   210 $f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus
   211 Isabelle immediately sees that $f$ terminates because one (fixed!) argument
   212 becomes smaller with every recursive call. There must be at most one equation
   213 for each constructor.  Their order is immaterial.
   214 A more general method for defining total recursive functions is introduced in
   215 {\S}\ref{sec:recdef}.
   216 
   217 \begin{exercise}\label{ex:Tree}
   218 \input{Misc/document/Tree.tex}%
   219 \end{exercise}
   220 
   221 \input{Misc/document/case_exprs.tex}
   222 
   223 \begin{warn}
   224   Induction is only allowed on free (or \isasymAnd-bound) variables that
   225   should not occur among the assumptions of the subgoal; see
   226   {\S}\ref{sec:ind-var-in-prems} for details. Case distinction
   227   (\isa{case_tac}) works for arbitrary terms, which need to be
   228   quoted if they are non-atomic.
   229 \end{warn}
   230 
   231 
   232 \input{Ifexpr/document/Ifexpr.tex}
   233 
   234 \section{Some basic types}
   235 
   236 
   237 \subsection{Natural numbers}
   238 \label{sec:nat}
   239 \index{arithmetic|(}
   240 
   241 \input{Misc/document/fakenat.tex}
   242 \input{Misc/document/natsum.tex}
   243 
   244 \index{arithmetic|)}
   245 
   246 
   247 \subsection{Pairs}
   248 \input{Misc/document/pairs.tex}
   249 
   250 \subsection{Datatype {\tt\slshape option}}
   251 \label{sec:option}
   252 \input{Misc/document/Option2.tex}
   253 
   254 \section{Definitions}
   255 \label{sec:Definitions}
   256 
   257 A definition is simply an abbreviation, i.e.\ a new name for an existing
   258 construction. In particular, definitions cannot be recursive. Isabelle offers
   259 definitions on the level of types and terms. Those on the type level are
   260 called type synonyms, those on the term level are called (constant)
   261 definitions.
   262 
   263 
   264 \subsection{Type synonyms}
   265 \indexbold{type synonym}
   266 
   267 Type synonyms are similar to those found in ML\@. Their syntax is fairly self
   268 explanatory:
   269 
   270 \input{Misc/document/types.tex}%
   271 
   272 Note that pattern-matching is not allowed, i.e.\ each definition must be of
   273 the form $f\,x@1\,\dots\,x@n~\isasymequiv~t$.
   274 Section~{\S}\ref{sec:Simplification} explains how definitions are used
   275 in proofs.
   276 
   277 \input{Misc/document/prime_def.tex}
   278 
   279 
   280 \chapter{More Functional Programming}
   281 
   282 The purpose of this chapter is to deepen the reader's understanding of the
   283 concepts encountered so far and to introduce advanced forms of datatypes and
   284 recursive functions. The first two sections give a structured presentation of
   285 theorem proving by simplification ({\S}\ref{sec:Simplification}) and discuss
   286 important heuristics for induction ({\S}\ref{sec:InductionHeuristics}). They can
   287 be skipped by readers less interested in proofs. They are followed by a case
   288 study, a compiler for expressions ({\S}\ref{sec:ExprCompiler}). Advanced
   289 datatypes, including those involving function spaces, are covered in
   290 {\S}\ref{sec:advanced-datatypes}, which closes with another case study, search
   291 trees (``tries'').  Finally we introduce \isacommand{recdef}, a very general
   292 form of recursive function definition which goes well beyond what
   293 \isacommand{primrec} allows ({\S}\ref{sec:recdef}).
   294 
   295 
   296 \section{Simplification}
   297 \label{sec:Simplification}
   298 \index{simplification|(}
   299 
   300 So far we have proved our theorems by \isa{auto}, which simplifies
   301 \emph{all} subgoals. In fact, \isa{auto} can do much more than that, except
   302 that it did not need to so far. However, when you go beyond toy examples, you
   303 need to understand the ingredients of \isa{auto}.  This section covers the
   304 method that \isa{auto} always applies first, namely simplification.
   305 
   306 Simplification is one of the central theorem proving tools in Isabelle and
   307 many other systems. The tool itself is called the \bfindex{simplifier}. The
   308 purpose of this section is introduce the many features of the simplifier.
   309 Anybody intending to use HOL should read this section. Later on
   310 ({\S}\ref{sec:simplification-II}) we explain some more advanced features and a
   311 little bit of how the simplifier works. The serious student should read that
   312 section as well, in particular in order to understand what happened if things
   313 do not simplify as expected.
   314 
   315 \subsubsection{What is simplification}
   316 
   317 In its most basic form, simplification means repeated application of
   318 equations from left to right. For example, taking the rules for \isa{\at}
   319 and applying them to the term \isa{[0,1] \at\ []} results in a sequence of
   320 simplification steps:
   321 \begin{ttbox}\makeatother
   322 (0#1#[]) @ []  \(\leadsto\)  0#((1#[]) @ [])  \(\leadsto\)  0#(1#([] @ []))  \(\leadsto\)  0#1#[]
   323 \end{ttbox}
   324 This is also known as \bfindex{term rewriting}\indexbold{rewriting} and the
   325 equations are referred to as \textbf{rewrite rules}\indexbold{rewrite rule}.
   326 ``Rewriting'' is more honest than ``simplification'' because the terms do not
   327 necessarily become simpler in the process.
   328 
   329 \input{Misc/document/simp.tex}
   330 
   331 \index{simplification|)}
   332 
   333 \input{Misc/document/Itrev.tex}
   334 
   335 \begin{exercise}
   336 \input{Misc/document/Tree2.tex}%
   337 \end{exercise}
   338 
   339 \input{CodeGen/document/CodeGen.tex}
   340 
   341 
   342 \section{Advanced datatypes}
   343 \label{sec:advanced-datatypes}
   344 \index{*datatype|(}
   345 \index{*primrec|(}
   346 %|)
   347 
   348 This section presents advanced forms of \isacommand{datatype}s.
   349 
   350 \subsection{Mutual recursion}
   351 \label{sec:datatype-mut-rec}
   352 
   353 \input{Datatype/document/ABexpr.tex}
   354 
   355 \subsection{Nested recursion}
   356 \label{sec:nested-datatype}
   357 
   358 {\makeatother\input{Datatype/document/Nested.tex}}
   359 
   360 
   361 \subsection{The limits of nested recursion}
   362 
   363 How far can we push nested recursion? By the unfolding argument above, we can
   364 reduce nested to mutual recursion provided the nested recursion only involves
   365 previously defined datatypes. This does not include functions:
   366 \begin{isabelle}
   367 \isacommand{datatype} t = C "t \isasymRightarrow\ bool"
   368 \end{isabelle}
   369 This declaration is a real can of worms.
   370 In HOL it must be ruled out because it requires a type
   371 \isa{t} such that \isa{t} and its power set \isa{t \isasymFun\ bool} have the
   372 same cardinality---an impossibility. For the same reason it is not possible
   373 to allow recursion involving the type \isa{set}, which is isomorphic to
   374 \isa{t \isasymFun\ bool}.
   375 
   376 Fortunately, a limited form of recursion
   377 involving function spaces is permitted: the recursive type may occur on the
   378 right of a function arrow, but never on the left. Hence the above can of worms
   379 is ruled out but the following example of a potentially infinitely branching tree is
   380 accepted:
   381 \smallskip
   382 
   383 \input{Datatype/document/Fundata.tex}
   384 \bigskip
   385 
   386 If you need nested recursion on the left of a function arrow, there are
   387 alternatives to pure HOL: LCF~\cite{paulson87} is a logic where types like
   388 \begin{isabelle}
   389 \isacommand{datatype} lam = C "lam \isasymrightarrow\ lam"
   390 \end{isabelle}
   391 do indeed make sense.  Note the different arrow,
   392 \isa{\isasymrightarrow} instead of \isa{\isasymRightarrow},
   393 expressing the type of \textbf{continuous} functions. 
   394 There is even a version of LCF on top of HOL,
   395 called HOLCF~\cite{MuellerNvOS99}.
   396 
   397 \index{*primrec|)}
   398 \index{*datatype|)}
   399 
   400 \subsection{Case study: Tries}
   401 \label{sec:Trie}
   402 
   403 Tries are a classic search tree data structure~\cite{Knuth3-75} for fast
   404 indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a
   405 trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and
   406 ``cat''.  When searching a string in a trie, the letters of the string are
   407 examined sequentially. Each letter determines which subtrie to search next.
   408 In this case study we model tries as a datatype, define a lookup and an
   409 update function, and prove that they behave as expected.
   410 
   411 \begin{figure}[htbp]
   412 \begin{center}
   413 \unitlength1mm
   414 \begin{picture}(60,30)
   415 \put( 5, 0){\makebox(0,0)[b]{l}}
   416 \put(25, 0){\makebox(0,0)[b]{e}}
   417 \put(35, 0){\makebox(0,0)[b]{n}}
   418 \put(45, 0){\makebox(0,0)[b]{r}}
   419 \put(55, 0){\makebox(0,0)[b]{t}}
   420 %
   421 \put( 5, 9){\line(0,-1){5}}
   422 \put(25, 9){\line(0,-1){5}}
   423 \put(44, 9){\line(-3,-2){9}}
   424 \put(45, 9){\line(0,-1){5}}
   425 \put(46, 9){\line(3,-2){9}}
   426 %
   427 \put( 5,10){\makebox(0,0)[b]{l}}
   428 \put(15,10){\makebox(0,0)[b]{n}}
   429 \put(25,10){\makebox(0,0)[b]{p}}
   430 \put(45,10){\makebox(0,0)[b]{a}}
   431 %
   432 \put(14,19){\line(-3,-2){9}}
   433 \put(15,19){\line(0,-1){5}}
   434 \put(16,19){\line(3,-2){9}}
   435 \put(45,19){\line(0,-1){5}}
   436 %
   437 \put(15,20){\makebox(0,0)[b]{a}}
   438 \put(45,20){\makebox(0,0)[b]{c}}
   439 %
   440 \put(30,30){\line(-3,-2){13}}
   441 \put(30,30){\line(3,-2){13}}
   442 \end{picture}
   443 \end{center}
   444 \caption{A sample trie}
   445 \label{fig:trie}
   446 \end{figure}
   447 
   448 Proper tries associate some value with each string. Since the
   449 information is stored only in the final node associated with the string, many
   450 nodes do not carry any value. This distinction is modeled with the help
   451 of the predefined datatype \isa{option} (see {\S}\ref{sec:option}).
   452 \input{Trie/document/Trie.tex}
   453 
   454 \begin{exercise}
   455   Write an improved version of \isa{update} that does not suffer from the
   456   space leak in the version above. Prove the main theorem for your improved
   457   \isa{update}.
   458 \end{exercise}
   459 
   460 \begin{exercise}
   461   Write a function to \emph{delete} entries from a trie. An easy solution is
   462   a clever modification of \isa{update} which allows both insertion and
   463   deletion with a single function.  Prove (a modified version of) the main
   464   theorem above. Optimize you function such that it shrinks tries after
   465   deletion, if possible.
   466 \end{exercise}
   467 
   468 \section{Total recursive functions}
   469 \label{sec:recdef}
   470 \index{*recdef|(}
   471 
   472 Although many total functions have a natural primitive recursive definition,
   473 this is not always the case. Arbitrary total recursive functions can be
   474 defined by means of \isacommand{recdef}: you can use full pattern-matching,
   475 recursion need not involve datatypes, and termination is proved by showing
   476 that the arguments of all recursive calls are smaller in a suitable (user
   477 supplied) sense. In this section we ristrict ourselves to measure functions;
   478 more advanced termination proofs are discussed in {\S}\ref{sec:beyond-measure}.
   479 
   480 \subsection{Defining recursive functions}
   481 \label{sec:recdef-examples}
   482 \input{Recdef/document/examples.tex}
   483 
   484 \subsection{Proving termination}
   485 
   486 \input{Recdef/document/termination.tex}
   487 
   488 \subsection{Simplification with recdef}
   489 \label{sec:recdef-simplification}
   490 
   491 \input{Recdef/document/simplification.tex}
   492 
   493 \subsection{Induction}
   494 \index{induction!recursion|(}
   495 \index{recursion induction|(}
   496 
   497 \input{Recdef/document/Induction.tex}
   498 \label{sec:recdef-induction}
   499 
   500 \index{induction!recursion|)}
   501 \index{recursion induction|)}
   502 \index{*recdef|)}