wneuper/isa: doc-src/Tutorial/fp.tex@1463e182c533 (annotated)

nipkow@5375	1	\chapter{Functional Programming in HOL}
nipkow@5375	2
nipkow@5375	3	Although on the surface this chapter is mainly concerned with how to write
nipkow@5375	4	functional programs in HOL and how to verify them, most of the
nipkow@5375	5	constructs and proof procedures introduced are general purpose and recur in
nipkow@5375	6	any specification or verification task.
nipkow@5375	7
nipkow@5375	8	The dedicated functional programmer should be warned: HOL offers only what
nipkow@5375	9	could be called {\em total functional programming} --- all functions in HOL
nipkow@5375	10	must be total; lazy data structures are not directly available. On the
nipkow@5375	11	positive side, functions in HOL need not be computable: HOL is a
nipkow@5375	12	specification language that goes well beyond what can be expressed as a
nipkow@5375	13	program. However, for the time being we concentrate on the computable.
nipkow@5375	14
nipkow@5375	15	\section{An introductory theory}
nipkow@5375	16	\label{sec:intro-theory}
nipkow@5375	17
nipkow@5375	18	Functional programming needs datatypes and functions. Both of them can be
nipkow@5375	19	defined in a theory with a syntax reminiscent of languages like ML or
nipkow@5375	20	Haskell. As an example consider the theory in Fig.~\ref{fig:ToyList}.
nipkow@5375	21
nipkow@5375	22	\begin{figure}[htbp]
nipkow@5375	23	\begin{ttbox}\makeatother
nipkow@5375	24	\input{ToyList/ToyList.thy}\end{ttbox}
nipkow@5375	25	\caption{A theory of lists}
nipkow@5375	26	\label{fig:ToyList}
nipkow@5375	27	\end{figure}
nipkow@5375	28
nipkow@5375	29	HOL already has a predefined theory of lists called \texttt{List} ---
nipkow@5375	30	\texttt{ToyList} is merely a small fragment of it chosen as an example. In
nipkow@5375	31	contrast to what is recommended in \S\ref{sec:Basic:Theories},
nipkow@5375	32	\texttt{ToyList} is not based on \texttt{Main} but on \texttt{Datatype}, a
nipkow@5375	33	theory that contains everything required for datatype definitions but does
nipkow@5375	34	not have \texttt{List} as a parent, thus avoiding ambiguities caused by
nipkow@5375	35	defining lists twice.
nipkow@5375	36
nipkow@5375	37	The \ttindexbold{datatype} \texttt{list} introduces two constructors
nipkow@5375	38	\texttt{Nil} and \texttt{Cons}, the empty list and the operator that adds an
nipkow@5375	39	element to the front of a list. For example, the term \texttt{Cons True (Cons
nipkow@5375	40	False Nil)} is a value of type \texttt{bool~list}, namely the list with the
nipkow@5375	41	elements \texttt{True} and \texttt{False}. Because this notation becomes
nipkow@5375	42	unwieldy very quickly, the datatype declaration is annotated with an
nipkow@5375	43	alternative syntax: instead of \texttt{Nil} and \texttt{Cons}~$x$~$xs$ we can
nipkow@5375	44	write \index{#@{\tt[]}\|bold}\texttt{[]} and
nipkow@5375	45	\texttt{$x$~\#~$xs$}\index{#@{\tt\#}\|bold}. In fact, this alternative syntax
nipkow@5375	46	is the standard syntax. Thus the list \texttt{Cons True (Cons False Nil)}
nipkow@5375	47	becomes \texttt{True \# False \# []}. The annotation \ttindexbold{infixr}
nipkow@5375	48	means that \texttt{\#} associates to the right, i.e.\ the term \texttt{$x$ \#
nipkow@5375	49	$y$ \# $z$} is read as \texttt{$x$ \# ($y$ \# $z$)} and not as \texttt{($x$
nipkow@5375	50	\# $y$) \# $z$}.
nipkow@5375	51
nipkow@5375	52	\begin{warn}
nipkow@5375	53	Syntax annotations are a powerful but completely optional feature. You
nipkow@5375	54	could drop them from theory \texttt{ToyList} and go back to the identifiers
nipkow@5375	55	\texttt{Nil} and \texttt{Cons}. However, lists are such a central datatype
nipkow@5375	56	that their syntax is highly customized. We recommend that novices should
nipkow@5375	57	not use syntax annotations in their own theories.
nipkow@5375	58	\end{warn}
nipkow@5375	59
nipkow@5375	60	Next, the functions \texttt{app} and \texttt{rev} are declared. In contrast
nipkow@5375	61	to ML, Isabelle insists on explicit declarations of all functions (keyword
nipkow@5375	62	\ttindexbold{consts}). (Apart from the declaration-before-use restriction,
nipkow@5375	63	the order of items in a theory file is unconstrained.) Function \texttt{app}
nipkow@5375	64	is annotated with concrete syntax too. Instead of the prefix syntax
nipkow@5375	65	\texttt{app}~$xs$~$ys$ the infix $xs$~\texttt{\at}~$ys$ becomes the preferred
nipkow@5375	66	form.
nipkow@5375	67
nipkow@5375	68	Both functions are defined recursively. The equations for \texttt{app} and
nipkow@5375	69	\texttt{rev} hardly need comments: \texttt{app} appends two lists and
nipkow@5375	70	\texttt{rev} reverses a list. The keyword \texttt{primrec} indicates that
nipkow@5375	71	the recursion is of a particularly primitive kind where each recursive call
nipkow@5375	72	peels off a datatype constructor from one of the arguments (see
nipkow@5375	73	\S\ref{sec:datatype}). Thus the recursion always terminates, i.e.\ the
nipkow@5375	74	function is \bfindex{total}.
nipkow@5375	75
nipkow@5375	76	The termination requirement is absolutely essential in HOL, a logic of total
nipkow@5375	77	functions. If we were to drop it, inconsistencies could quickly arise: the
nipkow@5375	78	``definition'' $f(n) = f(n)+1$ immediately leads to $0 = 1$ by subtracting
nipkow@5375	79	$f(n)$ on both sides.
nipkow@5375	80	% However, this is a subtle issue that we cannot discuss here further.
nipkow@5375	81
nipkow@5375	82	\begin{warn}
nipkow@5375	83	As we have indicated, the desire for total functions is not a gratuitously
nipkow@5375	84	imposed restriction but an essential characteristic of HOL. It is only
nipkow@5375	85	because of totality that reasoning in HOL is comparatively easy. More
nipkow@5375	86	generally, the philosophy in HOL is not to allow arbitrary axioms (such as
nipkow@5375	87	function definitions whose totality has not been proved) because they
nipkow@5375	88	quickly lead to inconsistencies. Instead, fixed constructs for introducing
nipkow@5375	89	types and functions are offered (such as \texttt{datatype} and
nipkow@5375	90	\texttt{primrec}) which are guaranteed to preserve consistency.
nipkow@5375	91	\end{warn}
nipkow@5375	92
nipkow@5375	93	A remark about syntax. The textual definition of a theory follows a fixed
nipkow@5375	94	syntax with keywords like \texttt{datatype} and \texttt{end} (see
nipkow@5375	95	Fig.~\ref{fig:keywords} in Appendix~\ref{sec:Appendix} for a full list).
nipkow@5375	96	Embedded in this syntax are the types and formulae of HOL, whose syntax is
nipkow@5375	97	extensible, e.g.\ by new user-defined infix operators
nipkow@5375	98	(see~\ref{sec:infix-syntax}). To distinguish the two levels, everything
nipkow@5375	99	HOL-specific should be enclosed in \texttt{"}\dots\texttt{"}. The same holds
nipkow@5375	100	for identifiers that happen to be keywords, as in
nipkow@5375	101	\begin{ttbox}
nipkow@5375	102	consts "end" :: 'a list => 'a
nipkow@5375	103	\end{ttbox}
nipkow@5375	104	To lessen this burden, quotation marks around types can be dropped,
nipkow@5375	105	provided their syntax does not go beyond what is described in
nipkow@5375	106	\S\ref{sec:TypesTermsForms}. Types containing further operators, e.g.\
nipkow@5375	107	\texttt{*} for Cartesian products, need quotation marks.
nipkow@5375	108
nipkow@5375	109	When Isabelle prints a syntax error message, it refers to the HOL syntax as
nipkow@5375	110	the \bfindex{inner syntax}.
nipkow@5375	111
nipkow@5375	112	\section{An introductory proof}
nipkow@5375	113	\label{sec:intro-proof}
nipkow@5375	114
nipkow@5375	115	Having defined \texttt{ToyList}, we load it with the ML command
nipkow@5375	116	\begin{ttbox}
nipkow@5375	117	use_thy "ToyList";
nipkow@5375	118	\end{ttbox}
nipkow@5375	119	and are ready to prove a few simple theorems. This will illustrate not just
nipkow@5375	120	the basic proof commands but also the typical proof process.
nipkow@5375	121
nipkow@5375	122	\subsubsection*{Main goal: \texttt{rev(rev xs) = xs}}
nipkow@5375	123
nipkow@5375	124	Our goal is to show that reversing a list twice produces the original
nipkow@5375	125	list. Typing
nipkow@5375	126	\begin{ttbox}
nipkow@5375	127	\input{ToyList/thm}\end{ttbox}
nipkow@5375	128	establishes a new goal to be proved in the context of the current theory,
nipkow@5375	129	which is the one we just loaded. Isabelle's response is to print the current proof state:
nipkow@5375	130	\begin{ttbox}
nipkow@5375	131	{\out Level 0}
nipkow@5375	132	{\out rev (rev xs) = xs}
nipkow@5375	133	{\out 1. rev (rev xs) = xs}
nipkow@5375	134	\end{ttbox}
nipkow@5375	135	Until we have finished a proof, the proof state always looks like this:
nipkow@5375	136	\begin{ttbox}
nipkow@5375	137	{\out Level $i$}
nipkow@5375	138	{\out $G$}
nipkow@5375	139	{\out 1. $G@1$}
nipkow@5375	140	{\out $\vdots$}
nipkow@5375	141	{\out $n$. $G@n$}
nipkow@5375	142	\end{ttbox}
nipkow@5375	143	where \texttt{Level}~$i$ indicates that we are $i$ steps into the proof, $G$
nipkow@5375	144	is the overall goal that we are trying to prove, and the numbered lines
nipkow@5375	145	contain the subgoals $G@1$, \dots, $G@n$ that we need to prove to establish
nipkow@5375	146	$G$. At \texttt{Level 0} there is only one subgoal, which is identical with
nipkow@5375	147	the overall goal. Normally $G$ is constant and only serves as a reminder.
nipkow@5375	148	Hence we rarely show it in this tutorial.
nipkow@5375	149
nipkow@5375	150	Let us now get back to \texttt{rev(rev xs) = xs}. Properties of recursively
nipkow@5375	151	defined functions are best established by induction. In this case there is
nipkow@5375	152	not much choice except to induct on \texttt{xs}:
nipkow@5375	153	\begin{ttbox}
nipkow@5375	154	\input{ToyList/inductxs}\end{ttbox}
nipkow@5375	155	This tells Isabelle to perform induction on variable \texttt{xs} in subgoal
nipkow@5375	156	1. The new proof state contains two subgoals, namely the base case
nipkow@5375	157	(\texttt{Nil}) and the induction step (\texttt{Cons}):
nipkow@5375	158	\begin{ttbox}
nipkow@5375	159	{\out 1. rev (rev []) = []}
nipkow@5375	160	{\out 2. !!a list. rev (rev list) = list ==> rev (rev (a # list)) = a # list}
nipkow@5375	161	\end{ttbox}
nipkow@5375	162	The induction step is an example of the general format of a subgoal:
nipkow@5375	163	\begin{ttbox}
nipkow@5375	164	{\out $i$. !!$x@1 \dots x@n$. {\it assumptions} ==> {\it conclusion}}
nipkow@5375	165	\end{ttbox}\index{==>@{\tt==>}\|bold}
nipkow@5375	166	The prefix of bound variables \texttt{!!$x@1 \dots x@n$} can be ignored
nipkow@5375	167	most of the time, or simply treated as a list of variables local to this
nipkow@5375	168	subgoal. Their deeper significance is explained in \S\ref{sec:PCproofs}. The
nipkow@5375	169	{\it assumptions} are the local assumptions for this subgoal and {\it
nipkow@5375	170	conclusion} is the actual proposition to be proved. Typical proof steps
nipkow@5375	171	that add new assumptions are induction or case distinction. In our example
nipkow@5375	172	the only assumption is the induction hypothesis \texttt{rev (rev list) =
nipkow@5375	173	list}, where \texttt{list} is a variable name chosen by Isabelle. If there
nipkow@5375	174	are multiple assumptions, they are enclosed in the bracket pair
nipkow@5375	175	\texttt{[\|}\index{==>@\ttlbr\|bold} and \texttt{\|]}\index{==>@\ttrbr\|bold}
nipkow@5375	176	and separated by semicolons.
nipkow@5375	177
nipkow@5375	178	Let us try to solve both goals automatically:
nipkow@5375	179	\begin{ttbox}
nipkow@5375	180	\input{ToyList/autotac}\end{ttbox}
nipkow@5375	181	This command tells Isabelle to apply a proof strategy called
nipkow@5375	182	\texttt{Auto_tac} to all subgoals. Essentially, \texttt{Auto_tac} tries to
nipkow@5375	183	`simplify' the subgoals. In our case, subgoal~1 is solved completely (thanks
nipkow@5375	184	to the equation \texttt{rev [] = []}) and disappears; the simplified version
nipkow@5375	185	of subgoal~2 becomes the new subgoal~1:
nipkow@5375	186	\begin{ttbox}\makeatother
nipkow@5375	187	{\out 1. !!a list. rev(rev list) = list ==> rev(rev list @ a # []) = a # list}
nipkow@5375	188	\end{ttbox}
nipkow@5375	189	In order to simplify this subgoal further, a lemma suggests itself.
nipkow@5375	190
nipkow@5375	191	\subsubsection*{First lemma: \texttt{rev(xs \at~ys) = (rev ys) \at~(rev xs)}}
nipkow@5375	192
nipkow@5375	193	We start the proof as usual:
nipkow@5375	194	\begin{ttbox}\makeatother
nipkow@5375	195	\input{ToyList/lemma1}\end{ttbox}
nipkow@5375	196	There are two variables that we could induct on: \texttt{xs} and
nipkow@5375	197	\texttt{ys}. Because \texttt{\at} is defined by recursion on
nipkow@5375	198	the first argument, \texttt{xs} is the correct one:
nipkow@5375	199	\begin{ttbox}
nipkow@5375	200	\input{ToyList/inductxs}\end{ttbox}
nipkow@5375	201	This time not even the base case is solved automatically:
nipkow@5375	202	\begin{ttbox}\makeatother
nipkow@5375	203	by(Auto_tac);
nipkow@5375	204	{\out 1. rev ys = rev ys @ []}
nipkow@5375	205	{\out 2. \dots}
nipkow@5375	206	\end{ttbox}
nipkow@5375	207	We need another lemma.
nipkow@5375	208
nipkow@5375	209	\subsubsection*{Second lemma: \texttt{xs \at~[] = xs}}
nipkow@5375	210
nipkow@5375	211	This time the canonical proof procedure
nipkow@5375	212	\begin{ttbox}\makeatother
nipkow@5375	213	\input{ToyList/lemma2}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
nipkow@5375	214	leads to the desired message \texttt{No subgoals!}:
nipkow@5375	215	\begin{ttbox}\makeatother
nipkow@5375	216	{\out Level 2}
nipkow@5375	217	{\out xs @ [] = xs}
nipkow@5375	218	{\out No subgoals!}
nipkow@5375	219	\end{ttbox}
nipkow@5375	220	Now we can give the lemma just proved a suitable name
nipkow@5375	221	\begin{ttbox}
nipkow@5375	222	\input{ToyList/qed2}\end{ttbox}
nipkow@5375	223	and tell Isabelle to use this lemma in all future proofs by simplification:
nipkow@5375	224	\begin{ttbox}
nipkow@5375	225	\input{ToyList/addsimps2}\end{ttbox}
nipkow@5375	226	Note that in the theorem \texttt{app_Nil2} the free variable \texttt{xs} has
nipkow@5375	227	been replaced by the unknown \texttt{?xs}, just as explained in
nipkow@5375	228	\S\ref{sec:variables}.
nipkow@5375	229
nipkow@5375	230	Going back to the proof of the first lemma
nipkow@5375	231	\begin{ttbox}\makeatother
nipkow@5375	232	\input{ToyList/lemma1}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
nipkow@5375	233	we find that this time \texttt{Auto_tac} solves the base case, but the
nipkow@5375	234	induction step merely simplifies to
nipkow@5375	235	\begin{ttbox}\makeatother
nipkow@5375	236	{\out 1. !!a list.}
nipkow@5375	237	{\out rev (list @ ys) = rev ys @ rev list}
nipkow@5375	238	{\out ==> (rev ys @ rev list) @ a # [] = rev ys @ rev list @ a # []}
nipkow@5375	239	\end{ttbox}
nipkow@5375	240	Now we need to remember that \texttt{\at} associates to the right, and that
nipkow@5375	241	\texttt{\#} and \texttt{\at} have the same priority (namely the \texttt{65}
nipkow@5375	242	in the definition of \texttt{ToyList}). Thus the conclusion really is
nipkow@5375	243	\begin{ttbox}\makeatother
nipkow@5375	244	{\out ==> (rev ys @ rev list) @ (a # []) = rev ys @ (rev list @ (a # []))}
nipkow@5375	245	\end{ttbox}
nipkow@5375	246	and the missing lemma is associativity of \texttt{\at}.
nipkow@5375	247
nipkow@5375	248	\subsubsection*{Third lemma: \texttt{(xs \at~ys) \at~zs = xs \at~(ys \at~zs)}}
nipkow@5375	249
nipkow@5375	250	This time the canonical proof procedure
nipkow@5375	251	\begin{ttbox}\makeatother
nipkow@5375	252	\input{ToyList/lemma3}\end{ttbox}
nipkow@5375	253	succeeds without further ado. Again we name the lemma and add it to
nipkow@5375	254	the set of lemmas used during simplification:
nipkow@5375	255	\begin{ttbox}
nipkow@5375	256	\input{ToyList/qed3}\end{ttbox}
nipkow@5375	257	Now we can go back and prove the first lemma
nipkow@5375	258	\begin{ttbox}\makeatother
nipkow@5375	259	\input{ToyList/lemma1}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
nipkow@5375	260	add it to the simplification lemmas
nipkow@5375	261	\begin{ttbox}
nipkow@5375	262	\input{ToyList/qed1}\end{ttbox}
nipkow@5375	263	and then solve our main theorem:
nipkow@5375	264	\begin{ttbox}\makeatother
nipkow@5375	265	\input{ToyList/thm}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
nipkow@5375	266
nipkow@5375	267	\subsubsection*{Review}
nipkow@5375	268
nipkow@5375	269	This is the end of our toy proof. It should have familiarized you with
nipkow@5375	270	\begin{itemize}
nipkow@5375	271	\item the standard theorem proving procedure:
nipkow@5375	272	state a goal; proceed with proof until a new lemma is required; prove that
nipkow@5375	273	lemma; come back to the original goal.
nipkow@5375	274	\item a specific procedure that works well for functional programs:
nipkow@5375	275	induction followed by all-out simplification via \texttt{Auto_tac}.
nipkow@5375	276	\item a basic repertoire of proof commands.
nipkow@5375	277	\end{itemize}
nipkow@5375	278
nipkow@5375	279
nipkow@5375	280	\section{Some helpful commands}
nipkow@5375	281	\label{sec:commands-and-hints}
nipkow@5375	282
nipkow@5375	283	This section discusses a few basic commands for manipulating the proof state
nipkow@5375	284	and can be skipped by casual readers.
nipkow@5375	285
nipkow@5375	286	There are two kinds of commands used during a proof: the actual proof
nipkow@5375	287	commands and auxiliary commands for examining the proof state and controlling
nipkow@5375	288	the display. Proof commands are always of the form
nipkow@5375	289	\texttt{by(\textit{tactic});}\indexbold{tactic} where \textbf{tactic} is a
nipkow@5375	290	synonym for ``theorem proving function''. Typical examples are
nipkow@5375	291	\texttt{induct_tac} and \texttt{Auto_tac} --- the suffix \texttt{_tac} is
nipkow@5375	292	merely a mnemonic. Further tactics are introduced throughout the tutorial.
nipkow@5375	293
nipkow@5375	294	%Tactics can also be modified. For example,
nipkow@5375	295	%\begin{ttbox}
nipkow@5375	296	%by(ALLGOALS Asm_simp_tac);
nipkow@5375	297	%\end{ttbox}
nipkow@5375	298	%tells Isabelle to apply \texttt{Asm_simp_tac} to all subgoals. For more on
nipkow@5375	299	%tactics and how to combine them see~\S\ref{sec:Tactics}.
nipkow@5375	300
nipkow@5375	301	The most useful auxiliary commands are:
nipkow@5375	302	\begin{description}
nipkow@5375	303	\item[Printing the current state]
nipkow@5375	304	Type \texttt{pr();} to redisplay the current proof state, for example when it
nipkow@5375	305	has disappeared off the screen.
nipkow@5375	306	\item[Limiting the number of subgoals]
nipkow@5375	307	Typing \texttt{prlim $k$;} tells Isabelle to print only the first $k$
nipkow@5375	308	subgoals from now on and redisplays the current proof state. This is helpful
nipkow@5375	309	when there are many subgoals.
nipkow@5375	310	\item[Undoing] Typing \texttt{undo();} undoes the effect of the last
nipkow@5375	311	tactic.
nipkow@5375	312	\item[Context switch] Every proof happens in the context of a
nipkow@5375	313	\bfindex{current theory}. By default, this is the last theory loaded. If
nipkow@5375	314	you want to prove a theorem in the context of a different theory
nipkow@5375	315	\texttt{T}, you need to type \texttt{context T.thy;}\index{*context\|bold}
nipkow@5375	316	first. Of course you need to change the context again if you want to go
nipkow@5375	317	back to your original theory.
nipkow@5375	318	\item[Displaying types] We have already mentioned the flag
nipkow@5375	319	\ttindex{show_types} above. It can also be useful for detecting typos in
nipkow@5375	320	formulae early on. For example, if \texttt{show_types} is set and the goal
nipkow@5375	321	\texttt{rev(rev xs) = xs} is started, Isabelle prints the additional output
nipkow@5375	322	\begin{ttbox}
nipkow@5375	323	{\out Variables:}
nipkow@5375	324	{\out xs :: 'a list}
nipkow@5375	325	\end{ttbox}
nipkow@5375	326	which tells us that Isabelle has correctly inferred that
nipkow@5375	327	\texttt{xs} is a variable of list type. On the other hand, had we
nipkow@5375	328	made a typo as in \texttt{rev(re xs) = xs}, the response
nipkow@5375	329	\begin{ttbox}
nipkow@5375	330	Variables:
nipkow@5375	331	re :: 'a list => 'a list
nipkow@5375	332	xs :: 'a list
nipkow@5375	333	\end{ttbox}
nipkow@5375	334	would have alerted us because of the unexpected variable \texttt{re}.
nipkow@5375	335	\item[(Re)loading theories]\index{loading theories}\index{reloading theories}
nipkow@5375	336	Initially you load theory \texttt{T} by typing \ttindex{use_thy}~\texttt{"T";},
nipkow@5375	337	which loads all parent theories of \texttt{T} automatically, if they are not
nipkow@5375	338	loaded already. If you modify \texttt{T.thy} or \texttt{T.ML}, you can
nipkow@5375	339	reload it by typing \texttt{use_thy~"T";} again. This time, however, only
nipkow@5375	340	\texttt{T} is reloaded. If some of \texttt{T}'s parents have changed as well,
nipkow@5375	341	type \ttindexbold{update}\texttt{();} to reload all theories that have
nipkow@5375	342	changed.
nipkow@5375	343	\end{description}
nipkow@5375	344	Further commands are found in the Reference Manual.
nipkow@5375	345
nipkow@5375	346
nipkow@5375	347	\section{Datatypes}
nipkow@5375	348	\label{sec:datatype}
nipkow@5375	349
nipkow@5375	350	Inductive datatypes are part of almost every non-trivial application of HOL.
nipkow@5375	351	First we take another look at a very important example, the datatype of
nipkow@5375	352	lists, before we turn to datatypes in general. The section closes with a
nipkow@5375	353	case study.
nipkow@5375	354
nipkow@5375	355
nipkow@5375	356	\subsection{Lists}
nipkow@5375	357
nipkow@5375	358	Lists are one of the essential datatypes in computing. Readers of this tutorial
nipkow@5375	359	and users of HOL need to be familiar with their basic operations. Theory
nipkow@5375	360	\texttt{ToyList} is only a small fragment of HOL's predefined theory
nipkow@5375	361	\texttt{List}\footnote{\texttt{http://www.in.tum.de/\~\relax
nipkow@5375	362	isabelle/library/HOL/List.html}}.
nipkow@5375	363	The latter contains many further operations. For example, the functions
nipkow@5375	364	\ttindexbold{hd} (`head') and \ttindexbold{tl} (`tail') return the first
nipkow@5375	365	element and the remainder of a list. (However, pattern-matching is usually
nipkow@5375	366	preferable to \texttt{hd} and \texttt{tl}.)
nipkow@5375	367	Theory \texttt{List} also contains more syntactic sugar:
nipkow@5375	368	\texttt{[}$x@1$\texttt{,}\dots\texttt{,}$x@n$\texttt{]} abbreviates
nipkow@5375	369	$x@1$\texttt{\#}\dots\texttt{\#}$x@n$\texttt{\#[]}.
nipkow@5375	370	In the rest of the tutorial we always use HOL's predefined lists.
nipkow@5375	371
nipkow@5375	372
nipkow@5375	373	\subsection{The general format}
nipkow@5375	374	\label{sec:general-datatype}
nipkow@5375	375
nipkow@5375	376	The general HOL \texttt{datatype} definition is of the form
nipkow@5375	377	\[
nipkow@5375	378	\mathtt{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~
nipkow@5375	379	C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~
nipkow@5375	380	C@m~\tau@{m1}~\dots~\tau@{mk@m}
nipkow@5375	381	\]
nipkow@5375	382	where $\alpha@i$ are type variables (the parameters), $C@i$ are distinct
nipkow@5375	383	constructor names and $\tau@{ij}$ are types; it is customary to capitalize
nipkow@5375	384	the first letter in constructor names. There are a number of
nipkow@5375	385	restrictions (such as the type should not be empty) detailed
nipkow@5375	386	elsewhere~\cite{Isa-Logics-Man}. Isabelle notifies you if you violate them.
nipkow@5375	387
nipkow@5375	388	Laws about datatypes, such as \verb$[] ~= x#xs$ and \texttt{(x\#xs = y\#ys) =
nipkow@5375	389	(x=y \& xs=ys)}, are used automatically during proofs by simplification.
nipkow@5375	390	The same is true for the equations in primitive recursive function
nipkow@5375	391	definitions.
nipkow@5375	392
nipkow@5375	393	\subsection{Primitive recursion}
nipkow@5375	394
nipkow@5375	395	Functions on datatypes are usually defined by recursion. In fact, most of the
nipkow@5375	396	time they are defined by what is called \bfindex{primitive recursion}.
nipkow@5375	397	The keyword \texttt{primrec} is followed by a list of equations
nipkow@5375	398	\[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \]
nipkow@5375	399	such that $C$ is a constructor of the datatype $t$ and all recursive calls of
nipkow@5375	400	$f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus
nipkow@5375	401	Isabelle immediately sees that $f$ terminates because one (fixed!) argument
nipkow@5375	402	becomes smaller with every recursive call. There must be exactly one equation
nipkow@5375	403	for each constructor. Their order is immaterial.
nipkow@5375	404	A more general method for defining total recursive functions is explained in
nipkow@5375	405	\S\ref{sec:recdef}.
nipkow@5375	406
nipkow@5375	407	\begin{exercise}
nipkow@5375	408	Given the datatype of binary trees
nipkow@5375	409	\begin{ttbox}
nipkow@5375	410	\input{Misc/tree}\end{ttbox}
nipkow@5375	411	define a function \texttt{mirror} that mirrors the structure of a binary tree
nipkow@5375	412	by swapping subtrees (recursively). Prove \texttt{mirror(mirror(t)) = t}.
nipkow@5375	413	\end{exercise}
nipkow@5375	414
nipkow@5375	415	\subsection{\texttt{case}-expressions}
nipkow@5375	416	\label{sec:case-expressions}
nipkow@5375	417
nipkow@5375	418	HOL also features \ttindexbold{case}-expressions for analyzing elements of a
nipkow@5375	419	datatype. For example,
nipkow@5375	420	\begin{ttbox}
nipkow@5375	421	case xs of [] => 0 \| y#ys => y
nipkow@5375	422	\end{ttbox}
nipkow@5375	423	evaluates to \texttt{0} if \texttt{xs} is \texttt{[]} and to \texttt{y} if
nipkow@5375	424	\texttt{xs} is \texttt{y\#ys}. (Since the result in both branches must be of
nipkow@5375	425	the same type, it follows that \texttt{y::nat} and hence
nipkow@5375	426	\texttt{xs::(nat)list}.)
nipkow@5375	427
nipkow@5375	428	In general, if $e$ is a term of the datatype $t$ defined in
nipkow@5375	429	\S\ref{sec:general-datatype} above, the corresponding
nipkow@5375	430	\texttt{case}-expression analyzing $e$ is
nipkow@5375	431	\[
nipkow@5375	432	\begin{array}{rrcl}
nipkow@5375	433	\mbox{\tt case}~e~\mbox{\tt of} & C@1~x@{11}~\dots~x@{1k@1} & \To & e@1 \\
nipkow@5375	434	\vdots \\
nipkow@5375	435	\mid & C@m~x@{m1}~\dots~x@{mk@m} & \To & e@m
nipkow@5375	436	\end{array}
nipkow@5375	437	\]
nipkow@5375	438
nipkow@5375	439	\begin{warn}
nipkow@5375	440	{\em All} constructors must be present, their order is fixed, and nested
nipkow@5375	441	patterns are not supported. Violating these restrictions results in strange
nipkow@5375	442	error messages.
nipkow@5375	443	\end{warn}
nipkow@5375	444	\noindent
nipkow@5375	445	Nested patterns can be simulated by nested \texttt{case}-expressions: instead
nipkow@5375	446	of
nipkow@5375	447	\begin{ttbox}
nipkow@5375	448	case xs of [] => 0 \| [x] => x \| x#(y#zs) => y
nipkow@5375	449	\end{ttbox}
nipkow@5375	450	write
nipkow@5375	451	\begin{ttbox}
nipkow@5375	452	case xs of [] => 0 \| x#ys => (case ys of [] => x \| y#zs => y)
nipkow@5375	453	\end{ttbox}
nipkow@5375	454	Note that \texttt{case}-expressions should be enclosed in parentheses to
nipkow@5375	455	indicate their scope.
nipkow@5375	456
nipkow@5375	457	\subsection{Structural induction}
nipkow@5375	458
nipkow@5375	459	Almost all the basic laws about a datatype are applied automatically during
nipkow@5375	460	simplification. Only induction is invoked by hand via \texttt{induct_tac},
nipkow@5375	461	which works for any datatype. In some cases, induction is overkill and a case
nipkow@5375	462	distinction over all constructors of the datatype suffices. This is performed
nipkow@5375	463	by \ttindexbold{exhaust_tac}. A trivial example:
nipkow@5375	464	\begin{ttbox}
nipkow@5375	465	\input{Misc/exhaust.ML}{\out1. xs = [] ==> (case xs of [] => [] \| y # ys => xs) = xs}
nipkow@5375	466	{\out2. !!a list. xs = a # list ==> (case xs of [] => [] \| y # ys => xs) = xs}
nipkow@5375	467	\input{Misc/autotac.ML}\end{ttbox}
nipkow@5375	468	Note that this particular case distinction could have been automated
nipkow@5375	469	completely. See~\S\ref{sec:SimpFeatures}.
nipkow@5375	470
nipkow@5375	471	\begin{warn}
nipkow@5375	472	Induction is only allowed on a free variable that should not occur among
nipkow@5375	473	the assumptions of the subgoal. Exhaustion works for arbitrary terms.
nipkow@5375	474	\end{warn}
nipkow@5375	475
nipkow@5375	476	\subsection{Case study: boolean expressions}
nipkow@5375	477	\label{sec:boolex}
nipkow@5375	478
nipkow@5375	479	The aim of this case study is twofold: it shows how to model boolean
nipkow@5375	480	expressions and some algorithms for manipulating them, and it demonstrates
nipkow@5375	481	the constructs introduced above.
nipkow@5375	482
nipkow@5375	483	\subsubsection{How can we model boolean expressions?}
nipkow@5375	484
nipkow@5375	485	We want to represent boolean expressions built up from variables and
nipkow@5375	486	constants by negation and conjunction. The following datatype serves exactly
nipkow@5375	487	that purpose:
nipkow@5375	488	\begin{ttbox}
nipkow@5375	489	\input{Ifexpr/boolex}\end{ttbox}
nipkow@5375	490	The two constants are represented by the terms \texttt{Const~True} and
nipkow@5375	491	\texttt{Const~False}. Variables are represented by terms of the form
nipkow@5375	492	\texttt{Var}~$n$, where $n$ is a natural number (type \texttt{nat}).
nipkow@5375	493	For example, the formula $P@0 \land \neg P@1$ is represented by the term
nipkow@5375	494	\texttt{And~(Var~0)~(Neg(Var~1))}.
nipkow@5375	495
nipkow@5375	496	\subsubsection{What is the value of boolean expressions?}
nipkow@5375	497
nipkow@5375	498	The value of a boolean expressions depends on the value of its variables.
nipkow@5375	499	Hence the function \texttt{value} takes an additional parameter, an {\em
nipkow@5375	500	environment} of type \texttt{nat~=>~bool}, which maps variables to their
nipkow@5375	501	values:
nipkow@5375	502	\begin{ttbox}
nipkow@5375	503	\input{Ifexpr/value}\end{ttbox}
nipkow@5375	504
nipkow@5375	505	\subsubsection{If-expressions}
nipkow@5375	506
nipkow@5375	507	An alternative and often more efficient (because in a certain sense
nipkow@5375	508	canonical) representation are so-called \textit{If-expressions\/} built up
nipkow@5375	509	from constants (\texttt{CIF}), variables (\texttt{VIF}) and conditionals
nipkow@5375	510	(\texttt{IF}):
nipkow@5375	511	\begin{ttbox}
nipkow@5375	512	\input{Ifexpr/ifex}\end{ttbox}
nipkow@5375	513	The evaluation if If-expressions proceeds as for \texttt{boolex}:
nipkow@5375	514	\begin{ttbox}
nipkow@5375	515	\input{Ifexpr/valif}\end{ttbox}
nipkow@5375	516
nipkow@5375	517	\subsubsection{Transformation into and of If-expressions}
nipkow@5375	518
nipkow@5375	519	The type \texttt{boolex} is close to the customary representation of logical
nipkow@5375	520	formulae, whereas \texttt{ifex} is designed for efficiency. Thus we need to
nipkow@5375	521	translate from \texttt{boolex} into \texttt{ifex}:
nipkow@5375	522	\begin{ttbox}
nipkow@5375	523	\input{Ifexpr/bool2if}\end{ttbox}
nipkow@5375	524	At last, we have something we can verify: that \texttt{bool2if} preserves the
nipkow@5375	525	value of its argument.
nipkow@5375	526	\begin{ttbox}
nipkow@5375	527	\input{Ifexpr/bool2if.ML}\end{ttbox}
nipkow@5375	528	The proof is canonical:
nipkow@5375	529	\begin{ttbox}
nipkow@5375	530	\input{Ifexpr/proof.ML}\end{ttbox}
nipkow@5375	531	In fact, all proofs in this case study look exactly like this. Hence we do
nipkow@5375	532	not show them below.
nipkow@5375	533
nipkow@5375	534	More interesting is the transformation of If-expressions into a normal form
nipkow@5375	535	where the first argument of \texttt{IF} cannot be another \texttt{IF} but
nipkow@5375	536	must be a constant or variable. Such a normal form can be computed by
nipkow@5375	537	repeatedly replacing a subterm of the form \texttt{IF~(IF~b~x~y)~z~u} by
nipkow@5375	538	\texttt{IF b (IF x z u) (IF y z u)}, which has the same value. The following
nipkow@5375	539	primitive recursive functions perform this task:
nipkow@5375	540	\begin{ttbox}
nipkow@5375	541	\input{Ifexpr/normif}
nipkow@5375	542	\input{Ifexpr/norm}\end{ttbox}
nipkow@5375	543	Their interplay is a bit tricky, and we leave it to the reader to develop an
nipkow@5375	544	intuitive understanding. Fortunately, Isabelle can help us to verify that the
nipkow@5375	545	transformation preserves the value of the expression:
nipkow@5375	546	\begin{ttbox}
nipkow@5375	547	\input{Ifexpr/norm.ML}\end{ttbox}
nipkow@5375	548	The proof is canonical, provided we first show the following lemma (which
nipkow@5375	549	also helps to understand what \texttt{normif} does) and make it available
nipkow@5375	550	for simplification via \texttt{Addsimps}:
nipkow@5375	551	\begin{ttbox}
nipkow@5375	552	\input{Ifexpr/normif.ML}\end{ttbox}
nipkow@5375	553
nipkow@5375	554	But how can we be sure that \texttt{norm} really produces a normal form in
nipkow@5375	555	the above sense? We have to prove
nipkow@5375	556	\begin{ttbox}
nipkow@5375	557	\input{Ifexpr/normal_norm.ML}\end{ttbox}
nipkow@5375	558	where \texttt{normal} expresses that an If-expression is in normal form:
nipkow@5375	559	\begin{ttbox}
nipkow@5375	560	\input{Ifexpr/normal}\end{ttbox}
nipkow@5375	561	Of course, this requires a lemma about normality of \texttt{normif}
nipkow@5375	562	\begin{ttbox}
nipkow@5375	563	\input{Ifexpr/normal_normif.ML}\end{ttbox}
nipkow@5375	564	that has to be made available for simplification via \texttt{Addsimps}.
nipkow@5375	565
nipkow@5375	566	How does one come up with the required lemmas? Try to prove the main theorems
nipkow@5375	567	without them and study carefully what \texttt{Auto_tac} leaves unproved. This
nipkow@5375	568	has to provide the clue.
nipkow@5375	569	The necessity of universal quantification (\texttt{!t e}) in the two lemmas
nipkow@5375	570	is explained in \S\ref{sec:InductionHeuristics}
nipkow@5375	571
nipkow@5375	572	\begin{exercise}
nipkow@5375	573	We strengthen the definition of a {\em normal\/} If-expression as follows:
nipkow@5375	574	the first argument of all \texttt{IF}s must be a variable. Adapt the above
nipkow@5375	575	development to this changed requirement. (Hint: you may need to formulate
nipkow@5375	576	some of the goals as implications (\texttt{-->}) rather than equalities
nipkow@5375	577	(\texttt{=}).)
nipkow@5375	578	\end{exercise}
nipkow@5375	579
nipkow@5375	580	\section{Some basic types}
nipkow@5375	581
nipkow@5375	582	\subsection{Natural numbers}
nipkow@5375	583
nipkow@5375	584	The type \ttindexbold{nat} of natural numbers is predefined and behaves like
nipkow@5375	585	\begin{ttbox}
nipkow@5375	586	datatype nat = 0 \| Suc nat
nipkow@5375	587	\end{ttbox}
nipkow@5375	588	In particular, there are \texttt{case}-expressions, for example
nipkow@5375	589	\begin{ttbox}
nipkow@5375	590	case n of 0 => 0 \| Suc m => m
nipkow@5375	591	\end{ttbox}
nipkow@5375	592	primitive recursion, for example
nipkow@5375	593	\begin{ttbox}
nipkow@5375	594	\input{Misc/natsum}\end{ttbox}
nipkow@5375	595	and induction, for example
nipkow@5375	596	\begin{ttbox}
nipkow@5375	597	\input{Misc/NatSum.ML}\ttbreak
nipkow@5375	598	{\out sum n + sum n = n * Suc n}
nipkow@5375	599	{\out No subgoals!}
nipkow@5375	600	\end{ttbox}
nipkow@5375	601
nipkow@5375	602	The usual arithmetic operations \ttindexbold{+}, \ttindexbold{-},
nipkow@5375	603	\ttindexbold{*}, \ttindexbold{div} and \ttindexbold{mod} are predefined, as
nipkow@5375	604	are the relations \ttindexbold{<=} and \ttindexbold{<}. There is even a least
nipkow@5375	605	number operation \ttindexbold{LEAST}. For example, \texttt{(LEAST n.$\,$1 <
nipkow@5375	606	n) = 2} (HOL does not prove this completely automatically).
nipkow@5375	607
nipkow@5375	608	\begin{warn}
nipkow@5375	609	The operations \ttindexbold{+}, \ttindexbold{-}, \ttindexbold{*} are
nipkow@5375	610	overloaded, i.e.\ they are available not just for natural numbers but at
nipkow@5375	611	other types as well (see \S\ref{sec:TypeClasses}). For example, given
nipkow@5375	612	the goal \texttt{x+y = y+x}, there is nothing to indicate that you are
nipkow@5375	613	talking about natural numbers. Hence Isabelle can only infer that
nipkow@5375	614	\texttt{x} and \texttt{y} are of some arbitrary type where \texttt{+} is
nipkow@5375	615	declared. As a consequence, you will be unable to prove the goal (although
nipkow@5375	616	it may take you some time to realize what has happened if
nipkow@5375	617	\texttt{show_types} is not set). In this particular example, you need to
nipkow@5375	618	include an explicit type constraint, for example \texttt{x+y = y+(x::nat)}.
nipkow@5375	619	If there is enough contextual information this may not be necessary:
nipkow@5375	620	\texttt{x+0 = x} automatically implies \texttt{x::nat}.
nipkow@5375	621	\end{warn}
nipkow@5375	622
nipkow@5375	623
nipkow@5375	624	\subsection{Products}
nipkow@5375	625
nipkow@5375	626	HOL also has pairs: \texttt{($a@1$,$a@2$)} is of type \texttt{$\tau@1$ *
nipkow@5375	627	$\tau@2$} provided each $a@i$ is of type $\tau@i$. The components of a pair
nipkow@5375	628	are extracted by \texttt{fst} and \texttt{snd}:
nipkow@5375	629	\texttt{fst($x$,$y$) = $x$} and \texttt{snd($x$,$y$) = $y$}. Tuples
nipkow@5375	630	are simulated by pairs nested to the right:
nipkow@5375	631	\texttt{($a@1$,$a@2$,$a@3$)} and \texttt{$\tau@1$ * $\tau@2$ * $\tau@3$}
nipkow@5375	632	stand for \texttt{($a@1$,($a@2$,$a@3$))} and \texttt{$\tau@1$ * ($\tau@2$ *
nipkow@5375	633	$\tau@3$)}. Therefore \texttt{fst(snd($a@1$,$a@2$,$a@3$)) = $a@2$}.
nipkow@5375	634
nipkow@5375	635	It is possible to use (nested) tuples as patterns in abstractions, for
nipkow@5375	636	example \texttt{\%(x,y,z).x+y+z} and \texttt{\%((x,y),z).x+y+z}.
nipkow@5375	637
nipkow@5375	638	In addition to explicit $\lambda$-abstractions, tuple patterns can be used in
nipkow@5375	639	most variable binding constructs. Typical examples are
nipkow@5375	640	\begin{ttbox}
nipkow@5375	641	let (x,y) = f z in (y,x)
nipkow@5375	642
nipkow@5375	643	case xs of [] => 0 \| (x,y)\#zs => x+y
nipkow@5375	644	\end{ttbox}
nipkow@5375	645	Further important examples are quantifiers and sets.
nipkow@5375	646
nipkow@5375	647	\begin{warn}
nipkow@5375	648	Abstraction over pairs and tuples is merely a convenient shorthand for a more
nipkow@5375	649	complex internal representation. Thus the internal and external form of a
nipkow@5375	650	term may differ, which can affect proofs. If you want to avoid this
nipkow@5375	651	complication, use \texttt{fst} and \texttt{snd}, i.e.\ write
nipkow@5375	652	\texttt{\%p.~fst p + snd p} instead of \texttt{\%(x,y).~x + y}.
nipkow@5375	653	See~\S\ref{} for theorem proving with tuple patterns.
nipkow@5375	654	\end{warn}
nipkow@5375	655
nipkow@5375	656
nipkow@5375	657	\section{Definitions}
nipkow@5375	658	\label{sec:Definitions}
nipkow@5375	659
nipkow@5375	660	A definition is simply an abbreviation, i.e.\ a new name for an existing
nipkow@5375	661	construction. In particular, definitions cannot be recursive. Isabelle offers
nipkow@5375	662	definitions on the level of types and terms. Those on the type level are
nipkow@5375	663	called type synonyms, those on the term level are called (constant)
nipkow@5375	664	definitions.
nipkow@5375	665
nipkow@5375	666
nipkow@5375	667	\subsection{Type synonyms}
nipkow@5375	668	\indexbold{type synonyms}
nipkow@5375	669
nipkow@5375	670	Type synonyms are similar to those found in ML. Their syntax is fairly self
nipkow@5375	671	explanatory:
nipkow@5375	672	\begin{ttbox}
nipkow@5375	673	\input{Misc/types}\end{ttbox}\indexbold{*types}
nipkow@5375	674	The synonym \texttt{alist} shows that in general the type on the right-hand
nipkow@5375	675	side needs to be enclosed in double quotation marks
nipkow@5375	676	(see the end of~\S\ref{sec:intro-theory}).
nipkow@5375	677
nipkow@5375	678	Internally all synonyms are fully expanded. As a consequence Isabelle's
nipkow@5375	679	output never contains synonyms. Their main purpose is to improve the
nipkow@5375	680	readability of theory definitions. Synonyms can be used just like any other
nipkow@5375	681	type:
nipkow@5375	682	\begin{ttbox}
nipkow@5375	683	\input{Misc/consts}\end{ttbox}
nipkow@5375	684
nipkow@5375	685	\subsection{Constant definitions}
nipkow@5375	686	\label{sec:ConstDefinitions}
nipkow@5375	687
nipkow@5375	688	The above constants \texttt{nand} and \texttt{exor} are non-recursive and can
nipkow@5375	689	therefore be defined directly by
nipkow@5375	690	\begin{ttbox}
nipkow@5375	691	\input{Misc/defs}\end{ttbox}\indexbold{*defs}
nipkow@5375	692	where \texttt{defs} is a keyword and \texttt{nand_def} and \texttt{exor_def}
nipkow@5375	693	are arbitrary user-supplied names.
nipkow@5375	694	The symbol \texttt{==}\index{==>@{\tt==}\|bold} is a special form of equality
nipkow@5375	695	that should only be used in constant definitions.
nipkow@5375	696	Declarations and definitions can also be merged
nipkow@5375	697	\begin{ttbox}
nipkow@5375	698	\input{Misc/constdefs}\end{ttbox}\indexbold{*constdefs}
nipkow@5375	699	in which case the default name of each definition is $f$\texttt{_def}, where
nipkow@5375	700	$f$ is the name of the defined constant.
nipkow@5375	701
nipkow@5375	702	Note that pattern-matching is not allowed, i.e.\ each definition must be of
nipkow@5375	703	the form $f\,x@1\,\dots\,x@n$\texttt{~==~}$t$.
nipkow@5375	704
nipkow@5375	705	Section~\S\ref{sec:Simplification} explains how definitions are used
nipkow@5375	706	in proofs.
nipkow@5375	707
nipkow@5375	708	\begin{warn}
nipkow@5375	709	A common mistake when writing definitions is to introduce extra free variables
nipkow@5375	710	on the right-hand side as in the following fictitious definition:
nipkow@5375	711	\begin{ttbox}
nipkow@5375	712	defs prime_def "prime(p) == (m divides p) --> (m=1 \| m=p)"
nipkow@5375	713	\end{ttbox}
nipkow@5375	714	Isabelle rejects this `definition' because of the extra {\tt m} on the
nipkow@5375	715	right-hand side, which would introduce an inconsistency. What you should have
nipkow@5375	716	written is
nipkow@5375	717	\begin{ttbox}
nipkow@5375	718	defs prime_def "prime(p) == !m. (m divides p) --> (m=1 \| m=p)"
nipkow@5375	719	\end{ttbox}
nipkow@5375	720	\end{warn}
nipkow@5375	721
nipkow@5375	722
nipkow@5375	723
nipkow@5375	724
nipkow@5375	725	\chapter{More Functional Programming}
nipkow@5375	726
nipkow@5375	727	The purpose of this chapter is to deepen the reader's understanding of the
nipkow@5375	728	concepts encountered so far and to introduce an advanced method for defining
nipkow@5375	729	recursive functions. The first two sections give a structured presentation of
nipkow@5375	730	theorem proving by simplification (\S\ref{sec:Simplification}) and
nipkow@5375	731	discuss important heuristics for induction (\S\ref{sec:InductionHeuristics}). They
nipkow@5375	732	can be skipped by readers less interested in proofs. They are followed by a
nipkow@5375	733	case study, a compiler for expressions (\S\ref{sec:ExprCompiler}).
nipkow@5375	734	Finally we present a very general method for defining recursive functions
nipkow@5375	735	that goes well beyond what \texttt{primrec} allows (\S\ref{sec:recdef}).
nipkow@5375	736
nipkow@5375	737
nipkow@5375	738	\section{Simplification}
nipkow@5375	739	\label{sec:Simplification}
nipkow@5375	740
nipkow@5375	741	So far we have proved our theorems by \texttt{Auto_tac}, which
nipkow@5375	742	`simplifies' all subgoals. In fact, \texttt{Auto_tac} can do much more than
nipkow@5375	743	that, except that it did not need to so far. However, when you go beyond toy
nipkow@5375	744	examples, you need to understand the ingredients of \texttt{Auto_tac}.
nipkow@5375	745	This section covers the tactic that \texttt{Auto_tac} always applies first,
nipkow@5375	746	namely simplification.
nipkow@5375	747
nipkow@5375	748	Simplification is one of the central theorem proving tools in Isabelle and
nipkow@5375	749	many other systems. The tool itself is called the \bfindex{simplifier}. The
nipkow@5375	750	purpose of this section is twofold: to introduce the many features of the
nipkow@5375	751	simplifier (\S\ref{sec:SimpFeatures}) and to explain a little bit how the
nipkow@5375	752	simplifier works (\S\ref{sec:SimpHow}). Anybody intending to use HOL should
nipkow@5375	753	read \S\ref{sec:SimpFeatures}, and the serious student should read
nipkow@5375	754	\S\ref{sec:SimpHow} as well in order to understand what happened in case
nipkow@5375	755	things do not simplify as expected.
nipkow@5375	756
nipkow@5375	757
nipkow@5375	758	\subsection{Using the simplifier}
nipkow@5375	759	\label{sec:SimpFeatures}
nipkow@5375	760
nipkow@5375	761	In its most basic form, simplification means repeated application of
nipkow@5375	762	equations from left to right. For example, taking the rules for \texttt{\at}
nipkow@5375	763	and applying them to the term \texttt{[0,1] \at\ []} results in a sequence of
nipkow@5375	764	simplification steps:
nipkow@5375	765	\begin{ttbox}\makeatother
nipkow@5375	766	(0#1#[]) @ [] $\leadsto$ 0#((1#[]) @ []) $\leadsto$ 0#(1#([] @ [])) $\leadsto$ 0#1#[]
nipkow@5375	767	\end{ttbox}
nipkow@5375	768	This is also known as {\em term rewriting} and the equations are referred
nipkow@5375	769	to as {\em rewrite rules}. This is more honest than `simplification' because
nipkow@5375	770	the terms do not necessarily become simpler in the process.
nipkow@5375	771
nipkow@5375	772	\subsubsection{Simpsets}
nipkow@5375	773
nipkow@5375	774	To facilitate simplification, each theory has an associated set of
nipkow@5375	775	simplification rules, known as a \bfindex{simpset}. Within a theory,
nipkow@5375	776	proofs by simplification refer to the associated simpset by default.
nipkow@5375	777	The simpset of a theory is built up as follows: starting with the union of
nipkow@5375	778	the simpsets of the parent theories, each occurrence of a \texttt{datatype}
nipkow@5375	779	or \texttt{primrec} construct augments the simpset. Explicit definitions are
nipkow@5375	780	not added automatically. Users can add new theorems via \texttt{Addsimps} and
nipkow@5375	781	delete them again later by \texttt{Delsimps}.
nipkow@5375	782
nipkow@5375	783	You may augment a simpset not just by equations but by pretty much any
nipkow@5375	784	theorem. The simplifier will try to make sense of it. For example, a theorem
nipkow@5375	785	\verb$~$$P$ is automatically turned into \texttt{$P$ = False}. The details are
nipkow@5375	786	explained in \S\ref{sec:SimpHow}.
nipkow@5375	787
nipkow@5375	788	As a rule of thumb, rewrite rules that really simplify a term (like
nipkow@5375	789	\texttt{xs \at\ [] = xs} and \texttt{rev(rev xs) = xs}) should be added to the
nipkow@5375	790	current simpset right after they have been proved. Those of a more specific
nipkow@5375	791	nature (e.g.\ distributivity laws, which alter the structure of terms
nipkow@5375	792	considerably) should only be added for specific proofs and deleted again
nipkow@5375	793	afterwards. Conversely, it may also happen that a generally useful rule
nipkow@5375	794	needs to be removed for a certain proof and is added again afterwards. The
nipkow@5375	795	need of frequent temporary additions or deletions may indicate a badly
nipkow@5375	796	designed simpset.
nipkow@5375	797	\begin{warn}
nipkow@5375	798	Simplification may not terminate, for example if both $f(x) = g(x)$ and
nipkow@5375	799	$g(x) = f(x)$ are in the simpset. It is the user's responsibility not to
nipkow@5375	800	include rules that can lead to nontermination, either on their own or in
nipkow@5375	801	combination with other rules.
nipkow@5375	802	\end{warn}
nipkow@5375	803
nipkow@5375	804	\subsubsection{Simplification tactics}
nipkow@5375	805
nipkow@5375	806	There are four main simplification tactics:
nipkow@5375	807	\begin{ttdescription}
nipkow@5375	808	\item[\ttindexbold{Simp_tac} $i$] simplifies the conclusion of subgoal~$i$
nipkow@5375	809	using the theory's simpset. It may solve the subgoal completely if it has
nipkow@5375	810	become trivial. For example:
nipkow@5375	811	\begin{ttbox}\makeatother
nipkow@5375	812	{\out 1. [] @ [] = []}
nipkow@5375	813	by(Simp_tac 1);
nipkow@5375	814	{\out No subgoals!}
nipkow@5375	815	\end{ttbox}
nipkow@5375	816
nipkow@5375	817	\item[\ttindexbold{Asm_simp_tac}]
nipkow@5375	818	is like \verb$Simp_tac$, but extracts additional rewrite rules from
nipkow@5375	819	the assumptions of the subgoal. For example, it solves
nipkow@5375	820	\begin{ttbox}\makeatother
nipkow@5375	821	{\out 1. xs = [] ==> xs @ xs = xs}
nipkow@5375	822	\end{ttbox}
nipkow@5375	823	which \texttt{Simp_tac} does not do.
nipkow@5375	824
nipkow@5375	825	\item[\ttindexbold{Full_simp_tac}] is like \verb$Simp_tac$, but also
nipkow@5375	826	simplifies the assumptions (without using the assumptions to
nipkow@5375	827	simplify each other or the actual goal).
nipkow@5375	828
nipkow@5375	829	\item[\ttindexbold{Asm_full_simp_tac}] is like \verb$Asm_simp_tac$,
nipkow@5375	830	but also simplifies the assumptions. In particular, assumptions can
nipkow@5375	831	simplify each other. For example:
nipkow@5375	832	\begin{ttbox}\makeatother
nipkow@5375	833	\out{ 1. [\| xs @ zs = ys @ xs; [] @ xs = [] @ [] \|] ==> ys = zs}
nipkow@5375	834	by(Asm_full_simp_tac 1);
nipkow@5375	835	{\out No subgoals!}
nipkow@5375	836	\end{ttbox}
nipkow@5375	837	The second assumption simplifies to \texttt{xs = []}, which in turn
nipkow@5375	838	simplifies the first assumption to \texttt{zs = ys}, thus reducing the
nipkow@5375	839	conclusion to \texttt{ys = ys} and hence to \texttt{True}.
nipkow@5375	840	(See also the paragraph on tracing below.)
nipkow@5375	841	\end{ttdescription}
nipkow@5375	842	\texttt{Asm_full_simp_tac} is the most powerful of this quartet of
nipkow@5375	843	tactics. In fact, \texttt{Auto_tac} starts by applying
nipkow@5375	844	\texttt{Asm_full_simp_tac} to all subgoals. The only reason for the existence
nipkow@5375	845	of the other three tactics is that sometimes one wants to limit the amount of
nipkow@5375	846	simplification, for example to avoid nontermination:
nipkow@5375	847	\begin{ttbox}\makeatother
nipkow@5375	848	{\out 1. ! x. f x = g (f (g x)) ==> f [] = f [] @ []}
nipkow@5375	849	\end{ttbox}
nipkow@5375	850	is solved by \texttt{Simp_tac}, but \texttt{Asm_simp_tac} and
nipkow@5375	851	\texttt{Asm_full_simp_tac} loop because the rewrite rule \texttt{f x = g(f(g
nipkow@5375	852	x))} extracted from the assumption does not terminate. Isabelle notices
nipkow@5375	853	certain simple forms of nontermination, but not this one.
nipkow@5375	854
nipkow@5375	855	\subsubsection{Modifying simpsets locally}
nipkow@5375	856
nipkow@5375	857	If a certain theorem is merely needed in one proof by simplification, the
nipkow@5375	858	pattern
nipkow@5375	859	\begin{ttbox}
nipkow@5375	860	Addsimps [$rare_theorem$];
nipkow@5375	861	by(Simp_tac 1);
nipkow@5375	862	Delsimps [$rare_theorem$];
nipkow@5375	863	\end{ttbox}
nipkow@5375	864	is awkward. Therefore there are lower-case versions of the simplification
nipkow@5375	865	tactics (\ttindexbold{simp_tac}, \ttindexbold{asm_simp_tac},
nipkow@5375	866	\ttindexbold{full_simp_tac}, \ttindexbold{asm_full_simp_tac}) and of the
nipkow@5375	867	simpset modifiers (\ttindexbold{addsimps}, \ttindexbold{delsimps})
nipkow@5375	868	that do not access or modify the implicit simpset but explicitly take a
nipkow@5375	869	simpset as an argument. For example, the above three lines become
nipkow@5375	870	\begin{ttbox}
nipkow@5375	871	by(simp_tac (simpset() addsimps [$rare_theorem$]) 1);
nipkow@5375	872	\end{ttbox}
nipkow@5375	873	where the result of the function call \texttt{simpset()} is the simpset of
nipkow@5375	874	the current theory and \texttt{addsimps} is an infix function. The implicit
nipkow@5375	875	simpset is read once but not modified.
nipkow@5375	876	This is far preferable to pairs of \texttt{Addsimps} and \texttt{Delsimps}.
nipkow@5375	877	Local modifications can be stacked as in
nipkow@5375	878	\begin{ttbox}
nipkow@5375	879	by(simp_tac (simpset() addsimps [$rare_theorem$] delsimps [$some_thm$]) 1);
nipkow@5375	880	\end{ttbox}
nipkow@5375	881
nipkow@5375	882	\subsubsection{Rewriting with definitions}
nipkow@5375	883
nipkow@5375	884	Constant definitions (\S\ref{sec:ConstDefinitions}) are not automatically
nipkow@5375	885	included in the simpset of a theory. Hence such definitions are not expanded
nipkow@5375	886	automatically either, just as it should be: definitions are introduced for
nipkow@5375	887	the purpose of abbreviating complex concepts. Of course we need to expand the
nipkow@5375	888	definitions initially to derive enough lemmas that characterize the concept
nipkow@5375	889	sufficiently for us to forget the original definition completely. For
nipkow@5375	890	example, given the theory
nipkow@5375	891	\begin{ttbox}
nipkow@5375	892	\input{Misc/Exor.thy}\end{ttbox}
nipkow@5375	893	we may want to prove \verb$exor A (~A)$. Instead of \texttt{Goal} we use
nipkow@5375	894	\begin{ttbox}
nipkow@5375	895	\input{Misc/exorgoal.ML}\end{ttbox}
nipkow@5375	896	which tells Isabelle to expand the definition of \texttt{exor}---the first
nipkow@5375	897	argument of \texttt{Goalw} can be a list of definitions---in the initial goal:
nipkow@5375	898	\begin{ttbox}
nipkow@5375	899	{\out exor A (~ A)}
nipkow@5375	900	{\out 1. A & ~ ~ A \| ~ A & ~ A}
nipkow@5375	901	\end{ttbox}
nipkow@5375	902	In this simple example, the goal is proved by \texttt{Simp_tac}.
nipkow@5375	903	Of course the resulting theorem is insufficient to characterize \texttt{exor}
nipkow@5375	904	completely.
nipkow@5375	905
nipkow@5375	906	In case we want to expand a definition in the middle of a proof, we can
nipkow@5375	907	simply add the definition locally to the simpset:
nipkow@5375	908	\begin{ttbox}
nipkow@5375	909	\input{Misc/exorproof.ML}\end{ttbox}
nipkow@5375	910	You should normally not add the definition permanently to the simpset
nipkow@5375	911	using \texttt{Addsimps} because this defeats the whole purpose of an
nipkow@5375	912	abbreviation.
nipkow@5375	913
nipkow@5375	914	\begin{warn}
nipkow@5375	915	If you have defined $f\,x\,y$\texttt{~==~}$t$ then you can only expand
nipkow@5375	916	occurrences of $f$ with at least two arguments. Thus it is safer to define
nipkow@5375	917	$f$\texttt{~==~\%$x\,y$.}$\;t$.
nipkow@5375	918	\end{warn}
nipkow@5375	919
nipkow@5375	920	\subsubsection{Simplifying \texttt{let}-expressions}
nipkow@5375	921
nipkow@5375	922	Proving a goal containing \ttindex{let}-expressions invariably requires the
nipkow@5375	923	\texttt{let}-constructs to be expanded at some point. Since
nipkow@5375	924	\texttt{let}-\texttt{in} is just syntactic sugar for a defined constant
nipkow@5375	925	(called \texttt{Let}), expanding \texttt{let}-constructs means rewriting with
nipkow@5375	926	\texttt{Let_def}:
nipkow@5375	927	%context List.thy;
nipkow@5375	928	%Goal "(let xs = [] in xs@xs) = ys";
nipkow@5375	929	\begin{ttbox}\makeatother
nipkow@5375	930	{\out 1. (let xs = [] in xs @ xs) = ys}
nipkow@5375	931	by(simp_tac (simpset() addsimps [Let_def]) 1);
nipkow@5375	932	{\out 1. [] = ys}
nipkow@5375	933	\end{ttbox}
nipkow@5375	934	If, in a particular context, there is no danger of a combinatorial explosion
nipkow@5375	935	of nested \texttt{let}s one could even add \texttt{Let_def} permanently via
nipkow@5375	936	\texttt{Addsimps}.
nipkow@5375	937
nipkow@5375	938	\subsubsection{Conditional equations}
nipkow@5375	939
nipkow@5375	940	So far all examples of rewrite rules were equations. The simplifier also
nipkow@5375	941	accepts {\em conditional\/} equations, for example
nipkow@5375	942	\begin{ttbox}
nipkow@5375	943	xs ~= [] ==> hd xs # tl xs = xs \hfill $(*)$
nipkow@5375	944	\end{ttbox}
nipkow@5375	945	(which is proved by \texttt{exhaust_tac} on \texttt{xs} followed by
nipkow@5375	946	\texttt{Asm_full_simp_tac} twice). Assuming that this theorem together with
nipkow@5375	947	%\begin{ttbox}\makeatother
nipkow@5375	948	\texttt{(rev xs = []) = (xs = [])}
nipkow@5375	949	%\end{ttbox}
nipkow@5375	950	are part of the simpset, the subgoal
nipkow@5375	951	\begin{ttbox}\makeatother
nipkow@5375	952	{\out 1. xs ~= [] ==> hd(rev xs) # tl(rev xs) = rev xs}
nipkow@5375	953	\end{ttbox}
nipkow@5375	954	is proved by simplification:
nipkow@5375	955	the conditional equation $(*)$ above
nipkow@5375	956	can simplify \texttt{hd(rev~xs)~\#~tl(rev~xs)} to \texttt{rev xs}
nipkow@5375	957	because the corresponding precondition \verb$rev xs ~= []$ simplifies to
nipkow@5375	958	\verb$xs ~= []$, which is exactly the local assumption of the subgoal.
nipkow@5375	959
nipkow@5375	960
nipkow@5375	961	\subsubsection{Automatic case splits}
nipkow@5375	962
nipkow@5375	963	Goals containing \ttindex{if}-expressions are usually proved by case
nipkow@5375	964	distinction on the condition of the \texttt{if}. For example the goal
nipkow@5375	965	\begin{ttbox}
nipkow@5375	966	{\out 1. ! xs. if xs = [] then rev xs = [] else rev xs ~= []}
nipkow@5375	967	\end{ttbox}
nipkow@5375	968	can be split into
nipkow@5375	969	\begin{ttbox}
nipkow@5375	970	{\out 1. ! xs. (xs = [] --> rev xs = []) \& (xs ~= [] --> rev xs ~= [])}
nipkow@5375	971	\end{ttbox}
nipkow@5375	972	by typing
nipkow@5375	973	\begin{ttbox}
nipkow@5375	974	\input{Misc/splitif.ML}\end{ttbox}
nipkow@5375	975	Because this is almost always the right proof strategy, the simplifier
nipkow@5375	976	performs case-splitting on \texttt{if}s automatically. Try \texttt{Simp_tac}
nipkow@5375	977	on the initial goal above.
nipkow@5375	978
nipkow@5375	979	This splitting idea generalizes from \texttt{if} to \ttindex{case}:
nipkow@5375	980	\begin{ttbox}\makeatother
nipkow@5375	981	{\out 1. (case xs of [] => zs \| y#ys => y#(ys@zs)) = xs@zs}
nipkow@5375	982	\end{ttbox}
nipkow@5375	983	becomes
nipkow@5375	984	\begin{ttbox}\makeatother
nipkow@5375	985	{\out 1. (xs = [] --> zs = xs @ zs) &}
nipkow@5375	986	{\out (! a list. xs = a # list --> a # list @ zs = xs @ zs)}
nipkow@5375	987	\end{ttbox}
nipkow@5375	988	by typing
nipkow@5375	989	\begin{ttbox}
nipkow@5375	990	\input{Misc/splitlist.ML}\end{ttbox}
nipkow@5375	991	In contrast to \texttt{if}-expressions, the simplifier does not split
nipkow@5375	992	\texttt{case}-expressions by default because this can lead to nontermination
nipkow@5375	993	in case of recursive datatypes.
nipkow@5375	994	Nevertheless the simplifier can be instructed to perform \texttt{case}-splits
nipkow@5375	995	by adding the appropriate rule to the simpset:
nipkow@5375	996	\begin{ttbox}
nipkow@5375	997	by(simp_tac (simpset() addsplits [split_list_case]) 1);
nipkow@5375	998	\end{ttbox}\indexbold{*addsplits}
nipkow@5375	999	solves the initial goal outright, which \texttt{Simp_tac} alone will not do.
nipkow@5375	1000
nipkow@5375	1001	In general, every datatype $t$ comes with a rule
nipkow@5375	1002	\texttt{$t$.split} that can be added to the simpset either
nipkow@5375	1003	locally via \texttt{addsplits} (see above), or permanently via
nipkow@5375	1004	\begin{ttbox}
nipkow@5375	1005	Addsplits [$t$.split];
nipkow@5375	1006	\end{ttbox}\indexbold{*Addsplits}
nipkow@5375	1007	Split-rules can be removed globally via \ttindexbold{Delsplits} and locally
nipkow@5375	1008	via \ttindexbold{delsplits} as, for example, in
nipkow@5375	1009	\begin{ttbox}
nipkow@5375	1010	by(simp_tac (simpset() addsimps [$\dots$] delsplits [split_if]) 1);
nipkow@5375	1011	\end{ttbox}
nipkow@5375	1012
nipkow@5375	1013
nipkow@5375	1014	\subsubsection{Permutative rewrite rules}
nipkow@5375	1015
nipkow@5375	1016	A rewrite rule is {\bf permutative} if the left-hand side and right-hand side
nipkow@5375	1017	are the same up to renaming of variables. The most common permutative rule
nipkow@5375	1018	is commutativity: $x+y = y+x$. Another example is $(x-y)-z = (x-z)-y$. Such
nipkow@5375	1019	rules are problematic because once they apply, they can be used forever.
nipkow@5375	1020	The simplifier is aware of this danger and treats permutative rules
nipkow@5375	1021	separately. For details see~\cite{Isa-Ref-Man}.
nipkow@5375	1022
nipkow@5375	1023	\subsubsection{Tracing}
nipkow@5375	1024	\indexbold{tracing the simplifier}
nipkow@5375	1025
nipkow@5375	1026	Using the simplifier effectively may take a bit of experimentation. Set the
nipkow@5375	1027	\verb$trace_simp$ flag to get a better idea of what is going on:
nipkow@5375	1028	\begin{ttbox}\makeatother
nipkow@5375	1029	{\out 1. rev [x] = []}
nipkow@5375	1030	\ttbreak
nipkow@5375	1031	set trace_simp;
nipkow@5375	1032	by(Simp_tac 1);
nipkow@5375	1033	\ttbreak\makeatother
nipkow@5375	1034	{\out Applying instance of rewrite rule:}
nipkow@5375	1035	{\out rev (?x # ?xs) == rev ?xs @ [?x]}
nipkow@5375	1036	{\out Rewriting:}
nipkow@5375	1037	{\out rev [x] == rev [] @ [x]}
nipkow@5375	1038	\ttbreak
nipkow@5375	1039	{\out Applying instance of rewrite rule:}
nipkow@5375	1040	{\out rev [] == []}
nipkow@5375	1041	{\out Rewriting:}
nipkow@5375	1042	{\out rev [] == []}
nipkow@5375	1043	\ttbreak\makeatother
nipkow@5375	1044	{\out Applying instance of rewrite rule:}
nipkow@5375	1045	{\out [] @ ?y == ?y}
nipkow@5375	1046	{\out Rewriting:}
nipkow@5375	1047	{\out [] @ [x] == [x]}
nipkow@5375	1048	\ttbreak
nipkow@5375	1049	{\out Applying instance of rewrite rule:}
nipkow@5375	1050	{\out ?x # ?t = ?t == False}
nipkow@5375	1051	{\out Rewriting:}
nipkow@5375	1052	{\out [x] = [] == False}
nipkow@5375	1053	\ttbreak
nipkow@5375	1054	{\out Level 1}
nipkow@5375	1055	{\out rev [x] = []}
nipkow@5375	1056	{\out 1. False}
nipkow@5375	1057	\end{ttbox}
nipkow@5375	1058	In more complicated cases, the trace can be enormous, especially since
nipkow@5375	1059	invocations of the simplifier are often nested (e.g.\ when solving conditions
nipkow@5375	1060	of rewrite rules).
nipkow@5375	1061
nipkow@5375	1062	\subsection{How it works}
nipkow@5375	1063	\label{sec:SimpHow}
nipkow@5375	1064
nipkow@5375	1065	\subsubsection{Higher-order patterns}
nipkow@5375	1066
nipkow@5375	1067	\subsubsection{Local assumptions}
nipkow@5375	1068
nipkow@5375	1069	\subsubsection{The preprocessor}
nipkow@5375	1070
nipkow@5375	1071	\section{Induction heuristics}
nipkow@5375	1072	\label{sec:InductionHeuristics}
nipkow@5375	1073
nipkow@5375	1074	The purpose of this section is to illustrate some simple heuristics for
nipkow@5375	1075	inductive proofs. The first one we have already mentioned in our initial
nipkow@5375	1076	example:
nipkow@5375	1077	\begin{quote}
nipkow@5375	1078	{\em 1. Theorems about recursive functions are proved by induction.}
nipkow@5375	1079	\end{quote}
nipkow@5375	1080	In case the function has more than one argument
nipkow@5375	1081	\begin{quote}
nipkow@5375	1082	{\em 2. Do induction on argument number $i$ if the function is defined by
nipkow@5375	1083	recursion in argument number $i$.}
nipkow@5375	1084	\end{quote}
nipkow@5375	1085	When we look at the proof of
nipkow@5375	1086	\begin{ttbox}\makeatother
nipkow@5375	1087	(xs @ ys) @ zs = xs @ (ys @ zs)
nipkow@5375	1088	\end{ttbox}
nipkow@5375	1089	in \S\ref{sec:intro-proof} we find (a) \texttt{\at} is recursive in the first
nipkow@5375	1090	argument, (b) \texttt{xs} occurs only as the first argument of \texttt{\at},
nipkow@5375	1091	and (c) both \texttt{ys} and \texttt{zs} occur at least once as the second
nipkow@5375	1092	argument of \texttt{\at}. Hence it is natural to perform induction on
nipkow@5375	1093	\texttt{xs}.
nipkow@5375	1094
nipkow@5375	1095	The key heuristic, and the main point of this section, is to
nipkow@5375	1096	generalize the goal before induction. The reason is simple: if the goal is
nipkow@5375	1097	too specific, the induction hypothesis is too weak to allow the induction
nipkow@5375	1098	step to go through. Let us now illustrate the idea with an example.
nipkow@5375	1099
nipkow@5375	1100	We define a tail-recursive version of list-reversal,
nipkow@5375	1101	i.e.\ one that can be compiled into a loop:
nipkow@5375	1102	\begin{ttbox}
nipkow@5375	1103	\input{Misc/Itrev.thy}\end{ttbox}
nipkow@5375	1104	The behaviour of \texttt{itrev} is simple: it reverses its first argument by
nipkow@5375	1105	stacking its elements onto the second argument, and returning that second
nipkow@5375	1106	argument when the first one becomes empty.
nipkow@5375	1107	We need to show that \texttt{itrev} does indeed reverse its first argument
nipkow@5375	1108	provided the second one is empty:
nipkow@5375	1109	\begin{ttbox}
nipkow@5375	1110	\input{Misc/itrev1.ML}\end{ttbox}
nipkow@5375	1111	There is no choice as to the induction variable, and we immediately simplify:
nipkow@5375	1112	\begin{ttbox}
nipkow@5375	1113	\input{Misc/induct_auto.ML}\ttbreak\makeatother
nipkow@5375	1114	{\out1. !!a list. itrev list [] = rev list$\;$==> itrev list [a] = rev list @ [a]}
nipkow@5375	1115	\end{ttbox}
nipkow@5375	1116	Just as predicted above, the overall goal, and hence the induction
nipkow@5375	1117	hypothesis, is too weak to solve the induction step because of the fixed
nipkow@5375	1118	\texttt{[]}. The corresponding heuristic:
nipkow@5375	1119	\begin{quote}
nipkow@5375	1120	{\em 3. Generalize goals for induction by replacing constants by variables.}
nipkow@5375	1121	\end{quote}
nipkow@5375	1122	Of course one cannot do this na\"{\i}vely: \texttt{itrev xs ys = rev xs} is
nipkow@5375	1123	just not true --- the correct generalization is
nipkow@5375	1124	\begin{ttbox}\makeatother
nipkow@5375	1125	\input{Misc/itrev2.ML}\end{ttbox}
nipkow@5375	1126	If \texttt{ys} is replaced by \texttt{[]}, the right-hand side simplifies to
nipkow@5375	1127	\texttt{rev xs}, just as required.
nipkow@5375	1128
nipkow@5375	1129	In this particular instance it is easy to guess the right generalization,
nipkow@5375	1130	but in more complex situations a good deal of creativity is needed. This is
nipkow@5375	1131	the main source of complications in inductive proofs.
nipkow@5375	1132
nipkow@5375	1133	Although we now have two variables, only \texttt{xs} is suitable for
nipkow@5375	1134	induction, and we repeat our above proof attempt. Unfortunately, we are still
nipkow@5375	1135	not there:
nipkow@5375	1136	\begin{ttbox}\makeatother
nipkow@5375	1137	{\out 1. !!a list.}
nipkow@5375	1138	{\out itrev list ys = rev list @ ys}
nipkow@5375	1139	{\out ==> itrev list (a # ys) = rev list @ a # ys}
nipkow@5375	1140	\end{ttbox}
nipkow@5375	1141	The induction hypothesis is still too weak, but this time it takes no
nipkow@5375	1142	intuition to generalize: the problem is that \texttt{ys} is fixed throughout
nipkow@5375	1143	the subgoal, but the induction hypothesis needs to be applied with
nipkow@5375	1144	\texttt{a \# ys} instead of \texttt{ys}. Hence we prove the theorem
nipkow@5375	1145	for all \texttt{ys} instead of a fixed one:
nipkow@5375	1146	\begin{ttbox}\makeatother
nipkow@5375	1147	\input{Misc/itrev3.ML}\end{ttbox}
nipkow@5375	1148	This time induction on \texttt{xs} followed by simplification succeeds. This
nipkow@5375	1149	leads to another heuristic for generalization:
nipkow@5375	1150	\begin{quote}
nipkow@5375	1151	{\em 4. Generalize goals for induction by universally quantifying all free
nipkow@5375	1152	variables {\em(except the induction variable itself!)}.}
nipkow@5375	1153	\end{quote}
nipkow@5375	1154	This prevents trivial failures like the above and does not change the
nipkow@5375	1155	provability of the goal. Because it is not always required, and may even
nipkow@5375	1156	complicate matters in some cases, this heuristic is often not
nipkow@5375	1157	applied blindly.
nipkow@5375	1158
nipkow@5375	1159	A final point worth mentioning is the orientation of the equation we just
nipkow@5375	1160	proved: the more complex notion (\texttt{itrev}) is on the left-hand
nipkow@5375	1161	side, the simpler \texttt{rev} on the right-hand side. This constitutes
nipkow@5375	1162	another, albeit weak heuristic that is not restricted to induction:
nipkow@5375	1163	\begin{quote}
nipkow@5375	1164	{\em 5. The right-hand side of an equation should (in some sense) be
nipkow@5375	1165	simpler than the left-hand side.}
nipkow@5375	1166	\end{quote}
nipkow@5375	1167	The heuristic is tricky to apply because it is not obvious that
nipkow@5375	1168	\texttt{rev xs \at\ ys} is simpler than \texttt{itrev xs ys}. But see what
nipkow@5375	1169	happens if you try to prove the symmetric equation!
nipkow@5375	1170
nipkow@5375	1171
nipkow@5375	1172	\section{Case study: compiling expressions}
nipkow@5375	1173	\label{sec:ExprCompiler}
nipkow@5375	1174
nipkow@5375	1175	The task is to develop a compiler from a generic type of expressions (built
nipkow@5375	1176	up from variables, constants and binary operations) to a stack machine. This
nipkow@5375	1177	generic type of expressions is a generalization of the boolean expressions in
nipkow@5375	1178	\S\ref{sec:boolex}. This time we do not commit ourselves to a particular
nipkow@5375	1179	type of variables or values but make them type parameters. Neither is there
nipkow@5375	1180	a fixed set of binary operations: instead the expression contains the
nipkow@5375	1181	appropriate function itself.
nipkow@5375	1182	\begin{ttbox}
nipkow@5375	1183	\input{CodeGen/expr}\end{ttbox}
nipkow@5375	1184	The three constructors represent constants, variables and the combination of
nipkow@5375	1185	two subexpressions with a binary operation.
nipkow@5375	1186
nipkow@5375	1187	The value of an expression w.r.t.\ an environment that maps variables to
nipkow@5375	1188	values is easily defined:
nipkow@5375	1189	\begin{ttbox}
nipkow@5375	1190	\input{CodeGen/value}\end{ttbox}
nipkow@5375	1191
nipkow@5375	1192	The stack machine has three instructions: load a constant value onto the
nipkow@5375	1193	stack, load the contents of a certain address onto the stack, and apply a
nipkow@5375	1194	binary operation to the two topmost elements of the stack, replacing them by
nipkow@5375	1195	the result. As for \texttt{expr}, addresses and values are type parameters:
nipkow@5375	1196	\begin{ttbox}
nipkow@5375	1197	\input{CodeGen/instr}\end{ttbox}
nipkow@5375	1198
nipkow@5375	1199	The execution of the stack machine is modelled by a function \texttt{exec}
nipkow@5375	1200	that takes a store (modelled as a function from addresses to values, just
nipkow@5375	1201	like the environment for evaluating expressions), a stack (modelled as a
nipkow@5375	1202	list) of values and a list of instructions and returns the stack at the end
nipkow@5375	1203	of the execution --- the store remains unchanged:
nipkow@5375	1204	\begin{ttbox}
nipkow@5375	1205	\input{CodeGen/exec}\end{ttbox}
nipkow@5375	1206	Recall that \texttt{hd} and \texttt{tl}
nipkow@5375	1207	return the first element and the remainder of a list.
nipkow@5375	1208
nipkow@5375	1209	Because all functions are total, \texttt{hd} is defined even for the empty
nipkow@5375	1210	list, although we do not know what the result is. Thus our model of the
nipkow@5375	1211	machine always terminates properly, although the above definition does not
nipkow@5375	1212	tell us much about the result in situations where \texttt{Apply} was executed
nipkow@5375	1213	with fewer than two elements on the stack.
nipkow@5375	1214
nipkow@5375	1215	The compiler is a function from expressions to a list of instructions. Its
nipkow@5375	1216	definition is pretty much obvious:
nipkow@5375	1217	\begin{ttbox}\makeatother
nipkow@5375	1218	\input{CodeGen/comp}\end{ttbox}
nipkow@5375	1219
nipkow@5375	1220	Now we have to prove the correctness of the compiler, i.e.\ that the
nipkow@5375	1221	execution of a compiled expression results in the value of the expression:
nipkow@5375	1222	\begin{ttbox}
nipkow@5375	1223	exec s [] (comp e) = [value s e]
nipkow@5375	1224	\end{ttbox}
nipkow@5375	1225	This is generalized to
nipkow@5375	1226	\begin{ttbox}
nipkow@5375	1227	\input{CodeGen/goal2.ML}\end{ttbox}
nipkow@5375	1228	and proved by induction on \texttt{e} followed by simplification, once we
nipkow@5375	1229	have the following lemma about executing the concatenation of two instruction
nipkow@5375	1230	sequences:
nipkow@5375	1231	\begin{ttbox}\makeatother
nipkow@5375	1232	\input{CodeGen/goal2.ML}\end{ttbox}
nipkow@5375	1233	This requires induction on \texttt{xs} and ordinary simplification for the
nipkow@5375	1234	base cases. In the induction step, simplification leaves us with a formula
nipkow@5375	1235	that contains two \texttt{case}-expressions over instructions. Thus we add
nipkow@5375	1236	automatic case splitting as well, which finishes the proof:
nipkow@5375	1237	\begin{ttbox}
nipkow@5375	1238	\input{CodeGen/simpsplit.ML}\end{ttbox}
nipkow@5375	1239
nipkow@5375	1240	We could now go back and prove \texttt{exec s [] (comp e) = [value s e]}
nipkow@5375	1241	merely by simplification with the generalized version we just proved.
nipkow@5375	1242	However, this is unnecessary because the generalized version fully subsumes
nipkow@5375	1243	its instance.
nipkow@5375	1244
nipkow@5375	1245	\section{Total recursive functions}
nipkow@5375	1246	\label{sec:recdef}
nipkow@5375	1247	\index{*recdef\|(}
nipkow@5375	1248
nipkow@5375	1249
nipkow@5375	1250	Although many total functions have a natural primitive recursive definition,
nipkow@5375	1251	this is not always the case. Arbitrary total recursive functions can be
nipkow@5375	1252	defined by means of \texttt{recdef}: you can use full pattern-matching,
nipkow@5375	1253	recursion need not involve datatypes, and termination is proved by showing
nipkow@5375	1254	that each recursive call makes the argument smaller in a suitable (user
nipkow@5375	1255	supplied) sense.
nipkow@5375	1256
nipkow@5375	1257	\subsection{Defining recursive functions}
nipkow@5375	1258
nipkow@5375	1259	Here is a simple example, the Fibonacci function:
nipkow@5375	1260	\begin{ttbox}
nipkow@5375	1261	consts fib :: nat => nat
nipkow@5375	1262	recdef fib "measure(\%n. n)"
nipkow@5375	1263	"fib 0 = 0"
nipkow@5375	1264	"fib 1 = 1"
nipkow@5375	1265	"fib (Suc(Suc x)) = fib x + fib (Suc x)"
nipkow@5375	1266	\end{ttbox}
nipkow@5375	1267	The definition of \texttt{fib} is accompanied by a \bfindex{measure function}
nipkow@5375	1268	\texttt{\%n.$\;$n} that maps the argument of \texttt{fib} to a natural
nipkow@5375	1269	number. The requirement is that in each equation the measure of the argument
nipkow@5375	1270	on the left-hand side is strictly greater than the measure of the argument of
nipkow@5375	1271	each recursive call. In the case of \texttt{fib} this is obviously true
nipkow@5375	1272	because the measure function is the identity and \texttt{Suc(Suc~x)} is
nipkow@5375	1273	strictly greater than both \texttt{x} and \texttt{Suc~x}.
nipkow@5375	1274
nipkow@5375	1275	Slightly more interesting is the insertion of a fixed element
nipkow@5375	1276	between any two elements of a list:
nipkow@5375	1277	\begin{ttbox}
nipkow@5375	1278	consts sep :: "'a * 'a list => 'a list"
nipkow@5375	1279	recdef sep "measure (\%(a,xs). length xs)"
nipkow@5375	1280	"sep(a, []) = []"
nipkow@5375	1281	"sep(a, [x]) = [x]"
nipkow@5375	1282	"sep(a, x#y#zs) = x # a # sep(a,y#zs)"
nipkow@5375	1283	\end{ttbox}
nipkow@5375	1284	This time the measure is the length of the list, which decreases with the
nipkow@5375	1285	recursive call; the first component of the argument tuple is irrelevant.
nipkow@5375	1286
nipkow@5375	1287	Pattern matching need not be exhaustive:
nipkow@5375	1288	\begin{ttbox}
nipkow@5375	1289	consts last :: 'a list => bool
nipkow@5375	1290	recdef last "measure (\%xs. length xs)"
nipkow@5375	1291	"last [x] = x"
nipkow@5375	1292	"last (x#y#zs) = last (y#zs)"
nipkow@5375	1293	\end{ttbox}
nipkow@5375	1294
nipkow@5375	1295	Overlapping patterns are disambiguated by taking the order of equations into
nipkow@5375	1296	account, just as in functional programming:
nipkow@5375	1297	\begin{ttbox}
nipkow@5375	1298	recdef sep "measure (\%(a,xs). length xs)"
nipkow@5375	1299	"sep(a, x#y#zs) = x # a # sep(a,y#zs)"
nipkow@5375	1300	"sep(a, xs) = xs"
nipkow@5375	1301	\end{ttbox}
nipkow@5375	1302	This defines exactly the same function \texttt{sep} as further above.
nipkow@5375	1303
nipkow@5375	1304	\begin{warn}
nipkow@5375	1305	Currently \texttt{recdef} only accepts functions with a single argument,
nipkow@5375	1306	possibly of tuple type.
nipkow@5375	1307	\end{warn}
nipkow@5375	1308
nipkow@5375	1309	When loading a theory containing a \texttt{recdef} of a function $f$,
nipkow@5375	1310	Isabelle proves the recursion equations and stores the result as a list of
nipkow@5375	1311	theorems $f$.\texttt{rules}. It can be viewed by typing
nipkow@5375	1312	\begin{ttbox}
nipkow@5375	1313	prths $f$.rules;
nipkow@5375	1314	\end{ttbox}
nipkow@5375	1315	All of the above examples are simple enough that Isabelle can determine
nipkow@5375	1316	automatically that the measure of the argument goes down in each recursive
nipkow@5375	1317	call. In that case $f$.\texttt{rules} contains precisely the defining
nipkow@5375	1318	equations.
nipkow@5375	1319
nipkow@5375	1320	In general, Isabelle may not be able to prove all termination conditions
nipkow@5375	1321	automatically. For example, termination of
nipkow@5375	1322	\begin{ttbox}
nipkow@5375	1323	consts gcd :: "nat*nat => nat"
nipkow@5375	1324	recdef gcd "measure ((\%(m,n).n))"
nipkow@5375	1325	"gcd (m, n) = (if n=0 then m else gcd(n, m mod n))"
nipkow@5375	1326	\end{ttbox}
nipkow@5375	1327	relies on the lemma \texttt{mod_less_divisor}
nipkow@5375	1328	\begin{ttbox}
nipkow@5375	1329	0 < n ==> m mod n < n
nipkow@5375	1330	\end{ttbox}
nipkow@5375	1331	that is not part of the default simpset. As a result, Isabelle prints a
nipkow@5375	1332	warning and \texttt{gcd.rules} contains a precondition:
nipkow@5375	1333	\begin{ttbox}
nipkow@5375	1334	(! m n. 0 < n --> m mod n < n) ==> gcd (m, n) = (if n=0 \dots)
nipkow@5375	1335	\end{ttbox}
nipkow@5375	1336	We need to instruct \texttt{recdef} to use an extended simpset to prove the
nipkow@5375	1337	termination condition:
nipkow@5375	1338	\begin{ttbox}
nipkow@5375	1339	recdef gcd "measure ((\%(m,n).n))"
nipkow@5375	1340	simpset "simpset() addsimps [mod_less_divisor]"
nipkow@5375	1341	"gcd (m, n) = (if n=0 then m else gcd(n, m mod n))"
nipkow@5375	1342	\end{ttbox}
nipkow@5375	1343	This time everything works fine and \texttt{gcd.rules} contains precisely the
nipkow@5375	1344	stated recursion equation for \texttt{gcd}.
nipkow@5375	1345
nipkow@5375	1346	When defining some nontrivial total recursive function, the first attempt
nipkow@5375	1347	will usually generate a number of termination conditions, some of which may
nipkow@5375	1348	require new lemmas to be proved in some of the parent theories. Those lemmas
nipkow@5375	1349	can then be added to the simpset used by \texttt{recdef} for its
nipkow@5375	1350	proofs, as shown for \texttt{gcd}.
nipkow@5375	1351
nipkow@5375	1352	Although all the above examples employ measure functions, \texttt{recdef}
nipkow@5375	1353	allows arbitrary wellfounded relations. For example, termination of
nipkow@5375	1354	Ackermann's function requires the lexicographic product \texttt{**}:
nipkow@5375	1355	\begin{ttbox}
nipkow@5375	1356	recdef ack "measure(\%m. m) ** measure(\%n. n)"
nipkow@5375	1357	"ack(0,n) = Suc n"
nipkow@5375	1358	"ack(Suc m,0) = ack(m, 1)"
nipkow@5375	1359	"ack(Suc m,Suc n) = ack(m,ack(Suc m,n))"
nipkow@5375	1360	\end{ttbox}
nipkow@5375	1361	For details see~\cite{Isa-Logics-Man} and the examples in the library.
nipkow@5375	1362
nipkow@5375	1363
nipkow@5375	1364	\subsection{Deriving simplification rules}
nipkow@5375	1365
nipkow@5375	1366	Once we have succeeded to prove all termination conditions, we can start to
nipkow@5375	1367	derive some theorems. In contrast to \texttt{primrec} definitions, which are
nipkow@5375	1368	automatically added to the simpset, \texttt{recdef} rules must be included
nipkow@5375	1369	explicitly, for example as in
nipkow@5375	1370	\begin{ttbox}
nipkow@5375	1371	Addsimps fib.rules;
nipkow@5375	1372	\end{ttbox}
nipkow@5375	1373	However, some care is necessary now, in contrast to \texttt{primrec}.
nipkow@5375	1374	Although \texttt{gcd} is a total function, its defining equation leads to
nipkow@5375	1375	nontermination of the simplifier, because the subterm \texttt{gcd(n, m mod
nipkow@5375	1376	n)} on the right-hand side can again be simplified by the same equation,
nipkow@5375	1377	and so on. The reason: the simplifier rewrites the \texttt{then} and
nipkow@5375	1378	\texttt{else} branches of a conditional if the condition simplifies to
nipkow@5375	1379	neither \texttt{True} nor \texttt{False}. Therefore it is recommended to
nipkow@5375	1380	derive an alternative formulation that replaces case distinctions on the
nipkow@5375	1381	right-hand side by conditional equations. For \texttt{gcd} it means we have
nipkow@5375	1382	to prove
nipkow@5375	1383	\begin{ttbox}
nipkow@5375	1384	gcd (m, 0) = m
nipkow@5375	1385	n ~= 0 ==> gcd (m, n) = gcd(n, m mod n)
nipkow@5375	1386	\end{ttbox}
nipkow@5375	1387	To avoid nontermination during those proofs, we have to resort to some low
nipkow@5375	1388	level tactics:
nipkow@5375	1389	\begin{ttbox}
nipkow@5375	1390	Goal "gcd(m,0) = m";
nipkow@5375	1391	by(resolve_tac [trans] 1);
nipkow@5375	1392	by(resolve_tac gcd.rules 1);
nipkow@5375	1393	by(Simp_tac 1);
nipkow@5375	1394	\end{ttbox}
nipkow@5375	1395	At this point it is not necessary to understand what exactly
nipkow@5375	1396	\texttt{resolve_tac} is doing. The main point is that the above proof works
nipkow@5375	1397	not just for this one example but in general (except that we have to use
nipkow@5375	1398	\texttt{Asm_simp_tac} and $f$\texttt{.rules} in general). Try the second
nipkow@5375	1399	\texttt{gcd}-equation above!
nipkow@5375	1400
nipkow@5375	1401	\subsection{Induction}
nipkow@5375	1402
nipkow@5375	1403	Assuming we have added the recursion equations (or some suitable derived
nipkow@5375	1404	equations) to the simpset, we might like to prove something about our
nipkow@5375	1405	function. Since the function is recursive, the natural proof principle is
nipkow@5375	1406	again induction. But this time the structural form of induction that comes
nipkow@5375	1407	with datatypes is unlikely to work well---otherwise we could have defined the
nipkow@5375	1408	function by \texttt{primrec}. Therefore \texttt{recdef} automatically proves
nipkow@5375	1409	a suitable induction rule $f$\texttt{.induct} that follows the recursion
nipkow@5375	1410	pattern of the particular function $f$. Roughly speaking, it requires you to
nipkow@5375	1411	prove for each \texttt{recdef} equation that the property you are trying to
nipkow@5375	1412	establish holds for the left-hand side provided it holds for all recursive
nipkow@5375	1413	calls on the right-hand side. Applying $f$\texttt{.induct} requires its
nipkow@5375	1414	explicit instantiation. See \S\ref{sec:explicit-inst} for details.
nipkow@5375	1415
nipkow@5375	1416	\index{*recdef\|)}

author	nipkow
	Wed, 26 Aug 1998 16:57:49 +0200
changeset 5375	1463e182c533
child 5850	9712294e60b9
permissions	-rw-r--r--