1.1 --- a/doc-src/Sledgehammer/sledgehammer.tex Fri May 20 12:47:59 2011 +0200
1.2 +++ b/doc-src/Sledgehammer/sledgehammer.tex Fri May 20 12:47:59 2011 +0200
1.3 @@ -118,7 +118,7 @@
1.4 \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's
1.5 \texttt{src/HOL/Metis\_Examples} directory.
1.6 Comments and bug reports concerning Sledgehammer or this manual should be
1.7 -directed to \authoremail.
1.8 +directed to the author at \authoremail.
1.9
1.10 \vskip2.5\smallskipamount
1.11
1.12 @@ -283,7 +283,7 @@
1.13 the proof found by E looks perfect, so click it to finish the proof.
1.14
1.15 You can ask Sledgehammer for an Isar text proof by passing the
1.16 -\textit{isar\_proof} option:
1.17 +\textit{isar\_proof} option (\S\ref{output-format}):
1.18
1.19 \prew
1.20 \textbf{sledgehammer} [\textit{isar\_proof}]
1.21 @@ -356,14 +356,14 @@
1.22
1.23 \point{Why does Metis fail to reconstruct the proof?}
1.24
1.25 -There can be many reasons. If Metis runs seemingly forever, that's a sign that
1.26 -the proof is too difficult for it. Metis is complete, so it should eventually
1.27 -find it, but that's little consolation. There are several possible solutions:
1.28 +There are many reasons. If Metis runs seemingly forever, that is a sign that the
1.29 +proof is too difficult for it. Metis is complete, so it should eventually find
1.30 +it, but that's little consolation. There are several possible solutions:
1.31
1.32 \begin{enum}
1.33 -\item[$\bullet$] Try the \textit{isar\_proof} option to obtain a step-by-step
1.34 -Isar proof where each step is justified by Metis. Since the steps are fairly
1.35 -small, Metis is more likely to be able to replay them.
1.36 +\item[$\bullet$] Try the \textit{isar\_proof} option (\S\ref{output-format}) to
1.37 +obtain a step-by-step Isar proof where each step is justified by Metis. Since
1.38 +the steps are fairly small, Metis is more likely to be able to replay them.
1.39
1.40 \item[$\bullet$] Try the \textit{smt} proof method instead of \textit{metis}. It
1.41 is usually stronger, but you need to have Z3 available to replay the proofs,
1.42 @@ -375,75 +375,118 @@
1.43 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
1.44 \end{enum}
1.45
1.46 -% * sometimes Metis runs into some error, e.g. a type error. then it tries
1.47 -% again with metisFT, where FT stands for ``full type information'
1.48 -% * metisFT is much slower, but its proof search is fully typed, and it also
1.49 -% includes more powerful rules such as the axiom ``$x = \mathit{True}
1.50 -% \mathrel{\lor} x = \mathit{False}$'' for reasoning in higher-order places
1.51 -% (e.g., in set comprehensions)
1.52 -%
1.53 -% * finally, in some cases the ATP proof is simply type-incorrect.
1.54 -% Sledgehammer drops some type information to speed up the search. Try
1.55 -% Sledgehammer again with full type information: \textit{full\_types}
1.56 -% (\S\ref{problem-encoding}), or choose a specific type encoding with
1.57 -% \textit{type\_sys} (\S\ref{problem-encoding}). Older versions of
1.58 -% Sledgehammer were frequent victims of this problem. Now this should very
1.59 -% seldom be an issue, but if you notice too many unsound proofs, contact
1.60 -%
1.61 -%\point{How can I easily tell whether a Sledgehammer proof is sound?}
1.62 -%
1.63 -%Easiest way: Once it's found: ... by (metis facts)
1.64 -%try
1.65 -%sledgehammer [full\_types] (facts)
1.66 -%
1.67 -%should usually give unprovable or refind the proof fairly quickly
1.68 -%
1.69 -%Same trick if you believe that there exists a proof with certain facts.
1.70 -%
1.71 -%\point{Which facts does Sledgehammer select?}
1.72 -%
1.73 -% * heuristic
1.74 -% * and several hundreds
1.75 -% * show them: debug
1.76 -% * influence it with sledgehammer (add: xxx)
1.77 -%
1.78 -% * S/h good at finding short proofs combining a handful of existing lemmas
1.79 -% * for deeper proofs, you must restrict the number of facts, e.g.
1.80 -% max\_relevant = 50
1.81 -% * but then proof reconstruction is an issue
1.82 -%
1.83 -%\point{Why are the Isar proofs generated by Sledgehammer so ugly?}
1.84 -%
1.85 -% * experimental
1.86 -% * working on this
1.87 -% * there is a large body of research into transforming resolution proofs into
1.88 -% natural deduction proofs (e.g., Isar proofs)
1.89 -% * meantime: isar\_shrink\_factor
1.90 -%
1.91 -%
1.92 -%\point{Should I let Sledgehammer minimize the number of lemmas?}
1.93 -%
1.94 -% * in general, yes
1.95 -% * proofs involving fewer lemmas tend to be shorter as well, and hence easier
1.96 -% to re-find by Metis
1.97 -% * but the opposite is sometimes the case
1.98 +In some rare cases, Metis fails fairly quickly. This usually indicates that
1.99 +Sledgehammer found a type-incorrect proof. Sledgehammer erases some type
1.100 +information to speed up the search. Try Sledgehammer again with full type
1.101 +information: \textit{full\_types} (\S\ref{problem-encoding}), or choose a
1.102 +specific type encoding with \textit{type\_sys} (\S\ref{problem-encoding}). Older
1.103 +versions of Sledgehammer were frequent victims of this problem. Now this should
1.104 +very seldom be an issue, but if you notice many unsound proofs, contact the
1.105 +author at \authoremail.
1.106
1.107 -% Why is Sledgehammer automatically minimizing sometimes?
1.108 -% * some provers (e.g. CVC3 and Yices)
1.109 -% * also, sometimes E finds a proof but doesn't give a proof object
1.110 +\point{How can I tell whether a Sledgehammer proof is sound?}
1.111 +
1.112 +First, if \emph{metis} (or \emph{metisFT}) can reconstruct it, the proof is
1.113 +sound (modulo soundness of Isabelle's inference kernel). If it fails or runs
1.114 +seemingly forever, you can try
1.115 +
1.116 +\prew
1.117 +\textbf{apply}~\textbf{--} \\
1.118 +\textbf{sledgehammer} [\textit{type\_sys} = \textit{poly\_tags}] (\textit{metis\_facts})
1.119 +\postw
1.120 +
1.121 +where \textit{metis\_facts} is the list of facts appearing in the suggested
1.122 +Metis call. The automatic provers should be able to refind the proof very
1.123 +quickly if it is sound, and the \textit{type\_sys} $=$ \textit{poly\_tags}
1.124 +option (\S\ref{problem-encoding}) ensures that no unsound proofs are found.
1.125 +
1.126 +The \textit{full\_types} option (\S\ref{problem-encoding}) can also be used
1.127 +here, but it is unsound in extremely rare degenerate cases such as the
1.128 +following:
1.129 +
1.130 +\prew
1.131 +\textbf{lemma} ``$\forall x\> y\Colon{'}a.\ x = y \,\Longrightarrow \exists f\> g\Colon\mathit{nat} \Rightarrow {'}a.\ f \not= g$'' \\
1.132 +\textbf{sledgehammer} [\textit{full\_types}] (\textit{nat.distinct\/}(1))
1.133 +\postw
1.134 +
1.135 +\point{How does Sledgehammer select the facts that should be passed to the
1.136 +automatic provers?}
1.137 +
1.138 +Briefly, the relevance filter assigns a score to every available fact (lemma,
1.139 +theorem, definition, or axiom)\ based upon how many constants that fact shares
1.140 +with the conjecture; this process iterates to include facts relevant to those
1.141 +just accepted, but with a decay factor to ensure termination. The constants are
1.142 +weighted to give unusual ones greater significance. The relevance filter copes
1.143 +best when the conjecture contains some unusual constants; if all the constants
1.144 +are common, it is unable to discriminate among the hundreds of facts that are
1.145 +picked up. The relevance filter is also memoryless: It has no information about
1.146 +how many times a particular fact has been used in a proof, and it cannot learn.
1.147 +
1.148 +The number of facts included in a problem varies from prover to prover, since
1.149 +some provers get overwhelmed quicker than others. You can show the number of
1.150 +facts given using the \textit{verbose} option (\S\ref{output-format}) and the
1.151 +actual facts using \textit{debug} (\S\ref{output-format}).
1.152 +
1.153 +Sledgehammer is good at finding short proofs combining a handful of existing
1.154 +lemmas. If you are looking for longer proofs, you must typically restrict the
1.155 +number of facts, by setting \textit{max\_relevant} to, say, 50 or 100.
1.156 +
1.157 +\point{Why are the Isar proofs generated by Sledgehammer so ugly?}
1.158 +
1.159 +The current implementation is experimental and explodes exponentially in the
1.160 +worst case. Work on a new implementation has begun. There is a large body of
1.161 +research into transforming resolution proofs into natural deduction proofs (such
1.162 +as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
1.163 +set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger
1.164 +value or to try several provers and keep the nicest-looking proof.
1.165 +
1.166 +\point{Should I let Sledgehammer minimize the number of lemmas?}
1.167 +
1.168 +In general, minimization is a good idea, because proofs involving fewer lemmas
1.169 +tend to be shorter as well, and hence easier to re-find by Metis. But the
1.170 +opposite is sometimes the case.
1.171 +
1.172 +\point{Why does the minimizer sometimes starts of its own?}
1.173 +
1.174 +There are two scenarios in which this can happen. First, some provers (e.g.,
1.175 +CVC3 and Yices) do not provide proofs or provide incomplete proofs. The
1.176 +minimizer is then invoked to find out which facts are actually needed from the
1.177 +(large) set of facts that was initinally given to the prover. Second, if a
1.178 +prover returns a proof with lots of facts, the minimizer is invoked
1.179 +automatically since Metis is unlikely to refind the proof.
1.180 +
1.181 +\point{What is metisFT?}
1.182 +
1.183 +The \textit{metisFT} proof method is the fully-typed version of Metis. It is
1.184 +much slower than \textit{metis}, but the proof search is fully typed, and it
1.185 +also includes more powerful rules such as the axiom ``$x = \mathit{True}
1.186 +\mathrel{\lor} x = \mathit{False}$'' for reasoning in higher-order places (e.g.,
1.187 +in set comprehensions). The method kicks in automatically as a fallback when
1.188 +\textit{metis} fails, and it is sometimes generated by Sledgehammer instead of
1.189 +\textit{metis} if the proof obviously requires type information.
1.190 +
1.191 +If you see the warning
1.192 +
1.193 +\prew
1.194 +\textsl
1.195 +Metis: Falling back on ``\textit{metisFT}''.
1.196 +\postw
1.197 +
1.198 +in a successful Metis proof, you can advantageously replace the \textit{metis}
1.199 +call with \textit{metisFT}.
1.200
1.201 \point{I got a strange error from Sledgehammer---what should I do?}
1.202
1.203 Sledgehammer tries to give informative error messages. Please report any strange
1.204 -error to \authoremail. This applies double if you get the message
1.205 +error to the author at \authoremail. This applies double if you get the message
1.206
1.207 -\begin{quote}
1.208 +\prew
1.209 \slshape
1.210 The prover found a type-unsound proof involving ``\textit{foo}'',
1.211 ``\textit{bar}'', ``\textit{baz}'' even though a supposedly type-sound encoding
1.212 was used (or, less likely, your axioms are inconsistent). You might want to
1.213 report this to the Isabelle developers.
1.214 -\end{quote}
1.215 +\postw
1.216
1.217 \point{Auto can solve it---why not Sledgehammer?}
1.218
1.219 @@ -452,6 +495,13 @@
1.220 Because the system refers to all theorems known to Isabelle, it is particularly
1.221 suitable when your goal has a short proof from lemmas that you don't know about.
1.222
1.223 +\point{Why are there so many options?}
1.224 +
1.225 +Sledgehammer's philosophy should work out of the box, without user guidance.
1.226 +Many of the options are meant to be used mostly by the Sledgehammer developers
1.227 +for experimentation purposes. Of course, feel free to experiment with them if
1.228 +you are so inclined.
1.229 +
1.230 \section{Command Syntax}
1.231 \label{command-syntax}
1.232