doc-src/Sledgehammer/sledgehammer.tex
changeset 43752 ec1ea24d49bc
parent 43746 d7447b8c4265
child 43753 75c94e3319ae
     1.1 --- a/doc-src/Sledgehammer/sledgehammer.tex	Fri May 20 12:47:59 2011 +0200
     1.2 +++ b/doc-src/Sledgehammer/sledgehammer.tex	Fri May 20 12:47:59 2011 +0200
     1.3 @@ -118,7 +118,7 @@
     1.4  \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's
     1.5  \texttt{src/HOL/Metis\_Examples} directory.
     1.6  Comments and bug reports concerning Sledgehammer or this manual should be
     1.7 -directed to \authoremail.
     1.8 +directed to the author at \authoremail.
     1.9  
    1.10  \vskip2.5\smallskipamount
    1.11  
    1.12 @@ -283,7 +283,7 @@
    1.13  the proof found by E looks perfect, so click it to finish the proof.
    1.14  
    1.15  You can ask Sledgehammer for an Isar text proof by passing the
    1.16 -\textit{isar\_proof} option:
    1.17 +\textit{isar\_proof} option (\S\ref{output-format}):
    1.18  
    1.19  \prew
    1.20  \textbf{sledgehammer} [\textit{isar\_proof}]
    1.21 @@ -356,14 +356,14 @@
    1.22  
    1.23  \point{Why does Metis fail to reconstruct the proof?}
    1.24  
    1.25 -There can be many reasons. If Metis runs seemingly forever, that's a sign that
    1.26 -the proof is too difficult for it. Metis is complete, so it should eventually
    1.27 -find it, but that's little consolation. There are several possible solutions:
    1.28 +There are many reasons. If Metis runs seemingly forever, that is a sign that the
    1.29 +proof is too difficult for it. Metis is complete, so it should eventually find
    1.30 +it, but that's little consolation. There are several possible solutions:
    1.31  
    1.32  \begin{enum}
    1.33 -\item[$\bullet$] Try the \textit{isar\_proof} option to obtain a step-by-step
    1.34 -Isar proof where each step is justified by Metis. Since the steps are fairly
    1.35 -small, Metis is more likely to be able to replay them.
    1.36 +\item[$\bullet$] Try the \textit{isar\_proof} option (\S\ref{output-format}) to
    1.37 +obtain a step-by-step Isar proof where each step is justified by Metis. Since
    1.38 +the steps are fairly small, Metis is more likely to be able to replay them.
    1.39  
    1.40  \item[$\bullet$] Try the \textit{smt} proof method instead of \textit{metis}. It
    1.41  is usually stronger, but you need to have Z3 available to replay the proofs,
    1.42 @@ -375,75 +375,118 @@
    1.43  \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
    1.44  \end{enum}
    1.45  
    1.46 -%    * sometimes Metis runs into some error, e.g. a type error. then it tries
    1.47 -%      again with metisFT, where FT stands for ``full type information'
    1.48 -%    * metisFT is much slower, but its proof search is fully typed, and it also
    1.49 -%      includes more powerful rules such as the axiom ``$x = \mathit{True}
    1.50 -%      \mathrel{\lor} x = \mathit{False}$'' for reasoning in higher-order places
    1.51 -%      (e.g., in set comprehensions)
    1.52 -%
    1.53 -%    * finally, in some cases the ATP proof is simply type-incorrect.
    1.54 -%      Sledgehammer drops some type information to speed up the search. Try
    1.55 -%      Sledgehammer again with full type information: \textit{full\_types}
    1.56 -%      (\S\ref{problem-encoding}), or choose a specific type encoding with
    1.57 -%      \textit{type\_sys} (\S\ref{problem-encoding}). Older versions of
    1.58 -%      Sledgehammer were frequent victims of this problem. Now this should very
    1.59 -%      seldom be an issue, but if you notice too many unsound proofs, contact
    1.60 -%
    1.61 -%\point{How can I easily tell whether a Sledgehammer proof is sound?}
    1.62 -%
    1.63 -%Easiest way: Once it's found: ... by (metis facts)
    1.64 -%try
    1.65 -%sledgehammer [full\_types] (facts)
    1.66 -%
    1.67 -%should usually give unprovable or refind the proof fairly quickly
    1.68 -%
    1.69 -%Same trick if you believe that there exists a proof with certain facts.
    1.70 -%
    1.71 -%\point{Which facts does Sledgehammer select?}
    1.72 -%
    1.73 -%    * heuristic
    1.74 -%    * and several hundreds
    1.75 -%    * show them: debug
    1.76 -%    * influence it with sledgehammer (add: xxx)
    1.77 -%
    1.78 -%    * S/h good at finding short proofs combining a handful of existing lemmas
    1.79 -%    * for deeper proofs, you must restrict the number of facts, e.g.
    1.80 -%      max\_relevant = 50
    1.81 -%    * but then proof reconstruction is an issue
    1.82 -%
    1.83 -%\point{Why are the Isar proofs generated by Sledgehammer so ugly?}
    1.84 -%
    1.85 -%    * experimental
    1.86 -%    * working on this
    1.87 -%    * there is a large body of research into transforming resolution proofs into
    1.88 -%      natural deduction proofs (e.g., Isar proofs)
    1.89 -%    * meantime: isar\_shrink\_factor
    1.90 -%
    1.91 -%
    1.92 -%\point{Should I let Sledgehammer minimize the number of lemmas?}
    1.93 -%
    1.94 -%    * in general, yes
    1.95 -%    * proofs involving fewer lemmas tend to be shorter as well, and hence easier
    1.96 -%      to re-find by Metis
    1.97 -%    * but the opposite is sometimes the case
    1.98 +In some rare cases, Metis fails fairly quickly. This usually indicates that
    1.99 +Sledgehammer found a type-incorrect proof. Sledgehammer erases some type
   1.100 +information to speed up the search. Try Sledgehammer again with full type
   1.101 +information: \textit{full\_types} (\S\ref{problem-encoding}), or choose a
   1.102 +specific type encoding with \textit{type\_sys} (\S\ref{problem-encoding}). Older
   1.103 +versions of Sledgehammer were frequent victims of this problem. Now this should
   1.104 +very seldom be an issue, but if you notice many unsound proofs, contact the
   1.105 +author at \authoremail.
   1.106  
   1.107 -% Why is Sledgehammer automatically minimizing sometimes?
   1.108 -%       * some provers (e.g. CVC3 and Yices)
   1.109 -%       * also, sometimes E finds a proof but doesn't give a proof object
   1.110 +\point{How can I tell whether a Sledgehammer proof is sound?}
   1.111 +
   1.112 +First, if \emph{metis} (or \emph{metisFT}) can reconstruct it, the proof is
   1.113 +sound (modulo soundness of Isabelle's inference kernel). If it fails or runs
   1.114 +seemingly forever, you can try
   1.115 +
   1.116 +\prew
   1.117 +\textbf{apply}~\textbf{--} \\
   1.118 +\textbf{sledgehammer} [\textit{type\_sys} = \textit{poly\_tags}] (\textit{metis\_facts})
   1.119 +\postw
   1.120 +
   1.121 +where \textit{metis\_facts} is the list of facts appearing in the suggested
   1.122 +Metis call. The automatic provers should be able to refind the proof very
   1.123 +quickly if it is sound, and the \textit{type\_sys} $=$ \textit{poly\_tags}
   1.124 +option (\S\ref{problem-encoding}) ensures that no unsound proofs are found.
   1.125 +
   1.126 +The \textit{full\_types} option (\S\ref{problem-encoding}) can also be used
   1.127 +here, but it is unsound in extremely rare degenerate cases such as the
   1.128 +following:
   1.129 +
   1.130 +\prew
   1.131 +\textbf{lemma} ``$\forall x\> y\Colon{'}a.\ x = y \,\Longrightarrow \exists f\> g\Colon\mathit{nat} \Rightarrow {'}a.\ f \not= g$'' \\
   1.132 +\textbf{sledgehammer} [\textit{full\_types}] (\textit{nat.distinct\/}(1))
   1.133 +\postw
   1.134 +
   1.135 +\point{How does Sledgehammer select the facts that should be passed to the
   1.136 +automatic provers?}
   1.137 +
   1.138 +Briefly, the relevance filter assigns a score to every available fact (lemma,
   1.139 +theorem, definition, or axiom)\ based upon how many constants that fact shares
   1.140 +with the conjecture; this process iterates to include facts relevant to those
   1.141 +just accepted, but with a decay factor to ensure termination. The constants are
   1.142 +weighted to give unusual ones greater significance. The relevance filter copes
   1.143 +best when the conjecture contains some unusual constants; if all the constants
   1.144 +are common, it is unable to discriminate among the hundreds of facts that are
   1.145 +picked up. The relevance filter is also memoryless: It has no information about
   1.146 +how many times a particular fact has been used in a proof, and it cannot learn.
   1.147 +
   1.148 +The number of facts included in a problem varies from prover to prover, since
   1.149 +some provers get overwhelmed quicker than others. You can show the number of
   1.150 +facts given using the \textit{verbose} option (\S\ref{output-format}) and the
   1.151 +actual facts using \textit{debug} (\S\ref{output-format}).
   1.152 +
   1.153 +Sledgehammer is good at finding short proofs combining a handful of existing
   1.154 +lemmas. If you are looking for longer proofs, you must typically restrict the
   1.155 +number of facts, by setting \textit{max\_relevant} to, say, 50 or 100.
   1.156 +
   1.157 +\point{Why are the Isar proofs generated by Sledgehammer so ugly?}
   1.158 +
   1.159 +The current implementation is experimental and explodes exponentially in the
   1.160 +worst case. Work on a new implementation has begun. There is a large body of
   1.161 +research into transforming resolution proofs into natural deduction proofs (such
   1.162 +as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
   1.163 +set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger
   1.164 +value or to try several provers and keep the nicest-looking proof.
   1.165 +
   1.166 +\point{Should I let Sledgehammer minimize the number of lemmas?}
   1.167 +
   1.168 +In general, minimization is a good idea, because proofs involving fewer lemmas
   1.169 +tend to be shorter as well, and hence easier to re-find by Metis. But the
   1.170 +opposite is sometimes the case.
   1.171 +
   1.172 +\point{Why does the minimizer sometimes starts of its own?}
   1.173 +
   1.174 +There are two scenarios in which this can happen. First, some provers (e.g.,
   1.175 +CVC3 and Yices) do not provide proofs or provide incomplete proofs. The
   1.176 +minimizer is then invoked to find out which facts are actually needed from the
   1.177 +(large) set of facts that was initinally given to the prover. Second, if a
   1.178 +prover returns a proof with lots of facts, the minimizer is invoked
   1.179 +automatically since Metis is unlikely to refind the proof.
   1.180 +
   1.181 +\point{What is metisFT?}
   1.182 +
   1.183 +The \textit{metisFT} proof method is the fully-typed version of Metis. It is
   1.184 +much slower than \textit{metis}, but the proof search is fully typed, and it
   1.185 +also includes more powerful rules such as the axiom ``$x = \mathit{True}
   1.186 +\mathrel{\lor} x = \mathit{False}$'' for reasoning in higher-order places (e.g.,
   1.187 +in set comprehensions). The method kicks in automatically as a fallback when
   1.188 +\textit{metis} fails, and it is sometimes generated by Sledgehammer instead of
   1.189 +\textit{metis} if the proof obviously requires type information.
   1.190 +
   1.191 +If you see the warning
   1.192 +
   1.193 +\prew
   1.194 +\textsl
   1.195 +Metis: Falling back on ``\textit{metisFT}''.
   1.196 +\postw
   1.197 +
   1.198 +in a successful Metis proof, you can advantageously replace the \textit{metis}
   1.199 +call with \textit{metisFT}.
   1.200  
   1.201  \point{I got a strange error from Sledgehammer---what should I do?}
   1.202  
   1.203  Sledgehammer tries to give informative error messages. Please report any strange
   1.204 -error to \authoremail. This applies double if you get the message
   1.205 +error to the author at \authoremail. This applies double if you get the message
   1.206  
   1.207 -\begin{quote}
   1.208 +\prew
   1.209  \slshape
   1.210  The prover found a type-unsound proof involving ``\textit{foo}'',
   1.211  ``\textit{bar}'', ``\textit{baz}'' even though a supposedly type-sound encoding
   1.212  was used (or, less likely, your axioms are inconsistent). You might want to
   1.213  report this to the Isabelle developers.
   1.214 -\end{quote}
   1.215 +\postw
   1.216  
   1.217  \point{Auto can solve it---why not Sledgehammer?}
   1.218  
   1.219 @@ -452,6 +495,13 @@
   1.220  Because the system refers to all theorems known to Isabelle, it is particularly
   1.221  suitable when your goal has a short proof from lemmas that you don't know about.
   1.222  
   1.223 +\point{Why are there so many options?}
   1.224 +
   1.225 +Sledgehammer's philosophy should work out of the box, without user guidance.
   1.226 +Many of the options are meant to be used mostly by the Sledgehammer developers
   1.227 +for experimentation purposes. Of course, feel free to experiment with them if
   1.228 +you are so inclined.
   1.229 +
   1.230  \section{Command Syntax}
   1.231  \label{command-syntax}
   1.232