adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
1 \documentclass[a4paper,12pt]{article}
2 \usepackage[T1]{fontenc}
5 \usepackage[english,french]{babel}
12 %\usepackage[scaled=.85]{beramono}
13 \usepackage{isabelle,iman,pdfsetup}
15 \newcommand\download{\url{http://www21.in.tum.de/~blanchet/\#software}}
17 \def\qty#1{\ensuremath{\left<\mathit{#1\/}\right>}}
18 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1\/}}\right>}$}
20 \newcommand\const[1]{\textsf{#1}}
23 %\evensidemargin=4.6mm
30 \def\Colon{\mathord{:\mkern-1.5mu:}}
31 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}
32 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}
33 \def\lparr{\mathopen{(\mkern-4mu\mid}}
34 \def\rparr{\mathclose{\mid\mkern-4mu)}}
37 \def\undef{(\lambda x.\; \unk)}
38 %\def\unr{\textit{others}}
40 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}
41 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}
48 %\renewcommand\labelitemi{$\bullet$}
49 \renewcommand\labelitemi{\raise.065ex\hbox{\small\textbullet}}
51 \selectlanguage{english}
53 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]
54 Hammering Away \\[\smallskipamount]
55 \Large A User's Guide to Sledgehammer for Isabelle/HOL}
57 Jasmin Christian Blanchette \\
58 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\[4\smallskipamount]
59 {\normalsize with contributions from} \\[4\smallskipamount]
60 Lawrence C. Paulson \\
61 {\normalsize Computer Laboratory, University of Cambridge} \\
68 \setlength{\parskip}{.7em plus .2em minus .1em}
69 \setlength{\parindent}{0pt}
70 \setlength{\abovedisplayskip}{\parskip}
71 \setlength{\abovedisplayshortskip}{.9\parskip}
72 \setlength{\belowdisplayskip}{\parskip}
73 \setlength{\belowdisplayshortskip}{.9\parskip}
75 % General-purpose enum environment with correct spacing
76 \newenvironment{enum}%
78 \setlength{\topsep}{.1\parskip}%
79 \setlength{\partopsep}{.1\parskip}%
80 \setlength{\itemsep}{\parskip}%
81 \advance\itemsep by-\parsep}}
84 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin
85 \advance\rightskip by\leftmargin}
86 \def\post{\vskip0pt plus1ex\endgroup}
88 \def\prew{\pre\advance\rightskip by-\leftmargin}
91 \section{Introduction}
94 Sledgehammer is a tool that applies automatic theorem provers (ATPs)
95 and satisfiability-modulo-theories (SMT) solvers on the current goal.%
96 \footnote{The distinction between ATPs and SMT solvers is convenient but mostly
97 historical. The two communities are converging, with more and more ATPs
98 supporting typical SMT features such as arithmetic and sorts, and a few SMT
99 solvers parsing ATP syntaxes. There is also a strong technological connection
100 between instantiation-based ATPs (such as iProver and iProver-Eq) and SMT
103 The supported ATPs are E \cite{schulz-2002}, E-SInE \cite{sine}, E-ToFoF
104 \cite{tofof}, iProver \cite{korovin-2009}, iProver-Eq
105 \cite{korovin-sticksel-2010}, LEO-II \cite{leo2}, Satallax \cite{satallax},
106 SNARK \cite{snark}, SPASS \cite{weidenbach-et-al-2009}, Vampire
107 \cite{riazanov-voronkov-2002}, and Waldmeister \cite{waldmeister}. The ATPs are
108 run either locally or remotely via the System\-On\-TPTP web service
109 \cite{sutcliffe-2000}. In addition to the ATPs, a selection of the SMT solvers
110 CVC3 \cite{cvc3}, Yices \cite{yices}, and Z3 \cite{z3} are run by default, and
111 you can tell Sledgehammer to try Alt-Ergo \cite{alt-ergo} as well; these are run
112 either locally or (for CVC3 and Z3) on a server at the TU M\"unchen.
114 The problem passed to the automatic provers consists of your current goal
115 together with a heuristic selection of hundreds of facts (theorems) from the
116 current theory context, filtered by relevance. Because jobs are run in the
117 background, you can continue to work on your proof by other means. Provers can
118 be run in parallel. Any reply (which may arrive half a minute later) will appear
119 in the Proof General response buffer.
121 The result of a successful proof search is some source text that usually (but
122 not always) reconstructs the proof within Isabelle. For ATPs, the reconstructed
123 proof relies on the general-purpose \textit{metis} proof method, which
124 integrates the Metis ATP in Isabelle/HOL with explicit inferences going through
125 the kernel. Thus its results are correct by construction.
127 In this manual, we will explicitly invoke the \textbf{sledgehammer} command.
128 Sledgehammer also provides an automatic mode that can be enabled via the ``Auto
129 Sledgehammer'' option in Proof General's ``Isabelle'' menu. In this mode,
130 Sledgehammer is run on every newly entered theorem. The time limit for Auto
131 Sledgehammer and other automatic tools can be set using the ``Auto Tools Time
135 \setbox\boxA=\hbox{\texttt{NOSPAM}}
137 \newcommand\authoremail{\texttt{blan{\color{white}NOSPAM}\kern-\wd\boxA{}chette@\allowbreak
138 in.\allowbreak tum.\allowbreak de}}
140 To run Sledgehammer, you must make sure that the theory \textit{Sledgehammer} is
141 imported---this is rarely a problem in practice since it is part of
142 \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's
143 \texttt{src/HOL/Metis\_Examples} directory.
144 Comments and bug reports concerning Sledgehammer or this manual should be
145 directed to the author at \authoremail.
147 \vskip2.5\smallskipamount
149 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for
150 %suggesting several textual improvements.
152 \section{Installation}
155 Sledgehammer is part of Isabelle, so you do not need to install it. However, it
156 relies on third-party automatic provers (ATPs and SMT solvers).
158 Among the ATPs, E, LEO-II, Satallax, SPASS, and Vampire can be run locally; in
159 addition, E, E-SInE, E-ToFoF, iProver, iProver-Eq, LEO-II, Satallax, SNARK,
160 Vampire, and Waldmeister are available remotely via System\-On\-TPTP
161 \cite{sutcliffe-2000}. If you want better performance, you should at least
162 install E and SPASS locally.
164 The SMT solvers Alt-Ergo, CVC3, Yices, and Z3 can be run locally, and CVC3 and
165 Z3 can be run remotely on a TU M\"unchen server. If you want better performance
166 and get the ability to replay proofs that rely on the \emph{smt} proof method
167 without an Internet connection, you should at least have Z3 locally installed.
169 There are three main ways to install automatic provers on your machine:
173 \item[\labelitemi] If you installed an official Isabelle package, it should
174 already include properly setup executables for CVC3, E, SPASS, and Z3, ready to use.%
175 \footnote{Vampire's and Yices's licenses prevent us from doing the same for
176 these otherwise remarkable tools.}
177 For Z3, you must additionally set the variable
178 \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a
179 noncommercial user, either in the environment in which Isabelle is
181 \texttt{\$ISABELLE\_HOME\_USER/etc/settings} file.
183 \item[\labelitemi] Alternatively, you can download the Isabelle-aware CVC3, E,
184 SPASS, and Z3 binary packages from \download. Extract the archives, then add a
185 line to your \texttt{\$ISABELLE\_HOME\_USER\slash etc\slash components}%
186 \footnote{The variable \texttt{\$ISABELLE\_HOME\_USER} is set by Isabelle at
187 startup. Its value can be retrieved by executing \texttt{isabelle}
188 \texttt{getenv} \texttt{ISABELLE\_HOME\_USER} on the command line.}
189 file with the absolute path to CVC3, E, SPASS, or Z3. For example, if the
190 \texttt{components} file does not exist yet and you extracted SPASS to
191 \texttt{/usr/local/spass-3.8ds}, create it with the single line
194 \texttt{/usr/local/spass-3.8ds}
199 \item[\labelitemi] If you prefer to build E, LEO-II, Satallax, or SPASS
200 manually, or found a Vampire executable somewhere (e.g.,
201 \url{http://www.vprover.org/}), set the environment variable \texttt{E\_HOME},
202 \texttt{LEO2\_HOME}, \texttt{SATALLAX\_HOME}, \texttt{SPASS\_HOME}, or
203 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{eproof},
204 \texttt{leo}, \texttt{satallax}, \texttt{SPASS}, or \texttt{vampire} executable.
205 Sledgehammer has been tested with E 1.0 to 1.4, LEO-II 1.3.4, Satallax 2.2, 2.3,
206 and 2.4, SPASS 3.8ds, and Vampire 0.6 to 2.6.%
207 \footnote{Following the rewrite of Vampire, the counter for version numbers was
208 reset to 0; hence the (new) Vampire versions 0.6, 1.0, 1.8, and 2.6 are more
209 recent than 9.0 or 11.5.}%
210 Since the ATPs' output formats are neither documented nor stable, other
211 versions might not work well with Sledgehammer. Ideally,
212 you should also set \texttt{E\_VERSION}, \texttt{LEO2\_VERSION},
213 \texttt{SATALLAX\_VERSION}, \texttt{SPASS\_VERSION}, or
214 \texttt{VAMPIRE\_VERSION} to the prover's version number (e.g., ``1.4'').
216 Similarly, if you want to build Alt-Ergo or CVC3, or found a
217 Yices or Z3 executable somewhere (e.g.,
218 \url{http://yices.csl.sri.com/download.shtml} or
219 \url{http://research.microsoft.com/en-us/um/redmond/projects/z3/download.html}),
220 set the environment variable \texttt{CVC3\_\allowbreak SOLVER},
221 \texttt{YICES\_SOLVER}, or \texttt{Z3\_SOLVER} to the complete path of
222 the executable, \emph{including the file name};
223 for Alt-Ergo, set the
224 environment variable \texttt{WHY3\_HOME} to the directory that contains the
225 \texttt{why3} executable.
226 Sledgehammer has been tested with Alt-Ergo 0.93 and 0.94, CVC3 2.2 and 2.4.1,
227 Yices 1.0.28 and 1.0.33, and Z3 3.0 to 4.0. Since the SMT solvers' output
228 formats are somewhat unstable, other versions of the solvers might not work well
229 with Sledgehammer. Ideally, also set \texttt{CVC3\_VERSION},
230 \texttt{YICES\_VERSION}, or \texttt{Z3\_VERSION} to the solver's version number
235 To check whether E, SPASS, Vampire, and/or Z3 are successfully installed, try
236 out the example in \S\ref{first-steps}. If the remote versions of any of these
237 provers is used (identified by the prefix ``\emph{remote\_\/}''), or if the
238 local versions fail to solve the easy goal presented there, something must be
239 wrong with the installation.
241 Remote prover invocation requires Perl with the World Wide Web Library
242 (\texttt{libwww-perl}) installed. If you must use a proxy server to access the
243 Internet, set the \texttt{http\_proxy} environment variable to the proxy, either
244 in the environment in which Isabelle is launched or in your
245 \texttt{\$ISABELLE\_HOME\_USER/etc/settings} file. Here are a few
249 \texttt{http\_proxy=http://proxy.example.org} \\
250 \texttt{http\_proxy=http://proxy.example.org:8080} \\
251 \texttt{http\_proxy=http://joeblow:pAsSwRd@proxy.example.org}
254 \section{First Steps}
257 To illustrate Sledgehammer in context, let us start a theory file and
258 attempt to prove a simple lemma:
261 \textbf{theory}~\textit{Scratch} \\
262 \textbf{imports}~\textit{Main} \\
263 \textbf{begin} \\[2\smallskipamount]
265 \textbf{lemma} ``$[a] = [b] \,\Longrightarrow\, a = b$'' \\
266 \textbf{sledgehammer}
269 Instead of issuing the \textbf{sledgehammer} command, you can also find
270 Sledgehammer in the ``Commands'' submenu of the ``Isabelle'' menu in Proof
271 General or press the Emacs key sequence C-c C-a C-s.
272 Either way, Sledgehammer produces the following output after a few seconds:
276 Sledgehammer: ``\textit{e\/}'' on goal \\
277 $[a] = [b] \,\Longrightarrow\, a = b$ \\
278 Try this: \textbf{by} (\textit{metis last\_ConsL}) (64 ms). \\[3\smallskipamount]
280 Sledgehammer: ``\textit{z3\/}'' on goal \\
281 $[a] = [b] \,\Longrightarrow\, a = b$ \\
282 Try this: \textbf{by} (\textit{metis list.inject}) (20 ms). \\[3\smallskipamount]
284 Sledgehammer: ``\textit{vampire\/}'' on goal \\
285 $[a] = [b] \,\Longrightarrow\, a = b$ \\
286 Try this: \textbf{by} (\textit{metis hd.simps}) (14 ms). \\[3\smallskipamount]
288 Sledgehammer: ``\textit{spass\/}'' on goal \\
289 $[a] = [b] \,\Longrightarrow\, a = b$ \\
290 Try this: \textbf{by} (\textit{metis list.inject}) (17 ms). \\[3\smallskipamount]
292 Sledgehammer: ``\textit{remote\_waldmeister\/}'' on goal \\
293 $[a] = [b] \,\Longrightarrow\, a = b$ \\
294 Try this: \textbf{by} (\textit{metis hd.simps}) (15 ms). \\[3\smallskipamount]
296 Sledgehammer: ``\textit{remote\_e\_sine\/}'' on goal \\
297 $[a] = [b] \,\Longrightarrow\, a = b$ \\
298 Try this: \textbf{by} (\textit{metis hd.simps}) (18 ms).
301 Sledgehammer ran E, E-SInE, SPASS, Vampire, Waldmeister, and Z3 in parallel.
302 Depending on which provers are installed and how many processor cores are
303 available, some of the provers might be missing or present with a
304 \textit{remote\_} prefix. Waldmeister is run only for unit equational problems,
305 where the goal's conclusion is a (universally quantified) equation.
307 For each successful prover, Sledgehammer gives a one-liner \textit{metis} or
308 \textit{smt} method call. Rough timings are shown in parentheses, indicating how
309 fast the call is. You can click the proof to insert it into the theory text.
311 In addition, you can ask Sledgehammer for an Isar text proof by passing the
312 \textit{isar\_proofs} option (\S\ref{output-format}):
315 \textbf{sledgehammer} [\textit{isar\_proofs}]
318 When Isar proof construction is successful, it can yield proofs that are more
319 readable and also faster than the \textit{metis} or \textit{smt} one-liners.
320 This feature is experimental and is only available for ATPs.
325 This section presents a few hints that should help you get the most out of
326 Sledgehammer. Frequently asked questions are answered in
327 \S\ref{frequently-asked-questions}.
329 %\newcommand\point[1]{\medskip\par{\sl\bfseries#1}\par\nopagebreak}
330 \newcommand\point[1]{\subsection{\emph{#1}}}
332 \point{Presimplify the goal}
334 For best results, first simplify your problem by calling \textit{auto} or at
335 least \textit{safe} followed by \textit{simp\_all}. The SMT solvers provide
336 arithmetic decision procedures, but the ATPs typically do not (or if they do,
337 Sledgehammer does not use it yet). Apart from Waldmeister, they are not
338 especially good at heavy rewriting, but because they regard equations as
339 undirected, they often prove theorems that require the reverse orientation of a
340 \textit{simp} rule. Higher-order problems can be tackled, but the success rate
341 is better for first-order problems. Hence, you may get better results if you
342 first simplify the problem to remove higher-order features.
344 \point{Make sure E, SPASS, Vampire, and Z3 are locally installed}
346 Locally installed provers are faster and more reliable than those running on
347 servers. See \S\ref{installation} for details on how to install them.
349 \point{Familiarize yourself with the most important options}
351 Sledgehammer's options are fully documented in \S\ref{command-syntax}. Many of
352 the options are very specialized, but serious users of the tool should at least
353 familiarize themselves with the following options:
356 \item[\labelitemi] \textbf{\textit{provers}} (\S\ref{mode-of-operation}) specifies
357 the automatic provers (ATPs and SMT solvers) that should be run whenever
358 Sledgehammer is invoked (e.g., ``\textit{provers}~= \textit{e spass
359 remote\_vampire\/}''). For convenience, you can omit ``\textit{provers}~=''
360 and simply write the prover names as a space-separated list (e.g., ``\textit{e
361 spass remote\_vampire\/}'').
363 \item[\labelitemi] \textbf{\textit{max\_facts}} (\S\ref{relevance-filter})
364 specifies the maximum number of facts that should be passed to the provers. By
365 default, the value is prover-dependent but varies between about 50 and 1000. If
366 the provers time out, you can try lowering this value to, say, 25 or 50 and see
369 \item[\labelitemi] \textbf{\textit{isar\_proofs}} (\S\ref{output-format}) specifies
370 that Isar proofs should be generated, instead of one-liner \textit{metis} or
371 \textit{smt} proofs. The length of the Isar proofs can be controlled by setting
372 \textit{isar\_shrink} (\S\ref{output-format}).
374 \item[\labelitemi] \textbf{\textit{timeout}} (\S\ref{timeouts}) controls the
375 provers' time limit. It is set to 30 seconds, but since Sledgehammer runs
376 asynchronously you should not hesitate to raise this limit to 60 or 120 seconds
377 if you are the kind of user who can think clearly while ATPs are active.
380 Options can be set globally using \textbf{sledgehammer\_params}
381 (\S\ref{command-syntax}). The command also prints the list of all available
382 options with their current value. Fact selection can be influenced by specifying
383 ``$(\textit{add}{:}~\textit{my\_facts})$'' after the \textbf{sledgehammer} call
384 to ensure that certain facts are included, or simply ``$(\textit{my\_facts})$''
385 to force Sledgehammer to run only with $\textit{my\_facts}$.
387 \section{Frequently Asked Questions}
388 \label{frequently-asked-questions}
390 This sections answers frequently (and infrequently) asked questions about
391 Sledgehammer. It is a good idea to skim over it now even if you do not have any
392 questions at this stage. And if you have any further questions not listed here,
393 send them to the author at \authoremail.
395 \point{Which facts are passed to the automatic provers?}
397 Sledgehammer heuristically selects a few hundred relevant lemmas from the
398 currently loaded libraries. The component that performs this selection is
399 called \emph{relevance filter}.
403 The traditional relevance filter, called \emph{MePo}
404 (\underline{Me}ng--\underline{Pau}lson), assigns a score to every available fact
405 (lemma, theorem, definition, or axiom) based upon how many constants that fact
406 shares with the conjecture. This process iterates to include facts relevant to
407 those just accepted. The constants are weighted to give unusual ones greater
408 significance. MePo copes best when the conjecture contains some unusual
409 constants; if all the constants are common, it is unable to discriminate among
410 the hundreds of facts that are picked up. The filter is also memoryless: It has
411 no information about how many times a particular fact has been used in a proof,
415 An experimental, memoryful alternative to MePo is \emph{MaSh}
416 (\underline{Ma}chine Learner for \underline{S}ledge\underline{h}ammer). It
417 relies on an external Python tool that applies machine learning to
418 the problem of finding relevant facts.
420 \item[\labelitemi] The \emph{MeSh} filter combines MePo and MaSh.
423 The default is either MePo or MeSh, depending on whether the environment
424 variable \texttt{MASH} is set and what class of provers the target prover
425 belongs to (\S\ref{relevance-filter}).
427 The number of facts included in a problem varies from prover to prover, since
428 some provers get overwhelmed more easily than others. You can show the number of
429 facts given using the \textit{verbose} option (\S\ref{output-format}) and the
430 actual facts using \textit{debug} (\S\ref{output-format}).
432 Sledgehammer is good at finding short proofs combining a handful of existing
433 lemmas. If you are looking for longer proofs, you must typically restrict the
434 number of facts, by setting the \textit{max\_facts} option
435 (\S\ref{relevance-filter}) to, say, 25 or 50.
437 You can also influence which facts are actually selected in a number of ways. If
438 you simply want to ensure that a fact is included, you can specify it using the
439 ``$(\textit{add}{:}~\textit{my\_facts})$'' syntax. For example:
442 \textbf{sledgehammer} (\textit{add}: \textit{hd.simps} \textit{tl.simps})
445 The specified facts then replace the least relevant facts that would otherwise be
446 included; the other selected facts remain the same.
447 If you want to direct the selection in a particular direction, you can specify
448 the facts via \textbf{using}:
451 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
452 \textbf{sledgehammer}
455 The facts are then more likely to be selected than otherwise, and if they are
456 selected at iteration $j$ they also influence which facts are selected at
457 iterations $j + 1$, $j + 2$, etc. To give them even more weight, try
460 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
461 \textbf{apply}~\textbf{--} \\
462 \textbf{sledgehammer}
465 \point{Why does Metis fail to reconstruct the proof?}
467 There are many reasons. If Metis runs seemingly forever, that is a sign that the
468 proof is too difficult for it. Metis's search is complete, so it should
469 eventually find it, but that's little consolation. There are several possible
473 \item[\labelitemi] Try the \textit{isar\_proofs} option (\S\ref{output-format}) to
474 obtain a step-by-step Isar proof where each step is justified by \textit{metis}.
475 Since the steps are fairly small, \textit{metis} is more likely to be able to
478 \item[\labelitemi] Try the \textit{smt} proof method instead of \textit{metis}.
479 It is usually stronger, but you need to either have Z3 available to replay the
480 proofs, trust the SMT solver, or use certificates. See the documentation in the
481 \emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.
483 \item[\labelitemi] Try the \textit{blast} or \textit{auto} proof methods, passing
484 the necessary facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},
485 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
488 In some rare cases, \textit{metis} fails fairly quickly, and you get the error
493 One-line proof reconstruction failed.
496 This message indicates that Sledgehammer determined that the goal is provable,
497 but the proof is, for technical reasons, beyond \textit{metis}'s power. You can
498 then try again with the \textit{strict} option (\S\ref{problem-encoding}).
500 If the goal is actually unprovable and you did not specify an unsound encoding
501 using \textit{type\_enc} (\S\ref{problem-encoding}), this is a bug, and you are
502 strongly encouraged to report this to the author at \authoremail.
504 \point{Why are the generated Isar proofs so ugly/broken?}
506 The current implementation of the Isar proof feature,
507 enabled by the \textit{isar\_proofs} option (\S\ref{output-format}),
508 is highly experimental. Work on a new implementation has begun. There is a large body of
509 research into transforming resolution proofs into natural deduction proofs (such
510 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
511 set the \textit{isar\_shrink} option (\S\ref{output-format}) to a larger
512 value or to try several provers and keep the nicest-looking proof.
514 \point{How can I tell whether a suggested proof is sound?}
516 Earlier versions of Sledgehammer often suggested unsound proofs---either proofs
517 of nontheorems or simply proofs that rely on type-unsound inferences. This
518 is a thing of the past, unless you explicitly specify an unsound encoding
519 using \textit{type\_enc} (\S\ref{problem-encoding}).
521 Officially, the only form of ``unsoundness'' that lurks in the sound
522 encodings is related to missing characteristic theorems of datatypes. For
526 \textbf{lemma}~``$\exists \mathit{xs}.\; \mathit{xs} \neq []$'' \\
527 \textbf{sledgehammer} ()
530 suggests an argumentless \textit{metis} call that fails. However, the conjecture
531 does actually hold, and the \textit{metis} call can be repaired by adding
532 \textit{list.distinct}.
534 We hope to address this problem in a future version of Isabelle. In the
535 meantime, you can avoid it by passing the \textit{strict} option
536 (\S\ref{problem-encoding}).
538 \point{What are the \textit{full\_types}, \textit{no\_types}, and
539 \textit{mono\_tags} arguments to Metis?}
541 The \textit{metis}~(\textit{full\_types}) proof method
542 and its cousin \textit{metis}~(\textit{mono\_tags}) are fully-typed
543 version of Metis. It is somewhat slower than \textit{metis}, but the proof
544 search is fully typed, and it also includes more powerful rules such as the
545 axiom ``$x = \const{True} \mathrel{\lor} x = \const{False}$'' for reasoning in
546 higher-order places (e.g., in set comprehensions). The method kicks in
547 automatically as a fallback when \textit{metis} fails, and it is sometimes
548 generated by Sledgehammer instead of \textit{metis} if the proof obviously
549 requires type information or if \textit{metis} failed when Sledgehammer
550 preplayed the proof. (By default, Sledgehammer tries to run \textit{metis} with
551 various options for up to 3 seconds each time to ensure that the generated
552 one-line proofs actually work and to display timing information. This can be
553 configured using the \textit{preplay\_timeout} and \textit{dont\_preplay}
554 options (\S\ref{timeouts}).)
556 At the other end of the soundness spectrum, \textit{metis} (\textit{no\_types})
557 uses no type information at all during the proof search, which is more efficient
558 but often fails. Calls to \textit{metis} (\textit{no\_types}) are occasionally
559 generated by Sledgehammer.
561 See the \textit{type\_enc} option (\S\ref{problem-encoding}) for details.
563 Incidentally, if you ever see warnings such as
567 Metis: Falling back on ``\textit{metis} (\textit{full\_types})''.
570 for a successful \textit{metis} proof, you can advantageously pass the
571 \textit{full\_types} option to \textit{metis} directly.
573 \point{And what are the \textit{lifting} and \textit{hide\_lams} arguments
576 Orthogonally to the encoding of types, it is important to choose an appropriate
577 translation of $\lambda$-abstractions. Metis supports three translation schemes,
578 in decreasing order of power: Curry combinators (the default),
579 $\lambda$-lifting, and a ``hiding'' scheme that disables all reasoning under
580 $\lambda$-abstractions. The more powerful schemes also give the automatic
581 provers more rope to hang themselves. See the \textit{lam\_trans} option (\S\ref{problem-encoding}) for details.
583 \point{Are generated proofs minimal?}
585 Automatic provers frequently use many more facts than are necessary.
586 Sledgehammer inclues a minimization tool that takes a set of facts returned by a
587 given prover and repeatedly calls the same prover, \textit{metis}, or
588 \textit{smt} with subsets of those axioms in order to find a minimal set.
589 Reducing the number of axioms typically improves Metis's speed and success rate,
590 while also removing superfluous clutter from the proof scripts.
592 In earlier versions of Sledgehammer, generated proofs were systematically
593 accompanied by a suggestion to invoke the minimization tool. This step is now
594 performed implicitly if it can be done in a reasonable amount of time (something
595 that can be guessed from the number of facts in the original proof and the time
596 it took to find or preplay it).
598 In addition, some provers (e.g., Yices) do not provide proofs or sometimes
599 produce incomplete proofs. The minimizer is then invoked to find out which facts
600 are actually needed from the (large) set of facts that was initially given to
601 the prover. Finally, if a prover returns a proof with lots of facts, the
602 minimizer is invoked automatically since Metis would be unlikely to re-find the
605 Automatic minimization can be forced or disabled using the \textit{minimize}
606 option (\S\ref{mode-of-operation}).
608 \point{A strange error occurred---what should I do?}
610 Sledgehammer tries to give informative error messages. Please report any strange
611 error to the author at \authoremail. This applies double if you get the message
615 The prover found a type-unsound proof involving ``\textit{foo\/}'',
616 ``\textit{bar\/}'', and ``\textit{baz\/}'' even though a supposedly type-sound
617 encoding was used (or, less likely, your axioms are inconsistent). You might
618 want to report this to the Isabelle developers.
621 \point{Auto can solve it---why not Sledgehammer?}
623 Problems can be easy for \textit{auto} and difficult for automatic provers, but
624 the reverse is also true, so do not be discouraged if your first attempts fail.
625 Because the system refers to all theorems known to Isabelle, it is particularly
626 suitable when your goal has a short proof from lemmas that you do not know
629 \point{Why are there so many options?}
631 Sledgehammer's philosophy should work out of the box, without user guidance.
632 Many of the options are meant to be used mostly by the Sledgehammer developers
633 for experimentation purposes. Of course, feel free to experiment with them if
636 \section{Command Syntax}
637 \label{command-syntax}
639 \subsection{Sledgehammer}
641 Sledgehammer can be invoked at any point when there is an open goal by entering
642 the \textbf{sledgehammer} command in the theory file. Its general syntax is as
646 \textbf{sledgehammer} \qty{subcommand}$^?$ \qty{options}$^?$ \qty{facts\_override}$^?$ \qty{num}$^?$
649 For convenience, Sledgehammer is also available in the ``Commands'' submenu of
650 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c
651 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with
652 no arguments in the theory text.
654 In the general syntax, the \qty{subcommand} may be any of the following:
657 \item[\labelitemi] \textbf{\textit{run} (the default):} Runs Sledgehammer on
658 subgoal number \qty{num} (1 by default), with the given options and facts.
660 \item[\labelitemi] \textbf{\textit{min}:} Attempts to minimize the facts
661 specified in the \qty{facts\_override} argument to obtain a simpler proof
662 involving fewer facts. The options and goal number are as for \textit{run}.
664 \item[\labelitemi] \textbf{\textit{messages}:} Redisplays recent messages issued
665 by Sledgehammer. This allows you to examine results that might have been lost
666 due to Sledgehammer's asynchronous nature. The \qty{num} argument specifies a
667 limit on the number of messages to display (10 by default).
669 \item[\labelitemi] \textbf{\textit{supported\_provers}:} Prints the list of
670 automatic provers supported by Sledgehammer. See \S\ref{installation} and
671 \S\ref{mode-of-operation} for more information on how to install automatic
674 \item[\labelitemi] \textbf{\textit{running\_provers}:} Prints information about
675 currently running automatic provers, including elapsed runtime and remaining
678 \item[\labelitemi] \textbf{\textit{kill}:} Terminates all running
679 threads (automatic provers and machine learners).
681 \item[\labelitemi] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
682 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
685 In addition, the following subcommands provide finer control over machine
689 \item[\labelitemi] \textbf{\textit{unlearn}:} Resets MaSh, erasing any
692 \item[\labelitemi] \textbf{\textit{learn\_isar}:} Invokes MaSh on the current
693 theory to process all the available facts, learning from their Isabelle/Isar
694 proofs. This happens automatically at Sledgehammer invocations if the
695 \textit{learn} option (\S\ref{relevance-filter}) is enabled.
697 \item[\labelitemi] \textbf{\textit{learn\_prover}:} Invokes MaSh on the current
698 theory to process all the available facts, learning from proofs generated by
699 automatic provers. The prover to use and its timeout can be set using the
700 \textit{prover} (\S\ref{mode-of-operation}) and \textit{timeout}
701 (\S\ref{timeouts}) options. It is recommended to perform learning using an
702 efficient first-order ATP (such as E, SPASS, and Vampire) as opposed to a
703 higher-order ATP or an SMT solver.
705 \item[\labelitemi] \textbf{\textit{relearn\_isar}:} Same as \textit{unlearn}
706 followed by \textit{learn\_isar}.
708 \item[\labelitemi] \textbf{\textit{relearn\_prover}:} Same as \textit{unlearn}
709 followed by \textit{learn\_prover}.
711 \item[\labelitemi] \textbf{\textit{running\_learners}:} Prints information about
712 currently running machine learners, including elapsed runtime and remaining
715 \item[\labelitemi] \textbf{\textit{kill\_learners}:} Terminates all running
719 Sledgehammer's behavior can be influenced by various \qty{options}, which can be
720 specified in brackets after the \textbf{sledgehammer} command. The
721 \qty{options} are a list of key--value pairs of the form ``[$k_1 = v_1,
722 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true\/}'' is optional.
726 \textbf{sledgehammer} [\textit{isar\_proofs}, \,\textit{timeout} = 120]
729 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:
732 \textbf{sledgehammer\_params} \qty{options}
735 The supported options are described in \S\ref{option-reference}.
737 The \qty{facts\_override} argument lets you alter the set of facts that go
738 through the relevance filter. It may be of the form ``(\qty{facts})'', where
739 \qty{facts} is a space-separated list of Isabelle facts (theorems, local
740 assumptions, etc.), in which case the relevance filter is bypassed and the given
741 facts are used. It may also be of the form ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}})'',
742 ``(\textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', or ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}}\
743 \textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', where the relevance filter is instructed to
744 proceed as usual except that it should consider \qty{facts\/_{\mathrm{1}}}
745 highly-relevant and \qty{facts\/_{\mathrm{2}}} fully irrelevant.
747 You can instruct Sledgehammer to run automatically on newly entered theorems by
748 enabling the ``Auto Sledgehammer'' option in Proof General's ``Isabelle'' menu.
749 For automatic runs, only the first prover set using \textit{provers}
750 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,
751 \textit{slice} (\S\ref{mode-of-operation}) is disabled, \textit{strict}
752 (\S\ref{problem-encoding}) is enabled, \textit{verbose} (\S\ref{output-format})
753 and \textit{debug} (\S\ref{output-format}) are disabled, and \textit{timeout}
754 (\S\ref{timeouts}) is superseded by the ``Auto Tools Time Limit'' in Proof
755 General's ``Isabelle'' menu. Sledgehammer's output is also more concise.
759 The \textit{metis} proof method has the syntax
762 \textbf{\textit{metis}}~(\qty{options})${}^?$~\qty{facts}${}^?$
765 where \qty{facts} is a list of arbitrary facts and \qty{options} is a
766 comma-separated list consisting of at most one $\lambda$ translation scheme
767 specification with the same semantics as Sledgehammer's \textit{lam\_trans}
768 option (\S\ref{problem-encoding}) and at most one type encoding specification
769 with the same semantics as Sledgehammer's \textit{type\_enc} option
770 (\S\ref{problem-encoding}).
772 The supported $\lambda$ translation schemes are \textit{hide\_lams},
773 \textit{lifting}, and \textit{combs} (the default).
775 All the untyped type encodings listed in \S\ref{problem-encoding} are supported.
776 For convenience, the following aliases are provided:
778 \item[\labelitemi] \textbf{\textit{full\_types}:} Alias for \textit{poly\_guards\_query}.
779 \item[\labelitemi] \textbf{\textit{partial\_types}:} Alias for \textit{poly\_args}.
780 \item[\labelitemi] \textbf{\textit{no\_types}:} Alias for \textit{erased}.
783 \section{Option Reference}
784 \label{option-reference}
789 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}
790 \def\optrueonly#1{\flushitem{\textit{#1} $\bigl[$= \textit{true}$\bigr]$\enskip}\nopagebreak\\[\parskip]}
791 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{true}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
792 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{false}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
793 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
794 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
795 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}
796 \def\opnodefaultbrk#1#2{\flushitem{$\bigl[$\textit{#1} =$\bigr]$ \qtybf{#2}} \nopagebreak\\[\parskip]}
797 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\enskip \defl\textit{#3}\defr} \nopagebreak\\[\parskip]}
798 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}
799 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
800 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
802 Sledgehammer's options are categorized as follows:\ mode of operation
803 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}),
804 relevance filter (\S\ref{relevance-filter}), output format
805 (\S\ref{output-format}), authentication (\S\ref{authentication}), and timeouts
808 The descriptions below refer to the following syntactic quantities:
811 \item[\labelitemi] \qtybf{string}: A string.
812 \item[\labelitemi] \qtybf{bool\/}: \textit{true} or \textit{false}.
813 \item[\labelitemi] \qtybf{smart\_bool\/}: \textit{true}, \textit{false}, or
815 \item[\labelitemi] \qtybf{int\/}: An integer.
816 %\item[\labelitemi] \qtybf{float\/}: A floating-point number (e.g., 2.5).
817 \item[\labelitemi] \qtybf{float\_pair\/}: A pair of floating-point numbers
819 \item[\labelitemi] \qtybf{smart\_int\/}: An integer or \textit{smart}.
820 \item[\labelitemi] \qtybf{float\_or\_none\/}: A floating-point number (e.g., 60 or
821 0.5) expressing a number of seconds, or the keyword \textit{none} ($\infty$
825 Default values are indicated in curly brackets (\textrm{\{\}}). Boolean options
826 have a negative counterpart (e.g., \textit{blocking} vs.\
827 \textit{non\_blocking}). When setting Boolean options or their negative
828 counterparts, ``= \textit{true\/}'' may be omitted.
830 \subsection{Mode of Operation}
831 \label{mode-of-operation}
834 \opnodefaultbrk{provers}{string}
835 Specifies the automatic provers to use as a space-separated list (e.g.,
836 ``\textit{e}~\textit{spass}~\textit{remote\_vampire\/}'').
837 Provers can be run locally or remotely; see \S\ref{installation} for
838 installation instructions.
840 The following local provers are supported:
844 \item[\labelitemi] \textbf{\textit{alt\_ergo}:} Alt-Ergo is a polymorphic
845 SMT solver developed by Bobot et al.\ \cite{alt-ergo}.
846 It supports the TPTP polymorphic typed first-order format (TFF1) via Why3
847 \cite{why3}. It is included for experimental purposes. To use Alt-Ergo, set the
848 environment variable \texttt{WHY3\_HOME} to the directory that contains the
849 \texttt{why3} executable. Sledgehammer has been tested with Alt-Ergo 0.93 and an
850 unidentified development version of Why3.
852 \item[\labelitemi] \textbf{\textit{cvc3}:} CVC3 is an SMT solver developed by
853 Clark Barrett, Cesare Tinelli, and their colleagues \cite{cvc3}. To use CVC3,
854 set the environment variable \texttt{CVC3\_SOLVER} to the complete path of the
855 executable, including the file name, or install the prebuilt CVC3 package from
856 \download. Sledgehammer has been tested with version 2.2 and 2.4.1.
858 \item[\labelitemi] \textbf{\textit{e}:} E is a first-order resolution prover
859 developed by Stephan Schulz \cite{schulz-2002}. To use E, set the environment
860 variable \texttt{E\_HOME} to the directory that contains the \texttt{eproof}
861 executable and \texttt{E\_VERSION} to the version number (e.g., ``1.4''), or
862 install the prebuilt E package from \download. Sledgehammer has been tested with
865 \item[\labelitemi] \textbf{\textit{e\_males}:} E-MaLeS is a metaprover developed
866 by Daniel K\"uhlwein that implements strategy scheduling on top of E. To use
867 E-MaLeS, set the environment variable \texttt{E\_MALES\_HOME} to the directory
868 that contains the \texttt{emales.py} script. Sledgehammer has been tested with
871 \item[\labelitemi] \textbf{\textit{iprover}:} iProver is a pure
872 instantiation-based prover developed by Konstantin Korovin \cite{korovin-2009}.
873 To use iProver, set the environment variable \texttt{IPROVER\_HOME} to the
874 directory that contains the \texttt{iprover} and \texttt{vclausify\_rel}
875 executables. Sledgehammer has been tested with version 0.99.
877 \item[\labelitemi] \textbf{\textit{iprover\_eq}:} iProver-Eq is an
878 instantiation-based prover with native support for equality developed by
879 Konstantin Korovin and Christoph Sticksel \cite{korovin-sticksel-2010}. To use
880 iProver-Eq, set the environment variable \texttt{IPROVER\_EQ\_HOME} to the
881 directory that contains the \texttt{iprover-eq} and \texttt{vclausify\_rel}
882 executables. Sledgehammer has been tested with version 0.8.
884 \item[\labelitemi] \textbf{\textit{leo2}:} LEO-II is an automatic
885 higher-order prover developed by Christoph Benzm\"uller et al.\ \cite{leo2},
886 with support for the TPTP typed higher-order syntax (THF0). To use LEO-II, set
887 the environment variable \texttt{LEO2\_HOME} to the directory that contains the
888 \texttt{leo} executable. Sledgehammer requires version 1.2.9 or above.
890 \item[\labelitemi] \textbf{\textit{metis}:} Although it is less powerful than
891 the external provers, Metis itself can be used for proof search.
893 \item[\labelitemi] \textbf{\textit{satallax}:} Satallax is an automatic
894 higher-order prover developed by Chad Brown et al.\ \cite{satallax}, with
895 support for the TPTP typed higher-order syntax (THF0). To use Satallax, set the
896 environment variable \texttt{SATALLAX\_HOME} to the directory that contains the
897 \texttt{satallax} executable. Sledgehammer requires version 2.2 or above.
899 \item[\labelitemi] \textbf{\textit{smt}:} The \textit{smt} proof method with the
900 current settings (usually:\ Z3 with proof reconstruction) can be used for proof
903 \item[\labelitemi] \textbf{\textit{spass}:} SPASS is a first-order resolution
904 prover developed by Christoph Weidenbach et al.\ \cite{weidenbach-et-al-2009}.
905 To use SPASS, set the environment variable \texttt{SPASS\_HOME} to the directory
906 that contains the \texttt{SPASS} executable and \texttt{SPASS\_VERSION} to the
907 version number (e.g., ``3.8ds''), or install the prebuilt SPASS package from
908 \download. Sledgehammer requires version 3.8ds or above.
910 \item[\labelitemi] \textbf{\textit{vampire}:} Vampire is a first-order
911 resolution prover developed by Andrei Voronkov and his colleagues
912 \cite{riazanov-voronkov-2002}. To use Vampire, set the environment variable
913 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{vampire}
914 executable and \texttt{VAMPIRE\_VERSION} to the version number (e.g.,
915 ``1.8rev1435'', ``2.6''). Sledgehammer has been tested with versions 0.6, 1.0,
916 and 1.8. Versions strictly above 1.8 (e.g., ``1.8rev1435'') support the TPTP
917 typed first-order format (TFF0).
919 \item[\labelitemi] \textbf{\textit{yices}:} Yices is an SMT solver developed at
920 SRI \cite{yices}. To use Yices, set the environment variable
921 \texttt{YICES\_SOLVER} to the complete path of the executable, including the
922 file name. Sledgehammer has been tested with version 1.0.28.
924 \item[\labelitemi] \textbf{\textit{z3}:} Z3 is an SMT solver developed at
925 Microsoft Research \cite{z3}. To use Z3, set the environment variable
926 \texttt{Z3\_SOLVER} to the complete path of the executable, including the file
927 name, and set \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a
928 noncommercial user. Sledgehammer has been tested with versions 3.0, 3.1, 3.2,
933 The following remote provers are supported:
936 \item[\labelitemi] \textbf{\textit{remote\_cvc3}:} The remote version of CVC3 runs
937 on servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
940 \item[\labelitemi] \textbf{\textit{remote\_e}:} The remote version of E runs
941 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
943 \item[\labelitemi] \textbf{\textit{remote\_e\_sine}:} E-SInE is a metaprover
944 developed by Kry\v stof Hoder \cite{sine} based on E. It runs on Geoff
945 Sutcliffe's Miami servers.
947 \item[\labelitemi] \textbf{\textit{remote\_e\_tofof}:} E-ToFoF is a metaprover
948 developed by Geoff Sutcliffe \cite{tofof} based on E running on his Miami
949 servers. This ATP supports the TPTP typed first-order format (TFF0). The
950 remote version of E-ToFoF runs on Geoff Sutcliffe's Miami servers.
952 \item[\labelitemi] \textbf{\textit{remote\_iprover}:} The
953 remote version of iProver runs on Geoff Sutcliffe's Miami servers
954 \cite{sutcliffe-2000}.
956 \item[\labelitemi] \textbf{\textit{remote\_iprover\_eq}:} The
957 remote version of iProver-Eq runs on Geoff Sutcliffe's Miami servers
958 \cite{sutcliffe-2000}.
960 \item[\labelitemi] \textbf{\textit{remote\_leo2}:} The remote version of LEO-II
961 runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
963 \item[\labelitemi] \textbf{\textit{remote\_satallax}:} The remote version of
964 Satallax runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
966 \item[\labelitemi] \textbf{\textit{remote\_snark}:} SNARK is a first-order
967 resolution prover developed by Stickel et al.\ \cite{snark}. It supports the
968 TPTP typed first-order format (TFF0). The remote version of SNARK runs on
969 Geoff Sutcliffe's Miami servers.
971 \item[\labelitemi] \textbf{\textit{remote\_vampire}:} The remote version of
972 Vampire runs on Geoff Sutcliffe's Miami servers.
974 \item[\labelitemi] \textbf{\textit{remote\_waldmeister}:} Waldmeister is a unit
975 equality prover developed by Hillenbrand et al.\ \cite{waldmeister}. It can be
976 used to prove universally quantified equations using unconditional equations,
977 corresponding to the TPTP CNF UEQ division. The remote version of Waldmeister
978 runs on Geoff Sutcliffe's Miami servers.
980 \item[\labelitemi] \textbf{\textit{remote\_z3}:} The remote version of Z3 runs on
981 servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
985 By default, Sledgehammer runs a selection of CVC3, E, E-SInE, SPASS, Vampire,
986 Yices, Z3, and (if appropriate) Waldmeister in parallel---either locally or
987 remotely, depending on the number of processor cores available. For historical
988 reasons, the default value of this option can be overridden using the option
989 ``Sledgehammer: Provers'' in Proof General's ``Isabelle'' menu.
991 It is generally a good idea to run several provers in parallel. Running E,
992 SPASS, and Vampire for 5~seconds yields a similar success rate to running the
993 most effective of these for 120~seconds \cite{boehme-nipkow-2010}.
995 For the \textit{min} subcommand, the default prover is \textit{metis}. If
996 several provers are set, the first one is used.
998 \opnodefault{prover}{string}
999 Alias for \textit{provers}.
1001 \opfalse{blocking}{non\_blocking}
1002 Specifies whether the \textbf{sledgehammer} command should operate
1003 synchronously. The asynchronous (non-blocking) mode lets the user start proving
1004 the putative theorem manually while Sledgehammer looks for a proof, but it can
1005 also be more confusing. Irrespective of the value of this option, Sledgehammer
1006 is always run synchronously for the new jEdit-based user interface or if
1007 \textit{debug} (\S\ref{output-format}) is enabled.
1009 \optrue{slice}{dont\_slice}
1010 Specifies whether the time allocated to a prover should be sliced into several
1011 segments, each of which has its own set of possibly prover-dependent options.
1012 For SPASS and Vampire, the first slice tries the fast but incomplete
1013 set-of-support (SOS) strategy, whereas the second slice runs without it. For E,
1014 up to three slices are tried, with different weighted search strategies and
1015 number of facts. For SMT solvers, several slices are tried with the same options
1016 each time but fewer and fewer facts. According to benchmarks with a timeout of
1017 30 seconds, slicing is a valuable optimization, and you should probably leave it
1018 enabled unless you are conducting experiments. This option is implicitly
1019 disabled for (short) automatic runs.
1022 {\small See also \textit{verbose} (\S\ref{output-format}).}
1024 \opsmart{minimize}{dont\_minimize}
1025 Specifies whether the minimization tool should be invoked automatically after
1026 proof search. By default, automatic minimization takes place only if
1027 it can be done in a reasonable amount of time (as determined by
1028 the number of facts in the original proof and the time it took to find or
1029 preplay it) or the proof involves an unreasonably large number of facts.
1032 {\small See also \textit{preplay\_timeout} (\S\ref{timeouts})
1033 and \textit{dont\_preplay} (\S\ref{timeouts}).}
1035 \opfalse{overlord}{no\_overlord}
1036 Specifies whether Sledgehammer should put its temporary files in
1037 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for
1038 debugging Sledgehammer but also unsafe if several instances of the tool are run
1039 simultaneously. The files are identified by the prefixes \texttt{prob\_} and
1040 \texttt{mash\_}; you may safely remove them after Sledgehammer has run.
1043 {\small See also \textit{debug} (\S\ref{output-format}).}
1046 \subsection{Relevance Filter}
1047 \label{relevance-filter}
1050 \opdefault{fact\_filter}{string}{smart}
1051 Specifies the relevance filter to use. The following filters are available:
1054 \item[\labelitemi] \textbf{\textit{mepo}:}
1055 The traditional memoryless MePo relevance filter.
1057 \item[\labelitemi] \textbf{\textit{mash}:}
1058 The memoryful MaSh machine learner. MaSh relies on the external Python program
1059 \texttt{mash.py}, which is part of Isabelle. To enable MaSh, set the environment
1060 variable \texttt{MASH} to \texttt{yes}. Persistent data is stored in the
1061 directory \texttt{\$ISABELLE\_HOME\_USER/mash}.
1063 \item[\labelitemi] \textbf{\textit{mesh}:} A combination of MePo and MaSh.
1065 \item[\labelitemi] \textbf{\textit{smart}:} Use MeSh if MaSh is
1066 enabled and the target prover is an ATP; otherwise, use MePo.
1069 \opdefault{max\_facts}{smart\_int}{smart}
1070 Specifies the maximum number of facts that may be returned by the relevance
1071 filter. If the option is set to \textit{smart}, it is set to a value that was
1072 empirically found to be appropriate for the prover. Typical values range between
1075 \opdefault{fact\_thresholds}{float\_pair}{\upshape 0.45~0.85}
1076 Specifies the thresholds above which facts are considered relevant by the
1077 relevance filter. The first threshold is used for the first iteration of the
1078 relevance filter and the second threshold is used for the last iteration (if it
1079 is reached). The effective threshold is quadratically interpolated for the other
1080 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems
1081 are relevant and 1 only theorems that refer to previously seen constants.
1083 \optrue{learn}{dont\_learn}
1084 Specifies whether MaSh should be run automatically by Sledgehammer to learn the
1085 available theories (and hence provide more accurate results). Learning only
1086 takes place if MaSh is enabled.
1088 \opdefault{max\_new\_mono\_instances}{int}{smart}
1089 Specifies the maximum number of monomorphic instances to generate beyond
1090 \textit{max\_facts}. The higher this limit is, the more monomorphic instances
1091 are potentially generated. Whether monomorphization takes place depends on the
1092 type encoding used. If the option is set to \textit{smart}, it is set to a value
1093 that was empirically found to be appropriate for the prover. For most provers,
1097 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}
1099 \opdefault{max\_mono\_iters}{int}{smart}
1100 Specifies the maximum number of iterations for the monomorphization fixpoint
1101 construction. The higher this limit is, the more monomorphic instances are
1102 potentially generated. Whether monomorphization takes place depends on the
1103 type encoding used. If the option is set to \textit{smart}, it is set to a value
1104 that was empirically found to be appropriate for the prover. For most provers,
1108 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}
1111 \subsection{Problem Encoding}
1112 \label{problem-encoding}
1114 \newcommand\comb[1]{\const{#1}}
1117 \opdefault{lam\_trans}{string}{smart}
1118 Specifies the $\lambda$ translation scheme to use in ATP problems. The supported
1119 translation schemes are listed below:
1122 \item[\labelitemi] \textbf{\textit{hide\_lams}:} Hide the $\lambda$-abstractions
1123 by replacing them by unspecified fresh constants, effectively disabling all
1124 reasoning under $\lambda$-abstractions.
1126 \item[\labelitemi] \textbf{\textit{lifting}:} Introduce a new
1127 supercombinator \const{c} for each cluster of $n$~$\lambda$-abstractions,
1128 defined using an equation $\const{c}~x_1~\ldots~x_n = t$ ($\lambda$-lifting).
1130 \item[\labelitemi] \textbf{\textit{combs}:} Rewrite lambdas to the Curry
1131 combinators (\comb{I}, \comb{K}, \comb{S}, \comb{B}, \comb{C}). Combinators
1132 enable the ATPs to synthesize $\lambda$-terms but tend to yield bulkier formulas
1133 than $\lambda$-lifting: The translation is quadratic in the worst case, and the
1134 equational definitions of the combinators are very prolific in the context of
1137 \item[\labelitemi] \textbf{\textit{combs\_and\_lifting}:} Introduce a new
1138 supercombinator \const{c} for each cluster of $\lambda$-abstractions and characterize it both using a
1139 lifted equation $\const{c}~x_1~\ldots~x_n = t$ and via Curry combinators.
1141 \item[\labelitemi] \textbf{\textit{combs\_or\_lifting}:} For each cluster of
1142 $\lambda$-abstractions, heuristically choose between $\lambda$-lifting and Curry
1145 \item[\labelitemi] \textbf{\textit{keep\_lams}:}
1146 Keep the $\lambda$-abstractions in the generated problems. This is available
1147 only with provers that support the THF0 syntax.
1149 \item[\labelitemi] \textbf{\textit{smart}:} The actual translation scheme used
1150 depends on the ATP and should be the most efficient scheme for that ATP.
1153 For SMT solvers, the $\lambda$ translation scheme is always \textit{lifting},
1154 irrespective of the value of this option.
1156 \opsmartx{uncurried\_aliases}{no\_uncurried\_aliases}
1157 Specifies whether fresh function symbols should be generated as aliases for
1158 applications of curried functions in ATP problems.
1160 \opdefault{type\_enc}{string}{smart}
1161 Specifies the type encoding to use in ATP problems. Some of the type encodings
1162 are unsound, meaning that they can give rise to spurious proofs
1163 (unreconstructible using \textit{metis}). The type encodings are
1164 listed below, with an indication of their soundness in parentheses.
1165 An asterisk (*) indicates that the encoding is slightly incomplete for
1166 reconstruction with \textit{metis}, unless the \emph{strict} option (described
1170 \item[\labelitemi] \textbf{\textit{erased} (unsound):} No type information is
1171 supplied to the ATP, not even to resolve overloading. Types are simply erased.
1173 \item[\labelitemi] \textbf{\textit{poly\_guards} (sound):} Types are encoded using
1174 a predicate \const{g}$(\tau, t)$ that guards bound
1175 variables. Constants are annotated with their types, supplied as extra
1176 arguments, to resolve overloading.
1178 \item[\labelitemi] \textbf{\textit{poly\_tags} (sound):} Each term and subterm is
1179 tagged with its type using a function $\const{t\/}(\tau, t)$.
1181 \item[\labelitemi] \textbf{\textit{poly\_args} (unsound):}
1182 Like for \textit{poly\_guards} constants are annotated with their types to
1183 resolve overloading, but otherwise no type information is encoded. This
1184 is the default encoding used by the \textit{metis} command.
1188 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags} (sound); \\
1189 \textit{raw\_mono\_args} (unsound):} \\
1190 Similar to \textit{poly\_guards}, \textit{poly\_tags}, and \textit{poly\_args},
1191 respectively, but the problem is additionally monomorphized, meaning that type
1192 variables are instantiated with heuristically chosen ground types.
1193 Monomorphization can simplify reasoning but also leads to larger fact bases,
1194 which can slow down the ATPs.
1198 \textit{mono\_guards}, \textit{mono\_tags} (sound);
1199 \textit{mono\_args} (unsound):} \\
1201 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, and
1202 \textit{raw\_mono\_args}, respectively but types are mangled in constant names
1203 instead of being supplied as ground term arguments. The binary predicate
1204 $\const{g}(\tau, t)$ becomes a unary predicate
1205 $\const{g\_}\tau(t)$, and the binary function
1206 $\const{t}(\tau, t)$ becomes a unary function
1207 $\const{t\_}\tau(t)$.
1209 \item[\labelitemi] \textbf{\textit{mono\_native} (sound):} Exploits native
1210 first-order types if the prover supports the TFF0, TFF1, or THF0 syntax;
1211 otherwise, falls back on \textit{mono\_guards}. The problem is monomorphized.
1213 \item[\labelitemi] \textbf{\textit{mono\_native\_higher} (sound):} Exploits
1214 native higher-order types if the prover supports the THF0 syntax; otherwise,
1215 falls back on \textit{mono\_native} or \textit{mono\_guards}. The problem is
1218 \item[\labelitemi] \textbf{\textit{poly\_native} (sound):} Exploits native
1219 first-order polymorphic types if the prover supports the TFF1 syntax; otherwise,
1220 falls back on \textit{mono\_native}.
1224 \textit{poly\_guards}?, \textit{poly\_tags}?, \textit{raw\_mono\_guards}?, \\
1225 \textit{raw\_mono\_tags}?, \textit{mono\_guards}?, \textit{mono\_tags}?, \\
1226 \textit{mono\_native}? (sound*):} \\
1227 The type encodings \textit{poly\_guards}, \textit{poly\_tags},
1228 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},
1229 \textit{mono\_tags}, and \textit{mono\_native} are fully typed and sound. For
1230 each of these, Sledgehammer also provides a lighter variant identified by a
1231 question mark (`\hbox{?}')\ that detects and erases monotonic types, notably
1232 infinite types. (For \textit{mono\_native}, the types are not actually erased
1233 but rather replaced by a shared uniform type of individuals.) As argument to the
1234 \textit{metis} proof method, the question mark is replaced by a
1235 \hbox{``\textit{\_query\/}''} suffix.
1239 \textit{poly\_guards}??, \textit{poly\_tags}??, \textit{raw\_mono\_guards}??, \\
1240 \textit{raw\_mono\_tags}??, \textit{mono\_guards}??, \textit{mono\_tags}?? \\
1242 Even lighter versions of the `\hbox{?}' encodings. As argument to the
1243 \textit{metis} proof method, the `\hbox{??}' suffix is replaced by
1244 \hbox{``\textit{\_query\_query\/}''}.
1248 \textit{poly\_guards}@, \textit{poly\_tags}@, \textit{raw\_mono\_guards}@, \\
1249 \textit{raw\_mono\_tags}@ (sound*):} \\
1250 Alternative versions of the `\hbox{??}' encodings. As argument to the
1251 \textit{metis} proof method, the `\hbox{@}' suffix is replaced by
1252 \hbox{``\textit{\_at\/}''}.
1254 \item[\labelitemi] \textbf{\textit{poly\_args}?, \textit{raw\_mono\_args}? (unsound):} \\
1255 Lighter versions of \textit{poly\_args} and \textit{raw\_mono\_args}.
1257 \item[\labelitemi] \textbf{\textit{smart}:} The actual encoding used depends on
1258 the ATP and should be the most efficient sound encoding for that ATP.
1261 For SMT solvers, the type encoding is always \textit{mono\_native}, irrespective
1262 of the value of this option.
1265 {\small See also \textit{max\_new\_mono\_instances} (\S\ref{relevance-filter})
1266 and \textit{max\_mono\_iters} (\S\ref{relevance-filter}).}
1268 \opfalse{strict}{non\_strict}
1269 Specifies whether Sledgehammer should run in its strict mode. In that mode,
1270 sound type encodings marked with an asterisk (*) above are made complete
1271 for reconstruction with \textit{metis}, at the cost of some clutter in the
1272 generated problems. This option has no effect if \textit{type\_enc} is
1273 deliberately set to an unsound encoding.
1276 \subsection{Output Format}
1277 \label{output-format}
1281 \opfalse{verbose}{quiet}
1282 Specifies whether the \textbf{sledgehammer} command should explain what it does.
1283 This option is implicitly disabled for automatic runs.
1285 \opfalse{debug}{no\_debug}
1286 Specifies whether Sledgehammer should display additional debugging information
1287 beyond what \textit{verbose} already displays. Enabling \textit{debug} also
1288 enables \textit{verbose} and \textit{blocking} (\S\ref{mode-of-operation})
1289 behind the scenes. The \textit{debug} option is implicitly disabled for
1293 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}
1295 \opfalse{isar\_proofs}{no\_isar\_proofs}
1296 Specifies whether Isar proofs should be output in addition to one-liner
1297 \textit{metis} proofs. Isar proof construction is still experimental and often
1298 fails; however, they are usually faster and sometimes more robust than
1299 \textit{metis} proofs.
1301 \opdefault{isar\_shrink}{int}{\upshape 10}
1302 Specifies the granularity of the generated Isar proofs if \textit{isar\_proofs}
1303 is enabled. A value of $n$ indicates that each Isar proof step should correspond
1304 to a group of up to $n$ consecutive proof steps in the ATP proof.
1307 \subsection{Authentication}
1308 \label{authentication}
1311 \opnodefault{expect}{string}
1312 Specifies the expected outcome, which must be one of the following:
1315 \item[\labelitemi] \textbf{\textit{some}:} Sledgehammer found a proof.
1316 \item[\labelitemi] \textbf{\textit{none}:} Sledgehammer found no proof.
1317 \item[\labelitemi] \textbf{\textit{timeout}:} Sledgehammer timed out.
1318 \item[\labelitemi] \textbf{\textit{unknown}:} Sledgehammer encountered some
1322 Sledgehammer emits an error (if \textit{blocking} is enabled) or a warning
1323 (otherwise) if the actual outcome differs from the expected outcome. This option
1324 is useful for regression testing.
1327 {\small See also \textit{blocking} (\S\ref{mode-of-operation}) and
1328 \textit{timeout} (\S\ref{timeouts}).}
1331 \subsection{Timeouts}
1335 \opdefault{timeout}{float\_or\_none}{\upshape 30}
1336 Specifies the maximum number of seconds that the automatic provers should spend
1337 searching for a proof. This excludes problem preparation and is a soft limit.
1338 For historical reasons, the default value of this option can be overridden using
1339 the option ``Sledgehammer: Time Limit'' in Proof General's ``Isabelle'' menu.
1341 \opdefault{preplay\_timeout}{float\_or\_none}{\upshape 3}
1342 Specifies the maximum number of seconds that \textit{metis} or \textit{smt}
1343 should spend trying to ``preplay'' the found proof. If this option is set to 0,
1344 no preplaying takes place, and no timing information is displayed next to the
1345 suggested \textit{metis} calls.
1348 {\small See also \textit{minimize} (\S\ref{mode-of-operation}).}
1350 \optrueonly{dont\_preplay}
1351 Alias for ``\textit{preplay\_timeout} = 0''.
1356 \bibliography{manual}{}
1357 \bibliographystyle{abbrv}