1 \documentclass[a4paper,12pt]{article}
2 \usepackage[T1]{fontenc}
5 \usepackage[english,french]{babel}
12 %\usepackage[scaled=.85]{beramono}
13 \usepackage{../../lib/texinputs/isabelle,../iman,../pdfsetup}
16 %\evensidemargin=4.6mm
23 \def\Colon{\mathord{:\mkern-1.5mu:}}
24 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}
25 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}
26 \def\lparr{\mathopen{(\mkern-4mu\mid}}
27 \def\rparr{\mathclose{\mid\mkern-4mu)}}
30 \def\undef{(\lambda x.\; \unk)}
31 %\def\unr{\textit{others}}
33 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}
34 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}
40 \selectlanguage{english}
42 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]
43 Hammering Away \\[\smallskipamount]
44 \Large A User's Guide to Sledgehammer for Isabelle/HOL}
46 Jasmin Christian Blanchette \\
47 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\
54 \setlength{\parskip}{.7em plus .2em minus .1em}
55 \setlength{\parindent}{0pt}
56 \setlength{\abovedisplayskip}{\parskip}
57 \setlength{\abovedisplayshortskip}{.9\parskip}
58 \setlength{\belowdisplayskip}{\parskip}
59 \setlength{\belowdisplayshortskip}{.9\parskip}
61 % General-purpose enum environment with correct spacing
62 \newenvironment{enum}%
64 \setlength{\topsep}{.1\parskip}%
65 \setlength{\partopsep}{.1\parskip}%
66 \setlength{\itemsep}{\parskip}%
67 \advance\itemsep by-\parsep}}
70 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin
71 \advance\rightskip by\leftmargin}
72 \def\post{\vskip0pt plus1ex\endgroup}
74 \def\prew{\pre\advance\rightskip by-\leftmargin}
77 \section{Introduction}
80 Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs)
81 and satisfiability-modulo-theories (SMT) solvers on the current goal. The
82 supported ATPs are E \cite{schulz-2002}, SPASS \cite{weidenbach-et-al-2009},
83 Vampire \cite{riazanov-voronkov-2002}, SInE-E \cite{sine}, SNARK
84 \cite{snark}, and ToFoF-E \cite{tofof}. The ATPs are run either locally or
85 remotely via the System\-On\-TPTP web service \cite{sutcliffe-2000}. In addition
86 to the ATPs, the SMT solvers Z3 \cite{z3} is used by default, and you can tell
87 Sledgehammer to try Yices \cite{yices} and CVC3 \cite{cvc3} as well; these
88 are run either locally or on a server in Munich.
90 The problem passed to the automatic provers consists of your current goal
91 together with a heuristic selection of hundreds of facts (theorems) from the
92 current theory context, filtered by relevance. Because jobs are run in the
93 background, you can continue to work on your proof by other means. Provers can
94 be run in parallel. Any reply (which may arrive half a minute later) will appear
95 in the Proof General response buffer.
97 The result of a successful proof search is some source text that usually (but
98 not always) reconstructs the proof within Isabelle. For ATPs, the reconstructed
99 proof relies on the general-purpose Metis prover \cite{metis}, which is fully
100 integrated into Isabelle/HOL, with explicit inferences going through the kernel.
101 Thus its results are correct by construction.
103 In this manual, we will explicitly invoke the \textbf{sledgehammer} command.
104 Sledgehammer also provides an automatic mode that can be enabled via the
105 ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof General. In
106 this mode, Sledgehammer is run on every newly entered theorem. The time limit
107 for Auto Sledgehammer and other automatic tools can be set using the ``Auto
108 Tools Time Limit'' option.
111 \setbox\boxA=\hbox{\texttt{nospam}}
113 \newcommand\authoremail{\texttt{blan{\color{white}nospam}\kern-\wd\boxA{}chette@\allowbreak
114 in.\allowbreak tum.\allowbreak de}}
116 To run Sledgehammer, you must make sure that the theory \textit{Sledgehammer} is
117 imported---this is rarely a problem in practice since it is part of
118 \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's
119 \texttt{src/HOL/Metis\_Examples} directory.
120 Comments and bug reports concerning Sledgehammer or this manual should be
121 directed to the author at \authoremail.
123 \vskip2.5\smallskipamount
125 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for
126 %suggesting several textual improvements.
128 \section{Installation}
131 Sledgehammer is part of Isabelle, so you don't need to install it. However, it
132 relies on third-party automatic theorem provers (ATPs) and SMT solvers.
134 \subsection{Installing ATPs}
136 Currently, E, SPASS, and Vampire can be run locally; in addition, E, Vampire,
137 SInE-E, SNARK, and ToFoF-E are available remotely via System\-On\-TPTP
138 \cite{sutcliffe-2000}. If you want better performance, you should at least
139 install E and SPASS locally.
141 There are three main ways to install ATPs on your machine:
144 \item[$\bullet$] If you installed an official Isabelle package with everything
145 inside, it should already include properly setup executables for E and SPASS,
147 \footnote{Vampire's license prevents us from doing the same for this otherwise
150 \item[$\bullet$] Alternatively, you can download the Isabelle-aware E and SPASS
151 binary packages from Isabelle's download page. Extract the archives, then add a
152 line to your \texttt{\$ISABELLE\_HOME\_USER/etc/components}%
153 \footnote{The variable \texttt{\$ISABELLE\_HOME\_USER} is set by Isabelle at
154 startup. Its value can be retrieved by invoking \texttt{isabelle}
155 \texttt{getenv} \texttt{ISABELLE\_HOME\_USER} on the command line.}
156 file with the absolute
157 path to E or SPASS. For example, if the \texttt{components} does not exist yet
158 and you extracted SPASS to \texttt{/usr/local/spass-3.7}, create the
159 \texttt{components} file with the single line
162 \texttt{/usr/local/spass-3.7}
167 \item[$\bullet$] If you prefer to build E or SPASS yourself, or obtained a
168 Vampire executable from somewhere (e.g., \url{http://www.vprover.org/}),
169 set the environment variable \texttt{E\_HOME}, \texttt{SPASS\_HOME}, or
170 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{eproof},
171 \texttt{SPASS}, or \texttt{vampire} executable. Sledgehammer has been tested
172 with E 1.0 and 1.2, SPASS 3.5 and 3.7, and Vampire 0.6 and 1.0%
173 \footnote{Following the rewrite of Vampire, the counter for version numbers was
174 reset to 0; hence the (new) Vampire versions 0.6 and 1.0 are more recent than,
176 . Since the ATPs' output formats are neither documented nor stable, other
177 versions of the ATPs might or might not work well with Sledgehammer. Ideally,
178 also set \texttt{E\_VERSION}, \texttt{SPASS\_VERSION}, or
179 \texttt{VAMPIRE\_VERSION} to the ATP's version number (e.g., ``1.2'').
182 To check whether E and SPASS are successfully installed, follow the example in
183 \S\ref{first-steps}. If the remote versions of E and SPASS are used (identified
184 by the prefix ``\emph{remote\_}''), or if the local versions fail to solve the
185 easy goal presented there, this is a sign that something is wrong with your
188 Remote ATP invocation via the SystemOnTPTP web service requires Perl with the
189 World Wide Web Library (\texttt{libwww-perl}) installed. If you must use a proxy
190 server to access the Internet, set the \texttt{http\_proxy} environment variable
191 to the proxy, either in the environment in which Isabelle is launched or in your
192 \texttt{\char`\~/\$ISABELLE\_HOME\_USER/etc/settings} file. Here are a few examples:
195 \texttt{http\_proxy=http://proxy.example.org} \\
196 \texttt{http\_proxy=http://proxy.example.org:8080} \\
197 \texttt{http\_proxy=http://joeblow:pAsSwRd@proxy.example.org}
200 \subsection{Installing SMT Solvers}
202 CVC3, Yices, and Z3 can be run locally or remotely on a Munich server. If you
203 want better performance and get the ability to replay proofs that rely on the
204 \emph{smt} proof method, you should at least install Z3 locally.
206 There are two main ways of installing SMT solvers locally.
209 \item[$\bullet$] If you installed an official Isabelle package with everything
210 inside, it should already include properly setup executables for CVC3 and Z3,
212 \footnote{Yices's license prevents us from doing the same for this otherwise
214 For Z3, you additionally need to set the environment variable
215 \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a noncommercial
218 \item[$\bullet$] Otherwise, follow the instructions documented in the \emph{SMT}
219 theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}).
222 \section{First Steps}
225 To illustrate Sledgehammer in context, let us start a theory file and
226 attempt to prove a simple lemma:
229 \textbf{theory}~\textit{Scratch} \\
230 \textbf{imports}~\textit{Main} \\
231 \textbf{begin} \\[2\smallskipamount]
233 \textbf{lemma} ``$[a] = [b] \,\longleftrightarrow\, a = b$'' \\
234 \textbf{sledgehammer}
237 Instead of issuing the \textbf{sledgehammer} command, you can also find
238 Sledgehammer in the ``Commands'' submenu of the ``Isabelle'' menu in Proof
239 General or press the Emacs key sequence C-c C-a C-s.
240 Either way, Sledgehammer produces the following output after a few seconds:
244 Sledgehammer: ``\textit{e}'' for subgoal 1: \\
245 $([a] = [b]) = (a = b)$ \\
246 Try this command: \textbf{by} (\textit{metis hd.simps}). \\
247 To minimize the number of lemmas, try this: \\
248 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount]
250 Sledgehammer: ``\textit{spass}'' for subgoal 1: \\
251 $([a] = [b]) = (a = b)$ \\
252 Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\
253 To minimize the number of lemmas, try this: \\
254 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount]
256 Sledgehammer: ``\textit{vampire}'' for subgoal 1: \\
257 $([a] = [b]) = (a = b)$ \\
258 Try this command: \textbf{by} (\textit{metis eq\_commute last\_snoc}). \\
259 To minimize the number of lemmas, try this: \\
260 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{vampire}]~(\textit{eq\_commute last\_snoc}). \\[3\smallskipamount]
262 Sledgehammer: ``\textit{remote\_sine\_e}'' for subgoal 1: \\
263 $([a] = [b]) = (a = b)$ \\
264 Try this command: \textbf{by} (\textit{metis hd.simps}). \\
265 To minimize the number of lemmas, try this: \\
266 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{remote\_sine\_e}]~(\textit{hd.simps}). \\[3\smallskipamount]
268 Sledgehammer: ``\textit{remote\_z3}'' for subgoal 1: \\
269 $([a] = [b]) = (a = b)$ \\
270 Try this command: \textbf{by} (\textit{metis hd.simps}). \\
271 To minimize the number of lemmas, try this: \\
272 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{remote\_sine\_e}]~(\textit{hd.simps}).
275 Sledgehammer ran E, SPASS, Vampire, SInE-E, and Z3 in parallel. Depending on
276 which provers are installed and how many processor cores are available, some of
277 the provers might be missing or present with a \textit{remote\_} prefix.
279 For each successful prover, Sledgehammer gives a one-liner proof that uses the
280 \textit{metis} or \textit{smt} method. You can click the proof to insert it into
281 the theory text. You can click the ``\textbf{sledgehammer} \textit{minimize}''
282 command if you want to look for a shorter (and probably faster) proof. But here
283 the proof found by E looks perfect, so click it to finish the proof.
285 You can ask Sledgehammer for an Isar text proof by passing the
286 \textit{isar\_proof} option (\S\ref{output-format}):
289 \textbf{sledgehammer} [\textit{isar\_proof}]
292 When Isar proof construction is successful, it can yield proofs that are more
293 readable and also faster than the \textit{metis} one-liners. This feature is
294 experimental and is only available for ATPs.
299 This section presents a few hints that should help you get the most out of
300 Sledgehammer and Metis. Frequently (and infrequently) asked questions are
301 answered in \S\ref{frequently-asked-questions}.
303 \newcommand\point[1]{{\sl\bfseries#1}\par\nopagebreak}
305 \point{Presimplify the goal}
307 For best results, first simplify your problem by calling \textit{auto} or at
308 least \textit{safe} followed by \textit{simp\_all}. None of the ATPs contain
309 arithmetic decision procedures. They are not especially good at heavy rewriting,
310 but because they regard equations as undirected, they often prove theorems that
311 require the reverse orientation of a \textit{simp} rule. Higher-order problems
312 can be tackled, but the success rate is better for first-order problems. Hence,
313 you may get better results if you first simplify the problem to remove
314 higher-order features.
316 \point{Make sure at least E, SPASS, Vampire, and Z3 are installed}
318 Locally installed provers are faster and more reliable than those running on
319 servers. See \S\ref{installation} for details on how to install them.
321 \point{Familiarize yourself with the most important options}
323 Sledgehammer's options are fully documented in \S\ref{command-syntax}. Many of
324 the options are very specialized, but serious users of the tool should at least
325 familiarize themselves with the following options:
328 \item[$\bullet$] \textbf{\textit{provers}} (\S\ref{mode-of-operation}) specifies
329 the automatic provers (ATPs and SMT solvers) that should be run whenever
330 Sledgehammer is invoked (e.g., ``\textit{provers}~= \textit{e spass
333 \item[$\bullet$] \textbf{\textit{timeout}} (\S\ref{mode-of-operation}) controls
334 the provers' time limit. It is set to 30 seconds, but since Sledgehammer runs
335 asynchronously you should not hesitate to raise this limit to 60 or 120 seconds
336 if you are the kind of user who can think clearly while ATPs are active.
338 \item[$\bullet$] \textbf{\textit{full\_types}} (\S\ref{problem-encoding})
339 specifies whether type-sound encodings should be used. By default, Sledgehammer
340 employs a mixture of type-sound and type-unsound encodings, occasionally
341 yielding unsound ATP proofs. (SMT solver proofs should always be sound, although
342 we occasionally find soundness bugs in the solvers.)
344 \item[$\bullet$] \textbf{\textit{max\_relevant}} (\S\ref{relevance-filter})
345 specifies the maximum number of facts that should be passed to the provers. By
346 default, the value is prover-dependent but varies between about 150 and 1000. If
347 the provers time out, you can try lowering this value to, say, 100 or 50 and see
350 \item[$\bullet$] \textbf{\textit{isar\_proof}} (\S\ref{output-format}) specifies
351 that Isar proofs should be generated, instead of one-liner Metis proofs. The
352 length of the Isar proofs can be controlled by setting
353 \textit{isar\_shrink\_factor} (\S\ref{output-format}).
356 Options can be set globally using \textbf{sledgehammer\_params}
357 (\S\ref{command-syntax}). Fact selection can be influenced by specifying
358 ``$(\textit{add}{:}~\textit{my\_facts})$'' after the \textbf{sledgehammer}
359 call to ensure that certain facts are included, or simply
360 ``$(\textit{my\_facts})$'' to force Sledgehammer to run only with
361 $\textit{my\_facts}$.
363 \section{Frequently Asked Questions}
364 \label{frequently-asked-questions}
366 \point{Why does Metis fail to reconstruct the proof?}
368 There are many reasons. If Metis runs seemingly forever, that is a sign that the
369 proof is too difficult for it. Metis is complete, so it should eventually find
370 it, but that's little consolation. There are several possible solutions:
373 \item[$\bullet$] Try the \textit{isar\_proof} option (\S\ref{output-format}) to
374 obtain a step-by-step Isar proof where each step is justified by Metis. Since
375 the steps are fairly small, Metis is more likely to be able to replay them.
377 \item[$\bullet$] Try the \textit{smt} proof method instead of \textit{metis}. It
378 is usually stronger, but you need to have Z3 available to replay the proofs,
379 trust the SMT solver, or use certificates. See the documentation in the
380 \emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.
382 \item[$\bullet$] Try the \textit{blast} or \textit{auto} proof methods, passing
383 facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},
384 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
387 In some rare cases, Metis fails fairly quickly. This usually indicates that
388 Sledgehammer found a type-incorrect proof. Sledgehammer erases some type
389 information to speed up the search. Try Sledgehammer again with full type
390 information: \textit{full\_types} (\S\ref{problem-encoding}), or choose a
391 specific type encoding with \textit{type\_sys} (\S\ref{problem-encoding}). Older
392 versions of Sledgehammer were frequent victims of this problem. Now this should
393 very seldom be an issue, but if you notice many unsound proofs, contact the
394 author at \authoremail.
396 \point{How can I tell whether a Sledgehammer proof is sound?}
398 First, if \emph{metis} (or \emph{metisFT}) can reconstruct it, the proof is
399 sound (modulo soundness of Isabelle's inference kernel). If it fails or runs
400 seemingly forever, you can try
403 \textbf{apply}~\textbf{--} \\
404 \textbf{sledgehammer} [\textit{type\_sys} = \textit{poly\_tags}] (\textit{metis\_facts})
407 where \textit{metis\_facts} is the list of facts appearing in the suggested
408 Metis call. The automatic provers should be able to refind the proof very
409 quickly if it is sound, and the \textit{type\_sys} $=$ \textit{poly\_tags}
410 option (\S\ref{problem-encoding}) ensures that no unsound proofs are found.
412 The \textit{full\_types} option (\S\ref{problem-encoding}) can also be used
413 here, but it is unsound in extremely rare degenerate cases such as the
417 \textbf{lemma} ``$\forall x\> y\Colon{'}a.\ x = y \,\Longrightarrow \exists f\> g\Colon\mathit{nat} \Rightarrow {'}a.\ f \not= g$'' \\
418 \textbf{sledgehammer} [\textit{full\_types}] (\textit{nat.distinct\/}(1))
421 \point{How does Sledgehammer select the facts that should be passed to the
424 Briefly, the relevance filter assigns a score to every available fact (lemma,
425 theorem, definition, or axiom)\ based upon how many constants that fact shares
426 with the conjecture; this process iterates to include facts relevant to those
427 just accepted, but with a decay factor to ensure termination. The constants are
428 weighted to give unusual ones greater significance. The relevance filter copes
429 best when the conjecture contains some unusual constants; if all the constants
430 are common, it is unable to discriminate among the hundreds of facts that are
431 picked up. The relevance filter is also memoryless: It has no information about
432 how many times a particular fact has been used in a proof, and it cannot learn.
434 The number of facts included in a problem varies from prover to prover, since
435 some provers get overwhelmed quicker than others. You can show the number of
436 facts given using the \textit{verbose} option (\S\ref{output-format}) and the
437 actual facts using \textit{debug} (\S\ref{output-format}).
439 Sledgehammer is good at finding short proofs combining a handful of existing
440 lemmas. If you are looking for longer proofs, you must typically restrict the
441 number of facts, by setting the \textit{max\_relevant} option
442 (\S\ref{relevance-filter}) to, say, 50 or 100.
444 \point{Why are the Isar proofs generated by Sledgehammer so ugly?}
446 The current implementation is experimental and explodes exponentially in the
447 worst case. Work on a new implementation has begun. There is a large body of
448 research into transforming resolution proofs into natural deduction proofs (such
449 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
450 set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger
451 value or to try several provers and keep the nicest-looking proof.
453 \point{Should I let Sledgehammer minimize the number of lemmas?}
455 In general, minimization is a good idea, because proofs involving fewer lemmas
456 tend to be shorter as well, and hence easier to re-find by Metis. But the
457 opposite is sometimes the case.
459 \point{Why does the minimizer sometimes starts of its own?}
461 There are two scenarios in which this can happen. First, some provers (e.g.,
462 CVC3 and Yices) do not provide proofs or provide incomplete proofs. The
463 minimizer is then invoked to find out which facts are actually needed from the
464 (large) set of facts that was initinally given to the prover. Second, if a
465 prover returns a proof with lots of facts, the minimizer is invoked
466 automatically since Metis is unlikely to refind the proof.
468 \point{What is metisFT?}
470 The \textit{metisFT} proof method is the fully-typed version of Metis. It is
471 much slower than \textit{metis}, but the proof search is fully typed, and it
472 also includes more powerful rules such as the axiom ``$x = \mathit{True}
473 \mathrel{\lor} x = \mathit{False}$'' for reasoning in higher-order places (e.g.,
474 in set comprehensions). The method kicks in automatically as a fallback when
475 \textit{metis} fails, and it is sometimes generated by Sledgehammer instead of
476 \textit{metis} if the proof obviously requires type information.
478 If you see the warning
482 Metis: Falling back on ``\textit{metisFT}''.
485 in a successful Metis proof, you can advantageously replace the \textit{metis}
486 call with \textit{metisFT}.
488 \point{I got a strange error from Sledgehammer---what should I do?}
490 Sledgehammer tries to give informative error messages. Please report any strange
491 error to the author at \authoremail. This applies double if you get the message
495 The prover found a type-unsound proof involving ``\textit{foo}'',
496 ``\textit{bar}'', ``\textit{baz}'' even though a supposedly type-sound encoding
497 was used (or, less likely, your axioms are inconsistent). You might want to
498 report this to the Isabelle developers.
501 \point{Auto can solve it---why not Sledgehammer?}
503 Problems can be easy for \textit{auto} and difficult for automatic provers, but
504 the reverse is also true, so don't be discouraged if your first attempts fail.
505 Because the system refers to all theorems known to Isabelle, it is particularly
506 suitable when your goal has a short proof from lemmas that you don't know about.
508 \point{Why are there so many options?}
510 Sledgehammer's philosophy should work out of the box, without user guidance.
511 Many of the options are meant to be used mostly by the Sledgehammer developers
512 for experimentation purposes. Of course, feel free to experiment with them if
515 \section{Command Syntax}
516 \label{command-syntax}
518 Sledgehammer can be invoked at any point when there is an open goal by entering
519 the \textbf{sledgehammer} command in the theory file. Its general syntax is as
523 \textbf{sledgehammer} \textit{subcommand\/$^?$ options\/$^?$ facts\_override\/$^?$ num\/$^?$}
526 For convenience, Sledgehammer is also available in the ``Commands'' submenu of
527 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c
528 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with
529 no arguments in the theory text.
531 In the general syntax, the \textit{subcommand} may be any of the following:
534 \item[$\bullet$] \textbf{\textit{run} (the default):} Runs Sledgehammer on
535 subgoal number \textit{num} (1 by default), with the given options and facts.
537 \item[$\bullet$] \textbf{\textit{minimize}:} Attempts to minimize the provided facts
538 (specified in the \textit{facts\_override} argument) to obtain a simpler proof
539 involving fewer facts. The options and goal number are as for \textit{run}.
541 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued
542 by Sledgehammer. This allows you to examine results that might have been lost
543 due to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a
544 limit on the number of messages to display (5 by default).
546 \item[$\bullet$] \textbf{\textit{supported\_provers}:} Prints the list of
547 automatic provers supported by Sledgehammer. See \S\ref{installation} and
548 \S\ref{mode-of-operation} for more information on how to install automatic
551 \item[$\bullet$] \textbf{\textit{running\_provers}:} Prints information about
552 currently running automatic provers, including elapsed runtime and remaining
555 \item[$\bullet$] \textbf{\textit{kill\_provers}:} Terminates all running
558 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
559 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
562 Sledgehammer's behavior can be influenced by various \textit{options}, which can
563 be specified in brackets after the \textbf{sledgehammer} command. The
564 \textit{options} are a list of key--value pairs of the form ``[$k_1 = v_1,
565 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true}'' is optional. For
569 \textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120$\,s$]
572 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:
575 \textbf{sledgehammer\_params} \textit{options}
578 The supported options are described in \S\ref{option-reference}.
580 The \textit{facts\_override} argument lets you alter the set of facts that go
581 through the relevance filter. It may be of the form ``(\textit{facts})'', where
582 \textit{facts} is a space-separated list of Isabelle facts (theorems, local
583 assumptions, etc.), in which case the relevance filter is bypassed and the given
584 facts are used. It may also be of the form ``(\textit{add}:\ \textit{facts}$_1$)'',
585 ``(\textit{del}:\ \textit{facts}$_2$)'', or ``(\textit{add}:\ \textit{facts}$_1$\
586 \textit{del}:\ \textit{facts}$_2$)'', where the relevance filter is instructed to
587 proceed as usual except that it should consider \textit{facts}$_1$
588 highly-relevant and \textit{facts}$_2$ fully irrelevant.
590 You can instruct Sledgehammer to run automatically on newly entered theorems by
591 enabling the ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof
592 General. For automatic runs, only the first prover set using \textit{provers}
593 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,
594 \textit{slicing} (\S\ref{mode-of-operation}) is disabled, \textit{timeout}
595 (\S\ref{mode-of-operation}) is superseded by the ``Auto Tools Time Limit'' in
596 Proof General's ``Isabelle'' menu, \textit{full\_types}
597 (\S\ref{problem-encoding}) is enabled, and \textit{verbose}
598 (\S\ref{output-format}) and \textit{debug} (\S\ref{output-format}) are disabled.
599 Sledgehammer's output is also more concise.
601 \section{Option Reference}
602 \label{option-reference}
604 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}
605 \def\qty#1{$\left<\textit{#1}\right>$}
606 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1}}\right>}$}
607 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{true}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
608 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{false}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
609 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
610 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
611 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}
612 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\quad [\textit{#3}]} \nopagebreak\\[\parskip]}
613 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}
614 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
615 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
617 Sledgehammer's options are categorized as follows:\ mode of operation
618 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}),
619 relevance filter (\S\ref{relevance-filter}), output format
620 (\S\ref{output-format}), and authentication (\S\ref{authentication}).
622 The descriptions below refer to the following syntactic quantities:
625 \item[$\bullet$] \qtybf{string}: A string.
626 \item[$\bullet$] \qtybf{bool\/}: \textit{true} or \textit{false}.
627 \item[$\bullet$] \qtybf{bool\_or\_smart\/}: \textit{true}, \textit{false}, or
629 \item[$\bullet$] \qtybf{int\/}: An integer.
630 %\item[$\bullet$] \qtybf{float\/}: A floating-point number (e.g., 2.5).
631 \item[$\bullet$] \qtybf{float\_pair\/}: A pair of floating-point numbers
633 \item[$\bullet$] \qtybf{int\_or\_smart\/}: An integer or \textit{smart}.
634 \item[$\bullet$] \qtybf{float\_or\_none\/}: An integer (e.g., 60) or
635 floating-point number (e.g., 0.5) expressing a number of seconds, or the keyword
636 \textit{none} ($\infty$ seconds).
639 Default values are indicated in square brackets. Boolean options have a negated
640 counterpart (e.g., \textit{blocking} vs.\ \textit{non\_blocking}). When setting
641 Boolean options, ``= \textit{true}'' may be omitted.
643 \subsection{Mode of Operation}
644 \label{mode-of-operation}
647 \opnodefault{provers}{string}
648 Specifies the automatic provers to use as a space-separated list (e.g.,
649 ``\textit{e}~\textit{spass}''). The following provers are supported:
652 \item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz
653 \cite{schulz-2002}. To use E, set the environment variable
654 \texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable,
655 or install the prebuilt E package from Isabelle's download page. See
656 \S\ref{installation} for details.
658 \item[$\bullet$] \textbf{\textit{spass}:} SPASS is an ATP developed by Christoph
659 Weidenbach et al.\ \cite{weidenbach-et-al-2009}. To use SPASS, set the
660 environment variable \texttt{SPASS\_HOME} to the directory that contains the
661 \texttt{SPASS} executable, or install the prebuilt SPASS package from Isabelle's
662 download page. Sledgehammer requires version 3.5 or above. See
663 \S\ref{installation} for details.
665 \item[$\bullet$] \textbf{\textit{vampire}:} Vampire is an ATP developed by
666 Andrei Voronkov and his colleagues \cite{riazanov-voronkov-2002}. To use
667 Vampire, set the environment variable \texttt{VAMPIRE\_HOME} to the directory
668 that contains the \texttt{vampire} executable. Sledgehammer has been tested with
669 versions 11, 0.6, and 1.0.
671 \item[$\bullet$] \textbf{\textit{cvc3}:} CVC3 is an SMT solver developed by
672 Clark Barrett, Cesare Tinelli, and their colleagues \cite{cvc3}. To use CVC3,
673 set the environment variable \texttt{CVC3\_SOLVER} to the complete path of the
674 executable, including the file name. Sledgehammer has been tested with version
677 \item[$\bullet$] \textbf{\textit{yices}:} Yices is an SMT solver developed at
678 SRI \cite{yices}. To use Yices, set the environment variable
679 \texttt{YICES\_SOLVER} to the complete path of the executable, including the
680 file name. Sledgehammer has been tested with version 1.0.
682 \item[$\bullet$] \textbf{\textit{z3}:} Z3 is an SMT solver developed at
683 Microsoft Research \cite{z3}. To use Z3, set the environment variable
684 \texttt{Z3\_SOLVER} to the complete path of the executable, including the file
685 name. Sledgehammer has been tested with versions 2.7 to 2.18.
687 \item[$\bullet$] \textbf{\textit{z3\_atp}:} This version of Z3 pretends to be an
688 ATP, exploiting Z3's undocumented support for the TPTP format. It is included
689 for experimental purposes. It requires version 2.18 or above.
691 \item[$\bullet$] \textbf{\textit{remote\_e}:} The remote version of E runs
692 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
694 \item[$\bullet$] \textbf{\textit{remote\_vampire}:} The remote version of
695 Vampire runs on Geoff Sutcliffe's Miami servers. Version 9 is used.
697 \item[$\bullet$] \textbf{\textit{remote\_tofof\_e}:} ToFoF-E is a metaprover
698 developed by Geoff Sutcliffe \cite{tofof} based on E running on his Miami
699 servers. This ATP supports a fragment of the TPTP many-typed first-order format
700 (TFF). It is supported primarily for experimenting with the
701 \textit{type\_sys} $=$ \textit{simple} option (\S\ref{problem-encoding}).
703 \item[$\bullet$] \textbf{\textit{remote\_sine\_e}:} SInE-E is a metaprover
704 developed by Kry\v stof Hoder \cite{sine} based on E. The remote version of
705 SInE runs on Geoff Sutcliffe's Miami servers.
707 \item[$\bullet$] \textbf{\textit{remote\_snark}:} SNARK is a prover
708 developed by Stickel et al.\ \cite{snark}. The remote version of
709 SNARK runs on Geoff Sutcliffe's Miami servers.
711 \item[$\bullet$] \textbf{\textit{remote\_cvc3}:} The remote version of CVC3 runs
712 on servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
715 \item[$\bullet$] \textbf{\textit{remote\_z3}:} The remote version of Z3 runs on
716 servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
719 \item[$\bullet$] \textbf{\textit{remote\_z3\_atp}:} The remote version of ``Z3
720 as an ATP'' runs on Geoff Sutcliffe's Miami servers.
723 By default, Sledgehammer will run E, SPASS, Vampire, SInE-E, and Z3 (or whatever
724 the SMT module's \textit{smt\_solver} configuration option is set to) in
725 parallel---either locally or remotely, depending on the number of processor
726 cores available. For historical reasons, the default value of this option can be
727 overridden using the option ``Sledgehammer: Provers'' from the ``Isabelle'' menu
730 It is a good idea to run several provers in parallel, although it could slow
731 down your machine. Running E, SPASS, Vampire, and SInE-E together for 5 seconds
732 yields a better success rate than running the most effective of these (Vampire)
733 for 120 seconds \cite{boehme-nipkow-2010}.
735 \opnodefault{prover}{string}
736 Alias for \textit{provers}.
738 %\opnodefault{atps}{string}
739 %Legacy alias for \textit{provers}.
741 %\opnodefault{atp}{string}
742 %Legacy alias for \textit{provers}.
744 \opdefault{timeout}{float\_or\_none}{\upshape 30}
745 Specifies the maximum number of seconds that the automatic provers should spend
746 searching for a proof. This excludes problem preparation and is a soft limit.
747 For historical reasons, the default value of this option can be overridden using
748 the option ``Sledgehammer: Time Limit'' from the ``Isabelle'' menu in Proof
751 \opfalse{blocking}{non\_blocking}
752 Specifies whether the \textbf{sledgehammer} command should operate
753 synchronously. The asynchronous (non-blocking) mode lets the user start proving
754 the putative theorem manually while Sledgehammer looks for a proof, but it can
755 also be more confusing.
757 \optrue{slicing}{no\_slicing}
758 Specifies whether the time allocated to a prover should be sliced into several
759 segments, each of which has its own set of possibly prover-dependent options.
760 For SPASS and Vampire, the first slice tries the fast but incomplete
761 set-of-support (SOS) strategy, whereas the second slice runs without it. For E,
762 up to three slices are tried, with different weighted search strategies and
763 number of facts. For SMT solvers, several slices are tried with the same options
764 each time but fewer and fewer facts. According to benchmarks with a timeout of
765 30 seconds, slicing is a valuable optimization, and you should probably leave it
766 enabled unless you are conducting experiments. This option is implicitly
767 disabled for (short) automatic runs.
770 {\small See also \textit{verbose} (\S\ref{output-format}).}
772 \opfalse{overlord}{no\_overlord}
773 Specifies whether Sledgehammer should put its temporary files in
774 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for
775 debugging Sledgehammer but also unsafe if several instances of the tool are run
776 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may
777 safely remove them after Sledgehammer has run.
780 {\small See also \textit{debug} (\S\ref{output-format}).}
783 \subsection{Problem Encoding}
784 \label{problem-encoding}
787 \opfalse{explicit\_apply}{implicit\_apply}
788 Specifies whether function application should be encoded as an explicit
789 ``apply'' operator in ATP problems. If the option is set to \textit{false}, each
790 function will be directly applied to as many arguments as possible. Enabling
791 this option can sometimes help discover higher-order proofs that otherwise would
794 \opfalse{full\_types}{partial\_types}
795 Specifies whether full type information is encoded in ATP problems. Enabling
796 this option prevents the discovery of type-incorrect proofs, but it can slow
797 down the ATP slightly. This option is implicitly enabled for automatic runs. For
798 historical reasons, the default value of this option can be overridden using the
799 option ``Sledgehammer: Full Types'' from the ``Isabelle'' menu in Proof General.
801 \opdefault{type\_sys}{string}{smart}
802 Specifies the type system to use in ATP problems. The option can take the
806 \item[$\bullet$] \textbf{\textit{poly\_preds}:} Types are encoded using a predicate
807 $\mathit{has\_type\/}(\tau, t)$ that restricts the range of bound variables.
808 Constants are annotated with their types, supplied as extra arguments, to
811 \item[$\bullet$] \textbf{\textit{poly\_tags}:} Each term and subterm is tagged with
812 its type using a function $\mathit{type\_info\/}(\tau, t)$.
814 \item[$\bullet$] \textbf{\textit{poly\_args}:}
815 Like for the other sound encodings, constants are annotated with their types to
816 resolve overloading, but otherwise no type information is encoded.
818 \item[$\bullet$] \textbf{\textit{erased}:} No type information is supplied to
819 the ATP. Types are simply erased.
823 \textit{mono\_preds},
825 \textit{mono\_args}:} \\
826 Similar to \textit{poly\_preds}, \textit{poly\_tags}, and \textit{poly\_args},
827 respectively, but the problem is additionally monomorphized, meaning that type
828 variables are instantiated with heuristically chosen ground types.
829 Monomorphization can simplify reasoning but also leads to larger fact bases,
830 which can slow down the ATPs.
832 \item[$\bullet$] \textbf{\textit{simple}:} Use the prover's support for simply
833 typed first-order logic if available; otherwise, fall back on
834 \textit{mangled\_preds}. The problem is monomorphized.
838 \textit{mangled\_preds},
839 \textit{mangled\_tags},
840 \textit{mangled\_args}:} \\
842 \textit{mono\_preds}, \textit{mono\_tags}, and \textit{mono\_args},
843 respectively but types are mangled in constant names instead of being supplied
844 as ground term arguments. The binary predicate $\mathit{has\_type\/}(\tau, t)$
845 becomes a unary predicate $\mathit{has\_type\_}\tau(t)$, and the binary function
846 $\mathit{type\_info\/}(\tau, t)$ becomes a unary function
847 $\mathit{type\_info\_}\tau(t)$.
851 \textit{mono\_preds}?, \textit{mono\_tags}?, \textit{simple}?, \\
852 \textit{mangled\_preds}?, \textit{mangled\_tags}?:} \\
853 The type systems \textit{mono\_preds}, \textit{mono\_tags}, \textit{simple},
854 \textit{mangled\_preds}, and \textit{mangled\_tags} are fully typed and
855 virtually sound---except for pathological cases, all found proofs are
856 type-correct. For each of these, Sledgehammer also provides a lighter (but
857 virtually sound) variant identified by a question mark (`{?}')\ that detects and
858 erases monotonic types, notably infinite types. (For \textit{simple}, the types
859 are not actually erased but rather replaced by a shared uniform type of
864 \textit{poly\_tags}!, \textit{mono\_preds}!, \textit{mono\_tags}!, \\
865 \textit{simple}!, \textit{mangled\_preds}!, \textit{mangled\_tags}!:} \\
866 The type systems \textit{poly\_preds}, \textit{poly\_tags},
867 \textit{mono\_preds}, \textit{mono\_tags}, \textit{simple},
868 \textit{mangled\_preds}, and \textit{mangled\_tags} also admit a somewhat
869 unsound (but very efficient) variant identified by an exclamation mark (`{!}')
870 that detects and erases erases all types except those that are clearly finite
871 (e.g., \textit{bool}). (For \textit{simple}, the types are not actually erased
872 but rather replaced by a shared uniform type of individuals.)
874 \item[$\bullet$] \textbf{\textit{smart}:} If \textit{full\_types} is enabled,
875 uses a fully typed, virtually sound encoding; otherwise, uses any encoding. The
876 actual encoding used depends on the ATP and should be the most efficient for
880 In addition, all the \textit{preds} and \textit{tags} type systems are available
881 in two variants, a lightweight and a heavyweight variant. The lightweight
882 variants are generally more efficient and are the default; the heavyweight
883 variants are identified by a \textit{\_heavy} suffix (e.g.,
884 \textit{mangled\_preds\_heavy}{?}).
886 For SMT solvers and ToFoF-E, the type system is always \textit{simple},
887 irrespective of the value of this option.
890 \subsection{Relevance Filter}
891 \label{relevance-filter}
894 \opdefault{relevance\_thresholds}{float\_pair}{\upshape 0.45~0.85}
895 Specifies the thresholds above which facts are considered relevant by the
896 relevance filter. The first threshold is used for the first iteration of the
897 relevance filter and the second threshold is used for the last iteration (if it
898 is reached). The effective threshold is quadratically interpolated for the other
899 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems
900 are relevant and 1 only theorems that refer to previously seen constants.
902 \opsmart{max\_relevant}{int\_or\_smart}
903 Specifies the maximum number of facts that may be returned by the relevance
904 filter. If the option is set to \textit{smart}, it is set to a value that was
905 empirically found to be appropriate for the prover. A typical value would be
908 \opdefault{max\_new\_mono\_instances}{int}{\upshape 400}
909 Specifies the maximum number of monomorphic instances to generate beyond
910 \textit{max\_relevant}. The higher this limit is, the more monomorphic instances
911 are potentially generated. Whether monomorphization takes place depends on the
915 {\small See also \textit{type\_sys} (\S\ref{problem-encoding}).}
917 \opdefault{max\_mono\_iters}{int}{\upshape 3}
918 Specifies the maximum number of iterations for the monomorphization fixpoint
919 construction. The higher this limit is, the more monomorphic instances are
920 potentially generated. Whether monomorphization takes place depends on the
924 {\small See also \textit{type\_sys} (\S\ref{problem-encoding}).}
927 \subsection{Output Format}
928 \label{output-format}
932 \opfalse{verbose}{quiet}
933 Specifies whether the \textbf{sledgehammer} command should explain what it does.
934 This option is implicitly disabled for automatic runs.
936 \opfalse{debug}{no\_debug}
937 Specifies whether Sledgehammer should display additional debugging information
938 beyond what \textit{verbose} already displays. Enabling \textit{debug} also
939 enables \textit{verbose} and \textit{blocking} (\S\ref{mode-of-operation})
940 behind the scenes. The \textit{debug} option is implicitly disabled for
944 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}
946 \opfalse{isar\_proof}{no\_isar\_proof}
947 Specifies whether Isar proofs should be output in addition to one-liner
948 \textit{metis} proofs. Isar proof construction is still experimental and often
949 fails; however, they are usually faster and sometimes more robust than
950 \textit{metis} proofs.
952 \opdefault{isar\_shrink\_factor}{int}{\upshape 1}
953 Specifies the granularity of the Isar proof. A value of $n$ indicates that each
954 Isar proof step should correspond to a group of up to $n$ consecutive proof
955 steps in the ATP proof.
959 \subsection{Authentication}
960 \label{authentication}
963 \opnodefault{expect}{string}
964 Specifies the expected outcome, which must be one of the following:
967 \item[$\bullet$] \textbf{\textit{some}:} Sledgehammer found a (potentially
969 \item[$\bullet$] \textbf{\textit{none}:} Sledgehammer found no proof.
970 \item[$\bullet$] \textbf{\textit{unknown}:} Sledgehammer encountered some
974 Sledgehammer emits an error (if \textit{blocking} is enabled) or a warning
975 (otherwise) if the actual outcome differs from the expected outcome. This option
976 is useful for regression testing.
979 {\small See also \textit{blocking} (\S\ref{mode-of-operation}).}
983 \bibliography{../manual}{}
984 \bibliographystyle{abbrv}