1 \documentclass[a4paper,12pt]{article}
2 \usepackage[T1]{fontenc}
5 \usepackage[english,french]{babel}
12 %\usepackage[scaled=.85]{beramono}
13 \usepackage{../iman,../pdfsetup}
16 %\evensidemargin=4.6mm
23 \def\Colon{\mathord{:\mkern-1.5mu:}}
24 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}
25 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}
26 \def\lparr{\mathopen{(\mkern-4mu\mid}}
27 \def\rparr{\mathclose{\mid\mkern-4mu)}}
30 \def\undef{(\lambda x.\; \unk)}
31 %\def\unr{\textit{others}}
33 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}
34 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}
40 \selectlanguage{english}
42 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]
43 Hammering Away \\[\smallskipamount]
44 \Large A User's Guide to Sledgehammer for Isabelle/HOL}
46 Jasmin Christian Blanchette \\
47 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\
54 \setlength{\parskip}{.7em plus .2em minus .1em}
55 \setlength{\parindent}{0pt}
56 \setlength{\abovedisplayskip}{\parskip}
57 \setlength{\abovedisplayshortskip}{.9\parskip}
58 \setlength{\belowdisplayskip}{\parskip}
59 \setlength{\belowdisplayshortskip}{.9\parskip}
61 % General-purpose enum environment with correct spacing
62 \newenvironment{enum}%
64 \setlength{\topsep}{.1\parskip}%
65 \setlength{\partopsep}{.1\parskip}%
66 \setlength{\itemsep}{\parskip}%
67 \advance\itemsep by-\parsep}}
70 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin
71 \advance\rightskip by\leftmargin}
72 \def\post{\vskip0pt plus1ex\endgroup}
74 \def\prew{\pre\advance\rightskip by-\leftmargin}
77 \section{Introduction}
80 Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs)
81 on the current goal. The supported ATPs are E \cite{schulz-2002}, SPASS
82 \cite{weidenbach-et-al-2009}, and Vampire \cite{riazanov-voronkov-2002}, which
83 can be run locally or remotely via the SystemOnTPTP web service
84 \cite{sutcliffe-2000}.
86 The problem passed to ATPs consists of the current goal together with a
87 heuristic selection of facts (theorems) from the current theory context,
88 filtered by relevance. The result of a successful ATP proof search is some
89 source text that usually (but not always) reconstructs the proof within
90 Isabelle, without requiring the ATPs again. The reconstructed proof relies on
91 the general-purpose Metis prover \cite{metis}, which is fully integrated into
92 Isabelle/HOL, with explicit inferences going through the kernel. Thus its
93 results are correct by construction.
96 \setbox\boxA=\hbox{\texttt{nospam}}
98 Examples of Sledgehammer use can be found in Isabelle's
99 \texttt{src/HOL/Metis\_Examples} directory.
100 Comments and bug reports concerning Sledgehammer or this manual should be
102 \texttt{blan{\color{white}nospam}\kern-\wd\boxA{}chette@\allowbreak
103 in.\allowbreak tum.\allowbreak de}.
105 \vskip2.5\smallskipamount
107 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for
108 %suggesting several textual improvements.
110 \section{Installation}
113 Sledgehammer is part of Isabelle, so you don't need to install it. However, it
114 relies on third-party automatic theorem provers (ATPs). Currently, E, SPASS, and
115 Vampire are supported. All of these are available remotely via SystemOnTPTP
116 \cite{sutcliffe-2000}, but if you want better performance you will need to
117 install at least E and SPASS locally.
119 There are three main ways to install E and SPASS:
122 \item[$\bullet$] If you installed an official Isabelle package with everything
123 inside, it should already include properly setup executables for E and SPASS,
126 \item[$\bullet$] Otherwise, you can download the Isabelle-aware E and SPASS
127 binary packages from Isabelle's download page. Extract the archives, then add a
128 line to your \texttt{\char`\~/.isabelle/etc/components} file with the absolute path to
129 E or SPASS. For example, if \texttt{\char`\~/.isabelle/etc/components} does not exist
130 yet and you extracted SPASS to \texttt{/usr/local/spass-3.7}, create
131 the file \texttt{\char`\~/.isabelle/etc/components} with the single line
134 \texttt{/usr/local/spass-3.7}
137 \item[$\bullet$] If you prefer to build E or SPASS yourself, feel free to do so
138 and set the environment variable \texttt{E\_HOME} or \texttt{SPASS\_HOME} to the
139 directory that contains the \texttt{eproof} or \texttt{SPASS} executable,
143 To check whether E and SPASS are installed, follow the example in
146 \section{First Steps}
149 To illustrate Sledgehammer in context, let us start a theory file and
150 attempt to prove a simple lemma:
153 \textbf{theory}~\textit{Scratch} \\
154 \textbf{imports}~\textit{Main} \\
155 \textbf{begin} \\[2\smallskipamount]
157 \textbf{lemma} ``$[a] = [b] \,\longleftrightarrow\, a = b$'' \\
158 \textbf{sledgehammer}
161 After a few seconds, Sledgehammer produces the following output:
165 Sledgehammer: ATP ``\textit{e}'' for subgoal 1: \\
166 $([a] = [b]) = (a = b)$ \\
167 Try this command: \textbf{by} (\textit{metis hd.simps}). \\
168 To minimize the number of lemmas, try this command: \\
169 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount]
171 Sledgehammer: ATP ``\textit{spass}'' for subgoal 1: \\
172 $([a] = [b]) = (a = b)$ \\
173 Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\
174 To minimize the number of lemmas, try this command: \\
175 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount]
177 Sledgehammer: ATP ``\textit{remote\_vampire}'' for subgoal 1: \\
178 $([a] = [b]) = (a = b)$ \\
179 Try this command: \textbf{by} (\textit{metis One\_nat\_def\_raw empty\_replicate} \\
180 \phantom{Try this command: \textbf{by} (\textit{metis~}}\textit{insert\_Nil last\_ConsL replicate\_Suc}). \\
181 To minimize the number of lemmas, try this command: \\
182 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{remote\_vampire}] \\
183 \phantom{\textbf{sledgehammer}~}(\textit{One\_nat\_def\_raw empty\_replicate insert\_Nil} \\
184 \phantom{\textbf{sledgehammer}~(}\textit{last\_ConsL replicate\_Suc}).
187 Sledgehammer ran E, SPASS, and the remote version of Vampire in parallel. If E
188 and SPASS are not installed (\S\ref{installation}), you will see references to
189 their remote American cousins \textit{remote\_e} and \textit{remote\_spass}
190 instead of \textit{e} and \textit{spass}.
192 Based on each ATP proof, Sledgehammer gives a one-liner proof that uses the
193 \textit{metis} method. You can click them and insert them into the theory text.
194 You can click the ``\textbf{sledgehammer} \textit{minimize}'' command if you
195 want to look for a shorter (and faster) proof. But here the proof found by E
196 looks perfect, so click it to finish the proof.
198 You can ask Sledgehammer for an Isar text proof by passing the
199 \textit{isar\_proof} option:
202 \textbf{sledgehammer} [\textit{isar\_proof}]
205 When Isar proof construction is successful, it can yield proofs that are more
206 readable and also faster than the \textit{metis} one-liners. This feature is
209 \section{Command Syntax}
210 \label{command-syntax}
212 Sledgehammer can be invoked at any point when there is an open goal by entering
213 the \textbf{sledgehammer} command in the theory file. Its general syntax is as
217 \textbf{sledgehammer} \textit{subcommand\/$^?$ options\/$^?$ facts\_override\/$^?$ num\/$^?$}
220 For convenience, Sledgehammer is also available in the ``Commands'' submenu of
221 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c
222 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with
223 no arguments in the theory text.
225 In the general syntax, the \textit{subcommand} may be any of the following:
228 \item[$\bullet$] \textbf{\textit{run} (the default):} Runs Sledgehammer on subgoal number
229 \textit{num} (1 by default), with the given options and facts.
231 \item[$\bullet$] \textbf{\textit{minimize}:} Attempts to minimize the provided facts
232 (specified in the \textit{facts\_override} argument) to obtain a simpler proof
233 involving fewer facts. The options and goal number are as for \textit{run}.
235 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued by
236 Sledgehammer. This allows you to examine results that might have been lost due
237 to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a
238 limit on the number of messages to display (5 by default).
240 \item[$\bullet$] \textbf{\textit{available\_atps}:} Prints the list of installed ATPs.
241 See \S\ref{installation} and \S\ref{mode-of-operation} for more information on
244 \item[$\bullet$] \textbf{\textit{running\_atps}:} Prints information about currently
245 running ATPs, including elapsed runtime and remaining time until timeout.
247 \item[$\bullet$] \textbf{\textit{kill\_atps}:} Terminates all running ATPs.
249 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
250 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
253 Sledgehammer's behavior can be influenced by various \textit{options}, which can
254 be specified in brackets after the \textbf{sledgehammer} command. The
255 \textit{options} are a list of key--value pairs of the form ``[$k_1 = v_1,
256 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true}'' is optional. For
260 \textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120$\,s$]
263 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:
266 \textbf{sledgehammer\_params} \textit{options}
269 The supported options are described in \S\ref{option-reference}.
271 The \textit{facts\_override} argument lets you alter the set of facts that go
272 through the relevance filter. It may be of the form ``(\textit{facts})'', where
273 \textit{facts} is a space-separated list of Isabelle facts (theorems, local
274 assumptions, etc.), in which case the relevance filter is bypassed and the given
275 facts are used. It may also be of the form (\textit{add}:\ \textit{facts}$_1$),
276 (\textit{del}:\ \textit{facts}$_2$), or (\textit{add}:\ \textit{facts}$_1$\
277 \textit{del}:\ \textit{facts}$_2$), where the relevance filter is instructed to
278 proceed as usual except that it should consider \textit{facts}$_1$
279 highly-relevant and \textit{facts}$_2$ fully irrelevant.
281 \section{Option Reference}
282 \label{option-reference}
284 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}
285 \def\qty#1{$\left<\textit{#1}\right>$}
286 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1}}\right>}$}
287 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{true}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
288 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{false}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
289 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
290 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
291 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}
292 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\quad [\textit{#3}]} \nopagebreak\\[\parskip]}
293 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}
294 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
295 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
297 Sledgehammer's options are categorized as follows:\ mode of operation
298 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}), output
299 format (\S\ref{output-format}), and timeouts (\S\ref{timeouts}).
301 The descriptions below refer to the following syntactic quantities:
304 \item[$\bullet$] \qtybf{string}: A string.
305 \item[$\bullet$] \qtybf{bool\/}: \textit{true} or \textit{false}.
306 \item[$\bullet$] \qtybf{bool\_or\_smart\/}: \textit{true}, \textit{false}, or \textit{smart}.
307 \item[$\bullet$] \qtybf{int\/}: An integer.
308 \item[$\bullet$] \qtybf{time}: An integer followed by $\textit{min}$ (minutes), $s$ (seconds), or \textit{ms}
309 (milliseconds), or the keyword \textit{none} ($\infty$ years).
312 Default values are indicated in square brackets. Boolean options have a negated
313 counterpart (e.g., \textit{debug} vs.\ \textit{no\_debug}). When setting
314 Boolean options, ``= \textit{true}'' may be omitted.
316 \subsection{Mode of Operation}
317 \label{mode-of-operation}
320 %\optrue{blocking}{non\_blocking}
321 %Specifies whether the \textbf{sledgehammer} command should operate synchronously.
322 %The asynchronous (non-blocking) mode lets the user start proving the putative
323 %theorem while Sledgehammer looks for a counterexample, but it can also be more
324 %confusing. For technical reasons, automatic runs currently always block.
326 \opnodefault{atps}{string}
327 Specifies the ATPs (automated theorem provers) to use as a space-separated list
328 (e.g., ``\textit{e}~\textit{spass}''). The following ATPs are supported:
331 \item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz
332 \cite{schulz-2002}. To use E, set the environment variable
333 \texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable,
334 or install the prebuilt E package from Isabelle's download page. See
335 \S\ref{installation} for details.
337 \item[$\bullet$] \textbf{\textit{spass}:} SPASS is an ATP developed by Christoph
338 Weidenbach et al.\ \cite{weidenbach-et-al-2009}. To use SPASS, set the
339 environment variable \texttt{SPASS\_HOME} to the directory that contains the
340 \texttt{SPASS} executable, or install the prebuilt SPASS package from Isabelle's
341 download page. Sledgehammer requires version 3.5 or above. See
342 \S\ref{installation} for details.
344 \item[$\bullet$] \textbf{\textit{spass\_dfg}:} Same as the above, except that
345 Sledgehammer communicates with SPASS using the native DFG syntax rather than the
346 TPTP syntax. Sledgehammer requires version 3.0 or above. This ATP is provided
347 for compatibility reasons.
349 \item[$\bullet$] \textbf{\textit{vampire}:} Vampire is an ATP developed by
350 Andrei Voronkov and his colleagues \cite{riazanov-voronkov-2002}. To use
351 Vampire, set the environment variable \texttt{VAMPIRE\_HOME} to the directory
352 that contains the \texttt{vampire} executable.
354 \item[$\bullet$] \textbf{\textit{remote\_e}:} The remote version of E executes
355 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
357 \item[$\bullet$] \textbf{\textit{remote\_spass}:} The remote version of SPASS
358 executes on Geoff Sutcliffe's Miami servers.
360 \item[$\bullet$] \textbf{\textit{remote\_vampire}:} The remote version of
361 Vampire executes on Geoff Sutcliffe's Miami servers. Version 9 is used.
365 By default, Sledgehammer will run E, SPASS, and Vampire in parallel. For E and
366 SPASS, it will use any locally installed version if available, falling back
367 on the remote versions if necessary. For historical reasons, the default value
368 of this option can be overridden using the option ``Sledgehammer: ATPs'' from
369 the ``Isabelle'' menu in Proof General.
371 It is a good idea to run several ATPs in parallel, although it could slow down
372 your machine. Tobias Nipkow observed that running E, SPASS, and Vampire together
373 for 5 seconds yields the same success rate as running the most effective of
374 these (Vampire) for 120 seconds \cite{boehme-nipkow-2010}.
376 \opnodefault{atp}{string}
377 Alias for \textit{atps}.
379 \opfalse{overlord}{no\_overlord}
380 Specifies whether Sledgehammer should put its temporary files in
381 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for
382 debugging Sledgehammer but also unsafe if several instances of the tool are run
383 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may
384 safely remove them after Sledgehammer has run.
387 {\small See also \textit{debug} (\S\ref{output-format}).}
390 \subsection{Problem Encoding}
391 \label{problem-encoding}
394 \opfalse{explicit\_apply}{implicit\_apply}
395 Specifies whether function application should be encoded as an explicit
396 ``apply'' operator. If the option is set to \textit{false}, each function will
397 be directly applied to as many arguments as possible. Enabling this option can
398 sometimes help discover higher-order proofs that otherwise would not be found.
400 \opfalse{full\_types}{partial\_types}
401 Specifies whether full-type information is exported. Enabling this option can
402 prevent the discovery of type-incorrect proofs, but it also tends to slow down
403 the ATPs significantly. For historical reasons, the default value of this option
404 can be overridden using the option ``Sledgehammer: ATPs'' from the ``Isabelle''
405 menu in Proof General.
407 \opdefault{relevance\_threshold}{int}{50}
408 Specifies the threshold above which facts are considered relevant by the
409 relevance filter. The option ranges from 0 to 100, where 0 means that all
410 theorems are relevant.
412 \opdefault{relevance\_convergence}{int}{320}
413 Specifies the convergence quotient, multiplied by 100, used by the relevance
414 filter. This quotient is used by the relevance filter to scale down the
415 relevance of facts at each iteration of the filter.
417 \opsmartx{theory\_relevant}{theory\_irrelevant}
418 Specifies whether the theory from which a fact comes should be taken into
419 consideration by the relevance filter. If the option is set to \textit{smart},
420 it is taken to be \textit{true} for SPASS and \textit{false} for E and Vampire,
421 because empirical results suggest that these are the best settings.
423 \opfalse{defs\_relevant}{defs\_irrelevant}
424 Specifies whether the definition of constants occurring in the formula to prove
425 should be considered particularly relevant. Enabling this option tends to lead
426 to larger problems and typically slows down the ATPs.
428 \optrue{respect\_no\_atp}{ignore\_no\_atp}
429 Specifies whether Sledgehammer should honor the \textit{no\_atp} attributes. The
430 \textit{no\_atp} attributes marks theorems that tend to confuse ATPs, typically
431 because they can lead to unsound ATP proofs \cite{boehme-nipkow-2010}. It is
432 normally a good idea to leave this option enabled, unless you are debugging
437 \subsection{Output Format}
438 \label{output-format}
442 \opfalse{verbose}{quiet}
443 Specifies whether the \textbf{sledgehammer} command should explain what it does.
445 \opfalse{debug}{no\_debug}
446 Specifies whether Nitpick should display additional debugging information beyond
447 what \textit{verbose} already displays. Enabling \textit{debug} also enables
448 \textit{verbose} behind the scenes.
451 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}
453 \opfalse{isar\_proof}{no\_isar\_proof}
454 Specifies whether Isar proofs should be output in addition to one-liner
455 \textit{metis} proofs. Isar proof construction is still experimental and often
456 fails; however, they are usually faster and sometimes more robust than
457 \textit{metis} proofs.
459 \opdefault{isar\_shrink\_factor}{int}{1}
460 Specifies the granularity of the Isar proof. A value of $n$ indicates that each
461 Isar proof step should correspond to a group of up to $n$ consecutive proof
462 steps in the ATP proof.
466 \subsection{Timeouts}
470 \opdefault{timeout}{time}{$\mathbf{60}$ s}
471 Specifies the maximum amount of time that the ATPs should spend looking for a
472 proof. For historical reasons, the default value of this option can be
473 overridden using the option ``Sledgehammer: Time Limit'' from the ``Isabelle''
474 menu in Proof General.
476 \opdefault{minimize\_timeout}{time}{$\mathbf{5}$\,s}
477 Specifies the maximum amount of time that the ATPs should spend looking for a
478 proof for \textbf{sledgehammer}~\textit{minimize}.
482 \bibliography{../manual}{}
483 \bibliographystyle{abbrv}