1 \documentclass[a4paper,12pt]{article}
2 \usepackage[T1]{fontenc}
5 \usepackage[english,french]{babel}
12 %\usepackage[scaled=.85]{beramono}
13 \usepackage{../iman,../pdfsetup}
16 %\evensidemargin=4.6mm
23 \def\Colon{\mathord{:\mkern-1.5mu:}}
24 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}
25 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}
26 \def\lparr{\mathopen{(\mkern-4mu\mid}}
27 \def\rparr{\mathclose{\mid\mkern-4mu)}}
30 \def\undef{(\lambda x.\; \unk)}
31 %\def\unr{\textit{others}}
33 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}
34 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}
40 \selectlanguage{english}
42 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]
43 Hammering Away \\[\smallskipamount]
44 \Large A User's Guide to Sledgehammer for Isabelle/HOL}
46 Jasmin Christian Blanchette \\
47 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\
54 \setlength{\parskip}{.7em plus .2em minus .1em}
55 \setlength{\parindent}{0pt}
56 \setlength{\abovedisplayskip}{\parskip}
57 \setlength{\abovedisplayshortskip}{.9\parskip}
58 \setlength{\belowdisplayskip}{\parskip}
59 \setlength{\belowdisplayshortskip}{.9\parskip}
61 % General-purpose enum environment with correct spacing
62 \newenvironment{enum}%
64 \setlength{\topsep}{.1\parskip}%
65 \setlength{\partopsep}{.1\parskip}%
66 \setlength{\itemsep}{\parskip}%
67 \advance\itemsep by-\parsep}}
70 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin
71 \advance\rightskip by\leftmargin}
72 \def\post{\vskip0pt plus1ex\endgroup}
74 \def\prew{\pre\advance\rightskip by-\leftmargin}
77 \section{Introduction}
80 Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs)
81 on the current goal. The supported ATPs are E \cite{schulz-2002}, SPASS
82 \cite{weidenbach-et-al-2009}, and Vampire \cite{riazanov-voronkov-2002}, which
83 can be run locally or remotely via the SystemOnTPTP web service
84 \cite{sutcliffe-2000}.
86 The problem passed to ATPs consists of the current goal together with a
87 heuristic selection of facts (theorems) from the current theory context,
88 filtered by relevance. The result of a successful ATP proof search is some
89 source text that usually (but not always) reconstructs the proof within
90 Isabelle, without requiring the ATPs again. The reconstructed proof relies on
91 the general-purpose Metis prover \cite{metis}, which is fully integrated into
92 Isabelle/HOL, with explicit inferences going through the kernel. Thus its
93 results are correct by construction.
96 \setbox\boxA=\hbox{\texttt{nospam}}
98 Examples of Sledgehammer use can be found in Isabelle's
99 \texttt{src/HOL/Metis\_Examples} directory.
100 Comments and bug reports concerning Sledgehammer or this manual should be
102 \texttt{blan{\color{white}nospam}\kern-\wd\boxA{}chette@\allowbreak
103 in.\allowbreak tum.\allowbreak de}.
105 \vskip2.5\smallskipamount
107 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for
108 %suggesting several textual improvements.
110 \section{Installation}
113 Sledgehammer is part of Isabelle, so you don't need to install it. However, it
114 relies on third-party automatic theorem provers (ATPs). Currently, E, SPASS, and
115 Vampire are supported. All of these are available remotely via SystemOnTPTP
116 \cite{sutcliffe-2000}, but if you want better performance you will need to
117 install at least E and SPASS locally.
119 There are three main ways to install E and SPASS:
122 \item[$\bullet$] If you installed an official Isabelle package with everything
123 inside, it should already include properly setup executables for E and SPASS,
126 \item[$\bullet$] Otherwise, you can download the Isabelle-aware E and SPASS
127 binary packages from Isabelle's download page. Extract the archives, then add a
128 line to your \texttt{\char`\~/.isabelle/etc/components} file with the absolute path to
129 E or SPASS. For example, if \texttt{\char`\~/.isabelle/etc/components} does not exist
130 yet and you extracted SPASS to \texttt{/usr/local/spass-3.7}, create
131 the file \texttt{\char`\~/.isabelle/etc/components} with the single line
134 \texttt{/usr/local/spass-3.7}
137 \item[$\bullet$] If you prefer to build E or SPASS yourself, feel free to do so
138 and set the environment variable \texttt{E\_HOME} or \texttt{SPASS\_HOME} to the
139 directory that contains the \texttt{eproof} or \texttt{SPASS} executable,
143 To check whether E and SPASS are installed, follow the example in
146 \section{First Steps}
149 To illustrate Sledgehammer in context, let us start a theory file and
150 attempt to prove a simple lemma:
153 \textbf{theory}~\textit{Scratch} \\
154 \textbf{imports}~\textit{Main} \\
155 \textbf{begin} \\[2\smallskipamount]
157 \textbf{lemma} ``$[a] = [b] \,\longleftrightarrow\, a = b$'' \\
158 \textbf{sledgehammer}
161 After a few seconds, Sledgehammer produces the following output:
165 Sledgehammer: ATP ``\textit{e}'' for subgoal 1: \\
166 $([a] = [b]) = (a = b)$ \\
167 Try this command: \textbf{by} (\textit{metis hd.simps}). \\
168 To minimize the number of lemmas, try this command: \\
169 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount]
171 Sledgehammer: ATP ``\textit{spass}'' for subgoal 1: \\
172 $([a] = [b]) = (a = b)$ \\
173 Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\
174 To minimize the number of lemmas, try this command: \\
175 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount]
177 Sledgehammer: ATP ``\textit{remote\_vampire}'' for subgoal 1: \\
178 $([a] = [b]) = (a = b)$ \\
179 Try this command: \textbf{by} (\textit{metis One\_nat\_def\_raw empty\_replicate} \\
180 \phantom{Try this command: \textbf{by} (\textit{metis~}}\textit{insert\_Nil last\_ConsL replicate\_Suc}). \\
181 To minimize the number of lemmas, try this command: \\
182 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{remote\_vampire}] \\
183 \phantom{\textbf{sledgehammer}~}(\textit{One\_nat\_def\_raw empty\_replicate insert\_Nil} \\
184 \phantom{\textbf{sledgehammer}~(}\textit{last\_ConsL replicate\_Suc}).
187 Sledgehammer ran E, SPASS, and the remote version of Vampire in parallel. If E
188 and SPASS are not installed (\S\ref{installation}), you will see references to
189 their remote American cousins \textit{remote\_e} and \textit{remote\_spass}
190 instead of \textit{e} and \textit{spass}.
192 Based on each ATP proof, Sledgehammer gives a one-liner proof that uses the
193 \textit{metis} method. You can click them and insert them into the theory text.
194 You can click the ``\textbf{sledgehammer} \textit{minimize}'' command if you
195 want to look for a shorter (and faster) proof. But here the proof found by E
196 looks perfect, so click it to finish the proof.
198 You can ask Sledgehammer for an Isar text proof by passing the
199 \textit{isar\_proof} option:
202 \textbf{sledgehammer} [\textit{isar\_proof}]
205 When Isar proof construction is successful, it can yield proofs that are more
206 readable and also faster than the \textit{metis} one-liners. This feature is
209 \section{Command Syntax}
210 \label{command-syntax}
212 Sledgehammer can be invoked at any point when there is an open goal by entering
213 the \textbf{sledgehammer} command in the theory file. Its general syntax is as
217 \textbf{sledgehammer} \textit{subcommand\/$^?$ options\/$^?$ facts\_override\/$^?$ num\/$^?$}
220 For convenience, Sledgehammer is also available in the ``Commands'' submenu of
221 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c
222 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with
223 no arguments in the theory text.
225 In the general syntax, the \textit{subcommand} may be any of the following:
228 \item[$\bullet$] \textbf{\textit{run} (the default):} Runs Sledgehammer on subgoal number
229 \textit{num} (1 by default), with the given options and facts.
231 \item[$\bullet$] \textbf{\textit{minimize}:} Attempts to minimize the provided facts
232 (specified in the \textit{facts\_override} argument) to obtain a simpler proof
233 involving fewer facts. The options and goal number are as for \textit{run}.
235 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued by
236 Sledgehammer. This allows you to examine results that might have been lost due
237 to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a
238 limit on the number of messages to display (5 by default).
240 \item[$\bullet$] \textbf{\textit{available\_atps}:} Prints the list of installed ATPs.
241 See \S\ref{installation} and \S\ref{mode-of-operation} for more information on
244 \item[$\bullet$] \textbf{\textit{running\_atps}:} Prints information about currently
245 running ATPs, including elapsed runtime and remaining time until timeout.
247 \item[$\bullet$] \textbf{\textit{kill\_atps}:} Terminates all running ATPs.
249 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
250 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
253 Sledgehammer's behavior can be influenced by various \textit{options}, which can
254 be specified in brackets after the \textbf{sledgehammer} command. The
255 \textit{options} are a list of key--value pairs of the form ``[$k_1 = v_1,
256 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true}'' is optional. For
260 \textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120$\,s$]
263 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:
266 \textbf{sledgehammer\_params} \textit{options}
269 The supported options are described in \S\ref{option-reference}.
271 The \textit{facts\_override} argument lets you alter the set of facts that go
272 through the relevance filter. It may be of the form ``(\textit{facts})'', where
273 \textit{facts} is a space-separated list of Isabelle facts (theorems, local
274 assumptions, etc.), in which case the relevance filter is bypassed and the given
275 facts are used. It may also be of the form (\textit{add}:\ \textit{facts}$_1$),
276 (\textit{del}:\ \textit{facts}$_2$), or (\textit{add}:\ \textit{facts}$_1$\
277 \textit{del}:\ \textit{facts}$_2$), where the relevance filter is instructed to
278 proceed as usual except that it should consider \textit{facts}$_1$
279 highly-relevant and \textit{facts}$_2$ fully irrelevant.
281 \section{Option Reference}
282 \label{option-reference}
284 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}
285 \def\qty#1{$\left<\textit{#1}\right>$}
286 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1}}\right>}$}
287 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{true}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
288 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{false}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
289 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
290 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
291 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}
292 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\quad [\textit{#3}]} \nopagebreak\\[\parskip]}
293 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}
294 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
295 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
297 Sledgehammer's options are categorized as follows:\ mode of operation
298 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}), output
299 format (\S\ref{output-format}), and timeouts (\S\ref{timeouts}).
301 The descriptions below refer to the following syntactic quantities:
304 \item[$\bullet$] \qtybf{string}: A string.
305 \item[$\bullet$] \qtybf{bool\/}: \textit{true} or \textit{false}.
306 \item[$\bullet$] \qtybf{bool\_or\_smart\/}: \textit{true}, \textit{false}, or \textit{smart}.
307 \item[$\bullet$] \qtybf{int\/}: An integer.
308 \item[$\bullet$] \qtybf{time}: An integer followed by $\textit{min}$ (minutes), $s$ (seconds), or \textit{ms}
309 (milliseconds), or the keyword \textit{none} ($\infty$ years).
312 Default values are indicated in square brackets. Boolean options have a negated
313 counterpart (e.g., \textit{debug} vs.\ \textit{no\_debug}). When setting
314 Boolean options, ``= \textit{true}'' may be omitted.
316 \subsection{Mode of Operation}
317 \label{mode-of-operation}
320 %\optrue{blocking}{non\_blocking}
321 %Specifies whether the \textbf{sledgehammer} command should operate synchronously.
322 %The asynchronous (non-blocking) mode lets the user start proving the putative
323 %theorem while Sledgehammer looks for a counterexample, but it can also be more
324 %confusing. For technical reasons, automatic runs currently always block.
326 \opnodefault{atps}{string}
327 Specifies the ATPs (automated theorem provers) to use as a space-separated list
328 (e.g., ``\textit{e}~\textit{spass}''). The following ATPs are supported:
331 \item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz
332 \cite{schulz-2002}. To use E, set the environment variable
333 \texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable,
334 or install the prebuilt E package from Isabelle's download page. See
335 \S\ref{installation} for details.
337 \item[$\bullet$] \textbf{\textit{spass}:} SPASS is an ATP developed by Christoph
338 Weidenbach et al.\ \cite{weidenbach-et-al-2009}. To use SPASS, set the
339 environment variable \texttt{SPASS\_HOME} to the directory that contains the
340 \texttt{SPASS} executable, or install the prebuilt SPASS package from Isabelle's
341 download page. See \S\ref{installation} for details.
343 \item[$\bullet$] \textbf{\textit{spass\_tptp}:} Same as the above, except that
344 Sledgehammer communicates with SPASS using the TPTP syntax rather than the
345 native DFG syntax. This ATP is provided for experimental purposes.
347 \item[$\bullet$] \textbf{\textit{vampire}:} Vampire is an ATP developed by
348 Andrei Voronkov and his colleagues \cite{riazanov-voronkov-2002}. To use
349 Vampire, set the environment variable \texttt{VAMPIRE\_HOME} to the directory
350 that contains the \texttt{vampire} executable.
352 \item[$\bullet$] \textbf{\textit{remote\_e}:} The remote version of E executes
353 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
355 \item[$\bullet$] \textbf{\textit{remote\_spass}:} The remote version of SPASS
356 executes on Geoff Sutcliffe's Miami servers.
358 \item[$\bullet$] \textbf{\textit{remote\_vampire}:} The remote version of
359 Vampire executes on Geoff Sutcliffe's Miami servers. Version 9 is used.
363 By default, Sledgehammer will run E, SPASS, and Vampire in parallel. For E and
364 SPASS, it will use any locally installed version if available, falling back
365 on the remote versions if necessary. For historical reasons, the default value
366 of this option can be overridden using the option ``Sledgehammer: ATPs'' from
367 the ``Isabelle'' menu in Proof General.
369 It is a good idea to run several ATPs in parallel, although it could slow down
370 your machine. Tobias Nipkow observed that running E, SPASS, and Vampire together
371 for 5 seconds yields the same success rate as running the most effective of
372 these (Vampire) for 120 seconds \cite{boehme-nipkow-2010}.
374 \opnodefault{atp}{string}
375 Alias for \textit{atps}.
377 \opfalse{overlord}{no\_overlord}
378 Specifies whether Sledgehammer should put its temporary files in
379 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for
380 debugging Sledgehammer but also unsafe if several instances of the tool are run
381 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may
382 safely remove them after Sledgehammer has run.
385 {\small See also \textit{debug} (\S\ref{output-format}).}
388 \subsection{Problem Encoding}
389 \label{problem-encoding}
392 \opfalse{explicit\_apply}{implicit\_apply}
393 Specifies whether function application should be encoded as an explicit
394 ``apply'' operator. If the option is set to \textit{false}, each function will
395 be directly applied to as many arguments as possible. Enabling this option can
396 sometimes help discover higher-order proofs that otherwise would not be found.
398 \opfalse{full\_types}{partial\_types}
399 Specifies whether full-type information is exported. Enabling this option can
400 prevent the discovery of type-incorrect proofs, but it also tends to slow down
401 the ATPs significantly. For historical reasons, the default value of this option
402 can be overridden using the option ``Sledgehammer: ATPs'' from the ``Isabelle''
403 menu in Proof General.
405 \opdefault{relevance\_threshold}{int}{50}
406 Specifies the threshold above which facts are considered relevant by the
407 relevance filter. The option ranges from 0 to 100, where 0 means that all
408 theorems are relevant.
410 \opdefault{relevance\_convergence}{int}{320}
411 Specifies the convergence quotient, multiplied by 100, used by the relevance
412 filter. This quotient is used by the relevance filter to scale down the
413 relevance of facts at each iteration of the filter.
415 \opsmartx{theory\_relevant}{theory\_irrelevant}
416 Specifies whether the theory from which a fact comes should be taken into
417 consideration by the relevance filter. If the option is set to \textit{smart},
418 it is taken to be \textit{true} for SPASS and \textit{false} for E and Vampire,
419 because empirical results suggest that these are the best settings.
421 \opfalse{defs\_relevant}{defs\_irrelevant}
422 Specifies whether the definition of constants occurring in the formula to prove
423 should be considered particularly relevant. Enabling this option tends to lead
424 to larger problems and typically slows down the ATPs.
426 \optrue{respect\_no\_atp}{ignore\_no\_atp}
427 Specifies whether Sledgehammer should honor the \textit{no\_atp} attributes. The
428 \textit{no\_atp} attributes marks theorems that tend to confuse ATPs, typically
429 because they can lead to unsound ATP proofs \cite{boehme-nipkow-2010}. It is
430 normally a good idea to leave this option enabled, unless you are debugging
435 \subsection{Output Format}
436 \label{output-format}
440 \opfalse{verbose}{quiet}
441 Specifies whether the \textbf{sledgehammer} command should explain what it does.
443 \opfalse{debug}{no\_debug}
444 Specifies whether Nitpick should display additional debugging information beyond
445 what \textit{verbose} already displays. Enabling \textit{debug} also enables
446 \textit{verbose} behind the scenes.
449 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}
451 \opfalse{isar\_proof}{no\_isar\_proof}
452 Specifies whether Isar proofs should be output in addition to one-liner
453 \textit{metis} proofs. Isar proof construction is still experimental and often
454 fails; however, they are usually faster and sometimes more robust than
455 \textit{metis} proofs.
457 \opdefault{isar\_shrink\_factor}{int}{1}
458 Specifies the granularity of the Isar proof. A value of $n$ indicates that each
459 Isar proof step should correspond to a group of up to $n$ consecutive proof
460 steps in the ATP proof.
464 \subsection{Timeouts}
468 \opdefault{timeout}{time}{$\mathbf{60}$ s}
469 Specifies the maximum amount of time that the ATPs should spend looking for a
470 proof. For historical reasons, the default value of this option can be
471 overridden using the option ``Sledgehammer: Time Limit'' from the ``Isabelle''
472 menu in Proof General.
474 \opdefault{minimize\_timeout}{time}{$\mathbf{5}$\,s}
475 Specifies the maximum amount of time that the ATPs should spend looking for a
476 proof for \textbf{sledgehammer}~\textit{minimize}.
480 \bibliography{../manual}{}
481 \bibliographystyle{abbrv}