doc-src/Sledgehammer/sledgehammer.tex
author blanchet
Sat, 04 Feb 2012 12:08:18 +0100
changeset 47237 d4754183ccce
parent 47194 2ded91a6cbd5
child 47239 cafa325419f6
permissions -rw-r--r--
made option available to users (mostly for experiments)
     1 \documentclass[a4paper,12pt]{article}
     2 \usepackage[T1]{fontenc}
     3 \usepackage{amsmath}
     4 \usepackage{amssymb}
     5 \usepackage[english,french]{babel}
     6 \usepackage{color}
     7 \usepackage{footmisc}
     8 \usepackage{graphicx}
     9 %\usepackage{mathpazo}
    10 \usepackage{multicol}
    11 \usepackage{stmaryrd}
    12 %\usepackage[scaled=.85]{beramono}
    13 \usepackage{../../lib/texinputs/isabelle,../iman,../pdfsetup}
    14 
    15 \newcommand\download{\url{http://www21.in.tum.de/~blanchet/\#software}}
    16 
    17 \def\qty#1{\ensuremath{\left<\mathit{#1\/}\right>}}
    18 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1\/}}\right>}$}
    19 
    20 \newcommand\const[1]{\textsf{#1}}
    21 
    22 %\oddsidemargin=4.6mm
    23 %\evensidemargin=4.6mm
    24 %\textwidth=150mm
    25 %\topmargin=4.6mm
    26 %\headheight=0mm
    27 %\headsep=0mm
    28 %\textheight=234mm
    29 
    30 \def\Colon{\mathord{:\mkern-1.5mu:}}
    31 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}
    32 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}
    33 \def\lparr{\mathopen{(\mkern-4mu\mid}}
    34 \def\rparr{\mathclose{\mid\mkern-4mu)}}
    35 
    36 \def\unk{{?}}
    37 \def\undef{(\lambda x.\; \unk)}
    38 %\def\unr{\textit{others}}
    39 \def\unr{\ldots}
    40 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}
    41 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}
    42 
    43 \urlstyle{tt}
    44 
    45 \begin{document}
    46 
    47 %%% TYPESETTING
    48 %\renewcommand\labelitemi{$\bullet$}
    49 \renewcommand\labelitemi{\raise.065ex\hbox{\small\textbullet}}
    50 
    51 \selectlanguage{english}
    52 
    53 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]
    54 Hammering Away \\[\smallskipamount]
    55 \Large A User's Guide to Sledgehammer for Isabelle/HOL}
    56 \author{\hbox{} \\
    57 Jasmin Christian Blanchette \\
    58 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\[4\smallskipamount]
    59 {\normalsize with contributions from} \\[4\smallskipamount]
    60 Lawrence C. Paulson \\
    61 {\normalsize Computer Laboratory, University of Cambridge} \\
    62 \hbox{}}
    63 
    64 \maketitle
    65 
    66 \tableofcontents
    67 
    68 \setlength{\parskip}{.7em plus .2em minus .1em}
    69 \setlength{\parindent}{0pt}
    70 \setlength{\abovedisplayskip}{\parskip}
    71 \setlength{\abovedisplayshortskip}{.9\parskip}
    72 \setlength{\belowdisplayskip}{\parskip}
    73 \setlength{\belowdisplayshortskip}{.9\parskip}
    74 
    75 % General-purpose enum environment with correct spacing
    76 \newenvironment{enum}%
    77     {\begin{list}{}{%
    78         \setlength{\topsep}{.1\parskip}%
    79         \setlength{\partopsep}{.1\parskip}%
    80         \setlength{\itemsep}{\parskip}%
    81         \advance\itemsep by-\parsep}}
    82     {\end{list}}
    83 
    84 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin
    85 \advance\rightskip by\leftmargin}
    86 \def\post{\vskip0pt plus1ex\endgroup}
    87 
    88 \def\prew{\pre\advance\rightskip by-\leftmargin}
    89 \def\postw{\post}
    90 
    91 \section{Introduction}
    92 \label{introduction}
    93 
    94 Sledgehammer is a tool that applies automatic theorem provers (ATPs)
    95 and satisfiability-modulo-theories (SMT) solvers on the current goal. The
    96 supported ATPs are E \cite{schulz-2002}, E-SInE \cite{sine}, E-ToFoF
    97 \cite{tofof}, LEO-II \cite{leo2}, Satallax \cite{satallax}, SNARK \cite{snark},
    98 SPASS \cite{weidenbach-et-al-2009}, Vampire \cite{riazanov-voronkov-2002}, and
    99 Waldmeister \cite{waldmeister}. The ATPs are run either locally or remotely via
   100 the System\-On\-TPTP web service \cite{sutcliffe-2000}. In addition to the ATPs,
   101 the SMT solvers Z3 \cite{z3} is used by default, and you can tell Sledgehammer
   102 to try CVC3 \cite{cvc3} and Yices \cite{yices} as well; these are run either
   103 locally or on a server at the TU M\"unchen.
   104 
   105 The problem passed to the automatic provers consists of your current goal
   106 together with a heuristic selection of hundreds of facts (theorems) from the
   107 current theory context, filtered by relevance. Because jobs are run in the
   108 background, you can continue to work on your proof by other means. Provers can
   109 be run in parallel. Any reply (which may arrive half a minute later) will appear
   110 in the Proof General response buffer.
   111 
   112 The result of a successful proof search is some source text that usually (but
   113 not always) reconstructs the proof within Isabelle. For ATPs, the reconstructed
   114 proof relies on the general-purpose \textit{metis} proof method, which
   115 integrates the Metis ATP in Isabelle/HOL with explicit inferences going through
   116 the kernel. Thus its results are correct by construction.
   117 
   118 In this manual, we will explicitly invoke the \textbf{sledgehammer} command.
   119 Sledgehammer also provides an automatic mode that can be enabled via the ``Auto
   120 Sledgehammer'' option in Proof General's ``Isabelle'' menu. In this mode,
   121 Sledgehammer is run on every newly entered theorem. The time limit for Auto
   122 Sledgehammer and other automatic tools can be set using the ``Auto Tools Time
   123 Limit'' option.
   124 
   125 \newbox\boxA
   126 \setbox\boxA=\hbox{\texttt{NOSPAM}}
   127 
   128 \newcommand\authoremail{\texttt{blan{\color{white}NOSPAM}\kern-\wd\boxA{}chette@\allowbreak
   129 in.\allowbreak tum.\allowbreak de}}
   130 
   131 To run Sledgehammer, you must make sure that the theory \textit{Sledgehammer} is
   132 imported---this is rarely a problem in practice since it is part of
   133 \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's
   134 \texttt{src/HOL/Metis\_Examples} directory.
   135 Comments and bug reports concerning Sledgehammer or this manual should be
   136 directed to the author at \authoremail.
   137 
   138 \vskip2.5\smallskipamount
   139 
   140 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for
   141 %suggesting several textual improvements.
   142 
   143 \section{Installation}
   144 \label{installation}
   145 
   146 Sledgehammer is part of Isabelle, so you don't need to install it. However, it
   147 relies on third-party automatic provers (ATPs and SMT solvers).
   148 
   149 Among the ATPs, E, LEO-II, Satallax, SPASS, and Vampire can be run locally; in
   150 addition, E, E-SInE, E-ToFoF, iProver, iProver-Eq, LEO-II, Satallax, SNARK,
   151 Waldmeister, and Vampire are available remotely via System\-On\-TPTP
   152 \cite{sutcliffe-2000}. If you want better performance, you should at least
   153 install E and SPASS locally.
   154 
   155 Among the SMT solvers, CVC3, Yices, and Z3 can be run locally, and CVC3 and Z3
   156 can be run remotely on a TU M\"unchen server. If you want better performance and
   157 get the ability to replay proofs that rely on the \emph{smt} proof method
   158 without an Internet connection, you should at least install Z3 locally.
   159 
   160 There are three main ways to install automatic provers on your machine:
   161 
   162 \begin{sloppy}
   163 \begin{enum}
   164 \item[\labelitemi] If you installed an official Isabelle package, it should
   165 already include properly setup executables for CVC3, E, SPASS, and Z3, ready to use.%
   166 \footnote{Vampire's and Yices's licenses prevent us from doing the same for
   167 these otherwise remarkable tools.}
   168 For Z3, you must additionally set the variable
   169 \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a
   170 noncommercial user, either in the environment in which Isabelle is
   171 launched or in your
   172 \texttt{\char`\~/\$ISABELLE\_HOME\_USER/etc/settings} file.
   173 
   174 \item[\labelitemi] Alternatively, you can download the Isabelle-aware CVC3, E,
   175 SPASS, and Z3 binary packages from \download. Extract the archives, then add a
   176 line to your \texttt{\$ISABELLE\_HOME\_USER\slash etc\slash components}%
   177 \footnote{The variable \texttt{\$ISABELLE\_HOME\_USER} is set by Isabelle at
   178 startup. Its value can be retrieved by executing \texttt{isabelle}
   179 \texttt{getenv} \texttt{ISABELLE\_HOME\_USER} on the command line.}
   180 file with the absolute path to CVC3, E, SPASS, or Z3. For example, if the
   181 \texttt{components} file does not exist yet and you extracted SPASS to
   182 \texttt{/usr/local/spass-3.7}, create it with the single line
   183 
   184 \prew
   185 \texttt{/usr/local/spass-3.7}
   186 \postw
   187 
   188 (including an invisible newline character) in it.
   189 
   190 \item[\labelitemi] If you prefer to build E, LEO-II, Satallax, or SPASS
   191 manually, or found a Vampire executable somewhere (e.g.,
   192 \url{http://www.vprover.org/}), set the environment variable \texttt{E\_HOME},
   193 \texttt{LEO2\_HOME}, \texttt{SATALLAX\_HOME}, \texttt{SPASS\_HOME}, or
   194 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{eproof},
   195 \texttt{leo}, \texttt{satallax}, \texttt{SPASS}, or \texttt{vampire} executable.
   196 Sledgehammer has been tested with E 1.0 to 1.4, LEO-II 1.2.9, Satallax 2.2,
   197 SPASS 3.5 and 3.7, and Vampire 0.6, 1.0, and 1.8%
   198 \footnote{Following the rewrite of Vampire, the counter for version numbers was
   199 reset to 0; hence the (new) Vampire versions 0.6, 1.0, and 1.8 are more recent
   200 than 9.0 or 11.5.}%
   201 . Since the ATPs' output formats are neither documented nor stable, other
   202 versions of the ATPs might not work well with Sledgehammer. Ideally,
   203 also set \texttt{E\_VERSION}, \texttt{LEO2\_VERSION},
   204 \texttt{SATALLAX\_VERSION}, \texttt{SPASS\_VERSION}, or
   205 \texttt{VAMPIRE\_VERSION} to the prover's version number (e.g., ``1.4'').
   206 
   207 Similarly, if you want to build CVC3, or found a
   208 Yices or Z3 executable somewhere (e.g.,
   209 \url{http://yices.csl.sri.com/download.shtml} or
   210 \url{http://research.microsoft.com/en-us/um/redmond/projects/z3/download.html}),
   211 set the environment variable \texttt{CVC3\_\allowbreak SOLVER},
   212 \texttt{YICES\_SOLVER}, or \texttt{Z3\_SOLVER} to the complete path of
   213 the executable, \emph{including the file name}. Sledgehammer has been tested
   214 with CVC3 2.2, Yices 1.0.28, and Z3 3.0 to 3.2.
   215 Since the SMT solvers' output formats are somewhat unstable, other
   216 versions of the solvers might not work well with Sledgehammer. Ideally,
   217 also set \texttt{CVC3\_VERSION}, \texttt{YICES\_VERSION}, or
   218 \texttt{Z3\_VERSION} to the solver's version number (e.g., ``3.2'').
   219 \end{enum}
   220 \end{sloppy}
   221 
   222 To check whether E, SPASS, Vampire, and/or Z3 are successfully installed, try
   223 out the example in \S\ref{first-steps}. If the remote versions of any of these
   224 provers is used (identified by the prefix ``\emph{remote\_\/}''), or if the
   225 local versions fail to solve the easy goal presented there, something must be
   226 wrong with the installation.
   227 
   228 Remote prover invocation requires Perl with the World Wide Web Library
   229 (\texttt{libwww-perl}) installed. If you must use a proxy server to access the
   230 Internet, set the \texttt{http\_proxy} environment variable to the proxy, either
   231 in the environment in which Isabelle is launched or in your
   232 \texttt{\char`\~/\$ISABELLE\_HOME\_USER/etc/settings} file. Here are a few
   233 examples:
   234 
   235 \prew
   236 \texttt{http\_proxy=http://proxy.example.org} \\
   237 \texttt{http\_proxy=http://proxy.example.org:8080} \\
   238 \texttt{http\_proxy=http://joeblow:pAsSwRd@proxy.example.org}
   239 \postw
   240 
   241 \section{First Steps}
   242 \label{first-steps}
   243 
   244 To illustrate Sledgehammer in context, let us start a theory file and
   245 attempt to prove a simple lemma:
   246 
   247 \prew
   248 \textbf{theory}~\textit{Scratch} \\
   249 \textbf{imports}~\textit{Main} \\
   250 \textbf{begin} \\[2\smallskipamount]
   251 %
   252 \textbf{lemma} ``$[a] = [b] \,\Longrightarrow\, a = b$'' \\
   253 \textbf{sledgehammer}
   254 \postw
   255 
   256 Instead of issuing the \textbf{sledgehammer} command, you can also find
   257 Sledgehammer in the ``Commands'' submenu of the ``Isabelle'' menu in Proof
   258 General or press the Emacs key sequence C-c C-a C-s.
   259 Either way, Sledgehammer produces the following output after a few seconds:
   260 
   261 \prew
   262 \slshape
   263 Sledgehammer: ``\textit{e\/}'' on goal \\
   264 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   265 Try this: \textbf{by} (\textit{metis last\_ConsL}) (64 ms). \\[3\smallskipamount]
   266 %
   267 Sledgehammer: ``\textit{z3\/}'' on goal \\
   268 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   269 Try this: \textbf{by} (\textit{metis list.inject}) (20 ms). \\[3\smallskipamount]
   270 %
   271 Sledgehammer: ``\textit{vampire\/}'' on goal \\
   272 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   273 Try this: \textbf{by} (\textit{metis hd.simps}) (14 ms). \\[3\smallskipamount]
   274 %
   275 Sledgehammer: ``\textit{spass\/}'' on goal \\
   276 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   277 Try this: \textbf{by} (\textit{metis list.inject}) (17 ms). \\[3\smallskipamount]
   278 %
   279 Sledgehammer: ``\textit{remote\_waldmeister\/}'' on goal \\
   280 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   281 Try this: \textbf{by} (\textit{metis hd.simps}) (15 ms). \\[3\smallskipamount]
   282 %
   283 Sledgehammer: ``\textit{remote\_e\_sine\/}'' on goal \\
   284 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   285 Try this: \textbf{by} (\textit{metis hd.simps}) (18 ms).
   286 \postw
   287 
   288 Sledgehammer ran E, E-SInE, SPASS, Vampire, Waldmeister, and Z3 in parallel.
   289 Depending on which provers are installed and how many processor cores are
   290 available, some of the provers might be missing or present with a
   291 \textit{remote\_} prefix. Waldmeister is run only for unit equational problems,
   292 where the goal's conclusion is a (universally quantified) equation.
   293 
   294 For each successful prover, Sledgehammer gives a one-liner proof that uses
   295 the \textit{metis} or \textit{smt} proof method. Approximate timings are shown
   296 in parentheses, indicating how fast the call is. You can click the proof to
   297 insert it into the theory text.
   298 
   299 In addition, you can ask Sledgehammer for an Isar text proof by passing the
   300 \textit{isar\_proof} option (\S\ref{output-format}):
   301 
   302 \prew
   303 \textbf{sledgehammer} [\textit{isar\_proof}]
   304 \postw
   305 
   306 When Isar proof construction is successful, it can yield proofs that are more
   307 readable and also faster than the \textit{metis} or \textit{smt} one-liners.
   308 This feature is experimental and is only available for ATPs.
   309 
   310 \section{Hints}
   311 \label{hints}
   312 
   313 This section presents a few hints that should help you get the most out of
   314 Sledgehammer. Frequently (and infrequently) asked questions are answered in
   315 \S\ref{frequently-asked-questions}.
   316 
   317 %\newcommand\point[1]{\medskip\par{\sl\bfseries#1}\par\nopagebreak}
   318 \newcommand\point[1]{\subsection{\emph{#1}}}
   319 
   320 \point{Presimplify the goal}
   321 
   322 For best results, first simplify your problem by calling \textit{auto} or at
   323 least \textit{safe} followed by \textit{simp\_all}. The SMT solvers provide
   324 arithmetic decision procedures, but the ATPs typically do not (or if they do,
   325 Sledgehammer does not use it yet). Apart from Waldmeister, they are not
   326 especially good at heavy rewriting, but because they regard equations as
   327 undirected, they often prove theorems that require the reverse orientation of a
   328 \textit{simp} rule. Higher-order problems can be tackled, but the success rate
   329 is better for first-order problems. Hence, you may get better results if you
   330 first simplify the problem to remove higher-order features.
   331 
   332 \point{Make sure E, SPASS, Vampire, and Z3 are locally installed}
   333 
   334 Locally installed provers are faster and more reliable than those running on
   335 servers. See \S\ref{installation} for details on how to install them.
   336 
   337 \point{Familiarize yourself with the most important options}
   338 
   339 Sledgehammer's options are fully documented in \S\ref{command-syntax}. Many of
   340 the options are very specialized, but serious users of the tool should at least
   341 familiarize themselves with the following options:
   342 
   343 \begin{enum}
   344 \item[\labelitemi] \textbf{\textit{provers}} (\S\ref{mode-of-operation}) specifies
   345 the automatic provers (ATPs and SMT solvers) that should be run whenever
   346 Sledgehammer is invoked (e.g., ``\textit{provers}~= \textit{e spass
   347 remote\_vampire\/}''). For convenience, you can omit ``\textit{provers}~=''
   348 and simply write the prover names as a space-separated list (e.g., ``\textit{e
   349 spass remote\_vampire\/}'').
   350 
   351 \item[\labelitemi] \textbf{\textit{max\_relevant}} (\S\ref{relevance-filter})
   352 specifies the maximum number of facts that should be passed to the provers. By
   353 default, the value is prover-dependent but varies between about 150 and 1000. If
   354 the provers time out, you can try lowering this value to, say, 100 or 50 and see
   355 if that helps.
   356 
   357 \item[\labelitemi] \textbf{\textit{isar\_proof}} (\S\ref{output-format}) specifies
   358 that Isar proofs should be generated, instead of one-liner \textit{metis} or
   359 \textit{smt} proofs. The length of the Isar proofs can be controlled by setting
   360 \textit{isar\_shrink\_factor} (\S\ref{output-format}).
   361 
   362 \item[\labelitemi] \textbf{\textit{timeout}} (\S\ref{timeouts}) controls the
   363 provers' time limit. It is set to 30 seconds, but since Sledgehammer runs
   364 asynchronously you should not hesitate to raise this limit to 60 or 120 seconds
   365 if you are the kind of user who can think clearly while ATPs are active.
   366 \end{enum}
   367 
   368 Options can be set globally using \textbf{sledgehammer\_params}
   369 (\S\ref{command-syntax}). The command also prints the list of all available
   370 options with their current value. Fact selection can be influenced by specifying
   371 ``$(\textit{add}{:}~\textit{my\_facts})$'' after the \textbf{sledgehammer} call
   372 to ensure that certain facts are included, or simply ``$(\textit{my\_facts})$''
   373 to force Sledgehammer to run only with $\textit{my\_facts}$.
   374 
   375 \section{Frequently Asked Questions}
   376 \label{frequently-asked-questions}
   377 
   378 This sections answers frequently (and infrequently) asked questions about
   379 Sledgehammer. It is a good idea to skim over it now even if you don't have any
   380 questions at this stage. And if you have any further questions not listed here,
   381 send them to the author at \authoremail.
   382 
   383 \point{Which facts are passed to the automatic provers?}
   384 
   385 The relevance filter assigns a score to every available fact (lemma, theorem,
   386 definition, or axiom) based upon how many constants that fact shares with the
   387 conjecture. This process iterates to include facts relevant to those just
   388 accepted, but with a decay factor to ensure termination. The constants are
   389 weighted to give unusual ones greater significance. The relevance filter copes
   390 best when the conjecture contains some unusual constants; if all the constants
   391 are common, it is unable to discriminate among the hundreds of facts that are
   392 picked up. The relevance filter is also memoryless: It has no information about
   393 how many times a particular fact has been used in a proof, and it cannot learn.
   394 
   395 The number of facts included in a problem varies from prover to prover, since
   396 some provers get overwhelmed more easily than others. You can show the number of
   397 facts given using the \textit{verbose} option (\S\ref{output-format}) and the
   398 actual facts using \textit{debug} (\S\ref{output-format}).
   399 
   400 Sledgehammer is good at finding short proofs combining a handful of existing
   401 lemmas. If you are looking for longer proofs, you must typically restrict the
   402 number of facts, by setting the \textit{max\_relevant} option
   403 (\S\ref{relevance-filter}) to, say, 25 or 50.
   404 
   405 You can also influence which facts are actually selected in a number of ways. If
   406 you simply want to ensure that a fact is included, you can specify it using the
   407 ``$(\textit{add}{:}~\textit{my\_facts})$'' syntax. For example:
   408 %
   409 \prew
   410 \textbf{sledgehammer} (\textit{add}: \textit{hd.simps} \textit{tl.simps})
   411 \postw
   412 %
   413 The specified facts then replace the least relevant facts that would otherwise be
   414 included; the other selected facts remain the same.
   415 If you want to direct the selection in a particular direction, you can specify
   416 the facts via \textbf{using}:
   417 %
   418 \prew
   419 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
   420 \textbf{sledgehammer}
   421 \postw
   422 %
   423 The facts are then more likely to be selected than otherwise, and if they are
   424 selected at iteration $j$ they also influence which facts are selected at
   425 iterations $j + 1$, $j + 2$, etc. To give them even more weight, try
   426 %
   427 \prew
   428 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
   429 \textbf{apply}~\textbf{--} \\
   430 \textbf{sledgehammer}
   431 \postw
   432 
   433 \point{Why does Metis fail to reconstruct the proof?}
   434 
   435 There are many reasons. If Metis runs seemingly forever, that is a sign that the
   436 proof is too difficult for it. Metis's search is complete, so it should
   437 eventually find it, but that's little consolation. There are several possible
   438 solutions:
   439 
   440 \begin{enum}
   441 \item[\labelitemi] Try the \textit{isar\_proof} option (\S\ref{output-format}) to
   442 obtain a step-by-step Isar proof where each step is justified by \textit{metis}.
   443 Since the steps are fairly small, \textit{metis} is more likely to be able to
   444 replay them.
   445 
   446 \item[\labelitemi] Try the \textit{smt} proof method instead of \textit{metis}. It
   447 is usually stronger, but you need to either have Z3 available to replay the
   448 proofs, trust the SMT solver, or use certificates. See the documentation in the
   449 \emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.
   450 
   451 \item[\labelitemi] Try the \textit{blast} or \textit{auto} proof methods, passing
   452 the necessary facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},
   453 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
   454 \end{enum}
   455 
   456 In some rare cases, \textit{metis} fails fairly quickly, and you get the error
   457 message
   458 
   459 \prew
   460 \slshape
   461 One-line proof reconstruction failed.
   462 \postw
   463 
   464 This message indicates that Sledgehammer determined that the goal is provable,
   465 but the proof is, for technical reasons, beyond \textit{metis}'s power. You can
   466 then try again with the \textit{strict} option (\S\ref{problem-encoding}).
   467 
   468 If the goal is actually unprovable are you did not specify an unsound encoding
   469 using \textit{type\_enc} (\S\ref{problem-encoding}), this is a bug, and you are
   470 strongly encouraged to report this to the author at \authoremail.
   471 
   472 \point{Why are the generated Isar proofs so ugly/broken?}
   473 
   474 The current implementation of the Isar proof feature,
   475 enabled by the \textit{isar\_proof} option (\S\ref{output-format}),
   476 is highly experimental. Work on a new implementation has begun. There is a large body of
   477 research into transforming resolution proofs into natural deduction proofs (such
   478 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
   479 set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger
   480 value or to try several provers and keep the nicest-looking proof.
   481 
   482 \point{How can I tell whether a suggested proof is sound?}
   483 
   484 Earlier versions of Sledgehammer often suggested unsound proofs---either proofs
   485 of nontheorems or simply proofs that rely on type-unsound inferences. This
   486 is a thing of the pass, unless you explicitly specify an unsound encoding
   487 using \textit{type\_enc} (\S\ref{problem-encoding}).
   488 %
   489 Officially, the only form of ``unsoundness'' that lurks in the sound
   490 encodings is related to missing characteristic theorems of datatypes. For
   491 example,
   492 
   493 \prew
   494 \textbf{lemma}~``$\exists \mathit{xs}.\; \mathit{xs} \neq []$'' \\
   495 \textbf{sledgehammer} ()
   496 \postw
   497 
   498 suggests an argumentless \textit{metis} call that fails. However, the conjecture
   499 does actually hold, and the \textit{metis} call can be repaired by adding
   500 \textit{list.distinct}.
   501 %
   502 We hope to address this problem in a future version of Isabelle. In the
   503 meantime, you can avoid it by passing the \textit{strict} option
   504 (\S\ref{problem-encoding}).
   505 
   506 \point{What are the \textit{full\_types}, \textit{no\_types}, and
   507 \textit{mono\_tags} arguments to Metis?}
   508 
   509 The \textit{metis}~(\textit{full\_types}) proof method
   510 and its cousin \textit{metis}~(\textit{mono\_tags}) are fully-typed
   511 version of Metis. It is somewhat slower than \textit{metis}, but the proof
   512 search is fully typed, and it also includes more powerful rules such as the
   513 axiom ``$x = \const{True} \mathrel{\lor} x = \const{False}$'' for reasoning in
   514 higher-order places (e.g., in set comprehensions). The method kicks in
   515 automatically as a fallback when \textit{metis} fails, and it is sometimes
   516 generated by Sledgehammer instead of \textit{metis} if the proof obviously
   517 requires type information or if \textit{metis} failed when Sledgehammer
   518 preplayed the proof. (By default, Sledgehammer tries to run \textit{metis} with
   519 various options for up to 3 seconds each time to ensure that the generated
   520 one-line proofs actually work and to display timing information. This can be
   521 configured using the \textit{preplay\_timeout} option (\S\ref{timeouts}).)
   522 %
   523 At the other end of the soundness spectrum, \textit{metis} (\textit{no\_types})
   524 uses no type information at all during the proof search, which is more efficient
   525 but often fails. Calls to \textit{metis} (\textit{no\_types}) are occasionally
   526 generated by Sledgehammer.
   527 %
   528 See the \textit{type\_enc} option (\S\ref{problem-encoding}) for details.
   529 
   530 Incidentally, if you ever see warnings such as
   531 
   532 \prew
   533 \slshape
   534 Metis: Falling back on ``\textit{metis} (\textit{full\_types})''.
   535 \postw
   536 
   537 for a successful \textit{metis} proof, you can advantageously pass the
   538 \textit{full\_types} option to \textit{metis} directly.
   539 
   540 \point{And what are the \textit{lifting} and \textit{hide\_lams} arguments
   541 to Metis?}
   542 
   543 Orthogonally to the encoding of types, it is important to choose an appropriate
   544 translation of $\lambda$-abstractions. Metis supports three translation schemes,
   545 in decreasing order of power: Curry combinators (the default),
   546 $\lambda$-lifting, and a ``hiding'' scheme that disables all reasoning under
   547 $\lambda$-abstractions. The more powerful schemes also give the automatic
   548 provers more rope to hang themselves. See the \textit{lam\_trans} option (\S\ref{problem-encoding}) for details.
   549 
   550 \point{Are generated proofs minimal?}
   551 
   552 Automatic provers frequently use many more facts than are necessary.
   553 Sledgehammer inclues a minimization tool that takes a set of facts returned by a
   554 given prover and repeatedly calls the same prover, \textit{metis}, or
   555 \textit{smt} with subsets of those axioms in order to find a minimal set.
   556 Reducing the number of axioms typically improves Metis's speed and success rate,
   557 while also removing superfluous clutter from the proof scripts.
   558 
   559 In earlier versions of Sledgehammer, generated proofs were systematically
   560 accompanied by a suggestion to invoke the minimization tool. This step is now
   561 performed implicitly if it can be done in a reasonable amount of time (something
   562 that can be guessed from the number of facts in the original proof and the time
   563 it took to find or preplay it).
   564 
   565 In addition, some provers (e.g., Yices) do not provide proofs or sometimes
   566 produce incomplete proofs. The minimizer is then invoked to find out which facts
   567 are actually needed from the (large) set of facts that was initinally given to
   568 the prover. Finally, if a prover returns a proof with lots of facts, the
   569 minimizer is invoked automatically since Metis would be unlikely to re-find the
   570 proof.
   571 %
   572 Automatic minimization can be forced or disabled using the \textit{minimize}
   573 option (\S\ref{mode-of-operation}).
   574 
   575 \point{A strange error occurred---what should I do?}
   576 
   577 Sledgehammer tries to give informative error messages. Please report any strange
   578 error to the author at \authoremail. This applies double if you get the message
   579 
   580 \prew
   581 \slshape
   582 The prover found a type-unsound proof involving ``\textit{foo\/}'',
   583 ``\textit{bar\/}'', and ``\textit{baz\/}'' even though a supposedly type-sound
   584 encoding was used (or, less likely, your axioms are inconsistent). You might
   585 want to report this to the Isabelle developers.
   586 \postw
   587 
   588 \point{Auto can solve it---why not Sledgehammer?}
   589 
   590 Problems can be easy for \textit{auto} and difficult for automatic provers, but
   591 the reverse is also true, so don't be discouraged if your first attempts fail.
   592 Because the system refers to all theorems known to Isabelle, it is particularly
   593 suitable when your goal has a short proof from lemmas that you don't know about.
   594 
   595 \point{Why are there so many options?}
   596 
   597 Sledgehammer's philosophy should work out of the box, without user guidance.
   598 Many of the options are meant to be used mostly by the Sledgehammer developers
   599 for experimentation purposes. Of course, feel free to experiment with them if
   600 you are so inclined.
   601 
   602 \section{Command Syntax}
   603 \label{command-syntax}
   604 
   605 \subsection{Sledgehammer}
   606 
   607 Sledgehammer can be invoked at any point when there is an open goal by entering
   608 the \textbf{sledgehammer} command in the theory file. Its general syntax is as
   609 follows:
   610 
   611 \prew
   612 \textbf{sledgehammer} \qty{subcommand}$^?$ \qty{options}$^?$ \qty{facts\_override}$^?$ \qty{num}$^?$
   613 \postw
   614 
   615 For convenience, Sledgehammer is also available in the ``Commands'' submenu of
   616 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c
   617 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with
   618 no arguments in the theory text.
   619 
   620 In the general syntax, the \qty{subcommand} may be any of the following:
   621 
   622 \begin{enum}
   623 \item[\labelitemi] \textbf{\textit{run} (the default):} Runs Sledgehammer on
   624 subgoal number \qty{num} (1 by default), with the given options and facts.
   625 
   626 \item[\labelitemi] \textbf{\textit{min}:} Attempts to minimize the facts
   627 specified in the \qty{facts\_override} argument to obtain a simpler proof
   628 involving fewer facts. The options and goal number are as for \textit{run}.
   629 
   630 \item[\labelitemi] \textbf{\textit{messages}:} Redisplays recent messages issued
   631 by Sledgehammer. This allows you to examine results that might have been lost
   632 due to Sledgehammer's asynchronous nature. The \qty{num} argument specifies a
   633 limit on the number of messages to display (5 by default).
   634 
   635 \item[\labelitemi] \textbf{\textit{supported\_provers}:} Prints the list of
   636 automatic provers supported by Sledgehammer. See \S\ref{installation} and
   637 \S\ref{mode-of-operation} for more information on how to install automatic
   638 provers.
   639 
   640 \item[\labelitemi] \textbf{\textit{running\_provers}:} Prints information about
   641 currently running automatic provers, including elapsed runtime and remaining
   642 time until timeout.
   643 
   644 \item[\labelitemi] \textbf{\textit{kill\_provers}:} Terminates all running
   645 automatic provers.
   646 
   647 \item[\labelitemi] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
   648 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
   649 \end{enum}
   650 
   651 Sledgehammer's behavior can be influenced by various \qty{options}, which can be
   652 specified in brackets after the \textbf{sledgehammer} command. The
   653 \qty{options} are a list of key--value pairs of the form ``[$k_1 = v_1,
   654 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true\/}'' is optional. For
   655 example:
   656 
   657 \prew
   658 \textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120]
   659 \postw
   660 
   661 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:
   662 
   663 \prew
   664 \textbf{sledgehammer\_params} \qty{options}
   665 \postw
   666 
   667 The supported options are described in \S\ref{option-reference}.
   668 
   669 The \qty{facts\_override} argument lets you alter the set of facts that go
   670 through the relevance filter. It may be of the form ``(\qty{facts})'', where
   671 \qty{facts} is a space-separated list of Isabelle facts (theorems, local
   672 assumptions, etc.), in which case the relevance filter is bypassed and the given
   673 facts are used. It may also be of the form ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}})'',
   674 ``(\textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', or ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}}\
   675 \textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', where the relevance filter is instructed to
   676 proceed as usual except that it should consider \qty{facts\/_{\mathrm{1}}}
   677 highly-relevant and \qty{facts\/_{\mathrm{2}}} fully irrelevant.
   678 
   679 You can instruct Sledgehammer to run automatically on newly entered theorems by
   680 enabling the ``Auto Sledgehammer'' option in Proof General's ``Isabelle'' menu.
   681 For automatic runs, only the first prover set using \textit{provers}
   682 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,
   683 \textit{slice} (\S\ref{mode-of-operation}) is disabled, \textit{strict}
   684 (\S\ref{problem-encoding}) is enabled, \textit{verbose} (\S\ref{output-format})
   685 and \textit{debug} (\S\ref{output-format}) are disabled, and \textit{timeout}
   686 (\S\ref{timeouts}) is superseded by the ``Auto Tools Time Limit'' in Proof
   687 General's ``Isabelle'' menu. Sledgehammer's output is also more concise.
   688 
   689 \subsection{Metis}
   690 
   691 The \textit{metis} proof method has the syntax
   692 
   693 \prew
   694 \textbf{\textit{metis}}~(\qty{options})${}^?$~\qty{facts}${}^?$
   695 \postw
   696 
   697 where \qty{facts} is a list of arbitrary facts and \qty{options} is a
   698 comma-separated list consisting of at most one $\lambda$ translation scheme
   699 specification with the same semantics as Sledgehammer's \textit{lam\_trans}
   700 option (\S\ref{problem-encoding}) and at most one type encoding specification
   701 with the same semantics as Sledgehammer's \textit{type\_enc} option
   702 (\S\ref{problem-encoding}).
   703 %
   704 The supported $\lambda$ translation schemes are \textit{hide\_lams},
   705 \textit{lifting}, and \textit{combs} (the default).
   706 %
   707 All the untyped type encodings listed in \S\ref{problem-encoding} are supported.
   708 For convenience, the following aliases are provided:
   709 \begin{enum}
   710 \item[\labelitemi] \textbf{\textit{full\_types}:} Synonym for \textit{poly\_guards\_query}.
   711 \item[\labelitemi] \textbf{\textit{partial\_types}:} Synonym for \textit{poly\_args}.
   712 \item[\labelitemi] \textbf{\textit{no\_types}:} Synonym for \textit{erased}.
   713 \end{enum}
   714 
   715 \section{Option Reference}
   716 \label{option-reference}
   717 
   718 \def\defl{\{}
   719 \def\defr{\}}
   720 
   721 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}
   722 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{true}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   723 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{false}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   724 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   725 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   726 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}
   727 \def\opnodefaultbrk#1#2{\flushitem{$\bigl[$\textit{#1} =$\bigr]$ \qtybf{#2}} \nopagebreak\\[\parskip]}
   728 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\enskip \defl\textit{#3}\defr} \nopagebreak\\[\parskip]}
   729 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}
   730 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
   731 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
   732 
   733 Sledgehammer's options are categorized as follows:\ mode of operation
   734 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}),
   735 relevance filter (\S\ref{relevance-filter}), output format
   736 (\S\ref{output-format}), authentication (\S\ref{authentication}), and timeouts
   737 (\S\ref{timeouts}).
   738 
   739 The descriptions below refer to the following syntactic quantities:
   740 
   741 \begin{enum}
   742 \item[\labelitemi] \qtybf{string}: A string.
   743 \item[\labelitemi] \qtybf{bool\/}: \textit{true} or \textit{false}.
   744 \item[\labelitemi] \qtybf{smart\_bool\/}: \textit{true}, \textit{false}, or
   745 \textit{smart}.
   746 \item[\labelitemi] \qtybf{int\/}: An integer.
   747 %\item[\labelitemi] \qtybf{float\/}: A floating-point number (e.g., 2.5).
   748 \item[\labelitemi] \qtybf{float\_pair\/}: A pair of floating-point numbers
   749 (e.g., 0.6 0.95).
   750 \item[\labelitemi] \qtybf{smart\_int\/}: An integer or \textit{smart}.
   751 \item[\labelitemi] \qtybf{float\_or\_none\/}: A floating-point number (e.g., 60 or
   752 0.5) expressing a number of seconds, or the keyword \textit{none} ($\infty$
   753 seconds).
   754 \end{enum}
   755 
   756 Default values are indicated in curly brackets (\textrm{\{\}}). Boolean options
   757 have a negated counterpart (e.g., \textit{blocking} vs.\
   758 \textit{non\_blocking}). When setting them, ``= \textit{true\/}'' may be omitted.
   759 
   760 \subsection{Mode of Operation}
   761 \label{mode-of-operation}
   762 
   763 \begin{enum}
   764 \opnodefaultbrk{provers}{string}
   765 Specifies the automatic provers to use as a space-separated list (e.g.,
   766 ``\textit{e}~\textit{spass}~\textit{remote\_vampire\/}'').
   767 Provers can be run locally or remotely; see \S\ref{installation} for
   768 installation instructions.
   769 
   770 The following local provers are supported:
   771 
   772 \begin{enum}
   773 \item[\labelitemi] \textbf{\textit{cvc3}:} CVC3 is an SMT solver developed by
   774 Clark Barrett, Cesare Tinelli, and their colleagues \cite{cvc3}. To use CVC3,
   775 set the environment variable \texttt{CVC3\_SOLVER} to the complete path of the
   776 executable, including the file name, or install the prebuilt CVC3 package from
   777 \download. Sledgehammer has been tested with version 2.2.
   778 
   779 \item[\labelitemi] \textbf{\textit{e}:} E is a first-order resolution prover
   780 developed by Stephan Schulz \cite{schulz-2002}. To use E, set the environment
   781 variable \texttt{E\_HOME} to the directory that contains the \texttt{eproof}
   782 executable, or install the prebuilt E package from \download. Sledgehammer has
   783 been tested with versions 1.0 to 1.4.
   784 
   785 \item[\labelitemi] \textbf{\textit{leo2}:} LEO-II is an automatic
   786 higher-order prover developed by Christoph Benzm\"uller et al.\ \cite{leo2},
   787 with support for the TPTP typed higher-order syntax (THF0). To use LEO-II, set
   788 the environment variable \texttt{LEO2\_HOME} to the directory that contains the
   789 \texttt{leo} executable. Sledgehammer requires version 1.2.9 or above.
   790 
   791 \item[\labelitemi] \textbf{\textit{metis}:} Although it is much less powerful than
   792 the external provers, Metis itself can be used for proof search.
   793 
   794 \item[\labelitemi] \textbf{\textit{satallax}:} Satallax is an automatic
   795 higher-order prover developed by Chad Brown et al.\ \cite{satallax}, with
   796 support for the TPTP typed higher-order syntax (THF0). To use Satallax, set the
   797 environment variable \texttt{SATALLAX\_HOME} to the directory that contains the
   798 \texttt{satallax} executable. Sledgehammer requires version 2.2 or above.
   799 
   800 \item[\labelitemi] \textbf{\textit{smt}:} The \textit{smt} proof method with the
   801 current settings (usually:\ Z3 with proof reconstruction).
   802 
   803 \item[\labelitemi] \textbf{\textit{spass}:} SPASS is a first-order resolution
   804 prover developed by Christoph Weidenbach et al.\ \cite{weidenbach-et-al-2009}.
   805 To use SPASS, set the environment variable \texttt{SPASS\_HOME} to the directory
   806 that contains the \texttt{SPASS} executable, or install the prebuilt SPASS
   807 package from \download. Sledgehammer requires version 3.5 or above.
   808 
   809 \item[\labelitemi] \textbf{\textit{vampire}:} Vampire is a first-order resolution
   810 prover developed by Andrei Voronkov and his colleagues
   811 \cite{riazanov-voronkov-2002}. To use Vampire, set the environment variable
   812 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{vampire}
   813 executable and \texttt{VAMPIRE\_VERSION} to the version number (e.g., ``1.8'').
   814 Sledgehammer has been tested with versions 0.6, 1.0, and 1.8.
   815 %Vampire 1.8 supports the TPTP typed first-order format (TFF0).
   816 
   817 \item[\labelitemi] \textbf{\textit{yices}:} Yices is an SMT solver developed at
   818 SRI \cite{yices}. To use Yices, set the environment variable
   819 \texttt{YICES\_SOLVER} to the complete path of the executable, including the
   820 file name. Sledgehammer has been tested with version 1.0.28.
   821 
   822 \item[\labelitemi] \textbf{\textit{z3}:} Z3 is an SMT solver developed at
   823 Microsoft Research \cite{z3}. To use Z3, set the environment variable
   824 \texttt{Z3\_SOLVER} to the complete path of the executable, including the file
   825 name, and set \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a
   826 noncommercial user. Sledgehammer has been tested with versions 3.0 to 3.2.
   827 
   828 \item[\labelitemi] \textbf{\textit{z3\_tptp}:} This version of Z3 pretends to be
   829 an ATP, exploiting Z3's support for the TPTP untyped and typed first-order
   830 formats (FOF and TFF0). It is included for experimental purposes. It
   831 requires version 3.0 or above. To use it, set the environment variable
   832 \texttt{Z3\_HOME} to the directory that contains the \texttt{z3}
   833 executable.
   834 \end{enum}
   835 
   836 The following remote provers are supported:
   837 
   838 \begin{enum}
   839 \item[\labelitemi] \textbf{\textit{remote\_cvc3}:} The remote version of CVC3 runs
   840 on servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
   841 point).
   842 
   843 \item[\labelitemi] \textbf{\textit{remote\_e}:} The remote version of E runs
   844 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
   845 
   846 \item[\labelitemi] \textbf{\textit{remote\_e\_sine}:} E-SInE is a metaprover
   847 developed by Kry\v stof Hoder \cite{sine} based on E. The remote version of
   848 SInE runs on Geoff Sutcliffe's Miami servers.
   849 
   850 \item[\labelitemi] \textbf{\textit{remote\_e\_tofof}:} E-ToFoF is a metaprover
   851 developed by Geoff Sutcliffe \cite{tofof} based on E running on his Miami
   852 servers. This ATP supports the TPTP typed first-order format (TFF0). The
   853 remote version of E-ToFoF runs on Geoff Sutcliffe's Miami servers.
   854 
   855 \item[\labelitemi] \textbf{\textit{remote\_iprover}:} iProver is a pure
   856 instantiation-based prover developed by Konstantin Korovin \cite{korovin-2009}. The
   857 remote version of iProver runs on Geoff Sutcliffe's Miami servers
   858 \cite{sutcliffe-2000}.
   859 
   860 \item[\labelitemi] \textbf{\textit{remote\_iprover\_eq}:} iProver-Eq is an
   861 instantiation-based prover with native support for equality developed by
   862 Konstantin Korovin and Christoph Sticksel \cite{korovin-sticksel-2010}. The
   863 remote version of iProver-Eq runs on Geoff Sutcliffe's Miami servers
   864 \cite{sutcliffe-2000}.
   865 
   866 The remote version of LEO-II
   867 runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
   868 
   869 \item[\labelitemi] \textbf{\textit{remote\_leo2}:} The remote version of LEO-II
   870 runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
   871 
   872 \item[\labelitemi] \textbf{\textit{remote\_satallax}:} The remote version of
   873 Satallax runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
   874 
   875 \item[\labelitemi] \textbf{\textit{remote\_snark}:} SNARK is a first-order
   876 resolution prover developed by Stickel et al.\ \cite{snark}. It supports the
   877 TPTP typed first-order format (TFF0). The remote version of SNARK runs on
   878 Geoff Sutcliffe's Miami servers.
   879 
   880 \item[\labelitemi] \textbf{\textit{remote\_vampire}:} The remote version of
   881 Vampire runs on Geoff Sutcliffe's Miami servers. Version 1.8 is used.
   882 
   883 \item[\labelitemi] \textbf{\textit{remote\_waldmeister}:} Waldmeister is a unit
   884 equality prover developed by Hillenbrand et al.\ \cite{waldmeister}. It can be
   885 used to prove universally quantified equations using unconditional equations,
   886 corresponding to the TPTP CNF UEQ division. The remote version of Waldmeister
   887 runs on Geoff Sutcliffe's Miami servers.
   888 
   889 \item[\labelitemi] \textbf{\textit{remote\_z3}:} The remote version of Z3 runs on
   890 servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
   891 point).
   892 
   893 \item[\labelitemi] \textbf{\textit{remote\_z3\_tptp}:} The remote version of ``Z3
   894 with TPTP syntax'' runs on Geoff Sutcliffe's Miami servers.
   895 \end{enum}
   896 
   897 By default, Sledgehammer runs E, E-SInE, SPASS, Vampire, Z3 (or whatever
   898 the SMT module's \textit{smt\_solver} configuration option is set to), and (if
   899 appropriate) Waldmeister in parallel---either locally or remotely, depending on
   900 the number of processor cores available. For historical reasons, the default
   901 value of this option can be overridden using the option ``Sledgehammer:
   902 Provers'' in Proof General's ``Isabelle'' menu.
   903 
   904 It is generally a good idea to run several provers in parallel. Running E,
   905 SPASS, and Vampire for 5~seconds yields a similar success rate to running the
   906 most effective of these for 120~seconds \cite{boehme-nipkow-2010}.
   907 
   908 For the \textit{min} subcommand, the default prover is \textit{metis}. If
   909 several provers are set, the first one is used.
   910 
   911 \opnodefault{prover}{string}
   912 Alias for \textit{provers}.
   913 
   914 %\opnodefault{atps}{string}
   915 %Legacy alias for \textit{provers}.
   916 
   917 %\opnodefault{atp}{string}
   918 %Legacy alias for \textit{provers}.
   919 
   920 \opfalse{blocking}{non\_blocking}
   921 Specifies whether the \textbf{sledgehammer} command should operate
   922 synchronously. The asynchronous (non-blocking) mode lets the user start proving
   923 the putative theorem manually while Sledgehammer looks for a proof, but it can
   924 also be more confusing. Irrespective of the value of this option, Sledgehammer
   925 is always run synchronously for the new jEdit-based user interface or if
   926 \textit{debug} (\S\ref{output-format}) is enabled.
   927 
   928 \optrue{slice}{dont\_slice}
   929 Specifies whether the time allocated to a prover should be sliced into several
   930 segments, each of which has its own set of possibly prover-dependent options.
   931 For SPASS and Vampire, the first slice tries the fast but incomplete
   932 set-of-support (SOS) strategy, whereas the second slice runs without it. For E,
   933 up to three slices are tried, with different weighted search strategies and
   934 number of facts. For SMT solvers, several slices are tried with the same options
   935 each time but fewer and fewer facts. According to benchmarks with a timeout of
   936 30 seconds, slicing is a valuable optimization, and you should probably leave it
   937 enabled unless you are conducting experiments. This option is implicitly
   938 disabled for (short) automatic runs.
   939 
   940 \nopagebreak
   941 {\small See also \textit{verbose} (\S\ref{output-format}).}
   942 
   943 \opsmart{minimize}{dont\_minimize}
   944 Specifies whether the minimization tool should be invoked automatically after
   945 proof search. By default, automatic minimization takes place only if
   946 it can be done in a reasonable amount of time (as determined by
   947 the number of facts in the original proof and the time it took to find or
   948 preplay it) or the proof involves an unreasonably large number of facts.
   949 
   950 \nopagebreak
   951 {\small See also \textit{preplay\_timeout} (\S\ref{timeouts}).}
   952 
   953 \opfalse{overlord}{no\_overlord}
   954 Specifies whether Sledgehammer should put its temporary files in
   955 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for
   956 debugging Sledgehammer but also unsafe if several instances of the tool are run
   957 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may
   958 safely remove them after Sledgehammer has run.
   959 
   960 \nopagebreak
   961 {\small See also \textit{debug} (\S\ref{output-format}).}
   962 \end{enum}
   963 
   964 \subsection{Problem Encoding}
   965 \label{problem-encoding}
   966 
   967 \newcommand\comb[1]{\const{#1}}
   968 
   969 \begin{enum}
   970 \opdefault{lam\_trans}{string}{smart}
   971 Specifies the $\lambda$ translation scheme to use in ATP problems. The supported
   972 translation schemes are listed below:
   973 
   974 \begin{enum}
   975 \item[\labelitemi] \textbf{\textit{hide\_lams}:} Hide the $\lambda$-abstractions
   976 by replacing them by unspecified fresh constants, effectively disabling all
   977 reasoning under $\lambda$-abstractions.
   978 
   979 \item[\labelitemi] \textbf{\textit{lifting}:} Introduce a new
   980 supercombinator \const{c} for each cluster of $n$~$\lambda$-abstractions,
   981 defined using an equation $\const{c}~x_1~\ldots~x_n = t$ ($\lambda$-lifting).
   982 
   983 \item[\labelitemi] \textbf{\textit{combs}:} Rewrite lambdas to the Curry
   984 combinators (\comb{I}, \comb{K}, \comb{S}, \comb{B}, \comb{C}). Combinators
   985 enable the ATPs to synthesize $\lambda$-terms but tend to yield bulkier formulas
   986 than $\lambda$-lifting: The translation is quadratic in the worst case, and the
   987 equational definitions of the combinators are very prolific in the context of
   988 resolution.
   989 
   990 \item[\labelitemi] \textbf{\textit{combs\_and\_lifting}:} Introduce a new
   991 supercombinator \const{c} for each cluster of $\lambda$-abstractions and characterize it both using a
   992 lifted equation $\const{c}~x_1~\ldots~x_n = t$ and via Curry combinators.
   993 
   994 \item[\labelitemi] \textbf{\textit{combs\_or\_lifting}:} For each cluster of
   995 $\lambda$-abstractions, heuristically choose between $\lambda$-lifting and Curry
   996 combinators.
   997 
   998 \item[\labelitemi] \textbf{\textit{keep\_lams}:}
   999 Keep the $\lambda$-abstractions in the generated problems. This is available
  1000 only with provers that support the THF0 syntax.
  1001 
  1002 \item[\labelitemi] \textbf{\textit{smart}:} The actual translation scheme used
  1003 depends on the ATP and should be the most efficient scheme for that ATP.
  1004 \end{enum}
  1005 
  1006 For SMT solvers, the $\lambda$ translation scheme is always \textit{lifting},
  1007 irrespective of the value of this option.
  1008 
  1009 \opsmartx{uncurried\_aliases}{no\_uncurried\_aliases}
  1010 Specifies the $\lambda$ translation scheme to use in ATP problems. The supported
  1011 translation schemes are listed below:
  1012 
  1013 \opdefault{type\_enc}{string}{smart}
  1014 Specifies the type encoding to use in ATP problems. Some of the type encodings
  1015 are unsound, meaning that they can give rise to spurious proofs
  1016 (unreconstructible using \textit{metis}). The supported type encodings are
  1017 listed below, with an indication of their soundness in parentheses.
  1018 An asterisk (*) means that the encoding is slightly incomplete for
  1019 reconstruction with \textit{metis}, unless the \emph{strict} option (described
  1020 below) is enabled.
  1021 
  1022 \begin{enum}
  1023 \item[\labelitemi] \textbf{\textit{erased} (very unsound):} No type information is
  1024 supplied to the ATP, not even to resolve overloading. Types are simply erased.
  1025 
  1026 \item[\labelitemi] \textbf{\textit{poly\_guards} (sound):} Types are encoded using
  1027 a predicate \const{g}$(\tau, t)$ that guards bound
  1028 variables. Constants are annotated with their types, supplied as additional
  1029 arguments, to resolve overloading.
  1030 
  1031 \item[\labelitemi] \textbf{\textit{poly\_tags} (sound):} Each term and subterm is
  1032 tagged with its type using a function $\const{t\/}(\tau, t)$.
  1033 
  1034 \item[\labelitemi] \textbf{\textit{poly\_args} (unsound):}
  1035 Like for \textit{poly\_guards} constants are annotated with their types to
  1036 resolve overloading, but otherwise no type information is encoded. This
  1037 coincides with the default encoding used by the \textit{metis} command.
  1038 
  1039 \item[\labelitemi]
  1040 \textbf{%
  1041 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags} (sound); \\
  1042 \textit{raw\_mono\_args} (unsound):} \\
  1043 Similar to \textit{poly\_guards}, \textit{poly\_tags}, and \textit{poly\_args},
  1044 respectively, but the problem is additionally monomorphized, meaning that type
  1045 variables are instantiated with heuristically chosen ground types.
  1046 Monomorphization can simplify reasoning but also leads to larger fact bases,
  1047 which can slow down the ATPs.
  1048 
  1049 \item[\labelitemi]
  1050 \textbf{%
  1051 \textit{mono\_guards}, \textit{mono\_tags} (sound);
  1052 \textit{mono\_args} (unsound):} \\
  1053 Similar to
  1054 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, and
  1055 \textit{raw\_mono\_args}, respectively but types are mangled in constant names
  1056 instead of being supplied as ground term arguments. The binary predicate
  1057 $\const{g}(\tau, t)$ becomes a unary predicate
  1058 $\const{g\_}\tau(t)$, and the binary function
  1059 $\const{t}(\tau, t)$ becomes a unary function
  1060 $\const{t\_}\tau(t)$.
  1061 
  1062 \item[\labelitemi] \textbf{\textit{mono\_simple} (sound):} Exploits simple
  1063 first-order types if the prover supports the TFF0 or THF0 syntax; otherwise,
  1064 falls back on \textit{mono\_guards}. The problem is monomorphized.
  1065 
  1066 \item[\labelitemi] \textbf{\textit{mono\_simple\_higher} (sound):} Exploits simple
  1067 higher-order types if the prover supports the THF0 syntax; otherwise, falls back
  1068 on \textit{mono\_simple} or \textit{mono\_guards}. The problem is monomorphized.
  1069 
  1070 \item[\labelitemi]
  1071 \textbf{%
  1072 \textit{poly\_guards}?, \textit{poly\_tags}?, \textit{raw\_mono\_guards}?, \\
  1073 \textit{raw\_mono\_tags}?, \textit{mono\_guards}?, \textit{mono\_tags}?, \\
  1074 \textit{mono\_simple}? (sound*):} \\
  1075 The type encodings \textit{poly\_guards}, \textit{poly\_tags},
  1076 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},
  1077 \textit{mono\_tags}, and \textit{mono\_simple} are fully
  1078 typed and sound. For each of these, Sledgehammer also provides a lighter,
  1079 virtually sound variant identified by a question mark (`\hbox{?}')\ that detects
  1080 and erases monotonic types, notably infinite types. (For \textit{mono\_simple},
  1081 the types are not actually erased but rather replaced by a shared uniform type
  1082 of individuals.) As argument to the \textit{metis} proof method, the question
  1083 mark is replaced by a \hbox{``\textit{\_query\/}''} suffix.
  1084 
  1085 \item[\labelitemi]
  1086 \textbf{%
  1087 \textit{poly\_guards}??, \textit{poly\_tags}??, \textit{raw\_mono\_guards}??, \\
  1088 \textit{raw\_mono\_tags}??, \textit{mono\_guards}??, \textit{mono\_tags}?? \\
  1089 (sound*):} \\
  1090 Even lighter versions of the `\hbox{?}' encodings. As argument to the
  1091 \textit{metis} proof method, the `\hbox{??}' suffix is replaced by
  1092 \hbox{``\textit{\_query\_query\/}''}.
  1093 
  1094 \item[\labelitemi]
  1095 \textbf{%
  1096 \textit{poly\_guards}@?, \textit{raw\_mono\_guards}@? (sound*):} \\
  1097 Alternative versions of the `\hbox{??}' encodings. As argument to the
  1098 \textit{metis} proof method, the `\hbox{@?}' suffix is replaced by
  1099 \hbox{``\textit{\_at\_query\/}''}.
  1100 
  1101 \item[\labelitemi]
  1102 \textbf{%
  1103 \textit{poly\_guards}!, \textit{poly\_tags}!, \textit{raw\_mono\_guards}!, \\
  1104 \textit{raw\_mono\_tags}!, \textit{mono\_guards}!, \textit{mono\_tags}!, \\
  1105 \textit{mono\_simple}!, \textit{mono\_simple\_higher}! (mildly unsound):} \\
  1106 The type encodings \textit{poly\_guards}, \textit{poly\_tags},
  1107 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},
  1108 \textit{mono\_tags}, \textit{mono\_simple}, and \textit{mono\_simple\_higher}
  1109 also admit a mildly unsound (but very efficient) variant identified by an
  1110 exclamation mark (`\hbox{!}') that detects and erases erases all types except
  1111 those that are clearly finite (e.g., \textit{bool}). (For \textit{mono\_simple}
  1112 and \textit{mono\_simple\_higher}, the types are not actually erased but rather
  1113 replaced by a shared uniform type of individuals.) As argument to the
  1114 \textit{metis} proof method, the exclamation mark is replaced by the suffix
  1115 \hbox{``\textit{\_bang\/}''}.
  1116 
  1117 \item[\labelitemi]
  1118 \textbf{%
  1119 \textit{poly\_guards}!!, \textit{poly\_tags}!!, \textit{raw\_mono\_guards}!!, \\
  1120 \textit{raw\_mono\_tags}!!, \textit{mono\_guards}!!, \textit{mono\_tags}!! \\
  1121 (mildly unsound):} \\
  1122 Even lighter versions of the `\hbox{!}' encodings. As argument to the
  1123 \textit{metis} proof method, the `\hbox{!!}' suffix is replaced by
  1124 \hbox{``\textit{\_bang\_bang\/}''}.
  1125 
  1126 \item[\labelitemi]
  1127 \textbf{%
  1128 \textit{poly\_guards}@!, \textit{raw\_mono\_guards}@! (mildly unsound):} \\
  1129 Alternative versions of the `\hbox{!!}' encodings. As argument to the
  1130 \textit{metis} proof method, the `\hbox{@!}' suffix is replaced by
  1131 \hbox{``\textit{\_at\_bang\/}''}.
  1132 
  1133 \item[\labelitemi] \textbf{\textit{smart}:} The actual encoding used depends on
  1134 the ATP and should be the most efficient virtually sound encoding for that ATP.
  1135 \end{enum}
  1136 
  1137 For SMT solvers, the type encoding is always \textit{mono\_simple}, irrespective
  1138 of the value of this option.
  1139 
  1140 \nopagebreak
  1141 {\small See also \textit{max\_new\_mono\_instances} (\S\ref{relevance-filter})
  1142 and \textit{max\_mono\_iters} (\S\ref{relevance-filter}).}
  1143 
  1144 \opfalse{strict}{non\_strict}
  1145 Specifies whether Sledgehammer should run in its strict mode. In that mode,
  1146 sound type encodings marked with an asterisk (*) above are made complete
  1147 for reconstruction with \textit{metis}, at the cost of some clutter in the
  1148 generated problems. This option has no effect if \textit{type\_enc} is
  1149 deliberately set to an unsound encoding.
  1150 \end{enum}
  1151 
  1152 \subsection{Relevance Filter}
  1153 \label{relevance-filter}
  1154 
  1155 \begin{enum}
  1156 \opdefault{relevance\_thresholds}{float\_pair}{\upshape 0.45~0.85}
  1157 Specifies the thresholds above which facts are considered relevant by the
  1158 relevance filter. The first threshold is used for the first iteration of the
  1159 relevance filter and the second threshold is used for the last iteration (if it
  1160 is reached). The effective threshold is quadratically interpolated for the other
  1161 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems
  1162 are relevant and 1 only theorems that refer to previously seen constants.
  1163 
  1164 \opdefault{max\_relevant}{smart\_int}{smart}
  1165 Specifies the maximum number of facts that may be returned by the relevance
  1166 filter. If the option is set to \textit{smart}, it is set to a value that was
  1167 empirically found to be appropriate for the prover. A typical value would be
  1168 250.
  1169 
  1170 \opdefault{max\_new\_mono\_instances}{int}{\upshape 200}
  1171 Specifies the maximum number of monomorphic instances to generate beyond
  1172 \textit{max\_relevant}. The higher this limit is, the more monomorphic instances
  1173 are potentially generated. Whether monomorphization takes place depends on the
  1174 type encoding used.
  1175 
  1176 \nopagebreak
  1177 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}
  1178 
  1179 \opdefault{max\_mono\_iters}{int}{\upshape 3}
  1180 Specifies the maximum number of iterations for the monomorphization fixpoint
  1181 construction. The higher this limit is, the more monomorphic instances are
  1182 potentially generated. Whether monomorphization takes place depends on the
  1183 type encoding used.
  1184 
  1185 \nopagebreak
  1186 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}
  1187 \end{enum}
  1188 
  1189 \subsection{Output Format}
  1190 \label{output-format}
  1191 
  1192 \begin{enum}
  1193 
  1194 \opfalse{verbose}{quiet}
  1195 Specifies whether the \textbf{sledgehammer} command should explain what it does.
  1196 This option is implicitly disabled for automatic runs.
  1197 
  1198 \opfalse{debug}{no\_debug}
  1199 Specifies whether Sledgehammer should display additional debugging information
  1200 beyond what \textit{verbose} already displays. Enabling \textit{debug} also
  1201 enables \textit{verbose} and \textit{blocking} (\S\ref{mode-of-operation})
  1202 behind the scenes. The \textit{debug} option is implicitly disabled for
  1203 automatic runs.
  1204 
  1205 \nopagebreak
  1206 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}
  1207 
  1208 \opfalse{isar\_proof}{no\_isar\_proof}
  1209 Specifies whether Isar proofs should be output in addition to one-liner
  1210 \textit{metis} proofs. Isar proof construction is still experimental and often
  1211 fails; however, they are usually faster and sometimes more robust than
  1212 \textit{metis} proofs.
  1213 
  1214 \opdefault{isar\_shrink\_factor}{int}{\upshape 1}
  1215 Specifies the granularity of the Isar proof. A value of $n$ indicates that each
  1216 Isar proof step should correspond to a group of up to $n$ consecutive proof
  1217 steps in the ATP proof.
  1218 \end{enum}
  1219 
  1220 \subsection{Authentication}
  1221 \label{authentication}
  1222 
  1223 \begin{enum}
  1224 \opnodefault{expect}{string}
  1225 Specifies the expected outcome, which must be one of the following:
  1226 
  1227 \begin{enum}
  1228 \item[\labelitemi] \textbf{\textit{some}:} Sledgehammer found a proof.
  1229 \item[\labelitemi] \textbf{\textit{none}:} Sledgehammer found no proof.
  1230 \item[\labelitemi] \textbf{\textit{timeout}:} Sledgehammer timed out.
  1231 \item[\labelitemi] \textbf{\textit{unknown}:} Sledgehammer encountered some
  1232 problem.
  1233 \end{enum}
  1234 
  1235 Sledgehammer emits an error (if \textit{blocking} is enabled) or a warning
  1236 (otherwise) if the actual outcome differs from the expected outcome. This option
  1237 is useful for regression testing.
  1238 
  1239 \nopagebreak
  1240 {\small See also \textit{blocking} (\S\ref{mode-of-operation}) and
  1241 \textit{timeout} (\S\ref{timeouts}).}
  1242 \end{enum}
  1243 
  1244 \subsection{Timeouts}
  1245 \label{timeouts}
  1246 
  1247 \begin{enum}
  1248 \opdefault{timeout}{float\_or\_none}{\upshape 30}
  1249 Specifies the maximum number of seconds that the automatic provers should spend
  1250 searching for a proof. This excludes problem preparation and is a soft limit.
  1251 For historical reasons, the default value of this option can be overridden using
  1252 the option ``Sledgehammer: Time Limit'' in Proof General's ``Isabelle'' menu.
  1253 
  1254 \opdefault{preplay\_timeout}{float\_or\_none}{\upshape 3}
  1255 Specifies the maximum number of seconds that \textit{metis} or \textit{smt}
  1256 should spend trying to ``preplay'' the found proof. If this option is set to 0,
  1257 no preplaying takes place, and no timing information is displayed next to the
  1258 suggested \textit{metis} calls.
  1259 
  1260 \nopagebreak
  1261 {\small See also \textit{minimize} (\S\ref{mode-of-operation}).}
  1262 \end{enum}
  1263 
  1264 \let\em=\sl
  1265 \bibliography{../manual}{}
  1266 \bibliographystyle{abbrv}
  1267 
  1268 \end{document}