doc-src/Sledgehammer/sledgehammer.tex
author blanchet
Fri, 27 May 2011 10:30:08 +0200
changeset 43879 07ebc2398731
parent 43877 0ef380310863
child 43894 da6f459a7021
permissions -rw-r--r--
new timeout section (cf. Nitpick manual)
     1 \documentclass[a4paper,12pt]{article}
     2 \usepackage[T1]{fontenc}
     3 \usepackage{amsmath}
     4 \usepackage{amssymb}
     5 \usepackage[english,french]{babel}
     6 \usepackage{color}
     7 \usepackage{footmisc}
     8 \usepackage{graphicx}
     9 %\usepackage{mathpazo}
    10 \usepackage{multicol}
    11 \usepackage{stmaryrd}
    12 %\usepackage[scaled=.85]{beramono}
    13 \usepackage{../../lib/texinputs/isabelle,../iman,../pdfsetup}
    14 
    15 %\oddsidemargin=4.6mm
    16 %\evensidemargin=4.6mm
    17 %\textwidth=150mm
    18 %\topmargin=4.6mm
    19 %\headheight=0mm
    20 %\headsep=0mm
    21 %\textheight=234mm
    22 
    23 \def\Colon{\mathord{:\mkern-1.5mu:}}
    24 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}
    25 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}
    26 \def\lparr{\mathopen{(\mkern-4mu\mid}}
    27 \def\rparr{\mathclose{\mid\mkern-4mu)}}
    28 
    29 \def\unk{{?}}
    30 \def\undef{(\lambda x.\; \unk)}
    31 %\def\unr{\textit{others}}
    32 \def\unr{\ldots}
    33 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}
    34 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}
    35 
    36 \urlstyle{tt}
    37 
    38 \begin{document}
    39 
    40 \selectlanguage{english}
    41 
    42 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]
    43 Hammering Away \\[\smallskipamount]
    44 \Large A User's Guide to Sledgehammer for Isabelle/HOL}
    45 \author{\hbox{} \\
    46 Jasmin Christian Blanchette \\
    47 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\[4\smallskipamount]
    48 {\normalsize with contributions from} \\[4\smallskipamount]
    49 Lawrence C. Paulson \\
    50 {\normalsize Computer Laboratory, University of Cambridge} \\
    51 \hbox{}}
    52 
    53 \maketitle
    54 
    55 \tableofcontents
    56 
    57 \setlength{\parskip}{.7em plus .2em minus .1em}
    58 \setlength{\parindent}{0pt}
    59 \setlength{\abovedisplayskip}{\parskip}
    60 \setlength{\abovedisplayshortskip}{.9\parskip}
    61 \setlength{\belowdisplayskip}{\parskip}
    62 \setlength{\belowdisplayshortskip}{.9\parskip}
    63 
    64 % General-purpose enum environment with correct spacing
    65 \newenvironment{enum}%
    66     {\begin{list}{}{%
    67         \setlength{\topsep}{.1\parskip}%
    68         \setlength{\partopsep}{.1\parskip}%
    69         \setlength{\itemsep}{\parskip}%
    70         \advance\itemsep by-\parsep}}
    71     {\end{list}}
    72 
    73 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin
    74 \advance\rightskip by\leftmargin}
    75 \def\post{\vskip0pt plus1ex\endgroup}
    76 
    77 \def\prew{\pre\advance\rightskip by-\leftmargin}
    78 \def\postw{\post}
    79 
    80 \section{Introduction}
    81 \label{introduction}
    82 
    83 Sledgehammer is a tool that applies automatic theorem provers (ATPs)
    84 and satisfiability-modulo-theories (SMT) solvers on the current goal. The
    85 supported ATPs are E \cite{schulz-2002}, LEO-II \cite{leo2}, Satallax
    86 \cite{satallax}, SInE-E \cite{sine}, SNARK \cite{snark}, SPASS
    87 \cite{weidenbach-et-al-2009}, ToFoF-E \cite{tofof}, Vampire
    88 \cite{riazanov-voronkov-2002}, and Waldmeister \cite{waldmeister}. The ATPs are
    89 run either locally or remotely via the System\-On\-TPTP web service
    90 \cite{sutcliffe-2000}. In addition to the ATPs, the SMT solvers Z3 \cite{z3} is
    91 used by default, and you can tell Sledgehammer to try CVC3 \cite{cvc3} and Yices
    92 \cite{yices} as well; these are run either locally or on a server at the TU
    93 M\"unchen.
    94 
    95 The problem passed to the automatic provers consists of your current goal
    96 together with a heuristic selection of hundreds of facts (theorems) from the
    97 current theory context, filtered by relevance. Because jobs are run in the
    98 background, you can continue to work on your proof by other means. Provers can
    99 be run in parallel. Any reply (which may arrive half a minute later) will appear
   100 in the Proof General response buffer.
   101 
   102 The result of a successful proof search is some source text that usually (but
   103 not always) reconstructs the proof within Isabelle. For ATPs, the reconstructed
   104 proof relies on the general-purpose Metis prover, which is fully integrated into
   105 Isabelle/HOL, with explicit inferences going through the kernel. Thus its
   106 results are correct by construction.
   107 
   108 In this manual, we will explicitly invoke the \textbf{sledgehammer} command.
   109 Sledgehammer also provides an automatic mode that can be enabled via the
   110 ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof General. In
   111 this mode, Sledgehammer is run on every newly entered theorem. The time limit
   112 for Auto Sledgehammer and other automatic tools can be set using the ``Auto
   113 Tools Time Limit'' option.
   114 
   115 \newbox\boxA
   116 \setbox\boxA=\hbox{\texttt{nospam}}
   117 
   118 \newcommand\authoremail{\texttt{blan{\color{white}nospam}\kern-\wd\boxA{}chette@\allowbreak
   119 in.\allowbreak tum.\allowbreak de}}
   120 
   121 To run Sledgehammer, you must make sure that the theory \textit{Sledgehammer} is
   122 imported---this is rarely a problem in practice since it is part of
   123 \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's
   124 \texttt{src/HOL/Metis\_Examples} directory.
   125 Comments and bug reports concerning Sledgehammer or this manual should be
   126 directed to the author at \authoremail.
   127 
   128 \vskip2.5\smallskipamount
   129 
   130 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for
   131 %suggesting several textual improvements.
   132 
   133 \section{Installation}
   134 \label{installation}
   135 
   136 Sledgehammer is part of Isabelle, so you don't need to install it. However, it
   137 relies on third-party automatic theorem provers (ATPs) and SMT solvers.
   138 
   139 \subsection{Installing ATPs}
   140 
   141 Currently, E, SPASS, and Vampire can be run locally; in addition, E, Vampire,
   142 LEO-II, Satallax, SInE-E, SNARK, ToFoF-E, and Waldmeister are available remotely
   143 via System\-On\-TPTP \cite{sutcliffe-2000}. If you want better performance, you
   144 should at least install E and SPASS locally.
   145 
   146 There are three main ways to install ATPs on your machine:
   147 
   148 \begin{enum}
   149 \item[$\bullet$] If you installed an official Isabelle package with everything
   150 inside, it should already include properly setup executables for E and SPASS,
   151 ready to use.%
   152 \footnote{Vampire's license prevents us from doing the same for this otherwise
   153 wonderful tool.}
   154 
   155 \item[$\bullet$] Alternatively, you can download the Isabelle-aware E and SPASS
   156 binary packages from Isabelle's download page. Extract the archives, then add a
   157 line to your \texttt{\$ISABELLE\_HOME\_USER/etc/components}%
   158 \footnote{The variable \texttt{\$ISABELLE\_HOME\_USER} is set by Isabelle at
   159 startup. Its value can be retrieved by invoking \texttt{isabelle}
   160 \texttt{getenv} \texttt{ISABELLE\_HOME\_USER} on the command line.}
   161 file with the absolute
   162 path to E or SPASS. For example, if the \texttt{components} does not exist yet
   163 and you extracted SPASS to \texttt{/usr/local/spass-3.7}, create the
   164 \texttt{components} file with the single line
   165 
   166 \prew
   167 \texttt{/usr/local/spass-3.7}
   168 \postw
   169 
   170 in it.
   171 
   172 \item[$\bullet$] If you prefer to build E or SPASS yourself, or obtained a
   173 Vampire executable from somewhere (e.g., \url{http://www.vprover.org/}),
   174 set the environment variable \texttt{E\_HOME}, \texttt{SPASS\_HOME}, or
   175 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{eproof},
   176 \texttt{SPASS}, or \texttt{vampire} executable. Sledgehammer has been tested
   177 with E 1.0 and 1.2, SPASS 3.5 and 3.7, and Vampire 0.6 and 1.0%
   178 \footnote{Following the rewrite of Vampire, the counter for version numbers was
   179 reset to 0; hence the (new) Vampire versions 0.6 and 1.0 are more recent than,
   180 say, Vampire 11.5.}%
   181 . Since the ATPs' output formats are neither documented nor stable, other
   182 versions of the ATPs might or might not work well with Sledgehammer. Ideally,
   183 also set \texttt{E\_VERSION}, \texttt{SPASS\_VERSION}, or
   184 \texttt{VAMPIRE\_VERSION} to the ATP's version number (e.g., ``1.2'').
   185 \end{enum}
   186 
   187 To check whether E and SPASS are successfully installed, follow the example in
   188 \S\ref{first-steps}. If the remote versions of E and SPASS are used (identified
   189 by the prefix ``\emph{remote\_}''), or if the local versions fail to solve the
   190 easy goal presented there, this is a sign that something is wrong with your
   191 installation.
   192 
   193 Remote ATP invocation via the SystemOnTPTP web service requires Perl with the
   194 World Wide Web Library (\texttt{libwww-perl}) installed. If you must use a proxy
   195 server to access the Internet, set the \texttt{http\_proxy} environment variable
   196 to the proxy, either in the environment in which Isabelle is launched or in your
   197 \texttt{\char`\~/\$ISABELLE\_HOME\_USER/etc/settings} file. Here are a few examples:
   198 
   199 \prew
   200 \texttt{http\_proxy=http://proxy.example.org} \\
   201 \texttt{http\_proxy=http://proxy.example.org:8080} \\
   202 \texttt{http\_proxy=http://joeblow:pAsSwRd@proxy.example.org}
   203 \postw
   204 
   205 \subsection{Installing SMT Solvers}
   206 
   207 CVC3, Yices, and Z3 can be run locally or (for CVC3 and Z3) remotely on a TU
   208 M\"unchen server. If you want better performance and get the ability to replay
   209 proofs that rely on the \emph{smt} proof method, you should at least install Z3
   210 locally.
   211 
   212 There are two main ways of installing SMT solvers locally.
   213 
   214 \begin{enum}
   215 \item[$\bullet$] If you installed an official Isabelle package with everything
   216 inside, it should already include properly setup executables for CVC3 and Z3,
   217 ready to use.%
   218 \footnote{Yices's license prevents us from doing the same for this otherwise
   219 wonderful tool.}
   220 For Z3, you additionally need to set the environment variable
   221 \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a noncommercial
   222 user.
   223 
   224 \item[$\bullet$] Otherwise, follow the instructions documented in the \emph{SMT}
   225 theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}).
   226 \end{enum}
   227 
   228 \section{First Steps}
   229 \label{first-steps}
   230 
   231 To illustrate Sledgehammer in context, let us start a theory file and
   232 attempt to prove a simple lemma:
   233 
   234 \prew
   235 \textbf{theory}~\textit{Scratch} \\
   236 \textbf{imports}~\textit{Main} \\
   237 \textbf{begin} \\[2\smallskipamount]
   238 %
   239 \textbf{lemma} ``$[a] = [b] \,\Longrightarrow\, a = b$'' \\
   240 \textbf{sledgehammer}
   241 \postw
   242 
   243 Instead of issuing the \textbf{sledgehammer} command, you can also find
   244 Sledgehammer in the ``Commands'' submenu of the ``Isabelle'' menu in Proof
   245 General or press the Emacs key sequence C-c C-a C-s.
   246 Either way, Sledgehammer produces the following output after a few seconds:
   247 
   248 \prew
   249 \slshape
   250 Sledgehammer: ``\textit{e}'' on goal \\
   251 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   252 Try this: \textbf{by} (\textit{metis last\_ConsL}) (46 ms). \\
   253 To minimize: \textbf{sledgehammer} \textit{min} [\textit{e}] (\textit{last\_ConsL}). \\[3\smallskipamount]
   254 %
   255 Sledgehammer: ``\textit{vampire}'' on goal \\
   256 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   257 Try this: \textbf{by} (\textit{metis hd.simps}) (17 ms). \\
   258 To minimize: \textbf{sledgehammer} \textit{min} [\textit{vampire}] (\textit{hd.simps}). \\[3\smallskipamount]
   259 %
   260 Sledgehammer: ``\textit{spass}'' on goal \\
   261 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   262 Try this: \textbf{by} (\textit{metis list.inject}) (20 ms). \\
   263 To minimize: \textbf{sledgehammer} \textit{min} [\textit{spass}]~(\textit{list.inject}). \\[3\smallskipamount]
   264 %
   265 Sledgehammer: ``\textit{remote\_waldmeister}'' on goal \\
   266 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   267 Try this: \textbf{by} (\textit{metis hd.simps insert\_Nil}) (25 ms). \\
   268 To minimize: \textbf{sledgehammer} \textit{min} [\textit{remote\_waldmeister}] \\
   269 \phantom{To minimize: \textbf{sledgehammer}~}(\textit{hd.simps insert\_Nil}). \\[3\smallskipamount]
   270 %
   271 Sledgehammer: ``\textit{remote\_sine\_e}'' on goal \\
   272 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   273 Try this: \textbf{by} (\textit{metis hd.simps}) (17 ms). \\
   274 To minimize: \textbf{sledgehammer} \textit{min} [\textit{remote\_sine\_e}]~(\textit{hd.simps}). \\[3\smallskipamount]
   275 %
   276 Sledgehammer: ``\textit{remote\_z3}'' on goal \\
   277 $[a] = [b] \,\Longrightarrow\, a = b$ \\
   278 Try this: \textbf{by} (\textit{metis hd.simps}) (17 ms). \\
   279 To minimize: \textbf{sledgehammer} \textit{min} [\textit{remote\_z3}]~(\textit{hd.simps}).
   280 \postw
   281 
   282 Sledgehammer ran E, SInE-E, SPASS, Vampire, Waldmeister, and Z3 in parallel.
   283 Depending on which provers are installed and how many processor cores are
   284 available, some of the provers might be missing or present with a
   285 \textit{remote\_} prefix. Waldmeister is run only for unit equational problems,
   286 where the goal's conclusion is a (universally quantified) equation.
   287 
   288 For each successful prover, Sledgehammer gives a one-liner proof that uses Metis
   289 or the \textit{smt} proof method. For Metis, timings are shown in parentheses,
   290 indicating how fast the call is. You can click the proof to insert it into the
   291 theory text. You can click the ``\textbf{sledgehammer} \textit{minimize}''
   292 command if you want to look for a shorter (and probably faster) proof. But here
   293 the proof found by Vampire is both short and fast already.
   294 
   295 You can ask Sledgehammer for an Isar text proof by passing the
   296 \textit{isar\_proof} option (\S\ref{output-format}):
   297 
   298 \prew
   299 \textbf{sledgehammer} [\textit{isar\_proof}]
   300 \postw
   301 
   302 When Isar proof construction is successful, it can yield proofs that are more
   303 readable and also faster than the Metis one-liners. This feature is experimental
   304 and is only available for ATPs.
   305 
   306 \section{Hints}
   307 \label{hints}
   308 
   309 This section presents a few hints that should help you get the most out of
   310 Sledgehammer and Metis. Frequently (and infrequently) asked questions are
   311 answered in \S\ref{frequently-asked-questions}.
   312 
   313 \newcommand\point[1]{\medskip\par{\sl\bfseries#1}\par\nopagebreak}
   314 
   315 \point{Presimplify the goal}
   316 
   317 For best results, first simplify your problem by calling \textit{auto} or at
   318 least \textit{safe} followed by \textit{simp\_all}. The SMT solvers provide
   319 arithmetic decision procedures, but the ATPs typically do not (or if they do,
   320 Sledgehammer does not use it yet). Apart from Waldmeister, they are not
   321 especially good at heavy rewriting, but because they regard equations as
   322 undirected, they often prove theorems that require the reverse orientation of a
   323 \textit{simp} rule. Higher-order problems can be tackled, but the success rate
   324 is better for first-order problems. Hence, you may get better results if you
   325 first simplify the problem to remove higher-order features.
   326 
   327 \point{Make sure at least E, SPASS, Vampire, and Z3 are installed}
   328 
   329 Locally installed provers are faster and more reliable than those running on
   330 servers. See \S\ref{installation} for details on how to install them.
   331 
   332 \point{Familiarize yourself with the most important options}
   333 
   334 Sledgehammer's options are fully documented in \S\ref{command-syntax}. Many of
   335 the options are very specialized, but serious users of the tool should at least
   336 familiarize themselves with the following options:
   337 
   338 \begin{enum}
   339 \item[$\bullet$] \textbf{\textit{provers}} (\S\ref{mode-of-operation}) specifies
   340 the automatic provers (ATPs and SMT solvers) that should be run whenever
   341 Sledgehammer is invoked (e.g., ``\textit{provers}~= \textit{e spass
   342 remote\_vampire}''). For convenience, you can omit ``\textit{provers}~=''
   343 and simply write the prover names as a space-separated list (e.g., ``\textit{e
   344 spass remote\_vampire}'').
   345 
   346 \item[$\bullet$] \textbf{\textit{full\_types}} (\S\ref{problem-encoding})
   347 specifies whether type-sound encodings should be used. By default, Sledgehammer
   348 employs a mixture of type-sound and type-unsound encodings, occasionally
   349 yielding unsound ATP proofs. In contrast, SMT solver proofs should always be
   350 sound.
   351 
   352 \item[$\bullet$] \textbf{\textit{max\_relevant}} (\S\ref{relevance-filter})
   353 specifies the maximum number of facts that should be passed to the provers. By
   354 default, the value is prover-dependent but varies between about 150 and 1000. If
   355 the provers time out, you can try lowering this value to, say, 100 or 50 and see
   356 if that helps.
   357 
   358 \item[$\bullet$] \textbf{\textit{isar\_proof}} (\S\ref{output-format}) specifies
   359 that Isar proofs should be generated, instead of one-liner Metis proofs. The
   360 length of the Isar proofs can be controlled by setting
   361 \textit{isar\_shrink\_factor} (\S\ref{output-format}).
   362 
   363 \item[$\bullet$] \textbf{\textit{timeout}} (\S\ref{timeouts}) controls the
   364 provers' time limit. It is set to 30 seconds, but since Sledgehammer runs
   365 asynchronously you should not hesitate to raise this limit to 60 or 120 seconds
   366 if you are the kind of user who can think clearly while ATPs are active.
   367 \end{enum}
   368 
   369 Options can be set globally using \textbf{sledgehammer\_params}
   370 (\S\ref{command-syntax}). The command also prints the list of all available
   371 options with their current value. Fact selection can be influenced by specifying
   372 ``$(\textit{add}{:}~\textit{my\_facts})$'' after the \textbf{sledgehammer} call
   373 to ensure that certain facts are included, or simply ``$(\textit{my\_facts})$''
   374 to force Sledgehammer to run only with $\textit{my\_facts}$.
   375 
   376 \section{Frequently Asked Questions}
   377 \label{frequently-asked-questions}
   378 
   379 This sections answers frequently (and infrequently) asked questions about
   380 Sledgehammer. It is a good idea to skim over it now even if you don't have any
   381 questions at this stage. And if you have any further questions not listed here,
   382 send them to the author at \authoremail.
   383 
   384 \point{Why does Metis fail to reconstruct the proof?}
   385 
   386 There are many reasons. If Metis runs seemingly forever, that is a sign that the
   387 proof is too difficult for it. Metis's search is complete, so it should
   388 eventually find it, but that's little consolation. There are several possible
   389 solutions:
   390 
   391 \begin{enum}
   392 \item[$\bullet$] Try the \textit{isar\_proof} option (\S\ref{output-format}) to
   393 obtain a step-by-step Isar proof where each step is justified by Metis. Since
   394 the steps are fairly small, Metis is more likely to be able to replay them.
   395 
   396 \item[$\bullet$] Try the \textit{smt} proof method instead of Metis. It is
   397 usually stronger, but you need to have Z3 available to replay the proofs, trust
   398 the SMT solver, or use certificates. See the documentation in the \emph{SMT}
   399 theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.
   400 
   401 \item[$\bullet$] Try the \textit{blast} or \textit{auto} proof methods, passing
   402 the necessary facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},
   403 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
   404 \end{enum}
   405 
   406 In some rare cases, Metis fails fairly quickly, and you get the error message
   407 
   408 \prew
   409 \slshape
   410 Proof reconstruction failed.
   411 \postw
   412 
   413 This usually indicates that Sledgehammer found a type-incorrect proof.
   414 Sledgehammer erases some type information to speed up the search. Try
   415 Sledgehammer again with full type information: \textit{full\_types}
   416 (\S\ref{problem-encoding}), or choose a specific type encoding with
   417 \textit{type\_sys} (\S\ref{problem-encoding}). Older versions of Sledgehammer
   418 were frequent victims of this problem. Now this should very seldom be an issue,
   419 but if you notice many unsound proofs, contact the author at \authoremail.
   420 
   421 \point{How can I tell whether a generated proof is sound?}
   422 
   423 First, if Metis can reconstruct it, the proof is sound (modulo soundness of
   424 Isabelle's inference kernel). If it fails or runs seemingly forever, you can try
   425 
   426 \prew
   427 \textbf{apply}~\textbf{--} \\
   428 \textbf{sledgehammer} [\textit{type\_sys} = \textit{poly\_tags}] (\textit{metis\_facts})
   429 \postw
   430 
   431 where \textit{metis\_facts} is the list of facts appearing in the suggested
   432 Metis call. The automatic provers should be able to re-find the proof very
   433 quickly if it is sound, and the \textit{type\_sys} $=$ \textit{poly\_tags}
   434 option (\S\ref{problem-encoding}) ensures that no unsound proofs are found.
   435 
   436 The \textit{full\_types} option (\S\ref{problem-encoding}) can also be used
   437 here, but it is unsound in extremely rare degenerate cases such as the
   438 following:
   439 
   440 \prew
   441 \textbf{lemma} ``$\forall x\> y\Colon{'}\!a.\ x = y \,\Longrightarrow \exists f\> g\Colon\mathit{nat} \Rightarrow {'}\!a.\ f \not= g$'' \\
   442 \textbf{sledgehammer} [\textit{full\_types}] (\textit{nat.distinct\/}(1))
   443 \postw
   444 
   445 \point{Which facts are passed to the automatic provers?}
   446 
   447 The relevance filter assigns a score to every available fact (lemma, theorem,
   448 definition, or axiom)\ based upon how many constants that fact shares with the
   449 conjecture. This process iterates to include facts relevant to those just
   450 accepted, but with a decay factor to ensure termination. The constants are
   451 weighted to give unusual ones greater significance. The relevance filter copes
   452 best when the conjecture contains some unusual constants; if all the constants
   453 are common, it is unable to discriminate among the hundreds of facts that are
   454 picked up. The relevance filter is also memoryless: It has no information about
   455 how many times a particular fact has been used in a proof, and it cannot learn.
   456 
   457 The number of facts included in a problem varies from prover to prover, since
   458 some provers get overwhelmed more easily than others. You can show the number of
   459 facts given using the \textit{verbose} option (\S\ref{output-format}) and the
   460 actual facts using \textit{debug} (\S\ref{output-format}).
   461 
   462 Sledgehammer is good at finding short proofs combining a handful of existing
   463 lemmas. If you are looking for longer proofs, you must typically restrict the
   464 number of facts, by setting the \textit{max\_relevant} option
   465 (\S\ref{relevance-filter}) to, say, 50 or 100.
   466 
   467 You can also influence which facts are actually selected in a number of ways. If
   468 you simply want to ensure that a fact is included, you can specify it using the
   469 ``$(\textit{add}{:}~\textit{my\_facts})$'' syntax. For example:
   470 %
   471 \prew
   472 \textbf{sledgehammer} (\textit{add}: \textit{hd.simps} \textit{tl.simps})
   473 \postw
   474 %
   475 The specified facts then replace the least relevant facts that would otherwise be
   476 included; the other selected facts remain the same.
   477 If you want to direct the selection in a particular direction, you can specify
   478 the facts via \textbf{using}:
   479 %
   480 \prew
   481 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
   482 \textbf{sledgehammer}
   483 \postw
   484 %
   485 The facts are then more likely to be selected than otherwise, and if they are
   486 selected at iteration $j$ they also influence which facts are selected at
   487 iterations $j + 1$, $j + 2$, etc. To give them even more weight, try
   488 %
   489 \prew
   490 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
   491 \textbf{apply}~\textbf{--} \\
   492 \textbf{sledgehammer}
   493 \postw
   494 
   495 \point{Why are the generated Isar proofs so ugly/detailed/broken?}
   496 
   497 The current implementation is experimental and explodes exponentially in the
   498 worst case. Work on a new implementation has begun. There is a large body of
   499 research into transforming resolution proofs into natural deduction proofs (such
   500 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
   501 set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger
   502 value or to try several provers and keep the nicest-looking proof.
   503 
   504 \point{What is metisFT?}
   505 
   506 The \textit{metisFT} proof method is the fully-typed version of Metis. It is
   507 much slower than \textit{metis}, but the proof search is fully typed, and it
   508 also includes more powerful rules such as the axiom ``$x = \mathit{True}
   509 \mathrel{\lor} x = \mathit{False}$'' for reasoning in higher-order places (e.g.,
   510 in set comprehensions). The method kicks in automatically as a fallback when
   511 \textit{metis} fails, and it is sometimes generated by Sledgehammer instead of
   512 \textit{metis} if the proof obviously requires type information or if
   513 \textit{metis} failed when Sledgehammer preplayed the proof. (By default,
   514 Sledgehammer tries to run \textit{metis} and/or \textit{metisFT} for 4 seconds
   515 to ensure that the generated one-line proofs actually work and to display timing
   516 information. This can be configured using the \textit{preplay\_timeout} option
   517 (\S\ref{timeouts}).)
   518 
   519 If you see the warning
   520 
   521 \prew
   522 \slshape
   523 Metis: Falling back on ``\textit{metisFT\/}''.
   524 \postw
   525 
   526 in a successful Metis proof, you can advantageously replace the \textit{metis}
   527 call with \textit{metisFT}.
   528 
   529 \point{Should I minimize the number of lemmas?}
   530 
   531 In general, minimization is a good idea, because proofs involving fewer lemmas
   532 tend to be shorter as well, and hence easier to re-find by Metis. But the
   533 opposite is sometimes the case. Keep an eye on the timing information displayed
   534 next to the suggested Metis calls.
   535 
   536 \point{Why does the minimizer sometimes starts on its own?}
   537 
   538 There are two scenarios in which this can happen. First, some provers (notably
   539 CVC3, Satallax, and Yices) do not provide proofs or sometimes provide incomplete
   540 proofs. The minimizer is then invoked to find out which facts are actually
   541 needed from the (large) set of facts that was initinally given to the prover.
   542 Second, if a prover returns a proof with lots of facts, the minimizer is invoked
   543 automatically since Metis would be unlikely to re-find the proof.
   544 
   545 \point{A strange error occurred---what should I do?}
   546 
   547 Sledgehammer tries to give informative error messages. Please report any strange
   548 error to the author at \authoremail. This applies double if you get the message
   549 
   550 \prew
   551 \slshape
   552 The prover found a type-unsound proof involving ``\textit{foo}'',
   553 ``\textit{bar}'', and ``\textit{baz}'' even though a supposedly type-sound
   554 encoding was used (or, less likely, your axioms are inconsistent). You might
   555 want to report this to the Isabelle developers.
   556 \postw
   557 
   558 \point{Auto can solve it---why not Sledgehammer?}
   559 
   560 Problems can be easy for \textit{auto} and difficult for automatic provers, but
   561 the reverse is also true, so don't be discouraged if your first attempts fail.
   562 Because the system refers to all theorems known to Isabelle, it is particularly
   563 suitable when your goal has a short proof from lemmas that you don't know about.
   564 
   565 \point{Why are there so many options?}
   566 
   567 Sledgehammer's philosophy should work out of the box, without user guidance.
   568 Many of the options are meant to be used mostly by the Sledgehammer developers
   569 for experimentation purposes. Of course, feel free to experiment with them if
   570 you are so inclined.
   571 
   572 \section{Command Syntax}
   573 \label{command-syntax}
   574 
   575 Sledgehammer can be invoked at any point when there is an open goal by entering
   576 the \textbf{sledgehammer} command in the theory file. Its general syntax is as
   577 follows:
   578 
   579 \prew
   580 \textbf{sledgehammer} \textit{subcommand\/$^?$ options\/$^?$ facts\_override\/$^?$ num\/$^?$}
   581 \postw
   582 
   583 For convenience, Sledgehammer is also available in the ``Commands'' submenu of
   584 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c
   585 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with
   586 no arguments in the theory text.
   587 
   588 In the general syntax, the \textit{subcommand} may be any of the following:
   589 
   590 \begin{enum}
   591 \item[$\bullet$] \textbf{\textit{run} (the default):} Runs Sledgehammer on
   592 subgoal number \textit{num} (1 by default), with the given options and facts.
   593 
   594 \item[$\bullet$] \textbf{\textit{min}:} Attempts to minimize the provided facts
   595 (specified in the \textit{facts\_override} argument) to obtain a simpler proof
   596 involving fewer facts. The options and goal number are as for \textit{run}.
   597 
   598 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued
   599 by Sledgehammer. This allows you to examine results that might have been lost
   600 due to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a
   601 limit on the number of messages to display (5 by default).
   602 
   603 \item[$\bullet$] \textbf{\textit{supported\_provers}:} Prints the list of
   604 automatic provers supported by Sledgehammer. See \S\ref{installation} and
   605 \S\ref{mode-of-operation} for more information on how to install automatic
   606 provers.
   607 
   608 \item[$\bullet$] \textbf{\textit{running\_provers}:} Prints information about
   609 currently running automatic provers, including elapsed runtime and remaining
   610 time until timeout.
   611 
   612 \item[$\bullet$] \textbf{\textit{kill\_provers}:} Terminates all running
   613 automatic provers.
   614 
   615 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
   616 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
   617 \end{enum}
   618 
   619 Sledgehammer's behavior can be influenced by various \textit{options}, which can
   620 be specified in brackets after the \textbf{sledgehammer} command. The
   621 \textit{options} are a list of key--value pairs of the form ``[$k_1 = v_1,
   622 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true}'' is optional. For
   623 example:
   624 
   625 \prew
   626 \textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120$\,s$]
   627 \postw
   628 
   629 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:
   630 
   631 \prew
   632 \textbf{sledgehammer\_params} \textit{options}
   633 \postw
   634 
   635 The supported options are described in \S\ref{option-reference}.
   636 
   637 The \textit{facts\_override} argument lets you alter the set of facts that go
   638 through the relevance filter. It may be of the form ``(\textit{facts})'', where
   639 \textit{facts} is a space-separated list of Isabelle facts (theorems, local
   640 assumptions, etc.), in which case the relevance filter is bypassed and the given
   641 facts are used. It may also be of the form ``(\textit{add}:\ \textit{facts}$_1$)'',
   642 ``(\textit{del}:\ \textit{facts}$_2$)'', or ``(\textit{add}:\ \textit{facts}$_1$\
   643 \textit{del}:\ \textit{facts}$_2$)'', where the relevance filter is instructed to
   644 proceed as usual except that it should consider \textit{facts}$_1$
   645 highly-relevant and \textit{facts}$_2$ fully irrelevant.
   646 
   647 You can instruct Sledgehammer to run automatically on newly entered theorems by
   648 enabling the ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof
   649 General. For automatic runs, only the first prover set using \textit{provers}
   650 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,
   651 \textit{slicing} (\S\ref{mode-of-operation}) is disabled, \textit{full\_types}
   652 (\S\ref{problem-encoding}) is enabled, \textit{verbose} (\S\ref{output-format})
   653 and \textit{debug} (\S\ref{output-format}) are disabled, and \textit{timeout}
   654 (\S\ref{timeouts}) is superseded by the ``Auto Tools Time Limit'' in Proof
   655 General's ``Isabelle'' menu. Sledgehammer's output is also more concise.
   656 
   657 \section{Option Reference}
   658 \label{option-reference}
   659 
   660 \def\defl{\{}
   661 \def\defr{\}}
   662 
   663 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}
   664 \def\qty#1{$\left<\textit{#1}\right>$}
   665 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1}}\right>}$}
   666 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{true}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   667 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{false}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   668 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   669 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\hfill\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
   670 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}
   671 \def\opnodefaultbrk#1#2{\flushitem{$\bigl[$\textit{#1} =$\bigr]$ \qtybf{#2}} \nopagebreak\\[\parskip]}
   672 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\enskip \defl\textit{#3}\defr} \nopagebreak\\[\parskip]}
   673 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}
   674 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
   675 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
   676 
   677 Sledgehammer's options are categorized as follows:\ mode of operation
   678 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}),
   679 relevance filter (\S\ref{relevance-filter}), output format
   680 (\S\ref{output-format}), authentication (\S\ref{authentication}), and timeouts
   681 (\S\ref{timeouts}).
   682 
   683 The descriptions below refer to the following syntactic quantities:
   684 
   685 \begin{enum}
   686 \item[$\bullet$] \qtybf{string}: A string.
   687 \item[$\bullet$] \qtybf{bool\/}: \textit{true} or \textit{false}.
   688 \item[$\bullet$] \qtybf{smart\_bool\/}: \textit{true}, \textit{false}, or
   689 \textit{smart}.
   690 \item[$\bullet$] \qtybf{int\/}: An integer.
   691 %\item[$\bullet$] \qtybf{float\/}: A floating-point number (e.g., 2.5).
   692 \item[$\bullet$] \qtybf{float\_pair\/}: A pair of floating-point numbers
   693 (e.g., 0.6 0.95).
   694 \item[$\bullet$] \qtybf{smart\_int\/}: An integer or \textit{smart}.
   695 \item[$\bullet$] \qtybf{float\_or\_none\/}: A floating-point number (e.g., 60 or
   696 0.5) expressing a number of seconds, or the keyword \textit{none} ($\infty$
   697 seconds).
   698 \end{enum}
   699 
   700 Default values are indicated in braces. Boolean options have a negated
   701 counterpart (e.g., \textit{blocking} vs.\ \textit{non\_blocking}). When setting
   702 Boolean options, ``= \textit{true}'' may be omitted.
   703 
   704 \subsection{Mode of Operation}
   705 \label{mode-of-operation}
   706 
   707 \begin{enum}
   708 \opnodefaultbrk{provers}{string}
   709 Specifies the automatic provers to use as a space-separated list (e.g.,
   710 ``\textit{e}~\textit{spass}~\textit{remote\_vampire}''). The following local
   711 provers are supported:
   712 
   713 \begin{enum}
   714 \item[$\bullet$] \textbf{\textit{cvc3}:} CVC3 is an SMT solver developed by
   715 Clark Barrett, Cesare Tinelli, and their colleagues \cite{cvc3}. To use CVC3,
   716 set the environment variable \texttt{CVC3\_SOLVER} to the complete path of the
   717 executable, including the file name. Sledgehammer has been tested with version
   718 2.2.
   719 
   720 \item[$\bullet$] \textbf{\textit{e}:} E is a first-order resolution prover
   721 developed by Stephan Schulz \cite{schulz-2002}. To use E, set the environment
   722 variable \texttt{E\_HOME} to the directory that contains the \texttt{eproof}
   723 executable, or install the prebuilt E package from Isabelle's download page. See
   724 \S\ref{installation} for details.
   725 
   726 \item[$\bullet$] \textbf{\textit{spass}:} SPASS is a first-order resolution
   727 prover developed by Christoph Weidenbach et al.\ \cite{weidenbach-et-al-2009}.
   728 To use SPASS, set the environment variable \texttt{SPASS\_HOME} to the directory
   729 that contains the \texttt{SPASS} executable, or install the prebuilt SPASS
   730 package from Isabelle's download page. Sledgehammer requires version 3.5 or
   731 above. See \S\ref{installation} for details.
   732 
   733 \item[$\bullet$] \textbf{\textit{yices}:} Yices is an SMT solver developed at
   734 SRI \cite{yices}. To use Yices, set the environment variable
   735 \texttt{YICES\_SOLVER} to the complete path of the executable, including the
   736 file name. Sledgehammer has been tested with version 1.0.
   737 
   738 \item[$\bullet$] \textbf{\textit{vampire}:} Vampire is a first-order resolution
   739 prover developed by Andrei Voronkov and his colleagues
   740 \cite{riazanov-voronkov-2002}. To use Vampire, set the environment variable
   741 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{vampire}
   742 executable. Sledgehammer has been tested with versions 11, 0.6, and 1.0.
   743 
   744 \item[$\bullet$] \textbf{\textit{z3}:} Z3 is an SMT solver developed at
   745 Microsoft Research \cite{z3}. To use Z3, set the environment variable
   746 \texttt{Z3\_SOLVER} to the complete path of the executable, including the file
   747 name, and set \texttt{Z3\_NON\_COMMERCIAL=yes} to confirm that you are a
   748 noncommercial user. Sledgehammer has been tested with versions 2.7 to 2.18.
   749 
   750 \item[$\bullet$] \textbf{\textit{z3\_atp}:} This version of Z3 pretends to be an
   751 ATP, exploiting Z3's undocumented support for the TPTP format. It is included
   752 for experimental purposes. It requires version 2.18 or above.
   753 \end{enum}
   754 
   755 In addition, the following remote provers are supported:
   756 
   757 \begin{enum}
   758 \item[$\bullet$] \textbf{\textit{remote\_cvc3}:} The remote version of CVC3 runs
   759 on servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
   760 point).
   761 
   762 \item[$\bullet$] \textbf{\textit{remote\_e}:} The remote version of E runs
   763 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.
   764 
   765 \item[$\bullet$] \textbf{\textit{remote\_leo2}:} LEO-II is an automatic
   766 higher-order prover developed by Christoph Benzm\"uller et al. \cite{leo2}. The
   767 remote version of LEO-II runs on Geoff Sutcliffe's Miami servers. In the current
   768 setup, the problems given to LEO-II are only mildly higher-order.
   769 
   770 \item[$\bullet$] \textbf{\textit{remote\_satallax}:} Satallax is an automatic
   771 higher-order prover developed by Chad Brown et al. \cite{satallax}. The remote
   772 version of Satallax runs on Geoff Sutcliffe's Miami servers. In the current
   773 setup, the problems given to Satallax are only mildly higher-order.
   774 
   775 \item[$\bullet$] \textbf{\textit{remote\_sine\_e}:} SInE-E is a metaprover
   776 developed by Kry\v stof Hoder \cite{sine} based on E. The remote version of
   777 SInE runs on Geoff Sutcliffe's Miami servers.
   778 
   779 \item[$\bullet$] \textbf{\textit{remote\_snark}:} SNARK is a first-order
   780 resolution prover developed by Stickel et al.\ \cite{snark}. The remote version
   781 of SNARK runs on Geoff Sutcliffe's Miami servers.
   782 
   783 \item[$\bullet$] \textbf{\textit{remote\_tofof\_e}:} ToFoF-E is a metaprover
   784 developed by Geoff Sutcliffe \cite{tofof} based on E running on his Miami
   785 servers. This ATP supports a fragment of the TPTP many-typed first-order format
   786 (TFF). It is supported primarily for experimenting with the
   787 \textit{type\_sys} $=$ \textit{simple} option (\S\ref{problem-encoding}).
   788 
   789 \item[$\bullet$] \textbf{\textit{remote\_vampire}:} The remote version of
   790 Vampire runs on Geoff Sutcliffe's Miami servers. Version 9 is used.
   791 
   792 \item[$\bullet$] \textbf{\textit{remote\_waldmeister}:} Waldmeister is a unit
   793 equality prover developed by Hillenbrand et al.\ \cite{waldmeister}. It can be
   794 used to prove universally quantified equations using unconditional equations.
   795 The remote version of Waldmeister runs on Geoff Sutcliffe's Miami servers.
   796 
   797 \item[$\bullet$] \textbf{\textit{remote\_z3}:} The remote version of Z3 runs on
   798 servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
   799 point).
   800 
   801 \item[$\bullet$] \textbf{\textit{remote\_z3\_atp}:} The remote version of ``Z3
   802 as an ATP'' runs on Geoff Sutcliffe's Miami servers.
   803 \end{enum}
   804 
   805 By default, Sledgehammer will run E, SPASS, Vampire, SInE-E, and Z3 (or whatever
   806 the SMT module's \textit{smt\_solver} configuration option is set to) in
   807 parallel---either locally or remotely, depending on the number of processor
   808 cores available. For historical reasons, the default value of this option can be
   809 overridden using the option ``Sledgehammer: Provers'' from the ``Isabelle'' menu
   810 in Proof General.
   811 
   812 It is a good idea to run several provers in parallel, although it could slow
   813 down your machine. Running E, SPASS, and Vampire for 5~seconds yields a similar
   814 success rate to running the most effective of these for 120~seconds
   815 \cite{boehme-nipkow-2010}.
   816 
   817 \opnodefault{prover}{string}
   818 Alias for \textit{provers}.
   819 
   820 %\opnodefault{atps}{string}
   821 %Legacy alias for \textit{provers}.
   822 
   823 %\opnodefault{atp}{string}
   824 %Legacy alias for \textit{provers}.
   825 
   826 \opfalse{blocking}{non\_blocking}
   827 Specifies whether the \textbf{sledgehammer} command should operate
   828 synchronously. The asynchronous (non-blocking) mode lets the user start proving
   829 the putative theorem manually while Sledgehammer looks for a proof, but it can
   830 also be more confusing. Irrespective of the value of this option, Sledgehammer
   831 is always run synchronously for the new jEdit-based user interface or if
   832 \textit{debug} (\S\ref{output-format}) is enabled.
   833 
   834 \optrue{slicing}{no\_slicing}
   835 Specifies whether the time allocated to a prover should be sliced into several
   836 segments, each of which has its own set of possibly prover-dependent options.
   837 For SPASS and Vampire, the first slice tries the fast but incomplete
   838 set-of-support (SOS) strategy, whereas the second slice runs without it. For E,
   839 up to three slices are tried, with different weighted search strategies and
   840 number of facts. For SMT solvers, several slices are tried with the same options
   841 each time but fewer and fewer facts. According to benchmarks with a timeout of
   842 30 seconds, slicing is a valuable optimization, and you should probably leave it
   843 enabled unless you are conducting experiments. This option is implicitly
   844 disabled for (short) automatic runs.
   845 
   846 \nopagebreak
   847 {\small See also \textit{verbose} (\S\ref{output-format}).}
   848 
   849 \opfalse{overlord}{no\_overlord}
   850 Specifies whether Sledgehammer should put its temporary files in
   851 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for
   852 debugging Sledgehammer but also unsafe if several instances of the tool are run
   853 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may
   854 safely remove them after Sledgehammer has run.
   855 
   856 \nopagebreak
   857 {\small See also \textit{debug} (\S\ref{output-format}).}
   858 \end{enum}
   859 
   860 \subsection{Problem Encoding}
   861 \label{problem-encoding}
   862 
   863 \begin{enum}
   864 \opfalse{explicit\_apply}{implicit\_apply}
   865 Specifies whether function application should be encoded as an explicit
   866 ``apply'' operator in ATP problems. If the option is set to \textit{false}, each
   867 function will be directly applied to as many arguments as possible. Enabling
   868 this option can sometimes help discover higher-order proofs that otherwise would
   869 not be found.
   870 
   871 \opfalse{full\_types}{partial\_types}
   872 Specifies whether full type information is encoded in ATP problems. Enabling
   873 this option prevents the discovery of type-incorrect proofs, but it can slow
   874 down the ATP slightly. This option is implicitly enabled for automatic runs. For
   875 historical reasons, the default value of this option can be overridden using the
   876 option ``Sledgehammer: Full Types'' from the ``Isabelle'' menu in Proof General.
   877 
   878 \opdefault{type\_sys}{string}{smart}
   879 Specifies the type system to use in ATP problems. Some of the type systems are
   880 unsound, meaning that they can give rise to spurious proofs (unreconstructible
   881 using Metis). The supported type systems are listed below, with an indication of
   882 their soundness in parentheses:
   883 
   884 \begin{enum}
   885 \item[$\bullet$] \textbf{\textit{erased} (very unsound):} No type information is
   886 supplied to the ATP. Types are simply erased.
   887 
   888 \item[$\bullet$] \textbf{\textit{poly\_preds} (sound):} Types are encoded using
   889 a predicate \textit{has\_\allowbreak type\/}$(\tau, t)$ that restricts the range
   890 of bound variables. Constants are annotated with their types, supplied as extra
   891 arguments, to resolve overloading.
   892 
   893 \item[$\bullet$] \textbf{\textit{poly\_tags} (sound):} Each term and subterm is
   894 tagged with its type using a function $\mathit{type\_info\/}(\tau, t)$. This
   895 coincides with the encoding used by the \textit{metisFT} command.
   896 
   897 \item[$\bullet$] \textbf{\textit{poly\_args} (unsound):}
   898 Like for \textit{poly\_preds} constants are annotated with their types to
   899 resolve overloading, but otherwise no type information is encoded. This
   900 coincides with the encoding used by the \textit{metis} command (before it falls
   901 back on \textit{metisFT}).
   902 
   903 \item[$\bullet$]
   904 \textbf{%
   905 \textit{mono\_preds}, \textit{mono\_tags} (sound);
   906 \textit{mono\_args} (unsound):} \\
   907 Similar to \textit{poly\_preds}, \textit{poly\_tags}, and \textit{poly\_args},
   908 respectively, but the problem is additionally monomorphized, meaning that type
   909 variables are instantiated with heuristically chosen ground types.
   910 Monomorphization can simplify reasoning but also leads to larger fact bases,
   911 which can slow down the ATPs.
   912 
   913 \item[$\bullet$]
   914 \textbf{%
   915 \textit{mangled\_preds},
   916 \textit{mangled\_tags} (sound); \\
   917 \textit{mangled\_args} (unsound):} \\
   918 Similar to
   919 \textit{mono\_preds}, \textit{mono\_tags}, and \textit{mono\_args},
   920 respectively but types are mangled in constant names instead of being supplied
   921 as ground term arguments. The binary predicate $\mathit{has\_type\/}(\tau, t)$
   922 becomes a unary predicate $\mathit{has\_type\_}\tau(t)$, and the binary function
   923 $\mathit{type\_info\/}(\tau, t)$ becomes a unary function
   924 $\mathit{type\_info\_}\tau(t)$.
   925 
   926 \item[$\bullet$] \textbf{\textit{simple} (sound):} Use the prover's support for
   927 simple types if available; otherwise, fall back on \textit{mangled\_preds}. The
   928 problem is monomorphized.
   929 
   930 \item[$\bullet$]
   931 \textbf{%
   932 \textit{poly\_preds}?, \textit{poly\_tags}?, \textit{mono\_preds}?, \textit{mono\_tags}?, \\
   933 \textit{mangled\_preds}?, \textit{mangled\_tags}?, \textit{simple}? (quasi-sound):} \\
   934 The type systems \textit{poly\_preds}, \textit{poly\_tags},
   935 \textit{mono\_preds}, \textit{mono\_tags}, \textit{mangled\_preds},
   936 \textit{mangled\_tags}, and \textit{simple} are fully typed and sound. For each
   937 of these, Sledgehammer also provides a lighter, virtually sound variant
   938 identified by a question mark (`{?}')\ that detects and erases monotonic types,
   939 notably infinite types. (For \textit{simple}, the types are not actually erased
   940 but rather replaced by a shared uniform type of individuals.)
   941 
   942 \item[$\bullet$]
   943 \textbf{%
   944 \textit{poly\_preds}!, \textit{poly\_tags}!, \textit{mono\_preds}!, \textit{mono\_tags}!, \\
   945 \textit{mangled\_preds}!, \textit{mangled\_tags}!, \textit{simple}! \\
   946 (mildly unsound):} \\
   947 The type systems \textit{poly\_preds}, \textit{poly\_tags},
   948 \textit{mono\_preds}, \textit{mono\_tags}, \textit{mangled\_preds},
   949 \textit{mangled\_tags}, and \textit{simple} also admit a mildly unsound (but
   950 very efficient) variant identified by an exclamation mark (`{!}') that detects
   951 and erases erases all types except those that are clearly finite (e.g.,
   952 \textit{bool}). (For \textit{simple}, the types are not actually erased but
   953 rather replaced by a shared uniform type of individuals.)
   954 
   955 \item[$\bullet$] \textbf{\textit{smart}:} If \textit{full\_types} is enabled,
   956 uses a sound or virtually sound encoding; otherwise, uses any encoding. The actual
   957 encoding used depends on the ATP and should be the most efficient for that ATP.
   958 \end{enum}
   959 
   960 In addition, all the \textit{preds} and \textit{tags} type systems are available
   961 in two variants, a lightweight and a heavyweight variant. The lightweight
   962 variants are generally more efficient and are the default; the heavyweight
   963 variants are identified by a \textit{\_heavy} suffix (e.g.,
   964 \textit{mangled\_preds\_heavy}{?}).
   965 
   966 For SMT solvers and ToFoF-E, the type system is always \textit{simple},
   967 irrespective of the value of this option.
   968 
   969 \nopagebreak
   970 {\small See also \textit{max\_new\_mono\_instances} (\S\ref{relevance-filter})
   971 and \textit{max\_mono\_iters} (\S\ref{relevance-filter}).}
   972 \end{enum}
   973 
   974 \subsection{Relevance Filter}
   975 \label{relevance-filter}
   976 
   977 \begin{enum}
   978 \opdefault{relevance\_thresholds}{float\_pair}{\upshape 0.45~0.85}
   979 Specifies the thresholds above which facts are considered relevant by the
   980 relevance filter. The first threshold is used for the first iteration of the
   981 relevance filter and the second threshold is used for the last iteration (if it
   982 is reached). The effective threshold is quadratically interpolated for the other
   983 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems
   984 are relevant and 1 only theorems that refer to previously seen constants.
   985 
   986 \opsmart{max\_relevant}{smart\_int}
   987 Specifies the maximum number of facts that may be returned by the relevance
   988 filter. If the option is set to \textit{smart}, it is set to a value that was
   989 empirically found to be appropriate for the prover. A typical value would be
   990 300.
   991 
   992 \opdefault{max\_new\_mono\_instances}{int}{\upshape 400}
   993 Specifies the maximum number of monomorphic instances to generate beyond
   994 \textit{max\_relevant}. The higher this limit is, the more monomorphic instances
   995 are potentially generated. Whether monomorphization takes place depends on the
   996 type system used.
   997 
   998 \nopagebreak
   999 {\small See also \textit{type\_sys} (\S\ref{problem-encoding}).}
  1000 
  1001 \opdefault{max\_mono\_iters}{int}{\upshape 3}
  1002 Specifies the maximum number of iterations for the monomorphization fixpoint
  1003 construction. The higher this limit is, the more monomorphic instances are
  1004 potentially generated. Whether monomorphization takes place depends on the
  1005 type system used.
  1006 
  1007 \nopagebreak
  1008 {\small See also \textit{type\_sys} (\S\ref{problem-encoding}).}
  1009 \end{enum}
  1010 
  1011 \subsection{Output Format}
  1012 \label{output-format}
  1013 
  1014 \begin{enum}
  1015 
  1016 \opfalse{verbose}{quiet}
  1017 Specifies whether the \textbf{sledgehammer} command should explain what it does.
  1018 This option is implicitly disabled for automatic runs.
  1019 
  1020 \opfalse{debug}{no\_debug}
  1021 Specifies whether Sledgehammer should display additional debugging information
  1022 beyond what \textit{verbose} already displays. Enabling \textit{debug} also
  1023 enables \textit{verbose} and \textit{blocking} (\S\ref{mode-of-operation})
  1024 behind the scenes. The \textit{debug} option is implicitly disabled for
  1025 automatic runs.
  1026 
  1027 \nopagebreak
  1028 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}
  1029 
  1030 \opfalse{isar\_proof}{no\_isar\_proof}
  1031 Specifies whether Isar proofs should be output in addition to one-liner
  1032 \textit{metis} proofs. Isar proof construction is still experimental and often
  1033 fails; however, they are usually faster and sometimes more robust than
  1034 \textit{metis} proofs.
  1035 
  1036 \opdefault{isar\_shrink\_factor}{int}{\upshape 1}
  1037 Specifies the granularity of the Isar proof. A value of $n$ indicates that each
  1038 Isar proof step should correspond to a group of up to $n$ consecutive proof
  1039 steps in the ATP proof.
  1040 \end{enum}
  1041 
  1042 \subsection{Authentication}
  1043 \label{authentication}
  1044 
  1045 \begin{enum}
  1046 \opnodefault{expect}{string}
  1047 Specifies the expected outcome, which must be one of the following:
  1048 
  1049 \begin{enum}
  1050 \item[$\bullet$] \textbf{\textit{some}:} Sledgehammer found a (potentially
  1051 unsound) proof.
  1052 \item[$\bullet$] \textbf{\textit{none}:} Sledgehammer found no proof.
  1053 \item[$\bullet$] \textbf{\textit{timeout}:} Sledgehammer timed out.
  1054 \item[$\bullet$] \textbf{\textit{unknown}:} Sledgehammer encountered some
  1055 problem.
  1056 \end{enum}
  1057 
  1058 Sledgehammer emits an error (if \textit{blocking} is enabled) or a warning
  1059 (otherwise) if the actual outcome differs from the expected outcome. This option
  1060 is useful for regression testing.
  1061 
  1062 \nopagebreak
  1063 {\small See also \textit{blocking} (\S\ref{mode-of-operation}) and
  1064 \textit{timeout} (\S\ref{timeouts}).}
  1065 \end{enum}
  1066 
  1067 \subsection{Timeouts}
  1068 \label{timeouts}
  1069 
  1070 \begin{enum}
  1071 \opdefault{timeout}{float\_or\_none}{\upshape 30}
  1072 Specifies the maximum number of seconds that the automatic provers should spend
  1073 searching for a proof. This excludes problem preparation and is a soft limit.
  1074 For historical reasons, the default value of this option can be overridden using
  1075 the option ``Sledgehammer: Time Limit'' from the ``Isabelle'' menu in Proof
  1076 General.
  1077 
  1078 \opdefault{preplay\_timeout}{float\_or\_none}{\upshape 4}
  1079 Specifies the maximum number of seconds that Metis should be spent trying to
  1080 ``preplay'' the found proof. If this option is set to 0, no preplaying takes
  1081 place, and no timing information is displayed next to the suggested Metis calls.
  1082 \end{enum}
  1083 
  1084 \let\em=\sl
  1085 \bibliography{../manual}{}
  1086 \bibliographystyle{abbrv}
  1087 
  1088 \end{document}