wneuper/isa: doc-src/Sledgehammer/sledgehammer.tex@d4754183ccce

     1 \documentclass[a4paper,12pt]{article}

     2 \usepackage[T1]{fontenc}

     3 \usepackage{amsmath}

     4 \usepackage{amssymb}

     5 \usepackage[english,french]{babel}

     6 \usepackage{color}

     7 \usepackage{footmisc}

     8 \usepackage{graphicx}

     9 %\usepackage{mathpazo}

    10 \usepackage{multicol}

    11 \usepackage{stmaryrd}

    12 %\usepackage[scaled=.85]{beramono}

    13 \usepackage{../../lib/texinputs/isabelle,../iman,../pdfsetup}

    15 \newcommand\download{\url{http://www21.in.tum.de/~blanchet/\#software}}

    17 \def\qty#1{\ensuremath{\left<\mathit{#1\/}\right>}}

    18 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1\/}}\right>}$}

    20 \newcommand\const[1]{\textsf{#1}}

    22 %\oddsidemargin=4.6mm

    23 %\evensidemargin=4.6mm

    24 %\textwidth=150mm

    25 %\topmargin=4.6mm

    26 %\headheight=0mm

    27 %\headsep=0mm

    28 %\textheight=234mm

    30 \def\Colon{\mathord{:\mkern-1.5mu:}}

    31 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}

    32 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}

    33 \def\lparr{\mathopen{(\mkern-4mu\mid}}

    34 \def\rparr{\mathclose{\mid\mkern-4mu)}}

    36 \def\unk{{?}}

    37 \def\undef{(\lambda x.\; \unk)}

    38 %\def\unr{\textit{others}}

    39 \def\unr{\ldots}

    40 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}

    41 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}

    43 \urlstyle{tt}

    45 \begin{document}

    47 %%% TYPESETTING

    48 %\renewcommand\labelitemi{$\bullet$}

    49 \renewcommand\labelitemi{\raise.065ex\hbox{\small\textbullet}}

    51 \selectlanguage{english}

    53 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]

    54 Hammering Away \\[\smallskipamount]

    55 \Large A User's Guide to Sledgehammer for Isabelle/HOL}

    56 \author{\hbox{} \\

    57 Jasmin Christian Blanchette \\

    58 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\[4\smallskipamount]

    59 {\normalsize with contributions from} \\[4\smallskipamount]

    60 Lawrence C. Paulson \\

    61 {\normalsize Computer Laboratory, University of Cambridge} \\

    62 \hbox{}}

    64 \maketitle

    66 \tableofcontents

    68 \setlength{\parskip}{.7em plus .2em minus .1em}

    69 \setlength{\parindent}{0pt}

    70 \setlength{\abovedisplayskip}{\parskip}

    71 \setlength{\abovedisplayshortskip}{.9\parskip}

    72 \setlength{\belowdisplayskip}{\parskip}

    73 \setlength{\belowdisplayshortskip}{.9\parskip}

    75 % General-purpose enum environment with correct spacing

    76 \newenvironment{enum}%

    77     {\begin{list}{}{%

    78         \setlength{\topsep}{.1\parskip}%

    79         \setlength{\partopsep}{.1\parskip}%

    80         \setlength{\itemsep}{\parskip}%

    81         \advance\itemsep by-\parsep}}

    82     {\end{list}}

    84 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin

    85 \advance\rightskip by\leftmargin}

    86 \def\post{\vskip0pt plus1ex\endgroup}

    88 \def\prew{\pre\advance\rightskip by-\leftmargin}

    89 \def\postw{\post}

    91 \section{Introduction}

    92 \label{introduction}

    94 Sledgehammer is a tool that applies automatic theorem provers (ATPs)

    95 and satisfiability-modulo-theories (SMT) solvers on the current goal. The

    96 supported ATPs are E \cite{schulz-2002}, E-SInE \cite{sine}, E-ToFoF

    97 \cite{tofof}, LEO-II \cite{leo2}, Satallax \cite{satallax}, SNARK \cite{snark},

    98 SPASS \cite{weidenbach-et-al-2009}, Vampire \cite{riazanov-voronkov-2002}, and

    99 Waldmeister \cite{waldmeister}. The ATPs are run either locally or remotely via

   100 the System\-On\-TPTP web service \cite{sutcliffe-2000}. In addition to the ATPs,

   101 the SMT solvers Z3 \cite{z3} is used by default, and you can tell Sledgehammer

   102 to try CVC3 \cite{cvc3} and Yices \cite{yices} as well; these are run either

   103 locally or on a server at the TU M\"unchen.

   105 The problem passed to the automatic provers consists of your current goal

   106 together with a heuristic selection of hundreds of facts (theorems) from the

   107 current theory context, filtered by relevance. Because jobs are run in the

   108 background, you can continue to work on your proof by other means. Provers can

   109 be run in parallel. Any reply (which may arrive half a minute later) will appear

   110 in the Proof General response buffer.

   112 The result of a successful proof search is some source text that usually (but

   113 not always) reconstructs the proof within Isabelle. For ATPs, the reconstructed

   114 proof relies on the general-purpose \textit{metis} proof method, which

   115 integrates the Metis ATP in Isabelle/HOL with explicit inferences going through

   116 the kernel. Thus its results are correct by construction.

   118 In this manual, we will explicitly invoke the \textbf{sledgehammer} command.

   119 Sledgehammer also provides an automatic mode that can be enabled via the ``Auto

   120 Sledgehammer'' option in Proof General's ``Isabelle'' menu. In this mode,

   121 Sledgehammer is run on every newly entered theorem. The time limit for Auto

   122 Sledgehammer and other automatic tools can be set using the ``Auto Tools Time

   123 Limit'' option.

   125 \newbox\boxA

   126 \setbox\boxA=\hbox{\texttt{NOSPAM}}

   128 \newcommand\authoremail{\texttt{blan{\color{white}NOSPAM}\kern-\wd\boxA{}chette@\allowbreak

   129 in.\allowbreak tum.\allowbreak de}}

   131 To run Sledgehammer, you must make sure that the theory \textit{Sledgehammer} is

   132 imported---this is rarely a problem in practice since it is part of

   133 \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's

   134 \texttt{src/HOL/Metis\_Examples} directory.

   135 Comments and bug reports concerning Sledgehammer or this manual should be

   136 directed to the author at \authoremail.

   138 \vskip2.5\smallskipamount

   140 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for

   141 %suggesting several textual improvements.

   143 \section{Installation}

   144 \label{installation}

   146 Sledgehammer is part of Isabelle, so you don't need to install it. However, it

   147 relies on third-party automatic provers (ATPs and SMT solvers).

   149 Among the ATPs, E, LEO-II, Satallax, SPASS, and Vampire can be run locally; in

   150 addition, E, E-SInE, E-ToFoF, iProver, iProver-Eq, LEO-II, Satallax, SNARK,

   151 Waldmeister, and Vampire are available remotely via System\-On\-TPTP

   152 \cite{sutcliffe-2000}. If you want better performance, you should at least

   153 install E and SPASS locally.

   155 Among the SMT solvers, CVC3, Yices, and Z3 can be run locally, and CVC3 and Z3

   156 can be run remotely on a TU M\"unchen server. If you want better performance and

   157 get the ability to replay proofs that rely on the \emph{smt} proof method

   158 without an Internet connection, you should at least install Z3 locally.

   160 There are three main ways to install automatic provers on your machine:

   162 \begin{sloppy}

   163 \begin{enum}

   164 \item[\labelitemi] If you installed an official Isabelle package, it should

   165 already include properly setup executables for CVC3, E, SPASS, and Z3, ready to use.%

   166 \footnote{Vampire's and Yices's licenses prevent us from doing the same for

   167 these otherwise remarkable tools.}

   168 For Z3, you must additionally set the variable

   169 \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a

   170 noncommercial user, either in the environment in which Isabelle is

   171 launched or in your

   172 \texttt{\char`\~/\$ISABELLE\_HOME\_USER/etc/settings} file.

   174 \item[\labelitemi] Alternatively, you can download the Isabelle-aware CVC3, E,

   175 SPASS, and Z3 binary packages from \download. Extract the archives, then add a

   176 line to your \texttt{\$ISABELLE\_HOME\_USER\slash etc\slash components}%

   177 \footnote{The variable \texttt{\$ISABELLE\_HOME\_USER} is set by Isabelle at

   178 startup. Its value can be retrieved by executing \texttt{isabelle}

   179 \texttt{getenv} \texttt{ISABELLE\_HOME\_USER} on the command line.}

   180 file with the absolute path to CVC3, E, SPASS, or Z3. For example, if the

   181 \texttt{components} file does not exist yet and you extracted SPASS to

   182 \texttt{/usr/local/spass-3.7}, create it with the single line

   184 \prew

   185 \texttt{/usr/local/spass-3.7}

   186 \postw

   188 (including an invisible newline character) in it.

   190 \item[\labelitemi] If you prefer to build E, LEO-II, Satallax, or SPASS

   191 manually, or found a Vampire executable somewhere (e.g.,

   192 \url{http://www.vprover.org/}), set the environment variable \texttt{E\_HOME},

   193 \texttt{LEO2\_HOME}, \texttt{SATALLAX\_HOME}, \texttt{SPASS\_HOME}, or

   194 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{eproof},

   195 \texttt{leo}, \texttt{satallax}, \texttt{SPASS}, or \texttt{vampire} executable.

   196 Sledgehammer has been tested with E 1.0 to 1.4, LEO-II 1.2.9, Satallax 2.2,

   197 SPASS 3.5 and 3.7, and Vampire 0.6, 1.0, and 1.8%

   198 \footnote{Following the rewrite of Vampire, the counter for version numbers was

   199 reset to 0; hence the (new) Vampire versions 0.6, 1.0, and 1.8 are more recent

   200 than 9.0 or 11.5.}%

   201 . Since the ATPs' output formats are neither documented nor stable, other

   202 versions of the ATPs might not work well with Sledgehammer. Ideally,

   203 also set \texttt{E\_VERSION}, \texttt{LEO2\_VERSION},

   204 \texttt{SATALLAX\_VERSION}, \texttt{SPASS\_VERSION}, or

   205 \texttt{VAMPIRE\_VERSION} to the prover's version number (e.g., ``1.4'').

   207 Similarly, if you want to build CVC3, or found a

   208 Yices or Z3 executable somewhere (e.g.,

   209 \url{http://yices.csl.sri.com/download.shtml} or

   210 \url{http://research.microsoft.com/en-us/um/redmond/projects/z3/download.html}),

   211 set the environment variable \texttt{CVC3\_\allowbreak SOLVER},

   212 \texttt{YICES\_SOLVER}, or \texttt{Z3\_SOLVER} to the complete path of

   213 the executable, \emph{including the file name}. Sledgehammer has been tested

   214 with CVC3 2.2, Yices 1.0.28, and Z3 3.0 to 3.2.

   215 Since the SMT solvers' output formats are somewhat unstable, other

   216 versions of the solvers might not work well with Sledgehammer. Ideally,

   217 also set \texttt{CVC3\_VERSION}, \texttt{YICES\_VERSION}, or

   218 \texttt{Z3\_VERSION} to the solver's version number (e.g., ``3.2'').

   219 \end{enum}

   220 \end{sloppy}

   222 To check whether E, SPASS, Vampire, and/or Z3 are successfully installed, try

   223 out the example in \S\ref{first-steps}. If the remote versions of any of these

   224 provers is used (identified by the prefix ``\emph{remote\_\/}''), or if the

   225 local versions fail to solve the easy goal presented there, something must be

   226 wrong with the installation.

   228 Remote prover invocation requires Perl with the World Wide Web Library

   229 (\texttt{libwww-perl}) installed. If you must use a proxy server to access the

   230 Internet, set the \texttt{http\_proxy} environment variable to the proxy, either

   231 in the environment in which Isabelle is launched or in your

   232 \texttt{\char`\~/\$ISABELLE\_HOME\_USER/etc/settings} file. Here are a few

   233 examples:

   235 \prew

   236 \texttt{http\_proxy=http://proxy.example.org} \\

   237 \texttt{http\_proxy=http://proxy.example.org:8080} \\

   238 \texttt{http\_proxy=http://joeblow:pAsSwRd@proxy.example.org}

   239 \postw

   241 \section{First Steps}

   242 \label{first-steps}

   244 To illustrate Sledgehammer in context, let us start a theory file and

   245 attempt to prove a simple lemma:

   247 \prew

   248 \textbf{theory}~\textit{Scratch} \\

   249 \textbf{imports}~\textit{Main} \\

   250 \textbf{begin} \\[2\smallskipamount]

   251 %

   252 \textbf{lemma} ``$[a] = [b] \,\Longrightarrow\, a = b$'' \\

   253 \textbf{sledgehammer}

   254 \postw

   256 Instead of issuing the \textbf{sledgehammer} command, you can also find

   257 Sledgehammer in the ``Commands'' submenu of the ``Isabelle'' menu in Proof

   258 General or press the Emacs key sequence C-c C-a C-s.

   259 Either way, Sledgehammer produces the following output after a few seconds:

   261 \prew

   262 \slshape

   263 Sledgehammer: ``\textit{e\/}'' on goal \\

   264 $[a] = [b] \,\Longrightarrow\, a = b$ \\

   265 Try this: \textbf{by} (\textit{metis last\_ConsL}) (64 ms). \\[3\smallskipamount]

   266 %

   267 Sledgehammer: ``\textit{z3\/}'' on goal \\

   268 $[a] = [b] \,\Longrightarrow\, a = b$ \\

   269 Try this: \textbf{by} (\textit{metis list.inject}) (20 ms). \\[3\smallskipamount]

   270 %

   271 Sledgehammer: ``\textit{vampire\/}'' on goal \\

   272 $[a] = [b] \,\Longrightarrow\, a = b$ \\

   273 Try this: \textbf{by} (\textit{metis hd.simps}) (14 ms). \\[3\smallskipamount]

   274 %

   275 Sledgehammer: ``\textit{spass\/}'' on goal \\

   276 $[a] = [b] \,\Longrightarrow\, a = b$ \\

   277 Try this: \textbf{by} (\textit{metis list.inject}) (17 ms). \\[3\smallskipamount]

   278 %

   279 Sledgehammer: ``\textit{remote\_waldmeister\/}'' on goal \\

   280 $[a] = [b] \,\Longrightarrow\, a = b$ \\

   281 Try this: \textbf{by} (\textit{metis hd.simps}) (15 ms). \\[3\smallskipamount]

   282 %

   283 Sledgehammer: ``\textit{remote\_e\_sine\/}'' on goal \\

   284 $[a] = [b] \,\Longrightarrow\, a = b$ \\

   285 Try this: \textbf{by} (\textit{metis hd.simps}) (18 ms).

   286 \postw

   288 Sledgehammer ran E, E-SInE, SPASS, Vampire, Waldmeister, and Z3 in parallel.

   289 Depending on which provers are installed and how many processor cores are

   290 available, some of the provers might be missing or present with a

   291 \textit{remote\_} prefix. Waldmeister is run only for unit equational problems,

   292 where the goal's conclusion is a (universally quantified) equation.

   294 For each successful prover, Sledgehammer gives a one-liner proof that uses

   295 the \textit{metis} or \textit{smt} proof method. Approximate timings are shown

   296 in parentheses, indicating how fast the call is. You can click the proof to

   297 insert it into the theory text.

   299 In addition, you can ask Sledgehammer for an Isar text proof by passing the

   300 \textit{isar\_proof} option (\S\ref{output-format}):

   302 \prew

   303 \textbf{sledgehammer} [\textit{isar\_proof}]

   304 \postw

   306 When Isar proof construction is successful, it can yield proofs that are more

   307 readable and also faster than the \textit{metis} or \textit{smt} one-liners.

   308 This feature is experimental and is only available for ATPs.

   310 \section{Hints}

   311 \label{hints}

   313 This section presents a few hints that should help you get the most out of

   314 Sledgehammer. Frequently (and infrequently) asked questions are answered in

   315 \S\ref{frequently-asked-questions}.

   317 %\newcommand\point[1]{\medskip\par{\sl\bfseries#1}\par\nopagebreak}

   318 \newcommand\point[1]{\subsection{\emph{#1}}}

   320 \point{Presimplify the goal}

   322 For best results, first simplify your problem by calling \textit{auto} or at

   323 least \textit{safe} followed by \textit{simp\_all}. The SMT solvers provide

   324 arithmetic decision procedures, but the ATPs typically do not (or if they do,

   325 Sledgehammer does not use it yet). Apart from Waldmeister, they are not

   326 especially good at heavy rewriting, but because they regard equations as

   327 undirected, they often prove theorems that require the reverse orientation of a

   328 \textit{simp} rule. Higher-order problems can be tackled, but the success rate

   329 is better for first-order problems. Hence, you may get better results if you

   330 first simplify the problem to remove higher-order features.

   332 \point{Make sure E, SPASS, Vampire, and Z3 are locally installed}

   334 Locally installed provers are faster and more reliable than those running on

   335 servers. See \S\ref{installation} for details on how to install them.

   337 \point{Familiarize yourself with the most important options}

   339 Sledgehammer's options are fully documented in \S\ref{command-syntax}. Many of

   340 the options are very specialized, but serious users of the tool should at least

   341 familiarize themselves with the following options:

   343 \begin{enum}

   344 \item[\labelitemi] \textbf{\textit{provers}} (\S\ref{mode-of-operation}) specifies

   345 the automatic provers (ATPs and SMT solvers) that should be run whenever

   346 Sledgehammer is invoked (e.g., ``\textit{provers}~= \textit{e spass

   347 remote\_vampire\/}''). For convenience, you can omit ``\textit{provers}~=''

   348 and simply write the prover names as a space-separated list (e.g., ``\textit{e

   349 spass remote\_vampire\/}'').

   351 \item[\labelitemi] \textbf{\textit{max\_relevant}} (\S\ref{relevance-filter})

   352 specifies the maximum number of facts that should be passed to the provers. By

   353 default, the value is prover-dependent but varies between about 150 and 1000. If

   354 the provers time out, you can try lowering this value to, say, 100 or 50 and see

   355 if that helps.

   357 \item[\labelitemi] \textbf{\textit{isar\_proof}} (\S\ref{output-format}) specifies

   358 that Isar proofs should be generated, instead of one-liner \textit{metis} or

   359 \textit{smt} proofs. The length of the Isar proofs can be controlled by setting

   360 \textit{isar\_shrink\_factor} (\S\ref{output-format}).

   362 \item[\labelitemi] \textbf{\textit{timeout}} (\S\ref{timeouts}) controls the

   363 provers' time limit. It is set to 30 seconds, but since Sledgehammer runs

   364 asynchronously you should not hesitate to raise this limit to 60 or 120 seconds

   365 if you are the kind of user who can think clearly while ATPs are active.

   366 \end{enum}

   368 Options can be set globally using \textbf{sledgehammer\_params}

   369 (\S\ref{command-syntax}). The command also prints the list of all available

   370 options with their current value. Fact selection can be influenced by specifying

   371 ``$(\textit{add}{:}~\textit{my\_facts})$'' after the \textbf{sledgehammer} call

   372 to ensure that certain facts are included, or simply ``$(\textit{my\_facts})$''

   373 to force Sledgehammer to run only with $\textit{my\_facts}$.

   375 \section{Frequently Asked Questions}

   376 \label{frequently-asked-questions}

   378 This sections answers frequently (and infrequently) asked questions about

   379 Sledgehammer. It is a good idea to skim over it now even if you don't have any

   380 questions at this stage. And if you have any further questions not listed here,

   381 send them to the author at \authoremail.

   383 \point{Which facts are passed to the automatic provers?}

   385 The relevance filter assigns a score to every available fact (lemma, theorem,

   386 definition, or axiom) based upon how many constants that fact shares with the

   387 conjecture. This process iterates to include facts relevant to those just

   388 accepted, but with a decay factor to ensure termination. The constants are

   389 weighted to give unusual ones greater significance. The relevance filter copes

   390 best when the conjecture contains some unusual constants; if all the constants

   391 are common, it is unable to discriminate among the hundreds of facts that are

   392 picked up. The relevance filter is also memoryless: It has no information about

   393 how many times a particular fact has been used in a proof, and it cannot learn.

   395 The number of facts included in a problem varies from prover to prover, since

   396 some provers get overwhelmed more easily than others. You can show the number of

   397 facts given using the \textit{verbose} option (\S\ref{output-format}) and the

   398 actual facts using \textit{debug} (\S\ref{output-format}).

   400 Sledgehammer is good at finding short proofs combining a handful of existing

   401 lemmas. If you are looking for longer proofs, you must typically restrict the

   402 number of facts, by setting the \textit{max\_relevant} option

   403 (\S\ref{relevance-filter}) to, say, 25 or 50.

   405 You can also influence which facts are actually selected in a number of ways. If

   406 you simply want to ensure that a fact is included, you can specify it using the

   407 ``$(\textit{add}{:}~\textit{my\_facts})$'' syntax. For example:

   408 %

   409 \prew

   410 \textbf{sledgehammer} (\textit{add}: \textit{hd.simps} \textit{tl.simps})

   411 \postw

   412 %

   413 The specified facts then replace the least relevant facts that would otherwise be

   414 included; the other selected facts remain the same.

   415 If you want to direct the selection in a particular direction, you can specify

   416 the facts via \textbf{using}:

   417 %

   418 \prew

   419 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\

   420 \textbf{sledgehammer}

   421 \postw

   422 %

   423 The facts are then more likely to be selected than otherwise, and if they are

   424 selected at iteration $j$ they also influence which facts are selected at

   425 iterations $j + 1$, $j + 2$, etc. To give them even more weight, try

   426 %

   427 \prew

   428 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\

   429 \textbf{apply}~\textbf{--} \\

   430 \textbf{sledgehammer}

   431 \postw

   433 \point{Why does Metis fail to reconstruct the proof?}

   435 There are many reasons. If Metis runs seemingly forever, that is a sign that the

   436 proof is too difficult for it. Metis's search is complete, so it should

   437 eventually find it, but that's little consolation. There are several possible

   438 solutions:

   440 \begin{enum}

   441 \item[\labelitemi] Try the \textit{isar\_proof} option (\S\ref{output-format}) to

   442 obtain a step-by-step Isar proof where each step is justified by \textit{metis}.

   443 Since the steps are fairly small, \textit{metis} is more likely to be able to

   444 replay them.

   446 \item[\labelitemi] Try the \textit{smt} proof method instead of \textit{metis}. It

   447 is usually stronger, but you need to either have Z3 available to replay the

   448 proofs, trust the SMT solver, or use certificates. See the documentation in the

   449 \emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.

   451 \item[\labelitemi] Try the \textit{blast} or \textit{auto} proof methods, passing

   452 the necessary facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},

   453 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.

   454 \end{enum}

   456 In some rare cases, \textit{metis} fails fairly quickly, and you get the error

   457 message

   459 \prew

   460 \slshape

   461 One-line proof reconstruction failed.

   462 \postw

   464 This message indicates that Sledgehammer determined that the goal is provable,

   465 but the proof is, for technical reasons, beyond \textit{metis}'s power. You can

   466 then try again with the \textit{strict} option (\S\ref{problem-encoding}).

   468 If the goal is actually unprovable are you did not specify an unsound encoding

   469 using \textit{type\_enc} (\S\ref{problem-encoding}), this is a bug, and you are

   470 strongly encouraged to report this to the author at \authoremail.

   472 \point{Why are the generated Isar proofs so ugly/broken?}

   474 The current implementation of the Isar proof feature,

   475 enabled by the \textit{isar\_proof} option (\S\ref{output-format}),

   476 is highly experimental. Work on a new implementation has begun. There is a large body of

   477 research into transforming resolution proofs into natural deduction proofs (such

   478 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to

   479 set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger

   480 value or to try several provers and keep the nicest-looking proof.

   482 \point{How can I tell whether a suggested proof is sound?}

   484 Earlier versions of Sledgehammer often suggested unsound proofs---either proofs

   485 of nontheorems or simply proofs that rely on type-unsound inferences. This

   486 is a thing of the pass, unless you explicitly specify an unsound encoding

   487 using \textit{type\_enc} (\S\ref{problem-encoding}).

   488 %

   489 Officially, the only form of ``unsoundness'' that lurks in the sound

   490 encodings is related to missing characteristic theorems of datatypes. For

   491 example,

   493 \prew

   494 \textbf{lemma}~``$\exists \mathit{xs}.\; \mathit{xs} \neq []$'' \\

   495 \textbf{sledgehammer} ()

   496 \postw

   498 suggests an argumentless \textit{metis} call that fails. However, the conjecture

   499 does actually hold, and the \textit{metis} call can be repaired by adding

   500 \textit{list.distinct}.

   501 %

   502 We hope to address this problem in a future version of Isabelle. In the

   503 meantime, you can avoid it by passing the \textit{strict} option

   504 (\S\ref{problem-encoding}).

   506 \point{What are the \textit{full\_types}, \textit{no\_types}, and

   507 \textit{mono\_tags} arguments to Metis?}

   509 The \textit{metis}~(\textit{full\_types}) proof method

   510 and its cousin \textit{metis}~(\textit{mono\_tags}) are fully-typed

   511 version of Metis. It is somewhat slower than \textit{metis}, but the proof

   512 search is fully typed, and it also includes more powerful rules such as the

   513 axiom ``$x = \const{True} \mathrel{\lor} x = \const{False}$'' for reasoning in

   514 higher-order places (e.g., in set comprehensions). The method kicks in

   515 automatically as a fallback when \textit{metis} fails, and it is sometimes

   516 generated by Sledgehammer instead of \textit{metis} if the proof obviously

   517 requires type information or if \textit{metis} failed when Sledgehammer

   518 preplayed the proof. (By default, Sledgehammer tries to run \textit{metis} with

   519 various options for up to 3 seconds each time to ensure that the generated

   520 one-line proofs actually work and to display timing information. This can be

   521 configured using the \textit{preplay\_timeout} option (\S\ref{timeouts}).)

   522 %

   523 At the other end of the soundness spectrum, \textit{metis} (\textit{no\_types})

   524 uses no type information at all during the proof search, which is more efficient

   525 but often fails. Calls to \textit{metis} (\textit{no\_types}) are occasionally

   526 generated by Sledgehammer.

   527 %

   528 See the \textit{type\_enc} option (\S\ref{problem-encoding}) for details.

   530 Incidentally, if you ever see warnings such as

   532 \prew

   533 \slshape

   534 Metis: Falling back on ``\textit{metis} (\textit{full\_types})''.

   535 \postw

   537 for a successful \textit{metis} proof, you can advantageously pass the

   538 \textit{full\_types} option to \textit{metis} directly.

   540 \point{And what are the \textit{lifting} and \textit{hide\_lams} arguments

   541 to Metis?}

   543 Orthogonally to the encoding of types, it is important to choose an appropriate

   544 translation of $\lambda$-abstractions. Metis supports three translation schemes,

   545 in decreasing order of power: Curry combinators (the default),

   546 $\lambda$-lifting, and a ``hiding'' scheme that disables all reasoning under

   547 $\lambda$-abstractions. The more powerful schemes also give the automatic

   548 provers more rope to hang themselves. See the \textit{lam\_trans} option (\S\ref{problem-encoding}) for details.

   550 \point{Are generated proofs minimal?}

   552 Automatic provers frequently use many more facts than are necessary.

   553 Sledgehammer inclues a minimization tool that takes a set of facts returned by a

   554 given prover and repeatedly calls the same prover, \textit{metis}, or

   555 \textit{smt} with subsets of those axioms in order to find a minimal set.

   556 Reducing the number of axioms typically improves Metis's speed and success rate,

   557 while also removing superfluous clutter from the proof scripts.

   559 In earlier versions of Sledgehammer, generated proofs were systematically

   560 accompanied by a suggestion to invoke the minimization tool. This step is now

   561 performed implicitly if it can be done in a reasonable amount of time (something

   562 that can be guessed from the number of facts in the original proof and the time

   563 it took to find or preplay it).

   565 In addition, some provers (e.g., Yices) do not provide proofs or sometimes

   566 produce incomplete proofs. The minimizer is then invoked to find out which facts

   567 are actually needed from the (large) set of facts that was initinally given to

   568 the prover. Finally, if a prover returns a proof with lots of facts, the

   569 minimizer is invoked automatically since Metis would be unlikely to re-find the

   570 proof.

   571 %

   572 Automatic minimization can be forced or disabled using the \textit{minimize}

   573 option (\S\ref{mode-of-operation}).

   575 \point{A strange error occurred---what should I do?}

   577 Sledgehammer tries to give informative error messages. Please report any strange

   578 error to the author at \authoremail. This applies double if you get the message

   580 \prew

   581 \slshape

   582 The prover found a type-unsound proof involving ``\textit{foo\/}'',

   583 ``\textit{bar\/}'', and ``\textit{baz\/}'' even though a supposedly type-sound

   584 encoding was used (or, less likely, your axioms are inconsistent). You might

   585 want to report this to the Isabelle developers.

   586 \postw

   588 \point{Auto can solve it---why not Sledgehammer?}

   590 Problems can be easy for \textit{auto} and difficult for automatic provers, but

   591 the reverse is also true, so don't be discouraged if your first attempts fail.

   592 Because the system refers to all theorems known to Isabelle, it is particularly

   593 suitable when your goal has a short proof from lemmas that you don't know about.

   595 \point{Why are there so many options?}

   597 Sledgehammer's philosophy should work out of the box, without user guidance.

   598 Many of the options are meant to be used mostly by the Sledgehammer developers

   599 for experimentation purposes. Of course, feel free to experiment with them if

   600 you are so inclined.

   602 \section{Command Syntax}

   603 \label{command-syntax}

   605 \subsection{Sledgehammer}

   607 Sledgehammer can be invoked at any point when there is an open goal by entering

   608 the \textbf{sledgehammer} command in the theory file. Its general syntax is as

   609 follows:

   611 \prew

   612 \textbf{sledgehammer} \qty{subcommand}$^?$ \qty{options}$^?$ \qty{facts\_override}$^?$ \qty{num}$^?$

   613 \postw

   615 For convenience, Sledgehammer is also available in the ``Commands'' submenu of

   616 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c

   617 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with

   618 no arguments in the theory text.

   620 In the general syntax, the \qty{subcommand} may be any of the following:

   622 \begin{enum}

   623 \item[\labelitemi] \textbf{\textit{run} (the default):} Runs Sledgehammer on

   624 subgoal number \qty{num} (1 by default), with the given options and facts.

   626 \item[\labelitemi] \textbf{\textit{min}:} Attempts to minimize the facts

   627 specified in the \qty{facts\_override} argument to obtain a simpler proof

   628 involving fewer facts. The options and goal number are as for \textit{run}.

   630 \item[\labelitemi] \textbf{\textit{messages}:} Redisplays recent messages issued

   631 by Sledgehammer. This allows you to examine results that might have been lost

   632 due to Sledgehammer's asynchronous nature. The \qty{num} argument specifies a

   633 limit on the number of messages to display (5 by default).

   635 \item[\labelitemi] \textbf{\textit{supported\_provers}:} Prints the list of

   636 automatic provers supported by Sledgehammer. See \S\ref{installation} and

   637 \S\ref{mode-of-operation} for more information on how to install automatic

   638 provers.

   640 \item[\labelitemi] \textbf{\textit{running\_provers}:} Prints information about

   641 currently running automatic provers, including elapsed runtime and remaining

   642 time until timeout.

   644 \item[\labelitemi] \textbf{\textit{kill\_provers}:} Terminates all running

   645 automatic provers.

   647 \item[\labelitemi] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote

   648 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.

   649 \end{enum}

   651 Sledgehammer's behavior can be influenced by various \qty{options}, which can be

   652 specified in brackets after the \textbf{sledgehammer} command. The

   653 \qty{options} are a list of key--value pairs of the form ``[$k_1 = v_1,

   654 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true\/}'' is optional. For

   655 example:

   657 \prew

   658 \textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120]

   659 \postw

   661 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:

   663 \prew

   664 \textbf{sledgehammer\_params} \qty{options}

   665 \postw

   667 The supported options are described in \S\ref{option-reference}.

   669 The \qty{facts\_override} argument lets you alter the set of facts that go

   670 through the relevance filter. It may be of the form ``(\qty{facts})'', where

   671 \qty{facts} is a space-separated list of Isabelle facts (theorems, local

   672 assumptions, etc.), in which case the relevance filter is bypassed and the given

   673 facts are used. It may also be of the form ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}})'',

   674 ``(\textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', or ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}}\

   675 \textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', where the relevance filter is instructed to

   676 proceed as usual except that it should consider \qty{facts\/_{\mathrm{1}}}

   677 highly-relevant and \qty{facts\/_{\mathrm{2}}} fully irrelevant.

   679 You can instruct Sledgehammer to run automatically on newly entered theorems by

   680 enabling the ``Auto Sledgehammer'' option in Proof General's ``Isabelle'' menu.

   681 For automatic runs, only the first prover set using \textit{provers}

   682 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,

   683 \textit{slice} (\S\ref{mode-of-operation}) is disabled, \textit{strict}

   684 (\S\ref{problem-encoding}) is enabled, \textit{verbose} (\S\ref{output-format})

   685 and \textit{debug} (\S\ref{output-format}) are disabled, and \textit{timeout}

   686 (\S\ref{timeouts}) is superseded by the ``Auto Tools Time Limit'' in Proof

   687 General's ``Isabelle'' menu. Sledgehammer's output is also more concise.

   689 \subsection{Metis}

   691 The \textit{metis} proof method has the syntax

   693 \prew

   694 \textbf{\textit{metis}}~(\qty{options})${}^?$~\qty{facts}${}^?$

   695 \postw

   697 where \qty{facts} is a list of arbitrary facts and \qty{options} is a

   698 comma-separated list consisting of at most one $\lambda$ translation scheme

   699 specification with the same semantics as Sledgehammer's \textit{lam\_trans}

   700 option (\S\ref{problem-encoding}) and at most one type encoding specification

   701 with the same semantics as Sledgehammer's \textit{type\_enc} option

   702 (\S\ref{problem-encoding}).

   703 %

   704 The supported $\lambda$ translation schemes are \textit{hide\_lams},

   705 \textit{lifting}, and \textit{combs} (the default).

   706 %

   707 All the untyped type encodings listed in \S\ref{problem-encoding} are supported.

   708 For convenience, the following aliases are provided:

   709 \begin{enum}

   710 \item[\labelitemi] \textbf{\textit{full\_types}:} Synonym for \textit{poly\_guards\_query}.

   711 \item[\labelitemi] \textbf{\textit{partial\_types}:} Synonym for \textit{poly\_args}.

   712 \item[\labelitemi] \textbf{\textit{no\_types}:} Synonym for \textit{erased}.

   713 \end{enum}

   715 \section{Option Reference}

   716 \label{option-reference}

   718 \def\defl{\{}

   719 \def\defr{\}}

   721 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}

   722 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{true}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

   723 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{false}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

   724 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

   725 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

   726 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}

   727 \def\opnodefaultbrk#1#2{\flushitem{$\bigl[$\textit{#1} =$\bigr]$ \qtybf{#2}} \nopagebreak\\[\parskip]}

   728 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\enskip \defl\textit{#3}\defr} \nopagebreak\\[\parskip]}

   729 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}

   730 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}

   731 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}

   733 Sledgehammer's options are categorized as follows:\ mode of operation

   734 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}),

   735 relevance filter (\S\ref{relevance-filter}), output format

   736 (\S\ref{output-format}), authentication (\S\ref{authentication}), and timeouts

   737 (\S\ref{timeouts}).

   739 The descriptions below refer to the following syntactic quantities:

   741 \begin{enum}

   742 \item[\labelitemi] \qtybf{string}: A string.

   743 \item[\labelitemi] \qtybf{bool\/}: \textit{true} or \textit{false}.

   744 \item[\labelitemi] \qtybf{smart\_bool\/}: \textit{true}, \textit{false}, or

   745 \textit{smart}.

   746 \item[\labelitemi] \qtybf{int\/}: An integer.

   747 %\item[\labelitemi] \qtybf{float\/}: A floating-point number (e.g., 2.5).

   748 \item[\labelitemi] \qtybf{float\_pair\/}: A pair of floating-point numbers

   749 (e.g., 0.6 0.95).

   750 \item[\labelitemi] \qtybf{smart\_int\/}: An integer or \textit{smart}.

   751 \item[\labelitemi] \qtybf{float\_or\_none\/}: A floating-point number (e.g., 60 or

   752 0.5) expressing a number of seconds, or the keyword \textit{none} ($\infty$

   753 seconds).

   754 \end{enum}

   756 Default values are indicated in curly brackets (\textrm{\{\}}). Boolean options

   757 have a negated counterpart (e.g., \textit{blocking} vs.\

   758 \textit{non\_blocking}). When setting them, ``= \textit{true\/}'' may be omitted.

   760 \subsection{Mode of Operation}

   761 \label{mode-of-operation}

   763 \begin{enum}

   764 \opnodefaultbrk{provers}{string}

   765 Specifies the automatic provers to use as a space-separated list (e.g.,

   766 ``\textit{e}~\textit{spass}~\textit{remote\_vampire\/}'').

   767 Provers can be run locally or remotely; see \S\ref{installation} for

   768 installation instructions.

   770 The following local provers are supported:

   772 \begin{enum}

   773 \item[\labelitemi] \textbf{\textit{cvc3}:} CVC3 is an SMT solver developed by

   774 Clark Barrett, Cesare Tinelli, and their colleagues \cite{cvc3}. To use CVC3,

   775 set the environment variable \texttt{CVC3\_SOLVER} to the complete path of the

   776 executable, including the file name, or install the prebuilt CVC3 package from

   777 \download. Sledgehammer has been tested with version 2.2.

   779 \item[\labelitemi] \textbf{\textit{e}:} E is a first-order resolution prover

   780 developed by Stephan Schulz \cite{schulz-2002}. To use E, set the environment

   781 variable \texttt{E\_HOME} to the directory that contains the \texttt{eproof}

   782 executable, or install the prebuilt E package from \download. Sledgehammer has

   783 been tested with versions 1.0 to 1.4.

   785 \item[\labelitemi] \textbf{\textit{leo2}:} LEO-II is an automatic

   786 higher-order prover developed by Christoph Benzm\"uller et al.\ \cite{leo2},

   787 with support for the TPTP typed higher-order syntax (THF0). To use LEO-II, set

   788 the environment variable \texttt{LEO2\_HOME} to the directory that contains the

   789 \texttt{leo} executable. Sledgehammer requires version 1.2.9 or above.

   791 \item[\labelitemi] \textbf{\textit{metis}:} Although it is much less powerful than

   792 the external provers, Metis itself can be used for proof search.

   794 \item[\labelitemi] \textbf{\textit{satallax}:} Satallax is an automatic

   795 higher-order prover developed by Chad Brown et al.\ \cite{satallax}, with

   796 support for the TPTP typed higher-order syntax (THF0). To use Satallax, set the

   797 environment variable \texttt{SATALLAX\_HOME} to the directory that contains the

   798 \texttt{satallax} executable. Sledgehammer requires version 2.2 or above.

   800 \item[\labelitemi] \textbf{\textit{smt}:} The \textit{smt} proof method with the

   801 current settings (usually:\ Z3 with proof reconstruction).

   803 \item[\labelitemi] \textbf{\textit{spass}:} SPASS is a first-order resolution

   804 prover developed by Christoph Weidenbach et al.\ \cite{weidenbach-et-al-2009}.

   805 To use SPASS, set the environment variable \texttt{SPASS\_HOME} to the directory

   806 that contains the \texttt{SPASS} executable, or install the prebuilt SPASS

   807 package from \download. Sledgehammer requires version 3.5 or above.

   809 \item[\labelitemi] \textbf{\textit{vampire}:} Vampire is a first-order resolution

   810 prover developed by Andrei Voronkov and his colleagues

   811 \cite{riazanov-voronkov-2002}. To use Vampire, set the environment variable

   812 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{vampire}

   813 executable and \texttt{VAMPIRE\_VERSION} to the version number (e.g., ``1.8'').

   814 Sledgehammer has been tested with versions 0.6, 1.0, and 1.8.

   815 %Vampire 1.8 supports the TPTP typed first-order format (TFF0).

   817 \item[\labelitemi] \textbf{\textit{yices}:} Yices is an SMT solver developed at

   818 SRI \cite{yices}. To use Yices, set the environment variable

   819 \texttt{YICES\_SOLVER} to the complete path of the executable, including the

   820 file name. Sledgehammer has been tested with version 1.0.28.

   822 \item[\labelitemi] \textbf{\textit{z3}:} Z3 is an SMT solver developed at

   823 Microsoft Research \cite{z3}. To use Z3, set the environment variable

   824 \texttt{Z3\_SOLVER} to the complete path of the executable, including the file

   825 name, and set \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a

   826 noncommercial user. Sledgehammer has been tested with versions 3.0 to 3.2.

   828 \item[\labelitemi] \textbf{\textit{z3\_tptp}:} This version of Z3 pretends to be

   829 an ATP, exploiting Z3's support for the TPTP untyped and typed first-order

   830 formats (FOF and TFF0). It is included for experimental purposes. It

   831 requires version 3.0 or above. To use it, set the environment variable

   832 \texttt{Z3\_HOME} to the directory that contains the \texttt{z3}

   833 executable.

   834 \end{enum}

   836 The following remote provers are supported:

   838 \begin{enum}

   839 \item[\labelitemi] \textbf{\textit{remote\_cvc3}:} The remote version of CVC3 runs

   840 on servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to

   841 point).

   843 \item[\labelitemi] \textbf{\textit{remote\_e}:} The remote version of E runs

   844 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

   846 \item[\labelitemi] \textbf{\textit{remote\_e\_sine}:} E-SInE is a metaprover

   847 developed by Kry\v stof Hoder \cite{sine} based on E. The remote version of

   848 SInE runs on Geoff Sutcliffe's Miami servers.

   850 \item[\labelitemi] \textbf{\textit{remote\_e\_tofof}:} E-ToFoF is a metaprover

   851 developed by Geoff Sutcliffe \cite{tofof} based on E running on his Miami

   852 servers. This ATP supports the TPTP typed first-order format (TFF0). The

   853 remote version of E-ToFoF runs on Geoff Sutcliffe's Miami servers.

   855 \item[\labelitemi] \textbf{\textit{remote\_iprover}:} iProver is a pure

   856 instantiation-based prover developed by Konstantin Korovin \cite{korovin-2009}. The

   857 remote version of iProver runs on Geoff Sutcliffe's Miami servers

   858 \cite{sutcliffe-2000}.

   860 \item[\labelitemi] \textbf{\textit{remote\_iprover\_eq}:} iProver-Eq is an

   861 instantiation-based prover with native support for equality developed by

   862 Konstantin Korovin and Christoph Sticksel \cite{korovin-sticksel-2010}. The

   863 remote version of iProver-Eq runs on Geoff Sutcliffe's Miami servers

   864 \cite{sutcliffe-2000}.

   866 The remote version of LEO-II

   867 runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

   869 \item[\labelitemi] \textbf{\textit{remote\_leo2}:} The remote version of LEO-II

   870 runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

   872 \item[\labelitemi] \textbf{\textit{remote\_satallax}:} The remote version of

   873 Satallax runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

   875 \item[\labelitemi] \textbf{\textit{remote\_snark}:} SNARK is a first-order

   876 resolution prover developed by Stickel et al.\ \cite{snark}. It supports the

   877 TPTP typed first-order format (TFF0). The remote version of SNARK runs on

   878 Geoff Sutcliffe's Miami servers.

   880 \item[\labelitemi] \textbf{\textit{remote\_vampire}:} The remote version of

   881 Vampire runs on Geoff Sutcliffe's Miami servers. Version 1.8 is used.

   883 \item[\labelitemi] \textbf{\textit{remote\_waldmeister}:} Waldmeister is a unit

   884 equality prover developed by Hillenbrand et al.\ \cite{waldmeister}. It can be

   885 used to prove universally quantified equations using unconditional equations,

   886 corresponding to the TPTP CNF UEQ division. The remote version of Waldmeister

   887 runs on Geoff Sutcliffe's Miami servers.

   889 \item[\labelitemi] \textbf{\textit{remote\_z3}:} The remote version of Z3 runs on

   890 servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to

   891 point).

   893 \item[\labelitemi] \textbf{\textit{remote\_z3\_tptp}:} The remote version of ``Z3

   894 with TPTP syntax'' runs on Geoff Sutcliffe's Miami servers.

   895 \end{enum}

   897 By default, Sledgehammer runs E, E-SInE, SPASS, Vampire, Z3 (or whatever

   898 the SMT module's \textit{smt\_solver} configuration option is set to), and (if

   899 appropriate) Waldmeister in parallel---either locally or remotely, depending on

   900 the number of processor cores available. For historical reasons, the default

   901 value of this option can be overridden using the option ``Sledgehammer:

   902 Provers'' in Proof General's ``Isabelle'' menu.

   904 It is generally a good idea to run several provers in parallel. Running E,

   905 SPASS, and Vampire for 5~seconds yields a similar success rate to running the

   906 most effective of these for 120~seconds \cite{boehme-nipkow-2010}.

   908 For the \textit{min} subcommand, the default prover is \textit{metis}. If

   909 several provers are set, the first one is used.

   911 \opnodefault{prover}{string}

   912 Alias for \textit{provers}.

   914 %\opnodefault{atps}{string}

   915 %Legacy alias for \textit{provers}.

   917 %\opnodefault{atp}{string}

   918 %Legacy alias for \textit{provers}.

   920 \opfalse{blocking}{non\_blocking}

   921 Specifies whether the \textbf{sledgehammer} command should operate

   922 synchronously. The asynchronous (non-blocking) mode lets the user start proving

   923 the putative theorem manually while Sledgehammer looks for a proof, but it can

   924 also be more confusing. Irrespective of the value of this option, Sledgehammer

   925 is always run synchronously for the new jEdit-based user interface or if

   926 \textit{debug} (\S\ref{output-format}) is enabled.

   928 \optrue{slice}{dont\_slice}

   929 Specifies whether the time allocated to a prover should be sliced into several

   930 segments, each of which has its own set of possibly prover-dependent options.

   931 For SPASS and Vampire, the first slice tries the fast but incomplete

   932 set-of-support (SOS) strategy, whereas the second slice runs without it. For E,

   933 up to three slices are tried, with different weighted search strategies and

   934 number of facts. For SMT solvers, several slices are tried with the same options

   935 each time but fewer and fewer facts. According to benchmarks with a timeout of

   936 30 seconds, slicing is a valuable optimization, and you should probably leave it

   937 enabled unless you are conducting experiments. This option is implicitly

   938 disabled for (short) automatic runs.

   940 \nopagebreak

   941 {\small See also \textit{verbose} (\S\ref{output-format}).}

   943 \opsmart{minimize}{dont\_minimize}

   944 Specifies whether the minimization tool should be invoked automatically after

   945 proof search. By default, automatic minimization takes place only if

   946 it can be done in a reasonable amount of time (as determined by

   947 the number of facts in the original proof and the time it took to find or

   948 preplay it) or the proof involves an unreasonably large number of facts.

   950 \nopagebreak

   951 {\small See also \textit{preplay\_timeout} (\S\ref{timeouts}).}

   953 \opfalse{overlord}{no\_overlord}

   954 Specifies whether Sledgehammer should put its temporary files in

   955 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for

   956 debugging Sledgehammer but also unsafe if several instances of the tool are run

   957 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may

   958 safely remove them after Sledgehammer has run.

   960 \nopagebreak

   961 {\small See also \textit{debug} (\S\ref{output-format}).}

   962 \end{enum}

   964 \subsection{Problem Encoding}

   965 \label{problem-encoding}

   967 \newcommand\comb[1]{\const{#1}}

   969 \begin{enum}

   970 \opdefault{lam\_trans}{string}{smart}

   971 Specifies the $\lambda$ translation scheme to use in ATP problems. The supported

   972 translation schemes are listed below:

   974 \begin{enum}

   975 \item[\labelitemi] \textbf{\textit{hide\_lams}:} Hide the $\lambda$-abstractions

   976 by replacing them by unspecified fresh constants, effectively disabling all

   977 reasoning under $\lambda$-abstractions.

   979 \item[\labelitemi] \textbf{\textit{lifting}:} Introduce a new

   980 supercombinator \const{c} for each cluster of $n$~$\lambda$-abstractions,

   981 defined using an equation $\const{c}~x_1~\ldots~x_n = t$ ($\lambda$-lifting).

   983 \item[\labelitemi] \textbf{\textit{combs}:} Rewrite lambdas to the Curry

   984 combinators (\comb{I}, \comb{K}, \comb{S}, \comb{B}, \comb{C}). Combinators

   985 enable the ATPs to synthesize $\lambda$-terms but tend to yield bulkier formulas

   986 than $\lambda$-lifting: The translation is quadratic in the worst case, and the

   987 equational definitions of the combinators are very prolific in the context of

   988 resolution.

   990 \item[\labelitemi] \textbf{\textit{combs\_and\_lifting}:} Introduce a new

   991 supercombinator \const{c} for each cluster of $\lambda$-abstractions and characterize it both using a

   992 lifted equation $\const{c}~x_1~\ldots~x_n = t$ and via Curry combinators.

   994 \item[\labelitemi] \textbf{\textit{combs\_or\_lifting}:} For each cluster of

   995 $\lambda$-abstractions, heuristically choose between $\lambda$-lifting and Curry

   996 combinators.

   998 \item[\labelitemi] \textbf{\textit{keep\_lams}:}

   999 Keep the $\lambda$-abstractions in the generated problems. This is available

  1000 only with provers that support the THF0 syntax.

  1002 \item[\labelitemi] \textbf{\textit{smart}:} The actual translation scheme used

  1003 depends on the ATP and should be the most efficient scheme for that ATP.

  1004 \end{enum}

  1006 For SMT solvers, the $\lambda$ translation scheme is always \textit{lifting},

  1007 irrespective of the value of this option.

  1009 \opsmartx{uncurried\_aliases}{no\_uncurried\_aliases}

  1010 Specifies the $\lambda$ translation scheme to use in ATP problems. The supported

  1011 translation schemes are listed below:

  1013 \opdefault{type\_enc}{string}{smart}

  1014 Specifies the type encoding to use in ATP problems. Some of the type encodings

  1015 are unsound, meaning that they can give rise to spurious proofs

  1016 (unreconstructible using \textit{metis}). The supported type encodings are

  1017 listed below, with an indication of their soundness in parentheses.

  1018 An asterisk (*) means that the encoding is slightly incomplete for

  1019 reconstruction with \textit{metis}, unless the \emph{strict} option (described

  1020 below) is enabled.

  1022 \begin{enum}

  1023 \item[\labelitemi] \textbf{\textit{erased} (very unsound):} No type information is

  1024 supplied to the ATP, not even to resolve overloading. Types are simply erased.

  1026 \item[\labelitemi] \textbf{\textit{poly\_guards} (sound):} Types are encoded using

  1027 a predicate \const{g}$(\tau, t)$ that guards bound

  1028 variables. Constants are annotated with their types, supplied as additional

  1029 arguments, to resolve overloading.

  1031 \item[\labelitemi] \textbf{\textit{poly\_tags} (sound):} Each term and subterm is

  1032 tagged with its type using a function $\const{t\/}(\tau, t)$.

  1034 \item[\labelitemi] \textbf{\textit{poly\_args} (unsound):}

  1035 Like for \textit{poly\_guards} constants are annotated with their types to

  1036 resolve overloading, but otherwise no type information is encoded. This

  1037 coincides with the default encoding used by the \textit{metis} command.

  1039 \item[\labelitemi]

  1040 \textbf{%

  1041 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags} (sound); \\

  1042 \textit{raw\_mono\_args} (unsound):} \\

  1043 Similar to \textit{poly\_guards}, \textit{poly\_tags}, and \textit{poly\_args},

  1044 respectively, but the problem is additionally monomorphized, meaning that type

  1045 variables are instantiated with heuristically chosen ground types.

  1046 Monomorphization can simplify reasoning but also leads to larger fact bases,

  1047 which can slow down the ATPs.

  1049 \item[\labelitemi]

  1050 \textbf{%

  1051 \textit{mono\_guards}, \textit{mono\_tags} (sound);

  1052 \textit{mono\_args} (unsound):} \\

  1053 Similar to

  1054 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, and

  1055 \textit{raw\_mono\_args}, respectively but types are mangled in constant names

  1056 instead of being supplied as ground term arguments. The binary predicate

  1057 $\const{g}(\tau, t)$ becomes a unary predicate

  1058 $\const{g\_}\tau(t)$, and the binary function

  1059 $\const{t}(\tau, t)$ becomes a unary function

  1060 $\const{t\_}\tau(t)$.

  1062 \item[\labelitemi] \textbf{\textit{mono\_simple} (sound):} Exploits simple

  1063 first-order types if the prover supports the TFF0 or THF0 syntax; otherwise,

  1064 falls back on \textit{mono\_guards}. The problem is monomorphized.

  1066 \item[\labelitemi] \textbf{\textit{mono\_simple\_higher} (sound):} Exploits simple

  1067 higher-order types if the prover supports the THF0 syntax; otherwise, falls back

  1068 on \textit{mono\_simple} or \textit{mono\_guards}. The problem is monomorphized.

  1070 \item[\labelitemi]

  1071 \textbf{%

  1072 \textit{poly\_guards}?, \textit{poly\_tags}?, \textit{raw\_mono\_guards}?, \\

  1073 \textit{raw\_mono\_tags}?, \textit{mono\_guards}?, \textit{mono\_tags}?, \\

  1074 \textit{mono\_simple}? (sound*):} \\

  1075 The type encodings \textit{poly\_guards}, \textit{poly\_tags},

  1076 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},

  1077 \textit{mono\_tags}, and \textit{mono\_simple} are fully

  1078 typed and sound. For each of these, Sledgehammer also provides a lighter,

  1079 virtually sound variant identified by a question mark (`\hbox{?}')\ that detects

  1080 and erases monotonic types, notably infinite types. (For \textit{mono\_simple},

  1081 the types are not actually erased but rather replaced by a shared uniform type

  1082 of individuals.) As argument to the \textit{metis} proof method, the question

  1083 mark is replaced by a \hbox{``\textit{\_query\/}''} suffix.

  1085 \item[\labelitemi]

  1086 \textbf{%

  1087 \textit{poly\_guards}??, \textit{poly\_tags}??, \textit{raw\_mono\_guards}??, \\

  1088 \textit{raw\_mono\_tags}??, \textit{mono\_guards}??, \textit{mono\_tags}?? \\

  1089 (sound*):} \\

  1090 Even lighter versions of the `\hbox{?}' encodings. As argument to the

  1091 \textit{metis} proof method, the `\hbox{??}' suffix is replaced by

  1092 \hbox{``\textit{\_query\_query\/}''}.

  1094 \item[\labelitemi]

  1095 \textbf{%

  1096 \textit{poly\_guards}@?, \textit{raw\_mono\_guards}@? (sound*):} \\

  1097 Alternative versions of the `\hbox{??}' encodings. As argument to the

  1098 \textit{metis} proof method, the `\hbox{@?}' suffix is replaced by

  1099 \hbox{``\textit{\_at\_query\/}''}.

  1101 \item[\labelitemi]

  1102 \textbf{%

  1103 \textit{poly\_guards}!, \textit{poly\_tags}!, \textit{raw\_mono\_guards}!, \\

  1104 \textit{raw\_mono\_tags}!, \textit{mono\_guards}!, \textit{mono\_tags}!, \\

  1105 \textit{mono\_simple}!, \textit{mono\_simple\_higher}! (mildly unsound):} \\

  1106 The type encodings \textit{poly\_guards}, \textit{poly\_tags},

  1107 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},

  1108 \textit{mono\_tags}, \textit{mono\_simple}, and \textit{mono\_simple\_higher}

  1109 also admit a mildly unsound (but very efficient) variant identified by an

  1110 exclamation mark (`\hbox{!}') that detects and erases erases all types except

  1111 those that are clearly finite (e.g., \textit{bool}). (For \textit{mono\_simple}

  1112 and \textit{mono\_simple\_higher}, the types are not actually erased but rather

  1113 replaced by a shared uniform type of individuals.) As argument to the

  1114 \textit{metis} proof method, the exclamation mark is replaced by the suffix

  1115 \hbox{``\textit{\_bang\/}''}.

  1117 \item[\labelitemi]

  1118 \textbf{%

  1119 \textit{poly\_guards}!!, \textit{poly\_tags}!!, \textit{raw\_mono\_guards}!!, \\

  1120 \textit{raw\_mono\_tags}!!, \textit{mono\_guards}!!, \textit{mono\_tags}!! \\

  1121 (mildly unsound):} \\

  1122 Even lighter versions of the `\hbox{!}' encodings. As argument to the

  1123 \textit{metis} proof method, the `\hbox{!!}' suffix is replaced by

  1124 \hbox{``\textit{\_bang\_bang\/}''}.

  1126 \item[\labelitemi]

  1127 \textbf{%

  1128 \textit{poly\_guards}@!, \textit{raw\_mono\_guards}@! (mildly unsound):} \\

  1129 Alternative versions of the `\hbox{!!}' encodings. As argument to the

  1130 \textit{metis} proof method, the `\hbox{@!}' suffix is replaced by

  1131 \hbox{``\textit{\_at\_bang\/}''}.

  1133 \item[\labelitemi] \textbf{\textit{smart}:} The actual encoding used depends on

  1134 the ATP and should be the most efficient virtually sound encoding for that ATP.

  1135 \end{enum}

  1137 For SMT solvers, the type encoding is always \textit{mono\_simple}, irrespective

  1138 of the value of this option.

  1140 \nopagebreak

  1141 {\small See also \textit{max\_new\_mono\_instances} (\S\ref{relevance-filter})

  1142 and \textit{max\_mono\_iters} (\S\ref{relevance-filter}).}

  1144 \opfalse{strict}{non\_strict}

  1145 Specifies whether Sledgehammer should run in its strict mode. In that mode,

  1146 sound type encodings marked with an asterisk (*) above are made complete

  1147 for reconstruction with \textit{metis}, at the cost of some clutter in the

  1148 generated problems. This option has no effect if \textit{type\_enc} is

  1149 deliberately set to an unsound encoding.

  1150 \end{enum}

  1152 \subsection{Relevance Filter}

  1153 \label{relevance-filter}

  1155 \begin{enum}

  1156 \opdefault{relevance\_thresholds}{float\_pair}{\upshape 0.45~0.85}

  1157 Specifies the thresholds above which facts are considered relevant by the

  1158 relevance filter. The first threshold is used for the first iteration of the

  1159 relevance filter and the second threshold is used for the last iteration (if it

  1160 is reached). The effective threshold is quadratically interpolated for the other

  1161 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems

  1162 are relevant and 1 only theorems that refer to previously seen constants.

  1164 \opdefault{max\_relevant}{smart\_int}{smart}

  1165 Specifies the maximum number of facts that may be returned by the relevance

  1166 filter. If the option is set to \textit{smart}, it is set to a value that was

  1167 empirically found to be appropriate for the prover. A typical value would be

  1168 250.

  1170 \opdefault{max\_new\_mono\_instances}{int}{\upshape 200}

  1171 Specifies the maximum number of monomorphic instances to generate beyond

  1172 \textit{max\_relevant}. The higher this limit is, the more monomorphic instances

  1173 are potentially generated. Whether monomorphization takes place depends on the

  1174 type encoding used.

  1176 \nopagebreak

  1177 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}

  1179 \opdefault{max\_mono\_iters}{int}{\upshape 3}

  1180 Specifies the maximum number of iterations for the monomorphization fixpoint

  1181 construction. The higher this limit is, the more monomorphic instances are

  1182 potentially generated. Whether monomorphization takes place depends on the

  1183 type encoding used.

  1185 \nopagebreak

  1186 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}

  1187 \end{enum}

  1189 \subsection{Output Format}

  1190 \label{output-format}

  1192 \begin{enum}

  1194 \opfalse{verbose}{quiet}

  1195 Specifies whether the \textbf{sledgehammer} command should explain what it does.

  1196 This option is implicitly disabled for automatic runs.

  1198 \opfalse{debug}{no\_debug}

  1199 Specifies whether Sledgehammer should display additional debugging information

  1200 beyond what \textit{verbose} already displays. Enabling \textit{debug} also

  1201 enables \textit{verbose} and \textit{blocking} (\S\ref{mode-of-operation})

  1202 behind the scenes. The \textit{debug} option is implicitly disabled for

  1203 automatic runs.

  1205 \nopagebreak

  1206 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}

  1208 \opfalse{isar\_proof}{no\_isar\_proof}

  1209 Specifies whether Isar proofs should be output in addition to one-liner

  1210 \textit{metis} proofs. Isar proof construction is still experimental and often

  1211 fails; however, they are usually faster and sometimes more robust than

  1212 \textit{metis} proofs.

  1214 \opdefault{isar\_shrink\_factor}{int}{\upshape 1}

  1215 Specifies the granularity of the Isar proof. A value of $n$ indicates that each

  1216 Isar proof step should correspond to a group of up to $n$ consecutive proof

  1217 steps in the ATP proof.

  1218 \end{enum}

  1220 \subsection{Authentication}

  1221 \label{authentication}

  1223 \begin{enum}

  1224 \opnodefault{expect}{string}

  1225 Specifies the expected outcome, which must be one of the following:

  1227 \begin{enum}

  1228 \item[\labelitemi] \textbf{\textit{some}:} Sledgehammer found a proof.

  1229 \item[\labelitemi] \textbf{\textit{none}:} Sledgehammer found no proof.

  1230 \item[\labelitemi] \textbf{\textit{timeout}:} Sledgehammer timed out.

  1231 \item[\labelitemi] \textbf{\textit{unknown}:} Sledgehammer encountered some

  1232 problem.

  1233 \end{enum}

  1235 Sledgehammer emits an error (if \textit{blocking} is enabled) or a warning

  1236 (otherwise) if the actual outcome differs from the expected outcome. This option

  1237 is useful for regression testing.

  1239 \nopagebreak

  1240 {\small See also \textit{blocking} (\S\ref{mode-of-operation}) and

  1241 \textit{timeout} (\S\ref{timeouts}).}

  1242 \end{enum}

  1244 \subsection{Timeouts}

  1245 \label{timeouts}

  1247 \begin{enum}

  1248 \opdefault{timeout}{float\_or\_none}{\upshape 30}

  1249 Specifies the maximum number of seconds that the automatic provers should spend

  1250 searching for a proof. This excludes problem preparation and is a soft limit.

  1251 For historical reasons, the default value of this option can be overridden using

  1252 the option ``Sledgehammer: Time Limit'' in Proof General's ``Isabelle'' menu.

  1254 \opdefault{preplay\_timeout}{float\_or\_none}{\upshape 3}

  1255 Specifies the maximum number of seconds that \textit{metis} or \textit{smt}

  1256 should spend trying to ``preplay'' the found proof. If this option is set to 0,

  1257 no preplaying takes place, and no timing information is displayed next to the

  1258 suggested \textit{metis} calls.

  1260 \nopagebreak

  1261 {\small See also \textit{minimize} (\S\ref{mode-of-operation}).}

  1262 \end{enum}

  1264 \let\em=\sl

  1265 \bibliography{../manual}{}

  1266 \bibliographystyle{abbrv}

  1268 \end{document}

author	blanchet
	Sat, 04 Feb 2012 12:08:18 +0100
changeset 47237	d4754183ccce
parent 47194	2ded91a6cbd5
child 47239	cafa325419f6
permissions	-rw-r--r--