150 %suggesting several textual improvements. |
150 %suggesting several textual improvements. |
151 |
151 |
152 \section{Installation} |
152 \section{Installation} |
153 \label{installation} |
153 \label{installation} |
154 |
154 |
155 Sledgehammer is part of Isabelle, so you don't need to install it. However, it |
155 Sledgehammer is part of Isabelle, so you do not need to install it. However, it |
156 relies on third-party automatic provers (ATPs and SMT solvers). |
156 relies on third-party automatic provers (ATPs and SMT solvers). |
157 |
157 |
158 Among the ATPs, E, LEO-II, Satallax, SPASS, and Vampire can be run locally; in |
158 Among the ATPs, E, LEO-II, Satallax, SPASS, and Vampire can be run locally; in |
159 addition, E, E-SInE, E-ToFoF, iProver, iProver-Eq, LEO-II, Satallax, SNARK, |
159 addition, E, E-SInE, E-ToFoF, iProver, iProver-Eq, LEO-II, Satallax, SNARK, |
160 Vampire, and Waldmeister are available remotely via System\-On\-TPTP |
160 Vampire, and Waldmeister are available remotely via System\-On\-TPTP |
299 Depending on which provers are installed and how many processor cores are |
299 Depending on which provers are installed and how many processor cores are |
300 available, some of the provers might be missing or present with a |
300 available, some of the provers might be missing or present with a |
301 \textit{remote\_} prefix. Waldmeister is run only for unit equational problems, |
301 \textit{remote\_} prefix. Waldmeister is run only for unit equational problems, |
302 where the goal's conclusion is a (universally quantified) equation. |
302 where the goal's conclusion is a (universally quantified) equation. |
303 |
303 |
304 For each successful prover, Sledgehammer gives a one-liner proof that uses |
304 For each successful prover, Sledgehammer gives a one-liner \textit{metis} or |
305 the \textit{metis} or \textit{smt} proof method. Approximate timings are shown |
305 \textit{smt} method call. Rough timings are shown in parentheses, indicating how |
306 in parentheses, indicating how fast the call is. You can click the proof to |
306 fast the call is. You can click the proof to insert it into the theory text. |
307 insert it into the theory text. |
|
308 |
307 |
309 In addition, you can ask Sledgehammer for an Isar text proof by passing the |
308 In addition, you can ask Sledgehammer for an Isar text proof by passing the |
310 \textit{isar\_proof} option (\S\ref{output-format}): |
309 \textit{isar\_proof} option (\S\ref{output-format}): |
311 |
310 |
312 \prew |
311 \prew |
384 |
383 |
385 \section{Frequently Asked Questions} |
384 \section{Frequently Asked Questions} |
386 \label{frequently-asked-questions} |
385 \label{frequently-asked-questions} |
387 |
386 |
388 This sections answers frequently (and infrequently) asked questions about |
387 This sections answers frequently (and infrequently) asked questions about |
389 Sledgehammer. It is a good idea to skim over it now even if you don't have any |
388 Sledgehammer. It is a good idea to skim over it now even if you do not have any |
390 questions at this stage. And if you have any further questions not listed here, |
389 questions at this stage. And if you have any further questions not listed here, |
391 send them to the author at \authoremail. |
390 send them to the author at \authoremail. |
392 |
391 |
393 \point{Which facts are passed to the automatic provers?} |
392 \point{Which facts are passed to the automatic provers?} |
394 |
393 |
395 The relevance filter assigns a score to every available fact (lemma, theorem, |
394 Sledgehammer heuristically selects a few hundred relevant lemmas from the |
396 definition, or axiom) based upon how many constants that fact shares with the |
395 currently loaded libraries. The component that performs this selection is |
397 conjecture. This process iterates to include facts relevant to those just |
396 called \emph{relevance filter}. |
398 accepted, but with a decay factor to ensure termination. The constants are |
397 |
399 weighted to give unusual ones greater significance. The relevance filter copes |
398 \begin{enum} |
400 best when the conjecture contains some unusual constants; if all the constants |
399 \item[\labelitemi] |
401 are common, it is unable to discriminate among the hundreds of facts that are |
400 The traditional relevance filter, called \emph{MePo}, assigns a score to every |
402 picked up. The relevance filter is also memoryless: It has no information about |
401 available fact (lemma, theorem, definition, or axiom) based upon how many |
403 how many times a particular fact has been used in a proof, and it cannot learn. |
402 constants that fact shares with the conjecture. This process iterates to include |
|
403 facts relevant to those just accepted. The constants are weighted to give |
|
404 unusual ones greater significance. MePo copes best when the conjecture contains |
|
405 some unusual constants; if all the constants are common, it is unable to |
|
406 discriminate among the hundreds of facts that are picked up. The filter is also |
|
407 memoryless: It has no information about how many times a particular fact has |
|
408 been used in a proof, and it cannot learn. |
|
409 |
|
410 \item[\labelitemi] |
|
411 An experimental, memoryful alternative to MePo is \emph{MaSh} |
|
412 (\underline{Ma}chine Learner for \underline{S}ledge\underline{h}ammer). It |
|
413 relies on an external tool called \texttt{mash} that applies machine learning to |
|
414 the problem of finding relevant facts. |
|
415 |
|
416 \item[\labelitemi] The \emph{Mesh} filter combines MePo and MaSh. |
|
417 \end{enum} |
|
418 |
|
419 The default is either MePo or Mesh, depending on whether \texttt{mash} is |
|
420 installed and what class of provers the target prover belongs to |
|
421 (\S\ref{relevance-filter}). |
404 |
422 |
405 The number of facts included in a problem varies from prover to prover, since |
423 The number of facts included in a problem varies from prover to prover, since |
406 some provers get overwhelmed more easily than others. You can show the number of |
424 some provers get overwhelmed more easily than others. You can show the number of |
407 facts given using the \textit{verbose} option (\S\ref{output-format}) and the |
425 facts given using the \textit{verbose} option (\S\ref{output-format}) and the |
408 actual facts using \textit{debug} (\S\ref{output-format}). |
426 actual facts using \textit{debug} (\S\ref{output-format}). |
597 \postw |
615 \postw |
598 |
616 |
599 \point{Auto can solve it---why not Sledgehammer?} |
617 \point{Auto can solve it---why not Sledgehammer?} |
600 |
618 |
601 Problems can be easy for \textit{auto} and difficult for automatic provers, but |
619 Problems can be easy for \textit{auto} and difficult for automatic provers, but |
602 the reverse is also true, so don't be discouraged if your first attempts fail. |
620 the reverse is also true, so do not be discouraged if your first attempts fail. |
603 Because the system refers to all theorems known to Isabelle, it is particularly |
621 Because the system refers to all theorems known to Isabelle, it is particularly |
604 suitable when your goal has a short proof from lemmas that you don't know about. |
622 suitable when your goal has a short proof from lemmas that you do not know |
|
623 about. |
605 |
624 |
606 \point{Why are there so many options?} |
625 \point{Why are there so many options?} |
607 |
626 |
608 Sledgehammer's philosophy should work out of the box, without user guidance. |
627 Sledgehammer's philosophy should work out of the box, without user guidance. |
609 Many of the options are meant to be used mostly by the Sledgehammer developers |
628 Many of the options are meant to be used mostly by the Sledgehammer developers |
652 currently running automatic provers, including elapsed runtime and remaining |
671 currently running automatic provers, including elapsed runtime and remaining |
653 time until timeout. |
672 time until timeout. |
654 |
673 |
655 \item[\labelitemi] \textbf{\textit{kill\_provers}:} Terminates all running |
674 \item[\labelitemi] \textbf{\textit{kill\_provers}:} Terminates all running |
656 automatic provers. |
675 automatic provers. |
|
676 |
|
677 \item[\labelitemi] \textbf{\textit{unlearn}:} Resets the MaSh machine learner, |
|
678 erasing any persistent state. |
|
679 |
|
680 \item[\labelitemi] \textbf{\textit{learn}:} Invokes the MaSh machine learner on |
|
681 the current theory to process all the available facts. This happens |
|
682 automatically at Sledgehammer invocations if the \textit{learn} option |
|
683 (\S\ref{relevance-filter}) is enabled. |
|
684 |
|
685 \item[\labelitemi] \textbf{\textit{relearn}:} Same as \textit{unlearn} followed |
|
686 by \textit{learn}. |
|
687 |
|
688 \item[\labelitemi] \textbf{\textit{running\_learners}:} Prints information about |
|
689 currently running machine learners, including elapsed runtime and remaining |
|
690 time until timeout. |
|
691 |
|
692 \item[\labelitemi] \textbf{\textit{kill\_learners}:} Terminates all running |
|
693 machine learners. |
657 |
694 |
658 \item[\labelitemi] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote |
695 \item[\labelitemi] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote |
659 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}. |
696 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}. |
660 \end{enum} |
697 \end{enum} |
661 |
698 |
974 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may |
1011 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may |
975 safely remove them after Sledgehammer has run. |
1012 safely remove them after Sledgehammer has run. |
976 |
1013 |
977 \nopagebreak |
1014 \nopagebreak |
978 {\small See also \textit{debug} (\S\ref{output-format}).} |
1015 {\small See also \textit{debug} (\S\ref{output-format}).} |
|
1016 \end{enum} |
|
1017 |
|
1018 \subsection{Relevance Filter} |
|
1019 \label{relevance-filter} |
|
1020 |
|
1021 \begin{enum} |
|
1022 \opdefault{max\_facts}{smart\_int}{smart} |
|
1023 Specifies the maximum number of facts that may be returned by the relevance |
|
1024 filter. If the option is set to \textit{smart}, it is set to a value that was |
|
1025 empirically found to be appropriate for the prover. Typical values range between |
|
1026 50 and 1000. |
|
1027 |
|
1028 \opdefault{fact\_thresholds}{float\_pair}{\upshape 0.45~0.85} |
|
1029 Specifies the thresholds above which facts are considered relevant by the |
|
1030 relevance filter. The first threshold is used for the first iteration of the |
|
1031 relevance filter and the second threshold is used for the last iteration (if it |
|
1032 is reached). The effective threshold is quadratically interpolated for the other |
|
1033 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems |
|
1034 are relevant and 1 only theorems that refer to previously seen constants. |
|
1035 |
|
1036 \opdefault{max\_new\_mono\_instances}{int}{smart} |
|
1037 Specifies the maximum number of monomorphic instances to generate beyond |
|
1038 \textit{max\_facts}. The higher this limit is, the more monomorphic instances |
|
1039 are potentially generated. Whether monomorphization takes place depends on the |
|
1040 type encoding used. If the option is set to \textit{smart}, it is set to a value |
|
1041 that was empirically found to be appropriate for the prover. For most provers, |
|
1042 this value is 200. |
|
1043 |
|
1044 \nopagebreak |
|
1045 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).} |
|
1046 |
|
1047 \opdefault{max\_mono\_iters}{int}{smart} |
|
1048 Specifies the maximum number of iterations for the monomorphization fixpoint |
|
1049 construction. The higher this limit is, the more monomorphic instances are |
|
1050 potentially generated. Whether monomorphization takes place depends on the |
|
1051 type encoding used. If the option is set to \textit{smart}, it is set to a value |
|
1052 that was empirically found to be appropriate for the prover. For most provers, |
|
1053 this value is 3. |
|
1054 |
|
1055 \nopagebreak |
|
1056 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).} |
979 \end{enum} |
1057 \end{enum} |
980 |
1058 |
981 \subsection{Problem Encoding} |
1059 \subsection{Problem Encoding} |
982 \label{problem-encoding} |
1060 \label{problem-encoding} |
983 |
1061 |
1141 for reconstruction with \textit{metis}, at the cost of some clutter in the |
1219 for reconstruction with \textit{metis}, at the cost of some clutter in the |
1142 generated problems. This option has no effect if \textit{type\_enc} is |
1220 generated problems. This option has no effect if \textit{type\_enc} is |
1143 deliberately set to an unsound encoding. |
1221 deliberately set to an unsound encoding. |
1144 \end{enum} |
1222 \end{enum} |
1145 |
1223 |
1146 \subsection{Relevance Filter} |
|
1147 \label{relevance-filter} |
|
1148 |
|
1149 \begin{enum} |
|
1150 \opdefault{max\_facts}{smart\_int}{smart} |
|
1151 Specifies the maximum number of facts that may be returned by the relevance |
|
1152 filter. If the option is set to \textit{smart}, it is set to a value that was |
|
1153 empirically found to be appropriate for the prover. Typical values range between |
|
1154 50 and 1000. |
|
1155 |
|
1156 \opdefault{fact\_thresholds}{float\_pair}{\upshape 0.45~0.85} |
|
1157 Specifies the thresholds above which facts are considered relevant by the |
|
1158 relevance filter. The first threshold is used for the first iteration of the |
|
1159 relevance filter and the second threshold is used for the last iteration (if it |
|
1160 is reached). The effective threshold is quadratically interpolated for the other |
|
1161 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems |
|
1162 are relevant and 1 only theorems that refer to previously seen constants. |
|
1163 |
|
1164 \opdefault{max\_new\_mono\_instances}{int}{smart} |
|
1165 Specifies the maximum number of monomorphic instances to generate beyond |
|
1166 \textit{max\_facts}. The higher this limit is, the more monomorphic instances |
|
1167 are potentially generated. Whether monomorphization takes place depends on the |
|
1168 type encoding used. If the option is set to \textit{smart}, it is set to a value |
|
1169 that was empirically found to be appropriate for the prover. For most provers, |
|
1170 this value is 200. |
|
1171 |
|
1172 \nopagebreak |
|
1173 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).} |
|
1174 |
|
1175 \opdefault{max\_mono\_iters}{int}{smart} |
|
1176 Specifies the maximum number of iterations for the monomorphization fixpoint |
|
1177 construction. The higher this limit is, the more monomorphic instances are |
|
1178 potentially generated. Whether monomorphization takes place depends on the |
|
1179 type encoding used. If the option is set to \textit{smart}, it is set to a value |
|
1180 that was empirically found to be appropriate for the prover. For most provers, |
|
1181 this value is 3. |
|
1182 |
|
1183 \nopagebreak |
|
1184 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).} |
|
1185 \end{enum} |
|
1186 |
|
1187 \subsection{Output Format} |
1224 \subsection{Output Format} |
1188 \label{output-format} |
1225 \label{output-format} |
1189 |
1226 |
1190 \begin{enum} |
1227 \begin{enum} |
1191 |
1228 |