The SAT2002 solver competition, involving more than 30 solvers and
2300 benchmarks, took place in Cincinnati a decade after the first
SAT solver competition held in Paderborn in 1991/1992
[BB93]. Two other SAT competitions were organized since
that date: the Second DIMACS Challenge, held in 1992/1993
[JT96], and the Beijing competition, in
19963. In the last
few years, the need for such a competition was more and more
obvious, reflecting the recent and important progress in the field. A
lot of papers have been published concerning ``heuristics''
algorithms for the NP-complete satisfiability problem
[Coo71], and even software exists that people use on
real-world applications. Many techniques are currently available, and
it is difficult to compare them. This comparison can hardly be only
on the theoretical level, because it often does not tell anything from
a practical viewpoint. A competition can lead to some empirical
evaluation of current algorithms (as good as possible) and thus can
be viewed as a snapshot of the state-of-the-art solvers at a given
time. The data collected during this competition will probably help
to identify classes of hard instances, solvers limitations and allow to give
appropriate answers in the next few years. Moreover, we think that
the idea of such a competition takes place in a more general idea of
empirical evaluation of algorithms. In a number of computer science
fields, we need more and more knowledge about the behavior of the
algorithms we design and about the characteristics of
benchmarks. This competition is a step in a better - and crucial -
empirical knowledge of SAT algorithms and benchmarks
[Hoo94,Hoo96]. The aim of this paper is to report
what organizers learned during this competition (about solvers,
benchmarks and the competition itself), and to publish enough data to
allow the reader to make is own opinion about the results.
As it was mentioned, in the first half of the last decade, some competitions were organized to compare solvers. However, one major - and recent - step in that area was the creation of the SAT-Ex web site [SC01], an online collection of results concerning various SAT solvers on some classical benchmarks. Among all the advantages of this kind of publication, SAT-Ex allows to check every solver output, generate dynamically synthesis and add constantly new solvers and benchmarks.
More and more researchers would like to see how their solver compares with other solvers on the current benchmark set. This is not really a problem because only a few CPU days are needed to update SAT-Ex database with a new solver: usually, the new solver will outperform old ones for some series of benchmarks. A problem arises with the introduction of new benchmarks: the benchmarks have to be tested on each solver, and they are likely to give them a hard time. Since all the results available on SAT-Ex were obtained on the same computer (the only way to provide a fair comparison based on the CPU-time) it will take ages before seeing results on new benchmarks. To solve that problem, there are several solutions:
This last point is one of our major technical choices: using SAT-Ex architecture for the competition, providing a SAT-Ex style online publication. In order to enlight some of the other choices we made during the competition, let us first recall some SAT statements. Currently, approaches to solve SAT can be divided into two categories: complete and incomplete ones. A complete solver can prove satisfiability as well as unsatisfiability of a boolean formula. On the contrary, an incomplete solver can only prove that a formula is satisfiable, usually by providing a model of the formula (a certificate of satisfiability).
Most complete solvers descend from the backtrack search version of
the initial resolution-based Davis and Putnam algorithm
[DP60,DLL62], often referred to as DPLL algorithm. It
can be viewed as an enumeration of all the truth assignments of a
formula, hence if no model is found, the formula is
unsatisfiable. Last decade has resulted in many improvements of that
algorithm in various aspects both in theory (exponential worst-case
upper bounds; the most recent are [DGH$^+$02,Hir00a]) and in practice
(heuristics,
especially for -SAT formulas:
[DABC96,Fre95,LA97,DD01], data structures
[ZS96] and local processing). Forward local processing is
used to reduce the search space in collaboration with heuristics
(unit propagation lookahead [Li99], binary clause reasoning
[VT96], equivalence reasoning [Li00], etc.).
Backward local processing tries to correct mistakes made by the
heuristics: learning, intelligent backtracking, backjumping, etc
[BS97,Zha97,MSS96]. Also
randomization is used to correct wrong heuristic choices: random ties
breaking and rapid restart strategies have been shown successful for
solving some structured instances (planning [GSK98],
Bounded Model Checking (BMC) [BMS00]).
Another use of randomization is the design of incomplete solvers,
where randomness is inherent. There was an increased interest in
their experimental study after the papers on greedy algorithms and
later WalkSAT [SLM92,SKC94a]. Encouraging average-case time
complexity results are known for this type of algorithms (see, e.g.,
[KP92]). In theory, incomplete solvers could perform (much)
better than complete ones just because they belong to a wider
computational model. Indeed, there are benchmarks (especially coming
from various random generators) on which incomplete solvers perform
much better. Worst-case time bounds are also better for incomplete
algorithms [SSWH02].