Next: A revolution? Up: The SAT2002 Competition (preliminary Previous: Acknowledgements

Introduction

The SAT2002 solver competition, involving more than 30 solvers and 2300 benchmarks, took place in Cincinnati a decade after the first SAT solver competition held in Paderborn in 1991/1992 [BB93]. Two other SAT competitions were organized since that date: the Second DIMACS Challenge, held in 1992/1993 [JT96], and the Beijing competition, in 1996³. In the last few years, the need for such a competition was more and more obvious, reflecting the recent and important progress in the field. A lot of papers have been published concerning ``heuristics'' algorithms for the NP-complete satisfiability problem [Coo71], and even software exists that people use on real-world applications. Many techniques are currently available, and it is difficult to compare them. This comparison can hardly be only on the theoretical level, because it often does not tell anything from a practical viewpoint. A competition can lead to some empirical evaluation of current algorithms (as good as possible) and thus can be viewed as a snapshot of the state-of-the-art solvers at a given time. The data collected during this competition will probably help to identify classes of hard instances, solvers limitations and allow to give appropriate answers in the next few years. Moreover, we think that the idea of such a competition takes place in a more general idea of empirical evaluation of algorithms. In a number of computer science fields, we need more and more knowledge about the behavior of the algorithms we design and about the characteristics of benchmarks. This competition is a step in a better - and crucial - empirical knowledge of SAT algorithms and benchmarks [Hoo94,Hoo96]. The aim of this paper is to report what organizers learned during this competition (about solvers, benchmarks and the competition itself), and to publish enough data to allow the reader to make is own opinion about the results.

As it was mentioned, in the first half of the last decade, some competitions were organized to compare solvers. However, one major - and recent - step in that area was the creation of the SAT-Ex web site [SC01], an online collection of results concerning various SAT solvers on some classical benchmarks. Among all the advantages of this kind of publication, SAT-Ex allows to check every solver output, generate dynamically synthesis and add constantly new solvers and benchmarks.

More and more researchers would like to see how their solver compares with other solvers on the current benchmark set. This is not really a problem because only a few CPU days are needed to update SAT-Ex database with a new solver: usually, the new solver will outperform old ones for some series of benchmarks. A problem arises with the introduction of new benchmarks: the benchmarks have to be tested on each solver, and they are likely to give them a hard time. Since all the results available on SAT-Ex were obtained on the same computer (the only way to provide a fair comparison based on the CPU-time) it will take ages before seeing results on new benchmarks. To solve that problem, there are several solutions:

working with a cluster of computers instead of a single one. Laurent Simon is currently working that out, preparing a new (and updated) release of SAT-Ex.
using SAT-Ex system as a convenient way to take a picture of some SAT solvers efficiency on a given set of benchmarks.

This last point is one of our major technical choices: using SAT-Ex architecture for the competition, providing a SAT-Ex style online publication. In order to enlight some of the other choices we made during the competition, let us first recall some SAT statements. Currently, approaches to solve SAT can be divided into two categories: complete and incomplete ones. A complete solver can prove satisfiability as well as unsatisfiability of a boolean formula. On the contrary, an incomplete solver can only prove that a formula is satisfiable, usually by providing a model of the formula (a certificate of satisfiability).

Most complete solvers descend from the backtrack search version of the initial resolution-based Davis and Putnam algorithm [DP60,DLL62], often referred to as DPLL algorithm. It can be viewed as an enumeration of all the truth assignments of a formula, hence if no model is found, the formula is unsatisfiable. Last decade has resulted in many improvements of that algorithm in various aspects both in theory (exponential worst-case upper bounds; the most recent are [DGH$^+$02,Hir00a]) and in practice (heuristics, especially for -SAT formulas: [DABC96,Fre95,LA97,DD01], data structures [ZS96] and local processing). Forward local processing is used to reduce the search space in collaboration with heuristics (unit propagation lookahead [Li99], binary clause reasoning [VT96], equivalence reasoning [Li00], etc.). Backward local processing tries to correct mistakes made by the heuristics: learning, intelligent backtracking, backjumping, etc [BS97,Zha97,MSS96]. Also randomization is used to correct wrong heuristic choices: random ties breaking and rapid restart strategies have been shown successful for solving some structured instances (planning [GSK98], Bounded Model Checking (BMC) [BMS00]).

Another use of randomization is the design of incomplete solvers, where randomness is inherent. There was an increased interest in their experimental study after the papers on greedy algorithms and later WalkSAT [SLM92,SKC94a]. Encouraging average-case time complexity results are known for this type of algorithms (see, e.g., [KP92]). In theory, incomplete solvers could perform (much) better than complete ones just because they belong to a wider computational model. Indeed, there are benchmarks (especially coming from various random generators) on which incomplete solvers perform much better. Worst-case time bounds are also better for incomplete algorithms [SSWH02].

Subsections

- - A revolution?

Next: A revolution? Up: The SAT2002 Competition (preliminary Previous: Acknowledgements

LE BERRE Daniel 2002-09-16