Conclusions

The competition revealed many expected results as well as a few
surprising ones. First of all, **incomplete solvers**
appeared to be much weaker than complete ones. Note that no
incomplete solver won any category of satisfiable formulas while
these categories were intended rather for them than for complete
solvers (note that in theory randomized one-sided error algorithms
accept potentially more languages than deterministic ones). In
Industrial and Handmade categories only one incomplete solver
(UnitWalk) was in the top five. This can be due to the need of
specific tuning for local search algorithms (noise, length of walk, etc.)
and, probably, to the
lack of new ideas for the use of randomization (except for local
search). If automatic tuning (for similar benchmarks) of incomplete solvers is possible, how
to incorporate it in the competition?

On industrial and handmade benchmarks, **zchaff and
algorithms close to it** (Berkmin and limmat) were dominating. On the
other hand, the situation on **randomly generated instances**
was quite different. These algorithms were not even in the top five!
Also, randomly generated satisfiable instances were the only category
where incomplete algorithms were competitive (four of them:
dlmsat1(2,3) and UnitWalk, were in the top five). Concerning the
unsatisfiable instances, the top five list is also looking quite
differently to other categories.

In fact, only two solvers appeared in all the top five lists
for the three categories Industrial/Handmade/Random:
a *non*-zchaff-like complete solver 2clseq for SAT+UNSAT and
an incomplete solver UnitWalk for SAT. However, they did not win anything.
The (unsurprising) conclusion is that specialized solvers indeed
perform better on the classes of benchmarks they are specialized for.
Also it confirms that our choice of categories was right.
But maybe an award should be given to algorithms performing uniformly on all
kinds of instances (while some part of the community was against an ``overall
winner'' for the present competition).

Another conclusion of the competition is that almost all benchmarks that remained
**unsolved within 40 minutes** on P-III-450
(or a close number of CPU cycles on a faster machine)
have not been solved in 6 hours either. This can be partially
due to the fact that few people experimented with the behaviour of their
solvers for that long. Note that the greatest number of second stage benchmarks
was solved in Industrial-SAT+UNSAT category, the one where probably
the greatest number of experiments is made by the solvers authors.
Also many solvers crashed on huge
formulas (probably due to the lack of memory).

It is no surprise that the **smallest unsolved unsatisfiable** benchmark
(xor-chain instance by L. Zhang) belongs to Handmade category. In fact, many of
unsatisfiable benchmarks in this category are also very small. However,
it seems like all these benchmarks are hard only for resolution
(and hence DP- and DLL-like algorithms)
where exponential lower bounds are known for decades (see, e.g.,
[Tse68,Urq87]). Therefore, if non-resolution-based complete algorithms come,
these benchmarks will be probably easy for them.
For example, LSAT (fixed version) and eqsatz (not participated)
which employ equality reasoning can easily solve parity32 instances
that remained unsolved in the competition.

On the other hand, the **smallest unsolved** (provably)
**satisfiable** benchmark (hgen2 instance by Edward A. Hirsch)
is made using a random generator. Other small hard satisfiable benchmarks
also belong to Random category. These benchmarks are much larger
than hard unsatisfiable ones (5250 vs 844 literal occurrences).
This is partially due to the fact that no
exponential lower bounds are known for DPLL-like algorithms for
*satisfiable* formulas (in fact, the only such lower bounds we know
for other SAT algorithms are of [Hir00b]). In contrast, no random generator
was submitted for (provably) unsatisfiable instances.
(Of course, some of the handmade unsatisfiable instances can be
generated using random structures behind them; however, this does not
give
a language not known to be in coRP (and even in ZPP).)
Note that the existence of an efficient generator
of a *coNP-complete* language
would imply NP=coNP (random bits form a short certificate of membership).

Probably, at the end, the main thing about competition is that it attracted completely new solvers (e.g., 2clseq and limmat) and a lot of new benchmarks.

Some challenging questions, drawn from the conclusion, are:

- Construct a
*random*generator for a ``hard'' language of provably*unsatisfiable*formulas. - Design an incomplete algorithm that would outperform complete algorithms (on satisfiable formulas), or explain why it is not possible.
- Construct
*satisfiable*formulas giving exponential lower bounds for the worst-case running time of*DPLL-like*algorithms. - For the competition: how to request/represent a certificate for
``unsatisfiable'' answers without violating the rights of
non-DPLL-like algorithms?