Next: Graphical analysis of SOTA
Up: Other views of the
Previous: Other views of the
Geoff Sutcliffe and Christian Suttner are running the CASC^{9}
competition for many years now, and provide in [SS01]
some clues about what is a fair way to evaluate automated theorem
provers. One of the key ideas of their work is the notion of State Of The Art (SOTA) solver. It is based on a subsumption
relationship between the set of instances solved by the competing
solvers: ``a solver A is better than a solver B iff solver A solves a
strict subset of the instances solved by solver B''. The underlying
idea is that any solver being the only one to solve an instance is
meaningful. The subsumption relationship provides a partial order
between the competing solvers, and a virtual solver representing
advances of the whole community, the SOTA solver. This solver
can solve every problem solved by any of the competing solvers (so
would be the unique maximal element for the subsumption
relationship).
There is a little chance that the SOTA solver is a real solver,
and thus the notion of SOTA contributors is introduced by the
authors. They correspond to the maximal elements of the subsumption
relationship. The SOTA solver is equivalent to the set of SOTA
contributors running in parallel.
Tables 10, 11 and
12 show the problems solved by only one solver during
the first stage of the competition, for the three categories of
benchmarks (horsconcours solvers are discarded). We can now find
SOTA contributors from each table:
 Table 10 (industrial)
 SOTA contributors for
industrial instances are 2clseq, berkmin, limmat, modoc and zchaff.
Note that 2clseq worked very well on rand_net benchmarks, same thing
for zchaff and pipe instances.
 Table 11 (handmade)
 SOTA contributors for
handmade benchmarks are berkmin, limmat, rb2cl and zchaff.
 Table 12 (random)
 SOTA contributors for
randomly generated benchmarks are 2clseq, dlmsat1, dlmsat3, OKsolver,
unitwalk, usat05, usat10.
Table 10:
Uniquely solved industrial benchmarks during the first stage
Solver 
Bench (shortname) 
cpu (s) 
2clseq 
bmc2/cnt09.cnf 
198.00 
2clseq 
rand_net/506010 
96.33 
2clseq 
rand_net/604010 
16.03 
2clseq 
rand_net/606010 
180.45 
2clseq 
rand_net/60605 
172.22 
2clseq 
rand_net/704010 
35.02 
2clseq 
rand_net/70405 
52.15 
2clseq 
rand_net/706010 
188.11 
2clseq 
rand_net/70605 
611.71 
2clseq 
satexc/c6288s 
0.73 
2clseq 
satexc/c6288 
0.77 
berkmin 
dinphil/dp10u09 
321.67 
berkmin 
fpgar/rcs_w9 
2 054.50 
berkmin 
satexc/cnfr4b2k1.1 
1 402.24 
limmat 
Homer/homer16 
2 315.05 
limmat 
dinphil/dp12s12 
5.21 
modoc 
Homer/homer15 
843.87 
zchaff 
fifo8/fifo8_200 
1 417.68 
zchaff 
w10/w10_70.cnf 
661.74 
zchaff 
satex/9vliw_bp_mc 
1 274.94 
zchaff 
fvpu.2.0/5pipe 
186.58 
zchaff 
fvpu.2.0/5pipe_3_oo 
533.11 
zchaff 
fvpu.2.0/5pipe_4_ooo 
1 667.30 
zchaff 
fvpu.2.0/5pipe_5_ooo 
814.92 
zchaff 
fvpu.2.0/7pipe_bug 
452.14 

Table 11:
Uniquely solved handmade benchmarks during the first stage
Solver 
Benchmark 
CPU (s) 
berkmin 
hanoi6_on 
295.04 
berkmin 
rope_1000 
2 058.62 
berkmin 
x1.1_32 
57.13 
berkmin 
x1_32 
82.18 
limmat 
pyhalabraunsat40401 
1 295.77 
rb2cl 
lisa21_1_a 
1 955.62 
zchaff 
lisa21_2_a 
287.74 
zchaff 
pyhalabraunsat40402 
971.59 
zchaff 
pyhalabraununsat35401 
1 474.30 
zchaff 
pyhalabraununsat35402 
1 653.80 
zchaff 
Urquharts3b5 
1 507.11 
zchaff 
x1.1_36 
786.60 
zchaff 
x2_36 
1 828.81 

Table 12:
Uniquely solved randomly generated benchmarks during the first stage
Solver 
Benchmark 
CPU (s) 
2clseq 
5col100_15_1 
1 148.08 
2clseq 
5col100_15_4 
1 097.78 
2clseq 
5col100_15_5 
1 018.13 
2clseq 
5col100_15_6 
1 353.14 
2clseq 
5col100_15_7 
748.21 
2clseq 
5col100_15_8 
1 160.66 
2clseq 
5col100_15_9 
821.42 
dlmsat1 
hgen5v300s1895562135 
391.94 
dlmsat1 
hgen5v300s528975468 
512.81 
dlmsat1 
hgen2v300s1807441418 
651.35 
dlmsat1 
hgen3v450s646636548 
141.16 
dlmsat1 
hgen4v200s2074278220 
16.79 
dlmsat1 
hgen4v200s812807056 
1 158.20 
dlmsat3 
hgen3v450s356974048 
821.42 
dlmsat3 
4col280_9_2 
1 766.60 
dlmsat3 
4col280_9_4 
6.60 
dlmsat3 
4col280_9_6 
1 792.59 
dlmsat3 
4col280_9_7 
108.43 
OKsolver 
glassyv399s1993893641 
1 162.07 
OKsolver 
glassyv399s524425695 
814.44 
OKsolver 
glassybpv399s1499943388 
273.31 
OKsolver 
glassybpv399s944837607 
615.27 
OKsolver 
glassybv399s1267848873 
1 001.71 
OKsolver 
5cnf_3900_3900_160 
243.24 
OKsolver 
5cnf_3900_3900_170 
911.32 
OKsolver 
5cnf_3900_3900_180 
1 790.46 
OKsolver 
5cnf_3900_3900_190 
1 778.96 
OKsolver 
5cnf_4300_4300_090 
1 390.55 
OKsolver 
5cnf_4300_4300_100 
1 311.92 
unitwalk 
hgen2v650s2139597266 
536.09 
unitwalk 
hgen2v700s543738649 
32.38 
unitwalk 
hgen2v700s548148704 
0.62 
unitwalk 
hgen3v500s1754870155 
157.03 
usat05 
okgenc2550v600s552691850 
5.57 
usat10 
hgen3v450s1400022686 
0.39 

In order to obtain a total order among the solvers, one must first
classify benchmarks themselves. For this, the notion of SOTA
contributors can be used [SS01]: benchmarks solved by
all SOTA contributors are said easy, those solved by at
least one SOTA contributor (but not all) are called difficult^{10}. Note that the benchmarks not solved by any
solver are not considered here.
Now, it is easy to rank the solvers accordingly to the number of
difficult instances they can solve. Tables
13 and 14 provide a
summary for SAT2002 competition, for respectively complete solvers
and satisfiable benchmarks.
Table 13:
Number of difficult SAT+UNSAT benchmarks solved by each solver during the first stage
Industrial 
Handmade 
Randomly generated 
solver 
#solved 
solver 
#solved 
solver 
#solved 
berkmin 
121 
berkmin 
68 
OKsolver 
548 
zchaff 
104 
zchaff 
68 
marchII 
543 
2clseq 
93 
limmat 
47 
marchIIse 
543 
limmat 
87 
simo 
43 
2clseq 
369 
simo 
57 
2clseq 
34 
rb2cl 
338 
rb2cl 
36 
OKsolver 
33 
zchaff 
305 
OKsolver 
35 
jquest 
16 
simo 
298 
jquest 
31 
marchIIse 
13 
berkmin 
263 
marchIIse 
29 
rb2cl 
10 
limmat 
221 
modoc 
26 
marchII 
9 
modoc 
219 
blindsat 
20 
modoc 
7 
jquest 
122 
marchII 
16 
blindsat 
0 
blindsat 
3 

Table 14:
Number of difficult satisfiable benchmarks solved by each solver during the first stage
Industrial 
Handmade 
Randomly generated 
solver 
#solved 
solver 
#solved 
solver 
#solved 
berkmin 
68 
zchaff 
35 
OKsolver 
548 
zchaff 
64 
berkmin 
33 
marchII 
543 
limmat 
54 
limmat 
29 
marchIIse 
543 
2clseq 
46 
simo 
27 
2clseq 
369 
unitwalk 
41 
OKsolver 
23 
rb2cl 
338 
simo 
39 
2clseq 
21 
zchaff 
305 
dlmsat2 
37 
unitwalk 
11 
simo 
298 
dlmsat1 
36 
dlmsat1 
11 
berkmin 
263 
rb2cl 
35 
marchIIse 
10 
dlmsat1 
255 
dlmsat3 
35 
dlmsat3 
10 
dlmsat2 
234 
saturn 
34 
saturn 
9 
dlmsat3 
233 
usat10 
33 
usat05 
9 
unitwalk 
227 
usat05 
31 
usat10 
9 
limmat 
221 
OKsolver 
30 
dlmsat2 
9 
modoc 
219 
marchIIse 
26 
marchII 
7 
saturn 
208 
jquest 
24 
rb2cl 
6 
usat10 
206 
blindsat 
20 
modoc 
5 
usat05 
205 
modoc 
20 
jquest 
4 
jquest 
122 
ga 
18 
blindsat 
0 
blindsat 
3 
marchII 
13 
ga 
0 
ga 
3 

What can we conclude? First of all, we awarded SOTA
contributors. Looking at the SOTA ranking per category, berkmin,
zchaff and OKsolver would be awarded. Limmat, our fourth awarded, is
most of the time third after zchaff and berkmin in industrial and
handmade categories. So the result of the SAT2002 competition looks
reasonable. Berkmin certainly deserves a special attention from the
community, since despite the bug that made it crashed on more than 100
benchmarks, it is ranked first in the industrial category using the
SOTA system. This little impact of crashes can certainly be due to
the fact that only the second SATengine of Berkmin crashed, which
means that most instances on which Berkmin crashed are hard for all
solvers anyway (this engine is called for hardest instances only).
Furthermore, we now have difficult and unsolved instances for the
next competition. The degree of difficulty provided by the SOTA
system can be used to tune solvers for the next competition: first
try to solve instances of medium difficulty, then try the really hard
ones. All that information will be included in the instances archive.
Next: Graphical analysis of SOTA
Up: Other views of the
Previous: Other views of the
LE BERRE Daniel
20020916