next up previous
Next: Graphical analysis of SOTA Up: Other views of the Previous: Other views of the

SOTA view

Geoff Sutcliffe and Christian Suttner are running the CASC9 competition for many years now, and provide in [SS01] some clues about what is a fair way to evaluate automated theorem provers. One of the key ideas of their work is the notion of State Of The Art (SOTA) solver. It is based on a subsumption relationship between the set of instances solved by the competing solvers: ``a solver A is better than a solver B iff solver A solves a strict subset of the instances solved by solver B''. The underlying idea is that any solver being the only one to solve an instance is meaningful. The subsumption relationship provides a partial order between the competing solvers, and a virtual solver representing advances of the whole community, the SOTA solver. This solver can solve every problem solved by any of the competing solvers (so would be the unique maximal element for the subsumption relationship). There is a little chance that the SOTA solver is a real solver, and thus the notion of SOTA contributors is introduced by the authors. They correspond to the maximal elements of the subsumption relationship. The SOTA solver is equivalent to the set of SOTA contributors running in parallel.

Tables 10, 11 and 12 show the problems solved by only one solver during the first stage of the competition, for the three categories of benchmarks (hors-concours solvers are discarded). We can now find SOTA contributors from each table:

Table 10 (industrial)
SOTA contributors for industrial instances are 2clseq, berkmin, limmat, modoc and zchaff. Note that 2clseq worked very well on rand_net benchmarks, same thing for zchaff and pipe instances.

Table 11 (hand-made)
SOTA contributors for hand-made benchmarks are berkmin, limmat, rb2cl and zchaff.

Table 12 (random)
SOTA contributors for randomly generated benchmarks are 2clseq, dlmsat1, dlmsat3, OKsolver, unitwalk, usat05, usat10.

Table 10: Uniquely solved industrial benchmarks during the first stage
Solver Bench (shortname) cpu (s)
2clseq bmc2/cnt09.cnf 198.00
2clseq rand_net/50-60-10 96.33
2clseq rand_net/60-40-10 16.03
2clseq rand_net/60-60-10 180.45
2clseq rand_net/60-60-5 172.22
2clseq rand_net/70-40-10 35.02
2clseq rand_net/70-40-5 52.15
2clseq rand_net/70-60-10 188.11
2clseq rand_net/70-60-5 611.71
2clseq satex-c/c6288-s 0.73
2clseq satex-c/c6288 0.77
berkmin dinphil/dp10u09 321.67
berkmin fpga-r/rcs_w9 2 054.50
berkmin satex-c/cnf-r4-b2-k1.1 1 402.24
limmat Homer/homer16 2 315.05
limmat dinphil/dp12s12 5.21
modoc Homer/homer15 843.87
zchaff fifo8/fifo8_200 1 417.68
zchaff w10/w10_70.cnf 661.74
zchaff satex-/9vliw_bp_mc 1 274.94
zchaff fvp-u.2.0/5pipe 186.58
zchaff fvp-u.2.0/5pipe_3_oo 533.11
zchaff fvp-u.2.0/5pipe_4_ooo 1 667.30
zchaff fvp-u.2.0/5pipe_5_ooo 814.92
zchaff fvp-u.2.0/7pipe_bug 452.14

Table 11: Uniquely solved hand-made benchmarks during the first stage
Solver Benchmark CPU (s)
berkmin hanoi6_on 295.04
berkmin rope_1000 2 058.62
berkmin x1.1_32 57.13
berkmin x1_32 82.18
limmat pyhala-braun-sat-40-4-01 1 295.77
rb2cl lisa21_1_a 1 955.62
zchaff lisa21_2_a 287.74
zchaff pyhala-braun-sat-40-4-02 971.59
zchaff pyhala-braun-unsat-35-4-01 1 474.30
zchaff pyhala-braun-unsat-35-4-02 1 653.80
zchaff Urquhart-s3-b5 1 507.11
zchaff x1.1_36 786.60
zchaff x2_36 1 828.81

Table 12: Uniquely solved randomly generated benchmarks during the first stage
Solver Benchmark CPU (s)
2clseq 5col100_15_1 1 148.08
2clseq 5col100_15_4 1 097.78
2clseq 5col100_15_5 1 018.13
2clseq 5col100_15_6 1 353.14
2clseq 5col100_15_7 748.21
2clseq 5col100_15_8 1 160.66
2clseq 5col100_15_9 821.42
dlmsat1 hgen5-v300-s1895562135 391.94
dlmsat1 hgen5-v300-s528975468 512.81
dlmsat1 hgen2-v300-s1807441418 651.35
dlmsat1 hgen3-v450-s646636548 141.16
dlmsat1 hgen4-v200-s2074278220 16.79
dlmsat1 hgen4-v200-s812807056 1 158.20
dlmsat3 hgen3-v450-s356974048 821.42
dlmsat3 4col280_9_2 1 766.60
dlmsat3 4col280_9_4 6.60
dlmsat3 4col280_9_6 1 792.59
dlmsat3 4col280_9_7 108.43
OKsolver glassy-v399-s1993893641 1 162.07
OKsolver glassy-v399-s524425695 814.44
OKsolver glassybp-v399-s1499943388 273.31
OKsolver glassybp-v399-s944837607 615.27
OKsolver glassyb-v399-s1267848873 1 001.71
OKsolver 5cnf_3900_3900_160 243.24
OKsolver 5cnf_3900_3900_170 911.32
OKsolver 5cnf_3900_3900_180 1 790.46
OKsolver 5cnf_3900_3900_190 1 778.96
OKsolver 5cnf_4300_4300_090 1 390.55
OKsolver 5cnf_4300_4300_100 1 311.92
unitwalk hgen2-v650-s2139597266 536.09
unitwalk hgen2-v700-s543738649 32.38
unitwalk hgen2-v700-s548148704 0.62
unitwalk hgen3-v500-s1754870155 157.03
usat05 okgen-c2550-v600-s552691850 5.57
usat10 hgen3-v450-s1400022686 0.39

In order to obtain a total order among the solvers, one must first classify benchmarks themselves. For this, the notion of SOTA contributors can be used [SS01]: benchmarks solved by all SOTA contributors are said easy, those solved by at least one SOTA contributor (but not all) are called difficult10. Note that the benchmarks not solved by any solver are not considered here.

Now, it is easy to rank the solvers accordingly to the number of difficult instances they can solve. Tables 13 and 14 provide a summary for SAT2002 competition, for respectively complete solvers and satisfiable benchmarks.

Table 13: Number of difficult SAT+UNSAT benchmarks solved by each solver during the first stage
Industrial Handmade Randomly generated
solver #solved solver #solved solver #solved
berkmin 121 berkmin 68 OKsolver 548
zchaff 104 zchaff 68 marchII 543
2clseq 93 limmat 47 marchIIse 543
limmat 87 simo 43 2clseq 369
simo 57 2clseq 34 rb2cl 338
rb2cl 36 OKsolver 33 zchaff 305
OKsolver 35 jquest 16 simo 298
jquest 31 marchIIse 13 berkmin 263
marchIIse 29 rb2cl 10 limmat 221
modoc 26 marchII 9 modoc 219
blindsat 20 modoc 7 jquest 122
marchII 16 blindsat 0 blindsat 3

Table 14: Number of difficult satisfiable benchmarks solved by each solver during the first stage
Industrial Handmade Randomly generated
solver #solved solver #solved solver #solved
berkmin 68 zchaff 35 OKsolver 548
zchaff 64 berkmin 33 marchII 543
limmat 54 limmat 29 marchIIse 543
2clseq 46 simo 27 2clseq 369
unitwalk 41 OKsolver 23 rb2cl 338
simo 39 2clseq 21 zchaff 305
dlmsat2 37 unitwalk 11 simo 298
dlmsat1 36 dlmsat1 11 berkmin 263
rb2cl 35 marchIIse 10 dlmsat1 255
dlmsat3 35 dlmsat3 10 dlmsat2 234
saturn 34 saturn 9 dlmsat3 233
usat10 33 usat05 9 unitwalk 227
usat05 31 usat10 9 limmat 221
OKsolver 30 dlmsat2 9 modoc 219
marchIIse 26 marchII 7 saturn 208
jquest 24 rb2cl 6 usat10 206
blindsat 20 modoc 5 usat05 205
modoc 20 jquest 4 jquest 122
ga 18 blindsat 0 blindsat 3
marchII 13 ga 0 ga 3

What can we conclude? First of all, we awarded SOTA contributors. Looking at the SOTA ranking per category, berkmin, zchaff and OKsolver would be awarded. Limmat, our fourth awarded, is most of the time third after zchaff and berkmin in industrial and hand-made categories. So the result of the SAT2002 competition looks reasonable. Berkmin certainly deserves a special attention from the community, since despite the bug that made it crashed on more than 100 benchmarks, it is ranked first in the industrial category using the SOTA system. This little impact of crashes can certainly be due to the fact that only the second SAT-engine of Berkmin crashed, which means that most instances on which Berkmin crashed are hard for all solvers anyway (this engine is called for hardest instances only).

Furthermore, we now have difficult and unsolved instances for the next competition. The degree of difficulty provided by the SOTA system can be used to tune solvers for the next competition: first try to solve instances of medium difficulty, then try the really hard ones. All that information will be included in the instances archive.

next up previous
Next: Graphical analysis of SOTA Up: Other views of the Previous: Other views of the
LE BERRE Daniel 2002-09-16