Shortcut: direct links to the selection of Application benchmarks and Crafted benchmarks. You can also reproduce the selection yourself!.

Random Category

The random benchmarks have been generated using a uniform random K-sat instance generator following the guidelines published by Oliver Kullmann for the SAT 2005 competition.

The instances are generated in both medium and large size. Medium benchmarks target complete solvers while large benchmarks target incomplete solvers. For each medium tuple of parameter (clause length, number of variables, ratio), 100 instances have been generated randomly. The 2007 winner of the random SAT category, gNovelty+, was used to descriminate between SAT and unsat benchmarks. The benchmarks found SAT be gNovelty+ (usually in a few seconds) were classified as SAT while the others were classified as unknown (probably UNSAT). Once classified that way, 10 SAT and 10 UNKNOWN benchmarks were selected per tuple of parameter. For large benchmarks, the instances have been generated slightly below the threshold, so they are probably SAT. We generated 100 of them, among which 10 where randomly selected per tuple of parameter.

Medium (SAT+UNKNOWN)Large (probably SAT)
KRatioNumber of variablesRatioNumber of variables
34.26360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 5604.22000, 4000, 6000, 8000, 10000, 12000, 14000, 16000, 18000, 20000*, 22000*, 24000*, 26000*
521.390, 100, 110, 12020700, 800, 900, 1000, 1100
78960, 65, 70, 7581140, 160, 180, 200, 220
Total SAT190190+40*
Total UNSAT190NA
Total380230
610

The rationale was to have as many SAT than UNKNOWN benchmarks for complete solvers and as many UNKNOWN as large probably SAT benchmarks, i.e. specific benchmarks for complete and incomplete solvers.

After the first stage on random category, it appeared that the incomplete solvers were able to solve basically all the large instances for K=3. So, in agreement with the judges, it was decided to add larger benchmarks (followed by a star) to challenge the solvers during the second stage.

General principles adopted by the judges

No more than 10% of the benchmarks should come from the same source.

It was decided to build a set of 300 instances. As a consequence, no more than 30 benchmarks could be accepted per source.

The final selection of benchmarks should contain 45% existing benchmarks and 55% submitted benchmarks.

As a consequence, the final set of 300 benchmarks should contain 165 submitted benchmarks and 135 existing benchmarks.

The final selection should contain 10% easy, 40% medium and 50% hard benchmarks.

As a consequence, among the 135 existing benchmarks, we should have exactly 13 easy, 54 medium and 68 hard benchmarks while among the 165 submitted benchmarks we should have exactly 17 easy, 66 medium and 82 benchmarks.

Duplicate benchmarks found after the selection was done will simply be removed from the selection. No other benchmarks will be added to the selection.

As a consequence, the final set of benchmarks will be less than 300 benchmarks.

Application Category

In the case of the application category, the benchmarks are coming from several sources:

Existing Benchmarks
are comming from the two SAT Races (2006 and 2008) plus the SAT 2007 competition, industrial division.
Submitted Benchmarks
are the benchmarks submitted for the SAT competition 2009.

Note that the IBM benchmarks disappeared from the existing benchmarks because they are no longer available from IBM web site.

The benchmarks have been classified as EASY/MEDIUM/HARD benchmarks according to the results of the two winners of the SAT07 competition, industrial category (RSAT and PICOSAT), plus the widely used Minisat. Since those three solvers participated to the 3 competitive events, we simply used the publicly available results to classify the benchmarks.

easy
Solved within 30s by all the solvers.
hard
Not solved by any of the solvers within the timeout of the event (first stage for the SAT competition 07).
medium
The remaining instances.

Note that such definition is relative to the best solvers in 2007 for that category. As a consequence, some SAT solvers might find hard to solve some benchmarks classified easy while others may be able to solve some hard benchmarks in reasonable time.

 EASYMEDIUMHARDTotal
OriginSATUNSATALLSATUNSATALLSATUNSATUNKNOWNALL 
Existing11294566871478294582295
Submitted60389838609881282102298
Total71671431041472451641127184593
Details
Submitted
aprove0921021404----25
bio I30361117----20
bio II909437--242440
c32sat011134-32510
bitverif-1414-2222-6232965
crypto5-576134-202442
diagnosis22234516153143132096
Existing
grieu---7-73--310
palacios---71219125827
jarvisalo----44-1237
aprove07---189-25716
post-11235-4-410
narain1-1213--1-5
mizh---13-13----13
goldb----55-5-510
manolios-44-2121-5-526
babic51520-1010----3
misc58137111811-233
diagnosis07---1061631273046
velev-1117623-851337

The number of instances submitted by a given source ranged from 10 to 96. Thus it was not possible to accept as is all the submissions. Limiting the number of instance to 30 at most per submitter changed a bit the landscape:

 EASYMEDIUMHARDTotal
OriginSATUNSATALLSATUNSATALLSATUNSATUNKNOWNALL 
Existing11294052841366293469245
Submitted263292540658106381175
Total33326977124201143997150420
Details
Submitted
aprove0921021404----25
bio I30361117----20
bio II101437--222230
c32sat011134-32510
bitverif-11-1111-4141830
crypto1-165114-141830
diagnosis-11471143111830
Existing
grieu---7-73--310
palacios---71219125827
jarvisalo----44-1237
aprove07---189-25716
post-11235-4-410
narain1-1213--1-5
mizh---13-13----13
goldb----55-5-510
manolios-44-2121-5-526
babic51520-1010----3
misc58137111811-233
diagnosis07---661211161830
velev-117411-851325

Finally, once the pourcentage of existing/submitted and easy/hard/medium benchmarks are applied, we end up with the following selection of benchmarks for the competition:

(List)EASYMEDIUMHARDTotal
OriginSATUNSATALLSATUNSATALLSATUNSATUNKNOWNALL 
Existing110112233556293469135
Submitted181192540658106381165
Total1911304773120143997150300

During the selection of benchmarks, the initial set of benchmarks did contain several duplicated benchmarks. This was discovered once the benchmarks were added to the competition system because several statictics regarding the benchmarks are automatically computed. manol-pipe-f9b.cnf and manol-pipe-g10nid.cnf appeared both in HARD and MEDIUM category, post* benchmarks submitted to the SAT09 competition were exactly the ones used during the SAT RACE08. (we simply removed holes in variable numbering). It was decided to remove those duplicate benchmarks from the selection, without adding new ones.

(Corrected List)EASYMEDIUMHARDTotal
OriginSATUNSATALLSATUNSATALLSATUNSATUNKNOWNALL 
Existing19102133546233463127
Submitted181192540658106381165
Total1910294673119143397144292
Details
Submitted
aprove0914-144-4----18
bio I3-361117----20
bio II1-1437--222230
c32sat---134-3259
bitverif-11-1111-4141830
crypto---65114-141829
diagnosis---471143111829
Existing
grieu---5-53--38
palacios---167125815
jarvisalo----33-1236
aprove07-------2577
narain--------111
mizh---7-7----7
goldb----11-5-56
manolios-22-1111-3-315
babic-33-------3
misc14526811-215
diagnosis07---23511161823
velev---437-851320

Crafted Category

For the crafted category, the selection of benchmarks was make from the SAT07 benchmarks, crafted category (that contained a selection of benchmarks from all the previous competitions.

We followed the same principle as for the application category. The SAT solvers choosen to evaluate the difficulty of the benchmarks are the winners of the 2007 competition (March-KS and Satzilla-Crafted, plus Minisat 2007 that would have been the winner of that category using this year scoring scheme.

 EASYMEDIUMHARDTotal
OriginSATUNSATALLSATUNSATALLSATUNSATUNKNOWNALL 
Existing-4428568451890113201
Submitted11810128759846-355371583
Total11814132103651681118445484784
Details
Submitted
Existing

Following the rule applied for the application category, no single source should have more than 30 benchmarks. The updated numbers follows.

 EASYMEDIUMHARDTotal
OriginSATUNSATALLSATUNSATALLSATUNSATUNKNOWNALL 
Existing-4424517561885109188
Submitted341044509596-7076179
Total34144874601341218155185367
Details
Submitted
parity games6814729--7730
sgen09516426--181830
modcircuits-11415--131319
ramseycube1-1538--1110
rbsat---11112--181830
edge matching---18-186-61230
sgi22-221-1--7730
Existing
contest 02----334-71115
contest 03----11-6192526
contest 04---17812172028
jarvisalo---14418-2-220
QG---41620-17828
phnf----1--18910
misc---336--101016
sgen07-44-66-24616
sabharwal---2101214131830

Among those benchmarks, 300 were randomly selected to match the ratios of easy/medium and hard decided by the judges.

(List)EASYMEDIUMHARDTotal
OriginSATUNSATALLSATUNSATALLSATUNSATUNKNOWNALL 
Existing-441942614125874139
Submitted19726509596-7076161
Total19113069651201110129150300
Details
Submitted
parity games5510729--7726
sgen09112426--181826
modcircuits-11415--131319
ramseycube1-1538--1110
rbsat---11112--181830
edge matching---18-186-61230
sgi12-121-1--7720
Existing
contest 02----222-579
contest 03----11-3111415
contest 04----6611141622
jarvisalo---11314-1-115
QG---31417-14522
phnf----1--1678
misc---325--9914
sgen07-44-44-24614
sabharwal---2911135920

Errata: it happened that 3 benchmarks were wrongly added twice to the selection (because they matched two different regular expressions used to classify the benchmarks). Those benchmarks are contest04-connm-ue-csp-sat-n800-d0.04-s362703357.sat05-548.reshuffled-07.cnf, contest04-connm-ue-csp-sat-n600-d0.02-s1676244754.sat05-535.reshuffled-07.cnf,contest03-SGI_30_50_30_20_3-dir.sat05-440.reshuffled-07.cnf.
Furthermore, the parity-games generator provides exactly the same benchmark for n=2,i=2 with or without the ce and ci parameters. As such, the instance_n2_i2_pp_ci_ce.cnf benchmark has been removed too from the selection.
The Q32inK08.cnf benchmark from the RamseyCube family was wrongly encoding the very same problem as Q3inK09.cnf so it as been removed as well.
Finally, the sgen generator submitted to the 2009 competition contains in fact two generators: one for satisfiable benchmarks, and one for unsatisfiable benchmarks. Only the satisfiable generator was new, the unsat generator is exactly the same as the SAT2007 competition. Therefore, the SAT07 and SAT09 unsat sgen benchmarks should be considered to come from the same source. It means that all (14) the SAT07 benchmarks from Ivor Spence should be removed to follow the rule that no more than 30 benchmarks should come from the same source.
The final selection without duplicates is thus a set of 281 benchmarks.