Shortcut: direct links to the selection of Application benchmarks and Crafted benchmarks. You can also reproduce the selection yourself!.

Random Category

The random benchmarks have been generated using a uniform random K-sat instance generator following the guidelines published by Oliver Kullmann for the SAT 2005 competition.

The instances are generated in both medium and large size. Medium benchmarks target complete solvers while large benchmarks target incomplete solvers. For each medium tuple of parameter (clause length, number of variables, ratio), 100 instances have been generated randomly. The 2007 winner of the random SAT category, gNovelty+, was used to descriminate between SAT and unsat benchmarks. The benchmarks found SAT be gNovelty+ (usually in a few seconds) were classified as SAT while the others were classified as unknown (probably UNSAT). Once classified that way, 10 SAT and 10 UNKNOWN benchmarks were selected per tuple of parameter. For large benchmarks, the instances have been generated slightly below the threshold, so they are probably SAT. We generated 100 of them, among which 10 where randomly selected per tuple of parameter.

	Medium (SAT+UNKNOWN)		Large (probably SAT)
K	Ratio	Number of variables	Ratio	Number of variables
3	4.26	360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560	4.2	2000, 4000, 6000, 8000, 10000, 12000, 14000, 16000, 18000, 20000, 22000, 24000, 26000
5	21.3	90, 100, 110, 120	20	700, 800, 900, 1000, 1100
7	89	60, 65, 70, 75	81	140, 160, 180, 200, 220
Total SAT	190		190+40*
Total UNSAT	190		NA
Total	380		230
Total	610

The rationale was to have as many SAT than UNKNOWN benchmarks for complete solvers and as many UNKNOWN as large probably SAT benchmarks, i.e. specific benchmarks for complete and incomplete solvers.

After the first stage on random category, it appeared that the incomplete solvers were able to solve basically all the large instances for K=3. So, in agreement with the judges, it was decided to add larger benchmarks (followed by a star) to challenge the solvers during the second stage.

General principles adopted by the judges

No more than 10% of the benchmarks should come from the same source.

It was decided to build a set of 300 instances. As a consequence, no more than 30 benchmarks could be accepted per source.

The final selection of benchmarks should contain 45% existing benchmarks and 55% submitted benchmarks.

As a consequence, the final set of 300 benchmarks should contain 165 submitted benchmarks and 135 existing benchmarks.

The final selection should contain 10% easy, 40% medium and 50% hard benchmarks.

As a consequence, among the 135 existing benchmarks, we should have exactly 13 easy, 54 medium and 68 hard benchmarks while among the 165 submitted benchmarks we should have exactly 17 easy, 66 medium and 82 benchmarks.

Duplicate benchmarks found after the selection was done will simply be removed from the selection. No other benchmarks will be added to the selection.

As a consequence, the final set of benchmarks will be less than 300 benchmarks.

Application Category

In the case of the application category, the benchmarks are coming from several sources:

Existing Benchmarks: are comming from the two SAT Races (2006 and 2008) plus the SAT 2007 competition, industrial division.
Submitted Benchmarks: are the benchmarks submitted for the SAT competition 2009.

Note that the IBM benchmarks disappeared from the existing benchmarks because they are no longer available from IBM web site.

The benchmarks have been classified as EASY/MEDIUM/HARD benchmarks according to the results of the two winners of the SAT07 competition, industrial category (RSAT and PICOSAT), plus the widely used Minisat. Since those three solvers participated to the 3 competitive events, we simply used the publicly available results to classify the benchmarks.

easy: Solved within 30s by all the solvers.
hard: Not solved by any of the solvers within the timeout of the event (first stage for the SAT competition 07).
medium: The remaining instances.

Note that such definition is relative to the best solvers in 2007 for that category. As a consequence, some SAT solvers might find hard to solve some benchmarks classified easy while others may be able to solve some hard benchmarks in reasonable time.

	EASY			MEDIUM			HARD				Total
Origin	SAT	UNSAT	ALL	SAT	UNSAT	ALL	SAT	UNSAT	UNKNOWN	ALL
Existing	11	29	45	66	87	147	8	29	45	82	295
Submitted	60	38	98	38	60	98	8	12	82	102	298
Total	71	67	143	104	147	245	16	41	127	184	593
Details
Submitted
aprove09	21	0	21	4	0	4	-	-	-	-	25
bio I	3	0	3	6	11	17	-	-	-	-	20
bio II	9	0	9	4	3	7	-	-	24	24	40
c32sat	0	1	1	1	3	4	-	3	2	5	10
bitverif	-	14	14	-	22	22	-	6	23	29	65
crypto	5	-	5	7	6	13	4	-	20	24	42
diagnosis	22	23	45	16	15	31	4	3	13	20	96
Existing
grieu	-	-	-	7	-	7	3	-	-	3	10
palacios	-	-	-	7	12	19	1	2	5	8	27
jarvisalo	-	-	-	-	4	4	-	1	2	3	7
aprove07	-	-	-	1	8	9	-	2	5	7	16
post	-	1	1	2	3	5	-	4	-	4	10
narain	1	-	1	2	1	3	-	-	1	-	5
mizh	-	-	-	13	-	13	-	-	-	-	13
goldb	-	-	-	-	5	5	-	5	-	5	10
manolios	-	4	4	-	21	21	-	5	-	5	26
babic	5	15	20	-	10	10	-	-	-	-	3
misc	5	8	13	7	11	18	1	1	-	2	33
diagnosis07	-	-	-	10	6	16	3	1	27	30	46
velev	-	1	1	17	6	23	-	8	5	13	37

The number of instances submitted by a given source ranged from 10 to 96. Thus it was not possible to accept as is all the submissions. Limiting the number of instance to 30 at most per submitter changed a bit the landscape:

	EASY			MEDIUM			HARD				Total
Origin	SAT	UNSAT	ALL	SAT	UNSAT	ALL	SAT	UNSAT	UNKNOWN	ALL
Existing	11	29	40	52	84	136	6	29	34	69	245
Submitted	26	3	29	25	40	65	8	10	63	81	175
Total	33	32	69	77	124	201	14	39	97	150	420
Details
Submitted
aprove09	21	0	21	4	0	4	-	-	-	-	25
bio I	3	0	3	6	11	17	-	-	-	-	20
bio II	1	0	1	4	3	7	-	-	22	22	30
c32sat	0	1	1	1	3	4	-	3	2	5	10
bitverif	-	1	1	-	11	11	-	4	14	18	30
crypto	1	-	1	6	5	11	4	-	14	18	30
diagnosis	-	1	1	4	7	11	4	3	11	18	30
Existing
grieu	-	-	-	7	-	7	3	-	-	3	10
palacios	-	-	-	7	12	19	1	2	5	8	27
jarvisalo	-	-	-	-	4	4	-	1	2	3	7
aprove07	-	-	-	1	8	9	-	2	5	7	16
post	-	1	1	2	3	5	-	4	-	4	10
narain	1	-	1	2	1	3	-	-	1	-	5
mizh	-	-	-	13	-	13	-	-	-	-	13
goldb	-	-	-	-	5	5	-	5	-	5	10
manolios	-	4	4	-	21	21	-	5	-	5	26
babic	5	15	20	-	10	10	-	-	-	-	3
misc	5	8	13	7	11	18	1	1	-	2	33
diagnosis07	-	-	-	6	6	12	1	1	16	18	30
velev	-	1	1	7	4	11	-	8	5	13	25

Finally, once the pourcentage of existing/submitted and easy/hard/medium benchmarks are applied, we end up with the following selection of benchmarks for the competition:

(List)	EASY			MEDIUM			HARD				Total
Origin	SAT	UNSAT	ALL	SAT	UNSAT	ALL	SAT	UNSAT	UNKNOWN	ALL
Existing	1	10	11	22	33	55	6	29	34	69	135
Submitted	18	1	19	25	40	65	8	10	63	81	165
Total	19	11	30	47	73	120	14	39	97	150	300

During the selection of benchmarks, the initial set of benchmarks did contain several duplicated benchmarks. This was discovered once the benchmarks were added to the competition system because several statictics regarding the benchmarks are automatically computed. manol-pipe-f9b.cnf and manol-pipe-g10nid.cnf appeared both in HARD and MEDIUM category, post* benchmarks submitted to the SAT09 competition were exactly the ones used during the SAT RACE08. (we simply removed holes in variable numbering). It was decided to remove those duplicate benchmarks from the selection, without adding new ones.

(Corrected List)	EASY			MEDIUM			HARD				Total
Origin	SAT	UNSAT	ALL	SAT	UNSAT	ALL	SAT	UNSAT	UNKNOWN	ALL
Existing	1	9	10	21	33	54	6	23	34	63	127
Submitted	18	1	19	25	40	65	8	10	63	81	165
Total	19	10	29	46	73	119	14	33	97	144	292
Details
Submitted
aprove09	14	-	14	4	-	4	-	-	-	-	18
bio I	3	-	3	6	11	17	-	-	-	-	20
bio II	1	-	1	4	3	7	-	-	22	22	30
c32sat	-	-	-	1	3	4	-	3	2	5	9
bitverif	-	1	1	-	11	11	-	4	14	18	30
crypto	-	-	-	6	5	11	4	-	14	18	29
diagnosis	-	-	-	4	7	11	4	3	11	18	29
Existing
grieu	-	-	-	5	-	5	3	-	-	3	8
palacios	-	-	-	1	6	7	1	2	5	8	15
jarvisalo	-	-	-	-	3	3	-	1	2	3	6
aprove07	-	-	-	-	-	-	-	2	5	7	7
narain	-	-	-	-	-	-	-	-	1	1	1
mizh	-	-	-	7	-	7	-	-	-	-	7
goldb	-	-	-	-	1	1	-	5	-	5	6
manolios	-	2	2	-	11	11	-	3	-	3	15
babic	-	3	3	-	-	-	-	-	-	-	3
misc	1	4	5	2	6	8	1	1	-	2	15
diagnosis07	-	-	-	2	3	5	1	1	16	18	23
velev	-	-	-	4	3	7	-	8	5	13	20

Crafted Category

For the crafted category, the selection of benchmarks was make from the SAT07 benchmarks, crafted category (that contained a selection of benchmarks from all the previous competitions.

We followed the same principle as for the application category. The SAT solvers choosen to evaluate the difficulty of the benchmarks are the winners of the 2007 competition (March-KS and Satzilla-Crafted, plus Minisat 2007 that would have been the winner of that category using this year scoring scheme.

	EASY			MEDIUM			HARD				Total
Origin	SAT	UNSAT	ALL	SAT	UNSAT	ALL	SAT	UNSAT	UNKNOWN	ALL
Existing	-	4	4	28	56	84	5	18	90	113	201
Submitted	118	10	128	75	9	84	6	-	355	371	583
Total	118	14	132	103	65	168	11	18	445	484	784
Details
Submitted
Existing

Following the rule applied for the application category, no single source should have more than 30 benchmarks. The updated numbers follows.

	EASY			MEDIUM			HARD				Total
Origin	SAT	UNSAT	ALL	SAT	UNSAT	ALL	SAT	UNSAT	UNKNOWN	ALL
Existing	-	4	4	24	51	75	6	18	85	109	188
Submitted	34	10	44	50	9	59	6	-	70	76	179
Total	34	14	48	74	60	134	12	18	155	185	367
Details
Submitted
parity games	6	8	14	7	2	9	-	-	7	7	30
sgen09	5	1	6	4	2	6	-	-	18	18	30
modcircuits	-	1	1	4	1	5	-	-	13	13	19
ramseycube	1	-	1	5	3	8	-	-	1	1	10
rbsat	-	-	-	11	1	12	-	-	18	18	30
edge matching	-	-	-	18	-	18	6	-	6	12	30
sgi	22	-	22	1	-	1	-	-	7	7	30
Existing
contest 02	-	-	-	-	3	3	4	-	7	11	15
contest 03	-	-	-	-	1	1	-	6	19	25	26
contest 04	-	-	-	1	7	8	1	2	17	20	28
jarvisalo	-	-	-	14	4	18	-	2	-	2	20
QG	-	-	-	4	16	20	-	1	7	8	28
phnf	-	-	-	-	1	-	-	1	8	9	10
misc	-	-	-	3	3	6	-	-	10	10	16
sgen07	-	4	4	-	6	6	-	2	4	6	16
sabharwal	-	-	-	2	10	12	1	4	13	18	30

Among those benchmarks, 300 were randomly selected to match the ratios of easy/medium and hard decided by the judges.

(List)	EASY			MEDIUM			HARD				Total
Origin	SAT	UNSAT	ALL	SAT	UNSAT	ALL	SAT	UNSAT	UNKNOWN	ALL
Existing	-	4	4	19	42	61	4	12	58	74	139
Submitted	19	7	26	50	9	59	6	-	70	76	161
Total	19	11	30	69	65	120	11	10	129	150	300
Details
Submitted
parity games	5	5	10	7	2	9	-	-	7	7	26
sgen09	1	1	2	4	2	6	-	-	18	18	26
modcircuits	-	1	1	4	1	5	-	-	13	13	19
ramseycube	1	-	1	5	3	8	-	-	1	1	10
rbsat	-	-	-	11	1	12	-	-	18	18	30
edge matching	-	-	-	18	-	18	6	-	6	12	30
sgi	12	-	12	1	-	1	-	-	7	7	20
Existing
contest 02	-	-	-	-	2	2	2	-	5	7	9
contest 03	-	-	-	-	1	1	-	3	11	14	15
contest 04	-	-	-	-	6	6	1	1	14	16	22
jarvisalo	-	-	-	11	3	14	-	1	-	1	15
QG	-	-	-	3	14	17	-	1	4	5	22
phnf	-	-	-	-	1	-	-	1	6	7	8
misc	-	-	-	3	2	5	-	-	9	9	14
sgen07	-	4	4	-	4	4	-	2	4	6	14
sabharwal	-	-	-	2	9	11	1	3	5	9	20

Errata: it happened that 3 benchmarks were wrongly added twice to the selection (because they matched two different regular expressions used to classify the benchmarks). Those benchmarks are contest04-connm-ue-csp-sat-n800-d0.04-s362703357.sat05-548.reshuffled-07.cnf, contest04-connm-ue-csp-sat-n600-d0.02-s1676244754.sat05-535.reshuffled-07.cnf,contest03-SGI_30_50_30_20_3-dir.sat05-440.reshuffled-07.cnf.
Furthermore, the parity-games generator provides exactly the same benchmark for n=2,i=2 with or without the ce and ci parameters. As such, the instance_n2_i2_pp_ci_ce.cnf benchmark has been removed too from the selection.
The Q32inK08.cnf benchmark from the RamseyCube family was wrongly encoding the very same problem as Q3inK09.cnf so it as been removed as well.
Finally, the sgen generator submitted to the 2009 competition contains in fact two generators: one for satisfiable benchmarks, and one for unsatisfiable benchmarks. Only the satisfiable generator was new, the unsat generator is exactly the same as the SAT2007 competition. Therefore, the SAT07 and SAT09 unsat sgen benchmarks should be considered to come from the same source. It means that all (14) the SAT07 benchmarks from Ivor Spence should be removed to follow the rule that no more than 30 benchmarks should come from the same source.
The final selection without duplicates is thus a set of 281 benchmarks.