The general idea was to award the most ``generic'' solver, i.e. the one that is able to solve the widest range of problems. However, it looked like a nonsense to compare a solver tailored for 3SAT random instances and one tailored for Electronic Design Automation (EDA) instances, and the same remark applies for complete and incomplete solvers. So we divided the space of SAT experiments into 6 categories: industrial, handmade, random benchmarks for either complete (which could solve both satisfiable and unsatisfiable benchmarks) or all solvers (in the latter case, only satisfiable and ``unknown'' benchmarks were used, and only satisfiable ones were counted). Submitters were asked to stamp their benchmarks with the correct category.

To rank the solvers in each category, we decided to use the notion
of series: a series of benchmarks is a set of closely related
benchmarks (for instance, pigeon-hole series consists of hole-2, hole-3, etc.
instances). We considered that a series was solved if and only if at
least one of the instances of that series was solved. Thus the idea
was to award a solver solving a maximum number of series in a given
category. To break ties, we decided to count the total number of
instances solved. We planned to use CPU-time as a last resort but we
did not have to use it (note that, besides its effects with the CPU
cut-off limit, pure CPU time performances do not play a crucial role
in the results: two solvers have the same performance if they solve a
benchmark, whatever the exact CPU time it takes). Benchmarks were
grouped in series by us, authors were only allowed to submit
*families* of benchmarks (a series was one or more families of
benchmarks).

Furthermore, if there was a scaling factor between the instances of
the series (`hole-2` `hole-3`
`hole-4`, etc.) then we did not launch a solver on the
biggest instances if it failed to solve any smaller. The initial
idea of this ``heuristic'' (well-founded in practice on the
*pigeon hole* example) was to save CPU-time (allowing to
discard quickly any weak solver). Later, it happened that this choice
had an important impact on results and was not well-founded in
general (we will discuss this later in the paper). The scaling
information of families of benchmarks was only given by benchmarks
author.