Graphical analysis of SOTA cpu performances

The SOTA ranking also allows to focus on subsets of solvers/benchmarks. We can for instance represent the above results in a graphical way, mixing this time complete and incomplete solvers. Doing this, we can take CPU time results into account to give a better picture of SOTA contributors performance. Figures 1, 2 and 3 show for all respective SOTA contributors how much CPU time is needed to solve an increasing number of instances.

Each curve shows exactly how many difficult benchmarks were solved by a particular SOTA contributor within a given time limit for each benchmark. At ordinate 500, one can find how many difficult instances can be solved by each solver within 500 seconds each. So it is possible to see the impact of the time limit on the SOTA contributors.

Take for instance figure 1. Modoc solved very quickly (a few seconds) around 20 difficult instances, nothing more within 500s, 26 in 2400s. Zchaff solved very quickly 40 instances, 80 within 500s, 104 within 2400s. etc.

**Figure 1:** Number of instances solved vs. CPU time for SOTA contributors: Industrial benchmarks
$\begin{figure}\centerline{\epsfig{figure=sorted-color-Industrial-Sota-AllBenchs.eps, width=12cm}}\end{figure}$

**Figure 2:** Number of instances solved vs. CPU time for SOTA contributors: Handmade benchmarks
$\begin{figure}\centerline{\epsfig{figure=sorted-color-Hand-Made-Sota-AllBenchs.eps, width=12cm}}\end{figure}$

**Figure 3:** Number of instances solved vs. CPU time for SOTA contributors: Randomly generated benchmarks (note that the cpu-time slice was smaller for these benchmarks). For each complete solvers (2clseq and OKsolver), we plot two curves. (1) a curve for SAT results only (in order to compare their performances to uncomplete solvers) and (2) for all benchmarks (SAT+UNSAT).
$\begin{figure}\centerline{\epsfig{figure=sorted-color-new-Random-Sota-AllBenchs.eps, width=12cm}}\end{figure}$

Such representation is important for the validation of the cpu-time slice parameter. For this competition, it was arbitrarily chosen¹¹ and one can ask whether this value can play a crucial role in the results.

The first observation from all the figures is that, in general, the order of the respective curves does not change after a certain cpu-time (there is mostly no crossing of lines after 100s). There are however two exceptions. First, on figure 1, for 2clseq and zchaff: 2clseq did not solve any other problem after 1000s, which allowed zchaff to solve more problems in the given cpu-time. 2clseq would probably have been considered in a better way if the cut-off had been fixed to less than 1000s. Secondly, on Handmade benchmarks, the Berkmin curve crosses the one of zchaff just after 2000s (this can be due to the internal bug of Berkmin). One can ask the question of what happened to the results if the cpu-time slice was less than 2000s. At last, as expected, one can notice on figure 3 that uncomplete solvers obtain the best performances on SAT instances (the rightmost curve is OKsolver on SAT+UNSAT benchmarks). Obviously - OKsolver is complete -, the picture is quite different as soon as we take into account all (SAT+UNSAT) answers for OKsolver. It may be at last interesting to notice that all uncomplete solvers obtain more or less the same curve (observe how close are the curves for dlmsat1, dlmsat2, unitwalk, usat05 and usat10).

Two conclusions may be drawn. First, the growth of the curves on industrial benchmarks clearly shows that 2500s is enough. Moreover, we can guess that a cpu-time of 1500s would have been sufficient. On hand-made instances, the picture is not so clear. May be this is due to the scalability of generated instances, which sometimes allows a smooth growth of the needed cpu-time (a smooth growth may also be observed on random instances).