To explore the extent to which agreement exists we perform rank correlation tests for agreement in multiple judgements [KanjiKanji1999] (we refer to this test as an MRC). In our experiment the judges are the planners and the subjects are the problem instances. We perform a distinct MRC for each domain/level combination, showing in each case how the planners ranked the instances in that domain and level. We therefore perform 25 MRCs for the fully-automated planners (there were 25 distinct domain/level pairs in which the fully-automated planners competed), 23 for the hand-coded planners on the small problems (the hand-coded planners did not compete in the Freecell STRIPS or Settlers NUMERIC domains) and 22 for the hand-coded planners on the large problems (amongst which there were no Satellite HARDNUMERIC instances). The results of these tests are shown in Figure 26. In each test the planners rank the
problem instances in order of time taken to solve. Unsolved problems create no difficulties as they are pushed to the top end of the ranking. The MRC determines whether the independent rankings made by the
planners agree. The test statistic follows the F-distribution with
degrees of freedom determining whether the critical value is exceeded.