We consider three different ways in which TILDE can be run in its ILPROLOG implementation:
The most interesting information is obtained by comparing (a) the actual query evaluation time in settings 2 and 3: this gives a view of the efficiency gain obtained by the removal of redundant computation itself (we will abbreviate this as exec in the tables); and (b) the total execution time in settings 1 and 3: this provides an indication of how much is gained by implementing packs in an ILP system, taking all other effects into account (re-implementation of the computation of heuristics via a bit matrix, use of compiled queries instead of meta-calls, etc.), or in other words: what the net effect of the whole re-implementation is (indicated as net in the tables).
In a first experiment we used Bongard problems, varying (1) the size of the
data sets; (2) the complexity of the target hypothesis; and (3) TILDE's
lookahead parameter. The complexity of the target hypothesis
can be small, medium, or none. In the latter case the examples are
random, which causes TILDE to grow ever larger trees in an attempt to
find a good hypothesis; the size of the final tree then typically depends
on the size of the data set. The lookahead parameter is used to control
the number of levels the pack contains; with lookahead , packs of depth
are generated.
Table 1 gives an overview of results for the Bongard problems. The total induction time is reported, as well as (for pack-based execution mechanisms) the time needed for pack compilation and pack execution. Note that the total time includes not only pack compilation and execution, but also all other computations not directly related to packs (e.g., the computation of heuristics from the bitmatrix). The results can be interpreted as follows.
First of all, the table shows that significant speedups can be obtained by using the pack mechanism; net speedups of over a factor 5.5 are obtained, while the execution itself is up to 75 times faster compared to disjoint execution.
A further observation is that for more complex target hypotheses greater
speedups are obtained. This can be explained by the broom-like form of
the packs in TILDE. Complex target hypotheses correspond to deep trees,
and refinement of a node at a lower level of such a tree yields a pack
with a long clause before the branching, which in accordance with our
previous analysis should yield a speedup closer to the branching factor
in the case of lookahead 0 (and more generally, closer to
for
lookahead
, although the latter is much harder to achieve).
Note that the maximum branching factor occurring in each pack is included
in the table in column
.
Finally, deeper packs also yield higher speedups, and this effect is larger
for more complex theories. This is understandable considering the
following. Let us call the clause that is being refined . With
lookahead
, conjunctions of
literals are added to the clause.
In some cases the first of these
literals may fail immediately, which
causes this branch of the pack to have almost no execution time, while
cutting away
queries. Remember that according to our analysis, the
speedup can in the limit approximate
if the complexity of clause
dominates over the complexity of the rest of the pack;
such ``early failing branches'' in the pack cause the actual situation
to approximate closer this ideal case.
We have also run experiments on the Mutagenesis data set (Table 2), both in a regression and a classification setting. Here, query packs are much larger than for the Bongard data set (there is a higher branching factor); with a lookahead of 2 the largest packs had over 20000 queries. For these large packs a significant amount of time is spent compiling the pack, but even then clear net speedups are obtained.5 A comparison of execution times turned out infeasible because in the disjoint execution setting the pack structures consumed too much memory.
|