Next: Explicit Solution in the
Up: Calculating Rule Vectors using
Previous: Optimizing in the Setting
As previously argued in Theorem 4, minimizing
in the setting of Schapire
Singer (1998) can be done optimally, but at the expense of complex optimization procedures, with large complexities. One can wonder whether such procedures, to optimize only the computation of
(a small part of WIDC), are really well worth the adaptation to our setting, in which more values are authorized. We are now going to show that a much simpler combinatorial procedure, with comparatively very low complexity, can bring optimal results in fairly general situations. The most simple way to describe most of these situations is to make the following assumption on the examples:
- (
)
- Each example used to compute
has only one ``1'' in its class vector.
A careful reading of assumption (
) reveals that it implies that each example belongs to exactly one class, but it does not prevent an observation to be element of more than one class, as long as different examples sharing the same observation have different classes (the ``1'' of the class vectors is in different positions among these examples). Therefore, even if it does not integrate the most general features of the ranking loss setting, our assumption still authorizes to consider problems with non zero Bayes optimum. This is really interesting, as many commonly used datasets fall into the category of our assumption, as for example many datasets of the UCI repository of Machine Learning database [Blake et al.1998]. Finally, even if the assumption does not hold, we show that in many of the remaining (interesting) cases, our approximation algorithm is asymptotically optimal, that is, finds solutions closer to the minimal value of
as
increases.
Suppose for now that (
) holds. Our objective is to calculate the vector
of some monomial
. We use the shorthands
to denote the sum of weights of the examples satisfying
and belonging respectively to classes
. We want to minimize
as proposed in eq. (3). Suppose without loss of generality that
otherwise, reorder the classes so that they verify this assertion. Given only three possible values for each component of
, the testing of all
possibilities for
is exponential and time-consuming. But we can propose a very fast approach. We have indeed
Proof: See the Appendix.
Thus, the optimal
does not belong to a set of cardinality
, but to a set of cardinality
. Our algorithm is then straightforward: simply explore this set of
elements, and keep the vector having the lowest value of
. Note that this combinatorial algorithm has the advantage to be adaptable to more general settings in which
particular values are authorized for the components of
, for any fixed
not necessarily equal to 3. In that case, the complexity is larger, but limited to
.
There are slightly more general settings in which our algorithm remains optimal, in particular when we can certify
,
. Here,
denotes the sum of weights of the examples belonging at least to class
, and
denotes the sum of weights of the examples belonging at least to class
, and not belonging at least to class
. This shows that even for some particular multilabel cases, our approximation algorithm can remain optimal. One can wonder if the optimality is preserved in the unrestricted multilabel framework. We now show that, if optimality is not preserved, we can still prove the quality of our algorithm for general multilabel cases, showing asymptotic optimality as
increases.
Our approximation algorithm is run in the multilabel case by transforming the examples as follows: each example
for which
is transformed into
examples, having the same description
, and only one ``1'' in their vector, in such a way that we span the
``1'' of the original example. Their weight is the one of the original example, divided by
. We then run our algorithm on this new set of examples satisfying assumption (
).
Now, suppose that for any example
, we have
for some
. There are two interesting vectors we use. The first one is
, the optimal vector (or an optimal vector) minimizing
over the original set of examples, the second one is
, the vector we find minimizing
over the transformed set of examples. What we want is to estimate the quality of
with respect to the optimal value of
over the original set of examples,
using our notation. The following theorem gives an answer to this problem, by quantifying its convergence towards
.
Proof: See the Appendix.
Therefore, in the set of all problems for which for some
,
, we obtain
, and our bound converges to the optimum as
increases in this class of problems. By means of words, our simple approximation algorithm is quite efficient for problems with large number of classes. Note that using a slightly more involved proof, we could have reduced the constant ``
'' factor in Theorem 6 to the slightly smaller ``
''. Now, to fix the ideas, the following subsection displays the explicit (and simple) solution when there are only two classes.
Next: Explicit Solution in the
Up: Calculating Rule Vectors using
Previous: Optimizing in the Setting
�2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.