Description |
The notion of Boosting subsumes methods of improving learning
algorithms. If learning algorithms fail to provide arbitrarily
good results for a specific concept learning task, i.e. they fail
to output arbitrarily good hypothesis with arbitrary high
probability, they often can be "boosted" to perform much better.
In the PAC-learning scenario the following is required for a learning
algorithm A:
- For arbitrary δ and ε greater than 0, and
- for each target concept c of a given hypothesis
language LH,
A has to be able to choose a hypothesis,
- approximating c ε-good,
- with a probability of at least (1 - δ),
- after reading a random training sample of size polynomial in
(1 / δ) and (1 / ε).
Please follow the link to "PAC-Learning", for more details.
The requirements of PAC-Learning are strong, and usually do not hold
in practice for common learning algorithms. Instead one has so
called weak learners, performing just slightly better than random
guessing.
A weak learning algorithm manages
- for each target concept of a given hypothesis language,
to
- output a hypothesis with error below 1/2 -
p(|E|),
- with a probability greater than 0,
- provided with a random training sample,
where p is a polynomal.
The aim of boosting is to improve the accuracy (1 - ε above)
reached by a weak learner, and to reduce the probability of outliers
(δ above), i.e. that random samples lead to bad hypothesis.
Therefore boosting algorithms ...
-
run a given weak learner several times, where each round
either the set of examples is slightly altered, e.g. by
shifting the weight of examples, or the algorithm is allowed
to read new examples each time.
-
combine the set of hypotheses to one final hypothesis,
e.g. by using a weighted majority vote.
|