AdaBoost

Publication Shapire/99a: Theoretical Views of Boosting and Applications

Name AdaBoost

Description
AdaBoost is a boosting algorithm, running a given weak learner several times on slightly altered training data, and combining the hypotheses to one final hypothesis, in order to achieve higher accuracy than the weak learner's hypothesis would have.

The main idea of AdaBoost is to assign each example of the given training set a weight. At the beginning all weights are equal, but in every round the weak learner returns a hypothesis, and the weights of all examples classified wrong by that hypothesis are increased. That way the weak learner is forced to focus on the difficult examples of the training set. The final hypothesis is a combination of the hypotheses of all rounds, namely a weighted majority vote, where hypotheses with lower classification error have higher weight.
In detail:
Given

a set E = { (x₁, y₁), ..., (x_n, y_n) } of classified examples, where x_i ∈ X and y_i ∈ Y, for i = 1, ..., n. Here we assume Y = {-1, +1}, e.g. instances that are not covered by a concept to be learned have label -1 and the ones covered have label +1.

a weak learning algorithm that can deal with weighted example sets. Such a learning algorithm reads an example set E and a distribution D. In the most simple case, where all hypotheses that can be output are functions X → {-1, +1}, the algorithm tries to find a hypothesis h with minimal probability of misclassification, given that an example is drawn from X with respect to D. The case of other possible hypotheses can be addressed by using more complex error measures.

The algorithm:
Let D_t(i) denote the weight of example i in round t.

Initialization: Assign each example (x_i, y_i) ∈ E the weight D₁(i) := 1/n.

For t = 1 to T:

Call the weak learning algorithm with example set E and weight s given by D_t.

Get a weak hypothesis h_t : X → ℝ.

Update the weights of all examples.

Output the final hypothesis, generated from the hypotheses of rounds 1 to T.

Updating the weights in round t:
D_t+1(i) := D_t(i) * exp(α_ty_ih_t(x_i)) / Z_t , where

Z_t is chosen such, that D_t+1 is a distribution.

α_t is chosen according to the importance of hypothesis h_t. For h_t: X → {-1, 1} usually α_t is chosen as
α_t := 1/2 * ln ( (1 - ε_t) / ε_t ),
where ε_t denotes the classification error of hypothesis h_t.

The final hypothesis H: X → {-1, +1} is chosen as

_T

H(X) = sign( ∑ α_th_t(x) ).

^t=1

For more details, please refer to the publication link above.

Generalization Boosting

Method Type Algorithm