AdaBoost

Publication Shapire/99a: Theoretical Views of Boosting and Applications
Name AdaBoost
Description

AdaBoost is a boosting algorithm, running a given weak learner several times on slightly altered training data, and combining the hypotheses to one final hypothesis, in order to achieve higher accuracy than the weak learner's hypothesis would have.

The main idea of AdaBoost is to assign each example of the given training set a weight. At the beginning all weights are equal, but in every round the weak learner returns a hypothesis, and the weights of all examples classified wrong by that hypothesis are increased. That way the weak learner is forced to focus on the difficult examples of the training set. The final hypothesis is a combination of the hypotheses of all rounds, namely a weighted majority vote, where hypotheses with lower classification error have higher weight.

In detail:

Given

  • a set E = { (x1, y1), ..., (xn, yn) } of classified examples, where xi ∈ X and yi ∈ Y, for i = 1, ..., n. Here we assume Y = {-1, +1}, e.g. instances that are not covered by a concept to be learned have label -1 and the ones covered have label +1.
  • a weak learning algorithm that can deal with weighted example sets. Such a learning algorithm reads an example set E and a distribution D. In the most simple case, where all hypotheses that can be output are functions X → {-1, +1}, the algorithm tries to find a hypothesis h with minimal probability of misclassification, given that an example is drawn from X with respect to D. The case of other possible hypotheses can be addressed by using more complex error measures.
The algorithm:

Let Dt(i) denote the weight of example i in round t.

  • Initialization: Assign each example (xi, yi) ∈ E the weight D1(i) := 1/n.
  • For t = 1 to T:
    • Call the weak learning algorithm with example set E and weight s given by Dt.
    • Get a weak hypothesis ht : X → ℝ.
    • Update the weights of all examples.
  • Output the final hypothesis, generated from the hypotheses of rounds 1 to T.
Updating the weights in round t:

Dt+1(i) := Dt(i) * exp(αtyiht(xi)) / Zt , where

  • Zt is chosen such, that Dt+1 is a distribution.
  • αt is chosen according to the importance of hypothesis ht. For ht: X → {-1, 1} usually αt is chosen as
    αt := 1/2 * ln ( (1 - εt) / εt ),
    where εt denotes the classification error of hypothesis ht.

The final hypothesis H: X → {-1, +1} is chosen as
T
H(X) = sign(αtht(x) ).
t=1

For more details, please refer to the publication link above.
Generalization Boosting
Method Type Algorithm