Description |
AdaBoost is a boosting algorithm, running a given weak learner several
times on slightly altered training data, and combining the hypotheses
to one final hypothesis, in order to achieve higher accuracy than the
weak learner's hypothesis would have.
The main idea of AdaBoost is to assign each example of the given
training set a weight.
At the beginning all weights are equal, but in every round the
weak learner returns a hypothesis, and the weights of all examples
classified wrong by that hypothesis are increased.
That way the weak learner is forced to focus on the difficult
examples of the training set.
The final hypothesis is a combination of the hypotheses of all rounds,
namely a weighted majority vote, where hypotheses with lower
classification error have higher weight.
In detail:
Given
-
a set E = { (x1, y1), ..., (xn,
yn) } of classified examples, where xi ∈
X and yi ∈ Y, for i = 1, ..., n. Here we assume
Y = {-1, +1}, e.g. instances that are not covered by a concept to be
learned have label -1 and the ones covered have label +1.
-
a weak learning algorithm that can deal with weighted example sets.
Such a learning algorithm reads an example set E and a distribution
D. In the most simple case, where all hypotheses that can be output
are functions X → {-1, +1}, the algorithm tries to find a
hypothesis h with minimal probability of misclassification, given
that an example is drawn from X with respect to D. The case of other
possible hypotheses can be addressed by using more complex error
measures.
The algorithm:
Let Dt(i) denote the weight of example i in round t.
- Initialization: Assign each example (xi,
yi) ∈ E the weight
D1(i) := 1/n.
-
For t = 1 to T:
- Call the weak learning algorithm with example set E and weight
s given by Dt.
- Get a weak hypothesis ht : X → ℝ.
- Update the weights of all examples.
- Output the final hypothesis, generated from the hypotheses of
rounds 1 to T.
Updating the weights in round t:
Dt+1(i) := Dt(i) *
exp(αtyiht(xi)) /
Zt , where
- Zt is chosen such, that Dt+1 is a
distribution.
- αt is chosen according to the importance
of hypothesis ht.
For ht: X → {-1, 1} usually
αt is chosen as
αt := 1/2 * ln ( (1 -
εt) / εt ),
where εt
denotes the classification error of hypothesis
ht.
The final hypothesis H: X → {-1, +1} is chosen as
| T | |
H(X) = sign( | ∑ | αtht(x) ). |
| t=1 | |
For more details, please refer to the publication link above. |