CANDIDATE-ELIMINATION

Publication Mitchell/97b: Machine Learning
Name CANDIDATE-ELIMINATION
Description

The versions space consists of all actual generalizations, all actual specializations and the hypotheses "in between". Intuitively spoken the versions space is the set of the remaining hypotheses still consistent with the given examples and is described by the set of most general and by the set of most specific hypothesis. As soon as both sets intersect, a target concept is within the intersection. Supplying more examples will lead to identical sets. All concepts in this set are possible solutions. To continue supplying further examples can still delete wrong hypothesis, which could not be detected as such, so far.

For the algorithm it is assumed that the most general concepts in the set of all hypotheses cover all of the examples, the most special concepts exclude all of them, the hypothesis language LE is partially ordered with respect to generality. Normally, LH is an attribute-value representation, with taxonomies describing the general-to-specific ordering of attribute values.

Often there are more solutions to a learning task. Nevertheless the shown algorithm continues until a unique solution was found, which does not mean any disadvantage. At each time the actual version space can be considered as a partial solution, helping to classify yet unclassified instances.

The Algorithm

Initialize the set G with the most general concepts, the set S with the most special ones.
Until G and S are equal and contain exactly one hypothesis, read an example e and

if e is a negative example but covered by G
  • delete all concepts s in S, which cover the example e.
  • specialize G until e is no longer covered.
  • delete all g in G, which are strictly more specific than any other concept in G.
if e is a positive example that is not covered by S
  • delete all concepts g in G, which do not cover e.
  • generalize S until e is covered.
  • delete all s in S, which are strictly more general than any other concept in S.
    Then delete all concepts in G, which are not more general than any concept in S and delete all concepts in S which are not more specific than any concept in G.
As soon as G and S are equal, and each consists of exactly one hypothesis, the hypothesis is the searched concept.

Generalization Searching Version Spaces
Example Languages Attribute-Value
Method Type Algorithm
Demos Version Space Demo Applet
Theories Learning as Search