Description |
The versions space consists of all actual generalizations,
all actual specializations and the hypotheses "in between".
Intuitively spoken the versions space is the set of the remaining
hypotheses still consistent with the given examples and is described
by the set of most general and by the set of most specific
hypothesis. As soon as both sets intersect, a target concept is within
the intersection. Supplying more examples will lead to identical
sets. All concepts in this set are possible solutions. To continue
supplying further examples can still delete wrong hypothesis, which
could not be detected as such, so far.
For the algorithm it is assumed that the most general concepts in the
set of all hypotheses cover all of the examples, the
most special concepts exclude all of them, the hypothesis language
LE is partially ordered with respect to generality.
Normally, LH is an attribute-value representation, with taxonomies
describing the general-to-specific ordering of attribute values.
Often there are more solutions to a learning task. Nevertheless the shown
algorithm continues until a unique solution was found, which does not
mean any disadvantage. At each time the actual version space can
be considered as a partial solution, helping to classify yet
unclassified instances.
The Algorithm
Initialize the set G with the most general concepts, the set S with
the most special ones.
Until G and S are equal and contain exactly one hypothesis, read an
example e and
if e is a negative example but covered by G
- delete all concepts s in S, which cover the example e.
- specialize G until e is no longer covered.
- delete all g in G, which are strictly more specific than any other
concept in G.
if e is a positive example that is not covered by S
- delete all concepts g in G, which do not cover e.
- generalize S until e is covered.
- delete all s in S, which are strictly more general than any other
concept in S.
Then delete all concepts in G, which are not more general than any concept in S and
delete all concepts in S which are not more specific than any concept in G.
As soon as G and S are equal, and each consists of exactly one hypothesis,
the hypothesis is the searched concept.
|