|
k-NEAREST NEIGHBOR
Publication |
Mitchell/97b: Machine Learning
|
Name |
k-NEAREST NEIGHBOR |
Description |
K-NEAREST NEIGHBOR is a simple algorithm that stores all available examples and classifies
new instances of the example language based on a similarity measure. A variant of this algorithm
addresses the task of function approximation.
In detail (for the task of concept learning):
- Examples are described by numerical attribute-values.
- The complete example set is simply stored in the "training phase".
- Calculations are delayed until queries occur, no hypothesis in the usual sense is returned!
Hypotheses are implicit defined by the stored example set and the rules new instances are classified by.
- If there are n attributes all vectors can be interpreted as instances of |Rn.
The distance d(x1, x2) of two example vectors
x1 and x2 is defined as their usual vector distance.
- The distance between two example vectors is regarded as a measure for their similarity.
- To classify a new instance e from the set of stored examples, the k examples most similar
to e are determined. The new instance is assigned the class most of this k examples
belong to.
This approach is suited for function approximation as well. Instead of assigning the most frequent
classification among the k examples most similar to an instance e, an average of the function values of the k
examples is calculated as the prediction for the function value e.
A variant of this approach calculates a weighted average of the nearest neighbors. Given
a specific instance e that shall be classified, the weight of an example increases with
increasing similarity to e.
A major problem of the simple approach of k-NEAREST NEIGHBOR is that the vector distance will not
necessarily be suited for finding intuitively similar examples, especially if irrelevant attributes are present.
|
Generalization |
Lazy Learning
|
Example Languages |
Numerical Values
|
Method Type |
Algorithm
|
|
|