k-NEAREST NEIGHBOR

Publication Mitchell/97b: Machine Learning
Name k-NEAREST NEIGHBOR
Description

K-NEAREST NEIGHBOR is a simple algorithm that stores all available examples and classifies new instances of the example language based on a similarity measure. A variant of this algorithm addresses the task of function approximation.

In detail (for the task of concept learning):

  • Examples are described by numerical attribute-values.
  • The complete example set is simply stored in the "training phase".
  • Calculations are delayed until queries occur, no hypothesis in the usual sense is returned! Hypotheses are implicit defined by the stored example set and the rules new instances are classified by.
  • If there are n attributes all vectors can be interpreted as instances of |Rn. The distance d(x1, x2) of two example vectors x1 and x2 is defined as their usual vector distance.
  • The distance between two example vectors is regarded as a measure for their similarity.
  • To classify a new instance e from the set of stored examples, the k examples most similar to e are determined. The new instance is assigned the class most of this k examples belong to.

This approach is suited for function approximation as well. Instead of assigning the most frequent classification among the k examples most similar to an instance e, an average of the function values of the k examples is calculated as the prediction for the function value e.

A variant of this approach calculates a weighted average of the nearest neighbors. Given a specific instance e that shall be classified, the weight of an example increases with increasing similarity to e.

A major problem of the simple approach of k-NEAREST NEIGHBOR is that the vector distance will not necessarily be suited for finding intuitively similar examples, especially if irrelevant attributes are present.

Generalization Lazy Learning
Example Languages Numerical Values
Method Type Algorithm