Cross Validation

Name Cross Validation
Description Cross validation is a method aiming at estimating the error of a hypothesis, generated by a concept learning algorithm (classifier).

Given a set of training data and a concept learner this is how cross validation estimates the accuracy of the hypothesis gained by running the learning algorithm on the data set:

  • Randomly divide the training data in M sub-samples.
  • For each sub-sample i, do:
    • Let a concept learner build a hypothesis from the training data without sub-sample i.
    • Determine the accuracy ai of this hypthesis on sub-sample i, not used for learning.
  • The estimated accuracy is (1/M) * Σi=1,..,M ai, the average error rate for the M sub-samples.

A special case of cross validation is the so called leave one out method, where M is chosen as the cardinality of the training set. In other words for each given example another run of learning is performed where all training data except for this example is used for training, and the correctness of the classification of the single example is checked.

For efficiency reasons "leave one out" is unusual in practice, although in general it will be closest to the real error. Instead M=10 is a frequently chosen compromise between computational effort and quality of estimation results.