Cross Validation
Name |
Cross Validation |
Description |
Cross validation is a method aiming at estimating the error of
a hypothesis, generated by a concept learning algorithm (classifier).
Given a set of training data and a concept learner this
is how cross validation estimates the accuracy of the hypothesis
gained by running the learning algorithm on the data set:
- Randomly divide the training data in M sub-samples.
- For each sub-sample i, do:
Let a concept learner build a hypothesis from the training data
without sub-sample i.
- Determine the accuracy ai of this hypthesis on
sub-sample i, not used for learning.
The estimated accuracy is (1/M) * Σi=1,..,M
ai, the average error rate for the M sub-samples.
A special case of cross validation is the so called
leave one out method, where M is chosen as the cardinality of
the training set. In other words for each given example another run of
learning is performed where all training data except for this example
is used for training, and the correctness of the classification of
the single example is checked.
For efficiency reasons "leave one out" is unusual in practice,
although in general it will be closest to the real error.
Instead M=10 is a frequently chosen compromise between computational
effort and quality of estimation results. |