Train and Test

Description:

Keeping a test set additionally to the train set is a general method to estimate the accuracy of a concept learning algorithm (classifier). Given a sample of classified instances and a concept learning algorithm, this is how this method works:

  • Split the data in two parts, a training set and a test set.
  • Run the learner on the training set, but do not show the test set. Let h denote the hypothesis output by the learner.
  • Use h to classify all instances of the test set. The fraction of correct classified instances is the estimated accuracy.

Usually a fraction of 20-30% of the available data is chosen as the test set. This is a good option, if the size of the test set is larger than 1000.