Probably Approximately Correct (PAC-) Learning

Publication Blumer/etal/90a: Learnability and the Vapnik--Chervonenkis dimension
Haussler/99a: Convolution Kernels on Discrete Structures
Mitchell/97b: Machine Learning
Natarajan/91a: Machine Learning. A Theoretical Approach
Name Probably Approximately Correct (PAC-) Learning
Description

In the PAC-theory the topic is learnability of

  • concept classes C
  • over instance spaces LE
  • by certain hypotheses spaces H,
  • considering an unknown but fixed probability distribution D over LE, modeling that there is a probability for each instance of LE to occur.

All concepts of the concept classes C and H are subsets of LE.
An example e ∈ LE is classified according to a concept c, if it is given whether e ∈ c or not.

Probably Approximately Correct learnability means, that there is an algorithm A that is able to
  • select an ε-good hypothesis from H
  • with a probability of at least 1 - δ
  • for each δ, ε with 0 < δ < 1/2 , 0 < ε < 1/2.
To find such a hypothesis the algorithm is allowed to read a set X of classified examples, drawn with respect to D.
A hypothesis h is called ε-good, if the probability errorD(h, c) an instance e ∈ LE is drawn and "classified wrong" by h is at most ε:
errorP(h, c) := P(x ∈ (c \ h) ∪ (h \ c))

Two main questions can be distinguished in PAC-theory:
  1. How many examples are needed to learn Probably Approximately Correct?
    Is there a polynomal p for a specific concept class C, such that there is an algorithm A, which can find an ε - good hypothesis for each concept c ∈ C with a probability of at least 1 - δ, for each 0 < δ < 1/2 and 0 < ε < 1/2, after having read at most p(1/δ, 1/ε) classified examples from LE, drawn with respect to P ?
  2. How complex is the learning task?
    If point 1 is true, is there an algorithm A` and a polynomal p` such that A` can find an ε-good hypothesis in time p`(1 / δ, 1 / ε, |c|), where |c| denotes the representation length of the actual concept c to be learned ?
If question 2 can be answered yes, so can question 1, because in time p`(1/δ, 1/ε, |c|) the cardinality of the set of read examples is bound by a polynomal in 1/δ, 1/ε.
Dm Step Concept Learning
Function Approximation