Concept Learning

Name Concept Learning
Description The task of concept learning is to acquire general concepts from specific training examples. Training examples are instances, which either belong to a special concept, and therefore are positive examples, or do not belong to the concept, and therefore are negative examples.

Given

  • a description language LE for examples,
  • a hypothesis language LH for possible hypotheses (i.e. possible learning results),
  • a set of positive examples (instances of a certain concept)
  • a set of negative examples (instances that do not belong to the concept),
  • a predicate "covers", which indicates whether a given concept/hypothesis covers a given example,
  • an acceptance criterion (measure) which evaluates hypotheses,
Find hypotheses in the language LH fulfilling the acceptance criterion.
The partial order of LH can be used to prune the search.

Definition One can distinguish two more specific concept learning tasks, each related to another theory. Function Approximation may be regarded as a third specialization.

1. The concept learning task in the predictive setting of Inductive Logic Programming is:

Given
  • positive (E+) and negative (E-) examples E in an example language LE,
  • background knowledge B in a language LB,
with B ∪ E ⊭⊥ and B ⊭ E.

The task is to find a hypothesis h in LH,
  • which is consistent with B and E: B, H, E ⊭ ⊥
  • that together with B implies all of the positive examples in E: B, H ⊨ E+
  • that together with B does not imply any of the negative examples in E: ∀e ∈ E- : B, H ⊭ e
Usually LB and LE are chosen as sets of ground facts, LH is a restricted first order logic and the logical implication is modeled by an inference operator.
For selection from a set of correct and complete hypotheses, there often is a special preference operator.

2. The scenario of Concept Learning in the PAC-paradigm:

A learning algorithm
  • for concepts c of a representation class C
  • reads a set of examples X
  • from an example language LE
  • with respect to an unknown, "correct" probability distribution D over LE.
  • The given examples are classified, that is for every example e in X it is given whether e belongs to concept c or not.
Additionally the learning algorithm receives two parameters, "δ" and "ε", weakening the requirements of possible learning results.
  • ε defines the maximum error, up to which hypotheses are regarded to classify a given concept approximately correct.
    The error of a hypothesis h, refering to a concept c, is the probability, an example drawn from LE, with respect to the given probability distribution D, is classified wrong by h.
  • δ is an upper bound for the probability, the algorithm does not learn the given concept approximately correct.
A concept class C is probably approximately correct (PAC-) learnable by a hypotheses class H from a set of (positive and negative classified) examples X, if there is an algorithm L(δ , ε), which can output a hypothesis h from H with an error of at most epsilon,
  • for every fixed probability distribution D over LE,
  • for all concepts c in C,
  • in time, limited by a polynomal in 1/ε , 1/δ and |c|,
  • with a probability of at least 1 - δ,
where
  • |c| is the (variously defined) representation length, for example based on the length of the most efficient representation of c and
  • 0 < δ < 1/2 and 0 < ε < 1/2.
Methods Boosting
Bottom-Up Induction of Horn Clauses
GOLEM
Incremental decision tree learning
Lazy Learning
Logistic Regression
Naive Bayes
PROGOL
Searching Version Spaces
Star
Support Vector Machine (SVM)
Top-Down Induction of Decision Trees (TDIDT)
Top-Down Induction of Horn Clauses
Theories Bayesian Learning
Inductive Logic Programming (ILP)
Learning as Search
Probably Approximately Correct (PAC-) Learning
Statistical Learning