Function Approximation

Publication Draper/Smith/73a: Applied Regression Analysis

Name Function Approximation

Description
Many general learning tasks, especially concept learning, may be regarded as function approximation.

Given are examples of the function to be approximated. The aim is to find a hypothesis (a function as well), that can be used for predicting the function values of yet unseen instances, e.g. to predict future events.

Good hypotheses are those, often predicting an accurate function value. The quality of a hypothesis for approximating a specific function is measured by a so called loss function, increasing as the differences between predicted and true function values increase. It is also taken into account, that it is more important to predict well on frequently observed instances.

Definition Given

a target function t: X → ℝ, mapping instances of a domain X to real values.

a probability distribution P over the domain X, indicating the probability each instance x ∈ X is drawn.

an oracle, producing a finite set E of examples, where E ⊂ X x ℝ. Each example e ∈ E is generated like this:

An instance x ∈ X is drawn with respect to P.

The value of y is chosen as t(x), the target function value of x.
At this point noisy examples might be modelled by allowing y ≠ t(x), e.g. y ≈ t(x), where y deviates randomly from t(x).
Output e := (x, y).

a language LH for hypotheses.Each single hypothesis h is a mapping h: X → ℝ.

Then the aim of function approximation is:
Find a hypothesis h ∈ LH minimizing the expected error (or expected risk), defined as

|E|

∑ P(x_i) * Q(x_i, h) ,

i=1

where x_i denotes the instance of example number i, and P(x_i) the probability for x_i to be drawn. So it is more important to make less mistakes on more probable examples than on less probable ones.

Q(x_i, h) is a loss function, describing the quality of the prediction of hypothesis h on example x_i.

There are two subtasks, depending on the structure of Q:

Classification: The examples are divided into a given set of classes.
Usually the following function is applied:

Q(x, h) := 1 if h(x) ≠ t(x)

0 if h(x) = t(x)

Note that, applying this measure, we have reached the PAC-version of the task of concept learning!

Regression: A real value function shall be approximated.
Q is often chosen as the squared difference between the predicted value h(x) and the correct value t(x):
Q(x, h) := (t(x) - h(x))²

Methods Lazy Learning
Neural Nets
Support Vector Machine (SVM)
Top-Down Induction of Regression Trees

Theories Probably Approximately Correct (PAC-) Learning
Statistical Learning