|
Interesting Subgroups
Name |
Interesting Subgroups |
Description |
The task of finding interesting subgroups is related to the task of characterization.
We are looking for subsets of an instance space, with interesting properties.
This differs from the tasks of concept learning and function approximation,
because we are not trying to find a hypothesis, that globally describes the data
and enables to predict unseen instances, but we focus on subsets of the data.
One possible application field is marketing, for finding favourable market segments
is a specific case of this task.
|
Definition |
Given
- an instance space X,
- a probability distribution D,
- a hypothesis space LH,
- an extension function ext: H → 2X, assigning each
hypothesis a set of instances,
- a sample S ⊆ X of the instance space, drawn according to D,
- and a quality function q : LH → ℝ.
We can define the learning task of Interesting Subgroups in two ways:
- Given X, S, Lh, q and a qmin ∈ R. Find all
hypotheses h ⊆ Lh, with q(h) ≥ qmin.
- Given X, S, Lh, q and k ∈ ℕ. Find a set of
hypotheses H ⊆ Lh, |H| = k, and there is no h ∈ H and
h' ∈ Lh \ H so that q(h') ≥ q(h).
So we either try to find all subgroups that are interesting enough, or k most
interesting subgroups. The quality function q aims at measuring the
degree of interest, assigned to a particular subgroup.
However, estimating the degree of users is a complex task, so quality
functions are usually restricted to detecting statistical properties.
|
|
|