Interesting Subgroups

Name Interesting Subgroups

Description
The task of finding interesting subgroups is related to the task of characterization. We are looking for subsets of an instance space, with interesting properties. This differs from the tasks of concept learning and function approximation, because we are not trying to find a hypothesis, that globally describes the data and enables to predict unseen instances, but we focus on subsets of the data.

One possible application field is marketing, for finding favourable market segments is a specific case of this task.

Definition
Given

an instance space X,

a probability distribution D,

a hypothesis space L_H,

an extension function ext: H → 2^X, assigning each hypothesis a set of instances,

a sample S ⊆ X of the instance space, drawn according to D,

and a quality function q : L_H → ℝ.

We can define the learning task of Interesting Subgroups in two ways:

Given X, S, L_h, q and a q_min ∈ R. Find all hypotheses h ⊆ L_h, with q(h) ≥ q_min.

Given X, S, L_h, q and k ∈ ℕ. Find a set of hypotheses H ⊆ L_h, |H| = k, and there is no h ∈ H and h' ∈ L_h \ H so that q(h') ≥ q(h).

So we either try to find all subgroups that are interesting enough, or k most interesting subgroups. The quality function q aims at measuring the degree of interest, assigned to a particular subgroup. However, estimating the degree of users is a complex task, so quality functions are usually restricted to detecting statistical properties.