Description |
The probabilistic categorization tree, used e.g. by COBWEB is a special data structure, a tree where ...
- each node defines a cluster, which is a set of (unclassified) examples.
- each leaf represents exactly one example, described by an attribute-value tuple.
- each inner node represents a cluster, consisting of the union of the clusters of its successors.
The other way around: The set of successors partition the set of examples covered by
a node.
- the root node represents the cluster containing all examples stored in the tree.
- for every node n (except for the root node) the following probability is calculated and attached to the node:
Given, that an example from the cluster of n's successor has been drawn under a uniform distribution,
how probable is it, that this example is in n's cluster?
- for each attribute a, each possible value v of a, and each node n of the tree, the fraction
(again interpreted as a probability) of those examples in n's cluster is calculated, where attribute a has value v.
Such a tree is suited for describing a hierarchical clustering probabilistically.
This hypothesis language is highly related to the task to achieve
- high predictability of variable values, given a cluster, and
- high predictiveness of a cluster, given variable values,
by choosing an appropriate tree for a given set of examples.
|