Probabilistic Categorization Tree

Name Probabilistic Categorization Tree

Description
The probabilistic categorization tree, used e.g. by COBWEB is a special data structure, a tree where ...
each node defines a cluster, which is a set of (unclassified) examples.

each leaf represents exactly one example, described by an attribute-value tuple.

each inner node represents a cluster, consisting of the union of the clusters of its successors. The other way around: The set of successors partition the set of examples covered by a node.

the root node represents the cluster containing all examples stored in the tree.

for every node n (except for the root node) the following probability is calculated and attached to the node:
Given, that an example from the cluster of n's successor has been drawn under a uniform distribution, how probable is it, that this example is in n's cluster?

for each attribute a, each possible value v of a, and each node n of the tree, the fraction (again interpreted as a probability) of those examples in n's cluster is calculated, where attribute a has value v.

Such a tree is suited for describing a hierarchical clustering probabilistically. This hypothesis language is highly related to the task to achieve

high predictability of variable values, given a cluster, and

high predictiveness of a cluster, given variable values,

by choosing an appropriate tree for a given set of examples.

Methods COBWEB