Probabilistic Categorization Tree

Name Probabilistic Categorization Tree
Description

The probabilistic categorization tree, used e.g. by COBWEB is a special data structure, a tree where ...

  • each node defines a cluster, which is a set of (unclassified) examples.
  • each leaf represents exactly one example, described by an attribute-value tuple.
  • each inner node represents a cluster, consisting of the union of the clusters of its successors. The other way around: The set of successors partition the set of examples covered by a node.
  • the root node represents the cluster containing all examples stored in the tree.
  • for every node n (except for the root node) the following probability is calculated and attached to the node:
    Given, that an example from the cluster of n's successor has been drawn under a uniform distribution, how probable is it, that this example is in n's cluster?
  • for each attribute a, each possible value v of a, and each node n of the tree, the fraction (again interpreted as a probability) of those examples in n's cluster is calculated, where attribute a has value v.

Such a tree is suited for describing a hierarchical clustering probabilistically. This hypothesis language is highly related to the task to achieve

  • high predictability of variable values, given a cluster, and
  • high predictiveness of a cluster, given variable values,
by choosing an appropriate tree for a given set of examples.

Methods COBWEB