As in COBWEB, AUTOCLASS [
Cheeseman et al., 1988],
and other systems [Anderson &
Matessa, 1991],
we will assume that clusters, , are described probabilistically:
each variable value has an associated conditional probability,
, which reflects the proportion of observations
in
that exhibit the value,
, along variable
.
In fact, each variable value
is actually associated with the number of observations in the cluster
having that value; probabilities
are computed `on demand' for purposes of evaluation.
Figure 1: A probabilistic categorization tree.
Probabilistically-described clusters arranged in a tree
form a hierarchical
clustering known as a probabilistic categorization tree. Each set of
sibling clusters partitions the observations covered by the common parent.
There is a single root cluster, identical in structure to other
clusters, but covering all observations and containing frequency
information necessary to compute 's as required
by category utility. Figure 1 gives
an example of a probablistic categorization
tree (i.e., a hierarchical clustering) in which each node
is a cluster of observations summarized probabilistically.
Observations are at leaves and are described by three variables:
Size, Color, and Shape.