Clustering (Unsupervised Learning)

Name Clustering (Unsupervised Learning)
Description

The task of clustering is to structure a given set of unclassified instances of an example language by creating concepts, based on similarities found on the training data. So the main difference to supervised learning is, that there is neither a target predicate nor an oracle, dividing the instances of the training set into categories. The categories are formed by the learner itself.


Given: A set of (unclassified) instances of an example language LE.

Find a set of concepts that cover all given examples, such that
  • the similarity between examples of the same concepts is maximized,
  • the similarity between examples of different concepts is minimized.

Conceptual Clustering

The setting above just aims at finding subsets of similar examples. Conceptual Clustering extends this task to finding intensional descriptions of these subsets. This can be seen as a second learning step, although it will not necessarily be split from the first one:
  1. Partion the example set, optimizing a measure based on similarity of the instances within the same subsets.
  2. Perform concept learning (supervised learning) for each of the found subsets, to turn the extensional description into an intensional one.
Note that the second step allows for prediction of yet unseen instances of the example language.

One method addressing the task of Conceptual Clustering is the Star method. COBWEB is an example of a clustering algorithm, which does not induce an intesional description of the found clusters, but organizes them in a tree structure.

Methods COBWEB
Star