Hauptnavigation

Pages about teaching are available in German only Zurück zu der Liste der Abschlussarbeiten

Information-theoretic k-means clustering

Title Information-theoretic k-means clustering
Description

The objective of this is to reimplement the text clustering algorithms from the two papers

Banerjee, A., Dhillon, I. S., Ghosh, J., Sra, S., & Ridgeway, G. (2005). Clustering on the Unit Hypersphere using von Mises-Fisher Distributions. Journal of Machine Learning Research, 6(9).

Wu, Junjie. "Information-Theoretic K-means for Text Clustering." Advances in K-means Clustering. Springer, Berlin, Heidelberg, 2012. 69-98.

and compare them to the existing algorithms such as spherical k-means.

The implementation is to be done in the ELKI data mining framework (Java), which already contains spherical k-means as comparison algorithm as well as evaluation measures.

A careful experimental evaluation is expected to study the strength and weaknesses of the algorithms, and verify the claims from above papers.

Qualification

This is only suitable as a Bachelor thesis topic.

Good Java programming skills

Good understanding of data mining and machine learning

Good statistical knowledge

Thesistype Bachelorthesis
Second Tutor Schubert, Erich
Professor Schubert, Erich
Status Vorgemerkt