Title | Information-theoretic k-means clustering |
---|---|
Description |
The objective of this is to reimplement the text clustering algorithms from the two papers Banerjee, A., Dhillon, I. S., Ghosh, J., Sra, S., & Ridgeway, G. (2005). Clustering on the Unit Hypersphere using von Mises-Fisher Distributions. Journal of Machine Learning Research, 6(9). Wu, Junjie. "Information-Theoretic K-means for Text Clustering." Advances in K-means Clustering. Springer, Berlin, Heidelberg, 2012. 69-98. and compare them to the existing algorithms such as spherical k-means. The implementation is to be done in the ELKI data mining framework (Java), which already contains spherical k-means as comparison algorithm as well as evaluation measures. A careful experimental evaluation is expected to study the strength and weaknesses of the algorithms, and verify the claims from above papers. |
Qualification |
This is only suitable as a Bachelor thesis topic. Good Java programming skills Good understanding of data mining and machine learning Good statistical knowledge |
Thesistype | Bachelorthesis |
Second Tutor | Schubert, Erich |
Professor | Schubert, Erich |
Status | Vorgemerkt |
---|