Data Preprocessing

To understand and transform data so that it can be used as input for a machine learning algorithm is part of the KDD process. Intelligent methods can be used in this phase already, such as learning how to replace missing values. Since the quality of the achieved learning result depends crucially on an adequate representation of the data, this phase is equally important as mining itself.

Related Topics

Feature Extraction
Feature Set Transformations


SFB 531 Computational Intelligence


MiningMart system
RapidMiner (YALE)
RapidMiner HDF5 Extension


Euler, Timm
Klinkenberg, Ralf
Köpcke, Hanna
Mierswa, Ingo
Scholz, Martin

Past Master Thesis


Euler/2006a Timm Euler. Data Mining mit MiningMart. In Programmieren unter Linux, No. 1, pages 56--60, 2006.
Euler/2006b Timm Euler. Modeling Preparation for Data Mining Processes. In Journal of Telecommunications and Information Technology, No. 4, pages 81--87, 2006.
Euler/2005a Timm Euler. Publishing Operational Models of Data Mining Case Studies. In Proceedings of the Workshop on Data Mining Case Studies at the 5th IEEE International Conference on Data Mining (ICDM), pages 99--106, Houston, Texas, USA, 2005.
Euler/2005d Timm Euler. Modelling Data Mining Processes on a Conceptual Level. In Proceedings of the 5th International Conference on Decision Support for Telecommunications and Information Society, Warsaw, Poland, 2005.
Morik/Koepcke/2004a Morik, Katharina and Köpcke, Hanna. Analysing Customer Churn in Insurance Data - A Case Study. In Jean-Francois Boulicaut and Floriana Esposito and Fosca Giannotti and Dino Pedreschi (editors), PKDD '04: Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Vol. 3202, pages 325--336, New York, NY, USA, Springer, 2004.
Morik/Scholz/2004a Morik, Katharina and Scholz, Martin. The MiningMart Approach to Knowledge Discovery in Databases. In Ning Zhong and Jiming Liu (editors), Intelligent Technologies for Information Analysis, pages 47--65, Springer, 2004.
Morik/2000a Morik, Katharina. The Representation Race - Preprocessing for Handling Time Phenomena. In Ramon L\'opez de M\'antaras and Enric Plaza (editors), Proceedings of the 11th European Conference on Machine Learning (ECML), Vol. 1810, pages 4--19, Berlin, Heidelberg, New York, Springer, 2000. Arrow Symbol