Data Cleansing

Description:

To increase the quality of results Machine Learning techniques yield, when applied to large datasets, a step of inspecting the data and removing or correcting corrupt or misleading parts should be performed first.

Typical problems are contradictory or incomplete information. This will confuse learning algorithms, for it is known that learning in the presence of noise is much harder, than in the case of correct information. Please refer to the case studies for a more detailed discussion on data cleansing.