Feature Selection

Description:

Feature selection aims at focussing on those attributes of a datasets, which are relevant for the learning task. Irrelevant attributes always bear the risk of confusing the learner with random correlevance, while on the other hand they do not provide any useful information.

The degree irrelevant and redundant attributes are harmful depends on the learning method selected. While algorithms like k-Nearest Neighbor are known to perform significantly worse in the presence of such attributes, id3 automatically performs some kind of feature selection, by choosing the test with highest information gain.

For a more detailed discussion on feature selection, please refer to our case studies, especially to data design and data cleansing.

Publications:

Kohavi/John/97a: Wrappers for feature subset selection
Liu/Motoda/98b: Feature Selection for Knowledge Discovery and Data Mining