|
The Caravan Case: Method Selection
Name |
The Caravan Case: Method Selection |
Description |
|
Running a learning technique
Many different learning techniques, such as logistic regression, neural networks and
decision trees can be deployed for a specific task
(e.g. classification) are candidates. Some techniques deliver a binary
classification (class label: prospective caravan policy owner or not; e.g. decision trees) others
come up with continuous predictions (a continuous number indicating the chances for caravan policy
ownership; e.g. logistic regression).
Neural networks can be designed for either type of result. Here, we designed the network to
deliver binary results.
To get a better picture of the suitability of various techniques,
a number of models were actually
built: logistic regression (with the logit link function and
forward feature selection), and decision trees. The experiments
were organized as follows:
- the original data set was split into train and test
sets (with 6:4 ratio) and the same response rate),
- models developed on the training set were
applied to the test set, and the top 20% of cases were selected
- the percentage of correctly predicted
"positive" cases (within the selection) was determined.
(From now on we will refer to this as model accuracy)
- For each technique at least 10 experimental repeats were
performed. The mean accuracy and standard deviation were calculated
were presented as the result.
The results of most methods were similar: 11% to 13% accuracy. Logistic regression
performed better than the neural network and this better than decision trees.
|
|
|
|
Classification
First of all, the detailed knowledge representation for the
classification
task is subordinate to predictive accuracy;
performance is what counts here. As the data inspection has not given support
for preferring one representation over another, and because one goal of the caravan
policy pilot is to get acquainted with data mining, it is decided to follow an
experimental approach.
|
|
|
|
Several learning techniques have been used for these tasks. The following techniques will
be discussed:
|
|
|
|
Characterization
In the case of
characterization, the representation is of major importance. A marketeer
must understand the outcome, and must also be able to use the outcome to define
the necesasary actions. Therefore, a (semi-)linguistic representation in the
form of rules is preferred.
|
|
|
|
The following learning techniques have been used for these tasks:
|
|
|
|
Case Study |
The Caravan Policy Case
|
|
|