The Caravan Case: Method Selection

Name The Caravan Case: Method Selection

Description

Running a learning technique

Many different learning techniques, such as logistic regression, neural networks and decision trees can be deployed for a specific task (e.g. classification) are candidates. Some techniques deliver a binary classification (class label: prospective caravan policy owner or not; e.g. decision trees) others come up with continuous predictions (a continuous number indicating the chances for caravan policy ownership; e.g. logistic regression). Neural networks can be designed for either type of result. Here, we designed the network to deliver binary results.

To get a better picture of the suitability of various techniques, a number of models were actually built: logistic regression (with the logit link function and forward feature selection), and decision trees. The experiments were organized as follows:

the original data set was split into train and test sets (with 6:4 ratio) and the same response rate),

models developed on the training set were applied to the test set, and the top 20% of cases were selected

the percentage of correctly predicted "positive" cases (within the selection) was determined. (From now on we will refer to this as model accuracy)

For each technique at least 10 experimental repeats were performed. The mean accuracy and standard deviation were calculated were presented as the result.

The results of most methods were similar: 11% to 13% accuracy. Logistic regression performed better than the neural network and this better than decision trees.

Classification
First of all, the detailed knowledge representation for the classification task is subordinate to predictive accuracy; performance is what counts here. As the data inspection has not given support for preferring one representation over another, and because one goal of the caravan policy pilot is to get acquainted with data mining, it is decided to follow an experimental approach.

Several learning techniques have been used for these tasks. The following techniques will be discussed:

Logistic Regression

Neural Networks

Decision Trees

Characterization
In the case of characterization, the representation is of major importance. A marketeer must understand the outcome, and must also be able to use the outcome to define the necesasary actions. Therefore, a (semi-)linguistic representation in the form of rules is preferred.

The following learning techniques have been used for these tasks:

Decision Trees

Case Study The Caravan Policy Case