Decision Trees

Simple dataset
First, the decision tree learner is run on the binary dataset as obtained by the data cleansing. Data transformation is not necessary because the decision tree learner automatically constructs intervals that can be used for tests in the decision tree. The first attempt results in a rather poor result. A typical predictive result for this dataset is shown below:

	predicted caravan	predicted no_caravan
caravan	3	240	243
no_caravan	3	3754	3757
	6	3994	4000

Due to the under-representation of policy holders in the original data set, and the low density of policy holders, the decision tree learner can hardly improve on the default accuracy of 0.94.

Balancing the dataset

This problem is not different from the classification task as solved before. In principle, "boosted" decision tree learning would provide a satisfactory solution for it. However, as our aim is to deliver a comprehensible solution that is accessible to a marketeer, boosting does not suffice.

Instead, an effect similar to boosting is achieved artificially, by including multiple copies of policy holder records in the data set. This, by the way, is equivalent to blaming the learning algorithm more for misclassifying a policy holder then for a non-policy holder. In this case, we copy the policy holder records to make the policy holders appear as frequent in the data set as the non-policy holders.

With this new dataset, the decision tree learner finds a better decision tree.

predicted

caravan

predicted

no_caravan

caravan 176 67 243
no_caravan 1461 2296 3757
1637 2363 4000

From this matrix we can see that approximately 3/4 of the policy holders (176) is correctly classified by the decision tree as generated on the balanced dataset. Simultaneously, 2/3 of the non-holders are classified incorrectly. As our target was to recognise policy holders, and infer potential holders in the non-holders group, this seems an appropriate result.

Results
The resulting decision tree is almost trivial (see figure below). It consists of one decision that makes the difference between predicting policy holdership and non-holdership: Attribute A5 (does the client have a car policy or not). In other words: to determine whether a client has a good chance of buying a caravan policy, the only relevant predictor is to see whether he has a car policy. If so, it is a prospect for a caravan policy. If not, offering a caravan policy may not be a good idea.

This result is obtained after boosting the dataset to double the original size. When boosting the dataset iteratively, generating decision decision trees after each boosting cycle, one may find more structured rules, with more decision criteria. A glance of how this may look is obtained when looking at the association rules as generated from the balanced data. Association rules predict the occurrence of one class. For the extended data set, focussing on the caravan policy holders, the following rules are generated.

Association rules
Association rules are a means to describe the common features of a group of concepts. Instead of distinguishing classes, they find commonalities within classes. As illustrated in the thumbnail, a single association rule describe part of a class in the form of a hypercube.

For the caravan policy owners, several association rules were found:

- to be filled in
- rule: 2 predictive-value: 97.2% coverage: 63.0%
(f5 <= 0.5)
- rule: 3 predictive-value: 89.3% coverage: 37.0%
(f5 > 0.5)
- rule: 4 predictive-value: 91.5% coverage: 31.4%
(f4 <= 0.5) & (f5 > 0.5)
- rule: 6 predictive-value: 92.3% coverage: 28.9%
(f4 <= 0.5) & (f5 > 0.5) & (f9 <= 0.5)
- rule: 8 predictive-value: 92.6% coverage: 28.7%
(f4 <= 0.5) & (f5 > 0.5) & (f7 <= 0.5) & (f9 <= 0.5)

Results achieved with the decison tree and association rule modules of the Data Mining Software Kit (see: S.M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann)