|
Simple dataset
First, the decision tree learner is run on the binary dataset as obtained
by the data cleansing. Data
transformation is not necessary because the
decision tree learner automatically
constructs intervals that can be used for
tests in the decision tree. The first
attempt
results in a rather poor result. A typical predictive result for this dataset is shown
below:
|
predicted
caravan
|
predicted
no_caravan
|
|
caravan | 3 | 240 | 243 |
no_caravan | 3 | 3754 | 3757 |
| 6 | 3994 | 4000 |
Due to the under-representation of policy holders in the original data set, and the low density of policy
holders, the decision tree learner can hardly improve on the default accuracy of 0.94.
|
|
|
|
Balancing the dataset
This problem is not different from the classification task as solved before. In principle,
"boosted" decision tree learning would provide
a satisfactory solution for it. However, as our aim is to deliver a comprehensible solution that is accessible to
a marketeer, boosting does not suffice.
Instead, an effect similar to boosting is achieved artificially, by including multiple
copies of policy holder records in the
data set. This, by the way, is equivalent to blaming the learning algorithm more for misclassifying a policy holder
then for a non-policy holder. In this case, we
copy the policy holder records to make the policy holders
appear as frequent in the data set as the non-policy holders.
With this new dataset, the decision tree learner
finds a better decision tree.
|
predicted
caravan
|
predicted
no_caravan
|
|
caravan | 176 | 67 | 243 |
no_caravan | 1461 | 2296 | 3757 |
| 1637 | 2363 | 4000 |
From this matrix we can see that approximately 3/4 of the policy holders (176) is correctly
classified by the decision tree as generated on the
balanced dataset. Simultaneously, 2/3 of the non-holders
are classified incorrectly. As our target was to recognise policy holders, and infer potential holders in the
non-holders group, this seems an appropriate result.
|
|
|
|
Results
The resulting decision tree is almost
trivial (see figure below). It consists of one decision that makes the difference between
predicting policy holdership and non-holdership: Attribute A5 (does the client have a car policy or not).
In other words: to determine whether a client has a good
chance of buying a
caravan policy, the only
relevant predictor is to see whether he has a car policy. If so, it is a prospect for a caravan policy. If not,
offering a caravan policy may not be a good idea.
This result is obtained after boosting the dataset to double the original size. When boosting the dataset
iteratively, generating decision decision trees after
each boosting cycle, one may find more structured rules,
with more decision criteria. A glance of how this may look is
obtained when looking at the association rules as
generated from the balanced data. Association rules predict the
occurrence of one class. For the extended data
set, focussing on the caravan policy holders, the following rules are generated.
|
|
|
|
Association rules
Association rules are a means to describe the common features of a group of concepts. Instead of
distinguishing classes, they find commonalities within
classes. As illustrated in the thumbnail, a
single association rule describe part of a
class in the form of a hypercube.
For the caravan policy owners, several association rules
were found:
- to be filled in
- rule: 2 predictive-value: 97.2% coverage: 63.0%
(f5 <= 0.5)
- rule: 3 predictive-value: 89.3% coverage: 37.0%
(f5 > 0.5)
- rule: 4 predictive-value: 91.5% coverage: 31.4%
(f4 <= 0.5) & (f5 > 0.5)
- rule: 6 predictive-value: 92.3% coverage: 28.9%
(f4 <= 0.5) & (f5 > 0.5) & (f9 <= 0.5)
- rule: 8 predictive-value: 92.6% coverage: 28.7%
(f4 <= 0.5) & (f5 > 0.5) & (f7 <= 0.5) & (f9 <= 0.5)
|
|
|
Results achieved with the decison tree and association rule modules of the Data Mining Software Kit (see: S.M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann)
|
|