Caravan Policy Case
The caravan policy case was kindly made available by Sentient Machine Research. The data stems from a real world insurance company. The examples of the applied techniques to the data are obtained from the Benelearn competition on machine learning.


Characterization
This task involves describing the subspace in the total customerspace where customers have an increased chance of purchasing a caravan policy. The description is preferably interpretable by human beings. More (Task: Interesting Subgroups) ...


Classification
This task involves predicting if a customer will have a caravan insurance policy from other data about the customer. The training set contains 6000 descriptions of customers, including the information if they have a caravan insurance policy.
The task is motivated by the decision to include customers in a mailing. Mail will be sent only to customers with a high probability of becoming caravan policy holders. More (Task: Concept Learning) ...


Mosaic region and customer typing
Ideally, direct marketeers would like to have a detailed description of each individual household, in order to assess whether someone is potentially interested in a product. Privacy legislation prevents the acquisition and maintenance of such data. Mosaic is a system to gather information on groups of individuals. The basis for the system is the postalcode. Data on households is aggregated on postalcode level, and expressed in terms of averages and chances (e.g. attribute nr. 6 MGODRK gives the fraction of roman catholic inhabitants in the region. For the region of client 31, this is between 37 and 49 %). On the basis of this aggregated data, regions are clustered in prototypical groups: the mosaic customer types. The customer type description belonging to the type successful hedonists (of client 10) is amongst others:
  • living in the better situated suburban areas
  • middle-aged to older
  • high education level
  • high income
  • driving expensive and new cars
  • reading quality papers


Personal data
CTRCLNTNR Client number
CTRStrt Street of the Insurance client
CTRHsnr House number of the insurance client
CTRZIP Postal code of insurance client
CTRStart Start date of contact
CTRLast Last contact date
etc. etc.


Retired & religious
Record 8 of the Caravan Policy Database. This record represents a small religious family, from middle management or unskilled labor background, with lower income, and one third party insurance for a total premium contribution of Dfl 1-49.


Sales data
PPERSAUT Contribution car policies
PBESAUT Contribution delivery van policies
PMOTSCO Contribution motorcycle/scooter policies
PVRAAUT Contribution lorry policies
PAANHANG Contribution trailer policies
PTRACTOR Contribution tractor policies
etc. etc.


Socio-demographic information
Personal data about clients that relates to postal codes to aspect such as age, social class etc.


Succesful hedonist
Record 10 of the Caravan Policy Database. This record represents a presumably single-person household of a person ion the age of 50-60, with a chance of ca. 50% of being religous, medium to high educated, without children. The person will be a middle manger, from high to medium education, with a farily high income. He has a MIC fire and third party insurance, with premium contributions of Dfl 200-499 and Dfl 50-99 respectively.