main page

End user's view on the MiningMart project


Why is KDD important?

Almost every company has been through a customer profiling exercise. To better target your markets and best customer prospects you need to be able to answer these proverbial questions:

  • Who is my best customer / my worst customer?
  • Who is my single buy customer?
  • What do they buy?
  • When do they buy it?
  • Why don't they buy?

The customer profile will describe what the customers are like in marketing terms, spending patterns, payment histories, repeat business opportunities, cross selling products, and similar categories that are considered significant to ongoing business development.

Important topics in analysing data

Data Mining is the process of finding new and potentially useful knowledge from data. According to a recent study by Gartner Group, worldwide spending on Data Mining licenses and services is expected to reach $76.3 billion in 2005, more than tripling the $23.3 billion spend in 2000. The most important business tasks in Data Mining are:

  • Customer Relationship Management is a strategy used to learn more about customers' needs and behaviors in order to develop stronger relationships with them. After all, good customer relationships are at the heart of business success. CRM to be effective requires using information about customers and prospects in all stages of their relationship with a company. From the company's point of view, the stages are acquiring customers, increasing the value of customers and retaining good customers.
    Read more about Customer Relationship Management.

  • Direct mailing is a commonly chosen method by companies as a part of their direct marketing strategies. Of course every company wants its mailings to be as effective as possible. The effectiveness of a mailing campaign can be measured by its response rate. A high response rate means that the marketing goals have been achieved and therefore that the mailing costs were justified. A company that regularly sends mails for marketing purposes can reduce the mailing costs considerably by optimizing the responses using data mining techniques.
    One example of a direct mailing action is described here.

  • Fraud detection systems enable an operator to respond to fraud by denying services to or detecting and preparing prosecutions against fraudulent users. The huge volume of call activity in a network means that fraud detection and analysis is a challenging problem.

  • Other tasks are the prediction of sales in order to minimize stocks, the prediction of electricity consumption or telecommunication services at particular day times in order to minimize the use of external services or optimize network routing, respectively. The health sector demands several analysis tasks for resource management, quality control, and decision making.

Data analysis

On-line Analytical Processing (OLAP) offers interactive data analysis by aggregating data and counting the frequencies. This already answers questions like the following:

  • What are the attributes of my most frequent customers?
  • Which are the frequently sold products?
  • How many unpaid bills do I have to expect per year?
  • How many returns did I receive after my last direct mailing action?
Reports that support decision making need more detailed information. Questions are more specific, for instance:

  • Which customers are most likely to sell their insurance contract back to the insurance company before it ends?
  • How many sales of a certain item do I have to expect in order to not offer empty shelves to customers and at the same time minimize my stock?

Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the "high-level" application of particular data mining methods. The unifying goal of the KDD process is to extract knowledge from data in the context of large databases. Knowledge Discovery in Databases (KDD) can be considered a high-level query language for relational databases that aims at generating sensible reports such that a company may enhance its performance. KDD enables analysts to model virtually any customer activity and to find previously hidden patterns relevant to current business problems, or business evolution and growth.

But data mining is a difficult process which requires many iterations and adaptions in the data and in the parameter settings until a satisfactory result is achieved. Within the data mining process considerable time is spend for pre-processing the data (data cleaning and handling of null values), feature generation and selection (in databases this means to construct additional columns and select the relevant attributes). Practical experiences have shown that the time spend on preprocessing can take from 50% up to 80% of the entire data mining process when using the traditional attribute-value learners. That´s why preprocessing is the key issue in data analysis.

The MiningMart Approach


MiningMart can help to reduce this time. The MiningMart project aims at new techniques that give decision-makers direct access to information stored in databases, data warehouses, and knowledge bases. The main goal is to support users in making intelligent choices by offering following objectives:

  • Operators for preprocessing with direct database access
  • Use of machine learning for the preprocessing
  • Detailed documentation of successful cases
  • High quality discovery results
  • Scalability to very large databases
  • Techniques that automatically select or change representations.

What is MiningMart’s path to reaching the goal?

Examples of successfully applied Data Mining Cases with the MiningMart System

The MiningMart System was successfully applied in two telecommunications companies, the National Institute of Telecommunications in Warsaw, Poland, and the Telecom Italia Lab in Alessandria, Italy. The details of these cases are published in the internet case base that MiningMart provides (see next paragraph).

Case base of successful cases on the internet

One of the project’s objectives is to set up a case-base of successful cases on the internet. The shared knowledge allows all internet users to benefit from a new case. Submitting a new case of best practice is a safe advertisement for KDD specialists or service providers, since the relational data model is kept private. Only the conceptual and the case model is published. The case base can be found here.

A detailed description of the case base is available here.