The first component
are operators that perform data transformations such as, e.g., discretization,
handling null values, aggregation of attributes into a new one, or collecting
sequences from time-stamped data. The operators directly access the database
and are capable of handling large masses of data. Machine learning
is not restricted to a data mining step, but is also applicable in preprocessing.
This view offers a variety of learning tasks that are not as well investigated
as is learning classifiers. For instance, an important task is to acquire
events and their duration (i.e. a time interval) on the basis of time
series (i.e. measurements at time points).
See
the available operators with some technical descriptions.
The second component are
successful cases of knowledge discovery. Since most of the time is
used to find chains of operator applications that lead to good answers
to complex questions, it is cumbersome to develop such chains over and
over again for very similar discovery tasks and data. Currently, even
the same task on data of the same format is implemented anew every time
new data are to be analysed. Therefore, the re-use of successful cases
would speed up the process considerably. Cases of successful
preprocessing are stored for their re-use.
Metadata of cases can be adapted to similar cases. A library of best-practice
cases in the form of their meta-data is currently being collected. MiningMart
presents cases from areas ranging from on-line monitoring in intensive
care to direct mailing actions.
The particular approach of the
MininjgMart project is to allow the re-use of cases by means of meta-data,
also called ontologies. Meta-data describe the data as well as the operator
chains. A compiler generates the SQL code according to the meta-data.
Read more about the advantages of meta-data driven software generation.
MiningMart Architecuture
The MiningMart project has developed a model for meta-data together with
its compiler and implements human-computer interfaces that allow database
managers and case designers to fill in their application-specific meta-data.
The system will support preprocessing and can be used stand-alone or in
combination with a toolbox for the data mining step.