![]() |
|
Curriculum for KDDSince the analysis of very large data sets with many variables has become a hot topic in Computer Science, both, from a scientific and from a business perspective, Knowledge Discovery (Data Mining) is partially taught in most European universities. Due to the interdisciplinary nature of the field, courses are embedded into the field of database research, statistics, or machine learning (artificial intelligence). Although part of the topics is lectured at most universities, there is still a lack of a comprehensive lecture at many universities. Expertise in this new field cannot be presupposed everywhere. Therefore, we offer a guideline for such a comprehensive course, which makes it easy to teach the course at every Computer Science department. Our generic curriculum offers:
How to read the curriculumKnowledge Discovery (data mining), KDD for short, has become part of the teaching activities in computer science and statistics. Being interdisciplinary by nature, the background knowledge stems from databases, statistics, and machine learning including computational learning theory. Depending on the faculty teaching the course and the overall curriculum students are follwing at the particular university, some parts of the KDD curriculum can be dropped, others be strengthened. Students visiting the KDD lecture are usually graduate students. However, their background can differ considerably. For instance, they may already be acquainted with databases, with statistical measures, with complexity theory -- or not. Additionally, universities have a profile and hence focus on some aspects. This choice may influence the outline of a KDD course, too.Hence, some flexibility applies to the proposed KDD curriculum. The proposed KDD curriculum is structured into chapters, each containing some topics. There are three levels of flexibility:
ECTSThe European Credit Transfer System (ECTS) intends to ease studies across European universities. They express the work load of students successfully passing a course. The assumed number of working hours per week is 40 to 45 in a year of 40 working weeks. Hence, the number of hours per year is 1600 to 1800. 60 credit points correspond to a full year of studies. In addition to visiting the lecture and the exercise session, the work for preparing the material of a session and solving the exercises is taken into account. Also the work load for preparing an examination on the lecture is included in the ECTS.The KDD CurriculumThe KDD curriculum is based on experience of teaching KDD to both, computer science and statistics students at Dortmund university. As a module, the course has been accredited for studying Data Management and Data Mining to a Bachelor/ Master degree. Discussions with European lecturers of KDD have been taken into account.Short CourseThe lecture with exercises gives an overview of Knowledge Discovery in Databases (KDD), also known as Data Mining. Starting from the cross-industrial standard process model of knowledge discovery and building upon database theory,methods for preprocessing and analysing very large data collections is presented. Analysis tasks are classification, regression, clustering, and frequent set mining.Goals: Students will know after visiting the short course, what KDD is and where it can be applied. In the exercises they will have used tools in order to solve some KDD tasks. Principles underlying the tools are known. Hence, students are capable of performing standard applications. Prerequisites: If the prerequisites are not known to the students, then the main ideas are taught (session number in brackets) and the number of sessions for the obligatory sessions is diminished by 2 (e.g., the overall KDD process and regression are handled in only 1 session each). Long CourseThe lecture with exercises gives an overview of Knowledge Discovery in Databases (KDD), also known as Data Mining. Starting from the cross-industrial standard process model of knowledge discovery and building upon database theory, methods for preprocessing and analysing very large data collections is presented. Analysis tasks are classification, regression, clustering, and frequent set mining. Learning from temporal data or exploiting spatial relationships is handled as spatio-temporal analysis. Analysis methods range from statistical learning and optimization methods to logical (multi-relational) approaches. Application areas are the analysis of very large data bases or multi-media collections like the world wide web (texts, images, audio and video databases).Goals: Students will know after the long course, what KDD is, where it can be applied, and how to develop an application. Students know challenges of the research field and are ready for starting own developments, for instance in the form of a diploma or master thesis. Prerequisites: If the students are not familiar with both the prerequisites, only 2 of the facultative chapters can be selected. If the students are knowledgable in one of the prerequisites, 3 of the facultative chapters can be chosen. If both prerequisites are known, all facultative chapters can be taught in the lecture. Course Materials:BooksThe overall course can be based on the following books:Modules
|