Selected Projects

SFB 876

  • Förderzeitraum: since 01/2011 (DFG)
  • Sprecher: Prof. Dr. Katharina Morik
  • URL: SFB 876

The collaborative research center SFB876 brings together data mining and embedded systems. On the one hand, embedded systems can be further improved using machine learning. On the other hand, data mining algorithms can be realized in hardware, e.g. FPGAs, or run on GPGPUs. The restrictions of ubiquitous systems in computing power, memory, and energy demand new algorithms for known learning tasks. These resource bounded learning algorithms may also be applied on extremely large data bases on servers.

Selected Publications of high-impact journals and conferences

Lang/Schubert/2021a Lang, Andreas and Schubert, Erich. BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees. In Information Systems, 2021.
Munteanu/etal/2021a Munteanu, Alexander and Omlor, Simon and Woodruff, David P.. Oblivious Sketching for Logistic Regression. In Proceedings of the 38th International Conference on Machine Learning (to appear), 2021.
Schubert/Rousseeuw/2021a Erich Schubert and Peter J. Rousseeuw. Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms. In Information Systems, 2021. Arrow Symbol
Buschjaeger/etal/2020 Buschjäger, Sebastian and Pfahler, Lukas and Buss, Jens and Morik, Katharina and Rhode, Wolfgang. On-Site Gamma-Hadron Separation with Deep Learning on FPGAs. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2020. Arrow Symbol
Pfahler/Morik/2020a Pfahler, Lukas and Morik, Katharina. Semantic Search in Millions of Equations. In KDD ‘20- Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2020. Arrow Symbol
Hess/etal/2019a Hess, Sibylle and Duivesteijn, Wouter and Honysz, Philipp-Jan and Morik, Katharina. The SpectACl of Nonconvex Clustering: a Spectral Approach to Density-Based Clustering. In AAAI, 2019.
kotthaus/etal/2019a Tözün, Pinar and Kotthaus, Helena. Scheduling Data-Intensive Tasks on Heterogeneous Many Cores. In IEEE Data Engineering Bulletin, Vol. 42, No. 1, pages 61-72, 2019. Arrow Symbol
Meintrup/etal/2019a Stefan Meintrup and Alexander Munteanu and Dennis Rohde. Random projections and sampling algorithms for clustering of high-dimensional polygonal curves. In Advances in Neural Information Processing Systems 32 (NeurIPS), pages 12807--12817, 2019. Arrow Symbol
Pfahler/etal/2019b Pfahler, Lukas and Schill, Jonathan and Morik, Katharina. The Search for Equations - Learning to Identify Similarities between Mathematical Expressions. In Procs. ECML PKDD2019, Springer, 2019. Arrow Symbol
Buschjaeger/Morik/2017b Buschjäger, Sebastian and Morik, Katharina. Decision Tree and Random Forest Implementations for Fast Filtering of Sensor Data. In IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 65-I, No. 1, pages 209--222, 2018. Arrow Symbol
Falkenberg/etal/2018b Robert Falkenberg and Benjamin Sliwa and Nico Piatkowski and Christian Wietfeld. Machine Learning Based Uplink Transmission Power Prediction for LTE and Upcoming 5G Networks using Passive Downlink Indicators. In 2018 IEEE 88th IEEE Vehicular Technology Conference (VTC-Fall), Chicago, USA, 2018.
Hess/etal/2018a Hess, Sibylle and Piatkowski, Nico and Morik, Katharina. The Trustworthy Pal: Controlling the False Discovery Rate in Boolean Matrix Factorization. In Proceedings of the 2018 SIAM International Conference on Data Mining, SDM 2018, May 3-5, 2018, San Diego Marriott Mission Valley, San Diego, CA, USA., pages 405--413, SIAM, 2018. Arrow Symbol
Molina/etal/2018a Molina, Alejandro and Munteanu, Alexander and Kersting, Kristian. Core Dependency Networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018. Arrow Symbol
Piatkowski/Morik/2018a Piatkowski, Nico and Morik, Katharina. Fast Stochastic Quadrature for Approximate Maximum-Likelihood Estimation. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, California, USA, August 6-10, 2018, 2018.
Saadallah/etal/2018a Saadallah, Amal and Finkeldey, Felix and Morik, Katharina and Wiederkehr, Petra. Stability prediction in milling processes using a simulation-based machine learning approach. In 51st CIRP conference on Manufacturing Systems, Elsevier, 2018.
VonDerBrueggen/etal/2018a von der Brüggen, Georg and Piatkowski, Nico and Chen, Kuan-Hsun and Chen, Jian-Jia and Morik, Katharina. Efficiently Approximating the Probability of Deadline Misses in Real-Time Systems. In 30th Euromicro Conference on Real-Time Systems, ECRTS 2018, July 3-6, 2018, Barcelona, Spain, LIPIcs, 2018.
Hess/etal/2017a Hess, Sibylle and Morik, Katharina and Piatkowski, Nico. The PRIMPING routine---Tiling through proximal alternating linearized minimization. In Data Mining and Knowledge Discovery, Vol. 31, No. 4, pages 1090--1131, 2017. Arrow Symbol
Hess/Morik/2017a Hess, Sibylle and Morik, Katharina. C-SALT: Mining Class-Specific ALTerations in Boolean Matrix Factorization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2017, Springer, 2017. Arrow Symbol
Liebig/etal/2017b Liebig, Thomas and Piatkowski, Nico and Bockermann, Christian and Morik, Katharina. Dynamic Route Planning with Real-Time Traffic Predictions. In Information Systems, Vol. 64, pages 258--265, Elsevier, 2017. Arrow Symbol
Molina/etal/2017a Molina, Alejandro and Natarajan, Sriraam and Kersting, Kristian. Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), pages 2357--2363, 2017.
Das/etal/2016a Das, Mayukh and Wu, Yunqing and Khot, Tushar and Kersting, Kristian and Natarajan, Sriraam. Scaling Lifted Probabilistic Inference and Learning Via Graph Databases,. In Proceedings of the SIAM International Conference on Data Mining (SDM), 2016. Arrow Symbol
Lee/etal/2016a Lee, Sangkyun and Brzyski, Damian and Bogdan, Malgorzata. Fast Saddle-Point Algorithm for Generalized Dantzig Selector and FDR Control with the Ordered l1-Norm. In Arthur Gretton and Christian C. Robert (editors), Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 780--789, JMLR W&CP, 2016. Arrow Symbol
Morris/etal/2016a Morris, Christopher and Kriege, Nils and Kersting, Kristian and Mutzel, Petra. Faster Kernels for Graphs with Continuous Attributes via Hashing. In IEEE International Conference on Data Mining (ICDM), pages 1095--1100, 2016.
Piatkowski/etal/2016a Piatkowski, Nico and Lee, Sangkyun and Morik, Katharina. Integer undirected graphical models for resource-constrained systems. In Neurocomputing, Vol. 173, No. 1, pages 9--23, Elsevier, 2016. Arrow Symbol
Piatkowski/Morik/2016a Piatkowski, Nico and Morik, Katharina. Stochastic Discrete Clenshaw-Curtis Quadrature. In Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, New York, USA, 19-24 June 2016, JMLR.org, 2016. Arrow Symbol
Poelitz/etal/2016a Poelitz, Christian and Duivesteijn, Wouter and Morik, Katharina. Interpretable Domain Adaptation via Optimization over the Stiefel Manifold. In Machine Learning, Vol. 104, No. 2-3, pages 315-336, 2016.
Stolpe/2016a Marco Stolpe. The Internet of Things: Opportunities and Challenges for Distributed Data Analysis. In SIGKDD Explorations, Vol. 18, No. 1, pages 15-34, 2016.
Taylor/etal/2016a Taylor, Joseph and Sharmanska, and Kersting, Kristian and Weir, David and Quadrianto, Novi. Learning using Unselected Features (LUFe). In S. Kambhampati (editors), Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), 2016. Arrow Symbol
Yang/etal/2016a Yang, Shuo and Khot, Tushar and Kersting, Kristian and Natarajan, Sriraam. Learning Continuous-Time Bayesian Networks in Relational Domains: A Non-Parametric Approach. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, 2016. Arrow Symbol
Bauckhage/etal/2015a Bauckhage, Christian and Kersting, Kristian and Hadiji, Fabian. Parameterizing the Distance Distribution of Undirected Networks. In Tom Heskes and Marina Meila (editors), Proceedings of the 31th Conference on Uncertainty in Artificial Intelligence (UAI), AUAI, 2015. Arrow Symbol
Bauckhage/etal/2015b Bauckhage, Christian and Kersting, Kristian and Hadiji, Fabian. How Viral are Viral Movies?. In Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM), 2015. Arrow Symbol
Bockermann/etal/2015a Bockermann, Christian and Brügge, Kai and Buß, Jens and Egorov, Alexey and Morik, Katharina and Rhode, Wolfgang and Ruhe, Tim. Online Analysis of High-Volume Data Streams in Astroparticle Physics. In European Conference on Machine Learning (ECML PKDD 2015), Industrial Track, Springer, 2015.
Boerner/etal/2015a Börner, Mathis and Rhode, Wolfgang and Ruhe, Tim and Morik, Katharina. Discovering Neutrinos through Data Analytics. In European Conference on Machine Learning (ECML PKDD 2015), Springer, 2015.
Downar/Duivesteijn/2015a Downar, Lennart and Duivesteijn, Wouter. Exceptionally Monotone Models - the Rank Correlation Model Class for Exceptional Model Mining. In Data Mining (ICDM), 2015 IEEE International Conference on, pages 111-120, IEEE, IEEE Computer Society, 2015.
Hadiji/etal/2015b Hadiji, Fabian and Molina, Alejandro and Natarajan, Sriraam and Kersting, Kristian. Poisson Dependency Networks: Gradient Boosted Models for Multivariate Count Data. In Machine Learning Journal (MLJ), Vol. 100, No. 2, pages 477-507, 2015. Arrow Symbol
Schramm/etal/2015a Schramm, Alexander and Köster, Johannes and Assenov, Yassen and Althoff, Kristina and Peifer, Martin and Mahlow, Ellen and Odersky, Andrea and Beisser, Daniela and Ernst, Corinna and Henssen, Anton G. and Stephan, Harald and Schröder, Christopher and Heukamp, Lukas and Engesser, Anne and Kahlert, Yvonne and Theissen, Jessica and Hero, Barbara and Roels, Frederik and Altmüller, Janine and Nürnberg, Peter and Astrahantseff, Kathy and Gloeckner, Christian and De Preter, Katleen and Plass, Christoph and Lee, Sangkyun and Lode, Holger N. and Henrich, Kai-Oliver and Gartlgruber, Moritz and Speleman, Frank and Schmezer, Peter and Westermann, Frank and Rahmann, Sven and Fischer, Matthias and Eggert, Angelika and Schulte, Johannes H.. Mutational dynamics between primary and relapse neuroblastomas. In Nature Genetics, Vol. 47, No. 8, pages 872--877, 2015. Arrow Symbol
Artikis/etal/2014a Alexander Artikis and Matthias Weidlich and Francois Schnitzler and Ioannis Boutsis and Thomas Liebig and Nico Piatkowski and Christian Bockermann and Katharina Morik and Vana Kalogeraki and Jakub Marecek and Avigdor Gal and Shie Mannor and Dimitrios Gunopulos and Dermot Kinane. Heterogeneous Stream Processing and Crowdsourcing for Urban Traffic Management. In Proceedings of the 17th International Conference on Extending Database Technology, 2014. Arrow Symbol
Kriege/etal/2014a Kriege, Nils and Neumann, Marion and Kersting, Kristian and Mutzel, Petra. Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels. In Kumar, Ravi and Toivonen, Hannu (editors), Proceedings of the IEEE International Conference on Data Mining (ICDM), pages 881--886, IEEE, 2014. Arrow Symbol
Lee/etal/2014a Sangkyun Lee and Jörg Rahnenführer and Michel Lang and Katleen de Preter and Pieter Mestdagh and Jan Koster and Rogier Versteeg and Raymond Stallings and Luigi Varesio and Shahab Asgharzadeh and Johannes Schulte and Kathrin Fielitz and Melanie Heilmann and Katharina Morik and Alexander Schramm. Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling. In PLoS ONE, Vol. 9, pages e108818, 2014.
Lee/Poelitz/2014a Lee, Sangkyun and Pölitz, Christian. Kernel Completion for Learning Consensus Support Vector Machines in Bandwidth-Limited Sensor Networks. In International Conference on Pattern Recognition Applications and Methods, 2014. Arrow Symbol
Schnitzler/etal/2014b Schnitzler, Francois and Artikis, Alexander and Weidlich, Matthias and Boutsis, Ioannis and Liebig, Thomas and Piatkowski, Nico and Bockermann, Christian and Morik, Katharina and Kalogeraki, Vana and Marecek, Jakub and Gal, Avigdor and Mannor, Shie and Kinane, Dermot and Gunopulos, Dimitrios. Heterogeneous Stream Processing and Crowdsourcing for Traffic Monitoring: Highlights. In Proceedings of the European Conference on Machine Learning (ECML), Nectar Track, pages 520-523, Springer, 2014.
Bauckhage/etal/2013b Bauckhage, Christian and Kersting, Kristian and Rastegarpanah, Bashir. The Weibull as a Model of Shortest Path Distributions in Random Networks. In L. Adamic and L. Getoor and B. Huang and J. Leskovec and J. McAuley (editors), Working Notes of the International Workshop on Mining and Learning with Graphs, Chicago, IL, USA, 2013. Arrow Symbol
Lieber/etal/2013a Lieber, Daniel and Stolpe, Marco and Konrad, Benedikt and Deuse, Jochen and Morik, Katharina. Quality Prediction in Interlinked Manufacturing Processes based on Supervised & Unsupervised Machine Learning. In Procedia CIRP - 46th CIRP Conf. on Manufacturing Systems, Vol. 7, pages 193-198, Elsevier, 2013. Arrow Symbol
Neumann/etal/2013b Neumann, Marion and Moreno, Plinio and Antanas, Laura and Garnett, Roman and Kersting, Kristian. Graph Kernels for Object Category Prediction in Task-Dependent Robot Grasping. In Adamic,L. and Getoor, L. and Huang, B. and Leskovec, J. and McAuley, J. (editors), Working Notes of the International Workshop on Mining and Learning with Graphs, Chicago, IL, USA, 2013.
Ruhe/etal/2013b Ruhe, Tim and Schmitz, Martin and Voigt, Tobias and Wornowizki, Max. DSEA: A Data Mining Approach to Unfolding. In International Cosmic Ray Conference (ICRC 2013), 2013.
Stolpe/etal/2013a Stolpe, M. and Bhaduri, K. and Das, K. and Morik, K.. Anomaly Detection in Vertically Partitioned Data by Distributed Core Vector Machines. In Blockeel, Hendrik and Kersting, Kristian and Nijssen, Siegfried and \vZelezný, Filip (editors), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III, pages 321--336, Springer, 2013.
Lee/Wright/2012b Lee, Sangkyun and Wright, Stephen J.. Manifold Identification in Dual Averaging Methods for Regularized Stochastic Online Learning. In Journal of Machine Learning Research, Vol. 13, pages 1705--1744, 2012. Arrow Symbol
Morik/etal/2011a Morik, Katharina and Kaspari, Andreas and Wurst, Michael and Skirzynski, Marcin. Multi-Objective Frequent Termset Clustering. In Knowledge and Information Systems, Vol. 30, No. 3, pages 715-738, 2012.
Morik/etal/2012a Morik, Katharina and Bhaduri, Kanishka and Kargupta, Hillol. Introduction to data mining for sustainability. In Data Mining and Knowledge Discovery, Vol. 24, No. 2, pages 311 -- 324, 2012.
Natarajan/etal/2012c Natarajan, Sriraam and Khot, Tushar and Kersting, Kristian and Gutmann, Bernd and Shavlik, Jude. Gradient-based boosting for statistical relational learning: The relational dependency network case. In Machine Learning Journal, Vol. 86, No. 1, 2012.
Lee/Wright/2011a Lee, Sangkyun and Wright, Stephen J.. Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning. In the 28th International Conference on Machine Learning, 2011. Arrow Symbol
Piatkowski/2011c Piatkowski, Nico. Parallel Algorithms for GPU Accelerated Probabilistic Inference. In International Workshop on Big Learning, Neural Information Processing Systems (NIPS), 2011.
Stolpe/Morik/2011a Stolpe, M. and Morik, K.. Learning from Label Proportions by Optimizing Cluster Model Selection. In Gunopulos, Dimitrios and Hofmann, Thomas and Malerba, Donato and Vazirgiannis, Michalis (editors), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part III, pages 349--364, Springer, 2011.

EU H2020: VaVeL: Variety, Veracity, VaLue

  • Laufzeit: since 03/2016
  • Partner: National and Kapodistrian University of Athens, TU Dortmund University, IBM, Technion - Israel Institute of Technology, Fraunhofer IAIS, Dublin City Council,WUT - Warsaw University of Technology,City of Warsaw, OPL, AGT International
  • URL: https://cordis.europa.eu/project/id/688380

Urban environments are awash with data from fixed and mobile sensors and monitoring infrastructures from public, private, or industry sources. Making such data useful would enable developing novel big data applications to benefit the citizens of Europe in areas such as transportation, infrastructures, and crime prevention. Urban data is heterogeneous, noisy, and unlabeled, which severely reduces its usability. Succinctly stated, urban data are difficult to understand. The goal of the VaVeL project is to radically advance our ability to use urban data in applications that can identify and address citizen needs and improve urban life. Our motivation comes from problems in urban transportation. This project will develop a general purpose framework for managing and mining multiple heterogeneous urban data streams for cities become more efficient, productive and resilient. The framework will be able to solve major issues that arise with urban transportation related data and are currently not dealt by existing stream management technologies. The project brings together two European cities that provide diverse large scale data of cross-country origin and real application needs, three major European companies in this space, and a strong group of researchers that have uniquely strong expertise in analyzing real-life urban data. VaVeL aims at making fundamental advances in addressing the most critical inefficiencies of current (big) data management and stream frameworks to cope with emerging urban sensor data thus making European urban data more accessible and easy to use and enhancing European industries that use big data management and analytics. The consortium develops end-user driven concrete scenaria that are addressing real, important problems with the potential of enormous impact, and a large spectrum of technology requirements, thus enabling the realization of the fundamental capabilities required and the realistic evaluation of the success of our methods.

Selected Publications

Buschjaeger/etal/2019a Buschjäger, Sebastian and Liebig, Thomas and Morik, Katharina. Gaussian Model Trees for Traffic Imputation. In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pages 243 - 254, SciTePress, 2019. Arrow Symbol
Sliwa/etal/2018a Sliwa, Benjamin and Liebig, Thomas and Falkenberg, Robert and Pillmann, Johannes and Wietfeld, Christian. Efficient machine-type communication using multi-metric context-awareness for cars used as mobile sensors in upcoming 5G networks. In Proceedings of the 87th Vehicular Technology Conference: VTC2018-Spring, IEEE, 2018.
Tomaras/etal/2018a Dimitrios Tomaras and Vana Kalogeraki and Thomas Liebig and Dimitrios Gunopulos. Crowd-based ecofriendly trip planning. In Proceedings of the 19th IEEE International Conference on Mobile Data Management, Aalborg, pages (accepted), IEEE Press, 2018.
Heppe/2017a Heppe, Lukas and Liebig, Thomas. Real-Time Public Transport Delay Prediction for Situation-Aware Routing. In Kern-Isberner, Gabriele and Fürnkranz, Johannes and Thimm, Matthias (editors), KI 2017: Advances in Artificial Intelligence: 40th Annual German Conference on AI, Dortmund, Germany, September 25--29, 2017, Proceedings, pages 128--141, Cham, Springer, 2017. Arrow Symbol
Liebig/2017a Liebig, Thomas. Smart navigation - chances, risk and challenges. In M. Jankowska and M. Pawelczyk and S. Augustyn and M. Kulawiak (editors), Navigation and Earth Observation - Law & Technology, pages (accepted), Warsaw, IUS PUBLICUM, 2017.
Liebig/2017b Liebig, Thomas. Report on Data Privacy. No. H2020-688380 D4.1, VAVEL Consortium, Dortmund, Germany, 2017.
Liebig/etal/2017b Liebig, Thomas and Piatkowski, Nico and Bockermann, Christian and Morik, Katharina. Dynamic Route Planning with Real-Time Traffic Predictions. In Information Systems, Vol. 64, pages 258--265, Elsevier, 2017. Arrow Symbol
Liebig/Sotzny/2017a Liebig, Thomas and Sotzny, Maurice. On Avoiding Traffic Jams with Dynamic Self-Organizing Trip Planning. In Clementini, Eliseo and Donnelly, Maureen and Yuan, May and Kray, Christian and Fogliaroni, Paolo and Ballatore, Andrea (editors), 13th International Conference on Spatial Information Theory (COSIT 2017), Vol. 86, pages 17:1--17:12, Dagstuhl, Germany, Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2017. Arrow Symbol
Souto/Liebig/2016a Gustavo Souto and Thomas Liebig. On Event Detection from Spatial Time Series for Urban TrafficApplications. In Stefan Michaelis and Nico Piatkowski and Marco Stolpe (editors), Solving Large Scale Learning Tasks: Challenges and Algorithms, Vol. 9580, pages 221--233, Springer, 2016. Arrow Symbol
Liebig/2015b Liebig, Thomas. Analysis Methods and Privacy Aspects in Spatio-Temporal Data Mining. In Marlena Jankowska and Miroslaw Pawelczyk and Sylvie Allouche and Marcin Kulawiak (editors), AI: Philosophy, Geoinformatics & Law, pages (to appear), Warsaw, IUS PUBLICUM, 2015.


  • Start: 06/2012
  • Partners: University of Zurich (Coordindator), TU Dortmund University, Rapid-I GmbH, Zattoo Europa AG, Vrije Universiteit Amsterdam, BBC
  • URL: Vista-TV.eu

Live video content is increasingly consumed over IP networks in addition to traditional broadcasting. The move to IP provides a huge opportunity to discover what people are watching in much greater breadth and depth than currently possible through interviews or set-top box based data gathering by rating organizations, because it allows direct analysis of consumer behavior via the logs they produce. The ViSTA-TV project proposes to gather consumers’ anonymized viewing behavior and the actual video streams from broadcasters/IPTV-transmitters, to combine them with enhanced electronic program guide information as the input for a holistic live-stream data mining analysis.
ViSTA-TV will employ the gathered information via a stream-analytics process to generate a high-quality linked open dataset (LOD) describing live TV programming. Combining the LOD with the behavioral information gathered, ViSTA-TV will be in the position to provide highly accurate market research information about viewing behavior that can be used for a variety of analyses of high interest to all participants in the TV-industry. ViSTA-TV will employ the information gathered to build a recommendation service that exploits both usage information and personalized feature extraction in conjunction with existing metadata to provide real-time viewing recommendations.
These results will be made possible by scientific progress in data-stream mining consisting of advances in data mining for tagging, recommendations, and behavioral analyses and temporal/probabilistic RDF-triple stream processing.

ViSTA-TV is a European Union-funded research project, beginning on 1 June 2012, and lasting for two years.

KobRA - Korpus-basierte linguistische Recherche und Analyse mit Hilfe von Data-Mining

  • Duration : 09/2012 - 08/2015
  • Participants: Prof. Dr. Angelika Storrer, Prof. Dr. Katharina Morik, Prof. Dr. Erhard Hinrichs, Dr. Alexander Geyken, Dr. Marc Kupietz, Dr. Andreas Witt
  • URL: KoBRA

Korpus-basierte Linguistik hat sich in den letzten Jahren zu einem wichtigen Gebiet der Sprachforschung entwickelt. In Infrastrukturprojekten wie CLARIN werden umfangreiche, strukturierte Sprachressourcen (Textkorpora, Baumbanken, lexikalische Wortnetze) bereitgestellt, die neuartige und attraktive Möglichkeiten bieten, linguistische Fragestellungen an authentischen Sprachverwendungsdaten zu untersuchen und quantitativ auszuwerten.

Ziel des Projekts ist es, durch den Einsatz innovativer Data-Mining-Verfahren (insbesondere Verfahren des maschinellen Lernens) die Möglichkeiten der empirischen linguistischen Arbeit mit strukturierten Sprachressourcen zu verbessern.

DDMD Data Driven Material Development

In diesem Projekt soll das systematische Design neuer Materialien durch die interdisziplinäre Zusammenarbeit zwischen Materialwissenschaften und Informatik vorangetrieben werden. Der neue Wissenschaftszweig heißt „Data Driven Materials Development“ oder „Datengetriebene Materialentwicklung“. In diesem Gebiet sollen sowohl neue Entdeckungen und Einsichten, z.B. über bisher unbekannte Phasen oder über besondere physikalische Eigenschaften der Materialien, gewonnen werden, als auch die Entwicklung neuer Materialien beschleunigt werden. Hierzu arbeiten in der Materialforschung zwei Lehrstühle der RUB zur synergistischen Nutzung von experimentellen Hochdurchsatzmethoden und analytischer Modellierung mit zwei Informatik-Lehrtsühlen der TU Dortmund und der Universität Duisburg-Essen zum Data Mining bzw. zur Hochdurchsatzanalyse zusammen. Dies ist notwendig, da in der systematischen Materialerforschung, insbesondere in den Bereichen Dünnschicht-Materialbibliotheken, Eigenschafts-Screenings und „Advanced Materials Simulation“, sehr große und hochdimensionale Datenmengen anfallen, die nur mit Hilfe von neuartigen Datenanalyseverfahren und entsprechenden Computerressourcen effizient analysiert werden können.

SFB 475 - Project A4

  • Duration: since 07/1997 (DFG)
  • Project Leader: Prof. Dr. Katharina Morik, Prof. Dr. Claus Weihs
  • Staff: Thorsten Joachims, Stefan Rüping, Ralf Klinkenberg, Ingo Mierswa, Martin Scholz, Michael Wurst
  • URL: SFB 475 - A4

The aim of project A4 is to combine statistical methods and methods of machine learning in order to improve Knowledge Discovery in Databases (KDD). After the process of the knowledge discovery was examined as a whole in the last period, we focus on two important problems in the current period. These problems often occur in practice of knowledge discovery. Corresponding research promises a special synergy effect because of the combination of statistical methods and machine learning methods: analysis temporal phenomenons in the form of events and the application of experimental design. Additionally, emphasis of the project is placed on the applied analysis of real databases.

Selected Publications

Mierswa, Ingo and Morik, Katharina. Automatic Feature Extraction for Classifying Audio Data. Machine Learning Journal, 58, 127-149, 2005. [pdf]
Mierswa, Ingo and Wurst, Michael. Efficient Case Based Feature Construction for Heterogeneous Learning Tasks. In Proceedings of the European Conference on Machine Learning (ECML), Springer-Verlag, Berlin, 641-648, 2005. [pdf]
Morik, Katharina and Siebes, Arno and Boulicault, Jean-François (editors). Detecting Local Patterns, Springer Lecture Notes in Artificial Intelligence, Volume 3539, Springer-Verlag, Berlin, 2005. Springer
Rüping, Stefan and Scheffer, Tobias (editors). Proceedings of the ICML 2005 Workshop on Learning with Multiple Views, 2005.
Scholz, Martin. Sampling-Based Sequential Subgroup Mining. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Databases (KDD), 265-274, 2005.
Klinkenberg, Ralf and Rüping, Stefan. Concept Drift and the Importance of Examples. In Franke, Jürgen and Nakhaeizadeh, Gholamreza and Renz, Ingrid (editors), Text Mining - Theoretical Aspects and Applications, Seiten 55--77, Physica-Verlag, Berlin, 2003.
Morik, Katharina and Rüping, Stefan. A Multistrategy Approach to the Classification of Phases in Business Cycles. In Proceedings of the European Conference on Machine Learning (ECML), Springer-Verlag, 307-318, 2002. [pdf]
Joachims, Thorsten. Estimating the Generalization Performance of a SVM Efficiently. In Proceedings of the International Conference on Machine Learning (ICML), Morgan Kaufman, 431-438, 2000. [pdf]
Joachims, Thorsten. Making large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999. [pdf]
Joachims, Thorsten. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning (ECML), Springer-Verlag, 137-142, 1998. [pdf]


  • Duration: ab 01/2006 (EU)
  • Project Leader: Fraunhofer Institut for Intelligent Autonomous Systems
  • Staff: Katharina Morik, Sebastian Land
  • URL:http://www.kdubiq.org

KDUbiq brings together newly emerging research in ubiquitous knowledge discovery. This multi-disciplinary approach constitutes a paradigm shift for the field of knowledge discovery since the idea of standalone analysis tools is abandoned in favour of process integrated, distributed and autonomous analysis systems.

Selected Publications

SFB 531 - Project B5

  • Duration: 01/2000 - 12/2002 (DFG)
  • Project Leader: Prof. Dr. Katharina Morik
  • Staff: Oliver Ritthoff, Ralf Klinkenberg, Ingo Mierswa
  • URL: SFB 531 - B5

The goal of this project is the identification and formalization of practically relevant learning tasks on the basis of applications in the C-projects. Particular learning tasks which deviate from the standard scenario of classification respectively optimization as, e.g., learning with non-factual knowledge, repeated learning of similar concepts, learning of temporally varying concepts and feature selection/construction will be considered. In this context the problem of feature selection/construction will be a central aspect in the scope of investigations.

Selected Publications

Klinkenberg, Ralf. Learning Drifting Concepts: Example Selection vs. Example Weighting. In Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, Vol. 8, No. 3, 2004.
Klinkenberg, Ralf and Rüping, Stefan. Concept Drift and the Importance of Examples. In Franke, Jürgen and Nakhaeizadeh, Gholamreza and Renz, Ingrid (editors), Text Mining -- Theoretical Aspects and Applications, Seiten 55-77, Berlin, Germany, Physica-Verlag, 2003.
Ritthoff, Oliver and Klinkenberg, Ralf. Evolutionary Feature Space Transformation using Type-Restricted Generators. In Cantu-Paz, E. et al.(editors), Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2003) - Part II, Seiten 1606-1607, Springer, 2003.
Ritthoff, Oliver and Klinkenberg, Ralf and Fischer, Simon and Mierswa, Ingo. A Hybrid Approach to Feature Selection and Generation Using an Evolutionary Algorithm. In Bullinaria, John A. (editors), Proceedings of the 2002 U.K. Workshop on Computational Intelligence (UKCI-02), Seiten 147-154, Birmingham, UK, University of Birmingham, 2002.
Klinkenberg, Ralf und Joachims, Thorsten. Detecting concept drift with support vector machines. In P. Langley (Hrsg.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML), Seiten 487-494. Morgan Kaufmann, San Francisco, CA, USA, 2000.

SFB 531 - Project C11

  • Duration: 01/2003 - 12/2005 (DFG)
  • Project Leader: Prof. Dr. Katharina Morik, Prof. Dr. Henner Schmidt-Traub
  • Staff: Dipl.-Ing. Bernd Hicking, Dipl.-Inform. Hanna Köpcke, Dipl.-Inform. Ingo Mierswa, Dipl.-Inform. Oliver Ritthoff
  • URL: SFB 531 - C11

The goal of this project is to find optimal positionings for given chemical equipment with methods from the field of Computational Intelligence. We compare and evaluate several knowledge-based and numerical approaches to optimize a plant layout under given constraints. Up to now previous knowledge is not used for sub-symbolic optimization and ideas of knowledge-based optimization should be transferred into Computation Intelligence. This knowledge is extracted from plans provided by engineers.

Selected Publications

Morik, Katharina and Schmidt-Traub, Henner and Hicking, Bernd and Köpcke, Hanna and Mierswa, Ingo. Layout optimization for chemical plants. In Industriemanagement, 2005.
Mierswa, Ingo. Incorporating Fuzzy Knowledge into Fitness: Multiobjective Evolutionary 3D Design of Process Plants. In Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2005, Washington D.C., USA, 2005.


  • Duration: 04/2001 - 12/2003 (BMBF)
  • Project Leader: Fraunhofer for Media Communication
  • Staff: Michael Wurst, Katharina Morik
  • URL: http://awake.imk.fhg.de

The aim of the project Awake is to explore how implicit knowledge structures in different communities of experts can be discovered, visualised and employed for semantic navigation of information spaces and construction of new knowledge. The developed methods combine semantic text analysis with Machine Learning and interfaces for visualising relationships and creating new knowledge structures. Application scenarios include automatic generation of personalised knowledge portals, collaborative semantic exploration of complex information spaces and construction of shared ontology networks for the SemanticWeb. The real-world testbed and context of development is the Internet platform netzspannung.org that aims at establishing a knowledge portal connecting digital art, culture and information technology.

Selected Publications

Novak, Jasminko and Wurst, Michael. Supporting Knowledge Creation and Sharing in Communities Based on Mapping Implicit Knowledge. In j-jucs, Vol. 10, No. 3, pages 235--251, 2004.
Wurst, Michael and Novak, Jasminko. Knowledge Sharing im Heterogeneous Expert Communities based on Personal Taxonomies. In ECAI Workshop on Agent Mediated Knowledge Management, 2004.
Novak, Jasminko and Wurst, Michael. Discovering, Visualizing and Sharing Knowledge through Personalized Learning Knowledge Maps. In Agent Mediated Knowledge Management, 2003.
Novak, Jasminko and Wurst, Michael. Supporting Communities of Practice Through Personalisation and Collaborative Structuring based on Capturing Implicit Knowledge. In Proceedings of the International Conference on Knowledge Management, 2003.
Morik, Katharina and Wurst, Michael. Knowledge Dicovery and Knowledge Visualization, Perspektiven vernetzter Wissensraeume, Workshop 2002. 2002.

Mining Mart

  • Duration: 01/2000 - 02/2003 (EU)
  • Project Leader: Katharina Morik
  • Staff: Katharina Morik, Martin Scholz, Timm Euler, Harald Liedtke
  • URL:http://mmart.cs.uni-dortmund.de

Within the data mining process considerable time is spent for pre-processing the data. Practical experiences have shown that the time spent on preprocessing can take from 50% up to 80% of the entire data mining process when using the traditional attribute-value learners. Thats why preprocessing is the key issue in data analysis. The time is spend for:

  • Choosing the learning task
  • Sampling
  • Feature generation, extraction, and selection
  • Data cleaning
  • Model selection or tuning the hypothesis space
  • Defining appropriate evaluation criteria

Experienced users can apply any learning system successfully to any application, since they prepare the data well. The representation of examples and the choice of a sample determines the applicability of learning methods. A chain of data transformations (learning steps or manual preprocessing) delivers the desired result. Experienced users remember prototypical successful transformation/learning chains.

Selected Publications

Euler, Timm. Publishing Operational Models of Data Mining Case Studies. In Proceedings of the Workshop on Data Mining Case Studies at the 5th IEEE International Conference on Data Mining (ICDM), pages 99--106, Houston, Texas, USA, 2005.
Euler, Timm. Modelling Data Mining Processes on a Conceptual Level. In Proceedings of the 5th International Conference on Decision Support for Telecommunications and Information Society, Warsaw, Poland, 2005.
Morik, Katharina and Scholz, Martin. The MiningMart Approach to Knowledge Discovery in Databases. In Ning Zhong and Jiming Liu (editors), Intelligent Technologies for Information Analysis, pages 47--65, Springer, 2004.
Kietz, Jörg-Uwe and Vaduva, Anca and Zücker, Regina, MiningMart: Metadata-Driven Preprocessing. In Proceedings of the ECML/PKDD Workshop on Database Support for KDD, 2001.
Kietz, Jörg-Uwe and Vaduva, Anca and Zücker, Regina, Mining Mart: Combining Case-Based-Reasoning and Multi-Strategy Learning into a Framework to reuse KDD-Application. In Proceedings of the 5th International Workshop on Multistrategy Learning, R.S. Michalki and P. Brazdil (editors), 2000.
Morik, Katharina. The Representation Race - Preprocessing for Handling Time Phenomena. In Proceedings of the European Conference on Machine Learning, Barcelona, Spain, Springer, 2000.


The COMRIS project aims to develop, demonstrate and experimentally evaluate a scalable approach to integrating the Inhabited Information Spaces schema with a concept of software agents. The COMRIS vision of co-habited mixed-reality information spaces emphasizes the co-habitation of software and human agents in a pair of closely coupled spaces, a virtual and a real one. However, this project does not pursue the perceptual integration of real and virtual space into an augmented reality. Instead the coupling aims at focusing the large potential for useful social interactions in each of the spaces, so that they become more manageable, goal-directed and effective.

Selected Publications

Cranefield, Stephen and Haustein, Stefan and Purvis, Martin. UML-Based Ontology Modelling for Software Agents. In Proceedings of the Autonomous Agents 2001 Workshop on Ontologies in Agent Systems, 2001.
Haustein, Stefan. Semantic Web Languages: RDF vs. SOAP Serialization. In Proceedings of the Second International Workshop on the Semantic Web at WWW10, 2001.
Haustein, Stefan. Utilising an Ontology Based Repository to Connect Web Miners and Application Agents. In Proceedings of the ECML/PKDD Workshop on Semantic Web Mining, 2001.
Haustein, Stefan and Lüdecke, Sascha and Schwering, Christian. The Knowledge Agency. In Proceedings of the Forth International Conference on Autonomous Agents, pages 205 -- 206, ACM SIGART, Barcelona, Spain, ACM Press, New York, 2000.
Haustein, Stefan and Lüdecke, Sascha. Towards Information Agent Interoperability. In Cooperative Information Agents IV -- The Future of Information Agents in Cyberspace, Vol. 1860, pages 208 -- 219, Boston, USA, Springer, 2000.
Morik, Katharina and Haustein, Stefan. The Challenge of Discovering Meta--Data. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, American Association for Artificial Intelligence (AAAI), AAAI press, 2000.


  • Duration: 9/1992 - 8/1995 (EU)
  • Project Leader: University of Karlsruhe
  • Staff: Volker Klingspor, Katharina Morik, Anke Rieger
  • URL:

Within the project BLearn II machine learning methods are applied to robotics, in order to reduce the time for setting up and modifying robot applications, and in order to make the operation of robots more user-friendly. The task of chair VIII within this project is to integrate logic-based learning into navigation. The goal is to allow a human user to give abstract commands, such as &qoute;Pass through the doorway, turn left and stop &qoute;. In order to execute these commands, the robot has to be able to recognize, for example, a door or a cupboard. In addition, the robot has to be able to find a door and to execute a left turn in a flexible way, adjusting itself to the different spatial conditions. A hierarchy of learning steps has been developed, which starts from sensor data and robot moves, and which leads to operational concepts. They integrate information about perceptions and actions, such that object recognition and action are coupled directly.

Selected Publications

Morik, Katharina and Klingspor, Volker and Kaiser, Michael (editors). Making Robots Smarter -- Combining Sensing and Action through Robot Learning. Kluwer Academic Press, 1999.
Klingspor, Volker and Morik, Katharina and Rieger, Anke. Learning Concepts from Sensor Data of a Mobile Robot. In Machine Learning, Vol. 23, No. 2/3, pages 305-332, 1996.
Klingspor, Volker and Demiris, J. and Kaiser, Michael. Human-Robot-Communication and Machine Learning. In Applied Artificial Intelligence, Vol. 11, No. 7/8, pages 719--746, 1997.
Klingspor, Volker and Morik, Katharina. Towards Concept Formation Grounded on Perception and Action of a Mobile Robot. In U. Rembold and R. Dillmann and L.O. Hertzberger and T. Kanade (editors), IAS--4, Proc. of the 4th Intern. Conference on Intelligent Autonomous Systems, pages 271--278, Amsterdam, IOS Press, 1995.