Selected Projects

SFB 876

Förderzeitraum: since 01/2011 (DFG)
Sprecher: Prof. Dr. Katharina Morik
URL: SFB 876

The collaborative research center SFB876 brings together data mining and embedded systems. On the one hand, embedded systems can be further improved using machine learning. On the other hand, data mining algorithms can be realized in hardware, e.g. FPGAs, or run on GPGPUs. The restrictions of ubiquitous systems in computing power, memory, and energy demand new algorithms for known learning tasks. These resource bounded learning algorithms may also be applied on extremely large data bases on servers.

Selected Publications of high-impact journals and conferences

Fischer/etal/2023a	Raphael Fischer and Jakobs, Matthias and Morik, Katharina. Energy efficiency considerations for popular AI benchmarks. In AAAI-2 Workshop AI for Innovation, 2023.
Bause/etal/2022a	Franka Bause and Erich Schubert and Nils M. Kriege. EmbAssi: embedding assignment costs for similarity search in large graph databases. In Data Mining and Knowledge Discovery, Springer, 2022.
Buschjaeger/etal/2021d	Buschjäger, Sebastian and Hess, Sibylle and Morik, Katharina J.. Shrub Ensembles for Online Classification. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), Vol. 36, No. 6, pages 6123-6131, AAAI Press, 2022.
Lang/Schubert/2021a	Andreas Lang and Erich Schubert. BETULA: Fast Clustering of Large Data with Improved BIRCH CF-Trees. In Information Systems, Vol. 108, pages 101918, 2022.
Morik/etal/2021a	Morik, Katharina and Kotthaus, Helena and Fischer, Raphael and Mücke, Sascha and Jakobs, Matthias and Piatkowski, Nico and Pauly, Andreas and Heppe, Lukas and Heinrich, Danny. Yes We Care! - Certification for Machine Learning Methods through the Care Label Framework. In Elisa Fromont (editors), Frontiers in Artificial Intelligence, Frontiers, 2022.
Saadallah/etal/2022a	Saadallah, Amal and Büscher, Jan and Abdulaaty, Omar and Panusch,Thorben and Deuse,Jochen and Morik, Katharina. Explainable Predictive Quality Inspection using Deep Learning in Electronics Manufacturing. In 55th CIRP conference on Manufacturing Systems, Elsevier, 2022.
Saadallah/etal/2022c	Saadallah, Amal and Abdulaaty, Omar and Büscher, Jan and Panusch,Thorben and Morik, Katharina and Deuse,Jochen. Early Quality Prediction using Deep Learning on Time Series Sensor Data. In 55th CIRP conference on Manufacturing Systems, Elsevier, 2022.
Thordsen/Schubert/2022a	Erik Thordsen and Erich Schubert. ABID: Angle Based Intrinsic Dimensionality -- Theory and Analysis. In Information Systems, Vol. 108, pages 101989, 2022.
Munteanu/etal/2021a	Munteanu, Alexander and Omlor, Simon and Woodruff, David P.. Oblivious Sketching for Logistic Regression. In Proceedings of the 38th International Conference on Machine Learning (ICML), 2021.
Schubert/Rousseeuw/2021a	Erich Schubert and Peter J. Rousseeuw. Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms. In Information Systems, Vol. 101, pages 101804, 2021.
Buschjaeger/etal/2020a	Buschjäger, Sebastian and Pfahler, Lukas and Buss, Jens and Morik, Katharina and Rhode, Wolfgang. On-Site Gamma-Hadron Separation with Deep Learning on FPGAs. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2020.
Buschjaeger/Honysz/2020a	Buschjäger, Sebastian and Honysz, Philipp-Jan and Morik, Katharina. Generalized Isolation Forest: Some Theory and More Applications -- Extended Abstract. In Proceedings 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA 2020), IEEE, 2020.
Pfahler/Morik/2020a	Pfahler, Lukas and Morik, Katharina. Semantic Search in Millions of Equations. In KDD '20- Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2020.
Saadallah/Morik/2020g	Saadallah, Amal and Katharina, Morik. Active Sampling for Learning Interpretable Surrogate Machine Learning Models. In IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2020.
Shao/etal/2020a	Xiaoting Shao and Alejandro Molina and Antonio Vergari and Karl Stelzner and Robert Perharz and Thomas Liebig and Kristian Kersting. Conditional Sum-Product Networks: Composing Neural Networks into Probabilistic Tractable Models. In Proceedings of the 10th International Conference on Probabilistic Graphical Models, 2020.
Hess/etal/2019a	Hess, Sibylle and Duivesteijn, Wouter and Honysz, Philipp-Jan and Morik, Katharina. The SpectACl of Nonconvex Clustering: a Spectral Approach to Density-Based Clustering. In AAAI, 2019.
Kotthaus/etal/2019a	Tözün, Pinar and Kotthaus, Helena. Scheduling Data-Intensive Tasks on Heterogeneous Many Cores. In IEEE Data Engineering Bulletin, Vol. 42, No. 1, pages 61-72, 2019.
Meintrup/etal/2019a	Meintrup, Stefan and Munteanu, Alexander and Rohde, Dennis. Random projections and sampling algorithms for clustering of high-dimensional polygonal curves. In Advances in Neural Information Processing Systems 32 (NeurIPS), pages 12807--12817, 2019.
Pfahler/etal/2019b	Pfahler, Lukas and Schill, Jonathan and Morik, Katharina. The Search for Equations - Learning to Identify Similarities between Mathematical Expressions. In Procs. ECML PKDD2019, Springer, 2019.
Buschjaeger/Morik/2017b	Buschjäger, Sebastian and Morik, Katharina. Decision Tree and Random Forest Implementations for Fast Filtering of Sensor Data. In IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 65-I, No. 1, pages 209--222, 2018.
Falkenberg/etal/2018b	Robert Falkenberg and Benjamin Sliwa and Nico Piatkowski and Christian Wietfeld. Machine Learning Based Uplink Transmission Power Prediction for LTE and Upcoming 5G Networks using Passive Downlink Indicators. In 2018 IEEE 88th IEEE Vehicular Technology Conference (VTC-Fall), Chicago, USA, 2018.
Hess/etal/2018a	Hess, Sibylle and Piatkowski, Nico and Morik, Katharina. The Trustworthy Pal: Controlling the False Discovery Rate in Boolean Matrix Factorization. In Proceedings of the 2018 SIAM International Conference on Data Mining, SDM 2018, May 3-5, 2018, San Diego Marriott Mission Valley, San Diego, CA, USA., pages 405--413, SIAM, 2018.
Molina/etal/2018a	Molina, Alejandro and Munteanu, Alexander and Kersting, Kristian. Core Dependency Networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018.
Piatkowski/Morik/2018a	Piatkowski, Nico and Morik, Katharina. Fast Stochastic Quadrature for Approximate Maximum-Likelihood Estimation. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, California, USA, August 6-10, 2018, 2018.
Saadallah/etal/2018a	Saadallah, Amal and Finkeldey, Felix and Morik, Katharina and Wiederkehr, Petra. Stability prediction in milling processes using a simulation-based machine learning approach. In 51st CIRP conference on Manufacturing Systems, Elsevier, 2018.
VonDerBrueggen/etal/2018a	von der Brüggen, Georg and Piatkowski, Nico and Chen, Kuan-Hsun and Chen, Jian-Jia and Morik, Katharina. Efficiently Approximating the Probability of Deadline Misses in Real-Time Systems. In 30th Euromicro Conference on Real-Time Systems, ECRTS 2018, July 3-6, 2018, Barcelona, Spain, LIPIcs, 2018.
Hess/etal/2017a	Hess, Sibylle and Morik, Katharina and Piatkowski, Nico. The PRIMPING routine---Tiling through proximal alternating linearized minimization. In Data Mining and Knowledge Discovery, Vol. 31, No. 4, pages 1090--1131, 2017.
Hess/Morik/2017a	Hess, Sibylle and Morik, Katharina. C-SALT: Mining Class-Specific ALTerations in Boolean Matrix Factorization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2017, Springer, 2017.
Liebig/etal/2017b	Liebig, Thomas and Piatkowski, Nico and Bockermann, Christian and Morik, Katharina. Dynamic Route Planning with Real-Time Traffic Predictions. In Information Systems, Vol. 64, pages 258--265, Elsevier, 2017.
Molina/etal/2017a	Molina, Alejandro and Natarajan, Sriraam and Kersting, Kristian. Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), pages 2357--2363, 2017.
Das/etal/2016a	Das, Mayukh and Wu, Yunqing and Khot, Tushar and Kersting, Kristian and Natarajan, Sriraam. Scaling Lifted Probabilistic Inference and Learning Via Graph Databases,. In Proceedings of the SIAM International Conference on Data Mining (SDM), 2016.
Lee/etal/2016a	Lee, Sangkyun and Brzyski, Damian and Bogdan, Malgorzata. Fast Saddle-Point Algorithm for Generalized Dantzig Selector and FDR Control with the Ordered l1-Norm. In Arthur Gretton and Christian C. Robert (editors), Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 780--789, JMLR W&CP, 2016.
Morris/etal/2016a	Morris, Christopher and Kriege, Nils and Kersting, Kristian and Mutzel, Petra. Faster Kernels for Graphs with Continuous Attributes via Hashing. In IEEE International Conference on Data Mining (ICDM), pages 1095--1100, 2016.
Piatkowski/etal/2016a	Piatkowski, Nico and Lee, Sangkyun and Morik, Katharina. Integer undirected graphical models for resource-constrained systems. In Neurocomputing, Vol. 173, No. 1, pages 9--23, Elsevier, 2016.
Piatkowski/Morik/2016a	Piatkowski, Nico and Morik, Katharina. Stochastic Discrete Clenshaw-Curtis Quadrature. In Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, New York, USA, 19-24 June 2016, JMLR.org, 2016.
Poelitz/etal/2016a	Poelitz, Christian and Duivesteijn, Wouter and Morik, Katharina. Interpretable Domain Adaptation via Optimization over the Stiefel Manifold. In Machine Learning, Vol. 104, No. 2-3, pages 315-336, 2016.
Stolpe/2016a	Marco Stolpe. The Internet of Things: Opportunities and Challenges for Distributed Data Analysis. In SIGKDD Explorations, Vol. 18, No. 1, pages 15-34, 2016.
Taylor/etal/2016a	Taylor, Joseph and Sharmanska, and Kersting, Kristian and Weir, David and Quadrianto, Novi. Learning using Unselected Features (LUFe). In S. Kambhampati (editors), Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), 2016.
Yang/etal/2016a	Yang, Shuo and Khot, Tushar and Kersting, Kristian and Natarajan, Sriraam. Learning Continuous-Time Bayesian Networks in Relational Domains: A Non-Parametric Approach. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, 2016.
Bauckhage/etal/2015a	Bauckhage, Christian and Kersting, Kristian and Hadiji, Fabian. Parameterizing the Distance Distribution of Undirected Networks. In Tom Heskes and Marina Meila (editors), Proceedings of the 31th Conference on Uncertainty in Artificial Intelligence (UAI), AUAI, 2015.
Bauckhage/etal/2015b	Bauckhage, Christian and Kersting, Kristian and Hadiji, Fabian. How Viral are Viral Movies?. In Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM), 2015.
Bockermann/etal/2015a	Bockermann, Christian and Brügge, Kai and Buß, Jens and Egorov, Alexey and Morik, Katharina and Rhode, Wolfgang and Ruhe, Tim. Online Analysis of High-Volume Data Streams in Astroparticle Physics. In European Conference on Machine Learning (ECML PKDD 2015), Industrial Track, Springer, 2015.
Boerner/etal/2015a	Börner, Mathis and Rhode, Wolfgang and Ruhe, Tim and Morik, Katharina. Discovering Neutrinos through Data Analytics. In European Conference on Machine Learning (ECML PKDD 2015), Springer, 2015.
Downar/Duivesteijn/2015a	Downar, Lennart and Duivesteijn, Wouter. Exceptionally Monotone Models - the Rank Correlation Model Class for Exceptional Model Mining. In Data Mining (ICDM), 2015 IEEE International Conference on, pages 111-120, IEEE, IEEE Computer Society, 2015.
Hadiji/etal/2015b	Hadiji, Fabian and Molina, Alejandro and Natarajan, Sriraam and Kersting, Kristian. Poisson Dependency Networks: Gradient Boosted Models for Multivariate Count Data. In Machine Learning Journal (MLJ), Vol. 100, No. 2, pages 477-507, 2015.
Schramm/etal/2015a	Schramm, Alexander and Köster, Johannes and Assenov, Yassen and Althoff, Kristina and Peifer, Martin and Mahlow, Ellen and Odersky, Andrea and Beisser, Daniela and Ernst, Corinna and Henssen, Anton G. and Stephan, Harald and Schröder, Christopher and Heukamp, Lukas and Engesser, Anne and Kahlert, Yvonne and Theissen, Jessica and Hero, Barbara and Roels, Frederik and Altmüller, Janine and Nürnberg, Peter and Astrahantseff, Kathy and Gloeckner, Christian and De Preter, Katleen and Plass, Christoph and Lee, Sangkyun and Lode, Holger N. and Henrich, Kai-Oliver and Gartlgruber, Moritz and Speleman, Frank and Schmezer, Peter and Westermann, Frank and Rahmann, Sven and Fischer, Matthias and Eggert, Angelika and Schulte, Johannes H.. Mutational dynamics between primary and relapse neuroblastomas. In Nature Genetics, Vol. 47, No. 8, pages 872--877, 2015.
Artikis/etal/2014a	Alexander Artikis and Matthias Weidlich and Francois Schnitzler and Ioannis Boutsis and Thomas Liebig and Nico Piatkowski and Christian Bockermann and Katharina Morik and Vana Kalogeraki and Jakub Marecek and Avigdor Gal and Shie Mannor and Dimitrios Gunopulos and Dermot Kinane. Heterogeneous Stream Processing and Crowdsourcing for Urban Traffic Management. In Proceedings of the 17th International Conference on Extending Database Technology, 2014.
Kriege/etal/2014a	Kriege, Nils and Neumann, Marion and Kersting, Kristian and Mutzel, Petra. Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels. In Kumar, Ravi and Toivonen, Hannu (editors), Proceedings of the IEEE International Conference on Data Mining (ICDM), pages 881--886, IEEE, 2014.
Lee/etal/2014a	Sangkyun Lee and Jörg Rahnenführer and Michel Lang and Katleen de Preter and Pieter Mestdagh and Jan Koster and Rogier Versteeg and Raymond Stallings and Luigi Varesio and Shahab Asgharzadeh and Johannes Schulte and Kathrin Fielitz and Melanie Heilmann and Katharina Morik and Alexander Schramm. Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling. In PLoS ONE, Vol. 9, pages e108818, 2014.
Lee/Poelitz/2014a	Lee, Sangkyun and Pölitz, Christian. Kernel Completion for Learning Consensus Support Vector Machines in Bandwidth-Limited Sensor Networks. In International Conference on Pattern Recognition Applications and Methods, 2014.
Schnitzler/etal/2014b	Schnitzler, Francois and Artikis, Alexander and Weidlich, Matthias and Boutsis, Ioannis and Liebig, Thomas and Piatkowski, Nico and Bockermann, Christian and Morik, Katharina and Kalogeraki, Vana and Marecek, Jakub and Gal, Avigdor and Mannor, Shie and Kinane, Dermot and Gunopulos, Dimitrios. Heterogeneous Stream Processing and Crowdsourcing for Traffic Monitoring: Highlights. In Proceedings of the European Conference on Machine Learning (ECML), Nectar Track, pages 520-523, Springer, 2014.
Bauckhage/etal/2013b	Bauckhage, Christian and Kersting, Kristian and Rastegarpanah, Bashir. The Weibull as a Model of Shortest Path Distributions in Random Networks. In L. Adamic and L. Getoor and B. Huang and J. Leskovec and J. McAuley (editors), Working Notes of the International Workshop on Mining and Learning with Graphs, Chicago, IL, USA, 2013.
Lieber/etal/2013a	Lieber, Daniel and Stolpe, Marco and Konrad, Benedikt and Deuse, Jochen and Morik, Katharina. Quality Prediction in Interlinked Manufacturing Processes based on Supervised & Unsupervised Machine Learning. In Procedia CIRP - 46th CIRP Conf. on Manufacturing Systems, Vol. 7, pages 193-198, Elsevier, 2013.
Neumann/etal/2013b	Neumann, Marion and Moreno, Plinio and Antanas, Laura and Garnett, Roman and Kersting, Kristian. Graph Kernels for Object Category Prediction in Task-Dependent Robot Grasping. In Adamic,L. and Getoor, L. and Huang, B. and Leskovec, J. and McAuley, J. (editors), Working Notes of the International Workshop on Mining and Learning with Graphs, Chicago, IL, USA, 2013.
Ruhe/etal/2013b	Ruhe, Tim and Schmitz, Martin and Voigt, Tobias and Wornowizki, Max. DSEA: A Data Mining Approach to Unfolding. In International Cosmic Ray Conference (ICRC 2013), 2013.
Stolpe/etal/2013a	Stolpe, M. and Bhaduri, K. and Das, K. and Morik, K.. Anomaly Detection in Vertically Partitioned Data by Distributed Core Vector Machines. In Blockeel, Hendrik and Kersting, Kristian and Nijssen, Siegfried and \vZelezný, Filip (editors), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III, pages 321--336, Springer, 2013.
Lee/Wright/2012b	Lee, Sangkyun and Wright, Stephen J.. Manifold Identification in Dual Averaging Methods for Regularized Stochastic Online Learning. In Journal of Machine Learning Research, Vol. 13, pages 1705--1744, 2012.
Morik/etal/2011a	Morik, Katharina and Kaspari, Andreas and Wurst, Michael and Skirzynski, Marcin. Multi-Objective Frequent Termset Clustering. In Knowledge and Information Systems, Vol. 30, No. 3, pages 715-738, 2012.
Morik/etal/2012a	Morik, Katharina and Bhaduri, Kanishka and Kargupta, Hillol. Introduction to data mining for sustainability. In Data Mining and Knowledge Discovery, Vol. 24, No. 2, pages 311 -- 324, 2012.
Natarajan/etal/2012c	Natarajan, Sriraam and Khot, Tushar and Kersting, Kristian and Gutmann, Bernd and Shavlik, Jude. Gradient-based boosting for statistical relational learning: The relational dependency network case. In Machine Learning Journal, Vol. 86, No. 1, 2012.
Lee/Wright/2011a	Lee, Sangkyun and Wright, Stephen J.. Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning. In the 28th International Conference on Machine Learning, 2011.
Piatkowski/2011c	Piatkowski, Nico. Parallel Algorithms for GPU Accelerated Probabilistic Inference. In International Workshop on Big Learning, Neural Information Processing Systems (NIPS), 2011.
Stolpe/Morik/2011a	Stolpe, M. and Morik, K.. Learning from Label Proportions by Optimizing Cluster Model Selection. In Gunopulos, Dimitrios and Hofmann, Thomas and Malerba, Donato and Vazirgiannis, Michalis (editors), Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part III, pages 349--364, Springer, 2011.

EU H2020: VaVeL: Variety, Veracity, VaLue

Laufzeit: since 03/2016
Partner: National and Kapodistrian University of Athens, TU Dortmund University, IBM, Technion - Israel Institute of Technology, Fraunhofer IAIS, Dublin City Council,WUT - Warsaw University of Technology,City of Warsaw, OPL, AGT International
URL: https://cordis.europa.eu/project/id/688380

Urban environments are awash with data from fixed and mobile sensors and monitoring infrastructures from public, private, or industry sources. Making such data useful would enable developing novel big data applications to benefit the citizens of Europe in areas such as transportation, infrastructures, and crime prevention. Urban data is heterogeneous, noisy, and unlabeled, which severely reduces its usability. Succinctly stated, urban data are difficult to understand. The goal of the VaVeL project is to radically advance our ability to use urban data in applications that can identify and address citizen needs and improve urban life. Our motivation comes from problems in urban transportation. This project will develop a general purpose framework for managing and mining multiple heterogeneous urban data streams for cities become more efficient, productive and resilient. The framework will be able to solve major issues that arise with urban transportation related data and are currently not dealt by existing stream management technologies. The project brings together two European cities that provide diverse large scale data of cross-country origin and real application needs, three major European companies in this space, and a strong group of researchers that have uniquely strong expertise in analyzing real-life urban data. VaVeL aims at making fundamental advances in addressing the most critical inefficiencies of current (big) data management and stream frameworks to cope with emerging urban sensor data thus making European urban data more accessible and easy to use and enhancing European industries that use big data management and analytics. The consortium develops end-user driven concrete scenaria that are addressing real, important problems with the potential of enormous impact, and a large spectrum of technology requirements, thus enabling the realization of the fundamental capabilities required and the realistic evaluation of the success of our methods.

Selected Publications

Buschjaeger/etal/2019a	Buschjäger, Sebastian and Liebig, Thomas and Morik, Katharina. Gaussian Model Trees for Traffic Imputation. In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pages 243 - 254, SciTePress, 2019.
Sliwa/etal/2018a	Sliwa, Benjamin and Liebig, Thomas and Falkenberg, Robert and Pillmann, Johannes and Wietfeld, Christian. Efficient machine-type communication using multi-metric context-awareness for cars used as mobile sensors in upcoming 5G networks. In Proceedings of the 87th Vehicular Technology Conference: VTC2018-Spring, IEEE, 2018.
Tomaras/etal/2018a	Dimitrios Tomaras and Vana Kalogeraki and Thomas Liebig and Dimitrios Gunopulos. Crowd-based ecofriendly trip planning. In Proceedings of the 19th IEEE International Conference on Mobile Data Management, Aalborg, pages (accepted), IEEE Press, 2018.
Heppe/2017a	Heppe, Lukas and Liebig, Thomas. Real-Time Public Transport Delay Prediction for Situation-Aware Routing. In Kern-Isberner, Gabriele and Fürnkranz, Johannes and Thimm, Matthias (editors), KI 2017: Advances in Artificial Intelligence: 40th Annual German Conference on AI, Dortmund, Germany, September 25--29, 2017, Proceedings, pages 128--141, Cham, Springer, 2017.
Liebig/2017a	Liebig, Thomas. Smart navigation - chances, risk and challenges. In M. Jankowska and M. Pawelczyk and S. Augustyn and M. Kulawiak (editors), Navigation and Earth Observation - Law & Technology, pages (accepted), Warsaw, IUS PUBLICUM, 2017.
Liebig/2017b	Liebig, Thomas. Report on Data Privacy. No. H2020-688380 D4.1, VAVEL Consortium, Dortmund, Germany, 2017.
Liebig/etal/2017b	Liebig, Thomas and Piatkowski, Nico and Bockermann, Christian and Morik, Katharina. Dynamic Route Planning with Real-Time Traffic Predictions. In Information Systems, Vol. 64, pages 258--265, Elsevier, 2017.
Liebig/Sotzny/2017a	Liebig, Thomas and Sotzny, Maurice. On Avoiding Traffic Jams with Dynamic Self-Organizing Trip Planning. In Clementini, Eliseo and Donnelly, Maureen and Yuan, May and Kray, Christian and Fogliaroni, Paolo and Ballatore, Andrea (editors), 13th International Conference on Spatial Information Theory (COSIT 2017), Vol. 86, pages 17:1--17:12, Dagstuhl, Germany, Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2017.
Souto/Liebig/2016a	Gustavo Souto and Thomas Liebig. On Event Detection from Spatial Time Series for Urban TrafficApplications. In Stefan Michaelis and Nico Piatkowski and Marco Stolpe (editors), Solving Large Scale Learning Tasks: Challenges and Algorithms, Vol. 9580, pages 221--233, Springer, 2016.
Liebig/2015b	Liebig, Thomas. Analysis Methods and Privacy Aspects in Spatio-Temporal Data Mining. In Marlena Jankowska and Miroslaw Pawelczyk and Sylvie Allouche and Marcin Kulawiak (editors), AI: Philosophy, Geoinformatics & Law, pages (to appear), Warsaw, IUS PUBLICUM, 2015.

Vista-TV

Start: 06/2012
Partners: University of Zurich (Coordindator), TU Dortmund University, Rapid-I GmbH, Zattoo Europa AG, Vrije Universiteit Amsterdam, BBC
URL: Vista-TV.eu

Live video content is increasingly consumed over IP networks in addition to traditional broadcasting. The move to IP provides a huge opportunity to discover what people are watching in much greater breadth and depth than currently possible through interviews or set-top box based data gathering by rating organizations, because it allows direct analysis of consumer behavior via the logs they produce. The ViSTA-TV project proposes to gather consumers’ anonymized viewing behavior and the actual video streams from broadcasters/IPTV-transmitters, to combine them with enhanced electronic program guide information as the input for a holistic live-stream data mining analysis.
ViSTA-TV will employ the gathered information via a stream-analytics process to generate a high-quality linked open dataset (LOD) describing live TV programming. Combining the LOD with the behavioral information gathered, ViSTA-TV will be in the position to provide highly accurate market research information about viewing behavior that can be used for a variety of analyses of high interest to all participants in the TV-industry. ViSTA-TV will employ the information gathered to build a recommendation service that exploits both usage information and personalized feature extraction in conjunction with existing metadata to provide real-time viewing recommendations.
These results will be made possible by scientific progress in data-stream mining consisting of advances in data mining for tagging, recommendations, and behavioral analyses and temporal/probabilistic RDF-triple stream processing.

ViSTA-TV is a European Union-funded research project, beginning on 1 June 2012, and lasting for two years.

KobRA - Korpus-basierte linguistische Recherche und Analyse mit Hilfe von Data-Mining

Duration : 09/2012 - 08/2015
Participants: Prof. Dr. Angelika Storrer, Prof. Dr. Katharina Morik, Prof. Dr. Erhard Hinrichs, Dr. Alexander Geyken, Dr. Marc Kupietz, Dr. Andreas Witt
URL: KoBRA

Korpus-basierte Linguistik hat sich in den letzten Jahren zu einem wichtigen Gebiet der Sprachforschung entwickelt. In Infrastrukturprojekten wie CLARIN werden umfangreiche, strukturierte Sprachressourcen (Textkorpora, Baumbanken, lexikalische Wortnetze) bereitgestellt, die neuartige und attraktive Möglichkeiten bieten, linguistische Fragestellungen an authentischen Sprachverwendungsdaten zu untersuchen und quantitativ auszuwerten.

Ziel des Projekts ist es, durch den Einsatz innovativer Data-Mining-Verfahren (insbesondere Verfahren des maschinellen Lernens) die Möglichkeiten der empirischen linguistischen Arbeit mit strukturierten Sprachressourcen zu verbessern.

DDMD Data Driven Material Development

Duration: 01.10.2012 - 30.09.2014
Contact Person: Prof. Dr. Ralf Drautz
URL: DDMD Data Driven Material Development

In diesem Projekt soll das systematische Design neuer Materialien durch die interdisziplinäre Zusammenarbeit zwischen Materialwissenschaften und Informatik vorangetrieben werden. Der neue Wissenschaftszweig heißt „Data Driven Materials Development“ oder „Datengetriebene Materialentwicklung“. In diesem Gebiet sollen sowohl neue Entdeckungen und Einsichten, z.B. über bisher unbekannte Phasen oder über besondere physikalische Eigenschaften der Materialien, gewonnen werden, als auch die Entwicklung neuer Materialien beschleunigt werden. Hierzu arbeiten in der Materialforschung zwei Lehrstühle der RUB zur synergistischen Nutzung von experimentellen Hochdurchsatzmethoden und analytischer Modellierung mit zwei Informatik-Lehrtsühlen der TU Dortmund und der Universität Duisburg-Essen zum Data Mining bzw. zur Hochdurchsatzanalyse zusammen. Dies ist notwendig, da in der systematischen Materialerforschung, insbesondere in den Bereichen Dünnschicht-Materialbibliotheken, Eigenschafts-Screenings und „Advanced Materials Simulation“, sehr große und hochdimensionale Datenmengen anfallen, die nur mit Hilfe von neuartigen Datenanalyseverfahren und entsprechenden Computerressourcen effizient analysiert werden können.

SFB 475 - Project A4

Duration: since 07/1997 (DFG)
Project Leader: Prof. Dr. Katharina Morik, Prof. Dr. Claus Weihs
Staff: Thorsten Joachims, Stefan Rüping, Ralf Klinkenberg, Ingo Mierswa, Martin Scholz, Michael Wurst
URL: SFB 475 - A4

The aim of project A4 is to combine statistical methods and methods of machine learning in order to improve Knowledge Discovery in Databases (KDD). After the process of the knowledge discovery was examined as a whole in the last period, we focus on two important problems in the current period. These problems often occur in practice of knowledge discovery. Corresponding research promises a special synergy effect because of the combination of statistical methods and machine learning methods: analysis temporal phenomenons in the form of events and the application of experimental design. Additionally, emphasis of the project is placed on the applied analysis of real databases.

Selected Publications

Mierswa, Ingo and Morik, Katharina. Automatic Feature Extraction for Classifying Audio Data. Machine Learning Journal, 58, 127-149, 2005. [pdf]

Mierswa, Ingo and Wurst, Michael. Efficient Case Based Feature Construction for Heterogeneous Learning Tasks. In Proceedings of the European Conference on Machine Learning (ECML), Springer-Verlag, Berlin, 641-648, 2005. [pdf]

Morik, Katharina and Siebes, Arno and Boulicault, Jean-François (editors). Detecting Local Patterns, Springer Lecture Notes in Artificial Intelligence, Volume 3539, Springer-Verlag, Berlin, 2005. Springer

Rüping, Stefan and Scheffer, Tobias (editors). Proceedings of the ICML 2005 Workshop on Learning with Multiple Views, 2005.

Scholz, Martin. Sampling-Based Sequential Subgroup Mining. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Databases (KDD), 265-274, 2005.

Klinkenberg, Ralf and Rüping, Stefan. Concept Drift and the Importance of Examples. In Franke, Jürgen and Nakhaeizadeh, Gholamreza and Renz, Ingrid (editors), Text Mining - Theoretical Aspects and Applications, Seiten 55--77, Physica-Verlag, Berlin, 2003.

Morik, Katharina and Rüping, Stefan. A Multistrategy Approach to the Classification of Phases in Business Cycles. In Proceedings of the European Conference on Machine Learning (ECML), Springer-Verlag, 307-318, 2002. [pdf]

Joachims, Thorsten. Estimating the Generalization Performance of a SVM Efficiently. In Proceedings of the International Conference on Machine Learning (ICML), Morgan Kaufman, 431-438, 2000. [pdf]

Joachims, Thorsten. Making large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999. [pdf]

Joachims, Thorsten. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning (ECML), Springer-Verlag, 137-142, 1998. [pdf]

KDUbiq

Duration: ab 01/2006 (EU)
Project Leader: Fraunhofer Institut for Intelligent Autonomous Systems
Staff: Katharina Morik, Sebastian Land
URL:http://www.kdubiq.org

KDUbiq brings together newly emerging research in ubiquitous knowledge discovery. This multi-disciplinary approach constitutes a paradigm shift for the field of knowledge discovery since the idea of standalone analysis tools is abandoned in favour of process integrated, distributed and autonomous analysis systems.

Selected Publications

SFB 531 - Project B5

Duration: 01/2000 - 12/2002 (DFG)
Project Leader: Prof. Dr. Katharina Morik
Staff: Oliver Ritthoff, Ralf Klinkenberg, Ingo Mierswa
URL: SFB 531 - B5

The goal of this project is the identification and formalization of practically relevant learning tasks on the basis of applications in the C-projects. Particular learning tasks which deviate from the standard scenario of classification respectively optimization as, e.g., learning with non-factual knowledge, repeated learning of similar concepts, learning of temporally varying concepts and feature selection/construction will be considered. In this context the problem of feature selection/construction will be a central aspect in the scope of investigations.

Selected Publications

Klinkenberg, Ralf. Learning Drifting Concepts: Example Selection vs. Example Weighting. In Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, Vol. 8, No. 3, 2004.

Klinkenberg, Ralf and Rüping, Stefan. Concept Drift and the Importance of Examples. In Franke, Jürgen and Nakhaeizadeh, Gholamreza and Renz, Ingrid (editors), Text Mining -- Theoretical Aspects and Applications, Seiten 55-77, Berlin, Germany, Physica-Verlag, 2003.

Ritthoff, Oliver and Klinkenberg, Ralf. Evolutionary Feature Space Transformation using Type-Restricted Generators. In Cantu-Paz, E. et al.(editors), Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2003) - Part II, Seiten 1606-1607, Springer, 2003.

Ritthoff, Oliver and Klinkenberg, Ralf and Fischer, Simon and Mierswa, Ingo. A Hybrid Approach to Feature Selection and Generation Using an Evolutionary Algorithm. In Bullinaria, John A. (editors), Proceedings of the 2002 U.K. Workshop on Computational Intelligence (UKCI-02), Seiten 147-154, Birmingham, UK, University of Birmingham, 2002.

Klinkenberg, Ralf und Joachims, Thorsten. Detecting concept drift with support vector machines. In P. Langley (Hrsg.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML), Seiten 487-494. Morgan Kaufmann, San Francisco, CA, USA, 2000.

SFB 531 - Project C11

Duration: 01/2003 - 12/2005 (DFG)
Project Leader: Prof. Dr. Katharina Morik, Prof. Dr. Henner Schmidt-Traub
Staff: Dipl.-Ing. Bernd Hicking, Dipl.-Inform. Hanna Köpcke, Dipl.-Inform. Ingo Mierswa, Dipl.-Inform. Oliver Ritthoff
URL: SFB 531 - C11

The goal of this project is to find optimal positionings for given chemical equipment with methods from the field of Computational Intelligence. We compare and evaluate several knowledge-based and numerical approaches to optimize a plant layout under given constraints. Up to now previous knowledge is not used for sub-symbolic optimization and ideas of knowledge-based optimization should be transferred into Computation Intelligence. This knowledge is extracted from plans provided by engineers.

Selected Publications

Morik, Katharina and Schmidt-Traub, Henner and Hicking, Bernd and Köpcke, Hanna and Mierswa, Ingo. Layout optimization for chemical plants. In Industriemanagement, 2005.

Mierswa, Ingo. Incorporating Fuzzy Knowledge into Fitness: Multiobjective Evolutionary 3D Design of Process Plants. In Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2005, Washington D.C., USA, 2005.

AWAKE

Duration: 04/2001 - 12/2003 (BMBF)
Project Leader: Fraunhofer for Media Communication
Staff: Michael Wurst, Katharina Morik
URL: http://awake.imk.fhg.de

The aim of the project Awake is to explore how implicit knowledge structures in different communities of experts can be discovered, visualised and employed for semantic navigation of information spaces and construction of new knowledge. The developed methods combine semantic text analysis with Machine Learning and interfaces for visualising relationships and creating new knowledge structures. Application scenarios include automatic generation of personalised knowledge portals, collaborative semantic exploration of complex information spaces and construction of shared ontology networks for the SemanticWeb. The real-world testbed and context of development is the Internet platform netzspannung.org that aims at establishing a knowledge portal connecting digital art, culture and information technology.

Selected Publications

Novak, Jasminko and Wurst, Michael. Supporting Knowledge Creation and Sharing in Communities Based on Mapping Implicit Knowledge. In j-jucs, Vol. 10, No. 3, pages 235--251, 2004.

Wurst, Michael and Novak, Jasminko. Knowledge Sharing im Heterogeneous Expert Communities based on Personal Taxonomies. In ECAI Workshop on Agent Mediated Knowledge Management, 2004.

Novak, Jasminko and Wurst, Michael. Discovering, Visualizing and Sharing Knowledge through Personalized Learning Knowledge Maps. In Agent Mediated Knowledge Management, 2003.

Novak, Jasminko and Wurst, Michael. Supporting Communities of Practice Through Personalisation and Collaborative Structuring based on Capturing Implicit Knowledge. In Proceedings of the International Conference on Knowledge Management, 2003.

Morik, Katharina and Wurst, Michael. Knowledge Dicovery and Knowledge Visualization, Perspektiven vernetzter Wissensraeume, Workshop 2002. 2002.

Mining Mart

Duration: 01/2000 - 02/2003 (EU)
Project Leader: Katharina Morik
Staff: Katharina Morik, Martin Scholz, Timm Euler, Harald Liedtke
URL:http://mmart.cs.uni-dortmund.de

Within the data mining process considerable time is spent for pre-processing the data. Practical experiences have shown that the time spent on preprocessing can take from 50% up to 80% of the entire data mining process when using the traditional attribute-value learners. Thats why preprocessing is the key issue in data analysis. The time is spend for:

Choosing the learning task
Sampling
Feature generation, extraction, and selection
Data cleaning
Model selection or tuning the hypothesis space
Defining appropriate evaluation criteria

Experienced users can apply any learning system successfully to any application, since they prepare the data well. The representation of examples and the choice of a sample determines the applicability of learning methods. A chain of data transformations (learning steps or manual preprocessing) delivers the desired result. Experienced users remember prototypical successful transformation/learning chains.

Selected Publications

Euler, Timm. Publishing Operational Models of Data Mining Case Studies. In Proceedings of the Workshop on Data Mining Case Studies at the 5th IEEE International Conference on Data Mining (ICDM), pages 99--106, Houston, Texas, USA, 2005.

Euler, Timm. Modelling Data Mining Processes on a Conceptual Level. In Proceedings of the 5th International Conference on Decision Support for Telecommunications and Information Society, Warsaw, Poland, 2005.

Morik, Katharina and Scholz, Martin. The MiningMart Approach to Knowledge Discovery in Databases. In Ning Zhong and Jiming Liu (editors), Intelligent Technologies for Information Analysis, pages 47--65, Springer, 2004.

Kietz, Jörg-Uwe and Vaduva, Anca and Zücker, Regina, MiningMart: Metadata-Driven Preprocessing. In Proceedings of the ECML/PKDD Workshop on Database Support for KDD, 2001.

Kietz, Jörg-Uwe and Vaduva, Anca and Zücker, Regina, Mining Mart: Combining Case-Based-Reasoning and Multi-Strategy Learning into a Framework to reuse KDD-Application. In Proceedings of the 5th International Workshop on Multistrategy Learning, R.S. Michalki and P. Brazdil (editors), 2000.

Morik, Katharina. The Representation Race - Preprocessing for Handling Time Phenomena. In Proceedings of the European Conference on Machine Learning, Barcelona, Spain, Springer, 2000.

COMRIS

Duration: 10/1997 - 12/2000 (EU)
Project Leader: University of Brussel
Staff: Stefan Haustein, Katharina Morik
URL: http://arti.vub.ac.be/~comris/

The COMRIS project aims to develop, demonstrate and experimentally evaluate a scalable approach to integrating the Inhabited Information Spaces schema with a concept of software agents. The COMRIS vision of co-habited mixed-reality information spaces emphasizes the co-habitation of software and human agents in a pair of closely coupled spaces, a virtual and a real one. However, this project does not pursue the perceptual integration of real and virtual space into an augmented reality. Instead the coupling aims at focusing the large potential for useful social interactions in each of the spaces, so that they become more manageable, goal-directed and effective.

Selected Publications

Cranefield, Stephen and Haustein, Stefan and Purvis, Martin. UML-Based Ontology Modelling for Software Agents. In Proceedings of the Autonomous Agents 2001 Workshop on Ontologies in Agent Systems, 2001.

Haustein, Stefan. Semantic Web Languages: RDF vs. SOAP Serialization. In Proceedings of the Second International Workshop on the Semantic Web at WWW10, 2001.

Haustein, Stefan. Utilising an Ontology Based Repository to Connect Web Miners and Application Agents. In Proceedings of the ECML/PKDD Workshop on Semantic Web Mining, 2001.

Haustein, Stefan and Lüdecke, Sascha and Schwering, Christian. The Knowledge Agency. In Proceedings of the Forth International Conference on Autonomous Agents, pages 205 -- 206, ACM SIGART, Barcelona, Spain, ACM Press, New York, 2000.

Haustein, Stefan and Lüdecke, Sascha. Towards Information Agent Interoperability. In Cooperative Information Agents IV -- The Future of Information Agents in Cyberspace, Vol. 1860, pages 208 -- 219, Boston, USA, Springer, 2000.

Morik, Katharina and Haustein, Stefan. The Challenge of Discovering Meta--Data. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, American Association for Artificial Intelligence (AAAI), AAAI press, 2000.

BLearn

Duration: 9/1992 - 8/1995 (EU)
Project Leader: University of Karlsruhe
Staff: Volker Klingspor, Katharina Morik, Anke Rieger
URL:

Within the project BLearn II machine learning methods are applied to robotics, in order to reduce the time for setting up and modifying robot applications, and in order to make the operation of robots more user-friendly. The task of chair VIII within this project is to integrate logic-based learning into navigation. The goal is to allow a human user to give abstract commands, such as &qoute;Pass through the doorway, turn left and stop &qoute;. In order to execute these commands, the robot has to be able to recognize, for example, a door or a cupboard. In addition, the robot has to be able to find a door and to execute a left turn in a flexible way, adjusting itself to the different spatial conditions. A hierarchy of learning steps has been developed, which starts from sensor data and robot moves, and which leads to operational concepts. They integrate information about perceptions and actions, such that object recognition and action are coupled directly.

Selected Publications

Morik, Katharina and Klingspor, Volker and Kaiser, Michael (editors). Making Robots Smarter -- Combining Sensing and Action through Robot Learning. Kluwer Academic Press, 1999.

Klingspor, Volker and Morik, Katharina and Rieger, Anke. Learning Concepts from Sensor Data of a Mobile Robot. In Machine Learning, Vol. 23, No. 2/3, pages 305-332, 1996.

Klingspor, Volker and Demiris, J. and Kaiser, Michael. Human-Robot-Communication and Machine Learning. In Applied Artificial Intelligence, Vol. 11, No. 7/8, pages 719--746, 1997.

Klingspor, Volker and Morik, Katharina. Towards Concept Formation Grounded on Perception and Action of a Mobile Robot. In U. Rembold and R. Dillmann and L.O. Hertzberger and T. Kanade (editors), IAS--4, Proc. of the 4th Intern. Conference on Intelligent Autonomous Systems, pages 271--278, Amsterdam, IOS Press, 1995.

Hauptnavigation

General

Research

Teaching

Staff

Selected Projects

SFB 876

Selected Publications of high-impact journals and conferences

EU H2020: VaVeL: Variety, Veracity, VaLue

Selected Publications

Vista-TV

KobRA - Korpus-basierte linguistische Recherche und Analyse mit Hilfe von Data-Mining

DDMD Data Driven Material Development

SFB 475 - Project A4

Selected Publications

KDUbiq

Selected Publications

SFB 531 - Project B5

Selected Publications

SFB 531 - Project C11

Selected Publications

AWAKE

Selected Publications

Mining Mart

Selected Publications

COMRIS

Selected Publications

BLearn

Selected Publications