Proseminar Sommersemester 2020

Machine Learning

Prof. Dr. Katharina Morik
Informatik LS8

Vertrauenswürdige KI -

Trustworthy Machine Learning

Machine learning (ML) is a driving force for many successful applications in Artificial Intelligence. ML pipelines ensure guarantees on the entirety of the system (i.e., horizontal certification) as well as on each single component (i.e., vertical certification). The horizontal certification covers the full pipeline from data acquisition to data visualization or to model deployment. These pipelines start with data acquisition. While scientific experiments are designed for the analysis, companies normally store their data for usages other than analysis. Production and communication have streaming data from distributed sensors which need to be synchronized and combined. ML and database theory both investigate methods of data description, data compression, feature extraction and selection, as well as sampling. Data impurities may travel in an ML pipeline from data acquisition to other consecutive components, and impact the quality of the pipeline downstream. Some approaches optimize the overall process of data analytics. In case errors get diffused in an ML pipeline, data-driven debugging explanation techniques (aka data provenance) are required to describe where the errors originate from.

The vertical certification exploits the theory of ML to guarantee error bounds, sampling complexity, energy consumption, execution time, and memory and communication demands. Many approaches are based on statistical theory. While methods are implemented in a particular programming paradigm and hardware architecture, which testing procedures are readily available for the certification of a particular implementation of a method, and which need yet to be developed?

The robustness of algorithms refers to the relationship between changes in the data and changes in the learning outcome. How can this be measured and tested, efficiently?

The fairness relates to properties of the data, not only the properties of the learned model, but also to our knowledge of what is possible (e.g., females being leaders) although counterfactual.

Explainability of a ML process can be regarded from user and system perspectives. From the user perspective, we are interested in knowing what can be done to help users comprehend learned models and inspect their applications. From the system perspective, we are interested in knowing how can the learned models be characterized and finally certified. This is the key part of ML and theory underlying vertical certification.

The issue of responsibility for the data and the services built upon the data refers to the overall pipeline. Companies need a clear policy governing the overall ML pipeline. The policy introduces quality measures together with their testing routines. It also rules data rights. What are best-practice procedures for companies, and how can they be made easy? Following regulatory digital privacy legislations (e.g., GDPR in Europe), the donors of data have the right to be informed about their data storage and use. The High Level Expert Group on AI has delivered Ethics Guidelines on Artificial Intelligence and Policy and Investment Recommendations Policy and Investment Recommendations. An approach to the horizontal certification of AI applications is under development at KI.NRW by Fraunhofer IAIS.

Date: Tuesdays, 16:15 - 17:45 h, online

Schedule

Moodle Workspace

Literature (excluding books):

Topic	Publications
Horizontal Certification	An intermediate representation for optimizing machine pipelines
	A survey on provenance: What for? What form? What from?
Vertical Certification	Sample complexity of composite likelihood
	Loopy belief propagation: Convergence and effects of message errors
	Realization of random forest for real-time evaluation through tree framing
	A comprehensive study of real-world numerical bug characteristics
	The (black) art of runtime evaluation: Are we comparing algorithms or implementations?
Robustness	Exploring change: a new dimension of data analytics
	Dbchex: Interactive exploration of data and schema change
	Metrics for explainable AI: Challenges and prospects
Fairness	Fairness and transparency in crowdsourcing
	Fairness and dicrimination in retrieval and recommendation
	Unbiased learning-to-rank with biased feedback
	Bias in olap queries: Detection, explanation, and removal
	Fa*ir: A fair top-k ranking algorithm
Explainability	When people and algorithms meet: User-reported problems in intelligent everyday applications
	A survey of methods for explaining black box models
	Methods for interpreting and understanding deep neural networks
	Why should I trust you?: Explaining the predictions of any classifier
	Evaluating the visualization of what a deep neural network has learned
	Integer undirected graphical models for resource-constrained systems
	Interpreting classifiers by multiple views
	Learning interpretable models
	The Strength of Weak Learnability
	Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods
	Democratizing data science through interactive curation of ML pipelines
Responsibility	The responsibility challenge for data

Hauptnavigation

General

Research

Teaching

Staff

Proseminar Sommersemester 2020

Machine Learning

Prof. Dr. Katharina Morik
Informatik LS8

Vertrauenswürdige KI -

Trustworthy Machine Learning

Hauptnavigation

General

Research

Teaching

Staff

Proseminar Sommersemester 2020

Machine Learning

Prof. Dr. Katharina MorikInformatik LS8

Vertrauenswürdige KI -

Trustworthy Machine Learning

Prof. Dr. Katharina Morik
Informatik LS8