Veranstaltung |
Wochentag |
Termin |
Ort |

042531 (Übung: 042532) | Montag (Übung: Mittwoch) | 10.15 - 12.00 | Otto-Hahn-Str. 12 - R 1.056 Campus Nord |

- Module description: pdf
- Language: English
- Lecturer: Dr. Sangkyun Lee
- Attendance to practices (Übung) is REQUIRED.
- Office Hour (lecturer): Thursday 13:00-14:00, LS8, OH 12, Raum 4.023 (or by appointments)

This course focuses on optimization techniques to find solutions of large-scale problems that typically appear in statistical learning / data analysis tasks with big data. Topics would include widely adopted methods in modern research literature such as projected gradient methods, accelerated first order algorithms, conjugate gradient methods, quasi-Newton methods, block coordinate descent, proximal point methods, stochastic sub-gradient algorithms, alternating direction method of multipliers, and semi-definite programming. Efficiency of these methods in terms of convergence rate and overall time complexity will be discussed, so that one can see how to choose a suitable method for his/her own research problems. Separable (approximate) reformulations of optimization problems will also be discussed that are suitable for parallel computation compared to their original formulation. Discussions in the course will be problem- oriented, so that methods will be presented in the context of specific problem instances whenever possible. Homework assignments will be given to provide background knowledge for the course or to check understanding of techniques.

The aim of this course is to provide students with understanding of modern optimization techniques suitable for large-scale/big-data problems, so that students see how they can choose, apply, and/or modify appropriate methods for their own research problems.

- Lectures: we focus on methods to understand the structure of their target problem, algorithms, and convergence properties in theory.
- Practices: we will see real-world examples where optimization methods (discussed in lectures) are used.

This lecture will be based on recent publications that will be assigned as reading homeworks. No textbook is required. Another lecture of mine will be helpful if you need background knowledge:

Some of my lecture materials are from:- EE236C (UCLA) by L. Vendenberghe, link
- 10-725 (CMU) by G. Gordon R. Tibshirani, link
- EE364b (Stanford) by S. Boyd, link

- - Methods -
- Lecture 1 (08,13.04): Introduction to large-scale optimization
- Lecture 2 (15.04): Background for convex optimization
- Lecture 3 (20.04): Gradient descent
- Lecture 4 (22.04): Subgradient method
- Lecture 5 (27.04): Stochastic Subgradient descent 1
- Lecture 6 (29.04): Stochastic Subgradient descent 2
- Lecture 7-8 (4-6.05): SVM and KKT conditions
- Lecture 9 (11.05): Pegasos SVM solver
- Lecture 10 (13.05): Proximal gradient descent
- Lecture 11 (18.05): Separable optimization / Compressed sensing
- Lecture 12 (20.05): Compressed sensing 2
- Lecture 13 (27.05): Accelerated First-Order Method
- Lecture 14 (01-08.06): Matrix Factorization
- Lecture 15 (15.06): ADMM 1
- Lecture 16 (17.06): ADMM 2
- Lecture 17 (22.06): ADMM 3
- Lecture 18 (24.06): Optimization modeling languages
- Lecture 19 (29.06): Block coordinate descent 1
- Lecture 20 (01.07): Block coordinate descent 2
- Lecture 21 (06.07): Duality 1
- Lecture 22 (08.07): Duality 2
- Lecture 23 (13.07): Duality 3
- Lecture 24 (15.07): Duality 4

Read the following papers as for your homework.

- (discussion begins on 22.04)Nemirovski et al., Robust Stochastic Approximation Approach to Stochastic Programming, SIOPT 2009 Sections 1~2.2, 4~5 (inclusive)
- (4-6.5) Shalev-Shwartz, Singer, and Srebro, Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, ICML 2007
- (13.5, 27.5-1.6) FISTA: A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
- (18-20.5, optional) L1-L2 Optimization in Signal and Image Processing (This provides a nice overview of some applications/methods we discuss. PDF is accessible within TU Dortmund)
- (8-17.6) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, FnT ML, 2010 Chapters: 2 and 3 (ADMM), 7 (distributed optimization), and 8 (ML examples: optional)
- (29.6-8.7) On the Convergence of Block Coordinate Descent Type Methods, SIOPT, 2013

- > Proposal (due by 17.07): a successful proposal is required for the qualification of the final exam. Check the requirements discussed on 14.01 pdf
- > Presentation (18.09): oral presentation of a mini-project (20min: 1-person group or 30min: 2-people group)
- > Final report (due by 24.09): up to 7 pages (extra pages allowed only for large tables and plots)
- Final grade will be based on the presentation (50\%) and the final report (50\%).
- Latex template for proposal and the final report: link

There is no restriction on the choice of topics, but it should be motivating for the student and fit the time limit of the lecture. You can focus on either method or application, but not restricted to.

- Method focus: e.g. modification of a method for better speed for a certain problem. Need to implement the method. Adaptation from an existing code is allowed. Benchmarking several algorithms for one problem setting is possible.
- Application focus: e.g. apply an optimization method for your research problem. Don't need to implement the method by yourself, if you can find an off-the-shelf software.

These data sets can be useful for your project.

- Collections of "standard" data sets:
- Computer vision and image datasets:
- Movie rating data sets:
- Bioinformatics:
- ECML/PKDD 2015 Discovery Challenge: link

- Motivation: why is this an interesting problem for you and/or for the audience?
- Understanding: the method(s) of choice should be explained in a compact but precise manner
- Creativity: any good reason behind: the choice of a method / methods, the choice of a problem / problems, etc.
- Evaluation: how well a hypothesis (e.g. an algorithm will get faster by a modification) is validated by theoretic analysis or experiments?
- Presentation: how good things are organized and delivered in the presentation?
- Repeatability: does it provide a script and executables to generate (a part of) experiments and results?