Veranstaltung |
Wochentag |
Termin |
Ort |

042531 (Übung: 042532) | Montag (Übung: Mittwoch) | 10.15 - 12.00 | Otto-Hahn-Str. 12 - R 1.055 (Mon), R 1.056 (Wed) Campus Nord |

- Module description: pdf
- Language: English
- Lecturer: Dr. Sangkyun Lee
- Attendance to practices (Übung) is REQUIRED.
- Office Hour (lecturer): Thursday 13:00-14:00, LS8, OH 12, Raum 4.023 (or by appointments)

This course focuses on optimization techniques to find solutions of large-scale problems that typically appear in statistical learning / data analysis tasks with big data. Topics would include widely adopted methods in modern research literature such as projected gradient methods, accelerated first order algorithms, conjugate gradient methods, quasi-Newton methods, block coordinate descent, proximal point methods, stochastic sub-gradient algorithms, alternating direction method of multipliers, and semi-definite programming. Efficiency of these methods in terms of convergence rate and overall time complexity will be discussed, so that one can see how to choose a suitable method for his/her own research problems. Separable (approximate) reformulations of optimization problems will also be discussed that are suitable for parallel computation compared to their original formulation. Discussions in the course will be problem- oriented, so that methods will be presented in the context of specific problem instances whenever possible. Homework assignments will be given to provide background knowledge for the course or to check understanding of techniques.

The aim of this course is to provide students with understanding of modern optimization techniques suitable for large-scale/big-data problems, so that students see how they can choose, apply, and/or modify appropriate methods for their own research problems.

- Lectures: we focus on methods to understand the structure of their target problem, algorithms, and convergence properties in theory.
- Practices: we will see real-world examples where optimization methods (discussed in lectures) are used.

- EE236C (UCLA) by L. Vendenberghe, link
- 10-725 (CMU) by G. Gordon R. Tibshirani, link
- EE364b (Stanford) by S. Boyd, link

- Lecture 1 (06.10.14): Introduction to large-scale optimization
- Lecture 2 (08.10): Background for convex optimization
- Lecture 3 (13.10): Stochastic subgradient descent (1: gradient descent)
- Lecture 4 (15.10): Stochastic subgradient descent (2: subgradient method)
- Lecture 5 (20.10): Stochastic subgradient descent (3: stochastic subgradient method)
- Lecture 6 (22.10): Stochastic subgradient descent (4: stochastic subgradient method)
- > Reading: Nemirovski et al., Robust Stochastic Approximation Approach to Stochastic Programming, SIOPT 2009 Sections 1~2.2, 4~5 (inclusive)

- Lecture 7-8 ({27,29}.10): Stochastic subgradient descent for SVMs
- Lecture 9 (3.11): Generalized gradient descent
- Lecture 10 (5.11): SpaRSA, Compressed Sensing
- Lecture 11 (10.11): Random projection, projected gradient descent, matrix completion
- > Reading: SparRSA: sparse Reconstruction by Separable Approximation
- > Reading (optional): Astronomical image representation by the curvelet transform (Section 5.3 5.4)
- > Reading (optional): ISTA part (Sect 3) of the FISTA paper below
- > Reading (optional): High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality, D. Donoho, 2000

- Lecture 12 (12.11): Accelerated first-order methods
- Lecture 13 (17.11): Matrix factorizations
- Lecture 14 (19.11): Optimization systems: CVXPY, examples
- Lecture 15 (24.11): Estimation of extreme singular values
- Lecture 16 (26.11): Block coordinate descent (BCGD)
- Lecture 17 (1.12): Block coordinate descent (Accelerated, Alternating optimization)
- Lecture 18 (3.12): Alternating Direction Method of Multipliers (ADMM) 1 (Dual ascent, augmented Lagrangian, ADMM)
- Lecture 19 (8.12): Alternating Direction Method of Multipliers (ADMM) 2 (Consensus and sharing forms)
- Lecture 20 (10.12): Alternating Direction Method of Multipliers (ADMM) 3 (Learning with distributed examples or features)
- > Reading: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, FnT ML, 2010 Chapters: 2 and 3 (ADMM), 7 (distributed optimization), and 8 (ML examples: optional)
- -- after Christmas --
- Lecture 21 (7.1): Linear Program 1: Simplex Algorithm
- Lecture 22 (12.1): Linear Program 2: Interior-Point Method
- pdf (14.1) Presentation of mini-project ideas
- Lecture 23 (19.1): Duality 1: LP and Lagrangian duality
- Lecture 24 (21.1): Duality 2: Duality gap and strong duality
- Lecture 25 (26.1): Duality 3: Conjugation and dual problems
- Lecture 26 (28.1): Duality 4: Dual of Lasso and Fused Lasso
- Lecture 27 (2.2): Conic Program 1
- Lecture 28 (4.2): Conic Program 2
- Intro to Julia

- Optimality conditions for nonsmooth convex optimization [link]
- Google[x], Project Wing [youtube]
- Class projets from Stanford EE364b 2014 [link]
- L1-L2 Optimization in Signal and Image Processing (This provides a nice overview of some applications/methods we discuss. PDF is accessible within TU Dortmund)
- CVXPY: Python-based modeling language for convex optimization problems link

- > Proposal (due by 05.02): a successful proposal is required for the qualification of the final exam. Check the requirements discussed on 14.01 pdf
- > Presentation (19-20.03): oral presentation of a mini-project (20min: 1-person group or 30min: 2-people group)
- > Final report (due by 25.03): up to 7 pages (extra pages allowed only for large tables and plots)
- Final grade will be based on the presentation and the final report.
- Latex template for proposal and the final report: link

There is no restriction on the choice of topics, but it should be motivating for the student and fit the time limit of the lecture. You can focus on either method or application, but not restricted to.

- Method focus: e.g. modification of a method for better speed for a certain problem. Need to implement the method. Adaptation from an existing code is allowed. Benchmarking several algorithms for one problem setting is possible.
- Application focus: e.g. apply an optimization method for your research problem. Don't need to implement the method by yourself, if you can find an off-the-shelf software.

These data sets can be useful for your project.

- Collections of "standard" data sets:
- Computer vision and image datasets:
- Movie rating data sets:
- Bioinformatics:

- Motivation: why is this an interesting problem for you and/or for the audience?
- Understanding: the method(s) of choice should be explained in a compact but precise manner
- Creativity: any good reason behind: the choice of a method / methods, the choice of a problem / problems, etc.
- Evaluation: how well a hypothesis (e.g. an algorithm will get faster by a modification) is validated by theoretic analysis or experiments?
- Presentation: how good things are organized and delivered in the presentation?
- Repeatability: does it provide a script and executables to generate (a part of) experiments and results?

Some suggestions for mini-projets will be added: