Notebook

EECS 545 - Machine Learning¶

Lecture 3: Convex Optimization¶

Date: January 13, 2016¶

Instructor: Jacob Abernethy¶

In [1]:

from IPython.core.display import HTML, Image
from IPython.display import YouTubeVideo
from sympy import init_printing, Matrix, symbols, Rational
import sympy as sym
from warnings import filterwarnings
init_printing(use_latex = 'mathjax')
filterwarnings('ignore')

%pylab inline

import numpy as np

Populating the interactive namespace from numpy and matplotlib

Some important notes¶

HW1 is out! Due January 25th at 11pm
Homework will be submitted via Gradescope. Please see Piazza for precise instructions. Do it soon, not at the last minute!!
There is an optional tutorial this evening, 5pm, in Dow 1013. Come see Daniel go over some tough problems.
No class on Monday January 18, MLK day!

* Anaconda is standalone Python distribution that includes all the most important scientific packages: *numpy, scipy, matplotlib, sympy, sklearn, etc.* * Easy to install, available for OS X, Windows, Linux. * Small warning: it's kind of large (250MB)

Some notes on using Python¶

HW1 has only a very simple programming exercise, just as a warmup. We don't expect you to submit code this time
This is a good time to start learning Python basics
There are a ton of good places on the web to learn python, we'll post some
Highly recommended: ipython; it's a much more user friendly terminal interface to python
Even better: jupyter notebook, a web based interface. This is how I'm making these slides!

Checking if all is installed, and HelloWorld¶

If you got everything installed, this should run:

# numpy is crucial for vectors, matrices, etc.
import numpy as np              
# Lots of cool plotting tools with matplotlib
import matplotlib.pyplot as plt
# For later: scipy has a ton of stats tools
import scipy as sp
# For later: sklearn has many standard ML algs
import sklearn
# Here we go!
print("Hello World!")

Outline for this Lecture¶

Convexity
- Convex Set
- Convex Function
Introduction to Optimization
Introduction to Lagrange Duality

In this lecture, we will first introduce convex set, convex function and optimization problem. One approach to solve optimization problem is to solve its dual problem. We will briefly cover some basics of duality in this lecture. More about optimization and duality will come when we study support vector machine (SVM).

Convexity¶

Convex Sets¶

$C \subseteq \mathbb{R}^n$ is convex if

$$t x + (1-t)y \in C$$

for any $x, y \in C$ and $0 \leq t \leq 1$

that is, a set is convex if the line connecting any two points in the set is entirely inside the set

(Left: Convex Set; Right: Non-convex Set)

Convex Functions¶

We say that a function $f$ is convex if, for any distinct pair of points $x,y$ we have

$$f(tx_1+(1-t)x_2) \leq tf(x_1)+(1-t)f(x_2) \quad \forall t \in[0,1]$$

A function $f$ is said to be concave if $-f$ is convex

Fun Facts About Convex Functions¶

If $f$ is differentiable, then $f$ is convex iff $f$ "lies above its linear approximation", i.e.:

$$ f(x + y) \geq f(x) + \nabla_x f(x) \cdot y \quad \forall x,y$$

If $f$ is twice-differentiable, then the hessian is always positive semi-definite!
This last one you will show on your homeowork :-)

Introduction to Optimization¶

The Most General Optimization Problem¶

Assume $f$ is some function, and $C \subset \mathbb{R}^n$ is some set. The following is an optimization problem: $$ \begin{array}{ll} \mbox{minimize} & f(x) \\ \mbox{subject to} & x \in C \end{array} $$

How hard is it to find a solution that is (near-) optimal? This is one of the fundamental problems in Computer Science and Operations Research.
A huge portion of ML relies on this task

A Rough Optimization Hierarchy¶

$$ \mbox{minimize } \ f(x) \quad \mbox{subject to } x \in C $$

[Really Easy] $C = \mathbb{R}^n$ (i.e. problem is unconstrained), $f$ is convex, $f$ is differentiable, strictly convex, and "slowly-changing" gradients
[Easyish] $C = \mathbb{R}^n$, $f$ is convex
[Medium] $C$ is a convex set, $f$ is convex
[Hard] $C$ is a convex set, $f$ is non-convex
[REALLY Hard] $C$ is an arbitrary set, $f$ is non-convex

Optimization Without Constraints¶

$$ \begin{array}{ll} \mbox{minimize} & f(x) \\ \mbox{subject to} & x \in \mathbb{R}^n \end{array} $$

This problem tends to be easier than constrained optimization
We just need to find an $x$ such that $\nabla f(x) = \vec{0}$
Techniques like gradient descent or Newton's method work in this setting. (More on this later)

Optimization With Constraints¶

$$ \begin{aligned} & {\text{minimize}} & & f(\mathbf{x})\\ & \text{subject to} & & g_i(\mathbf{x}) \leq 0, \quad i = 1, ..., m\\ & & & h_j(x) = 0, \quad j = 1, ..., n \end{aligned} $$

Here $C = \{ x : g_i(x) \leq 0,\ h_j(x) = 0, \ i=1, \ldots, m,\ j = 1, ..., n \}$
$C$ is convex as long as all $g_i(x)$ convex and all $h_j(x)$ affine
The solution of this optimization may occur in the interior of $C$, in which case the optimal $x$ will have $\nabla f(x) = 0$
But what if the solution occurs on the boundary of $C$?

Introduction to Lagrange Duality¶

In some cases original (primal) optimization problem can hard to solve, solving a proxy problem sometimes can be easier
The proxy problem could be dual problem which is transformed from primal problem
Here is how to transform from primal to dual. For primal problem

$$ \begin{aligned} & {\text{minimize}} & & f(\mathbf{x})\\ & \text{subject to} & & g_i(\mathbf{x}) \leq 0, \quad i = 1, ..., m\\ & & & h_j(x) = 0, \quad j = 1, ..., n \end{aligned} $$

Its Lagrangian is $$L(x,\boldsymbol{\lambda}, \boldsymbol{\nu}) := f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^n \nu_j h_j(x)$$ of which $\boldsymbol{\lambda} \in \mathbb{R}^m$, $\boldsymbol{\nu} \in \mathbb{R}^n$ are dual variables

The Langragian dual function is

$$L_D(\boldsymbol{\lambda}, \boldsymbol{\nu}) \triangleq \underset{x}{\inf}L(x,\boldsymbol{\lambda}, \boldsymbol{\nu}) = \underset{x}{\inf} \ \left[ f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^n \nu_j h_j(x) \right] $$

The minization is usually done by finding the stable point of $L(x,\boldsymbol{\lambda}, \boldsymbol{\nu})$ with respect to $x$

The Lagrange Dual Problem¶

Then the dual problem is

$$ \begin{aligned} & {\text{maximize}} & & L_D(\mathbf{\lambda}, \mathbf{\nu})\\ & \text{subject to} & & \lambda_i, \nu_j \geq 0 \quad i = 1, \ldots, m ,\ j = 1, ..., n\\ \end{aligned} $$

Instead of solving primal problem with respect to $x$, we now need to solve dual problem with respect to $\mathbf{\lambda}$ and $\mathbf{\nu}$

$ L_D(\mathbf{\lambda}, \mathbf{\nu})$ is concave even if primal problem is not convex
Let the $p^*$ and $d^*$ denote the optimal values of primal problem and dual problem, we always have weak duality: $p^* \geq d^*$
Under nice conditions, we get strong duality: $p^* = d^*$
Many details are ommited here and they will come when we study support vector machine (SVM)

EECS 545 - Machine Learning¶

Lecture 3: Convex Optimization¶

Date: January 13, 2016¶

Instructor: Jacob Abernethy¶

Some important notes¶

Some notes on using Python¶

Checking if all is installed, and HelloWorld¶

More on learning python¶

Outline for this Lecture¶

Convexity¶

Convex Sets¶

Convex Functions¶

Fun Facts About Convex Functions¶

Introduction to Optimization¶

The Most General Optimization Problem¶

A Rough Optimization Hierarchy¶

Optimization Without Constraints¶

Optimization With Constraints¶

Introduction to Lagrange Duality¶

The Lagrange Dual Problem¶

Recommended reading:¶

EECS 545 - Machine Learning¶

Lecture 3: Convex Optimization¶

Date: January 13, 2016¶

Instructor: Jacob Abernethy¶

Some important notes¶

Python: We recommend Anacdona¶

Some notes on using Python¶

Checking if all is installed, and HelloWorld¶

More on learning python¶

Outline for this Lecture¶

Convexity¶

Convex Sets¶

Convex Functions¶

Fun Facts About Convex Functions¶

Introduction to Optimization¶

The Most General Optimization Problem¶

A Rough Optimization Hierarchy¶

Optimization Without Constraints¶

Optimization With Constraints¶

Introduction to Lagrange Duality¶

The Lagrange Dual Problem¶

Recommended reading:¶