%matplotlib inline
from IPython.core.display import HTML, Image
from IPython.display import YouTubeVideo
from sympy import init_printing, Matrix, symbols, Rational
import sympy as sym
from warnings import filterwarnings
init_printing(use_latex = 'mathjax')
import numpy as np
from notebook.services.config import ConfigManager
cm = ConfigManager()
cm.update('livereveal', {
'theme': 'simple',
'start_slideshow_at': 'selected',
})
import warnings
warnings.simplefilter("ignore")
# numpy is crucial for vectors, matrices, etc.
import numpy as np
# Lots of cool plotting tools with matplotlib
import matplotlib.pyplot as plt
# For later: scipy has a ton of stats tools
import scipy as sp
# For later: sklearn has many standard ML algs
import sklearn
# Here we go!
print("Hello World!")
In this lecture, we will first introduce convex set, convex function and optimization problem. One approach to solve optimization problem is to solve its dual problem. We will briefly cover some basics of duality in this lecture. More about optimization and duality will come when we study support vector machine (SVM).
In some cases original (primal) optimization problem can hard to solve, solving a proxy problem sometimes can be easier
The proxy problem could be dual problem which is transformed from primal problem
Here is how to transform from primal to dual. For primal problem $$ \begin{aligned} & {\text{minimize}} & & f(\mathbf{x})\\ & \text{subject to} & & g_i(\mathbf{x}) \leq 0, \quad i = 1, ..., m\\ & & & h_j(x) = 0, \quad j = 1, ..., n \end{aligned} $$ Its Lagrangian is $$ L(x,\boldsymbol{\lambda}, \boldsymbol{\nu}) := f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^n \nu_j h_j(x) $$ of which $\boldsymbol{\lambda} \in \mathbb{R}^m$, $\boldsymbol{\nu} \in \mathbb{R}^n$ are dual variables
The Langrangian dual function is $$ L_D(\boldsymbol{\lambda}, \boldsymbol{\nu}) \triangleq \underset{x}{\inf}L(x,\boldsymbol{\lambda}, \boldsymbol{\nu}) = \underset{x}{\inf} \ \left[ f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^n \nu_j h_j(x) \right] $$
The minimization is usually done by finding the stable point of $L(x,\boldsymbol{\lambda}, \boldsymbol{\nu})$ with respect to $x$