This tour explores the use of gradient descent method for unconstrained and constrained optimization of a smooth function
Important: Please read the installation page for details about how to install the toolboxes. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$
using PyPlot
using NtToolBox
#using Autoreload
#arequire("NtToolBox")
We consider the problem of finding a minimum of a function $f$, hence solving $$\umin{x \in \RR^d} f(x)$$ where $f : \RR^d \rightarrow \RR$ is a smooth function.
Note that the minimum is not necessarily unique. In the general case, $f$ might exhibit local minima, in which case the proposed algorithms is not expected to find a global minimizer of the problem. In this tour, we restrict our attention to convex function, so that the methods will converge to a global minimizer.
The simplest method is the gradient descent, that computes $$ x^{(k+1)} = x^{(k)} - \tau_k \nabla f(x^{(k)}), $$ where $\tau_k>0$ is a step size, and $\nabla f(x) \in \RR^d$ is the gradient of $f$ at the point $x$, and $x^{(0)} \in \RR^d$ is any initial point.
In the convex case, if $f$ is of class $C^2$, in order to ensure convergence, the step size should satisfy $$ 0 < \tau_k < \frac{2}{ \sup_x \norm{Hf(x)} } $$ where $Hf(x) \in \RR^{d \times d}$ is the Hessian of $f$ at $x$ and $\norm{\cdot}$ is the spectral operator norm (largest eigenvalue).
We consider a simple problem, corresponding to the minimization of a 2-D quadratic form $$ f(x) = \frac{1}{2} \pa{ x_1^2 + \eta x_2^2 } ,$$ where $\eta>0$ controls the anisotropy, and hence the difficulty, of the problem.
Anisotropy parameter $\eta$.
eta = 10;
Function $f$.
f = x -> ( x[1,1]^2 + eta*x[2, 1]^2 ) / 2;
Background image of the function.
include("NtToolBox/src/ndgrid.jl")
t = linspace(-.7,.7,101)
(u, v) = meshgrid(t,t)
F = ( u .^ 2 + eta .* v .^ 2 ) ./ 2;
Display the function as a 2-D image.
contourf(t, t, F, 35)
PyObject <matplotlib.contour.QuadContourSet object at 0x0000000000DABAC8>
Gradient.
GradF = x -> [[x[1, 1]],[eta.*x[2, 1]]];
The step size should satisfy $\tau_k < 2/\eta$. We use here a constant step size.
tau = 1.8/eta;
Exercice 1: Perform the gradient descent using a fixed step size $\tau_k=\tau$. Display the decay of the energy $f(x^{(k)})$ through the iteration. Save the iterates so that |X(:,k)| corresponds to $x^{(k)}$.
## Insert your code here.
Display the iterations.
#contourf(t,t,Jmesh,35)
#plot(X[0,:], X[1,:], 'k.-')
Display the iteration for several different step sizes.
## Insert your code here.
Local differential operators like gradient, divergence and laplacian are the building blocks for variational image processing.
Load an image $x_0 \in \RR^N$ of $N=n \times n$ pixels.
n = 256
name = "NtToolBox/src/data/lena.png"
x0 = load_image(name, n);
Display it.
imageplot(x0)
For a continuous function $g$, the gradient reads $$\nabla g(s) = \pa{ \pd{g(s)}{s_1}, \pd{g(s)}{s_2} } \in \RR^2.$$ (note that here, the variable $d$ denotes the 2-D spacial position).
We discretize this differential operator on a discrete image $x \in \RR^N$ using first order finite differences. $$(\nabla x)_i = ( x_{i_1,i_2}-x_{i_1-1,i_2}, x_{i_1,i_2}-x_{i_1,i_2-1} ) \in \RR^2.$$ Note that for simplity we use periodic boundary conditions.
Compute its gradient, using finite differences.
grad = x -> cat(3, x - [x[end, :]'; x[1:end-1, :]], x - [x[:, end] x[:,1:end-1]]);
One thus has $\nabla : \RR^N \mapsto \RR^{N \times 2}.$
v = grad(x0);
One can display each of its components.
imageplot(v[:,:,1], L"\frac{d}{dx}", (1,2,1))
imageplot(v[:,:,2], L"\frac{d}{dy}", (1,2,2))
PyObject <matplotlib.text.Text object at 0x0000000000E92240>
One can display its magnitude $\norm{(\nabla x)_i}$, which is large near edges.
imageplot(sqrt(sum(v .* v, 3))[:, :])