from __future__ import division
%pylab inline
%load_ext autoreload
%autoreload 2
Populating the interactive namespace from numpy and matplotlib
$\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\pa}[1]{\left(#1\right)}$
We consider the problem of finding a minimizer of a convex smooth function $\;f : \RR^d \rightarrow \RR$; that is, we want to solve $$\umin{x \in \RR^d} f(x)$$
Note that the minimizer is not necessarily unique.
The simplest method is gradient descent, which iteratively computes $$ x^{(k+1)} = x^{(k)} - \tau \nabla f(x^{(k)}), $$ where $\nabla f(x) \in \RR^d$ is the gradient of $f$ at $x$, $x^{(0)} \in \RR^d$ is an arbitrary initial point, and the stepsize $\tau$ must satisfy $$0<\tau<2/\beta,$$ to have convergence, where $\beta$ is a Lipschitz constant of $\nabla f$; that is, $$ \forall (x,x') \in \RR^N \times \RR^N, \quad \norm{\nabla f(x) - \nabla f(x')} \leq \beta \norm{x-x'}. $$
For instance, if $f$ is of class $C^2$, $$\beta= \sup_x \norm{Hf(x)},$$ where $Hf(x) \in \RR^{d \times d}$ is the Hessian of $f$ at $x$ and $\norm{\cdot}$ is the spectral operator norm (largest eigenvalue).
We consider a simple problem, corresponding to the minimization of a 2-D ($d=2$) quadratic form $$ f(x) = \frac{1}{2} \pa{ x_1^2 + \eta x_2^2 } ,$$ where $\eta>0$ controls the anisotropy, and hence the difficulty, of the problem.
We can note that minimizing a strongly convex quadratic function is equivalent to solving a positive definite linear system. Gradient descent is then equivalent to the Richardson iteration applied to this linear system.
We define the anisotropy parameter $\eta$:
eta = 8
We define the function $f$:
f = lambda x : (x[0]**2 + eta*x[1]**2) / 2
Note: a 'lambda' function is a one-line function definition;
in Matlab, this would be f=@(x)(x(1)^2+eta*x(2)^2)/2;
An example of function evaluation:
f([2,3])
38.0
We display the function using a contourplot:
figsize(6,6)
t = linspace(-.7,.7,101)
[u,v] = meshgrid(t,t)
F = (u**2 + eta*v**2) / 2
contourf(t,t,F,35)
<matplotlib.contour.QuadContourSet instance at 0x10853c9e0>
We define the gradient of $f$:
Grad_f = lambda x : array([x[0], eta*x[1]])
An example of evaluation:
Grad_f([1,2])
array([ 1, 16])
Since $f$ is quadratic, its Hessian is the constant matrix $\left(\begin{array}{cc}1&0\\0&\eta\end{array}\right)$. Its spectral norm, which is the Lipschitz constant of Grad_f
, is $\beta=\max(1,\eta)$.
The stepsize $\tau$ must satisfy $0< \tau < 2/\beta$.
tau = 1.8/max(eta,1); tau
0.225
Now we implement the gradient descent method: given the initial estimate $x^{(0)}$ of the solution, the stepsize $\tau$ and the number $k$ of iterations, we compute $x^{(k)}$:
nbiter = 10
x = [1,1] # initial estimate of the solution
for iter in range(nbiter): # iter goes from 0 to nbiter-1
x = x - tau*Grad_f(x)
x # to display x, like in Matlab. Use print(x) if this is not the last command of the cell, else nothing is displayed
array([ 0.07816584, 0.10737418])
Note: there is no 'end' for the loops in Python; instead, the content of the loop is determined by the indentation: here four blank spaces before x=...
f(_) # _ is the current variable, like Matlab's ans
0.049171809817584886
We encapsulate the code in a function called GradientDescent
.
def GradDescent(Grad_f, x0, nbiter, tau):
x = x0;
for iter in range(nbiter): # iter goes from 0 to nbiter-1
x = x - tau*Grad_f(x) # x has type 'array'. Like in C, one can also write x-=tau*Grad_f(x)
return x
GradDescent(Grad_f,[1,1],10,tau)
array([ 0.07816584, 0.10737418])
We define a function called GradDescentArray
which puts the iterates in a 'matrix' (a 2-D array in fact); they are first put in a list and the list is converted to an array at the end (the + operation on lists concatenates them, whereas on arrays this is the classical elementwise addition).
def GradDescentArray(Grad_f, x0, nbiter, tau):
x = x0;
xlist=[list(x0)]; # the cast is just in case x0 would be an array
for iter in range(nbiter):
x = x - tau*Grad_f(x)
xlist = xlist + [list(x)]
return array(xlist)
xarray=GradDescentArray(Grad_f,[0.6,0.6],10,0.225)
xarray
array([[ 0.6 , 0.6 ], [ 0.465 , -0.48 ], [ 0.360375 , 0.384 ], [ 0.27929063, -0.3072 ], [ 0.21645023, 0.24576 ], [ 0.16774893, -0.196608 ], [ 0.13000542, 0.1572864 ], [ 0.1007542 , -0.12582912], [ 0.07808451, 0.1006633 ], [ 0.06051549, -0.08053064], [ 0.04689951, 0.06442451]])
We can access the elements of an array like in Matlab. Be careful: like in C, the first element has index 0! Some examples:
xarray[1,0]
0.46499999999999997
xarray[1,:]
array([ 0.465, -0.48 ])
xarray[:,0]
array([ 0.6 , 0.465 , 0.360375 , 0.27929063, 0.21645023, 0.16774893, 0.13000542, 0.1007542 , 0.07808451, 0.06051549, 0.04689951])
We can apply the function $f$ to every iterate $x^{(k)}$ of the list xarray
, using the command map
:
map(f,xarray)
[1.6199999999999999, 1.0297125000000003, 0.65475907031250036, 0.41648898660644562, 0.26501726238049639, 0.16868867468928564, 0.10740675137733219, 0.06840757437694138, 0.043580991731946392, 0.027771796276949725, 0.017701851534330564]
We plot the cost function $f(x^{(k)})$ as a function of $k$, in log-scale:
plot(range(len(xarray)),log10(map(f,xarray)),'o-')
[<matplotlib.lines.Line2D at 0x1086b7b50>]
We remark that because the function $f$ is strongly convex, the convergence is linear. Also, we can note that gradient descent is monotonic: $f(x^{(k)})$ is decreased at every iteration.
We plot the iterates above the contourplot of $f$:
figsize(8,8)
contourf(t,t,F,35)
xarray=GradDescentArray(Grad_f,[0.6,0.6],30,0.225)
plot(xarray[:,0], xarray[:,1], 'w.-')
[<matplotlib.lines.Line2D at 0x1062a1b90>]
Local differential operators like the discrete gradient are fundamental for variational image processing.
We load the image Lena in the array $x^\sharp$ (we choose the symbol # - 'sharp', because the image is clean, not degraded).
from scipy import misc
xsharp = misc.lena()
print(xsharp.shape) # like Matlab's size(xsharp). Given as a tuple.
print("The size of the image is %s x %s." % (xsharp.shape[0],xsharp.shape[1]))
print("The range of the pixel values is [%s,%s]." % (xsharp.min(),xsharp.max()))
xsharp = xsharp.astype(float32) # like Matlab's xsharp=double(xsharp) so that the pixel values are floating point numbers
(512, 512) The size of the image is 512 x 512. The range of the pixel values is [25,245].
We display the image.
figsize(11,11)
imshow(xsharp, interpolation='nearest', cmap=cm.gray, vmin=0, vmax=255)
# Without specifying vmin and vmax, imshow auto-adjusts its range so that black and white are
# the min and max of the data, respectively, like Matlab's imagesc.
colorbar() # displays the color bar close to the image
#axis('off') # uncomment to remove the axes
subplots_adjust(top=0.75)
title('This is Lena')
<matplotlib.text.Text at 0x10958b1d0>