This jupyter notebook is part of a collection of notebooks on various topics of Data-Driven Signal Processing. Please direct questions and suggestions to Sascha.Spors@uni-rostock.de.
This notebook illustrates the Gradient Descent algorithm at the example of finding the global minimum of a convex bivariate function.
import numpy as np
import matplotlib.pyplot as plt
The Jupyter interactive widgets framework ipympl
provides means for interactive visualization of the two-dimensional function. This may require the installation of the ipympl
package using e.g. conda or pip. Uncomment the following magic if the package is installed and you want to use the interactive plotting features.
# %matplotlib widget
Lets define a convex polynomial bivariate function and its gradient
def function(theta):
return (theta[0, :]-2)**2 + 2 * theta[1, :]**2 + 1
def gradient(theta):
return np.array((2*theta[0]-4, 4*theta[1]))
For the sake of illustration the function is plotted
def plot_function():
theta1 = np.linspace(0, 4, 100)
theta2 = np.linspace(-2, 2, 100)
T1, T2 = np.meshgrid(theta1, theta2)
Z = function(np.array((T1, T2)))
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.plot_surface(T1, T2, Z, antialiased=True,
alpha=.5, zorder=-10, color='C0')
ax.plot_wireframe(T1, T2, Z, rstride=5, cstride=5,
color='black', linewidth=.2)
ax.set_xlabel(r'$\theta_0$')
ax.set_ylabel(r'$\theta_1$')
ax.set_zlabel(r'$f(\mathbf{\theta})$')
ax.view_init(elev=51, azim=-57)
return ax, T1, T2
plot_function();
First a function is defined which implements the Gradient descent iteration
def gradient_descent(gradient, theta_0, lr=0.01, max_iter=100, tol=1e-6):
theta_min = theta_0.copy()
history = list()
history.append(theta_0.copy())
for _ in range(max_iter):
delta = -lr * gradient(theta_min)
if np.linalg.norm(delta) <= tol:
break
theta_min += delta
history.append(theta_min.copy())
return np.array(history)
history = gradient_descent(gradient, np.array([0.5,2.0]), lr=0.2, max_iter=1000, tol=1e-6)
print('Estimated (local) minimum is at {}'.format(history[-1]))
Estimated (local) minimum is at [1.99999846e+00 2.68435456e-19]
Lets illustrate the convergence of the algorithm by plotting the result of the individual iterations in parameter space together with the function values
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
theta1 = np.linspace(0, 4, 100)
theta2 = np.linspace(-2, 2, 100)
T1, T2 = np.meshgrid(theta1, theta2)
Z = function(np.array((T1, T2)))
cf_ = ax.contourf(T1, T2, Z, 20)
fig.colorbar(cf_, fraction=0.046, pad=0.04)
ax.scatter(history[:, 0], history[:, 1], c='C3', marker='o')
ax.plot(history[:, 0], history[:, 1], color='C3')
ax.set_xlabel(r'$\theta_0$')
ax.set_ylabel(r'$\theta_1$')
ax.set_aspect('equal')
As alternative visualisation of the convergence, the results of the individual iterations are plotted onto the function
ax, T1, T2 = plot_function()
ax.plot(history[:, 0], history[:, 1], function(
history.T), color='C3', zorder=0, alpha=1)
ax.scatter(history[:, 0], history[:, 1], function(
history.T), c='C3', marker='o', alpha=1, zorder=0)
ax.set_xlabel(r'$\theta_0$')
ax.set_ylabel(r'$\theta_1$')
ax.set_zlabel(r'$f(\mathbf{\theta})$')
ax.view_init(elev=51, azim=-57)
Copyright
This notebook is provided as Open Educational Resource. The text is licensed under Creative Commons Attribution 4.0 , the code of the IPython examples under the MIT license. Please attribute the work as follows: Sascha Spors, Data driven audio signal processing - Lecture supplementals.