We consider the following function.
\begin{align} F(x_1, x_2) = 3(1-x_1)^2\exp(-(x_1^2) - (x_2+1)^2)\\ - 10(x_1/5 - x_1^3 - x_2^5)\exp(-x_1^2-x_2^2)\\ - (1/3)\exp(-(x_1+1)^2 - x_2^2) \end{align}The surface plot of this function is given below together with its contour plot. The function has a single global minimum located near $(0.23, -1.62)$ and shown in red in the contour plot.
We want to implement gradient descent iterations on that function. Starting from a random initial point $(x_1, x_2)$, code the following updates
\begin{align} x_1^{(k+1)} = x_1^{(k)} - \eta * \text{grad}_{x_1} F(x_1, x_2)\\ x_2^{(k+1)} = x_2^{(k)} - \eta * \text{grad}_{x_2} F(x_1, x_2) \end{align}where $\text{grad}_{x_i}$ represents the gradient of $F(x_1, x_2)$ with respect to $x_i$. Choose a sufficiently small learning rate and plot the iterates (in white) on the contour plot. Repeat your experiments for various initial iterates.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
# Make data.
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
x1, x2 = np.meshgrid(x, y)
F = 3*(1-x1)**2 * np.exp(-(x1**2) - (x2+1)**2)\
- 10*(np.true_divide(x1,5) - x1**3 - x2**5)*np.exp(-x1**2 - x2**2)\
- np.true_divide(1,3)*np.exp(-(x1+1)**2 - x2**2)
# Plot the surface.
surf = ax.plot_surface(x1, x2, F, linewidth=0, alpha=1, cmap = 'viridis')
plt.show()
fig1, ax = plt.subplots(constrained_layout=True)
contour = ax.contourf(x1, x2, F,cmap = 'viridis')
plt.scatter(0.23, -1.62,c='r',marker='X')
plt.show()
# put your solution here
We consider the simple regression problem below, similar to the one discussed in class. Find the model that minimizes the sum of squares loss
\begin{align} \ell(\boldsymbol \beta) = \frac{1}{2N}\sum_{i=1}^N (t_{\text{noisy}}^{(i)} - (\beta_0 +\beta_1 x^{(i)}))^2 \end{align}using the Normal Equations. To do this:
Then compute the matrix $\tilde{\boldsymbol X}^T\tilde{\boldsymbol X}$ and the vector $\tilde{\boldsymbol X}^T\boldsymbol t$ where $\boldsymbol t = [t_{\text{noisy}}^{(1)}, \ldots , t^{(N)}_{\text{noisy}}]^T$
Finally solve the equations through
using the function np.linalg.inv from the linear algebra package. Plot the result in green on top of the plot below and compare with the true (blue) (unknown) model.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,5,10)
noise = np.random.normal(0,.3,len(x))
beta_true = np.random.normal(0,1,2)
t = beta_true[0] + beta_true[1]*x
tnoisy = t+noise
plt.scatter(x, tnoisy, c='r')
plt.plot(x, t)
plt.show()
# put your code here