Total: 25 pts
Start date: Friday Sept. 13
Due date: Tuesday Sept. 24
In this homework, you will learn to code one of the building block of many learning algorithms. When one wants to fit a model $f_{\beta}$ (with parameters $\beta$) to some training dataset $\mathcal{D}_T$, as we saw in regression, we usually solve an optimization problem of the form
\begin{align} \min_{f_{\beta}} \sum_{i\in \mathcal{D}_T} \left|f_{\beta}(x_i) - t_i\right|^2 \end{align}When the learning model is relatively simple, such as in linear regression, an explicit solution can be computed in closed form as we saw this week. However, for most models (and particularly for the more interesting ones), such explicit expression does not exists or is not straightforward and we thus need an alternative.
The gradient descent algorithm takes any loss function $\ell(\beta, x)$ for which one wants to find a local minimum, i.e.
\begin{align} \min_{\beta} \ell(\beta; x, t) \end{align}and iterates and the following step
\begin{align} \beta^{t+1} \leftarrow \beta^{t} - \alpha\; \text{grad}\; f(\beta^t) \end{align}for a learning rate $\alpha$.
As a starting point, consider the following function whose plot is given below. You can uncomment the first line to activate interactive plot on jupyter and use the "plt.ioff() %matplotlib inline" cell below to turn the interactive mode off
$$f(x,y) = 3(1-x)^2 e^{-(x^2) - (y+1)^2}- 10(x/5 - x^3 - y^5)e^{-x^2-y^2}- \frac{1}{3} e^{-(x+1)^2 - y^2}$$#%matplotlib notebook # use to activate interactive plotting
import numpy as np
x = np.linspace(-2,2,100)
y = np.linspace(-3,3,100)
xx, yy = np.meshgrid(x, y, indexing='xy')
z = 3*((1-xx)**2) * np.exp(-(xx**2) - (yy+1)**2) \
- 10*(xx/5 - xx**3 - yy**5) * np.exp(-xx**2 - yy**2)- (1/3)* np.exp(-(xx+1)**2 - yy**2)
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import axes3d
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(xx, yy, z, cmap=cm.viridis,alpha=.4,
linewidth=0, antialiased=False)
plt.show()
1a.(10 pts) Code a function 'gradient_descent' that takes as argument a function 'f(x,y)' in two variables and a learning rate $\alpha$ as well as a maximum number of iterations $N$ and return the location $(x,y)$ of a local minimum as well as the value of the function and gradient after the $N$ iterations.
(Hint: note that you don't need to know the explicit expression for the derivative of f(x,y), you can simply use the definition of the derivative $$\frac{\partial f}{\partial y} \approx \frac{[f(x, y+h) - f(x,y)]}{h}$$ for a sufficiently small step size h and similarly for $x$.
# First defining the function
def my_function(x,y):
z = 3*((1-x)**2) * np.exp(-(x**2) - (y+1)**2) \
- 10*(x/5 - x**3 - y**5) * np.exp(-x**2 - y**2)- (1/3)* np.exp(-(x+1)**2 - y**2)
return z
def gradient_descent(my_function,x0,learning_rate, epsilong, N_iter):
1b.(5pts) Extend your algorithm to a function of $D$ variables, $f(x_1, x_2, \ldots, x_D)$
# put the code here
1b.(10 pts) Now that you have a gradient descent algorithm, we want to use it on a simple regression problem. Apply your GD iterations to the regression problem below.
Generate and plot the line $t(x) = \beta_0 + \beta_1 x$ for $\beta_0 = 1$ and $\beta_1 = 2$. (use linspace with 100 points between $-5$ and $5$)
Generate $20$ samples along the line (use linspace with 20). For each of those samples, generate the data/feature vectors $(t_i, x_i, x_i^2)$ where $t_i$ si defined as
and $\varepsilon_i$ is some random gaussian noise with mean $0$ and variance $3$.
where $\mathbf{\beta} = (\beta_0, \beta_1,\beta_2)$, with your gradient descent algorithm.
Then plot the values $\text{err}(\lambda)$ as a function of $\lambda$.
import numpy as np
# put your code here