In this short notebook, we will see how to use the gradient obtained with Autograd to perform optimization of an objective function.
Then we will also present some off-the-shelf Pytorch optimizers and learning rate schedulers.
As an eye candy, we will finish with some live optimization vizualisations.
import sys
import torch
if 'google.colab' in sys.modules: # Execute if you're using Google Colab
!wget -q https://raw.githubusercontent.com/theevann/amld-pytorch-workshop/master/live_plot.py -O live_plot.py
!pip install -q ipympl
%matplotlib ipympl
torch.set_printoptions(precision=3)
We will start with a simple example : minimizing the square function.
def f(x):
return x ** 2
We will minimize the function $f$ "by hand" using the gradient descent algorithm.
As a reminder, the update step of the algorithm is: $$x_{t+1} = x_{t} - \lambda \nabla_x f (x_t)$$
Note:
x.grad
once we run the backward
function.x.grad
after each iteration.with torch.no_grad():
context for the update step since we want to change x
in place but don't want autograd to track this change.x0 = 8
lr = 0.01
iterations = 10
x = torch.Tensor([x0]).requires_grad_()
y = f(x)
for i in range(iterations):
# < YOUR CODE HERE >
print(y.data)
with torch.no_grad()
?¶Because x
"requires grad", any operation we apply to x
is recorded for automatic differentiation. As we don't want to track the update step of the parameters, we need to "tell" autograd not to track this change. This is done by using torch.no_grad()
.
PyTorch provides most common optimization algorithms encapsulated into "optimizer classes".
An optimizer is an object that automatically loops through all the numerous parameters of your model and performs the (potentially complex) update step for you.
You first need to import torch.optim
.
import torch.optim as optim
Below are the most commonly used optimizers. Each of them has its specific parameters that you can check on the Pytorch Doc.
parameters = [x] # This should be the list of model parameters
optimizer = optim.SGD(parameters, lr=0.01, momentum=0.9)
optimizer = optim.Adam(parameters, lr=0.01)
optimizer = optim.Adadelta(parameters, lr=0.01)
optimizer = optim.Adagrad(parameters, lr=0.01)
optimizer = optim.RMSprop(parameters, lr=0.01)
optimizer = optim.LBFGS(parameters, lr=0.01)
# and there is more ...
Now, let's use an optimizer to do the optimization !
You will need 2 new functions:
optimizer.zero_grad()
: This function sets the gradient of the parameters (x
here) to 0 (otherwise it will get accumulated)optimizer.step()
: This function applies an update stepx0 = 8
lr = 0.01
iterations = 10
x = torch.Tensor([x0]).requires_grad_()
y = f(x)
# Define your optimizer
optimizer = # < YOUR CODE HERE >
for i in range(iterations):
# < YOUR CODE HERE >
print(y.data)
In addition to an optimizer, a learning rate scheduler can be used to adjust the learning rate during training by reducing it according to a pre-defined schedule.
Below are some of the schedulers available in PyTorch.
optim.lr_scheduler.LambdaLR
optim.lr_scheduler.ExponentialLR
optim.lr_scheduler.MultiStepLR
optim.lr_scheduler.StepLR
# and some more ...
Let's try optim.lr_scheduler.ExponentialLR
:
def f(x):
return x.abs() * 5
x0 = 8
lr = 0.5
iterations = 150
x = torch.Tensor([x0]).requires_grad_()
optimizer = optim.SGD([x], lr=lr)
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, 0.8)
for i in range(iterations):
optimizer.zero_grad()
y = f(x)
y.backward()
optimizer.step()
scheduler.step()
print(y.data, " | lr : ", optimizer.param_groups[0]['lr'])
Below are some live plots to see what actually happens when you optimize a function.
You can play with learning rates, optimizers and also define new functions to optimize !
Note: These are not stricly speaking live plots as it is not possible to do so in colab. We actually create a video of the optimization process instead
from live_plot import anim_2d
def function_2d(x):
return x ** 2 / 20 + x.sin().tanh()
x0 = 8
lr = 2
iterations = 15
points= []
x_range = torch.arange(-10, 10, 0.1)
x = torch.Tensor([x0]).requires_grad_()
optimizer = torch.optim.Adam([x], lr=lr)
for i in range(iterations):
optimizer.zero_grad()
f = function_2d(x)
f.backward()
points += [(x.item(), f.item())]
optimizer.step()
anim_2d(x_range, function_2d, points, 400)
from live_plot import anim_3d
Choose a function below and run the cell
elev, azim = 40, 250
x0, y0 = 6, -0.01
x_range = torch.arange(-10, 10, 1).float()
y_range = torch.arange(-15, 10, 2).float()
def function_3d(x, y):
return x ** 2 - y ** 2
elev, azim = 30, 130
x0, y0 = 10, -4
x_range = torch.arange(-10, 15, 1).float()
y_range = torch.arange(-15, 10, 2).float()
def function_3d(x, y):
return x ** 3 - y ** 3
elev, azim = 80, 130
x0, y0 = 4, -5
x_range = torch.arange(-10, 10, .5).float()
y_range = torch.arange(-10, 10, 1).float()
def function_3d(x, y):
return (x ** 2 + y ** 2).sqrt().sin()
elev, azim = 37, 120
x0, y0 = 6, -15
x_range = torch.arange(-10, 12, 1).float()
y_range = torch.arange(-25, 5, 1).float()
# lr 0.15 momentum 0.5
def function_3d(x, y):
return (x ** 2 / 20 + x.sin().tanh()) * (y.abs()) ** 1.2 + 5 * x.abs() + (y + 7)**2 / 10
Optimize the function
lr = .1
iterations = 15
x = torch.Tensor([x0]).requires_grad_()
y = torch.Tensor([y0]).requires_grad_()
optimizer = torch.optim.SGD([x, y], lr=lr)
points = []
for i in range(iterations):
optimizer.zero_grad()
f = function_3d(x, y)
f.backward()
points += [(x.item(), y.item(), f.item())]
optimizer.step()
anim_3d(x_range, y_range, elev, azim, function_3d, points, 100)