%matplotlib inline
from preamble import *
import torch
import torch.nn as nn
SEED = 10
torch.manual_seed(SEED)
torch.backends.openmp.deterministic = True
np.random.seed(SEED)
Check slide 13 of the lecture notes for a visualization of the dimensions.
where the $+1$ accounts for the bias unit
$$ \pmb{X} = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_p \end{bmatrix} \ \ ; \ \ \pmb{\theta}^j = \begin{bmatrix} \theta_{10} & \dots & \theta_{1(u^j + 1)}\\ \theta_{20} & \ddots\\ \vdots \\ \theta_{(u^{j+1}) 0} & & \theta_{(u^{j+1})(u^j + 1)}\\ \end{bmatrix} $$In this notebook we will train a neural network to fit an arbitrary function.
Let's create a simple non-linear function to fit with our neural network:
sample_points = 3e3
x_lim = 100
x = np.linspace(0, x_lim, int(sample_points))
y = np.sin(x * x_lim * 1e-4) * np.cos(x * x_lim * 1e-3) * 3
plt.plot(x, y)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Function to be fitted')
Text(0.5, 1.0, 'Function to be fitted')
n_input = 1
n_out = 1
In order for the model to take each point of the data one by one we need to do some additional re-shaping, where we introduce an additional dimension for each entry:
x_reshape = x.reshape((int(len(x) / n_input), n_input))
y_reshape = y.reshape((int(len(y) / n_out), n_out))
# # Uncomment to check the shape change
# print(x.shape, y.shape)
# print(x_reshape.shape, y_reshape.shape)
# print(x[10], x_reshape[10])
The data that we will input to the model needs to be of the type torch.float32
Side Remark: The default dtype of torch tensors (also the layer parameters) is torch.float32
, which is related to the GPU performance optimization. If one wants to use torch.float64
/torch.double
instead, one can set the tensors to double precision via v = v.double()
or set the global precision via torch.set_default_dtype(torch.float64)
. Just keep in mind, the NN parameters and the input tensors should have the same precision.
Before starting, let's convert our data numpy arrays to torch tensors:
x_torch = torch.from_numpy(x_reshape)
y_torch = torch.from_numpy(y_reshape)
# Type checking:
print(x.dtype, y.dtype)
print(x_torch.dtype, y_torch.dtype)
float64 float64 torch.float64 torch.float64
The type is still not correct, but we can easily convert it:
x_torch = x_torch.to(dtype=torch.float32)
y_torch = y_torch.to(dtype=torch.float32)
# Type checking:
print(x.dtype, y.dtype)
print(x_torch.dtype, y_torch.dtype)
float64 float64 torch.float32 torch.float32
We will also need to normalize the data to make sure we are in the non-linear region of the activation functions:
x_norm = torch.nn.functional.normalize(x_torch, p=5, dim=0)
y_norm = torch.nn.functional.normalize(y_torch, p=5, dim=0)
The torch.nn.functional.normalize
function performs $L_p$ normalization, where the $L_p$ norm is:
plt.plot(x_norm.detach().numpy(), y_norm.detach().numpy())
plt.title('Normalized function')
Text(0.5, 1.0, 'Normalized function')
Sequential
stands for sequential container, where modules can be added sequentially and are connected in a cascading way. The output for each module is forwarded sequentially to the next.Sequential
n_hidden_01 = 5
model0 = nn.Sequential(nn.Linear(n_input, n_hidden_01),
nn.Tanh(),
nn.Linear(n_hidden_01, n_out),
nn.Tanh()
)
print(model0)
Sequential( (0): Linear(in_features=1, out_features=5, bias=True) (1): Tanh() (2): Linear(in_features=5, out_features=1, bias=True) (3): Tanh() )
n_hidden_11 = 10
model1 = nn.Sequential(nn.Linear(n_input, n_hidden_11),
nn.Tanh(),
nn.Linear(n_hidden_11, n_out),
nn.Tanh()
)
print(model1)
Sequential( (0): Linear(in_features=1, out_features=10, bias=True) (1): Tanh() (2): Linear(in_features=10, out_features=1, bias=True) (3): Tanh() )
n_hidden_21 = 5
n_hidden_22 = 5
model2 = nn.Sequential(nn.Linear(n_input, n_hidden_21),
nn.Tanh(),
nn.Linear(n_hidden_21, n_hidden_22),
nn.Tanh(),
nn.Linear(n_hidden_22, n_out),
nn.Tanh()
)
print(model2)
Sequential( (0): Linear(in_features=1, out_features=5, bias=True) (1): Tanh() (2): Linear(in_features=5, out_features=5, bias=True) (3): Tanh() (4): Linear(in_features=5, out_features=1, bias=True) (5): Tanh() )
You can uncomment and execute the next line to explore the methods of the model
object you created
# dir(model)
Try the parameters
method (needs to be instantiated).
model0.parameters()
<generator object Module.parameters at 0x16923d350>
The parameters
method gives back a generator, which means it needs to be iterated over to give back an output:
for element in model0.parameters():
print(element.shape)
torch.Size([5, 1]) torch.Size([5]) torch.Size([1, 5]) torch.Size([1])
Let's have a look at what the contents of those tensors:
for element in model0.parameters():
print(element)
Parameter containing: tensor([[-0.0838], [-0.0343], [-0.3750], [ 0.2300], [-0.5721]], requires_grad=True) Parameter containing: tensor([-0.1763, 0.3876, 0.9386, 0.2356, -0.3393], requires_grad=True) Parameter containing: tensor([[ 0.0429, -0.0501, 0.1825, 0.0512, 0.1752]], requires_grad=True) Parameter containing: tensor([0.4337], requires_grad=True)
torch.nn
provides many different types of loss functions. One of the most popular ones in the Mean Squared Error (MSE) since it can be applied to a wide variety of cases.loss_function = nn.MSELoss()
torch.optim
provides implementations of various optimization algorithms. The optimizer object will hold the current state and will update the parameters of the model based on computer gradients. It takes as an input an iterable containing the model parameters, that we explored before.
batch_size = 200 # how many points to pass to the model at a time
learning_rate = 0.015
optimizer0 = torch.optim.Adam(model0.parameters(), lr=learning_rate)
optimizer1 = torch.optim.Adam(model1.parameters(), lr=learning_rate)
optimizer2 = torch.optim.Adam(model2.parameters(), lr=learning_rate)
# optimizer0 = torch.optim.SGD(model0.parameters(), lr=learning_rate)
# optimizer1 = torch.optim.SGD(model1.parameters(), lr=learning_rate)
# optimizer2 = torch.optim.SGD(model2.parameters(), lr=learning_rate)
The model learns iteratively in a loop of a given number of epochs. Each loop consists of:
epochs = 1000
losses0 = []
for epoch in range(epochs):
pred_y0 = model0(x_norm)
optimizer0.zero_grad()
loss0 = loss_function(pred_y0, y_norm)
losses0.append(loss0.item())
loss0.backward()
optimizer0.step()
losses1 = []
for epoch in range(epochs):
pred_y1 = model1(x_norm)
optimizer1.zero_grad()
loss1 = loss_function(pred_y1, y_norm)
losses1.append(loss1.item())
loss1.backward()
optimizer1.step()
losses2 = []
for epoch in range(epochs):
pred_y2 = model2(x_norm)
optimizer2.zero_grad()
loss2 = loss_function(pred_y2, y_norm)
losses2.append(loss2.item())
loss2.backward()
optimizer2.step()
plt.plot(losses0, label='Model 0')
plt.plot(losses1, label='Model 1')
plt.plot(losses2, label='Model 2')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.title("Learning rate %f"%(learning_rate))
plt.legend()
plt.show()
test_points = 50
x_test = np.random.uniform(0, np.max(x_norm.detach().numpy()), test_points)
x_test_reshape = x_test.reshape((int(len(x_test) / n_input), n_input))
x_test_torch = torch.from_numpy(x_test_reshape)
x_test_torch = x_test_torch.to(dtype=torch.float32)
Now we predict the y-value with our model:
y0_test_torch = model0(x_test_torch)
y1_test_torch = model1(x_test_torch)
y2_test_torch = model2(x_test_torch)
plt.plot(x_norm.detach().numpy(), y_norm.detach().numpy())
plt.scatter(x_test_torch.detach().numpy(), y0_test_torch.detach().numpy(), color='red', label='Model 0')
plt.scatter(x_test_torch.detach().numpy(), y1_test_torch.detach().numpy(), color='orange', label='Model 1')
plt.scatter(x_test_torch.detach().numpy(), y2_test_torch.detach().numpy(), color='green', label='Model 2')
plt.legend()
<matplotlib.legend.Legend at 0x169346f70>
Some ideas:
Section 5
to 5000 and re-train the models. What happens?Reproducibility
cell at the very top. How do the results change?Section 4
from Adam
to SGD
and re-train the models. What happens?Section 5
to 1000000. What happens?Section 4
to 0.05. How do the results change? what does it tell us about our previous value?# %reset -f