#!/usr/bin/env python # coding: utf-8 # # Introduction to PyTorch # `PyTorch` is an open source machine learning framework that allows you to write your own neural networks and optimize them efficiently. We choose to teach PyTorch because it is well established, has a huge developer community (originally developed by Facebook), is very flexible and especially used in research. Many current papers publish their code in PyTorch, and thus it is good to be familiar with PyTorch as well. # # # In[ ]: # In[43]: ## Standard libraries import os import math import numpy as np import time ## Imports for plotting import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') # ## The Basic of PyTorch # # We will start with reviewing the very basic concepts of PyTorch. As a prerequisite, we recommend to be familiar with the `numpy` package as most machine learning frameworks are based on very similar concepts. If you are not familiar with `numpy` yet, don't worry: here is a [tutorial](https://numpy.org/devdocs/user/quickstart.html) to go through. # # In[44]: Install PyTorch # So, let's start with importing PyTorch. The package is called `torch`, based on its original framework [Torch](http://torch.ch/). As a first step, we can check its version: # In[45]: import torch print("Using torch", torch.__version__) # In general, it is recommended to keep the PyTorch version updated to the newest one. It is okay to install a lower version. The interface between PyTorch versions doesn't change too much, and hence all code should also be runnable with newer versions. # # As in every machine learning framework, PyTorch provides functions that are stochastic like generating random numbers. However, a very good practice is to setup your code to be reproducible with the exact same random numbers. This is why we set a seed below. # In[46]: torch.manual_seed(42) # Setting the seed # ### Tensors # # Tensors are the PyTorch equivalent to Numpy arrays, with the addition to also have support for GPU acceleration (more on that later). # The name "tensor" is a generalization of concepts you already know. For instance, a vector is a 1-D tensor, and a matrix a 2-D tensor. When working with neural networks, we will use tensors of various shapes and number of dimensions. # # # #### Initialization # # Let's first start by looking at different ways of creating a tensor. There are many possible options, the most simple one is to call `torch.Tensor` passing the desired shape as input argument: # In[47]: x = torch.Tensor(2, 3, 4) print(x) # The function `torch.Tensor` allocates memory for the desired tensor, but reuses any values that have already been in the memory. To directly assign values to the tensor during initialization, there are many alternatives including: # # * `torch.zeros`: Creates a tensor filled with zeros # * `torch.ones`: Creates a tensor filled with ones # * `torch.rand`: Creates a tensor with random values uniformly sampled between 0 and 1 # * `torch.randn`: Creates a tensor with random values sampled from a normal distribution with mean 0 and variance 1 # * `torch.arange`: Creates a tensor containing the values $N,N+1,N+2,...,M$ # * `torch.Tensor` (input list): Creates a tensor from the list elements you provide # In[48]: # Create a tensor from a (nested) list x = torch.Tensor([[1, 2], [3, 4]]) print(x) # In[49]: # Create a tensor with random values between 0 and 1 with the shape [2, 3, 4] x = torch.rand(2, 3, 4) print(x) # You can obtain the shape of a tensor in the same way as in numpy (`x.shape`), or using the `.size` method: # In[50]: shape = x.shape print("Shape:", x.shape) size = x.size() print("Size:", size) dim1, dim2, dim3 = x.size() print("Size:", dim1, dim2, dim3) # Most common functions you know from numpy can be used on tensors as well. Actually, since numpy arrays are so similar to tensors, we can convert most tensors to numpy arrays (and back) but we don't need it too often. # In[51]: # convert numpy to tensor or vise versa np_data = np.arange(6).reshape((2, 3)) torch_data = torch.from_numpy(np_data) tensor2array = torch_data.numpy() print( '\nnumpy array:', np_data, # [[0 1 2], [3 4 5]] '\ntorch tensor:', torch_data, # 0 1 2 \n 3 4 5 [torch.LongTensor of size 2x3] '\ntensor to array:', tensor2array, # [[0 1 2], [3 4 5]] ) # #### Operations # # Most operations that exist in numpy, also exist in PyTorch. A full list of operations can be found in the [PyTorch documentation](https://pytorch.org/docs/stable/tensors.html#), but we will see the most important ones here. # # The simplest operation is to add two tensors: # In[52]: x1 = torch.rand(2, 3) x2 = torch.rand(2, 3) y = x1 + x2 print("X1", x1) print("X2", x2) print("Y", y) # In[54]: x1 = torch.rand(2, 3) x2 = torch.rand(2, 3) print("X1 (before)", x1) print("X2 (before)", x2) x2.add(x1) #x2 = torch.add(x1,x2) #Add a scalar or tensor to self tensor print("X1 (after)", x1) print("X2 (after)", x2) x2.add_(x1) # in place version of add() print("X1 (in place)", x1) print("X2 (in place)", x2) # Another common operation aims at changing the shape of a tensor. A tensor of size (2,3) can be re-organized to any other shape with the same number of elements (e.g. a tensor of size (6), or (3,2), ...). In PyTorch, this operation is called `view`: # In[55]: x = torch.arange(6) print("X", x) # In[58]: x = x.view(2, -1) print("X", x) # Other commonly used operations include matrix multiplications, which are essential for neural networks. # # `torch.matmul`: Performs the matrix product over two tensors, where the specific behavior depends on the dimensions. If both inputs are matrices (2-dimensional tensors), it performs the standard matrix product. For higher dimensional inputs, the function supports broadcasting. # # `torch.mm`: Performs the matrix product over two matrices, but doesn't support broadcasting. # In[13]: # matrix multiplication data = [[1,2], [3,4]] tensor = torch.FloatTensor(data) # 32-bit floating point # correct method print( '\nmatrix multiplication (matmul)', '\nnumpy: ', np.matmul(data, data), # [[7, 10], [15, 22]] '\ntorch.matmul: ', torch.matmul(tensor,tensor), '\ntorch.mm: ', torch.mm(tensor, tensor) # [[7, 10], [15, 22]] ) # #### Indexing # # We often have the situation where we need to select a part of a tensor. Indexing works just like in numpy, so let's try it: # In[14]: x = torch.arange(12).view(3, 4) print("X", x) # In[15]: print(x[:, 1]) # Second column # In[16]: print(x[0]) # First row # In[17]: print(x[:2, -1]) # First two rows, last column # In[18]: print(x[1:3, :]) # Middle two rows # ### Dynamic Computation Graph and Backpropagation # # One of the main reasons for using PyTorch in Deep Learning projects is that we can automatically get **gradients/derivatives** of functions that we define. We will mainly use PyTorch for implementing neural networks, and they are just fancy functions. If we use weight matrices in our function that we want to learn, then those are called the **parameters** or simply the **weights**. # # If our neural network would output a single scalar value, we would talk about taking the **derivative**, but you will see that quite often we will have **multiple** output variables ("values"); in that case we talk about **gradients**. It's a more general term. # # Given an input $\mathbf{x}$, we define our function by **manipulating** that input, usually by matrix-multiplications with weight matrices and additions with so-called bias vectors. As we manipulate our input, we are automatically creating a **computational graph**. This graph shows how to arrive at our output from our input. # PyTorch is a **define-by-run** framework; this means that we can just do our manipulations, and PyTorch will keep track of that graph for us. Thus, we create a dynamic computation graph along the way. # # So, to recap: the only thing we have to do is to compute the **output**, and then we can ask PyTorch to automatically get the **gradients**. # # > **Note: Why do we want gradients?** Consider that we have defined a function, a neural net, that is supposed to compute a certain output $y$ for an input vector $\mathbf{x}$. We then define an **error measure** that tells us how wrong our network is; how bad it is in predicting output $y$ from input $\mathbf{x}$. Based on this error measure, we can use the gradients to **update** the weights $\mathbf{W}$ that were responsible for the output, so that the next time we present input $\mathbf{x}$ to our network, the output will be closer to what we want. # # The first thing we have to do is to specify which tensors require gradients. By default, when we create a tensor, it does not require gradients. # In[19]: x = torch.ones((3,)) print(x.requires_grad) # In[20]: x.requires_grad_(True) print(x.requires_grad) # In order to get familiar with the concept of a computational graph, we will create one for the following function: # # $$y = \frac{1}{|x|}\sum_i \left[(x_i + 2)^2 + 3\right]$$ # # You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$. For this, we want to obtain the gradients $\partial y / \partial \mathbf{x}$. For our example, we'll use $\mathbf{x}=[0,1,2]$ as our input. # In[21]: x = torch.arange(3, dtype=torch.float32, requires_grad=True) # Only float tensors can have gradients print("X", x) # Now let's build the computational graph step by step. # In[22]: a = x + 2 b = a ** 2 c = b + 3 y = c.mean() print("Y", y) print("b",b) # Using the statements above, we have created a computational graph that looks similar to the figure below: # #