Before we dive deep into the world of PyTorch development, it’s important to familiarize yourself with the fundamental data structure in PyTorch: the torch.tensor. By understanding the tensor, you will understand how PyTorch handles and stores data, and since deep learning is fundamentally the collection and manipulation of floating-point numbers, understanding tensors will help you understand how PyTorch implements more advanced functions for deep learning. In addition, you may find yourself using tensor operations frequently when preprocessing input data or manipulating output data during model development
In PyTorch, a tensor is a data structure used to store and manipulate data. Like a NumPy array, a tensor is a multidimensional array containing elements of a single data type. Tensors can be used to represent scalars, vectors, matrices, and n-dimensional arrays and are derived from the torch.Tensor class. However, tensors are more than just arrays of numbers. Creating or instantiating a tensor object from the torch.Tensor class gives us access to a set of built-in class attributes and operations or class methods that provide a robust set of built-in capabilities. This guide describes these attributes and operations in detail.
Tensors also include added benefits that make them more suitable than NumPy arrays for deep learning calculations. First, tensor operations can be performed significantly faster using GPU acceleration. Second, tensors can be stored and manipulated at scale using distributed processing on multiple CPUs and GPUs and across multiple servers. And third, tensors keep track of their graph computations, which is very important in implementing a deep learning library.
Simple example
First, we import the PyTorch library, then we create two tensors, x and y, from two-dimensional lists. Next, we add the two tensors and store the result in z. We can just use the + operator here because the torch.Tensor class supports operator overloading. Finally, we print the new tensor, z, which we can see is the matrix sum of x and y, and we print the size of z. Notice that z is a tensor object itself and the size() method is used to return its matrix dimensions, namely 2 × 3:
import torch
x = torch.tensor([[1,2,3],[4,5,6]])
y = torch.tensor([[7,8,9],[10,11,12]])
z = x + y
print(z)
tensor([[ 8, 10, 12], [14, 16, 18]])
print(z.size())
torch.Size([2, 3])
', '.join(dir(z))
'T, __abs__, __add__, __and__, __array__, __array_priority__, __array_wrap__, __bool__, __class__, __complex__, __contains__, __deepcopy__, __delattr__, __delitem__, __dict__, __dir__, __div__, __doc__, __eq__, __float__, __floordiv__, __format__, __ge__, __getattribute__, __getitem__, __gt__, __hash__, __iadd__, __iand__, __idiv__, __ifloordiv__, __ilshift__, __imod__, __imul__, __index__, __init__, __init_subclass__, __int__, __invert__, __ior__, __ipow__, __irshift__, __isub__, __iter__, __itruediv__, __ixor__, __le__, __len__, __long__, __lshift__, __lt__, __matmul__, __mod__, __module__, __mul__, __ne__, __neg__, __new__, __nonzero__, __or__, __pos__, __pow__, __radd__, __rdiv__, __reduce__, __reduce_ex__, __repr__, __reversed__, __rfloordiv__, __rmul__, __rpow__, __rshift__, __rsub__, __rtruediv__, __setattr__, __setitem__, __setstate__, __sizeof__, __str__, __sub__, __subclasshook__, __torch_function__, __truediv__, __weakref__, __xor__, _backward_hooks, _base, _cdata, _coalesced_, _dimI, _dimV, _grad, _grad_fn, _indices, _is_view, _make_subclass, _nnz, _reduce_ex_internal, _to_sparse_csr, _update_names, _values, _version, abs, abs_, absolute, absolute_, acos, acos_, acosh, acosh_, add, add_, addbmm, addbmm_, addcdiv, addcdiv_, addcmul, addcmul_, addmm, addmm_, addmv, addmv_, addr, addr_, align_as, align_to, all, allclose, amax, amin, angle, any, apply_, arccos, arccos_, arccosh, arccosh_, arcsin, arcsin_, arcsinh, arcsinh_, arctan, arctan_, arctanh, arctanh_, argmax, argmin, argsort, as_strided, as_strided_, as_subclass, asin, asin_, asinh, asinh_, atan, atan2, atan2_, atan_, atanh, atanh_, backward, baddbmm, baddbmm_, bernoulli, bernoulli_, bfloat16, bincount, bitwise_and, bitwise_and_, bitwise_not, bitwise_not_, bitwise_or, bitwise_or_, bitwise_xor, bitwise_xor_, bmm, bool, broadcast_to, byte, cauchy_, cdouble, ceil, ceil_, cfloat, char, cholesky, cholesky_inverse, cholesky_solve, chunk, clamp, clamp_, clamp_max, clamp_max_, clamp_min, clamp_min_, clip, clip_, clone, coalesce, col_indices, conj, contiguous, copy_, copysign, copysign_, cos, cos_, cosh, cosh_, count_nonzero, cpu, cross, crow_indices, cuda, cummax, cummin, cumprod, cumprod_, cumsum, cumsum_, data, data_ptr, deg2rad, deg2rad_, dense_dim, dequantize, det, detach, detach_, device, diag, diag_embed, diagflat, diagonal, diff, digamma, digamma_, dim, dist, div, div_, divide, divide_, dot, double, dsplit, dtype, eig, element_size, eq, eq_, equal, erf, erf_, erfc, erfc_, erfinv, erfinv_, exp, exp2, exp2_, exp_, expand, expand_as, expm1, expm1_, exponential_, fill_, fill_diagonal_, fix, fix_, flatten, flip, fliplr, flipud, float, float_power, float_power_, floor, floor_, floor_divide, floor_divide_, fmax, fmin, fmod, fmod_, frac, frac_, frexp, gather, gcd, gcd_, ge, ge_, geometric_, geqrf, ger, get_device, grad, grad_fn, greater, greater_, greater_equal, greater_equal_, gt, gt_, half, hardshrink, has_names, heaviside, heaviside_, histc, hsplit, hypot, hypot_, i0, i0_, igamma, igamma_, igammac, igammac_, imag, index_add, index_add_, index_copy, index_copy_, index_fill, index_fill_, index_put, index_put_, index_select, indices, inner, int, int_repr, inverse, is_coalesced, is_complex, is_contiguous, is_cuda, is_distributed, is_floating_point, is_leaf, is_meta, is_mkldnn, is_mlc, is_nonzero, is_pinned, is_quantized, is_same_size, is_set_to, is_shared, is_signed, is_sparse, is_sparse_csr, is_vulkan, is_xpu, isclose, isfinite, isinf, isnan, isneginf, isposinf, isreal, istft, item, kron, kthvalue, layout, lcm, lcm_, ldexp, ldexp_, le, le_, lerp, lerp_, less, less_, less_equal, less_equal_, lgamma, lgamma_, log, log10, log10_, log1p, log1p_, log2, log2_, log_, log_normal_, log_softmax, logaddexp, logaddexp2, logcumsumexp, logdet, logical_and, logical_and_, logical_not, logical_not_, logical_or, logical_or_, logical_xor, logical_xor_, logit, logit_, logsumexp, long, lstsq, lt, lt_, lu, lu_solve, map2_, map_, masked_fill, masked_fill_, masked_scatter, masked_scatter_, masked_select, matmul, matrix_exp, matrix_power, max, maximum, mean, median, min, minimum, mm, mode, moveaxis, movedim, msort, mul, mul_, multinomial, multiply, multiply_, mv, mvlgamma, mvlgamma_, name, names, nan_to_num, nan_to_num_, nanmedian, nanquantile, nansum, narrow, narrow_copy, ndim, ndimension, ne, ne_, neg, neg_, negative, negative_, nelement, new, new_empty, new_empty_strided, new_full, new_ones, new_tensor, new_zeros, nextafter, nextafter_, nonzero, norm, normal_, not_equal, not_equal_, numel, numpy, orgqr, ormqr, outer, output_nr, permute, pin_memory, pinverse, polygamma, polygamma_, positive, pow, pow_, prelu, prod, put, put_, q_per_channel_axis, q_per_channel_scales, q_per_channel_zero_points, q_scale, q_zero_point, qr, qscheme, quantile, rad2deg, rad2deg_, random_, ravel, real, reciprocal, reciprocal_, record_stream, refine_names, register_hook, reinforce, relu, relu_, remainder, remainder_, rename, rename_, renorm, renorm_, repeat, repeat_interleave, requires_grad, requires_grad_, reshape, reshape_as, resize, resize_, resize_as, resize_as_, retain_grad, roll, rot90, round, round_, rsqrt, rsqrt_, scatter, scatter_, scatter_add, scatter_add_, select, set_, sgn, sgn_, shape, share_memory_, short, sigmoid, sigmoid_, sign, sign_, signbit, sin, sin_, sinc, sinc_, sinh, sinh_, size, slogdet, smm, softmax, solve, sort, sparse_dim, sparse_mask, sparse_resize_, sparse_resize_and_clear_, split, split_with_sizes, sqrt, sqrt_, square, square_, squeeze, squeeze_, sspaddmm, std, stft, storage, storage_offset, storage_type, stride, sub, sub_, subtract, subtract_, sum, sum_to_size, svd, swapaxes, swapaxes_, swapdims, swapdims_, symeig, t, t_, take, take_along_dim, tan, tan_, tanh, tanh_, tensor_split, tile, to, to_dense, to_mkldnn, to_sparse, tolist, topk, trace, transpose, transpose_, triangular_solve, tril, tril_, triu, triu_, true_divide, true_divide_, trunc, trunc_, type, type_as, unbind, unflatten, unfold, uniform_, unique, unique_consecutive, unsafe_chunk, unsafe_split, unsafe_split_with_sizes, unsqueeze, unsqueeze_, values, var, vdot, view, view_as, vsplit, where, xlogy, xlogy_, xpu, zero_'
Running it on gpu (if available)
device = "cuda" if torch.cuda.is_available() else "cpu"
x = torch.tensor([[1,2,3],[4,5,6]],
device=device)
y = torch.tensor([[7,8,9],[10,11,12]],
device=device)
z = x + y
print(z)
tensor([[ 8, 10, 12], [14, 16, 18]])
print(z.size())
torch.Size([2, 3])
print(z.device)
cpu
The previous section showed a simple way to create tensors; however, there are many other ways to do it. You can create tensors from preexisting numeric data or create random samplings. Tensors can be created from preexisting data stored in array-like structures such as lists, tuples, scalars, or serialized data files, as well as in NumPy arrays.
The following code illustrates some common ways to create tensors. First, it shows how to create a tensor from a list using torch.tensor(). This method can also be used to create tensors from other data structures like tuples, sets, or NumPy arrays:
import numpy
# Created from pre-existing arrays
w = torch.tensor([1,2,3]) # <1>
w = torch.tensor((1,2,3)) # <2>
w = torch.tensor(numpy.array([1,2,3])) # <3>
# Initialized by size
w = torch.empty(100,200) # <4>
w = torch.zeros(100,200) # <5>
w = torch.ones(100,200) # <6>
# Initialized by size with random values
w = torch.rand(100,200) # <7>
w = torch.randn(100,200) # <8>
w = torch.randint(5,10,(100,200)) # <9>
# Initialized with specified data type or device
w = torch.empty((100,200), dtype=torch.float64,
device="cpu")
# Initialized to have same size, data type,
# and device as another tensor
x = torch.empty_like(w)
x = torch.tensor([[1,2,3],[4,5,6]])
print(torch.empty_like(x))
print(torch.empty_like(x))
print(torch.zeros_like(x))
print(torch.ones_like(x))
print(torch.full_like(x, fill_value=5))
tensor([[94291637941632, 2, 3], [ 4, 5, 6]]) tensor([[94291637941632, 2, 3], [ 4, 5, 6]]) tensor([[0, 0, 0], [0, 0, 0]]) tensor([[1, 1, 1], [1, 1, 1]]) tensor([[5, 5, 5], [5, 5, 5]])
Following table lists PyTorch functions used to create tensors. You should use each one with the torch namespace, e.g., torch.empty().
Function | Description |
---|---|
torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False) |
Creates a tensor from an existing data structure |
torch.empty(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a tensor from uninitialized elements based on the random state of values in memory |
torch.zeros(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a tensor with all elements initialized to 0.0 |
torch.ones(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a tensor with all elements initialized to 1.0 |
torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 1D tensor of values over a range with a common step value |
torch.linspace(start, end, steps=100, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 1D tensor of linearly spaced points between the start and end |
torch.logspace(start, end, steps=100, base=10.0, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 1D tensor of logarithmically spaced points between the start and end |
torch.eye(n, m=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a 2D tensor with ones on the diagonal and zeros everywhere else |
torch.full(size, fill_value, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) |
Creates a tensor filled with fill_value |
torch.load(f) | Loads a tensor from a serialized pickle file |
torch.save(f) | Saves a tensor to a serialized pickle file |
During deep learning development, it’s important to be aware of the data type used by your data and its calculations. So when you create tensors, you should control what data types are being used. As mentioned previously, all tensor elements have the same data type. You can specify the data type when creating the tensor by using the dtype parameter, or you can cast a tensor to a new dtype using the appropriate casting method or the to() method, as shown in the following code:
# Specify data type at creation using dtype
w = torch.tensor([1,2,3], dtype=torch.float32)
# Use casting method to cast to a new data type
w.int() # w remains a float32 after cast
w = w.int() # w changes to int32 after cast
# Use to() method to cast to a new type
w = w.to(torch.float64) # <1>
w = w.to(dtype=torch.float64) # <2>
# Python automatically converts data types during operations
x = torch.tensor([1,2,3], dtype=torch.int32)
y = torch.tensor([1,2,3], dtype=torch.float32)
z = x + y # <3>
print(z.dtype)
torch.float32
Table below lists all the available data types in PyTorch. Each data type results in a different tensor class depending on the tensor’s device. The corresponding tensor classes are shown in the two rightmost columns for CPUs and GPUs, respectively.
Data type | dtype | CPU tensor | GPU tensor |
---|---|---|---|
32-bit floating point (default) | torch.float32 or torch.float | torch.FloatTensor | torch.cuda.FloatTensor |
64-bit floating point | torch.float64 or torch.double | torch.DoubleTensor | torch.cuda.DoubleTensor |
16-bit floating point | torch.float16 or torch.half | torch.HalfTensor | torch.cuda.HalfTensor |
8-bit integer (unsigned) | torch.uint8 | torch.ByteTensor | torch.cuda.ByteTensor |
8-bit integer (signed) | torch.int8 | torch.CharTensor | torch.cuda.CharTensor |
16-bit integer (signed) | torch.int16 or torch.short | torch.ShortTensor | torch.cuda.ShortTensor |
32-bit integer (signed) | torch.int32 or torch.int | torch.IntTensor | torch.cuda.IntTensor |
64-bit integer (signed) | torch.int64 or torch.long | torch.LongTensor | torch.cuda.LongTensor |
Boolean | torch.bool | torch.BoolTensor | torch.cuda.BoolTensor |
Indexing, Slicing, Combining, and Splitting Tensors
Once you have created tensors, you may want to access portions of the data and combine or split tensors to form new tensors. The following code demonstrates how to perform these types of operations. You can slice and index tensors in the same way you would slice and index NumPy arrays.
x = torch.tensor([[1,2],[3,4],[5,6],[7,8]])
x
tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
# Indexing, returns a tensor
print(x[1,1])
tensor(4)
# Indexing, returns a value as a Python number
print(x[1,1].item())
4
# Slicing
print(x[:2,1])
tensor([2, 4])
# Boolean indexing
# Only keep elements less than 5
print(x[x<5])
tensor([1, 2, 3, 4])
# Transpose array; x.t() or x.T can be used
print(x.t())
tensor([[1, 3, 5, 7], [2, 4, 6, 8]])
# Change shape; usually view() is preferred over
# reshape()
print(x.view((2,4)))
tensor([[1, 2, 3, 4], [5, 6, 7, 8]])
You can also combine or split tensors by using functions like torch.stack() and torch.unbind(), respectively, as shown in the following code:
# Combining tensors
y = torch.stack((x, x))
print(y)
tensor([[[1, 2], [3, 4], [5, 6], [7, 8]], [[1, 2], [3, 4], [5, 6], [7, 8]]])
x
tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
# Splitting tensors
a,b = x.unbind(dim=1)
print(a,b)
tensor([1, 3, 5, 7]) tensor([2, 4, 6, 8])
# Splitting tensors
a,b,c,d = x.unbind(dim=0)
print(a,b,c,d)
tensor([1, 2]) tensor([3, 4]) tensor([5, 6]) tensor([7, 8])
PyTorch provides a robust set of built-in functions that can be used to access, split, and combine tensors in different ways. Table below lists some commonly used functions to manipulate tensor elements.
Function | Description |
---|---|
torch.cat() | Concatenates the given sequence of tensors in the given dimension. |
torch.chunk() | Splits a tensor into a specific number of chunks. Each chunk is a view of the input tensor. |
torch.gather() | Gathers values along an axis specified by the dimension. |
torch.index_select() | Returns a new tensor that indexes the input tensor along a dimension using the entries in the index, which is a LongTensor. |
torch.masked_select() | Returns a new 1D tensor that indexes the input tensor according to the Boolean mask, which is a BoolTensor. |
torch.narrow() | Returns a tensor that is a narrow version of the input tensor. |
torch.nonzero() | Returns the indices of nonzero elements. |
torch.reshape() | Returns a tensor with the same data and number of elements as the input tensor, but a different shape. Use view() instead to ensure the tensor is not copied. |
torch.split() | Splits the tensor into chunks. Each chunk is a view or subdivision of the original tensor. |
torch.squeeze() | Returns a tensor with all the dimensions of the input tensor of size 1 removed. |
torch.stack() | Concatenates a sequence of tensors along a new dimension. |
torch.t() | Expects the input to be a 2D tensor and transposes dimensions 0 and 1. |
torch.take() | Returns a tensor at specified indices when slicing is not continuous. |
torch.transpose() | Transposes only the specified dimensions. |
torch.unbind() | Removes a tensor dimension by returning a tuple of the removed dimension. |
torch.unsqueeze() | Returns a new tensor with a dimension of size 1 inserted at the specified position. |
torch.where() | Returns a tensor of selected elements from either one of two tensors, depending on the specified condition. |
Deep learning development is strongly based on mathematical computations, so PyTorch supports a very robust set of built-in math functions. Whether you are creating new data transforms, customizing loss functions, or building your own optimization algorithms, you can speed up your research and development with the math functions provided by PyTorch.
PyTorch supports many different types of math functions, including pointwise operations, reduction functions, comparison calculations, and linear algebra operations, as well as spectral and other math computations. The first category of useful math operations we’ll look at are pointwise operations. Pointwise operations perform an operation on each point in the tensor individually and return a new tensor.
They are useful for rounding and truncation as well as trigonometrical and logical operations. By default, the functions will create a new tensor or use one passed in by the out parameter. If you want to perform an in-place operation, remember to append an underscore to the function name.
Table below lists some commonly used pointwise operations.
Operation type | Sample functions |
---|---|
Basic math | add(), div(), mul(), neg(), reciprocal(), true_divide() |
Truncation | ceil(), clamp(), floor(), floor_divide(), fmod(), frac(), lerp(), remainder(), round(), sigmoid(), trunc() |
Complex numbers | abs(), angle(), conj(), imag(), real() |
Trigonometry | acos(), asin(), atan(), cos(), cosh(), deg2rad(), rad2deg(), sin(), sinh(), tan(), tanh() |
Exponents and logarithms | exp(), expm1(), log(), log10(), log1p(), log2(), logaddexp(), pow(), rsqrt(), sqrt(), square() |
Logical | logical_and(), logical_not(), logical_or(), logical_xor() |
Cumulative math | addcdiv(), addcmul() |
Bitwise operators | bitwise_not(), bitwise_and(), bitwise_or(), bitwise_xor() |
Error functions | erf(), erfc(), erfinv() |
Gamma functions | digamma(), lgamma(), mvlgamma(), polygamma() |
The second category of math functions we’ll look at are reduction operations. Reduction operations reduce a bunch of numbers down to a single number or a smaller set of numbers. That is, they reduce the dimensionality or rank of the tensor. Reduction operations include functions for finding maximum or minimum values as well as many statistical calculations, like finding the mean or standard deviation.
These operations are frequently used in deep learning. For example, deep learning classification often uses the argmax() function to reduce softmax outputs to a dominant class.
Function | Description |
---|---|
torch.argmax(input, dim, keepdim=False, out=None) | Returns the index(es) of the maximum value across all elements, or just a dimension if it’s specified |
torch.argmin(input, dim, keepdim=False, out=None) | Returns the index(es) of the minimum value across all elements, or just a dimension if it’s specified |
torch.dist(input, dim, keepdim=False, out=None) | Computes the p-norm of two tensors |
torch.logsumexp(input, dim, keepdim=False, out=None) | Computes the log of summed exponentials of each row of the input tensor in the given dimension |
torch.mean(input, dim, keepdim=False, out=None) | Computes the mean or average across all elements, or just a dimension if it’s specified |
torch.median(input, dim, keepdim=False, out=None) | Computes the median or middle value across all elements, or just a dimension if it’s specified |
torch.mode(input, dim, keepdim=False, out=None) | Computes the mode or most frequent value across all elements, or just a dimension if it’s specified |
torch.norm(input, p='fro', dim=None,__keepdim=False,__out=None, dtype=None) | Computes the matrix or vector norm across all elements, or just a dimension if it’s specified |
torch.prod(input, dim, keepdim=False, dtype=None) | Computes the product of all elements, or of each row of the input tensor if it’s specified |
torch.std(input, dim, keepdim=False, out=None) | Computes the standard deviation across all elements, or just a dimension if it’s specified |
torch.std_mean(input, unbiased=True) | Computes the standard deviation and mean across all elements, or just a dimension if it’s specified |
torch.sum(input, dim, keepdim=False, out=None) | Computes the sum of all elements, or just a dimension if it’s specified |
torch.unique(input, dim, keepdim=False, out=None) | Removes duplicates across the entire tensor, or just a dimension if it’s specified |
torch.unique_consecutive(input, dim, keepdim=False, out=None) | Similar to torch.unique() but only removes consecutive duplicates |
torch.var(input, dim, keepdim=False, out=None) | Computes the variance across all elements, or just a dimension if it’s specified |
torch.var_mean(input, dim, keepdim=False, out=None) | Computes the mean and variance across all elements, or just a dimension if it’s specified |
Note that many of these functions accept the dim parameter, which specifies the dimension of reduction for multidimensional tensors. This is similar to the axis parameter in NumPy. By default, when dim is not specified, the reduction occurs across all dimensions. Specifying dim = 1 will compute the operation across each row. For example, torch.mean(x,1) will compute the mean for each row in tensor x.
Tip: It’s common to chain methods together. For example, torch.rand(2,2).max().item() creates a 2 × 2 tensor of random floats, finds the maximum value, and returns the value itself from the resulting tensor.
Next, we’ll look at PyTorch’s comparison functions. Comparison functions usually compare all the values within a tensor, or compare one tensor’s values to another’s. They can return a tensor full of Booleans based on each element’s value such as torch.eq() or torch.is_boolean(). There are also functions to find the maximum or minimum value, sort tensor values, return the top subset of tensor elements, and more.
Table below lists some commonly used comparison functions for your reference.
Operation type | Sample functions |
---|---|
Compare a tensor to other tensors | eq(), ge(), gt(), le(), lt(), ne() or ==, >, >=, <, <=, !=, respectively |
Test tensor status or conditions | isclose(), isfinite(), isinf(), isnan() |
Return a single Boolean for the entire tensor | allclose(), equal() |
Find value(s) over the entire tensor or along a given dimension | argsort(), kthvalue(), max(), min(), sort(), topk() |
The next type of mathematical functions we’ll look at are linear algebra functions. Linear algebra functions facilitate matrix operations and are important for deep learning computations.
Many computations, including gradient descent and optimization algorithms, use linear algebra to implement their calculations. PyTorch supports a robust set of built-in linear algebra operations, many of which are based on the Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) standardized libraries.
Function | Description |
---|---|
torch.matmul() | Computes a matrix product of two tensors; supports broadcasting |
torch.chain_matmul() | Computes a matrix product of _N_ tensors |
torch.mm() | Computes a matrix product of two tensors (if broadcasting is required, use matmul()) |
torch.addmm() | Computes a matrix product of two tensors and adds it to the input |
torch.bmm() | Computes a batch of matrix products |
torch.addbmm() | Computes a batch of matrix products and adds it to the input |
torch.baddbmm() | Computes a batch of matrix products and adds it to the input batch |
torch.mv() | Computes the product of the matrix and vector |
torch.addmv() | Computes the product of the matrix and vector and adds it to the input |
torch.matrix_power | Returns a tensor raised to the power of _n_ (for square tensors) |
torch.eig() | Finds the eigenvalues and eigenvectors of a real square tensor |
torch.inverse() | Computes the inverse of a square tensor |
torch.det() | Computes the determinant of a matrix or batch of matrices |
torch.logdet() | Computes the log determinant of a matrix or batch of matrices |
torch.dot() | Computes the inner product of two tensors |
torch.addr() | Computes the outer product of two tensors and adds it to the input |
torch.solve() | Returns the solution to a system of linear equations |
torch.svd() | Performs a single-value decomposition |
torch.pca_lowrank() | Performs a linear principle component analysis |
torch.cholesky() | Computes a Cholesky decomposition |
torch.cholesky_inverse() | Computes the inverse of a symmetric positive definite matrix and returns the Cholesky factor |
torch.cholesky_solve() | Solves a system of linear equations using the Cholesky factor |
The final type of mathematical operations we’ll consider are spectral and other math operations. Depending on the domain of interest, these functions may be useful for data transforms or analysis. For example, spectral operations like the fast Fourier transform (FFT) can play an important role in computer vision or digital signal processing applications.
Operation type | Sample functions |
---|---|
Fast, inverse, and short-time Fourier transforms | fft(), ifft(), stft() |
Real-to-complex FFT and complex-to-real inverse FFT (IFFT) | rfft(), irfft() |
Windowing algorithms | bartlett_window(), blackman_window(),hamming_window(), hann_window() |
Histogram and bin counts | histc(), bincount() |
Cumulative operations | cummax(), cummin(), cumprod(), cumsum(),trace() (sum of the diagonal), einsum() (sum of products using Einstein summation) |
Normalization functions | cdist(), renorm() |
Cross product, dot product, and Cartesian product | cross(), tensordot(), cartesian_prod() |
Functions that create a diagonal tensor with elements of the input tensor | diag(), diag_embed(), diag_flat(), diagonal() |
Einstein summation | einsum() |
Matrix reduction and restructuring functions | flatten(), flip(), rot90(), repeat_interleave(), meshgrid(), roll(), combinations() |
Functions that return the lower or upper triangles and their indices | tril(), tril_indices, triu(), triu_indices() |
One function, backward(), is worth calling out in its own subsection because it’s what makes PyTorch so powerful for deep learning development. The backward() function uses PyTorch’s automatic differentiation package, torch.autograd, to differentiate and compute gradients of tensors based on the chain rule.
x = torch.tensor([[1,2,3],[4,5,6]],
dtype=torch.float, requires_grad=True)
print(x)
tensor([[1., 2., 3.], [4., 5., 6.]], requires_grad=True)
f = x.pow(2).sum()
print(f)
tensor(91., grad_fn=<SumBackward0>)
f.backward()
print(x.grad) # df/dx = 2x
tensor([[ 2., 4., 6.], [ 8., 10., 12.]])
we'll implement the basic functions of the Gradient Descent algorithm to find the boundary in a small dataset. First, we'll start with some functions that will help us plot and visualize the data.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#Some helper functions for plotting and drawing lines
def plot_points(X, y):
admitted = X[np.argwhere(y==1)]
rejected = X[np.argwhere(y==0)]
plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'blue', edgecolor = 'k')
plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'red', edgecolor = 'k')
def display(m, b, color='g--'):
plt.xlim(-0.05,1.05)
plt.ylim(-0.05,1.05)
x = np.arange(-10, 10, 0.1)
plt.plot(x, m*x+b, color)
data = pd.read_csv('https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/intro-neural-networks/gradient-descent/data.csv', header=None)
X = np.array(data[[0,1]])
y = np.array(data[2])
plot_points(X,y)
plt.show()
# Activation (sigmoid) function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Output (prediction) formula
def output_formula(features, weights, bias):
return sigmoid(np.dot(features, weights) + bias)
# Error (log-loss) formula
def error_formula(y, output):
return - y*np.log(output) - (1 - y) * np.log(1-output)
# Gradient descent step
def update_weights(x, y, weights, bias, learnrate):
output = output_formula(x, weights, bias)
d_error = y - output
weights += learnrate * d_error * x
bias += learnrate * d_error
return weights, bias
The following training function will help us iterate the gradient descent algorithm through all the data, for a number of epochs. It will also plot the data, and some of the boundary lines obtained as we run the algorithm.
np.random.seed(44)
epochs = 100
learnrate = 0.01
def train(features, targets, epochs, learnrate, graph_lines=False):
errors = []
n_records, n_features = features.shape
last_loss = None
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
bias = 0
for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features, targets):
weights, bias = update_weights(x, y, weights, bias, learnrate)
# Printing out the log-loss error on the training set
out = output_formula(features, weights, bias)
loss = np.mean(error_formula(targets, out))
errors.append(loss)
if e % (epochs / 10) == 0:
print("\n========== Epoch", e,"==========")
if last_loss and last_loss < loss:
print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss
# Converting the output (float) to boolean as it is a binary classification
# e.g. 0.95 --> True (= 1), 0.31 --> False (= 0)
predictions = out > 0.5
accuracy = np.mean(predictions == targets)
print("Accuracy: ", accuracy)
if graph_lines and e % (epochs / 100) == 0:
display(-weights[0]/weights[1], -bias/weights[1])
# Plotting the solution boundary
plt.title("Solution boundary")
display(-weights[0]/weights[1], -bias/weights[1], 'black')
# Plotting the data
plot_points(features, targets)
plt.show()
# Plotting the error
plt.title("Error Plot")
plt.xlabel('Number of epochs')
plt.ylabel('Error')
plt.plot(errors)
plt.show()
When we run the function, we'll obtain the following:
train(X, y, epochs, learnrate, True)
========== Epoch 0 ========== Train loss: 0.7135845195381634 Accuracy: 0.4 ========== Epoch 10 ========== Train loss: 0.6225835210454962 Accuracy: 0.59 ========== Epoch 20 ========== Train loss: 0.5548744083669508 Accuracy: 0.74 ========== Epoch 30 ========== Train loss: 0.501606141872473 Accuracy: 0.84 ========== Epoch 40 ========== Train loss: 0.4593334641861401 Accuracy: 0.86 ========== Epoch 50 ========== Train loss: 0.42525543433469987 Accuracy: 0.93 ========== Epoch 60 ========== Train loss: 0.3973461571671399 Accuracy: 0.93 ========== Epoch 70 ========== Train loss: 0.3741469765239074 Accuracy: 0.93 ========== Epoch 80 ========== Train loss: 0.35459973368161973 Accuracy: 0.94 ========== Epoch 90 ========== Train loss: 0.3379273658879921 Accuracy: 0.94
In this section, we predict student admissions to graduate school at UCLA based on three pieces of data:
The dataset originally came from here: http://www.ats.ucla.edu/
# Importing pandas and numpy
import pandas as pd
import numpy as np
# Reading the csv file into a pandas DataFrame
data = pd.read_csv('https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/intro-neural-networks/student-admissions/student_data.csv')
# Printing out the first 10 rows of our data
data.head()
admit | gre | gpa | rank | |
---|---|---|---|---|
0 | 0 | 380 | 3.61 | 3 |
1 | 1 | 660 | 3.67 | 3 |
2 | 1 | 800 | 4.00 | 1 |
3 | 1 | 640 | 3.19 | 4 |
4 | 0 | 520 | 2.93 | 4 |
First let's make a plot of our data to see how it looks. In order to have a 2D plot, let's ingore the rank.
# %matplotlib inline
import matplotlib.pyplot as plt
# Function to help us plot
def plot_points(data):
X = np.array(data[["gre","gpa"]])
y = np.array(data["admit"])
admitted = X[np.argwhere(y==1)]
rejected = X[np.argwhere(y==0)]
plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')
plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k')
plt.xlabel('Test (GRE)')
plt.ylabel('Grades (GPA)')
# Plotting the points
plot_points(data)
plt.show()
Roughly, it looks like the students with high scores in the grades and test passed, while the ones with low scores didn't, but the data is not as nicely separable as we hoped it would. Maybe it would help to take the rank into account? Let's make 4 plots, each one for each rank.
# Separating the ranks
data_rank1 = data[data["rank"]==1]
data_rank2 = data[data["rank"]==2]
data_rank3 = data[data["rank"]==3]
data_rank4 = data[data["rank"]==4]
# Plotting the graphs
plot_points(data_rank1)
plt.title("Rank 1")
plt.show()
plot_points(data_rank2)
plt.title("Rank 2")
plt.show()
plot_points(data_rank3)
plt.title("Rank 3")
plt.show()
plot_points(data_rank4)
plt.title("Rank 4")
plt.show()
This looks more promising, as it seems that the lower the rank, the higher the acceptance rate. Let's use the rank as one of our inputs. In order to do this, we should one-hot encode it.
Use the get_dummies
function in pandas in order to one-hot encode the data.
Hint: To drop a column, it's suggested that you use one_hot_data
.drop( ).
# Make dummy variables for rank
one_hot_data = pd.concat([data, pd.get_dummies(data['rank'], prefix='rank')], axis=1)
# Drop the previous rank column
one_hot_data = one_hot_data.drop('rank', axis=1)
# Print the first 10 rows of our data
one_hot_data[:10]
admit | gre | gpa | rank_1 | rank_2 | rank_3 | rank_4 | |
---|---|---|---|---|---|---|---|
0 | 0 | 380 | 3.61 | 0 | 0 | 1 | 0 |
1 | 1 | 660 | 3.67 | 0 | 0 | 1 | 0 |
2 | 1 | 800 | 4.00 | 1 | 0 | 0 | 0 |
3 | 1 | 640 | 3.19 | 0 | 0 | 0 | 1 |
4 | 0 | 520 | 2.93 | 0 | 0 | 0 | 1 |
5 | 1 | 760 | 3.00 | 0 | 1 | 0 | 0 |
6 | 1 | 560 | 2.98 | 1 | 0 | 0 | 0 |
7 | 0 | 400 | 3.08 | 0 | 1 | 0 | 0 |
8 | 1 | 540 | 3.39 | 0 | 0 | 1 | 0 |
9 | 0 | 700 | 3.92 | 0 | 1 | 0 | 0 |
The next step is to scale the data. We notice that the range for grades is 1.0-4.0, whereas the range for test scores is roughly 200-800, which is much larger. This means our data is skewed, and that makes it hard for a neural network to handle. Let's fit our two features into a range of 0-1, by dividing the grades by 4.0, and the test score by 800.
# Copying our data
processed_data = one_hot_data[:]
# Scaling the columns
processed_data['gre'] = processed_data['gre']/800
processed_data['gpa'] = processed_data['gpa']/4.0
processed_data[:10]
admit | gre | gpa | rank_1 | rank_2 | rank_3 | rank_4 | |
---|---|---|---|---|---|---|---|
0 | 0 | 0.475 | 0.9025 | 0 | 0 | 1 | 0 |
1 | 1 | 0.825 | 0.9175 | 0 | 0 | 1 | 0 |
2 | 1 | 1.000 | 1.0000 | 1 | 0 | 0 | 0 |
3 | 1 | 0.800 | 0.7975 | 0 | 0 | 0 | 1 |
4 | 0 | 0.650 | 0.7325 | 0 | 0 | 0 | 1 |
5 | 1 | 0.950 | 0.7500 | 0 | 1 | 0 | 0 |
6 | 1 | 0.700 | 0.7450 | 1 | 0 | 0 | 0 |
7 | 0 | 0.500 | 0.7700 | 0 | 1 | 0 | 0 |
8 | 1 | 0.675 | 0.8475 | 0 | 0 | 1 | 0 |
9 | 0 | 0.875 | 0.9800 | 0 | 1 | 0 | 0 |
In order to test our algorithm, we'll split the data into a Training and a Testing set. The size of the testing set will be 10% of the total data.
sample = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False)
train_data, test_data = processed_data.iloc[sample], processed_data.drop(sample)
print("Number of training samples is", len(train_data))
print("Number of testing samples is", len(test_data))
print(train_data[:10])
print(test_data[:10])
Number of training samples is 360 Number of testing samples is 40 admit gre gpa rank_1 rank_2 rank_3 rank_4 99 0 0.500 0.8275 0 0 1 0 194 1 0.750 0.8675 0 1 0 0 61 0 0.700 0.8300 0 0 0 1 302 1 0.500 0.7875 0 1 0 0 394 1 0.575 0.9975 0 0 1 0 62 0 0.800 0.9175 0 0 1 0 345 0 0.625 0.7575 0 0 1 0 344 0 0.650 0.8375 0 0 1 0 68 0 0.725 0.9225 1 0 0 0 314 0 0.675 0.8650 0 0 0 1 admit gre gpa rank_1 rank_2 rank_3 rank_4 5 1 0.950 0.7500 0 1 0 0 13 0 0.875 0.7700 0 1 0 0 23 0 0.850 0.7975 0 0 0 1 26 1 0.775 0.9025 1 0 0 0 30 0 0.675 0.9450 0 0 0 1 58 0 0.500 0.9125 0 1 0 0 72 0 0.600 0.8475 0 0 0 1 74 0 0.900 0.8625 0 0 0 1 75 0 0.900 1.0000 0 0 1 0 80 0 0.875 0.7250 0 0 0 1
Now, as a final step before the training, we'll split the data into features (X) and targets (y).
features = train_data.drop('admit', axis=1)
targets = train_data['admit']
features_test = test_data.drop('admit', axis=1)
targets_test = test_data['admit']
print(features[:10])
print(targets[:10])
gre gpa rank_1 rank_2 rank_3 rank_4 99 0.500 0.8275 0 0 1 0 194 0.750 0.8675 0 1 0 0 61 0.700 0.8300 0 0 0 1 302 0.500 0.7875 0 1 0 0 394 0.575 0.9975 0 0 1 0 62 0.800 0.9175 0 0 1 0 345 0.625 0.7575 0 0 1 0 344 0.650 0.8375 0 0 1 0 68 0.725 0.9225 1 0 0 0 314 0.675 0.8650 0 0 0 1 99 0 194 1 61 0 302 1 394 1 62 0 345 0 344 0 68 0 314 0 Name: admit, dtype: int64
The following function trains the 1-layer neural network.
First, we'll write some helper functions.
# Activation (sigmoid) function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_prime(x):
return sigmoid(x) * (1-sigmoid(x))
def error_formula(y, output):
return - y*np.log(output) - (1 - y) * np.log(1-output)
Now it's your turn to shine. Write the error term. Remember that this is given by the equation $$ (y-\hat{y})x $$ for binary cross entropy loss function and $$ (y-\hat{y})\sigma'(x)x $$ for mean square error.
def error_term_formula(x, y, output):
# for binary cross entropy loss
return (y - output)*x
# for mean square error
# return (y - output)*sigmoid_prime(x)*x
# Neural Network hyperparameters
epochs = 1000
learnrate = 0.0001
# Training function
def train_nn(features, targets, epochs, learnrate):
# Use to same seed to make debugging easier
np.random.seed(42)
n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
for e in range(epochs):
del_w = np.zeros(weights.shape)
for x, y in zip(features.values, targets):
# Loop through all records, x is the input, y is the target
# Activation of the output unit
# Notice we multiply the inputs and the weights here
# rather than storing h as a separate variable
output = sigmoid(np.dot(x, weights))
# The error term
error_term = error_term_formula(x, y, output)
# The gradient descent step, the error times the gradient times the inputs
del_w += error_term
# Update the weights here. The learning rate times the
# change in weights
# don't have to divide by n_records since it is compensated by the learning rate
weights += learnrate * del_w #/ n_records
# Printing out the mean square error on the training set
if e % (epochs / 10) == 0:
out = sigmoid(np.dot(features, weights))
loss = np.mean(error_formula(targets, out))
print("Epoch:", e)
if last_loss and last_loss < loss:
print("Train loss: ", loss, " WARNING - Loss Increasing")
else:
print("Train loss: ", loss)
last_loss = loss
print("=========")
print("Finished training!")
return weights
weights = train_nn(features, targets, epochs, learnrate)
Epoch: 0 Train loss: 0.7554101695287188 ========= Epoch: 100 Train loss: 0.6257618907012589 ========= Epoch: 200 Train loss: 0.6138277106962134 ========= Epoch: 300 Train loss: 0.6106428180453167 ========= Epoch: 400 Train loss: 0.6086718721854811 ========= Epoch: 500 Train loss: 0.607187860874402 ========= Epoch: 600 Train loss: 0.606032020277616 ========= Epoch: 700 Train loss: 0.6051189345648322 ========= Epoch: 800 Train loss: 0.6043881919205627 ========= Epoch: 900 Train loss: 0.6037952479383182 ========= Finished training!
# Calculate accuracy on test data
test_out = sigmoid(np.dot(features_test, weights))
predictions = test_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))
Prediction accuracy: 0.625
You’ll build a deep learning model and train the model using a common training loop structure. Then, you’ll test your model’s performance and tweak hyperparameters to improve your results and training speed. Finally, we’ll explore ways to deploy your model to prototype systems or production.
First, we load this data and convert it to numeric values in the form of tensors. The tensors will act as inputs during the model training stage; however, before they are passed in, the tensors are usually preprocessed via transforms and grouped into batches for better training performance. Thus, the data preparation stage takes generic data and converts it to batches of tensors that can be passed into your NN model.
Next, in the model experimentation and development stage, we will design an NN model, train the model with our training data, test its performance, and optimize our hyperparameters to improve performance to a desired level. To do so, we will separate our dataset into three parts: one for training, one for validation, and one for testing. We’ll design an NN model and train its parameters with our training data. PyTorch provides elegantly designed modules and classes in the torch.nn module to help you create and train your NNs. We will define a loss function and optimizer from a selection of the many built-in PyTorch functions. Then we’ll perform backpropagation and update the model parameters in our training loop.
Within each epoch, we’ll also validate our model by passing in validation data, measuring performance, and potentially tuning hyperparameters. Finally, we’ll test our model by passing in test data and measuring the model’s performance against unseen data. In practice, validation and test loops may be optional, but we show them here for completeness.
The last stage of deep learning model development is the model deployment stage. In this stage, we have a fully trained model—so what do we do with it? If you are a deep learning research scientist conducting experiments, you may want to simply save the model to a file and load it for further research and experimentation, or you may want to provide access to it via a repository like PyTorch Hub. You may also want to deploy it to an edge device or local server to demonstrate a prototype or a proof of concept.
On the other hand, if you are a software developer or systems engineer, you may want to deploy your model to a product or service. In this case, you can deploy your model to a production environment on a cloud server or deploy it to an edge device or mobile phone. When deploying trained models, the model often requires additional postprocessing. For example, you may classify a batch of images, but you only want to report the most confident result. The model deployment stage also handles any postprocessing that is needed to go from your model’s output values to the final solution.
PyTorch provides powerful built-in classes and utilities, such as the Dataset, DataLoader, and Sampler classes, for loading various types of data. The Dataset class defines how to access and preprocess data from a file or data sources. The Sampler class defines how to sample data from a dataset in order to create batches, while the DataLoader class combines a dataset with a sampler and allows you to iterate over a set of batches.
import torch
import torchvision
from torchvision.datasets import CIFAR10
train_data = CIFAR10(root="./train/",
train=True,
download=True)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./train/cifar-10-python.tar.gz
0%| | 0/170498071 [00:00<?, ?it/s]
Extracting ./train/cifar-10-python.tar.gz to ./train/
print(train_data)
print(len(train_data))
print(train_data.data.shape)
print(len(train_data.targets))
print(train_data.classes)
print(train_data.class_to_idx)
print(type(train_data[0]))
print(len(train_data[0]))
data, label = train_data[0]
print(type(data))
print(data)
print(type(label))
print(label)
print(train_data.classes[label])
Dataset CIFAR10 Number of datapoints: 50000 Root location: ./train/ Split: Train 50000 (50000, 32, 32, 3) 50000 ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] {'airplane': 0, 'automobile': 1, 'bird': 2, 'cat': 3, 'deer': 4, 'dog': 5, 'frog': 6, 'horse': 7, 'ship': 8, 'truck': 9} <class 'tuple'> 2 <class 'PIL.Image.Image'> <PIL.Image.Image image mode=RGB size=32x32 at 0x7F2B561C1810> <class 'int'> 6 frog
test_data = CIFAR10(root="./test/",
train=False,
download=True)
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./test/cifar-10-python.tar.gz
0%| | 0/170498071 [00:00<?, ?it/s]
Extracting ./test/cifar-10-python.tar.gz to ./test/
print(test_data)
print(len(test_data))
print(test_data.data.shape)
Dataset CIFAR10 Number of datapoints: 10000 Root location: ./test/ Split: Test 10000 (10000, 32, 32, 3)
from torchvision import transforms
train_transforms = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(
(0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))])
train_data = CIFAR10(root="./train/",
train=True,
download=True,
transform=train_transforms)
print(train_data)
print(train_data.transforms)
data, label = train_data[0]
print(type(data))
print(data.size())
print(data)
Files already downloaded and verified Dataset CIFAR10 Number of datapoints: 50000 Root location: ./train/ Split: Train StandardTransform Transform: Compose( RandomCrop(size=(32, 32), padding=4) RandomHorizontalFlip(p=0.5) ToTensor() Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.201)) ) StandardTransform Transform: Compose( RandomCrop(size=(32, 32), padding=4) RandomHorizontalFlip(p=0.5) ToTensor() Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.201)) ) <class 'torch.Tensor'> torch.Size([3, 32, 32]) tensor([[[-2.4291, -2.4291, -2.4291, ..., -2.4291, -2.4291, -2.4291], [-2.4291, -2.4291, -2.4291, ..., -2.4291, -2.4291, -2.4291], [-2.4291, -2.4291, -2.4291, ..., -2.4291, -2.4291, -2.4291], ..., [-2.4291, -2.4291, -2.4291, ..., 0.2073, 0.0328, -0.0835], [-2.4291, -2.4291, -2.4291, ..., 0.3430, 0.0910, 0.1297], [-2.4291, -2.4291, -2.4291, ..., 0.4593, 0.2267, 0.3430]], [[-2.4183, -2.4183, -2.4183, ..., -2.4183, -2.4183, -2.4183], [-2.4183, -2.4183, -2.4183, ..., -2.4183, -2.4183, -2.4183], [-2.4183, -2.4183, -2.4183, ..., -2.4183, -2.4183, -2.4183], ..., [-2.4183, -2.4183, -2.4183, ..., -0.2156, -0.3532, -0.6089], [-2.4183, -2.4183, -2.4183, ..., -0.3532, -0.5892, -0.6482], [-2.4183, -2.4183, -2.4183, ..., -0.2746, -0.4319, -0.3139]], [[-2.2214, -2.2214, -2.2214, ..., -2.2214, -2.2214, -2.2214], [-2.2214, -2.2214, -2.2214, ..., -2.2214, -2.2214, -2.2214], [-2.2214, -2.2214, -2.2214, ..., -2.2214, -2.2214, -2.2214], ..., [-2.2214, -2.2214, -2.2214, ..., -1.2069, -1.4605, -1.4605], [-2.2214, -2.2214, -2.2214, ..., -1.1678, -1.3629, -1.3239], [-2.2214, -2.2214, -2.2214, ..., -0.9922, -1.1678, -1.1093]]])
test_transforms = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(
(0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))])
test_data = torchvision.datasets.CIFAR10(
root="./test/",
train=False,
transform=test_transforms)
print(test_data)
Dataset CIFAR10 Number of datapoints: 10000 Root location: ./test/ Split: Test StandardTransform Transform: Compose( ToTensor() Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.201)) )
Now that we have defined the transforms and created the datasets, we can access data samples one at a time. However, when you train your model, you will want to pass in small batches of data at each iteration. Sending data in batches not only allows more efficient training but also takes advantage of the parallel nature of GPUs to accelerate training.
Batch processing can easily be implemented using the torch.utils.data.DataLoader class. Let’s start with an example of how Torchvision uses this class, and then we’ll cover it in more detail.
trainloader = torch.utils.data.DataLoader(
train_data,
batch_size=16,
shuffle=True)
testloader = torch.utils.data.DataLoader(
test_data,
batch_size=16,
shuffle=False)
The dataloader object combines a dataset and a sampler, and provides an iterable over the given dataset. In other words, your training loop can use this object to sample your dataset and apply transforms one batch at a time instead of applying them for the complete dataset at once. This considerably improves efficiency and speed when training and testing models.
The following code shows how to retrieve a batch of samples from the trainloader:
data_batch, labels_batch = next(iter(trainloader))
print(data_batch.size())
print(labels_batch.size())
torch.Size([16, 3, 32, 32]) torch.Size([16])
We need to use iter() to cast the trainloader to an iterator and then use next() to iterate over the data one more time. This is only necessary when accessing one batch. As we’ll see later, our training loops will access the dataloader directly without the need for iter() and next(). After checking the sizes of the data and labels, we see they return batches of size 16.
So far, I’ve shown you how to load, transform, and batch image data using Torchvision. However, you can use PyTorch to prepare other types of data as well. PyTorch libraries such as Torchtext and Torchaudio provide dataset and dataloader classes for text and audio data, and new external libraries are being developed all the time.
PyTorch also provides a submodule called torch.utils.data that you can use to create your own dataset and dataloader classes like the ones you saw in Torchvision. It consists of Dataset, Sampler, and DataLoader classes.
PyTorch supports map- and iterable-style dataset classes. A map-style dataset is derived from the abstract class torch.utils.data.Dataset. It implements the getitem() and len() functions, and represents a map from (possibly nonintegral) indices/keys to data samples. For example, such a dataset, when accessed with dataset[idx], could read the idx-th image and its corresponding label from a folder on the disk. Map-style datasets are more commonly used than iterable-style datasets, and all datasets that represent a map made from keys or data samples should use this subclass.
All subclasses should overwrite getitem(), which fetches a data sample for a given key. Subclasses can also optionally overwrite len(), which returns the size of the dataset by many Sampler implementations and the default options of DataLoader.
An iterable-style dataset, on the other hand, is derived from the torch.utils.data.IterableDataset abstract class. It implements the iter() protocol and represents an iterable over data samples. This type of dataset is typically used when reading data from a database or a remote server, as well as data generated in real time. Iterable datasets are useful when random reads are expensive or uncertain, and when the batch size depends on fetched data.
In addition to dataset classes PyTorch also provides sampler classes, which offer a way to iterate over indices of dataset samples. Sampler are derived from the torch.utils.data.Sampler base class.
Every Sampler subclass needs to implement an iter() method to provide a way to iterate over indices of dataset elements and a len() method that returns the length of the returned iterators.
The dataset and sampler objects are not iterables, meaning you cannot run a for loop on them. The dataloader object solves this problem. The Dataset class returns a dataset object that includes data and information about the data. The Sampler class returns the actual data itself in a specified or random fashion. The DataLoader class combines a dataset with a sampler and returns an iterable.
One of the most powerful features of PyTorch is its Python module torch.nn, which makes it easy to design and experiment with new models. The following code illustrates how you can create a simple model with torch.nn. In this example, we will create a fully connected model called SimpleNet. It consists of an input layer, a hidden layer, and an output layer that takes in 2,048 input values and returns 2 output values for classification:
import torch.nn as nn
import torch.nn.functional as F
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.fc1 = nn.Linear(2048, 256)
self.fc2 = nn.Linear(256, 64)
self.fc3 = nn.Linear(64,2)
def forward(self, x):
x = x.view(-1, 2048)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.softmax(self.fc3(x),dim=1)
return x
simplenet = SimpleNet()
print(simplenet)
SimpleNet( (fc1): Linear(in_features=2048, out_features=256, bias=True) (fc2): Linear(in_features=256, out_features=64, bias=True) (fc3): Linear(in_features=64, out_features=2, bias=True) )
This simple model demonstrates the following decisions you need to make during model design:
The next step in model development is to train your model with your training data. Training a model involves nothing more than estimating the model’s parameters, passing in data, and adjusting the parameters to achieve a more accurate representation of how the data is generally modeled.
In other words, you set the parameters to some values, pass through data, and then compare the model’s outputs with true outputs to measure the error. The goal is to change the parameters and repeat the process until the error is minimized and the model’s outputs are the same as the true outputs.
In this example, we will train the LeNet5 model with the CIFAR-10 dataset that we used earlier in this chapter. The LeNet5 model is a simple convolutional NN developed by Yann LeCun and his team at Bell Labs in the 1990s to classify hand-written digits. (Unbeknownst to me at the time, I actually worked for Bell Labs in the same building in Holmdel, NJ, while this work was being performed.)
from torch import nn
import torch.nn.functional as F
class LeNet5(nn.Module):
def __init__(self):
super(LeNet5, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5) # <1>
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, int(x.nelement() / x.shape[0]))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
device = "cuda" if torch.cuda.is_available() else "cpu"
model = LeNet5().to(device=device)
Next, we need to define the loss function (which is also called the criterion) and the optimizer algorithm. The loss function determines how we measure the performance of our model and computes the loss or error between predictions and truth. We’ll attempt to minimize the loss by adjusting the model parameters during training. The optimizer defines how we update our model’s parameters during training.
To define the loss function and the optimizer, we use the torch.optim and torch.nn packages as shown in the following code:
from torch import optim
from torch import nn
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),
lr=0.001,
momentum=0.9)
N_EPOCHS = 10
for epoch in range(N_EPOCHS): # <1>
epoch_loss = 0.0
for inputs, labels in trainloader:
inputs = inputs.to(device) # <2>
labels = labels.to(device)
optimizer.zero_grad() # <3>
outputs = model(inputs) # <4>
loss = criterion(outputs, labels) # <5>
loss.backward() # <6>
optimizer.step() # <7>
epoch_loss += loss.item() # <8>
print("Epoch: {} Loss: {}".format(epoch,
epoch_loss/len(trainloader)))
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Epoch: 0 Loss: 1.9107245240402222 Epoch: 1 Loss: 1.600426623764038 Epoch: 2 Loss: 1.4878545830726624 Epoch: 3 Loss: 1.3998275267791749 Epoch: 4 Loss: 1.339401881046295 Epoch: 5 Loss: 1.2818300464820862 Epoch: 6 Loss: 1.2469202939224242 Epoch: 7 Loss: 1.2177171779727936 Epoch: 8 Loss: 1.194690007867813 Epoch: 9 Loss: 1.1706352946281433
The training loop consists of two loops. In the outer loop, we will process the entire set of training data during every iteration or epoch. However, instead of waiting to process the entire dataset before updating the model’s parameters, we process smaller batches of data, one batch at a time. The inner loop loops over each batch.
Warning: By default, PyTorch accumulates the gradients during each call to loss.backward() (i.e., the backward pass). This is convenient while training some types of NNs, such as RNNs; however, it is not desired for convolutional neural networks (CNNs). In most cases, you will need to call optimizer.zero_grad() to zero the gradients before doing backpropagation so the optimizer updates the model parameters correctly.
Now that we have trained our model and attempted to minimize the loss, how can we evaluate its performance? How do we know that our model will generalize and work with data it has never seen before?
Model development often includes validation and testing loops to ensure that overfitting does not occur and that the model will perform well against unseen data. Let’s address validation first. Here, I’ll provide you with a quick reference for how you can add validation to your training loops with PyTorch.
Typically, we will reserve a portion of the training data for validation. The validation data will not be used to train the NN; instead, we’ll use it to test the performance of the model at the end of each epoch.
Validation is good practice when training your models. It’s commonly performed when adjusting hyperparameters. For example, maybe we want to slow down the learning rate after five epochs.
Before we perform validation, we need to split our training dataset into a training dataset and a validation dataset. We use the random_split() function from torch.utils.data to reserve 10,000 of our 50,000 training images for validation. Once we create our train_set and val_set, we create our dataloaders for each one.
from torch.utils.data import random_split
train_set, val_set = random_split(
train_data,
[40000, 10000])
trainloader = torch.utils.data.DataLoader(
train_set,
batch_size=16,
shuffle=True)
valloader = torch.utils.data.DataLoader(
val_set,
batch_size=16,
shuffle=True)
print(len(trainloader))
print(len(valloader))
2500 625
If the loss decreases for validation data, then the model is doing well. However, if the training loss decreases but the validation loss does not, then there’s a good chance the model is overfitting.
from torch import optim
from torch import nn
model = LeNet5().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),
lr=0.001,
momentum=0.9)
N_EPOCHS = 10
for epoch in range(N_EPOCHS):
# Training
train_loss = 0.0
model.train() # <1>
for inputs, labels in trainloader:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
# Validation
val_loss = 0.0
model.eval() # <2>
for inputs, labels in valloader:
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
val_loss += loss.item()
print("Epoch: {} Train Loss: {} Val Loss: {}".format(
epoch,
train_loss/len(trainloader),
val_loss/len(valloader)))
Epoch: 0 Train Loss: 1.9745660563468934 Val Loss: 1.7492280321121216 Epoch: 1 Train Loss: 1.6637929407119751 Val Loss: 1.5639272161483764 Epoch: 2 Train Loss: 1.5348106513500213 Val Loss: 1.4912440963745117 Epoch: 3 Train Loss: 1.4464851764440536 Val Loss: 1.385581601524353 Epoch: 4 Train Loss: 1.374079407954216 Val Loss: 1.4018443069458009 Epoch: 5 Train Loss: 1.316621362066269 Val Loss: 1.2531775268554688 Epoch: 6 Train Loss: 1.2859153034687043 Val Loss: 1.2561434190750123 Epoch: 7 Train Loss: 1.2512328678131104 Val Loss: 1.2327665576934814 Epoch: 8 Train Loss: 1.2264495978951455 Val Loss: 1.2291773901939391 Epoch: 9 Train Loss: 1.195324891924858 Val Loss: 1.1875609773635865
Note: Running the .train() or .eval() method on your model object puts the model in training or testing mode, respectively. Calling these methods is only necessary if your model operates differently for training and evaluation. For example, dropout and batch normalization are used in training but not in validation or testing. It’s good practice to call .train() and .eval() in your loops.
As you can see, our model is training well and does not seem to be overfitting, since both the training loss and the validation loss are decreasing. If we train the model for more epochs, we may get even better results.
We’re not quite finished, though. Our model may still be overfitting. We might have just gotten lucky with our choice of hyperparameters, leading to good validation results. As a further test against overfitting, we will run some test data through our model.
The model has never seen the test data during training, nor has the test data had any influence on the hyperparameters. Let’s see how we perform against the test dataset.
num_correct = 0.0
for x_test_batch, y_test_batch in testloader:
model.eval()
y_test_batch = y_test_batch.to(device)
x_test_batch = x_test_batch.to(device)
y_pred_batch = model(x_test_batch)
_, predicted = torch.max(y_pred_batch, 1)
num_correct += (predicted == y_test_batch).float().sum()
accuracy = num_correct/(len(testloader)*testloader.batch_size)
print(len(testloader), testloader.batch_size)
print("Test Accuracy: {}".format(accuracy))
625 16 Test Accuracy: 0.6157000064849854
Tip: You now know how to create training, validation, and test loops using PyTorch. Feel free to use this code as a reference when creating your own loops.
Now that you have a fully trained model, let’s explore what you can do with it in the model deployment stage. One of the simplest things you can do is save your trained model for future use. When you want to run your model against new inputs, you can simply load it and call the model with the new values.
The following code illustrates the recommended way to save and load a trained model. It uses the state_dict() method, which creates a dictionary object that maps each layer to its parameter tensor. In other words, we only need to save the model’s learned parameters. We already have the model’s design defined in our model class, so we don’t need to save the architecture. When we load the model, we use the constructor to create a “blank model,” and then we use load_state_dict() to set the parameters for each layer:
torch.save(model.state_dict(), "./lenet5_model.pt")
model = LeNet5().to(device)
model.load_state_dict(torch.load("./lenet5_model.pt"))
<All keys matched successfully>
Note: A common PyTorch convention is to save models using either a .pt or .pth file extension.