Basic PyTorch classification tutorial with links and references to useful materials to get started. This tutorial was presented on the 6th of August 2018 as part of the weekly meetings of IVUL-KAUST research group.
There exists high-level APIs for PyTorch analogous to Keras for Tensorflow such as:
Also, it is not that hard to use tensorboard with PyTorch:
You can use this official ~100 lines example of MNIST classification as a reference.
In this example, we will only use torch
and torchvision
and we won't be using any high-level API because we want to demonstrate the power of PyTorch at its core. In fact, we will be building something similar to a high-level API ourselves.
import importlib
import tensorflow as tf # to visualize training summaries with tensorboard
import torch
from torch import nn
import torch.nn.functional as F
from torchvision import transforms, datasets
# for reproducibility
torch.manual_seed(0)
# a utility function to print the progress of a for-loop
# don't worry about this bit because it is not part of the tutorial
# it is recommended that you install `tqdm` package
def _verbosify(iterable):
# shows only the iteration number and how many iterations are left
try:
len_iterable = len(iterable)
except Exception:
len_iterable = None
for i, element in enumerate(iterable, 1):
if len_iterable is None:
print('\rIteration #{}'.format(i), end='')
else:
print('\rIteration #{} out of {} iterations [Done {:.2f}%]'.format(
i, len_iterable, 100 * i / len_iterable), end='')
yield element
print('\r', end='', flush=True)
def verbosify(iterable, **kwargs):
# try to use tqdm (shows the speed and the remaining time left)
if importlib.util.find_spec('tqdm') is not None:
tqdm = importlib.import_module('tqdm').tqdm
if 'file' not in kwargs:
kwargs['file'] = importlib.import_module('sys').stdout
if 'leave' not in kwargs:
kwargs['leave'] = False
return tqdm(iterable, **kwargs)
else:
return iter(_verbosify(iterable))
# try out this example (uncomment to test):
# for i in verbosify(range(10000000)):
# pass
train_loader
and valid_loader
]¶I strongly recommend that you follow this tutorial for a more comprehensive understanding of how to deal with data.
You can find the Dogs vs. Cats dataset here. However, you can use your own dataset but to use ImageFolder you need to make sure that the dataset is in a folder where each subfolder is a class label that contains all the images of that class. There are also utility functions in torchvision
that downloads common datasets like MNIST and CIFAR10 under torchvision.datasets
(e.g. torchvision.datasets.MNIST
).
After you download the Dogs vs. Cats dataset you would get all.zip
file, unzip it using:
sudo apt-get install unzip
unzip all.zip -d all
cd all
unzip train.zip
cd train
All the images are named as [dog|cat].<index>.jpg
but we need to put them in sperate folders {cat, dog}
as follows:
mkdir dog cat
mv dog.* dog
mv cat.* cat
In PyTorch, we call the input to your model data
and the output target
. When we read images from a folder, they are read as PIL
images but in order to feed them as data
to our model, they need to be transformed to PyTorch tensors with correct size and normalization. This is why we will create a list of transformation functions for the images, each of which will operate on the output of the previous function while the first function will operate on a single PIL
image. You can also create transformation functions for the output labels called target_transform
if needed.
train_transform = transforms.Compose([
transforms.Resize(224), # takes PIL image as input and outputs PIL image
transforms.RandomResizedCrop(224), # takes PIL image as input and outputs PIL image
transforms.RandomHorizontalFlip(), # takes PIL image as input and outputs PIL image
transforms.ToTensor(), # takes PIL image as input and outputs torch.tensor
transforms.Normalize(mean=[0.4280, 0.4106, 0.3589], # takes tensor and outputs tensor
std=[0.2737, 0.2631, 0.2601]), # see next step for mean and std
])
valid_transform = transforms.Compose([ # for validation we don't randomize or augment
transforms.Resize(224),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.4280, 0.4106, 0.3589],
std=[0.2737, 0.2631, 0.2601]),
])
# Just implement __getitem__ and __len__
class DummyDataset(torch.utils.data.Dataset):
def __init__(self, size=(3, 224, 224), num_samples=1000, num_classes=3):
self.images = torch.randn(num_samples, *size)
self.labels = torch.randint(0, num_classes, (num_samples,))
# this dataset, doesn't need transforms
# because it is already in the correct size and format
def __getitem__(self, index):
return self.images[index, ...], self.labels[index]
def __len__(self):
return self.images.size(0)
# Or, we can use `torchvision.datasets.ImageFolder`
dataset = datasets.ImageFolder(root='./all/train/',
transform=train_transform)
# here you should split this dataset into training and validation
def random_split(dataset, split_frac):
dataset_length = len(dataset)
train_length = int(dataset_length * (1 - split_frac))
valid_length = dataset_length - train_length
train_set, valid_set = torch.utils.data.random_split(dataset, [train_length, valid_length])
return train_set, valid_set
split_frac = 0.1 # the ratio of images in the validation set
train_set, valid_set = random_split(dataset, split_frac)
# getting the mean and std of the images (assuming that you have enough memory)
# pixels_list = [img.view(3, -1) for img, label in \ # this will take a while
# datasets.ImageFolder(root='all/train', transform=valid_transform)]
# pixels = torch.cat(pixels_list, dim=-1)
# pixels_mean = pixels.mean(dim=-1)
# pixels_std = pixels.std(dim=-1)
# print(pixels_mean) # Out: tensor([0.4280, 0.4106, 0.3589])
# print(pixels_std) # Out: tensor([0.2737, 0.2631, 0.2601])
# if you don't have sufficient memory, you can compute mean as a running average
# and std as described here: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
# create the torch.utils.data.DataLoader which will do the loading
def data_loader(dataset, batch_size, train, cuda):
return torch.utils.data.DataLoader(dataset,
batch_size=batch_size,
pin_memory=cuda,
num_workers=4 if cuda else 0,
shuffle=not train,
drop_last=train)
train_loader = data_loader(train_set, 128, train=True, cuda=True)
valid_loader = data_loader(valid_set, 256, train=False, cuda=True)
# Note: look up the rest of the parameters of DataLoader
# some of the interesting ones are `sampler` and `collate_fn`.
Net
]¶Please, refer to the official implementation of AlexNet here for a nicer style of defining an nn.Module
that uses nn.Sequential
which itself is a subclass of nn.Module
. It will introduce you to nn.Sequential
and the concept of defining a module using submodules.
class Net(nn.Module):
def __init__(self):
super().__init__()
# define all the parameters of the model here
# Note: All the layers and modules have to be direct
# attributes of Net to be included in training (e.g. self.conv1).
# To add them manually: `self.add_module(name, module)`.
self.conv1 = nn.Conv2d(3, 32, kernel_size=5)
self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.conv3 = nn.Conv2d(64, 64, kernel_size=5)
self.conv4 = nn.Conv2d(64, 20, kernel_size=5)
self.fc1 = nn.Linear(2000, 1024) # assumes the input is 224x224
self.fc2 = nn.Linear(1024, 10)
def forward(self, x):
# define the forward pass here
# Note: PyTorch will complain if you tried to do operations between tensors that don't
# have the same dtype and/or device but it will allow scalar tensor operations.
# Be wary of operations using scalars because it is a big source of errors.
# Native Python scalars are mostly fine but Numpy scalars are problematic:
# E.g., `np.array([1.])[0] * torch.tensor(2, device='cuda')` will be in 'cpu'.
# Always put the scalars in `torch.tensor` to know when you are mixing stuff.
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = F.relu(F.max_pool2d(self.conv3(x), 2))
x = F.relu(F.max_pool2d(self.conv4(x), 2))
x = x.view(x.shape[0], -1)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return x
full_epoch()
]¶We need to implement a function, called full_epoch()
, that performs a single complete epoch on a given dataset and returns some computed metrics (e.g., loss and accuracy). It will take as input the model, the data loader, the device to do the operations on and optionally an optimizer. If the optimizer is provided, it will do a training epoch, otherwise it will do a validation epoch. Here, we will also implement a helper function called process()
, for modularity purposes only, that processes a single input batch at a time and it will be called by full_epoch()
at each iteration. Usually, you would only need to modify process()
.
def softmax_cross_entropy(output, target):
# more efficient than `F.cross_entropy(F.softmax(output), target)`
return F.nll_loss(F.log_softmax(output, dim=1), target)
def accuracy(output, target):
predictions = output.max(1, keepdim=True)[1]
return predictions.eq(target.view_as(predictions)).sum()
def process(model, data, target, optimizer=None):
'''Perform the forward and backward passes on the given data.
Args:
model: An `nn.Module` or a function to process `data`.
data: The desired input to `model` (e.g., a batch of images).
target: The desired output of `model` (e.g., ground truth labels).
optimizer: To perform the backward pass.
Returns:
A `dict` of collected metrics.
'''
# compute the loss
output = model(data) # logits
loss = softmax_cross_entropy(output, target)
# if training, update the weights
if optimizer is not None:
# you need to zero_out the gradients of all the parameters
optimizer.zero_grad()
# accumlate the gradients with a backward pass
loss.backward()
# update the parameters with the gradients
optimizer.step()
# save the metrics
metrics = {
'loss': loss.item() * len(data),
'accuracy': accuracy(output, target).item(),
}
return metrics
def full_epoch(model, data_loader, device, optimizer=None):
'''Perform a single epoch.
Args:
model: An `nn.Module` or a function to process `data`.
data_loader: A 'torch.utils.data.DataLoader'.
device: On which device to perfrom the epoch.
Returns:
A `dict` of collected metrics.
'''
# Change model.training to True and False accordingly
if optimizer is None:
model.eval()
else:
model.train()
model.to(device)
total_count = 0
accumulated_metrics = {}
for data, target in verbosify(data_loader):
# process the batch [data (images) and target (labels)]
metrics = process(model, data.to(device), target.to(device), optimizer)
# accumlate the metrics
total_count += len(data)
for metric, value in metrics.items():
if metric not in accumulated_metrics:
accumulated_metrics[metric] = 0
accumulated_metrics[metric] += value
# compute the averaged metrics
for metric in accumulated_metrics:
accumulated_metrics[metric] /= total_count
return accumulated_metrics
train()
and running it]¶But first, we will implement a generic train()
function.
def train(model, device, num_epochs, optimizer, train_loader, valid_loader,
scheduler=None, patience=10, load=None, save=None, log_dir=None, restart=False):
'''Train a model for a certain number of epochs.
Args:
model: An `nn.Module` or a function to process the batches form the loaders.
device: In which device to do the training.
num_epochs: Number of epochs to train.
optimizer: An `torch.optim.Optimizer` (e.g. SGD).
train_loader: The `torch.utils.data.DataLoader` for the training dataset.
valid_loader: The `torch.utils.data.DataLoader` for the validation dataset.
scheduler: The learning rate scheduler.
patience: The number of bad epochs to wait before early termination.
load: Reinitialize the model and its hyper-parameters from this `*.pt` checkpoint file.
save: The `*.pt` checkpoint file to save all the parameters of the trained model.
log_dir: The directory to which we want to save tensorboard summaries.
restart: Whether to remove `log_dir` and `load` before starting the function.
Returns:
The best state of the model during training (at the maximum validation loss).
'''
# restart if desired by removing old files
if restart:
if log_dir is not None and os.path.exists(log_dir):
rmtree(log_dir)
if load is not None and os.path.exists(load):
os.remove(load)
# try to resume from a checkpoint file if `load` was provided
if load is not None:
try:
best_state = torch.load(load)
model.load_state_dict(best_state['model'])
optimizer.load_state_dict(best_state['optimizer'])
scheduler.load_state_dict(best_state['scheduler'])
except FileNotFoundError:
msg = 'Couldn\'t find checkpoint file! {} (training with random initialization)'
print(msg.format(load))
load = None
# otherwise, start from the current initialization
if load is None:
best_state = {
'epoch': -1,
'model': model.state_dict(),
'optimizer': optimizer.state_dict(),
'scheduler': scheduler.state_dict(),
'loss': float('inf'),
}
model.to(device)
num_bad_epochs = 0
for epoch in range(best_state['epoch'] + 1, num_epochs):
# train and validate
train_metrics = full_epoch(model, train_loader, device, optimizer)
valid_metrics = full_epoch(model, valid_loader, device) # will not do backward pass
# get the current learing rate
learning_rate = optimizer.param_groups[0]['lr']
# Note: an nn.Module can have multiple param_groups
# each of which can be assigned a different learning rate
# but by default we have a single param_group.
# reduce the learning rate according to the `scheduler` policy
if scheduler is not None:
scheduler.step(valid_metrics['loss'])
# print the progress
print('Epoch #{}: [train: {:.2e} > {:.2f}%][valid: {:.2e} > {:.2f}%] @ {:.2e}'.format(
epoch, train_metrics['loss'], 100 * train_metrics['accuracy'],
valid_metrics['loss'], 100 * valid_metrics['accuracy'], learning_rate,
))
# save tensorboard summaries
if log_dir is not None:
# create the summary writer only the first time
if not hasattr(log_dir, 'add_summary'):
log_dir = tf.summary.FileWriter(log_dir)
summaries = {
'learning_rate': learning_rate,
}
summaries.update({'train/' + name: value for name, value in train_metrics.items()})
summaries.update({'valid/' + name: value for name, value in valid_metrics.items()})
values = [tf.Summary.Value(tag=k, simple_value=v) for k, v in summaries.items()]
log_dir.add_summary(tf.Summary(value=values), epoch)
log_dir.flush()
# save the model to disk if it has improved
if best_state['loss'] < valid_metrics['loss']:
num_bad_epochs += 1
else:
num_bad_epochs = 0
best_state = {
'epoch': epoch,
'model': model.state_dict(),
'optimizer': optimizer.state_dict(),
'scheduler': scheduler.state_dict(),
'loss': valid_metrics['loss'],
}
if save is not None:
torch.save(best_state, save)
# do early stopping
if num_bad_epochs >= patience:
print('Validation loss didn\'t improve for {} iterations!'.format(patience))
print('[Early stopping]')
break
# close the summary writer if created
if log_dir is not None:
if hasattr(log_dir, 'close'):
log_dir.close()
return best_state
Then, we will initialize a model with some hyper-parameters and start the training.
# intialize the model and the hyper-parameters
model = Net()
num_epochs = 10
logs = './all/log/model'
checkpoint = './all/model.pt'
device = torch.device('cuda:0') # e.g., {'cpu', 'cuda:0', 'cuda:1', ...}
optimizer = torch.optim.Adam(model.parameters(),
lr=1e-4, betas=(0.9, 0.999), weight_decay=0.01)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, 'min', factor=0.2, patience=3, verbose=True)
# train the model
best_state = train(model, device, num_epochs, optimizer, train_loader, valid_loader,
scheduler, patience=10, load=checkpoint, save=checkpoint, log_dir=logs, restart=False)
model.load_state_dict(best_state['model']) # loads the best model acquired during training
Couldn't find checkpoint file! ./all/model.pt (training without reinitialization) Epoch #0: [train: 8.01e-01 > 51.02%][valid: 6.90e-01 > 54.20%] @ 1.00e-04 Epoch #1: [train: 6.95e-01 > 54.94%][valid: 6.65e-01 > 61.28%] @ 1.00e-04 Epoch #2: [train: 6.80e-01 > 57.67%][valid: 6.53e-01 > 61.56%] @ 1.00e-04 Epoch #3: [train: 6.76e-01 > 58.54%][valid: 6.58e-01 > 60.24%] @ 1.00e-04 Epoch #4: [train: 6.60e-01 > 60.82%][valid: 6.56e-01 > 60.12%] @ 1.00e-04 Epoch #5: [train: 6.51e-01 > 62.10%][valid: 6.34e-01 > 64.76%] @ 1.00e-04 Epoch #6: [train: 6.43e-01 > 63.32%][valid: 6.21e-01 > 65.52%] @ 1.00e-04 Epoch #7: [train: 6.36e-01 > 64.58%][valid: 6.04e-01 > 68.04%] @ 1.00e-04 Epoch #8: [train: 6.29e-01 > 65.19%][valid: 6.06e-01 > 68.40%] @ 1.00e-04 Epoch #9: [train: 6.25e-01 > 65.35%][valid: 6.03e-01 > 68.16%] @ 1.00e-04 Tensorboard snapshot:
We implemented a standard deep learning workflow in PyTorch. The steps are (in order):
train_loader
and valid_loader
that loads and transforms our dataset in batches asynchronouslyNet
full_epoch()
that does a single training or validation epoch depending on whether it was fed an optimizertrain()
, that uses full_epoch()
, with early stopping and nice learning rate schedulingNet
with hand-picked hyper-parameterstest()
) but I will leave that to you because now you know how to do it yourselfFinal notes and remarks:
train()
to handle the case when we don't have validation data (i.e., valid_loader is None
). You will certainly need to do a descent number of changes to incorporate this.train()
for callbacks that you call before and after training. Then, put all the early stopping and the learning rate scheduling business outside train()
by implementing them through these callbacks