Now it's time to deal with more challenging task - classification of the original Oxford-IIIT Dataset. Let's start by loading and visualizing the dataset.
!wget https://mslearntensorflowlp.blob.core.windows.net/data/oxpets_images.tar.gz
!tar xfz oxpets_images.tar.gz
!rm oxpets_images.tar.gz
We will define generic function to display a series of images from a list:
import matplotlib.pyplot as plt
import os
from PIL import Image
import numpy as np
def display_images(l,titles=None,fontsize=12):
n=len(l)
fig,ax = plt.subplots(1,n)
for i,im in enumerate(l):
ax[i].imshow(im)
ax[i].axis('off')
if titles is not None:
ax[i].set_title(titles[i],fontsize=fontsize)
fig.set_size_inches(fig.get_size_inches()*n)
plt.tight_layout()
plt.show()
You can see that all images are located in one directory called images
, and their name contains the name of the class (breed):
fnames = os.listdir('images')[:5]
display_images([Image.open(os.path.join('images',x)) for x in fnames],titles=fnames,fontsize=30)
To simplify classification and use the same approach to loading images as in the previous part, let's sort all images into corresponding directories:
for fn in os.listdir('images'):
cls = fn[:fn.rfind('_')].lower()
os.makedirs(os.path.join('images',cls),exist_ok=True)
os.replace(os.path.join('images',fn),os.path.join('images',cls,fn))
Let's also define the number of classes in our dataset:
num_classes = len(os.listdir('images'))
num_classes
37
To start training our neural network, we need to convert all images to tensors, and also create tensors corresponding to labels (class numbers). Most neural network frameworks contain simple tools for dealing with images:
tf.keras.preprocessing.image_dataset_from_directory
torchvision.datasets.ImageFolder
As you have seen from the pictures above, all of them are close to square image ratio, so we need to resize all images to square size. Also, we can organize images in minibatches.
import torch
import torchvision
import torchvision.transforms as tr
image_size = 224
batch_size = 32
dataset = torchvision.datasets.ImageFolder(
'images',
transform = tr.Compose([
tr.Resize(image_size),
tr.CenterCrop(image_size),
tr.ToTensor()
]))
classnames = dataset.classes
Now we need to separate dataset into train and test portions:
n = len(dataset)
n_train = int(0.8*n)
train_data,test_data = torch.utils.data.random_split(dataset,[n_train,n-n_train])
Now define data loaders:
train_loader = torch.utils.data.DataLoader(train_data,batch_size=64,shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data,batch_size=64,shuffle=True)
for _,(i,l) in zip(range(3),train_loader):
display_images(i[:7].permute(0,2,3,1),titles=[dataset.classes[x] for x in l[:7]],fontsize=50)
For image classification, you should probably define a convolutional neural network with several layers. What to keep an eye for:
An important thing is to get the activation function on the last layer + loss function right:
softmax
as the activation, and sparse_categorical_crossentropy
as loss. The difference between sparse categorical cross-entropy and non-sparse one is that the former expects output as the number of class, and not as one-hot vector.CrossEntropyLoss
loss function. This function applies softmax automatically.Hint: In PyTorch, you can use
LazyLinear
layer instead ofLinear
, in order to avoid computing the number of inputs. It only requires onen_out
parameter, which is number of neurons in the layer, and the dimension of input data is picked up automatically upon firstforward
pass.
model = torch.nn.Sequential(
torch.nn.Conv2d(3,16,(3,3)),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2),
torch.nn.Conv2d(16,32,(3,3)),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2),
torch.nn.Conv2d(32,64,(3,3)),
torch.nn.ReLU(),
torch.nn.MaxPool2d(2),
torch.nn.Flatten(),
torch.nn.LazyLinear(2000),
torch.nn.ReLU(),
torch.nn.LazyLinear(num_classes)
)
Now we are ready to train the neural network. During training, please collect accuracy on train and test data on each epoch, and then plot the accuracy to see if there is overfitting.
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
def train_epoch(net,dataloader,lr=0.01,optimizer=None,loss_fn = torch.nn.NLLLoss()):
optimizer = optimizer or torch.optim.Adam(net.parameters(),lr=lr)
net.train()
total_loss,acc,count = 0,0,0
for features,labels in dataloader:
f = features.to(device)
l = labels.to(device)
optimizer.zero_grad()
out = net(f)
loss = loss_fn(out,l) #cross_entropy(out,labels)
loss.backward()
optimizer.step()
total_loss+=loss
_,predicted = torch.max(out,1)
acc+=(predicted==l).sum()
count+=len(labels)
return total_loss.item()/count, acc.item()/count
def validate(net, dataloader,loss_fn=torch.nn.NLLLoss()):
net.eval()
count,acc,loss = 0,0,0
with torch.no_grad():
for features,labels in dataloader:
f = features.to(device)
l = labels.to(device)
out = net(f)
loss += loss_fn(out,l)
pred = torch.max(out,1)[1]
acc += (pred==l).sum()
count += len(labels)
return loss.item()/count, acc.item()/count
def train(net,train_loader,test_loader,optimizer=None,lr=0.01,epochs=10,loss_fn=torch.nn.NLLLoss()):
optimizer = optimizer or torch.optim.Adam(net.parameters(),lr=lr)
res = { 'train_loss' : [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
for ep in range(epochs):
tl,ta = train_epoch(net,train_loader,optimizer=optimizer,lr=lr,loss_fn=loss_fn)
vl,va = validate(net,test_loader,loss_fn=loss_fn)
print(f"Epoch {ep:2}, Train acc={ta:.3f}, Val acc={va:.3f}, Train loss={tl:.3f}, Val loss={vl:.3f}")
res['train_loss'].append(tl)
res['train_acc'].append(ta)
res['val_loss'].append(vl)
res['val_acc'].append(va)
return res
hist = train(model,train_loader,test_loader,epochs=10,lr=0.001,loss_fn=torch.nn.CrossEntropyLoss())
Epoch 0, Train acc=0.060, Val acc=0.084, Train loss=0.055, Val loss=0.054 Epoch 1, Train acc=0.129, Val acc=0.118, Train loss=0.050, Val loss=0.052 Epoch 2, Train acc=0.226, Val acc=0.156, Train loss=0.044, Val loss=0.051 Epoch 3, Train acc=0.409, Val acc=0.154, Train loss=0.033, Val loss=0.056 Epoch 4, Train acc=0.695, Val acc=0.147, Train loss=0.017, Val loss=0.072 Epoch 5, Train acc=0.923, Val acc=0.144, Train loss=0.005, Val loss=0.094 Epoch 6, Train acc=0.981, Val acc=0.164, Train loss=0.001, Val loss=0.105 Epoch 7, Train acc=0.992, Val acc=0.145, Train loss=0.001, Val loss=0.126 Epoch 8, Train acc=0.991, Val acc=0.145, Train loss=0.001, Val loss=0.118 Epoch 9, Train acc=0.994, Val acc=0.144, Train loss=0.000, Val loss=0.132
plt.plot(hist['train_acc'],label='Training')
plt.plot(hist['val_acc'],label='Test')
plt.grid()
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Looks like the accuracy is far from great!
To improve the accuracy, let's use pre-trained neural network as feature extractor. Feel free to experiment with VGG-16/VGG-19 models, ResNet50, etc.
Since this training is slower, you may start with training the model for the small number of epochs, eg. 3. You can alsways resume training to further improve accuracy if needed.
We need to normalize our data differently for transfer learning, thus we will reload the dataset again using different set of transforms:
std_normalize = tr.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
trans = tr.Compose([
tr.Resize(256),
tr.CenterCrop(224),
tr.ToTensor(),
std_normalize])
dataset = torchvision.datasets.ImageFolder('images',transform=trans)
n = len(dataset)
n_train = int(0.8*n)
train_data,test_data = torch.utils.data.random_split(dataset,[n_train,n-n_train])
train_loader = torch.utils.data.DataLoader(train_data,batch_size=64,shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data,batch_size=64,shuffle=True)
Let's load the pre-trained network:
vgg = torchvision.models.vgg16(pretrained=True)
vgg
VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) )
There is a slot called classifier
, which you can replace with your own classifier for the desired number of classes. We will also move the model to GPU device:
vgg.classifier = torch.nn.LazyLinear(num_classes)
Make sure to set all parameters of VGG feature extractor not to be trainable using requires_grad
property:
for x in vgg.features.parameters():
x.requires_grad = False
vgg = vgg.to(device)
Now we can start the training. Be very patient, as training takes a long time, and our train function is not designed to print anything before the end of the epoch.
hist = train(vgg,train_loader,test_loader,epochs=3,lr=0.001,loss_fn=torch.nn.CrossEntropyLoss())
Epoch 0, Train acc=0.709, Val acc=0.809, Train loss=0.028, Val loss=0.020 Epoch 1, Train acc=0.952, Val acc=0.848, Train loss=0.003, Val loss=0.016 Epoch 2, Train acc=0.978, Val acc=0.851, Train loss=0.001, Val loss=0.018
It seems much better now!
We can also computer Top3 accuracy using the same code as in the previous exercise.
correct = 0
total = 0
for t,l in test_loader:
out = vgg(t.to(device))
_,r = out.topk(5,1,True,True)
r = r.t()
correct += r.eq(l.to(device).view(1,-1).expand_as(r)).sum()
total += len(l)
print(correct/total)
tensor(0.9878, device='cuda:0')