Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch
Author: Sebastian Raschka Python implementation: CPython Python version : 3.8.12 IPython version : 8.0.1 torch: 1.10.1
AlexNet [1][2] trained on CIFAR-10 [3].
This implementation uses grouped convolutions like in the original AlexNet paper [2]:
Here, the network is essentially split into two parts to train it on two GPUs with 1.5 Gb RAM each. This was purely done for computational performance reasons (and the video RAM limitation back then). However, there are certain benefits to using grouped convolutions ...
Taking a step back, how do grouped convolutions work?
In a nutshell, you can think of grouped convolutions as convolutional layers that process part of the input independently and merge the results. So, for example, if you consider grouped convolutions with two filter groups, each filter group would process half of the channels.
One of the benefits of grouped convolutions is, as noted by Yani Ioannou [4], that AlexNet has a slightly improved accuracy when using two filter groups:
Another benefit is the reduced parameter size.
Say we have kernels with height 3 and width 3. The inputs have 6 channels, and the output channels are set to 12. Then, we have kernels with 3x3x6 weight parameters with a regular convolution. Since we have 12 output channels, that's 3x3x6x12=648 parameters in total.
Now, let's assume we use a grouped convolution with group size 2. We still have a 3x3 kernel height and width. But now, the number of input channels is split by a factor of 2, so each kernel is 3x3x3. The first group produces the first 6 output channels, so we have 3x3x3x6 parameters for the first group. The second group has the same size, so we have (3x3x3x6)x2 = 3x3x3x12 = 324, which is a 2x reduction in parameters compared to the regular convolution.
And how do we do this in PyTorch?
Implementing grouped convolutions in PyTorch is now really straightforward. We just used the groups
parameter. For example, to implement a grouped convolution with two filter groups we use
torch.nn.Conv2d(..., groups=2)
Note that a requirement for this is that the number of input and output channels is divisible by groups (here: 2).
import os
import time
import random
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Subset
from torchvision import datasets
from torchvision import transforms
import matplotlib.pyplot as plt
from PIL import Image
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
I recommend using a function like the following one prior to using dataset loaders and initializing a model if you want to ensure the data is shuffled in the same manner if you rerun this notebook and the model gets the same initial random weights:
def set_all_seeds(seed):
os.environ["PL_GLOBAL_SEED"] = str(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
Similar to the set_all_seeds
function above, I recommend setting the behavior of PyTorch and cuDNN to deterministic (this is particulary relevant when using GPUs). We can also define a function for that:
def set_deterministic():
if torch.cuda.is_available():
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.set_deterministic(True)
##########################
### SETTINGS
##########################
# Hyperparameters
RANDOM_SEED = 1
LEARNING_RATE = 0.0001
BATCH_SIZE = 256
NUM_EPOCHS = 40
# Architecture
NUM_CLASSES = 10
# Other
DEVICE = "cuda:0"
set_all_seeds(RANDOM_SEED)
# Deterministic behavior not yet supported by AdaptiveAvgPool2d
#set_deterministic()
import sys
sys.path.insert(0, "..") # to include ../helper_evaluate.py etc.
from helper_evaluate import compute_accuracy
from helper_data import get_dataloaders_cifar10
from helper_train import train_classifier_simple_v1
### Set random seed ###
set_all_seeds(RANDOM_SEED)
##########################
### Dataset
##########################
train_transforms = transforms.Compose([transforms.Resize((70, 70)),
transforms.RandomCrop((64, 64)),
transforms.ToTensor()])
test_transforms = transforms.Compose([transforms.Resize((70, 70)),
transforms.CenterCrop((64, 64)),
transforms.ToTensor()])
train_loader, valid_loader, test_loader = get_dataloaders_cifar10(
batch_size=BATCH_SIZE,
num_workers=2,
train_transforms=train_transforms,
test_transforms=test_transforms,
validation_fraction=0.1)
Files already downloaded and verified
# Checking the dataset
print('Training Set:\n')
for images, labels in train_loader:
print('Image batch dimensions:', images.size())
print('Image label dimensions:', labels.size())
print(labels[:10])
break
# Checking the dataset
print('\nValidation Set:')
for images, labels in valid_loader:
print('Image batch dimensions:', images.size())
print('Image label dimensions:', labels.size())
print(labels[:10])
break
# Checking the dataset
print('\nTesting Set:')
for images, labels in train_loader:
print('Image batch dimensions:', images.size())
print('Image label dimensions:', labels.size())
print(labels[:10])
break
Training Set: Image batch dimensions: torch.Size([256, 3, 64, 64]) Image label dimensions: torch.Size([256]) tensor([0, 2, 3, 5, 4, 8, 9, 6, 9, 7]) Validation Set: Image batch dimensions: torch.Size([256, 3, 64, 64]) Image label dimensions: torch.Size([256]) tensor([6, 9, 3, 5, 7, 3, 4, 1, 8, 0]) Testing Set: Image batch dimensions: torch.Size([256, 3, 64, 64]) Image label dimensions: torch.Size([256]) tensor([2, 6, 3, 1, 1, 1, 1, 2, 4, 8])
##########################
### MODEL
##########################
class AlexNet(nn.Module):
def __init__(self, num_classes):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2, groups=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1, groups=2),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1, groups=2),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1, groups=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes)
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
logits = self.classifier(x)
probas = F.softmax(logits, dim=1)
return logits
torch.manual_seed(RANDOM_SEED)
model = AlexNet(NUM_CLASSES)
model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
log_dict = train_classifier_simple_v1(num_epochs=NUM_EPOCHS, model=model,
optimizer=optimizer, device=DEVICE,
train_loader=train_loader, valid_loader=valid_loader,
logging_interval=50)
Epoch: 001/040 | Batch 0000/0175 | Loss: 2.3021 Epoch: 001/040 | Batch 0050/0175 | Loss: 2.0457 Epoch: 001/040 | Batch 0100/0175 | Loss: 1.8939 Epoch: 001/040 | Batch 0150/0175 | Loss: 1.8882 ***Epoch: 001/040 | Train. Acc.: 32.408% | Loss: 1.743 ***Epoch: 001/040 | Valid. Acc.: 33.340% | Loss: 1.714 Time elapsed: 1.02 min Epoch: 002/040 | Batch 0000/0175 | Loss: 1.7017 Epoch: 002/040 | Batch 0050/0175 | Loss: 1.7092 Epoch: 002/040 | Batch 0100/0175 | Loss: 1.6315 Epoch: 002/040 | Batch 0150/0175 | Loss: 1.5158 ***Epoch: 002/040 | Train. Acc.: 41.415% | Loss: 1.555 ***Epoch: 002/040 | Valid. Acc.: 42.140% | Loss: 1.538 Time elapsed: 2.06 min Epoch: 003/040 | Batch 0000/0175 | Loss: 1.5034 Epoch: 003/040 | Batch 0050/0175 | Loss: 1.5797 Epoch: 003/040 | Batch 0100/0175 | Loss: 1.4733 Epoch: 003/040 | Batch 0150/0175 | Loss: 1.2810 ***Epoch: 003/040 | Train. Acc.: 48.246% | Loss: 1.380 ***Epoch: 003/040 | Valid. Acc.: 49.020% | Loss: 1.373 Time elapsed: 3.09 min Epoch: 004/040 | Batch 0000/0175 | Loss: 1.3568 Epoch: 004/040 | Batch 0050/0175 | Loss: 1.3999 Epoch: 004/040 | Batch 0100/0175 | Loss: 1.4197 Epoch: 004/040 | Batch 0150/0175 | Loss: 1.3423 ***Epoch: 004/040 | Train. Acc.: 52.406% | Loss: 1.289 ***Epoch: 004/040 | Valid. Acc.: 53.300% | Loss: 1.280 Time elapsed: 4.13 min Epoch: 005/040 | Batch 0000/0175 | Loss: 1.2611 Epoch: 005/040 | Batch 0050/0175 | Loss: 1.2486 Epoch: 005/040 | Batch 0100/0175 | Loss: 1.2251 Epoch: 005/040 | Batch 0150/0175 | Loss: 1.2385 ***Epoch: 005/040 | Train. Acc.: 52.710% | Loss: 1.277 ***Epoch: 005/040 | Valid. Acc.: 54.220% | Loss: 1.255 Time elapsed: 5.17 min Epoch: 006/040 | Batch 0000/0175 | Loss: 1.3014 Epoch: 006/040 | Batch 0050/0175 | Loss: 1.2493 Epoch: 006/040 | Batch 0100/0175 | Loss: 1.1972 Epoch: 006/040 | Batch 0150/0175 | Loss: 1.1217 ***Epoch: 006/040 | Train. Acc.: 58.699% | Loss: 1.130 ***Epoch: 006/040 | Valid. Acc.: 58.900% | Loss: 1.133 Time elapsed: 6.20 min Epoch: 007/040 | Batch 0000/0175 | Loss: 1.1142 Epoch: 007/040 | Batch 0050/0175 | Loss: 1.1751 Epoch: 007/040 | Batch 0100/0175 | Loss: 1.1640 Epoch: 007/040 | Batch 0150/0175 | Loss: 1.3103 ***Epoch: 007/040 | Train. Acc.: 59.344% | Loss: 1.130 ***Epoch: 007/040 | Valid. Acc.: 58.880% | Loss: 1.139 Time elapsed: 7.23 min Epoch: 008/040 | Batch 0000/0175 | Loss: 1.0524 Epoch: 008/040 | Batch 0050/0175 | Loss: 1.1116 Epoch: 008/040 | Batch 0100/0175 | Loss: 1.0474 Epoch: 008/040 | Batch 0150/0175 | Loss: 1.0964 ***Epoch: 008/040 | Train. Acc.: 60.248% | Loss: 1.114 ***Epoch: 008/040 | Valid. Acc.: 59.220% | Loss: 1.136 Time elapsed: 8.28 min Epoch: 009/040 | Batch 0000/0175 | Loss: 1.1965 Epoch: 009/040 | Batch 0050/0175 | Loss: 1.0852 Epoch: 009/040 | Batch 0100/0175 | Loss: 1.1306 Epoch: 009/040 | Batch 0150/0175 | Loss: 1.0086 ***Epoch: 009/040 | Train. Acc.: 61.824% | Loss: 1.058 ***Epoch: 009/040 | Valid. Acc.: 60.620% | Loss: 1.076 Time elapsed: 9.31 min Epoch: 010/040 | Batch 0000/0175 | Loss: 1.0530 Epoch: 010/040 | Batch 0050/0175 | Loss: 1.0641 Epoch: 010/040 | Batch 0100/0175 | Loss: 0.9715 Epoch: 010/040 | Batch 0150/0175 | Loss: 1.1926 ***Epoch: 010/040 | Train. Acc.: 64.944% | Loss: 0.977 ***Epoch: 010/040 | Valid. Acc.: 62.820% | Loss: 1.028 Time elapsed: 10.34 min Epoch: 011/040 | Batch 0000/0175 | Loss: 1.0312 Epoch: 011/040 | Batch 0050/0175 | Loss: 1.0072 Epoch: 011/040 | Batch 0100/0175 | Loss: 0.9125 Epoch: 011/040 | Batch 0150/0175 | Loss: 1.0450 ***Epoch: 011/040 | Train. Acc.: 65.922% | Loss: 0.945 ***Epoch: 011/040 | Valid. Acc.: 64.360% | Loss: 0.990 Time elapsed: 11.37 min Epoch: 012/040 | Batch 0000/0175 | Loss: 1.0078 Epoch: 012/040 | Batch 0050/0175 | Loss: 1.0750 Epoch: 012/040 | Batch 0100/0175 | Loss: 0.8935 Epoch: 012/040 | Batch 0150/0175 | Loss: 0.9567 ***Epoch: 012/040 | Train. Acc.: 67.286% | Loss: 0.907 ***Epoch: 012/040 | Valid. Acc.: 64.140% | Loss: 0.983 Time elapsed: 12.40 min Epoch: 013/040 | Batch 0000/0175 | Loss: 0.9305 Epoch: 013/040 | Batch 0050/0175 | Loss: 0.8980 Epoch: 013/040 | Batch 0100/0175 | Loss: 1.0309 Epoch: 013/040 | Batch 0150/0175 | Loss: 0.9596 ***Epoch: 013/040 | Train. Acc.: 69.614% | Loss: 0.853 ***Epoch: 013/040 | Valid. Acc.: 66.480% | Loss: 0.926 Time elapsed: 13.43 min Epoch: 014/040 | Batch 0000/0175 | Loss: 0.8409 Epoch: 014/040 | Batch 0050/0175 | Loss: 0.9902 Epoch: 014/040 | Batch 0100/0175 | Loss: 0.8945 Epoch: 014/040 | Batch 0150/0175 | Loss: 0.9077 ***Epoch: 014/040 | Train. Acc.: 71.190% | Loss: 0.822 ***Epoch: 014/040 | Valid. Acc.: 67.460% | Loss: 0.915 Time elapsed: 14.45 min Epoch: 015/040 | Batch 0000/0175 | Loss: 0.9422 Epoch: 015/040 | Batch 0050/0175 | Loss: 0.8516 Epoch: 015/040 | Batch 0100/0175 | Loss: 0.7501 Epoch: 015/040 | Batch 0150/0175 | Loss: 0.8076 ***Epoch: 015/040 | Train. Acc.: 70.270% | Loss: 0.832 ***Epoch: 015/040 | Valid. Acc.: 66.460% | Loss: 0.936 Time elapsed: 15.49 min Epoch: 016/040 | Batch 0000/0175 | Loss: 0.8759 Epoch: 016/040 | Batch 0050/0175 | Loss: 0.9273 Epoch: 016/040 | Batch 0100/0175 | Loss: 0.8645 Epoch: 016/040 | Batch 0150/0175 | Loss: 0.8602 ***Epoch: 016/040 | Train. Acc.: 69.623% | Loss: 0.859 ***Epoch: 016/040 | Valid. Acc.: 65.780% | Loss: 0.967 Time elapsed: 16.53 min Epoch: 017/040 | Batch 0000/0175 | Loss: 0.9171 Epoch: 017/040 | Batch 0050/0175 | Loss: 0.8242 Epoch: 017/040 | Batch 0100/0175 | Loss: 0.7830 Epoch: 017/040 | Batch 0150/0175 | Loss: 0.9317 ***Epoch: 017/040 | Train. Acc.: 72.188% | Loss: 0.794 ***Epoch: 017/040 | Valid. Acc.: 67.400% | Loss: 0.948 Time elapsed: 17.56 min Epoch: 018/040 | Batch 0000/0175 | Loss: 0.7613 Epoch: 018/040 | Batch 0050/0175 | Loss: 0.8159 Epoch: 018/040 | Batch 0100/0175 | Loss: 0.8606 Epoch: 018/040 | Batch 0150/0175 | Loss: 0.8943 ***Epoch: 018/040 | Train. Acc.: 74.721% | Loss: 0.720 ***Epoch: 018/040 | Valid. Acc.: 69.640% | Loss: 0.866 Time elapsed: 18.60 min Epoch: 019/040 | Batch 0000/0175 | Loss: 0.7614 Epoch: 019/040 | Batch 0050/0175 | Loss: 0.7849 Epoch: 019/040 | Batch 0100/0175 | Loss: 0.8485 Epoch: 019/040 | Batch 0150/0175 | Loss: 0.8462 ***Epoch: 019/040 | Train. Acc.: 75.212% | Loss: 0.704 ***Epoch: 019/040 | Valid. Acc.: 69.440% | Loss: 0.872 Time elapsed: 19.64 min Epoch: 020/040 | Batch 0000/0175 | Loss: 0.6625 Epoch: 020/040 | Batch 0050/0175 | Loss: 0.7826 Epoch: 020/040 | Batch 0100/0175 | Loss: 0.7387 Epoch: 020/040 | Batch 0150/0175 | Loss: 0.7622 ***Epoch: 020/040 | Train. Acc.: 76.560% | Loss: 0.663 ***Epoch: 020/040 | Valid. Acc.: 69.820% | Loss: 0.859 Time elapsed: 20.68 min Epoch: 021/040 | Batch 0000/0175 | Loss: 0.7006 Epoch: 021/040 | Batch 0050/0175 | Loss: 0.6893 Epoch: 021/040 | Batch 0100/0175 | Loss: 0.6352 Epoch: 021/040 | Batch 0150/0175 | Loss: 0.7598 ***Epoch: 021/040 | Train. Acc.: 76.339% | Loss: 0.674 ***Epoch: 021/040 | Valid. Acc.: 70.540% | Loss: 0.859 Time elapsed: 21.70 min Epoch: 022/040 | Batch 0000/0175 | Loss: 0.7093 Epoch: 022/040 | Batch 0050/0175 | Loss: 0.5893 Epoch: 022/040 | Batch 0100/0175 | Loss: 0.6019 Epoch: 022/040 | Batch 0150/0175 | Loss: 0.7325 ***Epoch: 022/040 | Train. Acc.: 78.268% | Loss: 0.623 ***Epoch: 022/040 | Valid. Acc.: 70.520% | Loss: 0.851 Time elapsed: 22.73 min Epoch: 023/040 | Batch 0000/0175 | Loss: 0.6316 Epoch: 023/040 | Batch 0050/0175 | Loss: 0.5694 Epoch: 023/040 | Batch 0100/0175 | Loss: 0.7315 Epoch: 023/040 | Batch 0150/0175 | Loss: 0.6656 ***Epoch: 023/040 | Train. Acc.: 79.455% | Loss: 0.589 ***Epoch: 023/040 | Valid. Acc.: 71.440% | Loss: 0.828 Time elapsed: 23.77 min Epoch: 024/040 | Batch 0000/0175 | Loss: 0.5285 Epoch: 024/040 | Batch 0050/0175 | Loss: 0.6959 Epoch: 024/040 | Batch 0100/0175 | Loss: 0.5504 Epoch: 024/040 | Batch 0150/0175 | Loss: 0.6831 ***Epoch: 024/040 | Train. Acc.: 80.174% | Loss: 0.570 ***Epoch: 024/040 | Valid. Acc.: 70.540% | Loss: 0.830 Time elapsed: 24.82 min Epoch: 025/040 | Batch 0000/0175 | Loss: 0.6270 Epoch: 025/040 | Batch 0050/0175 | Loss: 0.6128 Epoch: 025/040 | Batch 0100/0175 | Loss: 0.5769 Epoch: 025/040 | Batch 0150/0175 | Loss: 0.6409 ***Epoch: 025/040 | Train. Acc.: 80.933% | Loss: 0.547 ***Epoch: 025/040 | Valid. Acc.: 72.340% | Loss: 0.822 Time elapsed: 25.85 min Epoch: 026/040 | Batch 0000/0175 | Loss: 0.5859 Epoch: 026/040 | Batch 0050/0175 | Loss: 0.5577 Epoch: 026/040 | Batch 0100/0175 | Loss: 0.6651 Epoch: 026/040 | Batch 0150/0175 | Loss: 0.5483 ***Epoch: 026/040 | Train. Acc.: 79.150% | Loss: 0.589 ***Epoch: 026/040 | Valid. Acc.: 70.240% | Loss: 0.907 Time elapsed: 26.90 min Epoch: 027/040 | Batch 0000/0175 | Loss: 0.6005 Epoch: 027/040 | Batch 0050/0175 | Loss: 0.5660 Epoch: 027/040 | Batch 0100/0175 | Loss: 0.6606 Epoch: 027/040 | Batch 0150/0175 | Loss: 0.5047 ***Epoch: 027/040 | Train. Acc.: 81.045% | Loss: 0.539 ***Epoch: 027/040 | Valid. Acc.: 71.120% | Loss: 0.864 Time elapsed: 27.94 min Epoch: 028/040 | Batch 0000/0175 | Loss: 0.5897 Epoch: 028/040 | Batch 0050/0175 | Loss: 0.5210 Epoch: 028/040 | Batch 0100/0175 | Loss: 0.5563 Epoch: 028/040 | Batch 0150/0175 | Loss: 0.5192 ***Epoch: 028/040 | Train. Acc.: 83.891% | Loss: 0.464 ***Epoch: 028/040 | Valid. Acc.: 72.400% | Loss: 0.815 Time elapsed: 28.98 min Epoch: 029/040 | Batch 0000/0175 | Loss: 0.5087 Epoch: 029/040 | Batch 0050/0175 | Loss: 0.6121 Epoch: 029/040 | Batch 0100/0175 | Loss: 0.5465 Epoch: 029/040 | Batch 0150/0175 | Loss: 0.4414 ***Epoch: 029/040 | Train. Acc.: 82.757% | Loss: 0.493 ***Epoch: 029/040 | Valid. Acc.: 71.080% | Loss: 0.851 Time elapsed: 30.01 min Epoch: 030/040 | Batch 0000/0175 | Loss: 0.5460 Epoch: 030/040 | Batch 0050/0175 | Loss: 0.5083 Epoch: 030/040 | Batch 0100/0175 | Loss: 0.4999 Epoch: 030/040 | Batch 0150/0175 | Loss: 0.5453 ***Epoch: 030/040 | Train. Acc.: 83.397% | Loss: 0.469 ***Epoch: 030/040 | Valid. Acc.: 71.660% | Loss: 0.869 Time elapsed: 31.03 min Epoch: 031/040 | Batch 0000/0175 | Loss: 0.4998 Epoch: 031/040 | Batch 0050/0175 | Loss: 0.4808 Epoch: 031/040 | Batch 0100/0175 | Loss: 0.4958 Epoch: 031/040 | Batch 0150/0175 | Loss: 0.5201 ***Epoch: 031/040 | Train. Acc.: 84.167% | Loss: 0.447 ***Epoch: 031/040 | Valid. Acc.: 71.980% | Loss: 0.873 Time elapsed: 32.07 min Epoch: 032/040 | Batch 0000/0175 | Loss: 0.3548 Epoch: 032/040 | Batch 0050/0175 | Loss: 0.4062 Epoch: 032/040 | Batch 0100/0175 | Loss: 0.4292 Epoch: 032/040 | Batch 0150/0175 | Loss: 0.4786 ***Epoch: 032/040 | Train. Acc.: 85.027% | Loss: 0.430 ***Epoch: 032/040 | Valid. Acc.: 71.920% | Loss: 0.899 Time elapsed: 33.10 min Epoch: 033/040 | Batch 0000/0175 | Loss: 0.4133 Epoch: 033/040 | Batch 0050/0175 | Loss: 0.3402 Epoch: 033/040 | Batch 0100/0175 | Loss: 0.3988 Epoch: 033/040 | Batch 0150/0175 | Loss: 0.4555 ***Epoch: 033/040 | Train. Acc.: 87.281% | Loss: 0.373 ***Epoch: 033/040 | Valid. Acc.: 72.960% | Loss: 0.846 Time elapsed: 34.13 min Epoch: 034/040 | Batch 0000/0175 | Loss: 0.4540 Epoch: 034/040 | Batch 0050/0175 | Loss: 0.5704 Epoch: 034/040 | Batch 0100/0175 | Loss: 0.5121 Epoch: 034/040 | Batch 0150/0175 | Loss: 0.3992 ***Epoch: 034/040 | Train. Acc.: 88.328% | Loss: 0.340 ***Epoch: 034/040 | Valid. Acc.: 73.600% | Loss: 0.847 Time elapsed: 35.17 min Epoch: 035/040 | Batch 0000/0175 | Loss: 0.3568 Epoch: 035/040 | Batch 0050/0175 | Loss: 0.4348 Epoch: 035/040 | Batch 0100/0175 | Loss: 0.3955 Epoch: 035/040 | Batch 0150/0175 | Loss: 0.4242 ***Epoch: 035/040 | Train. Acc.: 88.502% | Loss: 0.338 ***Epoch: 035/040 | Valid. Acc.: 72.520% | Loss: 0.874 Time elapsed: 36.20 min Epoch: 036/040 | Batch 0000/0175 | Loss: 0.2860 Epoch: 036/040 | Batch 0050/0175 | Loss: 0.3446 Epoch: 036/040 | Batch 0100/0175 | Loss: 0.4947 Epoch: 036/040 | Batch 0150/0175 | Loss: 0.3412 ***Epoch: 036/040 | Train. Acc.: 88.482% | Loss: 0.325 ***Epoch: 036/040 | Valid. Acc.: 72.760% | Loss: 0.905 Time elapsed: 37.24 min Epoch: 037/040 | Batch 0000/0175 | Loss: 0.3963 Epoch: 037/040 | Batch 0050/0175 | Loss: 0.3388 Epoch: 037/040 | Batch 0100/0175 | Loss: 0.3279 Epoch: 037/040 | Batch 0150/0175 | Loss: 0.3965 ***Epoch: 037/040 | Train. Acc.: 90.065% | Loss: 0.294 ***Epoch: 037/040 | Valid. Acc.: 71.740% | Loss: 0.904 Time elapsed: 38.27 min Epoch: 038/040 | Batch 0000/0175 | Loss: 0.3269 Epoch: 038/040 | Batch 0050/0175 | Loss: 0.3193 Epoch: 038/040 | Batch 0100/0175 | Loss: 0.3548 Epoch: 038/040 | Batch 0150/0175 | Loss: 0.3414 ***Epoch: 038/040 | Train. Acc.: 90.460% | Loss: 0.280 ***Epoch: 038/040 | Valid. Acc.: 73.400% | Loss: 0.904 Time elapsed: 39.30 min Epoch: 039/040 | Batch 0000/0175 | Loss: 0.2794 Epoch: 039/040 | Batch 0050/0175 | Loss: 0.2914 Epoch: 039/040 | Batch 0100/0175 | Loss: 0.3197 Epoch: 039/040 | Batch 0150/0175 | Loss: 0.2898 ***Epoch: 039/040 | Train. Acc.: 90.545% | Loss: 0.276 ***Epoch: 039/040 | Valid. Acc.: 73.240% | Loss: 0.922 Time elapsed: 40.33 min Epoch: 040/040 | Batch 0000/0175 | Loss: 0.3442 Epoch: 040/040 | Batch 0050/0175 | Loss: 0.3222 Epoch: 040/040 | Batch 0100/0175 | Loss: 0.4111 Epoch: 040/040 | Batch 0150/0175 | Loss: 0.3112 ***Epoch: 040/040 | Train. Acc.: 89.846% | Loss: 0.292 ***Epoch: 040/040 | Valid. Acc.: 71.940% | Loss: 0.997 Time elapsed: 41.36 min Total Training Time: 41.36 min
import matplotlib.pyplot as plt
%matplotlib inline
loss_list = log_dict['train_loss_per_batch']
plt.plot(loss_list, label='Minibatch loss')
plt.plot(np.convolve(loss_list,
np.ones(200,)/200, mode='valid'),
label='Running average')
plt.ylabel('Cross Entropy')
plt.xlabel('Iteration')
plt.legend()
plt.show()
plt.plot(np.arange(1, NUM_EPOCHS+1), log_dict['train_acc_per_epoch'], label='Training')
plt.plot(np.arange(1, NUM_EPOCHS+1), log_dict['valid_acc_per_epoch'], label='Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
with torch.set_grad_enabled(False):
train_acc = compute_accuracy(model=model,
data_loader=test_loader,
device=DEVICE)
test_acc = compute_accuracy(model=model,
data_loader=test_loader,
device=DEVICE)
valid_acc = compute_accuracy(model=model,
data_loader=valid_loader,
device=DEVICE)
print(f'Train ACC: {valid_acc:.2f}%')
print(f'Validation ACC: {valid_acc:.2f}%')
print(f'Test ACC: {test_acc:.2f}%')
Train ACC: 71.94% Validation ACC: 71.94% Test ACC: 70.93%
%watermark -iv
torch : 1.10.1 torchvision: 0.11.2 numpy : 1.22.0 pandas : 1.4.1 sys : 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0] matplotlib : 3.3.4 PIL : 9.0.1