Simplest Neural Network - linear layer with activation function
In the previous post I've explained what is the most important concept in neural networks - technique that allows us to incrementally find minimum of a function. This is called Gradient Descent algorithm!
In this post I will build on this concept and show you how to create a basic linear model to predict what is on a medical image!
As in the first post showing how to build medical image recognition with pure statistics, we need to download data first. For some basic description of how the data looks like, see that first post first ;)
! git clone https://github.com/apolanco3225/Medical-MNIST-Classification.git
! rm -rf ./medical_mnist
! mv Medical-MNIST-Classification/resized/ ./medical_mnist
! rm -rf Medical-MNIST-Classification
Cloning into 'Medical-MNIST-Classification'... remote: Enumerating objects: 58532, done. remote: Total 58532 (delta 0), reused 0 (delta 0), pack-reused 58532 Receiving objects: 100% (58532/58532), 77.86 MiB | 17.62 MiB/s, done. Resolving deltas: 100% (506/506), done. Checking connectivity... done. Checking out files: 100% (58959/58959), done.
from pathlib import Path
PATH = Path("medical_mnist/")
We have much more powerful tools now, so we will deal with all 6 classes now, but first need to prepare data:
classes = [cls.name for cls in PATH.iterdir()]
classes
['AbdomenCT', 'BreastMRI', 'CXR', 'ChestCT', 'Hand', 'HeadCT']
The plan is to
images = {}
for cls in classes:
images[cls] = list((PATH/cls).iterdir())
from PIL import Image
import torch
from torchvision.transforms import ToTensor
image_tensors = {}
for cls in classes:
image_tensors[cls] = torch.stack( # converts iterable of tensors into higher dimention single tensor
[
ToTensor()( # converts images to tensors
Image.open(path)
).view(-1, 64 * 64).squeeze().float()/255 # reshape tensor from 64x64 to vector tensor of size 4096 and convert values
for path in (PATH/cls).iterdir()]
)
so let's see what we got there
for cls in classes:
class_shape = image_tensors[cls].shape
print(f"{cls} has {class_shape[0]} images of a size {class_shape[1:]}")
AbdomenCT has 10000 images of a size torch.Size([4096]) BreastMRI has 8954 images of a size torch.Size([4096]) CXR has 10000 images of a size torch.Size([4096]) ChestCT has 10000 images of a size torch.Size([4096]) Hand has 10000 images of a size torch.Size([4096]) HeadCT has 10000 images of a size torch.Size([4096])
x_train = torch.cat([image_tensors[cls] for cls in classes], dim=0)
y_train = torch.cat([torch.tensor([index] * image_tensors[cls].shape[0]) for index, cls in enumerate(classes)])
shuffle the dataset. This is important as if we don't do this, images from the classes that we train first will be effectively not as "fresh" in the memmory of the network
permutations = torch.randperm(x_train.shape[0])
x_train = x_train[permutations]
y_train = y_train[permutations]
create validation set that is 20% of the training set - this is important to asses the performance of the model. Validation set doesn't take part in training, so model is not biased towards those images (it cannot remember those exact images from training). This is essential to see how well model generalizes on examples it has not seen.
valid_pct = 0.2
valid_index = int(x_train.shape[0] * valid_pct)
valid_index
11790
# we take out first 20% of examples from the training set
x_valid = x_train[:valid_index]
y_valid = y_train[:valid_index]
x_train = x_train[valid_index:]
y_train = y_train[valid_index:]
x_train.shape, y_train.shape, x_valid.shape, y_valid.shape
(torch.Size([47164, 4096]), torch.Size([47164]), torch.Size([11790, 4096]), torch.Size([11790]))
Now we need to build the model. Let's use the most basic building blocks of the neural network - linear layer (linear_layer
function) and nonlinearity (softmax
function).
# it normalizes all 10 classes so we can treat each class prediction as probability that add up to 1.0
def softmax(x):
return x - x.exp().sum(-1).unsqueeze(-1)
def linear_layer(x):
return x @ weights + bias
the model is a simple function composition (results of linear layer and fed into nonlinear layer (softmax
)
def model(x):
return softmax(linear_layer(x))
For the Gradient Descent to work we also need to specify the loss function - this is crucial, as this is the function on which we compute gradients for our parameters. Just a quick recap: Gradient Descent algorithm finds out the values to change function parameters so the function values decreese.
In your case we minimize loss_func
. Parameters of this function (passed in a form of preds
) are in the model
: wegiths
and bias
. Gradient Descent will give us values to change each of those parameters so we minimize the loss_func
def loss_func(preds, targets):
return -preds[range(targets.shape[0]), targets].mean()
def accuracy(preds, targets):
return (torch.argmax(preds, dim=-1) == targets).float().mean()
And here is the Gradient Descent loop - see comments in the code for details
%%time
# number of training examples
n = x_train.shape[0]
# batch size - this is necessary as we won't be able to fit all
# the examples into the memmory, so we need to do the computations in batches
bs = 64
# how many epochs to train for
epochs = 15
weights = torch.zeros((64 * 64, 10), requires_grad=True) # define weights matrix
bias = torch.zeros(10, requires_grad=True) # and bias term
# in each of those epochs algorithm sees all the images. So in this case
# we see all the images 15 times
for epoch in range(epochs):
# here is the loop for batches: in each batch we:
# - see 64 images
# - compute predictions based on the model
# - compute the loss
# - compute gradients and update parameters (wegiths and bias)
for i in range((n - 1) // bs + 1):
# select images for this batch
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
# compute predictions
preds = model(xb)
# compute loss
loss = loss_func(preds, yb)
# compute gradients (this is done for us by PyTorch with this backwards function!)
loss.backward()
# this block is necessary, so computations we do below, are not taken into account when
# computing next gradients
with torch.no_grad():
# update parameters
weights -= weights.grad
bias -= bias.grad
# zero out the gradients so they are ready for the next batch (otherwise they accumulate values)
weights.grad.zero_()
bias.grad.zero_()
# eventually after each epoch (seeing all the images) we print out how we did
print(f"Epoch {epoch} accuracy: {accuracy(model(x_valid),y_valid)}%, loss: {loss_func(model(x_valid), y_valid)}")
Epoch 0 accuracy: 0.7695504426956177%, loss: 2.5434305667877197 Epoch 1 accuracy: 0.787277340888977%, loss: 2.3623108863830566 Epoch 2 accuracy: 0.7912637591362%, loss: 2.232290267944336 Epoch 3 accuracy: 0.8318066000938416%, loss: 2.134671926498413 Epoch 4 accuracy: 0.9314673542976379%, loss: 2.058687448501587 Epoch 5 accuracy: 0.9561492800712585%, loss: 1.9977604150772095 Epoch 6 accuracy: 0.9601356983184814%, loss: 1.9477065801620483 Epoch 7 accuracy: 0.9603053331375122%, loss: 1.9057585000991821 Epoch 8 accuracy: 0.9602205157279968%, loss: 1.8700224161148071 Epoch 9 accuracy: 0.9598812460899353%, loss: 1.8391577005386353 Epoch 10 accuracy: 0.9594571590423584%, loss: 1.8121910095214844 Epoch 11 accuracy: 0.9597116112709045%, loss: 1.788395881652832 Epoch 12 accuracy: 0.9596267938613892%, loss: 1.7672215700149536 Epoch 13 accuracy: 0.9598812460899353%, loss: 1.7482411861419678 Epoch 14 accuracy: 0.9598812460899353%, loss: 1.7311174869537354 CPU times: user 1min 5s, sys: 173 ms, total: 1min 5s Wall time: 9.22 s
With this simplest neural network we got to almost 96% accuracy - this is pretty good.