Simplest Neural Network - linear layer with activation function

- toc: false
- badges: true
- comments: true
- categories: [ML, medical]

In the previous post I've explained what is the most important concept in neural networks - technique that allows us to incrementally find minimum of a function. This is called Gradient Descent algorithm!

In this post I will build on this concept and show you how to create a basic linear model to predict what is on a medical image!

As in the first post showing how to build medical image recognition with pure statistics, we need to download data first. For some basic description of how the data looks like, see that first post first ;)

In [1]:

```
! git clone https://github.com/apolanco3225/Medical-MNIST-Classification.git
! rm -rf ./medical_mnist
! mv Medical-MNIST-Classification/resized/ ./medical_mnist
! rm -rf Medical-MNIST-Classification
```

In [2]:

```
from pathlib import Path
PATH = Path("medical_mnist/")
```

We have much more powerful tools now, so we will deal with all 6 classes now, but first need to prepare data:

- all data needs to be numerical
- it needs to be in arrays
- it needs to be labeled

In [3]:

```
classes = [cls.name for cls in PATH.iterdir()]
classes
```

Out[3]:

['AbdomenCT', 'BreastMRI', 'CXR', 'ChestCT', 'Hand', 'HeadCT']

The plan is to

In [4]:

```
images = {}
for cls in classes:
images[cls] = list((PATH/cls).iterdir())
```

In [5]:

```
from PIL import Image
```

In [6]:

```
import torch
from torchvision.transforms import ToTensor
image_tensors = {}
for cls in classes:
image_tensors[cls] = torch.stack( # converts iterable of tensors into higher dimention single tensor
[
ToTensor()( # converts images to tensors
Image.open(path)
).view(-1, 64 * 64).squeeze().float()/255 # reshape tensor from 64x64 to vector tensor of size 4096 and convert values
for path in (PATH/cls).iterdir()]
)
```

so let's see what we got there

In [7]:

```
for cls in classes:
class_shape = image_tensors[cls].shape
print(f"{cls} has {class_shape[0]} images of a size {class_shape[1:]}")
```

In [8]:

```
x_train = torch.cat([image_tensors[cls] for cls in classes], dim=0)
y_train = torch.cat([torch.tensor([index] * image_tensors[cls].shape[0]) for index, cls in enumerate(classes)])
```

In [9]:

```
permutations = torch.randperm(x_train.shape[0])
```

In [10]:

```
x_train = x_train[permutations]
y_train = y_train[permutations]
```

In [11]:

```
valid_pct = 0.2
valid_index = int(x_train.shape[0] * valid_pct)
valid_index
```

Out[11]:

11790

In [12]:

```
# we take out first 20% of examples from the training set
x_valid = x_train[:valid_index]
y_valid = y_train[:valid_index]
x_train = x_train[valid_index:]
y_train = y_train[valid_index:]
```

In [13]:

```
x_train.shape, y_train.shape, x_valid.shape, y_valid.shape
```

Out[13]:

(torch.Size([47164, 4096]), torch.Size([47164]), torch.Size([11790, 4096]), torch.Size([11790]))

`linear_layer`

function) and nonlinearity (`softmax`

function).

In [14]:

```
# it normalizes all 10 classes so we can treat each class prediction as probability that add up to 1.0
def softmax(x):
return x - x.exp().sum(-1).unsqueeze(-1)
```

In [15]:

```
def linear_layer(x):
return x @ weights + bias
```

`softmax`

)

In [16]:

```
def model(x):
return softmax(linear_layer(x))
```

For the Gradient Descent to work we also need to specify the loss function - this is crucial, as this is the function on which we compute gradients for our parameters. Just a quick recap: Gradient Descent algorithm finds out the values to change function parameters so the function values decreese.

In your case we minimize `loss_func`

. Parameters of this function (passed in a form of `preds`

) are in the `model`

: `wegiths`

and `bias`

. Gradient Descent will give us values to change each of those parameters so we minimize the `loss_func`

In [17]:

```
def loss_func(preds, targets):
return -preds[range(targets.shape[0]), targets].mean()
def accuracy(preds, targets):
return (torch.argmax(preds, dim=-1) == targets).float().mean()
```

And here is the Gradient Descent loop - see comments in the code for details

In [18]:

```
%%time
# number of training examples
n = x_train.shape[0]
# batch size - this is necessary as we won't be able to fit all
# the examples into the memmory, so we need to do the computations in batches
bs = 64
# how many epochs to train for
epochs = 15
weights = torch.zeros((64 * 64, 10), requires_grad=True) # define weights matrix
bias = torch.zeros(10, requires_grad=True) # and bias term
# in each of those epochs algorithm sees all the images. So in this case
# we see all the images 15 times
for epoch in range(epochs):
# here is the loop for batches: in each batch we:
# - see 64 images
# - compute predictions based on the model
# - compute the loss
# - compute gradients and update parameters (wegiths and bias)
for i in range((n - 1) // bs + 1):
# select images for this batch
start_i = i * bs
end_i = start_i + bs
xb = x_train[start_i:end_i]
yb = y_train[start_i:end_i]
# compute predictions
preds = model(xb)
# compute loss
loss = loss_func(preds, yb)
# compute gradients (this is done for us by PyTorch with this backwards function!)
loss.backward()
# this block is necessary, so computations we do below, are not taken into account when
# computing next gradients
with torch.no_grad():
# update parameters
weights -= weights.grad
bias -= bias.grad
# zero out the gradients so they are ready for the next batch (otherwise they accumulate values)
weights.grad.zero_()
bias.grad.zero_()
# eventually after each epoch (seeing all the images) we print out how we did
print(f"Epoch {epoch} accuracy: {accuracy(model(x_valid),y_valid)}%, loss: {loss_func(model(x_valid), y_valid)}")
```

With this simplest neural network we got to almost 96% accuracy - this is pretty good.