*Source: 🤖Homemade Machine Learning repository*

☝Before moving on with this demo you might want to take a look at:

**Logistic regression** is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Logistic Regression is used when the dependent variable (target) is categorical.

For example:

- To predict whether an email is spam (
`1`

) or (`0`

). - Whether online transaction is fraudulent (
`1`

) or not (`0`

). - Whether the tumor is malignant (
`1`

) or not (`0`

).

Demo Project:In this example we will train clothes classifier that will recognize clothes types (10 categories) from`28x28`

pixel images.

In [1]:

```
# To make debugging of logistic_regression module easier we enable imported modules autoreloading feature.
# By doing this you may change the code of logistic_regression library and all these changes will be available here.
%load_ext autoreload
%autoreload 2
# Add project root folder to module loading paths.
import sys
sys.path.append('../..')
```

- pandas - library that we will use for loading and displaying the data in a table
- numpy - library that we will use for linear algebra operations
- matplotlib - library that we will use for plotting the data
- math - math library that we will use to calculate sqaure roots etc.
- logistic_regression - custom implementation of logistic regression

In [2]:

```
# Import 3rd party dependencies.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import math
# Import custom logistic regression implementation.
from homemade.logistic_regression import LogisticRegression
```

In this demo we will use a sample of Fashion MNIST dataset in a CSV format.

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

Instead of using full dataset with 60000 training examples we will use cut dataset of just 5000 examples that we will also split into training and testing sets.

Each row in the dataset consists of 785 values: the first value is the label (a category from 0 to 9) and the remaining 784 values (28x28 pixels image) are the pixel values (a number from 0 to 255).

Each training and test example is assigned to one of the following labels:

- 0 T-shirt/top
- 1 Trouser
- 2 Pullover
- 3 Dress
- 4 Coat
- 5 Sandal
- 6 Shirt
- 7 Sneaker
- 8 Bag
- 9 Ankle boot

In [3]:

```
# Load the data.
data = pd.read_csv('../../data/fashion-mnist-demo.csv')
# Laets create the mapping between numeric category and category name.
label_map = {
0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot',
}
# Print the data table.
data.head(10)
```

Out[3]:

label | pixel1 | pixel2 | pixel3 | pixel4 | pixel5 | pixel6 | pixel7 | pixel8 | pixel9 | ... | pixel775 | pixel776 | pixel777 | pixel778 | pixel779 | pixel780 | pixel781 | pixel782 | pixel783 | pixel784 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

1 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

2 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | ... | 0 | 0 | 0 | 30 | 43 | 0 | 0 | 0 | 0 | 0 |

3 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | ... | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |

4 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

5 | 4 | 0 | 0 | 0 | 5 | 4 | 5 | 5 | 3 | 5 | ... | 7 | 8 | 7 | 4 | 3 | 7 | 5 | 0 | 0 | 0 |

6 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

7 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

8 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 2 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

9 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 203 | 214 | 166 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

10 rows × 785 columns

Let's peek first 25 rows of the dataset and display them as an images to have an example of clothes we will be working with.

In [4]:

```
# How many images to display.
numbers_to_display = 25
# Calculate the number of cells that will hold all the images.
num_cells = math.ceil(math.sqrt(numbers_to_display))
# Make the plot a little bit bigger than default one.
plt.figure(figsize=(10, 10))
# Go through the first images in a training set and plot them.
for plot_index in range(numbers_to_display):
# Extrace image data.
digit = data[plot_index:plot_index + 1].values
digit_label = digit[0][0]
digit_pixels = digit[0][1:]
# Calculate image size (remember that each picture has square proportions).
image_size = int(math.sqrt(digit_pixels.shape[0]))
# Convert image vector into the matrix of pixels.
frame = digit_pixels.reshape((image_size, image_size))
# Plot the image matrix.
plt.subplot(num_cells, num_cells, plot_index + 1)
plt.imshow(frame, cmap='Greys')
plt.title(label_map[digit_label])
plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
# Plot all subplots.
plt.subplots_adjust(hspace=0.5, wspace=0.5)
plt.show()
```

In this step we will split our dataset into *training* and *testing* subsets (in proportion 80/20%).

Training data set will be used for training of our model. Testing dataset will be used for validating of the model. All data from testing dataset will be new to model and we may check how accurate are model predictions.

In [5]:

```
# Split data set on training and test sets with proportions 80/20.
# Function sample() returns a random sample of items.
pd_train_data = data.sample(frac=0.8)
pd_test_data = data.drop(pd_train_data.index)
# Convert training and testing data from Pandas to NumPy format.
train_data = pd_train_data.values
test_data = pd_test_data.values
# Extract training/test labels and features.
num_training_examples = 3000
x_train = train_data[:num_training_examples, 1:]
y_train = train_data[:num_training_examples, [0]]
x_test = test_data[:, 1:]
y_test = test_data[:, [0]]
```

☝🏻This is the place where you might want to play with model configuration.

`polynomial_degree`

- this parameter will allow you to add additional polynomial features of certain degree. More features - more curved the line will be.`max_iterations`

- this is the maximum number of iterations that gradient descent algorithm will use to find the minimum of a cost function. Low numbers may prevent gradient descent from reaching the minimum. High numbers will make the algorithm work longer without improving its accuracy.`regularization_param`

- parameter that will fight overfitting. The higher the parameter, the simplier is the model will be.`polynomial_degree`

- the degree of additional polynomial features (`x1^2 * x2, x1^2 * x2^2, ...`

). This will allow you to curve the predictions.`sinusoid_degree`

- the degree of sinusoid parameter multipliers of additional features (`sin(x), sin(2*x), ...`

). This will allow you to curve the predictions by adding sinusoidal component to the prediction curve.`normalize_data`

- boolean flag that indicates whether data normalization is needed or not.

In [6]:

```
# Set up linear regression parameters.
max_iterations = 10000 # Max number of gradient descent iterations.
regularization_param = 25 # Helps to fight model overfitting.
polynomial_degree = 0 # The degree of additional polynomial features.
sinusoid_degree = 0 # The degree of sinusoid parameter multipliers of additional features.
normalize_data = True # Whether we need to normalize data to make it more unifrom or not.
# Init logistic regression instance.
logistic_regression = LogisticRegression(x_train, y_train, polynomial_degree, sinusoid_degree, normalize_data)
# Train logistic regression.
(thetas, costs) = logistic_regression.train(regularization_param, max_iterations)
```

Let's see how model parameters (thetas) look like. For each digit class (from 0 to 9) we've just trained a set of 784 parameters (one theta for each image pixel). These parameters represents the importance of every pixel for specific digit recognition.

In [7]:

```
# Print thetas table.
pd.DataFrame(thetas)
```

Out[7]:

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 775 | 776 | 777 | 778 | 779 | 780 | 781 | 782 | 783 | 784 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | -6.188828 | -0.098811 | -0.087328 | -0.015862 | 0.024045 | 0.121492 | 0.058710 | -0.042991 | 0.026648 | -0.019745 | ... | -0.051891 | -0.039698 | 0.264672 | 0.098661 | 0.124377 | 0.036166 | -0.053816 | -0.046484 | -0.026023 | -0.013350 |

1 | -6.474232 | -0.002895 | -0.002513 | -0.009509 | -0.011743 | 0.117668 | -0.048227 | 0.062439 | 0.022960 | 0.149320 | ... | 0.153883 | 0.079365 | -0.033011 | -0.019675 | 0.017837 | -0.025347 | -0.026005 | -0.003478 | -0.002177 | -0.009066 |

2 | -5.334694 | -0.003845 | -0.037747 | -0.110560 | -0.115551 | -0.028090 | -0.106846 | -0.041750 | -0.220657 | -0.049796 | ... | -0.002940 | 0.159790 | 0.163778 | 0.086334 | 0.027404 | -0.098383 | -0.014745 | -0.087777 | 0.167204 | 0.146124 |

3 | -6.609060 | -0.010289 | -0.009757 | -0.029398 | -0.002841 | -0.013853 | 0.065444 | 0.152862 | 0.039919 | -0.109553 | ... | 0.070574 | -0.061689 | 0.028969 | -0.204451 | -0.200882 | -0.112483 | -0.078020 | -0.024253 | -0.008831 | -0.001139 |

4 | -5.965323 | 0.000036 | -0.007320 | 0.010527 | 0.044382 | -0.019542 | -0.009529 | -0.023622 | 0.027513 | -0.055903 | ... | 0.013763 | 0.171588 | -0.147751 | -0.045985 | 0.086505 | 0.058849 | 0.084423 | -0.070254 | -0.055217 | -0.002108 |

5 | -11.772900 | -0.000465 | -0.001262 | -0.037732 | -0.001665 | -0.004291 | -0.001232 | -0.010755 | -0.021838 | -0.059554 | ... | -0.061065 | -0.070279 | -0.038202 | -0.073953 | -0.083174 | -0.073595 | -0.013555 | -0.012098 | -0.091695 | -0.005742 |

6 | -4.922826 | 0.142705 | -0.033088 | -0.002464 | -0.113538 | -0.093484 | -0.035009 | -0.045016 | 0.047347 | 0.026366 | ... | -0.096220 | 0.025678 | -0.108178 | -0.043293 | -0.179027 | -0.050755 | 0.083261 | 0.115971 | -0.025531 | -0.170215 |

7 | -9.278711 | -0.000367 | -0.000285 | -0.000301 | -0.000410 | -0.000456 | -0.000550 | -0.001178 | -0.003780 | -0.007860 | ... | -0.050218 | -0.051179 | -0.013316 | -0.013148 | -0.005308 | -0.005978 | -0.005408 | -0.006201 | -0.003152 | -0.000013 |

8 | -5.977974 | -0.000269 | -0.040986 | -0.061915 | -0.019404 | -0.065937 | 0.122967 | 0.033192 | 0.107072 | 0.096812 | ... | 0.065897 | 0.037445 | -0.030971 | 0.014127 | -0.048779 | -0.025388 | -0.033395 | -0.028944 | -0.036300 | 0.002200 |

9 | -7.507192 | -0.000174 | -0.000325 | -0.001354 | -0.000743 | -0.001619 | -0.001899 | -0.005294 | -0.011630 | -0.013474 | ... | -0.033911 | -0.029371 | -0.015877 | -0.003029 | -0.004274 | 0.012986 | -0.001100 | 0.016288 | 0.117543 | -0.000238 |

10 rows × 785 columns

Each perceptron in the hidden layer learned something from the training process. What it learned is represented by input theta parameters for it. Each perceptron in the hidden layer has 28x28 input thetas (one for each input image pizel). Each theta represents how valuable each pixel is for this particuar perceptron. So let's try to plot how valuable each pixel of input image is for each perceptron based on its theta values.

In [8]:

```
# How many images to display.
numbers_to_display = 9
# Calculate the number of cells that will hold all the images.
num_cells = math.ceil(math.sqrt(numbers_to_display))
# Make the plot a little bit bigger than default one.
plt.figure(figsize=(10, 10))
# Go through the thetas and print them.
for plot_index in range(numbers_to_display):
# Extrace thetas data.
digit_pixels = thetas[plot_index][1:]
# Calculate image size (remember that each picture has square proportions).
image_size = int(math.sqrt(digit_pixels.shape[0]))
# Convert image vector into the matrix of pixels.
frame = digit_pixels.reshape((image_size, image_size))
# Plot the thetas matrix.
plt.subplot(num_cells, num_cells, plot_index + 1)
plt.imshow(frame, cmap='Greys')
plt.title(plot_index)
plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
# Plot all subplots.
plt.subplots_adjust(hspace=0.5, wspace=0.5)
plt.show()
```

The plot below illustrates how the cost function value changes over each iteration. You should see it decreasing.

In case if cost function value increases it may mean that gradient descent missed the cost function minimum and with each step it goes further away from it.

From this plot you may also get an understanding of how many iterations you need to get an optimal value of the cost function.

In [9]:

```
# Draw gradient descent progress for each label.
labels = logistic_regression.unique_labels
for index, label in enumerate(labels):
plt.plot(range(len(costs[index])), costs[index], label=label_map[labels[index]])
plt.xlabel('Gradient Steps')
plt.ylabel('Cost')
plt.legend()
plt.show()
```

Calculate how many of training and test examples have been classified correctly. Normally we need test precission to be as high as possible. In case if training precision is high and test precission is low it may mean that our model is overfitted (it works really well with the training data set but it is not good at classifying new unknown data from the test dataset). In this case you may want to play with `regularization_param`

parameter to fighth the overfitting.

In [10]:

```
# Make training set predictions.
y_train_predictions = logistic_regression.predict(x_train)
y_test_predictions = logistic_regression.predict(x_test)
# Check what percentage of them are actually correct.
train_precision = np.sum(y_train_predictions == y_train) / y_train.shape[0] * 100
test_precision = np.sum(y_test_predictions == y_test) / y_test.shape[0] * 100
print('Training Precision: {:5.4f}%'.format(train_precision))
print('Test Precision: {:5.4f}%'.format(test_precision))
```

Training Precision: 93.8000% Test Precision: 83.8000%

In order to illustrate how our model classifies unknown examples let's plot first 64 predictions for testing dataset. All green clothes on the plot below have been recognized corrctly but all the red clothes have not been recognized correctly by our classifier. On top of each image you may see the clothes class (type) that has been recognized on the image.

In [11]:

```
# How many numbers to display.
numbers_to_display = 64
# Calculate the number of cells that will hold all the numbers.
num_cells = math.ceil(math.sqrt(numbers_to_display))
# Make the plot a little bit bigger than default one.
plt.figure(figsize=(15, 15))
# Go through the first numbers in a test set and plot them.
for plot_index in range(numbers_to_display):
# Extrace digit data.
digit_label = y_test[plot_index, 0]
digit_pixels = x_test[plot_index, :]
# Predicted label.
predicted_label = y_test_predictions[plot_index][0]
# Calculate image size (remember that each picture has square proportions).
image_size = int(math.sqrt(digit_pixels.shape[0]))
# Convert image vector into the matrix of pixels.
frame = digit_pixels.reshape((image_size, image_size))
# Plot the number matrix.
color_map = 'Greens' if predicted_label == digit_label else 'Reds'
plt.subplot(num_cells, num_cells, plot_index + 1)
plt.imshow(frame, cmap=color_map)
plt.title(label_map[predicted_label])
plt.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
# Plot all subplots.
plt.subplots_adjust(hspace=0.5, wspace=0.5)
plt.show()
```