Notebook

Ex) Handwritten image classifier¶

$$ \newcommand{\eg}{{\it e.g.}} \newcommand{\ie}{{\it i.e.}} \newcommand{\argmin}{\operatornamewithlimits{argmin}} \newcommand{\mc}{\mathcal} \newcommand{\mb}{\mathbb} \newcommand{\mf}{\mathbf} \newcommand{\minimize}{{\text{minimize}}} \newcommand{\diag}{{\text{diag}}} \newcommand{\cond}{{\text{cond}}} \newcommand{\rank}{{\text{rank }}} \newcommand{\range}{{\mathcal{R}}} \newcommand{\null}{{\mathcal{N}}} \newcommand{\tr}{{\text{trace}}} \newcommand{\dom}{{\text{dom}}} \newcommand{\dist}{{\text{dist}}} \newcommand{\R}{\mathbf{R}} \newcommand{\Z}{\mathbf{Z}} \newcommand{\SM}{\mathbf{S}} \newcommand{\ball}{\mathcal{B}} \newcommand{\bmat}[1]{\begin{bmatrix}#1\end{bmatrix}} $$

ASE1302: Computer programming, Inha University.

Jong-Han Kim (jonghank@inha.ac.kr)

In this problem, we will develop a simple handwritten image classfier, that

receives a grayscale $m \times m$ handwritten image of a digit (0~9) as input,
and tells what it is.

Cool... This problem will hopefully guide you step-by-step through the development and training procedures for a simple classifier.

The following cell downloads a data set that contains 10,000 handwritten images ($n=10000$) with correct label for each image. Each image is made of 28x28 grayscale pixels ($m=28$) and the label is given by an integer from 0 to 9, so we represents the $n$-th data pair by $X^{(n)}\in\R^{m\times m}$ (image) and $y^{(n)} \in \{0,1,\dots,9\}$ (label).

We will use some of these data pairs to train our model. The model is simply a python function, that receives an $m\times m$ grayscale image as input and returns what the input image looks closest among the integers from 0 to 9. What we mean by "to train" is to find appropriate parameters of the model function based on the data pairs that we use for training.

Note that $X^{(n)}\in\R^{m\times m}$ and $y^{(n)}$ are accessible by X[:,:,n] and y[n] below.

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('https://jonghank.github.io/ase1302/files/numbers.csv', \
                 header=None).values

m = 28     # 28x28: image size
K = 10     # 10: each image represents one of 10 digits
N = 10000  # 10000: number of images in the train set

y = data[:,0]
X = np.zeros((m,m,N))
for n in range(N):
  X[:,:,n] = data[n,1:].reshape((m,m))
  
print(X.shape, y.shape)

(28, 28, 10000) (10000,)

For example, the first 12 images and the labels from the dataset are shown below. You can check if they coincide.

In [2]:

n_examples = 12
print(f'First {n_examples} images:\n')
plt.figure(figsize=(12,4), dpi=100)
for n in range(n_examples):
  plt.subplot(1,n_examples,n+1)
  plt.imshow(X[:,:,n], cmap='gray')
  plt.axis('off')
plt.show()

print(f'First {n_examples} label:\n{y[:n_examples]}')

First 12 images:

First 12 label:
[7 2 1 0 4 1 4 9 5 9 0 6]

(Problem 1a) We will first split the data by the "Train set" and the "Validation set". The train set is something we use for training the classifier, and the validation set is something we use for evaluating the accuracy of the classifier.

Simply let the first 7,000 data pairs ($n_\text{train}=7000$) be the train set ($X_\text{train}$ and $y_\text{train}$) and the rest 3,000 ($n_\text{valid}=3000$) be the validation set ($X_\text{valid}$ and $y_\text{valid}$).

Display the first 12 images from the validation set.

In [3]:

# your code here

First 12 images in the validation set:

(Problem 1b) Your train set should now contain 7,000 images, $X_\text{train}$, with correct labels, $y_\text{train}$.

For each integer $k$ from 0 to 9, how many images in the train set display integer $k$? Let $z=\left( z^{[0]}, z^{[1]},\dots, z^{[9]}\right)\in\Z^{10}$ be a zero-based vector of positive integers, such that $z^{[k]}$ represents the number of such images showing the digit $k$.

For your information, those ten numbers should sum to 7,000 and should look well balanced.

In [4]:

# your code here

z:
[672 795 729 702 700 633 656 712 682 719]

(Problem 1c) The training process using the train set $X_\text{train}$ and $y_\text{train}$ is as follows:

For each $k$ in $0,1,\dots,9$, construct a representative image $X^{[k]}_{\text{rep}}\in\R^{m\times m}$ that captures the representative characteristics or features of the images displaying $k$.
The average image can be a good candidate for $X^{[k]}_{\text{rep}}$. For example,

$$ X^{[k]}_{\text{rep}} = \frac{1}{z^{[k]}}\sum_{n: \ y^{(n)}_\text{train}=k} X^{(n)} $$

Compute and display all ten representative images, $X^{[0]}_{\text{rep}}, X^{[1]}_{\text{rep}}, \dots, X^{[9]}_{\text{rep}}$. How do they look?

In [5]:

# your code here

Representative images:

(Problem 1d) Now we will use the representative images obtained above to build the classifier.

The classifier compares the input image $X_\text{in}\in\R^{m\times m}$ with the ten representative images, $X^{[0]}_{\text{rep}}, X^{[1]}_{\text{rep}}, \dots, X^{[9]}_{\text{rep}}$, and chooses the closest one.
In other words, your classifier computes

$$ y_\text{pred} = \underset{k}{\arg\min} \ \ \|X_\text{in}-X^{[k]}_\text{rep}\|^2_F, $$

where $y_\text{pred}\in\ \{0,1,\dots,9\}$ is the prediction (which digit your input image look like) from your classifier, and the Frobenius norm $\|A\|_F^2$ for matrix $A$ is the square sum of all of its elements, so $\|X_\text{in}-X^{[3]}_\text{rep}\|^2_F$ being small implies that $X_\text{in}$ and $X^{[3]}_\text{rep}$ are close to each other. The $\arg\min$ function returns the index that achieves the smallest value.

Build the classfier function digit_classifier() that receives an $m\times m$ image X_in and returns the prediction label y_pred obtained from the above solution.

The Frobenius norm of A can be computed from np.linalg.norm(A, 'fro') and you may find np.argmin() function useful.

In [6]:

# your code here

(Problem 1e) Run your classifier function on the first 12 images from the validation set, and display the images along with the predictions, $y_\text{pred}$, from your classifier. Do they look all correct? For which one among the twelve samples did your classifier make wrong answer?

In [7]:

# your code here

First 12 images in the validation set:

Results from the digit_classifier:
[1 2 2 5 8 1 3 2 9 4 8 8]

(Problem 1f) A more sophisticated way to analyze the accuracy of your predictor is to build and check the confusion matrix from the validation set. Note that the validation set contains the correct label $y_\text{valid}$ with which we can compare $y_\text{pred}$ to check if your classifier works properly.

For our problem, the confusion matrix $C$ is a 10x10 matrix each of whose element (with zero-based indices) is defined as follows.

$$ C_{ij} = \text{# of occurrences for which the prediction was $i$ while the image was $j$} $$

so the elements of $C$ sum to 3,000 ($n_\text{valid}$).

Compute and print the confusion matrix.

What is the accuracy (in percent) of your classifier? For which input digit does your classifier achieve the lowest accuracy?

In [8]:

# your code here

Confusion matrix:
[[285   0  12   1   0   4   5   0   5   0]
 [  0 291   3   2   5  41   3   1  11   8]
 [  4  43 239  19   1   3   4   7   4   0]
 [  0   0   1 234   0   2   0   0   7   0]
 [  3   0  21   0 249  12   2   3   2  31]
 [ 10   0   0  12   0 170   5   0  11   3]
 [  4   0   3   0   5   8 282   0   4   0]
 [  0   0   5   6   0  12   0 296   3  33]
 [  2   6  15  33   8   6   1   7 242  12]
 [  0   0   4   1  14   1   0   2   3 203]]

Classification accuracy:
83.03 percent

Percentage accuracy for each digit:
[92.53 85.59 78.88 75.97 88.3  65.64 93.38 93.67 82.88 70.  ]