In this problem, we will develop a simple handwritten image classfier, that
Cool... This problem will hopefully guide you step-by-step through the development and training procedures for a simple classifier.
The following cell downloads a data set that contains 10,000 handwritten images ($n=10000$) with correct label for each image. Each image is made of 28x28 grayscale pixels ($m=28$) and the label is given by an integer from 0 to 9, so we represents the $n$-th data pair by $X^{(n)}\in\R^{m\times m}$ (image) and $y^{(n)} \in \{0,1,\dots,9\}$ (label).
We will use some of these data pairs to train our model. The model is simply a python function, that receives an $m\times m$ grayscale image as input and returns what the input image looks closest among the integers from 0 to 9. What we mean by "to train" is to find appropriate parameters of the model function based on the data pairs that we use for training.
Note that $X^{(n)}\in\R^{m\times m}$ and $y^{(n)}$ are accessible by X[:,:,n]
and y[n]
below.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://jonghank.github.io/ase1302/files/numbers.csv', \
header=None).values
m = 28 # 28x28: image size
K = 10 # 10: each image represents one of 10 digits
N = 10000 # 10000: number of images in the train set
y = data[:,0]
X = np.zeros((m,m,N))
for n in range(N):
X[:,:,n] = data[n,1:].reshape((m,m))
print(X.shape, y.shape)
(28, 28, 10000) (10000,)
For example, the first 12 images and the labels from the dataset are shown below. You can check if they coincide.
n_examples = 12
print(f'First {n_examples} images:\n')
plt.figure(figsize=(12,4), dpi=100)
for n in range(n_examples):
plt.subplot(1,n_examples,n+1)
plt.imshow(X[:,:,n], cmap='gray')
plt.axis('off')
plt.show()
print(f'First {n_examples} label:\n{y[:n_examples]}')
First 12 images:
First 12 label: [7 2 1 0 4 1 4 9 5 9 0 6]
(Problem 1a) We will first split the data by the "Train set" and the "Validation set". The train set is something we use for training the classifier, and the validation set is something we use for evaluating the accuracy of the classifier.
Simply let the first 7,000 data pairs ($n_\text{train}=7000$) be the train set ($X_\text{train}$ and $y_\text{train}$) and the rest 3,000 ($n_\text{valid}=3000$) be the validation set ($X_\text{valid}$ and $y_\text{valid}$).
Display the first 12 images from the validation set.
# your code here
First 12 images in the validation set:
(Problem 1b) Your train set should now contain 7,000 images, $X_\text{train}$, with correct labels, $y_\text{train}$.
For each integer $k$ from 0 to 9, how many images in the train set display integer $k$? Let $z=\left( z^{[0]}, z^{[1]},\dots, z^{[9]}\right)\in\Z^{10}$ be a zero-based vector of positive integers, such that $z^{[k]}$ represents the number of such images showing the digit $k$.
For your information, those ten numbers should sum to 7,000 and should look well balanced.
# your code here
z: [672 795 729 702 700 633 656 712 682 719]
(Problem 1c) The training process using the train set $X_\text{train}$ and $y_\text{train}$ is as follows:
Compute and display all ten representative images, $X^{[0]}_{\text{rep}}, X^{[1]}_{\text{rep}}, \dots, X^{[9]}_{\text{rep}}$. How do they look?
# your code here
Representative images:
(Problem 1d) Now we will use the representative images obtained above to build the classifier.
where $y_\text{pred}\in\ \{0,1,\dots,9\}$ is the prediction (which digit your input image look like) from your classifier, and the Frobenius norm $\|A\|_F^2$ for matrix $A$ is the square sum of all of its elements, so $\|X_\text{in}-X^{[3]}_\text{rep}\|^2_F$ being small implies that $X_\text{in}$ and $X^{[3]}_\text{rep}$ are close to each other. The $\arg\min$ function returns the index that achieves the smallest value.
Build the classfier function digit_classifier()
that receives an $m\times m$ image X_in
and returns the prediction label y_pred
obtained from the above solution.
The Frobenius norm of A
can be computed from np.linalg.norm(A, 'fro')
and you may find np.argmin()
function useful.
# your code here
(Problem 1e) Run your classifier function on the first 12 images from the validation set, and display the images along with the predictions, $y_\text{pred}$, from your classifier. Do they look all correct? For which one among the twelve samples did your classifier make wrong answer?
# your code here
First 12 images in the validation set:
Results from the digit_classifier: [1 2 2 5 8 1 3 2 9 4 8 8]
(Problem 1f) A more sophisticated way to analyze the accuracy of your predictor is to build and check the confusion matrix from the validation set. Note that the validation set contains the correct label $y_\text{valid}$ with which we can compare $y_\text{pred}$ to check if your classifier works properly.
For our problem, the confusion matrix $C$ is a 10x10 matrix each of whose element (with zero-based indices) is defined as follows.
$$ C_{ij} = \text{# of occurrences for which the prediction was $i$ while the image was $j$} $$so the elements of $C$ sum to 3,000 ($n_\text{valid}$).
Compute and print the confusion matrix.
What is the accuracy (in percent) of your classifier? For which input digit does your classifier achieve the lowest accuracy?
# your code here
Confusion matrix: [[285 0 12 1 0 4 5 0 5 0] [ 0 291 3 2 5 41 3 1 11 8] [ 4 43 239 19 1 3 4 7 4 0] [ 0 0 1 234 0 2 0 0 7 0] [ 3 0 21 0 249 12 2 3 2 31] [ 10 0 0 12 0 170 5 0 11 3] [ 4 0 3 0 5 8 282 0 4 0] [ 0 0 5 6 0 12 0 296 3 33] [ 2 6 15 33 8 6 1 7 242 12] [ 0 0 4 1 14 1 0 2 3 203]] Classification accuracy: 83.03 percent Percentage accuracy for each digit: [92.53 85.59 78.88 75.97 88.3 65.64 93.38 93.67 82.88 70. ]