The problem addressed by MNIST data is a supervised learning problem, and more precisely a classification problem. The objective is to build a model that can recognize numbers from images. The training datasets contains 60000 example of images (arrays 28*28) which makes 784 features. The testing datasets contains 10000 examples. The evaluation criteria used is the test error rate (ie the rate of missclassified images).
So far, the best performance reported for a fully connected neural networks is 0.35%. Better results have been reached with convolutional neural networks: 0.23 %. This is the best performance so far.
from keras.datasets import mnist
import numpy as np
# Load data through keras library
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
Using TensorFlow backend. /home/Anaconda/anaconda3/envs/tensorflow/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds)
print(X_train.shape)
print(X_test.shape)
(60000, 28, 28) (10000, 28, 28)
Here is an overview of the dataset (first 20 images)
import matplotlib.pyplot as plt
w = 10
h = 10
fig = plt.figure(figsize=(8, 8))
columns = 4
rows = 5
for i in range(0, columns*rows):
img = np.random.randint(10, size=(h,w))
fig.add_subplot(rows, columns, i+1)
plt.imshow(X_train[i])
plt.show()
<matplotlib.figure.Figure at 0x7f5ba7715128>
Here we can check the corresponding labels
Y_train[:20]
array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9], dtype=uint8)
# Reshape training and testing data
X_train = X_train.reshape(X_train.shape[0], 28*28)
X_test = X_test.reshape(X_test.shape[0], 28*28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# Reshape labels
from keras.utils import np_utils
Y_train = np_utils.to_categorical(Y_train, 10)
Y_test = np_utils.to_categorical(Y_test, 10)
Now the dataset is ready to process. We'll use keras library to learn and test neural networks with diferent sets of parameters.
# Import modules
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Input
Here are the parameters chosen
# Parameters
activation = 'relu' # activation function
optimizer = 'adam' # optimization
loss = 'categorical_crossentropy' # Cost function
batch_size = 30 # Batch size
epochs = 5 # Epochs
# Initialize model
model = Sequential()
# Define NN structure
inputs = Input(shape=(784,))
x = Dense(200, activation=activation)(inputs) # First hidden layer with 200 nodes
x = Dense(200, activation=activation)(x) # Second hidden layer with 200 nodes
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=optimizer, loss=loss,metrics=['accuracy']) # optimization parameters + cost function
import time
t = time.time()
# Train model
model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size)
t = time.time() - t
print("Training time : " + str(t) + " sec")
Epoch 1/5 60000/60000 [==============================] - 19s 324us/step - loss: 0.2083 - acc: 0.9375 Epoch 2/5 60000/60000 [==============================] - 19s 311us/step - loss: 0.0897 - acc: 0.9720 Epoch 3/5 60000/60000 [==============================] - 19s 320us/step - loss: 0.0610 - acc: 0.9801 Epoch 4/5 60000/60000 [==============================] - 19s 309us/step - loss: 0.0459 - acc: 0.9852 Epoch 5/5 60000/60000 [==============================] - 18s 298us/step - loss: 0.0368 - acc: 0.9880 Training time : 94.06952619552612 sec
# Testing the model
score = model.evaluate(X_test, Y_test, batch_size=10)
print(score)
print("Accuracy : " + str(score[1]))
print("Test error rate : " + str(1-score[1]))
10000/10000 [==============================] - 3s 321us/step [0.07905915418734595, 0.978799996137619] Accuracy : 0.978799996137619 Test error rate : 0.021200003862380967
With this set of parameters, we find a test error rate approximately equal to 2% which is better than the 3% needed in the assignement. Objective completed !