NOTE: To prevent errors from the autograder, you are not allowed to edit or delete non-graded cells in this notebook . Please also refrain from adding any new cells. Once you have passed this assignment and want to experiment with any of the non-graded code, you may follow the instructions at the bottom of this notebook.
First, let's run the cell below to import all the packages that you will need during this assignment.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.activations import linear, relu, sigmoid
%matplotlib widget
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
tf.autograph.set_verbosity(0)
from public_tests import *
from autils import *
from lab_utils_softmax import plt_softmax
np.set_printoptions(precision=2)
This week, a new activation was introduced, the Rectified Linear Unit (ReLU). $$ a = max(0,z) \quad\quad\text {# ReLU function} $$
plt_act_trio()
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
A multiclass neural network generates N outputs. One output is selected as the predicted answer. In the output layer, a vector $\mathbf{z}$ is generated by a linear function which is fed into a softmax function. The softmax function converts $\mathbf{z}$ into a probability distribution as described below. After applying softmax, each output will be between 0 and 1 and the outputs will sum to 1. They can be interpreted as probabilities. The larger inputs to the softmax will correspond to larger output probabilities.
The softmax function can be written: $$a_j = \frac{e^{z_j}}{ \sum_{k=0}^{N-1}{e^{z_k} }} \tag{1}$$
Where $z = \mathbf{w} \cdot \mathbf{x} + b$ and N is the number of feature/categories in the output layer.
# UNQ_C1
# GRADED CELL: my_softmax
def my_softmax(z):
""" Softmax converts a vector of values to a probability distribution.
Args:
z (ndarray (N,)) : input data, N features
Returns:
a (ndarray (N,)) : softmax of z
"""
### START CODE HERE ###
a = np.ndarray(z.size)
z_sum = 0
for i in range(z.size):
z_sum += np.exp(z[i])
for i in range(z.size):
a[i] = np.exp(z[i]) / z_sum
### END CODE HERE ###
return a
z = np.array([1., 2., 3., 4.])
a = my_softmax(z)
atf = tf.nn.softmax(z)
print(f"my_softmax(z): {a}")
print(f"tensorflow softmax(z): {atf}")
# BEGIN UNIT TEST
test_my_softmax(my_softmax)
# END UNIT TEST
my_softmax(z): [0.03 0.09 0.24 0.64]
tensorflow softmax(z): [0.03 0.09 0.24 0.64]
All tests passed.
def my_softmax(z):
N = len(z)
a = # initialize a to zeros
ez_sum = # initialize sum to zero
for k in range(N): # loop over number of outputs
ez_sum += # sum exp(z[k]) to build the shared denominator
for j in range(N): # loop over number of outputs again
a[j] = # divide each the exp of each output by the denominator
return(a)
def my_softmax(z):
N = len(z)
a = np.zeros(N)
ez_sum = 0
for k in range(N):
ez_sum += np.exp(z[k])
for j in range(N):
a[j] = np.exp(z[j])/ez_sum
return(a)
Or, a vector implementation:
def my_softmax(z):
ez = np.exp(z)
a = ez/np.sum(ez)
return(a)
Below, vary the values of the z
inputs. Note in particular how the exponential in the numerator magnifies small differences in the values. Note as well that the output values sum to one.
plt.close("all")
plt_softmax(my_softmax)
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
In last weeks assignment, you implemented a neural network to do binary classification. This week you will extend that to multiclass classification. This will utilize the softmax activation.
In this exercise, you will use a neural network to recognize ten handwritten digits, 0-9. This is a multiclass classification task where one of n choices is selected. Automated handwritten digit recognition is widely used today - from recognizing zip codes (postal codes) on mail envelopes to recognizing amounts written on bank checks.
You will start by loading the dataset for this task.
The load_data()
function shown below loads the data into variables X
and y
The data set contains 5000 training examples of handwritten digits $^1$.
X
.X
where every row is a training example of a handwritten digit image.y
that contains labels for the training sety = 0
if the image is of the digit 0
, y = 4
if the image is of the digit 4
and so on.$^1$ This is a subset of the MNIST handwritten digit dataset (http://yann.lecun.com/exdb/mnist/)
# load dataset
X, y = load_data()
Let's get more familiar with your dataset.
The code below prints the first element in the variables X
and y
.
print ('The first element of X is: ', X[0])
The first element of X is: [ 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 8.56e-06 1.94e-06 -7.37e-04 -8.13e-03 -1.86e-02 -1.87e-02 -1.88e-02 -1.91e-02 -1.64e-02 -3.78e-03 3.30e-04 1.28e-05 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 1.16e-04 1.20e-04 -1.40e-02 -2.85e-02 8.04e-02 2.67e-01 2.74e-01 2.79e-01 2.74e-01 2.25e-01 2.78e-02 -7.06e-03 2.35e-04 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 1.28e-17 -3.26e-04 -1.39e-02 8.16e-02 3.83e-01 8.58e-01 1.00e+00 9.70e-01 9.31e-01 1.00e+00 9.64e-01 4.49e-01 -5.60e-03 -3.78e-03 0.00e+00 0.00e+00 0.00e+00 0.00e+00 5.11e-06 4.36e-04 -3.96e-03 -2.69e-02 1.01e-01 6.42e-01 1.03e+00 8.51e-01 5.43e-01 3.43e-01 2.69e-01 6.68e-01 1.01e+00 9.04e-01 1.04e-01 -1.66e-02 0.00e+00 0.00e+00 0.00e+00 0.00e+00 2.60e-05 -3.11e-03 7.52e-03 1.78e-01 7.93e-01 9.66e-01 4.63e-01 6.92e-02 -3.64e-03 -4.12e-02 -5.02e-02 1.56e-01 9.02e-01 1.05e+00 1.51e-01 -2.16e-02 0.00e+00 0.00e+00 0.00e+00 5.87e-05 -6.41e-04 -3.23e-02 2.78e-01 9.37e-01 1.04e+00 5.98e-01 -3.59e-03 -2.17e-02 -4.81e-03 6.17e-05 -1.24e-02 1.55e-01 9.15e-01 9.20e-01 1.09e-01 -1.71e-02 0.00e+00 0.00e+00 1.56e-04 -4.28e-04 -2.51e-02 1.31e-01 7.82e-01 1.03e+00 7.57e-01 2.85e-01 4.87e-03 -3.19e-03 0.00e+00 8.36e-04 -3.71e-02 4.53e-01 1.03e+00 5.39e-01 -2.44e-03 -4.80e-03 0.00e+00 0.00e+00 -7.04e-04 -1.27e-02 1.62e-01 7.80e-01 1.04e+00 8.04e-01 1.61e-01 -1.38e-02 2.15e-03 -2.13e-04 2.04e-04 -6.86e-03 4.32e-04 7.21e-01 8.48e-01 1.51e-01 -2.28e-02 1.99e-04 0.00e+00 0.00e+00 -9.40e-03 3.75e-02 6.94e-01 1.03e+00 1.02e+00 8.80e-01 3.92e-01 -1.74e-02 -1.20e-04 5.55e-05 -2.24e-03 -2.76e-02 3.69e-01 9.36e-01 4.59e-01 -4.25e-02 1.17e-03 1.89e-05 0.00e+00 0.00e+00 -1.94e-02 1.30e-01 9.80e-01 9.42e-01 7.75e-01 8.74e-01 2.13e-01 -1.72e-02 0.00e+00 1.10e-03 -2.62e-02 1.23e-01 8.31e-01 7.27e-01 5.24e-02 -6.19e-03 0.00e+00 0.00e+00 0.00e+00 0.00e+00 -9.37e-03 3.68e-02 6.99e-01 1.00e+00 6.06e-01 3.27e-01 -3.22e-02 -4.83e-02 -4.34e-02 -5.75e-02 9.56e-02 7.27e-01 6.95e-01 1.47e-01 -1.20e-02 -3.03e-04 0.00e+00 0.00e+00 0.00e+00 0.00e+00 -6.77e-04 -6.51e-03 1.17e-01 4.22e-01 9.93e-01 8.82e-01 7.46e-01 7.24e-01 7.23e-01 7.20e-01 8.45e-01 8.32e-01 6.89e-02 -2.78e-02 3.59e-04 7.15e-05 0.00e+00 0.00e+00 0.00e+00 0.00e+00 1.53e-04 3.17e-04 -2.29e-02 -4.14e-03 3.87e-01 5.05e-01 7.75e-01 9.90e-01 1.01e+00 1.01e+00 7.38e-01 2.15e-01 -2.70e-02 1.33e-03 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 2.36e-04 -2.26e-03 -2.52e-02 -3.74e-02 6.62e-02 2.91e-01 3.23e-01 3.06e-01 8.76e-02 -2.51e-02 2.37e-04 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 6.21e-18 6.73e-04 -1.13e-02 -3.55e-02 -3.88e-02 -3.71e-02 -1.34e-02 9.91e-04 4.89e-05 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00]
print ('The first element of y is: ', y[0,0])
print ('The last element of y is: ', y[-1,0])
The first element of y is: 0 The last element of y is: 9
Another way to get familiar with your data is to view its dimensions. Please print the shape of X
and y
and see how many training examples you have in your dataset.
print ('The shape of X is: ' + str(X.shape))
print ('The shape of y is: ' + str(y.shape))
The shape of X is: (5000, 400) The shape of y is: (5000, 1)
You will begin by visualizing a subset of the training set.
X
, maps each row back to a 20 pixel by 20 pixel grayscale image and displays the images together.import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell
m, n = X.shape
fig, axes = plt.subplots(8,8, figsize=(5,5))
fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]
#fig.tight_layout(pad=0.5)
widgvis(fig)
for i,ax in enumerate(axes.flat):
# Select random indices
random_index = np.random.randint(m)
# Select rows corresponding to the random indices and
# reshape the image
X_random_reshaped = X[random_index].reshape((20,20)).T
# Display the image
ax.imshow(X_random_reshaped, cmap='gray')
# Display the label above the image
ax.set_title(y[random_index,0])
ax.set_axis_off()
fig.suptitle("Label, image", fontsize=14)
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
The neural network you will use in this assignment is shown in the figure below.
The parameters have dimensions that are sized for a neural network with $25$ units in layer 1, $15$ units in layer 2 and $10$ output units in layer 3, one for each digit.
Recall that the dimensions of these parameters is determined as follows:
Therefore, the shapes of W
, and b
, are
W1
is (400, 25) and the shape of b1
is (25,)W2
is (25, 15) and the shape of b2
is: (15,)W3
is (15, 10) and the shape of b3
is: (10,)Note: The bias vector
b
could be represented as a 1-D (n,) or 2-D (n,1) array. Tensorflow utilizes a 1-D representation and this lab will maintain that convention:
Tensorflow models are built layer by layer. A layer's input dimensions ($s_{in}$ above) are calculated for you. You specify a layer's output dimensions and this determines the next layer's input dimension. The input dimension of the first layer is derived from the size of the input data specified in the model.fit
statement below.
Note: It is also possible to add an input layer that specifies the input dimension of the first layer. For example:
tf.keras.Input(shape=(400,)), #specify input shape
We will include that here to illuminate some model sizing.
As described in the lecture and the optional softmax lab, numerical stability is improved if the softmax is grouped with the loss function rather than the output layer during training. This has implications when building the model and using the model.
Building:
model.compile
statement will indicate this by including from_logits=True
.loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
Using the model:
Below, using Keras Sequential model and Dense Layer with a ReLU activation to construct the three layer network described above.
# UNQ_C2
# GRADED CELL: Sequential model
tf.random.set_seed(1234) # for consistent results
model = Sequential(
[
### START CODE HERE ###
tf.keras.Input(shape=(400,)),
Dense(units=25, activation='relu'),
Dense(units=15, activation='relu'),
Dense(units=10, activation='linear'),
### END CODE HERE ###
], name = "my_model"
)
model.summary()
Model: "my_model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_3 (Dense) (None, 25) 10025 dense_4 (Dense) (None, 15) 390 dense_5 (Dense) (None, 10) 160 ================================================================= Total params: 10,575 Trainable params: 10,575 Non-trainable params: 0 _________________________________________________________________
Model: "my_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
L1 (Dense) (None, 25) 10025
_________________________________________________________________
L2 (Dense) (None, 15) 390
_________________________________________________________________
L3 (Dense) (None, 10) 160
=================================================================
Total params: 10,575
Trainable params: 10,575
Non-trainable params: 0
_________________________________________________________________
tf.random.set_seed(1234)
model = Sequential(
[
### START CODE HERE ###
tf.keras.Input(shape=(400,)), # @REPLACE
Dense(25, activation='relu', name = "L1"), # @REPLACE
Dense(15, activation='relu', name = "L2"), # @REPLACE
Dense(10, activation='linear', name = "L3"), # @REPLACE
### END CODE HERE ###
], name = "my_model"
)
# BEGIN UNIT TEST
test_model(model, 10, 400)
# END UNIT TEST
All tests passed!
The parameter counts shown in the summary correspond to the number of elements in the weight and bias arrays as shown below.
Let's further examine the weights to verify that tensorflow produced the same dimensions as we calculated above.
[layer1, layer2, layer3] = model.layers
#### Examine Weights shapes
W1,b1 = layer1.get_weights()
W2,b2 = layer2.get_weights()
W3,b3 = layer3.get_weights()
print(f"W1 shape = {W1.shape}, b1 shape = {b1.shape}")
print(f"W2 shape = {W2.shape}, b2 shape = {b2.shape}")
print(f"W3 shape = {W3.shape}, b3 shape = {b3.shape}")
W1 shape = (400, 25), b1 shape = (25,) W2 shape = (25, 15), b2 shape = (15,) W3 shape = (15, 10), b3 shape = (10,)
Expected Output
W1 shape = (400, 25), b1 shape = (25,)
W2 shape = (25, 15), b2 shape = (15,)
W3 shape = (15, 10), b3 shape = (10,)
The following code:
SparseCategoricalCrossentropy
and indicates the softmax should be included with the loss calculation by adding from_logits=True
)model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
)
history = model.fit(
X,y,
epochs=40
)
Epoch 1/40 157/157 [==============================] - 1s 2ms/step - loss: 1.7094 Epoch 2/40 157/157 [==============================] - 0s 2ms/step - loss: 0.7480 Epoch 3/40 157/157 [==============================] - 0s 2ms/step - loss: 0.4428 Epoch 4/40 157/157 [==============================] - 0s 2ms/step - loss: 0.3463 Epoch 5/40 157/157 [==============================] - 0s 2ms/step - loss: 0.2977 Epoch 6/40 157/157 [==============================] - 0s 2ms/step - loss: 0.2630 Epoch 7/40 157/157 [==============================] - 0s 2ms/step - loss: 0.2361 Epoch 8/40 157/157 [==============================] - 0s 2ms/step - loss: 0.2131 Epoch 9/40 157/157 [==============================] - 0s 2ms/step - loss: 0.2004 Epoch 10/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1805 Epoch 11/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1692 Epoch 12/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1580 Epoch 13/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1507 Epoch 14/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1396 Epoch 15/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1289 Epoch 16/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1255 Epoch 17/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1154 Epoch 18/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1102 Epoch 19/40 157/157 [==============================] - 0s 2ms/step - loss: 0.1016 Epoch 20/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0970 Epoch 21/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0926 Epoch 22/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0891 Epoch 23/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0828 Epoch 24/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0785 Epoch 25/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0755 Epoch 26/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0713 Epoch 27/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0701 Epoch 28/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0617 Epoch 29/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0578 Epoch 30/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0550 Epoch 31/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0511 Epoch 32/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0499 Epoch 33/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0462 Epoch 34/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0437 Epoch 35/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0422 Epoch 36/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0396 Epoch 37/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0366 Epoch 38/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0344 Epoch 39/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0312 Epoch 40/40 157/157 [==============================] - 0s 2ms/step - loss: 0.0294
In the compile
statement above, the number of epochs
was set to 100. This specifies that the entire data set should be applied during training 100 times. During training, you see output describing the progress of training that looks like this:
Epoch 1/100
157/157 [==============================] - 0s 1ms/step - loss: 2.2770
The first line, Epoch 1/100
, describes which epoch the model is currently running. For efficiency, the training data set is broken into 'batches'. The default size of a batch in Tensorflow is 32. There are 5000 examples in our data set or roughly 157 batches. The notation on the 2nd line 157/157 [====
is describing which batch has been executed.
In course 1, we learned to track the progress of gradient descent by monitoring the cost. Ideally, the cost will decrease as the number of iterations of the algorithm increases. Tensorflow refers to the cost as loss
. Above, you saw the loss displayed each epoch as model.fit
was executing. The .fit method returns a variety of metrics including the loss. This is captured in the history
variable above. This can be used to examine the loss in a plot as shown below.
plot_loss_tf(history)
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
To make a prediction, use Keras predict
. Below, X[1015] contains an image of a two.
image_of_two = X[1015]
display_digit(image_of_two)
prediction = model.predict(image_of_two.reshape(1,400)) # prediction
print(f" predicting a Two: \n{prediction}")
print(f" Largest Prediction index: {np.argmax(prediction)}")
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
predicting a Two: [[ -7.99 -2.23 0.77 -2.41 -11.66 -11.15 -9.53 -3.36 -4.42 -7.17]] Largest Prediction index: 2
The largest output is prediction[2], indicating the predicted digit is a '2'. If the problem only requires a selection, that is sufficient. Use NumPy argmax to select it. If the problem requires a probability, a softmax is required:
prediction_p = tf.nn.softmax(prediction)
print(f" predicting a Two. Probability vector: \n{prediction_p}")
print(f"Total of predictions: {np.sum(prediction_p):0.3f}")
predicting a Two. Probability vector: [[1.42e-04 4.49e-02 8.98e-01 3.76e-02 3.61e-06 5.97e-06 3.03e-05 1.44e-02 5.03e-03 3.22e-04]] Total of predictions: 1.000
To return an integer representing the predicted target, you want the index of the largest probability. This is accomplished with the Numpy argmax function.
yhat = np.argmax(prediction_p)
print(f"np.argmax(prediction_p): {yhat}")
np.argmax(prediction_p): 2
Let's compare the predictions vs the labels for a random sample of 64 digits. This takes a moment to run.
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cell
m, n = X.shape
fig, axes = plt.subplots(8,8, figsize=(5,5))
fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]
widgvis(fig)
for i,ax in enumerate(axes.flat):
# Select random indices
random_index = np.random.randint(m)
# Select rows corresponding to the random indices and
# reshape the image
X_random_reshaped = X[random_index].reshape((20,20)).T
# Display the image
ax.imshow(X_random_reshaped, cmap='gray')
# Predict using the Neural Network
prediction = model.predict(X[random_index].reshape(1,400))
prediction_p = tf.nn.softmax(prediction)
yhat = np.argmax(prediction_p)
# Display the label above the image
ax.set_title(f"{y[random_index,0]},{yhat}",fontsize=10)
ax.set_axis_off()
fig.suptitle("Label, yhat", fontsize=14)
plt.show()
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
Let's look at some of the errors.
Note: increasing the number of training epochs can eliminate the errors on this data set.
print( f"{display_errors(model,X,y)} errors out of {len(X)} images")
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
15 errors out of 5000 images
You have successfully built and utilized a neural network to do multiclass classification.
Important Note: Please only do this when you've already passed the assignment to avoid problems with the autograder.
Here's a short demo of how to do the steps above: