Notebook

Exercise 12.1 - Solution¶

Weights and activations of a convolutional network¶

This task consists of two parts. First, set up and evaluate a convolutional neural network.

Set up and train a convolutional network on the classification of images (data set CIFAR-10). Train your network to at least 70% test accuracy.

Plot the confusion matrix. What do you observe?
Plot several falsely classified images along with the predicted class scores. What kinds of misclassification do you observe, why do they occur?

Plot the filter weights in the first layer of your network and see if you can make any sense of it.
Visualize the activations in the first two layers of your network for two input images of choice and describe what you see.

Does it meet your expectations?

In [2]:

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

layers = keras.layers

keras version 2.4.0

Download CIFAR-10 data¶

In [2]:

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

In [3]:

x_train = (x_train - 128.) / 128.
x_test = (x_test - 128.) / 128.

In [4]:

y_train_one_hot = tf.keras.utils.to_categorical(y_train)
y_test_one_hot = tf.keras.utils.to_categorical(y_test)

Design, train, and evaluate a simple CNN¶

In [5]:

model = keras.models.Sequential([
    layers.Convolution2D(16, kernel_size=(5, 5), padding="same", activation='elu', input_shape=(32, 32, 3)),
    layers.Convolution2D(16, kernel_size=(3, 3), padding="same", activation='elu'),
    layers.Convolution2D(32, kernel_size=(3, 3), padding="same", strides=(2, 2), activation='elu'),
    layers.Convolution2D(32, kernel_size=(3, 3), padding="same", activation='elu'),
    layers.Convolution2D(64, kernel_size=(3, 3), padding="same", strides=(2, 2), activation='elu'),
    layers.Convolution2D(64, kernel_size=(3, 3), padding="same", activation='elu'),
    layers.GlobalMaxPooling2D(),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
    ])

print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 32, 32, 16)        1216      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 16)        2320      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 32)        4640      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 32)        9248      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          18496     
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
global_max_pooling2d (Global (None, 64)                0         
_________________________________________________________________
dropout (Dropout)            (None, 64)                0         
_________________________________________________________________
dense (Dense)                (None, 10)                650       
=================================================================
Total params: 73,498
Trainable params: 73,498
Non-trainable params: 0
_________________________________________________________________
None

In [6]:

model.compile(
    loss='categorical_crossentropy',
    optimizer=keras.optimizers.Adam(1e-3),
    metrics=['accuracy'])

results = model.fit(x_train, y_train_one_hot,
                    batch_size=32,
                    epochs=30,
                    verbose=1,
                    validation_split=0.1,
                    callbacks = [keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3, verbose=1, min_lr=1e-5)],
                    )

Epoch 1/30
1407/1407 [==============================] - 12s 6ms/step - loss: 1.8360 - accuracy: 0.3233 - val_loss: 1.5533 - val_accuracy: 0.4298
Epoch 2/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.6050 - accuracy: 0.4152 - val_loss: 1.4613 - val_accuracy: 0.4790
Epoch 3/30
1407/1407 [==============================] - 8s 6ms/step - loss: 1.5268 - accuracy: 0.4452 - val_loss: 1.3447 - val_accuracy: 0.5208
Epoch 4/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.4726 - accuracy: 0.4673 - val_loss: 1.2960 - val_accuracy: 0.5354
Epoch 5/30
1407/1407 [==============================] - 8s 6ms/step - loss: 1.4288 - accuracy: 0.4818 - val_loss: 1.2554 - val_accuracy: 0.5398
Epoch 6/30
1407/1407 [==============================] - 8s 6ms/step - loss: 1.3708 - accuracy: 0.5032 - val_loss: 1.2058 - val_accuracy: 0.5622
Epoch 7/30
1407/1407 [==============================] - 8s 6ms/step - loss: 1.3235 - accuracy: 0.5223 - val_loss: 1.1836 - val_accuracy: 0.5664
Epoch 8/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.2728 - accuracy: 0.5407 - val_loss: 1.1328 - val_accuracy: 0.6022
Epoch 9/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.2249 - accuracy: 0.5609 - val_loss: 1.0718 - val_accuracy: 0.6208
Epoch 10/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.1826 - accuracy: 0.5768 - val_loss: 1.0285 - val_accuracy: 0.6420
Epoch 11/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.1417 - accuracy: 0.5897 - val_loss: 1.0126 - val_accuracy: 0.6446
Epoch 12/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.1136 - accuracy: 0.6022 - val_loss: 0.9534 - val_accuracy: 0.6706
Epoch 13/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.0864 - accuracy: 0.6125 - val_loss: 0.9470 - val_accuracy: 0.6700
Epoch 14/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.0531 - accuracy: 0.6255 - val_loss: 0.9240 - val_accuracy: 0.6852
Epoch 15/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.0365 - accuracy: 0.6314 - val_loss: 0.9002 - val_accuracy: 0.6838
Epoch 16/30
1407/1407 [==============================] - 9s 6ms/step - loss: 1.0116 - accuracy: 0.6396 - val_loss: 0.9108 - val_accuracy: 0.6872
Epoch 17/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.9872 - accuracy: 0.6477 - val_loss: 0.8913 - val_accuracy: 0.6906
Epoch 18/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.9750 - accuracy: 0.6532 - val_loss: 0.9053 - val_accuracy: 0.6940
Epoch 19/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.9594 - accuracy: 0.6614 - val_loss: 0.8879 - val_accuracy: 0.6870
Epoch 20/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.9404 - accuracy: 0.6662 - val_loss: 0.8753 - val_accuracy: 0.6926
Epoch 21/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.9293 - accuracy: 0.6694 - val_loss: 0.8855 - val_accuracy: 0.6928
Epoch 22/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.9138 - accuracy: 0.6742 - val_loss: 0.8447 - val_accuracy: 0.7040
Epoch 23/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.9012 - accuracy: 0.6796 - val_loss: 0.8478 - val_accuracy: 0.7104
Epoch 24/30
1407/1407 [==============================] - 8s 6ms/step - loss: 0.8835 - accuracy: 0.6830 - val_loss: 0.8616 - val_accuracy: 0.7012
Epoch 25/30
1407/1407 [==============================] - 8s 6ms/step - loss: 0.8798 - accuracy: 0.6858 - val_loss: 0.8647 - val_accuracy: 0.6962

Epoch 00025: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 26/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.7754 - accuracy: 0.7240 - val_loss: 0.7802 - val_accuracy: 0.7330
Epoch 27/30
1407/1407 [==============================] - 8s 6ms/step - loss: 0.7374 - accuracy: 0.7356 - val_loss: 0.7827 - val_accuracy: 0.7248
Epoch 28/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.7168 - accuracy: 0.7403 - val_loss: 0.7784 - val_accuracy: 0.7374
Epoch 29/30
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6996 - accuracy: 0.7496 - val_loss: 0.7995 - val_accuracy: 0.7272
Epoch 30/30
1407/1407 [==============================] - 8s 6ms/step - loss: 0.6914 - accuracy: 0.7500 - val_loss: 0.7921 - val_accuracy: 0.7320

In [7]:

plt.figure(1, (12, 4))
plt.subplot(1, 2, 1)
plt.plot(results.history['loss'])
plt.plot(results.history['val_loss'])
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')

plt.subplot(1, 2, 2)
plt.plot(results.history['accuracy'])
plt.plot(results.history['val_accuracy'])
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.tight_layout()

Plot predictions¶

Task¶

Investigate the predictions of the trained model.

In [8]:

def plot_prediction(X, Y, Y_predict, fname=False):
    """
    Plot image X along with predicted probabilities Y_predict.
    X: CIFAR image, shape = (32, 32, 3)
    Y: CIFAR label, one-hot encoded, shape = (10)
    Y_predict: predicted probabilities, shape = (10)
    """
    X = 128 * X + 128
    labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer',
                       'dog', 'frog', 'horse', 'ship', 'truck'])

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))

    # plot image
    ax1.imshow(X.astype('uint8'), origin='upper')
    ax1.set(xticks=[], yticks=[])

    # plot probabilities
    ax2.barh(np.arange(10), Y_predict, align='center')
    ax2.set(xlim=(0, 1), xlabel='Score', yticks=[])

    for i in range(10):
        c = 'red' if (i == np.argmax(Y)) else 'black'
        ax2.text(0.05, i, labels[i].capitalize(), ha='left', va='center', color=c)

In [9]:

y_pred = model.predict(x_test, verbose=1).squeeze()

313/313 [==============================] - 1s 3ms/step

In [10]:

idx_1 = np.random.choice(len(x_test), 1)[0]
plot_prediction(x_test[idx_1], y_test_one_hot[idx_1], y_pred[idx_1])

In [11]:

idx_2 = np.random.choice(len(x_test), 1)[0]
plot_prediction(x_test[idx_2], y_test_one_hot[idx_2], y_pred[idx_2])

Plot confusion matrix¶

Task¶

Plot the confusion matrix and comment on your findings

In [12]:

def plot_confusion(Y_true, Y_predict, fname=False):
    """
    Plot confusion matrix
    Y_true:    array of true classifications (0-9), shape = (N)
    Y_predict: array of predicted classifications (0-9), shape = (N)
    """
    labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer',
                       'dog', 'frog', 'horse', 'ship', 'truck'])
    C = np.histogram2d(Y_true, Y_predict, bins=np.linspace(-0.5, 9.5, 11))[0]
    Cn = C / np.sum(C, axis=1)

    fig = plt.figure(figsize=(8, 8))
    plt.imshow(Cn, interpolation='nearest', vmin=0, vmax=1, cmap=plt.cm.YlGnBu)
    plt.colorbar()
    plt.xlabel('prediction')
    plt.ylabel('truth')
    plt.xticks(range(10), labels, rotation='vertical')
    plt.yticks(range(10), labels)

    for x in range(10):

        for y in range(10):
            plt.annotate('%i' % C[x, y], xy=(y, x), ha='center', va='center')

    plt.tight_layout()

In [13]:

y_predict_cl = np.argmax(y_pred, axis=1)
y_test_cl = np.argmax(y_test_one_hot, axis=1)

plot_confusion(y_test_cl, y_predict_cl)

The most misclassifications occur for classes that share similar features in the CNN. Samples of these classes have similar shapes, colors, and structures. Thus, most misclassifications occur e.g., for cars and trucks, for birds and airplanes (wings), for cats and dogs, or for airplanes and ships (similar background).

Plot filter weights of the first layer¶

In [14]:

# ----------------------------------------------------------
# Plot the filters in the first convolution layer
# ----------------------------------------------------------
conv1 = model.layers[0]

W1, b1 = conv1.get_weights()
W1 = (W1 * 128 + 128).astype(np.int)

nx, ny, nc, nf = W1.shape
n = np.ceil(nf**.5).astype(int)
fig, axes = plt.subplots(n, n, figsize=(7, 7))
fig.subplots_adjust(left=0.05, bottom=0.05, right=0.95, top=0.95, hspace=0, wspace=0)


for i in range(n**2):
    ax = axes.flat[i]

    if i < nf:
        ax.imshow(W1[..., i], origin='upper')
        ax.xaxis.set_visible(False)
        ax.yaxis.set_visible(False)
    else:
        ax.axis('Off')

fig.tight_layout()
fig.suptitle('36 convolutional filters', va='bottom')

Out[14]:

Text(0.5, 0.98, '36 convolutional filters')

Plot activations in convolutional layers¶

Task¶

Visualize the filters in the first convolutional layer of your CNN.

In [15]:

def visualize_activation(A, name='conv'):
    nx, ny, nf = A.shape
    n = np.ceil(nf**.5).astype(int)
    fig, axes = plt.subplots(n, n, figsize=(8, 8))
    fig.subplots_adjust(left=0.05, bottom=0.05, right=0.95, top=0.95, hspace=0, wspace=0)

    for i in range(n**2):
        ax = axes.flat[i]

        if i < nf:
            ax.imshow(A[..., i], origin='upper', cmap=plt.cm.Greys)
            ax.xaxis.set_visible(False)
            ax.yaxis.set_visible(False)
        else:
            ax.axis('Off')

    plt.tight_layout()
    fig.suptitle(name, va='bottom')
    plt.show()

Task¶

Visualize the activations in the first two layers of your network for two input images of choice and describe what you see.

In [16]:

import tensorflow as tf
tf.get_logger().setLevel('ERROR')  # remove annoying TF warnings

conv_layers = [l for l in model.layers if type(l) == layers.Conv2D]

for conv in conv_layers:
    conv_model = keras.models.Model(model.inputs, [conv.output])
    Xin = x_test[idx_1][np.newaxis]
    Xout1 = conv_model.predict(Xin)[0]
    visualize_activation(Xout1, 'image:%i  -  layer: %s' % (idx_1, conv.name))

In [17]:

idx = np.random.choice(len(x_test), 1)[0]

for conv in conv_layers:
    conv_model = keras.models.Model(model.inputs, [conv.output])
    Xin = x_test[idx_2][np.newaxis]
    Xout1 = conv_model.predict(Xin)[0]
    visualize_activation(Xout1, 'image:%i  -  layer: %s' % (idx_2, conv.name))

Whereas in the first few layers, the working principle of the CNN can be somehow, in a sense, be interpreted as a simple application of image [filters from image processing](https://en.wikipedia.org/wiki/Kernel_(image_processing), this is not possible for deeper layers. To get insights into the working principle of a CNN, more sophistaced methods are needed. Check, e.g., Exercise 12.2.