import keras
keras.__version__
Using TensorFlow backend.
'2.0.8'
This notebook contains the code sample found in Chapter 5, Section 2 of Deep Learning with Python. Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.
Having to train an image classification model using only very little data is a common situation, which you likely encounter yourself in practice if you ever do computer vision in a professional context.
Having "few" samples can mean anywhere from a few hundreds to a few tens of thousands of images. As a practical example, we will focus on classifying images as "dogs" or "cats", in a dataset containing 4000 pictures of cats and dogs (2000 cats, 2000 dogs). We will use 2000 pictures for training, 1000 for validation, and finally 1000 for testing.
In this section, we will review one basic strategy to tackle this problem: training a new model from scratch on what little data we have. We will start by naively training a small convnet on our 2000 training samples, without any regularization, to set a baseline for what can be achieved. This will get us to a classification accuracy of 71%. At that point, our main issue will be overfitting. Then we will introduce data augmentation, a powerful technique for mitigating overfitting in computer vision. By leveraging data augmentation, we will improve our network to reach an accuracy of 82%.
In the next section, we will review two more essential techniques for applying deep learning to small datasets: doing feature extraction with a pre-trained network (this will get us to an accuracy of 90% to 93%), and fine-tuning a pre-trained network (this will get us to our final accuracy of 95%). Together, these three strategies -- training a small model from scratch, doing feature extracting using a pre-trained model, and fine-tuning a pre-trained model -- will constitute your future toolbox for tackling the problem of doing computer vision with small datasets.
You will sometimes hear that deep learning only works when lots of data is available. This is in part a valid point: one fundamental characteristic of deep learning is that it is able to find interesting features in the training data on its own, without any need for manual feature engineering, and this can only be achieved when lots of training examples are available. This is especially true for problems where the input samples are very high-dimensional, like images.
However, what constitutes "lots" of samples is relative -- relative to the size and depth of the network you are trying to train, for starters. It isn't possible to train a convnet to solve a complex problem with just a few tens of samples, but a few hundreds can potentially suffice if the model is small and well-regularized and if the task is simple. Because convnets learn local, translation-invariant features, they are very data-efficient on perceptual problems. Training a convnet from scratch on a very small image dataset will still yield reasonable results despite a relative lack of data, without the need for any custom feature engineering. You will see this in action in this section.
But what's more, deep learning models are by nature highly repurposable: you can take, say, an image classification or speech-to-text model trained on a large-scale dataset then reuse it on a significantly different problem with only minor changes. Specifically, in the case of computer vision, many pre-trained models (usually trained on the ImageNet dataset) are now publicly available for download and can be used to bootstrap powerful vision models out of very little data. That's what we will do in the next section.
For now, let's get started by getting our hands on the data.
The cats vs. dogs dataset that we will use isn't packaged with Keras. It was made available by Kaggle.com as part of a computer vision
competition in late 2013, back when convnets weren't quite mainstream. You can download the original dataset at:
https://www.kaggle.com/c/dogs-vs-cats/data
(you will need to create a Kaggle account if you don't already have one -- don't worry, the
process is painless).
The pictures are medium-resolution color JPEGs. They look like this:
Unsurprisingly, the cats vs. dogs Kaggle competition in 2013 was won by entrants who used convnets. The best entries could achieve up to 95% accuracy. In our own example, we will get fairly close to this accuracy (in the next section), even though we will be training our models on less than 10% of the data that was available to the competitors. This original dataset contains 25,000 images of dogs and cats (12,500 from each class) and is 543MB large (compressed). After downloading and uncompressing it, we will create a new dataset containing three subsets: a training set with 1000 samples of each class, a validation set with 500 samples of each class, and finally a test set with 500 samples of each class.
Here are a few lines of code to do this:
import os, shutil
# The path to the directory where the original
# dataset was uncompressed
original_dataset_dir = '/Users/fchollet/Downloads/kaggle_original_data'
# The directory where we will
# store our smaller dataset
base_dir = '/Users/fchollet/Downloads/cats_and_dogs_small'
os.mkdir(base_dir)
# Directories for our training,
# validation and test splits
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)
# Directory with our training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)
# Directory with our training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)
# Directory with our validation cat pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')
os.mkdir(validation_cats_dir)
# Directory with our validation dog pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)
# Directory with our validation cat pictures
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)
# Directory with our validation dog pictures
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)
# Copy first 1000 cat images to train_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(train_cats_dir, fname)
shutil.copyfile(src, dst)
# Copy next 500 cat images to validation_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(validation_cats_dir, fname)
shutil.copyfile(src, dst)
# Copy next 500 cat images to test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(test_cats_dir, fname)
shutil.copyfile(src, dst)
# Copy first 1000 dog images to train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(train_dogs_dir, fname)
shutil.copyfile(src, dst)
# Copy next 500 dog images to validation_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(validation_dogs_dir, fname)
shutil.copyfile(src, dst)
# Copy next 500 dog images to test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
src = os.path.join(original_dataset_dir, fname)
dst = os.path.join(test_dogs_dir, fname)
shutil.copyfile(src, dst)
As a sanity check, let's count how many pictures we have in each training split (train/validation/test):
print('total training cat images:', len(os.listdir(train_cats_dir)))
total training cat images: 1000
print('total training dog images:', len(os.listdir(train_dogs_dir)))
total training dog images: 1000
print('total validation cat images:', len(os.listdir(validation_cats_dir)))
total validation cat images: 500
print('total validation dog images:', len(os.listdir(validation_dogs_dir)))
total validation dog images: 500
print('total test cat images:', len(os.listdir(test_cats_dir)))
total test cat images: 500
print('total test dog images:', len(os.listdir(test_dogs_dir)))
total test dog images: 500
So we have indeed 2000 training images, and then 1000 validation images and 1000 test images. In each split, there is the same number of samples from each class: this is a balanced binary classification problem, which means that classification accuracy will be an appropriate measure of success.
We've already built a small convnet for MNIST in the previous example, so you should be familiar with them. We will reuse the same
general structure: our convnet will be a stack of alternated Conv2D
(with relu
activation) and MaxPooling2D
layers.
However, since we are dealing with bigger images and a more complex problem, we will make our network accordingly larger: it will have one
more Conv2D
+ MaxPooling2D
stage. This serves both to augment the capacity of the network, and to further reduce the size of the
feature maps, so that they aren't overly large when we reach the Flatten
layer. Here, since we start from inputs of size 150x150 (a
somewhat arbitrary choice), we end up with feature maps of size 7x7 right before the Flatten
layer.
Note that the depth of the feature maps is progressively increasing in the network (from 32 to 128), while the size of the feature maps is decreasing (from 148x148 to 7x7). This is a pattern that you will see in almost all convnets.
Since we are attacking a binary classification problem, we are ending the network with a single unit (a Dense
layer of size 1) and a
sigmoid
activation. This unit will encode the probability that the network is looking at one class or the other.
from keras import layers
from keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Let's take a look at how the dimensions of the feature maps change with every successive layer:
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 148, 148, 32) 896 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 72, 72, 64) 18496 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 34, 34, 128) 73856 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 15, 15, 128) 147584 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 6272) 0 _________________________________________________________________ dense_1 (Dense) (None, 512) 3211776 _________________________________________________________________ dense_2 (Dense) (None, 1) 513 ================================================================= Total params: 3,453,121 Trainable params: 3,453,121 Non-trainable params: 0 _________________________________________________________________
For our compilation step, we'll go with the RMSprop
optimizer as usual. Since we ended our network with a single sigmoid unit, we will
use binary crossentropy as our loss (as a reminder, check out the table in Chapter 4, section 5 for a cheatsheet on what loss function to
use in various situations).
from keras import optimizers
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
As you already know by now, data should be formatted into appropriately pre-processed floating point tensors before being fed into our network. Currently, our data sits on a drive as JPEG files, so the steps for getting it into our network are roughly:
It may seem a bit daunting, but thankfully Keras has utilities to take care of these steps automatically. Keras has a module with image
processing helper tools, located at keras.preprocessing.image
. In particular, it contains the class ImageDataGenerator
which allows to
quickly set up Python generators that can automatically turn image files on disk into batches of pre-processed tensors. This is what we
will use here.
from keras.preprocessing.image import ImageDataGenerator
# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 150x150
target_size=(150, 150),
batch_size=20,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='binary')
Found 2000 images belonging to 2 classes. Found 1000 images belonging to 2 classes.
Let's take a look at the output of one of these generators: it yields batches of 150x150 RGB images (shape (20, 150, 150, 3)
) and binary
labels (shape (20,)
). 20 is the number of samples in each batch (the batch size). Note that the generator yields these batches
indefinitely: it just loops endlessly over the images present in the target folder. For this reason, we need to break
the iteration loop
at some point.
for data_batch, labels_batch in train_generator:
print('data batch shape:', data_batch.shape)
print('labels batch shape:', labels_batch.shape)
break
data batch shape: (20, 150, 150, 3) labels batch shape: (20,)
Let's fit our model to the data using the generator. We do it using the fit_generator
method, the equivalent of fit
for data generators
like ours. It expects as first argument a Python generator that will yield batches of inputs and targets indefinitely, like ours does.
Because the data is being generated endlessly, the generator needs to know example how many samples to draw from the generator before
declaring an epoch over. This is the role of the steps_per_epoch
argument: after having drawn steps_per_epoch
batches from the
generator, i.e. after having run for steps_per_epoch
gradient descent steps, the fitting process will go to the next epoch. In our case,
batches are 20-sample large, so it will take 100 batches until we see our target of 2000 samples.
When using fit_generator
, one may pass a validation_data
argument, much like with the fit
method. Importantly, this argument is
allowed to be a data generator itself, but it could be a tuple of Numpy arrays as well. If you pass a generator as validation_data
, then
this generator is expected to yield batches of validation data endlessly, and thus you should also specify the validation_steps
argument,
which tells the process how many batches to draw from the validation generator for evaluation.
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=30,
validation_data=validation_generator,
validation_steps=50)
Epoch 1/30 100/100 [==============================] - 9s - loss: 0.6898 - acc: 0.5285 - val_loss: 0.6724 - val_acc: 0.5950 Epoch 2/30 100/100 [==============================] - 8s - loss: 0.6543 - acc: 0.6340 - val_loss: 0.6565 - val_acc: 0.5950 Epoch 3/30 100/100 [==============================] - 8s - loss: 0.6143 - acc: 0.6690 - val_loss: 0.6116 - val_acc: 0.6650 Epoch 4/30 100/100 [==============================] - 8s - loss: 0.5626 - acc: 0.7125 - val_loss: 0.5774 - val_acc: 0.6970 Epoch 5/30 100/100 [==============================] - 8s - loss: 0.5266 - acc: 0.7335 - val_loss: 0.5726 - val_acc: 0.6960 Epoch 6/30 100/100 [==============================] - 8s - loss: 0.5007 - acc: 0.7550 - val_loss: 0.6075 - val_acc: 0.6580 Epoch 7/30 100/100 [==============================] - 8s - loss: 0.4723 - acc: 0.7840 - val_loss: 0.5516 - val_acc: 0.7060 Epoch 8/30 100/100 [==============================] - 8s - loss: 0.4521 - acc: 0.7875 - val_loss: 0.5724 - val_acc: 0.6980 Epoch 9/30 100/100 [==============================] - 8s - loss: 0.4163 - acc: 0.8095 - val_loss: 0.5653 - val_acc: 0.7140 Epoch 10/30 100/100 [==============================] - 8s - loss: 0.3988 - acc: 0.8185 - val_loss: 0.5508 - val_acc: 0.7180 Epoch 11/30 100/100 [==============================] - 8s - loss: 0.3694 - acc: 0.8385 - val_loss: 0.5712 - val_acc: 0.7300 Epoch 12/30 100/100 [==============================] - 8s - loss: 0.3385 - acc: 0.8465 - val_loss: 0.6097 - val_acc: 0.7110 Epoch 13/30 100/100 [==============================] - 8s - loss: 0.3229 - acc: 0.8565 - val_loss: 0.5827 - val_acc: 0.7150 Epoch 14/30 100/100 [==============================] - 8s - loss: 0.2962 - acc: 0.8720 - val_loss: 0.5928 - val_acc: 0.7190 Epoch 15/30 100/100 [==============================] - 8s - loss: 0.2684 - acc: 0.9005 - val_loss: 0.5921 - val_acc: 0.7190 Epoch 16/30 100/100 [==============================] - 8s - loss: 0.2509 - acc: 0.8980 - val_loss: 0.6148 - val_acc: 0.7250 Epoch 17/30 100/100 [==============================] - 8s - loss: 0.2221 - acc: 0.9110 - val_loss: 0.6487 - val_acc: 0.7010 Epoch 18/30 100/100 [==============================] - 8s - loss: 0.2021 - acc: 0.9250 - val_loss: 0.6185 - val_acc: 0.7300 Epoch 19/30 100/100 [==============================] - 8s - loss: 0.1824 - acc: 0.9310 - val_loss: 0.7713 - val_acc: 0.7020 Epoch 20/30 100/100 [==============================] - 8s - loss: 0.1579 - acc: 0.9425 - val_loss: 0.6657 - val_acc: 0.7260 Epoch 21/30 100/100 [==============================] - 8s - loss: 0.1355 - acc: 0.9550 - val_loss: 0.8077 - val_acc: 0.7040 Epoch 22/30 100/100 [==============================] - 8s - loss: 0.1247 - acc: 0.9545 - val_loss: 0.7726 - val_acc: 0.7080 Epoch 23/30 100/100 [==============================] - 8s - loss: 0.1111 - acc: 0.9585 - val_loss: 0.7387 - val_acc: 0.7220 Epoch 24/30 100/100 [==============================] - 8s - loss: 0.0932 - acc: 0.9710 - val_loss: 0.8196 - val_acc: 0.7050 Epoch 25/30 100/100 [==============================] - 8s - loss: 0.0707 - acc: 0.9790 - val_loss: 0.9012 - val_acc: 0.7190 Epoch 26/30 100/100 [==============================] - 8s - loss: 0.0625 - acc: 0.9855 - val_loss: 1.0437 - val_acc: 0.6970 Epoch 27/30 100/100 [==============================] - 8s - loss: 0.0611 - acc: 0.9820 - val_loss: 0.9831 - val_acc: 0.7060 Epoch 28/30 100/100 [==============================] - 8s - loss: 0.0488 - acc: 0.9865 - val_loss: 0.9721 - val_acc: 0.7310 Epoch 29/30 100/100 [==============================] - 8s - loss: 0.0375 - acc: 0.9915 - val_loss: 0.9987 - val_acc: 0.7100 Epoch 30/30 100/100 [==============================] - 8s - loss: 0.0387 - acc: 0.9895 - val_loss: 1.0139 - val_acc: 0.7240
It is good practice to always save your models after training:
model.save('cats_and_dogs_small_1.h5')
Let's plot the loss and accuracy of the model over the training and validation data during training:
import matplotlib.pyplot as plt
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
These plots are characteristic of overfitting. Our training accuracy increases linearly over time, until it reaches nearly 100%, while our validation accuracy stalls at 70-72%. Our validation loss reaches its minimum after only five epochs then stalls, while the training loss keeps decreasing linearly until it reaches nearly 0.
Because we only have relatively few training samples (2000), overfitting is going to be our number one concern. You already know about a number of techniques that can help mitigate overfitting, such as dropout and weight decay (L2 regularization). We are now going to introduce a new one, specific to computer vision, and used almost universally when processing images with deep learning models: data augmentation.
Overfitting is caused by having too few samples to learn from, rendering us unable to train a model able to generalize to new data. Given infinite data, our model would be exposed to every possible aspect of the data distribution at hand: we would never overfit. Data augmentation takes the approach of generating more training data from existing training samples, by "augmenting" the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, our model would never see the exact same picture twice. This helps the model get exposed to more aspects of the data and generalize better.
In Keras, this can be done by configuring a number of random transformations to be performed on the images read by our ImageDataGenerator
instance. Let's get started with an example:
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
These are just a few of the options available (for more, see the Keras documentation). Let's quickly go over what we just wrote:
rotation_range
is a value in degrees (0-180), a range within which to randomly rotate pictures.width_shift
and height_shift
are ranges (as a fraction of total width or height) within which to randomly translate picturesvertically or horizontally.
shear_range
is for randomly applying shearing transformations.zoom_range
is for randomly zooming inside pictures.horizontal_flip
is for randomly flipping half of the images horizontally -- relevant when there are no assumptions of horizontalasymmetry (e.g. real-world pictures).
fill_mode
is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift.Let's take a look at our augmented images:
# This is module with image preprocessing utilities
from keras.preprocessing import image
fnames = [os.path.join(train_cats_dir, fname) for fname in os.listdir(train_cats_dir)]
# We pick one image to "augment"
img_path = fnames[3]
# Read the image and resize it
img = image.load_img(img_path, target_size=(150, 150))
# Convert it to a Numpy array with shape (150, 150, 3)
x = image.img_to_array(img)
# Reshape it to (1, 150, 150, 3)
x = x.reshape((1,) + x.shape)
# The .flow() command below generates batches of randomly transformed images.
# It will loop indefinitely, so we need to `break` the loop at some point!
i = 0
for batch in datagen.flow(x, batch_size=1):
plt.figure(i)
imgplot = plt.imshow(image.array_to_img(batch[0]))
i += 1
if i % 4 == 0:
break
plt.show()
If we train a new network using this data augmentation configuration, our network will never see twice the same input. However, the inputs that it sees are still heavily intercorrelated, since they come from a small number of original images -- we cannot produce new information, we can only remix existing information. As such, this might not be quite enough to completely get rid of overfitting. To further fight overfitting, we will also add a Dropout layer to our model, right before the densely-connected classifier:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
Let's train our network using data augmentation and dropout:
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,)
# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 150x150
target_size=(150, 150),
batch_size=32,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=32,
class_mode='binary')
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=validation_generator,
validation_steps=50)
Found 2000 images belonging to 2 classes. Found 1000 images belonging to 2 classes. Epoch 1/100 100/100 [==============================] - 24s - loss: 0.6857 - acc: 0.5447 - val_loss: 0.6620 - val_acc: 0.5888 Epoch 2/100 100/100 [==============================] - 23s - loss: 0.6710 - acc: 0.5675 - val_loss: 0.6606 - val_acc: 0.5825 Epoch 3/100 100/100 [==============================] - 22s - loss: 0.6609 - acc: 0.5913 - val_loss: 0.6663 - val_acc: 0.5711.594 - ETA: 7s - loss: 0.6655 - ETA: 5s - los - ETA: 1s - loss: 0.6620 - acc: Epoch 4/100 100/100 [==============================] - 22s - loss: 0.6446 - acc: 0.6178 - val_loss: 0.6200 - val_acc: 0.6379 Epoch 5/100 100/100 [==============================] - 22s - loss: 0.6267 - acc: 0.6325 - val_loss: 0.6280 - val_acc: 0.5996 Epoch 6/100 100/100 [==============================] - 22s - loss: 0.6080 - acc: 0.6631 - val_loss: 0.6841 - val_acc: 0.5490 Epoch 7/100 100/100 [==============================] - 22s - loss: 0.5992 - acc: 0.6700 - val_loss: 0.5717 - val_acc: 0.6946 Epoch 8/100 100/100 [==============================] - 22s - loss: 0.5908 - acc: 0.6819 - val_loss: 0.5858 - val_acc: 0.6764 Epoch 9/100 100/100 [==============================] - 22s - loss: 0.5869 - acc: 0.6856 - val_loss: 0.5658 - val_acc: 0.6785 Epoch 10/100 100/100 [==============================] - 23s - loss: 0.5692 - acc: 0.6934 - val_loss: 0.5409 - val_acc: 0.7170 Epoch 11/100 100/100 [==============================] - 22s - loss: 0.5708 - acc: 0.6897 - val_loss: 0.5325 - val_acc: 0.7274 Epoch 12/100 100/100 [==============================] - 23s - loss: 0.5583 - acc: 0.7047 - val_loss: 0.5683 - val_acc: 0.7126 Epoch 13/100 100/100 [==============================] - 22s - loss: 0.5602 - acc: 0.7069 - val_loss: 0.6010 - val_acc: 0.6593 Epoch 14/100 100/100 [==============================] - 22s - loss: 0.5510 - acc: 0.7231 - val_loss: 0.5387 - val_acc: 0.7229 Epoch 15/100 100/100 [==============================] - 23s - loss: 0.5527 - acc: 0.7175 - val_loss: 0.5204 - val_acc: 0.7322 Epoch 16/100 100/100 [==============================] - 23s - loss: 0.5426 - acc: 0.7181 - val_loss: 0.5083 - val_acc: 0.7410 Epoch 17/100 100/100 [==============================] - 23s - loss: 0.5399 - acc: 0.7344 - val_loss: 0.5103 - val_acc: 0.7468 Epoch 18/100 100/100 [==============================] - 23s - loss: 0.5375 - acc: 0.7312 - val_loss: 0.5133 - val_acc: 0.7430 Epoch 19/100 100/100 [==============================] - 22s - loss: 0.5308 - acc: 0.7338 - val_loss: 0.4936 - val_acc: 0.7610 Epoch 20/100 100/100 [==============================] - 22s - loss: 0.5225 - acc: 0.7387 - val_loss: 0.4952 - val_acc: 0.7563 Epoch 21/100 100/100 [==============================] - 22s - loss: 0.5180 - acc: 0.7491 - val_loss: 0.4999 - val_acc: 0.7481 Epoch 22/100 100/100 [==============================] - 23s - loss: 0.5118 - acc: 0.7538 - val_loss: 0.4770 - val_acc: 0.7764 Epoch 23/100 100/100 [==============================] - 22s - loss: 0.5245 - acc: 0.7378 - val_loss: 0.4929 - val_acc: 0.7671 Epoch 24/100 100/100 [==============================] - 22s - loss: 0.5136 - acc: 0.7503 - val_loss: 0.4709 - val_acc: 0.7732 Epoch 25/100 100/100 [==============================] - 22s - loss: 0.4980 - acc: 0.7512 - val_loss: 0.4775 - val_acc: 0.7684 Epoch 26/100 100/100 [==============================] - 22s - loss: 0.4875 - acc: 0.7622 - val_loss: 0.4745 - val_acc: 0.7790 Epoch 27/100 100/100 [==============================] - 22s - loss: 0.5044 - acc: 0.7578 - val_loss: 0.5000 - val_acc: 0.7403 Epoch 28/100 100/100 [==============================] - 22s - loss: 0.4948 - acc: 0.7603 - val_loss: 0.4619 - val_acc: 0.7754 Epoch 29/100 100/100 [==============================] - 22s - loss: 0.4898 - acc: 0.7578 - val_loss: 0.4730 - val_acc: 0.7726 Epoch 30/100 100/100 [==============================] - 22s - loss: 0.4808 - acc: 0.7691 - val_loss: 0.4599 - val_acc: 0.7716 Epoch 31/100 100/100 [==============================] - 22s - loss: 0.4792 - acc: 0.7678 - val_loss: 0.4671 - val_acc: 0.7790 Epoch 32/100 100/100 [==============================] - 22s - loss: 0.4723 - acc: 0.7716 - val_loss: 0.4451 - val_acc: 0.7849 Epoch 33/100 100/100 [==============================] - 22s - loss: 0.4750 - acc: 0.7694 - val_loss: 0.4827 - val_acc: 0.7665 Epoch 34/100 100/100 [==============================] - 22s - loss: 0.4816 - acc: 0.7647 - val_loss: 0.4953 - val_acc: 0.7513 Epoch 35/100 100/100 [==============================] - 22s - loss: 0.4598 - acc: 0.7813 - val_loss: 0.4426 - val_acc: 0.7843 Epoch 36/100 100/100 [==============================] - 23s - loss: 0.4643 - acc: 0.7781 - val_loss: 0.4692 - val_acc: 0.7680 Epoch 37/100 100/100 [==============================] - 22s - loss: 0.4675 - acc: 0.7778 - val_loss: 0.4849 - val_acc: 0.7633 Epoch 38/100 100/100 [==============================] - 22s - loss: 0.4658 - acc: 0.7737 - val_loss: 0.4632 - val_acc: 0.7760 Epoch 39/100 100/100 [==============================] - 22s - loss: 0.4581 - acc: 0.7866 - val_loss: 0.4489 - val_acc: 0.7880 Epoch 40/100 100/100 [==============================] - 23s - loss: 0.4485 - acc: 0.7856 - val_loss: 0.4479 - val_acc: 0.7931 Epoch 41/100 100/100 [==============================] - 22s - loss: 0.4637 - acc: 0.7759 - val_loss: 0.4453 - val_acc: 0.7990 Epoch 42/100 100/100 [==============================] - 22s - loss: 0.4528 - acc: 0.7841 - val_loss: 0.4758 - val_acc: 0.7868 Epoch 43/100 100/100 [==============================] - 22s - loss: 0.4481 - acc: 0.7856 - val_loss: 0.4472 - val_acc: 0.7893 Epoch 44/100 100/100 [==============================] - 22s - loss: 0.4540 - acc: 0.7953 - val_loss: 0.4366 - val_acc: 0.7867A: 6s - loss: 0.4523 - acc: - ETA: Epoch 45/100 100/100 [==============================] - 22s - loss: 0.4411 - acc: 0.7919 - val_loss: 0.4708 - val_acc: 0.7697 Epoch 46/100 100/100 [==============================] - 22s - loss: 0.4493 - acc: 0.7869 - val_loss: 0.4366 - val_acc: 0.7829 Epoch 47/100 100/100 [==============================] - 22s - loss: 0.4436 - acc: 0.7916 - val_loss: 0.4307 - val_acc: 0.8090 Epoch 48/100 100/100 [==============================] - 22s - loss: 0.4391 - acc: 0.7928 - val_loss: 0.4203 - val_acc: 0.8065 Epoch 49/100 100/100 [==============================] - 23s - loss: 0.4284 - acc: 0.8053 - val_loss: 0.4422 - val_acc: 0.8041 Epoch 50/100 100/100 [==============================] - 22s - loss: 0.4492 - acc: 0.7906 - val_loss: 0.5422 - val_acc: 0.7437 Epoch 51/100 100/100 [==============================] - 22s - loss: 0.4292 - acc: 0.7953 - val_loss: 0.4446 - val_acc: 0.7932 Epoch 52/100 100/100 [==============================] - 22s - loss: 0.4275 - acc: 0.8037 - val_loss: 0.4287 - val_acc: 0.7989 Epoch 53/100 100/100 [==============================] - 22s - loss: 0.4297 - acc: 0.7975 - val_loss: 0.4091 - val_acc: 0.8046 Epoch 54/100 100/100 [==============================] - 23s - loss: 0.4198 - acc: 0.7978 - val_loss: 0.4413 - val_acc: 0.7964 Epoch 55/100 100/100 [==============================] - 23s - loss: 0.4195 - acc: 0.8019 - val_loss: 0.4265 - val_acc: 0.8001 Epoch 56/100 100/100 [==============================] - 22s - loss: 0.4081 - acc: 0.8056 - val_loss: 0.4374 - val_acc: 0.7957 Epoch 57/100 100/100 [==============================] - 22s - loss: 0.4214 - acc: 0.8006 - val_loss: 0.4228 - val_acc: 0.8020 Epoch 58/100 100/100 [==============================] - 22s - loss: 0.4050 - acc: 0.8097 - val_loss: 0.4332 - val_acc: 0.7900 Epoch 59/100 100/100 [==============================] - 22s - loss: 0.4162 - acc: 0.8134 - val_loss: 0.4088 - val_acc: 0.8099 Epoch 60/100 100/100 [==============================] - 22s - loss: 0.4042 - acc: 0.8141 - val_loss: 0.4436 - val_acc: 0.7957 Epoch 61/100 100/100 [==============================] - 23s - loss: 0.4016 - acc: 0.8212 - val_loss: 0.4082 - val_acc: 0.8189 Epoch 62/100 100/100 [==============================] - 22s - loss: 0.4167 - acc: 0.8097 - val_loss: 0.3935 - val_acc: 0.8236 Epoch 63/100 100/100 [==============================] - 23s - loss: 0.4052 - acc: 0.8138 - val_loss: 0.4509 - val_acc: 0.7824 Epoch 64/100 100/100 [==============================] - 22s - loss: 0.4011 - acc: 0.8209 - val_loss: 0.3874 - val_acc: 0.8299 Epoch 65/100 100/100 [==============================] - 22s - loss: 0.3966 - acc: 0.8131 - val_loss: 0.4328 - val_acc: 0.7970 Epoch 66/100 100/100 [==============================] - 23s - loss: 0.3889 - acc: 0.8163 - val_loss: 0.4766 - val_acc: 0.7719 Epoch 67/100 100/100 [==============================] - 22s - loss: 0.3960 - acc: 0.8163 - val_loss: 0.3859 - val_acc: 0.8325 Epoch 68/100 100/100 [==============================] - 22s - loss: 0.3893 - acc: 0.8231 - val_loss: 0.4172 - val_acc: 0.8128 Epoch 69/100 100/100 [==============================] - 23s - loss: 0.3828 - acc: 0.8219 - val_loss: 0.4023 - val_acc: 0.8215 loss: 0.3881 - acc: Epoch 70/100 100/100 [==============================] - 22s - loss: 0.3909 - acc: 0.8275 - val_loss: 0.4275 - val_acc: 0.8008 Epoch 71/100 100/100 [==============================] - 22s - loss: 0.3826 - acc: 0.8244 - val_loss: 0.3815 - val_acc: 0.8177 Epoch 72/100 100/100 [==============================] - 22s - loss: 0.3837 - acc: 0.8272 - val_loss: 0.4040 - val_acc: 0.8287 Epoch 73/100 100/100 [==============================] - 23s - loss: 0.3812 - acc: 0.8222 - val_loss: 0.4039 - val_acc: 0.8058 Epoch 74/100 100/100 [==============================] - 22s - loss: 0.3829 - acc: 0.8281 - val_loss: 0.4204 - val_acc: 0.8015 Epoch 75/100 100/100 [==============================] - 22s - loss: 0.3708 - acc: 0.8350 - val_loss: 0.4083 - val_acc: 0.8204 Epoch 76/100 100/100 [==============================] - 22s - loss: 0.3831 - acc: 0.8216 - val_loss: 0.3899 - val_acc: 0.8215 Epoch 77/100 100/100 [==============================] - 22s - loss: 0.3695 - acc: 0.8375 - val_loss: 0.3963 - val_acc: 0.8293 Epoch 78/100 100/100 [==============================] - 22s - loss: 0.3809 - acc: 0.8234 - val_loss: 0.4046 - val_acc: 0.8236 Epoch 79/100 100/100 [==============================] - 22s - loss: 0.3637 - acc: 0.8362 - val_loss: 0.3990 - val_acc: 0.8325 Epoch 80/100 100/100 [==============================] - 22s - loss: 0.3596 - acc: 0.8400 - val_loss: 0.3925 - val_acc: 0.8350 Epoch 81/100 100/100 [==============================] - 22s - loss: 0.3762 - acc: 0.8303 - val_loss: 0.3813 - val_acc: 0.8331 Epoch 82/100 100/100 [==============================] - 23s - loss: 0.3672 - acc: 0.8347 - val_loss: 0.4539 - val_acc: 0.7931 Epoch 83/100 100/100 [==============================] - 22s - loss: 0.3636 - acc: 0.8353 - val_loss: 0.3988 - val_acc: 0.8261 Epoch 84/100 100/100 [==============================] - 22s - loss: 0.3503 - acc: 0.8453 - val_loss: 0.3987 - val_acc: 0.8325 Epoch 85/100 100/100 [==============================] - 22s - loss: 0.3586 - acc: 0.8437 - val_loss: 0.3842 - val_acc: 0.8306 Epoch 86/100 100/100 [==============================] - 22s - loss: 0.3624 - acc: 0.8353 - val_loss: 0.4100 - val_acc: 0.8196.834 Epoch 87/100 100/100 [==============================] - 22s - loss: 0.3596 - acc: 0.8422 - val_loss: 0.3814 - val_acc: 0.8331 Epoch 88/100 100/100 [==============================] - 22s - loss: 0.3487 - acc: 0.8494 - val_loss: 0.4266 - val_acc: 0.8109 Epoch 89/100 100/100 [==============================] - 22s - loss: 0.3598 - acc: 0.8400 - val_loss: 0.4076 - val_acc: 0.8325 Epoch 90/100 100/100 [==============================] - 22s - loss: 0.3510 - acc: 0.8450 - val_loss: 0.3762 - val_acc: 0.8388 Epoch 91/100 100/100 [==============================] - 22s - loss: 0.3458 - acc: 0.8450 - val_loss: 0.4684 - val_acc: 0.8015 Epoch 92/100 100/100 [==============================] - 22s - loss: 0.3454 - acc: 0.8441 - val_loss: 0.4017 - val_acc: 0.8204 Epoch 93/100 100/100 [==============================] - 22s - loss: 0.3402 - acc: 0.8487 - val_loss: 0.3928 - val_acc: 0.8204 Epoch 94/100 100/100 [==============================] - 22s - loss: 0.3569 - acc: 0.8394 - val_loss: 0.4005 - val_acc: 0.8338 Epoch 95/100 100/100 [==============================] - 22s - loss: 0.3425 - acc: 0.8494 - val_loss: 0.3641 - val_acc: 0.8439 Epoch 96/100 100/100 [==============================] - 22s - loss: 0.3335 - acc: 0.8531 - val_loss: 0.3811 - val_acc: 0.8363 Epoch 97/100 100/100 [==============================] - 22s - loss: 0.3204 - acc: 0.8581 - val_loss: 0.3786 - val_acc: 0.8331 Epoch 98/100 100/100 [==============================] - 22s - loss: 0.3250 - acc: 0.8606 - val_loss: 0.4205 - val_acc: 0.8236 Epoch 99/100 100/100 [==============================] - 22s - loss: 0.3255 - acc: 0.8581 - val_loss: 0.3518 - val_acc: 0.8460 Epoch 100/100 100/100 [==============================] - 22s - loss: 0.3280 - acc: 0.8491 - val_loss: 0.3776 - val_acc: 0.8439
Let's save our model -- we will be using it in the section on convnet visualization.
model.save('cats_and_dogs_small_2.h5')
Let's plot our results again:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
Thanks to data augmentation and dropout, we are no longer overfitting: the training curves are rather closely tracking the validation curves. We are now able to reach an accuracy of 82%, a 15% relative improvement over the non-regularized model.
By leveraging regularization techniques even further and by tuning the network's parameters (such as the number of filters per convolution layer, or the number of layers in the network), we may be able to get an even better accuracy, likely up to 86-87%. However, it would prove very difficult to go any higher just by training our own convnet from scratch, simply because we have so little data to work with. As a next step to improve our accuracy on this problem, we will have to leverage a pre-trained model, which will be the focus of the next two sections.