In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
os.environ["CUDA_VISIBLE_DEVICES"]="0" 

Image Classification Example

We will begin our image classification example by importing some required modules.

In [2]:
import ktrain
from ktrain import vision as vis

Next, we will load and preprocess the image data for training and validation. ktrain can load images and associated labels from a variety of source:

  • images_from_folder: labels are represented as subfolders containing images [example notebook]
  • images_from_csv: labels are mapped to images in a CSV file [ example notebook ]
  • images_from_fname: labels are included as part of the filename and must be extracted using a regular expression [ example notebook ]
  • images_from_array: images and labels are stored in array [ example notebook ]

Here, we use the images_from_folder function to load the data as a generator (i.e., DirectoryIterator object). This function assumes the following directory structure:

  ├── datadir
    │   ├── train
    │   │   ├── class0       # folder containing documents of class 0
    │   │   ├── class1       # folder containing documents of class 1
    │   │   ├── class2       # folder containing documents of class 2
    │   │   └── classN       # folder containing documents of class N
    │   └── test 
    │       ├── class0       # folder containing documents of class 0
    │       ├── class1       # folder containing documents of class 1
    │       ├── class2       # folder containing documents of class 2
    │       └── classN       # folder containing documents of class N

The train_test_names argument can be used, if the train and test subfolders are named differently (e.g., test folder is called valid). Here, we load a dataset of cat and dog images, which can be obtained from here. The DATADIR variale should be set to the path to the extracted folder. The data_aug parameter can be used to employ data augmentation. We set this parameter using the get_data_aug function, which returns a default data augmentation with horizontal_flip=True as the only change to the defaults. See Keras documentation for a full set of agumentation parameters. Finally, we pass the requested target size (224,224) and color mode (rgb, which is a 3-channel image). The image will be resized or converted appropriately based on the values supplied. A target size of 224 by 224 is typically used when using a network pretrained on ImageNet, which we do next. The images_from_folder function returns generators for both the training and validation data in addition an instance of ktrain.vision.ImagePreprocessor, which can be used to preprocess raw data when making predictions for new examples. This will be demonstrated later.

In [3]:
DATADIR = 'data/dogscats'
(train_data, val_data, preproc) = vis.images_from_folder(datadir=DATADIR,
                                              # use a default data augmentation with horizontal_flip=True
                                              data_aug=vis.get_data_aug(horizontal_flip=True), 
                                              train_test_names=['train', 'valid'],
                                               target_size=(224,224), color_mode='rgb')
Found 23000 images belonging to 2 classes.
Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.

Let's examine some sample cat and dog images from the training set:

In [4]:
print('sample cat images:')
vis.show_random_images(DATADIR+'/train/cats/') 
sample cat images:
In [5]:
print('sample dog images:')
vis.show_random_images(DATADIR+'/train/dogs/')
sample dog images:

Next, we use the image_classifier function to load a ResNet50 model pre-trained on ImageNet. For more information on using pretrained networks, see this blog post. By default, all layers except the randomly initialized custom Dense layers on top are frozen (i.e., trainable). We, then, wrap the model and data in a Learner object. We specify 4 CPU workers to load batches during training, disable multiprocessing, and use a batch size of 64. You can change these values based on your system specification to see what yields the best peformance.

In [6]:
# let's print the available precanned image classification models in ktrain
vis.print_image_classifiers()
pretrained_resnet50: 50-layer Residual Network (pretrained on ImageNet)
resnet50: 50-layer Resididual Network (randomly initialized)
pretrained_mobilenet: MobileNet Neural Network (pretrained on ImageNet - TF only)
mobilenet: MobileNet Neural Network (randomly initialized - TF only)
pretrained_inception: Inception Version 3  (pretrained on ImageNet)
inception: Inception Version 3 (randomly initialized)
wrn22: 22-layer Wide Residual Network (randomly initialized)
default_cnn: a default LeNet-like Convolutional Neural Network
In [5]:
model = vis.image_classifier('pretrained_resnet50', train_data, val_data)
learner = ktrain.get_learner(model=model, train_data=train_data, val_data=val_data, 
                             workers=8, use_multiprocessing=False, batch_size=64)
The normalization scheme has been changed for use with a pretrained_resnet50 model. If you decide to use a different model, please reload your dataset with a ktrain.vision.data.images_from* function.

Is Multi-Label? False
pretrained_resnet50 model created.

Next, we freeze the first 15 layers, as the ImageNet pre-trained weights of these early layers are typically applicable as is. All other layers are unfrozen and trainable. You can use the learner.freeze and learner.unfreeze methods to selectively freeze and unfreeze layers, if necessary. learner.freeze(freeze_range=15) and learner.unfreeze(exclude_range=15) are equivalent. The number of layers you freeze will depend on how similar your dataset is to ImageNet and other particulars of the dataset. For instance, classifying satellite images or subcellular protein patterns may require less frozen layers than classifying pictures of dogs and cats. You can also begin training for a few epochs with many frozen layers and gradually unfreeze layers for later epochs.

In [6]:
learner.freeze(freeze_range=15)

You use the print_layers function to examine the layers of the created network.

In [7]:
learner.print_layers()
0 (trainable=False) : <keras.engine.input_layer.InputLayer object at 0x7fe1a3af6160>
1 (trainable=False) : <keras.layers.convolutional.ZeroPadding2D object at 0x7fe2502275f8>
2 (trainable=False) : <keras.layers.convolutional.Conv2D object at 0x7fe250227668>
3 (trainable=False) : <keras.layers.normalization.BatchNormalization object at 0x7fe1a2d2f470>
4 (trainable=False) : <keras.layers.core.Activation object at 0x7fe1a2d2fdd8>
5 (trainable=False) : <keras.layers.convolutional.ZeroPadding2D object at 0x7fe1a2d4c978>
6 (trainable=False) : <keras.layers.pooling.MaxPooling2D object at 0x7fe1a04cb828>
7 (trainable=False) : <keras.layers.convolutional.Conv2D object at 0x7fe1a2d3eba8>
8 (trainable=False) : <keras.layers.normalization.BatchNormalization object at 0x7fe20c7f9048>
9 (trainable=False) : <keras.layers.core.Activation object at 0x7fe20c7f9f98>
10 (trainable=False) : <keras.layers.convolutional.Conv2D object at 0x7fe20c791470>
11 (trainable=False) : <keras.layers.normalization.BatchNormalization object at 0x7fe20c7ab160>
12 (trainable=False) : <keras.layers.core.Activation object at 0x7fe20c771320>
13 (trainable=False) : <keras.layers.convolutional.Conv2D object at 0x7fe20c6e9be0>
14 (trainable=False) : <keras.layers.convolutional.Conv2D object at 0x7fe1a2d3e208>
15 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe20c68b748>
16 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1a04a5f28>
17 (trainable=True) : <keras.layers.merge.Add object at 0x7fe1a2d2f908>
18 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1a02d6390>
19 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1a02d64e0>
20 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe20b308c18>
21 (trainable=True) : <keras.layers.core.Activation object at 0x7fe20b3399e8>
22 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe20b2f2c88>
23 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe20b2ad048>
24 (trainable=True) : <keras.layers.core.Activation object at 0x7fe20b27d518>
25 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1a0288e10>
26 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1a00ad710>
27 (trainable=True) : <keras.layers.merge.Add object at 0x7fe1787d1eb8>
28 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1787802b0>
29 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1787801d0>
30 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe178700080>
31 (trainable=True) : <keras.layers.core.Activation object at 0x7fe17872a6d8>
32 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1786e4fd0>
33 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe178685cf8>
34 (trainable=True) : <keras.layers.core.Activation object at 0x7fe178648390>
35 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1785e9710>
36 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe17858f400>
37 (trainable=True) : <keras.layers.merge.Add object at 0x7fe178554080>
38 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1784f5128>
39 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe178491048>
40 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe178408080>
41 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1784283c8>
42 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1783d9dd8>
43 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe17837deb8>
44 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1782bdf60>
45 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1782ea400>
46 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe17824c1d0>
47 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1782830f0>
48 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1781edc18>
49 (trainable=True) : <keras.layers.merge.Add object at 0x7fe17812dfd0>
50 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1780dc358>
51 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1780dc278>
52 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe178040390>
53 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1587c8ac8>
54 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe15877ee80>
55 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1586e25c0>
56 (trainable=True) : <keras.layers.core.Activation object at 0x7fe15870c5f8>
57 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1586887b8>
58 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1586ad4a8>
59 (trainable=True) : <keras.layers.merge.Add object at 0x7fe158672160>
60 (trainable=True) : <keras.layers.core.Activation object at 0x7fe158596048>
61 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1585ae0f0>
62 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe158519080>
63 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1584c6470>
64 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1584f89e8>
65 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe15849dfd0>
66 (trainable=True) : <keras.layers.core.Activation object at 0x7fe158466400>
67 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1583864a8>
68 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1583a3198>
69 (trainable=True) : <keras.layers.merge.Add object at 0x7fe15836c278>
70 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1582eb6a0>
71 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1582eb908>
72 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe15821e7b8>
73 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1581cc4e0>
74 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1581f4860>
75 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe158199550>
76 (trainable=True) : <keras.layers.core.Activation object at 0x7fe15815c2b0>
77 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1580ddcf8>
78 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe158097358>
79 (trainable=True) : <keras.layers.merge.Add object at 0x7fe158069160>
80 (trainable=True) : <keras.layers.core.Activation object at 0x7fe13879f630>
81 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe13879f390>
82 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe13872c208>
83 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1386ddef0>
84 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1386b2550>
85 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe13864e240>
86 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1386184e0>
87 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe13859ccc0>
88 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1384fe9b0>
89 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe138539e48>
90 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe138443c18>
91 (trainable=True) : <keras.layers.merge.Add object at 0x7fe13837fd30>
92 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1383ae198>
93 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1383ae0b8>
94 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1383102b0>
95 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1382d9c88>
96 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe138291be0>
97 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe13824f080>
98 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1381c5080>
99 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe138199d68>
100 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe138139668>
101 (trainable=True) : <keras.layers.merge.Add object at 0x7fe13807ce10>
102 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1380a8208>
103 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe1380a8128>
104 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1187e5518>
105 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1187949b0>
106 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe118752c50>
107 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe11870a4e0>
108 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1186de4e0>
109 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe118657588>
110 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1186786d8>
111 (trainable=True) : <keras.layers.merge.Add object at 0x7fe1185b4dd8>
112 (trainable=True) : <keras.layers.core.Activation object at 0x7fe11855e1d0>
113 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe11855e0f0>
114 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1184c0240>
115 (trainable=True) : <keras.layers.core.Activation object at 0x7fe118487cc0>
116 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe118440c18>
117 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1183fc470>
118 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1183ce4e0>
119 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe118344da0>
120 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1183666a0>
121 (trainable=True) : <keras.layers.merge.Add object at 0x7fe1182a7e48>
122 (trainable=True) : <keras.layers.core.Activation object at 0x7fe118252240>
123 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe118252160>
124 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe1182372b0>
125 (trainable=True) : <keras.layers.core.Activation object at 0x7fe11817bd30>
126 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe118139c88>
127 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe118171048>
128 (trainable=True) : <keras.layers.core.Activation object at 0x7fe1180c3550>
129 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe118039e10>
130 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe11805c710>
131 (trainable=True) : <keras.layers.merge.Add object at 0x7fe0fc75eeb8>
132 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0fc70b2b0>
133 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc70b1d0>
134 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0fc6ef320>
135 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0fc6bada0>
136 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc673cf8>
137 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0fc62c358>
138 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0fc57f240>
139 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc577da0>
140 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0fc51a780>
141 (trainable=True) : <keras.layers.merge.Add object at 0x7fe0fc45af28>
142 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0fc406320>
143 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc406240>
144 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0fc3db358>
145 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0fc35bd68>
146 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc2fe208>
147 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0fc3162e8>
148 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0fc2e02b0>
149 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc25eef0>
150 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc146f98>
151 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0fc204ef0>
152 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0fc10b898>
153 (trainable=True) : <keras.layers.merge.Add object at 0x7fe0fc0d6860>
154 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0fc057278>
155 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0fc0579b0>
156 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0f073d080>
157 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0f07695c0>
158 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0f071fcc0>
159 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0f06c17b8>
160 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0f0686550>
161 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0f06305f8>
162 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0f05ca2e8>
163 (trainable=True) : <keras.layers.merge.Add object at 0x7fe0f0592400>
164 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0f0514ac8>
165 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0f04cfd30>
166 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0f04b5080>
167 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0f0462630>
168 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0f0419d30>
169 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0f043b828>
170 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0f0381898>
171 (trainable=True) : <keras.layers.convolutional.Conv2D object at 0x7fe0f0311668>
172 (trainable=True) : <keras.layers.normalization.BatchNormalization object at 0x7fe0f02c7358>
173 (trainable=True) : <keras.layers.merge.Add object at 0x7fe0f0287470>
174 (trainable=True) : <keras.layers.core.Activation object at 0x7fe0f020eb38>
175 (trainable=True) : <keras.layers.core.Flatten object at 0x7fe1a3adf4e0>
176 (trainable=True) : <keras.layers.core.Dropout object at 0x7fe0f022dcc0>
177 (trainable=True) : <keras.layers.core.Dense object at 0x7fe0f01ef438>

As shown before, we use the Learning Rate Finder in ktrain to find a good initial learning rate.

In [8]:
learner.lr_find()
simulating training for different learning rates... this may take a few moments...
Epoch 1/5
359/359 [==============================] - 125s 349ms/step - loss: 1.4328 - acc: 0.5916
Epoch 2/5
359/359 [==============================] - 113s 315ms/step - loss: 0.3350 - acc: 0.8991
Epoch 3/5
231/359 [==================>...........] - ETA: 40s - loss: 0.2776 - acc: 0.9272

done.
Please invoke the Learner.lr_plot() method to visually inspect the loss plot to help identify the maximal learning rate associated with falling loss.
In [9]:
learner.lr_plot()

Finally, we will use the autofit method to train our model using a triangular learning rate policy. Since we have not specified the number of epochs, the maximum learning rate will be periodically reduced when validation loss fails to decrease and eventually stop automatically.

Our final validation accuracy is 99.55% first occuring at the 8th epoch during this run.

In [10]:
learner.autofit(1e-4)
early_stopping automatically enabled at patience=5
reduce_on_plateau automatically enabled at patience=2


begin training using triangular learning rate policy with max lr of 0.0001...
Epoch 1/1024
359/359 [==============================] - 118s 330ms/step - loss: 0.2445 - acc: 0.9304 - val_loss: 0.0401 - val_acc: 0.9870
Epoch 2/1024
359/359 [==============================] - 117s 325ms/step - loss: 0.0778 - acc: 0.9790 - val_loss: 0.0339 - val_acc: 0.9895
Epoch 3/1024
359/359 [==============================] - 117s 325ms/step - loss: 0.0670 - acc: 0.9803 - val_loss: 0.0318 - val_acc: 0.9920
Epoch 4/1024
359/359 [==============================] - 116s 323ms/step - loss: 0.0518 - acc: 0.9846 - val_loss: 0.0334 - val_acc: 0.9895
Epoch 5/1024
359/359 [==============================] - 116s 324ms/step - loss: 0.0530 - acc: 0.9834 - val_loss: 0.0438 - val_acc: 0.9890

Epoch 00005: Reducing Max LR on Plateau: new max lr will be 5e-05 (if not early_stopping).
Epoch 6/1024
359/359 [==============================] - 117s 325ms/step - loss: 0.0345 - acc: 0.9882 - val_loss: 0.0324 - val_acc: 0.9910
Epoch 7/1024
359/359 [==============================] - 117s 325ms/step - loss: 0.0302 - acc: 0.9902 - val_loss: 0.0255 - val_acc: 0.9930
Epoch 8/1024
359/359 [==============================] - 117s 327ms/step - loss: 0.0219 - acc: 0.9928 - val_loss: 0.0222 - val_acc: 0.9955
Epoch 9/1024
359/359 [==============================] - 116s 324ms/step - loss: 0.0265 - acc: 0.9914 - val_loss: 0.0361 - val_acc: 0.9910
Epoch 10/1024
359/359 [==============================] - 117s 325ms/step - loss: 0.0216 - acc: 0.9931 - val_loss: 0.0249 - val_acc: 0.9935

Epoch 00010: Reducing Max LR on Plateau: new max lr will be 2.5e-05 (if not early_stopping).
Epoch 11/1024
359/359 [==============================] - 117s 325ms/step - loss: 0.0179 - acc: 0.9937 - val_loss: 0.0271 - val_acc: 0.9940
Epoch 12/1024
359/359 [==============================] - 116s 324ms/step - loss: 0.0171 - acc: 0.9950 - val_loss: 0.0229 - val_acc: 0.9930

Epoch 00012: Reducing Max LR on Plateau: new max lr will be 1.25e-05 (if not early_stopping).
Epoch 13/1024
359/359 [==============================] - 117s 325ms/step - loss: 0.0142 - acc: 0.9953 - val_loss: 0.0250 - val_acc: 0.9945
Restoring model weights from the end of the best epoch
Epoch 00013: early stopping
Weights from best epoch have been loaded into model.
Out[10]:
<keras.callbacks.History at 0x7fdd440e7e48>
In [11]:
loss, acc = learner.model.evaluate_generator(learner.val_data, 
                                             steps=len(learner.val_data))
In [12]:
print('final loss:%s, final accuracy:%s' % (loss, acc))
final loss:0.022161634005376185, final accuracy:0.9955

As can be seen, the final validation accuracy of our model is 99.55%.

Using Our Model to Make Predictions

Finally, let's use our model to make predictions for some images.

Here is a sample image of both a cat and a dog from the validation set.

In [13]:
!!ls {DATADIR}/valid/cats |head -n3
Out[13]:
['cat.10016.jpg', 'cat.1001.jpg', 'cat.10026.jpg']
In [14]:
!!ls {DATADIR}/valid/dogs |head -n3
Out[14]:
['dog.10001.jpg', 'dog.10005.jpg', 'dog.10010.jpg']
In [15]:
vis.show_image(DATADIR+'/valid/cats/cat.10016.jpg')
Out[15]:
<matplotlib.image.AxesImage at 0x7fdd2d3fa4e0>
In [16]:
vis.show_image(DATADIR+'/valid/dogs/dog.10001.jpg')
Out[16]:
<matplotlib.image.AxesImage at 0x7fdd2cd9c5f8>

Now, let's create a predictor object to make predictions for the above images.

In [17]:
predictor = ktrain.get_predictor(learner.model, preproc)

Let's see if we predict the selected cat and dog images correctly.

In [18]:
predictor.predict_filename(DATADIR+'/valid/cats/cat.10016.jpg')
Out[18]:
['cats']
In [19]:
predictor.predict_filename(DATADIR+'/valid/dogs/dog.10001.jpg')
Out[19]:
['dogs']

Our predictor is working well. We can save our predictor to disk for later use in an application.

In [20]:
predictor.save('/tmp/cat_vs_dog_detector')

Let's load our predictor from disk to show that it still works as expected.

In [21]:
predictor = ktrain.load_predictor('/tmp/cat_vs_dog_detector')
In [22]:
predictor.predict_filename(DATADIR+'/valid/cats/cat.10016.jpg')
Out[22]:
['cats']
In [23]:
predictor.predict_filename(DATADIR+'/valid/dogs/dog.10001.jpg')
Out[23]:
['dogs']

Finally, let's make predictions for all the cat pictures in our validation set:

In [24]:
predictor.predict_folder(DATADIR+'/valid/cats/')[:10]
Found 1000 images belonging to 1 classes.
Out[24]:
[('cats/cat.1001.jpg', 'cats'),
 ('cats/cat.10016.jpg', 'cats'),
 ('cats/cat.10026.jpg', 'cats'),
 ('cats/cat.10048.jpg', 'cats'),
 ('cats/cat.10050.jpg', 'cats'),
 ('cats/cat.10064.jpg', 'cats'),
 ('cats/cat.10071.jpg', 'cats'),
 ('cats/cat.10091.jpg', 'cats'),
 ('cats/cat.10103.jpg', 'cats'),
 ('cats/cat.10104.jpg', 'cats')]

By default, predict* methods in ktrain return the predicted class labels. To view the predicted probabilities for each class, supply return_proba=True as an extra argument:

predictor.predict_filename(filename, return_proba=True)
predictor.predict_folder(foldername, return_proba=True)

Multi-Label Image Classification

In the previous example, the classes were mutually exclusive. That is, images contained either a dog or a cat, but not both. Some problems are multi-label classification problems in that each image can belong to multiple classes (or categories). One such instance of this is the Kaggle Planet Competition. In this competition, were are given a collection of satellite images of the Amazon rainforest. The objective here is to identify locations of deforestation and human encroachment on forests by classifying images into up to 17 different categories. Categories include agriculture, habitation, selective_logging, and slash_burn. A given satellite image can belong to more than category. The dataset can be downloaded from the competition page. The satellite images are located in a zipped folder called train-jpg.zip. The labels for each image are in the form of a CSV (i.e., train_v2.csv) with file names and their labels. Let us first examine the CSV file for this dataset. Be sure to set the DATADIR variable to the path of the extracted dataset.

In [3]:
DATADIR = 'data/planet'
!!head {DATADIR}/train_v2.csv
Out[3]:
['image_name,tags',
 'train_0,haze primary',
 'train_1,agriculture clear primary water',
 'train_2,clear primary',
 'train_3,clear primary',
 'train_4,agriculture clear habitation primary road',
 'train_5,haze primary water',
 'train_6,agriculture clear cultivation primary water',
 'train_7,haze primary',
 'train_8,agriculture clear cultivation primary']

We make three observations.

  • The image_name field is the file name of the satellite image.
  • The file names are missing the .jpg file extension.
  • The labels are simply a space-delimited list of tags, rather than a one-hot-encoded vector.

Let us first convert this CSV into a new CSV that includes one-hot-encoded representations of the tags and appends the file extension to the file names. Since this dataset format is somewhat common (especially for multi-label image classification problems), ktrain contains a convenience function to automatically perform the conversion.

In [4]:
ORIGINAL_DATA = DATADIR+'/train_v2.csv'
CONVERTED_DATA = DATADIR+'/train_v2-CONVERTED.csv'
labels = vis.preprocess_csv(ORIGINAL_DATA, 
                           CONVERTED_DATA, 
                           x_col='image_name', y_col='tags', suffix='.jpg')
In [5]:
!!head {DATADIR}/train_v2-CONVERTED.csv
Out[5]:
['image_name,agriculture,artisinal_mine,bare_ground,blooming,blow_down,clear,cloudy,conventional_mine,cultivation,habitation,haze,partly_cloudy,primary,road,selective_logging,slash_burn,water',
 'train_0.jpg,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0',
 'train_1.jpg,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1',
 'train_2.jpg,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0',
 'train_3.jpg,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0',
 'train_4.jpg,1,0,0,0,0,1,0,0,0,1,0,0,1,1,0,0,0',
 'train_5.jpg,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1',
 'train_6.jpg,1,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,1',
 'train_7.jpg,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0',
 'train_8.jpg,1,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0']

We can use the images_from_csv for function to load the data as generators. Remember to specify preprocess_for='resenet50', as we will be using a ResNet50 architecture again.

In [6]:
train_data, val_data, preproc = vis.images_from_csv(
                          CONVERTED_DATA,
                         'image_name',
                          directory=DATADIR+'/train-jpg',
                          val_filepath = None,
                          label_columns = labels,
                          data_aug=vis.get_data_aug(horizontal_flip=True, vertical_flip=True))
Found 40479 images belonging to 1 classes.
Found 36293 images.
Found 4186 images.

As before, we load a pre-trained ResNet50 model (the default) and wrap this model and the data in a Learner object. Here, will freeze only the first two layers, as the satelitte images are comparatively more dissimilar to ImageNet. Thus, the weights in earlier layers will need more updating.

In [8]:
model = vis.image_classifier('pretrained_resnet50', train_data, val_data=val_data)
learner = ktrain.get_learner(model, train_data=train_data, val_data=val_data, 
                             batch_size=64, workers=8, use_multiprocessing=False)
The normalization scheme has been changed for use with a pretrained_resnet50 model. If you decide to use a different model, please reload your dataset with a ktrain.vision.data.images_from* function.

Is Multi-Label? True
pretrained_resnet50 model created.
In [9]:
learner.freeze(2)

The learning-rate-finder indicates a learning rate of 1e-4 would be a good choice.

In [35]:
learner.lr_find()
simulating training for different learning rates... this may take a few moments...
Epoch 1/5
567/567 [==============================] - 213s 375ms/step - loss: 0.7902 - acc: 0.6997
Epoch 2/5
567/567 [==============================] - 199s 350ms/step - loss: 0.2325 - acc: 0.9208
Epoch 3/5
567/567 [==============================] - 199s 351ms/step - loss: 0.1963 - acc: 0.9371
Epoch 4/5
567/567 [==============================] - 199s 351ms/step - loss: 0.2701 - acc: 0.9098
Epoch 5/5
135/567 [======>.......................] - ETA: 2:35 - loss: 0.4646 - acc: 0.8918

done.
Please invoke the Learner.lr_plot() method to visually inspect the loss plot to help identify the maximal learning rate associated with falling loss.
In [36]:
learner.lr_plot()

For this dataset, instead of using autofit, we will use the fit_onecycle method that utilizes the 1cycle learning rate policy. The final model achieves an F2-score of 0.928, as shown below.

In [10]:
learner.fit_onecycle(1e-4, 20)
begin training using onecycle policy with max lr of 0.0001...
Epoch 1/20
567/567 [==============================] - 219s 386ms/step - loss: 0.2452 - acc: 0.9160 - val_loss: 0.1311 - val_acc: 0.9523
Epoch 2/20
567/567 [==============================] - 206s 363ms/step - loss: 0.1429 - acc: 0.9483 - val_loss: 0.1067 - val_acc: 0.9608
Epoch 3/20
567/567 [==============================] - 206s 364ms/step - loss: 0.1241 - acc: 0.9549 - val_loss: 0.1006 - val_acc: 0.9630
Epoch 4/20
567/567 [==============================] - 206s 364ms/step - loss: 0.1148 - acc: 0.9579 - val_loss: 0.0958 - val_acc: 0.9643
Epoch 5/20
567/567 [==============================] - 205s 362ms/step - loss: 0.1080 - acc: 0.9602 - val_loss: 0.0919 - val_acc: 0.9655
Epoch 6/20
567/567 [==============================] - 205s 361ms/step - loss: 0.1056 - acc: 0.9608 - val_loss: 0.0936 - val_acc: 0.9648
Epoch 7/20
567/567 [==============================] - 205s 362ms/step - loss: 0.1035 - acc: 0.9612 - val_loss: 0.0897 - val_acc: 0.9662
Epoch 8/20
567/567 [==============================] - 205s 362ms/step - loss: 0.1021 - acc: 0.9618 - val_loss: 0.0891 - val_acc: 0.9667
Epoch 9/20
567/567 [==============================] - 205s 362ms/step - loss: 0.1012 - acc: 0.9621 - val_loss: 0.0908 - val_acc: 0.9665
Epoch 10/20
567/567 [==============================] - 206s 363ms/step - loss: 0.0991 - acc: 0.9628 - val_loss: 0.0954 - val_acc: 0.9654
Epoch 11/20
567/567 [==============================] - 206s 363ms/step - loss: 0.0978 - acc: 0.9636 - val_loss: 0.0943 - val_acc: 0.9656
Epoch 12/20
567/567 [==============================] - 206s 363ms/step - loss: 0.0974 - acc: 0.9634 - val_loss: 0.0862 - val_acc: 0.9676
Epoch 13/20
567/567 [==============================] - 207s 364ms/step - loss: 0.0960 - acc: 0.9641 - val_loss: 0.0855 - val_acc: 0.9679
Epoch 14/20
567/567 [==============================] - 206s 364ms/step - loss: 0.0928 - acc: 0.9648 - val_loss: 0.0837 - val_acc: 0.9685
Epoch 15/20
567/567 [==============================] - 207s 364ms/step - loss: 0.0911 - acc: 0.9654 - val_loss: 0.0865 - val_acc: 0.9675
Epoch 16/20
567/567 [==============================] - 207s 365ms/step - loss: 0.0890 - acc: 0.9663 - val_loss: 0.0840 - val_acc: 0.9686
Epoch 17/20
567/567 [==============================] - 205s 361ms/step - loss: 0.0872 - acc: 0.9667 - val_loss: 0.0840 - val_acc: 0.9692
Epoch 18/20
567/567 [==============================] - 206s 364ms/step - loss: 0.0857 - acc: 0.9673 - val_loss: 0.0835 - val_acc: 0.9689
Epoch 19/20
567/567 [==============================] - 206s 363ms/step - loss: 0.0844 - acc: 0.9680 - val_loss: 0.0836 - val_acc: 0.9687
Epoch 20/20
567/567 [==============================] - 206s 363ms/step - loss: 0.0821 - acc: 0.9686 - val_loss: 0.0842 - val_acc: 0.9691
Out[10]:
<keras.callbacks.History at 0x7f0ab06896a0>

If there is not yet evidence of overfitting, it can sometimes be beneficial to train further after early_stopping. Since, the validation loss appears to still decrease, we will train for a little more using a lower learning rate. We only train for one additional epoch here for illustration purposes. Prior training, the current model is saved using the learner.save_model method in case we end up overfitting. If overfitting, the original model can be restored using the learner.load_model method.

In [15]:
learner.save_model('/tmp/planet_model')
In [16]:
learner.fit_onecycle(1e-4/10,1)
begin training using onecycle policy with max lr of 1e-05...
Epoch 1/1
567/567 [==============================] - 206s 363ms/step - loss: 0.0821 - acc: 0.9684 - val_loss: 0.0835 - val_acc: 0.9693
Out[16]:
<keras.callbacks.History at 0x7f0adc675550>

Evaluation

The evaluation metric for the Kaggle Planet competition was the F2-score.

As shown below, this model achieves an F2-score of 0.928.

In [17]:
from sklearn.metrics import fbeta_score
import numpy as np
import warnings
def f2(preds, targs, start=0.17, end=0.24, step=0.01):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        return max([fbeta_score(targs, (preds>th), 2, average='samples')
                    for th in np.arange(start,end,step)])
In [18]:
y_pred = learner.model.predict_generator(val_data, steps=len(val_data))
y_true = val_data.labels
In [19]:
f2(y_pred, y_true)
Out[19]:
0.9284542715011629

Making Predictions

Let's make some predictions using our model and examine results. As before, we first create a Predictor instance.

In [22]:
predictor = ktrain.get_predictor(learner.model, preproc)

Let's examine the folder of images and select a couple to analyze.

In [23]:
!!ls {DATADIR}/train-jpg/ |head
Out[23]:
['train_0.jpg',
 'train_10000.jpg',
 'train_10001.jpg',
 'train_10002.jpg',
 'train_10003.jpg',
 'train_10004.jpg',
 'train_10005.jpg',
 'train_10006.jpg',
 'train_10007.jpg',
 'train_10008.jpg']

Image train_10008.jpg is categorized into the following classes:

  • artisinal_mine (i.e., small-scale mining operations - sometimes illegal in lands designated for conservation)
  • clear
  • primary (rainforest)
  • water
In [40]:
vis.show_image(DATADIR+'/train-jpg/train_10008.jpg')
Out[40]:
<matplotlib.image.AxesImage at 0x7f06b44d23c8>
In [43]:
!!head -n 1 {CONVERTED_DATA}
Out[43]:
['image_name,agriculture,artisinal_mine,bare_ground,blooming,blow_down,clear,cloudy,conventional_mine,cultivation,habitation,haze,partly_cloudy,primary,road,selective_logging,slash_burn,water']
In [44]:
!!grep train_10008.jpg {CONVERTED_DATA}
Out[44]:
['train_10008.jpg,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1']

Our predictions are consistent as shown below:

In [45]:
predictor.predict_filename(DATADIR+'/train-jpg/train_10008.jpg')
Out[45]:
[[('agriculture', 0.0040258057),
  ('artisinal_mine', 0.99988484),
  ('bare_ground', 0.041586075),
  ('blooming', 3.7402046e-07),
  ('blow_down', 3.65358e-08),
  ('clear', 0.99841964),
  ('cloudy', 2.9288345e-05),
  ('conventional_mine', 0.0185931),
  ('cultivation', 0.0033639816),
  ('habitation', 0.011448876),
  ('haze', 0.00052912283),
  ('partly_cloudy', 0.0012023835),
  ('primary', 0.9873427),
  ('road', 0.21895583),
  ('selective_logging', 3.939015e-05),
  ('slash_burn', 2.703976e-05),
  ('water', 0.9984894)]]

Here is another example showing water, clear, and primary.

In [46]:
vis.show_image(DATADIR+'/train-jpg/train_10010.jpg')
Out[46]:
<matplotlib.image.AxesImage at 0x7f06b45dbe10>
In [47]:
!!head -n 1 {CONVERTED_DATA}
Out[47]:
['image_name,agriculture,artisinal_mine,bare_ground,blooming,blow_down,clear,cloudy,conventional_mine,cultivation,habitation,haze,partly_cloudy,primary,road,selective_logging,slash_burn,water']
In [48]:
!!grep train_10010.jpg {CONVERTED_DATA}
Out[48]:
['train_10010.jpg,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1']
In [49]:
predictor.predict_filename(DATADIR+'/train-jpg/train_10010.jpg')
Out[49]:
[[('agriculture', 0.26275295),
  ('artisinal_mine', 0.0002662742),
  ('bare_ground', 0.00402921),
  ('blooming', 0.00014191697),
  ('blow_down', 7.4397904e-06),
  ('clear', 0.998609),
  ('cloudy', 1.0647154e-06),
  ('conventional_mine', 4.7428235e-05),
  ('cultivation', 0.08935747),
  ('habitation', 0.010819469),
  ('haze', 3.7207883e-05),
  ('partly_cloudy', 0.0010312625),
  ('primary', 0.99998605),
  ('road', 0.06903103),
  ('selective_logging', 0.0035000525),
  ('slash_burn', 0.0003681789),
  ('water', 0.9974711)]]