%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
os.environ["CUDA_VISIBLE_DEVICES"]="0"
We will begin our image classification example by importing some required modules.
import ktrain
from ktrain import vision as vis
Next, we will load and preprocess the image data for training and validation. ktrain can load images and associated labels from a variety of source:
images_from_folder
: labels are represented as subfolders containing images [example notebook] images_from_csv
: labels are mapped to images in a CSV file [ example notebook ]images_from_fname
: labels are included as part of the filename and must be extracted using a regular expression [ example notebook ]images_from_array
: images and labels are stored in array [ example notebook ]Here, we use the images_from_folder
function to load the data as a generator (i.e., DirectoryIterator object). This function assumes the following directory structure:
├── datadir
│ ├── train
│ │ ├── class0 # folder containing documents of class 0
│ │ ├── class1 # folder containing documents of class 1
│ │ ├── class2 # folder containing documents of class 2
│ │ └── classN # folder containing documents of class N
│ └── test
│ ├── class0 # folder containing documents of class 0
│ ├── class1 # folder containing documents of class 1
│ ├── class2 # folder containing documents of class 2
│ └── classN # folder containing documents of class N
The train_test_names argument can be used, if the train and test subfolders are named differently (e.g., test folder is called valid). Here, we load a dataset of cat and dog images, which can be obtained from here. The DATADIR variale should be set to the path to the extracted folder. The data_aug parameter can be used to employ data augmentation. We set this parameter using the get_data_aug
function, which returns a default data augmentation with horizontal_flip=True
as the only change to the defaults. See Keras documentation for a full set of agumentation parameters. Finally, we pass the requested target size (224,224) and color mode (rgb, which is a 3-channel image). The image will be resized or converted appropriately based on the values supplied. A target size of 224 by 224 is typically used when using a network pretrained on ImageNet, which we do next. The images_from_folder
function returns generators for both the training and validation data in addition an instance of ktrain.vision.ImagePreprocessor
, which can be used to preprocess raw data when making predictions for new examples. This will be demonstrated later.
DATADIR = 'data/dogscats'
(train_data, val_data, preproc) = vis.images_from_folder(datadir=DATADIR,
# use a default data augmentation with horizontal_flip=True
data_aug=vis.get_data_aug(horizontal_flip=True),
train_test_names=['train', 'valid'],
target_size=(224,224), color_mode='rgb')
Let's examine some sample cat and dog images from the training set:
print('sample cat images:')
vis.show_random_images(DATADIR+'/train/cats/')
print('sample dog images:')
vis.show_random_images(DATADIR+'/train/dogs/')
Next, we use the image_classifier
function to load a ResNet50 model pre-trained on ImageNet. For more information on using pretrained networks, see this blog post. By default, all layers except the randomly initialized custom Dense layers on top are frozen (i.e., trainable). We, then, wrap the model and data in a Learner object. We specify 4 CPU workers to load batches during training, disable multiprocessing, and use a batch size of 64. You can change these values based on your system specification to see what yields the best peformance.
# let's print the available precanned image classification models in ktrain
vis.print_image_classifiers()
model = vis.image_classifier('pretrained_resnet50', train_data, val_data)
learner = ktrain.get_learner(model=model, train_data=train_data, val_data=val_data,
workers=8, use_multiprocessing=False, batch_size=64)
Next, we freeze the first 15 layers, as the ImageNet pre-trained weights of these early layers are typically applicable as is. All other layers are unfrozen and trainable. You can use the learner.freeze
and learner.unfreeze
methods to selectively freeze and unfreeze layers, if necessary. learner.freeze(freeze_range=15)
and learner.unfreeze(exclude_range=15)
are equivalent. The number of layers you freeze will depend on how similar your dataset is to ImageNet and other particulars of the dataset. For instance, classifying satellite images or subcellular protein patterns may require less frozen layers than classifying pictures of dogs and cats. You can also begin training for a few epochs with many frozen layers and gradually unfreeze layers for later epochs.
learner.freeze(freeze_range=15)
You use the print_layers
function to examine the layers of the created network.
learner.print_layers()
As shown before, we use the Learning Rate Finder in ktrain to find a good initial learning rate.
learner.lr_find()
learner.lr_plot()
Finally, we will use the autofit
method to train our model using a triangular learning rate policy. Since we have not specified the number of epochs, the maximum learning
rate will be periodically reduced when validation loss fails to decrease and eventually stop automatically.
Our final validation accuracy is 99.55% first occuring at the 8th epoch during this run.
learner.autofit(1e-4)
loss, acc = learner.model.evaluate_generator(learner.val_data,
steps=len(learner.val_data))
print('final loss:%s, final accuracy:%s' % (loss, acc))
As can be seen, the final validation accuracy of our model is 99.55%.
Finally, let's use our model to make predictions for some images.
Here is a sample image of both a cat and a dog from the validation set.
!!ls {DATADIR}/valid/cats |head -n3
!!ls {DATADIR}/valid/dogs |head -n3
vis.show_image(DATADIR+'/valid/cats/cat.10016.jpg')
vis.show_image(DATADIR+'/valid/dogs/dog.10001.jpg')
Now, let's create a predictor object to make predictions for the above images.
predictor = ktrain.get_predictor(learner.model, preproc)
Let's see if we predict the selected cat and dog images correctly.
predictor.predict_filename(DATADIR+'/valid/cats/cat.10016.jpg')
predictor.predict_filename(DATADIR+'/valid/dogs/dog.10001.jpg')
Our predictor is working well. We can save our predictor to disk for later use in an application.
predictor.save('/tmp/cat_vs_dog_detector')
Let's load our predictor from disk to show that it still works as expected.
predictor = ktrain.load_predictor('/tmp/cat_vs_dog_detector')
predictor.predict_filename(DATADIR+'/valid/cats/cat.10016.jpg')
predictor.predict_filename(DATADIR+'/valid/dogs/dog.10001.jpg')
Finally, let's make predictions for all the cat pictures in our validation set:
predictor.predict_folder(DATADIR+'/valid/cats/')[:10]
By default, predict*
methods in ktrain return the predicted class labels. To view the predicted probabilities for each class, supply return_proba=True
as an extra argument:
predictor.predict_filename(filename, return_proba=True)
predictor.predict_folder(foldername, return_proba=True)
In the previous example, the classes were mutually exclusive. That is, images contained either a dog or a cat, but not both. Some problems are multi-label classification problems in that each image can belong to multiple classes (or categories). One such instance of this is the Kaggle Planet Competition. In this competition, were are given a collection of satellite images of the Amazon rainforest. The objective here is to identify locations of deforestation and human encroachment on forests by classifying images into up to 17 different categories. Categories include agriculture, habitation, selective_logging, and slash_burn. A given satellite image can belong to more than category. The dataset can be downloaded from the competition page. The satellite images are located in a zipped folder called train-jpg.zip. The labels for each image are in the form of a CSV (i.e., train_v2.csv) with file names and their labels. Let us first examine the CSV file for this dataset. Be sure to set the DATADIR variable to the path of the extracted dataset.
DATADIR = 'data/planet'
!!head {DATADIR}/train_v2.csv
We make three observations.
Let us first convert this CSV into a new CSV that includes one-hot-encoded representations of the tags and appends the file extension to the file names. Since this dataset format is somewhat common (especially for multi-label image classification problems), ktrain contains a convenience function to automatically perform the conversion.
ORIGINAL_DATA = DATADIR+'/train_v2.csv'
CONVERTED_DATA = DATADIR+'/train_v2-CONVERTED.csv'
labels = vis.preprocess_csv(ORIGINAL_DATA,
CONVERTED_DATA,
x_col='image_name', y_col='tags', suffix='.jpg')
!!head {DATADIR}/train_v2-CONVERTED.csv
We can use the images_from_csv
for function to load the data as generators. Remember to specify preprocess_for='resenet50'
, as we will be using a ResNet50 architecture again.
train_data, val_data, preproc = vis.images_from_csv(
CONVERTED_DATA,
'image_name',
directory=DATADIR+'/train-jpg',
val_filepath = None,
label_columns = labels,
data_aug=vis.get_data_aug(horizontal_flip=True, vertical_flip=True))
As before, we load a pre-trained ResNet50 model (the default) and wrap this model and the data in a Learner object. Here, will freeze only the first two layers, as the satelitte images are comparatively more dissimilar to ImageNet. Thus, the weights in earlier layers will need more updating.
model = vis.image_classifier('pretrained_resnet50', train_data, val_data=val_data)
learner = ktrain.get_learner(model, train_data=train_data, val_data=val_data,
batch_size=64, workers=8, use_multiprocessing=False)
learner.freeze(2)
The learning-rate-finder indicates a learning rate of 1e-4 would be a good choice.
learner.lr_find()
learner.lr_plot()
For this dataset, instead of using autofit
, we will use the fit_onecycle
method that utilizes the 1cycle learning rate policy. The final model achieves an F2-score of 0.928, as shown below.
learner.fit_onecycle(1e-4, 20)
If there is not yet evidence of overfitting, it can sometimes be beneficial to train further after early_stopping. Since, the validation loss appears to still decrease, we will train for a little more using a lower learning rate. We only train for one additional epoch here for illustration purposes. Prior training, the current model is saved using the learner.save_model
method in case we end up overfitting. If overfitting, the original model can be restored using the learner.load_model
method.
learner.save_model('/tmp/planet_model')
learner.fit_onecycle(1e-4/10,1)
The evaluation metric for the Kaggle Planet competition was the F2-score.
As shown below, this model achieves an F2-score of 0.928.
from sklearn.metrics import fbeta_score
import numpy as np
import warnings
def f2(preds, targs, start=0.17, end=0.24, step=0.01):
with warnings.catch_warnings():
warnings.simplefilter("ignore")
return max([fbeta_score(targs, (preds>th), 2, average='samples')
for th in np.arange(start,end,step)])
y_pred = learner.model.predict_generator(val_data, steps=len(val_data))
y_true = val_data.labels
f2(y_pred, y_true)
Let's make some predictions using our model and examine results. As before, we first create a Predictor instance.
predictor = ktrain.get_predictor(learner.model, preproc)
Let's examine the folder of images and select a couple to analyze.
!!ls {DATADIR}/train-jpg/ |head
Image train_10008.jpg is categorized into the following classes:
vis.show_image(DATADIR+'/train-jpg/train_10008.jpg')
!!head -n 1 {CONVERTED_DATA}
!!grep train_10008.jpg {CONVERTED_DATA}
Our predictions are consistent as shown below:
predictor.predict_filename(DATADIR+'/train-jpg/train_10008.jpg')
Here is another example showing water, clear, and primary.
vis.show_image(DATADIR+'/train-jpg/train_10010.jpg')
!!head -n 1 {CONVERTED_DATA}
!!grep train_10010.jpg {CONVERTED_DATA}
predictor.predict_filename(DATADIR+'/train-jpg/train_10010.jpg')