In this post, we will implement the Image classification (especially on Cat and dog dataset in kaggle) with Convolutional Neural Network using Tensorflow.
Previously, We built Artificial Neural Network for Fashion MNIST classifier. It contains several Dense (or Fully Connected) Layer which node has its weight. It works in image classification, but to do that, it requires numerous of parameters. Huge dataset like ImageNet containing hundreds and thousands of images cannot be trained with Artificial Neural Network.
But in 2012, CNN based AlexNet (Krizhevsky et al, 2012) out-perform the classification compared with classic techniques. From that moment, CNN is widely used in computer vision area, and it has been one of main root of Deep Neural Network Architecture.
In this post, we will implement CNN model which can classify the images of Cats and Dogs. Its dataset was published on Kaggle in 2013. As you know, Cats and Dogs have each unique appearance that can extract it as a feature. But there are many types contained in Cats and Dogs. Of course, it is hard to define the type of animals, but in this case, we just want to classify cats or dogs.
From previous post, we introduced how to build the CNN model, and data augmentation for data transformation, we will apply that. And one more thing, we will try transfer learning with pre-trained well-known model. Transfer learning is sort of learning method that train with huge dataset in advance, then replace the output layer with our purpose. For instance, MobileNet is well-trained with ImageNet dataset, but our goal is to classfity just two classes, cats and dogs. So we modify the MobileNet model for our case. We'll see it later in the post.
At first, we need to import some packages for implementation.
import os
import time
import re
from glob import glob
import shutil
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from PIL import Image
print("Tensorflow: v{}".format(tf.__version__))
%matplotlib inline
Tensorflow: v2.2.0
Mentioned earlier, dataset is released in Kaggle. Original dataset has 12500 images of dogs and 12500 images of cats, in 25000 images in total. That's a huge amount to train the model. But in our case, we just only use 1000 images for training, 500 images for validation, and 1000 images for test.
Actually, 1000 images are not enough datasets for training. But mentioned earlier, we already learn about how to size-up the dataset with transformation. Yes, it is data augmentation. There are several techniques to transform the image. In this case, we will use following transformations:
Of course, model input must be the same size. So after data augmentation, we need to resize our transformed image to fixed size. In this case, we choose 150x150 for the input image.
At first, we implement image load method.
def load(f, label):
# load the file into tensor
image = tf.io.read_file(f)
# Decode it to JPEG format
image = tf.image.decode_jpeg(image)
# Convert it to tf.float32
image = tf.cast(image, tf.float32)
return image, label
So let's test it for checking functionality.
image, label = load('./dataset/my_cat_dog/train/cat/cat.11.jpg', 0)
fig, ax = plt.subplots()
ax.imshow(image /255.)
ax.set_title(label)
plt.show()
And it is required to fix the input format. For this purpose, we need to implement resize
function. Tensorflow has image class(tf.image
) to handle the image processing in advance. we can use it. Note that size
argument must have an order of [height, width]
.
def resize(input_image, size):
return tf.image.resize(input_image, size)
So this is a sample image of Cat, which has label of '0'(Cats). Then we will implement random crop function. Actually, Tensorflow already contains random_crop API for convenience.
def random_crop(input_image):
return tf.image.random_crop(input_image, size=[150, 150, 3])
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image / 255.)
ax[0].set_title("Original Image")
ax[1].imshow(random_crop(image) / 255.)
ax[1].set_title("Random Crop Image")
plt.show()
And think about that when validation or test data is used. Can Random_crop help the performance of model? You know that validation and test processs don't affect the model training, it just measure the accuracy. So we don't need to data augmentation process in validation and test data. And sometimes random crop may crop the useless section of image that cannot classify correctly. In that case, central_crop function is required, not random crop. So We implement it using tensorflow.
From documentation, it needs to define central_fraction
as an argument. It means that this API crops from the center point based on the fraction. Our purpose is to made an input data with 150x150x3.
But remember that the size of each image may be different. So we need to resize it in advance.
def central_crop(input_image):
image = resize(input_image, [176, 176])
return tf.image.central_crop(image, central_fraction=0.84)
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image / 255.)
ax[0].set_title("Original Image")
ax[1].imshow(random_crop(image) / 255.)
ax[1].set_title("Central Crop Image")
plt.show()
Then we can implement random_rotation API.
def random_rotation(input_image):
angles = np.random.randint(0, 3, 1)
return tf.image.rot90(input_image, k=angles[0])
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image / 255.)
ax[0].set_title("Original Image")
ax[1].imshow(random_rotation(image) / 255.)
ax[1].set_title("Rotate Image")
plt.show()