Notebook

Instructions¶

To run any of Eden's notebooks, please check the guides on our Wiki page.
There you will find instructions on how to deploy the notebooks on your local system, on Google Colab, or on MyBinder, as well as other useful links, troubleshooting tips, and more.

Note: If you find any issues while executing the notebook, don't hesitate to open an issue on Github. We will try to reply as soon as possible.

Background¶

In this notebook, we are going to cover a technique called Transfer Learning, which generally refers to a process where a machine learning model is trained on one problem, and afterward, it is reused in some way on a second (probably) related problem (Bengio, 2012). Specifically, in deep learning, this technique is used by training only some layers of the pre-trained network. It promises that the training will be more efficient and in the best of the cases the performance will be better compared to a model trained from scratch.

Although it is usual to freeze some low-level layers and fine-tune the top part of the network, there is another possibility that could be more flexible and it consists of fine-tuning different parts of a network by using different learning rates. This way, we could, for example, fine-tune the first layers with a small learning rate of 1e-4, and the fully-connected part of the architecture with a larger learning rate, such as 1e-2. This technique has endless possibilities and you could even update every layer with different learning rates.

In this notebook, we are going to use Tensorflow-Addons to train the different layers of an architecture with different learning rates.

This notebook represents an extension over the previous Eden notebooks:

Library Imports¶

In [1]:

!pip install tensorflow-addons

Requirement already satisfied: tensorflow-addons in /home/beast/anaconda3/envs/image_classification_v2_bak/lib/python3.7/site-packages (0.15.0)
Requirement already satisfied: typeguard>=2.7 in /home/beast/anaconda3/envs/image_classification_v2_bak/lib/python3.7/site-packages (from tensorflow-addons) (2.13.2)

In [10]:

import warnings

warnings.filterwarnings("ignore")

import numpy as np
import cv2
import os
import csv
import gc
import random
import matplotlib.pyplot as plt

from tqdm import tqdm
from glob import glob
from pathlib import Path

import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.applications import *
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.callbacks import ReduceLROnPlateau
import tensorflow.keras.backend as K
from tensorflow import keras

from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from random import sample

from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.models import Sequential
from tensorflow.keras.applications.mobilenet_v3 import preprocess_input

Auxiliar functions¶

Check the docstrings for more information.

In [1]:

# Function for plotting images.
def plot_sample(X):
    """
    Given the array of images <X>, it plots a random subsample of 9 images.

        Parameters:
            X (ndarray): The array with all the images.
    """
    # Plotting 9 sample images
    sample_images = sample(list(X), 9)

    plt.figure(figsize=(8,8))
    for ix, image in enumerate(sample_images):
        plt.subplot(3, 3, ix+1)
        plt.imshow(image)
        plt.axis("off")
    plt.show()


def read_data(path_list, im_size=(224, 224)):
    """
    Given the list of paths where the images are stored <path_list>,
    and the size for image decimation <im_size>, it returns 2 Numpy Arrays
    with the images and labels; and a dictionary with the mapping between
    classes and folders. This will be used later for displaying the predicted
    labels.

        Parameters:
            path_list (List[String]): The list of paths to the images.
            im_size (Tuple): The height and width values.

        Returns:
            X (ndarray): Images
            y (ndarray): Labels
    """

    X = []
    y = []

    # Exctract the file-names of the datasets we read and create a label dictionary.
    tag2idx = {tag.split(os.path.sep)[-1]: i for i, tag in enumerate(path_list)}

    for path in path_list:
        for im_file in tqdm(glob(path + "*/*")):  # Read all files in path
            try:
                # os.path.separator is OS agnostic (either '/' or '\'),[-2] to grab folder name.
                label = im_file.split(os.path.sep)[-2]
                im = cv2.imread(im_file, cv2.IMREAD_COLOR)
                # By default OpenCV read with BGR format, return back to RGB.
                im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
                # Resize to appropriate dimensions.You can try different interpolation methods.
                # im = quantize_image(im)
                im = cv2.resize(im, im_size, interpolation=cv2.INTER_AREA)
                X.append(im)
                y.append(tag2idx[label])  # Append the label name to y
            except Exception as e:
                # In case annotations or metadata are found
                print("Not a picture")

    X = np.array(X)  # Convert list to numpy array.
    y = np.eye(len(np.unique(y)))[y].astype(np.uint8)

    return X, y


# Callbacks are used for saving the best weights and
# early stopping.
def get_callbacks(weights_file, patience):
    """
    Callbacks are used for saving the best weights and early stopping.
    Given some configuration parameters, it creates the callbacks that
    will be used by Keras after each epoch.

        Parameters:
            weights_file (String): File name for saving the best model weights.
            patience (Integer): Number of epochs without improvement to wait.

        Returns:
            callbacks (List[Callbacks]): Configured callbacks ready to use.
    """
    return [
        # Only save the weights that correspond to the maximum validation accuracy.
        ModelCheckpoint(
            filepath=weights_file,
            monitor="val_accuracy",
            mode="max",
            save_best_only=True,
            save_weights_only=True,
        ),
        # If val_loss doesn't improve for a number of epochs set with 'patience' var
        # training will stop to avoid overfitting.
        EarlyStopping(monitor="val_loss", mode="min", patience=patience, verbose=1),
    ]

def get_architecture(y, mobilenet_size):
    """
    Given the parameters, it returns an architecture (MobileNetV3)
    ready to be compiled with different learning rates.
    """
    inputs = layers.Input(shape=INPUT_SHAPE)
    input_aug = img_augmentation(inputs)
    input_norm = layers.Lambda(preprocess_input)(input_aug)  # placeholder in this case

    if mobilenet_size == "small":
        feature_extractor = MobileNetV3Small(
            weights="imagenet", include_top=False, input_tensor=input_norm
        )
    elif mobilenet_size == "large":
        feature_extractor = MobileNetV3Large(
            weights="imagenet", include_top=False, input_tensor=input_norm
        )

    # Create new model on top.
    feataures = layers.GlobalAveragePooling2D(name="pool")(
        feature_extractor.output
    )  # Flattening layer.
    fully = layers.Dense(units=64, activation="relu")(
        feataures
    )  # Add a fully connected layer.
    # Create a Classifier with shape=number_of_training_classes.
    fully = layers.Dropout(0.3)(fully)  # Regularize with dropout.
    out = layers.Dense(units=y.shape[1], activation="softmax")(fully)
    # This is the final model.
    model = Model(inputs, out)

    # model.summary()
    return model


# Plot learning curves for both validation accuracy & loss,
# training accuracy & loss
def plot_performances(performances):
    """
    Given the list of performances (validation accuracies) and method-name <performances>,
    it plots how the validation accuracy progressed during the training/validation process.

        Parameters:
            performances (List[Tuple]): The list of method-performance tuples.
    """
    plt.figure(figsize=(14, 8))
    plt.title("Validation Accuracy vs. Number of Training Epochs")
    plt.xlabel("Training Epochs")
    plt.ylabel("Validation Accuracy")
    for performance in performances:
        plt.plot(
            range(1, len(performance[1]) + 1), performance[1], label=performance[0]
        )
    plt.ylim((0.25, 1.05))
    plt.xticks(np.arange(1, NUM_EPOCHS + 1, 1.0))
    plt.legend()
    plt.show()

Experimental Constants¶

In [22]:

INPUT_SHAPE = (224, 224, 3)
IM_SIZE = (224, 224)
NUM_EPOCHS = 30
BATCH_SIZE = 4
TEST_SPLIT = 0.2
VAL_SPLIT = 0.2
RANDOM_STATE = 2021
WEIGHTS_FILE = "weights.h5"  # File that stores updated weights
# Datasets' paths we want to work on.
PATH_LIST = [
    "eden_library_datasets/Black nightsade-220519-Weed-zz-V1-20210225102034",
    "eden_library_datasets/Velvet leaf-220519-Weed-zz-V1-20210225104123",
]

tf.random.set_seed(RANDOM_STATE)
np.random.seed(RANDOM_STATE)

Key Experimental Constants¶

In this notebook, we are going to use two different learning rates. This way, we can keep ImageNet weights in a more safe way.

In [5]:

ENCODER_LR = 1e-4
CLASSIFIER_LR = 1e-3

Loading images and Data Loaders¶

In [6]:

i = 0
for path in PATH_LIST:
    # Define paths in an OS agnostic way.
    PATH_LIST[i] = str(Path(Path.cwd()).parents[0].joinpath(path))
    i += 1
X, y = read_data(PATH_LIST, IM_SIZE)

 60%|██████████████████████████████████████████████████▊                                 | 75/124 [00:16<00:10,  4.64it/s]

Not a picture

 63%|████████████████████████████████████████████████████▊                               | 78/124 [00:16<00:08,  5.59it/s]Corrupt JPEG data: 40 extraneous bytes before marker 0xd9
100%|███████████████████████████████████████████████████████████████████████████████████| 124/124 [00:26<00:00,  4.62it/s]
 68%|████████████████████████████████████████████████████████▉                           | 82/121 [00:17<00:08,  4.55it/s]

Not a picture

100%|███████████████████████████████████████████████████████████████████████████████████| 121/121 [00:26<00:00,  4.60it/s]

Displaying some sample images¶

In [24]:

plot_sample(X)

Defining the Data Augmentation Pipeline¶

In [25]:

img_augmentation = Sequential(
    [
        preprocessing.RandomRotation(factor=0.15),
        preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),
        preprocessing.RandomFlip(),
        preprocessing.RandomContrast(factor=0.1),
    ],
    name="img_augmentation",
)

2022-05-13 11:48:53.526793: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:53.546355: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:53.547437: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:53.548886: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-13 11:48:53.550305: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:53.551343: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:53.552498: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:54.218720: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:54.220153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:54.221433: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-13 11:48:54.222853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22040 MB memory:  -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:41:00.0, compute capability: 8.6

Visualization of some augmentations¶

In [26]:

IMAGE_IX = 10
image = tf.expand_dims(X[IMAGE_IX], axis=0)

plt.figure(figsize=(8, 8))
for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    aug_img = img_augmentation(image)
    plt.imshow(aug_img[0].numpy().astype("uint8"))
    plt.axis("off")
plt.show()

Training / Fine-Tuning Models¶

In [28]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=TEST_SPLIT, shuffle=True, stratify=y, random_state=RANDOM_STATE
)

X_train, X_val, y_train, y_val = train_test_split(
    X_train,
    y_train,
    test_size=VAL_SPLIT,
    shuffle=True,
    stratify=y_train,
    random_state=RANDOM_STATE,
)

Key Code: Defining two optimizers with different learning rates¶

In [29]:

optimizers = [
    tf.keras.optimizers.Adam(learning_rate=ENCODER_LR),
    tf.keras.optimizers.Adam(learning_rate=CLASSIFIER_LR)
]

Loading the Model with pre-trained weights¶

In [30]:

model = get_architecture(y, mobilenet_size="small")

These are the layers that will be fine-tuned with a larger learning-rate¶

In [31]:

for layer in model.layers[-4:]:
    print(layer.name)

pool
dense
dropout
dense_1

We use the MultiOptimizer class of Tensorflow-Addons to implement the multi learning rate technique¶

In [32]:

optimizers_and_layers = [(optimizers[0], model.layers[0:-4]), (optimizers[1], model.layers[-4:])]
optimizer = tfa.optimizers.MultiOptimizer(optimizers_and_layers)

Compiling the model with 2 optimizers¶

In [33]:

model.compile(optimizer=optimizer,
              loss=keras.losses.CategoricalCrossentropy(),
              metrics=keras.metrics.CategoricalAccuracy(name="accuracy"))

Training Model¶

In [34]:

%%time
history_v3Small = model.fit(
    X_train,  # train data
    y_train,  # labels
    batch_size=BATCH_SIZE,
    epochs=NUM_EPOCHS,
    validation_data=(X_val, y_val),
    callbacks=get_callbacks(WEIGHTS_FILE, NUM_EPOCHS // 2),
)

Epoch 1/30

2022-05-13 11:49:45.550442: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8101
2022-05-13 11:49:47.395048: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

39/39 [==============================] - 10s 67ms/step - loss: 0.7551 - accuracy: 0.4903 - val_loss: 0.6886 - val_accuracy: 0.5128
Epoch 2/30
39/39 [==============================] - 2s 40ms/step - loss: 0.5677 - accuracy: 0.6645 - val_loss: 0.6282 - val_accuracy: 0.5385
Epoch 3/30
39/39 [==============================] - 2s 40ms/step - loss: 0.5179 - accuracy: 0.7484 - val_loss: 0.5293 - val_accuracy: 0.7436
Epoch 4/30
39/39 [==============================] - 2s 41ms/step - loss: 0.3108 - accuracy: 0.8839 - val_loss: 0.4152 - val_accuracy: 0.7692
Epoch 5/30
39/39 [==============================] - 1s 36ms/step - loss: 0.3376 - accuracy: 0.8129 - val_loss: 0.7371 - val_accuracy: 0.6923
Epoch 6/30
39/39 [==============================] - 1s 36ms/step - loss: 0.3393 - accuracy: 0.8710 - val_loss: 0.6510 - val_accuracy: 0.6667
Epoch 7/30
39/39 [==============================] - 2s 40ms/step - loss: 0.1751 - accuracy: 0.9290 - val_loss: 0.2467 - val_accuracy: 0.8462
Epoch 8/30
39/39 [==============================] - 1s 37ms/step - loss: 0.1769 - accuracy: 0.9419 - val_loss: 0.3886 - val_accuracy: 0.7949
Epoch 9/30
39/39 [==============================] - 1s 35ms/step - loss: 0.1947 - accuracy: 0.9355 - val_loss: 0.3424 - val_accuracy: 0.7949
Epoch 10/30
39/39 [==============================] - 2s 41ms/step - loss: 0.3290 - accuracy: 0.8710 - val_loss: 0.0851 - val_accuracy: 1.0000
Epoch 11/30
39/39 [==============================] - 1s 37ms/step - loss: 0.1703 - accuracy: 0.9484 - val_loss: 0.0684 - val_accuracy: 1.0000
Epoch 12/30
39/39 [==============================] - 1s 36ms/step - loss: 0.2044 - accuracy: 0.9355 - val_loss: 0.0539 - val_accuracy: 1.0000
Epoch 13/30
39/39 [==============================] - 1s 35ms/step - loss: 0.1772 - accuracy: 0.9355 - val_loss: 0.0658 - val_accuracy: 1.0000
Epoch 14/30
39/39 [==============================] - 1s 35ms/step - loss: 0.1272 - accuracy: 0.9419 - val_loss: 0.0639 - val_accuracy: 0.9744
Epoch 15/30
39/39 [==============================] - 1s 36ms/step - loss: 0.0573 - accuracy: 0.9677 - val_loss: 0.0701 - val_accuracy: 0.9744
Epoch 16/30
39/39 [==============================] - 1s 35ms/step - loss: 0.2136 - accuracy: 0.9419 - val_loss: 0.0844 - val_accuracy: 0.9744
Epoch 17/30
39/39 [==============================] - 1s 36ms/step - loss: 0.1312 - accuracy: 0.9355 - val_loss: 0.0488 - val_accuracy: 0.9744
Epoch 18/30
39/39 [==============================] - 1s 35ms/step - loss: 0.1271 - accuracy: 0.9548 - val_loss: 0.0649 - val_accuracy: 0.9744
Epoch 19/30
39/39 [==============================] - 1s 35ms/step - loss: 0.0580 - accuracy: 0.9871 - val_loss: 0.0462 - val_accuracy: 1.0000
Epoch 20/30
39/39 [==============================] - 1s 35ms/step - loss: 0.0882 - accuracy: 0.9742 - val_loss: 0.0264 - val_accuracy: 1.0000
Epoch 21/30
39/39 [==============================] - 1s 35ms/step - loss: 0.2152 - accuracy: 0.9290 - val_loss: 0.0343 - val_accuracy: 1.0000
Epoch 22/30
39/39 [==============================] - 1s 38ms/step - loss: 0.0639 - accuracy: 0.9742 - val_loss: 0.0205 - val_accuracy: 1.0000
Epoch 23/30
39/39 [==============================] - 1s 35ms/step - loss: 0.0651 - accuracy: 0.9742 - val_loss: 0.0540 - val_accuracy: 0.9744
Epoch 24/30
39/39 [==============================] - 1s 36ms/step - loss: 0.0722 - accuracy: 0.9677 - val_loss: 0.0819 - val_accuracy: 0.9231
Epoch 25/30
39/39 [==============================] - 1s 37ms/step - loss: 0.0571 - accuracy: 0.9871 - val_loss: 0.1729 - val_accuracy: 0.9231
Epoch 26/30
39/39 [==============================] - 1s 35ms/step - loss: 0.0647 - accuracy: 0.9871 - val_loss: 0.0757 - val_accuracy: 0.9744
Epoch 27/30
39/39 [==============================] - 1s 35ms/step - loss: 0.1130 - accuracy: 0.9677 - val_loss: 0.1903 - val_accuracy: 0.9231
Epoch 28/30
39/39 [==============================] - 1s 36ms/step - loss: 0.0726 - accuracy: 0.9806 - val_loss: 0.1031 - val_accuracy: 0.9487
Epoch 29/30
39/39 [==============================] - 1s 36ms/step - loss: 0.0476 - accuracy: 0.9742 - val_loss: 0.0984 - val_accuracy: 0.9487
Epoch 30/30
39/39 [==============================] - 1s 36ms/step - loss: 0.0148 - accuracy: 1.0000 - val_loss: 0.0947 - val_accuracy: 0.9231
CPU times: user 53.9 s, sys: 4.49 s, total: 58.4 s
Wall time: 51.4 s

Evaluating the trained model¶

In [35]:

model.load_weights(WEIGHTS_FILE)
final_accuracy_adam_lr2 = model.evaluate(X_test, y_test, batch_size=1, verbose=0)[1]
print("*" * 50)
print(f"Final MobileNetV3-Small. Accuracy: {final_accuracy_adam_lr2}")
print("*" * 50)
print()

**************************************************
Final MobileNetV3-Small. Accuracy: 0.9591836929321289
**************************************************

Conclusions¶

This technique should be explored while fine-tuning to find a balance between freezing ImageNet weights or modifying them at the same rate as the fully-connected part of the network.

Possible Extensions¶

Use a different pre-trained network (for instance, MobileNetV3Large).
Try a different training approach where pre-trained weights are not loaded.
Try different epochs and batch sizes.

Bibliography¶

Bengio, Y., 2012. Deep Learning of Representations for Unsupervised and Transfer Learning. In: Journal of Machine Learning Research; 17–37.

Wang, G., Sun, Y., Wang, J., (2017). Automatic Image-Based Plant Disease Severity Estimation Using Deep Learning. Computational Intelligence and Neuroscience; 2017:8.

Mehdipour-Ghazi, M., Yanikoglu, B.A., & Aptoula, E. (2017). Plant identification using deep neural networks via optimization of transfer learning parameters. Neurocomputing, 235, 228-235.

Suh, H.K., IJsselmuiden, J., Hofstee, J.W., van Henten, E.J., (2018). Transfer learning for the classification of sugar beet and volunteer potato under field conditions. Biosystems Engineering; 174:50–65.

Kounalakis T., Triantafyllidis G. A., Nalpantidis L., (2019). Deep learning-based visual recognition of rumex for robotic precision farming. Computers and Electronics in Agriculture.

Too, E.C., Yujian, L., Njuki, S., & Ying-chun, L. (2019). A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric., 161, 272-279.

Espejo-Garcia, B., Mylonas, N., Athanasakos, L., & Fountas, S., (2020). Improving Weeds Identification with a Repository of Agricultural Pre-trained Deep Neural Networks. Computers and Electronics in Agriculture; 175 (August).

Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., & Chen, L. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510-4520.

Howard, A.G., Sandler, M., Chu, G., Chen, L., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., & Adam, H. (2019). Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 1314-1324.

https://medium.com/geekculture/a-2021-guide-to-improving-cnns-optimizers-adam-vs-sgd-495848ac6008

https://arxiv.org/abs/1609.04747