Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
In this notebook, we'll cover how to test different hyperparameters for a particular dataset and how to benchmark different parameters across a group of datasets. Note that this re-uses functionality which was already introduced and described in the classification/notebooks/11_exploring_hyperparameters.ipynb notebook. Please refer to that notebook for all explanations, which this notebook will not repeat.
For an example of how to scale up with remote GPU clusters on Azure Machine Learning, please view 24_exploring_hyperparameters_on_azureml.ipynb.
Ensure edits to libraries are loaded and plotting is shown in the notebook.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
We start by importing the utilities we need.
import sys
import numpy as np
import scrapbook as sb
import torch
import fastai
from fastai.vision import DatasetType
sys.path.append("../../")
from utils_cv.classification.data import Urls
from utils_cv.common.data import unzip_url
from utils_cv.classification.parameter_sweeper import ParameterSweeper, clean_sweeper_df, plot_sweeper_df
from utils_cv.similarity.data import comparative_set_builder
from utils_cv.similarity.metrics import positive_image_ranks
from utils_cv.similarity.model import compute_features_learner
fastai.__version__
'1.0.48'
Define the datasets and parameters we will use in this notebook.
DATA_PATHS = [unzip_url(Urls.fridge_objects_path, exist_ok=True), unzip_url(Urls.fridge_objects_watermark_path, exist_ok=True)]
REPS = 3
LEARNING_RATES = [1e-3, 1e-4, 1e-5]
IM_SIZES = [300, 500]
EPOCHS = [16]
DROPOUTS = [0] #Leave dropout at zero. Higher values tend to perform significantly worse
For image classification, we used the percentage of correctly labeled images to measure accuracy. For image retrieval, our measure is the rank of the positive example among a large number of negatives. This was described in the 01_training_and_evaluation_introduction.ipynb notebook, and we will re-use some of the code from that notebook in the definition of the retrieval_rank() function below.
def retrieval_rank(learn):
data = learn.data
# Build multiple sets of comparative images from the validation images
comparative_sets = comparative_set_builder(
data.valid_ds, num_sets=1000, num_negatives=99
)
# Use penultimate layer as image representation
embedding_layer = learn.model[1][-2]
# Compute DNN features for all validation images
valid_features = compute_features_learner(
data, DatasetType.Valid, learn, embedding_layer
)
assert len(list(valid_features.values())[0]) == 512
# For each comparative set compute the distances between the query image and all reference images
for cs in comparative_sets:
cs.compute_distances(valid_features)
# Compute the median rank of the positive example over all comparative sets
ranks = positive_image_ranks(comparative_sets)
median_rank = np.median(ranks)
return median_rank
We start by creating the Parameter Sweeper object. Before we start testing, it's a good idea to see what the default parameters are. We can use a the property parameters
to easily see those default values.
sweeper = ParameterSweeper(metric_name="rank")
sweeper.parameters
OrderedDict([('learning_rate', [0.0001]), ('epochs', [15]), ('batch_size', [16]), ('im_size', [299]), ('architecture', [<Architecture.resnet18: functools.partial(<function resnet18 at 0x7f443ed99598>)>]), ('transform', [True]), ('dropout', [0.5]), ('weight_decay', [0.01]), ('training_schedule', [<TrainingSchedule.head_first_then_body: 'head_first_then_body'>]), ('discriminative_lr', [False]), ('one_cycle_policy', [True])])
Now that we know the defaults, we can pass it the parameters we want to test, and run the parameter sweep.
sweeper.update_parameters(learning_rate=LEARNING_RATES, im_size=IM_SIZES, epochs=EPOCHS, dropout=DROPOUTS)
df = sweeper.run(datasets=DATA_PATHS, reps=REPS, metric_fct=retrieval_rank);
df
this Learner object self-destroyed - it still exists, but no longer usable
duration | rank | |||
---|---|---|---|---|
0 | PARAMETERS [learning_rate: 0.0001]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 27.527227 | 3.0 |
fridgeObjectsWatermark | 29.327206 | 7.0 | ||
PARAMETERS [learning_rate: 0.0001]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 46.494100 | 7.0 | |
fridgeObjectsWatermark | 40.436677 | 9.0 | ||
PARAMETERS [learning_rate: 0.001]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 29.745073 | 2.0 | |
fridgeObjectsWatermark | 28.454158 | 1.0 | ||
PARAMETERS [learning_rate: 0.001]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 44.277393 | 1.0 | |
fridgeObjectsWatermark | 40.866518 | 1.0 | ||
PARAMETERS [learning_rate: 1e-05]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 28.009722 | 20.0 | |
fridgeObjectsWatermark | 29.721222 | 22.0 | ||
PARAMETERS [learning_rate: 1e-05]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 40.376158 | 25.0 | |
fridgeObjectsWatermark | 42.627545 | 34.0 | ||
1 | PARAMETERS [learning_rate: 0.0001]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 30.931857 | 4.0 |
fridgeObjectsWatermark | 26.125927 | 8.5 | ||
PARAMETERS [learning_rate: 0.0001]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 46.117437 | 11.0 | |
fridgeObjectsWatermark | 40.555442 | 10.0 | ||
PARAMETERS [learning_rate: 0.001]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 29.870988 | 1.0 | |
fridgeObjectsWatermark | 25.864497 | 1.0 | ||
PARAMETERS [learning_rate: 0.001]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 46.807896 | 1.0 | |
fridgeObjectsWatermark | 41.351353 | 1.0 | ||
PARAMETERS [learning_rate: 1e-05]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 25.873023 | 26.0 | |
fridgeObjectsWatermark | 25.889981 | 26.0 | ||
PARAMETERS [learning_rate: 1e-05]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 41.558083 | 23.0 | |
fridgeObjectsWatermark | 41.196609 | 30.0 | ||
2 | PARAMETERS [learning_rate: 0.0001]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 25.954923 | 3.0 |
fridgeObjectsWatermark | 25.766089 | 4.0 | ||
PARAMETERS [learning_rate: 0.0001]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 40.162561 | 9.0 | |
fridgeObjectsWatermark | 41.274331 | 9.0 | ||
PARAMETERS [learning_rate: 0.001]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 26.026493 | 3.0 | |
fridgeObjectsWatermark | 25.616917 | 1.0 | ||
PARAMETERS [learning_rate: 0.001]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 40.691592 | 1.0 | |
fridgeObjectsWatermark | 40.468641 | 1.0 | ||
PARAMETERS [learning_rate: 1e-05]|[epochs: 16]|[batch_size: 16]|[im_size: 300]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 26.239308 | 19.0 | |
fridgeObjectsWatermark | 26.347744 | 23.0 | ||
PARAMETERS [learning_rate: 1e-05]|[epochs: 16]|[batch_size: 16]|[im_size: 500]|[arch: resnet18]|[transforms: True]|[dropout: 0]|[weight_decay: 0.01]|[training_schedule: head_first_then_body]|[discriminative_lr: False]|[one_cycle_policy: True] | fridgeObjects | 41.121185 | 33.0 | |
fridgeObjectsWatermark | 40.931316 | 30.0 |
When we read in multi-index dataframe, index 0 represents the run number, index 1 represents a single permutation of parameters, and index 2 represents the dataset. To see the results, show the df using the clean_sweeper_df
helper function. This will display all the hyperparameters in a nice, readable way.
df = clean_sweeper_df(df)
Since we've run our benchmarking over 3 repetitions, we may want to just look at the averages across the different run numbers.
df.mean(level=(1,2)).T
P: [learning_rate: 0.0001] [im_size: 300] | P: [learning_rate: 0.0001] [im_size: 500] | P: [learning_rate: 0.001] [im_size: 300] | P: [learning_rate: 0.001] [im_size: 500] | P: [learning_rate: 1e-05] [im_size: 300] | P: [learning_rate: 1e-05] [im_size: 500] | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
fridgeObjects | fridgeObjectsWatermark | fridgeObjects | fridgeObjectsWatermark | fridgeObjects | fridgeObjectsWatermark | fridgeObjects | fridgeObjectsWatermark | fridgeObjects | fridgeObjectsWatermark | fridgeObjects | fridgeObjectsWatermark | |
duration | 28.138002 | 27.073074 | 44.258033 | 40.755483 | 28.547518 | 26.645191 | 43.925627 | 40.895504 | 26.707351 | 27.319649 | 41.018475 | 41.585157 |
rank | 3.333333 | 6.500000 | 9.000000 | 9.333333 | 2.000000 | 1.000000 | 1.000000 | 1.000000 | 21.666667 | 23.666667 | 27.000000 | 31.333333 |
Print the average accuracy over the different runs for each dataset independently.
ax = df.mean(level=(1,2))["rank"].unstack().plot(kind='bar', figsize=(12, 6))
Additionally, we may want simply to see which set of hyperparameters perform the best across the different datasets. We can do that by averaging the results of the different datasets.
df.mean(level=(1)).T
P: [learning_rate: 0.0001] [im_size: 300] | P: [learning_rate: 0.0001] [im_size: 500] | P: [learning_rate: 0.001] [im_size: 300] | P: [learning_rate: 0.001] [im_size: 500] | P: [learning_rate: 1e-05] [im_size: 300] | P: [learning_rate: 1e-05] [im_size: 500] | |
---|---|---|---|---|---|---|
duration | 27.605538 | 42.506758 | 27.596354 | 42.410565 | 27.013500 | 41.301816 |
rank | 4.916667 | 9.166667 | 1.500000 | 1.000000 | 22.666667 | 29.166667 |
To make it easier to see which permutation did the best, we can plot the results using the plot_sweeper_df
helper function. This plot will help us easily see which parameters offer the highest accuracies.
plot_sweeper_df(df.mean(level=(1)), sort_by="rank")
# Preserve some of the notebook outputs
sb.glue("nr_elements", len(df))
sb.glue("ranks", list(df.mean(level=(1))["rank"]))
sb.glue("max_duration", df.max().duration)
sb.glue("min_duration", df.min().duration)