TL;DR: We introduce a dataset, Noisy Imagenette, which is a version of the Imagenette dataset with noisy labels. We hope this dataset is useful for rapid experimentation and testing of methods to address noisy label training.
Deep learning has led to impressive results on datasets of all types, but its success often shines when models are trained with large datasets with human-annotated labels (extreme example: GPT-3 and more recently CLIP/ALIGN/DALL-E). A major challenge when constructing these datasets is obtaining enough labels to train a neural network model. There is an inherent tradeoff between the quality of the annotations and the cost of annotation (in the form of time or money). For example, while using sources like Amazon Mechanical Turk provide cheap labeling, the use of these non-expert labeling services will often produce unreliable labels. This is what is referred to as noisy labels, as these unreliable labels are not necessarily ground truth. Unfortunately, neural networks are known to be susceptible to overfitting to noisy labels (see here) which means alternative approaches are needed to achieve good generalization in the presence of noisy labels.
Recently, many techniques have been presented in order to address label noise. These include novel loss functions like Bi-Tempered Logistic LossTaylor Cross Entropy Loss, or Symmetric Cross Entropy. Additionally, there are many novel training techniques that have been recently developed like MentorMix, DivideMix, Early-Learning Regularization and Noise-Robust Contrastive Learning.
Most of these papers are using MNIST, SVHN, CIFAR10 or related datasets with synthetically-added noise. Other common datasets are the WebVision and Clothing1M datasets, which are real-world noisy, large-scale datasets with millions of images. Therefore there is an opportunity to develop a mid-scale dataset that allows for rapid prototyping but is complex enough to provide useful results when it comes to noisy label training.
The idea of mid-scale datasets for rapid prototyping has been explored in the past. For example, in 2019, fast.ai released the Imagenette and Imagewoof datasets (subsequently updated in 2020), subsets of Imagenet for rapid experimentation and prototyping. It can serve as a small dataset proxy for the ImageNet, or a dataset with more complexity than MNIST or CIFAR10 but still small and simple enough for benchmarking and rapid experimentation. This dataset has been used to test and establish new training techniques like Mish activation function and Ranger optimizer (see here). The dataset also has been used in various papers (see here, here, here, here, here, and here). Clearly, this dataset has been quite useful to machine learning researchers and practitioners for testing and comparing new methods. We believe that an analogous dataset could be useful to researchers with modest compute for testing and comparing new methods for addressing label noise.
We introduce Noisy Imagenette, a version of Imagenette (and Imagewoof) that has synthetically noisy labels at different levels: 1%, 5%, 25%, and 50% incorrect labels. The Noisy Imagenette dataset already comes with the Imagenette dataset:
from fastai.vision.all import *
source = untar_data(URLs.IMAGENETTE)
While the regular labels for Imagenette dataset are given as the names of the image folder, the noisy labels are provided as a separate CSV file with columns corresponding to the image filename and labels for each of the different noise levels:
csv_file = pd.read_csv(source/'noisy_imagenette.csv')
csv_file.head()
path | noisy_labels_0 | noisy_labels_1 | noisy_labels_5 | noisy_labels_25 | noisy_labels_50 | is_valid | |
---|---|---|---|---|---|---|---|
0 | train/n02979186/n02979186_9036.JPEG | n02979186 | n02979186 | n02979186 | n02979186 | n02979186 | False |
1 | train/n02979186/n02979186_11957.JPEG | n02979186 | n02979186 | n02979186 | n02979186 | n03000684 | False |
2 | train/n02979186/n02979186_9715.JPEG | n02979186 | n02979186 | n02979186 | n03417042 | n03000684 | False |
3 | train/n02979186/n02979186_21736.JPEG | n02979186 | n02979186 | n02979186 | n02979186 | n03417042 | False |
4 | train/n02979186/ILSVRC2012_val_00046953.JPEG | n02979186 | n02979186 | n02979186 | n02979186 | n03394916 | False |
The generation of these noisy labels are provided in this Jupyter notebook. We have also updated fastai's train_imagenette.py to utilize the new noisy labels. If you want to train on the Noisy Imagenette dataset using this script, just simply pass the --pct-noise
argument to the script with the desired noise level.
:::{.callout-note}
The validation set remains clean and its labels are not changed. While technically the accuracy metric is robust to noise, I believe it's simpler to use a clean validation set to clearly understand see if a model is learning appropriate decision boundaries on the ground truth.
:::
For the original Imagenette dataset, there are technically (3 image sizes)*(4 number of epoch levels) for both Imagenette and Imagewoof giving a total of 24 leaderboards. If we had each of these 24 leaderboards for the previously mentioned 4 noise levels (1%, 5%, 25%, 50%), that would give us 96 leaderboards! Instead, the Imagenette repository only maintains leaderboards for Noisy Imagenette (and not Imagewoof) for 5% and 50% noise (24 leaderboards). Just like with the regular Imagenette leaderboards, feel free to send a pull request to the Imagenette repository with your results if it beats the current top score. I have provided a baseline which is currently on the leaderboards, as well as a CSV file with the baseline accuracy for all 96 leaderboards.
For some background, I started looking into training with label noise because of the recent Cassava Leaf Disease Kaggle Competition (my team was able win a silver medal, see here), which had a really noisy dataset. One of the recent techniques I heard about was SAM, which recently achieved a state-of-the-art score on ImageNet (only to be beaten in a few weeks by techniques/models like Meta Pseudo Labels and NFNets). However, the paper also included some improvements to noisy label training. I had a fastai implementation in-progress for the SAM optimizer (probably will describe in an upcoming blog post) and I wanted to test out its noisy label training capabilities on a dataset. I thought about corrupting the Imagenette labels and use that as my dataset for testing SAM. Jeremy Howard suggested adding it to the main Imagenette dataset and here we are!
In conclusion, I hope that this Noisy Imagenette datasets serves as a useful benchmarking dataset for machine learning community when it comes to testing and comparing techniques for training on noisy labels. I hope to experiment with some of these techniques like SAM, the different loss functions, etc. and record those results over on this blog, so be sure to keep an eye on this blog, or follow me on Twitter to get the latest updates!
I'd like to thank Jeremy Howard and especially Hamel Husain for adding the Noisy Imagenette dataset. I also would like to thank Hamel Husain for reviewing my blog post and providing feedback. I'd like to thank Isaac Flath for pointing out an error I originally had when generating the dataset.