Dataset Enrichment with Zero-Shot Classification Models¶

This notebook is Part 1 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.

Part 1 - Dataset Enrichment with Zero-Shot Classification Models
Part 2 - Dataset Enrichment with Zero-Shot Detection Models
Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models

This notebook shows how to can enrich your image dataset using labels generated with open-source zero-shot image classification (or image tagging) models such as Recognize Anything (RAM) and Tag2Text.

By the end of the notebook, you'll learn how to:

Install and load the RAM and Tag2Text models in fastdup.
Enrich the your dataset using labels generated by RAM and Tag2Text model.
Run inference using RAM and Tag2Text model on a single image.

Installation¶

First, let's install the necessary packages:

fastdup - To analyze issues in the dataset.
Recognize Anything - To use the RAM and Tag2Text model.
gdown - To download demo data hosted on Google Drive.

🗒 Note - We highly recommending running this notebook in CUDA enabled environment to reduce the run time.

In [ ]:

!pip install -Uq fastdup git+https://github.com/xinyu1205/recognize-anything.git@119a7ae42fb2ce75459cd9107b353bc508460023 gdown

Now, test the installation. If there's no error message, we are ready to go.

In [1]:

import fastdup
fastdup.__version__

Out[1]:

'1.53'

Download Dataset¶

Download the coco-minitrain dataset - a curated mini training set consisting of 20% of COCO 2017 training dataset. The coco-minitrain consists of 25,000 images and annotations.

In [ ]:

!gdown --fuzzy https://drive.google.com/file/d/1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK/view
!unzip -qq coco_minitrain_25k.zip

Inference with RAM and Tag2Text¶

Within fastdup you can readily use the zero-shot image tagging models such as Recognize Anything Model (RAM) and Tag2Text.

Both Tag2Text and RAM exhibit strong recognition ability:

RAM is an image tagging model, which can recognize any common category with high accuracy. Outperforms CLIP and BLIP.
Tag2Text is a vision-language model guided by tagging, which can support caption, retrieval and tagging.

1. Inference on a bulk of images¶

To run inference on downloaded dataset, you first need to load the image paths into a DataFrame.

In [2]:

import pandas as pd
from fastdup.utils import get_images_from_path

fd = fastdup.create(input_dir='./coco_minitrain_25k')
filenames = get_images_from_path(fd.input_dir)

df = pd.DataFrame(filenames, columns=["filename"])
df

Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.

Out[2]:

	filename
0	coco_minitrain_25k/images/val2017/000000382734.jpg
1	coco_minitrain_25k/images/val2017/000000508730.jpg
2	coco_minitrain_25k/images/val2017/000000202339.jpg
3	coco_minitrain_25k/images/val2017/000000460929.jpg
4	coco_minitrain_25k/images/val2017/000000181796.jpg
...	...
29995	coco_minitrain_25k/images/train2017/000000065630.jpg
29996	coco_minitrain_25k/images/train2017/000000062839.jpg
29997	coco_minitrain_25k/images/train2017/000000221911.jpg
29998	coco_minitrain_25k/images/train2017/000000292451.jpg
29999	coco_minitrain_25k/images/train2017/000000402948.jpg

30000 rows × 1 columns

fastdup provides a convenient API fd.enrich to enrich the metadata of the images loaded into a DataFrame.

In [3]:

NUM_ROWS_TO_ENRICH = 10                          # for demonstration, only run on 10 rows. 

df = fd.enrich(task='zero-shot-classification',
               model='recognize-anything-model', # specify model
               input_df=df,                      # the DataFrame of image files to enrich.
               input_col='filename',             # the name of the filename column.
               num_rows=NUM_ROWS_TO_ENRICH       # number of rows in the DataFrame to enrich. Optional.
     )

INFO:fastdup.models.ram:Loading model checkpoint from - /home/dnth/ram_swin_large_14m.pth
INFO:fastdup.models.ram:Model loaded to device - cuda

The above code loads the RAM model, runs inference on an images in the filename column and creates a new column ram_tags that contains the labels.

In [4]:

df

Out[4]:

	filename	ram_tags
0	coco_minitrain_25k/images/val2017/000000382734.jpg	bath . bathroom . doorway . drain . floor . glass door . room . screen door . shower . white
1	coco_minitrain_25k/images/val2017/000000508730.jpg	baby . bathroom . bathroom accessory . bin . boy . brush . chair . child . comb . diaper . hair . hairbrush . play . potty . sit . stool . tile wall . toddler . toilet bowl . toilet seat . toy
2	coco_minitrain_25k/images/val2017/000000202339.jpg	bus . bus station . business suit . carry . catch . city bus . pillar . man . shopping bag . sign . suit . tie . tour bus . walk
3	coco_minitrain_25k/images/val2017/000000460929.jpg	beer . beer bottle . beverage . blanket . bottle . roll . can . car . chili dog . condiment . table . dog . drink . foil . hot . hot dog . mustard . picnic table . sit . soda . tinfoil . tomato sauce . wrap
4	coco_minitrain_25k/images/val2017/000000181796.jpg	bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass
5	coco_minitrain_25k/images/val2017/000000052565.jpg	beach . black . cattle . coast . cow . sea . photo . sand . shore . shoreline . stand . stare . water . white
6	coco_minitrain_25k/images/val2017/000000503755.jpg	catch . court . hand . goggles . necklace . play . racket . stand . sunglasses . tennis court . tennis player . tennis racket . wear . woman
7	coco_minitrain_25k/images/val2017/000000477955.jpg	attach . beach . catch . coast . fly . person . kite . man . sea . parachute . parasail . sand . sky . stand . string . surfboard . surfer . wetsuit
8	coco_minitrain_25k/images/val2017/000000562229.jpg	boy . child . ride . road . skateboard . skateboarder . stand . trick
9	coco_minitrain_25k/images/val2017/000000528862.jpg	animal . area . dirt field . enclosure . fence . field . giraffe . habitat . herd . lush . pen . savanna . stand . tree . walk . zoo

Similarly, we can run the fd.enrich API using the Tag2Text model.

In [5]:

df = fd.enrich(task='zero-shot-classification', 
               model='tag2text', 
               input_df=df, 
               input_col='filename'
     )

INFO:fastdup.model.tag2text:Loading model checkpoint from - /home/dnth/tag2text_swin_14m.pth
INFO:fastdup.model.tag2text:Model loaded to device - cuda

In [6]:

df

Out[6]:

	filename	ram_tags	tag2text_tags	tag2text_caption
0	coco_minitrain_25k/images/val2017/000000382734.jpg	bath . bathroom . doorway . drain . floor . glass door . room . screen door . shower . white	room . floor . bathroom . shower . wall . toilet . green . white	a bathroom with green walls and a white toilet
1	coco_minitrain_25k/images/val2017/000000508730.jpg	baby . bathroom . bathroom accessory . bin . boy . brush . chair . child . comb . diaper . hair . hairbrush . play . potty . sit . stool . tile wall . toddler . toilet bowl . toilet seat . toy	hair . bathroom . brush . girl . boy . child . toddler . kid . couple . toilet . sit . play . sit in . sit on . small . young . little	a couple of small kids sitting on a toilet in a bathroom with a little girl playing with her brush and hair
2	coco_minitrain_25k/images/val2017/000000202339.jpg	bus . bus station . business suit . carry . catch . city bus . pillar . man . shopping bag . sign . suit . tie . tour bus . walk	bag . bus . suit . luggage . man . hold . carry . walk	a man in a suit carrying a bag of luggage
3	coco_minitrain_25k/images/val2017/000000460929.jpg	beer . beer bottle . beverage . blanket . bottle . roll . can . car . chili dog . condiment . table . dog . drink . foil . hot . hot dog . mustard . picnic table . sit . soda . tinfoil . tomato sauce . wrap	bun . table . mustard . dog . beer . bottle . hotdog . sit . wrap . hot	a hot dog sitting on top of a bun wrapped in foil next to a bottle of beer
4	coco_minitrain_25k/images/val2017/000000181796.jpg	bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass	meal . bean . table . wine . vegetable . meat . plate . food . wine glass . glass . sit on . wooden . white	a white plate of food sitting on a wooden table next to glasses of wine
5	coco_minitrain_25k/images/val2017/000000052565.jpg	beach . black . cattle . coast . cow . sea . photo . sand . shore . shoreline . stand . stare . water . white	ocean . body . cow . water . sand . beach . shore . stand . black . sandy	a black and white cow standing on a sandy shore near a body of water
6	coco_minitrain_25k/images/val2017/000000503755.jpg	catch . court . hand . goggles . necklace . play . racket . stand . sunglasses . tennis court . tennis player . tennis racket . wear . woman	tennis . tennis racquet . sunglass . hand . woman . ball . racket . racquet . tennis player . tennis court . court . tennis racket . play . hold . wear . stand . white . female	a woman in sunglasses playing tennis on a court with a tennis racket in her hand and a tennis racquet in her hand
7	coco_minitrain_25k/images/val2017/000000477955.jpg	attach . beach . catch . coast . fly . person . kite . man . sea . parachute . parasail . sand . sky . stand . string . surfboard . surfer . wetsuit	surfboard . ocean . board . water . kite . beach . person . man . hold . fly . stand	a man holding a surfboard standing on a beach with a kite flying above the water
8	coco_minitrain_25k/images/val2017/000000562229.jpg	boy . child . ride . road . skateboard . skateboarder . stand . trick	boy . child . parking lot . street . skateboard . helmet . kid . ride . skateboard . stand . small . young . little	a young boy in a helmet stands on a skateboard in a parking lot while a small child rides in the background
9	coco_minitrain_25k/images/val2017/000000528862.jpg	animal . area . dirt field . enclosure . fence . field . giraffe . habitat . herd . lush . pen . savanna . stand . tree . walk . zoo	enclosure . zoo . field . giraffe . herd . fence . grass . area . tree . animal . stand . grassy . many . several	several giraffes stand in a grassy area with many trees and other animals

Lets plot the images and labels to verify the label quality.

In [7]:

import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image


for index, row in df.iterrows():
    filename = row['filename']
    ram_labels = row['ram_tags']
    tag2text_labels = row['tag2text_tags']
    tag2text_caption = row['tag2text_caption']
    
    # Read the image using PIL
    image = Image.open(filename)
    
    # Plot the image
    plt.imshow(image)
    plt.title(f"Filename: {filename}\n\nRAM Tags - [{ram_labels}]\n\nTag2Text Tags - [{tag2text_labels}]\n\nTag2Text Caption - [{tag2text_caption}]\n", wrap=True)
    plt.axis('off')
    
    plt.show()
    plt.close()

2. Inference on a single image¶

If you'd like more granular control, you can run inference on a single image.

Let's suppose you'd like to run an inference on the following image.

In [8]:

from IPython.display import Image
Image("coco_minitrain_25k/images/val2017/000000181796.jpg")

Out[8]:

You can just import the RecognizeAnythingModel and invoke the run_inference method.

In [9]:

from fastdup.models_ram import RecognizeAnythingModel

model = RecognizeAnythingModel()
result = model.run_inference("coco_minitrain_25k/images/val2017/000000181796.jpg")

INFO:fastdup.models.ram:Loading model checkpoint from - /home/dnth/ram_swin_large_14m.pth
INFO:fastdup.models.ram:Model loaded to device - cuda

In [10]:

result

Out[10]:

'bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass'

Similarly, we can do the same for Tag2TextModel which returns tags and caption.

In [11]:

from fastdup.models_tag2text import Tag2TextModel
model = Tag2TextModel()
result = model.run_inference("coco_minitrain_25k/images/val2017/000000181796.jpg")

INFO:fastdup.model.tag2text:Loading model checkpoint from - /home/dnth/tag2text_swin_14m.pth
INFO:fastdup.model.tag2text:Model loaded to device - cuda

In [12]:

result

Out[12]:

('meal | bean | table | wine | vegetable | meat | plate | food | wine glass | glass | sit on | wooden | white',
 None,
 'a white plate of food sitting on a wooden table next to glasses of wine')

Wrap Up¶

In this tutorial, we showed how you can run zero-shot image classification (or image tagging) models to enrich your dataset.

This notebook is Part 1 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.

Part 1 - Dataset Enrichment with Zero-Shot Classification Models
Part 2 - Dataset Enrichment with Zero-Shot Detection Models
Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models

Please check out Part 2 of the series where we explore how to generate bounding boxes from the tags using zero-shot detection models like Grounding DINO. See you there!

Questions about this tutorial? Reach out to us on our Slack channel!

Next, feel free to check out other tutorials -

⚡ Quickstart: Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!
🧹 Clean Image Folder: Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.
🖼 Analyze Image Classification Dataset: Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!
🎁 Analyze Object Detection Dataset: Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try.

VL Profiler - A faster and easier way to diagnose and visualize dataset issues¶

If you prefer a no-code platform to inspect and visualize your dataset, try our free cloud product VL Profiler - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser.

VL Profiler is free to get started. Upload up to 1,000,000 images for analysis at zero cost!

Sign up now.

As usual, feedback is welcome! Questions? Drop by our Slack channel or open an issue on GitHub.

GitHub • Join Slack Community • Discussion Forum Blog • Documentation • About Us LinkedIn • Twitter