This notebook is Part 1 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.
This notebook shows how to can enrich your image dataset using labels generated with open-source zero-shot image classification (or image tagging) models such as Recognize Anything (RAM) and Tag2Text.
By the end of the notebook, you'll learn how to:
First, let's install the necessary packages:
🗒 Note - We highly recommending running this notebook in CUDA enabled environment to reduce the run time.
!pip install -Uq fastdup git+https://github.com/xinyu1205/recognize-anything.git@119a7ae42fb2ce75459cd9107b353bc508460023 gdown
Now, test the installation. If there's no error message, we are ready to go.
import fastdup
fastdup.__version__
'1.53'
Download the coco-minitrain dataset - a curated mini training set consisting of 20% of COCO 2017 training dataset. The coco-minitrain
consists of 25,000 images and annotations.
!gdown --fuzzy https://drive.google.com/file/d/1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK/view
!unzip -qq coco_minitrain_25k.zip
Within fastdup you can readily use the zero-shot image tagging models such as Recognize Anything Model (RAM) and Tag2Text.
Both Tag2Text and RAM exhibit strong recognition ability:
To run inference on downloaded dataset, you first need to load the image paths into a DataFrame
.
import pandas as pd
from fastdup.utils import get_images_from_path
fd = fastdup.create(input_dir='./coco_minitrain_25k')
filenames = get_images_from_path(fd.input_dir)
df = pd.DataFrame(filenames, columns=["filename"])
df
Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.
filename | |
---|---|
0 | coco_minitrain_25k/images/val2017/000000382734.jpg |
1 | coco_minitrain_25k/images/val2017/000000508730.jpg |
2 | coco_minitrain_25k/images/val2017/000000202339.jpg |
3 | coco_minitrain_25k/images/val2017/000000460929.jpg |
4 | coco_minitrain_25k/images/val2017/000000181796.jpg |
... | ... |
29995 | coco_minitrain_25k/images/train2017/000000065630.jpg |
29996 | coco_minitrain_25k/images/train2017/000000062839.jpg |
29997 | coco_minitrain_25k/images/train2017/000000221911.jpg |
29998 | coco_minitrain_25k/images/train2017/000000292451.jpg |
29999 | coco_minitrain_25k/images/train2017/000000402948.jpg |
30000 rows × 1 columns
fastdup provides a convenient API fd.enrich
to enrich the metadata of the images loaded into a DataFrame
.
NUM_ROWS_TO_ENRICH = 10 # for demonstration, only run on 10 rows.
df = fd.enrich(task='zero-shot-classification',
model='recognize-anything-model', # specify model
input_df=df, # the DataFrame of image files to enrich.
input_col='filename', # the name of the filename column.
num_rows=NUM_ROWS_TO_ENRICH # number of rows in the DataFrame to enrich. Optional.
)
INFO:fastdup.models.ram:Loading model checkpoint from - /home/dnth/ram_swin_large_14m.pth INFO:fastdup.models.ram:Model loaded to device - cuda
The above code loads the RAM model, runs inference on an images in the filename
column and creates a new column ram_tags
that contains the labels.
df
filename | ram_tags | |
---|---|---|
0 | coco_minitrain_25k/images/val2017/000000382734.jpg | bath . bathroom . doorway . drain . floor . glass door . room . screen door . shower . white |
1 | coco_minitrain_25k/images/val2017/000000508730.jpg | baby . bathroom . bathroom accessory . bin . boy . brush . chair . child . comb . diaper . hair . hairbrush . play . potty . sit . stool . tile wall . toddler . toilet bowl . toilet seat . toy |
2 | coco_minitrain_25k/images/val2017/000000202339.jpg | bus . bus station . business suit . carry . catch . city bus . pillar . man . shopping bag . sign . suit . tie . tour bus . walk |
3 | coco_minitrain_25k/images/val2017/000000460929.jpg | beer . beer bottle . beverage . blanket . bottle . roll . can . car . chili dog . condiment . table . dog . drink . foil . hot . hot dog . mustard . picnic table . sit . soda . tinfoil . tomato sauce . wrap |
4 | coco_minitrain_25k/images/val2017/000000181796.jpg | bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass |
5 | coco_minitrain_25k/images/val2017/000000052565.jpg | beach . black . cattle . coast . cow . sea . photo . sand . shore . shoreline . stand . stare . water . white |
6 | coco_minitrain_25k/images/val2017/000000503755.jpg | catch . court . hand . goggles . necklace . play . racket . stand . sunglasses . tennis court . tennis player . tennis racket . wear . woman |
7 | coco_minitrain_25k/images/val2017/000000477955.jpg | attach . beach . catch . coast . fly . person . kite . man . sea . parachute . parasail . sand . sky . stand . string . surfboard . surfer . wetsuit |
8 | coco_minitrain_25k/images/val2017/000000562229.jpg | boy . child . ride . road . skateboard . skateboarder . stand . trick |
9 | coco_minitrain_25k/images/val2017/000000528862.jpg | animal . area . dirt field . enclosure . fence . field . giraffe . habitat . herd . lush . pen . savanna . stand . tree . walk . zoo |
Similarly, we can run the fd.enrich
API using the Tag2Text
model.
df = fd.enrich(task='zero-shot-classification',
model='tag2text',
input_df=df,
input_col='filename'
)
INFO:fastdup.model.tag2text:Loading model checkpoint from - /home/dnth/tag2text_swin_14m.pth INFO:fastdup.model.tag2text:Model loaded to device - cuda
df
filename | ram_tags | tag2text_tags | tag2text_caption | |
---|---|---|---|---|
0 | coco_minitrain_25k/images/val2017/000000382734.jpg | bath . bathroom . doorway . drain . floor . glass door . room . screen door . shower . white | room . floor . bathroom . shower . wall . toilet . green . white | a bathroom with green walls and a white toilet |
1 | coco_minitrain_25k/images/val2017/000000508730.jpg | baby . bathroom . bathroom accessory . bin . boy . brush . chair . child . comb . diaper . hair . hairbrush . play . potty . sit . stool . tile wall . toddler . toilet bowl . toilet seat . toy | hair . bathroom . brush . girl . boy . child . toddler . kid . couple . toilet . sit . play . sit in . sit on . small . young . little | a couple of small kids sitting on a toilet in a bathroom with a little girl playing with her brush and hair |
2 | coco_minitrain_25k/images/val2017/000000202339.jpg | bus . bus station . business suit . carry . catch . city bus . pillar . man . shopping bag . sign . suit . tie . tour bus . walk | bag . bus . suit . luggage . man . hold . carry . walk | a man in a suit carrying a bag of luggage |
3 | coco_minitrain_25k/images/val2017/000000460929.jpg | beer . beer bottle . beverage . blanket . bottle . roll . can . car . chili dog . condiment . table . dog . drink . foil . hot . hot dog . mustard . picnic table . sit . soda . tinfoil . tomato sauce . wrap | bun . table . mustard . dog . beer . bottle . hotdog . sit . wrap . hot | a hot dog sitting on top of a bun wrapped in foil next to a bottle of beer |
4 | coco_minitrain_25k/images/val2017/000000181796.jpg | bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass | meal . bean . table . wine . vegetable . meat . plate . food . wine glass . glass . sit on . wooden . white | a white plate of food sitting on a wooden table next to glasses of wine |
5 | coco_minitrain_25k/images/val2017/000000052565.jpg | beach . black . cattle . coast . cow . sea . photo . sand . shore . shoreline . stand . stare . water . white | ocean . body . cow . water . sand . beach . shore . stand . black . sandy | a black and white cow standing on a sandy shore near a body of water |
6 | coco_minitrain_25k/images/val2017/000000503755.jpg | catch . court . hand . goggles . necklace . play . racket . stand . sunglasses . tennis court . tennis player . tennis racket . wear . woman | tennis . tennis racquet . sunglass . hand . woman . ball . racket . racquet . tennis player . tennis court . court . tennis racket . play . hold . wear . stand . white . female | a woman in sunglasses playing tennis on a court with a tennis racket in her hand and a tennis racquet in her hand |
7 | coco_minitrain_25k/images/val2017/000000477955.jpg | attach . beach . catch . coast . fly . person . kite . man . sea . parachute . parasail . sand . sky . stand . string . surfboard . surfer . wetsuit | surfboard . ocean . board . water . kite . beach . person . man . hold . fly . stand | a man holding a surfboard standing on a beach with a kite flying above the water |
8 | coco_minitrain_25k/images/val2017/000000562229.jpg | boy . child . ride . road . skateboard . skateboarder . stand . trick | boy . child . parking lot . street . skateboard . helmet . kid . ride . skateboard . stand . small . young . little | a young boy in a helmet stands on a skateboard in a parking lot while a small child rides in the background |
9 | coco_minitrain_25k/images/val2017/000000528862.jpg | animal . area . dirt field . enclosure . fence . field . giraffe . habitat . herd . lush . pen . savanna . stand . tree . walk . zoo | enclosure . zoo . field . giraffe . herd . fence . grass . area . tree . animal . stand . grassy . many . several | several giraffes stand in a grassy area with many trees and other animals |
Lets plot the images and labels to verify the label quality.
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
for index, row in df.iterrows():
filename = row['filename']
ram_labels = row['ram_tags']
tag2text_labels = row['tag2text_tags']
tag2text_caption = row['tag2text_caption']
# Read the image using PIL
image = Image.open(filename)
# Plot the image
plt.imshow(image)
plt.title(f"Filename: {filename}\n\nRAM Tags - [{ram_labels}]\n\nTag2Text Tags - [{tag2text_labels}]\n\nTag2Text Caption - [{tag2text_caption}]\n", wrap=True)
plt.axis('off')
plt.show()
plt.close()
If you'd like more granular control, you can run inference on a single image.
Let's suppose you'd like to run an inference on the following image.
from IPython.display import Image
Image("coco_minitrain_25k/images/val2017/000000181796.jpg")
You can just import the RecognizeAnythingModel
and invoke the run_inference
method.
from fastdup.models_ram import RecognizeAnythingModel
model = RecognizeAnythingModel()
result = model.run_inference("coco_minitrain_25k/images/val2017/000000181796.jpg")
INFO:fastdup.models.ram:Loading model checkpoint from - /home/dnth/ram_swin_large_14m.pth INFO:fastdup.models.ram:Model loaded to device - cuda
result
'bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass'
Similarly, we can do the same for Tag2TextModel
which returns tags and caption.
from fastdup.models_tag2text import Tag2TextModel
model = Tag2TextModel()
result = model.run_inference("coco_minitrain_25k/images/val2017/000000181796.jpg")
INFO:fastdup.model.tag2text:Loading model checkpoint from - /home/dnth/tag2text_swin_14m.pth INFO:fastdup.model.tag2text:Model loaded to device - cuda
result
('meal | bean | table | wine | vegetable | meat | plate | food | wine glass | glass | sit on | wooden | white', None, 'a white plate of food sitting on a wooden table next to glasses of wine')
In this tutorial, we showed how you can run zero-shot image classification (or image tagging) models to enrich your dataset.
This notebook is Part 1 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.
Please check out Part 2 of the series where we explore how to generate bounding boxes from the tags using zero-shot detection models like Grounding DINO. See you there!
Questions about this tutorial? Reach out to us on our Slack channel!
Next, feel free to check out other tutorials -
If you prefer a no-code platform to inspect and visualize your dataset, try our free cloud product VL Profiler - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser.
VL Profiler is free to get started. Upload up to 1,000,000 images for analysis at zero cost!
Sign up now.
As usual, feedback is welcome! Questions? Drop by our Slack channel or open an issue on GitHub.