#!/usr/bin/env python # coding: utf-8 #
# # # # # vl logo. # # #
#
# # Logo # # # Logo # # # Logo # # # Logo # # # Logo # #
# # Dataset Enrichment with Zero-Shot Detection Models # # [![Open in Colab](https://img.shields.io/badge/Open%20in%20Colab-blue?style=for-the-badge&logo=&labelColor=gray)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/enrichment-zero-shot-detection.ipynb) # [![Kaggle](https://img.shields.io/badge/Open%20in%20Kaggle-blue?style=for-the-badge&logo=&labelColor=gray)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/enrichment-zero-shot-detection.ipynb) # [![Explore the Docs](https://img.shields.io/badge/Explore%20the%20Docs-blue?style=for-the-badge&labelColor=gray&logo=read-the-docs)](https://visual-layer.readme.io/docs/enrichment-zero-shot-detection) # This notebook is Part 2 of the enrichment notebook series where we utilize various zero-shot models to enrich the metadata of existing dataset. # # + [Part 1](https://visual-layer.readme.io/docs/enrichment-zero-shot-classification) - Dataset Enrichment with Zero-Shot Classification Models # + [Part 2](https://visual-layer.readme.io/docs/enrichment-zero-shot-detection) - Dataset Enrichment with Zero-Shot Detection Models # + [Part 3](https://visual-layer.readme.io/docs/enrichment-zero-shot-segmentation) - Dataset Enrichment with Zero-Shot Segmentation Models # # If you haven't checkout out [Part 1](https://github.com/visual-layer/fastdup/blob/main/examples/enrichment-zero-shot-classification.ipynb), we highly encourage you to go through it first before proceeding with this notebook. # # In this notebook we show an end-to-end example on how you can enrich the metadata of your visual using open-source zero-shot models such [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO) using the output we obtained from Part 1. # # By the end of this notebook, you'll learn how to: # # + Install and load the Grounding DINO in fastdup. # + Enrich your dataset using bounding boxes and labels generated by the Grounding DINO model. # + Run inference using SAM on a single iamge. # + Specify custom prompt to search for object of interest in your dataset. # + Export the enriched dataset into COCO `.json` format. # ## Installation # # First, let's install the necessary packages: # # - [fastdup](https://github.com/visual-layer/fastdup) - To analyze issues in the dataset. # - [MMEngine](https://github.com/open-mmlab/mmengine), [MMDetection](https://github.com/open-mmlab/mmdetection), [groundingdino-py](https://github.com/IDEA-Research/GroundingDINO) - To use the Grounding DINO and MMDetection model. # - [gdown](https://github.com/wkentaro/gdown) - To download demo data hosted on Google Drive. # # > 🗒 **Note** - We highly recommending running this notebook in CUDA enabled environment to reduce the run time. # In[ ]: get_ipython().system('pip install -Uq fastdup mmengine mmdet groundingdino-py gdown') # Now, test the installation. If there's no error message, we are ready to go. # In[1]: import fastdup fastdup.__version__ # ## Download Dataset # Download the [coco-minitrain](https://github.com/giddyyupp/coco-minitrain) dataset - A curated mini training set consisting of 20% of COCO 2017 training dataset. The coco-minitrain consists of 25,000 images and annotations. # In[ ]: get_ipython().system('gdown --fuzzy https://drive.google.com/file/d/1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK/view') get_ipython().system('unzip -qq coco_minitrain_25k.zip') # ## Zero-Shot Detection with Grounding DINO # # Apart from zero-shot recognition models, fastdup also supports zero-shot detection models like [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO) (and more to come). # # Grounding DINO is a powerful zero-shot detection model. It accepts image-text pair as inputs and outputs a bounding box. # ### 1. Inference on a bulk of images # # # In [Part 1](https://github.com/visual-layer/fastdup/blob/main/examples/enrichment-zero-shot-classification.ipynb) of the enrichment notebook series, we utilized zero-shot image tagging models such as Recognize Anything Model and ran an inference over the images in our dataset. # # We ended up with a DataFrame consisting of the `filename` and `ram_tags` column as follows. # # In[2]: import pandas as pd # Dataframe we got from Part 1 data = { 'filename': [ 'coco_minitrain_25k/images/val2017/000000382734.jpg', 'coco_minitrain_25k/images/val2017/000000508730.jpg', 'coco_minitrain_25k/images/val2017/000000202339.jpg', 'coco_minitrain_25k/images/val2017/000000460929.jpg', 'coco_minitrain_25k/images/val2017/000000181796.jpg', 'coco_minitrain_25k/images/val2017/000000052565.jpg', 'coco_minitrain_25k/images/val2017/000000503755.jpg', 'coco_minitrain_25k/images/val2017/000000477955.jpg', 'coco_minitrain_25k/images/val2017/000000562229.jpg', 'coco_minitrain_25k/images/val2017/000000528862.jpg', ], 'ram_tags': [ 'bath . bathroom . doorway . drain . floor . glass door . room . screen door . shower . white', 'baby . bathroom . bathroom accessory . bin . boy . brush . chair . child . comb . diaper . hair . hairbrush . play . potty . sit . stool . tile wall . toddler . toilet bowl . toilet seat . toy', 'bus . bus station . business suit . carry . catch . city bus . pillar . man . shopping bag . sign . suit . tie . tour bus . walk', 'beer . beer bottle . beverage . blanket . bottle . roll . can . car . chili dog . condiment . table . dog . drink . foil . hot . hot dog . mustard . picnic table . sit . soda . tinfoil . tomato sauce . wrap', 'bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass', 'beach . black . cattle . coast . cow . sea . photo . sand . shore . shoreline . stand . stare . water . white', 'catch . court . hand . goggles . necklace . play . racket . stand . sunglasses . tennis court . tennis player . tennis racket . wear . woman', 'attach . beach . catch . coast . fly . person . kite . man . sea . parachute . parasail . sand . sky . stand . string . surfboard . surfer . wetsuit', 'boy . child . ride . road . skateboard . skateboarder . stand . trick', 'animal . area . dirt field . enclosure . fence . field . giraffe . habitat . herd . lush . pen . savanna . stand . tree . walk . zoo' ] } df = pd.DataFrame(data) df # If you'd like to reproduce the above DataFrame, [Part 1](https://github.com/visual-layer/fastdup/blob/main/examples/enrichment-zero-shot-classification.ipynb) notebook details the code you need to run. # We can now use the image tags from the above DataFrame in combination with [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO) to further enrich the dataset with bounding boxes. # # To run the enrichment on a DataFrame, use the `fd.enrich` method and specify `model='grounding-dino'`. By default fastdup loads the smaller variant (Swin-T) backbone for enrichment. # # Also specify the DataFrame to run the enrichment on and the name of the column as the input to the Grounding DINO model. In this example, we take the text prompt from the `ram_tags` column which we have computed earlier. # In[3]: fd = fastdup.create(input_dir='./coco_minitrain_25k') df = fd.enrich(task='zero-shot-detection', model='grounding-dino', input_df=df, input_col='ram_tags' ) # Once, done you'll notice that 3 new columns are appended into the DataFrame namely - `grounding_dino_bboxes`, `grounding_dino_scores`, and `grounding_dino_labels`. # In[4]: df # Now let's plot the results of the enrichment using the `plot_annotations` function. # In[5]: from fastdup.models_utils import plot_annotations plot_annotations(df, image_col='filename', # column specifying image filenames tags_col='ram_tags', # column specifying image labels bbox_col='grounding_dino_bboxes', # column specifying bounding boxes scores_col='grounding_dino_scores', # column specifying label scores labels_col='grounding_dino_labels', # column specifying label text num_rows=10 # the number of rows in the dataframe to plot ) # ### Search for Specific Objects with Custom Text Prompt # # Let's suppose you'd like to search for specific objects in your dataset, you can create a column in the DataFrame specifying the objects of interest and run the `.enrich` method. # # Let's create a column in our DataFrame and name it `custom_prompt`. # In[6]: df["custom_prompt"] = "face . eye . hair . " # In[7]: df # Now lets run the enrichment with the `custom_prompt` column. # In[9]: df = fd.enrich(task='zero-shot-detection', model='grounding-dino', input_df=df, input_col='custom_prompt' ) # In[10]: df # Not all images contain "face", "eye" and "hair", let's remove the columns with no detections and plot the column with detections. # In[11]: df = df[df['grounding_dino_labels'].astype(bool)] # In[12]: plot_annotations(df, image_col='filename', tags_col='custom_prompt', bbox_col='grounding_dino_bboxes', scores_col='grounding_dino_scores', labels_col='grounding_dino_labels', num_rows=10 ) # ### 2. Inference on single image # fastdup provides an easy way to load the Grounding DINO model and run an inference. # # Let's suppose we have the following image and would like to run an inference with the Grounding DINO model. # In[13]: from IPython.display import Image Image("coco_minitrain_25k/images/val2017/000000449996.jpg") # You'll have to import the module and provide it with an image-text input pair. # # Note: Text prompts must be separated with `" . "`. # # By default fastdup uses the smaller variant of Grounding DINO (Swin-T backbone). # In[14]: from fastdup.models_grounding_dino import GroundingDINO model = GroundingDINO() results = model.run_inference(image_path="coco_minitrain_25k/images/val2017/000000449996.jpg", text_prompt="air field . airliner . plane . airport . airport runway . airport terminal . jet . land . park . raceway . sky . tarmac . terminal", box_threshold=0.3, text_threshold=0.25 ) # In[15]: results # Fine tune the detection output by varying the `box_threshold` and `text_threshold` values. # # The outputs are stored in a Python `dict`. # Let's plot the image and results using the `annotate_image` convenience function. # In[16]: from fastdup.models_utils import annotate_image annotate_image("coco_minitrain_25k/images/val2017/000000449996.jpg", results) # You can optionally load another variant of Grounding DINO (Swin-B backbone) from the [official Grounding DINO repo](https://github.com/IDEA-Research/GroundingDINO). # # Download the [weights](https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swinb_cogcoor.pth) and [config](https://github.com/IDEA-Research/GroundingDINO/blob/main/groundingdino/config/GroundingDINO_SwinB_cfg.py) into your local directory and pass them as arguments to the `GroundingDINO` contructor. # # ```python # model = GroundingDINO(model_config="GroundingDINO_SwinB_cfg.py", # model_weights="groundingdino_swinb_cogcoor.pth") # # results = model.run_inference(image_path="coco_minitrain_25k/images/val2017/000000449996.jpg", # text_prompt="air field . airliner . plane . airport . airport runway . airport terminal . jet . land . park . raceway . sky . tarmac . terminal", # box_threshold=0.3, # text_threshold=0.25) # ``` # ## Convert Annotations to COCO Format # # Once the enrichment is complete, you can also conveniently export the DataFrame into the COCO `.json` annotation format. For now, only the bounding boxes and labels are exported. Masks will be added in a future release. # In[17]: from fastdup.models_utils import export_to_coco export_to_coco(df, bbox_col='grounding_dino_bboxes', label_col='grounding_dino_labels', json_filename='grounding_dino_annot_coco_format.json' ) # ## Wrap Up # In this tutorial, we showed how you can run zero-shot image detection models to enrich your dataset. # # This notebook is Part 2 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets. # # + [Part 1](https://visual-layer.readme.io/docs/enrichment-zero-shot-classification) - Dataset Enrichment with Zero-Shot Classification Models # + [Part 2](https://visual-layer.readme.io/docs/enrichment-zero-shot-detection) - Dataset Enrichment with Zero-Shot Detection Models # + [Part 3](https://visual-layer.readme.io/docs/enrichment-zero-shot-segmentation) - Dataset Enrichment with Zero-Shot Segmentation Models # # Please check out [Part 3](https://visual-layer.readme.io/docs/enrichment-zero-shot-segmentation) of the series where we explore how to generate masks from the bounding boxes using zero-shot segmentation models like Segment Anything. See you there! # # Questions about this tutorial? Reach out to us on our [Slack channel](https://visuallayer.slack.com/)! # # # Next, feel free to check out other tutorials - # # + ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here! # + 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start. # + 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go! # + 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. # # ## VL Profiler - A faster and easier way to diagnose and visualize dataset issues # # If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. # # VL Profiler is free to get started. Upload up to 1,000,000 images for analysis at zero cost! # # [Sign up](https://app.visual-layer.com) now. # # [![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/github_banner_profiler.gif)](https://app.visual-layer.com) # # As usual, feedback is welcome! Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues). #
# # # # # vl logo. # #
# GitHub • # Join Slack Community • # Discussion Forum #
# #
# Blog • # Documentation • # About Us #
# #
# LinkedIn • # Twitter #