Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
Open-source annotation tools for object detection and for image segmentation exist, however for image classification are less common. When there is only one object per image, labeling can be done by moving images manually into separate folders for each image class. This stategy however is manual, and does not work when it's possible to have multiple different objects in a single image. For such cases, either this notebook can be used, or e.g. this cloud-based labeling tool.
This notebook provides a simple UI to assist in labeling images. Each image can be annotated with one or more classes or be marked as "Exclude" to indicate that the image should not be used for model training or evaluation.
# Ensure edits to libraries are loaded and plotting is shown in the notebook.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os
import sys
import scrapbook as sb
sys.path.append("../../")
from utils_cv.classification.widget import AnnotationWidget
from utils_cv.classification.data import Urls
from utils_cv.common.data import unzip_url
Set the location of the images to annotate and path to save the annotations. Here unzip_url
is used to download example data if not already present and set the path.
See the FAQ.md for a brief discussion on how to scrape images from the internet.
IM_DIR = os.path.join((unzip_url(Urls.fridge_objects_tiny_path, exist_ok=True)), 'can')
ANNO_PATH = "cvbp_ic_annotation.txt"
print(f"Using images in directory: {IM_DIR}.")
Using images in directory: /data/home/pabuehle/Desktop/ComputerVision/data/fridgeObjectsTiny/can.
Start the UI. Check the "Allow multi-class labeling" box to allow for images to be annotated with multiple classes. When in doubt what the annotation for an image should be, or for any other reason (e.g. blur or over-exposure), mark an image as "EXCLUDE". All annotations are saved to (and loaded from) a pandas dataframe with path specified in anno_path
. Note that the toy dataset in this notebook only contains images of cans.
w_anno_ui = AnnotationWidget(
labels=["can", "carton", "milk_bottle", "water_bottle"],
im_dir=IM_DIR,
anno_path=ANNO_PATH,
im_filenames=None, # Set to None to annotate all images in IM_DIR
)
display(w_anno_ui.show())
Tab(children=(VBox(children=(HBox(children=(Button(description='Previous', layout=Layout(width='80px'), style=…
Below is an example how to create a fast.ai ImageList
object using the ground truth annotations generated by the AnnotationWidget
. Fast.ai does not support the Exclude
flag, hence we handle this by removing these images before calling the from_df()
and label_from_df()
functions.
For this example, we create a toy annotation file at example_annotation.csv
rather than using ANNO_PATH
.
%%writefile example_annotation.csv
IM_FILENAME EXCLUDE LABELS
10.jpg False can
12.jpg False can,carton
13.jpg True
14.jpg False carton
15.jpg False carton,milk_bottle
18.jpg False can
19.jpg True
20.jpg False can
Overwriting example_annotation.csv
import pandas as pd
from fastai.vision import ImageList, ImageDataBunch
# Load annotation, discard excluded images, and convert to format fast.ai expects
data = []
with open("example_annotation.csv", "r") as f:
for line in f.readlines()[1:]:
vec = line.strip().split("\t")
exclude = vec[1] == "True"
if not exclude and len(vec) > 2:
data.append((vec[0], vec[2]))
df = pd.DataFrame(data, columns=["name", "label"])
display(df)
data = (
ImageList.from_df(path=IM_DIR, df=df)
.split_by_rand_pct(valid_pct=0.5)
.label_from_df(cols="label", label_delim=",")
)
print(data)
name | label | |
---|---|---|
0 | 10.jpg | can |
1 | 12.jpg | can,carton |
2 | 14.jpg | carton |
3 | 15.jpg | carton,milk_bottle |
4 | 18.jpg | can |
5 | 20.jpg | can |
LabelLists; Train: LabelList (3 items) x: ImageList Image (3, 665, 499),Image (3, 665, 499),Image (3, 665, 499) y: MultiCategoryList carton,carton;milk_bottle,can Path: /data/home/pabuehle/Desktop/ComputerVision/data/fridgeObjectsTiny/can; Valid: LabelList (3 items) x: ImageList Image (3, 665, 499),Image (3, 665, 499),Image (3, 665, 499) y: MultiCategoryList can,can,can;carton Path: /data/home/pabuehle/Desktop/ComputerVision/data/fridgeObjectsTiny/can; Test: None
# Preserve some of the notebook outputs
num_images = len(data.valid) + len(data.train)
sb.glue("num_images", num_images)