Quickstart: Web Cam Image Classification¶

Image classification is the canonical computer vision task of determining if an image contains a specific object, feature, or activity. Currently, the best models are based on convolutional neural networks (CNNs) and are typically pre-trained on ImageNet dataset. The CNN weights are trained on millions of images and hundreds of object classes. Implementations are available from many deep neural network frameworks including CNTK, fast.ai, Keras, PyTorch, and TensorFlow.

This notebook shows a simple example of loading a pretrained model to score a webcam stream. Here, we use a ResNet model using the fastai.vision package.

For more details about image classification tasks, including transfer learning, please see our training introduction notebook.

Prerequisite for Webcam example¶

This notebook assumes you have a webcam connected to your machine. If you want to use a remote-VM to run the model and codes while using a local machine for the webcam stream, you can use an SSH tunnel:

SSH connect to your VM

$ ssh -L 8888:localhost:8888 <user-id@url-to-your-vm>

Launch a Jupyter session on the VM (with port 8888 which is the default)
Open localhost:8888 from your browser on the webcam connected local machine to access the Jupyter notebook running on the VM.

We use the ipywebrtc module to show the webcam widget in the notebook. Currently, the widget works on Chrome and Firefox. For more details about the widget, please visit ipywebrtc github or documentation.

Initialization¶

In [1]:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [3]:

import sys
sys.path.append("../../")
import io
import os
import time
import urllib.request

import fastai
from fastai.vision import models, open_image
from ipywebrtc import CameraStream, ImageRecorder
from ipywidgets import HBox, Label, Layout, Widget
import scrapbook as sb

from utils_cv.common.data import data_path
from utils_cv.common.gpu import which_processor
from utils_cv.classification.data import imagenet_labels
from utils_cv.classification.model import IMAGENET_IM_SIZE, model_to_learner

print(f"Fast.ai: {fastai.__version__}")
which_processor()

Fast.ai: 1.0.57
Torch is using GPU: Tesla V100-PCIE-16GB

Load Pretrained Model¶

To introduce the computer vision approach with CNNs, we will start with a pretrained ResNet18 CNN which is a relatively small and fast CNNs architecture. The reported error rate of the model on ImageNet is 30.24% for top-1 and 10.92% for top-5 (top n labels considered most probable by the model).

This model expects RGB-images as input. To take advantage of the pretrained ResNet model that was trained on imagenet, we need to normalize our image data in the same way. This includes reducing the range of values of RGB from 0-255 to 0-1 and using the normalization mean and std which can be found in fastai.vision.imagenet_stats.

The output of the model is a probability distribution of the labeled classes in ImageNet. To convert them into human-readable labels, we utilize the label json file from Keras.

Note: You can load a different model using learn = load_learner(path). To learn more about model-export and load, see the fastai doc.

In [3]:

labels = imagenet_labels()
print(f"Num labels = {len(labels)}")
print(f"{', '.join(labels[:5])}, ...")

Num labels = 1000
tench, goldfish, great_white_shark, tiger_shark, hammerhead, ...

Following fast.ai's naming conventions: A Learner is a Trainer for model using data to minimize loss_func with optimizer opt_func.

In [4]:

# Convert a pretrained imagenet model into Learner for prediction. 
# You can load an exported model by learn = load_learner(path) as well.
learn = model_to_learner(models.resnet18(pretrained=True), IMAGENET_IM_SIZE)

Classify Images¶

Image file¶

First, we prepare a coffee mug image to show an example of how to score a single image by using the model.

In [5]:

# Download an example image
IM_URL = "https://cvbp.blob.core.windows.net/public/images/cvbp_cup.jpg"
urllib.request.urlretrieve(IM_URL, os.path.join(data_path(), "example.jpg"))

im = open_image(os.path.join(data_path(), "example.jpg"), convert_mode='RGB')
im

Out[5]:

Using the learn.predict method, we ask the model to classify this image. The model classifies the image returning the label with the highest probability score. In this case, "coffee_mug".

In [6]:

start_time = time.time()

# Use the model to predict the class label
_, ind, prob = learn.predict(im)
predicted_label = labels[ind]
predicted_confidence = prob[ind]
print(f"Predicted label: {predicted_label} (confidence = {predicted_confidence:.2f})")

# Show prediction time. Note the first prediction usually takes longer because of the model loading
print(f"Took {time.time()-start_time} sec")

Predicted label: coffee_mug (confidence = 0.68)
Took 0.02240896224975586 sec

WebCam Stream¶

Now, we use a WebCam stream for image classification. We use ipywebrtc to start a webcam and get the video stream which is sent to the notebook's widget. Note that Jupyter widgets are quite unstable - if the widget below does not show then see the "Troubleshooting" section in this FAQ for possible fixes.

In [7]:

# Webcam
w_cam = CameraStream(
    constraints={
        'facing_mode': 'user',
        'audio': False,
        'video': { 'width': IMAGENET_IM_SIZE, 'height': IMAGENET_IM_SIZE }
    },
    layout=Layout(width=f'{IMAGENET_IM_SIZE}px')
)
# Image recorder for taking a snapshot
w_imrecorder = ImageRecorder(stream=w_cam, layout=Layout(padding='0 0 0 50px'))
# Label widget to show our classification results
w_label = Label(layout=Layout(padding='0 0 0 50px'))

def classify_frame(_):
    """ Classify an image snapshot by using a pretrained model
    """
    # Once capturing started, remove the capture widget since we don't need it anymore
    if w_imrecorder.layout.display != 'none':
        w_imrecorder.layout.display = 'none'
        
    try:
        im = open_image(io.BytesIO(w_imrecorder.image.value), convert_mode='RGB')
        _, ind, prob = learn.predict(im)
        # Show result label and confidence
        w_label.value = f"{labels[ind]} ({prob[ind]:.2f})"
    except OSError:
        # If im_recorder doesn't have valid image data, skip it. 
        pass
    
    # Taking the next snapshot programmatically
    w_imrecorder.recording = True

# Register classify_frame as a callback. Will be called whenever image.value changes. 
w_imrecorder.image.observe(classify_frame, 'value')

In [8]:

# Show widgets
HBox([w_cam, w_imrecorder, w_label])

HBox(children=(CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 224, 'heigh…

Now, click the capture button in the widget to start classification. Labels are displayed to show the most probable class along with the confidence predicted by the model for an image snapshot.

Example Webcam image

Conclusion¶

In this notebook, we used a simple example to demonstrate how to use a pretrained model to classify images. The model is limited to only predict object labels that are part of the ImageNet training samples. From our training introduction notebook, you can find how to fine-tune a model to customize the model to detect other objects that you may be interested in finding.

In [9]:

# Stop the model and webcam
Widget.close_all()

In [10]:

# Preserve some of the notebook outputs
sb.glue("predicted_label", predicted_label)
sb.glue("predicted_confidence", float(predicted_confidence))