- 🤖 See full list of Machine Learning Experiments on GitHub
- ▶️ Interactive Demo: try this model and other machine learning experiments in action
In this experiment we will use pre-trained ssdlite_mobilenet_v2_coco model from Tensorflow detection models zoo to do objects detection on the photos.
This notebook is inspired by Objects Detection API Demo
# Selecting Tensorflow version v2 (the command is relevant for Colab only).
%tensorflow_version 2.x
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pathlib
import cv2
import math
from PIL import Image
from google.protobuf import text_format
import platform
print('Python version:', platform.python_version())
print('Tensorflow version:', tf.__version__)
print('Keras version:', tf.keras.__version__)
Tensorflow version: 2.1.0 Keras version: 2.2.4-tf
To do objects detection we're going to use ssdlite_mobilenet_v2_coco model from Tensorflow detection models zoo.
The full name of the model will be ssdlite_mobilenet_v2_coco_2018_05_09.
# Create cache folder.
!mkdir .tmp
mkdir: .tmp: File exists
# Loads the module from internet, unpacks it and initializes a Tensorflow saved model.
def load_model(model_name):
model_url = 'http://download.tensorflow.org/models/object_detection/' + model_name + '.tar.gz'
model_dir = tf.keras.utils.get_file(
fname=model_name,
origin=model_url,
untar=True,
cache_dir=pathlib.Path('.tmp').absolute()
)
model = tf.saved_model.load(model_dir + '/saved_model')
return model
MODEL_NAME = 'ssdlite_mobilenet_v2_coco_2018_05_09'
saved_model = load_model(MODEL_NAME)
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
# Exploring model signatures.
saved_model.signatures
_SignatureMap({'serving_default': <tensorflow.python.eager.wrap_function.WrappedFunction object at 0x144f98d10>})
# Loading default model signature.
model = saved_model.signatures['serving_default']
Depending on what dataset has been used to train the model we need to download proper labels set from tensorflow models repository.
The ssdlite_mobilenet_v2_coco model has been trained on COCO dataset which has 90 objects categories. This list of categories we're going to download and explore. We need a label file with the name mscoco_label_map.pbtxt.
Label object structure is defined in string_int_label_map.proto file in protobuf format.
In order to convert mscoco_label_map.pbtxt
file to Python dictionary we need to load string_int_label_map.proto
file and compile it using protoc
. Before doing the we need to install protoc
.
One of the ways to install protoc
is to load it manually:
PROTOC_ZIP=protoc-3.7.1-osx-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d .tmp/protoc
rm -f $PROTOC_ZIP
After that we may compile proto
files by running:
.tmp/protoc/bin/protoc ./protos/*.proto --python_out=.
☝🏻 For simplicity reasons we have string_int_label_map.proto
and its compiled version string_int_label_map_pb2.py
in the protos
directory. So let's just include this compiled package.
from protos import string_int_label_map_pb2
def load_labels(labels_name):
labels_url = 'https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/' + labels_name
labels_path = tf.keras.utils.get_file(
fname=labels_name,
origin=labels_url,
cache_dir=pathlib.Path('.tmp').absolute()
)
labels_file = open(labels_path, 'r')
labels_string = labels_file.read()
labels_map = string_int_label_map_pb2.StringIntLabelMap()
try:
text_format.Merge(labels_string, labels_map)
except text_format.ParseError:
labels_map.ParseFromString(labels_string)
labels_dict = {}
for item in labels_map.item:
labels_dict[item.id] = item.display_name
return labels_dict
LABELS_NAME = 'mscoco_label_map.pbtxt'
labels = load_labels(LABELS_NAME)
labels
{1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle', 5: 'airplane', 6: 'bus', 7: 'train', 8: 'truck', 9: 'boat', 10: 'traffic light', 11: 'fire hydrant', 13: 'stop sign', 14: 'parking meter', 15: 'bench', 16: 'bird', 17: 'cat', 18: 'dog', 19: 'horse', 20: 'sheep', 21: 'cow', 22: 'elephant', 23: 'bear', 24: 'zebra', 25: 'giraffe', 27: 'backpack', 28: 'umbrella', 31: 'handbag', 32: 'tie', 33: 'suitcase', 34: 'frisbee', 35: 'skis', 36: 'snowboard', 37: 'sports ball', 38: 'kite', 39: 'baseball bat', 40: 'baseball glove', 41: 'skateboard', 42: 'surfboard', 43: 'tennis racket', 44: 'bottle', 46: 'wine glass', 47: 'cup', 48: 'fork', 49: 'knife', 50: 'spoon', 51: 'bowl', 52: 'banana', 53: 'apple', 54: 'sandwich', 55: 'orange', 56: 'broccoli', 57: 'carrot', 58: 'hot dog', 59: 'pizza', 60: 'donut', 61: 'cake', 62: 'chair', 63: 'couch', 64: 'potted plant', 65: 'bed', 67: 'dining table', 70: 'toilet', 72: 'tv', 73: 'laptop', 74: 'mouse', 75: 'remote', 76: 'keyboard', 77: 'cell phone', 78: 'microwave', 79: 'oven', 80: 'toaster', 81: 'sink', 82: 'refrigerator', 84: 'book', 85: 'clock', 86: 'vase', 87: 'scissors', 88: 'teddy bear', 89: 'hair drier', 90: 'toothbrush'}
# List model files
!ls -la .tmp/datasets/ssdlite_mobilenet_v2_coco_2018_05_09
total 81680 drwxr-x--- 9 trekhleb staff 288 May 10 2018 . drwxr-xr-x 5 trekhleb staff 160 Jan 23 07:24 .. -rw-r----- 1 trekhleb staff 77 May 10 2018 checkpoint -rw-r----- 1 trekhleb staff 19911343 May 10 2018 frozen_inference_graph.pb -rw-r----- 1 trekhleb staff 18205188 May 10 2018 model.ckpt.data-00000-of-00001 -rw-r----- 1 trekhleb staff 17703 May 10 2018 model.ckpt.index -rw-r----- 1 trekhleb staff 3665866 May 10 2018 model.ckpt.meta -rw-r----- 1 trekhleb staff 4199 May 10 2018 pipeline.config drwxr-x--- 4 trekhleb staff 128 May 10 2018 saved_model
# Check model pipeline.
!cat .tmp/datasets/ssdlite_mobilenet_v2_coco_2018_05_09/pipeline.config
model { ssd { num_classes: 90 image_resizer { fixed_shape_resizer { height: 300 width: 300 } } feature_extractor { type: "ssd_mobilenet_v2" depth_multiplier: 1.0 min_depth: 16 conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { truncated_normal_initializer { mean: 0.0 stddev: 0.0299999993294 } } activation: RELU_6 batch_norm { decay: 0.999700009823 center: true scale: true epsilon: 0.0010000000475 train: true } } use_depthwise: true } box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } box_predictor { convolutional_box_predictor { conv_hyperparams { regularizer { l2_regularizer { weight: 3.99999989895e-05 } } initializer { truncated_normal_initializer { mean: 0.0 stddev: 0.0299999993294 } } activation: RELU_6 batch_norm { decay: 0.999700009823 center: true scale: true epsilon: 0.0010000000475 train: true } } min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.800000011921 kernel_size: 3 box_code_size: 4 apply_sigmoid_to_scores: false use_depthwise: true } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.20000000298 max_scale: 0.949999988079 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.333299994469 } } post_processing { batch_non_max_suppression { score_threshold: 0.300000011921 iou_threshold: 0.600000023842 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } normalize_loss_by_num_matches: true loss { localization_loss { weighted_smooth_l1 { } } classification_loss { weighted_sigmoid { } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.990000009537 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 3 } classification_weight: 1.0 localization_weight: 1.0 } } } train_config { batch_size: 24 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } optimizer { rms_prop_optimizer { learning_rate { exponential_decay_learning_rate { initial_learning_rate: 0.00400000018999 decay_steps: 800720 decay_factor: 0.949999988079 } } momentum_optimizer_value: 0.899999976158 decay: 0.899999976158 epsilon: 1.0 } } fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" num_steps: 200000 fine_tune_checkpoint_type: "detection" } train_input_reader { label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" } } eval_config { num_examples: 8000 max_evals: 10 use_moving_averages: false } eval_input_reader { label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" } }
model.inputs
[<tf.Tensor 'image_tensor:0' shape=(None, None, None, 3) dtype=uint8>]
model.outputs
[<tf.Tensor 'detection_boxes:0' shape=(None, 100, 4) dtype=float32>, <tf.Tensor 'detection_classes:0' shape=(None, 100) dtype=float32>, <tf.Tensor 'detection_scores:0' shape=(None, 100) dtype=float32>, <tf.Tensor 'num_detections:0' shape=(None,) dtype=float32>]
def display_image(image_np):
plt.figure()
plt.imshow(image_np)
TEST_IMAGES_DIR_PATH = pathlib.Path('data')
TEST_IMAGE_PATHS = sorted(list(TEST_IMAGES_DIR_PATH.glob('*.jpg')))
TEST_IMAGE_PATHS
[PosixPath('data/appartment.jpg'), PosixPath('data/bicycle.jpg'), PosixPath('data/dog.jpg'), PosixPath('data/food.jpg'), PosixPath('data/football.jpg'), PosixPath('data/pedestrians.jpg'), PosixPath('data/street.jpg')]
for image_path in TEST_IMAGE_PATHS:
image_np = mpimg.imread(image_path)
display_image(image_np)
def detect_objects_on_image(image, model):
image = np.asarray(image)
input_tensor = tf.convert_to_tensor(image)
# Adding one more dimension since model expect a batch of images.
input_tensor = input_tensor[tf.newaxis, ...]
output_dict = model(input_tensor)
num_detections = int(output_dict['num_detections'])
output_dict = {
key:value[0, :num_detections].numpy()
for key,value in output_dict.items()
if key != 'num_detections'
}
output_dict['num_detections'] = num_detections
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
return output_dict
def draw_detections_on_image(image, detections, labels):
image_with_detections = image
width, height, channels = image_with_detections.shape
font = cv2.FONT_HERSHEY_SIMPLEX
color = (0, 255, 0)
label_padding = 5
num_detections = detections['num_detections']
if num_detections > 0:
for detection_index in range(num_detections):
detection_score = detections['detection_scores'][detection_index]
detection_box = detections['detection_boxes'][detection_index]
detection_class = detections['detection_classes'][detection_index]
detection_label = labels[detection_class]
detection_label_full = detection_label + ' ' + str(math.floor(100 * detection_score)) + '%'
y1 = int(width * detection_box[0])
x1 = int(height * detection_box[1])
y2 = int(width * detection_box[2])
x2 = int(height * detection_box[3])
# Detection rectangle.
image_with_detections = cv2.rectangle(
image_with_detections,
(x1, y1),
(x2, y2),
color,
3
)
# Label background.
label_size = cv2.getTextSize(
detection_label_full,
cv2.FONT_HERSHEY_COMPLEX,
0.7,
2
)
image_with_detections = cv2.rectangle(
image_with_detections,
(x1, y1 - label_size[0][1] - 2 * label_padding),
(x1 + label_size[0][0] + 2 * label_padding, y1),
color,
-1
)
# Label text.
cv2.putText(
image_with_detections,
detection_label_full,
(x1 + label_padding, y1 - label_padding),
font,
0.7,
(0, 0, 0),
1,
cv2.LINE_AA
)
return image_with_detections
# Example of how detections dictionary looks like.
image_np = np.array(Image.open(TEST_IMAGE_PATHS[1]))
detections = detect_objects_on_image(image_np, model)
detections
{'detection_scores': array([0.9872208 , 0.97772163, 0.97706723, 0.33399296], dtype=float32), 'detection_classes': array([3, 1, 2, 3]), 'detection_boxes': array([[0.31191874, 0.14469081, 0.5815682 , 0.5780536 ], [0.16846958, 0.47724158, 0.87531245, 0.75766236], [0.48911124, 0.37843052, 0.9204666 , 0.90651405], [0.30284303, 0. , 0.53111345, 0.10418391]], dtype=float32), 'num_detections': 4}
for image_path in TEST_IMAGE_PATHS:
image_np = np.array(Image.open(image_path))
detections = detect_objects_on_image(image_np, model)
image_with_detections = draw_detections_on_image(image_np, detections, labels)
plt.figure(figsize=(8, 6))
plt.imshow(image_with_detections)
To use the ssdlite_mobilenet_v2_coco_2018_05_09
model on the web we need to convert it into the format that will be understandable by tensorflowjs. To do so we may use tfjs-converter as following:
tensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
./experiments/objects_detection_ssdlite_mobilenet_v2/.tmp/datasets/ssdlite_mobilenet_v2_coco_2018_05_09/saved_model \
./demos/public/models/objects_detection_ssdlite_mobilenet_v2
Alternative and easier way would be to use a @tensorflow-models/coco-ssd npm package. But just for exploration purpose let's go one level deeper and use the model directly without wrapper modules.
You find this experiment in the Demo app and play around with it right in you browser to see how the model performs in real life.