Mask-RCNN Tensorflow v1 image examples

Use image segmentation creatively

  • toc: true
  • badges: true
  • comments: true
  • categories: [tensorflow, vision, segmentation]

Project Setup

Make sure we're running TensorFlow v1

In [1]:
  %tensorflow_version 1.x
except Exception:
TensorFlow 1.x selected.

Install Mask-RCNN model

In [0]:
pip install -U git+

Download weights of pretrained Mask-RCNN

In [3]:
!curl -L -o mask_rcnn_balloon.h5
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   611  100   611    0     0   2246      0 --:--:-- --:--:-- --:--:--  2254
100  244M  100  244M    0     0  40.0M      0  0:00:06  0:00:06 --:--:-- 47.2M
In [0]:
import cv2
import math
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from mrcnn import utils
from mrcnn import model as modellib
from mrcnn.config import Config
from PIL import Image

plt.rcParams["figure.figsize"]= (10,10)

Mask-RCNN setup

In [5]:
# Load the pre-trained model data
ROOT_DIR = os.getcwd()
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
if not os.path.exists(COCO_MODEL_PATH):
Downloading pretrained model to /content/mask_rcnn_coco.h5 ...
... done downloading pretrained model!
In [0]:
class InferenceConfig(Config):
    """Configuration for training on MS COCO.
    Derives from the base Config class and overrides values specific
    to the COCO dataset.
    # Give the configuration a recognizable name
    NAME = "coco"

    # Number of images to train with on each GPU. A 12GB GPU can typically
    # handle 2 images of 1024x1024px.

    # Uncomment to train on 8 GPUs (default is 1)
    GPU_COUNT = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 80  # COCO has 80 classes
In [0]:
# COCO dataset object names
model = modellib.MaskRCNN(
    mode="inference", model_dir=MODEL_DIR, config=InferenceConfig()
model.load_weights(COCO_MODEL_PATH, by_name=True)
class_names = [
    'BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
    'bus', 'train', 'truck', 'boat', 'traffic light',
    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
    'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
    'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
    'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard',
    'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
    'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
    'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
    'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
    'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
    'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
    'teddy bear', 'hair drier', 'toothbrush'

The following function will apply to the origianl image, the pixels from the gray image is 0, otherwise keep the pixels from original picture.

In [0]:
# This function is used to change the colorful background information to grayscale.
# image[:,:,0] is the Blue channel,image[:,:,1] is the Green channel, image[:,:,2] is the Red channel
# mask == 0 means that this pixel is not belong to the object.
# np.where function means that if the pixel belong to background, change it to gray_image.
# Since the gray_image is 2D, for each pixel in background, we should set 3 channels to the same value to keep the grayscale.

def apply_mask(image, mask_image, mask):
    """Helper function to apply a mask to an image."""
    image[:, :, 0] = np.where(
        mask == 0,
        mask_image[:, :, 0],
        image[:, :, 0]
    image[:, :, 1] = np.where(
        mask == 0,
        mask_image[:, :, 1],
        image[:, :, 1]
    image[:, :, 2] = np.where(
        mask == 0,
        mask_image[:, :, 2],
        image[:, :, 2]
    return image
In [0]:
def process_image(image, mask_image, boxes, masks, ids, names, scores, target_label):
    """Helper function to find the object with biggest bounding box and apply mask to it."""
    # max_area will save the largest object for all the detection results
    max_area = 0
    # n_instances saves the amount of all objects
    n_instances = boxes.shape[0]

    if not n_instances:
        print('NO INSTANCES TO DISPLAY')
        assert boxes.shape[0] == masks.shape[-1] == ids.shape[0]

    for i in range(n_instances):
        if not np.any(boxes[i]):

        # compute the square of each object
        y1, x1, y2, x2 = boxes[i]
        square = (y2 - y1) * (x2 - x1)

        # use label to select the object with given label from all the 80 classes in COCO dataset
        current_label = names[ids[i]]
        if target_label is not None or current_label == target_label:
            # save the largest object in the image as main character
            # other people will be regarded as background
            if square > max_area:
                max_area = square
                mask = masks[:, :, i]

        # apply mask for the image
    # by mistake you put apply_mask inside for loop or you can write continue in if also
    image = apply_mask(image, mask_image, mask)
    return image

Now the mode is ready to use

In [10]:
!curl -L -o cat_input.jpg
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   232    0   232    0     0    666      0 --:--:-- --:--:-- --:--:--   666
100 5442k  100 5442k    0     0  10.5M      0 --:--:-- --:--:-- --:--:-- 10.5M
In [69]:
# Credit for the image:
image = cv2.imread('./cat_input.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
<matplotlib.image.AxesImage at 0x7f4274e4b710>

Application 1: Grayscale the background

Recognize the main character, keep it colorfull while grayscal the background of the image.

In [70]:
# Use cvtColor to accomplish image transformation from RGB image to gray image
mask_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
mask_image = np.stack([mask_image, mask_image, mask_image], axis=2)
<matplotlib.image.AxesImage at 0x7f4274e21a90>
In [0]:
results = model.detect([image], verbose=0)
output_dict = results[0]
rois, class_ids, scores, masks = output_dict.values()
In [73]:
result = process_image(
    image.copy(), mask_image, rois, masks, class_ids, class_names, scores, 'cat'
<matplotlib.image.AxesImage at 0x7f427458c860>