In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2
In [2]:
from fastai.conv_learner import *
from fastai.dataset import *

from pathlib import Path
import json
from PIL import ImageDraw, ImageFont
from matplotlib import patches, patheffects
# torch.cuda.set_device(0)

Pascal VOC

We will be looking at the Pascal VOC dataset. It's quite slow, so you may prefer to download from this mirror. There are two different competition/research datasets, from 2007 and 2012. We'll be using the 2007 version. You can use the larger 2012 for better results, or even combine them (but be careful to avoid data leakage between the validation sets if you do this).

Unlike previous lessons, we are using the python 3 standard library pathlib for our paths and file access. Note that it returns an OS-specific class (on Linux, PosixPath) so your output may look a little different. Most libraries than take paths as input can take a pathlib object - although some (like cv2) can't, in which case you can use str() to convert it to a string.

Download the dataset

In [5]:
!pwd
/home/ubuntu/fastai/courses/dl2
In [9]:
!ln -s ~/data data
In [12]:
!ls -la data/
total 837144
drwxrwxr-x  4 ubuntu ubuntu      4096 May 20 05:28 .
drwxr-xr-x 24 ubuntu ubuntu      4096 May 20 15:01 ..
drwxrwxr-x  8 ubuntu ubuntu      4096 May 13 13:09 dogscats
-rw-rw-r--  1 ubuntu ubuntu 857214334 Apr  1  2017 dogscats.zip
drwxrwxr-x  2 ubuntu ubuntu      4096 May 20 06:02 spellbee
In [18]:
%cd data
/home/ubuntu/data
In [22]:
%mkdir pascal
In [23]:
%cd pascal/
/home/ubuntu/data/pascal
In [33]:
!tar -xf VOCtrainval_06-Nov-2007.tar
In [20]:
!aria2c --file-allocation=none -c -x 5 -s 5 http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
[#8be4bd 431MiB/438MiB(98%) CN:1 DL:34MiB]                        
06/15 21:53:04 [NOTICE] Download complete: /home/ubuntu/data/VOCtrainval_06-Nov-2007.tar

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
8be4bd|OK  |    33MiB/s|/home/ubuntu/data/VOCtrainval_06-Nov-2007.tar

Status Legend:
(OK):download completed.
In [21]:
!aria2c --file-allocation=none -c -x 5 -s 5 https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip
06/15 21:54:27 [NOTICE] Download complete: /home/ubuntu/data/PASCAL_VOC.zip

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
1350c8|OK  |   8.4MiB/s|/home/ubuntu/data/PASCAL_VOC.zip

Status Legend:
(OK):download completed.
In [33]:
!tar -xf VOCtrainval_06-Nov-2007.tar
In [35]:
!unzip PASCAL_VOC.zip
Archive:  PASCAL_VOC.zip
   creating: PASCAL_VOC/
  inflating: PASCAL_VOC/pascal_test2007.json  
  inflating: PASCAL_VOC/pascal_train2007.json  
  inflating: PASCAL_VOC/pascal_train2012.json  
  inflating: PASCAL_VOC/pascal_val2007.json  
  inflating: PASCAL_VOC/pascal_val2012.json  
In [36]:
%mv PASCAL_VOC/*.json .
In [38]:
%rmdir PASCAL_VOC
In [39]:
%ls -la
total 462072
drwxrwxr-x 3 ubuntu ubuntu      4096 Jun 15 22:01 ./
drwxrwxr-x 5 ubuntu ubuntu      4096 Jun 15 21:56 ../
-rw-r--r-- 1 ubuntu ubuntu   2584743 Jul  7  2015 pascal_test2007.json
-rw-r--r-- 1 ubuntu ubuntu   1346236 Aug 19  2015 pascal_train2007.json
-rw-r--r-- 1 ubuntu ubuntu   2912167 Aug 19  2015 pascal_train2012.json
-rw-r--r-- 1 ubuntu ubuntu   1342257 Jul  7  2015 pascal_val2007.json
-rw-r--r-- 1 ubuntu ubuntu   2922699 Aug 19  2015 pascal_val2012.json
-rw-rw-r-- 1 ubuntu ubuntu   1998182 Jun 15 21:54 PASCAL_VOC.zip
drwxrwxr-x 3 ubuntu ubuntu      4096 Nov  6  2007 VOCdevkit/
-rw-rw-r-- 1 ubuntu ubuntu 460032000 Jun 15 21:53 VOCtrainval_06-Nov-2007.tar
In [41]:
%cd ~/fastai/courses/dl2
/home/ubuntu/fastai/courses/dl2
In [4]:
PATH = Path('data/pascal')
list(PATH.iterdir())
Out[4]:
[PosixPath('data/pascal/pascal_train2012.json'),
 PosixPath('data/pascal/VOCtrainval_06-Nov-2007.tar'),
 PosixPath('data/pascal/pascal_train2007.json'),
 PosixPath('data/pascal/models'),
 PosixPath('data/pascal/VOCdevkit'),
 PosixPath('data/pascal/pascal_val2007.json'),
 PosixPath('data/pascal/pascal_test2007.json'),
 PosixPath('data/pascal/pascal_val2012.json'),
 PosixPath('data/pascal/PASCAL_VOC.zip'),
 PosixPath('data/pascal/tmp')]

As well as the images, there are also annotations - bounding boxes showing where each object is. These were hand labeled. The original version were in XML, which is a little hard to work with nowadays, so we uses the more recent JSON version which you can download from this link.

You can see here how pathlib includes the ability to open files (amongst many other capabilities).

In [5]:
trn_j = json.load( (PATH / 'pascal_train2007.json').open() )
trn_j.keys()
Out[5]:
dict_keys(['images', 'type', 'annotations', 'categories'])
In [6]:
IMAGES, ANNOTATIONS, CATEGORIES = ['images', 'annotations', 'categories']
trn_j[IMAGES][:5]
Out[6]:
[{'file_name': '000012.jpg', 'height': 333, 'width': 500, 'id': 12},
 {'file_name': '000017.jpg', 'height': 364, 'width': 480, 'id': 17},
 {'file_name': '000023.jpg', 'height': 500, 'width': 334, 'id': 23},
 {'file_name': '000026.jpg', 'height': 333, 'width': 500, 'id': 26},
 {'file_name': '000032.jpg', 'height': 281, 'width': 500, 'id': 32}]
In [66]:
trn_j[ANNOTATIONS][:2]
Out[66]:
[{'segmentation': [[155, 96, 155, 270, 351, 270, 351, 96]],
  'area': 34104,
  'iscrowd': 0,
  'image_id': 12,
  'bbox': [155, 96, 196, 174],
  'category_id': 7,
  'id': 1,
  'ignore': 0},
 {'segmentation': [[184, 61, 184, 199, 279, 199, 279, 61]],
  'area': 13110,
  'iscrowd': 0,
  'image_id': 17,
  'bbox': [184, 61, 95, 138],
  'category_id': 15,
  'id': 2,
  'ignore': 0}]
In [68]:
trn_j[CATEGORIES][:8]
Out[68]:
[{'supercategory': 'none', 'id': 1, 'name': 'aeroplane'},
 {'supercategory': 'none', 'id': 2, 'name': 'bicycle'},
 {'supercategory': 'none', 'id': 3, 'name': 'bird'},
 {'supercategory': 'none', 'id': 4, 'name': 'boat'},
 {'supercategory': 'none', 'id': 5, 'name': 'bottle'},
 {'supercategory': 'none', 'id': 6, 'name': 'bus'},
 {'supercategory': 'none', 'id': 7, 'name': 'car'},
 {'supercategory': 'none', 'id': 8, 'name': 'cat'}]

It's helpful to use constants instead of strings, since we get tab-completion and don't mistype.

In [7]:
FILE_NAME, ID, IMG_ID, CAT_ID, BBOX = 'file_name', 'id', 'image_id', 'category_id', 'bbox'
In [75]:
cats = { o[ID]:o["name"] for o in trn_j[CATEGORIES] }
trn_fns = { o[ID]:o[FILE_NAME] for o in trn_j[IMAGES] }
trn_ids = { o[ID] for o in trn_j[IMAGES] }
In [81]:
list( (PATH / 'VOCdevkit/VOC2007').iterdir() )
Out[81]:
[PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/SegmentationClass'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/Annotations'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/SegmentationObject'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/ImageSets')]
In [82]:
JPEGS = 'VOCdevkit/VOC2007/JPEGImages'
In [83]:
IMG_PATH = PATH / JPEGS
list( IMG_PATH.iterdir() )[:5]
Out[83]:
[PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/001688.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/007189.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/003408.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/001604.jpg'),
 PosixPath('data/pascal/VOCdevkit/VOC2007/JPEGImages/000729.jpg')]

Each image has a unique ID.

In [84]:
im0_d = trn_j[IMAGES][0]
im0_d
Out[84]:
{'file_name': '000012.jpg', 'height': 333, 'width': 500, 'id': 12}
In [85]:
im0_d[FILE_NAME], im0_d[ID]
Out[85]:
('000012.jpg', 12)

A defaultdict is useful any time you want to have a default dictionary entry for new keys. Here we create a dict from image IDs to a list of annotations (tuple of bounding box and class id).

We convert VOC's height/width into top-left/bottom-right, and switch x/y coords to be consistent with numpy.

In [86]:
def hw_bb(bb):
    # Example, bb = [155, 96, 196, 174]
    return np.array([ bb[1], bb[0], bb[3] + bb[1] - 1, bb[2] + bb[0] - 1 ])
In [106]:
# VOC's bbox: column (x coord), row (of top left, y coord), height, width
#ix   0    1   2    3
bb = [155, 96, 196, 174]
bb[1], bb[0], bb[3] + bb[1] - 1, bb[2] + bb[0] - 1
Out[106]:
(96, 155, 269, 350)
In [107]:
trn_anno = collections.defaultdict(lambda:[])

for o in trn_j[ANNOTATIONS]:
    if not o['ignore']:
        bb = o[BBOX] # one bbox. looks like '[155, 96, 196, 174]'.
        bb = hw_bb(bb)
        trn_anno[o[IMG_ID]].append( (bb, o[CAT_ID]) )

len(trn_anno)
Out[107]:
2501
In [115]:
# Test getting the first element from dict_values
list(trn_anno.values())[0]
Out[115]:
[(array([ 96, 155, 269, 350]), 7)]
In [117]:
print(im0_d[ID])

im_a = trn_anno[im0_d[ID]]
im_a
12
Out[117]:
[(array([ 96, 155, 269, 350]), 7)]
In [120]:
im0_a = im_a[0] # get first item (first bbox) from list. note: possible to have more than one bbox per image.
im0_a
Out[120]:
(array([ 96, 155, 269, 350]), 7)
In [121]:
cats[7]
Out[121]:
'car'
In [122]:
trn_anno[17]
Out[122]:
[(array([ 61, 184, 198, 278]), 15), (array([ 77,  89, 335, 402]), 13)]
In [123]:
cats[15], cats[13]
Out[123]:
('person', 'horse')

Some libs take VOC format bounding boxes, so this let's us convert back when required:

In [124]:
bb_voc = [155, 96, 196, 174]
bb_fastai = hw_bb(bb_voc)
bb_fastai
Out[124]:
array([ 96, 155, 269, 350])
In [125]:
def bb_hw(a):
    return np.array( [ a[1], a[0], a[3] - a[1] + 1, a[2] - a[0] + 1 ] )
In [126]:
f'expected: {bb_voc}, actual: {bb_hw(bb_fastai)}'
Out[126]:
'expected: [155, 96, 196, 174], actual: [155  96 196 174]'

You can use Visual Studio Code (vscode - open source editor that comes with recent versions of Anaconda, or can be installed separately), or most editors and IDEs, to find out all about the open_image function. vscode things to know:

  • Command palette (Ctrl-shift-p)
  • Select interpreter (for fastai env)
  • Select terminal shell
  • Go to symbol (Ctrl-t)
  • Find references (Shift-F12)
  • Go to definition (F12)
  • Go back (alt-left)
  • View documentation
  • Hide sidebar (Ctrl-b)
  • Zen mode (Ctrl-k,z)
In [127]:
im = open_image(IMG_PATH / im0_d[FILE_NAME])

Matplotlib's plt.subplots is a really useful wrapper for creating plots, regardless of whether you have more than one subplot. Note that Matplotlib has an optional object-oriented API which I think is much easier to understand and use (although few examples online use it!)

In [135]:
def show_img(im, figsize=None, ax=None):
    if not ax:
        fig, ax = plt.subplots(figsize=figsize)
    ax.imshow(im)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    return ax

A simple but rarely used trick to making text visible regardless of background is to use white text with black outline, or visa versa. Here's how to do it in matplotlib.

In [140]:
def draw_outline(o, lw):
    o.set_path_effects( [patheffects.Stroke(linewidth=lw, foreground='black'),
                          patheffects.Normal()] )

Note that * in argument lists is the splat operator. In this case it's a little shortcut compared to writing out b[-2],b[-1].

In [130]:
def draw_rect(ax, b):
    patch = ax.add_patch(patches.Rectangle(b[:2], *b[-2:], fill=False, edgecolor='white', lw=2))
    draw_outline(patch, 4)
In [330]:
def draw_text(ax, xy, txt, sz=14):
    text = ax.text(*xy, txt, verticalalignment='top', color='white', fontsize=sz, weight='bold')
    draw_outline(text, 1)
In [142]:
ax = show_img(im)
b = bb_hw(im0_a[0]) # convert bbox back to VOC format
draw_rect(ax, b)
draw_text(ax, b[:2], cats[im0_a[1]])

Packaging it all up

In [143]:
def draw_im(im, ann):
    # im is image, ann is annotations
    ax = show_img(im, figsize=(16, 8))
    for b, c in ann:
        # b is bbox, c is class id
        b = bb_hw(b)
        draw_rect(ax, b)
        draw_text(ax, b[:2], cats[c], sz=16)
In [146]:
def draw_idx(i):
    # i is image id
    im_a = trn_anno[i] # training annotations
    im = open_image(IMG_PATH / trn_fns[i]) # trn_fns is training image file names
    print(im.shape)
    draw_im(im, im_a) # im_a is an element of annotation
In [147]:
draw_idx(17) # image id is 17
(364, 480, 3)

Largest item classifier

A lambda function is simply a way to define an anonymous function inline. Here we use it to describe how to sort the annotation for each image - by bounding box size (descending).

In [244]:
def get_lrg(b):
    if not b:
        raise Exception()
    # x is tuple. e.g.: (array([96 155 269 350]), 16)
    # x[0] returns a numpy array. e.g.: [96 155 269 350]
    # x[0][-2:] returns a numpy array. e.g.: [269 350]. This is the width x height of a bbox.
    # x[0][:2] returns a numpy array. e.g.: [96 155]. This is the x/y coord of a bbox.
    # np.product(x[0][-2:] - x[0][:2]) returns a scalar. e.g.: 33735
    b = sorted(b, key=lambda x: np.product(x[0][-2:] - x[0][:2]), reverse=True)
    return b[0] # get the first element in the list, which is the largest bbox for one image.
In [240]:
# Debugging codes
np_prod = np.product(np.array([269, 350]) - np.array([96, 155]))
minus_mul = (269 - 96) * (350 - 155) # bbox volume (area): width x height at origin (0, 0)
print(np_prod)
assert np_prod == minus_mul
33735
In [ ]:
# for k, v in trn_anno.items():
#     print(f"k: {k}, v: {v}")
In [242]:
# a is image id (int), b is tuple of bbox (numpy array) & class id (int)
trn_lrg_anno = { a: get_lrg(b) for a, b in trn_anno.items() if (a != 0 and a != 1) }
In [263]:
trn_lrg_anno[23]
Out[263]:
(array([  1,   2, 461, 242]), 15)

Now we have a dictionary from image id to a single bounding box - the largest for that image.

In [259]:
def draw_largest_bbox(img_id):
    b, c = trn_lrg_anno[img_id] # trn_lrg_anno is a tuple. destructuring syntax.
    print(f'### DEBUG ### bbox: {b.tolist()}, class id: {c}') # print numpy.ndarray using tolist method.

    b = bb_hw(b) # convert back fastai's bbox to VOC format
    ax = show_img(open_image(IMG_PATH / trn_fns[img_id]), figsize=(5, 10))
    draw_rect(ax, b)
    draw_text(ax, b[:2], cats[c], sz=16)
In [283]:
img_id = 695
draw_largest_bbox(img_id)
### DEBUG ### bbox: [125, 108, 365, 414], class id: 13
In [273]:
(PATH / 'tmp').mkdir(exist_ok=True)
CSV = PATH / 'tmp/lrg.csv'

Often it's easiest to simply create a CSV of the data you want to model, rather than trying to create a custom dataset. Here we use Pandas to help us create a CSV of the image filename and class.

In [274]:
df = pd.DataFrame({ 'fn': [trn_fns[o] for o in trn_ids],
                    'cat': [cats[trn_lrg_anno[o][1]] for o in trn_ids] }, columns=['fn', 'cat'])
df.to_csv(CSV, index=False)
In [284]:
f_model = resnet34
sz = 224
bs = 64

From here it's just like Dogs vs Cats!

In [285]:
tfms = tfms_from_model(f_model, sz, aug_tfms=transforms_side_on, crop_type=CropType.NO)
md = ImageClassifierData.from_csv(PATH, JPEGS, CSV, tfms=tfms, bs=bs)
In [288]:
x, y = next(iter(md.val_dl))
In [291]:
show_img(md.val_ds.denorm(to_np(x))[0])
Out[291]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f64b9d0ba58>