Sports 1M C3D Network to Keras

This notebook is trying to explain detailed the steps needed to port the network parameters trained on a Caffe's fork to Keras. The network was trained over the Sports1M Dataset, and the Caffe version was a fork which implements the 3D convolution. More info at the paper.

The first step is to obtain the parameters file related to the network. This file is stored as a caffemodel. To read this file it will be needed to generate the python output of the proto definition at the fork of caffe C3D.

To compile the proto file and then be able to read the caffemodel file with python require to recompile the protobuf library increasing the limit size. By definition, protobuf is thought to treat no large amount of data and it is limitated to read files smaller than 64MB. We can increase that limit following these steps:

  1. Cloning the protobuf repository.
    git clone https://github.com/google/protobuf
    
  2. Change the following line of code.
diff --git a/src/google/protobuf/io/coded_stream.h b/src/google/protobuf/io/coded_stream.h
index c81a33a..eeb8863 100644
--- a/src/google/protobuf/io/coded_stream.h
+++ b/src/google/protobuf/io/coded_stream.h
@@ -609,7 +609,7 @@ class LIBPROTOBUF_EXPORT CodedInputStream {
   // Return the size of the buffer.
   int BufferSize() const;

-  static const int kDefaultTotalBytesLimit = 64 << 20;  // 64MB
+  static const int kDefaultTotalBytesLimit = 256 << 20;  // 256MB

   static const int kDefaultTotalBytesWarningThreshold = 32 << 20;  // 32MB
  1. Follow the instructions here to recompile the protoc compiler.

  2. Follow the instructions here to install the python protobuf with the reading file size increased.

  3. Compile the caffe.proto file for python.

    protoc --python_out=. caffe.proto
    

Once protobuf is build, it is also required to have all the depencies ok Keras and Theano (it is going to use as the backend of Keras because is the only one that supports 3D convolutions).

Sports1M pre-trained model C3D

Following the definition of the caffe proto of the C3D network used to train Sports1M and the paper, we define the same model in Keras.

In [1]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution3D, MaxPooling3D, ZeroPadding3D
from keras.optimizers import SGD

def get_model(summary=False):
    """ Return the Keras model of the network
    """
    model = Sequential()
    # 1st layer group
    model.add(Convolution3D(64, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv1',
                            subsample=(1, 1, 1), 
                            input_shape=(3, 16, 112, 112)))
    model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2), 
                           border_mode='valid', name='pool1'))
    # 2nd layer group
    model.add(Convolution3D(128, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv2',
                            subsample=(1, 1, 1)))
    model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2), 
                           border_mode='valid', name='pool2'))
    # 3rd layer group
    model.add(Convolution3D(256, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv3a',
                            subsample=(1, 1, 1)))
    model.add(Convolution3D(256, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv3b',
                            subsample=(1, 1, 1)))
    model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2), 
                           border_mode='valid', name='pool3'))
    # 4th layer group
    model.add(Convolution3D(512, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv4a',
                            subsample=(1, 1, 1)))
    model.add(Convolution3D(512, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv4b',
                            subsample=(1, 1, 1)))
    model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2), 
                           border_mode='valid', name='pool4'))
    # 5th layer group
    model.add(Convolution3D(512, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv5a',
                            subsample=(1, 1, 1)))
    model.add(Convolution3D(512, 3, 3, 3, activation='relu', 
                            border_mode='same', name='conv5b',
                            subsample=(1, 1, 1)))
    model.add(ZeroPadding3D(padding=(0, 1, 1)))
    model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=(2, 2, 2), 
                           border_mode='valid', name='pool5'))
    model.add(Flatten())
    # FC layers group
    model.add(Dense(4096, activation='relu', name='fc6'))
    model.add(Dropout(.5))
    model.add(Dense(4096, activation='relu', name='fc7'))
    model.add(Dropout(.5))
    model.add(Dense(487, activation='softmax', name='fc8'))
    if summary:
        print(model.summary())
    return model

model = get_model(summary=True)
Using Theano backend.
--------------------------------------------------------------------------------
Initial input shape: (None, 3, 16, 112, 112)
--------------------------------------------------------------------------------
Layer (name)                  Output Shape                  Param #             
--------------------------------------------------------------------------------
Convolution3D (conv1)         (None, 64, 16, 112, 112)      5248                
MaxPooling3D (pool1)          (None, 64, 16, 56, 56)        0                   
Convolution3D (conv2)         (None, 128, 16, 56, 56)       221312              
MaxPooling3D (pool2)          (None, 128, 8, 28, 28)        0                   
Convolution3D (conv3a)        (None, 256, 8, 28, 28)        884992              
Convolution3D (conv3b)        (None, 256, 8, 28, 28)        1769728             
MaxPooling3D (pool3)          (None, 256, 4, 14, 14)        0                   
Convolution3D (conv4a)        (None, 512, 4, 14, 14)        3539456             
Convolution3D (conv4b)        (None, 512, 4, 14, 14)        7078400             
MaxPooling3D (pool4)          (None, 512, 2, 7, 7)          0                   
Convolution3D (conv5a)        (None, 512, 2, 7, 7)          7078400             
Convolution3D (conv5b)        (None, 512, 2, 7, 7)          7078400             
ZeroPadding3D (zeropadding3d) (None, 512, 2, 9, 9)          0                   
MaxPooling3D (pool5)          (None, 512, 1, 4, 4)          0                   
Flatten (flatten)             (None, 8192)                  0                   
Dense (fc6)                   (None, 4096)                  33558528            
Dropout (dropout)             (None, 4096)                  0                   
Dense (fc7)                   (None, 4096)                  16781312            
Dropout (dropout)             (None, 4096)                  0                   
Dense (fc8)                   (None, 487)                   1995239             
--------------------------------------------------------------------------------
Total params: 79991015
--------------------------------------------------------------------------------
None

Loading parameters

Now lets load all the parametes. One thing to consider is that this operation waste a lot of computer memory (arround 3GB of RAM) due to ineficiencies of protobuf with large serialized objects.

Due to differences between the order of saving the parametes in caffe, some transformations to the matrix should be made.

In [2]:
import caffe_pb2 as caffe
import numpy as np

p = caffe.NetParameter()
p.ParseFromString(
    open('model/conv3d_deepnetA_sport1m_iter_1900000', 'rb').read()
)

def rot90(W):
    for i in range(W.shape[0]):
        for j in range(W.shape[1]):
            for k in range(W.shape[2]):
                W[i, j, k] = np.rot90(W[i, j, k], 2)
    return W

params = []
conv_layers_indx = [1, 4, 7, 9, 12, 14, 17, 19]
fc_layers_indx = [22, 25, 28]

for i in conv_layers_indx:
    layer = p.layers[i]
    weights_b = np.array(layer.blobs[1].data, dtype=np.float32)
    weights_p = np.array(layer.blobs[0].data, dtype=np.float32).reshape(
        layer.blobs[0].num, layer.blobs[0].channels, layer.blobs[0].length,
        layer.blobs[0].height, layer.blobs[0].width
    )
    weights_p = rot90(weights_p)
    params.append([weights_p, weights_b])
for i in fc_layers_indx:
    layer = p.layers[i]
    weights_b = np.array(layer.blobs[1].data, dtype=np.float32)
    weights_p = np.array(layer.blobs[0].data, dtype=np.float32).reshape(
        layer.blobs[0].num, layer.blobs[0].channels, layer.blobs[0].length,
        layer.blobs[0].height, layer.blobs[0].width)[0,0,0,:,:].T
    params.append([weights_p, weights_b])

Now that all the params are loaded, lets put it to the model.

In [3]:
model_layers_indx = [0, 2, 4, 5, 7, 8, 10, 11] + [15, 17, 19] #conv + fc
for i, j in zip(model_layers_indx, range(11)):
    model.layers[i].set_weights(params[j])

For the future, we save the model definition to json and also its weights for whatever use we would like to do with them.

You can also download them from here: model and weights.

In [4]:
import h5py

model.save_weights('sports1M_weights.h5', overwrite=True)
json_string = model.to_json()
with open('sports1M_model.json', 'w') as f:
    f.write(json_string)

Testing

For now on, it is highly recommended to reestart the kernel and reload the weights from the saved file. Doing this will only require 300MB instead of 3GB of memory. Also compile the model with a sgd and optimizer. This will not affect on testing due it work for training, but is necessary to compile the model so theano tensor's operations could be done in forward direction.

In [1]:
from keras.models import model_from_json

model = model_from_json(open('sports1M_model.json', 'r').read())
model.load_weights('sports1M_weights.h5')
model.compile(loss='mean_squared_error', optimizer='sgd')
Using Theano backend.

For testing we are going to load all the labels corresponding to the Sports1M dataset as can be found here.

In [2]:
with open('dataset/labels.txt', 'r') as f:
    labels = [line.strip() for line in f.readlines()]
print('Total labels: {}'.format(len(labels)))
Total labels: 487

For testing we are also going to load a video from the Sports1M dataset and pass it through the model. For this is required to have OpenCV3 and ffmpeg to be able to load videos.

In [3]:
import cv2
import numpy as np

cap = cv2.VideoCapture('dM06AMFLsrc.mp4')

vid = []
while True:
    ret, img = cap.read()
    if not ret:
        break
    vid.append(cv2.resize(img, (171, 128)))
vid = np.array(vid, dtype=np.float32)

Plot a frame of the video. As can be seen, the video correspond to the label: basquetball

In [4]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.imshow(vid[2000]/256)
Out[4]:
<matplotlib.image.AxesImage at 0x122b014a8>

Now from the video extract a 16 frame clip and crop the center to get a 3x16x112x112 clip.

In [5]:
X = vid[2000:2016, 8:120, 30:142, :].transpose((3, 0, 1, 2))
output = model.predict_on_batch(np.array([X]))
In [6]:
plt.plot(output[0][0])
Out[6]:
[<matplotlib.lines.Line2D at 0x110e579e8>]
In [7]:
print('Position of maximum probability: {}'.format(output[0].argmax()))
print('Maximum probability: {:.5f}'.format(max(output[0][0])))
print('Corresponding label: {}'.format(labels[output[0].argmax()]))

# sort top five predictions from softmax output
top_inds = output[0][0].argsort()[::-1][:5]  # reverse sort and take five largest items
print('\nTop 5 probabilities and labels:')
_ =[print('{:.5f} {}'.format(output[0][0][i], labels[i])) for i in top_inds]
Position of maximum probability: 367
Maximum probability: 0.45910
Corresponding label: basketball

Top 5 probabilities and labels:
0.45910 basketball
0.39566 streetball
0.02090 greco-roman wrestling
0.01479 freestyle wrestling
0.01391 slamball

As can be seen on the previous results, the classification of the video has been done correctly, giving as output the basketball category.