For this project, I wanted to come up with a simple example to get more experience working with Convolutional Neural Networks. I decided I wanted to work on a project from scratch, but I also wanted to keep it fairly simple. I wanted to start with nothing, create my own image data set, and use it in a model. I decided I would make a model for detecting my hand using my webcam. The idea was to draw a bounding box around my hand in videos captured using the webcam.
Initially I began by creating a small dataset of images and a basic model. As I iterated through the process of trying to improve my results, I added more images and made tweaks to the model. The following notebook is the final version of the dataset and model that I used. The full directory of previos attempts and models can be found on my my Github page. This specific notebook can be found here.
As can be seen in the two result videos below, the project was a success. The two videos below were not part of the training/validation/test sets.
from IPython.display import YouTubeVideo
display(YouTubeVideo('KGgFYSOsrSk', width=360, height=360))
display(YouTubeVideo('3mq0tE7Ft7Q', width=360, height=360))
The dataset was created using a Logitech C615 webcam. The video was taken at a resolution of 960x720. That was the default resolution of the webcam program I used, it wasn't chosen for any specific reason.
Over the course of the project, several videos were taken using the webcam. The videos frames were then converted to images using ffmpeg. The dataset was constructed in a way so there was only one hand or no hand in each image. I then created bounding boxes around the hand in images that contained a hand using labelImg.
Videos consisted of two types. The first type being where my face/body were included in the video. The second type being just a hand and/or partial arm in the video. Included in the dataset were also negative examples which included images taken from the video which contained no hand.
Data created from the videos were added in four different phases. The first phase contained 786 positive examples and 155 negative examples. After the model performed poorly, I added a second phase of 275 positive examples and 22 negative examples. The model still performed poorly, so I took the time with the third phase to more than double the data by adding 1638 positive examples and 123 negative examples. The model was performing better at this point, but was still having trouble detecting the hand when it was near my face, so I added a fourth phase which contained 281 positive examples and 24 negative examples focusing on examples that contained a hand near and far from my face. The dataset contains a total of 2980 positive examples and 324 negative examples.
The images were labeled by hand. Each image has five target values. The first value is a confidence value of 1 if the image contains a hand and 0 if the image doesn't. The second through fifth values are two x,y coordinate pairs that make up the bounding box for the hand. These x, y coordinates are 0 if no hand is contained in the image.
After labeling the images, I then flipped the entire dataset along the horizontal axis. This doubled the dataset size to a total of 5960 positive examples and 648 negative examples. Annotations for bounding boxes were updates to reflect this flip.
The video below displays a sample of the data along with targets from the training set. The target bounding box is drawn around the hand in red. If there is no hand in the image (0 confidence), no bounding box is drawn.
import os
#Turn off tensorflow messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
#Set second GPU as visible
os.environ["CUDA_VISIBLE_DEVICES"] = '1'
import keras
from keras.callbacks import EarlyStopping
import keras.backend as K
import tensorflow as tf
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import cv2
from IPython.display import HTML
from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
from imgaug import augmenters as iaa
from shutil import copyfile
def play_images_from_directory(directory, json_file, num_to_iter=1, model=None, prediction_threshold=0.5):
#num_to_iter: number of images to skip when iterating to create video
assert json_file is not None
assert os.path.isdir(directory)
assert os.path.isfile(directory + json_file)
fig, ax = plt.subplots(figsize=(5,5))
if json_file is not None:
json_file = open(directory + json_file)
json_data = json.load(json_file)
annotation_dict = {}
for entry in json_data:
width = entry['annotations'][0]['coordinates']['width']
height = entry['annotations'][0]['coordinates']['height']
x1 = entry['annotations'][0]['coordinates']['x'] - width / 2
y1 = entry['annotations'][0]['coordinates']['y'] - height / 2
x2 = x1 + width
y2 = y1 + height
file_name = entry['image']
confidence = 0 if width == 0 and height == 0 and x1 == 0 and y1 == 0 else 1.0
annotation_dict[file_name] = [confidence, y1, x1, y2, x2]
animation_images = []
iter_count = 0
for file_name in os.listdir(directory):
if iter_count % num_to_iter != 0:
iter_count += 1
continue
if file_name.endswith('.jpg'):
image = cv2.imread(directory + file_name)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
if json_file is not None:
annotation = annotation_dict[file_name]
if annotation[0] == 1.0:
y1 = annotation[1]
x1 = annotation[2]
y2 = annotation[3]
x2 = annotation[4]
cv2.rectangle(image, (int(x1), int(y1)), (int(x2), int(y2)), (255, 0, 0), 2)
if model is not None:
#If a model is passed in, do prediction and display some metric data
image_copy = image.copy() / 255.0
image_copy = cv2.resize(image_copy, (224, 224))
prediction = model.predict(np.reshape(image_copy, (1, 224, 224, 3)))
conf_pred = prediction[0][0][0]
iou_result = 0.0
if conf_pred > prediction_threshold:
pred_y1 = prediction[1][0][0] * 720
pred_x1 = prediction[1][0][1] * 960
pred_y2 = prediction[1][0][2] * 720
pred_x2 = prediction[1][0][3] * 960
cv2.rectangle(image, (int(pred_x1), int(pred_y1)), (int(pred_x2), int(pred_y2)), (0, 255, 0), 2)
iou_metric = IOU_Metric()
y_true = tf.convert_to_tensor([annotation[1:]], dtype=tf.float32)
y_pred = tf.convert_to_tensor([[pred_y1, pred_x1, pred_y2, pred_x2]], dtype=tf.float32)
iou_metric.update_state(y_true, y_pred)
iou_result = K.get_value(iou_metric.result())
cv2.putText(image, 'Conf: {:.2f}'.format(conf_pred), (2, 36), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 0), 2, cv2.LINE_AA)
cv2.putText(image, 'IoU: {:.2f}'.format(iou_result), (2, 80), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 0), 2, cv2.LINE_AA)
animation_images.append([ax.imshow(image)])
iter_count += 1
ani = animation.ArtistAnimation(fig, animation_images, interval=250, blit=True)
plt.close()
return HTML(ani.to_html5_video())
display(play_images_from_directory('./data/train/positive/', 'annotations.json', num_to_iter=150))
display(play_images_from_directory('./data/train/negative/', 'annotations.json', num_to_iter=15))
The initial training set by itself wasn't enough to produce a good model. To increase the dataset size, I augmented the training images using imgAug. For each image in the training set, a random augmentation was applied and the new image saved. This was repeated over the entire training set multiple times until the set was large enoough.
Augmentations included:
Each image that is augmented applies a random value from the four augmentations. Even though the same starting image may be augmented multiple times, each outcome should be different.
When performing augmentations on the image, certain sections along the edge of the image may be left with no information. To fill in these blank spaces, the last pixel value from the image edge is extended out to fill in the missing data.
When augmenting the image, it is also possible that the hand could be moved out of the image. To correct for this, only images where the augmented bounding box values are 75% contained within the image are kept. For example, if a hand and its bounding box is in the top left corner, and the image is augmented to rotate left, it's possible the hand could be rotated out of the new image. If more than 25% of the augmented bounding box values fall outside of the image after rotation, that image is discarded.
Below, a video shows what the augmented images looks like.
def create_augmented_images(directory_list, json_list, save_directory, num_augmented, bounding_percentage=0.25):
#bounding_percentage : percent of augmented bounding box outside of the image before its thrown out
def generate_augmented_image(image, annotation):
success = False
while success == False:
bounding_box = BoundingBoxesOnImage([BoundingBox(x1=annotation[2],
x2=annotation[4],
y1=annotation[1],
y2=annotation[3],
label='hand' if annotation[0] == 1.0 else 0)],
shape=image.shape)
seq = iaa.Sequential([iaa.Affine(scale=(0.75, 0.95),
translate_percent=(-0.2, 0.2),
rotate=(-10, 10),
shear=(-10, 10),
mode='edge'),
iaa.PerspectiveTransform(scale=(0.01, 0.15), mode='replicate')])
image_aug, bb_aug = seq(images=[image], bounding_boxes=bounding_box)
fraction = bb_aug.bounding_boxes[0].compute_out_of_image_fraction(image_aug[0])
if fraction <= bounding_percentage:
#bounding box is inside image
success = True
bb = bb_aug.bounding_boxes[0].clip_out_of_image(image_aug[0])
y1 = bb.y1_int
x1 = bb.x1_int
y2 = bb.y2_int
x2 = bb.x2_int
augmented_image = image_aug[0]
if bb_aug.bounding_boxes[0].label == 'hand':
augmented_annotation = [1.0, y1, x1, y2, x2]
else:
augmented_annotation = [0, 0, 0, 0, 0]
return augmented_image, augmented_annotation
def get_annotation(entry):
width = entry['annotations'][0]['coordinates']['width']
height = entry['annotations'][0]['coordinates']['height']
x = entry['annotations'][0]['coordinates']['x'] - width / 2
y = entry['annotations'][0]['coordinates']['y'] - height / 2
if width == 0 and height == 0:
confidence = 0
else:
confidence = 1
return [confidence, y, x, y + height, x + width]
def get_json_entry(json_file_name, json_annotation):
width = json_annotation[4] - json_annotation[2]
height = json_annotation[3] - json_annotation[1]
x = json_annotation[2] + width / 2
y = json_annotation[1] + height / 2
return {'image' : json_file_name,
'annotations' : [{'label' : 'hand',
'coordinates' : {'x' : float(x),
'y' : float(y),
'width' : float(width),
'height' : float(height)}}]}
json_annotation_file = []
augment_count = 0
while augment_count < num_augmented:
current_count = 0
for i, directory in enumerate(directory_list):
print('Augmenting directory: ', directory)
if augment_count >= num_augmented:
print(' Number augmented: ', current_count)
break
image = None
annotation = None
if json_list[i] is not None:
#files are positive examples with a json file containing annotations
json_file = open(directory + json_list[i])
json_data = json.load(json_file)
for entry in json_data:
file_name = entry['image']
image = cv2.imread(directory + file_name)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
annotation = get_annotation(entry)
augmented_image, augmented_annotation = generate_augmented_image(image, annotation)
augmented_image = cv2.cvtColor(augmented_image, cv2.COLOR_RGB2BGR)
save_name = str(augment_count + 1).zfill(5) + '.jpg'
cv2.imwrite(save_directory + save_name, augmented_image)
json_annotation_file.append(get_json_entry(save_name, augmented_annotation))
augment_count += 1
current_count += 1
if augment_count == num_augmented:
break
else:
#files are negative examples with no json file
for file_name in os.listdir(directory):
if file_name.endswith('.jpg'):
image = cv2.imread(directory + file_name)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
annotation = [0, 0, 0, 0, 0]
augmented_image, augmented_annotation = generate_augmented_image(image, annotation)
augmented_image = cv2.cvtColor(augmented_image, cv2.COLOR_RGB2BGR)
save_name = str(augment_count + 1).zfill(5) + '.jpg'
cv2.imwrite(save_directory + save_name, augmented_image)
json_annotation_file.append(get_json_entry(save_name, augmented_annotation))
augment_count += 1
current_count += 1
if augment_count == num_augmented:
break
print(' Number augmented in {}: {}'.format(directory, current_count))
current_count = 0
print('Total augmented: {}'.format(len(json_annotation_file)))
assert len(json_annotation_file) > 0
json_dict = json.dumps(json_annotation_file)
f = open(save_directory + 'annotations.json', 'w')
f.write(json_dict)
f.close()
!mkdir -p ./data/augmented_tmp/positive
create_augmented_images(['./data/train/positive/'], ['annotations.json'], './data/augmented_tmp/positive/', 30000)
!mkdir -p ./data/augmented_tmp/negative
create_augmented_images(['./data/train/negative/'], [None], './data/augmented_tmp/negative/', 10000)
display(play_images_from_directory('./data/augmented_tmp/positive/', 'annotations.json', num_to_iter=750))
display(play_images_from_directory('./data/augmented_tmp/negative/', 'annotations.json', num_to_iter=200))
Augmenting directory: ./data/train/positive/ Number augmented in ./data/train/positive/: 5960 Augmenting directory: ./data/train/positive/ Number augmented in ./data/train/positive/: 5960 Augmenting directory: ./data/train/positive/ Number augmented in ./data/train/positive/: 5960 Augmenting directory: ./data/train/positive/ Number augmented in ./data/train/positive/: 5960 Augmenting directory: ./data/train/positive/ Number augmented in ./data/train/positive/: 5960 Augmenting directory: ./data/train/positive/ Number augmented in ./data/train/positive/: 200 Total augmented: 30000 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 648 Augmenting directory: ./data/train/negative/ Number augmented in ./data/train/negative/: 280 Total augmented: 10000
The section below combines the original data with the augmented data into a separate directory to be used for training.
def copy_image_directories(image_directory_list, json_list, copy_directory):
assert len(image_directory_list) == len(json_list)
assert os.path.isdir(copy_directory)
for i in range(len(image_directory_list)):
assert os.path.isdir(image_directory_list[i])
if json_list[i] is not None:
assert os.path.isfile(image_directory_list[i] + json_list[i])
file_count = 0
file_names = []
file_in_paths = []
file_out_paths = []
target_values = []
for i in range(len(image_directory_list)):
#set of positive examples or negative examples with json file
if json_list[i] is not None:
json_file = open(image_directory_list[i] + json_list[i])
json_data = json.load(json_file)
for entry in json_data:
file_name = entry['image']
width = entry['annotations'][0]['coordinates']['width']
height = entry['annotations'][0]['coordinates']['height']
x = entry['annotations'][0]['coordinates']['x']
y = entry['annotations'][0]['coordinates']['y']
file_out_name = str(file_count+1).zfill(5) + '.jpg'
file_names.append(file_out_name)
file_count += 1
file_in_paths.append(image_directory_list[i] + file_name)
file_out_paths.append(copy_directory + file_out_name)
target_values.append([x, y, width, height])
#set of negative examples with no json file
else:
for file_name in os.listdir(image_directory_list[i]):
if file_name.endswith('.jpg'):
file_out_name = str(file_count+1).zfill(5) + '.jpg'
file_names.append(file_out_name)
file_count += 1
file_in_paths.append(image_directory_list[i] + file_name)
file_out_paths.append(copy_directory + file_out_name)
target_values.append([0, 0, 0, 0])
assert len(file_in_paths) == len(file_out_paths)
assert len(file_in_paths) == len(target_values)
json_data = []
for i in range(len(target_values)):
copyfile(file_in_paths[i], file_out_paths[i])
x = target_values[i][0]
y = target_values[i][1]
width = target_values[i][2]
height = target_values[i][3]
json_data.append({
'image' : file_names[i],
'annotations' : [
{
'label' : 'hand',
'coordinates' : {
'x' : x,
'y' : y,
'width' : width,
'height' : height}}]})
assert len(json_data) > 0
json_dict = json.dumps(json_data)
f = open(copy_directory + 'annotations.json', 'w')
f.write(json_dict)
f.close()
!mkdir -p ./data/train_augmented
copy_image_directories(['./data/train/positive/', './data/train/negative/', './data/augmented_tmp/positive/', './data/augmented_tmp/negative/'], ['annotations.json'] * 4, './data/train_augmented/')
!rm -rf ./data/augmented_tmp
When a dataset is made up of image files, which can be quite large, memory issues often arise. In order to prevent memory issues in this project, I will extend the Keras Sequence class to create a data generator object. This object loads only the images that are needed for a batch. It does not require storing the entire dataset in memory. It draws a batch from the dataset at each iteration of the batches, and at the end of the epoch, it shuffles the dataset for the next epoch. For each image that is pulled, the pixel values are divided by 255 so all values are between 0 and 1.
The data generator also requires a Pandas DataFrame object that contains the file path of each image and the target variables for each image. Below, I will define the DataGenerator class and a dataframe for the training, validation, and test data. At the end, a preview of the dataframes are shown which has the filename, confidence, and scaled x,y coordinates.
IMAGE_HEIGHT = 720
IMAGE_WIDTH = 960
class DataGenerator(keras.utils.Sequence):
def __init__(self, df, y_cols, directory=None, batch_size=32, dim=(224, 224), n_channels=3, shuffle=True):
self.df = df.reset_index(drop=True)
self.y_cols = y_cols
self.directory = directory
self.batch_size = batch_size
self.dim = dim
self.n_channels = n_channels
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
return int(np.floor(self.df.shape[0] / self.batch_size))
def __getitem__(self, index):
indexes = self.df.index.tolist()[index*self.batch_size:(index+1)*self.batch_size]
X, y = self.__data_generation(indexes)
return X, y
def on_epoch_end(self):
if self.shuffle == True:
self.df = self.df.sample(frac=1.0)
def __data_generation(self, indexes):
X = np.empty((self.batch_size, *self.dim, self.n_channels), dtype=np.float32)
for i, index in enumerate(indexes):
if self.directory is not None:
file_path = self.directory + self.df.iloc[index]['id']
else:
file_path = self.df.iloc[index]['id']
image = cv2.imread(file_path)
image = cv2.resize(image, self.dim)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.astype(np.float32) / 255.0
X[i,] = image
y = self.df.iloc[indexes,:][self.y_cols].to_numpy()
return X, [y[:,0], y[:,1:]]
def get_total_length(self):
return self.df.shape[0]
def get_entry(self, index):
#returns the image and targets for a single index
if index >= self.df.shape[0]:
return None, None, None
X, y = self.__data_generation([index])
return X[0].copy(), y[0][0].copy(), y[1][0].copy()
def get_positive_count(self):
#return the number of positive training examples
if 1 in self.df['confidence'].value_counts():
return int(self.df['confidence'].value_counts()[1])
else:
return 0
def get_negative_count(self):
#return the number of negative training examples
if 0 in self.df['confidence'].value_counts():
return int(self.df['confidence'].value_counts()[0])
else:
return 0
def load_dataframe(directory, json_file):
assert json_file is not None
assert os.path.isdir(directory)
assert os.path.isfile(directory + json_file)
file_names = []
confidence_values = []
y1_values, x1_values, y2_values, x2_values = [], [], [], []
json_file = open(directory + json_file)
json_data = json.load(json_file)
for entry in json_data:
#load annotations, annotations are center of box, convert to x1,y1 and width, height
width = entry['annotations'][0]['coordinates']['width']
height = entry['annotations'][0]['coordinates']['height']
x = entry['annotations'][0]['coordinates']['x'] - width / 2
y = entry['annotations'][0]['coordinates']['y'] - height / 2
file_name = entry['image']
confidence_value = 1.0
if width == 0 and height == 0 and x == 0 and y == 0:
confidence_value = float(0)
confidence_values.append(confidence_value)
y1_values.append(y)
x1_values.append(x)
y2_values.append(y + height)
x2_values.append(x + width)
file_names.append(file_name)
df = pd.DataFrame(data={'id' : file_names,
'confidence' : confidence_values,
'y1' : y1_values,
'x1' : x1_values,
'y2' : y2_values,
'x2' : x2_values})
df['y1'] = df['y1'].apply(np.round)
df['x1'] = df['x1'].apply(np.round)
df['y2'] = df['y2'].apply(np.round)
df['x2'] = df['x2'].apply(np.round)
#sanity check that all target values fall within the correct range
for index, row in df.iterrows():
assert row['confidence'] == 1.0 or row['confidence'] == 0
assert row['y1'] >= 0 and row['y1'] <= 720
assert row['x1'] >= 0 and row['x1'] <= 960
assert row['y2'] >= 0 and row['y2'] <= 720
assert row['x2'] >= 0 and row['x2'] <= 960
return df
#load dataframes with filenames and target data, then scale x,y coordinate data to be between 0 and 1
df_train = load_dataframe('./data/train_augmented/', 'annotations.json')
df_train['y1'] = df_train['y1'] / IMAGE_HEIGHT
df_train['x1'] = df_train['x1'] / IMAGE_WIDTH
df_train['y2'] = df_train['y2'] / IMAGE_HEIGHT
df_train['x2'] = df_train['x2'] / IMAGE_WIDTH
df_validation = load_dataframe('./data/validation/', 'annotations.json')
df_validation['y1'] = df_validation['y1'] / IMAGE_HEIGHT
df_validation['x1'] = df_validation['x1'] / IMAGE_WIDTH
df_validation['y2'] = df_validation['y2'] / IMAGE_HEIGHT
df_validation['x2'] = df_validation['x2'] / IMAGE_WIDTH
df_test = load_dataframe('./data/test/', 'annotations.json')
df_test['y1'] = df_test['y1'] / IMAGE_HEIGHT
df_test['x1'] = df_test['x1'] / IMAGE_WIDTH
df_test['y2'] = df_test['y2'] / IMAGE_HEIGHT
df_test['x2'] = df_test['x2'] / IMAGE_WIDTH
y_cols = ['confidence', 'y1', 'x1', 'y2', 'x2']
train_datagen = DataGenerator(df_train, y_cols, directory='./data/train_augmented/')
validation_datagen = DataGenerator(df_validation, y_cols, directory='./data/validation/')
test_datagen = DataGenerator(df_test, y_cols, directory='./data/test/')
display(df_train, df_validation, df_test)
id | confidence | y1 | x1 | y2 | x2 | |
---|---|---|---|---|---|---|
0 | 00001.jpg | 1.0 | 0.211111 | 0.145833 | 0.551389 | 0.360417 |
1 | 00002.jpg | 1.0 | 0.200000 | 0.145833 | 0.559722 | 0.359375 |
2 | 00003.jpg | 1.0 | 0.197222 | 0.143750 | 0.558333 | 0.360417 |
3 | 00004.jpg | 1.0 | 0.200000 | 0.145833 | 0.562500 | 0.358333 |
4 | 00005.jpg | 1.0 | 0.195833 | 0.145833 | 0.565278 | 0.358333 |
... | ... | ... | ... | ... | ... | ... |
46603 | 46604.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
46604 | 46605.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
46605 | 46606.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
46606 | 46607.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
46607 | 46608.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
46608 rows × 6 columns
id | confidence | y1 | x1 | y2 | x2 | |
---|---|---|---|---|---|---|
0 | 00001.jpg | 1.0 | 0.329167 | 0.312500 | 0.686111 | 0.535417 |
1 | 00002.jpg | 1.0 | 0.354167 | 0.359375 | 0.705556 | 0.538542 |
2 | 00003.jpg | 1.0 | 0.352778 | 0.369792 | 0.713889 | 0.547917 |
3 | 00004.jpg | 1.0 | 0.358333 | 0.400000 | 0.731944 | 0.562500 |
4 | 00005.jpg | 1.0 | 0.381944 | 0.415625 | 0.708333 | 0.569792 |
... | ... | ... | ... | ... | ... | ... |
443 | 00444.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
444 | 00445.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
445 | 00446.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
446 | 00447.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
447 | 00448.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
448 rows × 6 columns
id | confidence | y1 | x1 | y2 | x2 | |
---|---|---|---|---|---|---|
0 | 00001.jpg | 1.0 | 0.609722 | 0.625000 | 0.875000 | 0.837500 |
1 | 00002.jpg | 1.0 | 0.388889 | 0.485417 | 0.648611 | 0.677083 |
2 | 00003.jpg | 1.0 | 0.283333 | 0.473958 | 0.568056 | 0.662500 |
3 | 00004.jpg | 1.0 | 0.302778 | 0.395833 | 0.572222 | 0.560417 |
4 | 00005.jpg | 1.0 | 0.305556 | 0.421875 | 0.587500 | 0.591667 |
... | ... | ... | ... | ... | ... | ... |
305 | 00306.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
306 | 00307.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
307 | 00308.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
308 | 00309.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
309 | 00310.jpg | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
310 rows × 6 columns
The model in this project makes use of transfer learning. The first several layers of the model uses a trained VGG19 model from the keras package. The weights for the VGG19 layers are locked and can't be changed. Attached to the VGG19 layers are several Dense layers. These dense layers are trainable. During training, only the weights for these dense layers will be updated.
The inputs for the model will be the 224x224 3 channel image. The outputs will be five values between 0 and 1. The first output being the confidence that the image contains a hand. The remaining outputs contain the x,y pairs that mark the bounding box.
The outputs for this model contain two separate items which are the confidence and bounding box. For the confidence value, a simple accuracy metric will be used. Since the bounding box is a square drawn from the four remaining outputs, a normal metric such as mean squared error, doesn't give much information to how well the bounding box is performing.
A good measure for comparing bounding boxes is Intersection Over Union or IOU. IOU compares bounding boxes by dividing the area of intersection between the two boxes by the area of union of the two boxes.
$$ IOU = \frac{|A \cap B|}{|A \cup B| + \epsilon} $$For this equation, when the two bounding boxes perfectly overlap the value is very close to 1. When the the two bounding boxes are not overlapping, the intersection and union will be zero. Adding a small epsilon value to the denominator prevents a divide by zero error, so the value returned is 0.
class IOU_Metric(keras.metrics.Metric):
def __init__(self, name='iou', **kwargs):
super().__init__(**kwargs)
self.iou_values = self.add_weight(name='iou_values', initializer='zeros', dtype=tf.float32)
self.total = tf.Variable(0, dtype=tf.float32)
def update_state(self, y_true, y_pred, sample_weight=None):
y_true_area = K.abs(K.transpose(y_true)[2] - K.transpose(y_true)[0]) * K.abs(K.transpose(y_true)[3] - K.transpose(y_true)[1])
y_pred_area = K.abs(K.transpose(y_pred)[2] - K.transpose(y_pred)[0]) * K.abs(K.transpose(y_pred)[3] - K.transpose(y_pred)[1])
x1 = K.maximum(K.transpose(y_true)[1], K.transpose(y_pred)[1])
y1 = K.maximum(K.transpose(y_true)[0], K.transpose(y_pred)[0])
x2 = K.minimum(K.transpose(y_true)[3], K.transpose(y_pred)[3])
y2 = K.minimum(K.transpose(y_true)[2], K.transpose(y_pred)[2])
intersection = K.maximum(0.0, x2 - x1) * K.maximum(0.0, y2 - y1)
union = y_true_area + y_pred_area - intersection
iou = intersection / (union + K.epsilon())
iou = K.clip(iou, 0, 1.0)
self.iou_values.assign_add(K.sum(iou))
self.total.assign_add(K.sum(K.cast(K.cast(iou, dtype=tf.bool), dtype=tf.float32)))
def reset_states(self):
super().reset_states()
self.total.assign(0)
def result(self):
if self.total == 0:
return tf.constant(0, dtype=tf.float32)
return self.iou_values / self.total
This model uses a batch size of 32. The learning rate will be set to 0.00005. The model uses the EarlyStopping keras class. The EarlyStopping class monitors a metric, and can terminate the training before the maximum epochs are reached. The maximum epochs for this model will be 100. The validation IOU metric will be monitored for early stoppage, and when it fails to increase 10 times in a row, the model will be stopped and the best previous weights will be loaded.
The layers attached to the untrainable VGG19 model will be as follows:
The final layer is made up of two side by side layers. I split the final layer to be able to use separate metrics to monitor confidence and the bounding boxes.
BATCH_SIZE = 32
ES_PATIENCE = 10
ES_RESTORE_WEIGHTS = True
EPOCHS = 100
LEARNING_RATE = 0.00005
vgg = keras.applications.VGG19(weights='imagenet', include_top=False, input_tensor=keras.layers.Input(shape=(224, 224, 3)))
vgg.trainable = False
flatten = vgg.output
flatten = keras.layers.Flatten()(flatten)
hidden = keras.layers.Dense(1024, activation='relu')(flatten)
hidden = keras.layers.Dropout(0.1)(hidden)
hidden = keras.layers.Dense(512, activation='relu')(hidden)
hidden = keras.layers.Dropout(0.1)(hidden)
hidden = keras.layers.Dense(256, activation='relu')(hidden)
hidden = keras.layers.Dropout(0.1)(hidden)
confHead = keras.layers.Dense(1, activation='sigmoid', name='confidence_output')(hidden)
bbHead = keras.layers.Dense(4, activation='sigmoid', name='bounding_output')(hidden)
model = keras.models.Model(inputs=vgg.input, outputs=[confHead, bbHead])
opt = keras.optimizers.Adam(learning_rate=LEARNING_RATE)
model.compile(loss='mse', optimizer=opt, metrics=[[keras.metrics.BinaryAccuracy()], [IOU_Metric()]])
es = EarlyStopping(monitor='val_bounding_output_iou__metric', mode='max', verbose=1, patience=ES_PATIENCE, restore_best_weights=ES_RESTORE_WEIGHTS)
history = model.fit(train_datagen,
steps_per_epoch=train_datagen.__len__(),
validation_data=validation_datagen,
validation_steps = validation_datagen.__len__(),
epochs=EPOCHS,
callbacks=[es])
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5 80142336/80134624 [==============================] - 1s 0us/step Epoch 1/100 1456/1456 [==============================] - 309s 202ms/step - loss: 0.0668 - confidence_output_loss: 0.0376 - bounding_output_loss: 0.0292 - confidence_output_binary_accuracy: 0.9491 - bounding_output_iou__metric: 0.3633 - val_loss: 0.0054 - val_confidence_output_loss: 0.0010 - val_bounding_output_loss: 0.0044 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.5699 Epoch 2/100 1456/1456 [==============================] - 295s 203ms/step - loss: 0.0100 - confidence_output_loss: 0.0038 - bounding_output_loss: 0.0062 - confidence_output_binary_accuracy: 0.9952 - bounding_output_iou__metric: 0.5178 - val_loss: 0.0057 - val_confidence_output_loss: 0.0020 - val_bounding_output_loss: 0.0037 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.6030 Epoch 3/100 1456/1456 [==============================] - 286s 196ms/step - loss: 0.0060 - confidence_output_loss: 0.0018 - bounding_output_loss: 0.0042 - confidence_output_binary_accuracy: 0.9980 - bounding_output_iou__metric: 0.5669 - val_loss: 0.0085 - val_confidence_output_loss: 0.0035 - val_bounding_output_loss: 0.0050 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.5443 Epoch 4/100 1456/1456 [==============================] - 285s 196ms/step - loss: 0.0055 - confidence_output_loss: 0.0017 - bounding_output_loss: 0.0037 - confidence_output_binary_accuracy: 0.9980 - bounding_output_iou__metric: 0.5845 - val_loss: 0.0060 - val_confidence_output_loss: 0.0023 - val_bounding_output_loss: 0.0037 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.6033 Epoch 5/100 1456/1456 [==============================] - 283s 194ms/step - loss: 0.0045 - confidence_output_loss: 0.0013 - bounding_output_loss: 0.0032 - confidence_output_binary_accuracy: 0.9986 - bounding_output_iou__metric: 0.6035 - val_loss: 0.0070 - val_confidence_output_loss: 0.0035 - val_bounding_output_loss: 0.0035 - val_confidence_output_binary_accuracy: 0.9955 - val_bounding_output_iou__metric: 0.6477 Epoch 6/100 1456/1456 [==============================] - 282s 194ms/step - loss: 0.0037 - confidence_output_loss: 9.2979e-04 - bounding_output_loss: 0.0028 - confidence_output_binary_accuracy: 0.9989 - bounding_output_iou__metric: 0.6194 - val_loss: 0.0056 - val_confidence_output_loss: 0.0019 - val_bounding_output_loss: 0.0037 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.5713 Epoch 7/100 1456/1456 [==============================] - 282s 194ms/step - loss: 0.0033 - confidence_output_loss: 7.6830e-04 - bounding_output_loss: 0.0026 - confidence_output_binary_accuracy: 0.9992 - bounding_output_iou__metric: 0.6306 - val_loss: 0.0136 - val_confidence_output_loss: 0.0092 - val_bounding_output_loss: 0.0044 - val_confidence_output_binary_accuracy: 0.9888 - val_bounding_output_iou__metric: 0.6373 Epoch 8/100 1456/1456 [==============================] - 282s 194ms/step - loss: 0.0028 - confidence_output_loss: 6.5037e-04 - bounding_output_loss: 0.0021 - confidence_output_binary_accuracy: 0.9992 - bounding_output_iou__metric: 0.6542 - val_loss: 0.0038 - val_confidence_output_loss: 2.2703e-04 - val_bounding_output_loss: 0.0036 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.5980 Epoch 9/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0037 - confidence_output_loss: 0.0011 - bounding_output_loss: 0.0026 - confidence_output_binary_accuracy: 0.9984 - bounding_output_iou__metric: 0.6376 - val_loss: 0.0096 - val_confidence_output_loss: 0.0054 - val_bounding_output_loss: 0.0042 - val_confidence_output_binary_accuracy: 0.9955 - val_bounding_output_iou__metric: 0.6249 Epoch 10/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0030 - confidence_output_loss: 7.1961e-04 - bounding_output_loss: 0.0022 - confidence_output_binary_accuracy: 0.9991 - bounding_output_iou__metric: 0.6506 - val_loss: 0.0036 - val_confidence_output_loss: 6.1950e-04 - val_bounding_output_loss: 0.0030 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6558 Epoch 11/100 1456/1456 [==============================] - 279s 192ms/step - loss: 0.0017 - confidence_output_loss: 1.6699e-04 - bounding_output_loss: 0.0015 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.6869 - val_loss: 0.0029 - val_confidence_output_loss: 8.8711e-05 - val_bounding_output_loss: 0.0028 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6564 Epoch 12/100 1456/1456 [==============================] - 281s 193ms/step - loss: 0.0021 - confidence_output_loss: 3.8162e-04 - bounding_output_loss: 0.0017 - confidence_output_binary_accuracy: 0.9996 - bounding_output_iou__metric: 0.6762 - val_loss: 0.0163 - val_confidence_output_loss: 0.0096 - val_bounding_output_loss: 0.0066 - val_confidence_output_binary_accuracy: 0.9799 - val_bounding_output_iou__metric: 0.6289 Epoch 13/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0029 - confidence_output_loss: 8.1272e-04 - bounding_output_loss: 0.0021 - confidence_output_binary_accuracy: 0.9987 - bounding_output_iou__metric: 0.6635 - val_loss: 0.0031 - val_confidence_output_loss: 5.0265e-04 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6652 Epoch 14/100 1456/1456 [==============================] - 279s 192ms/step - loss: 0.0013 - confidence_output_loss: 5.9859e-05 - bounding_output_loss: 0.0013 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.7043 - val_loss: 0.0033 - val_confidence_output_loss: 4.7408e-04 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6461 Epoch 15/100 1456/1456 [==============================] - 279s 192ms/step - loss: 0.0028 - confidence_output_loss: 7.8699e-04 - bounding_output_loss: 0.0020 - confidence_output_binary_accuracy: 0.9989 - bounding_output_iou__metric: 0.6704 - val_loss: 0.0049 - val_confidence_output_loss: 0.0018 - val_bounding_output_loss: 0.0031 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.6567 Epoch 16/100 1456/1456 [==============================] - 279s 192ms/step - loss: 0.0018 - confidence_output_loss: 3.2669e-04 - bounding_output_loss: 0.0015 - confidence_output_binary_accuracy: 0.9996 - bounding_output_iou__metric: 0.6981 - val_loss: 0.0038 - val_confidence_output_loss: 0.0012 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.6597 Epoch 17/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0021 - confidence_output_loss: 4.8505e-04 - bounding_output_loss: 0.0016 - confidence_output_binary_accuracy: 0.9994 - bounding_output_iou__metric: 0.6939 - val_loss: 0.0031 - val_confidence_output_loss: 2.4019e-04 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6729 Epoch 18/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0016 - confidence_output_loss: 2.2262e-04 - bounding_output_loss: 0.0014 - confidence_output_binary_accuracy: 0.9997 - bounding_output_iou__metric: 0.7033 - val_loss: 0.0027 - val_confidence_output_loss: 4.3774e-05 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6784 Epoch 19/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0016 - confidence_output_loss: 2.6039e-04 - bounding_output_loss: 0.0013 - confidence_output_binary_accuracy: 0.9997 - bounding_output_iou__metric: 0.7097 - val_loss: 0.0025 - val_confidence_output_loss: 1.5120e-05 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6930 Epoch 20/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0018 - confidence_output_loss: 3.6368e-04 - bounding_output_loss: 0.0014 - confidence_output_binary_accuracy: 0.9996 - bounding_output_iou__metric: 0.7101 - val_loss: 0.0073 - val_confidence_output_loss: 0.0026 - val_bounding_output_loss: 0.0048 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6532 Epoch 21/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0016 - confidence_output_loss: 2.9388e-04 - bounding_output_loss: 0.0013 - confidence_output_binary_accuracy: 0.9996 - bounding_output_iou__metric: 0.7104 - val_loss: 0.0025 - val_confidence_output_loss: 4.4857e-05 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6683 Epoch 22/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0017 - confidence_output_loss: 3.4689e-04 - bounding_output_loss: 0.0013 - confidence_output_binary_accuracy: 0.9995 - bounding_output_iou__metric: 0.7111 - val_loss: 0.0027 - val_confidence_output_loss: 3.3875e-05 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6921 Epoch 23/100 1456/1456 [==============================] - 280s 192ms/step - loss: 9.2208e-04 - confidence_output_loss: 2.0792e-05 - bounding_output_loss: 9.0128e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7401 - val_loss: 0.0028 - val_confidence_output_loss: 1.6841e-05 - val_bounding_output_loss: 0.0028 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6889 Epoch 24/100 1456/1456 [==============================] - 279s 191ms/step - loss: 9.1563e-04 - confidence_output_loss: 3.2558e-05 - bounding_output_loss: 8.8308e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7432 - val_loss: 0.0054 - val_confidence_output_loss: 1.9324e-05 - val_bounding_output_loss: 0.0054 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.5203 Epoch 25/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0032 - confidence_output_loss: 0.0011 - bounding_output_loss: 0.0021 - confidence_output_binary_accuracy: 0.9988 - bounding_output_iou__metric: 0.6964 - val_loss: 0.0024 - val_confidence_output_loss: 4.1746e-05 - val_bounding_output_loss: 0.0023 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6977 Epoch 26/100 1456/1456 [==============================] - 279s 192ms/step - loss: 8.4636e-04 - confidence_output_loss: 1.7427e-05 - bounding_output_loss: 8.2893e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7501 - val_loss: 0.0024 - val_confidence_output_loss: 3.5244e-06 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7030 Epoch 27/100 1456/1456 [==============================] - 280s 192ms/step - loss: 8.5640e-04 - confidence_output_loss: 4.7071e-05 - bounding_output_loss: 8.0933e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7550 - val_loss: 0.0026 - val_confidence_output_loss: 2.6094e-05 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7029 Epoch 28/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0035 - confidence_output_loss: 0.0013 - bounding_output_loss: 0.0022 - confidence_output_binary_accuracy: 0.9983 - bounding_output_iou__metric: 0.6885 - val_loss: 0.0032 - val_confidence_output_loss: 3.3891e-04 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6875 Epoch 29/100 1456/1456 [==============================] - 280s 192ms/step - loss: 8.6850e-04 - confidence_output_loss: 1.0323e-05 - bounding_output_loss: 8.5818e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7451 - val_loss: 0.0028 - val_confidence_output_loss: 3.6383e-06 - val_bounding_output_loss: 0.0028 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7051 Epoch 30/100 1456/1456 [==============================] - 280s 192ms/step - loss: 7.9276e-04 - confidence_output_loss: 3.2359e-05 - bounding_output_loss: 7.6040e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7584 - val_loss: 0.0031 - val_confidence_output_loss: 1.3877e-04 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7030 Epoch 31/100 1456/1456 [==============================] - 280s 192ms/step - loss: 7.6786e-04 - confidence_output_loss: 3.8267e-05 - bounding_output_loss: 7.2959e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7638 - val_loss: 0.0069 - val_confidence_output_loss: 0.0036 - val_bounding_output_loss: 0.0033 - val_confidence_output_binary_accuracy: 0.9955 - val_bounding_output_iou__metric: 0.5976 Epoch 32/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0021 - confidence_output_loss: 5.6692e-04 - bounding_output_loss: 0.0015 - confidence_output_binary_accuracy: 0.9993 - bounding_output_iou__metric: 0.7025 - val_loss: 0.0026 - val_confidence_output_loss: 5.2256e-06 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7024 Epoch 33/100 1456/1456 [==============================] - 280s 192ms/step - loss: 8.3601e-04 - confidence_output_loss: 7.6521e-05 - bounding_output_loss: 7.5949e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.7633 - val_loss: 0.0165 - val_confidence_output_loss: 0.0091 - val_bounding_output_loss: 0.0074 - val_confidence_output_binary_accuracy: 0.9844 - val_bounding_output_iou__metric: 0.6684 Epoch 34/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0019 - confidence_output_loss: 5.2410e-04 - bounding_output_loss: 0.0013 - confidence_output_binary_accuracy: 0.9993 - bounding_output_iou__metric: 0.7218 - val_loss: 0.0077 - val_confidence_output_loss: 0.0045 - val_bounding_output_loss: 0.0033 - val_confidence_output_binary_accuracy: 0.9933 - val_bounding_output_iou__metric: 0.7003 Epoch 35/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0015 - confidence_output_loss: 2.8138e-04 - bounding_output_loss: 0.0012 - confidence_output_binary_accuracy: 0.9996 - bounding_output_iou__metric: 0.7254 - val_loss: 0.0025 - val_confidence_output_loss: 9.2699e-06 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6919 Epoch 36/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0014 - confidence_output_loss: 3.7840e-04 - bounding_output_loss: 0.0010 - confidence_output_binary_accuracy: 0.9995 - bounding_output_iou__metric: 0.7408 - val_loss: 0.0027 - val_confidence_output_loss: 3.7225e-05 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7013 Epoch 37/100 1456/1456 [==============================] - 281s 193ms/step - loss: 0.0010 - confidence_output_loss: 1.7202e-04 - bounding_output_loss: 8.3136e-04 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.7613 - val_loss: 0.0024 - val_confidence_output_loss: 9.4565e-06 - val_bounding_output_loss: 0.0023 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7023 Epoch 38/100 1456/1456 [==============================] - 280s 192ms/step - loss: 7.1512e-04 - confidence_output_loss: 2.8536e-05 - bounding_output_loss: 6.8658e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7694 - val_loss: 0.0024 - val_confidence_output_loss: 5.1885e-06 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7114 Epoch 39/100 1456/1456 [==============================] - 279s 191ms/step - loss: 6.2268e-04 - confidence_output_loss: 7.1210e-06 - bounding_output_loss: 6.1556e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7768 - val_loss: 0.0022 - val_confidence_output_loss: 1.6166e-05 - val_bounding_output_loss: 0.0022 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7031 Epoch 40/100 1456/1456 [==============================] - 280s 192ms/step - loss: 6.1861e-04 - confidence_output_loss: 7.1405e-06 - bounding_output_loss: 6.1147e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7769 - val_loss: 0.0023 - val_confidence_output_loss: 3.9605e-05 - val_bounding_output_loss: 0.0023 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7077 Epoch 41/100 1456/1456 [==============================] - 280s 192ms/step - loss: 6.1704e-04 - confidence_output_loss: 1.1931e-05 - bounding_output_loss: 6.0511e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7795 - val_loss: 0.0024 - val_confidence_output_loss: 1.0411e-05 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7090 Epoch 42/100 1456/1456 [==============================] - 280s 192ms/step - loss: 8.8678e-04 - confidence_output_loss: 1.4023e-04 - bounding_output_loss: 7.4655e-04 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.7721 - val_loss: 0.0025 - val_confidence_output_loss: 3.6133e-05 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6972 Epoch 43/100 1456/1456 [==============================] - 280s 192ms/step - loss: 6.6203e-04 - confidence_output_loss: 2.0622e-05 - bounding_output_loss: 6.4141e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7753 - val_loss: 0.0024 - val_confidence_output_loss: 5.9215e-06 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6995 Epoch 44/100 1456/1456 [==============================] - 280s 192ms/step - loss: 6.1612e-04 - confidence_output_loss: 4.7371e-05 - bounding_output_loss: 5.6875e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7880 - val_loss: 0.0024 - val_confidence_output_loss: 3.3524e-06 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7167 Epoch 45/100 1456/1456 [==============================] - 279s 191ms/step - loss: 8.0987e-04 - confidence_output_loss: 1.6070e-04 - bounding_output_loss: 6.4916e-04 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.7872 - val_loss: 0.0026 - val_confidence_output_loss: 4.1473e-07 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6924 Epoch 46/100 1456/1456 [==============================] - 280s 192ms/step - loss: 7.4928e-04 - confidence_output_loss: 6.9209e-06 - bounding_output_loss: 7.4236e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7596 - val_loss: 0.0022 - val_confidence_output_loss: 6.5748e-07 - val_bounding_output_loss: 0.0022 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7136 Epoch 47/100 1456/1456 [==============================] - 280s 192ms/step - loss: 5.8337e-04 - confidence_output_loss: 2.0569e-05 - bounding_output_loss: 5.6280e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7874 - val_loss: 0.0024 - val_confidence_output_loss: 8.6544e-06 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7105 Epoch 48/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0018 - confidence_output_loss: 5.8494e-04 - bounding_output_loss: 0.0012 - confidence_output_binary_accuracy: 0.9993 - bounding_output_iou__metric: 0.7412 - val_loss: 0.0023 - val_confidence_output_loss: 4.1940e-07 - val_bounding_output_loss: 0.0023 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7099 Epoch 49/100 1456/1456 [==============================] - 280s 192ms/step - loss: 5.9649e-04 - confidence_output_loss: 5.8496e-06 - bounding_output_loss: 5.9064e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7814 - val_loss: 0.0024 - val_confidence_output_loss: 9.5834e-07 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7137 Epoch 50/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0012 - confidence_output_loss: 3.6371e-04 - bounding_output_loss: 8.7019e-04 - confidence_output_binary_accuracy: 0.9995 - bounding_output_iou__metric: 0.7639 - val_loss: 0.0376 - val_confidence_output_loss: 0.0281 - val_bounding_output_loss: 0.0095 - val_confidence_output_binary_accuracy: 0.9688 - val_bounding_output_iou__metric: 0.6627 Epoch 51/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0012 - confidence_output_loss: 3.1242e-04 - bounding_output_loss: 8.9328e-04 - confidence_output_binary_accuracy: 0.9997 - bounding_output_iou__metric: 0.7627 - val_loss: 0.0023 - val_confidence_output_loss: 4.1472e-07 - val_bounding_output_loss: 0.0023 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7128 Epoch 52/100 1456/1456 [==============================] - 281s 193ms/step - loss: 6.5195e-04 - confidence_output_loss: 5.0876e-05 - bounding_output_loss: 6.0107e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7855 - val_loss: 0.0022 - val_confidence_output_loss: 1.2503e-06 - val_bounding_output_loss: 0.0022 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7125 Epoch 53/100 1456/1456 [==============================] - 280s 192ms/step - loss: 5.1412e-04 - confidence_output_loss: 1.1481e-05 - bounding_output_loss: 5.0264e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7965 - val_loss: 0.0024 - val_confidence_output_loss: 4.1434e-06 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7134 Epoch 54/100 1456/1456 [==============================] - 280s 192ms/step - loss: 5.5016e-04 - confidence_output_loss: 4.0779e-05 - bounding_output_loss: 5.0938e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7989 - val_loss: 0.0021 - val_confidence_output_loss: 2.9381e-06 - val_bounding_output_loss: 0.0021 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7197 Epoch 55/100 1456/1456 [==============================] - 281s 193ms/step - loss: 4.9212e-04 - confidence_output_loss: 9.9280e-06 - bounding_output_loss: 4.8219e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7983 - val_loss: 0.0025 - val_confidence_output_loss: 1.7593e-06 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7111 Epoch 56/100 1456/1456 [==============================] - 281s 193ms/step - loss: 5.8628e-04 - confidence_output_loss: 6.5236e-05 - bounding_output_loss: 5.2104e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.7999 - val_loss: 0.0022 - val_confidence_output_loss: 1.5982e-06 - val_bounding_output_loss: 0.0022 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7212 Epoch 57/100 1456/1456 [==============================] - 279s 192ms/step - loss: 4.8190e-04 - confidence_output_loss: 1.4405e-05 - bounding_output_loss: 4.6749e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8005 - val_loss: 0.0026 - val_confidence_output_loss: 3.5556e-06 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6962 Epoch 58/100 1456/1456 [==============================] - 280s 192ms/step - loss: 7.7905e-04 - confidence_output_loss: 1.2063e-04 - bounding_output_loss: 6.5842e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.7814 - val_loss: 0.0026 - val_confidence_output_loss: 7.7436e-07 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7028 Epoch 59/100 1456/1456 [==============================] - 280s 192ms/step - loss: 6.0205e-04 - confidence_output_loss: 8.0017e-05 - bounding_output_loss: 5.2203e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.8007 - val_loss: 0.0030 - val_confidence_output_loss: 5.9681e-04 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.6825 Epoch 60/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0017 - confidence_output_loss: 6.1916e-04 - bounding_output_loss: 0.0011 - confidence_output_binary_accuracy: 0.9993 - bounding_output_iou__metric: 0.7524 - val_loss: 0.0032 - val_confidence_output_loss: 5.4093e-04 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6834 Epoch 61/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0014 - confidence_output_loss: 4.3357e-04 - bounding_output_loss: 9.9016e-04 - confidence_output_binary_accuracy: 0.9994 - bounding_output_iou__metric: 0.7604 - val_loss: 0.0027 - val_confidence_output_loss: 4.8970e-04 - val_bounding_output_loss: 0.0022 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7158 Epoch 62/100 1456/1456 [==============================] - 280s 192ms/step - loss: 5.2087e-04 - confidence_output_loss: 1.0574e-05 - bounding_output_loss: 5.1029e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.7944 - val_loss: 0.0038 - val_confidence_output_loss: 0.0013 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7226 Epoch 63/100 1456/1456 [==============================] - 281s 193ms/step - loss: 4.6670e-04 - confidence_output_loss: 1.8438e-05 - bounding_output_loss: 4.4826e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8059 - val_loss: 0.0037 - val_confidence_output_loss: 0.0010 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7127 Epoch 64/100 1456/1456 [==============================] - 281s 193ms/step - loss: 4.3334e-04 - confidence_output_loss: 7.4976e-06 - bounding_output_loss: 4.2585e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8086 - val_loss: 0.0023 - val_confidence_output_loss: 2.3599e-05 - val_bounding_output_loss: 0.0023 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7226 Epoch 65/100 1456/1456 [==============================] - 279s 192ms/step - loss: 0.0012 - confidence_output_loss: 4.4950e-04 - bounding_output_loss: 7.8313e-04 - confidence_output_binary_accuracy: 0.9994 - bounding_output_iou__metric: 0.7881 - val_loss: 0.0047 - val_confidence_output_loss: 0.0021 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7158 Epoch 66/100 1456/1456 [==============================] - 279s 192ms/step - loss: 0.0015 - confidence_output_loss: 4.6395e-04 - bounding_output_loss: 0.0010 - confidence_output_binary_accuracy: 0.9993 - bounding_output_iou__metric: 0.7517 - val_loss: 0.0082 - val_confidence_output_loss: 0.0049 - val_bounding_output_loss: 0.0033 - val_confidence_output_binary_accuracy: 0.9955 - val_bounding_output_iou__metric: 0.6866 Epoch 67/100 1456/1456 [==============================] - 279s 192ms/step - loss: 8.4241e-04 - confidence_output_loss: 1.7183e-04 - bounding_output_loss: 6.7058e-04 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.7772 - val_loss: 0.0053 - val_confidence_output_loss: 0.0024 - val_bounding_output_loss: 0.0030 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7242 Epoch 68/100 1456/1456 [==============================] - 279s 192ms/step - loss: 5.7759e-04 - confidence_output_loss: 8.1143e-05 - bounding_output_loss: 4.9645e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.8070 - val_loss: 0.0053 - val_confidence_output_loss: 0.0023 - val_bounding_output_loss: 0.0031 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7226 Epoch 69/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.1533e-04 - confidence_output_loss: 6.6868e-06 - bounding_output_loss: 4.0864e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8119 - val_loss: 0.0052 - val_confidence_output_loss: 0.0023 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7165 Epoch 70/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.0363e-04 - confidence_output_loss: 8.3885e-07 - bounding_output_loss: 4.0279e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8127 - val_loss: 0.0053 - val_confidence_output_loss: 0.0023 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7230 Epoch 71/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.5926e-04 - confidence_output_loss: 3.5185e-05 - bounding_output_loss: 4.2407e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8128 - val_loss: 0.0052 - val_confidence_output_loss: 0.0022 - val_bounding_output_loss: 0.0030 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7285 Epoch 72/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.9465e-04 - confidence_output_loss: 9.3609e-07 - bounding_output_loss: 3.9371e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8132 - val_loss: 0.0031 - val_confidence_output_loss: 4.3677e-04 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7255 Epoch 73/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.0044e-04 - confidence_output_loss: 6.4073e-06 - bounding_output_loss: 3.9403e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8142 - val_loss: 0.0043 - val_confidence_output_loss: 0.0016 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7260 Epoch 74/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.9376e-04 - confidence_output_loss: 7.6865e-06 - bounding_output_loss: 3.8607e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8160 - val_loss: 0.0031 - val_confidence_output_loss: 4.6467e-04 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7074 Epoch 75/100 1456/1456 [==============================] - 280s 192ms/step - loss: 8.9383e-04 - confidence_output_loss: 2.3824e-04 - bounding_output_loss: 6.5560e-04 - confidence_output_binary_accuracy: 0.9997 - bounding_output_iou__metric: 0.7956 - val_loss: 0.0037 - val_confidence_output_loss: 0.0012 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7195 Epoch 76/100 1456/1456 [==============================] - 280s 192ms/step - loss: 8.9412e-04 - confidence_output_loss: 1.7390e-04 - bounding_output_loss: 7.2022e-04 - confidence_output_binary_accuracy: 0.9997 - bounding_output_iou__metric: 0.7779 - val_loss: 0.0044 - val_confidence_output_loss: 0.0017 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7211 Epoch 77/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.1282e-04 - confidence_output_loss: 2.3406e-06 - bounding_output_loss: 4.1048e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8113 - val_loss: 0.0038 - val_confidence_output_loss: 0.0011 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7218 Epoch 78/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.5750e-04 - confidence_output_loss: 5.1726e-05 - bounding_output_loss: 4.0577e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.8202 - val_loss: 0.0048 - val_confidence_output_loss: 0.0019 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7315 Epoch 79/100 1456/1456 [==============================] - 280s 192ms/step - loss: 8.9932e-04 - confidence_output_loss: 2.6538e-04 - bounding_output_loss: 6.3394e-04 - confidence_output_binary_accuracy: 0.9996 - bounding_output_iou__metric: 0.8027 - val_loss: 0.0030 - val_confidence_output_loss: 5.1038e-04 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7071 Epoch 80/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0015 - confidence_output_loss: 5.7655e-04 - bounding_output_loss: 8.9495e-04 - confidence_output_binary_accuracy: 0.9993 - bounding_output_iou__metric: 0.7742 - val_loss: 0.0024 - val_confidence_output_loss: 1.8626e-05 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7296 Epoch 81/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.1789e-04 - confidence_output_loss: 2.0217e-05 - bounding_output_loss: 3.9767e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8158 - val_loss: 0.0025 - val_confidence_output_loss: 3.1160e-06 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7236 Epoch 82/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.5804e-04 - confidence_output_loss: 4.2058e-06 - bounding_output_loss: 3.5384e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8231 - val_loss: 0.0025 - val_confidence_output_loss: 1.5082e-05 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7317 Epoch 83/100 1456/1456 [==============================] - 279s 192ms/step - loss: 3.7161e-04 - confidence_output_loss: 1.5536e-05 - bounding_output_loss: 3.5607e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8247 - val_loss: 0.0024 - val_confidence_output_loss: 5.1652e-06 - val_bounding_output_loss: 0.0024 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7126 Epoch 84/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.6530e-04 - confidence_output_loss: 1.4659e-05 - bounding_output_loss: 3.5064e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8247 - val_loss: 0.0025 - val_confidence_output_loss: 5.9989e-05 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7291 Epoch 85/100 1456/1456 [==============================] - 281s 193ms/step - loss: 4.8724e-04 - confidence_output_loss: 8.1223e-05 - bounding_output_loss: 4.0601e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.8237 - val_loss: 0.0046 - val_confidence_output_loss: 0.0020 - val_bounding_output_loss: 0.0027 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7222 Epoch 86/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.5560e-04 - confidence_output_loss: 4.7847e-06 - bounding_output_loss: 3.5082e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8228 - val_loss: 0.0045 - val_confidence_output_loss: 0.0019 - val_bounding_output_loss: 0.0026 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7306 Epoch 87/100 1456/1456 [==============================] - 281s 193ms/step - loss: 7.0935e-04 - confidence_output_loss: 1.8440e-04 - bounding_output_loss: 5.2495e-04 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.8114 - val_loss: 0.0032 - val_confidence_output_loss: 7.0380e-04 - val_bounding_output_loss: 0.0025 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7118 Epoch 88/100 1456/1456 [==============================] - 281s 193ms/step - loss: 6.7844e-04 - confidence_output_loss: 1.0599e-04 - bounding_output_loss: 5.7245e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.7937 - val_loss: 0.0048 - val_confidence_output_loss: 0.0018 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7201 Epoch 89/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.8352e-04 - confidence_output_loss: 4.1284e-06 - bounding_output_loss: 3.7939e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8192 - val_loss: 0.0053 - val_confidence_output_loss: 0.0020 - val_bounding_output_loss: 0.0033 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7268 Epoch 90/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.5968e-04 - confidence_output_loss: 1.4517e-05 - bounding_output_loss: 3.4517e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8273 - val_loss: 0.0051 - val_confidence_output_loss: 0.0021 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7387 Epoch 91/100 1456/1456 [==============================] - 280s 192ms/step - loss: 9.1978e-04 - confidence_output_loss: 2.4960e-04 - bounding_output_loss: 6.7018e-04 - confidence_output_binary_accuracy: 0.9997 - bounding_output_iou__metric: 0.8031 - val_loss: 0.0023 - val_confidence_output_loss: 1.0879e-05 - val_bounding_output_loss: 0.0023 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.6666 Epoch 92/100 1456/1456 [==============================] - 279s 191ms/step - loss: 8.3546e-04 - confidence_output_loss: 1.3682e-04 - bounding_output_loss: 6.9864e-04 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.7723 - val_loss: 0.0029 - val_confidence_output_loss: 9.3264e-06 - val_bounding_output_loss: 0.0029 - val_confidence_output_binary_accuracy: 1.0000 - val_bounding_output_iou__metric: 0.7029 Epoch 93/100 1456/1456 [==============================] - 280s 192ms/step - loss: 5.4708e-04 - confidence_output_loss: 7.1167e-05 - bounding_output_loss: 4.7591e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.8055 - val_loss: 0.0069 - val_confidence_output_loss: 0.0037 - val_bounding_output_loss: 0.0032 - val_confidence_output_binary_accuracy: 0.9955 - val_bounding_output_iou__metric: 0.6731 Epoch 94/100 1456/1456 [==============================] - 281s 193ms/step - loss: 8.8987e-04 - confidence_output_loss: 1.8461e-04 - bounding_output_loss: 7.0526e-04 - confidence_output_binary_accuracy: 0.9998 - bounding_output_iou__metric: 0.7809 - val_loss: 0.0253 - val_confidence_output_loss: 0.0158 - val_bounding_output_loss: 0.0095 - val_confidence_output_binary_accuracy: 0.9821 - val_bounding_output_iou__metric: 0.5715 Epoch 95/100 1456/1456 [==============================] - 280s 192ms/step - loss: 0.0016 - confidence_output_loss: 6.7312e-04 - bounding_output_loss: 9.6001e-04 - confidence_output_binary_accuracy: 0.9993 - bounding_output_iou__metric: 0.7658 - val_loss: 0.0053 - val_confidence_output_loss: 0.0022 - val_bounding_output_loss: 0.0030 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7189 Epoch 96/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.7952e-04 - confidence_output_loss: 1.3572e-05 - bounding_output_loss: 3.6595e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8222 - val_loss: 0.0053 - val_confidence_output_loss: 0.0022 - val_bounding_output_loss: 0.0031 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7202 Epoch 97/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.4467e-04 - confidence_output_loss: 7.3468e-06 - bounding_output_loss: 3.3732e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8274 - val_loss: 0.0065 - val_confidence_output_loss: 0.0032 - val_bounding_output_loss: 0.0033 - val_confidence_output_binary_accuracy: 0.9955 - val_bounding_output_iou__metric: 0.7142 Epoch 98/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.8457e-04 - confidence_output_loss: 3.5978e-05 - bounding_output_loss: 3.4859e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8290 - val_loss: 0.0056 - val_confidence_output_loss: 0.0023 - val_bounding_output_loss: 0.0033 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7299 Epoch 99/100 1456/1456 [==============================] - 280s 192ms/step - loss: 3.4888e-04 - confidence_output_loss: 1.6752e-05 - bounding_output_loss: 3.3213e-04 - confidence_output_binary_accuracy: 1.0000 - bounding_output_iou__metric: 0.8295 - val_loss: 0.0056 - val_confidence_output_loss: 0.0022 - val_bounding_output_loss: 0.0034 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.6999 Epoch 100/100 1456/1456 [==============================] - 280s 192ms/step - loss: 4.0968e-04 - confidence_output_loss: 5.1145e-05 - bounding_output_loss: 3.5853e-04 - confidence_output_binary_accuracy: 0.9999 - bounding_output_iou__metric: 0.8286 - val_loss: 0.0060 - val_confidence_output_loss: 0.0024 - val_bounding_output_loss: 0.0036 - val_confidence_output_binary_accuracy: 0.9978 - val_bounding_output_iou__metric: 0.7182 Restoring model weights from the end of the best epoch. Epoch 00100: early stopping
Below are five graphs. The first graph shows the overall loss for the model. The second and third show the loss for the confidence and bounding box layers. The fourth shows the IOU metric. The fifth shows the binary accuracy for the confidence value.
def plot_history(history):
fig, ax = plt.subplots(2, 3, figsize=(14,10))
ax[0,0].set_title('Overall Training Loss')
ax[0,0].set_xlabel('Epoch')
ax[0,0].set_ylabel('Loss')
ax[0,0].plot(history.history['loss'], label='Training')
ax[0,0].plot(history.history['val_loss'], label='Validation')
ax[0,0].legend()
ax[0,1].set_title('Confidence Training Loss')
ax[0,1].set_xlabel('Epoch')
ax[0,1].set_ylabel('Loss')
ax[0,1].plot(history.history['confidence_output_loss'], label='Training')
ax[0,1].plot(history.history['val_confidence_output_loss'], label='Validation')
ax[0,1].legend()
ax[0,2].set_title('Bounding Training Loss')
ax[0,2].set_xlabel('Epoch')
ax[0,2].set_ylabel('Loss')
ax[0,2].plot(history.history['bounding_output_loss'], label='Training')
ax[0,2].plot(history.history['val_bounding_output_loss'], label='Validation')
ax[0,2].legend()
ax[1,0].set_title('Training IOU Metric')
ax[1,0].set_xlabel('Epoch')
ax[1,0].set_ylabel('IOU')
ax[1,0].set_ylim([0, 1])
ax[1,0].plot(history.history['bounding_output_iou__metric'], label='Training')
ax[1,0].plot(history.history['val_bounding_output_iou__metric'], label='Validation')
ax[1,0].legend()
ax[1,1].set_title('Training Confidence Accuracy')
ax[1,1].set_xlabel('Epoch')
ax[1,1].set_ylabel('Accuracy')
ax[1,1].set_ylim([0, 1])
ax[1,1].plot(history.history['confidence_output_binary_accuracy'], label='Training Accuracy')
ax[1,1].plot(history.history['val_confidence_output_binary_accuracy'], label='Validation Accuracy')
ax[1,1].legend()
ax[1,2].axis('off')
plt.tight_layout()
plt.show()
plot_history(history)
Below are the final results on the test data. The values of overall loss, confidence loss, bounding box loss, confidence accuracy, and IOU on the test data are shown.
test_results = model.evaluate(test_datagen)
print('Test Data Loss: {}'.format(test_results[0]))
print('Test Data Confidence Loss: {}'.format(test_results[1]))
print('Test Data Bounding Loss: {}'.format(test_results[2]))
print('Test Data Binary Accuracy: {}'.format(test_results[3]))
print('Test Data IOU: {}'.format(test_results[4]))
9/9 [==============================] - 2s 179ms/step - loss: 0.0107 - confidence_output_loss: 0.0069 - bounding_output_loss: 0.0038 - confidence_output_binary_accuracy: 0.9931 - bounding_output_iou__metric: 0.6657 Test Data Loss: 0.010659802705049515 Test Data Confidence Loss: 0.006907379254698753 Test Data Bounding Loss: 0.0037524241488426924 Test Data Binary Accuracy: 0.9930555820465088 Test Data IOU: 0.6656837463378906
Below, a sample video showing training, validation, and test data are shown. The target data is shown as a red bounding box. The predicted data is shown as a green bounding box. The confidence value and IOU value are displayed at the top left.
display(play_images_from_directory('./data/train/positive/', 'annotations.json', num_to_iter=50, model=model))
display(play_images_from_directory('./data/validation/', 'annotations.json', num_to_iter=1, model=model))
display(play_images_from_directory('./data/test/', 'annotations.json', num_to_iter=1, model=model))
Below, the model is run on several unlabeled videos. There is no labeled target data, so only green boxes are shown. Also, no IOU data is available since there are no target boxes to measure overlap.
def play_images_from_video(video_path, model, interval=100, prediction_threshold=0.5):
fig, ax = plt.subplots(figsize=(5, 5))
cap = cv2.VideoCapture(video_path)
animation_images = []
while(cap.isOpened()):
ret, frame = cap.read()
if ret:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_copy = cv2.resize(frame, (224, 224))
image_copy = np.array(image_copy, dtype='float32') / 255.0
prediction = model.predict(np.reshape(image_copy, (1, 224, 224, 3)))
conf_pred = prediction[0][0][0]
iou_result = 0.0
if conf_pred > prediction_threshold:
pred_y1 = prediction[1][0][0] * 720
pred_x1 = prediction[1][0][1] * 960
pred_y2 = prediction[1][0][2] * 720
pred_x2 = prediction[1][0][3] * 960
cv2.rectangle(frame, (int(pred_x1), int(pred_y1)), (int(pred_x2), int(pred_y2)), (0, 255, 0), 2)
cv2.putText(frame, 'Conf: {:.2f}'.format(conf_pred), (2, 36), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 0), 2, cv2.LINE_AA)
animation_images.append([ax.imshow(frame.astype('int'))])
else:
break
cap.release()
ani = animation.ArtistAnimation(fig, animation_images, interval=interval, blit=True)
plt.close()
return HTML(ani.to_html5_video())
display(play_images_from_video('../../data/original/videos/video_1.mp4', model))
display(play_images_from_video('../../data/original/videos/video_2.mp4', model))
display(play_images_from_video('../../data/original/videos/video_3.mp4', model))
display(play_images_from_video('../../data/original/videos/video_4.mp4', model))
display(play_images_from_video('../../data/original/videos/video_5.mp4', model))
In summary, this project was done just for practice. I wanted to go through the process of beginning with nothing and creating a working model. I created and labeled my own dataset, made the model, and iteratively worked on improving it until it worked to an acceptable level.
While the model works on my dataset and videos of me from my webcam, it doesn't work well with videos from other sources. In order to make the model more robust, I would need labeled images from other camera types, with different people, different angles, and more differences. Since this project was just for practice, I don't plan to pursue improving it any further.