InferenceUsingTFHubCenterNetObjDetect - Using TensorFlow to do object detection using a pre-trained model
The following tutorial is based on the TensorFlow Hub Object Detection Colab notebook. It uses a pre-trained model based on the CenterNet architecture trained on the COCO 2017 dataset. Running the code requires an Internet connection to download the model (from Google servers) and testing data (from GitHub servers).
Some of this code is identical to that of InferenceUsingTFHubMobileNetV2Model
notebook. Please look there for an explanation for that code. As stated there, this will later be wrapped up into a high-level library to hide the details behind an API.
The following document is either a POD file which can additionally be run as a Perl script or a Jupyter Notebook which can be run in IPerl (viewable online at nbviewer). If you are reading this as POD, there should be a generated list of Perl dependencies in the CPANFILE section. Furthermore,
PDL::Graphics::Gnuplot
requires gnuplot
.If you are running the code, you may optionally install the tensorflow
Python package in order to access the saved_model_cli
command, but this is only used for informational purposes.
use strict;
use warnings;
use utf8;
use constant IN_IPERL => !! $ENV{PERL_IPERL_RUNNING};
no if IN_IPERL, warnings => 'redefine'; # fewer messages when re-running cells
use feature qw(say state postderef);
use Syntax::Construct qw(each-array);
use lib::projectroot qw(lib);
BEGIN {
if( IN_IPERL ) {
$ENV{TF_CPP_MIN_LOG_LEVEL} = 3;
}
require AI::TensorFlow::Libtensorflow;
}
use URI ();
use HTTP::Tiny ();
use Path::Tiny qw(path);
use File::Which ();
use List::Util 1.56 qw(mesh);
use Data::Printer ( output => 'stderr', return_value => 'void', filters => ['PDL'] );
use Data::Printer::Filter::PDL ();
use Text::Table::Tiny qw(generate_table);
use Imager;
my $s = AI::TensorFlow::Libtensorflow::Status->New;
sub AssertOK {
die "Status $_[0]: " . $_[0]->Message
unless $_[0]->GetCode == AI::TensorFlow::Libtensorflow::Status::OK;
return;
}
AssertOK($s);
And create helpers for converting between PDL
ndarrays and TFTensor
ndarrays.
use PDL;
use AI::TensorFlow::Libtensorflow::DataType qw(FLOAT UINT8);
use FFI::Platypus::Memory qw(memcpy);
use FFI::Platypus::Buffer qw(scalar_to_pointer);
sub FloatPDLTOTFTensor {
my ($p) = @_;
return AI::TensorFlow::Libtensorflow::Tensor->New(
FLOAT, [ reverse $p->dims ], $p->get_dataref, sub { undef $p }
);
}
sub FloatTFTensorToPDL {
my ($t) = @_;
my $pdl = zeros(float,reverse( map $t->Dim($_), 0..$t->NumDims-1 ) );
memcpy scalar_to_pointer( ${$pdl->get_dataref} ),
scalar_to_pointer( ${$t->Data} ),
$t->ByteSize;
$pdl->upd_data;
$pdl;
}
sub Uint8PDLTOTFTensor {
my ($p) = @_;
return AI::TensorFlow::Libtensorflow::Tensor->New(
UINT8, [ reverse $p->dims ], $p->get_dataref, sub { undef $p }
);
}
sub Uint8TFTensorToPDL {
my ($t) = @_;
my $pdl = zeros(byte,reverse( map $t->Dim($_), 0..$t->NumDims-1 ) );
memcpy scalar_to_pointer( ${$pdl->get_dataref} ),
scalar_to_pointer( ${$t->Data} ),
$t->ByteSize;
$pdl->upd_data;
$pdl;
}
We are going to use an object detection model from TensorFlow Hub based on the CenterNet architecture. We download both the model and COCO 2017 labels.
# image_size => [width, height] (but usually square images)
my %model_name_to_params = (
centernet_hourglass_512x512 => {
handle => 'https://tfhub.dev/tensorflow/centernet/hourglass_512x512/1',
image_size => [ 512, 512 ],
},
);
my $model_name = 'centernet_hourglass_512x512';
say "Selected model: $model_name : $model_name_to_params{$model_name}{handle}";
We download the model to the current directory and then extract the model to a folder with the name given in $model_base
.
my $model_uri = URI->new( $model_name_to_params{$model_name}{handle} );
$model_uri->query_form( 'tf-hub-format' => 'compressed' );
my $model_base = substr( $model_uri->path, 1 ) =~ s,/,_,gr;
my $model_archive_path = "${model_base}.tar.gz";
my $http = HTTP::Tiny->new;
for my $download ( [ $model_uri => $model_archive_path ],) {
my ($uri, $path) = @$download;
say "Downloading $uri to $path";
next if -e $path;
$http->mirror( $uri, $path );
}
use Archive::Extract;
my $ae = Archive::Extract->new( archive => $model_archive_path );
die "Could not extract archive" unless $ae->extract( to => $model_base );
my $saved_model = path($model_base)->child('saved_model.pb');
say "Saved model is in $saved_model" if -f $saved_model;
We need to download the COCO 2017 classification labels and parse out the mapping from the numeric index to the textual descriptions.
# Get the labels
my $response = $http->get('https://raw.githubusercontent.com/tensorflow/models/a4944a57ad2811e1f6a7a87589a9fc8a776e8d3c/object_detection/data/mscoco_label_map.pbtxt');
my %labels_map = $response->{content} =~ m<
(?:item \s+ \{ \s+
\Qname:\E \s+ "[^"]+" \s+
\Qid:\E \s+ (\d+) \s+
\Qdisplay_name:\E \s+ "([^"]+)" \s+
})+
>sgx;
my $label_count = List::Util::max keys %labels_map;
say "We have a label count of $label_count. These labels include: ",
join ", ", List::Util::head( 5, @labels_map{ sort keys %labels_map } );
We define the tag set [ 'serve' ]
which we will use to load the model.
my @tags = ( 'serve' );
We can examine what computations are contained in the graph in terms of the names of the inputs and outputs of an operation found in the graph by running saved_model_cli
.
if( File::Which::which('saved_model_cli')) {
local $ENV{TF_CPP_MIN_LOG_LEVEL} = 3; # quiet the TensorFlow logger for the following command
system(qw(saved_model_cli show),
qw(--dir) => $model_base,
qw(--tag_set) => join(',', @tags),
qw(--signature_def) => 'serving_default'
) == 0 or die "Could not run saved_model_cli";
} else {
say "Install the tensorflow Python package to get the `saved_model_cli` command.";
}
The above saved_model_cli
output shows that the model input is at serving_default_input_tensor:0
which means the operation named serving_default_input_tensor
at index 0
and there are multiple outputs with different shapes.
Per the model description on TensorFlow Hub:
Inputs
A three-channel image of variable size - the model does NOT support batching. The input tensor is a
tf.uint8
tensor with shape [1, height, width, 3] with values in [0, 255].Outputs
The output dictionary contains:
num_detections
: atf.int
tensor with only one value, the number of detections [N].detection_boxes
: atf.float32
tensor of shape [N, 4] containing bounding box coordinates in the following order: [ymin, xmin, ymax, xmax].detection_classes
: atf.int
tensor of shape [N] containing detection class index from the label file.detection_scores
: atf.float32
tensor of shape [N] containing detection scores.
Note that the above documentation has two errors: both num_detections
and detection_classes
are not of type tf.int
, but are actually tf.float32
.
Now we can load the model from that folder with the tag set [ 'serve' ]
by using the LoadFromSavedModel
constructor to create a ::Graph
and a ::Session
for that graph.
my $opt = AI::TensorFlow::Libtensorflow::SessionOptions->New;
my $graph = AI::TensorFlow::Libtensorflow::Graph->New;
my $session = AI::TensorFlow::Libtensorflow::Session->LoadFromSavedModel(
$opt, undef, $model_base, \@tags, $graph, undef, $s
);
AssertOK($s);
So let's use the names from the saved_model_cli
output to create our ::Output
ArrayRef
s.
my %ops = (
in => {
op => $graph->OperationByName('serving_default_input_tensor'),
dict => {
input_tensor => 0,
}
},
out => {
op => $graph->OperationByName('StatefulPartitionedCall'),
dict => {
detection_boxes => 0,
detection_classes => 1,
detection_scores => 2,
num_detections => 3,
}
},
);
my %outputs;
%outputs = map {
my $put_type = $_;
my $op = $ops{$put_type}{op};
my $port_dict = $ops{$put_type}{dict};
$put_type => +{
map {
my $dict_key = $_;
my $index = $port_dict->{$_};
$dict_key => AI::TensorFlow::Libtensorflow::Output->New( {
oper => $op,
index => $index,
});
} keys %$port_dict
}
} keys %ops;
p %outputs;
Now we can get the following testing image from GitHub.
use HTML::Tiny;
my %images_for_test_to_uri = (
"beach_scene" => 'https://github.com/tensorflow/models/blob/master/research/object_detection/test_images/image2.jpg?raw=true',
);
my @image_names = sort keys %images_for_test_to_uri;
my $h = HTML::Tiny->new;
my $image_name = 'beach_scene';
if( IN_IPERL ) {
IPerl->html(
$h->a( { href => $images_for_test_to_uri{$image_name} },
$h->img({
src => $images_for_test_to_uri{$image_name},
alt => $image_name,
width => '100%',
})
),
);
}
We now fetch the image and prepare it to be in the needed format by using Imager
. Note that this model does not need the input image to be of a certain size so no resizing or padding is required.
Then we turn the Imager
data into a PDL
ndarray. Since we just need the 3 channels of the image as they are, they can be stored directly in a PDL
ndarray of type byte
.
The reason why we need to concatenate the PDL
ndarrays here despite the model only taking a single image at a time is to get an ndarray with four (4) dimensions with the last PDL
dimension of size one (1).
sub load_image_to_pdl {
my ($uri, $image_size) = @_;
my $http = HTTP::Tiny->new;
my $response = $http->get( $uri );
die "Could not fetch image from $uri" unless $response->{success};
say "Downloaded $uri";
my $img = Imager->new;
$img->read( data => $response->{content} );
# Create PDL ndarray from Imager data in-memory.
my $data;
$img->write( data => \$data, type => 'raw' )
or die "could not write ". $img->errstr;
die "Image does not have 3 channels, it has @{[ $img->getchannels ]} channels"
if $img->getchannels != 3;
# $data is packed as PDL->dims == [w,h] with RGB pixels
my $pdl_raw = zeros(byte, $img->getchannels, $img->getwidth, $img->getheight);
${ $pdl_raw->get_dataref } = $data;
$pdl_raw->upd_data;
$pdl_raw;
}
my @pdl_images = map {
load_image_to_pdl(
$images_for_test_to_uri{$_},
$model_name_to_params{$model_name}{image_size}
);
} ($image_names[0]);
my $pdl_image_batched = cat(@pdl_images);
my $t = Uint8PDLTOTFTensor($pdl_image_batched);
die "There should be 4 dimensions" unless $pdl_image_batched->ndims == 4;
die "With the final dimension of length 1" unless $pdl_image_batched->dim(3) == 1;
p $pdl_image_batched;
p $t;
We can use the Run
method to run the session and get the multiple output TFTensor
s. The following uses the names in $outputs
mapping to help process the multiple outputs more easily.
my $RunSession = sub {
my ($session, $t) = @_;
my @outputs_t;
my @keys = keys %{ $outputs{out} };
my @values = $outputs{out}->@{ @keys };
$session->Run(
undef,
[ values %{$outputs{in} } ], [$t],
\@values, \@outputs_t,
undef,
undef,
$s
);
AssertOK($s);
return { mesh \@keys, \@outputs_t };
};
undef;
my $tftensor_output_by_name = $RunSession->($session, $t);
my %pdl_output_by_name = map {
$_ => FloatTFTensorToPDL( $tftensor_output_by_name->{$_} )
} keys $tftensor_output_by_name->%*;
undef;
Then we use a score threshold to select the objects of interest.
my $min_score_thresh = 0.30;
my $which_detect = which( $pdl_output_by_name{detection_scores} > $min_score_thresh );
my %subset;
$subset{detection_boxes} = $pdl_output_by_name{detection_boxes}->dice('X', $which_detect);
$subset{detection_classes} = $pdl_output_by_name{detection_classes}->dice($which_detect);
$subset{detection_scores} = $pdl_output_by_name{detection_scores}->dice($which_detect);
$subset{detection_class_labels}->@* = map { $labels_map{$_} } $subset{detection_classes}->list;
p %subset;
The following uses the bounding boxes and class label information to draw boxes and labels on top of the image using Gnuplot.
use PDL::Graphics::Gnuplot;
my $plot_output_path = 'objects-detected.png';
my $gp = gpwin('pngcairo', font => ",12", output => $plot_output_path, aa => 2, size => [10] );
my @qual_cmap = ('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6');
$gp->options(
map {
my $idx = $_;
my $lc_rgb = $qual_cmap[ $subset{detection_classes}->slice("($idx)")->squeeze % @qual_cmap ];
my $box_corners_yx_norm = $subset{detection_boxes}->slice([],$idx,[0,0,0]);
$box_corners_yx_norm->reshape(2,2);
my $box_corners_yx_img = $box_corners_yx_norm * $pdl_images[0]->shape->slice('-1:-2');
my $from_xy = join ",", $box_corners_yx_img->slice('-1:0,(0)')->list;
my $to_xy = join ",", $box_corners_yx_img->slice('-1:0,(1)')->list;
my $label_xy = join ",", $box_corners_yx_img->at(1,1), $box_corners_yx_img->at(0,1);
(
[ object => [ "rect" =>
from => $from_xy, to => $to_xy,
qq{front fs empty border lc rgb "$lc_rgb" lw 5} ], ],
[ label => [
sprintf("%s: %.1f",
$subset{detection_class_labels}[$idx],
100*$subset{detection_scores}->at($idx,0) ) =>
at => $label_xy, 'left',
offset => 'character 0,-0.25',
qq{font ",12" boxed front tc rgb "#ffffff"} ], ],
)
} 0..$subset{detection_boxes}->dim(1)-1
);
$gp->plot(
topcmds => q{set style textbox opaque fc "#505050f0" noborder},
square => 1,
yrange => [$pdl_images[0]->dim(2),0],
with => 'image', $pdl_images[0],
);
$gp->close;
IPerl->png( bytestream => path($plot_output_path)->slurp_raw ) if IN_IPERL;
use Filesys::DiskUsage qw/du/;
my $total = du( { 'human-readable' => 1, dereference => 1 },
$model_archive_path, $model_base );
say "Disk space usage: $total"; undef;