Notebook

Getting Started with TensorFlow Serving¶

In this notebook you will serve your first TensorFlow model with TensorFlow Serving. We will start by building a very simple model to infer a simple number relationship:

$$ y = 2x - 1 $$

between a few pairs of numbers. After training our model, we will serve it with TensorFlow Serving, and then we will make inference requests.

Note: This notebook is designed to be run in Google Colab if you want to run it locally or on a Jupyter notebook you would need to make minor changes and remove the Colab specific code

Setup¶

We will start by importing tensorflow

In [0]:

try:
    %tensorflow_version 2.x
except:
    pass

In [0]:

import os
import json
import tempfile
import requests
import numpy as np

import tensorflow as tf

print("\u2022 Using TensorFlow Version:", tf.__version__)

• Using TensorFlow Version: 2.2.0

Add TensorFlow Serving Distribution URI as a Package Source¶

We will install TensorFlow Serving using Aptitude (the default Debian package manager) since Google's Colab runs in a Debian environment.

Before we can install TensorFlow Serving, we need to add the tensorflow-model-server package to the list of packages that Aptitude knows about. Note that we're running as root.

Note: This notebook is running TensorFlow Serving natively, but you can also run it in a Docker container, which is one of the easiest ways to get started using TensorFlow Serving. The Docker Engine is available for a variety of Linux platforms, Windows, and Mac.

In [0]:

# This is the same as you would do from your command line, but without the [arch=amd64], and no sudo
# You would instead do:
# echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
# curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update

deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2943  100  2943    0     0   5181      0 --:--:-- --:--:-- --:--:--  5181
OK
Hit:1 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Hit:2 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:4 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic InRelease [15.4 kB]
Get:5 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease [3,626 B]
Get:6 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:7 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B]
Get:8 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Ign:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:11 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:12 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:13 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic/main Sources [1,816 kB]
Get:14 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic/main amd64 Packages [876 kB]
Get:15 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [1,376 kB]
Get:16 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [1,207 kB]
Get:17 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 Packages [354 B]
Get:19 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server-universal amd64 Packages [361 B]
Get:21 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [912 kB]
Get:22 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [846 kB]
Fetched 7,307 kB in 2s (2,975 kB/s)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
39 packages can be upgraded. Run 'apt list --upgradable' to see them.

Install TensorFlow Serving¶

Now that the Aptitude packages have been updated, we can use the apt-get command to install the TensorFlow model server.

In [0]:

!apt-get install tensorflow-model-server

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  tensorflow-model-server
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 175 MB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 tensorflow-model-server all 2.1.0 [175 MB]
Fetched 175 MB in 3s (55.7 MB/s)
Selecting previously unselected package tensorflow-model-server.
(Reading database ... 144433 files and directories currently installed.)
Preparing to unpack .../tensorflow-model-server_2.1.0_all.deb ...
Unpacking tensorflow-model-server (2.1.0) ...
Setting up tensorflow-model-server (2.1.0) ...

Create Dataset¶

Now, we will create a simple dataset that expresses the relationship:

$$ y = 2x - 1 $$

between inputs (xs) and outputs (ys).

In [0]:

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

Build and Train the Model¶

We'll use the simplest possible model for this example. Since we are going to train our model for 500 epochs, in order to avoid clutter on the screen, we will use the argument verbose=0 in the fit method. The Verbosity mode can be:

0 : silent.
1 : progress bar.
2 : one line per epoch.

As a side note, we should mention that since the progress bar is not particularly useful when logged to a file, verbose=2 is recommended when not running interactively (eg, in a production environment).

In [0]:

model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])

model.compile(optimizer='sgd',
              loss='mean_squared_error')

history = model.fit(xs, ys, epochs=500, verbose=0)

print("Finished training the model")

Finished training the model

Test the Model¶

Now that the model is trained, we can test it. If we give it the value 10, we should get a value very close to 19.

In [0]:

print(model.predict([[5.0]]))

[[8.994236]]

Save the Model¶

To load the trained model into TensorFlow Serving we first need to save it in the SavedModel format. This will create a protobuf file in a well-defined directory hierarchy, and will include a version number. TensorFlow Serving allows us to select which version of a model, or "servable" we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.

In [0]:

MODEL_DIR = tempfile.gettempdir()

version = 1

export_path = os.path.join(MODEL_DIR, str(version))

if os.path.isdir(export_path):
    print('\nAlready saved a model, cleaning up\n')
    !rm -r {export_path}

model.save(export_path, save_format="tf")

print('\nexport_path = {}'.format(export_path))
!ls -l {export_path}

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: /tmp/1/assets

export_path = /tmp/1
total 48
drwxr-xr-x 2 root root  4096 May 16 05:12 assets
-rw-r--r-- 1 root root 39128 May 16 05:12 saved_model.pb
drwxr-xr-x 2 root root  4096 May 16 05:12 variables

Examine Your Saved Model¶

We'll use the command line utility saved_model_cli to look at the MetaGraphDefs and SignatureDefs in our SavedModel. The signature definition is defined by the input and output tensors, and stored with the default serving key.

In [0]:

!saved_model_cli show --dir {export_path} --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['dense_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: serving_default_dense_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
WARNING: Logging before flag parsing goes to stderr.
W0516 05:12:34.321393 140293225809792 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling __init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Defined Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #3
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None
    Option #4
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None

  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input')

  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None
    Option #2
      Callable with:
        Argument #1
          inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None
    Option #3
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input')
        Argument #2
          DType: bool
          Value: False
        Argument #3
          DType: NoneType
          Value: None
    Option #4
      Callable with:
        Argument #1
          dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input')
        Argument #2
          DType: bool
          Value: True
        Argument #3
          DType: NoneType
          Value: None

Run the TensorFlow Model Server¶

We will now launch the TensorFlow model server with a bash script. We will use the argument --bg to run the script in the background.

Our script will start running TensorFlow Serving and will load our model. Here are the parameters we will use:

rest_api_port: The port that you'll use for requests.
model_name: You'll use this in the URL of your requests. It can be anything.
model_base_path: This is the path to the directory where you've saved your model.

Also, because the variable that points to the directory containing the model is in Python, we need a way to tell the bash script where to find the model. To do this, we will write the value of the Python variable to an environment variable using the os.environ function.

In [0]:

os.environ["MODEL_DIR"] = MODEL_DIR

In [0]:

%%bash --bg 
nohup tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=test \
  --model_base_path="${MODEL_DIR}" >server.log 2>&1

Starting job # 0 in a separate thread.

Now we can take a look at the server log.

In [0]:

!tail server.log

2020-05-16 05:12:35.568881: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-05-16 05:12:35.582063: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.
2020-05-16 05:12:35.592041: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /tmp/1
2020-05-16 05:12:35.594441: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 26368 microseconds.
2020-05-16 05:12:35.594760: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /tmp/1/assets.extra/tf_serving_warmup_requests
2020-05-16 05:12:35.594866: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: test version: 1}
2020-05-16 05:12:35.595857: I tensorflow_serving/model_servers/server.cc:358] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-05-16 05:12:35.596419: I tensorflow_serving/model_servers/server.cc:378] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

Create JSON Object with Test Data¶

We are now ready to construct a JSON object with some data so that we can make a couple of inferences. We will use $x=9$ and $x=10$ as our test data.

In [0]:

xs = np.array([[9.0], [10.0]])
data = json.dumps({"signature_name": "serving_default", "instances": xs.tolist()})
print(data)

{"signature_name": "serving_default", "instances": [[9.0], [10.0]]}

Make Inference Request¶

Finally, we can make the inference request and get the inferences back. We'll send a predict request as a POST to our server's REST endpoint, and pass it our test data. We'll ask our server to give us the latest version of our model by not specifying a particular version. The response will be a JSON payload containing the predictions.

In [0]:

headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/test:predict', data=data, headers=headers)

print(json_response.text)

{
    "predictions": [[16.9821], [18.9790649]
    ]
}

We can also look at the predictions directly by loading the value for the predictions key.

In [0]:

predictions = json.loads(json_response.text)['predictions']
print(predictions)

[[16.9821], [18.9790649]]

You just saw how you could serve a dummy model with TF Model server let's now see a real model in action