In this notebook you will serve your first TensorFlow model with TensorFlow Serving. We will start by building a very simple model to infer a simple number relationship:
$$ y = 2x - 1 $$between a few pairs of numbers. After training our model, we will serve it with TensorFlow Serving, and then we will make inference requests.
Note: This notebook is designed to be run in Google Colab if you want to run it locally or on a Jupyter notebook you would need to make minor changes and remove the Colab specific code
We will start by importing tensorflow
try:
%tensorflow_version 2.x
except:
pass
import os
import json
import tempfile
import requests
import numpy as np
import tensorflow as tf
print("\u2022 Using TensorFlow Version:", tf.__version__)
• Using TensorFlow Version: 2.2.0
We will install TensorFlow Serving using Aptitude (the default Debian package manager) since Google's Colab runs in a Debian environment.
Before we can install TensorFlow Serving, we need to add the tensorflow-model-server
package to the list of packages that Aptitude knows about. Note that we're running as root.
Note: This notebook is running TensorFlow Serving natively, but you can also run it in a Docker container, which is one of the easiest ways to get started using TensorFlow Serving. The Docker Engine is available for a variety of Linux platforms, Windows, and Mac.
# This is the same as you would do from your command line, but without the [arch=amd64], and no sudo
# You would instead do:
# echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
# curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update
deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 2943 100 2943 0 0 5181 0 --:--:-- --:--:-- --:--:-- 5181 OK Hit:1 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease Hit:2 http://archive.ubuntu.com/ubuntu bionic InRelease Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB] Get:4 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic InRelease [15.4 kB] Get:5 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease [3,626 B] Get:6 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB] Get:7 http://storage.googleapis.com/tensorflow-serving-apt stable InRelease [3,012 B] Get:8 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB] Ign:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease Ign:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease Hit:11 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release Hit:12 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release Get:13 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic/main Sources [1,816 kB] Get:14 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic/main amd64 Packages [876 kB] Get:15 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [1,376 kB] Get:16 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [1,207 kB] Get:17 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 Packages [354 B] Get:19 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server-universal amd64 Packages [361 B] Get:21 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [912 kB] Get:22 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [846 kB] Fetched 7,307 kB in 2s (2,975 kB/s) Reading package lists... Done Building dependency tree Reading state information... Done 39 packages can be upgraded. Run 'apt list --upgradable' to see them.
Now that the Aptitude packages have been updated, we can use the apt-get
command to install the TensorFlow model server.
!apt-get install tensorflow-model-server
Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: tensorflow-model-server 0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded. Need to get 175 MB of archives. After this operation, 0 B of additional disk space will be used. Get:1 http://storage.googleapis.com/tensorflow-serving-apt stable/tensorflow-model-server amd64 tensorflow-model-server all 2.1.0 [175 MB] Fetched 175 MB in 3s (55.7 MB/s) Selecting previously unselected package tensorflow-model-server. (Reading database ... 144433 files and directories currently installed.) Preparing to unpack .../tensorflow-model-server_2.1.0_all.deb ... Unpacking tensorflow-model-server (2.1.0) ... Setting up tensorflow-model-server (2.1.0) ...
Now, we will create a simple dataset that expresses the relationship:
$$ y = 2x - 1 $$between inputs (xs
) and outputs (ys
).
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
We'll use the simplest possible model for this example. Since we are going to train our model for 500
epochs, in order to avoid clutter on the screen, we will use the argument verbose=0
in the fit
method. The Verbosity mode can be:
0
: silent.
1
: progress bar.
2
: one line per epoch.
As a side note, we should mention that since the progress bar is not particularly useful when logged to a file, verbose=2
is recommended when not running interactively (eg, in a production environment).
model = tf.keras.Sequential([tf.keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd',
loss='mean_squared_error')
history = model.fit(xs, ys, epochs=500, verbose=0)
print("Finished training the model")
Finished training the model
Now that the model is trained, we can test it. If we give it the value 10
, we should get a value very close to 19
.
print(model.predict([[5.0]]))
[[8.994236]]
To load the trained model into TensorFlow Serving we first need to save it in the SavedModel format. This will create a protobuf file in a well-defined directory hierarchy, and will include a version number. TensorFlow Serving allows us to select which version of a model, or "servable" we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.
MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
if os.path.isdir(export_path):
print('\nAlready saved a model, cleaning up\n')
!rm -r {export_path}
model.save(export_path, save_format="tf")
print('\nexport_path = {}'.format(export_path))
!ls -l {export_path}
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. INFO:tensorflow:Assets written to: /tmp/1/assets export_path = /tmp/1 total 48 drwxr-xr-x 2 root root 4096 May 16 05:12 assets -rw-r--r-- 1 root root 39128 May 16 05:12 saved_model.pb drwxr-xr-x 2 root root 4096 May 16 05:12 variables
We'll use the command line utility saved_model_cli
to look at the MetaGraphDefs
and SignatureDefs
in our SavedModel. The signature definition is defined by the input and output tensors, and stored with the default serving key.
!saved_model_cli show --dir {export_path} --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs: signature_def['__saved_model_init_op']: The given SavedModel SignatureDef contains the following input(s): The given SavedModel SignatureDef contains the following output(s): outputs['__saved_model_init_op'] tensor_info: dtype: DT_INVALID shape: unknown_rank name: NoOp Method name is: signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['dense_input'] tensor_info: dtype: DT_FLOAT shape: (-1, 1) name: serving_default_dense_input:0 The given SavedModel SignatureDef contains the following output(s): outputs['dense'] tensor_info: dtype: DT_FLOAT shape: (-1, 1) name: StatefulPartitionedCall:0 Method name is: tensorflow/serving/predict WARNING: Logging before flag parsing goes to stderr. W0516 05:12:34.321393 140293225809792 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling __init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. Defined Functions: Function Name: '__call__' Option #1 Callable with: Argument #1 dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Option #2 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Option #3 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #4 Callable with: Argument #1 dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Function Name: '_default_save_signature' Option #1 Callable with: Argument #1 dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input') Function Name: 'call_and_return_all_conditional_losses' Option #1 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #2 Callable with: Argument #1 inputs: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'inputs') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None Option #3 Callable with: Argument #1 dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input') Argument #2 DType: bool Value: False Argument #3 DType: NoneType Value: None Option #4 Callable with: Argument #1 dense_input: TensorSpec(shape=(None, 1), dtype=tf.float32, name=u'dense_input') Argument #2 DType: bool Value: True Argument #3 DType: NoneType Value: None
We will now launch the TensorFlow model server with a bash script. We will use the argument --bg
to run the script in the background.
Our script will start running TensorFlow Serving and will load our model. Here are the parameters we will use:
rest_api_port
: The port that you'll use for requests.
model_name
: You'll use this in the URL of your requests. It can be anything.
model_base_path
: This is the path to the directory where you've saved your model.
Also, because the variable that points to the directory containing the model is in Python, we need a way to tell the bash script where to find the model. To do this, we will write the value of the Python variable to an environment variable using the os.environ
function.
os.environ["MODEL_DIR"] = MODEL_DIR
%%bash --bg
nohup tensorflow_model_server \
--rest_api_port=8501 \
--model_name=test \
--model_base_path="${MODEL_DIR}" >server.log 2>&1
Starting job # 0 in a separate thread.
Now we can take a look at the server log.
!tail server.log
2020-05-16 05:12:35.568881: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-05-16 05:12:35.582063: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle. 2020-05-16 05:12:35.592041: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:152] Running initialization op on SavedModel bundle at path: /tmp/1 2020-05-16 05:12:35.594441: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 26368 microseconds. 2020-05-16 05:12:35.594760: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /tmp/1/assets.extra/tf_serving_warmup_requests 2020-05-16 05:12:35.594866: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: test version: 1} 2020-05-16 05:12:35.595857: I tensorflow_serving/model_servers/server.cc:358] Running gRPC ModelServer at 0.0.0.0:8500 ... [warn] getaddrinfo: address family for nodename not supported 2020-05-16 05:12:35.596419: I tensorflow_serving/model_servers/server.cc:378] Exporting HTTP/REST API at:localhost:8501 ... [evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
We are now ready to construct a JSON object with some data so that we can make a couple of inferences. We will use $x=9$ and $x=10$ as our test data.
xs = np.array([[9.0], [10.0]])
data = json.dumps({"signature_name": "serving_default", "instances": xs.tolist()})
print(data)
{"signature_name": "serving_default", "instances": [[9.0], [10.0]]}
Finally, we can make the inference request and get the inferences back. We'll send a predict request as a POST to our server's REST endpoint, and pass it our test data. We'll ask our server to give us the latest version of our model by not specifying a particular version. The response will be a JSON payload containing the predictions.
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/test:predict', data=data, headers=headers)
print(json_response.text)
{ "predictions": [[16.9821], [18.9790649] ] }
We can also look at the predictions directly by loading the value for the predictions
key.
predictions = json.loads(json_response.text)['predictions']
print(predictions)
[[16.9821], [18.9790649]]
You just saw how you could serve a dummy model with TF Model server let's now see a real model in action