GPflow is built on top of TensorFlow, so it is useful to have some understanding of how TensorFlow works. In particular, TensorFlow's two-stage concept of first building a static compute graph and then executing it for specific input values can cause problems. This notebook aims to help with the most common issues.
import gpflow
import tensorflow as tf
import numpy as np
from gpflow import settings
The following example shows a typical situation when computation time increases proportionally to the number of GPflow objects that are created.
for n in range(2, 4):
kernel = gpflow.kernels.RBF(input_dim=1) # This is a gpflow object with tf.Variables inside
x = np.random.randn(n, 1) # gpflow expects rank-2 input matrices, even for D=1
kxx = kernel.K(x) # This is a tensor!
Remember, we operate on a TensorFlow graph!
Every time we create (build and compile) a new GPflow object, we continue to add more tensors to the graph and change only the reference to them, despite overwriting (in this case) the kernel variable.
So, unnecessary expansion of the graph slows down your computation!
The following examples show how to fix the issue (imagine running this code snippet in ipython
repeatedly):
for n in range(2, 4):
gpflow.reset_default_graph_and_session()
kernel = gpflow.kernels.RBF(1)
x = np.random.randn(n, 1)
kxx = kernel.K(x)
Here we were simply resetting the default graph and session using GPflow's reset_default_graph_and_session()
function. In the next example we explicitly build new tf.Graph()
and tf.Session()
objects:
for n in range(2, 4):
with tf.Graph().as_default() as graph:
with tf.Session(graph=graph).as_default():
kernel = gpflow.kernels.RBF(1)
x = np.random.randn(n, 1)
kxx = kernel.K(x)
In the Custom mean functions notebook we show a real-world example of this idea.
np.random.seed(1)
x = np.random.randn(2, 1)
y = np.random.randn(2, 1)
kernel = gpflow.kernels.RBF(1)
model = gpflow.models.GPR(x, y, kernel)
print(model.compute_log_likelihood())
x_new = np.random.randn(100, 1)
y_new = np.random.randn(100, 1)
-2.8766930392437593
We can compute the log-likelihood of the model on different data. Note that we didn't change the original model!
x_tensor = model.X.parameter_tensor
y_tensor = model.Y.parameter_tensor
model.compute_log_likelihood(feed_dict={x_tensor: x_new, y_tensor: y_new}) # we can still probe the model with the old data
-140.83017563045192
We can do the same by permanently updating the values of the dataholders.
model.X = x_new
model.Y = y_new
model.compute_log_likelihood()
-140.83017563045192
You can pass TensorFlow tensors for any non-trainable parameters of the GPflow objects like DataHolders.
np.random.seed(1)
kernel = gpflow.kernels.RBF(1)
likelihood = gpflow.likelihoods.Gaussian()
x_tensor = tf.random_normal((100, 1), dtype=settings.float_type)
y_tensor = tf.random_normal((100, 1), dtype=settings.float_type)
z = np.random.randn(10, 1)
model = gpflow.models.SVGP(x_tensor, y_tensor, kern=kernel, likelihood=likelihood, Z=z)
model.compute_log_likelihood()
-196.46001677717464
You can also use TensorFlow variables for trainable objects:
z = tf.Variable(np.random.randn(10, 1))
model = gpflow.models.SVGP(x_tensor, y_tensor, kern=kernel, likelihood=likelihood, Z=z)
However, in this case you have to initialise them manually, before interacting with a model:
session = gpflow.get_default_session()
session.run(z.initializer)
model.compute_log_likelihood()
-193.12198293763578
Sometimes we want to impose a hard-coded structure on the model (for example, if we have a multi-output model where some output dimensions share the same kernel and others don't). Unfortunately we cannot do this after the kernel object is compiled. We have to do it at build time and then manually compile the object.
with gpflow.decors.defer_build():
kernels = [gpflow.kernels.RBF(1) for _ in range(3)]
mo_kernels = gpflow.multioutput.kernels.SeparateMixedMok(kernels, W=np.random.randn(3, 4))
mo_kernels.kernels[0].lengthscales = mo_kernels.kernels[1].lengthscales
mo_kernels.compile()
assert mo_kernels.kernels[0].lengthscales is mo_kernels.kernels[1].lengthscales
The following is an example of bad practice:
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)
model = gpflow.models.GPR(x, y, kernel)
optimizer = gpflow.training.AdamOptimizer()
optimizer.minimize(model, maxiter=2)
# Do something with the model
optimizer.minimize(model, maxiter=2)
The minimize()
call creates a bunch of optimisation tensors. Calling minimize()
again causes the same issue discussed under issue (1).
The correct way of optimising your model without polluting your graph is as follows:
kernel = gpflow.kernels.RBF(1)
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)
model = gpflow.models.GPR(x, y, kernel)
optimizer = gpflow.training.AdamOptimizer()
optimizer_tensor = optimizer.make_optimize_tensor(model)
session = gpflow.get_default_session()
for _ in range(2):
session.run(optimizer_tensor)
Don't forget to anchor your model to the session after optimisation. Then you can continue working with your model.
model.anchor(session)
Now, if you need to optimise it again, you can reuse the same optimiser tensor.
for _ in range(2):
session.run(optimizer_tensor)
model.anchor(session)
np.random.seed(1)
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)
kernel = gpflow.kernels.RBF(1)
model = gpflow.models.GPR(x, y, kernel)
optimizer = gpflow.training.AdamOptimizer()
optimizer_tensor = optimizer.make_optimize_tensor(model)
The initial value before optimisation is:
model.kern.lengthscales.value
array(1.)
Let's call one step of the optimisation and check the new value of the parameter.
gpflow.get_default_session().run(optimizer_tensor)
model.kern.lengthscales.value
array(1.)
After optimisation you would expect that the parameters were updated, but they weren't. The trick is that the value
property returns a cached NumPy value of a parameter.
You can get the value of the optimised parameter by using the read_value()
method, specifying the correct session
.
model.kern.lengthscales.read_value(session)
1.0006322362558255
Alternatively, you can anchor(session)
your model to the session after the optimisation step. The anchor()
updates the parameters' cache.
NOTE: The anchor(session)
method is significantly more time-consuming than read_value(session)
. Do not call it too often unless you need to.
model.anchor(session)
model.kern.lengthscales.value
array(1.00063224)
kernel = gpflow.kernels.RBF(1)
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)
model = gpflow.models.GPR(x, y, kernel)
from pathlib import Path
filename = "/tmp/gpr.gpflow"
path = Path(filename)
if path.exists():
path.unlink()
saver = gpflow.saver.Saver()
saver.save(filename, model)
You can load the model into a different graph:
with tf.Graph().as_default() as graph, tf.Session().as_default():
model_copy = saver.load(filename)
Alternatively, you can load the model into the same session:
ctx_for_loading = gpflow.saver.SaverContext(autocompile=False)
model_copy = saver.load(filename, context=ctx_for_loading)
model_copy.clear()
model_copy.compile()
The difference between the former approach and the latter lies in the TensorFlow name scopes which are used for naming variables. The former approach replicates the instance of the TensorFlow objects (which already exist in the original graph), so we need to load the model into a new graph. The latter approach uses different name scopes for the variables so that you can dump the model in the same graph.