This notebook examines why we prefer the Variational Free Energy (VFE) objective to the Fully Independent Training Conditional (FITC) approximation for our sparse approximations.
import gpflow
from gpflow.test_util import notebook_niter # to speed up automated testing of this notebook
import matplotlib.pyplot as plt
%matplotlib inline
from FITCvsVFE import (getTrainingTestData, printModelParameters, plotPredictions,
repeatMinimization, stretch, plotComparisonFigure)
import logging
logging.disable(logging.WARN) # do not clutter up the notebook with optimisation warnings
First, we load the training data and plot it together with the exact GP solution (using the GPR
model):
# Load the training data:
xtrain, ytrain, xtest, ytest = getTrainingTestData()
def getKernel():
return gpflow.kernels.SquaredExponential(input_dim=1)
# Run exact inference on training data:
exact_model = gpflow.models.GPR(xtrain, ytrain, kern=getKernel())
opt = gpflow.train.ScipyOptimizer(tol=1e-11)
opt.minimize(exact_model, maxiter=notebook_niter(2000000))
print("Exact model parameters:")
printModelParameters(exact_model)
figA, ax = plt.subplots(1,1)
ax.plot(xtrain, ytrain, 'ro')
plotPredictions(ax, exact_model, color='g')
Exact model parameters: Likelihood variance = 0.074285 Kernel variance = 0.90049 Kernel lengthscale = 0.5825
def initializeHyperparametersFromExactSolution(sparse_model):
sparse_model.likelihood.variance = exact_model.likelihood.variance.read_value().copy()
sparse_model.kern.variance = exact_model.kern.variance.read_value().copy()
sparse_model.kern.lengthscales = exact_model.kern.lengthscales.read_value().copy()
We now construct two sparse model using the VFE (SGPR
model) and FITC (GPRFITC
model) optimisation objectives, with the inducing points being initialised on top of the training inputs, and the model hyperparameters (kernel variance and lengthscales, and likelihood variance) being initialized to the values obtained in the optimisation of the exact GPR
model:
# Train VFE model initialized from the perfect solution.
VFEmodel = gpflow.models.SGPR(xtrain, ytrain, kern=getKernel(), Z=xtrain.copy())
initializeHyperparametersFromExactSolution(VFEmodel)
VFEcb = repeatMinimization(VFEmodel, xtest, ytest) # optimise with several restarts
print("Sparse model parameters after VFE optimization:")
printModelParameters(VFEmodel)
Sparse model parameters after VFE optimization: Likelihood variance = 0.074286 Kernel variance = 0.90049 Kernel lengthscale = 0.5825
# Train FITC model initialized from the perfect solution.
FITCmodel = gpflow.models.GPRFITC(xtrain, ytrain, kern=getKernel(), Z=xtrain.copy())
initializeHyperparametersFromExactSolution(FITCmodel)
FITCcb = repeatMinimization(FITCmodel, xtest, ytest) # optimise with several restarts
print("Sparse model parameters after FITC optimization:")
printModelParameters(FITCmodel)
Sparse model parameters after FITC optimization: Likelihood variance = 0.019311 Kernel variance = 1.3256 Kernel lengthscale = 0.61747
Plotting a comparison of the two algorithms, we see that VFE stays at the optimum of exact GPR, whereas the FITC approximation eventually ends up with several inducing points on top of each other, and a worse fit:
figB, axes = plt.subplots(3, 2, figsize=(20, 16))
# VFE optimisation finishes after 10 iterations, so we stretch out the training and test
# log-likelihood traces to make them comparable against FITC:
VFEiters = FITCcb.n_iters
VFElog_likelihoods = stretch(len(VFEiters), VFEcb.log_likelihoods)
VFEhold_out_likelihood = stretch(len(VFEiters), VFEcb.hold_out_likelihood)
axes[0,0].set_title('VFE', loc='center', fontdict = {'fontsize': 22})
plotComparisonFigure(xtrain, VFEmodel, exact_model, axes[0,0], axes[1,0], axes[2,0],
VFEiters, VFElog_likelihoods, VFEhold_out_likelihood)
axes[0,1].set_title('FITC', loc='center', fontdict = {'fontsize': 22})
plotComparisonFigure(xtrain, FITCmodel, exact_model, axes[0,1], axes[1,1], axes[2,1],
FITCcb.n_iters, FITCcb.log_likelihoods, FITCcb.hold_out_likelihood)