In this notebook, we introduce the Maximum a Posteriori (MAP), which extends Maximum Likelihood Estimation (MLE) by inclusion of a prior $p(\theta)$ into the cost function. To include this prior information, we construct a Bayesian Posterior with Bayesian's Theorem given as,
$$ P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)} $$where,
$~$
$P(\theta|D)$ represents the posterior and can be read as "the probability of the parameters $(\theta)$ given the data $(D)$",
$P(D|\theta)$ is the probability of the data given the parameters, commonly called the likelihood,
$P(\theta)$ represents the probability of the parameters commonly called the prior,
$P(D)$ is the probability of the data and is commonly called the marginal probability.
However, as the marginal probability is commonly difficult to compute and represents a normalisation constant, in the case of MAP this term is forgone and the proportional posterior is optmised instead. This is given as,
$$ P(\theta|D) \propto P(D|\theta)P(\theta) $$Before we begin, we need to ensure that we have all the necessary tools. We will install and import PyBOP as well as upgrade dependencies. We also fix the random seed in order to generate consistent output during development, although this does not need to be done in practice.
%pip install --upgrade pip ipywidgets -q
%pip install pybop -q
import time
import numpy as np
import pybop
pybop.PlotlyManager().pio.renderers.default = "notebook_connected"
np.random.seed(8)
Note: you may need to restart the kernel to use updated packages. Note: you may need to restart the kernel to use updated packages.
To demonstrate the MAP process, we will first need a forward model and data to run parameter inference on. As we are introducing this as a simple example, we will use the PyBOP forward model with white noise as the reference. This requires defining a parameter set and the model itself.
parameter_set = pybop.ParameterSet.pybamm("Chen2020")
model = pybop.lithium_ion.SPM(parameter_set=parameter_set)
We can now simulate the model using the model.predict
method. This method is a light wrapper on the Pybamm.Simulation
class and be used as such. For this example, we use the default current function for the Chen2020
parameter set (5A) to generate the voltage data. As the goal is to investigate the MAP method, we will generate a range of observations from the forward model.
n = 8 # Number of time-series trajectories
observations = [
2**j for j in range(1, n + 1)
] # Number of observations in each trajectory
values = []
for i in observations:
t_eval = np.linspace(0, 10, i)
values.append(model.predict(t_eval=t_eval)) # predict and store
print(f"Number of observations in each trajectory: {observations}")
Number of observations in each trajectory: [2, 4, 8, 16, 32, 64, 128, 256]
To make the parameter inference more realistic, we add gaussian noise with zero mean to the data. While this doesn't truly represent the challenge of parameter inference with experimental data, this does ensure the cost landscape curvature isn't perfect. For a more realistic representation of experimental data, a different noise function could be used.
sigma = 0.005
corrupt_values = values[1]["Voltage [V]"].data + np.random.normal(
0, sigma, len(values[1]["Voltage [V]"].data)
)
The reference trajectory needs to be included in the optimisation task, which is handed within the Dataset
class. In this situation, this class is composed of the time, current, and the noisy voltage data; however, if we were performing parameter inference from a different measured signal, such as 'Cell Temperature', that would need to be included.
dataset = pybop.Dataset(
{
"Time [s]": values[1]["Time [s]"].data,
"Current function [A]": values[1]["Current [A]"].data,
"Voltage [V]": corrupt_values,
}
)
Next, we need to select the forward model parameters for inference. The PyBOP parameters class composes as many individual PyBOP parameters as the user wants (whether these parameters can be identified is left out of this example). This class requires the parameter name, which must resolve to a parameter within the ParameterSet
defined above. Additionally, this class can accept an initial_value
which will be used by the optimiser, as well as bounds. For this example, we will provide a prior
to the parameter class, which will be used later by the MAP process.
parameters = pybop.Parameters(
pybop.Parameter(
"Negative particle radius [m]",
prior=pybop.Gaussian(4e-6, 1e-6),
true_value=parameter_set["Negative particle radius [m]"],
),
pybop.Parameter(
"Positive particle radius [m]",
prior=pybop.Gaussian(5e-6, 1e-6),
true_value=parameter_set["Positive particle radius [m]"],
),
)
Default bounds applied based on prior distribution. Default bounds applied based on prior distribution.
With the datasets and parameters defined, we can now construct the FittingProblem
which composes the model, parameters, and dataset providing a single class with the required information for simulating and assessing the forward model.
As described in the introduction to this notebook, the MAP method uses the non-normalised posterior for optimisation. This is defined in PyBOP as the LogPosterior
class, and has arguments for the likelihood and prior functions. If a prior is not provided, the parameter priors are used as default. In this example, we will use a GaussianLogLikelihoodKnownSigma
likelihood function, and the default priors set above. For numerical reasons, we optimise the log of the posterior; however this doesn't affect the final results.
problem = pybop.FittingProblem(model, parameters, dataset)
likelihood = pybop.GaussianLogLikelihoodKnownSigma(problem, sigma0=sigma)
posterior = pybop.LogPosterior(likelihood)
Next, to investigate the individual components of the Posterior. The LogPosterior
class provides attributes of the prior and likelihood. To investigate the contributions of each to the Posterior we plot the landscapes across a selected parameter range.
steps = 10 # Number of discretisation points
bounds = np.asarray([[1e-6, 9e-6], [1e-6, 9e-6]])
pybop.plot2d(posterior.prior, bounds=bounds, steps=steps, title="Log Prior")
pybop.plot2d(posterior.likelihood, bounds=bounds, steps=steps, title="Log Likelihood")
pybop.plot2d(posterior, bounds=bounds, steps=steps, title="Log Posterior");
As expected, the prior represents a two-dimensional gaussian distribution with a mode at $[4e-6,5e-6]$. The likelihood appears to have a banded shape with ridge of optimal points traversing the higher parameter values. Finally, the Posterior forms the combination of the two. This is the benefit of the MAP process, as it allows for previous information to be included in the parameter inference task. Previous knowledge is encapsulated within the prior function and influences the Posterior, depending on the magnitude of the likelihood function.
To show how this is used within a PyBOP optimisation task, we select the Covariance Matrix Adaptation Evolution Strategy optimiser and run the optimisation. We can then plot the parameter trajectories to investigate how the optimiser performed.
optim = pybop.CMAES(
posterior,
min_iterations=20,
max_iterations=100,
)
start_time = time.time()
x, final_cost = optim.run()
print(f"Inferred Parameters: {x} in {time.time() - start_time} seconds")
print(f"True Parameters: {parameters.true_value()}")
pybop.plot_parameters(optim);
Inferred Parameters: [5.35425483e-06 5.58415228e-06] in 8.549633979797363 seconds True Parameters: [5.86e-06 5.22e-06]
As expected, the optimisation process returns values close to the true optimal. In this case, as the synthetic data is drawn from the forward model (with additive noise), the underlying structure is close to identical. This gives the optimisation process a very well posed landscape, and as such it finds the correct parameter values.
This is not always the case, especially in inference tasks with low-quality data, sloppy parameters, or poor excitation. In these cases, the prior influence can help 'steer' the optimisation process towards the combination of the likelihood and the user's prior knowledge of the parameters.
We've seen above that the proportional posterior can be represented from its components, the log-likelihood and log-prior. Next, to better understand when each of these terms can become dominating within the parameter inference problem we vary the number of measurement observations (i.e. the number of samples in the dataset) and inspect the construct posterior.
This is completed below with an increasing series of $2^n$, where $n$ is set above in our original creation of the trajectories. This will give us a visual representation of how the posterior changes with increasing observations.
for i, val in enumerate(values): # Loop through trajectories
dataset = pybop.Dataset(
{
"Time [s]": val["Time [s]"].data,
"Current function [A]": val["Current [A]"].data,
"Voltage [V]": val["Voltage [V]"].data
+ np.random.normal(0, sigma, len(val["Voltage [V]"].data)),
}
)
problem = pybop.FittingProblem(model, parameters, dataset)
likelihood = pybop.GaussianLogLikelihood(problem, sigma0=sigma)
posterior = pybop.LogPosterior(likelihood)
pybop.plot2d(
posterior,
bounds=bounds,
steps=steps,
title=f"Posterior with {observations[i]} observations",
)
Fixed Sigma for output 1: 0.005
Fixed Sigma for output 1: 0.005
Fixed Sigma for output 1: 0.005
Fixed Sigma for output 1: 0.005
Fixed Sigma for output 1: 0.005
Fixed Sigma for output 1: 0.005
Fixed Sigma for output 1: 0.005
Fixed Sigma for output 1: 0.005
The above contour plots showcase the influence decay of the prior on the posterior as observations are increased. This is expected as prior knowledge on the posterior should be reduced as additional high fidelity observations are obtained. In the case of lower quality, or noisy data, the prior influence can maintain influence.
The influence of the prior and likelihood on the posterior should be investigated during a parameter inference process, as presented above. This provides insight into how much influence prior knowledge has on the optimisation task, whether the likelihood function is well posed with smooth curvature, and finally the overall scale of the posterior.
This notebook illustrates the process of parameter inference with the Maximum a Posteriori method. This process enables encapsulation of prior knowledge into the optimisation process with influence decay as observations of the system increase. This influence decay has been presented above across observations obtained from the set $({2^n \mid n \in \mathbb{N}})$.