Partial Differential Equation and Gaussian Process¶

1 The Spatial Temporal Model¶

A model of post-transcriptional processing is formulated to describe the spatio-temporal Drosophila protein expression data(Becker 2012). Protein production is considered to be linearly dependent on the concentration of mRNA at an earlier time point. The model also allows for diffusion of protein between nuclei and linear protein decay. These processes are dependent on the diffusion parameter and the degradation rate of protein respectively.

\begin{equation} a \frac{\partial ^2 y_{x,t}}{\partial x^2} + b \frac{\partial y_{x,t}}{\partial t} + c y_{x,t}= f_{x,t} \end{equation}

The coefficients a, b and c are unknown. In this study, we use Gaussian process with RBF kernel as a prior over $y_{x,t}$. The multi-output Gaussian process are developed by applying the partial differential operator on the spatial-temporal RBF kernel.

In [2]:

# Copyright (c) 2012, GPy authors (see AUTHORS.txt).
# Licensed under the BSD 3-clause license (see LICENSE.txt)
# model : a*d^2y/dx^2  + b * dy/dt + c * y = f
#  lengthscale for Yt : 
#  lengthscale for Yx : 
# variance for Yt  :  
# variance for Yx  : 

%pylab inline
%config InlineBackend.figure_format = 'svg'
import numpy as np
import pylab as pb
import GPy

Populating the interactive namespace from numpy and matplotlib

The spatiol-temporal multi-output partial differential equation kernel are developed with kernel name "GPy.kern.ODE_st" in GPy. The inputs are one dimension spatial data, one dimension temporal data and one dimension index which is used to indicate f and Y.

In [2]:

KY = GPy.kern.ODE_st(input_dim=3,a=1.,b=1,c=1,variance_Yx=1.,variance_Yt=1.,
    lengthscale_Yx=1.,lengthscale_Yt=1.) 

nd =10
t1 = 10*np.random.rand(nd)[:,None]
x1 = 10*np.random.rand(nd)[:,None]
inx = np.zeros(nd**2)[:,None]

T = np.kron(t1,np.ones(nd)[:,None])
S = np.kron(np.ones(nd)[:,None],x1)

inx[nd**2/2:nd**2]=1
X = np.hstack([T,S,inx])


#Y=np.sin(X[:,0:1])
#Y=np.sin(X[:,0:1])
Y = np.sin(X[:,0:1]) + np.cos(X[:,1:2])#*(1-X[:,1:2]) #+ np.random.randn(16,1)*0.1

Y[nd**2/2:nd**2] =  np.sin(X[nd**2/2:nd**2,0:1]) + np.cos(X[nd**2/2:nd**2,0:1]) + 2*np.cos(X[nd**2/2:nd**2,1:2])

In [3]:

m = GPy.models.GPRegression(X,Y,KY) 

2 Model Output¶

In order to test the kernel, we defined

$y_{x,t} = cos(x) + sin(t)$

$f_{x,t} = sin(t) + cos(t) + 2cos(x)$

and a, b and c are equal to 1. With some arbitray choices of kernel parameters we plot the random field of f and y separatly. The spatial variable can be fixed to certain value. It become easier to compare the true curve and our estimation.

In [4]:

fixX = 5.
xplot = np.linspace(-2,12,200)
m.plot(fixed_inputs=[(2,0)], which_data_rows = slice(0,nd**2/2))
m.plot(fixed_inputs=[(1,fixX),(2,0)], which_data_rows = slice(0,nd**2/2))
pb.plot(xplot,np.cos(fixX) + np.sin(xplot))

m.plot(fixed_inputs=[(2,1)], which_data_rows = slice(nd**2/2,nd**2))
m.plot(fixed_inputs=[(1,fixX),(2,1)], which_data_rows = slice(nd**2/2,nd**2))
pb.plot(xplot,np.cos(xplot) + np.sin(xplot) + 2*np.cos(fixX))

Out[4]:

[<matplotlib.lines.Line2D at 0x48d6f10>]

In [5]:

m.optimize()

Warning: adding jitter of 2.3116375706e+39
Warning: adding jitter of 8.3803679219e+35
Warning: adding jitter of 3.4977999999e-03
Warning: adding jitter of 1.1815217911e+32

In [7]:

print m

  GP_regression.           |        Value        |  Constraint  |  Prior  |  Tied to
  ode_st.a                 |      1.00220487586  |     +ve      |         |         
  ode_st.b                 |      1.00058344853  |     +ve      |         |         
  ode_st.c                 |       0.9977523428  |     +ve      |         |         
  ode_st.variance_Yt       |      1.80961000578  |     +ve      |         |         
  ode_st.variance_Yx       |      1.80961000578  |     +ve      |         |         
  ode_st.lengthscale_Yt    |      7.30175140468  |     +ve      |         |         
  ode_st.lengthscale_Yx    |      7.92072957612  |     +ve      |         |         
  Gaussian_noise.variance  |  1.12525055876e-09  |     +ve      |         |

After optimization, the estimated value of a, b and c are printed. It is quite close to the true parameters we used to generate the testing data. The plot of the random fields are plotted below.

In [6]:

fixX = 5.
xplot = np.linspace(-2,12,200)
m.plot(fixed_inputs=[(2,0)], which_data_rows = slice(0,nd**2/2))
m.plot(fixed_inputs=[(1,fixX),(2,0)], which_data_rows = slice(0,nd**2/2))
pb.plot(xplot,np.cos(fixX) + np.sin(xplot))

m.plot(fixed_inputs=[(2,1)], which_data_rows = slice(nd**2/2,nd**2))
m.plot(fixed_inputs=[(1,fixX),(2,1)], which_data_rows = slice(nd**2/2,nd**2))
pb.plot(xplot,np.cos(xplot) + np.sin(xplot) + 2*np.cos(fixX))

/home/mu/GPy/GPy/likelihoods/gaussian.py:96: RuntimeWarning: invalid value encountered in sqrt
  return  [stats.norm.ppf(q/100.)*np.sqrt(var) + mu for q in quantiles]

Out[6]:

[<matplotlib.lines.Line2D at 0x471aa50>]

The true random field is plotted below

In [3]:

xplot = np.linspace(-1,10,200)
#pb.plot(xplot,np.sin(xplot))
PP = np.zeros((xplot.shape[0], xplot.shape[0]))
QQ = np.zeros((xplot.shape[0], xplot.shape[0]))

for i in range(0,200):
    for j in range(0,200):
        PP[i,j] = np.sin(xplot[i]) + np.cos(xplot[j])
        QQ[i,j] = np.sin(xplot[i]) + np.cos(xplot[i]) + 2*np.cos(xplot[j])

pb.figure()
pb.imshow(PP)

Out[3]:

<matplotlib.image.AxesImage at 0x4ea5550>

In [ ]: