This notebook illustrates automatic differentiation, with an application to Hamilton's equations and geodesic shooting. More precisely, we use automatic differentiation with the following combination of properties:
Known bugs and incompatibilities. Our implementation of automatic differentiation is based on subclassing the numpy array class. While simple and powerful, this approach suffers from a few pitfalls, described in the notebook ADBugs.
Summary of volume Algorithmic tools, this series of notebooks.
Main summary of the Adaptive Grid Discretizations book of notebooks, including the other volumes.
Acknowledgement. Some of the experiments presented in these notebooks are part of ongoing research with Ludovic Métivier and Da Chen.
Copyright Jean-Marie Mirebeau, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay
import sys; sys.path.insert(0,"..") # Allow importing agd from parent directory
#from Miscellaneous import TocTools; print(TocTools.displayTOC('Dense','Algo'))
import numpy as np
from matplotlib import pyplot as plt
from agd import AutomaticDifferentiation as ad
from agd.ODE.hamiltonian import GenericHamiltonian,SeparableHamiltonian,fixedpoint
from agd import LinearParallel as lp
norm_infinity = ad.Optimization.norm_infinity
def norm_infinity_ad(x):
"""Check that the zeroth, first and (possibly) second order content of x"""
if isinstance(x,ad.Dense.denseAD):
return max(norm_infinity(x.value),norm_infinity(x.coef))
if isinstance(x,ad.Dense.denseAD2):
return max(norm_infinity(x.value),norm_infinity(x.coef1),norm_infinity(x.coef2))
raise ValueError("Not a Dense AD variable")
def reload_packages():
from Miscellaneous.rreload import rreload
global ad,lp,GenericHamiltonian,SeparableHamiltonian
[ad,lp,GenericHamiltonian,SeparableHamiltonian] = rreload([ad,lp,GenericHamiltonian,SeparableHamiltonian],rootdir="..")
We choose to illustrate the automatic differentiation concept using Hamilton's equations of geodesics, which read $$ \dot q = \partial_p H(q,p), \qquad \dot p = -\partial_q H(q,p), $$ where $H$ is the Hamiltonian of the medium. These equations are (extremely) common in optics and mechanics. They naturally involve first-order differentiation. In addition, higher order differentiation will also be necessary in the context of geodesic shooting (adjusting the initial conditions to meet a prescribed endpoint), or implicit schemes.
Hamilton's equations characterize extremal paths, known as geodesics, arising in optics. Following the classical convention we denote by
Riemannian Hamiltonian. For a domain $\Omega \subset R^d$ equipped with a Riemannian metric $M : \Omega \to S_d^{++}$, the Hamiltonian is defined as $$ H(q,p) := \frac 1 2 \|p\|_{D(q)}^2, \quad \text{where} D(q) := M(q)^{-1}. $$
Poincaré's half plane model. We illustrate this ODE in the case of Poincaré's half plane model of the hyperbolic space, defined by the Riemannian metric $$ M(q) := \frac{\mathrm{Id}}{q_1^2}, $$ where $q = (q_0,q_1) \in \Omega = R \times R^{++}$. Thus $H(q,p) = (q_1^2/2) \|p\|^2$.
Exponential map. We also consider the Riemannian metric on $R^2 \setminus \{0\}$ defined by $$ M(q) := \frac{\mathrm{Id}}{\|q\|^2}. $$ It can be regarded as the metric induced by the logarithmic (multivalued) map $\ln : C \setminus \{0\} \to C$, where the targer space $C$ is equipped with the usual norm, and it possesses periodic trajectories.
def SquaredNorm(p):
return (p**2).sum(axis=0)
def H_HalfPlane(q,p):
return 0.5*q[1]**2*SquaredNorm(p)
def H_Log(q,p):
return 0.5*SquaredNorm(q)*SquaredNorm(p)
The Hamiltonian is in this case the sum of the kinetic energy, and of the potential energy, say w.r.t. a mass at the origin.
def H_Celestial(q,p):
return 0.5*SquaredNorm(p) - SquaredNorm(q)**-0.5
Some models in appeareance unrelated with mechanics or optics also exhibit a Hamiltonian structure. This is the case of the Lotka-Voltera Predator-Prey model, depending on parameters $\alpha,\beta,\gamma,\delta \in R$ $$ \dot x = x (\alpha-\beta y), \qquad \dot y = y(\delta x-\gamma). $$ Using a logarithmic change of variables, $q=\ln x$ and $p=\ln y$, this model can be put in Hamiltonian form, with $$ H(q,p) = \alpha p - \beta e^p + \gamma q - \delta e^q. $$
LotkaVolterra_params = (1.,1.,1.,1.)
def H_LotkaVolterra(q,p,params=LotkaVolterra_params):
alpha,beta,gamma,delta=params
return alpha*p - beta*np.exp(p) + gamma*q-delta*np.exp(q)
Since Hamiltonian structures are quite common in mathematics, a Hamiltonian class is provided, which implements most of the operations presented in this notebook. It does rely on automatic differentiation as well, but hides these details internal internally. We check here that it reproduces the same results.
# Non-separable Hamiltonians
HalfPlane = GenericHamiltonian(H_HalfPlane)
Log = GenericHamiltonian(H_Log)
# Separable Hamiltonian
alpha,beta,gamma,delta=LotkaVolterra_params
LotkaVolterra = SeparableHamiltonian( lambda q:gamma*q - delta*np.exp(q), lambda p:alpha*p - beta*np.exp(p) )
LotkaVolterra.vdim=None # Uses scalar variables
# Separable Hamiltonian with a quadratic structure in the second variable
Celestial = SeparableHamiltonian( lambda q:-SquaredNorm(q)**-0.5, lambda p:0.5*SquaredNorm(p) )
# Collect the Hamiltonians defined as functions or classes, for comparison purposes
HClass = [HalfPlane,Log,Celestial]
HFun = [H_HalfPlane,H_Log,H_Celestial]
We illustrate the automatic differentiation mechanism by differentiating a Hamiltonian w.r.t. the position $q$ and momentum $p$, which is a prerequisite for implementing Hamilton's ODE of motion.
Hamiltonian = H_HalfPlane
q = np.array([0.,1.])
p = np.array([1.,1.])
Hamiltonian(q,p)
1.0
In order access the Hamiltonian's derivatives, say with respect to position, we evaluate it on a variable featuring first order information. Said otherwise, we create a variable representing the first order Taylor expansion $$ q_\mathrm{ad} = (q_0+\delta_0,q_1+\delta_1) + O(\|\delta\|^2), $$ where $\delta = (\delta_0,\delta_1)$ is a symbolic, infinitesimal perturbation.
q_ad = ad.Dense.identity(constant=q)
q_ad # Contains an additional array representing first order information.
denseAD(array([0., 1.]), array([[1., 0.], [0., 1.]]))
print("Zero-th order : ",q_ad.value)
print("First order : \n",q_ad.coef)
Zero-th order : [0. 1.] First order : [[1. 0.] [0. 1.]]
The first order information is propagated through the computations, and eventually we obtain the result $$ H(q_\mathrm{ad},p) = H(q,p) + <\nabla_q H(q,p),\delta> + O(\|\delta\|^2). $$ Note that, for Poincaré's half plane model, one has $\nabla_q H(q,p) = (0,q_1 \|p\|^2)$.
Hamiltonian(q_ad,p)
denseAD(array(1.),array([0., 2.]))
print("Gradient of the Hamiltonian w.r.t. the variable q : ",Hamiltonian(q_ad,p).gradient())
Gradient of the Hamiltonian w.r.t. the variable q : [0. 2.]
We can similarly differentiate w.r.t. the momentum $p$. Poincaré's half plane model obeys $\nabla_p H(q,p) = q_1^2 p$.
p_ad = ad.Dense.identity(constant=p)
Hamiltonian(q,p_ad)
denseAD(array(1.),array([1., 1.]))
Comparing with the Hamiltonian class
for Cls,Fun in zip(HClass,HFun):
assert norm_infinity( Cls.H(q,p) - Fun(q,p) ) < 1e-14
assert norm_infinity( Cls.DqH(q,p) - Fun(q_ad,p).gradient() ) < 1e-14
assert norm_infinity( Cls.DpH(q,p) - Fun(q,p_ad).gradient() ) < 1e-14
Evaluating $H(q_\mathrm{ad},p_\mathrm{ad})$ yields an result may sound unexpected: we obtain the sum $\nabla_q H(q,p)+\nabla_p H(q,p)$ of the gradients, instead of their independent values. Indeed, this follows from the first order Taylor expansion $$ H(q+\delta,p+\delta) = H(q,p) + <\nabla_q H(q,p),\delta> + <\nabla_p H(q,p),\delta> + O(\|\delta\|^2), $$ where $\delta = (\delta_0,\delta_1)$ is an infinitesimal symbolic perturbation.
Hamiltonian(q_ad,p_ad)
denseAD(array(1.),array([1., 3.]))
In order to obtain the gradient of $H$ w.r.t the independent variables $q$ and $p$, we need to specify that the corresdponding infinitesimal perturbations are independent. Namely set $$ q_\mathrm{ad} = q+(\delta_0,\delta_1), \quad p_\mathrm{ad} = p+(\delta_2,\delta_3), $$ where $(\delta_0,\cdots,\delta_4)$ is four dimensional infinitesimal perturbation.
q_ad = ad.Dense.identity(constant=q,shift=(0,p.size))
p_ad = ad.Dense.identity(constant=p,shift=(q.size,0))
As desired, we obtain the full gradient of $H$, which is the concatenation of $\nabla_q H$ and $\nabla_p H$.
Hamiltonian(q_ad,p_ad)
denseAD(array(1.),array([0., 2., 1., 1.]))
q_ad
denseAD(array([0., 1.]), array([[1., 0., 0., 0.], [0., 1., 0., 0.]]))
p_ad
denseAD(array([1., 1.]), array([[0., 0., 1., 0.], [0., 0., 0., 1.]]))
Apply independent perturbations to a sequence of variables.
The register
function takes care of shifting the perturbations adequately.
q_ad_,p_ad_ = ad.Dense.register((q,p))
for a,b in zip( (q_ad_,p_ad_), (q_ad,p_ad)):
assert norm_infinity_ad(a-b)==0
A common programming style in Python application to numerics is vectorized computations using numpy tensors. This allows avoiding loops, when some operation has to be repeated a large number of times, with improvements in clarity, flexibility, and speed.
As an example, we compute $H(q_0,p_0)$, $H(q_1,p_1)$, $H(q_2,p_2)$, where $q_0,q_1,q_2$ and $p_0,p_1,p_2$ are different positions and impulsions. We also differentiate these values w.r.t. the impulsion.
q0 = np.array([0.,1.]); q1 = np.array([1.,1.]); q2 = np.array([-2.,1.]);
p0 = np.array([1.,1.]); p1 = np.array([1.,0.]); p2 = np.array([-1.,-1.]);
Hamiltonian(q0,p0), Hamiltonian(q1,p1), Hamiltonian(q2,p2)
(1.0, 0.5, 1.0)
Ham_seq = [Hamiltonian(q,ad.Dense.identity(constant=p)) for (q,p) in ((q0,p0), (q1,p1), (q2,p2))]
Ham_seq
[denseAD(array(1.),array([1., 1.])), denseAD(array(0.5),array([1., 0.])), denseAD(array(1.),array([-1., -1.]))]
The vectorized form of these computations is as follows.
q_vec = np.stack( (q0,q1,q2),axis=1)
p_vec = np.stack( (p0,p1,p2),axis=1)
Hamiltonian(q_vec,p_vec)
array([1. , 0.5, 1. ])
p_vec_ad = ad.Dense.identity(constant=p_vec)
However, the automatic differentiation result may not be what is desired. Indeed, the entries of q_vec
are all considered independent, and we recover a six dimensional gradient vector for each component. This is both impractical and numerically expensive.
Ham_vec = Hamiltonian(q_vec,p_vec_ad)
Ham_vec
denseAD(array([1. , 0.5, 1. ]), array([[ 1., 0., 0., 1., 0., 0.], [ 0., 1., 0., 0., 0., 0.], [ 0., 0., -1., 0., 0., -1.]]))
print("Sequential result : ",Ham_seq[0])
print("Vectorized result : ",Ham_vec[0])
Sequential result : denseAD(1.0,[1. 1.]) Vectorized result : denseAD(1.0,[1. 0. 0. 1. 0. 0.])
In this situation, and in contrast with the previous section, we would like the perturbations to be regarded as dependent. This is achieved with the following parameters:
shape_free
: apply free independent symbolic perturbations w.r.t. these dimensions.shape_bound
: apply bound correlated symbolic perturbations w.r.t. these dimensions.It is assumed that the free dimensions come first, and the bound dimensions come last. Only one of the two parameters shape_free
and shape_bound
needs to be specified, since they are redundant.
p_vec.shape
(2, 3)
The first dimension is free, and the last one is bound since it is only introduced for vectorization purposes.
shape_free,shape_bound = p_vec.shape[:1], p_vec.shape[1:]
p_vec_ad = ad.Dense.identity(constant=p_vec, shape_bound=shape_bound)
#p_vec_ad = ad.Dense.identity(constant=p_vec, shape_free=shape_free) # Equivalently
Ham_vec = Hamiltonian(q_vec,p_vec_ad)
print("Vectorized result (with shape_bound): \n", Ham_vec)
Vectorized result (with shape_bound): denseAD([1. 0.5 1. ], [[ 1. 1.] [ 1. 0.] [-1. -1.]])
This technique is compatible with differentiation w.r.t. multiple independent variables.
q_vec_ad = ad.Dense.identity(constant=q_vec, shape_free=shape_free, shift=(0,len(p_vec)) )
p_vec_ad = ad.Dense.identity(constant=p_vec, shape_free=shape_free, shift=(len(q_vec),0) )
Ham_vec = Hamiltonian(q_vec_ad,p_vec_ad)
Ham_vec
denseAD(array([1. , 0.5, 1. ]), array([[ 0., 2., 1., 1.], [ 0., 1., 1., 0.], [ 0., 2., -1., -1.]]))
print("Full gradient w.r.t q0,p0 : ", Ham_vec[0])
Full gradient w.r.t q0,p0 : denseAD(1.0,[0. 2. 1. 1.])
Again, the register
function takes car of the shifts.
q_vec_ad_,p_vec_ad_ = ad.Dense.register((q_vec,p_vec), shape_bound=shape_bound)
for a,b in zip( (q_vec_ad_,p_vec_ad_), (q_vec_ad,p_vec_ad) ):
assert norm_infinity_ad(a-b) == 0
Alternative implementation using arrays of arrays. Another way to achieve vectorization is to use 'arrays of arrays'. This approach is not recommended here, since it is more cumbersome and less efficient, but it is still shown for completeness and testing.
q_diss = ad.disassociate(q_vec, shape_free=(2,), # Creates an array of arrays
expand_free_dims=None, expand_bound_dims=None)
print("An array of arrays :", q_diss)
print("First element :", q_diss[0])
An array of arrays : [array([ 0., 1., -2.]) array([1., 1., 1.])] First element : [ 0. 1. -2.]
It is safer to avoid scalar arrays in the context of 'arrays of arrays'. Indeed, confusion aries otherwise between 'arrays scalars with array components', and 'standard arrays'. The default for ad.disassociate
is to introduce singleton dimensions to ensure this behavior is avoided.
q_diss = ad.disassociate(q_vec, shape_free=(2,))
p_diss = ad.disassociate(p_vec, shape_free=(2,))
print("Disassociated array ", q_diss)
print("with type ", type(q_diss), "with data type ", q_diss.dtype,
"and shape ", q_diss.shape, "including a trailing singleton.")
print("Reconstructed array", ad.associate(q_diss)) # Inverse operation to disassociate
Disassociated array [[array([[ 0.], [ 1.], [-2.]])] [array([[1.], [1.], [1.]])]] with type <class 'numpy.ndarray'> with data type object and shape (2, 1) including a trailing singleton. Reconstructed array [[ 0. 1. -2.] [ 1. 1. 1.]]
We can now manipulate q_diss
and p_diss
as standard low-dimensional arrays, abstracting the underlying vectorization, and in particular use the AD tools.
Hamiltonian(q_diss,p_diss)
array([array([[1. ], [0.5], [1. ]])], dtype=object)
q_diss_ad = ad.Dense.identity(constant=q_diss,shift=(0,2))
p_diss_ad = ad.Dense.identity(constant=p_diss,shift=(2,0))
Ham_diss_ad = Hamiltonian(q_diss_ad,p_diss_ad)
ad.associate(Ham_diss_ad)
denseAD(array([1. , 0.5, 1. ]), array([[ 0., 2., 1., 1.], [ 0., 1., 1., 0.], [ 0., 2., -1., -1.]]))
Comparison with the Hamiltonian class
The Hamiltonian class needs to know about the number of independent components, in order to deal with multi-dimensional arrays appropriately.
HalfPlane.shape_free = (2,)
assert norm_infinity_ad(Ham_vec - ad.associate(Ham_diss_ad)) < 1e-14
assert norm_infinity(HalfPlane.H(q_vec,p_vec) - Ham_vec.value) < 1e-14
assert norm_infinity(HalfPlane.DqH(q_vec,p_vec) - Ham_vec.gradient()[:2]) < 1e-14
assert norm_infinity(HalfPlane.DpH(q_vec,p_vec) - Ham_vec.gradient()[2:]) < 1e-14
There are situations where second order automatic differentiation is needed. It could be achieved, in principle, by recursive calls to first order differentiation. However, for reasons of performance and convenience, a dedicated class is implemented.
q_ad = ad.Dense2.identity(constant=q)
The hessian of Poincaré's half plane model hamiltonian, w.r.t. position is $$ \nabla^2_q H(q,p) = \begin{pmatrix} 0 & 0 \\ 0 & \|p\|^2 \end{pmatrix} $$ and w.r.t. impulsion $$ \nabla^2_p H(q,p) = q_0^2 \mathrm{Id}. $$
Ham = Hamiltonian(q_ad,p)
Ham
denseAD2(array(1.),array([0., 2.]), array([[0., 0.], [0., 2.]]))
print("Function value : ", Ham.value)
print("Gradient : ", Ham.coef1)
print("Hessian : \n", Ham.coef2)
Function value : 1.0 Gradient : [0. 2.] Hessian : [[0. 0.] [0. 2.]]
p_ad = ad.Dense2.identity(constant=p)
Hamiltonian(q,p_ad)
denseAD2(array(1.),array([1., 1.]), array([[1., 0.], [0., 1.]]))
A joint hessian, of size $4\times 4$, can be computed.
q_ad = ad.Dense2.identity(constant=q, shift=(0,p.size))
p_ad = ad.Dense2.identity(constant=p, shift=(q.size,0))
p_ad.coef2.shape
(2, 4, 4)
Hamiltonian(q_ad,p_ad)
denseAD2(array(1.),array([0., 2., 1., 1.]), array([[0., 0., 0., 0.], [0., 2., 2., 2.], [0., 2., 1., 0.], [0., 2., 0., 1.]]))
Vectorization works as in the first order case.
q_vec_ad = ad.Dense2.identity(constant=q_vec, shape_free=shape_free)
Ham = Hamiltonian(q_vec_ad,p_vec)
Ham[0]
denseAD2(array(1.),array([0., 2.]), array([[0., 0.], [0., 2.]]))
q_vec_ad = ad.Dense2.identity(constant=q_vec,shape_bound=shape_bound, shift=(0,len(p_vec)) )
p_vec_ad = ad.Dense2.identity(constant=p_vec,shape_bound=shape_bound, shift=(len(q_vec),0) )
Ham = Hamiltonian(q_vec_ad,p_vec_ad)
Ham[0]
denseAD2(array(1.),array([0., 2., 1., 1.]), array([[0., 0., 0., 0.], [0., 2., 2., 2.], [0., 2., 1., 0.], [0., 2., 0., 1.]]))
Alternative approach using recursive automatic differentiation. We present an alternative approach to second order automatic differentiation, based on recursive first order differentiation. It is not recommended, since it is more cumbersome and less efficient, but it is still shown for completeness and testing.
def ADRecIdentity(constant,shape_free=None,shift=(0,0)):
# First order differentiation
q_ad = ad.Dense.identity(constant=constant, shape_free=shape_free, shift=shift)
# Hide automatic differentiation info in the component type
q_diss = ad.disassociate(q_ad, shape_free=shape_free)
# Recursive first order differentiation
q_ad2 = ad.Dense.identity(constant=q_diss, shape_free=shape_free, shift=shift)
return q_ad2
def ADRecToAD2(q):
q01 = ad.associate(q.value) # Zero-th and first order
q12 = ad.associate(q.coef,squeeze_free_dims=-2) # first and second order
q12 = np.moveaxis(q12,q.ndim-1,-1) # Put ad information last
return ad.Dense2.denseAD2(q01.value,q01.coef,q12.coef)
q_rec = ADRecIdentity(q,shape_free=(2,), shift=(0,2))
p_rec = ADRecIdentity(p,shape_free=(2,), shift=(2,0))
Ham_rec = Hamiltonian(q_rec,p_rec)
ADRecToAD2(Ham_rec)
denseAD2(array(1.),array([0., 2., 1., 1.]), array([[0., 0., 0., 0.], [0., 2., 2., 2.], [0., 2., 1., 0.], [0., 2., 0., 1.]]))
q_vec_rec = ADRecIdentity(q_vec, shape_free=(2,), shift=(0,2))
p_vec_rec = ADRecIdentity(p_vec, shape_free=(2,), shift=(2,0))
Ham_vec_rec = Hamiltonian(q_vec_rec,p_vec_rec)
ADRecToAD2(Ham_vec_rec)[0]
denseAD2(array(1.),array([0., 2., 1., 1.]), array([[0., 0., 0., 0.], [0., 2., 2., 2.], [0., 2., 1., 0.], [0., 2., 0., 1.]]))
Automatic differentiation allows to compute the gradient and hessian of a function.
print(f"An array of shape {Ham.shape}, and AD information w.r.t {Ham.size_ad} independent variables")
An array of shape (3,), and AD information w.r.t 4 independent variables
Internally, the coefficients corresponding to differentiation of are stored last.
print("Internal storage has derivatives last.")
print(f"first order: {Ham.coef1.shape}, second order: {Ham.coef2.shape}")
Internal storage has derivatives last. first order: (3, 4), second order: (3, 4, 4)
However, for practical purposes it is often desirable to move them first. This is the role of the gradient
and hessian
functions.
print("Interface moves derivatives first.")
print(f"gradient: {Ham.gradient().shape}, hessian: {Ham.hessian().shape}")
Interface moves derivatives first. gradient: (4, 3), hessian: (4, 4, 3)
We implement Hamilton's equations of motion $$ \dot q = \partial_p H(q,p), \qquad \dot p = -\partial_q H(q,p), $$ using several numerical schemes. In terms of automatic differentiation, the most interesting case is the semi-implicit Symplectic Euler scheme.
Conserved quantity. By construction, the Hamiltonian value is conserved along the Hamiltonian flow: $$ H(q(t),p(t)) = \mathrm{cst}, $$ if $t \mapsto (q(t),p(t))$ is a solution to Hamilton's equations of motion.
Explicit solution for the half plane model. The geodesics for the Poincare half plane model are known explicitly. They are half circles whose center lies on the horizontal axis.
The simplest of all ODE solvers reads $$ \frac{q_{n+1}-q_n} {\Delta t} = \partial_p H(q_n,p_n), \qquad \frac{p_{n+1}-p_n} {\Delta t} = -\partial_q H(q_n,p_n), $$ where $(q_0,p_0)$ is a given initial condition, and $n>0$ is the iteration number, and $\Delta t$ is the time step.
def EulerStep(q,p,H,dt):
d=len(q)
q_ad = ad.Dense.identity(constant=q,shape_free=(d,),shift=(0,d))
p_ad = ad.Dense.identity(constant=p,shape_free=(d,),shift=(d,0))
grad = H(q_ad,p_ad).gradient()
return q+dt*grad[d:],p-dt*grad[:d]
def Geodesic(q0,p0,H,t,n=100,step=EulerStep):
dt=t/n
q,p=q0.copy(),p0.copy()
Q,P=[q],[p]
for i in range(n):
q,p=step(q,p,H,dt)
Q.append(q)
P.append(p)
# For convenience, we put time as a second coordinate.
return np.stack(Q,axis=1),np.stack(P,axis=1),np.linspace(0,t,n+1)
Euler's explicit scheme is only first order accurate. While the geodesic circular shape is more or less observed, the Hamiltonian conservation is far from perfect.
Q,P,T = Geodesic(q_vec,p_vec,Hamiltonian,4.)
plt.axis('equal'); plt.title("Poincare geodesic (Explicit euler)")
plt.plot(Q[0],Q[1]); plt.axhline(0);
plt.title("Poincare Hamiltonian (Explicit euler)")
plt.plot(T,Hamiltonian(Q,P));
A natural approach to improve the ODE solver is to increase its consistency order. This can be achieved using Runge-Kutta methods, here of second and fourth order.
def SymplecticGradient(q,p,H):
d=len(q)
q_ad = ad.Dense.identity(constant=q,shape_free=(d,),shift=(0,d))
p_ad = ad.Dense.identity(constant=p,shape_free=(d,),shift=(d,0))
grad = H(q_ad,p_ad).gradient()
return grad[d:],-grad[:d]
def RK2Step(q,p,H,dt):
k1 = SymplecticGradient(q,p,H)
k2 = SymplecticGradient(q+0.5*dt*k1[0],p+0.5*dt*k1[1],H)
return q+dt*k2[0],p+dt*k2[1]
def RK4Step(q,p,H,dt):
k1 = SymplecticGradient(q,p,H)
k2 = SymplecticGradient(q+0.5*dt*k1[0],p+0.5*dt*k1[1],H)
k3 = SymplecticGradient(q+0.5*dt*k2[0],p+0.5*dt*k2[1],H)
k4 = SymplecticGradient(q+dt*k3[0],p+dt*k3[1],H)
return q+dt*(k1[0]+2*k2[0]+2*k3[0]+k4[0])/6.,p+dt*(k1[1]+2*k2[1]+2*k3[1]+k4[1])/6.
Q2,P2,T = Geodesic(q_vec,p_vec,Hamiltonian,4.,step=RK2Step)
Q4,P4,T = Geodesic(q_vec,p_vec,Hamiltonian,4.,step=RK4Step)
plt.axis('equal'); plt.title("Poincare geodesic (Explicit solvers)")
plt.plot(Q[0],Q[1], Q2[0],Q2[1], Q4[0],Q4[1]); plt.axhline(0);
In this example, the Hamiltonian is almost exactly constant for the second and fourth order models.
plt.title("Poincare Hamiltonian (Runge-Kutta)")
plt.plot(T,Hamiltonian(Q2,P2), T,Hamiltonian(Q4,P4)); #T,Hamiltonian(Q,P),
Cone model. However, if the trajectory is long enough, and the time step is large enough, then even the high order models fail to conserve the Hamiltonian. This is evidenced with
Hamiltonian = H_Log
q,p = np.array([1,0]),np.array([0,1])
Q,P,T = Geodesic(q,p,Hamiltonian,6.,step=EulerStep)
Q2,P2,T2 = Geodesic(q,p,Hamiltonian,25.,step=RK2Step)
Q4,P4,T4 = Geodesic(q,p,Hamiltonian,75.,step=RK4Step)
plt.axis('equal'); plt.title("Cone model (Explicit solvers)")
plt.plot(Q[0],Q[1], Q2[0],Q2[1], Q4[0],Q4[1]);
plt.title("Cone Hamiltonian (Explicit solvers)")
plt.plot(T,Hamiltonian(Q,P), T2,Hamiltonian(Q2,P2), T4,Hamiltonian(Q4,P4));
Likewise, in the celestial mechanics case, the Hamiltonian drifts with time using the Euler explicit and Runge-Kutta schemes.
Hamiltonian = H_Celestial
q,p = np.array([1.,0.]),np.array([0.3,0.7])
t = 10.
Q,P,T = Geodesic(q,p,Hamiltonian,t,step=EulerStep,n=int(100*t))
Q2,P2,T2 = Geodesic(q,p,Hamiltonian,t,step=RK2Step,n=int(20*t))
Q4,P4,T4 = Geodesic(q,p,Hamiltonian,t,step=RK4Step,n=int(8*t))
plt.axis('equal'); plt.title("Poincare geodesic (Explicit solvers)")
plt.plot(Q[0],Q[1], Q2[0],Q2[1], Q4[0],Q4[1]);
plt.plot(T,Hamiltonian(Q,P), T2,Hamiltonian(Q2,P2), T4,Hamiltonian(Q4,P4));
Comparison with the Hamiltonian class
Q_,P_,T_ = Celestial.integrate(q,p,scheme="Euler",niter=int(100*t),T=t,path=True)
Q2_,P2_,T2_ = Celestial.integrate(q,p,scheme="Runge-Kutta-2", niter=int(20*t), T=t,path=True)
Q4_,P4_,T4_ = Celestial.integrate(q,p,scheme="Runge-Kutta-4", niter=int(8*t), T=t,path=True)
for a,b in zip( (Q_,P_,T_,Q2_,P2_,T2_,Q4_,P4_,T4_), (Q,P,T,Q2,P2,T2,Q4,P4,T4) ):
assert norm_infinity(a-b)<1e-14
Symplectic schemes guarantee that a quantity, closely related with the Hamiltonian, is conserved along the trajectory.
The Euler symplectic scheme is defined as follows: $$ \frac{p_{n+1}-p_n}{\Delta t} = -\partial_q H(q_n,p_{n+1}), \quad \frac{q_{n+1}-q_n}{\Delta t} = \partial_p H(q_n,p_{n+1}), $$ Note that the scheme is semi-implicit, in view of the definition of $p_{n+1}$. Two cases must be distinguished for the implementation.
def SymplecticEulerStep_Separable(q,p,H,dt):
d=len(q)
# Differentiate w.r.t. q and advance p
q_ad = ad.Dense.identity(constant=q,shape_free=(d,))
q_grad = H(q_ad,p).gradient() # We can use p thanks to separability
p1 = p-dt*q_grad
# Differentiate w.r.t. p and advance q
p1_ad = ad.Dense.identity(constant=p1,shape_free=(d,))
p1_grad = H(q,p1_ad).gradient()
q1 = q+dt*p1_grad
return q1,p1
Hamiltonian = H_Celestial
q,p = np.array([1,0]),np.array([0.3,0.7])
t = 20.
QS,PS,TS = Geodesic(q,p,Hamiltonian,t,step=SymplecticEulerStep_Separable,n=int(30*t))
The symplectic Euler scheme is only first order accurate, but it has the property that the Hamiltonian remains bounded above and below during the evolution.
plt.axis('equal'); plt.title("Celestial mechanics (Euler symplectic)")
plt.plot(QS[0],QS[1]);
plt.title("Hamiltonian conservation (Euler symplectic)")
plt.plot(TS,Hamiltonian(QS,PS));
The Verlet method is a second order symplectif integration scheme. It is explicit if the Hamiltonian is separable.
def SymplecticVerletStep_Separable(q,p,H,dt):
d=len(q)
# Differentiate w.r.t. q and advance p
q_ad = ad.Dense.identity(constant=q,shape_free=(d,))
q_grad = H(q_ad,p).gradient()
p1 = p-0.5*dt*q_grad
# Differentiate w.r.t. p and advance q
p1_ad = ad.Dense.identity(constant=p1,shape_free=(d,))
p1_grad = H(q,p1_ad).gradient()
q1 = q+dt*p1_grad
# Differentiate w.r.t. q and advance p
q1_ad = ad.Dense.identity(constant=q1,shape_free=(d,))
q1_grad = H(q1_ad,p1).gradient()
p2 = p1-0.5*dt*q1_grad
return q1,p2
Hamiltonian = H_Celestial
q,p = np.array([1.,0.]),np.array([0.3,0.7])
t = 20.
QV,PV,TV = Geodesic(q,p,Hamiltonian,t,step=SymplecticVerletStep_Separable,n=int(30*t))
It can be shown that the Verlet scheme, in the separable case, amounts to the symplecting Euler scheme shifted by half a time step. Visually, their output is therefore quite similar.
plt.axis('equal'); plt.title("Celestial mechanics (Verlet symplectic)")
plt.plot(QV[0],QV[1]);
The improved accuracy is more obvious on the Hamiltonian conservation than on the trajectory.
plt.title("Hamiltonian conservation (Verlet symplectic vs Euler symplectic)")
plt.plot(TV,Hamiltonian(QV,PV), TS,Hamiltonian(QS,PS) );
Comparison with the Hamiltonian class
QS_,PS_,TS_ = Celestial.integrate(q,p,scheme="Euler-p", niter=int(30*t),T=t,path=True)
QV_,PV_,TV_ = Celestial.integrate(q,p,scheme="Verlet-p",niter=int(30*t),T=t,path=True)
for a,b in zip( (QS_,PS_,TS_,QV_,PV_,TV_), (QS,PS,TS,QV,PV,TV) ): assert norm_infinity(a-b)<1e-14
We implement the general semi-implicit Symplectic Euler scheme using a Newton method for the implicit part. Our first step is to implement the Newton method.
Note on Newton versus Fixed point.
In the context of implicit schemes for geometric integration, fixed point iterations are often more computationally efficient than Newton iterations. We consider Newton iterations here for illustration of the AD library. The Hamiltonian
class uses by default a fixed point solver for the implicit schemes.
Generic Newton solver.
A Newton solver is implemented in the ad.Optimization
package, based on a similar approach using automatic , but with some additional flexibility.
def NewtonSolve(F,p0,niter=5,*args):
p_ad = ad.Dense.identity(constant=p0.copy(),shape_free=(len(p0),))
for i in range(niter):
p_ad = p_ad+F(p_ad,*args).solve()
return p_ad.value
We test it by solving $x+x^2=0$, with initial guess $x=0.5$. The solution $x=0$ is found.
def FTest(x):
return x+x**2
NewtonSolve(FTest,np.array([0.5]))
array([5.39659527e-16])
However, compatibility issues may arise if the function called $F$ itself uses automatic differentiation in its definition. This is the case of our numerical schemes, which involve the partial derivatives of the Hamiltonian. We circumvent this potential problem by hiding the AD information in the element type.
def NewtonSolve_AD(F,p0,niter=5,*args):
d=len(p0)
def disac(x): return ad.disassociate(x,shape_free=(d,))
def assoc(x): return ad.associate(x)
p_ad = ad.Dense.identity(constant=p0.copy(),shape_free=(d,))
for i in range(niter):
p_ad = p_ad+assoc(F(disac(p_ad),*args)).solve()
return p_ad.value
def SymplecticEulerStep(q0,p0,H,dt):
d=len(q0)
def F(p1):
# A potential issue is that p1 may have an additional singleton dimension,
# which is introduced in recursive AD techniques. So we must reshape q0 and p0 accordingly.
q0_,p0_= (a.reshape(p1.shape) for a in (q0,p0))
q0_ad = ad.Dense.identity(constant=q0_,shape_free=(d,))
q0_grad = H(q0_ad,p1).gradient()
return p1-p0_+dt*q0_grad
p1 = NewtonSolve_AD(F,p0) # Implicit update of momentum
p1_ad = ad.Dense.identity(constant=p1,shape_free=(d,))
p1_grad = H(q0,p1_ad).gradient()
q1=q0+dt*p1_grad # Explicit update of position
return q1,p1
If the Hamiltonian is separable, then the semi-implicit scheme coincides with the explicit scheme.
Hamiltonian = H_Celestial
q,p = np.array([1.,0.]),np.array([0.3,0.7])
qI,pI = SymplecticEulerStep(q,p,Hamiltonian,0.1) # Semi-implicit
qE,pE = SymplecticEulerStep_Separable(q,p,Hamiltonian,0.1)
for a,b in zip( (qI,pI), (qE,pE) ):
assert norm_infinity(a-b)<1e-14
However, only the semi-implicit implementation is truly symplectic for general hamiltonians.
Hamiltonian = H_Log
q,p = np.array([1.,0.]),np.array([0.,1.])
print("Semi-implicit :", SymplecticEulerStep(q,p,Hamiltonian,0.1))
print("Explicit :", SymplecticEulerStep_Separable(q,p,Hamiltonian,0.1))
Semi-implicit : (array([0.98989795, 0.1 ]), array([-0.10102051, 1. ])) Explicit : (array([0.99, 0.1 ]), array([-0.1, 1. ]))
t = 20.
QI,PI,TI = Geodesic(q,p,Hamiltonian,t,step=SymplecticEulerStep,n=int(10*t))
QS,PS,TS = Geodesic(q,p,Hamiltonian,t,step=SymplecticEulerStep_Separable,n=int(10*t))
None of the two schemes is very accurate, since they are only first order. (In this specific example, one expects a closed geodesic.)
plt.axis('equal'); plt.title("Cone model (Euler symplectic, separable or non-separable)")
plt.plot(QI[0],QI[1], QS[0],QS[1] );
The semi-implicit scheme almost perfectly conserves the Hamiltonian. This is in contrast with the explicit scheme which here makes the invalid assumption of separability. The goal is achieved.
plt.title("Cone Hamiltonian, Euler alternating scheme")
plt.plot(TI,Hamiltonian(QI,PI), label="semi-implicit (symplectic)")
plt.plot(TS,Hamiltonian(QS,PS), label="explicit (non-symplectic)")
plt.legend();
Comparison with the Hamiltonian class
The default implementation of implicit schemes in the Hamiltonian class uses a fixed point solver. A small discrepancy is observed with the Newton method, due to the convergence tolerance.
QI_,PI_,TI_ = Log.integrate(q,p,scheme="Euler-p", niter=int(10*t),T=t,path=True)
for a,b in zip( (QI_,PI_,TI_), (QI,PI,TI) ): assert norm_infinity(a-b) < 1e-7
Changing the stopping criterion of fixed point solver does improve concordance.
Log.impl_solver = lambda f,x : fixedpoint(f,x,tol=1e-14)
QI_,PI_,TI_ = Log.integrate(q,p,scheme="Euler-p", niter=int(10*t),T=t,path=True)
for a,b in zip( (QI_,PI_,TI_), (QI,PI,TI) ): assert norm_infinity(a-b) < 1e-12
scheme = Log.symplectic_schemes()["Euler-p"]
The symplectic form is a bilinear form on the space of positions of impulsions, defined as $$ B( (\delta q,\delta p), (\delta q',\delta p') ) = <\delta q,\delta p'> - <\delta p, \delta q'>. $$ It is often denoted as $B = \delta q \wedge \delta p$.
def SymplecticForm(q_ad,p_ad):
"""
Returns the matrix of the symplectic 2-form,
in the considered first order perturbations.
"""
dp = p_ad.gradient()
dq = q_ad.gradient()
size_ad = p_ad.size_ad
return ad.array([[lp.dot_VV(dq[i],dp[j]) - lp.dot_VV(dp[i],dq[j])
for i in range(size_ad)] for j in range(size_ad)])
Consider canonical independent perturbations in the position and impulsion variables. The matrix of the symplectic form is a $2d\times 2d$ antisymmetric matrix, with block form $$ \begin{pmatrix} 0 & -I\\ I & 0 \end{pmatrix}. $$
q,p = np.array([1.,0.]),np.array([0.,1.])
q_ad,p_ad = ad.Dense.register( (q,p) )
mSymp = SymplecticForm(q_ad,p_ad)
print(f"Matrix of the symplectic form:\n{mSymp}")
Matrix of the symplectic form: [[ 0. 0. -1. 0.] [ 0. 0. 0. -1.] [ 1. 0. 0. 0.] [ 0. 1. 0. 0.]]
We check that the form is preserved by our numerical schemes, by applying them to the q_ad
and p_ad
variables. Since their definition already involves automatic differentiation, we need to hide it using the associate and disassociate functions.
d=len(q)
def disac(x): return ad.disassociate(x,shape_free=(d,))
def assoc(x): return ad.associate(x)
qE_ad,pE_ad = (assoc(x) for x in SymplecticEulerStep_Separable(disac(q_ad),disac(p_ad),H_Celestial,0.1))
assert norm_infinity(SymplecticForm(qE_ad,pE_ad)-mSymp) < 1e-14
scheme(disac(q_ad),disac(p_ad),0.1)
(array([[denseAD(array([0.98989795]),array([[ 0.96948553, 0.00206207, 0.10206207, -0.02041241]]))], [denseAD(array([0.1]),array([[0.2 , 0.98989795, 0. , 0.1 ]]))]], dtype=object), array([[denseAD(array([-0.10102051]),array([[-0.10310363, 0.02062073, 1.02062073, -0.20412415]]))], [denseAD(array([1.]),array([[ 0. , -0.10102051, 0. , 1. ]]))]], dtype=object))
# Semi-implicit scheme for the H_Log hamiltonian
qI_ad,pI_ad = (assoc(x) for x in scheme(disac(q_ad),disac(p_ad),0.1))
assert norm_infinity(SymplecticForm(qI_ad,pI_ad) - mSymp) < 1e-14
Geodesic shooting is the process of adjusting the initial velocity or momentum of a geodesic, so that the endpoint at time $t=1$ meets a prescribed position. A natural strategy is to optimize the initial velocity using a Newton method, taking advantage of automatic differentiation to compute the jacobian matrix of the exponential map. Note that AD is also used, independently, to solve Hamilton's ODE, which involves the derivatives of the Hamiltonian.
We use the Poincare Half-Plane model for illustration. It is particularly simple since, due to its hyperbolic nature, there exists a unique geodesic between any two given points.
Riemannian exponential map. In Riemannian geometry, the exponential map associates to a position $q$ and velocity $v$ the endpoint $\exp_x(v) := \gamma_x^v(1)$ of the corresponding geodesic $\gamma_x^v$. The inverse map is known as the logarithmic map. This terminology does not perfectly apply here since Hamilton's equations are written in terms of the Hamiltonian momentum, which is dual to the velocity.
def Shoot(q,p,*args,**kwargs):
Q,P,T = Geodesic(q,p,*args,**kwargs)
return np.take(Q,-1,axis=1)
q0,p0=np.array([0,1]),np.array([1,1])
Shoot(q0,p0,H_HalfPlane,1.,step=EulerStep,n=10)
array([1.7200162 , 1.27931064])
def Aim(q1,q0,pGuess,*args,**kwargs):
if pGuess is None: pGuess=np.zeros(q0.shape)
def F(p):
nonlocal args,kwargs,q0,q1
# Reshape needed due to introduced trailing singleton
q0_,q1_ = (np.reshape(a,p.shape) for a in (q0,q1))
return Shoot(q0_,p,*args,**kwargs)-q1_
return NewtonSolve_AD(F,pGuess)
q1 = np.array([1.,1.])
p01 = Aim(q1,q0,p0,H_HalfPlane,1.,step=EulerStep,n=10)
assert norm_infinity(Shoot(q0,p01,H_HalfPlane,1.,step=EulerStep,n=10) -q1) < 1e-15