Adaptive PDE discretizations on Cartesian grids¶

Volume : Algorithmic tools¶

Part : Automatic differentiation¶

Chapter : Dense automatic differentiation, and geodesic shooting¶

This notebook illustrates automatic differentiation, with an application to Hamilton's equations and geodesic shooting. More precisely, we use automatic differentiation with the following combination of properties:

First and Second order. The variables used encode gradient and (sometimes) hessian information.
Dense. The Taylor expansion information - gradient and hessian - is stored in dense (full) form. This is practical for functions with arbitrary complexity, but low dimensional input.
Forward. The Taylor expansion information is propagated forward along with the computations.

Known bugs and incompatibilities. Our implementation of automatic differentiation is based on subclassing the numpy array class. While simple and powerful, this approach suffers from a few pitfalls, described in the notebook ADBugs.

Summary of volume Algorithmic tools, this series of notebooks.

Main summary of the Adaptive Grid Discretizations book of notebooks, including the other volumes.

Table of contents¶

1. A word on Hamilton's equations
2. Automatic differentiation basics
3. Solving ordinary differential equations
4. Shooting geodesics

Acknowledgement. Some of the experiments presented in these notebooks are part of ongoing research with Ludovic Métivier and Da Chen.

Copyright Jean-Marie Mirebeau, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay

0. Importing the required libraries¶

In [1]:

import sys; sys.path.insert(0,"..") # Allow importing agd from parent directory
#from Miscellaneous import TocTools; print(TocTools.displayTOC('Dense','Algo'))

In [2]:

import numpy as np
from matplotlib import pyplot as plt

In [3]:

from agd import AutomaticDifferentiation as ad
from agd.ODE.hamiltonian import GenericHamiltonian,SeparableHamiltonian,fixedpoint
from agd import LinearParallel as lp

In [4]:

norm_infinity = ad.Optimization.norm_infinity

def norm_infinity_ad(x):
    """Check that the zeroth, first and (possibly) second order content of x"""
    if isinstance(x,ad.Dense.denseAD):  
        return max(norm_infinity(x.value),norm_infinity(x.coef))
    if isinstance(x,ad.Dense.denseAD2): 
        return max(norm_infinity(x.value),norm_infinity(x.coef1),norm_infinity(x.coef2))
    raise ValueError("Not a Dense AD variable")

In [5]:

def reload_packages():
    from Miscellaneous.rreload import rreload
    global ad,lp,GenericHamiltonian,SeparableHamiltonian
    [ad,lp,GenericHamiltonian,SeparableHamiltonian] = rreload([ad,lp,GenericHamiltonian,SeparableHamiltonian],rootdir="..")

1. A word on Hamilton's equations¶

We choose to illustrate the automatic differentiation concept using Hamilton's equations of geodesics, which read $$ \dot q = \partial_p H(q,p), \qquad \dot p = -\partial_q H(q,p), $$ where $H$ is the Hamiltonian of the medium. These equations are (extremely) common in optics and mechanics. They naturally involve first-order differentiation. In addition, higher order differentiation will also be necessary in the context of geodesic shooting (adjusting the initial conditions to meet a prescribed endpoint), or implicit schemes.

1.1 Optics and geodesics¶

Hamilton's equations characterize extremal paths, known as geodesics, arising in optics. Following the classical convention we denote by

$q$ the position variable,
$p$ the impultion, which is dual to the velocity.

Riemannian Hamiltonian. For a domain $\Omega \subset R^d$ equipped with a Riemannian metric $M : \Omega \to S_d^{++}$, the Hamiltonian is defined as $$ H(q,p) := \frac 1 2 \|p\|_{D(q)}^2, \quad \text{where} D(q) := M(q)^{-1}. $$

Poincaré's half plane model. We illustrate this ODE in the case of Poincaré's half plane model of the hyperbolic space, defined by the Riemannian metric $$ M(q) := \frac{\mathrm{Id}}{q_1^2}, $$ where $q = (q_0,q_1) \in \Omega = R \times R^{++}$. Thus $H(q,p) = (q_1^2/2) \|p\|^2$.

Exponential map. We also consider the Riemannian metric on $R^2 \setminus \{0\}$ defined by $$ M(q) := \frac{\mathrm{Id}}{\|q\|^2}. $$ It can be regarded as the metric induced by the logarithmic (multivalued) map $\ln : C \setminus \{0\} \to C$, where the targer space $C$ is equipped with the usual norm, and it possesses periodic trajectories.

In [6]:

def SquaredNorm(p):
    return (p**2).sum(axis=0)
def H_HalfPlane(q,p):
    return 0.5*q[1]**2*SquaredNorm(p)
def H_Log(q,p):
    return 0.5*SquaredNorm(q)*SquaredNorm(p)

1.2 Celestial mechanics¶

The Hamiltonian is in this case the sum of the kinetic energy, and of the potential energy, say w.r.t. a mass at the origin.

In [7]:

def H_Celestial(q,p):
    return 0.5*SquaredNorm(p) - SquaredNorm(q)**-0.5

1.3 The Lotka-Volterra Predator-Prey model¶

Some models in appeareance unrelated with mechanics or optics also exhibit a Hamiltonian structure. This is the case of the Lotka-Voltera Predator-Prey model, depending on parameters $\alpha,\beta,\gamma,\delta \in R$ $$ \dot x = x (\alpha-\beta y), \qquad \dot y = y(\delta x-\gamma). $$ Using a logarithmic change of variables, $q=\ln x$ and $p=\ln y$, this model can be put in Hamiltonian form, with $$ H(q,p) = \alpha p - \beta e^p + \gamma q - \delta e^q. $$

In [8]:

LotkaVolterra_params = (1.,1.,1.,1.)
def H_LotkaVolterra(q,p,params=LotkaVolterra_params):
    alpha,beta,gamma,delta=params
    return alpha*p - beta*np.exp(p) + gamma*q-delta*np.exp(q)

1.4 Using the Hamiltonian class¶

Since Hamiltonian structures are quite common in mathematics, a Hamiltonian class is provided, which implements most of the operations presented in this notebook. It does rely on automatic differentiation as well, but hides these details internal internally. We check here that it reproduces the same results.

In [9]:

# Non-separable Hamiltonians
HalfPlane = GenericHamiltonian(H_HalfPlane)
Log = GenericHamiltonian(H_Log)

# Separable Hamiltonian
alpha,beta,gamma,delta=LotkaVolterra_params
LotkaVolterra = SeparableHamiltonian( lambda q:gamma*q - delta*np.exp(q), lambda p:alpha*p - beta*np.exp(p) )
LotkaVolterra.vdim=None # Uses scalar variables

# Separable Hamiltonian with a quadratic structure in the second variable
Celestial = SeparableHamiltonian( lambda q:-SquaredNorm(q)**-0.5, lambda p:0.5*SquaredNorm(p) )

In [10]:

# Collect the Hamiltonians defined as functions or classes, for comparison purposes
HClass = [HalfPlane,Log,Celestial] 
HFun = [H_HalfPlane,H_Log,H_Celestial] 

2. Automatic differentiation basics¶

We illustrate the automatic differentiation mechanism by differentiating a Hamiltonian w.r.t. the position $q$ and momentum $p$, which is a prerequisite for implementing Hamilton's ODE of motion.

2.1 Differentiating a function¶

In [11]:

Hamiltonian = H_HalfPlane

In [12]:

q = np.array([0.,1.])
p = np.array([1.,1.])

In [13]:

Hamiltonian(q,p)

Out[13]:

1.0

In order access the Hamiltonian's derivatives, say with respect to position, we evaluate it on a variable featuring first order information. Said otherwise, we create a variable representing the first order Taylor expansion $$ q_\mathrm{ad} = (q_0+\delta_0,q_1+\delta_1) + O(\|\delta\|^2), $$ where $\delta = (\delta_0,\delta_1)$ is a symbolic, infinitesimal perturbation.

In [14]:

q_ad = ad.Dense.identity(constant=q)

In [15]:

q_ad # Contains an additional array representing first order information.

Out[15]:

denseAD(array([0., 1.]),
array([[1., 0.],
       [0., 1.]]))

In [16]:

print("Zero-th order : ",q_ad.value)
print("First order : \n",q_ad.coef)

Zero-th order :  [0. 1.]
First order : 
 [[1. 0.]
 [0. 1.]]

The first order information is propagated through the computations, and eventually we obtain the result $$ H(q_\mathrm{ad},p) = H(q,p) + <\nabla_q H(q,p),\delta> + O(\|\delta\|^2). $$ Note that, for Poincaré's half plane model, one has $\nabla_q H(q,p) = (0,q_1 \|p\|^2)$.

In [17]:

Hamiltonian(q_ad,p)

Out[17]:

denseAD(array(1.),array([0., 2.]))

In [18]:

print("Gradient of the Hamiltonian w.r.t. the variable q : ",Hamiltonian(q_ad,p).gradient())

Gradient of the Hamiltonian w.r.t. the variable q :  [0. 2.]

We can similarly differentiate w.r.t. the momentum $p$. Poincaré's half plane model obeys $\nabla_p H(q,p) = q_1^2 p$.

In [19]:

p_ad = ad.Dense.identity(constant=p)
Hamiltonian(q,p_ad)

Out[19]:

denseAD(array(1.),array([1., 1.]))

Comparing with the Hamiltonian class

In [20]:

for Cls,Fun in zip(HClass,HFun):
    assert norm_infinity( Cls.H(q,p) - Fun(q,p) ) < 1e-14
    assert norm_infinity( Cls.DqH(q,p) - Fun(q_ad,p).gradient() ) < 1e-14
    assert norm_infinity( Cls.DpH(q,p) - Fun(q,p_ad).gradient() ) < 1e-14

2.2 Differentiating w.r.t several variables¶

Evaluating $H(q_\mathrm{ad},p_\mathrm{ad})$ yields an result may sound unexpected: we obtain the sum $\nabla_q H(q,p)+\nabla_p H(q,p)$ of the gradients, instead of their independent values. Indeed, this follows from the first order Taylor expansion $$ H(q+\delta,p+\delta) = H(q,p) + <\nabla_q H(q,p),\delta> + <\nabla_p H(q,p),\delta> + O(\|\delta\|^2), $$ where $\delta = (\delta_0,\delta_1)$ is an infinitesimal symbolic perturbation.

In [21]:

Hamiltonian(q_ad,p_ad)

Out[21]:

denseAD(array(1.),array([1., 3.]))

In order to obtain the gradient of $H$ w.r.t the independent variables $q$ and $p$, we need to specify that the corresdponding infinitesimal perturbations are independent. Namely set $$ q_\mathrm{ad} = q+(\delta_0,\delta_1), \quad p_\mathrm{ad} = p+(\delta_2,\delta_3), $$ where $(\delta_0,\cdots,\delta_4)$ is four dimensional infinitesimal perturbation.

In [22]:

q_ad = ad.Dense.identity(constant=q,shift=(0,p.size))
p_ad = ad.Dense.identity(constant=p,shift=(q.size,0))

As desired, we obtain the full gradient of $H$, which is the concatenation of $\nabla_q H$ and $\nabla_p H$.

In [23]:

Hamiltonian(q_ad,p_ad)

Out[23]:

denseAD(array(1.),array([0., 2., 1., 1.]))

In [24]:

q_ad

Out[24]:

denseAD(array([0., 1.]),
array([[1., 0., 0., 0.],
       [0., 1., 0., 0.]]))

In [25]:

p_ad

Out[25]:

denseAD(array([1., 1.]),
array([[0., 0., 1., 0.],
       [0., 0., 0., 1.]]))

Apply independent perturbations to a sequence of variables. The register function takes care of shifting the perturbations adequately.

In [26]:

q_ad_,p_ad_ = ad.Dense.register((q,p))
for a,b in zip( (q_ad_,p_ad_), (q_ad,p_ad)):
    assert norm_infinity_ad(a-b)==0

2.3 Vectorization¶

A common programming style in Python application to numerics is vectorized computations using numpy tensors. This allows avoiding loops, when some operation has to be repeated a large number of times, with improvements in clarity, flexibility, and speed.

As an example, we compute $H(q_0,p_0)$, $H(q_1,p_1)$, $H(q_2,p_2)$, where $q_0,q_1,q_2$ and $p_0,p_1,p_2$ are different positions and impulsions. We also differentiate these values w.r.t. the impulsion.

In [27]:

q0 = np.array([0.,1.]); q1 = np.array([1.,1.]); q2 = np.array([-2.,1.]); 
p0 = np.array([1.,1.]); p1 = np.array([1.,0.]); p2 = np.array([-1.,-1.]); 

In [28]:

Hamiltonian(q0,p0), Hamiltonian(q1,p1), Hamiltonian(q2,p2) 

Out[28]:

(1.0, 0.5, 1.0)

In [29]:

Ham_seq = [Hamiltonian(q,ad.Dense.identity(constant=p)) for (q,p) in ((q0,p0), (q1,p1), (q2,p2))]
Ham_seq

Out[29]:

[denseAD(array(1.),array([1., 1.])),
 denseAD(array(0.5),array([1., 0.])),
 denseAD(array(1.),array([-1., -1.]))]

The vectorized form of these computations is as follows.

In [30]:

q_vec = np.stack( (q0,q1,q2),axis=1)
p_vec = np.stack( (p0,p1,p2),axis=1)

In [31]:

Hamiltonian(q_vec,p_vec)

Out[31]:

array([1. , 0.5, 1. ])

In [32]:

p_vec_ad = ad.Dense.identity(constant=p_vec)

However, the automatic differentiation result may not be what is desired. Indeed, the entries of q_vec are all considered independent, and we recover a six dimensional gradient vector for each component. This is both impractical and numerically expensive.

In [33]:

Ham_vec = Hamiltonian(q_vec,p_vec_ad)
Ham_vec

Out[33]:

denseAD(array([1. , 0.5, 1. ]),
array([[ 1.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0., -1.,  0.,  0., -1.]]))

In [34]:

print("Sequential result : ",Ham_seq[0])
print("Vectorized result : ",Ham_vec[0])

Sequential result :  denseAD(1.0,[1. 1.])
Vectorized result :  denseAD(1.0,[1. 0. 0. 1. 0. 0.])

In this situation, and in contrast with the previous section, we would like the perturbations to be regarded as dependent. This is achieved with the following parameters:

shape_free : apply free independent symbolic perturbations w.r.t. these dimensions.
shape_bound : apply bound correlated symbolic perturbations w.r.t. these dimensions.

It is assumed that the free dimensions come first, and the bound dimensions come last. Only one of the two parameters shape_free and shape_bound needs to be specified, since they are redundant.

In [35]:

p_vec.shape

Out[35]:

(2, 3)

The first dimension is free, and the last one is bound since it is only introduced for vectorization purposes.

In [36]:

shape_free,shape_bound = p_vec.shape[:1], p_vec.shape[1:]

In [37]:

p_vec_ad = ad.Dense.identity(constant=p_vec, shape_bound=shape_bound)
#p_vec_ad = ad.Dense.identity(constant=p_vec, shape_free=shape_free) # Equivalently

In [38]:

Ham_vec = Hamiltonian(q_vec,p_vec_ad)
print("Vectorized result (with shape_bound): \n", Ham_vec)

Vectorized result (with shape_bound): 
 denseAD([1.  0.5 1. ],
[[ 1.  1.]
 [ 1.  0.]
 [-1. -1.]])

This technique is compatible with differentiation w.r.t. multiple independent variables.

In [39]:

q_vec_ad = ad.Dense.identity(constant=q_vec, shape_free=shape_free, shift=(0,len(p_vec)) )
p_vec_ad = ad.Dense.identity(constant=p_vec, shape_free=shape_free, shift=(len(q_vec),0) )

In [40]:

Ham_vec = Hamiltonian(q_vec_ad,p_vec_ad)
Ham_vec

Out[40]:

denseAD(array([1. , 0.5, 1. ]),
array([[ 0.,  2.,  1.,  1.],
       [ 0.,  1.,  1.,  0.],
       [ 0.,  2., -1., -1.]]))

In [41]:

print("Full gradient w.r.t q0,p0 : ", Ham_vec[0])

Full gradient w.r.t q0,p0 :  denseAD(1.0,[0. 2. 1. 1.])

Again, the register function takes car of the shifts.

In [42]:

q_vec_ad_,p_vec_ad_ = ad.Dense.register((q_vec,p_vec), shape_bound=shape_bound)
for a,b in zip( (q_vec_ad_,p_vec_ad_), (q_vec_ad,p_vec_ad) ):
    assert  norm_infinity_ad(a-b) == 0

Alternative implementation using arrays of arrays. Another way to achieve vectorization is to use 'arrays of arrays'. This approach is not recommended here, since it is more cumbersome and less efficient, but it is still shown for completeness and testing.

In [43]:

q_diss = ad.disassociate(q_vec, shape_free=(2,), # Creates an array of arrays
                         expand_free_dims=None, expand_bound_dims=None) 

In [44]:

print("An array of arrays :", q_diss)
print("First element :", q_diss[0])

An array of arrays : [array([ 0.,  1., -2.]) array([1., 1., 1.])]
First element : [ 0.  1. -2.]

It is safer to avoid scalar arrays in the context of 'arrays of arrays'. Indeed, confusion aries otherwise between 'arrays scalars with array components', and 'standard arrays'. The default for ad.disassociate is to introduce singleton dimensions to ensure this behavior is avoided.

In [45]:

q_diss = ad.disassociate(q_vec, shape_free=(2,)) 
p_diss = ad.disassociate(p_vec, shape_free=(2,))

In [46]:

print("Disassociated array ", q_diss)
print("with type ", type(q_diss), "with data type ", q_diss.dtype, 
      "and shape ", q_diss.shape, "including a trailing singleton.")
print("Reconstructed array", ad.associate(q_diss)) # Inverse operation to disassociate

Disassociated array  [[array([[ 0.],
         [ 1.],
         [-2.]])]
 [array([[1.],
         [1.],
         [1.]])]]
with type  <class 'numpy.ndarray'> with data type  object and shape  (2, 1) including a trailing singleton.
Reconstructed array [[ 0.  1. -2.]
 [ 1.  1.  1.]]

We can now manipulate q_diss and p_diss as standard low-dimensional arrays, abstracting the underlying vectorization, and in particular use the AD tools.

In [47]:

Hamiltonian(q_diss,p_diss)

Out[47]:

array([array([[1. ],
              [0.5],
              [1. ]])], dtype=object)

In [48]:

q_diss_ad = ad.Dense.identity(constant=q_diss,shift=(0,2))
p_diss_ad = ad.Dense.identity(constant=p_diss,shift=(2,0))

In [49]:

Ham_diss_ad = Hamiltonian(q_diss_ad,p_diss_ad)

In [50]:

ad.associate(Ham_diss_ad)

Out[50]:

denseAD(array([1. , 0.5, 1. ]),
array([[ 0.,  2.,  1.,  1.],
       [ 0.,  1.,  1.,  0.],
       [ 0.,  2., -1., -1.]]))

Comparison with the Hamiltonian class

The Hamiltonian class needs to know about the number of independent components, in order to deal with multi-dimensional arrays appropriately.

In [51]:

HalfPlane.shape_free = (2,)

In [52]:

assert norm_infinity_ad(Ham_vec - ad.associate(Ham_diss_ad)) < 1e-14
assert norm_infinity(HalfPlane.H(q_vec,p_vec) - Ham_vec.value) < 1e-14
assert norm_infinity(HalfPlane.DqH(q_vec,p_vec) - Ham_vec.gradient()[:2]) < 1e-14
assert norm_infinity(HalfPlane.DpH(q_vec,p_vec) - Ham_vec.gradient()[2:]) < 1e-14

2.4 Second order automatic differentiation¶

There are situations where second order automatic differentiation is needed. It could be achieved, in principle, by recursive calls to first order differentiation. However, for reasons of performance and convenience, a dedicated class is implemented.

In [53]:

q_ad = ad.Dense2.identity(constant=q)

The hessian of Poincaré's half plane model hamiltonian, w.r.t. position is $$ \nabla^2_q H(q,p) = \begin{pmatrix} 0 & 0 \\ 0 & \|p\|^2 \end{pmatrix} $$ and w.r.t. impulsion $$ \nabla^2_p H(q,p) = q_0^2 \mathrm{Id}. $$

In [54]:

Ham = Hamiltonian(q_ad,p)
Ham

Out[54]:

denseAD2(array(1.),array([0., 2.]),
array([[0., 0.],
       [0., 2.]]))

In [55]:

print("Function value : ", Ham.value)
print("Gradient : ", Ham.coef1)
print("Hessian : \n", Ham.coef2)

Function value :  1.0
Gradient :  [0. 2.]
Hessian : 
 [[0. 0.]
 [0. 2.]]

In [56]:

p_ad = ad.Dense2.identity(constant=p)

In [57]:

Hamiltonian(q,p_ad)

Out[57]:

denseAD2(array(1.),array([1., 1.]),
array([[1., 0.],
       [0., 1.]]))

A joint hessian, of size $4\times 4$, can be computed.

In [58]:

q_ad = ad.Dense2.identity(constant=q, shift=(0,p.size))
p_ad = ad.Dense2.identity(constant=p, shift=(q.size,0))

In [59]:

p_ad.coef2.shape

Out[59]:

(2, 4, 4)

In [60]:

Hamiltonian(q_ad,p_ad)

Out[60]:

denseAD2(array(1.),array([0., 2., 1., 1.]),
array([[0., 0., 0., 0.],
       [0., 2., 2., 2.],
       [0., 2., 1., 0.],
       [0., 2., 0., 1.]]))

Vectorization works as in the first order case.

In [61]:

q_vec_ad = ad.Dense2.identity(constant=q_vec, shape_free=shape_free)

In [62]:

Ham = Hamiltonian(q_vec_ad,p_vec)
Ham[0]

Out[62]:

denseAD2(array(1.),array([0., 2.]),
array([[0., 0.],
       [0., 2.]]))

In [63]:

q_vec_ad = ad.Dense2.identity(constant=q_vec,shape_bound=shape_bound, shift=(0,len(p_vec)) )
p_vec_ad = ad.Dense2.identity(constant=p_vec,shape_bound=shape_bound, shift=(len(q_vec),0) )

In [64]:

Ham = Hamiltonian(q_vec_ad,p_vec_ad)
Ham[0]

Out[64]:

denseAD2(array(1.),array([0., 2., 1., 1.]),
array([[0., 0., 0., 0.],
       [0., 2., 2., 2.],
       [0., 2., 1., 0.],
       [0., 2., 0., 1.]]))

Alternative approach using recursive automatic differentiation. We present an alternative approach to second order automatic differentiation, based on recursive first order differentiation. It is not recommended, since it is more cumbersome and less efficient, but it is still shown for completeness and testing.

In [65]:

def ADRecIdentity(constant,shape_free=None,shift=(0,0)):
    # First order differentiation
    q_ad = ad.Dense.identity(constant=constant, shape_free=shape_free, shift=shift) 
    # Hide automatic differentiation info in the component type
    q_diss = ad.disassociate(q_ad, shape_free=shape_free) 
    # Recursive first order differentiation
    q_ad2 = ad.Dense.identity(constant=q_diss, shape_free=shape_free, shift=shift) 
    return q_ad2

def ADRecToAD2(q): 
    q01 = ad.associate(q.value) # Zero-th and first order
    q12 = ad.associate(q.coef,squeeze_free_dims=-2) # first and second order
    q12 = np.moveaxis(q12,q.ndim-1,-1) # Put ad information last
    return ad.Dense2.denseAD2(q01.value,q01.coef,q12.coef)

In [66]:

q_rec = ADRecIdentity(q,shape_free=(2,), shift=(0,2))
p_rec = ADRecIdentity(p,shape_free=(2,), shift=(2,0))

In [67]:

Ham_rec = Hamiltonian(q_rec,p_rec)
ADRecToAD2(Ham_rec)

Out[67]:

denseAD2(array(1.),array([0., 2., 1., 1.]),
array([[0., 0., 0., 0.],
       [0., 2., 2., 2.],
       [0., 2., 1., 0.],
       [0., 2., 0., 1.]]))

In [68]:

q_vec_rec = ADRecIdentity(q_vec, shape_free=(2,), shift=(0,2))
p_vec_rec = ADRecIdentity(p_vec, shape_free=(2,), shift=(2,0))

In [69]:

Ham_vec_rec = Hamiltonian(q_vec_rec,p_vec_rec)
ADRecToAD2(Ham_vec_rec)[0]

Out[69]:

denseAD2(array(1.),array([0., 2., 1., 1.]),
array([[0., 0., 0., 0.],
       [0., 2., 2., 2.],
       [0., 2., 1., 0.],
       [0., 2., 0., 1.]]))

2.5 The gradient and hessian member functions¶

Automatic differentiation allows to compute the gradient and hessian of a function.

In [70]:

print(f"An array of shape {Ham.shape}, and AD information w.r.t {Ham.size_ad} independent variables")

An array of shape (3,), and AD information w.r.t 4 independent variables

Internally, the coefficients corresponding to differentiation of are stored last.

In [71]:

print("Internal storage has derivatives last.")
print(f"first order: {Ham.coef1.shape}, second order: {Ham.coef2.shape}")

Internal storage has derivatives last.
first order: (3, 4), second order: (3, 4, 4)

However, for practical purposes it is often desirable to move them first. This is the role of the gradient and hessian functions.

In [72]:

print("Interface moves derivatives first.")
print(f"gradient: {Ham.gradient().shape}, hessian: {Ham.hessian().shape}")

Interface moves derivatives first.
gradient: (4, 3), hessian: (4, 4, 3)

3. Solving ordinary differential equations¶

We implement Hamilton's equations of motion $$ \dot q = \partial_p H(q,p), \qquad \dot p = -\partial_q H(q,p), $$ using several numerical schemes. In terms of automatic differentiation, the most interesting case is the semi-implicit Symplectic Euler scheme.

Conserved quantity. By construction, the Hamiltonian value is conserved along the Hamiltonian flow: $$ H(q(t),p(t)) = \mathrm{cst}, $$ if $t \mapsto (q(t),p(t))$ is a solution to Hamilton's equations of motion.

Explicit solution for the half plane model. The geodesics for the Poincare half plane model are known explicitly. They are half circles whose center lies on the horizontal axis.

3.1 Explicit Euler scheme¶

The simplest of all ODE solvers reads $$ \frac{q_{n+1}-q_n} {\Delta t} = \partial_p H(q_n,p_n), \qquad \frac{p_{n+1}-p_n} {\Delta t} = -\partial_q H(q_n,p_n), $$ where $(q_0,p_0)$ is a given initial condition, and $n>0$ is the iteration number, and $\Delta t$ is the time step.

In [73]:

def EulerStep(q,p,H,dt):
    d=len(q)
    q_ad = ad.Dense.identity(constant=q,shape_free=(d,),shift=(0,d))
    p_ad = ad.Dense.identity(constant=p,shape_free=(d,),shift=(d,0))
    grad = H(q_ad,p_ad).gradient()
    return q+dt*grad[d:],p-dt*grad[:d]

def Geodesic(q0,p0,H,t,n=100,step=EulerStep):
    dt=t/n
    q,p=q0.copy(),p0.copy()
    Q,P=[q],[p]
    for i in range(n):
        q,p=step(q,p,H,dt)
        Q.append(q)
        P.append(p)
    # For convenience, we put time as a second coordinate.
    return np.stack(Q,axis=1),np.stack(P,axis=1),np.linspace(0,t,n+1)

Euler's explicit scheme is only first order accurate. While the geodesic circular shape is more or less observed, the Hamiltonian conservation is far from perfect.

In [74]:

Q,P,T = Geodesic(q_vec,p_vec,Hamiltonian,4.)

In [75]:

plt.axis('equal'); plt.title("Poincare geodesic (Explicit euler)")
plt.plot(Q[0],Q[1]); plt.axhline(0);

In [76]:

plt.title("Poincare Hamiltonian (Explicit euler)")
plt.plot(T,Hamiltonian(Q,P));

3.2 High order explicit schemes¶

A natural approach to improve the ODE solver is to increase its consistency order. This can be achieved using Runge-Kutta methods, here of second and fourth order.

In [77]:

def SymplecticGradient(q,p,H):
    d=len(q)
    q_ad = ad.Dense.identity(constant=q,shape_free=(d,),shift=(0,d))
    p_ad = ad.Dense.identity(constant=p,shape_free=(d,),shift=(d,0))
    grad = H(q_ad,p_ad).gradient()
    return grad[d:],-grad[:d]

def RK2Step(q,p,H,dt):
    k1 = SymplecticGradient(q,p,H)
    k2 = SymplecticGradient(q+0.5*dt*k1[0],p+0.5*dt*k1[1],H)
    return q+dt*k2[0],p+dt*k2[1]

def RK4Step(q,p,H,dt):
    k1 = SymplecticGradient(q,p,H)
    k2 = SymplecticGradient(q+0.5*dt*k1[0],p+0.5*dt*k1[1],H)
    k3 = SymplecticGradient(q+0.5*dt*k2[0],p+0.5*dt*k2[1],H)
    k4 = SymplecticGradient(q+dt*k3[0],p+dt*k3[1],H)
    return q+dt*(k1[0]+2*k2[0]+2*k3[0]+k4[0])/6.,p+dt*(k1[1]+2*k2[1]+2*k3[1]+k4[1])/6.

In [78]:

Q2,P2,T = Geodesic(q_vec,p_vec,Hamiltonian,4.,step=RK2Step)
Q4,P4,T = Geodesic(q_vec,p_vec,Hamiltonian,4.,step=RK4Step)

In [79]:

plt.axis('equal'); plt.title("Poincare geodesic (Explicit solvers)")
plt.plot(Q[0],Q[1], Q2[0],Q2[1], Q4[0],Q4[1]); plt.axhline(0);

In this example, the Hamiltonian is almost exactly constant for the second and fourth order models.

In [80]:

plt.title("Poincare Hamiltonian (Runge-Kutta)")
plt.plot(T,Hamiltonian(Q2,P2), T,Hamiltonian(Q4,P4)); #T,Hamiltonian(Q,P),

Cone model. However, if the trajectory is long enough, and the time step is large enough, then even the high order models fail to conserve the Hamiltonian. This is evidenced with

In [81]:

Hamiltonian = H_Log

In [82]:

q,p = np.array([1,0]),np.array([0,1])

In [83]:

Q,P,T = Geodesic(q,p,Hamiltonian,6.,step=EulerStep)
Q2,P2,T2 = Geodesic(q,p,Hamiltonian,25.,step=RK2Step)
Q4,P4,T4 = Geodesic(q,p,Hamiltonian,75.,step=RK4Step)

In [84]:

plt.axis('equal'); plt.title("Cone model (Explicit solvers)")
plt.plot(Q[0],Q[1], Q2[0],Q2[1], Q4[0],Q4[1]); 

In [85]:

plt.title("Cone Hamiltonian (Explicit solvers)")
plt.plot(T,Hamiltonian(Q,P), T2,Hamiltonian(Q2,P2), T4,Hamiltonian(Q4,P4));

Likewise, in the celestial mechanics case, the Hamiltonian drifts with time using the Euler explicit and Runge-Kutta schemes.

In [86]:

Hamiltonian = H_Celestial
q,p = np.array([1.,0.]),np.array([0.3,0.7])
t = 10.
Q,P,T = Geodesic(q,p,Hamiltonian,t,step=EulerStep,n=int(100*t))
Q2,P2,T2 = Geodesic(q,p,Hamiltonian,t,step=RK2Step,n=int(20*t))
Q4,P4,T4 = Geodesic(q,p,Hamiltonian,t,step=RK4Step,n=int(8*t))

In [87]:

plt.axis('equal'); plt.title("Poincare geodesic (Explicit solvers)")
plt.plot(Q[0],Q[1], Q2[0],Q2[1], Q4[0],Q4[1]); 

In [88]:

plt.plot(T,Hamiltonian(Q,P), T2,Hamiltonian(Q2,P2), T4,Hamiltonian(Q4,P4));

Comparison with the Hamiltonian class

In [89]:

Q_,P_,T_    = Celestial.integrate(q,p,scheme="Euler",niter=int(100*t),T=t,path=True)
Q2_,P2_,T2_ = Celestial.integrate(q,p,scheme="Runge-Kutta-2", niter=int(20*t),  T=t,path=True)
Q4_,P4_,T4_ = Celestial.integrate(q,p,scheme="Runge-Kutta-4", niter=int(8*t),   T=t,path=True)
for a,b in zip( (Q_,P_,T_,Q2_,P2_,T2_,Q4_,P4_,T4_), (Q,P,T,Q2,P2,T2,Q4,P4,T4) ): 
    assert norm_infinity(a-b)<1e-14

3.3 Symplectic Euler scheme - Separable case¶

Symplectic schemes guarantee that a quantity, closely related with the Hamiltonian, is conserved along the trajectory.

The Euler symplectic scheme is defined as follows: $$ \frac{p_{n+1}-p_n}{\Delta t} = -\partial_q H(q_n,p_{n+1}), \quad \frac{q_{n+1}-q_n}{\Delta t} = \partial_p H(q_n,p_{n+1}), $$ Note that the scheme is semi-implicit, in view of the definition of $p_{n+1}$. Two cases must be distinguished for the implementation.

Separable Hamiltonians. If the Hamiltonian takes the form $H(q,p) = F(p) + V(q)$, then the Euler symplectic scheme becomes explicit. The Celestial Mechanics and Lotka-Volterra schemes fall in that category.
Non separable Hamiltonians. If the Hamiltonian has a general form, then one can try to solve the implicit part of Euler's symplectic scheme using a Newton method. Another approach, not detailed here, is to introduce an higher dimensional separable Hamiltonian with the same dynamics.

In [90]:

def SymplecticEulerStep_Separable(q,p,H,dt):
    d=len(q)
        
    # Differentiate w.r.t. q and advance p
    q_ad = ad.Dense.identity(constant=q,shape_free=(d,))
    q_grad = H(q_ad,p).gradient() # We can use p thanks to separability
    p1 = p-dt*q_grad
    
    # Differentiate w.r.t. p and advance q
    p1_ad = ad.Dense.identity(constant=p1,shape_free=(d,))
    p1_grad = H(q,p1_ad).gradient()
    q1 = q+dt*p1_grad

    return q1,p1

In [91]:

Hamiltonian = H_Celestial
q,p = np.array([1,0]),np.array([0.3,0.7])

In [92]:

t = 20.
QS,PS,TS = Geodesic(q,p,Hamiltonian,t,step=SymplecticEulerStep_Separable,n=int(30*t))

The symplectic Euler scheme is only first order accurate, but it has the property that the Hamiltonian remains bounded above and below during the evolution.

In [93]:

plt.axis('equal'); plt.title("Celestial mechanics (Euler symplectic)")
plt.plot(QS[0],QS[1]); 

In [94]:

plt.title("Hamiltonian conservation (Euler symplectic)")
plt.plot(TS,Hamiltonian(QS,PS));

3.4 Symplectic Verlet scheme - Separable case¶

The Verlet method is a second order symplectif integration scheme. It is explicit if the Hamiltonian is separable.

In [95]:

def SymplecticVerletStep_Separable(q,p,H,dt):
    d=len(q)
    
    # Differentiate w.r.t. q and advance p
    q_ad = ad.Dense.identity(constant=q,shape_free=(d,))
    q_grad = H(q_ad,p).gradient()
    p1 = p-0.5*dt*q_grad
    
    # Differentiate w.r.t. p and advance q
    p1_ad = ad.Dense.identity(constant=p1,shape_free=(d,))
    p1_grad = H(q,p1_ad).gradient()
    q1 = q+dt*p1_grad
    
    # Differentiate w.r.t. q and advance p
    q1_ad = ad.Dense.identity(constant=q1,shape_free=(d,))
    q1_grad = H(q1_ad,p1).gradient()
    p2 = p1-0.5*dt*q1_grad
    
    return q1,p2

In [96]:

Hamiltonian = H_Celestial
q,p = np.array([1.,0.]),np.array([0.3,0.7])

In [97]:

t = 20.
QV,PV,TV = Geodesic(q,p,Hamiltonian,t,step=SymplecticVerletStep_Separable,n=int(30*t))

It can be shown that the Verlet scheme, in the separable case, amounts to the symplecting Euler scheme shifted by half a time step. Visually, their output is therefore quite similar.

In [98]:

plt.axis('equal'); plt.title("Celestial mechanics (Verlet symplectic)")
plt.plot(QV[0],QV[1]); 

The improved accuracy is more obvious on the Hamiltonian conservation than on the trajectory.

In [99]:

plt.title("Hamiltonian conservation (Verlet symplectic vs Euler symplectic)")
plt.plot(TV,Hamiltonian(QV,PV), TS,Hamiltonian(QS,PS) );

Comparison with the Hamiltonian class

In [100]:

QS_,PS_,TS_ = Celestial.integrate(q,p,scheme="Euler-p", niter=int(30*t),T=t,path=True)
QV_,PV_,TV_ = Celestial.integrate(q,p,scheme="Verlet-p",niter=int(30*t),T=t,path=True)
for a,b in zip( (QS_,PS_,TS_,QV_,PV_,TV_), (QS,PS,TS,QV,PV,TV) ): assert norm_infinity(a-b)<1e-14

3.5 Symplectic Euler scheme - Non separable case¶

We implement the general semi-implicit Symplectic Euler scheme using a Newton method for the implicit part. Our first step is to implement the Newton method.

Note on Newton versus Fixed point. In the context of implicit schemes for geometric integration, fixed point iterations are often more computationally efficient than Newton iterations. We consider Newton iterations here for illustration of the AD library. The Hamiltonian class uses by default a fixed point solver for the implicit schemes.

Generic Newton solver. A Newton solver is implemented in the ad.Optimization package, based on a similar approach using automatic , but with some additional flexibility.

In [101]:

def NewtonSolve(F,p0,niter=5,*args):
    p_ad = ad.Dense.identity(constant=p0.copy(),shape_free=(len(p0),))
    for i in range(niter):
        p_ad = p_ad+F(p_ad,*args).solve()
    return p_ad.value

We test it by solving $x+x^2=0$, with initial guess $x=0.5$. The solution $x=0$ is found.

In [102]:

def FTest(x):
    return x+x**2
NewtonSolve(FTest,np.array([0.5]))

Out[102]:

array([5.39659527e-16])

However, compatibility issues may arise if the function called $F$ itself uses automatic differentiation in its definition. This is the case of our numerical schemes, which involve the partial derivatives of the Hamiltonian. We circumvent this potential problem by hiding the AD information in the element type.

In [103]:

def NewtonSolve_AD(F,p0,niter=5,*args):
    d=len(p0)
    def disac(x): return ad.disassociate(x,shape_free=(d,))
    def assoc(x): return ad.associate(x)
    p_ad = ad.Dense.identity(constant=p0.copy(),shape_free=(d,))
    for i in range(niter):
        p_ad = p_ad+assoc(F(disac(p_ad),*args)).solve()
    return p_ad.value

In [104]:

def SymplecticEulerStep(q0,p0,H,dt):
    d=len(q0)
    
    def F(p1):
        # A potential issue is that p1 may have an additional singleton dimension,
        # which is introduced in recursive AD techniques. So we must reshape q0 and p0 accordingly.
        q0_,p0_= (a.reshape(p1.shape) for a in (q0,p0))    
        
        q0_ad = ad.Dense.identity(constant=q0_,shape_free=(d,))
        q0_grad = H(q0_ad,p1).gradient()
        return p1-p0_+dt*q0_grad
    
    p1 = NewtonSolve_AD(F,p0) # Implicit update of momentum
    
    p1_ad = ad.Dense.identity(constant=p1,shape_free=(d,))
    p1_grad = H(q0,p1_ad).gradient()
    q1=q0+dt*p1_grad # Explicit update of position
    
    return q1,p1

If the Hamiltonian is separable, then the semi-implicit scheme coincides with the explicit scheme.

In [105]:

Hamiltonian = H_Celestial
q,p = np.array([1.,0.]),np.array([0.3,0.7])
qI,pI = SymplecticEulerStep(q,p,Hamiltonian,0.1) # Semi-implicit
qE,pE = SymplecticEulerStep_Separable(q,p,Hamiltonian,0.1)
for a,b in zip( (qI,pI), (qE,pE) ):
    assert norm_infinity(a-b)<1e-14

However, only the semi-implicit implementation is truly symplectic for general hamiltonians.

In [106]:

Hamiltonian = H_Log
q,p = np.array([1.,0.]),np.array([0.,1.])
print("Semi-implicit :", SymplecticEulerStep(q,p,Hamiltonian,0.1))
print("Explicit      :", SymplecticEulerStep_Separable(q,p,Hamiltonian,0.1))

Semi-implicit : (array([0.98989795, 0.1       ]), array([-0.10102051,  1.        ]))
Explicit      : (array([0.99, 0.1 ]), array([-0.1,  1. ]))

In [107]:

t = 20.
QI,PI,TI = Geodesic(q,p,Hamiltonian,t,step=SymplecticEulerStep,n=int(10*t))
QS,PS,TS = Geodesic(q,p,Hamiltonian,t,step=SymplecticEulerStep_Separable,n=int(10*t))

None of the two schemes is very accurate, since they are only first order. (In this specific example, one expects a closed geodesic.)

In [108]:

plt.axis('equal'); plt.title("Cone model (Euler symplectic, separable or non-separable)")
plt.plot(QI[0],QI[1], QS[0],QS[1] ); 

The semi-implicit scheme almost perfectly conserves the Hamiltonian. This is in contrast with the explicit scheme which here makes the invalid assumption of separability. The goal is achieved.

In [109]:

plt.title("Cone Hamiltonian, Euler alternating scheme")
plt.plot(TI,Hamiltonian(QI,PI), label="semi-implicit (symplectic)")
plt.plot(TS,Hamiltonian(QS,PS), label="explicit (non-symplectic)")
plt.legend();

Comparison with the Hamiltonian class

The default implementation of implicit schemes in the Hamiltonian class uses a fixed point solver. A small discrepancy is observed with the Newton method, due to the convergence tolerance.

In [110]:

QI_,PI_,TI_ = Log.integrate(q,p,scheme="Euler-p", niter=int(10*t),T=t,path=True)
for a,b in zip( (QI_,PI_,TI_), (QI,PI,TI) ): assert norm_infinity(a-b) < 1e-7

Changing the stopping criterion of fixed point solver does improve concordance.

In [111]:

Log.impl_solver = lambda f,x : fixedpoint(f,x,tol=1e-14)
QI_,PI_,TI_ = Log.integrate(q,p,scheme="Euler-p", niter=int(10*t),T=t,path=True)
for a,b in zip( (QI_,PI_,TI_), (QI,PI,TI) ): assert norm_infinity(a-b) < 1e-12

In [112]:

scheme = Log.symplectic_schemes()["Euler-p"]

3.6 The symplectic form¶

The symplectic form is a bilinear form on the space of positions of impulsions, defined as $$ B( (\delta q,\delta p), (\delta q',\delta p') ) = <\delta q,\delta p'> - <\delta p, \delta q'>. $$ It is often denoted as $B = \delta q \wedge \delta p$.

In [113]:

def SymplecticForm(q_ad,p_ad):
    """
    Returns the matrix of the symplectic 2-form, 
    in the considered first order perturbations.
    """
    dp = p_ad.gradient()
    dq = q_ad.gradient()
    size_ad = p_ad.size_ad
    return ad.array([[lp.dot_VV(dq[i],dp[j]) - lp.dot_VV(dp[i],dq[j])
                     for i in range(size_ad)] for j in range(size_ad)])

Consider canonical independent perturbations in the position and impulsion variables. The matrix of the symplectic form is a $2d\times 2d$ antisymmetric matrix, with block form $$ \begin{pmatrix} 0 & -I\\ I & 0 \end{pmatrix}. $$

In [114]:

q,p = np.array([1.,0.]),np.array([0.,1.])
q_ad,p_ad = ad.Dense.register( (q,p) )

In [115]:

mSymp = SymplecticForm(q_ad,p_ad)
print(f"Matrix of the symplectic form:\n{mSymp}")

Matrix of the symplectic form:
[[ 0.  0. -1.  0.]
 [ 0.  0.  0. -1.]
 [ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]]

We check that the form is preserved by our numerical schemes, by applying them to the q_ad and p_ad variables. Since their definition already involves automatic differentiation, we need to hide it using the associate and disassociate functions.

In [116]:

d=len(q)
def disac(x): return ad.disassociate(x,shape_free=(d,))
def assoc(x): return ad.associate(x)

In [117]:

qE_ad,pE_ad = (assoc(x) for x in SymplecticEulerStep_Separable(disac(q_ad),disac(p_ad),H_Celestial,0.1))
assert norm_infinity(SymplecticForm(qE_ad,pE_ad)-mSymp) < 1e-14

In [118]:

scheme(disac(q_ad),disac(p_ad),0.1)

Out[118]:

(array([[denseAD(array([0.98989795]),array([[ 0.96948553,  0.00206207,  0.10206207, -0.02041241]]))],
        [denseAD(array([0.1]),array([[0.2       , 0.98989795, 0.        , 0.1       ]]))]],
       dtype=object),
 array([[denseAD(array([-0.10102051]),array([[-0.10310363,  0.02062073,  1.02062073, -0.20412415]]))],
        [denseAD(array([1.]),array([[ 0.        , -0.10102051,  0.        ,  1.        ]]))]],
       dtype=object))

In [119]:

# Semi-implicit scheme for the H_Log hamiltonian
qI_ad,pI_ad = (assoc(x) for x in scheme(disac(q_ad),disac(p_ad),0.1))
assert norm_infinity(SymplecticForm(qI_ad,pI_ad) - mSymp) < 1e-14

4. Shooting geodesics¶

Geodesic shooting is the process of adjusting the initial velocity or momentum of a geodesic, so that the endpoint at time $t=1$ meets a prescribed position. A natural strategy is to optimize the initial velocity using a Newton method, taking advantage of automatic differentiation to compute the jacobian matrix of the exponential map. Note that AD is also used, independently, to solve Hamilton's ODE, which involves the derivatives of the Hamiltonian.

We use the Poincare Half-Plane model for illustration. It is particularly simple since, due to its hyperbolic nature, there exists a unique geodesic between any two given points.

Riemannian exponential map. In Riemannian geometry, the exponential map associates to a position $q$ and velocity $v$ the endpoint $\exp_x(v) := \gamma_x^v(1)$ of the corresponding geodesic $\gamma_x^v$. The inverse map is known as the logarithmic map. This terminology does not perfectly apply here since Hamilton's equations are written in terms of the Hamiltonian momentum, which is dual to the velocity.

In [120]:

def Shoot(q,p,*args,**kwargs):
    Q,P,T = Geodesic(q,p,*args,**kwargs)
    return np.take(Q,-1,axis=1)

In [121]:

q0,p0=np.array([0,1]),np.array([1,1])
Shoot(q0,p0,H_HalfPlane,1.,step=EulerStep,n=10)

Out[121]:

array([1.7200162 , 1.27931064])

In [122]:

def Aim(q1,q0,pGuess,*args,**kwargs):
    if pGuess is None: pGuess=np.zeros(q0.shape)
    def F(p): 
        nonlocal args,kwargs,q0,q1
        # Reshape needed due to introduced trailing singleton 
        q0_,q1_ = (np.reshape(a,p.shape) for a in (q0,q1))
        return Shoot(q0_,p,*args,**kwargs)-q1_
    return NewtonSolve_AD(F,pGuess)

In [123]:

q1 = np.array([1.,1.])
p01 = Aim(q1,q0,p0,H_HalfPlane,1.,step=EulerStep,n=10)

In [124]:

assert norm_infinity(Shoot(q0,p01,H_HalfPlane,1.,step=EulerStep,n=10) -q1) < 1e-15

In [ ]: