Adaptive PDE discretizations on Cartesian grids¶

Volume : Algorithmic tools¶

Part : Automatic differentiation¶

Chapter : Reverse automatic differentiation¶

This notebook illustrates reverse automatic differentiation, of first and second order. Recall that this approach is recommended for functions from a large dimensional space, to a small dimensional space, and whose jacobian does not have a sparse structure. It is typically appropriate for large scale optimization problems.

Disclaimer. First order reverse automatic differentiation is a classical feature, found in many packages better optimized and better maintained than this one. If you only need this specific feature, then it could be wise to use another implementation.

Summary of volume Algorithmic tools, this series of notebooks.

Main summary of the Adaptive Grid Discretizations book of notebooks, including the other volumes.

Table of contents¶

1. Generating variables and requesting an expression's derivatives
2. Linear operators and their adjoints
- 2.1 Linear mapping
- 2.2 Linear inversion
3. Non-linear operators
4. Format of variables
- 4.1 Scalars and multi-dimensional arrays
- 4.2 Lists and dictionnaries

Acknowledgement. Some of the experiments presented in these notebooks are part of ongoing research with Ludovic Métivier and Da Chen.

Copyright Jean-Marie Mirebeau, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay

Known bugs and incompatibilities. Our implementation of automatic differentiation is based on subclassing the numpy array class. While simple and powerful, this approach suffers from a few pitfalls, described in notebook ADBugs.

0. Importing the required libraries¶

In [1]:

import sys; sys.path.insert(0,"..") # Allow importing agd from parent directory
#from Miscellaneous import TocTools; print(TocTools.displayTOC('Reverse','Algo'))

In [2]:

import numpy as np
import scipy.sparse; import scipy.sparse.linalg

In [3]:

import agd.AutomaticDifferentiation as ad
import agd.FiniteDifferences as fd
import agd.LinearParallel as lp

In [4]:

def LInfNorm(a): return np.max(np.abs(a))

1. Generating variables and requesting an expression's derivatives¶

Reverse automatic differentiation works by keeping a history of the computations. The sensitivity of the result w.r.t some components is obtained by propagating the sensitivities backward in the computation queue and using the operator adjoints.

Our implementation of the reverseAD is differs from the denseAD or sparseAD classes: here we do not define an data type overloading the arithmetic operators and the basic special functions. Instead, the reverseAD class is only meant to keep a history of user specified computations.

Note to reader. This section only shows how to generate variables, and request an expression's derivatives. See the next section for an actual use of automatic differentiation.

1.1 Gradient¶

We first create some basic numpy arrays.

In [5]:

x0=np.linspace(0,np.pi,4)
gridScale=x0[1]-x0[0]
u0=np.sin(x0)
v0=np.cos(x0)

Then we generate a reverseAD history, and register the variables of interest.

In [6]:

# Create an empty history
rev = ad.Reverse.empty()
# Create AD variables w.r.t which the gradient will be required
u1 = rev.identity(constant=u0)
v1 = rev.identity(constant=v0)

# Equivalently : 
#rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0))

In our implementation, the generated AD variables are of sparse AD type.

In [7]:

print("u1:",u1)
print("v1:",v1)

u1: spAD(array([0.00000000e+00, 8.66025404e-01, 8.66025404e-01, 1.22464680e-16]), array([[1.],
       [1.],
       [1.],
       [1.]]), array([[0],
       [1],
       [2],
       [3]]))
v1: spAD(array([ 1. ,  0.5, -0.5, -1. ]), array([[1.],
       [1.],
       [1.],
       [1.]]), array([[4],
       [5],
       [6],
       [7]]))

If we make some computation using the variables registered in the reverseAD class, then we can request the gradient of the final expression.

In [8]:

result = (u1**2+v1**2).sum()

In [9]:

rev.gradient(result)

Out[9]:

array([ 0.00000000e+00,  1.73205081e+00,  1.73205081e+00,  2.44929360e-16,
        2.00000000e+00,  1.00000000e+00, -1.00000000e+00, -2.00000000e+00])

If needed, this expression can be reshaped similar to the input variables.

In [10]:

u_grad,v_grad = rev.to_inputshapes(rev.gradient(result))

In [11]:

assert LInfNorm(v_grad-2*v0) < 1e-6

1.2 Hessian operator¶

The hessian operator may be accessed as well, using the Reverse2 submodule. Note that the hessian matrix itself is never assembled, since in applications it would typically be dense and of large size.

In [12]:

rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))
result = (u1**2+v1**2).sum()

In [13]:

grad = rev.gradient(result) # Gradient
hess = rev.hessian(result) # Hessian operator

In [14]:

grad

Out[14]:

array([ 0.00000000e+00,  1.73205081e+00,  1.73205081e+00,  2.44929360e-16,
        2.00000000e+00,  1.00000000e+00, -1.00000000e+00, -2.00000000e+00])

In this specific example, the hessian operator is twice the identity.

In [15]:

hess(grad)

Out[15]:

array([ 0.00000000e+00,  3.46410162e+00,  3.46410162e+00,  4.89858720e-16,
        4.00000000e+00,  2.00000000e+00, -2.00000000e+00, -4.00000000e+00])

1.3 Simplification of the AD information¶

Our implementation of reverse AD does rely on sparse AD, for the elementary operations, such as :

arithmetic operators $+,-,*,/$.
special functions $\cos$, $\sin$, $\ln$, $\exp$.
variable reshaping.

One may recall that sparseness is easily lost, as in the example below, which may result in complexity issues.

In [16]:

n=4
rev = ad.Reverse2.empty()
u1 = rev.identity(shape=(n,n))
u2 = (1.+u1+u1**2).sum(axis=0)
result = (u2**2).sum(axis=0)

grad,hess = rev.gradient(result),rev.hessian(result)

In [17]:

print("Result AD size : ",result.size_ad2)

Result AD size :  272

Sparse AD simplification may limit sparsity loss, but it is a costly process (involving sorting), which is not always very efficient.

In [18]:

u2.simplify_ad()
result = (u2**2).sum(axis=0)
print("Result AD size with sparse AD simplification : ",result.size_ad2)

Result AD size with sparse AD simplification :  80

We can take advantage of reverse AD to reintroduce some sparsity.

Important. Simplification based on reverse AD should not be confused with simplification based on sparse AD. Indeed, it does not involve sorting, and it is always optimally efficient.

In [19]:

rev = ad.Reverse2.empty()
u1 = rev.identity(shape=(n,n))
u2 = (1.+u1+u1**2).sum(axis=0)
u2_simpl = rev.simplify(u2) # Intermediate simplification
result = (u2_simpl**2).sum(axis=0)

grad_simpl,hess_simpl = rev.gradient(result),rev.hessian(result)

In [20]:

print("Result AD size with reverse AD simplification : ",result.size_ad2)

Result AD size with reverse AD simplification :  4

Reverse AD simplification proceeds by replacing the the complicated symbolic perturbations in $u_2$ with placeholders, in $u_2\_$simpl, with negative indices.

In [21]:

u2_simpl

Out[21]:

spAD2(array([4., 4., 4., 4.]), array([[1.],
       [1.],
       [1.],
       [1.]]), array([[-1],
       [-2],
       [-3],
       [-4]]), array([], shape=(4, 0), dtype=float64), array([], shape=(4, 0), dtype=int64), array([], shape=(4, 0), dtype=int64))

The relative dependency of the old and new symbolic perturbations is registered, which allows computing derivatives.

In [22]:

assert LInfNorm(grad-grad_simpl) < 1e-13
dir_hessian = grad**2+1.
assert LInfNorm(hess(dir_hessian)-hess_simpl(dir_hessian)) < 1e-13

2. Linear operators and their adjoints¶

We illustrate Reverse automatic differentiation, in its intended use case involving operators operators whose jacobians are expected to be both high-dimensional and non-sparse. Note that linear operators often fit this desciption. For instance:

A linear mapping, given in sparse form, but iterated many times.
The inverse of a linear mapping, given in sparse form.
The fast fourier transform, etc

2.1 Linear mapping¶

We first construct some sparse matrix, for the exposition, based on a transport scheme implemented using upwind finite differences.

In [23]:

def TransportScheme(u,h,speed,dt):
    return u+dt*speed*fd.DiffUpwind(u,(1,),h,padding=0.)

In order to construct the matrix, we evaluate the scheme on a sparse AD variable.

In [24]:

speed = 1.; T=1.5; nsteps=5; dt = T/nsteps;
transport_ad = TransportScheme(ad.Sparse.identity(u0.shape),gridScale,speed,dt)

In [25]:

transport_matrix = scipy.sparse.coo_matrix(transport_ad.triplets()).tocsr()

We can now use this matrix in the course of reverse AD computations, mixing both linear and non-linear steps.

In [26]:

rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0)) # Create the reverse AD environnement
u2 = rev.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps) #Implement the desired method
result_Reverse = (u2*v1).sum()

Due to the multiple iterations, the variable $u_2$ does not depend in a sparse manner on $u_1$. In our implementation, the variable $u_2$ is of sparseAD type but features negative placeholder indices. Likewise for the final computation result.

In [27]:

u2

Out[27]:

spAD(array([5.02050076e-01, 4.17158585e-01, 1.38706040e-01, 2.77367654e-33]), array([[1.],
       [1.],
       [1.],
       [1.]]), array([[-1],
       [-2],
       [-3],
       [-4]]))

In [28]:

result_Reverse

Out[28]:

spAD(array(0.64127635), array([ 1.00000000e+00,  5.00000000e-01, -5.00000000e-01, -1.00000000e+00,
        5.02050076e-01,  4.17158585e-01,  1.38706040e-01,  2.77367654e-33]), array([-1, -2, -3, -4,  4,  5,  6,  7]))

The negative indices are newly introduced symbolic perturbations, whose depedence on the original ones is known thanks to the registered adjoint. This is exploited for computing the gradient of the result.

In [29]:

rev.gradient(result_Reverse)

Out[29]:

array([ 0.00000000e+00,  8.03222547e-01,  6.77741743e-01, -2.49367755e-17,
        5.02050076e-01,  4.17158585e-01,  1.38706040e-01,  2.77367654e-33])

We can check the validity of this implementation using dense automatic differentiation, since this specific instance is small.

In [30]:

n=x0.size
u1 = ad.Dense2.identity(constant=u0,shift=(0,n))
v1 = ad.Dense2.identity(constant=v0,shift=(n,0))
u2 = ad.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps)
result_Dense2 = (u2*v1).sum()

In [31]:

assert LInfNorm(rev.gradient(result_Reverse) - result_Dense2.coef1) < 1e-13

Second order reverse automatic differentiation is also implemented.

In [32]:

rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))
u2 = rev.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps) 
result_Reverse2 = (u2*v1).sum()

In [33]:

assert LInfNorm(rev.gradient(result_Reverse2) - result_Dense2.coef1) < 1e-13

In [34]:

grad = rev.gradient(result_Reverse2)
hess = rev.hessian(result_Reverse2)

Note that the hessian is provided as an operator, that can be applied to arbitrary vectors, and not in a matrix form. This is enough to run e.g. the conjugate gradient method.

In [35]:

hess

Out[35]:

<function agd.AutomaticDifferentiation.Reverse2.reverseAD2.hessian.<locals>.hess_operator(dir_hessian, coef2_init=None, with_grad=False)>

In [36]:

dir_hessian = grad**2+1.
assert LInfNorm(hess(dir_hessian) - np.dot(result_Dense2.coef2,dir_hessian)) < 1e-13

2.2 Linear inversion¶

Linear inversion is almost always a non-local procedure, but with a simple and explicit adjoint. It is again a good fit for reverse AD. The syntax is almost identical, except for the choice of a linear solver.

In [37]:

rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))

# Choose a solver, and request the inversion.
solver = scipy.sparse.linalg.spsolve
u2 = rev.apply_linear_inverse(solver,transport_matrix,u1**2) 

result_Reverse2 = (u2*v1).sum()

grad = rev.gradient(result_Reverse2)
hess = rev.hessian(result_Reverse2)

Again, we control the results in this simple instance using dense AD.

In [38]:

n=x0.size
u1 = ad.Dense2.identity(constant=u0,shift=(0,n))
v1 = ad.Dense2.identity(constant=v0,shift=(n,0))
u2 = ad.apply_linear_inverse(solver,transport_matrix,u1**2)
result_Dense2 = (u2*v1).sum()

In [39]:

assert LInfNorm(grad - result_Dense2.coef1) < 1e-13
dir_hessian = grad**2+1.
assert LInfNorm(hess(dir_hessian) - np.dot(result_Dense2.coef2,dir_hessian)) < 1e-13

3. Non-linear operators¶

In the previous section, we have shown how to apply reverse AD to operators which are either:

linear
non-linear but local
compositions of the above

We discuss here the general case of arbitrary non-linear operators. These techniques are typically needed in the context of non-linear evolution equations.

3.1 Defining an operator and its adjoint¶

For an operator to be used in a reverse AD context, two execution paths must be implemented:

Standard program flow, the output is computed in a forward manner from the inputs.
Adjoint jacobian, if a co_output is specified. The operator should return a list of pairs, each consisting of an input argument and the relative co_input.

Note on the differentiation order. This specific subsection only applies to first order reverse AD. A slightly more complex implementation is required for second order reverse AD.

In [40]:

def my_operator(u,co_output=None):
    if co_output is None: 
        return transport_matrix*u # Standard program flow
    else:
        co_output_value,co_arg_request = co_output
        assert len(co_arg_request)==1 and co_arg_request[0] is u
        return [(u, transport_matrix.T*co_output_value)] # Adjoint jacobian 

This operator is functionally equivalent to the method rev.apply_linear_mapping(transport_matrix,...)

In [41]:

rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0)) # Create the reverse AD environnement
u2 = rev.apply(my_operator,u1**2) #Implement the desired method
result_Reverse = (u2*v1).sum()

grad = rev.gradient(result_Reverse)

In [42]:

grad

Out[42]:

array([ 0.00000000e+00,  1.11412341e+00, -3.69829398e-01, -2.09845813e-16,
        2.14859173e-01,  7.50000000e-01,  5.35140827e-01,  1.07011025e-32])

In [43]:

rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0))
u2 = rev.apply_linear_mapping(transport_matrix,u1**2) #Implement the desired method
result_Reverse = (u2*v1).sum()

assert LInfNorm(grad-rev.gradient(result_Reverse))<1e-13

3.2 Defining an operator from a block of instructions¶

For the purposes of illustration, we define an operator which:

has multiple inputs, each multi-dimensional.
has a multi-dimensional output.
is nonlocal (because of the inner loop involving a linear mapping).

This operator is defined using a block of elementary instructions.

In [44]:

def my_operator(u,v,co_output=None):    
    # Generate an operator-like AD environnement
    rev,(u1,v1) = ad.Reverse.operator_like(inputs=(u,v),co_output=co_output)
    
    # Do some computations
    u2 = rev.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps)
    result = u2*v1
    
    # Return the suitably converted result
    return rev.output(result)

The non-linear operator defined above can be applied to ordinary variables without AD information, or dense AD variables. (The multiplication with transport_matrix forbids the use of sparse AD variables.)

In [45]:

result_float = (my_operator(u0,u0*v0)*v0).sum()

In [46]:

n=x0.size
u1 = ad.Dense2.identity(constant=u0,shift=(0,n))
v1 = ad.Dense2.identity(constant=v0,shift=(n,0))

result_Dense2 = (my_operator(u1,u1*v1)*v1).sum()

It can also be used in a reverse AD context. (Note that this is recursive reverse AD.)

In [47]:

rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0))

result_Reverse = (rev.apply(my_operator,u1,u1*v1)*v1).sum()

grad = rev.gradient(result_Reverse)

In [48]:

assert LInfNorm(grad - result_Dense2.coef1) < 1e-13

In [49]:

rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))

result_Reverse2 = (rev.apply(my_operator,u1,u1*v1)*v1).sum()

grad = rev.gradient(result_Reverse2)
hess = rev.hessian(result_Reverse2)

In [50]:

assert LInfNorm(grad - result_Dense2.coef1) < 1e-13
dir_hessian = grad**2+1.
assert LInfNorm(hess(dir_hessian)-np.dot(result_Dense2.coef2,dir_hessian)) < 1e-13

3.3 Loops¶

We illustrate recursive reverse AD in nested loops. This approach allows to limit memory usage, at the cost of some recomputation.

Note on complexity Reverse automatic differentiation requires saving program state at each intermediate computation steps. If a program contains a loop, this may result in severe memory usage. Therefore, it is common to only store a small proportion of the intermediate states, referred to as keypoints, and to use recomputations for reconstructing the other steps. For best efficiency, this procedure must be made recursive.

Our first method makes $n$=niter iterations of the arithmetico-geometric algorithm.

Side note : the chosen example is not perfect, because this algorithm converges excessively fast, hence we only use a very small number of iterations, which may seem pointless.

In [51]:

counter=0 # Iteration counter
def my_loop(a,b,niter,co_output=None):
    global counter
    rev,(a,b) = ad.Reverse.operator_like(inputs=(a,b),co_output=co_output)
    for i in range(niter):
        a,b = (a+b)/2., np.sqrt(a*b)
        a,b = rev.simplify(a),rev.simplify(b)
        counter+=1
    result = (a,b)
    return rev.output(result)

In [52]:

a0=np.array([1.,2.,4.,8.])
b0=1/a0

In [53]:

niter=2; counter=0
my_loop(a0,b0,niter),counter

Out[53]:

((array([1.     , 1.125  , 1.5625 , 2.53125]),
  array([1.        , 1.11803399, 1.45773797, 2.01556444])),
 2)

After the selected number of iterations, we are quite far from convergence. We do note want to increase $n$, which would augment the number of intermediate steps saved. Instead, we achieve $n^2$ iterations while saving only $2n$ states, using an outer loop. This comes at the cost of recomputing $2$ times each step in the gradient evaluation.

With a similar basic recursion, one can achieve $n^k$ iterations, while saving only $k n$ steps, using $k$ nested loops. This comes at the cost of recomputing $k$ times each step in the gradient evaluation.

With the convention below, $k=$nrec$+1$

In [54]:

def my_outer_loop(a,b,niter,nrec,operator,co_output=None):
    rev,(a,b) = ad.Reverse.operator_like(inputs=(a,b),co_output=co_output)
    for i in range(niter):
        if nrec==1: a,b = rev.apply(operator,a,b,niter)
        else: a,b = rev.apply(my_outer_loop,a,b,niter,nrec-1,operator)
    result = (a,b)
    return rev.output(result)

In [55]:

niter=2; nrec=1; counter=0
my_outer_loop(a0,b0,niter,nrec,my_loop), counter

Out[55]:

((array([1.        , 1.12151429, 1.50966462, 2.26607262]),
  array([1.        , 1.12151429, 1.50966455, 2.26606075])),
 4)

In [56]:

niter=2; nrec=2; counter=0
my_outer_loop(a0,b0,niter,nrec,my_loop), counter

Out[56]:

((array([1.        , 1.12151429, 1.50966459, 2.26606669]),
  array([1.        , 1.12151429, 1.50966459, 2.26606669])),
 8)

In [57]:

niter=2; nrec=2; counter=0; 
rev,(a1,b1) = ad.Reverse.empty(inputs=(a0,b0))

a2,b2 = rev.apply(my_outer_loop,a1,b1,niter,nrec,my_loop)
result = (a2*b2).sum()
print("Iteration counter for forward evaluation", counter)

counter=0;
grad_rec = rev.gradient(result)
print("Iteration counter for backward evaluation", counter)

Iteration counter for forward evaluation 8
Iteration counter for backward evaluation 24

For verification purposes, we compare our results non-nested iteration.

In [58]:

niter2 = niter**(1+nrec); counter=0
rev,(a1,b1) = ad.Reverse.empty(inputs=(a0,b0))

a2,b2 = rev.apply(my_loop,a1,b1,niter2)
result = (a2*b2).sum()
print("Iteration counter for forward evaluation", counter)

counter=0;
grad_iter = rev.gradient(result)
print("Iteration counter for backward evaluation", counter)

assert LInfNorm(grad_rec-grad_iter)<1e-13

Iteration counter for forward evaluation 8
Iteration counter for backward evaluation 8

The hessian evaluation in second order reverse AD requires even more iterations, because it combines a forward and a backward pass.

In [59]:

niter=2; nrec=2; counter=0; 
rev,(a1,b1) = ad.Reverse2.empty(inputs=(a0,b0))

a2,b2 = rev.apply(my_outer_loop,a1,b1,niter,nrec,my_loop)
result = (a2*b2).sum()
print("Iteration counter for forward evaluation", counter)

counter=0;
grad_rec2 = rev.gradient(result)
print("Iteration counter for gradient evaluation", counter)
assert LInfNorm(grad_rec-grad_rec2)<1e-13

counter=0
hess = rev.hessian(result)
dir_hessian = grad_rec**2+1.
hess_rec2 = hess(dir_hessian)
print("Iteration counter for hessian evaluation", counter)

Iteration counter for forward evaluation 8
Iteration counter for gradient evaluation 24
Iteration counter for hessian evaluation 48

Eventually, we check our results using dense AD, and an unrolled loop.

In [60]:

niter2 = niter**(1+nrec); counter=0
a1 = ad.Dense2.identity(constant=a0,shift=(0,4))
b1 = ad.Dense2.identity(constant=b0,shift=(4,0))

a2,b2 = my_loop(a1,b1,niter2)
result = (a2*b2).sum()
print("Iteration counter for forward evaluation", counter)

grad_dense2 = result.coef1
hess_dense2 = np.dot(result.coef2,dir_hessian)

assert np.allclose(grad_rec2,grad_dense2) 
assert np.allclose(hess_rec2,hess_dense2) 

Iteration counter for forward evaluation 8

4. Format of variables¶

Variables used in reverseAD mode may be scalars, arrays, multidimensional arrays, as well as tuples, lists and dicts. The latter two cases, lists and dicts, must be explicitly requested as they are not considered by default.

4.1 Scalars and multi-dimensional arrays¶

Arrays of arbitrary depth are supported, including array scalars. As usual, please be careful when manipulating array scalars and numpy.float64 values, see notebook ADBugs.

In [61]:

x0=1.;x1=np.array([1.,2.]);x2=np.array([[3.,4.],[5.,6.]]);x3=np.zeros((2,2,2))

In [62]:

rev,(y0,y1,y2,y3)=ad.Reverse.empty(inputs=(x0,x1,x2,x3))
z0 = (y1[1]+y2*y3).sum()
t0 = rev.simplify(z0)
grad_rev = rev.gradient(y0+t0**2)
rev.to_inputshapes(grad_rev)

Out[62]:

(array(1.),
 array([  0., 256.]),
 array([[0., 0.],
        [0., 0.]]),
 array([[[ 96., 128.],
         [160., 192.]],
 
        [[ 96., 128.],
         [160., 192.]]]))

In [63]:

rev,(y0,y1,y2,y3)=ad.Reverse2.empty(inputs=(x0,x1,x2,x3))
z0 = (y1[1]+y2*y3).sum()
t0 = rev.simplify(z0)
obj = y0+t0**2 
grad_rev2 = rev.gradient(obj)
hess_rev2 = rev.hessian(obj)(grad_rev2)

We next check the result using dense automatic differentiation.

In [64]:

y0,y1,y2,y3 = ad.Dense2.register(inputs=(x0,x1,x2,x3))
z0=(y1[1]+y2*y3).sum()
obj=y0+z0**2
grad_dens = obj.gradient()
hess_dens = lp.dot_AV(obj.hessian(),grad_dens)

In [65]:

y0,y1,y2,y3 = ad.Sparse2.register(inputs=(x0,x1,x2,x3))
z0=(y1[1]+y2*y3).sum()
obj=(y0+z0**2).to_dense()
grad_sp = obj.gradient()
hess_sp = lp.dot_AV(obj.hessian(),grad_dens)

In [66]:

assert(all(grad_dens==grad_rev))
assert(all(grad_dens==grad_rev2))
assert(all(grad_dens==grad_sp))
assert(all(hess_dens==hess_rev2))
assert(all(hess_dens==hess_sp))

4.2 Lists and dictionnaries¶

By default, the provided reverseAD library does not look into input lists and dictionaries for AD information : only tuples are explored. Lists and dictionnaries are regarded as opaque objects, whose contents are not differentiable. This behavior can be changed, as described below.

In [67]:

def myfun(mylist,mydict,revType=ad.Reverse,co_output=None):
    rev,(l,dx,dy) = revType.operator_like(
        inputs=(mylist,mydict['x'],mydict['y']),co_output=co_output,
        input_iterables=(tuple,list),
        output_iterables=(list,))
    return rev.output( [l[0]*dx+dy,l[1]*dy] )

In [68]:

a0=[1.,np.array([2.,3.])]
b0={'x':np.array([4.,5.]),'y':6.}
rev,[a1,b1] = ad.Reverse.empty(inputs=[a0,b0], # list of list and dict
                                input_iterables=(list,dict),
                                output_iterables=(list,)) 
u,v= rev.apply(myfun,a1,b1)
result = lp.dot_VV(u,v)
grad_rev1 = rev.gradient(result)

In [69]:

revType=ad.Reverse2
rev,[a1,b1] = ad.Reverse2.empty(inputs=[a0,b0], # list of list and dict
                                input_iterables=(list,dict),
                                output_iterables=(list,)) 
u,v= rev.apply(myfun,a1,b1,ad.Reverse2)
result = lp.dot_VV(u,v)
grad_rev2 = rev.gradient(result)
hess_rev2 = rev.hessian(result)(grad_rev2)

Checking with denseAD

In [70]:

a1,b1 = ad.Dense2.register([a0,b0],iterables=(list,dict))
u,v= myfun(a1,b1)
result = lp.dot_VV(u,v)
grad_dense2 = result.gradient()
hess_dense2 = lp.dot_AV(result.hessian(),grad_dense2)

In [71]:

assert(all(grad_dense2==grad_rev1))
assert(all(grad_dense2==grad_rev2))
assert(all(hess_dense2==hess_rev2))

In [ ]: