Adaptive PDE discretizations on Cartesian grids¶

Volume : Algorithmic tools¶

Part : Automatic differentiation¶

Chapter : Sparse automatic differentiation¶

This notebook illustrates automatic differentiation, with an application to an simple one-dimensional denoising problem. More precisely, we use the following flavors of automatic differentiation:

First and Second order. The variables used encode gradient and (sometimes) hessian information.
Sparse. The Taylor expansion information - gradient and hessian - is stored in sparse form. This is practical for functions with low complexity, but an input of arbitrary dimension.
Forward. The Taylor expansion information is propagated forward along with the computations.

Summary of volume Algorithmic tools, this series of notebooks.

Main summary of the Adaptive Grid Discretizations book of notebooks, including the other volumes.

Table of contents¶

1. A word on functional optimization
2. Sparse automatic differentiation
3. Composition of Dense and Sparse AD
4. Differentiating extrema using the envelope theorem

Acknowledgement. Some of the experiments presented in these notebooks are part of ongoing research with Ludovic Métivier and Da Chen.

Copyright Jean-Marie Mirebeau, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay

Known bugs and incompatibilities. Our implementation of automatic differentiation is based on subclassing the numpy array class. While simple and powerful, this approach suffers from a few pitfalls, described in notebook ADBugs.

0. Importing the required libraries¶

In [1]:

import sys; sys.path.insert(0,"..") # Allow import of agd from parent directory (useless if conda package installed)
#from Miscellaneous import TocTools; print(TocTools.displayTOC('Sparse','Algo'))

In [2]:

import numpy as np

In [3]:

import agd.AutomaticDifferentiation as ad
import agd.FiniteDifferences as fd

In [4]:

def LInfNorm(a): return np.max(np.abs(a))
def LInfNorm_AD(a): return np.max((LInfNorm(a.value),LInfNorm(a.coef)))
def LInfNorm_AD2(a): return np.max((LInfNorm(a.value),LInfNorm(a.coef1),LInfNorm(a.coef2)))

In [5]:

def reload_packages():
    from Miscellaneous.rreload import rreload
    global ad,fd
    [ad,fd] = rreload([ad,fd],rootdir='..')

1. A word on functional optimization¶

We illustrate sparse automatic differentiation with an optimization problem of the form $$ \min_u \int_\Omega F(x,\nabla u(x)) + G(x,u(x)) \, dx, $$ where the unknown $u$ is a function defined over a given domain $\Omega$, with arbitrary boundary conditions.

The cost functions $F$ and $G$ are given as well. Classicaly, $F$ is a regularization term, whereas $G$ is a data fidelity term. The simplest case would be $$ F_0(x,v) := \frac 1 2 \|v\|^2, \qquad G_0(x,y) := \frac 1 2 \|y-g_0(x)\|^2, $$ where $g_0(x)$ is some given data. However, one may also think of numerous variants, involving for instance and without loss of generality:

Inhomogeneous and Anisotropic norms, such as $F(x,v) = \frac 1 2 <v,D(x) v>$ where $D(x)$ is a positive definite matrix. This allows, in dimension $d\geq 2$, to enforce regularity more or less strongly depending on the position and the orientation.
$L^1$ penalization, such as $F(x,v) = \|v\|$ or $G(x,y)=|y-g_0(x)|$. These convex yet weaker cost functions promote 'sparsity' in certain contexts.
Entropic penalization, for positive functions : $G(x,y) = y\ln(y/u_0(x))$.

Costs involving a kernel. Applications such as deblurring often involve an additional cost of the form $$ H(x, (\rho\star u)(x)), $$ where $\rho$ is a convolution kernel. Because convolution is a non-local operator with a large support, sparse automatic differentiation is not a good fit. Backward automatic differentiation would be appropriate here, but it is not supported at this point.

Problem parameters. In order to emphasize automatic differentiation tools, rather than a particular application, we consider as a start one of the most basic problem instantiations. See the other notebooks for more advanced examples.

The domain $\Omega$ is defined as the (one dimensional) segment $[0,1]$ equipped with periodic boundary conditions. We let $F_0$ and $G_0$ be the simple instances above defined, with a discontinuous $u_0$.

In [6]:

def F0(x,v):
    return 0.5*v**2

def g0(x):
    return (np.abs(x-0.5)<0.3) + np.sin(2.*np.pi*x)
def G0(x,y):
    return 0.5*(y-g0(x))**2

def Objective(u,x,dx,F,G):
#    h=x[1]-x[0] # The grid scale
    du = fd.DiffUpwind(u,(1,),dx,padding=None) # padding=None is for periodic bc
    return (F(x+0.5*dx,du)+G(x,u)).sum(axis=0) * dx # Integrate over the domain

In [7]:

x,dx=np.linspace(0,1,10,endpoint=False,retstep=True)
u=np.zeros(x.shape)

In [8]:

Objective(u,x,dx,F0,G0)

Out[8]:

0.5

The optimality condition of the variational problem, for above specific choices of $F_0$ and $G_0$, is $$ -\Delta u + u-g_0 = 0. $$ We can therefore, alternatively, solve directly this PDE instead going through the minimization problem.

In [9]:

def Scheme0(u,x):
    h=float(x[1]-x[0])
    d2u = fd.Diff2(u,(1,),h,padding=None)
    return -d2u +u -g0(x)

In [10]:

Scheme0(u,x)

Out[10]:

array([ 0.        , -0.58778525, -0.95105652, -1.95105652, -1.58778525,
       -1.        , -0.41221475, -0.04894348,  0.95105652,  0.58778525])

2. Sparse automatic differentiation¶

The sparse automatic differentiation classes, first and second order, allow to compute gradients, jacobian matrices, and hessians, represented in a sparse manner.

2.1 Elementary symbolic perturbation¶

Assume that $u^0 = (u^0_i)_{0 \leq i < n}$ is some array. Using the ad.Sparse.identity constructor, one can create an augmented variable $u^1$ with components $$ u^1_i = u^0_i + \delta_i + o(\delta_i) $$ where $(\delta_i)_{0 \leq i < n}$ are independent first order symbolic perturbations.

In [11]:

x = np.linspace(0,1,5,endpoint=False)
u0 = np.cos(2.*np.pi*x)

In [12]:

u1 = ad.Sparse.identity(constant=u0)

Elementary properties of the np.ndarray class are inherited

In [13]:

print("ndim :",u1.ndim,", shape :",u1.shape,", size :",u1.size,", len :",len(u1))

ndim : 1 , shape : (5,) , size : 5 , len : 5

However, internal data includes coefficient and index arrays, with one additional dimension. For now, this additional dimension is a singleton, because the perturbations of each component $u_i^1$ is the elementary $\delta_i$.

In [14]:

u1.size_ad

Out[14]:

In [15]:

print("Constant term :", u1.value)
print("Coefficients of the symbolic perturbation :", u1.coef)
print("Indices of the symbolic perturbation :", u1.index)

Constant term : [ 1.          0.30901699 -0.80901699 -0.80901699  0.30901699]
Coefficients of the symbolic perturbation : [[1.]
 [1.]
 [1.]
 [1.]
 [1.]]
Indices of the symbolic perturbation : [[0]
 [1]
 [2]
 [3]
 [4]]

In [16]:

u1 # Scalar values, gradient coefficients, gradient indices.

Out[16]:

spAD(array([ 1.        ,  0.30901699, -0.80901699, -0.80901699,  0.30901699]), array([[1.],
       [1.],
       [1.],
       [1.],
       [1.]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

Second order sparse automatic differentiation is based on the same principles. Using the ad.Sparse2.identity constructor, one generates an augmented variable $u^2$ with components $$ u^2_i = u^0_i + \delta_i + o(\delta_i^2) $$ where $(\delta_i)_{0 \leq i < n}$ are independent first order symbolic perturbations. The only difference with the first order case here lies in the remainder $o(\delta_i^2)$: the second order coefficients are known and explicitly set to zero.

In [17]:

u2 = ad.Sparse2.identity(constant=u0)

In [18]:

print("Constant term :", u2.value)
print("First order coefficients  :", u2.coef1)
print("First order indices :", u2.index)
print()

print("Second order coefficients :",u2.coef2)
print("Second order row indices :",u2.index_row)
print("Second order column indices :",u2.index_col)

Constant term : [ 1.          0.30901699 -0.80901699 -0.80901699  0.30901699]
First order coefficients  : [[1.]
 [1.]
 [1.]
 [1.]
 [1.]]
First order indices : [[0]
 [1]
 [2]
 [3]
 [4]]

Second order coefficients : []
Second order row indices : []
Second order column indices : []

In [19]:

u2

Out[19]:

spAD2(array([ 1.        ,  0.30901699, -0.80901699, -0.80901699,  0.30901699]), array([[1.],
       [1.],
       [1.],
       [1.],
       [1.]]), array([[0],
       [1],
       [2],
       [3],
       [4]]), array([], shape=(5, 0), dtype=float64), array([], shape=(5, 0), dtype=int64), array([], shape=(5, 0), dtype=int64))

2.2 Evaluating the objective function¶

Arithmetic operations, such as left and right multiplication, addition, substraction, division, work as usual. In particular, we can evaluate our objective function.

In [20]:

obj1 = Objective(u1,x,dx,F0,F0)

The result is an array scalar, i.e. an array whose shape is the empty tuple, but with substantial AD information.

In [21]:

obj1

Out[21]:

spAD(array(17.39957514), array([ -6.90983006, -11.18033989,   0.        ,  11.18033989,
         6.90983006,   6.90983006,  11.18033989,  -0.        ,
       -11.18033989,  -6.90983006,   0.1       ,   0.0309017 ,
        -0.0809017 ,  -0.0809017 ,   0.0309017 ]), array([1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))

Dense representation of the AD information would be more adequate for this particular variable. A conversion is obtained as follows.

In [22]:

obj1.to_dense()

Out[22]:

denseAD(array(17.39957514),
array([ 13.91966011,   4.30141153, -11.26124159, -11.26124159,
         4.30141153]))

Note also that the variable $b^1=$ obj1 has the form $$ b^1 = b^0 + \sum_{0 \leq i <I} \alpha_i \delta_{k_i}, $$ where the $k_i$ are sorted and not pairwise distinct. A canonical form is obtained with the simplify_ad method.

In [23]:

obj1.simplify_ad() # In place simplification
obj1

Out[23]:

spAD(array(17.39957514), array([ 13.91966011,   4.30141153, -11.26124159, -11.26124159,
         4.30141153]), array([0, 1, 2, 3, 4]))

In [24]:

obj1.to_dense()

Out[24]:

denseAD(array(17.39957514),
array([ 13.91966011,   4.30141153, -11.26124159, -11.26124159,
         4.30141153]))

First order automatic differentiation provides us with the gradient of the objective function, which enables gradient descent methods for minimization. However, Newton methods are more appropriate in certain contexts. For that purpose, second order automatic differentiation can extract the objective function hessian.

In [25]:

obj2 = Objective(u2,x,dx,F0,G0)

Looking at the dense form, one recognizes the matrix of a Laplacian.

In [26]:

obj2.to_dense()

Out[26]:

denseAD2(array(17.78637854),
array([ 13.91966011,   4.20630588, -11.42002011, -11.30246306,
         4.39651718]),
array([[ 20.1, -10. ,   0. ,   0. , -10. ],
       [-10. ,  20.1, -10. ,   0. ,   0. ],
       [  0. , -10. ,  20.1, -10. ,   0. ],
       [  0. ,   0. , -10. ,  20.1, -10. ],
       [-10. ,   0. ,   0. , -10. ,  20.1]]))

Our particular objective function, with those specific choices of $F_0$ and $G_0$, is quadratic. Therefore obj2 is in fact an exact representation, and a Newton method based on these reconstructed gradient and hessian would converge in a single step. See the notebook Elliptic for more on this approach.

2.3 Evaluating the PDE residue¶

The numerical scheme, returning the PDE residue, can also be evaluated with AD variables.

In [27]:

scheme0 = Scheme0(u0,x)
scheme1 = Scheme0(u1,x)
scheme2 = Scheme0(u2,x)

The constant part is simply the PDE residue

In [28]:

scheme0

Out[28]:

array([ 35.54915028,  10.03423506, -30.34765197, -29.17208146,
        11.93634809])

The first order part contains the numerical scheme jacobian matrix.

In [29]:

scheme1

Out[29]:

spAD(array([ 35.54915028,  10.03423506, -30.34765197, -29.17208146,
        11.93634809]), array([[-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.]]), array([[1, 0, 4, 0],
       [2, 1, 0, 1],
       [3, 2, 1, 2],
       [4, 3, 2, 3],
       [0, 4, 3, 4]]))

Looking at the dense form, we recognize the matrix of a laplacian.

In [30]:

scheme1.to_dense(5)

Out[30]:

denseAD(array([ 35.54915028,  10.03423506, -30.34765197, -29.17208146,
        11.93634809]),
array([[ 51., -25.,   0.,   0., -25.],
       [-25.,  51., -25.,   0.,   0.],
       [  0., -25.,  51., -25.,   0.],
       [  0.,   0., -25.,  51., -25.],
       [-25.,   0.,   0., -25.,  51.]]))

Since this specific numerical scheme is linear, the variable scheme1 is a faithful representation of it, and a Newton method based on the extracted Jacobian will converge in a single step. See the notebook LinearMonotoneSchemes2D for similar but more advanced examples.

Again, since this numerical scheme is linear, second order terms are zero.

In [31]:

scheme2

Out[31]:

spAD2(array([ 35.54915028,  10.03423506, -30.34765197, -29.17208146,
        11.93634809]), array([[-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.],
       [-25.,  50., -25.,   1.]]), array([[1, 0, 4, 0],
       [2, 1, 0, 1],
       [3, 2, 1, 2],
       [4, 3, 2, 3],
       [0, 4, 3, 4]]), array([], shape=(5, 0), dtype=float64), array([], shape=(5, 0), dtype=int64), array([], shape=(5, 0), dtype=int64))

2.4 Other operations and mathematical functions¶

A number of elementary mathematical functions are implemented.

In [32]:

np.sqrt(1+u1)

Out[32]:

spAD(array([1.41421356, 1.14412281, 0.43701602, 0.43701602, 1.14412281]), array([[0.35355339],
       [0.43701602],
       [1.14412281],
       [1.14412281],
       [0.43701602]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

In [33]:

np.abs(u1)

Out[33]:

spAD(array([1.        , 0.30901699, 0.80901699, 0.80901699, 0.30901699]), array([[ 1.],
       [ 1.],
       [-1.],
       [-1.],
       [ 1.]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

Comparison operators return a basic np.ndarray, as well as integer valued functions. Indeed, since these functions take their values in a discrete set, AD information would be pointless.

In [34]:

u1 <= (-1+x)

Out[34]:

array([False, False,  True,  True, False])

In [35]:

np.floor(u1+0.5)

Out[35]:

array([ 1.,  0., -1., -1.,  0.])

Maximum and minimum work as well

In [36]:

np.maximum(x,u1)

Out[36]:

spAD(array([1.        , 0.30901699, 0.4       , 0.6       , 0.8       ]), array([[1.],
       [1.],
       [0.],
       [0.],
       [0.]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

3. Composition of Dense and Sparse AD¶

Let us recall the recommended use cases of Sparse and Dense automatic differentiation.

Sparse AD is best suited for simple functions of many variables.
Dense AD is best suited for complex functions with few variables.

In particular, using Sparse AD with a complex function, or Dense AD with many variables, will likely be inefficient in terms of memory usage and computation time. However, when numerically solving a PDE with a complex expression, one is often confronted with expressions such as: $$ f(\frac{u(x+h e_0)-u(x)} h,\cdots,\frac{u(x+h e_n)-u(x)} h) $$ We show, in that case, how to combine:

Sparse automatic differentiation, for the construction of the finite differences, which can be in any direction $e_i\in Z^d$ on the grid.
Dense automatic differentiation, for evaluating the function $f$ which may have a complex expression, possibly featuring inner loops.

3.1 The problem : growth of the Sparse AD information¶

Sparse automatic differentiation, as implemented, is mostly intended for very simple functions. Indeed, the size of the ad information grows quickly and can become overwhelming. For instance, consider the following function with a loop.

In [37]:

def fun_with_loop(y):
    r=y.copy()
    for i in range(3):
        r=r+r**2
    return r

In [38]:

f1 = fun_with_loop(u1)
f1

Out[38]:

spAD(array([42.        ,  0.89091371, -0.11356996, -0.11356996,  0.89091371]), array([[ 1.00000000e+00,  2.00000000e+00,  4.00000000e+00,
         8.00000000e+00,  1.20000000e+01,  2.40000000e+01,
         4.80000000e+01,  9.60000000e+01],
       [ 1.00000000e+00,  6.18033989e-01,  8.09016994e-01,
         5.00000000e-01,  1.13627124e+00,  7.02254249e-01,
         9.19262746e-01,  5.68135621e-01],
       [ 1.00000000e+00, -1.61803399e+00, -3.09016994e-01,
         5.00000000e-01, -2.61271243e-01,  4.22745751e-01,
         8.07372542e-02, -1.30635621e-01],
       [ 1.00000000e+00, -1.61803399e+00, -3.09016994e-01,
         5.00000000e-01, -2.61271243e-01,  4.22745751e-01,
         8.07372542e-02, -1.30635621e-01],
       [ 1.00000000e+00,  6.18033989e-01,  8.09016994e-01,
         5.00000000e-01,  1.13627124e+00,  7.02254249e-01,
         9.19262746e-01,  5.68135621e-01]]), array([[0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4, 4, 4, 4]]))

After each binary operation, the AD information is as big as the concatenation of the input AD informations. In a loop, it therefore grows exponentially and becomes extremely redundant. Here, the result can be much simplified.

AD information is however simplified only at the request of the user, because that is a costly operation, involving some sorting and array resizing. And simplifying the final result only partially solves the problem, due to the cost of computing the intermediate results.

In [39]:

f1.simplify_ad()
f1

Out[39]:

spAD(array([42.        ,  0.89091371, -0.11356996, -0.11356996,  0.89091371]), array([[195.        ],
       [  6.25297484],
       [ -0.31547484],
       [ -0.31547484],
       [  6.25297484]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

3.2 A first solution: regular simplifications¶

When a loop is present, the AD information grows exponentially and its size is quickly our of control, as illustrated above. A partial solution is to simplify the AD information at each iteration.

In [40]:

def fun_with_loop_simpl(y):
    r=y.copy()
    for i in range(3):
        r=r+r**2
        ad.simplify_ad(r) # Simplify AD information
    return r

Complexity, is now linear w.r.t. the number of iterations, instead of exponential.

In [41]:

fun_with_loop_simpl(u1)

Out[41]:

spAD(array([42.        ,  0.89091371, -0.11356996, -0.11356996,  0.89091371]), array([[195.        ],
       [  6.25297484],
       [ -0.31547484],
       [ -0.31547484],
       [  6.25297484]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

The function had to be slightly modified, but it still applies transparently to non AD variables.

In [42]:

fun_with_loop_simpl(u0)

Out[42]:

array([42.        ,  0.89091371, -0.11356996, -0.11356996,  0.89091371])

3.3 Another possible solution: the combination with dense automatic differentiation¶

Dense automatic differentiation is not subject to the same limitations: it is efficient for evaluating complex functions depending only on a few independent variables.

When such a function needs to be executed on a sparse AD variable, the recommended approach is to compose dense and sparse automatic differentiation. This is achieved using the ad.apply command and indicating that the last dimensions are bound together.

In [43]:

u1 = ad.Sparse.identity(constant=u0)
u2 = ad.Sparse2.identity(constant=u0)

In [44]:

ad.apply(fun_with_loop,u1,shape_bound=u1.shape)

Out[44]:

spAD(array([42.        ,  0.89091371, -0.11356996, -0.11356996,  0.89091371]), array([[195.        ],
       [  6.25297484],
       [ -0.31547484],
       [ -0.31547484],
       [  6.25297484]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

The following command achieves the same result "by hand", in a less automated way.

In [45]:

# These dimensions can be "bound together", a.k.a. not regarded as independent.
shape_bound = u1.shape
# Generate an adequately dimensioned denseAD variable
u1d = ad.Dense.identity(constant=u1.value,shape_bound=shape_bound)
# Evaluate the complex function 
fu1d = fun_with_loop(u1d)
# Compose the dense en sparse AD
ad.compose(fu1d,np.expand_dims(u1,axis=0),shape_bound=shape_bound)

Out[45]:

spAD(array([42.        ,  0.89091371, -0.11356996, -0.11356996,  0.89091371]), array([[195.        ],
       [  6.25297484],
       [ -0.31547484],
       [ -0.31547484],
       [  6.25297484]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

Composition also works between dense AD types. It may speed up computations in the case of the evaluation of a complex function on a dense AD type with many components.

In [46]:

ad.apply(fun_with_loop,u1.to_dense(),shape_bound=u1.shape)

Out[46]:

denseAD(array([42.        ,  0.89091371, -0.11356996, -0.11356996,  0.89091371]),
array([[195.        ,   0.        ,   0.        ,   0.        ,
          0.        ],
       [  0.        ,   6.25297484,   0.        ,   0.        ,
          0.        ],
       [  0.        ,   0.        ,  -0.31547484,   0.        ,
          0.        ],
       [  0.        ,   0.        ,   0.        ,  -0.31547484,
          0.        ],
       [  0.        ,   0.        ,   0.        ,   0.        ,
          6.25297484]]))

Composition also works for second order automatic differentiation.

In [47]:

result = fun_with_loop(u2.to_dense())

In [48]:

tuple(LInfNorm_AD2(result - a.to_dense()) for a in (
    fun_with_loop(u2),
    fun_with_loop_simpl(u2),
    ad.apply(fun_with_loop,u2,shape_bound=u2.shape),
))

Out[48]:

(1.4210854715202004e-14, 7.105427357601002e-15, 0.0)

In [49]:

tuple(LInfNorm_AD2(result - a) for a in (
    ad.apply(fun_with_loop,u2.to_dense(),shape_bound=u2.shape),
))

Out[49]:

(0.0,)

4. Differentiating extrema using the envelope theorem¶

The envelope theorem. Consider a function $f:X \to R$ defined as the minimum of a bivariate function $F:X \times A \to R$ over the second variable: $$ f(x):=\min_{\alpha\in A} F(x,\alpha), $$ for each element $x$ of a domain $X \subset R^d$, and where $A$ is an arbitrary set. The envelope theorem states that, under minor technical assumptions, one has $$ \nabla f(x) = \frac{\partial F}{\partial x}(x,\alpha(x)), $$ where $\alpha(x) \in A$ is the minimizer to the above optimization problem, that is $f(x) = F(x,\alpha(x))$.

Generalizations. The function $f$ may be defined by a maximum instead of a minimum, or a combination e.g. $f(x) =\min_{\alpha\in A}\max_{\beta \in B} F(x,\alpha,\beta)$. It may also be defined in a piecewise manner, etc.

Relevance to automatic differentiation. In a typical pratical situation:

Computing the optimal $\alpha(x)$ is a costly procedure, involving an exhausitve serach or an iterative optimization method. The AD information is useless in this context, and has an significant performance impact. It is preferable not to use AD types part.
Evaluating $f(x) = F(x,\alpha(x))$ is rather straightforward once $\alpha(x)$ is known. Furthermore AD types are needed in this final step in order to effectively differentiate $f$.

First order derivatives only ! From a mathematical standpoint, the envelope theorem does not apply to second order derivatives in general. Please recall this. Our AD library will not stop you or issue a warning if you attempt to do so.

Practical implementation. Implement the function $f$ with the following enhancements:

Input. In addition to the standard parameters (a.k.a. the variable $x$ above), the function input should feature an oracle optional parameter (corresponding to $\alpha(x)$). If the oracle is provided, then the optimization step is bypassed.
Output. In addition to the relevant value $f(x)$ the function output should include the optimized parameter $\alpha(x)$, which may be used as an oracle in subsequent calls.

Then call the function $f$ twice, first with raw np.ndarray data types, and then with the desired AD type. We provide a helper function in the AD library to automate this process.

In [50]:

u1 = ad.Sparse.identity(constant=u0)
u2 = ad.Sparse2.identity(constant=u0)

We define a function whose evaluation involves a minimization by exhaustive search in a large table, here with $100$ entries.

In [51]:

def f(x,A,B):
    return np.min(A*x+B,axis=0) # Exhaustive search

In [52]:

A=np.linspace(0,1,100)**0.5; B=np.linspace(1,0,100)**2
A,B = (np.expand_dims(e,axis=-1) for e in (A,B))

A little bit of code rewrite is needed to make this function compatible with the oracle interface. For convenience, we provide a min_argmin function, and likewise a max_argmax function. (Note that the minimizer index given in the output of f_oracle will be erroneous if an oracle is given, but this shall not be an issue.)

In [53]:

def f_oracle(x,A,B,oracle=None):
    if not (oracle is None): 
        A,B = (np.take_along_axis(e,np.expand_dims(oracle,axis=0),axis=0) for e in (A,B)) 
    return ad.min_argmin(A*x+B,axis=0)

There is no need to use the AD types to compute the minimizer.

In [54]:

value,minimizer = f_oracle(u0,A,B)
print("Value (without AD) : ", value, "\nand minimizer ", minimizer)

Value (without AD) :  [ 0.92667447  0.30279844 -0.80901699 -0.80901699  0.30279844] 
and minimizer  [69 91 99 99 91]

Given the minimizer, one can efficiently evaluate $f$ on the AD type, bypassing most computations.

In [55]:

f_oracle(u1,A,B,oracle=minimizer)[0]

Out[55]:

spAD(array([ 0.92667447,  0.30279844, -0.80901699, -0.80901699,  0.30279844]), array([[0.83484711],
       [0.95874497],
       [1.        ],
       [1.        ],
       [0.95874497]]), array([[0],
       [1],
       [2],
       [3],
       [4]]))

Use ad.apply, with the keyword argument envelope=True, to automatically apply the above procedure.

In [56]:

ad.apply(f_oracle,u1,A,B,envelope=True)

Out[56]:

(spAD(array([ 0.92667447,  0.30279844, -0.80901699, -0.80901699,  0.30279844]), array([[0.83484711],
        [0.95874497],
        [1.        ],
        [1.        ],
        [0.95874497]]), array([[0],
        [1],
        [2],
        [3],
        [4]])),
 array([69, 91, 99, 99, 91]))

In [ ]: