Notebook

Block 3: dynamic programming¶

Alfred Galichon (NYU & Sciences Po)¶

'math+econ+code' masterclass on optimal transport and economic applications¶

© 2018-2021 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274, and contributions by Jules Baudet, Pauline Corblet, Gregory Dannay, and James Nesbit are acknowledged.

With python code examples¶

Learning Objectives¶

Basics of (finite-horizon, discrete) dynamic programming: Bellman's equation; forward induction, backward induction
Markov decision processes
Dynamic programming as linear programming: interpretation of duality
Vectorization, Kronecker products, multidimensional arrays

References¶

Ford Jr, L. R., & Fulkerson, D. R. (1958). Constructing maximal dynamic flows from static flows. Operations research, 6(3), 419-433.
Howard, R. (1960). Dynamic programming and Markov processes. Wiley.
Schrijver, A. (2003). Combinatorial optimization: polyhedra and efficiency Vol. A. Springer. Section 12.5.c.
Bertsekas, D. (2011), Dynamic Programming and Optimal Control, Vols. I and II. 3rd ed. Athena.
Ljungqvist, L., & Sargent, T. (2012), Recursive Macroeconomic Theory 3rd ed. MIT.
Rust (1987), Optimal replacement of GMC bus engines: an empirical model of Harold Zurcher. Econometrica.
Nazareth, J., & Kulkarni, R. (1986). Linear programming formulations of Markov decision processes. Operations Research Letters.

Movitation¶

John Rust describes the problem of Harold Zurcher, an engineer who runs a bus fleet as follows:

each month, buses operate a stochastic number of miles
operations costs increase with mileage (maintenance, fuel, insurance and costs of unexpected breakdowns)
there is a fixed cost associated with overhaul (independent on mileage)
each month, Zurcher needs to decide to send the bus to overhaul, which resets their mileage to zero, or to let them operate.

This problem is a dynamic programming problem. When taking the decision whether to perform the overhaul or not, Zurcher needs to compare the operation cost not only with the cost of overhaul, but also take into account the reduction in operation costs in the future periods.

While in this instance of the problem there is no externality across buses, so the buses could decide in isolation whether to go on maintenance or not, it is not hard to envision a variant of this problem where there are externalities. For instance, one may assume that there is a maximum number of buses that can go on overhaul at the same time.

We shall derive the optimal policy for Harold Zurcher, (somewhat freely) based on Rust's data.

First, let's import the libraries we'll need.

In [14]:

import numpy as np
import scipy.sparse as spr
from tabulate import tabulate
# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here
import gurobipy as grb
import pandas as pd

Linear dynamic programming¶

States and actions¶

Consider a finite set of states $x\in\mathcal{X}$; and a set of possible actions $y\in\mathcal{Y}$.

It is assumed that at initial time, the mass of agents in state $x$ is $q_{x}$ (exogenous).

Let $\pi_{xy}^{t}$ be the (endogenous) number of individuals who are in state $x$ and choose $y$ ("policy variable").

Because $\pi_{xy}^{t}$ is our decision variable, we shall need to worry about how the computer represents it in the memory: by stacking columns or by stacking rows? this brings us to the important question of vectorization.

Bus example. In our bus maintenance example, one faces a maintenance decision, which is captured by $y\in\mathcal{Y}=\left\{0,1\right\}$, where $y = 0$ is to keep going, and $y=1$ is to perform overhaul. The state $x\in\mathcal{X}=\left\{x_{0},...,x_{I}\right\}$ represents the mileage level of a bus. The transition between states is as follows:

When no overhaul is performed, these states undergo random transitions (depending on how much the bus is used): $x\rightarrow x^{\prime}$ with some probability $P_{x^{\prime}|x}$, where state $x^{\prime}$ marks more mileage than $x$.
When overhaul is performed on a bus, the state is restored to the zero state $x_{0}$.

Vectorization and Kronecker products¶

We will need to represent matrices (such as $U_{x}^{t}$) and 3-dimensional arrays (such as $u_{xy}^{t}$). Under the row-major order (a.k.a. 'C order') used in C and by default in numpy, we will represent a matrix $M_{ij}$ by varying the last index first, i.e. a $2\times2$ matrix will be represented as $vec_C\left(M\right) = M_{11}, M_{12}, M_{21}, M_{22}.$ Likewise, a 2x2x2 3-dimensional array $A$ will be represented by varying the first index first, then the second, i.e.

$vec_C\left(A\right) = A_{111}, A_{112}, A_{121}, A_{122}, A_{211}, A_{212}, A_{221}, A_{222}$.

In numpy, this is implemented by reshape(...).

A very important identity is \begin{align*} vec_C\left(AXB\right) = \left( A\otimes B^\top\right) vec_C\left(X\right), \end{align*} where $vec_C$ is the vectorization under the C (row-major) order, and where the Kronecker product $\otimes$ is defined as follows for 2x2 matrices (with obvious generalization):

\begin{align*} A\otimes B= \begin{pmatrix} a_{11}B & a_{12}B\\ a_{21}B & a_{22}B \end{pmatrix}. \end{align*}

Representation of the decision variable¶

We shall adopt the convention to store $\pi^t_{xy}$ by taking the indexing ordering to be $(t,x,y)$ under the row-major order. That way we shall store $(\pi^1_{xy})_{xy}$ on top of $(\pi^2_{xy})_{xy}$, etc, and whithin the $(\pi^1_{xy})_{xy}$ block, $(\pi^1_{x_1y})_{y}$ on top of $(\pi^1_{x_2y})_{y}$, etc.

Choice-counting equation¶

Define $n_{x}^{t}$ be the (endogenous) number of individuals in state $x$ at time $t$. The choice-counting equation expresses that the sum over $y$ of individuals of type $x$ who made choice $y$ is equal to $n_x$, that is:

$ n_{x}^{t} = \sum_{y\in\mathcal{Y}}\pi_{xy}^{t}. $

Clearly, this can we rewritten for each $t$

$ n^t = \begin{pmatrix} 1_{Y}^{\top } & 0 & \cdots & 0 \\ 0 & 1_{Y}^{\top } & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & 1_{Y}^{\top } \end{pmatrix} vec_C(\pi^{t}) =\left( I_{X}\otimes 1_{Y}^{\top } \right) vec_C(\pi^{t}), $

and stacking this over $t$ yields

$ n = \begin{pmatrix} I_{X}\otimes 1_{Y}^{\top } & 0 & \cdots & 0 \\ 0 & I_{X}\otimes 1_{Y}^{\top } & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & I_{X}\otimes 1_{Y}^{\top }% \end{pmatrix}% vec_C(\pi) = \left( I_{T}\otimes I_{X}\otimes 1_{Y}^{\top } \right) vec_C(\pi). $

Therefore the choice-counting equation reads:

\begin{equation} \left( I_{T}\otimes I_{X}\otimes 1_{Y}^{\top } \right) vec_C(\pi) = n. \end{equation}

Markov transitions¶

Now assume that if an agent is in state $x$ and takes decision $y$ at time $t-1$, then the agent's state will transition to state $x'$ at time $t$ with probability $P_{x^{\prime}|xy}$, which expresses that among the individual in state $x$ who choose $y$ at time $t-1$, a fraction $P_{x^{\prime}|xy}$ shall transit to state $x^{\prime}$ at time $t$.

$P$ is represented by a matrix whose rows are indexed by $x^{\prime}$ and whose columns are indexed by $xy$. Under the row-major order, the index ordering will be $(x^{\prime},x,y)$.

For $t$ such that $1\leq t\leq T-1$, the Markov transitions are given by

$ n_{x^{\prime}}^{t} = \sum_{x\in\mathcal{X},~y\in\mathcal{Y}}P_{x^{\prime}|xy}\pi_{xy}^{t-1} $, that is

$ n^t = P \pi^{t-1} $, and $n^{1}=q$.

Bus example. In the bus example, assume that the Markov transitions laws are given by:

If no overhaul is performed, each state but the last one has a probability $25\%$ of transiting to the next one, and probability $75\%$ of remaining the same. The last state transits to $x=0$ with probability $25\%$ (overhaul is performed when beyond last state).
If overhaul is performed, the state transits to $0$ for sure.

Define the matrix $L^{\mathcal{X}}_{x^{\prime}|x}$ associated with a transition to the next state, and a reset at the last state, and $R^{\mathcal{X}}_{x^{\prime}|x}$ associated with a reset to the initial state.

for the transitions $x\to x^{\prime}$ by: \begin{align*} L^{\mathcal{X}}=% \begin{pmatrix} 0 & 0 & \cdots & 0 & 1\\ 1 & 0 & \ddots & \ddots & 0\\ 0 & 1 & \ddots & \ddots & 0\\ \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & 0 & 0 & 1 & 0 \end{pmatrix} \text{ and } R^{\mathcal{X}}=% \begin{pmatrix} 1 & 1 & \cdots & 1\\ 0 & \vdots & \ddots & \vdots\\ 0 & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & 0 \end{pmatrix} \end{align*} Now:

If $y=0$, the transition matrix $x\to x^{\prime}$ will be given by $0.75I_{\mathcal{X}}+0.25L_{\mathcal{X}}$, while
If $y=1$ it will be given by $R^{\mathcal{X}}$

so that $P$ is given by \begin{align*} P=\left( 0.75I_{\mathcal{X}}+0.25L^{\mathcal{X}}\right) \otimes (1,0)

R^{\mathcal{X}} \otimes (0,1)

\end{align*}

To fix ideas, we shall assume that $\mathcal{X}=\left\{0,1,2\right\}$, that $\mathcal{Y}=\left\{0,1\right\}$ as argued before, and $\mathcal{T}=\left\{0,...,3\right\}$,, and we implement this as follows:

In [3]:

class dynamic_problem:
    def __init__(self,nbT,nbX):
        self.nbT = nbT
        self.nbX = nbX
        self.nbY = 2


small_example = dynamic_problem(4,3)
        
def P_x_xy(self):
    LX = spr.csr_matrix((np.ones(self.nbX), (list(range(1,self.nbX))+[0], range(self.nbX))), shape = (self.nbX,self.nbX))
    RX = spr.csr_matrix((np.ones(self.nbX), ([0 for i in range(self.nbX)], range(self.nbX))), shape = (self.nbX,self.nbX))
    return spr.kron((0.75 * spr.identity(self.nbX) + 0.25 * LX), np.array([[1,0]])) + spr.kron(RX, np.array([[0,1]])) 

dynamic_problem.P_x_xy = P_x_xy

small_example.P_x_xy().toarray()

Out[3]:

array([[0.75, 1.  , 0.  , 1.  , 0.25, 1.  ],
       [0.25, 0.  , 0.75, 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.25, 0.  , 0.75, 0.  ]])

Markov forward equation¶

Denoting $N_{T}$ the $T \times T$ matrix defined by:

$ N_{T}= \begin{pmatrix} 0 & 0 & \cdots & \cdots & 0\\ 1 & \ddots & \cdots & \cdots & 0\\ 0 & \ddots & \ddots & & \vdots\\ \vdots & \ddots & \ddots & \ddots & \vdots\\ 0 & \cdots & 0 & 1 & 0 \end{pmatrix} $

this rewrites

$ n = (N_{T} \otimes P) \pi + b $

where $b^t_x$ is the vector obtained by stacking column vector $q_x$ on top of $(T-1)X$ zeros.

Therefore the Markov forward equation reads:

\begin{equation} \left(N_{T} \otimes P\right) vec_C(\pi) + b = n. \end{equation}

In [4]:

def N_t_t(self):
    return(spr.csr_matrix((np.ones(self.nbT-1), (list(range(1,self.nbT)), range(self.nbT-1))), shape = (self.nbT,self.nbT)))

dynamic_problem.N_t_t = N_t_t

small_example.N_t_t().toarray()

Out[4]:

array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

Population dynamics¶

Combining the choice-counting equation with the Markov forward equation to substitute out $n$ yields

$$ A ~ vec_C(\pi) = b. $$

where we have defined

$$ A := I_{T}\otimes I_{X}\otimes 1_{Y}^{\top } - N_{T} \otimes P . $$

The implementation is straightforward:

In [5]:

def A_tx_txy(self):
    return(spr.kron(spr.identity(self.nbT),spr.kron(spr.identity(self.nbX), np.ones(self.nbY))) - spr.kron(N_t_t(self), P_x_xy(self)) )

dynamic_problem.A_tx_txy = A_tx_txy

def b_tx(self,q=np.array([])): 
    if len(q)==0:
        q = np.ones(self.nbX)
    return( np.concatenate((q,np.zeros(self.nbX * (self.nbT-1 )))))

dynamic_problem.b_tx = b_tx

In our example:

In [6]:

print("b=",small_example.b_tx())
small_example.A_tx_txy()[0:5,0:8].toarray()

b= [1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Out[6]:

array([[ 1.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  1.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  1.  ,  1.  ,  0.  ,  0.  ],
       [-0.75, -1.  ,  0.  , -1.  , -0.25, -1.  ,  1.  ,  1.  ],
       [-0.25,  0.  , -0.75,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ]])

Payoffs¶

Let $u_{xy}^{t}$ be the (exogenous) payoff associated with choice $y\in\mathcal{Y}$ at time $t\in\mathcal{T}=\left\{ 1,...,T\right\} $ in state $x\in\mathcal{X}$, discounted in order to be expressed in zeroth-period equivalent (for example: $u_{xy}^{t}=\beta^{t}u_{xy}$ where $\beta$ is a constant discount factor).

Example. Back to our bus example, we assume there is a fixed cost $C$ associated with overhaul (independent of mileage), while operations costs $c\left( x\right)$ increase with mileage (maintenance, fuel, insurance and costs of unexpected breakdowns). Specifically, assume the following:

The cost of replacing an engine is $C=\$8,000$ (in $1985$ dollars).
The operations cost is $c\left( x\right) = \frac {C x} {\left| \mathcal{X} \right|} .$
The discount factor is $\beta=0.9$.

Next, we build $u_{xyt}$

First the $u_{xy}$'s so that $u_{x0}=-x\times5.10^{2}$ for $x<\bar{x}$, and $u_{\bar{x}0}=-C$, while $u_{x1}=-C$ for all $x$.
Next the $u_{xyt}$ so that $u_{xyt}=u_{xy}\beta^{t}=vec_F\left(\left(\beta^{1},...,\beta^{T}\right) \otimes u_{xy}\right)$

In [7]:

def u_txy(self, overhaulCost = 8000 , beta = 0.9):
    maintCost = lambda x: x * overhaulCost / self.nbX
    discount = lambda x: beta**x
    u_xy = np.array(list(zip([-maintCost(x) for x in range(self.nbX)],[-overhaulCost for x in range(self.nbX)] ))).reshape(1,-1)
    return(np.kron(np.array([beta ** i for i in range(self.nbT)]),u_xy) )

dynamic_problem.u_txy = u_txy

In our example:

In [8]:

small_example.u_txy()

Out[8]:

array([[   -0.        , -8000.        , -2666.66666667, -8000.        ,
        -5333.33333333, -8000.        ,    -0.        , -7200.        ,
        -2400.        , -7200.        , -4800.        , -7200.        ,
           -0.        , -6480.        , -2160.        , -6480.        ,
        -4320.        , -6480.        ,    -0.        , -5832.        ,
        -1944.        , -5832.        , -3888.        , -5832.        ]])

Intertemporal optimization problem (finite horizon)¶

Duality¶

We now have everything ready to write down the intertemporal (primal) optimization problem, which expresses

\begin{align*} \max_{\pi\geq0} & \, u^{\top}\pi\\ s.t.~ & A\pi=b~\left[U\right] \end{align*}

while the dual problem is given by

\begin{align*} \min_{U} & \, b^{\top}U\\ s.t.~ & A^{\top} U\geq u~\left[\pi \geq 0\right] . \end{align*}

We shall write explicitely both problems to interpret the dual, but first, let's implement.

In [9]:

def solve_lp(self):
    m = grb.Model()
    pi_txy = m.addMVar(shape=self.nbT*self.nbX*self.nbY )
    m.setObjective(self.u_txy() @ pi_txy, grb.GRB.MAXIMIZE)
    m.addConstr( self.A_tx_txy() @ pi_txy == self.b_tx() )
    m.optimize()
    piopt_t_x_y = np.array(m.getAttr('X')).reshape((self.nbT,self.nbX,self.nbY))
    Uopt_t_x = np.array(m.getAttr('pi')).reshape((self.nbT,self.nbX))
    return({'pi_t_x_y':piopt_t_x_y,'U_t_x':Uopt_t_x})

dynamic_problem.solve_lp = solve_lp

In our example:

In [10]:

sol_lp = small_example.solve_lp()
sol_lp['U_t_x'][0,]

--------------------------------------------
Warning: your license will expire in 4 days
--------------------------------------------

Academic license - for non-commercial use only - expires 2021-10-25
Using license file c:\gurobi911\gurobi.lic
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64)
Thread count: 6 physical cores, 12 logical processors, using up to 12 threads
Optimize a model with 12 rows, 24 columns and 51 nonzeros
Model fingerprint: 0x2dd846f5
Coefficient statistics:
  Matrix range     [3e-01, 1e+00]
  Objective range  [2e+03, 8e+03]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 12 rows and 24 columns
Presolve time: 0.01s
Presolve: All rows and columns removed
Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0   -2.2023625e+04   0.000000e+00   0.000000e+00      0s

Solved in 0 iterations and 0.01 seconds
Optimal objective -2.202362500e+04

Out[10]:

array([-2999.625, -9512.   , -9512.   ])

Primal problem: central planner's problem¶

The central planner's problem is:

\begin{align*} \max_{\pi_{xy}^{t}\geq0} & \sum_{x\in\mathcal{X},~y\in\mathcal{Y},~t\in\mathcal{T}}\pi_{xy}^{t}u_{xy}^{t} \\ s.t. & \sum_{y^{\prime}\in\mathcal{Y}}\pi_{xy^{\prime}}^{0}=q_{x}~\left[U_{x}^{1}\right] \\ & \sum_{y^{\prime}\in\mathcal{Y}}\pi_{x^{\prime}y^{\prime}}^{t}=\sum_{x\in\mathcal{X},~y\in\mathcal{Y}}P_{x^{\prime}|xy}\pi_{xy}^{t-1}~\forall t\in\mathcal{T}\backslash\left\{ 0\right\} ~\left[ U_{x^{\prime}}^{t}\right] \end{align*}

Dual problem: dynamic programming¶

We have introduced $U_{x}^{t}$ the Lagrange multiplier associated with the constraints at time $t$. The dual problem is:

\begin{align*} \min_{U_{x}^{t},~t\in\mathcal{T},~x\in\mathcal{X}} & \sum_{x\in\mathcal{X}}q_{x}U_{x}^{0} \\ s.t.~ & U_{x}^{t}\geq u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}~\forall x\in\mathcal{X},~y\in\mathcal{Y},~t\in\mathcal{T}\backslash\left\{ T - 1\right\} \\ & U_{x}^{T-1}\geq u_{xy}^{T-1}~\forall x\in\mathcal{X},y\in\mathcal{Y} \end{align*}

We shall see that $U_{x}^t$ represents the intertemporal payoff of being in state $x$ at time $t$, while the constraints is a Bellman equation.

Complementary slackness and Bellman's equation¶

By complementary slackness, we have

\begin{align*} \pi_{xy}^{t}>0\Longrightarrow U_{x}^{t}=u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy} \end{align*}

whose interpretation is immediate: if $y$ is the optimal choice in state $x$ at time $t$, then the intertemporal payoff of $x$ at $t$ is the sum of her myopic payoff $u_{xy}^{t}$ and her expected payoff at the next step.

As a result, the dual variable is called intertemporal payoff in the vocable of dynamic programming. The relation yields Bellman's equation, verified as soon as $n^t_x>0$:

\begin{align*} U_{x}^{t}=\max_{y\in\mathcal{Y}}\left\{ u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}\right\}, \end{align*}

It is easy to see that if $U$ satisfies Bellman's equation for all $(x,t)$, then it solves the dual program.

Backward-forward induction¶

But there is in fact a much faster way to compute the primal and dual solutions without having to use the full power of a linear programming solver. Along with the fact that $U^{T}=0$, Bellman's equation implies that there is a particularly simple method to obtain the dual variables $U^{t}$, by solving recursively backward in time, from $t=T-1$ to $t=0$. This method is called backward induction:

Algorithm [Backward induction]

Set $U^{T}=0$
For $t=T-1$ down to $0$, set $U_{x}^{t}:=\max_{y\in\mathcal{Y}}\left\{u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}\right\}$.

We implement as follows:

In [11]:

def backward_induction(self):
    U_t_x = np.zeros((self.nbT,self.nbX))
    contVals_x = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[self.nbT-1,:,:].max(axis=1)
    U_t_x[ self.nbT-1,:] = contVals_x
    for t in range(self.nbT-2, -1, -1):
        myopic_x_y = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[t,:,:]
        EcontVals_x_y = (contVals_x.transpose() @ self.P_x_xy()).reshape((self.nbX,self.nbY))
        contVals_x = (myopic_x_y + EcontVals_x_y).max(axis = 1)
        U_t_x[ t,:] = contVals_x
    return(U_t_x)

dynamic_problem.backward_induction = backward_induction

In our example:

In [13]:

sol_bi = small_example.backward_induction()

print(pd.DataFrame(zip(sol_bi.flatten(),sol_lp['U_t_x'].flatten()),headers=["bi","lp" ]))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-9f96fabff82e> in <module>
      1 sol_bi = small_example.backward_induction()
      2 
----> 3 print(pd.DataFrame(zip(sol_bi.flatten(),sol_lp['U_t_x'].flatten()),headers=["bi","lp" ]))

NameError: name 'pd' is not defined

The primal variables $\pi^{t}$ are then deduced also by recursion, but this time forward in time from $t=1$ to $t=T-1$, by the so-called forward induction method:

Algorithm [Forward induction]

Set $b^{0}=q$ and compute $\left( U^{t}\right)$ by backward induction.
For $t=0$ up to $T-1$, pick $\pi^{t}$ such that $\pi_{xy}^{t}/n_{x}^{t}$ is a probability measure supported in the set

\begin{align*} \left\{ y:U_{x}^{t}=u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}\right\} . \end{align*}

Set $n_{x^{\prime}}^{t+1}:=\sum_{x\in\mathcal{X},~y\in\mathcal{Y}}P_{x^{\prime}|xy}\pi_{xy}^{t-1}$

We implement as follows:

In [16]:

def backward_forward_induction(self, q=np.array([]) ):
    U_t_x = self.backward_induction()
    n_t_x = self.b_tx(q).reshape((self.nbT,self.nbX))
    pi_t_x_y = np.zeros((self.nbT,self.nbX,self.nbY))
    self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[self.nbT-1,:,:].max(axis=1)
    for t in range(self.nbT-1):
        myopic_x_y = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[t,:,:]
        EcontVals_x_y = (U_t_x[t+1,:].transpose() @ self.P_x_xy()).reshape((self.nbX,self.nbY))
        Y_x = (myopic_x_y + EcontVals_x_y).argmax(axis = 1)
        for x in range(self.nbX):
            pi_t_x_y[t,x,Y_x[x]] = n_t_x[t,x] 
        n_t_x[t+1,:] = (self.P_x_xy() @ pi_t_x_y[t,:,:].flatten())
    myopic_x_y = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[self.nbT-1,:,:]
    Y_x = myopic_x_y.argmax(axis = 1)
    for x in range(self.nbX):
        pi_t_x_y[self.nbT-1,x,Y_x[x]] = n_t_x[self.nbT-1,x] 
    return({'pi_t_x_y':pi_t_x_y,'U_t_x':U_t_x})

dynamic_problem.backward_forward_induction = backward_forward_induction

In our example:

In [19]:

sol_bf = small_example.backward_forward_induction()
#print(Value )
print('Value sol bf=',(small_example.u_txy() * (sol_bf['pi_t_x_y'].flatten())).sum(), ' ; value sol lp=',(small_example.u_txy() * (sol_lp['pi_t_x_y'].flatten())).sum())

print(tabulate(zip(sol_bf['pi_t_x_y'].flatten(),sol_lp['pi_t_x_y'].flatten()),headers=["pi (bi)","pi (lp)" ]))

Value sol bf= -22023.625  ; value sol lp= -22023.625
  pi (bi)    pi (lp)
---------  ---------
  1          1
  0          0
  0          0
  1          1
  0          0
  1          1
  2.75       2.75
  0          0
  0.25       0.25
  0          0
  0          0
  0          0
  2.0625     2.0625
  0          0
  0.875      0.875
  0          0
  0          0
  0.0625     0.0625
  1.60938    1.60938
  0          0
  1.17188    1.17188
  0          0
  0.21875    0.21875
  0          0

Remarks¶

The dual variable is $U$ not necessarily unique (if $(x,t)$ is not visited, $U^t_x$ can take typically several values); the primal variable is not either, as there may be ties between several states.
The computation by the combination of the backward and forward algorithms is much faster than the computation by a black-box linear programming solver.
However, as soon as we introduce capacity constraints, the computation by backward induction no longer works, and the linear programming formulation is necessary, as we shall now see.

Population constraints¶

Let us now assume that for each category $x$ at most $k_y$ individuals can take choice $y$ at each time, where $\sum_y k_y \geq \sum_x q_x$. An additional constraint is therefore that for all $t\in\mathcal{T}$, $x \in \mathcal{X}$ and $y \in \mathcal{Y}$, one should have:

$$ \pi^t_{xy} \leq k_y $$

Exercise.

Write down (in matrix form) how the primal problem is modified.
In the example above, assume $k_0=+\infty$ and $k_1 = 1$ and compute the problem using LP.
Can we use the backward-forward algorithm?

Solution to the exercise¶

With a constraint of the form $B\pi\leq m$, the primal problem then writes

\begin{align*} \max_{\pi\geq0} & u^{\top}\pi\\ s.t.~ & A \pi=b~\left[ U\right] \\ & B \pi\leq m~\left[ \Lambda\right] \end{align*}

whose dual is

\begin{align*} \min_{U,\Lambda\geq0} & b^{\top}U+m^{\top}\Lambda\\ s.t.~ & A^\top U+B^\top\Lambda\geq u~\left[ \pi\right] \end{align*}

The dual becomes \begin{align*} \min_{U_{x}^{t},\lambda_{y}^{t}\geq0} & \sum_{x\in\mathcal{X}}n_{x}U_{x}% ^{1}+\sum_{x\in\mathcal{X}}\sum_{t\in\mathcal{T}}m_{y}\lambda_{y}^{t}\\ s.t.~ & U_{x}^{t}\geq u_{xy}^{t}-\lambda_{y}^{t}+\sum_{x^{\prime}% }U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}~\forall x\in\mathcal{X},~y\in \mathcal{Y},~t\in\mathcal{T}\backslash\left\{ T\right\} \nonumber\\ & U_{x}^{T}\geq u_{xy}^{T}~\forall x\in\mathcal{X},y\in\mathcal{Y}\nonumber \end{align*}

and $\lambda_{y}^{t}$ interprets as the shadow price of alternative $y$ at time $t$.

Solution to the exercise. The constraints $\pi^t_{xy} \leq k_y$ express as $$vec_C(\pi) \leq 1_{XT} \otimes k$$ However, when the upper bound is $+\infty$, we'd rather drop the corresponding constraints. In which case, the matrix form becomes $$(I_{XT} \otimes (0,1)) vec_C(\pi) \leq 1_{XT}$$ which we code as:

In [20]:

Abis_tx_txy = spr.kron(spr.identity(small_example.nbT * small_example.nbX),np.array([0,1]))
bbis_tx = np.ones(small_example.nbX * small_example.nbT )

m = grb.Model()
pi_txy = m.addMVar(shape=small_example.nbT*small_example.nbX*small_example.nbY )
m.setObjective(small_example.u_txy() @ pi_txy, grb.GRB.MAXIMIZE)
m.addConstr( small_example.A_tx_txy() @ pi_txy == small_example.b_tx())
m.addConstr( Abis_tx_txy @ pi_txy <= bbis_tx) 
m.optimize()
pioptbis_t_x_y = np.array(m.getAttr('X')).reshape((small_example.nbT,small_example.nbX,small_example.nbY))
Uoptbis_t_x = np.array(m.getAttr('pi'))[:(small_example.nbX * small_example.nbT)].reshape((small_example.nbT,small_example.nbX))
                        
Uoptbis_t_x[0,]

Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64)
Thread count: 6 physical cores, 12 logical processors, using up to 12 threads
Optimize a model with 24 rows, 24 columns and 63 nonzeros
Model fingerprint: 0xf58cf4ba
Coefficient statistics:
  Matrix range     [3e-01, 1e+00]
  Objective range  [2e+03, 8e+03]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 21 rows and 17 columns
Presolve time: 0.01s
Presolved: 3 rows, 7 columns, 10 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0   -1.5354500e+04   1.250000e+00   0.000000e+00      0s
       3   -2.2023625e+04   0.000000e+00   0.000000e+00      0s

Solved in 3 iterations and 0.01 seconds
Optimal objective -2.202362500e+04

Out[20]:

array([ -2999.625     , -10185.04166667, -11475.83333333])

Another plausible constraint is to assume that the maximum of buses that can go on maintenance all together is $k$, say e.g. $k=1.5$. This constraint now writes $$\left( I_{T}\otimes 1_{X}^{\top }\otimes (0,1)\right) vec\left( \pi \right) \leq k 1_{T},$$ and is coded as:

In [15]:

Abis_tx_txy = spr.kron(spr.kron(spr.identity(small_example.nbT ), np.ones(small_example.nbX)),np.array([0,1]))
bbis_tx = 1.5 * np.ones( small_example.nbT )

m = grb.Model()
pi_txy = m.addMVar(shape=small_example.nbT*small_example.nbX*small_example.nbY )
m.setObjective(small_example.u_txy() @ pi_txy, grb.GRB.MAXIMIZE)
m.addConstr( small_example.A_tx_txy() @ pi_txy == small_example.b_tx())
m.addConstr( Abis_tx_txy @ pi_txy <= bbis_tx) 
m.optimize()
pioptbis_t_x_y = np.array(m.getAttr('X')).reshape((small_example.nbT,small_example.nbX,small_example.nbY))
Uoptbis_t_x = np.array(m.getAttr('pi'))[:(small_example.nbX * small_example.nbT)].reshape((small_example.nbT,small_example.nbX))
                        
Uoptbis_t_x[0,]

Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (linux64)
Thread count: 2 physical cores, 4 logical processors, using up to 4 threads
Optimize a model with 16 rows, 24 columns and 63 nonzeros
Model fingerprint: 0xca47d0bd
Coefficient statistics:
  Matrix range     [2e-01, 1e+00]
  Objective range  [2e+03, 8e+03]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 2e+00]
Presolve removed 12 rows and 16 columns
Presolve time: 0.02s
Presolved: 4 rows, 8 columns, 14 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0   -1.5354500e+04   1.250000e+00   0.000000e+00      0s
       4   -2.2360146e+04   0.000000e+00   0.000000e+00      0s

Solved in 4 iterations and 0.02 seconds
Optimal objective -2.236014583e+04

Out[15]:

array([ -2999.625     , -10185.04166667, -10185.04166667])

In these cases, the linear programming solver is our only way to compute the solution. The backward induction algorithm would not apply because of the capacity constraint.