Basics of (finite-horizon, discrete) dynamic programming: Bellman's equation; forward induction, backward induction
Markov decision processes
Dynamic programming as linear programming: interpretation of duality
Vectorization, Kronecker products, multidimensional arrays
Ford Jr, L. R., & Fulkerson, D. R. (1958). Constructing maximal dynamic flows from static flows. Operations research, 6(3), 419-433.
Howard, R. (1960). Dynamic programming and Markov processes. Wiley.
Schrijver, A. (2003). Combinatorial optimization: polyhedra and efficiency Vol. A. Springer. Section 12.5.c.
Bertsekas, D. (2011), Dynamic Programming and Optimal Control, Vols. I and II. 3rd ed. Athena.
Ljungqvist, L., & Sargent, T. (2012), Recursive Macroeconomic Theory 3rd ed. MIT.
Rust (1987), Optimal replacement of GMC bus engines: an empirical model of Harold Zurcher. Econometrica.
Nazareth, J., & Kulkarni, R. (1986). Linear programming formulations of Markov decision processes. Operations Research Letters.
John Rust describes the problem of Harold Zurcher, an engineer who runs a bus fleet as follows:
each month, buses operate a stochastic number of miles
operations costs increase with mileage (maintenance, fuel, insurance and costs of unexpected breakdowns)
there is a fixed cost associated with overhaul (independent on mileage)
each month, Zurcher needs to decide to send the bus to overhaul, which resets their mileage to zero, or to let them operate.
This problem is a dynamic programming problem. When taking the decision whether to perform the overhaul or not, Zurcher needs to compare the operation cost not only with the cost of overhaul, but also take into account the reduction in operation costs in the future periods.
While in this instance of the problem there is no externality across buses, so the buses could decide in isolation whether to go on maintenance or not, it is not hard to envision a variant of this problem where there are externalities. For instance, one may assume that there is a maximum number of buses that can go on overhaul at the same time.
We shall derive the optimal policy for Harold Zurcher, (somewhat freely) based on Rust's data.
First, let's import the libraries we'll need.
import numpy as np
import scipy.sparse as spr
from tabulate import tabulate
# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here
import gurobipy as grb
import pandas as pd
Consider a finite set of states $x\in\mathcal{X}$; and a set of possible actions $y\in\mathcal{Y}$.
It is assumed that at initial time, the mass of agents in state $x$ is $q_{x}$ (exogenous).
Let $\pi_{xy}^{t}$ be the (endogenous) number of individuals who are in state $x$ and choose $y$ ("policy variable").
Because $\pi_{xy}^{t}$ is our decision variable, we shall need to worry about how the computer represents it in the memory: by stacking columns or by stacking rows? this brings us to the important question of vectorization.
Bus example. In our bus maintenance example, one faces a maintenance decision, which is captured by $y\in\mathcal{Y}=\left\{0,1\right\}$, where $y = 0$ is to keep going, and $y=1$ is to perform overhaul. The state $x\in\mathcal{X}=\left\{x_{0},...,x_{I}\right\}$ represents the mileage level of a bus. The transition between states is as follows:
We will need to represent matrices (such as $U_{x}^{t}$) and 3-dimensional arrays (such as $u_{xy}^{t}$). Under the row-major order (a.k.a. 'C order') used in C
and by default in numpy
, we will represent a matrix $M_{ij}$ by varying the last index first, i.e. a $2\times2$ matrix will be represented as $vec_C\left(M\right) = M_{11}, M_{12}, M_{21}, M_{22}.$ Likewise, a 2x2x2 3-dimensional array $A$ will be represented by varying the first index first, then the second, i.e.
$vec_C\left(A\right) = A_{111}, A_{112}, A_{121}, A_{122}, A_{211}, A_{212}, A_{221}, A_{222}$.
In numpy
, this is implemented by reshape(...)
.
A very important identity is \begin{align*} vec_C\left(AXB\right) = \left( A\otimes B^\top\right) vec_C\left(X\right), \end{align*} where $vec_C$ is the vectorization under the C (row-major) order, and where the Kronecker product $\otimes$ is defined as follows for 2x2 matrices (with obvious generalization):
\begin{align*} A\otimes B= \begin{pmatrix} a_{11}B & a_{12}B\\ a_{21}B & a_{22}B \end{pmatrix}. \end{align*}We shall adopt the convention to store $\pi^t_{xy}$ by taking the indexing ordering to be $(t,x,y)$ under the row-major order. That way we shall store $(\pi^1_{xy})_{xy}$ on top of $(\pi^2_{xy})_{xy}$, etc, and whithin the $(\pi^1_{xy})_{xy}$ block, $(\pi^1_{x_1y})_{y}$ on top of $(\pi^1_{x_2y})_{y}$, etc.
Define $n_{x}^{t}$ be the (endogenous) number of individuals in state $x$ at time $t$. The choice-counting equation expresses that the sum over $y$ of individuals of type $x$ who made choice $y$ is equal to $n_x$, that is:
$ n_{x}^{t} = \sum_{y\in\mathcal{Y}}\pi_{xy}^{t}. $
Clearly, this can we rewritten for each $t$
$ n^t = \begin{pmatrix} 1_{Y}^{\top } & 0 & \cdots & 0 \\ 0 & 1_{Y}^{\top } & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & 1_{Y}^{\top } \end{pmatrix} vec_C(\pi^{t}) =\left( I_{X}\otimes 1_{Y}^{\top } \right) vec_C(\pi^{t}), $
and stacking this over $t$ yields
$ n = \begin{pmatrix} I_{X}\otimes 1_{Y}^{\top } & 0 & \cdots & 0 \\ 0 & I_{X}\otimes 1_{Y}^{\top } & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & I_{X}\otimes 1_{Y}^{\top }% \end{pmatrix}% vec_C(\pi) = \left( I_{T}\otimes I_{X}\otimes 1_{Y}^{\top } \right) vec_C(\pi). $
Therefore the choice-counting equation reads:
\begin{equation} \left( I_{T}\otimes I_{X}\otimes 1_{Y}^{\top } \right) vec_C(\pi) = n. \end{equation}Now assume that if an agent is in state $x$ and takes decision $y$ at time $t-1$, then the agent's state will transition to state $x'$ at time $t$ with probability $P_{x^{\prime}|xy}$, which expresses that among the individual in state $x$ who choose $y$ at time $t-1$, a fraction $P_{x^{\prime}|xy}$ shall transit to state $x^{\prime}$ at time $t$.
$P$ is represented by a matrix whose rows are indexed by $x^{\prime}$ and whose columns are indexed by $xy$. Under the row-major order, the index ordering will be $(x^{\prime},x,y)$.
For $t$ such that $1\leq t\leq T-1$, the Markov transitions are given by
$ n_{x^{\prime}}^{t} = \sum_{x\in\mathcal{X},~y\in\mathcal{Y}}P_{x^{\prime}|xy}\pi_{xy}^{t-1} $, that is
$ n^t = P \pi^{t-1} $, and $n^{1}=q$.
Bus example. In the bus example, assume that the Markov transitions laws are given by:
If no overhaul is performed, each state but the last one has a probability $25\%$ of transiting to the next one, and probability $75\%$ of remaining the same. The last state transits to $x=0$ with probability $25\%$ (overhaul is performed when beyond last state).
If overhaul is performed, the state transits to $0$ for sure.
Define the matrix $L^{\mathcal{X}}_{x^{\prime}|x}$ associated with a transition to the next state, and a reset at the last state, and $R^{\mathcal{X}}_{x^{\prime}|x}$ associated with a reset to the initial state.
for the transitions $x\to x^{\prime}$ by: \begin{align*} L^{\mathcal{X}}=% \begin{pmatrix} 0 & 0 & \cdots & 0 & 1\\ 1 & 0 & \ddots & \ddots & 0\\ 0 & 1 & \ddots & \ddots & 0\\ \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & 0 & 0 & 1 & 0 \end{pmatrix} \text{ and } R^{\mathcal{X}}=% \begin{pmatrix} 1 & 1 & \cdots & 1\\ 0 & \vdots & \ddots & \vdots\\ 0 & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & 0 \end{pmatrix} \end{align*} Now:
so that $P$ is given by \begin{align*} P=\left( 0.75I_{\mathcal{X}}+0.25L^{\mathcal{X}}\right) \otimes (1,0)
\end{align*}
To fix ideas, we shall assume that $\mathcal{X}=\left\{0,1,2\right\}$, that $\mathcal{Y}=\left\{0,1\right\}$ as argued before, and $\mathcal{T}=\left\{0,...,3\right\}$,, and we implement this as follows:
class dynamic_problem:
def __init__(self,nbT,nbX):
self.nbT = nbT
self.nbX = nbX
self.nbY = 2
small_example = dynamic_problem(4,3)
def P_x_xy(self):
LX = spr.csr_matrix((np.ones(self.nbX), (list(range(1,self.nbX))+[0], range(self.nbX))), shape = (self.nbX,self.nbX))
RX = spr.csr_matrix((np.ones(self.nbX), ([0 for i in range(self.nbX)], range(self.nbX))), shape = (self.nbX,self.nbX))
return spr.kron((0.75 * spr.identity(self.nbX) + 0.25 * LX), np.array([[1,0]])) + spr.kron(RX, np.array([[0,1]]))
dynamic_problem.P_x_xy = P_x_xy
small_example.P_x_xy().toarray()
array([[0.75, 1. , 0. , 1. , 0.25, 1. ], [0.25, 0. , 0.75, 0. , 0. , 0. ], [0. , 0. , 0.25, 0. , 0.75, 0. ]])
Denoting $N_{T}$ the $T \times T$ matrix defined by:
$ N_{T}= \begin{pmatrix} 0 & 0 & \cdots & \cdots & 0\\ 1 & \ddots & \cdots & \cdots & 0\\ 0 & \ddots & \ddots & & \vdots\\ \vdots & \ddots & \ddots & \ddots & \vdots\\ 0 & \cdots & 0 & 1 & 0 \end{pmatrix} $
this rewrites
$ n = (N_{T} \otimes P) \pi + b $
where $b^t_x$ is the vector obtained by stacking column vector $q_x$ on top of $(T-1)X$ zeros.
Therefore the Markov forward equation reads:
\begin{equation} \left(N_{T} \otimes P\right) vec_C(\pi) + b = n. \end{equation}def N_t_t(self):
return(spr.csr_matrix((np.ones(self.nbT-1), (list(range(1,self.nbT)), range(self.nbT-1))), shape = (self.nbT,self.nbT)))
dynamic_problem.N_t_t = N_t_t
small_example.N_t_t().toarray()
array([[0., 0., 0., 0.], [1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.]])
Combining the choice-counting equation with the Markov forward equation to substitute out $n$ yields
$$ A ~ vec_C(\pi) = b. $$where we have defined
$$ A := I_{T}\otimes I_{X}\otimes 1_{Y}^{\top } - N_{T} \otimes P . $$The implementation is straightforward:
def A_tx_txy(self):
return(spr.kron(spr.identity(self.nbT),spr.kron(spr.identity(self.nbX), np.ones(self.nbY))) - spr.kron(N_t_t(self), P_x_xy(self)) )
dynamic_problem.A_tx_txy = A_tx_txy
def b_tx(self,q=np.array([])):
if len(q)==0:
q = np.ones(self.nbX)
return( np.concatenate((q,np.zeros(self.nbX * (self.nbT-1 )))))
dynamic_problem.b_tx = b_tx
In our example:
print("b=",small_example.b_tx())
small_example.A_tx_txy()[0:5,0:8].toarray()
b= [1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
array([[ 1. , 1. , 0. , 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 1. , 1. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 1. , 1. , 0. , 0. ], [-0.75, -1. , 0. , -1. , -0.25, -1. , 1. , 1. ], [-0.25, 0. , -0.75, 0. , 0. , 0. , 0. , 0. ]])
Let $u_{xy}^{t}$ be the (exogenous) payoff associated with choice $y\in\mathcal{Y}$ at time $t\in\mathcal{T}=\left\{ 1,...,T\right\} $ in state $x\in\mathcal{X}$, discounted in order to be expressed in zeroth-period equivalent (for example: $u_{xy}^{t}=\beta^{t}u_{xy}$ where $\beta$ is a constant discount factor).
Example. Back to our bus example, we assume there is a fixed cost $C$ associated with overhaul (independent of mileage), while operations costs $c\left( x\right)$ increase with mileage (maintenance, fuel, insurance and costs of unexpected breakdowns). Specifically, assume the following:
The cost of replacing an engine is $C=\$8,000$ (in $1985$ dollars).
The operations cost is $c\left( x\right) = \frac {C x} {\left| \mathcal{X} \right|} .$
The discount factor is $\beta=0.9$.
Next, we build $u_{xyt}$
First the $u_{xy}$'s so that $u_{x0}=-x\times5.10^{2}$ for $x<\bar{x}$, and $u_{\bar{x}0}=-C$, while $u_{x1}=-C$ for all $x$.
Next the $u_{xyt}$ so that $u_{xyt}=u_{xy}\beta^{t}=vec_F\left(\left(\beta^{1},...,\beta^{T}\right) \otimes u_{xy}\right)$
def u_txy(self, overhaulCost = 8000 , beta = 0.9):
maintCost = lambda x: x * overhaulCost / self.nbX
discount = lambda x: beta**x
u_xy = np.array(list(zip([-maintCost(x) for x in range(self.nbX)],[-overhaulCost for x in range(self.nbX)] ))).reshape(1,-1)
return(np.kron(np.array([beta ** i for i in range(self.nbT)]),u_xy) )
dynamic_problem.u_txy = u_txy
In our example:
small_example.u_txy()
array([[ -0. , -8000. , -2666.66666667, -8000. , -5333.33333333, -8000. , -0. , -7200. , -2400. , -7200. , -4800. , -7200. , -0. , -6480. , -2160. , -6480. , -4320. , -6480. , -0. , -5832. , -1944. , -5832. , -3888. , -5832. ]])
We now have everything ready to write down the intertemporal (primal) optimization problem, which expresses
\begin{align*} \max_{\pi\geq0} & \, u^{\top}\pi\\ s.t.~ & A\pi=b~\left[U\right] \end{align*}while the dual problem is given by
\begin{align*} \min_{U} & \, b^{\top}U\\ s.t.~ & A^{\top} U\geq u~\left[\pi \geq 0\right] . \end{align*}We shall write explicitely both problems to interpret the dual, but first, let's implement.
def solve_lp(self):
m = grb.Model()
pi_txy = m.addMVar(shape=self.nbT*self.nbX*self.nbY )
m.setObjective(self.u_txy() @ pi_txy, grb.GRB.MAXIMIZE)
m.addConstr( self.A_tx_txy() @ pi_txy == self.b_tx() )
m.optimize()
piopt_t_x_y = np.array(m.getAttr('X')).reshape((self.nbT,self.nbX,self.nbY))
Uopt_t_x = np.array(m.getAttr('pi')).reshape((self.nbT,self.nbX))
return({'pi_t_x_y':piopt_t_x_y,'U_t_x':Uopt_t_x})
dynamic_problem.solve_lp = solve_lp
In our example:
sol_lp = small_example.solve_lp()
sol_lp['U_t_x'][0,]
-------------------------------------------- Warning: your license will expire in 4 days -------------------------------------------- Academic license - for non-commercial use only - expires 2021-10-25 Using license file c:\gurobi911\gurobi.lic Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64) Thread count: 6 physical cores, 12 logical processors, using up to 12 threads Optimize a model with 12 rows, 24 columns and 51 nonzeros Model fingerprint: 0x2dd846f5 Coefficient statistics: Matrix range [3e-01, 1e+00] Objective range [2e+03, 8e+03] Bounds range [0e+00, 0e+00] RHS range [1e+00, 1e+00] Presolve removed 12 rows and 24 columns Presolve time: 0.01s Presolve: All rows and columns removed Iteration Objective Primal Inf. Dual Inf. Time 0 -2.2023625e+04 0.000000e+00 0.000000e+00 0s Solved in 0 iterations and 0.01 seconds Optimal objective -2.202362500e+04
array([-2999.625, -9512. , -9512. ])
The central planner's problem is:
\begin{align*} \max_{\pi_{xy}^{t}\geq0} & \sum_{x\in\mathcal{X},~y\in\mathcal{Y},~t\in\mathcal{T}}\pi_{xy}^{t}u_{xy}^{t} \\ s.t. & \sum_{y^{\prime}\in\mathcal{Y}}\pi_{xy^{\prime}}^{0}=q_{x}~\left[U_{x}^{1}\right] \\ & \sum_{y^{\prime}\in\mathcal{Y}}\pi_{x^{\prime}y^{\prime}}^{t}=\sum_{x\in\mathcal{X},~y\in\mathcal{Y}}P_{x^{\prime}|xy}\pi_{xy}^{t-1}~\forall t\in\mathcal{T}\backslash\left\{ 0\right\} ~\left[ U_{x^{\prime}}^{t}\right] \end{align*}We have introduced $U_{x}^{t}$ the Lagrange multiplier associated with the constraints at time $t$. The dual problem is:
\begin{align*} \min_{U_{x}^{t},~t\in\mathcal{T},~x\in\mathcal{X}} & \sum_{x\in\mathcal{X}}q_{x}U_{x}^{0} \\ s.t.~ & U_{x}^{t}\geq u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}~\forall x\in\mathcal{X},~y\in\mathcal{Y},~t\in\mathcal{T}\backslash\left\{ T - 1\right\} \\ & U_{x}^{T-1}\geq u_{xy}^{T-1}~\forall x\in\mathcal{X},y\in\mathcal{Y} \end{align*}We shall see that $U_{x}^t$ represents the intertemporal payoff of being in state $x$ at time $t$, while the constraints is a Bellman equation.
By complementary slackness, we have
\begin{align*} \pi_{xy}^{t}>0\Longrightarrow U_{x}^{t}=u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy} \end{align*}whose interpretation is immediate: if $y$ is the optimal choice in state $x$ at time $t$, then the intertemporal payoff of $x$ at $t$ is the sum of her myopic payoff $u_{xy}^{t}$ and her expected payoff at the next step.
As a result, the dual variable is called intertemporal payoff in the vocable of dynamic programming. The relation yields Bellman's equation, verified as soon as $n^t_x>0$:
\begin{align*} U_{x}^{t}=\max_{y\in\mathcal{Y}}\left\{ u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}\right\}, \end{align*}
It is easy to see that if $U$ satisfies Bellman's equation for all $(x,t)$, then it solves the dual program.
But there is in fact a much faster way to compute the primal and dual solutions without having to use the full power of a linear programming solver. Along with the fact that $U^{T}=0$, Bellman's equation implies that there is a particularly simple method to obtain the dual variables $U^{t}$, by solving recursively backward in time, from $t=T-1$ to $t=0$. This method is called backward induction:
Algorithm [Backward induction]
Set $U^{T}=0$
For $t=T-1$ down to $0$, set $U_{x}^{t}:=\max_{y\in\mathcal{Y}}\left\{u_{xy}^{t}+\sum_{x^{\prime}}U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}\right\}$.
We implement as follows:
def backward_induction(self):
U_t_x = np.zeros((self.nbT,self.nbX))
contVals_x = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[self.nbT-1,:,:].max(axis=1)
U_t_x[ self.nbT-1,:] = contVals_x
for t in range(self.nbT-2, -1, -1):
myopic_x_y = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[t,:,:]
EcontVals_x_y = (contVals_x.transpose() @ self.P_x_xy()).reshape((self.nbX,self.nbY))
contVals_x = (myopic_x_y + EcontVals_x_y).max(axis = 1)
U_t_x[ t,:] = contVals_x
return(U_t_x)
dynamic_problem.backward_induction = backward_induction
In our example:
sol_bi = small_example.backward_induction()
print(pd.DataFrame(zip(sol_bi.flatten(),sol_lp['U_t_x'].flatten()),headers=["bi","lp" ]))
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-13-9f96fabff82e> in <module> 1 sol_bi = small_example.backward_induction() 2 ----> 3 print(pd.DataFrame(zip(sol_bi.flatten(),sol_lp['U_t_x'].flatten()),headers=["bi","lp" ])) NameError: name 'pd' is not defined
The primal variables $\pi^{t}$ are then deduced also by recursion, but this time forward in time from $t=1$ to $t=T-1$, by the so-called forward induction method:
Algorithm [Forward induction]
Set $b^{0}=q$ and compute $\left( U^{t}\right)$ by backward induction.
For $t=0$ up to $T-1$, pick $\pi^{t}$ such that $\pi_{xy}^{t}/n_{x}^{t}$ is a probability measure supported in the set
We implement as follows:
def backward_forward_induction(self, q=np.array([]) ):
U_t_x = self.backward_induction()
n_t_x = self.b_tx(q).reshape((self.nbT,self.nbX))
pi_t_x_y = np.zeros((self.nbT,self.nbX,self.nbY))
self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[self.nbT-1,:,:].max(axis=1)
for t in range(self.nbT-1):
myopic_x_y = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[t,:,:]
EcontVals_x_y = (U_t_x[t+1,:].transpose() @ self.P_x_xy()).reshape((self.nbX,self.nbY))
Y_x = (myopic_x_y + EcontVals_x_y).argmax(axis = 1)
for x in range(self.nbX):
pi_t_x_y[t,x,Y_x[x]] = n_t_x[t,x]
n_t_x[t+1,:] = (self.P_x_xy() @ pi_t_x_y[t,:,:].flatten())
myopic_x_y = self.u_txy().reshape(self.nbT,self.nbX,self.nbY)[self.nbT-1,:,:]
Y_x = myopic_x_y.argmax(axis = 1)
for x in range(self.nbX):
pi_t_x_y[self.nbT-1,x,Y_x[x]] = n_t_x[self.nbT-1,x]
return({'pi_t_x_y':pi_t_x_y,'U_t_x':U_t_x})
dynamic_problem.backward_forward_induction = backward_forward_induction
In our example:
sol_bf = small_example.backward_forward_induction()
#print(Value )
print('Value sol bf=',(small_example.u_txy() * (sol_bf['pi_t_x_y'].flatten())).sum(), ' ; value sol lp=',(small_example.u_txy() * (sol_lp['pi_t_x_y'].flatten())).sum())
print(tabulate(zip(sol_bf['pi_t_x_y'].flatten(),sol_lp['pi_t_x_y'].flatten()),headers=["pi (bi)","pi (lp)" ]))
Value sol bf= -22023.625 ; value sol lp= -22023.625 pi (bi) pi (lp) --------- --------- 1 1 0 0 0 0 1 1 0 0 1 1 2.75 2.75 0 0 0.25 0.25 0 0 0 0 0 0 2.0625 2.0625 0 0 0.875 0.875 0 0 0 0 0.0625 0.0625 1.60938 1.60938 0 0 1.17188 1.17188 0 0 0.21875 0.21875 0 0
The dual variable is $U$ not necessarily unique (if $(x,t)$ is not visited, $U^t_x$ can take typically several values); the primal variable is not either, as there may be ties between several states.
The computation by the combination of the backward and forward algorithms is much faster than the computation by a black-box linear programming solver.
However, as soon as we introduce capacity constraints, the computation by backward induction no longer works, and the linear programming formulation is necessary, as we shall now see.
Let us now assume that for each category $x$ at most $k_y$ individuals can take choice $y$ at each time, where $\sum_y k_y \geq \sum_x q_x$. An additional constraint is therefore that for all $t\in\mathcal{T}$, $x \in \mathcal{X}$ and $y \in \mathcal{Y}$, one should have:
$$ \pi^t_{xy} \leq k_y $$Exercise.
With a constraint of the form $B\pi\leq m$, the primal problem then writes
\begin{align*} \max_{\pi\geq0} & u^{\top}\pi\\ s.t.~ & A \pi=b~\left[ U\right] \\ & B \pi\leq m~\left[ \Lambda\right] \end{align*}whose dual is
\begin{align*} \min_{U,\Lambda\geq0} & b^{\top}U+m^{\top}\Lambda\\ s.t.~ & A^\top U+B^\top\Lambda\geq u~\left[ \pi\right] \end{align*}The dual becomes \begin{align*} \min_{U_{x}^{t},\lambda_{y}^{t}\geq0} & \sum_{x\in\mathcal{X}}n_{x}U_{x}% ^{1}+\sum_{x\in\mathcal{X}}\sum_{t\in\mathcal{T}}m_{y}\lambda_{y}^{t}\\ s.t.~ & U_{x}^{t}\geq u_{xy}^{t}-\lambda_{y}^{t}+\sum_{x^{\prime}% }U_{x^{\prime}}^{t+1}P_{x^{\prime}|xy}~\forall x\in\mathcal{X},~y\in \mathcal{Y},~t\in\mathcal{T}\backslash\left\{ T\right\} \nonumber\\ & U_{x}^{T}\geq u_{xy}^{T}~\forall x\in\mathcal{X},y\in\mathcal{Y}\nonumber \end{align*}
and $\lambda_{y}^{t}$ interprets as the shadow price of alternative $y$ at time $t$.
Solution to the exercise. The constraints $\pi^t_{xy} \leq k_y$ express as $$vec_C(\pi) \leq 1_{XT} \otimes k$$ However, when the upper bound is $+\infty$, we'd rather drop the corresponding constraints. In which case, the matrix form becomes $$(I_{XT} \otimes (0,1)) vec_C(\pi) \leq 1_{XT}$$ which we code as:
Abis_tx_txy = spr.kron(spr.identity(small_example.nbT * small_example.nbX),np.array([0,1]))
bbis_tx = np.ones(small_example.nbX * small_example.nbT )
m = grb.Model()
pi_txy = m.addMVar(shape=small_example.nbT*small_example.nbX*small_example.nbY )
m.setObjective(small_example.u_txy() @ pi_txy, grb.GRB.MAXIMIZE)
m.addConstr( small_example.A_tx_txy() @ pi_txy == small_example.b_tx())
m.addConstr( Abis_tx_txy @ pi_txy <= bbis_tx)
m.optimize()
pioptbis_t_x_y = np.array(m.getAttr('X')).reshape((small_example.nbT,small_example.nbX,small_example.nbY))
Uoptbis_t_x = np.array(m.getAttr('pi'))[:(small_example.nbX * small_example.nbT)].reshape((small_example.nbT,small_example.nbX))
Uoptbis_t_x[0,]
Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (win64) Thread count: 6 physical cores, 12 logical processors, using up to 12 threads Optimize a model with 24 rows, 24 columns and 63 nonzeros Model fingerprint: 0xf58cf4ba Coefficient statistics: Matrix range [3e-01, 1e+00] Objective range [2e+03, 8e+03] Bounds range [0e+00, 0e+00] RHS range [1e+00, 1e+00] Presolve removed 21 rows and 17 columns Presolve time: 0.01s Presolved: 3 rows, 7 columns, 10 nonzeros Iteration Objective Primal Inf. Dual Inf. Time 0 -1.5354500e+04 1.250000e+00 0.000000e+00 0s 3 -2.2023625e+04 0.000000e+00 0.000000e+00 0s Solved in 3 iterations and 0.01 seconds Optimal objective -2.202362500e+04
array([ -2999.625 , -10185.04166667, -11475.83333333])
Another plausible constraint is to assume that the maximum of buses that can go on maintenance all together is $k$, say e.g. $k=1.5$. This constraint now writes $$\left( I_{T}\otimes 1_{X}^{\top }\otimes (0,1)\right) vec\left( \pi \right) \leq k 1_{T},$$ and is coded as:
Abis_tx_txy = spr.kron(spr.kron(spr.identity(small_example.nbT ), np.ones(small_example.nbX)),np.array([0,1]))
bbis_tx = 1.5 * np.ones( small_example.nbT )
m = grb.Model()
pi_txy = m.addMVar(shape=small_example.nbT*small_example.nbX*small_example.nbY )
m.setObjective(small_example.u_txy() @ pi_txy, grb.GRB.MAXIMIZE)
m.addConstr( small_example.A_tx_txy() @ pi_txy == small_example.b_tx())
m.addConstr( Abis_tx_txy @ pi_txy <= bbis_tx)
m.optimize()
pioptbis_t_x_y = np.array(m.getAttr('X')).reshape((small_example.nbT,small_example.nbX,small_example.nbY))
Uoptbis_t_x = np.array(m.getAttr('pi'))[:(small_example.nbX * small_example.nbT)].reshape((small_example.nbT,small_example.nbX))
Uoptbis_t_x[0,]
Warning for adding constraints: zero or small (< 1e-13) coefficients, ignored Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (linux64) Thread count: 2 physical cores, 4 logical processors, using up to 4 threads Optimize a model with 16 rows, 24 columns and 63 nonzeros Model fingerprint: 0xca47d0bd Coefficient statistics: Matrix range [2e-01, 1e+00] Objective range [2e+03, 8e+03] Bounds range [0e+00, 0e+00] RHS range [1e+00, 2e+00] Presolve removed 12 rows and 16 columns Presolve time: 0.02s Presolved: 4 rows, 8 columns, 14 nonzeros Iteration Objective Primal Inf. Dual Inf. Time 0 -1.5354500e+04 1.250000e+00 0.000000e+00 0s 4 -2.2360146e+04 0.000000e+00 0.000000e+00 0s Solved in 4 iterations and 0.02 seconds Optimal objective -2.236014583e+04
array([ -2999.625 , -10185.04166667, -10185.04166667])
In these cases, the linear programming solver is our only way to compute the solution. The backward induction algorithm would not apply because of the capacity constraint.