Richard M. Murray, 20 Jan 2022 (updated 7 Jul 2024)
This example works through the linear quadratic finite time optimal control problem. We assume that we have a linear system of the form $$ \dot x = A x + Bu $$ and that we want to minimize a cost function of the form $$ \int_0^T (x^T Q_x x + u^T Q_u u) dt + x^T P_1 x. $$ We show how to compute the solution the Riccati ODE and use this to obtain an optimal (time-varying) linear controller.
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import control as ct
import control.optimal as opt
import control.flatsys as fs
import time
We use the linearized dynamics of the vehicle steering problem as our linear system. This is mainly for convenient (since we have some intuition about it).
# Vehicle dynamics (bicycle model)
# Function to take states, inputs and return the flat flag
def _kincar_flat_forward(x, u, params={}):
# Get the parameter values
b = params.get('wheelbase', 3.)
#! TODO: add dir processing
# Create a list of arrays to store the flat output and its derivatives
zflag = [np.zeros(3), np.zeros(3)]
# Flat output is the x, y position of the rear wheels
zflag[0][0] = x[0]
zflag[1][0] = x[1]
# First derivatives of the flat output
zflag[0][1] = u[0] * np.cos(x[2]) # dx/dt
zflag[1][1] = u[0] * np.sin(x[2]) # dy/dt
# First derivative of the angle
thdot = (u[0]/b) * np.tan(u[1])
# Second derivatives of the flat output (setting vdot = 0)
zflag[0][2] = -u[0] * thdot * np.sin(x[2])
zflag[1][2] = u[0] * thdot * np.cos(x[2])
return zflag
# Function to take the flat flag and return states, inputs
def _kincar_flat_reverse(zflag, params={}):
# Get the parameter values
b = params.get('wheelbase', 3.)
dir = params.get('dir', 'f')
# Create a vector to store the state and inputs
x = np.zeros(3)
u = np.zeros(2)
# Given the flat variables, solve for the state
x[0] = zflag[0][0] # x position
x[1] = zflag[1][0] # y position
if dir == 'f':
x[2] = np.arctan2(zflag[1][1], zflag[0][1]) # tan(theta) = ydot/xdot
elif dir == 'r':
# Angle is flipped by 180 degrees (since v < 0)
x[2] = np.arctan2(-zflag[1][1], -zflag[0][1])
else:
raise ValueError("unknown direction:", dir)
# And next solve for the inputs
u[0] = zflag[0][1] * np.cos(x[2]) + zflag[1][1] * np.sin(x[2])
thdot_v = zflag[1][2] * np.cos(x[2]) - zflag[0][2] * np.sin(x[2])
u[1] = np.arctan2(thdot_v, u[0]**2 / b)
return x, u
# Function to compute the RHS of the system dynamics
def _kincar_update(t, x, u, params):
b = params.get('wheelbase', 3.) # get parameter values
#! TODO: add dir processing
dx = np.array([
np.cos(x[2]) * u[0],
np.sin(x[2]) * u[0],
(u[0]/b) * np.tan(u[1])
])
return dx
def _kincar_output(t, x, u, params):
return x # return x, y, theta (full state)
# Create differentially flat input/output system
kincar = fs.FlatSystem(
_kincar_flat_forward, _kincar_flat_reverse, name="kincar",
updfcn=_kincar_update, outfcn=_kincar_output,
inputs=('v', 'delta'), outputs=('x', 'y', 'theta'),
states=('x', 'y', 'theta'))
# Utility function to plot lane change manuever
def plot_lanechange(t, y, u, figure=None, yf=None):
# Plot the xy trajectory
plt.subplot(3, 1, 1, label='xy')
plt.plot(y[0], y[1])
plt.xlabel("x [m]")
plt.ylabel("y [m]")
if yf is not None:
plt.plot(yf[0], yf[1], 'ro')
# Plot the inputs as a function of time
plt.subplot(3, 1, 2, label='v')
plt.plot(t, u[0])
plt.xlabel("Time $t$ [sec]")
plt.ylabel("$v$ [m/s]")
plt.subplot(3, 1, 3, label='delta')
plt.plot(t, u[1])
plt.xlabel("Time $t$ [sec]")
plt.ylabel("$\\delta$ [rad]")
plt.suptitle("Lane change manuever")
plt.tight_layout()
# Initial conditions
x0 = np.array([-40, -2., 0.])
u0 = np.array([10, 0]) # only used for linearization
Tf = 4
# Linearized dynamics
sys = kincar.linearize(x0, u0)
print(sys)
We generate an trajectory for the system that minimizes the cost function above. Namely, starting from some initial function $x(0) = x_0$, we wish to bring the system toward the origin without using too much control effort.
# Define the cost function and the terminal cost
# (try changing these later to see what happens)
Qx = np.diag([1, 1, 1]) # state costs
Qu = np.diag([1, 1]) # input costs
Pf = np.diag([1, 1, 1]) # terminal costs
The optimal solution satisfies the following equations, which follow from the maximum principle:
$$ \begin{aligned} \dot x &= \left(\frac{\partial H}{\partial \lambda}\right)^T = A x + Bu, \qquad & x(0) &= x_0, \\ -\dot \lambda &= \left(\frac{\partial H}{\partial x}\right)^T = Q_x x + A^T \lambda, \qquad & \lambda(T) &= P_1 x(T), \\ 0 &= \left(\frac{\partial H}{\partial u}\right)^T = Q_u u + B^T \lambda. && \end{aligned} $$The last condition can be solved to obtain the optimal controller
$$ u = -Q_u^{-1} B^T \lambda, $$which can be substituted into the equations for the optimal solution.
Given the linear nature of the dynamics, we attempt to find a solution by setting $\lambda(t) = P(t) x(t)$ where $P(t) \in {\mathbb R}^{n \times n}$. Substituting this into the necessary condition, we obtain
$$ \begin{aligned} & \dot\lambda = \dot P x + P \dot x = \dot P x + P(Ax - BQ_u^{-1} B^T P) x, \\ & \quad\implies\quad -\dot P x - PA x + PBQ_u^{-1}B P x = Q_xx + A^T P x. \end{aligned} $$This equation is satisfied if we can find $P(t)$ such that
$$ -\dot P = PA + A^T P - P B Q_u^{-1} B^T P + Q_x, \qquad P(T) = P_1. $$To solve a final value problem with $P(T) = P_1$, we set the "initial" condition to $P_1$ and then invert time, so that we solve
$$ \frac{dP}{d(-t)} = -\frac{dP}{dt} = -F(P), \qquad P(0) = P_1 $$Solving this equation from time $t = 0$ to time $t = T$ will give us an solution that goes from $P(T)$ to $P(0)$.
# Set up the Riccatti ODE
def Pdot_reverse(t, x):
# Get the P matrix from the state by resizing
P = np.reshape(x, (sys.nstates, sys.nstates))
# Compute the right hand side of Riccati ODE
Prhs = P @ sys.A + sys.A.T @ P + Qx - \
P @ sys.B @ np.linalg.inv(Qu) @ sys.B.T @ P
# Return P as a vector, *backwards* in time (no minus sign)
return Prhs.reshape((-1))
# Solve the Riccati ODE (converting from matrix to vector and back)
P0 = np.reshape(Pf, (-1))
Psol = sp.integrate.solve_ivp(Pdot_reverse, (0, Tf), P0)
Pfwd = np.reshape(Psol.y, (sys.nstates, sys.nstates, -1))
# Reorder the solution in time
Prev = Pfwd[:, :, ::-1]
trev = Tf - Psol.t[::-1]
print("Trange = ", trev[0], "to", trev[-1])
print("P[Tf] =", Prev[:,:,-1])
print("P[0] =", Prev[:,:,0])
# Internal comparison: show that initial value is close to algebraic solution
_, P_lqr, _ = ct.lqr(sys.A, sys.B, Qx, Qu)
print("P_lqr =", P_lqr)
For solving the $x$ dynamics, we need a function to evaluate $P(t)$ at an arbitrary time (used by the integrator). We can do this with the SciPy interp1d
function:
# Define an interpolation function for P
P = sp.interpolate.interp1d(trev, Prev)
print("P(0) =", P(0))
print("P(3.5) =", P(3.5))
print("P(4) =", P(4))
We now solve the $\dot x$ equations forward in time, using $P(t)$:
# Now solve the state forward in time
def xdot_forward(t, x):
u = -np.linalg.inv(Qu) @ sys.B.T @ P(t) @ x
return sys.A @ x + sys.B @ u
# Now simulate from a shifted initial condition
xsol = sp.integrate.solve_ivp(xdot_forward, (0, Tf), x0)
tvec = xsol.t
x = xsol.y
print("x[0] =", x[:, 0])
print("x[Tf] =", x[:, -1])
# Finally compute the "desired" state and input values
xd = x
ud = np.zeros((sys.ninputs, tvec.size))
for i, t in enumerate(tvec):
ud[:, i] = -np.linalg.inv(Qu) @ sys.B.T @ P(t) @ x[:, i]
plot_lanechange(tvec, xd, ud)
Note here that we are stabilizing the system to the origin (compared to some of other examples where we change langes and so the final $y$ position is $y_\text{f} = 2$.