import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import pandas as pd
import sympy as sy
sy.init_printing()
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
LU factorization arises due its computational efficiency, it mainly facilitates solving the system of linear equations, though the flops (number of floating operations) of LU is still higher than Guassian-Jordon.
Nonetheless LU factorization is particularly handy if you are computing multiple solutions of $A x =b$.
One example is, if you have a set of $\{b_i,\ i \in \mathbb Z\}$ to substitute in the system, such that $$ Ax=b_1,\quad Ax=b_2,\quad Ax=b_3,\quad Ax=b_4, \quad... $$ we only decompose $A$ once, but Guassian-Jordon algorithm will have to re-do operations with every augmented $[A |b_i]$.
We aim to decompose a matrix $A$ into $L$ and $U$, which represent lower and upper triangular matrix respectively. And $L$ must have $1$'s on its principal diagonal. $$ A = LU $$ For instance, $$ A=\underbrace{\left[\begin{array}{cccc} 1 & 0 & 0 & 0 \
\end{array}\right]}{L}\underbrace{\left[\begin{array}{ccccc} \blacksquare & * & * & * & * \\ 0 & \blacksquare & * & * & * \\ 0 & 0 & 0 & \blacksquare & * \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]}{U} $$
The plausibility of decomposition hinges on if $A$ can be converted into a upper triangular matrix after a series of row operations, i.e.
$$E_p...E_2E_1 A = U$$Then
$$A = (E_p...E_2E_1)^{-1}U = LU$$where
$$L = (E_p...E_2E_1)^{-1}$$We will hand calculate an example here:
$$ A = \begin{bmatrix} 9 & 3 & 6\\3 & 4 & 6\\0 & 8 & 8\\ \end{bmatrix} $$Or we can compute $E_1^{-1}$ and $E_2^{-1}$ directly, because $$ L = E_1^{-1}E_2^{-1} $$
Construct augmented matrices for $E_2$ and $E_1$ $$ [E_1| I]=\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ -\frac{1}{3} & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right] \sim \left[\begin{array}{ccc|ccc} 1 &0 &0 &1 & 0 & 0\\ 0& 1& 0& \frac{1}{3} & 1 & 0\\ 0 & 0 & 1 &0 & 0 & 1 \end{array}\right] =[I|E_1^{-1}] $$
$$ [E_2| I]= \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 1 & 0\\ 0 & -\frac{8}{3} & 1 & 0 & 0 & 1\\ \end{array} \right] \sim \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0 &\frac{8}{3}& 1\\ \end{array} \right] =[I|E_2^{-1}] $$And we get the inverse of $E_1$ and $E_2$: $$ E_1^{-1} = \left[\begin{array}{ccc} 1& 0& 0 \\ \frac{1}{3}& 1& 0\\ 0& 0 &1 \end{array}\right]\\ E_2^{-1} = \left[\begin{array}{ccc} 1& 0& 0 \\ 1& 1& 0\\ 0& \frac{8}{3} &1 \end{array}\right] $$
The final result of LU decomposition: $$ A = LU = \underbrace{ \left[\begin{array}{ccc} 1& 0& 0 \\ \frac{4}{3}&1& 0\\ 0& \frac{8}{3} &1 \end{array}\right]}_{L} \underbrace{ \begin{bmatrix} 9 & 3 & 6\\0 & 3 & 4\\0 & 0 & -\frac{8}{3}\\ \end{bmatrix}}_{E_2E_1A=U} $$
We can take a look at SciPy results.
A = np.array([[9, 3, 6], [3, 4, 6], [0, 8, 8]]); A
array([[9, 3, 6], [3, 4, 6], [0, 8, 8]])
P, L, U = sp.linalg.lu(A)
P
L
U
array([[1., 0., 0.], [0., 0., 1.], [0., 1., 0.]])
array([[1. , 0. , 0. ], [0. , 1. , 0. ], [0.33333333, 0.375 , 1. ]])
array([[9., 3., 6.], [0., 8., 8.], [0., 0., 1.]])
P@L@U # this is A
array([[9., 3., 6.], [3., 4., 6.], [0., 8., 8.]])
It is easy to notice the SciPy lu
function give us more than $L$ and $U$, but also $P$, which is a permutation matrix. Theoretically, we don't need row switches to convert $A$ into $U$, but in some situations we must make row switches beforehand, otherwise decomposition will fail to materialize.
Thus Scipy uses $PLU$ decomposition to make the procedure always possible $$ A = PLU $$
Also $P = P^{-1}$, why? It's easier to analyze in augmented matrix expression, the inverse of row-switched elementary matrices are themselves.
$$ [P| I]=\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 0 & 0 & 1 \end{array}\right] \sim \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 & 1 & 0 \end{array}\right]=[I| P^{-1}] =[P^{-1}|I] $$With these knowledge, what we are decomposing is actucally
$$ PA = LU $$Rearrange that we can see $PL$ $$ A = \underbrace{E_0^{-1}}_{P} \underbrace{(E_1^{-1}E_2^{-1})}_{L}U $$
Solve the linear system: $$ \begin{align} 3x_1-7x_2 -2x_3+2x_4&=-9\\ -3x_1+5x_2 +x_3 &=5\\ 6x_1-4x_2 -5x_4&=7\\ -9x_1+5x_2 -5x_3+12x_4&=11\\ \end{align} $$ In matrix form: $$ \underbrace{\left[\begin{array}{rrrr} 3 & -7 & -2 & 2 \\ -3 & 5 & 1 & 0 \\ 6 & -4 & 0 & -5 \\ -9 & 5 & -5 & 12 \end{array}\right]}_{A} \left[\begin{array}{r} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array}\right] = \underbrace{\left[\begin{array}{r} -9 \\ 5 \\ 7 \\ 11 \end{array}\right]}_{b} $$ Perform $LU$ decomposition:
$$\underbrace{\left[\begin{array}{rrrr} 3 & -7 & -2 & 2 \\ -3 & 5 & 1 & 0 \\ 6 & -4 & 0 & -5 \\ -9 & 5 & -5 & 12 \end{array}\right]}_{A} =\underbrace{ \left[\begin{array}{rrrr} 1 & 0 & 0 & 0 \\ -1 & 1 & 0 & 0 \\ 2 & -5 & 1 & 0 \\ -3 & 8 & 3 & 1 \end{array}\right]}_{L}\underbrace{\left[\begin{array}{rrrr} 3 & -7 & -2 & 2 \\ 0 & -2 & -1 & 2 \\ 0 & 0 & -1 & 1 \\ 0 & 0 & 0 & -1 \end{array}\right]}_{U}$$Replace $A$ by $LU$, we get $L(Ux) = b$, now solve this pair of equations
$$ Ly = b\\ Ux = y $$Construct augmented matrix $[L|b]$
Next we solve $Ux = y$, to show this in NumPy:
U = np.array([[3, -7, -2, 2], [0, -2, -1, 2], [0, 0, -1, 1], [0, 0, 0,-1]])
y = np.array([-9, -4, 5, 1])
np.linalg.solve(U, y)
array([ 3., 4., -6., -1.])
If the process is correct, this is the solution of the system and we can verify results by invoking np.linalg.solve()
.
A = np.array([[3, -7, -2, 2], [-3, 5, 1, 0], [6, -4, 0, -5], [-9, 5, -5, 12]])
b = np.array([-9, 5, 7, 11])
np.linalg.solve(A, b)
array([ 3., 4., -6., -1.])
The results are the same, $LU$ decomposition works!
Cholesky decomposition is analogous to taking the square root of a matrix. A common example is the decomposition of a covariance matrix. In this context, the $L$ matrix plays a role similar to that of the correlation matrix. $$ \Sigma = L L^T $$
Suppose we have a vector of uncorrelated random variables $\mathbf{z}$. Each element of $\mathbf{z}$ is a standard normal random variable with zero mean and unit variance. In mathematical terms, $\mathbf{z} \sim \mathcal{N}(0, I)$, where $I$ is the identity matrix. $$ \mathbf{z}=\left(\begin{array}{c} z_1 \\ z_2 \\ \vdots \\ z_n \end{array}\right) $$
We can transform non-correlated series $x$ into corelated $z$ $$ x = Lz $$
To understand this, we examine the covariance matrix of the transformed vector ( x ): $$ \text{Cov}(x) = \text{Cov}(Lz) = L \text{Cov}(z) L^T = L I L^T = L L^T = \Sigma $$ This shows that $\text{Cov}(x)$ has the desired covariance structure.
Let's see a numerical example.
# use Antithetic Variates to reduce variance
n = 1000
z = np.random.normal(size=(1000, 3))
z2 = -z
z = np.concatenate((z, z2), axis=0)
cov_matrix = np.array([[1, 0.8, 0.7], [0.8, 1, 0.8], [0.7, 0.8, 1]])
L = np.linalg.cholesky(cov_matrix)
z = z @ L.T
np.cov(z.T)
array([[0.99424759, 0.80479653, 0.71424917], [0.80479653, 1.00962669, 0.81794008], [0.71424917, 0.81794008, 1.02650048]])