For simplicity we consider a square matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$.
History: Chinese mathematical text The Nine Chapters on the Mathematical Art, Issac Newton and Carl Friedrich Gauss.
A toy example.
A = [2.0 1.0 -1.0; -3.0 -1.0 2.0; -2.0 1.0 2.0]
3×3 Array{Float64,2}: 2.0 1.0 -1.0 -3.0 -1.0 2.0 -2.0 1.0 2.0
b = [8.0; -11.0; -3.0]
3-element Array{Float64,1}: 8.0 -11.0 -3.0
# solve linear equation
A \ b
3-element Array{Float64,1}: 2.0 3.0 -1.0
What happens when we call A \ b
to solve a linear equation?
$2m$ flops.
Column 1:
E21 = [1.0 0.0 0.0; 1.5 1.0 0.0; 0.0 0.0 1.0]
3×3 Array{Float64,2}: 1.0 0.0 0.0 1.5 1.0 0.0 0.0 0.0 1.0
# zero (2, 1) entry
E21 * A
3×3 Array{Float64,2}: 2.0 1.0 -1.0 0.0 0.5 0.5 -2.0 1.0 2.0
E31 = [1.0 0.0 0.0; 0.0 1.0 0.0; 1.0 0.0 1.0]
3×3 Array{Float64,2}: 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0
# zero (3, 1) entry
E31 * E21 * A
3×3 Array{Float64,2}: 2.0 1.0 -1.0 0.0 0.5 0.5 0.0 2.0 1.0
Column 2:
E32 = [1.0 0.0 0.0; 0.0 1.0 0.0; 0.0 -4.0 1.0]
3×3 Array{Float64,2}: 1.0 0.0 0.0 0.0 1.0 0.0 0.0 -4.0 1.0
# zero (3, 2) entry
E32 * E31 * E21 * A
3×3 Array{Float64,2}: 2.0 1.0 -1.0 0.0 0.5 0.5 0.0 0.0 -1.0
For the $k$-th column, $$ \mathbf{M}_k = \mathbf{E}_{nk}(c_{nk}) \cdots \mathbf{E}_{k+1,k}(c_{k+1,k}) = \begin{pmatrix} 1 & \\ & \ddots \\ & & 1 & \\ & & c_{k+1,k} & 1\\ & & \vdots & & \ddots \\ & & c_{n,k} & & & 1 \end{pmatrix}. $$
M1 = E31 * E21
3×3 Array{Float64,2}: 1.0 0.0 0.0 1.5 1.0 0.0 1.0 0.0 1.0
M2 = E32
3×3 Array{Float64,2}: 1.0 0.0 0.0 0.0 1.0 0.0 0.0 -4.0 1.0
inv(M1)
3×3 Array{Float64,2}: 1.0 0.0 0.0 -1.5 1.0 0.0 -1.0 0.0 1.0
inv(M2)
3×3 Array{Float64,2}: 1.0 0.0 0.0 0.0 1.0 0.0 0.0 4.0 1.0
Gaussian elimination does
$$
\mathbf{M}_{n-1} \cdots \mathbf{M}_1 \mathbf{A} = \mathbf{U}.
$$
Let
Thus we have the LU decomposition $$ \mathbf{A} = \mathbf{L} \mathbf{U}, $$ where $\mathbf{L}$ is unit lower triangular and $\mathbf{U}$ is upper triangular.
The whole LU algorithm is done in place, i.e., $\mathbf{A}$ is overwritten by $\mathbf{L}$ and $\mathbf{U}$.
LU decomposition exists if the principal sub-matrix $\mathbf{A}[1:k, 1:k]$ is non-singular for $k=1,\ldots,n-1$.
If the LU decomposition exists and $\mathbf{A}$ is non-singular, then the LU decmpositon is unique and
Actual implementations can differ: outer product LU ($kij$ loop), block outer product LU (higher level-3 fraction), Crout's algorithm ($jki$ loop).
Given LU decomposition $\mathbf{A} = \mathbf{L} \mathbf{U}$, solving $(\mathbf{L} \mathbf{U}) \mathbf{x} = \mathbf{b}$ for one right hand side costs $2n^2$ flops:
$$ \mathbf{L} \mathbf{y} = \mathbf{b} $$
$$ \mathbf{U} \mathbf{x} = \mathbf{y} $$
Therefore to solve $\mathbf{A} \mathbf{x} = \mathbf{b}$ via LU, we need a total of
No inversion mentality:
Whenever we see matrix inverse, we should think in terms of solving linear equations.
We do not compute matrix inverse unless
0. it is necessary to compute standard errors
0. number of right hand sides is much larger than $n$
0. $n$ is small
Is there a solution to $\mathbf{A} \mathbf{x} = \mathbf{b}$ for an arbitrary $\mathbf{b}$? Does GE/LU work for $\mathbf{A}$?
What if, during LU procedure, the pivot $a_{kk}$ is 0 or nearly 0 due to underflow?
Solution: pivoting.
Partial pivoting: before zeroing the $k$-th column, the row with $\max_{i=k}^n |a_{ik}|$ is moved into the $k$-th row.
LU with partial pivoting yields
where $\mathbf{P}$ is a permutation matrix, $\mathbf{L}$ is unit lower triangular with $|\ell_{ij}| \le 1$, and $\mathbf{U}$ is upper triangular.
A[k:n, k:n]
is permuted to the $(k,k)$-th entry. This yields the decompositionwhere $|\ell_{ij}| \le 1$.
LAPACK: ?GETRF
does $\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}$ (LU decomposition with partial pivoting) in place.
R: solve()
implicitly performs LU decomposition (wrapper of LAPACK routine DGESV
). solve()
allows specifying a single or multiple right hand sides. If none, it computes the matrix inverse. The matrix
package contains lu()
function that outputs L
, U
, and pivot
.
Julia:
A
3×3 Array{Float64,2}: 2.0 1.0 -1.0 -3.0 -1.0 2.0 -2.0 1.0 2.0
# second argument indicates partial pivoting (default) or not
alu = lufact(A, Val{true})
typeof(alu)
Base.LinAlg.LU{Float64,Array{Float64,2}}
alu[:L]
3×3 Array{Float64,2}: 1.0 0.0 0.0 0.666667 1.0 0.0 -0.666667 0.2 1.0
alu[:U]
3×3 Array{Float64,2}: -3.0 -1.0 2.0 0.0 1.66667 0.666667 0.0 0.0 0.2
alu[:p]
3-element Array{Int64,1}: 2 3 1
alu[:P]
3×3 Array{Float64,2}: 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0
alu[:L] * alu[:U]
3×3 Array{Float64,2}: -3.0 -1.0 2.0 -2.0 1.0 2.0 2.0 1.0 -1.0
A[alu[:p], :]
3×3 Array{Float64,2}: -3.0 -1.0 2.0 -2.0 1.0 2.0 2.0 1.0 -1.0
# this is doing two triangular solves
alu \ b
3-element Array{Float64,1}: 2.0 3.0 -1.0
det(A) # this does LU!
-0.9999999999999998
det(alu) # this is cheap
-0.9999999999999998
inv(A) # this does LU!
3×3 Array{Float64,2}: 4.0 3.0 -1.0 -2.0 -2.0 1.0 5.0 4.0 -1.0
inv(alu) # this is cheap
3×3 Array{Float64,2}: 4.0 3.0 -1.0 -2.0 -2.0 1.0 5.0 4.0 -1.0
Sections II.5.2 and II.5.3 of Computational Statistics by James Gentle (2010).
A definite reference is Chapter 3 of the book Matrix Computation by Gene Golub and Charles Van Loan.
versioninfo()
Julia Version 0.6.2 Commit d386e40c17 (2017-12-13 18:08 UTC) Platform Info: OS: macOS (x86_64-apple-darwin14.5.0) CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.9.1 (ORCJIT, skylake)