sessionInfo()
R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] fansi_1.0.4 crayon_1.5.2 digest_0.6.31 utf8_1.2.2 [5] IRdisplay_1.1 repr_1.1.5 lifecycle_1.0.3 jsonlite_1.8.4 [9] evaluate_0.20 pillar_1.8.1 rlang_1.0.6 cli_3.6.0 [13] uuid_1.1-0 vctrs_0.5.2 IRkernel_1.3.2 tools_4.2.2 [17] glue_1.6.2 fastmap_1.1.0 compiler_4.2.2 base64enc_0.1-3 [21] pbdZMQ_0.3-9 htmltools_0.5.4
library(Matrix)
We consider computer algorithms for solving linear equations $\mathbf{A} \mathbf{x} = \mathbf{b}$, a ubiquitous task in statistics.
The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.
-- James Gentle, Matrix Algebra, Springer, New York (2007).
A flop (floating point operation) consists of a floating point addition, subtraction, multiplication, division, or comparison, and the usually accompanying fetch and store.
How to measure efficiency of an algorithm? Big O notation. If $n$ is the size of a problem, an algorithm has order $O(f(n))$, where the leading term in the number of flops is $c \cdot f(n)$. For example,
A * b
, where A
is $m \times n$ and b
is $n \times 1$, takes $2mn$ or $O(mn)$ flopsA * B
, where A
is $m \times n$ and B
is $n \times p$, takes $2mnp$ or $O(mnp)$ flopsTypical orders of computational complexity:
Difference of $O(n^2)$ and $O(n\log n)$ on massive data. Suppose we have a teraflops supercomputer capable of doing $10^{12}$ flops per second. For a problem of size $n=10^{12}$, $O(n \log n)$ algorithm takes about
$O(n^2)$ algorithm takes about $10^{12}$ seconds, which is approximately 31710 years!
Idea: turning original problem into an easy one, e.g., triangular system.
To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is lower triangular
$$ \begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ a_{21} & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix}. $$To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is upper triangular
$$
\begin{bmatrix}
a_{11} & \cdots & a_{1,n-1} & a_{1n} \\
\vdots & \ddots & \vdots & \vdots \\
0 & \cdots & a_{n-1,n-1} & a_{n-1,n} \\
0 & 0 & 0 & a_{nn}
\end{bmatrix}
\begin{bmatrix}
x_1 \\ \vdots \\ x_{n-1} \\ x_n
\end{bmatrix} = \begin{bmatrix}
b_1 \\ \vdots \\ b_{n-1} \\ b_n
\end{bmatrix}.
$$
Most numerical computing software packages (including R) are built on top of the BLAS (basic linear algebra subprograms).
sessionInfo()
above)The BLAS triangular system solve is done in place, i.e., $\mathbf{b}$ is overwritten by $\mathbf{x}$.
# forward substitution
for (i in seq_len(n)) {
for (j in seq_len(i-1)) {
b[i] <- b[i] - A[i, j] * b[j]
}
b[i] <- b[i] / A[i, i]
}
# backsolve
for (i in rev(seq_len(n))) {
for (j in i + seq_len(max(n - i, 0))) {
b[i] <- b[i] - A[i, j] * b[j]
}
b[i] <- b[i] / A[i, i]
}
?backsolve
, ?forwardsolve
dtrsm
d
stands for double precision.set.seed(123) # seed
n <- 5
A <- Matrix(rnorm(n * n), nrow = n)
b <- rnorm(n)
# a random matrix
A
5 x 5 Matrix of class "dgeMatrix" [,1] [,2] [,3] [,4] [,5] [1,] -0.56047565 1.7150650 1.2240818 1.7869131 -1.0678237 [2,] -0.23017749 0.4609162 0.3598138 0.4978505 -0.2179749 [3,] 1.55870831 -1.2650612 0.4007715 -1.9666172 -1.0260044 [4,] 0.07050839 -0.6868529 0.1106827 0.7013559 -0.7288912 [5,] 0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393
Al <- lower.tri(A) # Returns a matrix of logicals the same size of a given matrix with entries TRUE in the lower triangle
Aupper <- A
Aupper[Al] <- 0
Aupper
5 x 5 Matrix of class "dgeMatrix" [,1] [,2] [,3] [,4] [,5] [1,] -0.5604756 1.7150650 1.2240818 1.7869131 -1.0678237 [2,] 0.0000000 0.4609162 0.3598138 0.4978505 -0.2179749 [3,] 0.0000000 0.0000000 0.4007715 -1.9666172 -1.0260044 [4,] 0.0000000 0.0000000 0.0000000 0.7013559 -0.7288912 [5,] 0.0000000 0.0000000 0.0000000 0.0000000 -0.6250393
backsolve(Aupper, b)
Eigenvalues of a triangular matrix $\mathbf{A}$ are diagonal entries $\lambda_i = a_{ii}$.
Determinant $\det(\mathbf{A}) = \prod_i a_{ii}$.
The product of two upper (lower) triangular matrices is upper (lower) triangular.
The inverse of an upper (lower) triangular matrix is upper (lower) triangular.
The product of two unit upper (lower) triangular matrices is unit upper (lower) triangular.
The inverse of a unit upper (lower) triangular matrix is unit upper (lower) triangular.
A toy example.
History: the method is named after Carl Friedrich Gauss (1777–1855), although it stems from the notes of Isaac Newton. If fact, GE was known to Chinese mathematicians as early as 179 AD.
A <- t(Matrix(c(2.0, -4.0, 2.0, 4.0, -9.0, 7.0, 2.0, 1.0, 3.0), ncol=3)) # R uses column major ordering
A
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 2 -4 2 [2,] 4 -9 7 [3,] 2 1 3
b <- c(6.0, 20.0, 14.0)
# R's way tp solve linear equation
solve(A, b)
3 x 1 Matrix of class "dgeMatrix" [,1] [1,] 2 [2,] 1 [3,] 3
What happens when we call solve(A, b)
?
$2m$ flops (why?).
Let's get back to our toy example.
E21 <- t(Matrix(c(1.0, 0.0, 0.0, -2.0, 1.0, 0.0, 0.0, 0.0, 1.0), nrow=3)) # Step 1, inv(L1)
E21
3 x 3 sparse Matrix of class "dtCMatrix" [1,] 1 . . [2,] -2 1 . [3,] . . 1
# zero (2, 1) entry
E21 %*% A # Step 1, A'
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 2 -4 2 [2,] 0 -1 3 [3,] 2 1 3
E31 <- t(Matrix(c(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, -1.0, 0.0, 1.0), nrow=3)) # Step 2, find the corresponding matrix!
E31
3 x 3 sparse Matrix of class "dtCMatrix" [1,] 1 . . [2,] . 1 . [3,] -1 . 1
# zero (3, 1) entry
E31 %*% E21 %*% A # Step2, A''
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 2 -4 2 [2,] 0 -1 3 [3,] 0 5 1
E32 <- t(Matrix(c(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 5.0, 1.0), nrow=3)) # Step 3, find the corresponding matrix!
E32
3 x 3 sparse Matrix of class "dtCMatrix" [1,] 1 . . [2,] . 1 . [3,] . 5 1
# zero (3, 2) entry
E32 %*% E31 %*% E21 %*% A # Step 3, A'''
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 2 -4 2 [2,] 0 -1 3 [3,] 0 0 16
Thus we have arrived at an upper triangular matrix $\mathbf{U}$.
We can combine elementary operations by column.
For column 1,
M1 <- E31 %*% E21 # inv(L1 * L2)
M1
3 x 3 sparse Matrix of class "dtCMatrix" [1,] 1 . . [2,] -2 1 . [3,] -1 . 1
M2 <- E32 # inv(L3)
M2
3 x 3 sparse Matrix of class "dtCMatrix" [1,] 1 . . [2,] . 1 . [3,] . 5 1
solve(M1) # L2; solve(A) gives the inverse of A, but use it with caution (see below)
3 x 3 sparse Matrix of class "dtCMatrix" [1,] 1 . . [2,] 2 1 . [3,] 1 . 1
solve(M2) # L3
3 x 3 sparse Matrix of class "dtCMatrix" [1,] 1 . . [2,] . 1 . [3,] . -5 1
Gaussian elimination does
$$
\mathbf{M}_{n-1} \cdots \mathbf{M}_1 \mathbf{A} = \mathbf{U}.
$$
Let
Thus we have the LU decomposition $$ \mathbf{A} = \mathbf{L} \mathbf{U}, $$ where $\mathbf{L}$ is unit lower triangular and $\mathbf{U}$ is upper triangular.
LU decomposition exists if the principal sub-matrix $\mathbf{A}[1:k, 1:k]$ is non-singular for $k=1,\ldots,n-1$.
If the LU decomposition exists and $\mathbf{A}$ is non-singular, then the LU decmpositon is unique and
- This is basically how R (and MATLAB and Julia) compute determinants.
for (k in seq_len(n-1)) {
for (i in k +seq_len(max(n - k, 0))) {
A[i, k] <- A[i, k] / A[k, k] # computes L
for (j in k +seq_len(max(n - k, 0))) {
A[i, j] <- A[i, j] - A[i, k] * A[k, j] # computes U
}
}
}
Source: http://www.netlib.org
Given LU decomposition $\mathbf{A} = \mathbf{L} \mathbf{U}$, solving $(\mathbf{L} \mathbf{U}) \mathbf{x} = \mathbf{b}$ for one right hand side costs $2n^2$ flops:
$$ \mathbf{L} \mathbf{y} = \mathbf{b} $$
$$ \mathbf{U} \mathbf{x} = \mathbf{y} $$
Therefore to solve $\mathbf{A} \mathbf{x} = \mathbf{b}$ via LU, we need a total of
No inversion mentality:
Whenever we see matrix inverse, we should think in terms of solving linear equations. In short, no
solve(A) %*% b
, butsolve(A, b)
.
We do not compute matrix inverse unless
Is there a solution to $\mathbf{A} \mathbf{x} = \mathbf{b}$ for an arbitrary $\mathbf{b}$? Does GE/LU work for $\mathbf{A}$?
with solution
$$ \begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix} = \begin{bmatrix} 1.001001 \\ 0.998999 \end{bmatrix} \approx \begin{bmatrix} 1.0 \\ 1.0 \end{bmatrix} . $$or
$$
\begin{split}
0.001x_1 + x_2 &= 1 \\
-1000 x_2 &= -1000
\end{split}
,
$$
yielding
$$
\begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix}
=
\begin{bmatrix} 0.0 \\ 1.0 \end{bmatrix}
!
$$
The problem of solving $\mathbf{A}\mathbf{x}=\mathbf{b}$ for fixed $b$ can be viewed as a map $f: \mathbb{R}^{n\times n} \ni \mathbf{A} \mapsto \mathbf{A}^{-1}\mathbf{b} \in \mathbb{R}^n$.
A well-conditioned problem is one such that all small perturbations of a given input $\mathbf{A}$ lead to only small changes in $f(\mathbf{A})$.
The conditioning of the problem of computing $\mathbf{A}^{-1}\mathbf{b}$ given $\mathbf{A}$ is usually measured by the condition number
+ $\sigma_{\max}(\mathbf{A}) = \max_{\mathbf{x}\neq 0} \|\mathbf{A}\mathbf{x}\|/\|\mathbf{x}\|$ is the maximum singular value of $\mathbf{A}$;
+ $\sigma_{\min}(\mathbf{A}) = \min_{\mathbf{x}\neq 0} \|\mathbf{A}\mathbf{x}\|/\|\mathbf{x}\|$ is the minimum singular value of $\mathbf{A}$.
An algorithm for computing $\mathbf{A}^{-1}\mathbf{b}$ using floating-point arithmetric can be viewed as another map $\tilde{f}: \mathbb{R}^{n\times n} \to \mathbb{R}^{n}$.
For given input $\mathbf{A}$, the solution computed by algorithm $\tilde{f}$ is $\hat{\mathbf{x}} = \tilde{f}(\mathbf{A})$.
Algorithms will be affected by at least rounding errors.
Algorithm $\tilde{f}$ is called stable if for any perturbed input $\tilde{\mathbf{A}}$ from $\mathbf{A}$ with relative error bounded by $\epsilon$, the relative error between $\hat{\mathbf{x}}$ and $\tilde{\mathbf{A}}^{-1}\mathbf{b}$ is bounded by $O(\epsilon)$.
The condition number of matrix $\mathbf{A} = \begin{bmatrix} 0.001 & 1 \\ 1 & 1 \end{bmatrix}$ is 2.262, so the problem itself is well-conditioned: a small change in $\mathbf{A}$ won't yield a large change in the solution $\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}$.
It is the instability of the GE/LU algorithm that yields the large error between the true solution $\mathbf{x}\approx(1, 1)$ and the computed solution (or the output of the algorithm) $\hat{\mathbf{x}}=(0, 1)$. This example suggests that with GE/LU, the worst-case relative error is $\|\mathbf{x}-\hat{\mathbf{x}}\| / \|\mathbf{x}\| \ge 1/\sqrt{2}$ (zero relative error in input $\tilde{\mathbf{A}}$), not small.
The root cause is the vastly different scale of $a_{11}$ with other coefficients and the order of ellimination. In particular, any $a_{22}$ such that $[a_{22} - 1000] = -1000$ and $b_2$ such that $[b_2 - 1000] = 1000$ will yield the same solution $\hat{\mathbf{x}}=(0, 1)$.
That is, the $1000 = a_{21}/a_{11}$ causes the loss of information in $a_{22}$ and $b_2$. (Recall Scenario 1 of Catastrophic Cancellation in Lecture 1.)
resulting in $\mathbf{L}\mathbf{U} = \begin{bmatrix} 0.001 & 1 \\ 1 & 0 \end{bmatrix}$, so $\mathbf{L}\mathbf{U}$ is not even closed to $\mathbf{A} = \begin{bmatrix} 0.001 & 1 \\ 1 & 1 \end{bmatrix}$!
or $$ \begin{split} x_1 + x_2 &= 2 \\ 1.0 x_2 &= 1.0 \end{split} , $$ yielding $$ \begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix} = \begin{bmatrix} 1.0 \\ 1.0 \end{bmatrix} . $$
Partial pivoting: before zeroing the $k$-th column, move the row with $\max_{i=k}^n |a_{ik}|$ is into the $k$-th row (called GEPP).
LU with partial pivoting yields
where $\mathbf{P}$ is a permutation matrix, $\mathbf{L}$ is unit lower triangular with $|\ell_{ij}| \le 1$, and $\mathbf{U}$ is upper triangular.
A[k:n, k:n]
is permuted to the $(k,k)$-th entry. This yields the decompositionwhere $|\ell_{ij}| \le 1$.
LAPACK (Linear Algebra PACKage) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. -Wikipedia
LAPACK routine GETRF
does $\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}$ (LU decomposition with partial pivoting) in place.
R: solve()
implicitly performs LU decomposition (wrapper of LAPACK routine DGESV
). solve()
allows specifying a single or multiple right hand sides. If none, it computes the matrix inverse. The Matrix
package contains lu()
function that outputs L
, U
, and pivot
.
A # toy example, just to recall
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 2 -4 2 [2,] 4 -9 7 [3,] 2 1 3
alu <- Matrix::lu(A)
class(alu)
ealu <- expand(alu) # expand the decomposition into Factors
ealu$L
3 x 3 Matrix of class "dtrMatrix" (unitriangular) [,1] [,2] [,3] [1,] 1.00000000 . . [2,] 0.50000000 1.00000000 . [3,] 0.50000000 0.09090909 1.00000000
ealu$U
3 x 3 Matrix of class "dtrMatrix" [,1] [,2] [,3] [1,] 4.000000 -9.000000 7.000000 [2,] . 5.500000 -0.500000 [3,] . . -1.454545
ealu$P
3 x 3 sparse Matrix of class "pMatrix" [1,] . . | [2,] | . . [3,] . | .
with(ealu, L %*% U)
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 4 -9 7 [2,] 2 1 3 [3,] 2 -4 2
with(ealu, P %*% L %*% U)
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 2 -4 2 [2,] 4 -9 7 [3,] 2 1 3
det(A) # this does LU!
det(ealu$U) # this is cheap
solve(A) # this does LU!
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 1.0625 -0.4375 0.3125 [2,] -0.0625 -0.0625 0.1875 [3,] -0.6875 0.3125 0.0625
with(ealu, solve(U) %*% solve(L) %*% t(P) ) # this is cheap???
3 x 3 Matrix of class "dgeMatrix" [,1] [,2] [,3] [1,] 1.0625 -0.4375 0.3125 [2,] -0.0625 -0.0625 0.1875 [3,] -0.6875 0.3125 0.0625