In [1]:

versioninfo()

Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 2 on 8 virtual cores

In [2]:

using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()

  Activating new project at `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/07-gelu`
  No Changes to `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/07-gelu/Project.toml`
  No Changes to `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/07-gelu/Manifest.toml`

Status `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/07-gelu/Project.toml` (empty project)

Gaussian Elimination and LU Decomposition¶

Goal: solve linear equation

$$ \mathbf{A} \mathbf{x} = \mathbf{b}. $$

For simplicity we consider a square matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$.

History: the method is named after Carl Friedrich Gauss (1777–1855), although it stems from the notes of Isaac Newton. If fact, GE was known to Chinese mathematicians as early as 179 AD.
A toy example.

In [3]:

A = [2.0 -4.0 2.0; 4.0 -9.0 7.0; 2.0 1.0 3.0]

Out[3]:

3×3 Matrix{Float64}:
 2.0  -4.0  2.0
 4.0  -9.0  7.0
 2.0   1.0  3.0

In [4]:

b = [6.0, 20.0, 14.0]

Out[4]:

3-element Vector{Float64}:
  6.0
 20.0
 14.0

In [5]:

# Julia way to solve linear equation
# equivalent to `solve(A, b)` in R
A \ b

Out[5]:

3-element Vector{Float64}:
 2.000000000000001
 1.0
 2.9999999999999996

What happens when we call A \ b to solve a linear equation?

Elementary operator matrix¶

Elementary operator matrix is the identity matrix with the 0 in position $(j,k)$ replaced by $c$

$$ \mathbf{E}_{jk}(c) = \begin{bmatrix} 1 & & \\ & \ddots & \\ & & 1 & \\ & & & \ddots & \\ & & c & & 1 & \\ & & & & & \ddots \\ & & & & & & 1 \end{bmatrix} = \mathbf{I} + c \mathbf{e}_j \mathbf{e}_k^T. $$

$\mathbf{E}_{jk}(c)$ is unit triangular, full rank, with inverse $\mathbf{E}_{jk}^{-1}(c) = \mathbf{E}_{jk}(-c)$.
$\mathbf{E}_{jk}(c)$ left-multiplied to an $n \times m$ matrix $\mathbf{X}$ effectively replaces its $j$-th row $\mathbf{x}_{j\cdot}$ by $c \mathbf{x}_{k \cdot} + \mathbf{x}_{j \cdot}$

$$ \mathbf{E}_{jk}(c) \times \mathbf{X} = \mathbf{E}_{jk}(c) \times \begin{bmatrix} & & \\ \cdots & \mathbf{x}_{k\cdot} & \cdots \\ & & \\ \cdots & \mathbf{x}_{j\cdot} & \cdots \\ & & \end{bmatrix} = \begin{bmatrix} & & \\ \cdots & \mathbf{x}_{k\cdot} & \cdots \\ & & \\ \cdots & c \mathbf{x}_{k\cdot} + \mathbf{x}_{j\cdot} & \cdots \\ & & \end{bmatrix}. $$

+ $2n$ flops (why?).

Gaussian elimination applies a sequence of elementary operator matrices to transform the linear system $\mathbf{A} \mathbf{x} = \mathbf{b}$ to an upper triangular system

$$ \begin{eqnarray*} \mathbf{E}_{n,n-1}(c_{n,n-1}) \cdots \mathbf{E}_{21}(c_{21}) \mathbf{A} \mathbf{x} &=& \mathbf{E}_{n,n-1}(c_{n,n-1}) \cdots \mathbf{E}_{21}(c_{21}) \mathbf{b} \\ \mathbf{U} \mathbf{x} &=& \mathbf{b}_{\text{new}}. \end{eqnarray*} $$

Column 1:

In [6]:

E21 = [1.0 0.0 0.0; -2.0 1.0 0.0; 0.0 0.0 1.0]

Out[6]:

3×3 Matrix{Float64}:
  1.0  0.0  0.0
 -2.0  1.0  0.0
  0.0  0.0  1.0

In [7]:

# zero (2, 1) entry
E21 * A   # Step 1, A'

Out[7]:

3×3 Matrix{Float64}:
 2.0  -4.0  2.0
 0.0  -1.0  3.0
 2.0   1.0  3.0

In [8]:

E31 = [1.0 0.0 0.0; 0.0 1.0 0.0; -1.0 0.0 1.0]

Out[8]:

3×3 Matrix{Float64}:
  1.0  0.0  0.0
  0.0  1.0  0.0
 -1.0  0.0  1.0

In [9]:

# zero (3, 1) entry
E31 * E21 * A  # Step 2, A''

Out[9]:

3×3 Matrix{Float64}:
 2.0  -4.0  2.0
 0.0  -1.0  3.0
 0.0   5.0  1.0

Column 2:

In [10]:

E32 = [1.0 0.0 0.0; 0.0 1.0 0.0; 0.0 5.0 1.0]

Out[10]:

3×3 Matrix{Float64}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  5.0  1.0

In [11]:

# zero (3, 2) entry
E32 * E31 * E21 * A   # Step 3, A'''

Out[11]:

3×3 Matrix{Float64}:
 2.0  -4.0   2.0
 0.0  -1.0   3.0
 0.0   0.0  16.0

Gauss transformations¶

For the $k$-th column,

$$ \mathbf{M}_k = \mathbf{E}_{nk}(c_{nk}) \cdots \mathbf{E}_{k+1,k}(c_{k+1,k}) = \begin{bmatrix} 1 & \\ & \ddots \\ & & 1 & \\ & & c_{k+1,k} & 1\\ & & \vdots & & \ddots \\ & & c_{n,k} & & & 1 \end{bmatrix}. $$

$\mathbf{M}_1, \ldots, \mathbf{M}_{n-1}$ are called the Gauss transformations.

In [12]:

M1 = E31 * E21   # inv(L2 * L1)

Out[12]:

3×3 Matrix{Float64}:
  1.0  0.0  0.0
 -2.0  1.0  0.0
 -1.0  0.0  1.0

In [13]:

M2 = E32    # inv(L3)

Out[13]:

3×3 Matrix{Float64}:
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  5.0  1.0

Gauss transformations $\mathbf{M}_k$ are unit triangular, full rank, with inverse

$$ \begin{split} \mathbf{M}_k^{-1} &= \mathbf{E}_{k+1,k}^{-1}(c_{k+1,k}) \cdots \mathbf{E}_{nk}^{-1}(c_{nk}) \\ &= \begin{bmatrix} 1 & \\ & \ddots \\ & & 1 & \\ & & - c_{k+1,k}\\ & & \vdots & & \ddots \\ & & - c_{n,k} & & & 1 \end{bmatrix}. \end{split} $$

In [14]:

inv(M1)    # L2 * L1. inv(A) give the inverse of A, but use with caution (see below)

Out[14]:

3×3 Matrix{Float64}:
 1.0  0.0  0.0
 2.0  1.0  0.0
 1.0  0.0  1.0

In [15]:

inv(M2)    # L3

Out[15]:

3×3 Matrix{Float64}:
 1.0   0.0  0.0
 0.0   1.0  0.0
 0.0  -5.0  1.0

LU decomposition¶

Gaussian elimination does $\mathbf{M}_{n-1} \cdots \mathbf{M}_1 \mathbf{A} = \mathbf{U}$.
Let

$$ \mathbf{L} = \mathbf{M}_1^{-1} \cdots \mathbf{M}_{n-1}^{-1} = \begin{bmatrix} 1 & & & & \\ - c_{21} & \ddots & & & \\ & & 1 & & \\ - c_{k+1,1} & & - c_{k+1,k} & & \\ & & \vdots & & \ddots \\ - c_{n1} & & - c_{nk} & & & 1 \end{bmatrix}. $$

Thus we have the LU decomposition

$$ \mathbf{A} = \mathbf{L} \mathbf{U}, $$

where L is unit lower triangular and U is upper triangular. + LU decomposition exists if the principal sub-matrix $\mathbf{A}[1:k, 1:k]$ is non-singular for $k=1,\ldots,n-1$.

If the LU decomposition exists and A is non-singular, then the LU decmpositon is unique and $$ \small \det(\mathbf{A}) = \det(\mathbf{L}) \det(\mathbf{U}) = \prod_{k=1}^n u_{kk}. $$
- This is basically how Julia and R compute determinants.

The whole LU algorithm is done in place, i.e., A is overwritten by L and U.

for k=1:n-1
    for i=k+1:n
        A[i, k] /= A[k, k]
        for j=k+1:n
            A[i, j] -= A[i, k] * A[k, j]
        end
    end
end

The LU decomposition costs $2(n-1)^2 + 2(n-2)^2 + \cdots + 2 \cdot 1^2 \approx \frac 23 n^3$ flops.
Actual implementations can differ: outer product LU (kij loop), block outer product LU (higher level-3 fraction), Crout's algorithm (jki loop).

Given LU decomposition $\mathbf{A} = \mathbf{L} \mathbf{U}$, solving $(\mathbf{L} \mathbf{U}) \mathbf{x} = \mathbf{b}$ for one right hand side costs $2n^2$ flops:
- One forward substitution ($n^2$ flops) to solve
$$ \mathbf{L} \mathbf{y} = \mathbf{b} $$
- One back substitution ($n^2$ flops) to solve
$$ \mathbf{U} \mathbf{x} = \mathbf{y} $$
Therefore to solve $\mathbf{A} \mathbf{x} = \mathbf{b}$ via LU, we need a total of $\frac 23 n^3 + 2n^2$ flops.
If there are multiple right hand sides, LU only needs to be done once.

Matrix inversion¶

For matrix inversion, there are $n$ right hand sides $\mathbf{e}_1, \ldots, \mathbf{e}_n$. Taking advantage of 0s reduces $2n^3$ flops to $\frac 43 n^3$ flops. So matrix inversion via LU costs $\frac 23 n^3 + \frac 43 n^3 = 2n^3$ flops.
No inversion mentality:
- Whenever you see matrix inverse, you should think in terms of solving linear equations. In short, no inv(A).
We do not compute matrix inverse unless
1. it is necessary to compute standard errors;
2. number of right hand sides is much larger than $n$;
3. $n$ is small.

Pivoting¶

Let

$$ \mathbf{A} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}. $$

Is there a solution to $\mathbf{A} \mathbf{x} = \mathbf{b}$ for an arbitrary $\mathbf{b}$? Does GE/LU work for $\mathbf{A}$?

What if, during LU procedure, the pivot $a_{kk}$ is 0 or nearly 0 due to underflow?

Example: $$ \begin{split} 0.001x_1 + x_2 &= 1 \\ x_1 + x_2 &= 2 \end{split} $$ with solution

$$ \begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix} = \begin{bmatrix} 1.001001 \\ 0.998999 \end{bmatrix} \approx \begin{bmatrix} 1.0 \\ 1.0 \end{bmatrix} . $$

With two significant digits, GE yields

$$ \begin{split} 0.001x_1 + x_2 &= 1 \\ (1-1000)x_2 &= 2 - 1000 \end{split} $$

or $$ \begin{split} 0.001x_1 + x_2 &= 1 \\ -1000 x_2 &= -1000 \end{split} , $$ yielding $$ \begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix} = \begin{bmatrix} 0.0 \\ 1.0 \end{bmatrix} ! $$

Solution: pivoting.

$$ \begin{split} x_1 + x_2 &= 2 \\ 0.001x_1 + x_2 &= 1 \end{split} $$

With two significant digits, GE yields

$$ \begin{split} x_1 + x_2 &= 2 \\ (1-0.001)x_2 &= 1 - 0.001 \end{split} $$

or $$ \begin{split} x_1 + x_2 &= 2 \\ 1.0 x_2 &= 1.0 \end{split} $$

yielding $$ \begin{bmatrix} x_1 \\ x_ 2 \end{bmatrix} = \begin{bmatrix} 1.0 \\ 1.0 \end{bmatrix} . $$

Partial pivoting: before zeroing the $k$-th column, move the row with $\max_{i=k}^n |a_{ik}|$ is into the $k$-th row (called GEPP).
LU with partial pivoting yields

$$ \mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}, $$

where $\mathbf{P}$ is a permutation matrix, $\mathbf{L}$ is unit lower triangular with $|\ell_{ij}| \le 1$, and $\mathbf{U}$ is upper triangular.

Complete pivoting (GECP): Do both row and column interchanges so that the largest entry in the submatrix A[k:n, k:n] is permuted to the $(k,k)$-th entry. This yields the decomposition

$$ \mathbf{P} \mathbf{A} \mathbf{Q} = \mathbf{L} \mathbf{U}, $$

where $|\ell_{ij}| \le 1$.

GEPP is the most commonly used methods for solving general linear systems. GECP is more stable but costs more computation. Partial pivoting is stable most of times (which is partially because GEPP guarantees $|\ell_{ij}| \le 1$).

Implementation¶

LAPACK: ?GETRF does $\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U}$ (LU decomposition with partial pivoting) in place.
R: solve() implicitly performs LU decomposition (wrapper of LAPACK routine DGESV). solve() allows specifying a single or multiple right hand sides. If none, it computes the matrix inverse. The matrix package contains lu() function that outputs L, U, and pivot.

Julia:
- LinearAlgebra.lu.
- LinearAlgebra.lu!. In-place version. Input matrix gets overwritten with L and U.
- Or call LAPACK wrapper function LAPACK.getrf! directly.
- Other LU-related LAPACK wrapper functions: getrs, gesv, gesvx, trtri (inverse of triangular matrix), trtrs.

In [16]:

Out[16]:

3×3 Matrix{Float64}:
 2.0  -4.0  2.0
 4.0  -9.0  7.0
 2.0   1.0  3.0

In [17]:

using LinearAlgebra

# second argument indicates partial pivoting (default) or not
alu = lu(A)
typeof(alu)

Out[17]:

LU{Float64, Matrix{Float64}, Vector{Int64}}

In [18]:

fieldnames(typeof(alu))

Out[18]:

(:factors, :ipiv, :info)

In [19]:

alu.L

Out[19]:

3×3 Matrix{Float64}:
 1.0  0.0        0.0
 0.5  1.0        0.0
 0.5  0.0909091  1.0

In [20]:

alu.U

Out[20]:

3×3 Matrix{Float64}:
 4.0  -9.0   7.0
 0.0   5.5  -0.5
 0.0   0.0  -1.45455

In [21]:

alu.p

Out[21]:

3-element Vector{Int64}:
 2
 3
 1

In [22]:

alu.P

Out[22]:

3×3 Matrix{Float64}:
 0.0  1.0  0.0
 0.0  0.0  1.0
 1.0  0.0  0.0

In [23]:

alu.L * alu.U

Out[23]:

3×3 Matrix{Float64}:
 4.0  -9.0  7.0
 2.0   1.0  3.0
 2.0  -4.0  2.0

In [24]:

A[alu.p, :]

Out[24]:

3×3 Matrix{Float64}:
 4.0  -9.0  7.0
 2.0   1.0  3.0
 2.0  -4.0  2.0

In [25]:

# this is doing two triangular solves, 2n^2 flops
alu \ b

Out[25]:

3-element Vector{Float64}:
 2.000000000000001
 1.0
 2.9999999999999996

In [26]:

det(A) # this does LU!

Out[26]:

-32.0

In [27]:

det(alu) # this is cheap

Out[27]:

-32.0

In [28]:

inv(A) # this does LU!

Out[28]:

3×3 Matrix{Float64}:
  1.0625  -0.4375  0.3125
 -0.0625  -0.0625  0.1875
 -0.6875   0.3125  0.0625

In [29]:

inv(alu) # this is cheap

Out[29]:

3×3 Matrix{Float64}:
  1.0625  -0.4375  0.3125
 -0.0625  -0.0625  0.1875
 -0.6875   0.3125  0.0625