In [1]:

using LinearAlgebra # load this package for most of Julia's linear algebra functions

Gaussian Elimination = Matrix multiplications¶

Let's look more closely at the process of Gaussian elimination in matrix form, using the matrix from lecture 1.

In [2]:

A = [1 3  1
     1 1 -1
     3 11 6]

Out[2]:

3×3 Matrix{Int64}:
 1   3   1
 1   1  -1
 3  11   6

Gaussian elimination produces the matrix $U$, which we can compute in Julia as in lecture 1:

In [3]:

# LU factorization (Gaussian elimination) of the matrix A, 
# passing the NoPivot() to prevent row re-ordering ("pivoting")
L, U = lu(A, NoPivot()) 
U # just show U

Out[3]:

3×3 Matrix{Float64}:
 1.0   3.0   1.0
 0.0  -2.0  -2.0
 0.0   0.0   1.0

Now, let's go through Gaussian elimination in matrix form, by expressing the elimination steps as matrix multiplications. In Gaussian elimination, we make linear combination of rows to cancel elements below the pivot, and we now know that this corresponds to multiplying on the left by some elimination matrix $E$.

The first step is to eliminate in the first column of $A$. The pivot is the 1 in the upper-left-hand corner. For this $A$, we need to:

Leave the first row alone.
Subtract the first row from the second row to get the new second row.
Subtract $3 \times {}$ first row from the third row to get the new third row.

This corresponds to multiplying $A$ on the left by the matrix E1. As above (in the "row × matrix" picture), the three rows of E1 correspond exactly to the three row operations listed above:

In [4]:

E1 = [ 1 0 0
      -1 1 0
      -3 0 1]

Out[4]:

3×3 Matrix{Int64}:
  1  0  0
 -1  1  0
 -3  0  1

In [5]:

E1*A

Out[5]:

3×3 Matrix{Int64}:
 1   3   1
 0  -2  -2
 0   2   3

As desired, this introduced zeros below the diagonal in the first column.

It's fun and perhaps illuminating to try acting E1 on a matrix of symbolic variables, using the Symbolics.jl computer-algebra (symbolic-math) package for Julia.

In [6]:

using Symbolics

We'll define a matrix S whose entries are variables instead of numbers, and then act E1 on it.

We could use boring variables like $s_{11}$ etcetera:

In [7]:

@variables s[1:3, 1:3]  # declare an bunch of symbolic variables
S = collect(s)          # & collect them into a matrix

Out[7]:

\begin{equation} \left[ \begin{array}{ccc} s{{_1}}ˏ{_1} & s{{_1}}ˏ{_2} & s{{_1}}ˏ{_3} \\ s{{_2}}ˏ{_1} & s{{_2}}ˏ{_2} & s{{_2}}ˏ{_3} \\ s{{_3}}ˏ{_1} & s{{_3}}ˏ{_2} & s{{_3}}ˏ{_3} \\ \end{array} \right] \end{equation}

In [8]:

E1 * S

Out[8]:

\begin{equation} \left[ \begin{array}{ccc} s{{_1}}ˏ{_1} & s{{_1}}ˏ{_2} & s{{_1}}ˏ{_3} \\ - s{{_1}}ˏ{_1} + s{{_2}}ˏ{_1} & - s{{_1}}ˏ{_2} + s{{_2}}ˏ{_2} & - s{{_1}}ˏ{_3} + s{{_2}}ˏ{_3} \\ - 3 s{{_1}}ˏ{_1} + s{{_3}}ˏ{_1} & - 3 s{{_1}}ˏ{_2} + s{{_3}}ˏ{_2} & - 3 s{{_1}}ˏ{_3} + s{{_3}}ˏ{_3} \\ \end{array} \right] \end{equation}

But it's hard to visually distinguish all of these subscripts, and nowadays on a computer we have a lot more symbols to choose from.

Let's instead make a symbolic matrix full of emoji:

In [9]:

@variables 😀 🥸 😭 🐸 🐶 🐨 🚗 🚛 🚜    # declare "moar funner" symbolic variables

S = [ 😀 🥸 😭       # first row = faces
      🐸 🐶 🐨       # second row = animals
      🚗 🚛 🚜 ]     # third row = Cars™

Out[9]:

\begin{equation} \left[ \begin{array}{ccc} 😀 & 🥸 & 😭 \\ 🐸 & 🐶 & 🐨 \\ 🚗 & 🚛 & 🚜 \\ \end{array} \right] \end{equation}

In [10]:

E1 * S

Out[10]:

\begin{equation} \left[ \begin{array}{ccc} 😀 & 🥸 & 😭 \\ 🐸 - 😀 & 🐶 - 🥸 & 🐨 - 😭 \\ 🚗 - 3 😀 & 🚛 - 3 🥸 & 🚜 - 3 😭 \\ \end{array} \right] \end{equation}

Now, we need to eliminate the 2 below the diagonal in the second column of E1*A. Our new pivot is $-2$ (in the second row), and we just add the second row of E1*A with the third row to make the new third row.

This corresponds to multiplying on the left by the matrix E2, which leaves the first two rows alone and makes the new third row by adding the second and third rows:

In [11]:

E2 = [1 0 0
      0 1 0
      0 1 1]

Out[11]:

3×3 Matrix{Int64}:
 1  0  0
 0  1  0
 0  1  1

In [12]:

E2 * S

Out[12]:

\begin{equation} \left[ \begin{array}{ccc} 😀 & 🥸 & 😭 \\ 🐸 & 🐶 & 🐨 \\ 🐸 + 🚗 & 🐶 + 🚛 & 🐨 + 🚜 \\ \end{array} \right] \end{equation}

In [13]:

E2*E1*A

Out[13]:

3×3 Matrix{Int64}:
 1   3   1
 0  -2  -2
 0   0   1

In [14]:

E2*E1*S

Out[14]:

\begin{equation} \left[ \begin{array}{ccc} 😀 & 🥸 & 😭 \\ 🐸 - 😀 & 🐶 - 🥸 & 🐨 - 😭 \\ 🐸 + 🚗 - 4 😀 & 🐶 + 🚛 - 4 🥸 & 🐨 + 🚜 - 4 😭 \\ \end{array} \right] \end{equation}

As expected, this is upper triangular, and in fact the same as the U matrix returned by the Julia lu function above:

In [15]:

E2*E1*A == U

Out[15]:

true

Thus, we have arrived at the formula: $$ \underbrace{E_2 E_1}_E A = U $$ Notice that we multiplied $A$ by the elimination matrices from right to left in the order of the steps: it is $E_2 E_1 A$, not $E_1 E_2 A$. Because matrix multiplication is generally not commutative, $E_2 E_1$ and $E_1 E_2$ give different matrices:

In [16]:

E = E2*E1

Out[16]:

3×3 Matrix{Int64}:
  1  0  0
 -1  1  0
 -4  1  1

(Notice that when we multiply $E_2 E_1$ to get $E$, the elimination steps get "mixed up together", and we get new numbers like -4 (above) that didn't appear as multipliers in the elimination steps. As discussed below, this makes $E$ a bit inconvenient as a way to work with Gaussian elimination, because it requires some pointless effort to compute $E$.)

In [17]:

E1*E2

Out[17]:

3×3 Matrix{Int64}:
  1  0  0
 -1  1  0
 -3  1  1

Elimination without row swaps: U = (lower triangular) * A¶

Notice, furthermore, that the matrices $E_1$ and $E_2$ are both lower-triangular matrices. This is a consequence of the structure of Gaussian elimination (assuming no row re-ordering): we always add the pivot row to rows below it, never above it.

The product of lower-triangular matrices is always lower-triangular too. (In homework, you will explore a similar property for upper-triangular matrices) In consequence, the product $E = E_2 E_1$ is lower-triangular, and Gaussian elimination can be viewed as yielding $EA=U$ where $E$ is lower triangular and $U$ is upper triangular.

Caveat: there is a slight complication if we have to do row swap steps, since row swaps are not lower triangular. We will defer this case until later, since row swaps don't arise in hand calculations unless you are unlucky.

The other way around: A = LU¶

There are two problems with writing $U = EA$.

The matrix $E$ is annoying to compute — the different elimination steps are mixed up together, and we don't actually want to multiply those matrices together.
We want to think of this process as simplifying A. i.e. we want to replace $A$ with $A = \mbox{simpler stuff}$.

Both of these concerns are addressed by thinking about Gaussian elimination in reverse: How do we get A back from U? Again, this consists of row operations on $U$, and will result in

$$ A = LU $$

where $L$ is a lower-triangular matrix with 1's on the diagonal, and the multipliers from the elimination steps written below the diagonal.

See handwritten notes on:

How to get $L$: just a record of the elimination steps, no extra computation!
How to use $A = LU$ to solve $Ax=b$ for any right-hand side(s) we want.

Check:

In [18]:

lu(A, NoPivot())

Out[18]:

LU{Float64, Matrix{Float64}}
L factor:
3×3 Matrix{Float64}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0
U factor:
3×3 Matrix{Float64}:
 1.0   3.0   1.0
 0.0  -2.0  -2.0
 0.0   0.0   1.0

In [19]:

E^-1

Out[19]:

3×3 Matrix{Float64}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0

The Identity Matrix¶

A very important special matrix is the identity matrix $I$. Here is a $ 5 \times 5$ identity matrix:

$$ I = \begin{pmatrix} 1&0&0&0&0 \\ 0&1&0&0&0 \\ 0&0&1&0&0 \\ 0&0&0&1&0 \\ 0&0&0&0&1 \end{pmatrix} = \begin{pmatrix} e_1 & e_2 & e_3 & e_4 & e_5 \end{pmatrix} $$

where the columns $e_1 \cdots e_5$ of $I$ are the unit vectors in each component.

The identity matrix, which can be constructed by I(5) in Julia, has the property that $Ix=x$ for any $x$, and hence $IA = A$ for any (here $5 \times 5$) matrix $A$:

In [20]:

A = [4  -2  -7  -4  -8
     9  -6  -6  -1  -5
    -2  -9   3  -5   2
     9   7  -9   5  -8
    -1   6  -3   9   6] # a randomly chosen 5x5 matrix

Out[20]:

5×5 Matrix{Int64}:
  4  -2  -7  -4  -8
  9  -6  -6  -1  -5
 -2  -9   3  -5   2
  9   7  -9   5  -8
 -1   6  -3   9   6

In [21]:

b = [-7,2,4,-4,-7] # a randomly chosen right-hand side

Out[21]:

5-element Vector{Int64}:
 -7
  2
  4
 -4
 -7

In [22]:

I(5)

Out[22]:

5×5 Diagonal{Bool, Vector{Bool}}:
 1  ⋅  ⋅  ⋅  ⋅
 ⋅  1  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅
 ⋅  ⋅  ⋅  1  ⋅
 ⋅  ⋅  ⋅  ⋅  1

In [23]:

I(5) * b == b

Out[23]:

true

In [24]:

I(5) * A == A

Out[24]:

true

In [25]:

A * I(5) == A

Out[25]:

true

Notational ambiguity¶

In principle, "$I$" is a little ambiguous in linear algebra, because there are $m \times m$ identity matrices of every size $m$, and $I$ could refer to any of them. In practice, you usually infer the size of $I$ from context: if you see an expression like $Ix$ or $IA$ or $A + I$, then $I$ refers to the identity matrix of the corresponding size.

There is actually a built-in constant called I in Julia that represents just that: an identity matrix whose size is inferred from context:

In [26]:

Out[26]:

UniformScaling{Bool}
true*I

In [27]:

I * b

Out[27]:

5-element Vector{Int64}:
 -7
  2
  4
 -4
 -7

In [28]:

I * A == A == A * I

Out[28]:

true

In [29]:

I * [2, 3]

Out[29]:

2-element Vector{Int64}:
 2
 3

When there is an ambiguity, I will sometimes write $I_m$ for the $m \times m$ identity matrix.

For example, if $B$ is an $3\times2$ matrix, then

$$ I_3 B I_2 = B $$

where we need to multiply by a different-size identity matrix on the left and right.

In [30]:

B = [2 3
     4 5
     6 7]

Out[30]:

3×2 Matrix{Int64}:
 2  3
 4  5
 6  7

In [31]:

I * B * I

Out[31]:

3×2 Matrix{Int64}:
 2  3
 4  5
 6  7

In [32]:

I(3) * B * I(2)

Out[32]:

3×2 Matrix{Int64}:
 2  3
 4  5
 6  7

In [33]:

I(2) * B * I(3)  # should give an error: wrong-shape matrices!

DimensionMismatch("second dimension of D, 2, does not match the first of B, 3")

Stacktrace:
 [1] lmul!(D::Diagonal{Bool, Vector{Bool}}, B::Matrix{Int64})
   @ LinearAlgebra /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/diagonal.jl:241
 [2] *(D::Diagonal{Bool, Vector{Bool}}, A::Matrix{Int64})
   @ LinearAlgebra /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/diagonal.jl:224
 [3] _tri_matmul(A::Diagonal{Bool, Vector{Bool}}, B::Matrix{Int64}, C::Diagonal{Bool, Vector{Bool}}, δ::Nothing)
   @ LinearAlgebra /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:1132
 [4] _tri_matmul
   @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:1124 [inlined]
 [5] *(A::Diagonal{Bool, Vector{Bool}}, B::Matrix{Int64}, C::Diagonal{Bool, Vector{Bool}})
   @ LinearAlgebra /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:1120
 [6] top-level scope
   @ In[33]:1
 [7] eval
   @ ./boot.jl:373 [inlined]
 [8] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

In [34]:

I * B * I

Out[34]:

3×2 Matrix{Int64}:
 2  3
 4  5
 6  7

Matrix Inverses¶

It is often conceptually convenient to talk about the inverse $A^{-1}$ of a matrix $A$, which exists for any non-singular square matrix. This is the matrix such that $x = A^{-1}b$ solves $Ax = b$ for any $b$. The inverse is conceptually convenient becuase it allows us to move matrices around in equations almost like numbers (except that matrices don't commute!).

Equivalently, the inverse matrix $A^{-1}$ is the matrix such that $A^{-1} A = A A^{-1} = I$.

Why does this correspond to solving $Ax=b$? Multiplying both sides on the left by $A^{-1}$ (multiplying on the right would make no sense: we can't multiply vector×matrix!), we get

$$ A^{-1}Ax=Ix = x = A^{-1}b $$

In Julia, you can just do inv(A) to get the inverse of a matrix:

In [35]:

inv(A)

Out[35]:

5×5 Matrix{Float64}:
  0.0109991   0.529789  -0.908341  -0.635197  -0.0879927
  0.131989    0.35747   -0.900092  -0.622365  -0.055912
 -0.235564   -0.179652   0.370302   0.353804  -0.11549
 -0.301558   -0.69172    1.48701    1.16499    0.0791323
  0.2044      0.678582  -1.29667   -1.05408    0.0314696

In [36]:

round.(inv(A) * A, 3)

MethodError: no method matching round(::Float64, ::Int64)
Closest candidates are:
  round(::T, ::RoundingMode{:NearestTiesUp}) where T<:AbstractFloat at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/floatfuncs.jl:220
  round(::Type{T}, ::Integer) where T<:Integer at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/int.jl:645
  round(::Float64, ::RoundingMode{:ToZero}) at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/float.jl:371
  ...

Stacktrace:
 [1] _broadcast_getindex_evalf
   @ ./broadcast.jl:670 [inlined]
 [2] _broadcast_getindex
   @ ./broadcast.jl:643 [inlined]
 [3] getindex
   @ ./broadcast.jl:597 [inlined]
 [4] copy
   @ ./broadcast.jl:899 [inlined]
 [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(round), Tuple{Matrix{Float64}, Int64}})
   @ Base.Broadcast ./broadcast.jl:860
 [6] top-level scope
   @ In[36]:1
 [7] eval
   @ ./boot.jl:373 [inlined]
 [8] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

(This is the identity matrix up to roundoff errors: computers only keep about 15 significant digits when they are doing most arithmetic.)

Inverses and products¶

Inverses have a special relationship to matrix products: $$ (AB)^{-1} = B^{-1} A^{-1} $$

The reason for this is that we must have $(AB)^{-1} AB = I$, and it is easy to see that $B^{-1} A^{-1}$ does the trick. Equivalently, $AB$ is the matrix that first multiplies by $B$ then by $A$; to invert this, we must reverse the steps: first multiply by the inverse of $A$ and then by the inverse of $B$.

In [37]:

C = rand(4,4)
D = rand(4,4)
inv(C*D)

Out[37]:

4×4 Matrix{Float64}:
  2.22127   1.98224   -3.57568    -1.32172
  2.93196  -1.87908    1.29332    -1.45601
  1.02713   1.25965   -0.0833894  -2.21414
 -5.81069  -0.203271   2.05968     4.76163

In [38]:

inv(D)*inv(C)

Out[38]:

4×4 Matrix{Float64}:
  2.22127   1.98224   -3.57568    -1.32172
  2.93196  -1.87908    1.29332    -1.45601
  1.02713   1.25965   -0.0833894  -2.21414
 -5.81069  -0.203271   2.05968     4.76163

In [39]:

inv(C)*inv(D)  # this is not the inverse of C*D

Out[39]:

4×4 Matrix{Float64}:
  0.0647283  -3.91299    4.92046   -3.07784
 -1.34403    -0.547912  -0.499501   3.0236
 -1.23397    -0.192993   3.08178   -1.93372
  1.8898      3.55496   -5.53689    2.42184

Gaussian Elimination, Inverses, and LU Factorization¶

Recall from the last lecture that we were looking at the Gaussian-elimination steps to convert a matrix $A$ into upper-triangular form via a sequence of row operations:

$$ \underbrace{\begin{pmatrix} \boxed{1} & 3 & 1 \\ 1 & 1 & -1 \\ 3 & 11 & 6 \end{pmatrix}}_A\to \underbrace{\begin{pmatrix} \boxed{1} & 3 & 1 \\ 0 & \boxed{-2} & -2 \\ 0 & 2 & 3 \end{pmatrix}}_{E_1 A}\to \underbrace{\begin{pmatrix} \boxed{1} & 3 & 1 \\ 0 & \boxed{-2} & -2 \\ 0 & 0 & \boxed{1} \end{pmatrix}}_{E_2 E_1 A} $$

Since row operations correspond to multiplying by matrices on the *left*, constructed the "elimination matrices" $E_1$ and $E_2$ corresponding to each of these elimination steps (under the first and second pivots).

In [40]:

A = [1 3  1
     1 1 -1
     3 11 6]
E1 = [ 1 0 0
      -1 1 0
      -3 0 1]
E2 = [1 0 0
      0 1 0
      0 1 1]

Out[40]:

3×3 Matrix{Int64}:
 1  0  0
 0  1  0
 0  1  1

Thus, we arrived at the formula: $$ \underbrace{E_2 E_1}_E A = U $$ Notice that we multiplied $A$ by the elimination matrices from right to left in the order of the steps: it is $E_2 E_1 A$, not $E_1 E_2 A$. Because matrix multiplication is generally not commutative, $E_2 E_1$ and $E_1 E_2$ give different matrices:

In [41]:

E2*E1

Out[41]:

3×3 Matrix{Int64}:
  1  0  0
 -1  1  0
 -4  1  1

In [42]:

E1*E2

Out[42]:

3×3 Matrix{Int64}:
  1  0  0
 -1  1  0
 -3  1  1

In [43]:

inv(E2*E1)

Out[43]:

3×3 Matrix{Float64}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0

In [44]:

E1

Out[44]:

3×3 Matrix{Int64}:
  1  0  0
 -1  1  0
 -3  0  1

In [45]:

E2

Out[45]:

3×3 Matrix{Int64}:
 1  0  0
 0  1  0
 0  1  1

Notice, furthermore, that the matrices $E_1$ and $E_2$ are both lower-triangular matrices. This is a consequence of the structure of Gaussian elimination (assuming no row re-ordering): we always add the pivot row to rows below it, never above it.

The product of lower-triangular matrices is always lower-triangular too. (In homework, you will explore a similar property for upper-triangular matrices) In consequence, the product $E = E_2 E_1$ is lower-triangular, and Gaussian elimination can be viewed as yielding

$$EA=U$$

where $E$ is lower triangular and $U$ is upper triangular!

Inverse elimination: LU factors¶

However, in practice, it turns out to be more useful to write this as

$$A= E^{-1} U = E_1^{-1}E_2^{-1}U$$

where $E^{-1}$ is the inverse of the matrix $E$. We will have more to say about how to compute matrix inverses later in 18.06, but for now we just need to know that it is the matrix that reverses the steps of Gaussian elimination, taking us back from $U$ to $A$. Computing matrix inverses is laborious in general, but in this particular case it is easy. We just need to reverse the steps one by one starting with the last elimination step and working back to the first one.

Hence, just as for our general rule about inverses of products above, we need to reverse (invert) $E_2$ first on $U$, and then reverse (invert) $E_1$: $A = E_1^{-1} E_2^{-1} U$. But reversing an individual elimination step like $E_2$ is easy: we just flip the signs below the diagonal, so that wherever we added the pivot row we subtract and vice-versa. That is: $$ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix}^{-1} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -1 & 1 \end{pmatrix} $$ (The last elimination step was adding the second row to the third row, so we reverse it by subtracting the second row from the third row of $U$.)

Julia can compute matrix inverses for us with the inv function. (It doesn't know the trick of flipping the sign, which only works for very special matrices, but it can compute it the "hard way" so quickly (for such a small matrix) that it doesn't matter.) Of course that gives the same result:

In [46]:

inv(E2)

Out[46]:

3×3 Matrix{Float64}:
 1.0   0.0  0.0
 0.0   1.0  0.0
 0.0  -1.0  1.0

Similarly for $E_1$:

In [47]:

inv(E1)

Out[47]:

3×3 Matrix{Float64}:
 1.0  0.0  0.0
 1.0  1.0  0.0
 3.0  0.0  1.0

If we didn't make any mistakes, then $E_1^{-1} E_2^{-1} U$ should give $A$, and it does:

In [48]:

L, U = lu(A, Val{false})
U

MethodError: no method matching lu(::Matrix{Int64}, ::Type{Val{false}})
Closest candidates are:
  lu(::SciMLBase.DiffEqScaledOperator, ::Any...) at ~/.julia/packages/SciMLBase/XzX8e/src/operators/diffeq_operator.jl:129
  lu(::SciMLBase.AbstractDiffEqLinearOperator, ::Any...) at ~/.julia/packages/SciMLBase/XzX8e/src/operators/common_defaults.jl:33
  lu(::AbstractMatrix{T}, ::Union{NoPivot, RowMaximum}; check) where T at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/lu.jl:277
  ...

Stacktrace:
 [1] top-level scope
   @ In[48]:1
 [2] eval
   @ ./boot.jl:373 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196

In [49]:

inv(E1)*inv(E2)*U

Out[49]:

3×3 Matrix{Float64}:
 1.0   3.0   1.0
 1.0   1.0  -1.0
 3.0  11.0   6.0

We call inverse elimination matrix $L = E^{-1} = E_1^{-1} E_2^{-1}$ Since the inverses of each elimination matrix were lower-triangular (with flipped signs), their product $L$ is also lower triangular:

In [50]:

L = inv(E1)*inv(E2)

Out[50]:

3×3 Matrix{Float64}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0

As mentioned above, this is the same as the inverse of $E = E_2 E_1$:

In [51]:

inv(E2*E1)

Out[51]:

3×3 Matrix{Float64}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0

In [52]:

E1

Out[52]:

3×3 Matrix{Int64}:
  1  0  0
 -1  1  0
 -3  0  1

The final result, therefore, is that Gaussian elimination (without row swaps) can be viewed as a factorization of the original matrix $A$ $$ A = LU $$ into a product of lower- and upper-triangular matrices. (Furthermore, although we didn't comment on this above, $L$ is always 1 along its diagonal.) This factorization is called the LU factorization of $A$. (It's why we used the lu function in Julia above.) When a computer performs Gaussian elimination, what it computes are the $L$ and $U$ factors.

What this accomplishes is to break a complicated matrix $A$ into much simpler pieces $L$ and $U$. It may not seem at first that $L$ and $U$ are that much simpler than $A$, but they are: lots of operations that are very difficult with $A$, like solving equations or computing the determinant, become easy once you known $L$ and $U$.

WARNING: For MOST triangular matrices, negation ≠ inverse¶

Above, we used the trick that we can invert a single elimination matrix $E$ just by flipping the signs below the diagonal (to reverse the steps). This doesn't work for a general lower (or upper) triangular matrix.

For example, even for the $L$ matrix computed above, it doesn't work:

In [53]:

Out[53]:

3×3 Matrix{Float64}:
 1.0   0.0  0.0
 1.0   1.0  0.0
 3.0  -1.0  1.0

In [54]:

inv(L)

Out[54]:

3×3 Matrix{Float64}:
  1.0  0.0  0.0
 -1.0  1.0  0.0
 -4.0  1.0  1.0

Notice that the 3 in the lower-left corner of $L$ changes to a -4 in the lower-left of $L^{-1}$!

The reason it works for the "elementary" $E$ matrices representing elimination of a single column of $A$, but not for $L$ (or other lower-triangular matrices in general), is essentially that operations on different columns affect one another (the elimination steps don't commute).

In fact, there is no easy trick to invert a general lower/upper triangular matrix quickly. But you shouldn't have to! If you see an expression like $L^{-1} b$, you shouldn't compute $L^{-1}$ explicitly! Instead, solve $Lc = b$ by forward-substitution.

Similarly, if you see $U^{-1} c$ for an upper-triangular matrix, don't try to compute $U^{-1}$. Instead, solve $Ux=c$ by back-substitution.

Finding Matrix Inverses in General¶

How do we find $A^{-1}$? The key is the equation $A A^{-1} = I$, which looks just like $AX=B$ for the right-hand sides consisting of the columns of the identity matrix, i.e. the unit vectors. So, we just solve $Ax=e_i$ for $i=1,\ldots,5$, or equivalently do A \ I in Julia. Of course, Julia comes with a built-in function inv(A) for computing $A^{-1}$ as well:

In [55]:

A = [4  -2  -7  -4  -8
     9  -6  -6  -1  -5
    -2  -9   3  -5   2
     9   7  -9   5  -8
    -1   6  -3   9   6] # a randomly chosen 5x5 matrix

Out[55]:

5×5 Matrix{Int64}:
  4  -2  -7  -4  -8
  9  -6  -6  -1  -5
 -2  -9   3  -5   2
  9   7  -9   5  -8
 -1   6  -3   9   6

In [56]:

Ainv = A \ I(5)

Out[56]:

5×5 Matrix{Float64}:
  0.0109991   0.529789  -0.908341  -0.635197  -0.0879927
  0.131989    0.35747   -0.900092  -0.622365  -0.055912
 -0.235564   -0.179652   0.370302   0.353804  -0.11549
 -0.301558   -0.69172    1.48701    1.16499    0.0791323
  0.2044      0.678582  -1.29667   -1.05408    0.0314696

In [57]:

Ainv - inv(A)

Out[57]:

5×5 Matrix{Float64}:
 -1.73472e-18  -2.22045e-16   0.0          -1.11022e-16   0.0
  2.77556e-17   5.55112e-17  -1.11022e-16  -2.22045e-16   1.38778e-17
  2.77556e-17   1.11022e-16  -2.22045e-16  -1.11022e-16   1.38778e-17
  0.0          -1.11022e-16   0.0           0.0          -1.38778e-17
  0.0          -1.11022e-16   0.0           0.0           0.0

The difference is just roundoff errors. We can also check approximate equality (ignoring roundoff differences) with ≈ (typed by \approx followed by a tab):

In [58]:

Ainv ≈ inv(A)

Out[58]:

true

In [59]:

Ainv * A

Out[59]:

5×5 Matrix{Float64}:
  1.0           1.11022e-16   1.72085e-15  -6.66134e-16   1.88738e-15
 -1.70697e-15   1.0           1.9984e-15   -6.66134e-16   1.33227e-15
  4.44089e-16   1.11022e-15   1.0           4.44089e-16  -4.44089e-16
  1.19349e-15  -5.55112e-17   2.77556e-17   1.0           1.72085e-15
 -1.45023e-15  -1.05471e-15  -8.04912e-16  -2.77556e-16   1.0

(Again, we get $I$ up to roundoff errors because the computer does arithmetic only to 15–16 significant digits.)

In [60]:

A * Ainv

Out[60]:

5×5 Matrix{Float64}:
  1.0          -8.88178e-16   1.77636e-15   0.0          -5.55112e-17
  0.0           1.0          -8.88178e-16   0.0          -5.55112e-17
 -1.66533e-16   4.44089e-16   1.0           2.66454e-15   1.38778e-17
  0.0          -8.88178e-16   0.0           1.0          -5.55112e-17
  6.66134e-16   1.77636e-15  -3.55271e-15  -3.55271e-15   1.0

Normally, $AB \ne BA$ for two matrices $A$ and $B$. Why can we multiply $A$ by $A^{-1}$ on either the left or right and get the same answer $I$? It is fairly easy to see why:

$$ A A^{-1} = I \implies A A^{-1} A = IA = A = A (A^{-1} A) $$

Since $A (A^{-1} A) = A$, and $A$ is non-singular (so there is a unique solution to this system of equations), we must have $A^{-1} A = I$.

In [61]:

[A\b Ainv*b] # print the two results side-by-side

Out[61]:

5×2 Matrix{Float64}:
  0.505958   0.505958
 -0.928506  -0.928506
  2.16407    2.16407
  1.46166    1.46166
 -1.26428   -1.26428

(Almost) Never Compute Inverses!¶

Matrix inverses are funny, however:

Inverse matrices are very convenient in analytical manipulations, because they allow you to move matrices from one side to the other of equations easily.
Inverse matrices are almost never computed in "serious" numerical calculations. Whenever you see $A^{-1} B$ (or $A^{-1} b$), when you go to implement it on a computer you should read $A^{-1} B$ as "solve $AX = B$ by some method." e.g. solve it by A \ B or by first computing the LU factorization of $A$ and then using it to solve $AX = B$.

One reason that you don't usually compute inverse matrices is that it is wasteful: once you have $A=LU$ (later we will generalize this to "$PA = LU$"), you can solve $AX=B$ directly without bothering to find $A^{-1}$, and computing $A^{-1}$ requires much more work if you only have to solve a few right-hand sides.

Another reason is that for many special matrices, there are ways to solve $AX=B$ much more quickly than you can find $A^{-1}$. For example, many large matrices in practice are sparse (mostly zero), and often for sparse matrices you can arrange for $L$ and $U$ to be sparse too. Sparse matrices are much more efficient to work with than general "dense" matrices because you don't have to multiply (or even store) the zeros. Even if $A$ is sparse, however, $A^{-1}$ is usually non-sparse, so you lose the special efficiency of sparsity if you compute the inverse matrix.