The Gauss–Jordan algorithm is a technique for hand-calculation of the inverse. Nowadays, you should hardly ever compute a matrix inverse, even on a computer, but Gauss–Jordan is still useful to go over:
It helps us to understand when and why an inverse matrix exists.
It gives us yet another example to help us understand the structure of elimination operations
using LinearAlgebra # as usual, we'll load this package
The inverse of a linear operator $A$ is the operat that "undoes" the action of $A$:
$$ \boxed{A^{-1}(Ax) = x} . $$for any $x$. Equivalently, $\boxed{Ax=b \implies x = A^{-1} b}$. This means that
since for non-square matrices or matrices with one or more "zero pivots" we can't always solve $Ax=b$ (we'd divide by zero during backsubstitution). It is also easy to see that $\boxed{(A^{-1})^{-1} = A}$, i.e. that $A$ undoes the action of $A^{-1}$.
Equivalently, $$ \boxed{AA^{-1} = A^{-1} A = I} $$ where $I$ is the m×m identity matrix — in linear algebra, we typically infer the size of $I$ from context, but if it is ambiguous we might write $I_m$.
It is easy to see that the inverse of a product $BA$ is the product of the inverses in reverse order: $\boxed{(AB)^{-1} = B^{-1} A^{-1}}$. Intuitively, when you reverse a sequence of operations, you always need to retrace your steps in backwards order. Explicitly: $$ (AB)^{-1} AB = B^{-1} \underbrace{A^{-1} A}_I B = B^{-1} B = I \, . $$
For example, we saw that Gaussian elimination corresponded to the factorization $A = LU$, where $U$ is the result of elimination and $L$ is simply a record of the elimination steps. Then $$ Ax = b \implies x = A^{-1} b = (LU)^{-1} b = \underbrace{U^{-1} \underbrace{ L^{-1} b }_\mbox{forward substitution}}_\mbox{backsubstitution} \, . $$
In general rarely if ever compute inverses explicitly:
More on this below. Instead, inverses are mostly a conceptual tool to move operators/matrices around in equations. Once we have the equations in the form that we want, we then carry out the computations in some other way.
Inverses allow us to "divide by matrices", but we always have to be clear about whether we are dividing on the left or on the right. The following notations can be convenient, and are used in computer software like Julia and Matlab and elsewhere for square invertible matrices $A$:
$$ B / A = BA^{-1}, \\ A \backslash B = A^{-1} B$$The equation $A A^{-1} = I$ actually gives us the algorithm to compute $A^{-1}$.
Suppose we denote the columns of $A^{-1} = \begin{pmatrix} x_1 & x_2 & \cdots & x_m \end{pmatrix}$, and the columns of $I = \begin{pmatrix} e_1 & e_2 & \cdots & e_m \end{pmatrix}$.
Then $$ A \underbrace{\begin{pmatrix} x_1 & x_2 & \cdots & x_m \end{pmatrix}}_{A^{-1}} = \begin{pmatrix} A x_1 & A x_2 & \cdots & A x_n \end{pmatrix} = \underbrace{\begin{pmatrix} e_1 & e_2 & \cdots & e_m \end{pmatrix}}_I. $$ (The key fact here is that multiplying A by a matrix on the right is equivalent to multiplying A by each column of that matrix, which you can easily see by writing out the computation.)
In consequence $A x_k = e_k$, which is a linear equation for the k-th column of A⁻¹. Equivalently, to find A⁻¹ for an m×m matrix A, we must solve Ax=b for m right-hand sides equal to the columns of I.
Put another way, for any matrix $B$, $Be_k = k\mbox{-th column of }B$. So the k-th column of $A^{-1}$ is $x_k = A^{-1} e_k$, i.e. the solution to $Ax_k = e_k$.
Ideally, we do Gaussian elimination $A=LU$ once, then compute $x_k = U^{-1} L^{-1} e_k$ by forward+back-substitution for each column of $I$. (This is essentially what the computer does.)
For example, how might we compute the inverse of the L matrix we got from Gaussian elimination in the last lecture, which should give us $L^{-1} = E$? We solve
$$ \underbrace{\begin{pmatrix} 1 & & \\ 1 & 1 & \\ 3 & -1 & 1 \end{pmatrix}}_L x_k = e_k $$for $e_1,e_2,e_3$ (the columns of the 3×3 identity I).
Let's do it for $e_1$, to find the first column $x_1$ of $L^{-1} = E$: $$ \underbrace{\begin{pmatrix} 1 & & \\ 1 & 1 & \\ 3 & -1 & 1 \end{pmatrix}}_L \underbrace{\begin{pmatrix} a \\ b \\ c \end{pmatrix}}_{x_1} = \underbrace{\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}}_{x_1} $$ By forward substitution (from top to bottom), we get $a = 1$, $1a + 1b = 0 \implies b = -1$, $3a - 1b + 1c = 0 \implies c = -4$, so $\boxed{x_1 = [1, -1, -4]}$. Let's check:
L = [1 0 0
1 1 0
3 -1 1]
3×3 Matrix{Int64}: 1 0 0 1 1 0 3 -1 1
E = L^-1
3×3 Matrix{Float64}: 1.0 0.0 0.0 -1.0 1.0 0.0 -4.0 1.0 1.0
E[:,1] # first column
3-element Vector{Float64}: 1.0 -1.0 -4.0
Yup, the first column is [1, -1, -4]
. We could easily get the other two columns as well (left as an exercise).
Important note: there is no simple formula* for the inverse of a triangular matrix like L or U! You can invert individual elimination steps $E_k$ by flipping signs, but the product of the elimination steps is not so easy to invert.
(A lot of students get confused by this because Strang's lectures and textbook start by inverting individual elimination steps, which is easier.)
Another way to write this is L \ I
, which conceptually means "multiply $I$ by $L^{-1}$ on the left", but actually in Julia is computed without inverting any matrix explicitly, by instead solving with 3 right-hand sides:
L \ I
3×3 Matrix{Float64}: 1.0 0.0 0.0 -1.0 1.0 0.0 -4.0 1.0 1.0
Note that I
is a special object defined by Julia's LinearAlgebra
package which essentially means an identity matrix whose size is inferred from context.
If we want an $m \times m$ identity matrix, we can use I(m)
:
I(3)
3×3 Diagonal{Bool, Vector{Bool}}: 1 ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ 1
Gauss–Jordan could be viewed as just a trick (primarily for hand calculation) to organize solving $A b_k = e_k$. But it's also nice to think about algebraically — it is a nice application of our "matrix viewpoint" of Gaussian elimination.
The Gauss–Jordan idea, in a nutshell is: if we do some row operations on A to obtain I, then doing the same row operations on I gives A⁻¹. Why?
Row operations correspond to multiplying $A$ by a some matrix $E=\cdots E_2 E_1$ on the left.
So, doing row operations that turn $A$ into $I$ means that $EA = I$, and hence $E = A^{-1}$.
Doing the same row operations on $I$ is equivalent to multiplying $I$ on the left by the same matrix $E$, giving $EI$. But $EI = E$, and $E = A^{-1}$, so this gives $A^{-1}$!
As usual for Gaussian elimination, to do the same row operations on both $A$ and $I$ we augment A with $I$. That is, we do:
$$ \boxed{ \left(\begin{array}{c|c}A & I\end{array}\right) \underset{\mbox{row ops}}{\longrightarrow} \left(\begin{array}{c|c}I & A^{-1}\end{array}\right) } $$How do we do row operations to turn $A$ into $I$? Simple:
First, do ordinary Gaussian elimination "downwards" to turn $A$ into $U$ (an upper-triangular matrix).
Then, do Gaussian elimination "upwards" on $U$ to eliminate entries above the diagonal, turning $U$ into a diagonal matrix $D$
Finally, divide each row of $D$ by the diagonal entry to turn it into $I$.
Let's perform these $A \to I$ elimination steps on $3 \times 3$ matrix $A$: first eliminate down to make $U$, then eliminate up to make $D$, then divide by the diagonals to make $I$:
$$ \underbrace{\begin{pmatrix} \boxed{1} & 4 & 1 \\ 1 & 2 & -1 \\ 3 & 14 & 6 \end{pmatrix}}_A \longrightarrow \begin{pmatrix} \boxed{1} & 4 & 1 \\ 0 & \boxed{-2} & -2 \\ 0 & 2 & 3 \end{pmatrix} \longrightarrow \underbrace{\begin{pmatrix} \boxed{1} & 4 & 1 \\ 0 & \boxed{-2} & -2 \\ 0 & 0 & \boxed{1} \end{pmatrix}}_U \\ \longrightarrow \begin{pmatrix} 1 & 0 & -3 \\ 0 & \boxed{-2} & -2 \\ 0 & 0 & 1 \end{pmatrix} \longrightarrow \underbrace{\begin{pmatrix} 1 & 0 & 0 \\ 0 & -2 & 0 \\ 0 & 0 & \boxed{1} \end{pmatrix}}_D \longrightarrow \underbrace{\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}}_I $$No problem! It is easy to see that this will work whenever A has all of its pivots (i.e. it is non-singular).
To get the inverse, we needed to augment this with $I$ so that we perform the same elimination steps on both.
$$ \left(\begin{array}{rrr|rrr} \boxed{1} & 4 & 1 & 1 & 0 & 0 \\ 1 & 2 & -1 & 0 & 1 & 0 \\ 3 & 14 & 6 & 0 & 0 & 1 \end{array}\right) \longrightarrow \left(\begin{array}{rrr|rrr} \boxed{1} & 4 & 1 & 1 & 0 & 0 \\ 0 & \boxed{-2} & -2 & -1 & 1 & 0 \\ 0 & 2 & 3 & -3 & 0 & 1 \end{array}\right) \\ \longrightarrow \left(\begin{array}{rrr|rrr} \boxed{1} & 4 & 1 & 1 & 0 & 0 \\ 0 & \boxed{-2} & -2 & -1 & 1 & 0 \\ 0 & 0 & \boxed{1} & -4 & 1 & 1 \end{array}\right) \longrightarrow \left(\begin{array}{rrr|rrr} 1 & 0 & -3 & -1 & 2 & 0 \\ 0 & \boxed{-2} & -2 & -1 & 1 & 0 \\ 0 & 0 & 1 & -4 & 1 & 1 \end{array}\right) \\ \longrightarrow \left(\begin{array}{rrr|rrr} 1 & 0 & 0 & -13 & 5 & 3 \\ 0 & -2 & 0 & -9 & 3 & 2 \\ 0 & 0 & \boxed{1} & -4 & 1 & 1 \end{array}\right) \longrightarrow \left(\begin{array}{rrr|rrr} 1 & 0 & 0 & -13 & 5 & 3 \\ 0 & 1 & 0 & 4.5 & -1.5 & -1 \\ 0 & 0 & 1 & -4 & 1 & 1 \end{array}\right) $$Whew, this was a lot of work! Did we get the right answer?
A = [1 4 1
1 2 -1
3 14 6]
3×3 Matrix{Int64}: 1 4 1 1 2 -1 3 14 6
A^-1
3×3 Matrix{Float64}: -13.0 5.0 3.0 4.5 -1.5 -1.0 -4.0 1.0 1.0
Hooray!
(It is really easy to make a mistake during this process.)
Matrix inverses are funny, however:
Inverse matrices are very convenient in analytical manipulations, because they allow you to move matrices from one side to the other of equations easily.
Inverse matrices are almost never computed in "serious" numerical calculations. Whenever you see $A^{-1} B$ (or $A^{-1} b$), when you go to implement it on a computer you should read $A^{-1} B$ as "solve $AX = B$ by some method." e.g. solve it by A \ B
or by first computing the LU factorization of $A$ and then using it to solve $AX = B$.
One reason that you don't usually compute inverse matrices is that it is wasteful: once you have $A=LU$ (later we will generalize this to "$PA = LU$"), you can solve $AX=B$ directly without bothering to find $A^{-1}$, and computing $A^{-1}$ requires much more work if you only have to solve a few right-hand sides.
Another reason is that for many special matrices, there are ways to solve $AX=B$ much more quickly than you can find $A^{-1}$. For example, many large matrices in practice are sparse (mostly zero), and often for sparse matrices you can arrange for $L$ and $U$ to be sparse too. Sparse matrices are much more efficient to work with than general "dense" matrices because you don't have to multiply (or even store) the zeros. Even if $A$ is sparse, however, $A^{-1}$ is usually non-sparse, so you lose the special efficiency of sparsity if you compute the inverse matrix.
For example:
If you see $U^{-1} b$ where $U$ is upper triangular, don't compute $U^{-1}$ explicitly! Just solve $Ux = b$ by back-substitution (from the bottom row up).
If you see $L^{-1} b$ where $L$ is lower triangular, don't compute $L^{-1}$ explicitly! Just solve $Lx = b$ by forward-substitution (from the top row down).