One of the first things that most students learn about in linear algebra is the determinant of a matrix. Lots of useful formulas for 2×2 and 3×3 matrices can be expressed in terms of determinants, and determinants played a central role in linear algebra 100 years ago when most matrices were tiny.
Nowadays, determinants are much less useful as a practical tool, although they still occasionally show up. Determinant-related formulas are also useful in proving theorems in linear algebra. The basic computational problem, however, is that the determinant formulas don't scale — for a big matrix, there is almost always a better way of computing something than using explicit determinants, cofactors, Cramer's rule, and other tricks useful for small matrices.
Still, it is important to know what determinants are, and their basic properties. In 18.06, we mainly use determinants as a conceptual tool to help us understand eigenvalues via the characteristic polynomial — although, again, this is not a practical computational tool for eigenvalues, which are nowadays computed by very different methods.
In high school, you may have learned some explicit formulas for determinants of small matrices.
These formulas quickly become computational useless for larger matrices (although there is a there is still a theoretical formula we'll give at the end), but it is nice to see a few of them.
The computer is much better at writing them down as the matrices get larger. We'll use the Symbolics.jl package for symbolic algebra in Julia to write out some determinant formulas with symbols, not numbers:
using Symbolics, LinearAlgebra
We can easily define a $3 \times 3$ matrix of symbolic variables in $a_{i,j}$ format and take its determinant:
@variables a[1:3, 1:3]
A = collect(a)
expand(det(A))
… but these $a_{i,j}$ variables can be a little hard to read. Let's define some more colorful symbols:
@variables a b c d 🍎 🍌 🍪 🥟 α β γ δ ♣ ♡ ♠ ♢
A = [a b
🍎 🍌]
Here is the determinant of a $2\times 2$ matrix:
det(A)
and here is $3 \times 3$:
A = [a b c
🍎 🍌 🍪
α β γ]
expand(det(A))
Notice that the terms in the determinant contain exactly one value from each row and one from each column.
This pattern continues for $4\times 4$, which is rapidly getting messier:
A = [ a b c d
🍎 🍌 🍪 🥟
α β γ δ
♣ ♡ ♠ ♢]
expand(det(A))
By $n=5$ these formulas have gotten ridiculous:
@variables a[1:5,1:5]
A = collect(a)
expand(det(A))
In fact, the number of terms in these formulas increases faster than exponentially with $n$, as we shall see. If we want to use determinants at all, we want a better way to think about them (and compute them) if possible.
@variables a # make this back to an ordinary scalar variable
The property that most students learn about determinants of 2×2 and 3×3 is this: given a square matrix A, the determinant det(A) is some number that is zero if and only if the matrix is singular.
For example, the following matrix is not singular, and its determinant (det(A)
in Julia) is nonzero:
A = [1 3
2 4]
det(A)
-2.0
(You may even remember the formula for the 2×2 determinant: $1 \times 4 - 3 \times 2 = -2$.
But this matrix is singular (the second column is twice the first), and so its determinant is zero:
A = [1 2
2 4]
det(A)
0.0
In 18.06, we know have another way to check whether a matrix is zero: perform Gaussian elimination, and then check whether any pivots (diagonal entries of U) are zero.
But this gives us an obvious way to construct a single determinant-like number: just multiply the pivots together, and the result will be zero if and only if the matrix is singular.
In fact, this intuition turns out to be almost exactly the right guess:
We can check it for a random matrix:
A = randn(5,5)
det(A)
-1.9357572476241076
L,U = lu(A, NoPivot()) # LU without row swaps
U
5×5 Matrix{Float64}: -2.72812 1.17583 -1.60671 1.14719 0.456614 0.0 -0.307043 -1.2456 -0.365474 0.592953 0.0 0.0 -6.08018 -0.940134 2.8862 0.0 0.0 0.0 -0.791331 0.368806 0.0 0.0 0.0 0.0 -0.480301
prod(diag(U)) # the product of the diagonal elements of U
-1.9357572476241098
Note that this matches det(A)
(up to roundoff errors in the last few digits).
This immediately gives you a hint of why the determinant is not such a useful computational tool as you might have thought:
The most efficient way to compute a determinant, in general, is to do Gaussian elimination and then multiply the pivots together.
Once you have done elimination, you already know whether the matrix is singular and you can already solve $Ax=b$ efficiently, so the determinant is mostly superfluous.
We'll discuss some actual determinant applications later.
Although we could use the "product of the pivots" as the definition of the determinant (at least for matrices), it is more typical to build up the definition of the determinant from more basic properties, and to get the product of the pivots as a consequence. We will do that now.
The following three properties are actually sufficient to uniquely define the determinant of any matrix, and are taken from Strang's Introduction to Linear Algebra, section 5.1.
Therefore, we don't derive these properties: they are axioms that serve to define the determinant operation.
It is clear that the identity matrix $I$ is not singular, and all its pivots are 1. A reasonable starting point for defining determinants, therefore, is to require:
For example:
I₅ = I(5) * 1
5×5 Diagonal{Int64, Vector{Int64}}: 1 ⋅ ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ 1
det(I₅)
1
The second key property is:
It's easy to see this for the high-school $2\times 2$ matrix formula:
det([a b
🍎 🍌])
det([🍎 🍌
a b])
Or for the identity matrix, where swapping the first two rows gives a determinant $-1$:
I₅_swapped = I₅[ [2,1,3,4,5], : ]
5×5 SparseArrays.SparseMatrixCSC{Int64, Int64} with 5 stored entries: ⋅ 1 ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ 1 ⋅ ⋅ ⋅ ⋅ ⋅ 1
det(I₅_swapped)
-1.0
As another example, let's try it with a random $5 \times 5$ matrix $A$:
A = rand(-3:3, 5,5)
5×5 Matrix{Int64}: 1 -3 3 2 -1 0 -1 0 2 1 -2 -3 2 1 1 -2 -3 0 -2 -3 3 0 -2 -2 1
Swapping the first two rows gives the matrix $B$:
B = A[ [2,1,3,4,5], : ]
5×5 Matrix{Int64}: 0 -1 0 2 1 1 -3 3 2 -1 -2 -3 2 1 1 -2 -3 0 -2 -3 3 0 -2 -2 1
Hence the determinants are equal and opposite:
det(A), det(B)
(372.0, -371.9999999999999)
(Up to roundoff errors, of course.)
The determinant will not be a linear operation on the whole matrix: $\det(A+B) \ne \det A + \det B$!! But, we would like it to be linear with respect to operations on individual rows.
This means two things:
This axiom actually makes a lot of sense if you think about the example of the identity matrix. Multiplying the first row of $I$ by $\alpha$ leads to the matrix:
$$ \begin{pmatrix} \alpha & 0 & 0 & 0 & \cdots \\ 0 & 1 & 0 & 0 & \cdots \\ 0 & 0 & 1 & 0 & \cdots \\ 0 & 0 & 0 & 1 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \ddots \\ \end{pmatrix} $$The determinant of this matrix is exactly $\alpha$! As $\alpha \to 0$, this matrix becomes singular, and the determinant goes to zero at the same rate. It is also consistent with our "product of the pivots" intuitive guess above, because the pivots here are $(\alpha, 1, 1, \cdots)$.
We can also try this with our random matrix $A$ from above. Let's multiply the second row by 2:
C = copy(A)
C[2,:] = 2*A[2,:]
C
5×5 Matrix{Int64}: 1 -3 3 2 -1 0 -2 0 4 2 -2 -3 2 1 1 -2 -3 0 -2 -3 3 0 -2 -2 1
det(A), det(C)
(372.0, 744.0)
As expected, the determinant doubles.
As a consequence of this, if you multiply an entire $m\times m$ matrix $A$ by $\alpha$, we obtain:
This is not an axiom, it is a consequence of the axiom above: we pick up a factor of $\alpha$ for each row that we scale.
For our $5 \times 5$ matrix $A$, this means that $\det(2A) = 2^5 \det A = 32 \det A$:
det(2A) / det(A)
32.0
If we think back to our high-school formulas, this is consistent — the determinant terms each had exactly one factor from each row. So, if we multiply any row by $\alpha$, then the determinant multiples by $\alpha$, and if we multiply all of the rows by $\alpha$ then we get a factor of $\alpha^m$.
There is a second property of linearity, corresponding to vector addition:
This is easier to explain with an example:
$$ \det \begin{pmatrix} a + a' & b + b' \\ c & d \end{pmatrix} = \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} + \det \begin{pmatrix} a' & b' \\ c & d \end{pmatrix} \; . $$Or, in terms of our matrix $A$ from above, let's add $(1,2,3,4,5)$ to the first row:
A
5×5 Matrix{Int64}: 1 -3 3 2 -1 0 -1 0 2 1 -2 -3 2 1 1 -2 -3 0 -2 -3 3 0 -2 -2 1
[1,0,0,0,0] * [1 2 3 4 5] # = column * row = outer product
5×5 Matrix{Int64}: 1 2 3 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
det(A + [1,0,0,0,0] * [1 2 3 4 5])
452.0
This should be the same as $\det A$ plus the determinant of $A$ with the first row replaced by $(1,2,3,4,5)$:
A′ = copy(A)
A′[1,:] = [1,2,3,4,5] # replace first row
A′
5×5 Matrix{Int64}: 1 2 3 4 5 0 -1 0 2 1 -2 -3 2 1 1 -2 -3 0 -2 -3 3 0 -2 -2 1
det(A) + det(A′)
452.0
Yup, it matches (up to roundoff errors, of course).
The following properties can be derived from the above 3, and are quite useful to know. Again, the numbering follows Strang, section 5.1:
It's easy to see why this follows from property 2: if we swap two equal rows, the matrix doesn't change, but the determinant must flip sign. But this means:
$$\det A = -\det A \implies \det A = 0$$For example:
det([ 1 2 3
4 5 6
1 2 3 ])
0.0
This property also makes sense if our expectation is that the determinant is zero for singular matrices: if two rows are equal, the matrix is singular.
Suppose we take a matrix $A$, and subtract (or add) a multiple of one row from another. For example:
$$ \det \begin{pmatrix} a & b \\ c - \alpha a & d - \alpha b \end{pmatrix} = \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} - \alpha \det \begin{pmatrix} a & b \\ a & b \end{pmatrix} = \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} + 0 $$Here, we applied axiom 3 (linearity), and then property 4 (repeated rows).
The same thing happens for any size of matrix.
But this is precisely the kind of operation that we perform during Gaussian elimination. It has the crucial implications:
Elimination operations on rows don't change the determinant.
Gaussian elimination without row swaps doesn't change the determinant.
And, by axiom 2:
For example:
L, U = lu(A, NoPivot()) # elimination without row swaps
U
5×5 Matrix{Float64}: 1.0 -3.0 3.0 2.0 -1.0 0.0 -1.0 0.0 2.0 1.0 0.0 0.0 8.0 -13.0 -10.0 0.0 0.0 0.0 -6.25 -6.5 0.0 0.0 0.0 0.0 7.44
det(A), det(U)
(372.0, 372.0)
This is easy to see from axiom 3 (linearity): if we multiply the row of zeros by zero, it doesn't change the matrix but multiplies the determinant by zero, hence:
$$ 0 \times \det A = \det A \implies \det A = 0 $$For example:
det([1 2 3
4 5 6
0 0 0])
0.0
This is another incredibly useful property. To see this, suppose we have an upper-triangular matrix $U$. Then:
Eliminate "upward" above the pivots to get a diagonal matrix $D$. This doesn't change the determinant by property 5.
Pull out each diagonal element by axiom 3 (linearity) until you get the identity matrix $I$ whose determinant is 1 by axiom 1:
which is precisely the product of the diagonals.
If we have a zero diagonal entry, we can't eliminate upward above it (we can't divide by the diagonal "pivot"). But in that case we end up with a row of zeros after eliminating above the other diagonals, and by property 6 we get a zero determinant. So it still matches the product of the diagonal entries.
Similarly for a lower triangular matrix, except that we eliminate "downward".
We already saw an example of this earlier, but let's do it again. We got our $U$ matrix from elimination on $A$:
U
5×5 Matrix{Float64}: 1.0 -3.0 3.0 2.0 -1.0 0.0 -1.0 0.0 2.0 1.0 0.0 0.0 8.0 -13.0 -10.0 0.0 0.0 0.0 -6.25 -6.5 0.0 0.0 0.0 0.0 7.44
Its diagonal entries are:
diag(U)
5-element Vector{Float64}: 1.0 -1.0 8.0 -6.25 7.4399999999999995
The product of these is:
prod(diag(U))
372.0
which matches $\det U$ (and $\det A$):
det(U), det(A)
(372.0, 372.0)
If we do need to compute the determinant, this gives us a very practical way to do it: compute det(A) by taking the product of the pivots after elimination, with a sign flip for every row swap.
This is, in fact exactly what the Julia det
function does, as you can check by looking at the source code:
@which det(A)
@which det(UpperTriangular(U))
From the source code, this calls det(lufact(A))
, which calls:
@which det(lufact(A))
UndefVarError: lufact not defined Stacktrace: [1] top-level scope @ In[45]:1 [2] eval @ ./boot.jl:373 [inlined] [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String) @ Base ./loading.jl:1196
This follows from property 7. Since the determinant is ± the product of the pivots, we get zero if and only if there is a zero pivot, corresponding to a singular matrix.
This is an amazing property of determinants, and probably the least obvious.
A nice way to show this (from Strang's book) is simply to check that $$\det(AB)/\det(B)$$ satisfies axioms 1,2,3 for A. If it does, then it must be $\det A$, and we are done! Let's check:
Let's try it:
B = rand(-3:3, 5,5)
5×5 Matrix{Int64}: 2 2 0 3 3 0 -2 -2 -2 -3 -3 2 3 2 0 3 3 -2 -2 -1 3 -3 0 1 -2
det(A), det(B)
(372.0, 556.0)
det(A*B), det(A)*det(B)
(206831.99999999994, 206832.0)
This rule has important consequences for matrix inverses. First:
$$\det (A^{-1}) = 1 / \det(A)$$Proof: $1 = \det(I) = \det(A A^{-1}) = \det(A) \det(A^{-1})$.
For example:
det(inv(A)), 1/det(A)
(0.002688172043010752, 0.002688172043010753)
Recall from last lecture that $X A X^{-1}$ corresponds simply to a change of basis from $A$. (We will later call this a similar matrix to $A$). Now we know:
$$ \det(X A X^{-1}) = \det(X) \det(A) \det(X^{-1}) = \det(A) \; . $$That is, a change of basis doesn't change the determinant.
This is another non-obvious, but very important, property of determinants. It is relatively easy to see from properties 7 and 9, however.
In particular, factorize $PA = LU$, or $A = P^T L U$. Then, from property 9:
$$ \det(A) = \det(P^T) \det(L) \det(U) = \det(P^T) \det(U) = \det(P^T) \times \mbox{(product of pivots)} \; , $$where we have used the fact that $\det L = 1$ since the diagonal entries of $L$ are all 1's. But we also have:
$$ \det(A^T) = \det(U^T L^T P) = \det(U^T) \det(L^T) \det(P) = \det(P) \times \mbox{(product of pivots)} $$where we have used the fact that $U$ and $U^T$ have the same diagonal entries and hence the same determionant ($U$ is upper triangular so $U^T$ is lower triangular), similarly for $L$.
So the only difference is $\det P$ vs $\det P^T$. But recall that $P$ is a permutation matrix: just a re-ordering of the rows of $I$, so by axioms 1 and 2 we must have $$\det P = \pm 1$$ (depending on how many row swaps there were). Furthermore, recall that $P^T P = I$ (the permution $P$ is orthogonal since its rows/columns are orthogonal (a re-ordering of the rows/columns of $I$). By property 9, this means that $$ 1 = \det I = \det (P^T P) = \det(P^T) \det(P) $$ Since $\det P = \pm 1$, this means that $\det(P^T) = \det(P)$.
It follows that
$$ \det A = \det A^T = \pm \mbox{(product of pivots)} $$For example:
det(A), det(A')
(372.0, 372.0)
Ignoring formulas (e.g. Cramer's rule, a formula for $A^{-1}$ — see Strang, section 5.3) that are mainly useful for tiny matrices, here are some examples of real usages of determinants even for large matrices:
The reason a determinant arises here is that, more generally, det(A) is the volume of a parallelepiped ("box") whose edges are given by the columns of $A$.
Integration may sound like something that only happens in a few dimensions (= tiny matrices J), but extremely high dimensional (even infinite-dimensional) integrals appear in statistics, quantum field theory, bioinformatics, and other fields.
This is no doubt an incomplete list. Nevertheless, although determinants are a much more marginal topic in modern linear algebra than they were in the 19th century, they have hardly disappeared.
You probably learned a neat formula for the determinant of a $2\times2$ matrix at some point:
$$ \det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc \; $$You might have even learned a formula for $3\times3$ matrices. You might be hoping, therefore, that there would be an extension of this "nice" formula (which seems a lot easier than doing elimination to get pivots) to arbitrary matrices. There is!
Here it is (see Strang, section 5.2):
$$ \det A = \sum_{\mbox{permutations }p} \operatorname{sign}(p) \times (\mbox{product of diagonals of }A\mbox{ with columns permuted by }p) $$The important thing to know is that you have to consider all permutations (re-orderings) of $(1,2,3,\ldots,n)$. (The sign of the permutation corresponds to the number of swaps it involves.) There are $n! = n (n-1)(n-2)\cdots 1$ (n factorial) re-orderings.
That means that this formula requires $\sim n \times n!$ scalar operations, which is worse than exponential in $n$. This is far more expensive than elimination ($\sim n^3$), making this formula computationally useless for $n > 3$.
(There is also another computationally useless formula involving minors and cofactors; see Strang, section 5.2.)
The permutation formula is still sometimes useful conceptually, however.