using LinearAlgebra
If we can find a solution $x \ne 0$ to
$$ Ax = \lambda x $$then, for this vector, the matrix $A$ acts like a scalar. $x$ is called an eigenvector of $A$, and $\lambda$ is called an eigenvalue.
In fact, for an $m \times m$ matrix $A$, we typically find $m$ linearly independendent eigenvectors $x_1,x_2,\ldots,x_m$ and $m$ corresponding eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_m$. Such a matrix is called diagonalizable. Most matrices are diagonalizable; we will deal with the rare "defective" (non-diagonalizable) case later.
Given such a basis of eigenvectors, the key idea for using them is:
Take any vector $x$ and expand it in this basis: $x = c_1 x_1 + \cdots c_m x_n$, or $x = Xc$ or $c = X^{-1}x$ where $X$ is the matrix whose columns are the eigenvectors.
For each eigenvector $x_k$, the matrix $A$ acts like a scalar $\lambda_k$. Multiplication or division corresponds to multiplying/dividing $x_k$ by $\lambda_k$. Solve your problem for each eigenvector by treating A as the scalar λ.
Add up the solution to your problem (sum the basis of the eigenvectors). That is, multiply the new coefficients by $X$.
This process of expanding in the eigenvectors, multiplying (or whatever) by λ, and then summing up the eigenvectors times their new coefficients, is expressed algebraically as the following diagonalization of the matrix $A$:
$$ A = X \Lambda X^{-1} $$where $\Lambda$ is the diagonal matrix of the eigenvalues and $X = \begin{pmatrix} x_1 & x_2 & \cdots & x_m \end{pmatrix}$ from above.
For example, consider the matrix
$$ A = \begin{pmatrix} 1 & 1 \\ -2 & 4 \end{pmatrix} $$whose eigenvalues are $\lambda_1 = 2$ and $\lambda_2 = 3$ and whose eigenvectors are $x_1 = \begin{pmatrix}1\\1\end{pmatrix}$ and $x_2 = \begin{pmatrix}1\\2\end{pmatrix}$.
We put these eigenvectors into a matrix $X = \begin{pmatrix} x_1 & x_2 \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}$. The matrix is invertible: $A$ is diagonalizable, since the eigenvectors form a basis.
A = [1 1
-2 4]
eigvals(A)
2-element Vector{Float64}: 2.0 3.0
X = [1 1
1 2]
2×2 Matrix{Int64}: 1 1 1 2
To write any vector $x$ in the basis of eigenvectors, we just want $x = Xc$ or $c = X^{-1} x$. For example:
x = [1,0]
c = X \ x
2-element Vector{Float64}: 2.0 -1.0
i.e. $x = \begin{pmatrix} 1 \\ 0 \end{pmatrix} = 2 \begin{pmatrix} 1 \\ 1 \end{pmatrix} - \begin{pmatrix} 1 \\ 2 \end{pmatrix}$, which is obviously correct.
$Ax = \lambda_1 c_1 x_1 + \lambda_2 c_2 x_2$, or equivalently $$ Ax = X \begin{pmatrix} \lambda_1 c_1 \\ \lambda_2 c_2 \end{pmatrix} = X \underbrace{\begin{pmatrix} \lambda_1 & \\ & \lambda_2 \end{pmatrix}}_\lambda c = X \Lambda X^{-1} x $$
A*x
2-element Vector{Int64}: 1 -2
Λ = Diagonal([2, 3])
2×2 Diagonal{Int64, Vector{Int64}}: 2 ⋅ ⋅ 3
X*Λ*c
2-element Vector{Float64}: 1.0 -2.0
X * Λ * inv(X) * x
2-element Vector{Float64}: 1.0 -2.0
Since this is true for every $x$, it means $\boxed{A = X \Lambda X^{-1}}$:
X * Λ / X # / X is equivalent to multiplying by inv(X), but is more efficient
2×2 Matrix{Float64}: 1.0 1.0 -2.0 4.0
Another way to see this is to consider $$AX = \begin{pmatrix} Ax_1 & Ax_2 \end{pmatrix} = \begin{pmatrix} \lambda_1 x_1 & \lambda_2 x_2 \end{pmatrix} = X \Lambda$$
A*X
2×2 Matrix{Int64}: 2 3 2 6
X*Λ
2×2 Matrix{Int64}: 2 3 2 6
It follows that $A = X\Lambda X^{-1}$ or $\boxed{\Lambda = X^{-1} A X}$:
X \ A * X
2×2 Matrix{Float64}: 2.0 0.0 0.0 3.0
The key thing is that this works for any matrix, as long as the eigenvectors form a basis. Such a matrix is called diagonalizable, and it turns out that this is true for almost all matrices; we will deal with the rare exceptions.
Thus, eigenproblems join LU factorization (Gaussian elimination) and QR factorization (Gram–Schmidt): the eigensolutions are equivalent to a matrix factorization. This is extremely useful in helping us think algebraically about using eigenvalues and eigenvectors, because it lets us work with them all at once.
A key concept in linear algebra, especially for eigenproblems, is a change of basis. If $S$ is an $m\times m$ invertible matrix, then we know its columns form a basis for $\mathbb{R}^m$. Writing any $x$ in this basis is simply $x=Sc$, i.e. coordinates $c = S^{-1} x$ in the new basis (the new "coordinate system").
If we have a matrix $A$ representing a linear operation $y=Ax$, we can also try to write the same linear operation in a new coordinate system. That is, if we write $x=Sc$ and $y=Sd$, then what is the matrix relating $c$ and $d$? This is easy to compute: $$ d = S^{-1} y = S^{-1} Ax = S^{-1} A S c, $$ so the matrix $\boxed{B = S^{-1} A S}$ is represent the same operation as $A$ but in the new coordinate system.
In linear algebra, we say that $A$ and $B$ are similar matrices. That is, B is similar to A if and only if $B = S^{-1} A S$ for some invertible matrix S. It also follows that $A = SBS^{-1} = (S^{-1})^{-1} B (S^{-1})$, i.e. if B is similar to A using S, then A is similar to B using $S^{-1}$.
For example, we can choose a random 2×2 S to write our matrix A from above in a new coordinate system:
S = randn(2,2)
2×2 Matrix{Float64}: 0.34504 1.3915 -0.486365 1.30546
det(S) # a random S is almost certainly nonsingular
1.1272143106064274
B = S \ A * S # same as inv(S) * A * S, but more efficient since it avoids the explicit inverse
2×2 Matrix{Float64}: 3.08981 0.11279 -0.867718 1.91019
A key fact is that similar matrices have the same eigenvalues (but different eigenvectors):
eigvals(B)
2-element Vector{Float64}: 1.9999999999999991 3.000000000000001
This is easy to show in a variety of ways.
For example, if $Ax=\lambda x$, since $A=SBS^{-1}$, we have $ SBS^{-1} x = \lambda x$, or $$ B(S^{-1}x) = \lambda (S^{-1}x) $$ so S⁻¹x is an eigenvector of B with the same eigenvalue λ!
In contrast, multiplying $A$ only on one side by some matrix $S$ will typically change the eigenvalues:
eigvals(A * S)
2-element Vector{ComplexF64}: 1.1487471858694138 - 2.3331664678277177im 1.1487471858694138 + 2.3331664678277177im
Another way of seeing this is by looking at the characteristic polynomial:
$$ \det(A - \lambda I) = \det(SBS^{-1} - \lambda I) = \det \left[ S (B - \lambda I) S^{-1} \right] = \det(S) \det(B - \lambda I) \det(S^{-1}) = \det(B - \lambda I) $$i.e. similar matrices have the same characteristic polynomial.
(Here, we used the product rule for determinants and the fact that $\det(S^{-1}) = 1/\det(S)$.)
If we set λ=0, we see that $\det A = \det B$, i.e. similar matrices have the same determinant.
det(A), det(B)
(6.0, 5.999999999999999)
In the new language, we see that diagonalization $A = X \Lambda X^{-1}$ is the same thing as saying that A is similar to a diagonal matrix (of eigenvalues). That is, there is a basis (coordinate system) in which A is diagonal.
From above, $\det A = \det \Lambda = \lambda_1 \lambda_2 \cdots \lambda_m$. That is, the determinant is the product of the eigenvalues.
(Technically, we have only shown this for diagonalizable matrices, but it turns out to be true in general.)
Let's try it:
det(A)
6.0
This is the same as the product of A's eigenvalues:
2 * 3
6
Equivalently, using Julia's prod
function (which computes the product of the elements of a list):
prod(eigvals(A))
6.0
We can also try it for some large matrices:
Abig = randn(100,100)
det(Abig)
1.6680449397144425e78
prod(eigvals(Abig))
1.6680449397143506e78 - 1.3679861476883116e62im
It also follows that the log of the determinant is the sum of the logs of the eigenvalues.
Julia has a built-in function logabsdet
to compute the log of the absolute value of the determinant, which should be the sum of the logs of the absolute values of the eigenvalues. (There is also a logdet
function to compute the log without the absolute value, but then we need to deal with complex numbers.)
logabsdet(Abig) # the log of the absolute value of the determinant, and the sign
(180.11328949938405, 1.0)
s = sum(log.(abs.(eigvals(Abig)))) # the sum of the logs of |λ|
180.113289499384
If we try to compute the determinant of an even bigger matrix, we run into a problem: in the default precision, a computer can't represent numbers bigger than $10^{308}$ or so, and instead gives Inf
(for "infinity"):
Abigger = randn(1000,1000)
det(Abigger)
-Inf
floatmax(Float64) # the problem is that there is a maximum value the computer can represent, beyond which it gives "Inf"
1.7976931348623157e308
But the log is fine:
logabsdet(Abigger) # but the log is fine
(2954.073853516206, -1.0)
s = sum(log.(abs.(eigvals(Abigger)))) # the sum of the logs of |λ|
2954.0738535162086
Another important quantity for a square matrix, which we haven't covered yet, is the trace. The trace is defined as the sum of the diagonal elements of any matrix. By plugging in the definition of matrix multiplication, one can quickly show that the trace has a crucial property:
$$ \operatorname{trace} (AB) = \operatorname{trace}(BA) $$It follows that similar matrices have the same trace, since if $A=SBS^{-1}$ then
$$ \operatorname{trace} (A) = \operatorname{trace}(SBS^{-1}) = \operatorname{trace}(S^{-1}SB) = \operatorname{trace}(B) $$In particular, since A and Λ are similar, this means that the trace of a matrix is the sum of the eigenvalues!
Let's check:
tr(A)
5
Yup, this is the sum of the eigenvalues:
2 + 3
5
sum(eigvals(A))
5.0
Let's try it for our bigger example:
tr(Abig)
4.625235324319982
sum(eigvals(Abig))
4.625235324319927 + 1.2434497875801753e-14im
We've already seen that if $Ax = \lambda x$ then $A^n x = \lambda^n x$ (for both positive and negative $n$): if x is an eigenvector of A, then it is also an eigenvector of Aⁿ with the eigenvalue raised to the n-th power.
There is another cute way to see this for diagonalizable matrices. If $A = X\Lambda X^{-1}$, then for $n \ge 0$
$$ A^n = \underbrace{AAA\cdots A}_{n\mbox{ times}} = \underbrace{X\Lambda X^{-1}X\Lambda X^{-1}X\Lambda X^{-1}\cdots X\Lambda X^{-1}}_{n\mbox{ times}} = X\Lambda^n X^{-1} $$because all of the $X$ terms cancel except the first and last ones. $\Lambda^n$ is just the diagonal matrix with $\lambda_1^n, \lambda_2^n, \ldots$ on the diagonal.
So, since we have the diagonalization of $A^n$, we immediately see that its eigenvectors are $X$ (same as for $A$) and its eigenvalues are $\lambda^n$.
Since $A^{-1}x = \lambda^{-1} x$ for an eigenvector $x$, it immediately follows that $A^{-1} = X \Lambda^{-1} X^{-1}$ where $\Lambda^{-1}$ is the diagonal matrix with $\lambda_k^{-1}$ on the diagonal. Similarly for $A^{-n} = X \Lambda^{-n} X^{-1}$.
A = [1 1
-2 4]
2×2 Matrix{Int64}: 1 1 -2 4
eigvals(A)
2-element Vector{Float64}: 2.0 3.0
Let's check that the eigenvalues of $A^4$ are $2^4 = 16$ and $3^4 = 81$:
eigvals(A^4)
2-element Vector{Float64}: 16.0 81.0
X = eigvecs(A)
X = X ./ X[1,:]'
2×2 Matrix{Float64}: 1.0 1.0 1.0 2.0
Let's check that the eigenvectors $X'$ of $A^4$ are the same:
X′ = eigvecs(A^4)
X′ = X′ ./ X′[1,:]'
2×2 Matrix{Float64}: 1.0 1.0 1.0 2.0
Similarly for $A^{-1}$, whose eigenvalues should be 1/2 and 1/3:
eigvals(inv(A))
2-element Vector{Float64}: 0.33333333333333337 0.4999999999999999
Note that a matrix is invertible if and only if its eigenvalues are all nonzero.
In fact an eigenvector of $A$ for $\lambda = 0$ has another name we've already seen: it is a nonzero vector in the nullspace $N(A)$. Only matrices with $N(A) = \{\vec{0}\}$ (remember, $\vec{0}$ is not allowed as an eigenvector) are invertible.
One really cool thing that diagonalization allows us to do easily is to compute $A^n$ for non-integer powers n. For example, we can now see how to find the square root of a matrix, at least for diagonalizable matrices.
If $A$ is a square matrix, its square root $\sqrt{A} = A^{1/2}$ is just a matrix so that $A^{1/2} A^{1/2} = A$. But how would we find such a matrix?
As usual, for eigenvalues it is easy: if $Ax=\lambda x$, then we obviously want $A^{1/2} x = \lambda^{1/2} x$. If $A$ is diagonalizable and we do this for every eigenvalue/vector, we get the diagonalization:
$$ \sqrt{A} = A^{1/2} = X \underbrace{\begin{pmatrix} \sqrt{\lambda_1} & & & \\ & \sqrt{\lambda_2} & & \\ & & \ddots & \\ & & & \sqrt{\lambda_m} \end{pmatrix}}_\sqrt{\Lambda} X^{-1} $$(Obviously, this may give a complex matrix if any eigenvalues are negative or complex.)
Does this have the property that $A^{1/2} A^{1/2} = A$? Yes! $X \sqrt{\Lambda} X^{-1} X \sqrt{\Lambda} X^{-1} = X \sqrt{\Lambda} \sqrt{\Lambda} X^{-1} = A$, since obviously $\sqrt{\Lambda} \sqrt{\Lambda} = \Lambda$ from the definition above.
Let's try it:
sqrt.(eigvals(A)) # takes the square root (elementwise) of each eigenvalue
2-element Vector{Float64}: 1.4142135623730951 1.7320508075688772
Diagonal(sqrt.(eigvals(A))) # the diagonal matrix √Λ
2×2 Diagonal{Float64, Vector{Float64}}: 1.41421 ⋅ ⋅ 1.73205
Asqrt = X * Diagonal(sqrt.(eigvals(A))) / X # the √A matrix
2×2 Matrix{Float64}: 1.09638 0.317837 -0.635674 2.04989
Asqrt * Asqrt
2×2 Matrix{Float64}: 1.0 1.0 -2.0 4.0
Hooray, it squared to $A$ as desired!
Julia sqrt
will compute the matrix square root for you if you give it a square-matrix argument. (It is more general than our approach above, because it works even for non-diagonalizable matrices.)
sqrt(A)
2×2 Matrix{Float64}: 1.09638 0.317837 -0.635674 2.04989
Let's double-check that it is the same as we got from an explicit square root of the eigenvalues:
sqrt(A) ≈ Asqrt
true