In [1]:

using Polynomials, PyPlot, LinearAlgebra
# force IJulia to display as LaTeX rather than HTML
Base.showable(::MIME"text/html", ::Polynomial) = false

Eigenvalues: The Key Idea¶

If we can find a solution $x \ne 0$ to

$$ Ax = \lambda x $$

then, for this vector, the matrix $A$ acts like a scalar. $x$ is called an eigenvector of $A$, and $\lambda$ is called an eigenvalue.

In fact, for an $m \times m$ matrix $A$, we typically find $m$ linearly independendent eigenvectors $x_1,x_2,\ldots,x_m$ and $m$ corresponding eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_m$. Such a matrix is called diagonalizable. Most matrices are diagonalizable; we will deal with the rare "defective" (non-diagonalizable) case later.

Given such a basis of eigenvectors, the key idea for using them is:

Take any vector $x$ and expand it in this basis: $x = c_1 x_1 + \cdots c_m x_n$, or $x = Xc$ or $c = X^{-1}x$ where $X$ is the matrix whose columns are the eigenvectors.
For each eigenvector $x_k$, the matrix $A$ acts like a scalar $\lambda_k$. Multiplication or division corresponds to multiplying/dividing $x_k$ by $\lambda_k$. Solve your problem for each eigenvector by treating A as the scalar λ.
Add up the solution to your problem (sum the basis of the eigenvectors). That is, multiply the new coefficients by $X$.

The characteristic polynomial¶

To find the eigenvalues, one approach is to realize that $Ax = \lambda x$ means:

$$ (A - \lambda I) x = 0 \, , $$

so the matrix $A - \lambda I$ is singular for any eigenvalue λ. This corresponds to the determinant being zero:

$$ p(\lambda) = \det(A - \lambda I) = 0 $$

where $p(\lambda)$ is the characteristic polynomial of A: a polynomial of degree m if $A$ is $m \times m$. The roots of this polynomial are the eigenvalues λ.

A polynomial of degree $m$ has at most $m$ roots (possibly complex), and typically has $m$ distinct roots. This is why most matrices have $m$ distinct eigenvalues/eigenvectors, and are therefore diagonalizable.

For example, let's plot the $\det(A - \lambda I)$ for a 4×4 matrix $A$. The result is a quartic curve whose roots are the four eigenvalues (computed by the built-in eigvals function):

In [2]:

# some "random" matrix:
A = [ 0.325  -0.075   0.075  -0.075
      0.025   0.225  -0.025  -0.275
       0.15   -0.05    0.25   -0.05 
       -0.1    -0.1     0.1     0.4  ]

λ = eigvals(A)

Out[2]:

4-element Vector{Float64}:
 0.10000000000000009
 0.1999999999999999
 0.3999999999999998
 0.5000000000000001

(I admit it: $A$ was not chosen at random to have such special eigenvalues.)

In [3]:

ξ = range(0,0.6,length=100)
plot(ξ, [det(A - λ*I) for λ in ξ], "r-")
plot(ξ, 0ξ, "k--")
plot(λ, 0λ, "bo")
xlabel(L"\lambda")
ylabel(L"\det(A - \lambda I)")
title("characteristic polynomial")

Out[3]:

PyObject Text(0.5, 1.0, 'characteristic polynomial')

Eigenvalue example:¶

For example, consider the matrix

$$ A = \begin{pmatrix} 1 & 1 \\ -2 & 4 \end{pmatrix} $$

whose eigenvalues are $\lambda = \{2,3\}$:

In [4]:

A = [ 1 1
     -2 4 ]

Out[4]:

2×2 Matrix{Int64}:
  1  1
 -2  4

In [5]:

eigvals(A)

Out[5]:

2-element Vector{Float64}:
 2.0
 3.0

The characteristic polynomial is

$$ \det(A - \lambda I) = \det \begin{pmatrix} 1 - \lambda & 1 \\ -2 & 4 - \lambda \end{pmatrix} = (1 - \lambda)(4 - \lambda) - (1)(-2) = \lambda^2 - 5\lambda + 6 = (\lambda - 2) (\lambda - 3) $$

where we have used high-school algebra to factor the polynomial. Hence its roots are $\lambda = \{2, 3\}$, as computed above.

Eigenvectors¶

Once we have the eigenvalues, finding the eigenvectors is (in principle) easy: the eigenvectors are just (a basis for) the nullspace

$$ N(A - \lambda I) $$

when $\lambda$ is an eigenvalue.

For example, with the matrix above, let's take the eigenvalue $\lambda_1 = 2$:

$$ A - 2I = \begin{pmatrix} -1 & 1 \\ -2 & 2 \end{pmatrix} $$

We could go through Gaussian elimination to find the nullspace, but we can see by inspection that the second column is minus the first, hence $x_1 = (1, 1)$ is a basis for the nullspace:

$$ (A - 2I) x_1 = \begin{pmatrix} -1 & 1 \\ -2 & 2 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} $$

or

$$ A x_1 = 2 x_1 $$

as desired. $x_1 = (1, 1)$ is an eigenvector! Let's check:

In [6]:

A * [1, 1]

Out[6]:

2-element Vector{Int64}:
 2
 2

For the other eigenvalue, $\lambda = 3$, we get:

$$ A - 3I = \begin{pmatrix} -2 & 1 \\ -2 & 1 \end{pmatrix} $$

from which it is obvious that a basis for the nullspace is $x_2 = (1, 2)$. Let's check:

In [7]:

A * [1, 2]

Out[7]:

2-element Vector{Int64}:
 3
 6

Yup, $A x_2 = 3 x_2$!

For more complicated cases, of course, we might have to go through elimination to find the nullspace. In practice, though, we alway just let the computer do it. The eig function in Julia will return the eigenvalues and eigenvectors:

In [8]:

λ, X = eigen(A)

Out[8]:

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
2-element Vector{Float64}:
 2.0
 3.0
vectors:
2×2 Matrix{Float64}:
 -0.707107  -0.447214
 -0.707107  -0.894427

In [9]:

λ

Out[9]:

2-element Vector{Float64}:
 2.0
 3.0

In [10]:

Out[10]:

2×2 Matrix{Float64}:
 -0.707107  -0.447214
 -0.707107  -0.894427

The columns of X are indeed the eigenvectors from above, but they are scaled differently (they are normalized to unit length). If we divide each one by its first element, though, we should recover our scaling from above:

In [11]:

X[:,1] / X[1,1]  # first column, with first entry scaled to 1

Out[11]:

2-element Vector{Float64}:
 1.0
 1.0

In [12]:

X[:,2] / X[1,2] # second column, with second entry scaled to 1

Out[12]:

2-element Vector{Float64}:
 1.0
 2.0

In practice, computing eigenvalues by hand, especially by this method, is even more pointless than doing Gaussian elimination by hand, for reasons explained below, so I will focus more on the properties of eigenvalues and how to use them than how to compute them. The computer will give us their values.

Matrix powers: A first look¶

If you multiply an eigenvector by $A^n$, it just multiplies the vector by $\lambda^n$. We will explore this more later, but for now let's try a couple of quick examples:

In [13]:

A^5 * [1,1] # gives 2⁵ * [1,1] = [32, 32]

Out[13]:

2-element Vector{Int64}:
 32
 32

What if we multiply some other vector $y$ by $A^n$? To understand what happens, we expand y in the basis of eigenvectors. A simple example is: $$ A^n \begin{pmatrix} 2 \\ 3 \end{pmatrix} = A^n \left[ \begin{pmatrix} 1 \\ 1 \end{pmatrix} + \begin{pmatrix} 1 \\ 2 \end{pmatrix} \right] = 2^n \begin{pmatrix} 1 \\ 1 \end{pmatrix} + 3^n \begin{pmatrix} 1 \\ 2 \end{pmatrix} \approx 3^n \begin{pmatrix} 1 \\ 2 \end{pmatrix} \mbox{ for } n \gg 1. $$ In this basis each eigenvector is multiplied by λⁿ. Furthermore the term with the biggest |λ| grows fastest so for large n the result is approximately in the corresponding eigenvector direction.

In [14]:

y = A^100.0 * [2,3]

Out[14]:

2-element Vector{Float64}:
 5.15377520732012e47
 1.030755041464024e48

In [15]:

2^100.0 * [1,1] + 3^100.0 * [1,2]  # same, but computed with eigenvectors

Out[15]:

2-element Vector{Float64}:
 5.153775207320113e47
 1.0307550414640226e48

The result is approximately a multiple of $(1,2)$, so the second component should be nearly double the first component:

In [16]:

y[2] / y[1]

Out[16]:

2.0

The same reasoning shows that for any vector $z$ that is not a multiple of $(1,1)$, $A^{100}z$ will blow up proportional to $3^{100}$ and will be approximately parallel to $(1,2)$:

In [17]:

z = A^100.0 * [17,-5]

Out[17]:

2-element Vector{Float64}:
 -1.1338305456104243e49
 -2.26766109122085e49

In [18]:

z[2]/z[1] # approximately 2 since z is nearly parallel to (1,2)

Out[18]:

2.0000000000000013

Transpose: Same eigenvalues!¶

One of the properties of determinant is that $\det A^T = \det A$. It follows that $$\det(A-\lambda I) = \det\left[ (A -\lambda I)^T \right] = \det (A^T - \lambda I)$$ and therefore $A$ and $A^T$ have the same eigenvalues! (They have the same characteristic polynomial.)

Let's check:

In [19]:

eigvals(A')

Out[19]:

2-element Vector{Float64}:
 2.0
 3.0

Yup, same eigenvalues (2 and 3) as for $A$.

However, $A$ and $A^T$ in general have different eigenvectors, because the left and right nullspaces are not usually the same. $N(A - \lambda I) \ne N(A^T - \lambda I)$ in general. Here, the eigenvectors are:

In [20]:

Y = eigvecs(A')
Y ./ Y[1,:]' # normalize so that the first components are 1, for easier comparison

Out[20]:

2×2 Matrix{Float64}:
  1.0   1.0
 -0.5  -1.0

Notice that these are different from the (1,1) and (1,2) that we got above.

As you might guess, the eigenvectors of $A^T$ are sometimes called its left eigenvectors.

Complex eigenvalues¶

If we change the matrix to: $$ \begin{pmatrix} 1 & 3 \\ -2 & 4 \end{pmatrix} $$ we get a characteristic polynomial: $$ \det \begin{pmatrix} 1 - \lambda & 3 \\ -2 & 4 - \lambda \end{pmatrix} = (1 - \lambda)(4 - \lambda) - (3)(-2) = \lambda^2 - 5\lambda + 10 $$ whose roots, from the quadratic formula, are: $$ \lambda = \frac{5 \pm \sqrt{5^2 - 40}}{2} = \frac{5 \pm \sqrt{-15}}{2} $$ which are complex! Let's check:

In [21]:

eigvals([1 3
        -2 4])

Out[21]:

2-element Vector{ComplexF64}:
 2.5 - 1.9364916731037085im
 2.5 + 1.9364916731037085im

In [22]:

(5 + sqrt(15)*im) / 2

Out[22]:

2.5 + 1.9364916731037085im

Yup, it matches our formula.

Eigenvalues may be complex numbers, even for real matrices. We can't avoid complex numbers for any longer in 18.06!

(But, for real matrices, they are the roots of a real polynomial and hence come in complex conjugate pairs.)

The perils of polynomial roots¶

You might think that finding roots of polynomials is we must inevitably find eigenvalues. In fact, although we use the characteristic polynomial to think about eigenvalues, in practice they are not used to compute them except for tiny matrices.

In fact, working with the characteristic polynomial is a computational disaster in general, because roots of polynomials are exponentially sensitive to their coefficients. Any tiny roundoff error leads to disaster.

For example, consider the polynomial

$$ w(x) = (x - 1) (x - 2) (x - 3) \cdots (x - 10) $$

whose roots are, obviously, ${1,2,\ldots,10}$. What happens if we actually multiply this polynomial together and compute the roots from the coefficients?

In [23]:

w = prod([Polynomial([-n, 1.0]) for n = 1:10])

Out[23]:

$3.6288e6 - 1.062864e7\cdot x + 1.2753576e7\cdot x^{2} - 8.4095e6\cdot x^{3} + 3.41693e6\cdot x^{4} - 902055.0\cdot x^{5} + 157773.0\cdot x^{6} - 18150.0\cdot x^{7} + 1320.0\cdot x^{8} - 55.0\cdot x^{9} + 1.0\cdot x^{10}$

Already, this seems hard: how do we find roots of a high-degree polynomial? More on this below.

For the moment, we will just use a "black box" function roots provided by the Polynomials package to "magically" get the roots of $w$ from its coefficients:

In [24]:

roots(w)

Out[24]:

10-element Vector{Float64}:
  1.0000000000000342
  1.999999999998515
  3.0000000000138605
  3.9999999999602096
  4.999999999950206
  6.0000000005467955
  6.999999998669565
  8.000000001559398
  8.999999999086734
 10.000000000214685

Looks good! The roots are what they should be.

Howevever, suppose we make a tiny error in computing the coefficients. Let's multiply each coefficient by $1 + \epsilon$, where $\epsilon$ is a random small number of root-mean-square value $R$.

The following code plots the roots in the complex plane for 100 random perturbations, and lets us vary the magnitude $R$ of the pertubation:

In [25]:

N = 10
w = prod([Polynomial([-n, 1.0]) for n = 1:N])
fig = figure()
#using Interact
#@manipulate for logR in -9:0.1:-1
for logR in -9:0.5:-1
    display(
    withfig(fig) do
        plot(1:N, zeros(10), "r*")
        R = exp10(logR)
        for i = 1:100
            r = roots(Polynomial(coeffs(w) .* (1 .+ R .* randn(N+1))))
            plot(real(r), imag(r), "b.")
        end
        xlabel("real part of roots")
        ylabel("imaginary part of roots")
        title("roots of \$(x-1)\\cdots(x-10)\$ with coeffs perturbed by R=$R")
    end
    )
end

Even a tiny error causes the roots to be complete garbage. This gets exponentially worse as the degree of the polynomials increases.

Because computers inevitably use a finite precision (usually about 15 significant digits), the tiny roundoff errors mean that characteristic polynomials are a computational disaster if they are actually computed explicitly.

Companion matrices¶

Finding roots of polynomials is equivalent to finding eigenvalues. Not only can you find eigenvalues by solving for the roots of the characteristic polynomial, but you can conversely find roots of any polynomial by turning into a matrix and finding the eigenvalues.

Given the degree-$n$ polynomial:

$$ p(z)=c_0 + c_1 z + \cdots + c_{n-1}z^{n-1} + z^n \;, $$

(notice that the $z^n$ coefficient is 1), we define the $n \times n$ companion matrix

$$ C=\begin{pmatrix} 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ 0 & \ddots & \ddots & \ddots & \vdots \\ \vdots & \vdots & \ddots & 0 & 1 \\ -c_0 & -c_1 & \dots & -c_{n-2} & -c_{n-1} \end{pmatrix}. $$

The amazing fact is that the characteristic polynomial $\det (C - \lambda I) = p(\lambda)$, and so the eigenvalues of C are the roots of p.

Proof¶

Suppose $z$ is an root of $p(z) = 0$. We can now show that this is an eigenvalue of $C$, with eigenvector $= (1,z,z^2,\ldots,z^{n-1})$:

$$ C \begin{pmatrix} 1 \\ z \\ z^2 \\ \vdots \\ z^{n-1} \end{pmatrix} = \begin{pmatrix} z \\ z^2 \\ \vdots \\ z^{n-1} \\ -c_0 - c_1 z - \cdots - c_{n-1} z^{m-1} \end{pmatrix} = \begin{pmatrix} z \\ z^2 \\ \vdots \\ z^{n-1} \\ z^n \end{pmatrix} = z \begin{pmatrix} 1 \\ z \\ z^2 \\ \vdots \\ z^{n-1} \end{pmatrix} $$

where in the last row we used the fact that $p(z) = 0$ so $z^n = -c_0 - c_1 z - \cdots - c_{n-1} z^{m-1}$.

Hence $z$ is an eigenvalue. The eigenvalues of C are the roots of p and vice versa.

Conclusion¶

If you have a polynomial whose leading coefficient is not 1, you can just divide the polynomial by that coefficient to get it in this form, without changing its roots. Hence the roots of any polynomial can be found by computing the eigenvalues of a companion matrix.

In [26]:

function companion(p::Polynomial)
    c = coeffs(p)
    n = degree(p)
    c = c[1:n] / c[end]
    C = [ [ zeros(n-1)'; I ] -c ]'
    return C
end

Out[26]:

companion (generic function with 1 method)

In [27]:

p = Polynomial([-2, 1]) * Polynomial([-3, 1]) # (x - 2) * (x - 3)

Out[27]:

$6 - 5\cdot x + x^{2}$

In [28]:

C = companion(p)

Out[28]:

2×2 adjoint(::Matrix{Float64}) with eltype Float64:
  0.0  1.0
 -6.0  5.0

In [29]:

eigvals(C)

Out[29]:

2-element Vector{Float64}:
 2.0
 3.0

In [30]:

# (x - 2) * (x - 3) * (x - 4) * (x + 1)
p = Polynomial([-2, 1]) * Polynomial([-3, 1]) * Polynomial([-4, 1]) * Polynomial([1, 1])

Out[30]:

$-24 + 2\cdot x + 17\cdot x^{2} - 8\cdot x^{3} + x^{4}$

In [31]:

C = companion(p)

Out[31]:

4×4 adjoint(::Matrix{Float64}) with eltype Float64:
  0.0   1.0    0.0  0.0
  0.0   0.0    1.0  0.0
  0.0   0.0    0.0  1.0
 24.0  -2.0  -17.0  8.0

In [32]:

eigvals(C)

Out[32]:

4-element Vector{Float64}:
 -0.9999999999999991
  1.999999999999996
  3.0000000000000084
  3.9999999999999947

In fact, this is the most common method to find roots of polynomials of degree ≥ 5: you find the companion matrix, and compute its eigenvalues. This is precisely how the Polynomials package does it (albeit with some extra cleverness to check for leading and trailing zero coefficients):

In [33]:

@which roots(p)

Out[33]:

roots(p::P; kwargs...) where {T, P<:(Polynomials.StandardBasisPolynomial{T})} in Polynomials at /Users/stevenj/.julia/packages/Polynomials/4DeNc/src/polynomials/standard-basis.jl:464

This would seem rather circular if eigenvalues were computed, in turn, by finding roots of polynomials. But they aren't: practical computer eigenvalue solvers never compute the characteristic polynomial, and don't resemble generic root-finding algorithms (like Newton's method).

Computing eigenvalues = polynomial roots = hard¶

Everyone learns the quadratic formula to find roots of a quadratic (degree-2) polynomial.
There is a (horrible) cubic formula to find the roots of any cubic (degree-3) polynomial.
There is a (terrifying) quartic formula to find the roots of any quartic (degree-4) polynomial.
There is no formula (in terms of a finite number of ±,×,÷,ⁿ√) for the roots of an arbitrary quintic polynomial or any degree ≥ 5. This is the Abel–Ruffini theorem, proved in the 19th century.

This does not mean that you can't compute roots (or eigenvalues) in practice! But it means that root-finding/eigenvalue algorithms are necessarily *iterative: they converge toward the solution* but never reach it exactly. You can get the solution to any desired accuracy.

For example we've already seen one such algorithm! Newton's method is an algorithm that could be used to find the roots of an arbitrary polynomial (given enough starting guesses), and converges very quickly without ever exactly reaching the root.

The most common algorithm to find eigenvalues (and hence polynomial roots, via companion matrices) is the QR algorithm. As you might guess, it is related to the $A=QR$ factorization. Explaining how and why this algorithm works, however, is outside the scope of 18.06. (It takes me a week+ in 18.335: graduate numerical methods.)

This means that the textbook characteristic-polynomial method we use to find eigenvalues of $2\times 2$ matrices is something of a fraud: unlike Gaussian elimination, it bears no resemblance whatsoever to how eigenvalues are really computed. In 18.06, therefore, we will mostly assume that the computer hands us the eigenvalues and eigenvectors, and we will focus on what eigensolutions mean, how they are used, and what their properties are.

One thing that it is useful to know, however, is that the computer algorithm to compute eigenvalues/eigenvectors of an $m \times m$ matrix requires $\sim m^3$ operations, just like Gaussian elimination. However, the "constant" coefficient in front of $m^3$ is significantly worse:

In [36]:

A1000 = rand(1000,1000)
LinearAlgebra.BLAS.set_num_threads(1) # use 1 cpu for benchmarking
@time lu(A1000)
@time qr(A1000)
@time eigvals(A1000)
@time eigen(A1000);

  0.025024 seconds (4 allocations: 7.637 MiB)
  0.039420 seconds (7 allocations: 8.179 MiB)
  0.402130 seconds (14 allocations: 7.936 MiB)
  0.628848 seconds (21 allocations: 31.580 MiB)

In [37]:

@elapsed(eigen(A1000)) / @elapsed(lu(A1000))

Out[37]:

37.08389555236729

Finding eigenvalues and/or eigenvectors is not so cheap, but it is often worth it!