In [1]:

using LinearAlgebra, PyPlot, Interact

The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.

The power method¶

We know that multiplying by a matrix $A$ repeatedly will exponentially amplify the largest-|λ| eigenvalue. This is the basis for many algorithms to compute eigenvectors and eigenvalues, the most basic of which is known as the power method.

The simplest version of this is to just start with a random vector $x$ and multiply it by $A$ repeatedly. (This is the procedure by which a Markov process approaches its steady state!) This works, but has the practical problem that the vector quickly becomes very large or very small, and eventually becomes too big/small for the computer to represent (this is known as "overflow/underflow"). The fix is easy: normalize the vector after each multiplication by A. That is:

Starting with a random $x$, repeatedly compute $x \leftarrow Ax / \Vert Ax \Vert $.

For example, let's try it on a random matrix with eigenvalues 1 to 5:

In [2]:

A = randn(5,5)
A = A * Diagonal(1:5) / A

Out[2]:

5×5 Matrix{Float64}:
  4.27661   1.1409      0.216216   2.8477    -2.26094
 -0.77939   1.01093     0.542869  -3.46742    2.19621
 -0.325662  0.57507     2.23157    0.54375    1.83044
  0.825317  0.693926    0.482976   4.41696   -0.892176
  0.302161  0.00126964  0.898394  -0.428578   3.06394

In [3]:

eigvals(A)

Out[3]:

5-element Vector{Float64}:
 0.9999999999999998
 2.0
 3.000000000000002
 4.0000000000000036
 4.999999999999995

In [4]:

λ, X = eigen(A, sortby=λ->-abs(λ)) # sort eigenvals by -|λ|
λ

Out[4]:

5-element Vector{Float64}:
 4.999999999999995
 4.0000000000000036
 3.000000000000002
 2.0
 0.9999999999999998

Let's look at the result of $n$ steps of the power method side-by-side with the eigenvector $x_1$ (which is normalized to unit length by Julia) for $\lambda = 5$:

In [5]:

x = [5,4,3,2,1] # arbitrary initial vector
# @manipulate
for n = 1:40
    y = x
    for i = 1:n
        y = A*y
        y = y / norm(y)
    end
    display(
        hbox(vbox(HTML("<b>power iteration $n</b>"),y),
         vline(), 
         vbox(HTML("<b>eigenvector</b>"),X[:,1]))
        )
end

Note that the sign of the resulting eigenvector depends on the initial $x$.

We could also plot the difference $\Vert \pm y - x_1 \Vert$ versus the number $n$ of steps:

In [6]:

d = Float64[]
y = x
for i = 1:100
    y = A*y
    y = y / norm(y)
    push!(d, min(norm(y - X[:,1]), norm(-y - X[:,1]))) # pick the better of the two signs
end
semilogy(1:length(d), d, "b.-")
xlabel("number of power steps")
ylabel("error in eigenvector")
title(L"convergence of power method for $\lambda=1,2,3,4,5$")

Out[6]:

PyObject Text(0.5, 1.0, 'convergence of power method for $\\lambda=1,2,3,4,5$')

Convergence rate¶

How fast does the power method converge?

Suppose that $A$ is diagonalizable with eigenvalues sorted in order of decreasing magnitude $$|\lambda_1| > |\lambda_2| > \cdots$$. And suppose that we expand our initial $x$ in the basis of the eigenvectors: $$x = c_1 x_1 + c_2 x_2 + \cdots$$ Then, after $n$ steps, the power method produces:

$$ \mbox{scalar multiple of } A^n x = \mbox{multiple of } \left( \lambda_1^n c_1 x_1 + \lambda_2^n c_2 x_2 + \lambda_3^n c_3 x_3 + \cdots \right) = \mbox{multiple of } \lambda_1^n \left[ c_1 x_1 + (\lambda_2/\lambda_1)^n c_2 x_2 + (\lambda_3/\lambda_1)^n c_3 x_3 + \cdots \right] $$

The overall exponentially growing (or decaying) term $\lambda_1^n$ gets removed by the normalization. The key thing is that the $x_2$, $x_3$ and other "error" terms not proportional to $x_1$ decay like the ratios of their eigenvalues/λ₁to the n-th power.

For large $n$ the error is dominated by the $x_2$ term (the next-biggest |λ|), since that term decays most slowly, and the magnitude of this term decays proportional to $|\lambda_2/\lambda_1|^n$: the ratio of the magnitudes.

For example, in our case above, we'd expect the error to decay proportional to $(4/5)^n$:

In [7]:

semilogy(1:length(d), d, "b.-")
semilogy(1:length(d), (4/5).^(1:length(d)), "k--")
xlabel("number of power steps")
ylabel("error in eigenvector")
title(L"convergence of power method for $\lambda=1,2,3,4,5$")
legend(["error", L"(4/5)^n"])

Out[7]:

PyObject <matplotlib.legend.Legend object at 0x7fe42c95a110>

Pros/cons of the power method¶

Our analysis shows that the power method can converge very slowly if $|\lambda_2/\lambda_1|$ is close to 1. And if two eigenvalues have equal magnitude, the method may not converge at all.
Also, it only gives us $x_1$. What if we want $x_2$, or in general a few of the biggest-|λ| eigenvectors?

Still, the power method is the starting point for many more sophisticated methods, including the Arnoldi method (which gives a few of the biggest eigenvectors) or the QR algorithm which gives all of the eigenvectors.

And the power method by itself can still be a pretty good method if we know that one eigenvalue is bigger than all of the others, e.g. for Markov matrices. And because it is so simple, the power method is easy to apply in lots of different cases, especially since:

The power method only requires you to supply a "black box" that multiplies matrix × vector This is a huge advantage for problems where the matrix is mostly zeros (or has some other special structure), in which you can multiply matrix × vector much more quickly than for a generic matrix.

An interesting application of the power method comes from how Markov matrices relate to the Google PageRank. Google actually runs this algorithm on a huge Markov matrix where rows/cols are web pages: the matrix is over a billion by billion entries. But since most web pages only link to a few other pages, the matrix is mostly zeros, and you can multiply it by a random vector in a few billion operation, rather than a billion² operations. (They don't even store the whole matrix: you only store the nonzero entries of such a sparse matrix.)

Getting an eigenvalue from the eigenvector¶

In the textbook method of solving eigenproblems, we first find the eigenvalues from the roots of the characteristic polynomial, and then we find the eigenvectors from $N(A - \lambda I)$ for each eigenvalue.

The power method, however, gives you an eigenvector first! How do you find the eigenvalue? And how do you find an approximate eigenvalue given the approximate eigenvector that you get from a finite number of iterations.

For example, suppose that we do 30 iterations on the example above:

In [8]:

y = x
for i = 1:30
    y = A*y
    y = y / norm(y)
end
y

Out[8]:

5-element Vector{Float64}:
  0.751664028549221
 -0.517065175988641
 -0.14283935626318672
  0.38223656917816207
 -0.03371817685193268

In [9]:

A*y

Out[9]:

5-element Vector{Float64}:
  3.7585009856494422
 -2.585526794858033
 -0.7147699284677411
  1.9109744679392595
 -0.16898745218157352

If $y$ was the exact eigenvector, we could just multiply $Ay$ and see how much each component increased: they would all increase (or decrease) by the same factor λ.

But, for an approximate eigenvector, each component of $Ay$ will increase by a slightly different amount:

In [10]:

(A*y) ./ y   # divide each element of Ay elementwise (./ in Julia) by the corresponding element of y

Out[10]:

5-element Vector{Float64}:
 5.000240590072783
 5.000388567871437
 5.0040125296473015
 4.999454845589477
 5.011761250427376

These are all pretty close to the true eigenvalue λ=5, but don't quite agree. Clearly, we need some kind of average?

A problem with dividing things elementwise is that some of the eigenvector elements might be zero (or nearly zero), and then our estimate will go crazy. Instead, we need to take the average in some other way.

Instead, the most common approach is to use the Rayleigh quotient:

$$ \lambda \approx \frac{y^T A y}{y^T y} $$

where $y$ is our estimated eigenvector. If we have an exact eigenvector, so that $Ay = \lambda y$, then the Rayleigh quotient will gives us exactly $\lambda$. Otherwise, it is a kind of weighted-average (weighted by the components $y_k^2$), and is a reasonable approximation:

In [11]:

(y'A*y) / (y'y) # a Rayleigh quotient in Julia

Out[11]:

5.000255409055291

Clearly, this is a pretty good estimate for the true eigenvalue 5. Let's see how it converges by plotting the error as a function of the number of power iterations:

In [12]:

Δλ = Float64[]
y = x
for i = 1:100
    y = A*y
    y = y / norm(y)
    λ̃ = (y'A*y) / (y'y)
    push!(Δλ, abs(λ̃ - 5))
end
semilogy(1:length(Δλ), Δλ, "b.-")
semilogy(1:length(Δλ), (4/5).^(1:length(Δλ)), "k--")
xlabel("number of power steps")
ylabel("error in eigenvalue")
title(L"convergence of power method $\lambda$ for $\lambda=1,2,3,4,5$")
legend(["error", L"(4/5)^n"])

Out[12]:

PyObject <matplotlib.legend.Legend object at 0x7fe40cc23950>

It is converging at the same rate.

(For symmetric matrices, it turns out that the eigenvalue converges even faster than the eigenvalue, but that is not a topic for 18.06.)

Shift-and-invert¶

The power method finds the eigenvector (and eigenvalue) for the biggest |λ|. If, instead, we want to find the smallest |λ|, we can apply the power method to $A^{-1}$. More generally, if we want to find the eigenvalue closest to some μ, we can apply the power method to $(A - \mu I)^{-1}$. This is called a shift and invert method:

Starting with a random $x$, repeatedly solve $(A-\mu I)y = x$ and then replace $x \leftarrow y / \Vert y \Vert $.

Notice that we don't usually compute $(A - \mu I)^{-1}$ explicitly; instead, we solve $(A-\mu I)y = x$ by whatever the best available method is. (If $A$ is a huge sparse matrix, this is a tricky problem in itself, which requires its own iterative or sparse methods.)

If $\lambda$ is the closest eigenvalue to $\mu$, and $\lambda^\prime$ is the second-closest, the same analysis as above tells us that the errors on the n-th iteration should go like $$ \mathrm{error} \sim \left| \frac{\lambda - \mu}{\lambda^\prime - \mu} \right|^n , $$ since $(\lambda - \mu)^{-1}$ and $(\lambda^\prime - \mu)^{-1}$ are the largest- and second-largest-magnitude eigenvalues of $(A - \mu I)^{-1}$, respectively.

Let's try this on our example matrix above to find the eigenvalue closest to 2.1. This should give us 2, and the error should go as $|2-2.1|^n / |3-2.1|^n = 1/9^n$:

In [13]:

Δλ = Float64[]
y = x
for i = 1:15
    y = (A - 2.1I) \ y
    y = y / norm(y)
    λ̃ = (y'A*y) / (y'y)
    push!(Δλ, abs(λ̃ - 2))
end
semilogy(1:length(Δλ), Δλ, "b.-")
semilogy(1:length(Δλ), (1/9).^(1:length(Δλ)), "k--")
xlabel("number of power steps")
ylabel("error in eigenvalue")
title(L"convergence of shift-and-invert method $\lambda$ for $\lambda=1,2,3,4,5$, $\mu = 2.1$")
legend(["error", L"1/9^n"])

Out[13]:

PyObject <matplotlib.legend.Legend object at 0x7fe42c4e0a90>

Yup, it converges at the expected rate!

If you have a reasonable guess for the eigenvalue that you want, shift-and-invert can converge amazingly fast!