In [1]:

using LinearAlgebra

Complexity of Matrix Operations¶

With a little effort, we can figure out that the number of arithmetic operations for an $n\times n$ matrix scales proportional to (for large $n$):

$n^2$ for: matrix * vector $Ax$, or solving a triangular system like $Ux=c$ or $Lc=b$ (back/forward substitution)
$n^3$ for: matrix * matrix $AB$, LU factorization $PA=LU$, or solving a triangular system with $n$ right-hand sides like computing $A^{-1}$ from the LU factorization.

(In computer science, we would say that these have “complexity” $\Theta(n^2)$ and $\Theta(n^3)$, respectively.

Let's see how these predictions match up to reality:

In [2]:

using BenchmarkTools # a useful Julia package for performance benchmarking

Measure the time for LU factorization of 10×10, 100×100, 500×500, 1000×1000, and 2000×2000 random real (double precision) matrices:

In [3]:

n = [10,100,500,1000,2000]
LinearAlgebra.BLAS.set_num_threads(1) # benchmarking on multiple cores is weird
t = [@belapsed(lu($(rand(n,n))), evals=1) for n in n]

Out[3]:

5-element Vector{Float64}:
 6.25e-7
 4.5459e-5
 0.002457875
 0.016544833
 0.122767084

Now let's plot it on a log–log scale to see if it is the expected $n^3$ power law:

In [4]:

using PyPlot
loglog(n, t*1e9, "bo-")
loglog(n, n.^3, "k--")
xlabel("matrix size n")
ylabel("time (ns)")
legend(["time", L"n^3"])
title("time for LU factorization")

┌ Warning: `vendor()` is deprecated, use `BLAS.get_config()` and inspect the output instead
│   caller = npyinitialize() at numpy.jl:67
└ @ PyCall /Users/stevenj/.julia/packages/PyCall/L0fLP/src/numpy.jl:67

Out[4]:

PyObject Text(0.5, 1.0, 'time for LU factorization')

It's pretty close! For large $n$, you can see it starting to go parallel to the $n^3$ line.

Let's also look at the time to solve $LUx=b$ when we are given the LU factors, which we predict should grow $\sim n^2$:

In [5]:

ts = [@belapsed($(lu(rand(n,n))) \ $(rand(n))) for n in n]

Out[5]:

5-element Vector{Float64}:
 2.193675889328063e-7
 2.463e-6
 5.1167e-5
 0.000196583
 0.001126875

In [6]:

loglog(n, ts*1e9, "bo-")
loglog(n, n.^2, "k--")
xlabel("matrix size n")
ylabel("time (ns)")
legend(["time", L"n^2"])
title("time for LU solve")

Out[6]:

PyObject Text(0.5, 1.0, 'time for LU solve')

Yup, it's pretty close to the $n^2$ growth! The key point is that, unless you have many ($\gtrsim n$) right-hand sides, most of the effort is spent in Gaussian elimination (finding L and U), not in the back/forward-substitution to solve $LUx=b$.

If we believe this scaling, how long would it take for my laptop to solve a $10^6 \times 10^6$ system of equations?

In [7]:

t[end], n[end] # the last measured time and n for LU factorization

Out[7]:

(0.122767084, 2000)

In [8]:

secs = t[end] * (1e6/n[end])^3 # this many seconds

Out[8]:

1.53458855e7

In [9]:

# convert to a human time period
using Dates
Dates.canonicalize(Dates.CompoundPeriod(Dates.Second(round(Int,secs))))

Out[9]:

25 weeks, 2 days, 14 hours, 44 minutes, 46 seconds

In fact, we usually run out of memory before we run out of time:

In [10]:

println((1e6)^2 * sizeof(Float64) / 2^30, " GiB for a 10⁶×10⁶ matrix")

7450.580596923828 GiB for a 10⁶×10⁶ matrix

In practice, people do regularly solve problems this large, and even larger, but they can do so because real matrices that big almost always have some special structure that allows you to solve them more quickly and store them more compactly. For example, a common special structure is sparsity: matrices whose entries are mostly zero. We will learn some basic ways to take advantage of this later in 18.06, and sparse-matrix methods are covered more extensively in 18.335.