Gaussian elimination¶

This Julia notebook allows us to interactively visualize the process of Gaussian elimination.

Recall that the process of Gaussian elimination involves subtracting rows to turn a matrix $A$ into an upper triangular matrix $U$. Often we augment the matrix with an additional column, representing the right-hand side $b$ of a system of equations $Ax=b$ that we want to solve: by doing the same row operations to both $A$ and $b$, we arrive at an equivalent equation $Ux=c$ that is easy to solve by backsubstitution (solving for one variable at a time, working from the last row to the top row).

For example, suppose we are solving:

$$ Ax = \begin{pmatrix} 1 & 3 & 1 \\ 1 & 1 & -1 \\ 3 & 11 & 6 \end{pmatrix} x = \begin{pmatrix} 9 \\ 1 \\ 35 \end{pmatrix} = b $$

We would perform the following elimination process.

$$ \left[\begin{array}{rrr|r} \boxed{1} & 3 & 1 & 9 \\ 1 & 1 & -1 & 1 \\ 3 & 11 & 6 & 35 \end{array}\right]\to \left[\begin{array}{rrr|r} \boxed{1} & 3 & 1 & 9 \\ 0 & \boxed{-2} & -2 & -8 \\ 0 & 2 & 3 & 8 \end{array}\right]\to \left[\begin{array}{rrr|r} \boxed{1} & 3 & 1 & 9 \\ 0 & \boxed{-2} & -2 & -8 \\ 0 & 0 & \boxed{1} & 0 \end{array}\right] $$

The boxed values are known as the pivots. Now we do backsubstitution, working from the bottom up. The last row is a single equation in a single unknown:

$$ 1 x_3 = 0 \implies x_3 = 0 . $$

Now that we know $x_3$, the second row gives:

$$ -2x_2 - 2x_3 = -8 \implies -2x_2 - 0 = -8 \implies x_2 = 4 . $$

Finally, now that we know $x_2$ and $x_3$, the first row gives:

$$ 1 x_1 + 3 x_2 + 1x_3 = 9 \implies x_1 + 12 + 0 = 9 \implies x_1 = -3. $$

It is much more fun to let the computer do the arithmetic than to crunch through it ourselves on the blackboard, but usually the computer does things too quickly (and it often does some re-ordering of the rows that makes it harder to follow what is going on). For example, in Julia, we can solve the above system of equations by simply:

In [1]:

A = [1 3  1
     1 1 -1
     3 11 6]

Out[1]:

3×3 Matrix{Int64}:
 1   3   1
 1   1  -1
 3  11   6

In [2]:

b = [9, 1, 35]

Out[2]:

3-element Vector{Int64}:
  9
  1
 35

In [3]:

x = A \ b # solves Ax = b by (essentially) Gaussian elimination

Out[3]:

3-element Vector{Float64}:
 -3.0
  4.0
 -0.0

In [4]:

A^-1 * b

Out[4]:

3-element Vector{Float64}:
 -3.0
  4.0
  0.0

What's next?¶

The basic Gaussian-elimination process is essentially the same as what you learned in 8th–9th grade algebra. Where do we go from here? There are several directions we would like to pursue in 18.06:

Zooming out and understanding this process algebraically: we want to think of elimination steps in terms of matrix operations (matrix products), and this will eventually lead us to viewing elimination as a factorization of the matrix A: the LU factorization. This factorization is key to how Gaussian elimination is actually used in large problems, and it helps us understand A by breaking it into simpler matrices.
What happens if you encounter a zero pivot?
- At the simplest hand-calculation level, you just swap rows if you can, as reviewed below. But in practical numerical calculations something more interesting happens — it turns out that you need to swap rows even if the pivots are nonzero! This eventually leads us to a "PA = LU" factorization, which is critical if you actually want to use Gaussian elimination in real life.
- If you can't swap rows to get rid of a zero pivot, then the matrix is singular. In high school, you typically just give up if the matrix is singular, but it turns out that there are lots of things to say about this case.
How does this process scale? It is good to understand what happens for $n \times n$ matrices. Is Gaussian elimination practical on a computer for $n = 1000$? What about $n = 10^6$?

Row swaps¶

Occasionally, we may encounter a zero in the pivot position. Sometimes this means that the equations are singular (may have no solutions) — we will talk more about this later. However, as long as there is a nonzero value below the pivot, we can fix the problem by swapping rows (which just corresponds to re-ordering the equations).

For example:

$$ \left[\begin{array}{rrr|r} \boxed{1} & 3 & 1 & 9 \\ 1 & 3 & -1 & 1 \\ 3 & 11 & 6 & 35 \end{array}\right]\to \left[\begin{array}{rrr|r} \boxed{1} & 3 & 1 & 9 \\ 0 & 0 & -2 & -8 \\ 0 & 2 & 3 & 8 \end{array}\right]\to \left[\begin{array}{rrr|r} \boxed{1} & 3 & 1 & 9 \\ 0 & \boxed{2} & 3 & 8 \\ 0 & 0 & \boxed{-2} & -8 \end{array}\right] $$

where in the second step we swapped the second and third rows to get a nonzero pivot in the second row.

At this point we can again solve bottom-up by backsubstitution:

$$ -2x_3 = 8 \implies x_3 = 4 \\ 2x_2 + 3x_3 = 8 = 2x_2 + 12 \implies x_2 = -2 \\ x_1 + 3x_2 + x_3 = 9 = x_1 -6 + 4 \implies x_3 = 11 $$

Of course, the computer can get the answer much more quickly and easily:

In [5]:

Out[5]:

3-element Vector{Float64}:
 11.000000000000005
 -2.0000000000000013
  4.0

“Big” matrices on the computer¶

Of course, the computer can solve much bigger problems easily. It can solve 1000 equations in 1000 unknowns in a fraction of a second — nowadays, that is no longer considered a "big" system of equations.

In [6]:

Ahuge = rand(1000,1000)

Out[6]:

1000×1000 Matrix{Float64}:
 0.754693  0.560733   0.910327   …  0.611877  0.292508   0.72601
 0.351665  0.369858   0.871103      0.25676   0.0405945  0.726211
 0.763593  0.464121   0.390283      0.604981  0.147775   0.72878
 0.453561  0.147726   0.600355      0.178248  0.914899   0.757964
 0.247388  0.981658   0.410212      0.817523  0.826953   0.440147
 0.633554  0.862141   0.688732   …  0.264323  0.497675   0.514361
 0.133259  0.527097   0.376906      0.965998  0.945114   0.884697
 0.711564  0.423059   0.870123      0.823831  0.272452   0.92002
 0.564027  0.638453   0.530519      0.900897  0.0810425  0.0930512
 0.214856  0.290041   0.819278      0.90219   0.552371   0.553739
 0.903851  0.897284   0.759832   …  0.208446  0.173989   0.0596702
 0.104133  0.113186   0.0797224     0.250324  0.604153   0.41286
 0.235456  0.590892   0.789271      0.973052  0.294701   0.143293
 ⋮                               ⋱                       
 0.215499  0.188321   0.975163      0.957026  0.871916   0.828928
 0.967067  0.591485   0.632038      0.010651  0.100226   0.991383
 0.119558  0.460483   0.199182   …  0.261509  0.324492   0.965666
 0.607541  0.628893   0.434687      0.263299  0.525989   0.577185
 0.230643  0.298218   0.603661      0.184217  0.13649    0.918041
 0.976573  0.985378   0.0157184     0.842938  0.745338   0.0950755
 0.461015  0.438604   0.308731      0.139264  0.478038   0.482712
 0.732309  0.0759328  0.585691   …  0.638138  0.604144   0.850571
 0.636802  0.322266   0.469396      0.578963  0.205645   0.172984
 0.414916  0.494336   0.807718      0.320688  0.657784   0.982512
 0.293371  0.493783   0.74403       0.799018  0.264314   0.186339
 0.104612  0.662899   0.301098      0.698438  0.609498   0.937989

In [7]:

bhuge = rand(1000)

Out[7]:

1000-element Vector{Float64}:
 0.11648951400817487
 0.638398291445165
 0.8298021277628341
 0.6578245083268514
 0.2992909571326191
 0.04434321426269938
 0.2273049093293089
 0.9208209561794793
 0.508792941046068
 0.6918189711915684
 0.6824429390834486
 0.914443078331818
 0.2796681550890874
 ⋮
 0.9260513170733145
 0.9305175378530915
 0.019844020048994104
 0.5146618522657653
 0.9889596344281131
 0.7890502663051733
 0.7176160909676175
 0.33518616859067674
 0.6782977395680323
 0.6828375702950923
 0.6045951621057377
 0.9432704795434865

In [8]:

Ahuge \ bhuge

Out[8]:

1000-element Vector{Float64}:
 -0.40242365163345395
 -1.2575603935803086
  1.3767709589169037
  0.020777542800422127
 -0.43224063490856907
  0.07846278754888261
 -1.2788481293898533
 -0.23363827638256934
  1.1367178501838064
 -0.6419266345464728
  1.16574989194809
 -1.0109259917168187
 -0.2547952187241332
  ⋮
  0.17760057480194702
 -0.026736570616518872
  1.7923137060044423
 -0.9918298946342873
 -1.0221141312210762
  0.4426412052246268
 -0.13777635985018952
 -0.4722031299540175
 -0.14595252778895984
 -0.37017438393408275
  0.88508351548458
 -0.1397370446925542

In [9]:

@time Ahuge \ bhuge;
@time Ahuge \ bhuge;
@time Ahuge \ bhuge;

  0.016476 seconds (4 allocations: 7.645 MiB)
  0.017585 seconds (4 allocations: 7.645 MiB)
  0.014655 seconds (4 allocations: 7.645 MiB)

If we want to see the matrix $U$ from above, we use the fact (covered soon in 18.06) that Gaussian elimination is really "LU" factorization, performed by the function lufact in the built-in LinearAlgebra package. By default, however, "serious" computer implementations of this process automatically re-order the rows to reduce the effect of roundoff errors, so we need to pass an extra option that tells Julia not to do this. (You should not normally do this, except for learning exercises.)

In [10]:

using LinearAlgebra

In [11]:

# LU factorization (Gaussian elimination) of the augumented matrix [A b], 
# passing the NoPivot() option to prevent row re-ordering
F = lu([A b],  NoPivot()) # a "factorization" object storing both L and U
F.U # just show U

Out[11]:

3×4 Matrix{Float64}:
 1.0   3.0   1.0   9.0
 0.0  -2.0  -2.0  -8.0
 0.0   0.0   1.0   0.0

However, it would be nice to show the individual steps of this process. This requires some programming.

Code to interactively visualize Gaussian elimination¶

The following is some slightly tricky code that lets us visualize the process of Gaussian elimination in Julia. It takes advantage of the Interact package in Julia, which allows us to easily create interactive displays using sliders, pushbuttons, and other widgets.

Implementing this is not really a beginner exercise for new Julia programmers, though it is fairly straightforward for people who are used to Julia. It involves defining our own type to control display, our own implementation of Gaussian elimination that allows us to stop partway through, and using the Interact package to create interactive widgets.

You can skip this part if you aren't ready for the programming details.

In [12]:

"""
    naive_gauss(A, [step])

Given a matrix `A`, performs Gaussian elimination to convert
`A` into an upper-triangular matrix `U`.

This implementation is "naive" because it *never re-orders the rows*.
(It will obviously fail if a zero pivot is encountered.)

If the optional `step` argument is supplied, only performs `step`
steps of Gaussian elimination.

Returns `(U, row, col, factor)`, where `row` and `col` are the
row and column of the last step performed, while `factor`
is the last factor multiplying the pivot row.
"""
function naive_gauss(A, step=typemax(Int))
    m = size(A,1) # number of rows
    factor = A[1,1]/A[1,1]
    step ≤ 0 && return (A, 1, 1, factor)
    U = copyto!(similar(A, typeof(factor)), A)
    for j = 1:m   # loop over m columns
        for i = j+1:m   # loop over rows below the pivot row j
            # subtract a multiple of the pivot row (j)
            # from the current row (i) to cancel U[i,j] = Uᵢⱼ:
            factor = -U[i,j]/U[j,j]
            U[i,:] = U[i,:] + U[j,:] * factor
            step -= 1
            step ≤ 0 && return (U, i, j, factor)
        end
    end
    return U, m, m, factor
end

Out[12]:

naive_gauss

For display purposes on the 18.06 github page, we can't use interactive sliders, so I just use the following code to switch to non-interactively output each of the "steps" of the animation.

To go back to interactive sliders, change import Interact to using Interact below:

In [13]:

import Interact  # change to "using Interact" to make interactive

# non-interactive replacements for Interact features
if !isdefined(Main, :slider)
    # non-interactive replacement for Interact.@manipulate
    macro manipulate(expr)
        return Expr(:for, esc(expr.args[1]), :(display($(esc(expr.args[2])))))
    end
    slider(r; kws...) = r
end

The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.

Out[13]:

slider (generic function with 1 method)

In [14]:

# For display, I only want to show 3 decimal places of floating-point values,
# but I want to show integers and similar types exactly, so I define a little
# function to do this rounding
shorten(x::AbstractFloat) = round(x, digits=3)
shorten(x) = x # leave non floating-point values as-is

# create an interactive widget to visualize the Gaussian-elimination process for the matrix A.
function visualize_gauss(A)
    m = size(A, 1)
    @manipulate for step in slider(1:(m*(m-1))÷2, value=1, label="gauss step")
        Uprev, = naive_gauss(A, step-1)
        U, row, col, factor = naive_gauss(A, step)
        pivot = U[col,col]
        Interact.vbox(
            Interact.node(:pre, "Gaussian elimination for column $col with pivot $pivot: add $(shorten(factor)) * (row $col) to (row $row)"),
            Interact.hbox(shorten.(Uprev),
                Interact.hskip(1*Interact.em), "⟶", Interact.hskip(1*Interact.em),
                shorten.(U)
            )(Interact.alignitems(:center))
        )(Interact.alignitems(:center))
    end
end

Out[14]:

visualize_gauss (generic function with 1 method)

Gaussian elimination examples¶

Now, let's use this machinery to interact with some examples, starting with our $3 \times 3$ matrix from above:

In [15]:

visualize_gauss([A b])

In [16]:

visualize_gauss(rand(-9:9,5,5))

Of course, because we are not re-ordering the rows, this process can go horribly wrong, most obviously if a zero pivot is encountered:

In [17]:

Abad = [-3   5   5   3  -7
         3  -5   8  -8  -6
         8   2   8   2  -8
        -6  -2   6   4  -8
        -8   4  -6  -1   8]
visualize_gauss(Abad)

But this matrix is not actually singular:

In [18]:

det(Abad)

Out[18]:

19211.999999999996

So we can fix the problem just by re-ordering the rows, e.g. swapping the first and last rows:

In [19]:

Aok = [-8   4  -6  -1   8
        3  -5   8  -8  -6
        8   2   8   2  -8
       -6  -2   6   4  -8
       -3   5   5   3  -7]
visualize_gauss(Aok)

A bigger example¶

We quickly run out of space for displaying matrices as text, but we can visualize the process for larger matrices by using images, with the PyPlot package (a wrapper around the Python Matplotlib library):

In [20]:

using PyPlot

In [21]:

m = 100
Abig = randn(m,m)
fig = figure()
nsteps = (m*(m-1))÷2
@manipulate for step in slider(0:550:nsteps, value=2, label="gauss step")
    withfig(fig) do
        U, row, col = naive_gauss(Abig, step)
        # I had to experiment a little to find a nice way to plot this
        V = log10.(abs.(U) .+ 500)
        V[abs.(U) .< 0.001] .= 0 # color small entries black
        imshow(V, cmap="hot", vmin=0, vmax=3)
        title("step $step: column $col, row $row")
        colorbar(label=L"\log_{10}(|U_{i,j}| + 500)")
    end
end

┌ Warning: `vendor()` is deprecated, use `BLAS.get_config()` and inspect the output instead
│   caller = npyinitialize() at numpy.jl:67
└ @ PyCall /Users/stevenj/.julia/packages/PyCall/L0fLP/src/numpy.jl:67

Note that it takes a lot more steps of Gaussian elimination for a $100 \times 100$ matrix (4950 steps) than for a $5 \times 5$ matrix (10 steps). Later on in 18.06, we will analyze the computational cost of Gaussian elimination and how it scales with the size of the matrix (in computer science, this is known as the complexity of the algorithm).

Another interesting example can be found in the Machine Learning with Gaussian Elimination notebook.