%matplotlib inline
%config InlineBackend.figure_format='retina'
# import libraries
import numpy as np
import matplotlib as mp
import pandas as pd
import matplotlib.pyplot as plt
import laUtilities as ut
import slideUtilities as sl
from IPython.display import Image
from IPython.display import display_html
from IPython.display import display
reload(ut)
print ''
%%html
<style>
.container.slides .celltoolbar, .container.slides .hide-in-slideshow {
display: None ! important;
}
</style>
# image credit: http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss#mediaviewer/File:Carl_Friedrich_Gauss.jpg
sl.hide_code_in_slideshow()
display(Image("images/Carl_Friedrich_Gauss.jpg", width=250))
In the last lecture we described a method for solving linear systems, but our description was somewhat informal. Today we'll formally define Gaussian Elimination, sometimes called Gauss-Jordan Elimination.
Carl Gauss lived from 1777 to 1855, in Germany. He is often called "the greatest mathematician since antiquity."
When Gauss was around 17 years old, he developed a method for working with inconsistent linear systems, called the method of least squares. A few years later (at the advanced age of 24) he turned his attention to a particular problem in astronomy. In 1801 the Sicilian astronomer Piazzi discovered a (dwarf) planet, which he named Ceres, in honor of the patron goddess of Sicily. Piazzi took measurements of Ceres' position for 40 nights, but then lost track of it when it passed behind the sun. Piazzi had only tracked Ceres through about 3 degrees of sky. Gauss however then succeeded in calculating the orbit of Ceres, even though the task seemed hopeless on the basis of so few observations. His computations were so accurate that the astronomer Olbers located Ceres again later the same year.
In the course of his computations Gauss had to solve systems of 17 linear equations. Since Gauss at first refused to reveal the methods that led to this amazing accomplishment, some even accused him of sorcery. Eight years later, in 1809, Gauss revealed his methods of orbit computation in his book Theoria Motus Corporum Coelestium.
Although Gauss invented this method (which Jordan then popularized), it was a reinvention. As we mentioned in the previous lecture, linear systems were being solved by a similar method in China 2,000 years earlier.
Based on Bretscher, Linear Algebra, pp 17-18, and the Wikipedia article on Gauss.
A system of linear equations is plotted below. We create an augmented matrix to represent this linear system, then perform a series of elementary row operations. Which of the following graphs could represent the result of these row operations?
sl.hide_code_in_slideshow()
display(Image("images/RowOperationsQuestion.jpg", width=450))
An echelon is a term used in the military to decribe an arrangement of rows (of troops, or ships, etc) in which each successive row extends further than the row in front of it.
At the end of the last lecture, we had constructed this matrix:
$$ \left[\begin{array}{rrrr} 2&-3&2&1\\ 0&1&-4&8\\ 0&0&0&-37/2 \end{array}\right] $$A "leading entry" is the first nonzero element in a row.
Definition: A matrix is in echelon form (or row echelon form) if it has the following three properties:
For example: $$ \left[\begin{array}{cccccccccc} 0&\blacksquare&*&*&*&*&*&*&*&*\\ 0&0&0&\blacksquare&*&*&*&*&*&*\\ 0&0&0&0&\blacksquare&*&*&*&*&*\\ 0&0&0&0&0&\blacksquare&*&*&*&*\\ 0&0&0&0&0&0&0&0&\blacksquare&*\\ 0&0&0&0&0&0&0&0&0&0\\ \end{array} \right] $$
In this diagram, the $\blacksquare$s are nonzero, and the $*$s can be any value.
This definition is a refinement of the notion of a triangular matrix (or system) that was introduced in the previous lecture.
The goal of the first step of Gaussian elimination is to convert the augmented matrix into echelon form.
Definition: A matrix is in reduced echelon form (or reduced row echelon form) if it is in echelon form, and furthermore:
For example: $$ \left[\begin{array}{cccccccccc} 0&\fbox{1}&*&0&0&0&*&*&0&*\\ 0&0&0&\fbox{1}&0&0&*&*&0&*\\ 0&0&0&0&\fbox{1}&0&*&*&0&*\\ 0&0&0&0&0&\fbox{1}&*&*&0&*\\ 0&0&0&0&0&0&0&0&\fbox{1}&*\\ 0&0&0&0&0&0&0&0&0&0\\ \end{array} \right] $$
The goal of the second step of Gaussian elimination is to convert the matrix into reduced echelon form.
Any matrix may be row reduced to an echelon form. Echelon forms are not unique; depending on the sequence of row operations, different echelon forms may be produced from a given matrix.
However, the reduced echelon form of a matrix is unique.
Theorem: Each matrix is equivalent to one and only one reduced echelon matrix.
The positions of the leading entries of an echelon matrix and its reduced form are the same. So, by the Theorem, the leading entries of any echelon form of a given matrix are in the same positions.
Definition: A pivot position in a matrix $A$ is the position of a leading 1 in the reduced echelon form of $A$.
As suggested by the last lecture, Gaussian Elimination has two stages. Given an augmented matrix $A$ representing a linear system:
Each stage iterates over the rows of $A$, starting with the first row.
Row Reduction Operations
Before stating the algorithm, let's recall the set of operations that we can perform on rows without changing the solution set:
Gaussian Elimination, Stage 1 (Elimination):
Input: matrix $A$.
We will use $i$ to denote the index of the current row. To start, let $i = 1$. Repeat the following steps:
The output of this stage is an echelon form of $A$.
In the following, '*' means a nonzero value, and '0' means a zero value.
For the following starting matrix:
$$ \left[\begin{array}{ccc} *&*&*\\ *&*&*\\ *&*&*\\ \end{array}\right] $$which of the following situations could occur during the Stage 1 of Gaussian Elimination?
$$ \begin{array}{cccc} \left[\begin{array}{ccc} 0&*&*\\ *&*&*\\ *&*&*\\ \end{array}\right] & \left[\begin{array}{ccc} *&*&*\\ 0&*&*\\ *&*&*\\ \end{array}\right] & \left[\begin{array}{ccc} *&*&*\\ *&0&*\\ *&*&*\\ \end{array}\right] & \left[\begin{array}{ccc} *&0&*\\ *&*&*\\ *&*&*\\ \end{array}\right] \\ (a)&(b)&(c)&(d)\\ \end{array} $$Gaussian Elimination, Stage 2 (Backsubstitution):
Input: an echelon form of $A$.
We start at the top again, so let $i = 1$. Repeat the following steps:
The output of this stage is the reduced echelon form of $A$.
Note: the book describes this stage as proceeding from the bottom of the matrix up; that procedure is equivalent to this one.
The Gaussian Elimination process we've described is essentially equivalent to the process described in the last lecture, so we won't do a lengthy example. Let the input matrix $A$ be $$\left[\begin{array}{rrrrrr} 0 & 3 & -6 & 6 & 4 & -5\\ 3 & -7 & 8 & -5 & 8 & 9\\ 3 & -9 & 12 & -9 & 6 & 15 \end{array}\right]$$
Stage 1
Start with the first row ($i = 1$). The leftmost nonzero in row 1 and below is in position 1. But since it's not in row 1, we need to swap. We'll swap rows 1 and 3 (we could have swapped 1 and 2).
$$\left[\begin{array}{rrrrrr} 3 & -9 & 12 & -9 & 6 & 15\\ 3 & -7 & 8 & -5 & 8 & 9\\ 0 & 3 & -6 & 6 & 4 & -5 \end{array}\right]$$The pivot is shown in a box. Use row reduction operations to create zeros below the pivot. In this case, that means subtracting row 1 from row 2.
$$\left[\begin{array}{rrrrrr} \fbox{3} & -9 & 12 & -9 & 6 & 15\\ 0 & 2 & -4 & 4 & 2 & -6\\ 0 & 3 & -6 & 6 & 4 & -5 \end{array}\right]$$Now $i = 2$. The pivot is boxed (no need to do any swaps). Use row reduction to create zeros below the pivot. To do so we subtract $3/2$ times row 2 from row 3.
$$\left[\begin{array}{rrrrrr} 3 & -9 & 12 & -9 & 6 & 15\\ 0 & \fbox{2} & -4 & 4 & 2 & -6\\ 0 & 0 & 0 & 0 & 1 & 4 \end{array}\right]$$Now $i = 3$. Since it is the last row, we are done with Stage 1. The pivots are marked:
$$\left[\begin{array}{rrrrrr} \fbox{3} & -9 & 12 & -9 & 6 & 15\\ 0 & \fbox{2} & -4 & 4 & 2 & -6\\ 0 & 0 & 0 & 0 & \fbox{1} & 4 \end{array}\right]$$Stage 2
Starting again with the first row ($i = 1$). Divide row 1 by its pivot.
$$\left[\begin{array}{rrrrrr} \fbox{1} & -3 & 4 & -3 & 2 & 5\\ 0 & 2 & -4 & 4 & 2 & -6\\ 0 & 0 & 0 & 0 & 1 & 4 \end{array}\right]$$Moving to the next row ($i = 2$). Divide row 2 by its pivot.
$$\left[\begin{array}{rrrrrr} 1 & -3 & 4 & -3 & 2 & 5\\ 0 & \fbox{1} & -2 & 2 & 1 & -3\\ 0 & 0 & 0 & 0 & 1 & 4 \end{array}\right]$$And use row reduction operations to create zeros in all elements above the pivot. In this case, that means adding 3 times row 2 to row 1.
$$\left[\begin{array}{rrrrrr} 1 & 0 & -2 & 3 & 5 & -4\\ 0 & \fbox{1} & -2 & 2 & 1 & -3\\ 0 & 0 & 0 & 0 & 1 & 4 \end{array}\right]$$Moving to the next row ($i = 3$). The pivot is already 1. So we subtract row 3 from row 2, and subtract 5 times row 3 from row 1.
$$\left[\begin{array}{rrrrrr} 1 & 0 & -2 & 3 & 0 & -24\\ 0 & 1 & -2 & 2 & 0 & -7\\ 0 & 0 & 0 & 0 & \fbox{1} & 4 \end{array}\right]$$And we are done.
Let's assess the computational cost required to solve a system of $n$ equations in $n$ unknowns.
First, we need to define our units. We will count the number of additions, multiplications, divisions, or subtractions. These are performed on floating point numbers, so they are called flops (floating point operations).
For $n$ equations in $n$ unknowns, $A$ is an $n \times (n+1)$ matrix. We can summarize stage 1 of GE as, in the worst case:
For row 1, this becomes $(n-1) \cdot 2(n+1)$ flops. That is, there are $n-1$ rows below row 1, each of those has $n+1$ elements, and each element requires one multiplication and one addition. This is $2n^2-2$ flops for row 1.
When operating on row $i$, there are $k = n - i + 1$ unknowns and so there are $2k^2 - 2$ flops required to process the rows below row $i$. So we can see that $k$ ranges from $n$ down to $1$. The number of operations required is then:
$$ \begin{array}{rcl} \sum_{k=1}^n (2k^2 - 2) &=& 2 \left(\sum_{k=1}^n k^2 - \sum_{k=1}^n 1\right)\\ &=& 2 \left(\frac{n(n+1)(2n+1)}{6} - n\right)\\ &=& \frac{2}{3} n^3 + n^2 - \frac{5}{3} n \end{array} $$The second step above is based on known formulas.
Note that this is a different calculation than is done in the book.
When $n$ is large, this expression is dominated by (approximately equal to) $\frac{2}{3} n^3$.
The second stage of GE only requires on the order of $n^2$ flops, so the whole algorithm is dominated by the $\frac{2}{3} n^3$ flops in the first stage.
Fact: For a system of $n$ equations in $n$ unknowns, Gaussian Elimination requires approximately $\frac{2}{3} n^3$ operations when $n$ is large.
Question: In the homework you solve an equation with 4 unknowns. At the beginning of the lecture we talked about Gauss solving large systems, on the order of 16 unknowns. If it took you 10 minutes to solve the 4-unknown problem, about how long would you think Gauss might have needed for his 16-unknown system?
Answer: For a system that is 4 times larger, one needs to do $4^3 = 64$ times as much work. So our prediction has Gauss working on a single system for 640 minutes, or more than 10 hours.
Returning to the fundamental questions about a linear system -- consistency and uniqueness -- we've discussed how the echelon form exposes consistency (by creating an equation $0 = k$ for some nonzero $k$). Now we can also discuss uniqueness.
Let's assume that the augmented matrix of a system has been transformed into the equivalent reduced echelon form:
$$ \left[\begin{array}{rrrr} 1&0&-5&1\\ 0&1&1&4\\ 0&0&0&0 \end{array}\right] $$This system is consistent. Is the solution unique?
The associated system of equations is
$$ \begin{array}{rrrrr} x_1 & & -5x_3 &=& 1\\ &x_2 & +x_3 &=& 4\\ &&0&=&0\\ \end{array} $$Variables $x_1$ and $x_2$ correspond to pivot columns. They are called basic variables. The other variable $x_3$ is a free variable.
Whenever a system is consistent, the solution set can be described explicitly by solving the reduced system of equations for the basic variables in terms of the free variables.
This operation is possible because the reduced echelon form places each basic variable in one and only one equation.
In the example, solve the first and second equations for $x_1$ and $x_2$. Ignore the third equation; it offers no restriction on the variables.
So the solution set is:
$$\begin{array}{rl} x_1 &= 1 + 5x_3\\ x_2 &= 4 - x_3\\ x_3 &\mbox{is free} \end{array} $$"$x_3$ is free" means you can choose any value for $x_3$. There are an inifinite set of solutions, each corresponding to one value of $x_3$. For instance, when $x_3 = 0$, the solution is (1,4,0); when $x_3 = 1,$ the solution is (6,3,1).
These are parametric descriptions of solutions sets. The free variables act as parameters. So solving a system amounts to finding a parametric description of the solution set (or determining that the solution set is empty).
sl.hide_code_in_slideshow()
fig = plt.figure()
xmin = zmin = -4
xmax = zmax = 4
ymin = 0
ymax = 8
ax = fig.add_subplot(111, projection='3d')
ax.axes.set_xlim([xmin, xmax])
ax.axes.set_ylim([ymin, ymax])
ax.axes.set_zlim([zmin, zmax])
ax.axes.set_xlabel('$x_1$')
ax.axes.set_ylabel('$x_2$')
ax.axes.set_zlabel('$x_3$')
eq1 = [1,0,-5,1]
eq2 = [0,1,1,4]
ut.plotLinEqn3d(ax, eq1, 'Brown')
ut.plotLinEqn3d(ax, eq2, 'Green')
ut.plotIntersection3d(ax, eq1, eq2, 'Green')
plt.show()
Since there is a row of zeros in the reduced echelon form matrix, there are only two equations (rather than three) that determine the solution set. The equations
$$\begin{array}{rl} x_1 &= 1 + 5x_3\\ x_2 &= 4 - x_3\\ x_3 &\mbox{is free} \end{array} $$therefore are the equations of a line in 3-space.