Using rotations and reflections to introduce zeros in a matrix is a powerful paradigm in numerical linear algebra.
Reflections and rotations are orthogonal transformations. They preserve the Euclidean lengths of real vectors.
They don't shrink and they don't stretch too much - these transformations have the perfect condition number, $\kappa=1$ (at least, in the 2-norm and other unitarily invariant norms like the Frobenius norm).
This notebook walks through the basic construction and application of two important variants.
Householder reflections: roughly speaking, these are used for dense problems, introducing zeros column by column.
Givens rotations: These are often used for data-sparse problems, for instance, introducing zeros one entry at a time in a sparse matrix.
using LinearAlgebra
using SparseArrays
The Householder reflector $F_v$, constructed from a real $k$-dimensional vector $v$, is a rank-one modification of the identity:
$$F_v = I - 2\alpha xx^*, \quad\text{where}\quad x = {\rm sign}(v_1)\|v\|e_1+v \quad\text{and}\quad \alpha = \|x\|^2.$$It looks like the orthogonal projection onto subspace orthogonal to $x$, but it is actually a reflection across the subspace orthogonal to $x$. The orthogonal projection has rank $k-1$, but the reflection has rank $k$ and $F_v^{-1}=F_v^T=F_v$.
The vector $x$ is chosen so that the reflection takes the vector $v$ to the first coordinate axis:
$$ F_v\begin{pmatrix}v_1 \\ v_2 \\ \vdots \\ v_k \end{pmatrix} = \begin{pmatrix}\|v\| \\ 0 \\ \vdots \\ 0\end{pmatrix}.$$# compute the Householder reflector
function hhr(v)
x = copy(v) # copy v to x
x[1] = x[1] + sign(x[1])*norm(x) # modify first entry of x
return x
end
v = randn(6)
v = v / norm(v)
x = hhr(v)
Fv = I-2*x*x' / (x'*x)
display(Fv) # here's the full reflection matrix
display(norm(I - Fv'*Fv)) # orthogonal transformations have transpose = inverse
display(Fv*v) # x is chosen so that Fv*v = ||v|| * e1 (on the computer, remember that rounding errors occur)
In practice, we never form the matrix $F_v$ explicitly. We apply it to vectors as a linear transformation by computing
$$ F_vw = w - 2\alpha x (x^*w). $$The arithmetic cost is a dot product, a vector addition, and a multiplication. Much better than building the whole matrix and multiplying by a vector.
We also only need to store the reflection vector $x$ (you could also store the scalar to avoid calculating $\alpha = \|x\|^2$ again if you want).
function apply_hhr(x, w)
return w - 2 * x * (x' * w) / (x' * x)
end
w = randn(6)
Fvw = apply_hhr(x, w) # we can use the computed reflector x to apply Fv to any vector without forming the full reflection matrix
display(Fvw) # vectors other than v get reflected across the same subspace, but don't usually end up along a coordinate axis (mostly nonzeros)
display(norm(Fvw)-norm(w)) # but, reflection means norm is preserved for _any_ vector
Householder reflections really come into their strength when we use them to introduce zeros and factor matrices. Here's a toy implementation for a familiar example: $A=QR$.
function hqr(A, k)
# Householder triangularization on the first k columns of A
R = copy(A)
n = size(A,2)
X = zeros(size(A))
for j in 1:k
X[j:end,j] = hhr(R[j:end,j]') # get Householder reflector
R[j:end,j:end] = apply_hhr(X[j:end,j],R[j:end,j:end]) # introduce zeros in n-j x n-j lower right submatrix
end
return X, R # return reflectors (for orthogonal Q) and upper triangular R
end
A = randn(8,5)
F = hqr(A,5)
display(abs.(F[2]).>1e-14) # R is now (numerically) zero below the diagonal
display(abs.(F[1]).>1e-14) # F[1] contains Householder vectors of decreasing size
In practice, the Householder reflectors are usually stored in a compact blocked format that enjoys better storage properties and enables faster matrix-matrix multiplication implementations.
The point is, we store the reflectors, and not the full reflection matrix!
Next week, Householder reflectors will play a starring role in the "crown jewel" of numerical analysis: the QR algorithm for computing eigenvalues (not to be mistaken with the QR factorization).
Householder reflections naturally operate on columns of $A$. But what if most column entries in $A$ are nonzero? We can introduce zeros one entry at a time with Givens rotations.
You can see the idea clearest in two dimensions first, where the vector $x = (x_1,\,\,x_2)^T$ is rotated counterclockwise by an angle $\theta$ into the vector $y = (y_1,\,\,y_2)^T$ by
$$ \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \begin{pmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}. $$Given $x$, how should we chose $\theta$ so that $y_2 = 0$? We need to rotate $x$ counterclockwise so that it lies along the $e_1=(1,\,\,0)^T$ coordinate axis! If we choose $\cos(\theta) = x_1/\|x\|$ and $\sin(\theta) = - x_2/\|x\|$, then
$$ \begin{pmatrix} \|x\| \\ 0 \end{pmatrix} = \frac{1}{\|x\|}\begin{pmatrix} x_1 & x_2 \\ -x_2 & x_1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}. $$The matrix we constructed is a rotation - an orthogonal transformation - that zeros out the second entry of the special vector $x$. A Givens rotation is the $2\times 2$ rotation analogue of a Housholder reflection!
In numerical linear algebra, we are usually concerned with more than just two dimensions. But we can use Givens rotations to introduce one zero at a time by, e.g., mixing two rows at a time. Conceptually, this means embedding the Givens rotation matrix into a larger identity matrix. For example if we want to use the first entry of a $5$-dimensional vector to zero out its last entry, we could write
$$ \begin{pmatrix} \sqrt{x_1^2+x_5^2} \\ x_2 \\ x_3 \\ x_4 \\ 0 \end{pmatrix} = \begin{pmatrix} c & 0 & 0 & 0 & s \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ -s & 0 & 0 & 0 & c \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{pmatrix}, \qquad\text{where}\qquad c = \frac{x_1}{\sqrt{x_1^2+x_5^2}} \qquad\text{and}\qquad s = \frac{x_5}{\sqrt{x_1^2+x_5^2}}. $$Just as with Householder reflections, we never form the full rotation matrix on the computer. We store $c$ and $s$ and apply the Givens rotation as a linear transformation that combines two rows (when applied from the left).
function toy_givens(x, j, k)
r = sqrt(x[j]^2 + x[k]^2)
c = x[j] / r
s = x[k] / r
return c, s
end
function apply_givens(v, c, s, j, k)
w = copy(v)
w[j,:] = c*v[j,:] + s*v[k,:]
w[k,:] = -s*v[j,:] + c*v[k,:]
return w
end
N = 10
A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))
g = toy_givens(A[:,1], 1, 2) # compute Givens rotation to zero out first subdiagonal entry
B = apply_givens(A, g[1], g[2], 1, 2) # apply Givens rotation to mix first two rows of A
display(sparse(A)) # display matrix before Givens
display(sparse(B)) # display matrix after Givens
display( norm(A[:,2]) - norm(B[:,2]) ) # column norm is preserved
There are a number of subtle points to get right when doing Givens on the computer. Luckily for us, Julia has a great ready-to-use function for computing and working with Givens rotations. Let's test it out on the same example we used for our "toy" Givens rotation code.
N = 10
A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))
G = givens(A, 1, 2, 1)
B = G[1]*A
display(sparse(A))
display(sparse(B))
display( norm(A[:,2]) - norm(B[:,2]) )
It looks like our toy code did okay on this simple example. Now, let's string a few more Givens rotations together to triangularize the tridiagonal matrix $A$ above. We'll also allow it to accumulate the same Givens rotations applied to another matrix B, which is useful if we want to express the orthogonal factor $Q$ as a matrix, or if we want to compute $Q^Tb$ for least-squares problems.
function triQR(A,B)
# compute the QR decomposition of a tridiagonal matrix using Givens rotations
R = copy(A)
QTB = copy(B)
n = size(R,2)
for j in 1:n-1
G = givens(R, j, j + 1, j)
R = G[1]*R
QTB = G[1]*QTB
end
return R, QTB
end
F = triQR(A,diagm(0 => ones(N)))
display(abs.(F[1]).>1e-14) # F[1] is the triangular factor - it is banded with upper bandwidth = 3
display(norm(F[2]'*F[2]-I)) # F[2] is the transpose of the orthogonal factor
Banded matrices are very special. What happens for other types of sparse matrices? Let's add a row of ones to the first row of the difference matrix and see how it effects the $QR$ factorization.
$$ A = \begin{pmatrix} 1 & 1 & 1 & 1 & \cdots & 1 \\ -1 & 2 & -1 & 0 & \cdots & 0 \\ 0 & -1 & -2 & 1 & \cdots & 0 \\ \vdots & & \ddots & \ddots & \ddots & \vdots \\ 0 & \cdots & & 0 & -1 & 2 \end{pmatrix} $$
A[1,:] = ones(1,N)
F = triQR(A,diagm(0 => ones(N)))
display(abs.(F[1]).>1e-14) # F[1] is the triangular factor - it is banded with upper bandwidth = 3
display(norm(F[2]'*F[2]-I)) # F[2] is the transpose of the orthogonal factor
The upper triangular factor is now completely dense... what happened?
We can explore by running Householder QR on $A$ one column at a time, stopping to visualize how the upper triangular factor fills in.
F = hqr(A, 0)
display(abs.(F[2]).>1e-14) # R is now (numerically) zero below the diagonal
Just as we saw with Gaussian elimination and $A=LU$, the factors of $A$ can suffer from fill-in: many more nonzero entries than in the original matrix.