Notebook

Householder Reflections and Givens Rotations¶

Using rotations and reflections to introduce zeros in a matrix is a powerful paradigm in numerical linear algebra.

Reflections and rotations are orthogonal transformations. They preserve the Euclidean lengths of real vectors.

They don't shrink and they don't stretch too much - these transformations have the perfect condition number, $\kappa=1$ (at least, in the 2-norm and other unitarily invariant norms like the Frobenius norm).

This notebook walks through the basic construction and application of two important variants.

Householder reflections: roughly speaking, these are used for dense problems, introducing zeros column by column.
Givens rotations: These are often used for data-sparse problems, for instance, introducing zeros one entry at a time in a sparse matrix.

In [ ]:

using LinearAlgebra
using SparseArrays

Householder Reflections¶

The Householder reflector $F_v$, constructed from a real $k$-dimensional vector $v$, is a rank-one modification of the identity:

$$F_v = I - 2\alpha xx^*, \quad\text{where}\quad x = {\rm sign}(v_1)\|v\|e_1+v \quad\text{and}\quad \alpha = \|x\|^2.$$

It looks like the orthogonal projection onto subspace orthogonal to $x$, but it is actually a reflection across the subspace orthogonal to $x$. The orthogonal projection has rank $k-1$, but the reflection has rank $k$ and $F_v^{-1}=F_v^T=F_v$.

The vector $x$ is chosen so that the reflection takes the vector $v$ to the first coordinate axis:

$$ F_v\begin{pmatrix}v_1 \\ v_2 \\ \vdots \\ v_k \end{pmatrix} = \begin{pmatrix}\|v\| \\ 0 \\ \vdots \\ 0\end{pmatrix}.$$

In [ ]:

# compute the Householder reflector
function hhr(v)

    x = copy(v)                         # copy v to x
    x[1] = x[1] + sign(x[1])*norm(x)    # modify first entry of x
    return x
end


v = randn(6)
v = v / norm(v)
x = hhr(v)
Fv = I-2*x*x' / (x'*x)

display(Fv)                     # here's the full reflection matrix
display(norm(I - Fv'*Fv))       # orthogonal transformations have transpose = inverse
display(Fv*v)                   # x is chosen so that Fv*v = ||v|| * e1 (on the computer, remember that rounding errors occur)

In practice, we never form the matrix $F_v$ explicitly. We apply it to vectors as a linear transformation by computing

$$ F_vw = w - 2\alpha x (x^*w). $$

The arithmetic cost is a dot product, a vector addition, and a multiplication. Much better than building the whole matrix and multiplying by a vector.

We also only need to store the reflection vector $x$ (you could also store the scalar to avoid calculating $\alpha = \|x\|^2$ again if you want).

In [ ]:

function apply_hhr(x, w)
    
    return w - 2 * x * (x' * w) / (x' * x)
end

w = randn(6)
Fvw = apply_hhr(x, w)               # we can use the computed reflector x to apply Fv to any vector without forming the full reflection matrix
display(Fvw)                        # vectors other than v get reflected across the same subspace, but don't usually end up along a coordinate axis (mostly nonzeros)
display(norm(Fvw)-norm(w))          # but, reflection means norm is preserved for _any_ vector

Householder reflections really come into their strength when we use them to introduce zeros and factor matrices. Here's a toy implementation for a familiar example: $A=QR$.

In [ ]:

function hqr(A, k)
    # Householder triangularization on the first k columns of A
    
    R = copy(A)
    n = size(A,2)
    X = zeros(size(A))
    for j in 1:k
        X[j:end,j] = hhr(R[j:end,j]')                           # get Householder reflector
        R[j:end,j:end] = apply_hhr(X[j:end,j],R[j:end,j:end])   # introduce zeros in n-j x n-j lower right submatrix
    end
    return X, R                                                 # return reflectors (for orthogonal Q) and upper triangular R
end

A = randn(8,5)
F = hqr(A,5)
display(abs.(F[2]).>1e-14)              # R is now (numerically) zero below the diagonal
display(abs.(F[1]).>1e-14)              # F[1] contains Householder vectors of decreasing size

In practice, the Householder reflectors are usually stored in a compact blocked format that enjoys better storage properties and enables faster matrix-matrix multiplication implementations.

The point is, we store the reflectors, and not the full reflection matrix!

Next week, Householder reflectors will play a starring role in the "crown jewel" of numerical analysis: the QR algorithm for computing eigenvalues (not to be mistaken with the QR factorization).

Givens rotations¶

Householder reflections naturally operate on columns of $A$. But what if most column entries in $A$ are nonzero? We can introduce zeros one entry at a time with Givens rotations.

You can see the idea clearest in two dimensions first, where the vector $x = (x_1,\,\,x_2)^T$ is rotated counterclockwise by an angle $\theta$ into the vector $y = (y_1,\,\,y_2)^T$ by

$$ \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \begin{pmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}. $$

Given $x$, how should we chose $\theta$ so that $y_2 = 0$? We need to rotate $x$ counterclockwise so that it lies along the $e_1=(1,\,\,0)^T$ coordinate axis! If we choose $\cos(\theta) = x_1/\|x\|$ and $\sin(\theta) = - x_2/\|x\|$, then

$$ \begin{pmatrix} \|x\| \\ 0 \end{pmatrix} = \frac{1}{\|x\|}\begin{pmatrix} x_1 & x_2 \\ -x_2 & x_1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}. $$

The matrix we constructed is a rotation - an orthogonal transformation - that zeros out the second entry of the special vector $x$. A Givens rotation is the $2\times 2$ rotation analogue of a Housholder reflection!

In numerical linear algebra, we are usually concerned with more than just two dimensions. But we can use Givens rotations to introduce one zero at a time by, e.g., mixing two rows at a time. Conceptually, this means embedding the Givens rotation matrix into a larger identity matrix. For example if we want to use the first entry of a $5$-dimensional vector to zero out its last entry, we could write

$$ \begin{pmatrix} \sqrt{x_1^2+x_5^2} \\ x_2 \\ x_3 \\ x_4 \\ 0 \end{pmatrix} = \begin{pmatrix} c & 0 & 0 & 0 & s \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ -s & 0 & 0 & 0 & c \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{pmatrix}, \qquad\text{where}\qquad c = \frac{x_1}{\sqrt{x_1^2+x_5^2}} \qquad\text{and}\qquad s = \frac{x_5}{\sqrt{x_1^2+x_5^2}}. $$

Just as with Householder reflections, we never form the full rotation matrix on the computer. We store $c$ and $s$ and apply the Givens rotation as a linear transformation that combines two rows (when applied from the left).

In [ ]:

function toy_givens(x, j, k)
    
    r = sqrt(x[j]^2 + x[k]^2)
    c = x[j] / r
    s = x[k] / r

    return c, s
end

function apply_givens(v, c, s, j, k)

    w = copy(v)

    w[j,:] = c*v[j,:] + s*v[k,:]
    w[k,:] = -s*v[j,:] + c*v[k,:]

    return w
end

N = 10
A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))
g = toy_givens(A[:,1], 1, 2)                                        # compute Givens rotation to zero out first subdiagonal entry
B = apply_givens(A, g[1], g[2], 1, 2)                               # apply Givens rotation to mix first two rows of A
display(sparse(A))                                                  # display matrix before Givens
display(sparse(B))                                                  # display matrix after Givens
display( norm(A[:,2]) - norm(B[:,2]) )                              # column norm is preserved

There are a number of subtle points to get right when doing Givens on the computer. Luckily for us, Julia has a great ready-to-use function for computing and working with Givens rotations. Let's test it out on the same example we used for our "toy" Givens rotation code.

In [ ]:

N = 10
A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))
G = givens(A, 1, 2, 1)
B = G[1]*A
display(sparse(A))
display(sparse(B))
display( norm(A[:,2]) - norm(B[:,2]) )

It looks like our toy code did okay on this simple example. Now, let's string a few more Givens rotations together to triangularize the tridiagonal matrix $A$ above. We'll also allow it to accumulate the same Givens rotations applied to another matrix B, which is useful if we want to express the orthogonal factor $Q$ as a matrix, or if we want to compute $Q^Tb$ for least-squares problems.

In [ ]:

function triQR(A,B)
    # compute the QR decomposition of a tridiagonal matrix using Givens rotations
    
    R = copy(A)
    QTB = copy(B)
    n = size(R,2)
    for j in 1:n-1
        G = givens(R, j, j + 1, j)
        R = G[1]*R
        QTB = G[1]*QTB
    end

    return R, QTB
end

F = triQR(A,diagm(0 => ones(N)))
display(abs.(F[1]).>1e-14)              # F[1] is the triangular factor - it is banded with upper bandwidth = 3
display(norm(F[2]'*F[2]-I))             # F[2] is the transpose of the orthogonal factor

Banded matrices are very special. What happens for other types of sparse matrices? Let's add a row of ones to the first row of the difference matrix and see how it effects the $QR$ factorization.

$$ A = \begin{pmatrix} 1 & 1 & 1 & 1 & \cdots & 1 \\ -1 & 2 & -1 & 0 & \cdots & 0 \\ 0 & -1 & -2 & 1 & \cdots & 0 \\ \vdots & & \ddots & \ddots & \ddots & \vdots \\ 0 & \cdots & & 0 & -1 & 2 \end{pmatrix} $$

In [ ]:

A[1,:] = ones(1,N)
F = triQR(A,diagm(0 => ones(N)))
display(abs.(F[1]).>1e-14)              # F[1] is the triangular factor - it is banded with upper bandwidth = 3
display(norm(F[2]'*F[2]-I))             # F[2] is the transpose of the orthogonal factor

The upper triangular factor is now completely dense... what happened?

We can explore by running Householder QR on $A$ one column at a time, stopping to visualize how the upper triangular factor fills in.

In [ ]:

F = hqr(A, 0)
display(abs.(F[2]).>1e-14)              # R is now (numerically) zero below the diagonal

Just as we saw with Gaussian elimination and $A=LU$, the factors of $A$ can suffer from fill-in: many more nonzero entries than in the original matrix.