Sascha Spors, Professorship Signal Theory and Digital Signal Processing, Institute of Communications Engineering (INT), Faculty of Computer Science and Electrical Engineering (IEF), University of Rostock, Germany
Winter Semester 2023/24 (Master Course #24512)
Feel free to contact lecturer frank.schultz@uni-rostock.de
Some convenient functions are found in scipy.linalg
, some in numpy.linalg
import numpy as np
from scipy.linalg import svd, diagsvd, inv, pinv, null_space, norm
from numpy.linalg import matrix_rank
np.set_printoptions(precision=2, floatmode="fixed", suppress=True)
rng = np.random.default_rng(1234)
mean, stdev = 0, 1
# we might convince ourselves that all works for complex data as well
# then the ^H operator (conj().T) needs to be used instead of just .T
use_complex = False
M = 7 # number of rows
N = 3 # number of cols
rank = min(M, N) # set desired rank == full column rank == independent columns
print("desired rank of A:", rank)
if use_complex:
dtype = "complex128"
A = np.zeros([M, N], dtype=dtype)
for i in range(rank):
col = rng.normal(mean, stdev, M) + 1j * rng.normal(mean, stdev, M)
row = rng.normal(mean, stdev, N) + 1j * rng.normal(mean, stdev, N)
A += np.outer(col, row) # superposition of rank-1 matrices
else:
dtype = "float64"
A = np.zeros([M, N], dtype=dtype)
for i in range(rank):
col = rng.normal(mean, stdev, M)
row = rng.normal(mean, stdev, N)
A += np.outer(col, row) # superposition of rank-1 matrices
# check if rng produced desired rank
print(" rank of A:", matrix_rank(A))
print("tall/thin matrix with full column rank")
print("-> matrix V contains only the row space")
print("-> null space is only the zero vector")
print("A =\n", A)
[U, s, Vh] = svd(A, full_matrices=True)
S = diagsvd(s, M, N) # for full SVD the matrix S has same dim as A
V = Vh.conj().T
Uh = U.conj().T
print("U =\n", U)
# number of non-zero sing values along diag must match rank
print("non-zero singular values: ", s[:rank])
print("S =\n", S) # contains 0 Matrix below diagonal part
print("V =\n", V)
The null space $N(\mathbf{A})$ of the tall/thin, full column rank matrix $\mathbf{A}$ is only $\mathbf{0}$, i.e. $N(\mathbf{A})=\mathbf{0}$. Except for $\mathbf{x}=\mathbf{0}$, all other $\mathbf{x}$ are mapped to the column space $C(\mathbf{A})$. This, however, requires, that the $\mathbf{V}$ matrix completely spans the row space and no $\mathbf{v}$ vectors span a dedicated null space.
The tall/thin, full column rank matrix $\mathbf{A}$ spans a rather large left null space $N(\mathbf{A}^\mathrm{H})$ with dimension $M-\mathrm{rank}(A)$.
We, therefore, here deal with a linear set of equations with more equations than unknowns ($M>N$, more rows than columns) , i.e. the over-determined case. For this case, we can find a solution in the least-squares error sense by help of the left inverse as discussed below.
# all stuff that is in matrix U
print("U =\n", U)
# column space C(A)
print("\ncolumn space (orthogonal to left null space):")
print(U[:, :rank])
# left null space, if empty only 0 vector
print("left null space (orthogonal to column space):")
print(U[:, rank:])
print("###")
# all stuff that is in matrix V
print("\nV =\n", V)
# row space
print("\nrow space (orthogonal to null space):")
print(V[:, :rank])
# null space N(A), if empty only 0 vector
print("null space (orthogonal to row space):")
print(V[:, rank:]) # for full column rank this is only the zero vector
[U, s, Vh] = svd(A, full_matrices=True)
V = Vh.conj().T
Uh = U.conj().T
Si = diagsvd(1 / s, N, M) # works if array s has only non-zero entries
print("Inverse singular value matrix with right zero block")
print("Si =\n", Si)
# left inverse using 'inverse' SVD:
Ali = V @ Si @ Uh
# left inverse using a dedicated pinv algorithm
# proper choice is done by pinv() itself
Ali_pinv = pinv(A)
print("pinv == left inverse via SVD?", np.allclose(Ali, Ali_pinv))
print("Si @ S\n", Si @ S, "\nyields NxN identity matrix")
print("Ali @ A = \n", Ali @ A, "\nyields NxN identity matrix")
u_col = U[:, 0, None]
u_left_null = U[:, rank, None]
u_tmp = u_col + u_left_null
# projection onto row space == I_NxN
# full rank and identity since we don't have a null space
# so each vector of the row space is projected onto itself
P_CAH = Ali @ A
print(
"projection matrix P_CAH projects a row space vector onto itself:\n",
"P_CAH @ V[:,0] == V[:,0]:",
np.allclose(P_CAH @ V[:, 0], V[:, 0]),
)
# projection onto column space
P_CA = A @ Ali
print(
"projection matrix P_CA projects U-space stuff to column space:\n",
"P_CA @ (u_tmp) == u_col:",
np.allclose(P_CA @ (u_tmp), u_col),
)
# projection onto null space == null matrix
P_NA = np.eye(N, N) - P_CAH
print("projection matrix P_NA is a null matrix\nP_NA=\n", P_NA)
# projection onto left null space
P_NAH = np.eye(M, M) - P_CA
print(
"projection matrix P_NAH projects U-space stuff to left null space:\n",
"P_NAH @ (u_tmp)==u_lnull",
np.allclose(P_NAH @ (u_tmp), u_left_null),
)
# design a vector:
# one entry from column space + one entry from left null space
# so we take some special left singular vectors:
b = U[:, 0, None] + U[:, rank, None] # same as u_tmp above
print("b==u_tmp", np.allclose(b, u_tmp))
# the vector b is a linear combination and lives in column+left null spaces
# with above introduced projection matrices we can project b
# (i) onto column space
bhat = P_CA @ b # project b to column space C(A)
print("bhat==U[:, 0, None]:", np.allclose(bhat, U[:, 0, None]))
# and
# (ii) onto left null space
e = P_NAH @ b # project b to left null space N(A^H)
print("e==U[:, 0, None]:", np.allclose(e, U[:, rank, None]))
# to find x_hat, i.e. the LS solution of the inverse problem
# we bring b to row space via left inverse:
# only the column space part of b (i.e. bhat) is brought to the row space
# we can never map back a 'zero' (i.e. e)
print("x_hat = Ali @ b == Ali @ bhat:", np.allclose(Ali @ b, Ali @ bhat))
# we expect that this is the scaled first right singular vector
print("x_hat=\n", V[:, 0, None] / S[0, 0])
print(Ali @ bhat)
Vector addition from example above with an error term $\mathbf{e}$
$\hat{\mathbf{b}} + \mathbf{e} = \mathbf{b} \rightarrow \mathbf{e} = \mathbf{b} - \hat{\mathbf{b}}$
We know, that pure row space $\hat{\mathbf{x}}$ maps to pure column space $\hat{\mathbf{b}}$
$\hat{\mathbf{b}} = \mathbf{A} \hat{\mathbf{x}}$
Inserting this
$\mathbf{e} = \mathbf{b} - \hat{\mathbf{b}} = \mathbf{b} - \mathbf{A} \hat{\mathbf{x}}$
The vector $\mathbf{A} \mathbf{x}$ is always living in the column space no matter what $\mathbf{x}$ is constructed of.
The vector $\mathbf{e}$ is orthogonal to column space (since it lives in left null space).
So, we know that the inner product must solve to zero
$(\mathbf{A} \mathbf{x})^\mathrm{H} \mathbf{e} = 0 \rightarrow \mathbf{x}^\mathrm{H} \mathbf{A}^\mathrm{H} (\mathbf{b} - \mathbf{A} \hat{\mathbf{x}}) = 0$
Rearranging yields
$\mathbf{x}^\mathrm{H} \mathbf{A}^\mathrm{H} \mathbf{b} = \mathbf{x}^\mathrm{H} \mathbf{A}^\mathrm{H} \mathbf{A} \hat{\mathbf{x}}$
and by canceling $\mathbf{x}^\mathrm{H} $ the famous normal equation is obtained
$\mathbf{A}^\mathrm{H} \mathbf{b} = \mathbf{A}^\mathrm{H} \mathbf{A} \hat{\mathbf{x}}$
This can be solved using the inverse of $\mathbf{A}^\mathrm{H} \mathbf{A}$ (this matrix is full rank and therefore invertible)
$(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H} \mathbf{b} = (\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} (\mathbf{A}^\mathrm{H} \mathbf{A}) \hat{\mathbf{x}}$
Since $(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} (\mathbf{A}^\mathrm{H} \mathbf{A}) = \mathbf{I}$ holds, we get the solution in least-squares error sense for $\mathbf{x}$ in the row space of $\mathbf{A}$
$(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H} \mathbf{b} = \hat{\mathbf{x}}$
We find the left inverse of $\mathbf{A}$ as
$\mathbf{A}^{+L} = (\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H}$
such that left multiplying
$\mathbf{A}^{+L} \mathbf{A} = \mathbf{I}$
Here, the origin of the naming least squares error is made more obvious.
We set up an optimization problem defining least amount of squared length of the error vector
$\mathrm{min}_\mathbf{x} ||\mathbf{e}||^2_2 = \mathrm{min}_\mathbf{x} ||\mathbf{b} - \mathbf{A} {\mathbf{x}}||_2^2$
We could solve this with help of calculus. But we have a nice tool, i.e. the properties of subspaces, that not requires pages of calculation:
We must find the minimum distance from $\mathbf{b}$ to the column space of $\mathbf{A}$.
This can be only achieved if the error vector $\mathbf{e}=\mathbf{b} - \mathbf{A} {\mathbf{x}}$ is orthogonal to the column space of $\mathbf{A}$.
This in turn means that $\mathbf{e}$ must live in the left null space of $\mathbf{A}$, i.e. $\mathbf{e} \in N(\mathbf{A}^\mathrm{H})$.
By definition of left nullspace we have $\mathbf{A}^\mathrm{H} \mathbf{e} = \mathbf{0}$.
So, let us insert $\mathbf{e}$ into $\mathbf{A}^\mathrm{H} \mathbf{e} = \mathbf{0}$:
$\mathbf{A}^\mathrm{H} (\mathbf{b} - \mathbf{A} {\mathbf{x}}) = \mathbf{0}$
$\mathbf{A}^\mathrm{H} \mathbf{b} = \mathbf{A}^\mathrm{H} \mathbf{A} {\mathbf{x}}$
The optimum $\mathbf{x}$ which solves the problem is just as above in derivation 1
$(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H} \mathbf{b} = \hat{\mathbf{x}}$
And again, we find the left inverse of $\mathbf{A}$ as
$\mathbf{A}^{+L} = (\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H}$
Ali_normaleq = inv(A.conj().T @ A) @ A.conj().T
xhat = Ali_normaleq @ b # LS solution in row space
print("xhat = ", xhat)
bhat = A @ xhat # map to column space
# thus this is the projection matrix that maps b (column + left nullspace stuff)
# to the column space of A in the LS sense
P_CA2 = A @ Ali_normaleq
print("P_CA == P_CA2 ?", np.allclose(P_CA, P_CA2)) # check with above result
print(
"P_CA2 @ b == bhat:", np.allclose(P_CA2 @ b, bhat)
) # check that both outputs are equal