Notebook

Sascha Spors, Professorship Signal Theory and Digital Signal Processing, Institute of Communications Engineering (INT), Faculty of Computer Science and Electrical Engineering (IEF), University of Rostock, Germany

Data Driven Audio Signal Processing - A Tutorial with Computational Examples¶

Winter Semester 2023/24 (Master Course #24512)

Feel free to contact lecturer frank.schultz@uni-rostock.de

Exercise 4: SVD and Left Matrix Inverse¶

Objectives¶

Matrix A of dimension (M x N)
SVD for a tall/thin matrix, we assume a matrix with full column rank $r=N$
Four subspaces in SVD domain
Projection matrices
Least squares / normal equation(s)
Left inverse

Special Python Packages¶

Some convenient functions are found in scipy.linalg, some in numpy.linalg

Some Initial Python Stuff¶

In [ ]:

import numpy as np
from scipy.linalg import svd, diagsvd, inv, pinv, null_space, norm
from numpy.linalg import matrix_rank

np.set_printoptions(precision=2, floatmode="fixed", suppress=True)

rng = np.random.default_rng(1234)
mean, stdev = 0, 1

# we might convince ourselves that all works for complex data as well
# then the ^H operator (conj().T) needs to be used instead of just .T
use_complex = False

SVD of Tall/Thin, Full Column Rank Matrix A¶

In [ ]:

M = 7  # number of rows
N = 3  # number of cols

rank = min(M, N)  # set desired rank == full column rank == independent columns
print("desired rank of A:", rank)

if use_complex:
    dtype = "complex128"
    A = np.zeros([M, N], dtype=dtype)
    for i in range(rank):
        col = rng.normal(mean, stdev, M) + 1j * rng.normal(mean, stdev, M)
        row = rng.normal(mean, stdev, N) + 1j * rng.normal(mean, stdev, N)
        A += np.outer(col, row)  # superposition of rank-1 matrices
else:
    dtype = "float64"
    A = np.zeros([M, N], dtype=dtype)
    for i in range(rank):
        col = rng.normal(mean, stdev, M)
        row = rng.normal(mean, stdev, N)
        A += np.outer(col, row)  # superposition of rank-1 matrices
# check if rng produced desired rank
print("        rank of A:", matrix_rank(A))
print("tall/thin matrix with full column rank")
print("-> matrix V contains only the row space")
print("-> null space is only the zero vector")
print("A =\n", A)

In [ ]:

[U, s, Vh] = svd(A, full_matrices=True)
S = diagsvd(s, M, N)  # for full SVD the matrix S has same dim as A
V = Vh.conj().T
Uh = U.conj().T

print("U =\n", U)
# number of non-zero sing values along diag must match rank
print("non-zero singular values: ", s[:rank])
print("S =\n", S)  # contains 0 Matrix below diagonal part
print("V =\n", V)

Four Subspaces in SVD Domain¶

The null space $N(\mathbf{A})$ of the tall/thin, full column rank matrix $\mathbf{A}$ is only $\mathbf{0}$, i.e. $N(\mathbf{A})=\mathbf{0}$. Except for $\mathbf{x}=\mathbf{0}$, all other $\mathbf{x}$ are mapped to the column space $C(\mathbf{A})$. This, however, requires, that the $\mathbf{V}$ matrix completely spans the row space and no $\mathbf{v}$ vectors span a dedicated null space.

The tall/thin, full column rank matrix $\mathbf{A}$ spans a rather large left null space $N(\mathbf{A}^\mathrm{H})$ with dimension $M-\mathrm{rank}(A)$.

We, therefore, here deal with a linear set of equations with more equations than unknowns ($M>N$, more rows than columns) , i.e. the over-determined case. For this case, we can find a solution in the least-squares error sense by help of the left inverse as discussed below.

In [ ]:

# all stuff that is in matrix U
print("U =\n", U)

# column space C(A)
print("\ncolumn space (orthogonal to left null space):")
print(U[:, :rank])

# left null space, if empty only 0 vector
print("left null space (orthogonal to column space):")
print(U[:, rank:])

print("###")

# all stuff that is in matrix V
print("\nV =\n", V)

# row space
print("\nrow space (orthogonal to null space):")
print(V[:, :rank])

# null space N(A), if empty only 0 vector
print("null space (orthogonal to row space):")
print(V[:, rank:])  # for full column rank this is only the zero vector

Left Inverse via SVD¶

In [ ]:

[U, s, Vh] = svd(A, full_matrices=True)
V = Vh.conj().T
Uh = U.conj().T

Si = diagsvd(1 / s, N, M)  # works if array s has only non-zero entries
print("Inverse singular value matrix with right zero block")
print("Si =\n", Si)
# left inverse using 'inverse' SVD:
Ali = V @ Si @ Uh
# left inverse using a dedicated pinv algorithm
# proper choice is done by pinv() itself
Ali_pinv = pinv(A)
print("pinv == left inverse via SVD?", np.allclose(Ali, Ali_pinv))
print("Si @ S\n", Si @ S, "\nyields NxN identity matrix")
print("Ali @ A = \n", Ali @ A, "\nyields NxN identity matrix")

Projection Matrices for the Left Inverse Problem¶

In [ ]:

u_col = U[:, 0, None]
u_left_null = U[:, rank, None]
u_tmp = u_col + u_left_null

# projection onto row space == I_NxN
# full rank and identity since we don't have a null space
# so each vector of the row space is projected onto itself
P_CAH = Ali @ A
print(
    "projection matrix P_CAH projects a row space vector onto itself:\n",
    "P_CAH @ V[:,0] == V[:,0]:",
    np.allclose(P_CAH @ V[:, 0], V[:, 0]),
)

# projection onto column space
P_CA = A @ Ali
print(
    "projection matrix P_CA projects U-space stuff to column space:\n",
    "P_CA @ (u_tmp) == u_col:",
    np.allclose(P_CA @ (u_tmp), u_col),
)

# projection onto null space == null matrix
P_NA = np.eye(N, N) - P_CAH
print("projection matrix P_NA is a null matrix\nP_NA=\n", P_NA)

# projection onto left null space
P_NAH = np.eye(M, M) - P_CA
print(
    "projection matrix P_NAH projects U-space stuff to left null space:\n",
    "P_NAH @ (u_tmp)==u_lnull",
    np.allclose(P_NAH @ (u_tmp), u_left_null),
)

In [ ]:

# design a vector:
# one entry from column space + one entry from left null space
# so we take some special left singular vectors:
b = U[:, 0, None] + U[:, rank, None]  # same as u_tmp above
print("b==u_tmp", np.allclose(b, u_tmp))

# the vector b is a linear combination and lives in column+left null spaces
# with above introduced projection matrices we can project b
# (i) onto column space
bhat = P_CA @ b  # project b to column space C(A)
print("bhat==U[:, 0, None]:", np.allclose(bhat, U[:, 0, None]))
# and
# (ii) onto left null space
e = P_NAH @ b  # project b to left null space N(A^H)
print("e==U[:, 0, None]:", np.allclose(e, U[:, rank, None]))

# to find x_hat, i.e. the LS solution of the inverse problem
# we bring b to row space via left inverse:
# only the column space part of b (i.e. bhat) is brought to the row space
# we can never map back a 'zero' (i.e. e)
print("x_hat = Ali @ b == Ali @ bhat:", np.allclose(Ali @ b, Ali @ bhat))
# we expect that this is the scaled first right singular vector
print("x_hat=\n", V[:, 0, None] / S[0, 0])
print(Ali @ bhat)

Left Inverse from Normal Equation(s) / Least Squares Error Optimization¶

Derivation 1¶

Vector addition from example above with an error term $\mathbf{e}$

$\hat{\mathbf{b}} + \mathbf{e} = \mathbf{b} \rightarrow \mathbf{e} = \mathbf{b} - \hat{\mathbf{b}}$

We know, that pure row space $\hat{\mathbf{x}}$ maps to pure column space $\hat{\mathbf{b}}$

$\hat{\mathbf{b}} = \mathbf{A} \hat{\mathbf{x}}$

Inserting this

$\mathbf{e} = \mathbf{b} - \hat{\mathbf{b}} = \mathbf{b} - \mathbf{A} \hat{\mathbf{x}}$

The vector $\mathbf{A} \mathbf{x}$ is always living in the column space no matter what $\mathbf{x}$ is constructed of.

The vector $\mathbf{e}$ is orthogonal to column space (since it lives in left null space).

So, we know that the inner product must solve to zero

$(\mathbf{A} \mathbf{x})^\mathrm{H} \mathbf{e} = 0 \rightarrow \mathbf{x}^\mathrm{H} \mathbf{A}^\mathrm{H} (\mathbf{b} - \mathbf{A} \hat{\mathbf{x}}) = 0$

Rearranging yields

$\mathbf{x}^\mathrm{H} \mathbf{A}^\mathrm{H} \mathbf{b} = \mathbf{x}^\mathrm{H} \mathbf{A}^\mathrm{H} \mathbf{A} \hat{\mathbf{x}}$

and by canceling $\mathbf{x}^\mathrm{H} $ the famous normal equation is obtained

$\mathbf{A}^\mathrm{H} \mathbf{b} = \mathbf{A}^\mathrm{H} \mathbf{A} \hat{\mathbf{x}}$

This can be solved using the inverse of $\mathbf{A}^\mathrm{H} \mathbf{A}$ (this matrix is full rank and therefore invertible)

$(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H} \mathbf{b} = (\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} (\mathbf{A}^\mathrm{H} \mathbf{A}) \hat{\mathbf{x}}$

Since $(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} (\mathbf{A}^\mathrm{H} \mathbf{A}) = \mathbf{I}$ holds, we get the solution in least-squares error sense for $\mathbf{x}$ in the row space of $\mathbf{A}$

$(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H} \mathbf{b} = \hat{\mathbf{x}}$

We find the left inverse of $\mathbf{A}$ as

$\mathbf{A}^{+L} = (\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H}$

such that left multiplying

$\mathbf{A}^{+L} \mathbf{A} = \mathbf{I}$

Derivation 2¶

Here, the origin of the naming least squares error is made more obvious.

We set up an optimization problem defining least amount of squared length of the error vector

$\mathrm{min}_\mathbf{x} ||\mathbf{e}||^2_2 = \mathrm{min}_\mathbf{x} ||\mathbf{b} - \mathbf{A} {\mathbf{x}}||_2^2$

We could solve this with help of calculus. But we have a nice tool, i.e. the properties of subspaces, that not requires pages of calculation:

We must find the minimum distance from $\mathbf{b}$ to the column space of $\mathbf{A}$.

This can be only achieved if the error vector $\mathbf{e}=\mathbf{b} - \mathbf{A} {\mathbf{x}}$ is orthogonal to the column space of $\mathbf{A}$.

This in turn means that $\mathbf{e}$ must live in the left null space of $\mathbf{A}$, i.e. $\mathbf{e} \in N(\mathbf{A}^\mathrm{H})$.

By definition of left nullspace we have $\mathbf{A}^\mathrm{H} \mathbf{e} = \mathbf{0}$.

So, let us insert $\mathbf{e}$ into $\mathbf{A}^\mathrm{H} \mathbf{e} = \mathbf{0}$:

$\mathbf{A}^\mathrm{H} (\mathbf{b} - \mathbf{A} {\mathbf{x}}) = \mathbf{0}$

$\mathbf{A}^\mathrm{H} \mathbf{b} = \mathbf{A}^\mathrm{H} \mathbf{A} {\mathbf{x}}$

The optimum $\mathbf{x}$ which solves the problem is just as above in derivation 1

$(\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H} \mathbf{b} = \hat{\mathbf{x}}$

And again, we find the left inverse of $\mathbf{A}$ as

$\mathbf{A}^{+L} = (\mathbf{A}^\mathrm{H} \mathbf{A})^{-1} \mathbf{A}^\mathrm{H}$

In [ ]:

Ali_normaleq = inv(A.conj().T @ A) @ A.conj().T

xhat = Ali_normaleq @ b  # LS solution in row space
print("xhat = ", xhat)
bhat = A @ xhat  # map to column space

# thus this is the projection matrix that maps b (column + left nullspace stuff)
# to the column space of A in the LS sense
P_CA2 = A @ Ali_normaleq

print("P_CA == P_CA2 ?", np.allclose(P_CA, P_CA2))  # check with above result
print(
    "P_CA2 @ b == bhat:", np.allclose(P_CA2 @ b, bhat)
)  # check that both outputs are equal

Copyright¶

the notebooks are provided as Open Educational Resources
the text is licensed under Creative Commons Attribution 4.0
the code of the IPython examples is licensed under the MIT license
feel free to use the notebooks for your own purposes
please attribute the work as follows: Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.