import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import sympy as sy
np.set_printoptions(precision=3)
np.set_printoptions(suppress=True)
def round_expr(expr, num_digits):
return expr.xreplace({n : round(n, num_digits) for n in expr.atoms(sy.Number)})
The first theorem of symmetric matrix:
If $A$ is symmetric, i.e. $A = A^T$, then any two eigenvectors from different eigenspaces are orthogonal.
Because $\lambda_1 \neq \lambda_2$, so only condition which makes the above equation holds is
$$ \mathbf{v}_{1} \cdot \mathbf{v}_{2}=0 $$With the help of this theorem, we can conclude that if symmetric matrix $A$ has different eigenvalues, its corresponding eigenvectors must be mutually orthogonal.
The diagonalization of $A$ is
$$ A = PDP^T = PDP^{-1} $$where $P$ is an orthonormal matrix with all eigenvectors of $A$.
The second theorem of symmetric matrix:
An $n \times n$ matrix $A$ is orthogonally diagonalizable if and only if $A$ is a symmetric matrix: $A^{T}=\left(P D P^{T}\right)^{T}=P^{T T} D^{T} P^{T}=P D P^{T}=A$.
Create a random matrix.
A = np.round(2*np.random.rand(3, 3)); A
array([[1., 0., 0.], [1., 2., 0.], [2., 1., 2.]])
Make it symmetric.
B = A@A.T; B # generate a symmetric matrix
array([[1., 1., 2.], [1., 5., 4.], [2., 4., 9.]])
Perform diagonalization with np.linalg.eig()
.
D, P = np.linalg.eig(B); P
array([[ 0.2 , 0.975, 0.095], [ 0.511, -0.021, -0.859], [ 0.836, -0.22 , 0.503]])
D = np.diag(D); D
array([[11.926, 0. , 0. ], [ 0. , 0.527, 0. ], [ 0. , 0. , 2.547]])
Check the norm of all eigenvectors.
for i in range(3):
print(np.linalg.norm(P[:, i]))
1.0 1.0 1.0
Check the orthogonality of eigenvectors, see if $PP^T=I$
P@P.T
array([[ 1., 0., -0.], [ 0., 1., -0.], [-0., -0., 1.]])
An $n \times n$ symmetric matrix $A$ has the following properties:
All these properties are obvious without proof, as the example above shows.However the purpose of the theorem is not reiterating last section, it paves the way for spectral decomposition.
Write diagonalization explicitly, we get the representation of spectral decomposition
$$ \begin{aligned} A &=P D P^{T}=\left[\begin{array}{lll} \mathbf{u}_{1} & \cdots & \mathbf{u}_{n} \end{array}\right]\left[\begin{array}{ccc} \lambda_{1} & & 0 \\ & \ddots & \\ 0 & & \lambda_{n} \end{array}\right]\left[\begin{array}{c} \mathbf{u}_{1}^{T} \\ \vdots \\ \mathbf{u}_{n}^{T} \end{array}\right] \\ &=\left[\begin{array}{lll} \lambda_{1} \mathbf{u}_{1} & \cdots & \lambda_{n} \mathbf{u}_{n} \end{array}\right]\left[\begin{array}{c} \mathbf{u}_{1}^{T} \\ \vdots \\ \mathbf{u}_{n}^{T} \end{array}\right]\\ &= \lambda_{1} \mathbf{u}_{1} \mathbf{u}_{1}^{T}+\lambda_{2} \mathbf{u}_{2} \mathbf{u}_{2}^{T}+\cdots+\lambda_{n} \mathbf{u}_{n} \mathbf{u}_{n}^{T} \end{aligned} $$$ \mathbf{u}_{i} \mathbf{u}_{i}^{T}$ are rank $1$ symmetric matrices, because all rows of $ \mathbf{u}_{i} \mathbf{u}_{i}^{T}$ are multiples of $\mathbf{u}_{i}^{T}$.
Following the example above, we demonstrate in SymPy.
lamb0,lamb1,lamb2 = D[0,0], D[1,1], D[2,2]
u0,u1,u2 = P[:,0], P[:,1], P[:,2]
Check rank of $ \mathbf{u}_{i} \mathbf{u}_{i}^{T}$ by np.linalg.matrix_rank()
.
np.linalg.matrix_rank(np.outer(u0,u0))
1
Use spectral theorem to recover $A$:
specDecomp = lamb0 * np.outer(u0,u0) + lamb1 * np.outer(u1,u1) + lamb2 * np.outer(u2,u2)
specDecomp
array([[1., 1., 2.], [1., 5., 4.], [2., 4., 9.]])
A quadratic form is a function with form $Q(\mathbf{x})=\mathbf{x}^TA\mathbf{x}$, where $A$ is an $n\times n$ symmetric matrix, which is called the the matrix of the quadratic form.
Consider a matrix of quadratic form
$$ A = \left[ \begin{matrix} 3 & 2 & 0\\ 2 & -1 & 4\\ 0 & 4 & -2 \end{matrix} \right] $$construct the quadratic form $\mathbf{x}^TA\mathbf{x}$.
Fortunately, there is an easier way to calculate quadratic form.
Notice that coefficients of $x_i^2$ is on the principal diagonal and coefficients of $x_ix_j$ are be split evenly between $(i,j)-$ and $(j, i)$-entries in $A$.
Consider another example,
$$ A = \left[ \begin{matrix} 3 & 2 & 0 & 5\\ 2 & -1 & 4 & -3\\ 0 & 4 & -2 & -4\\ 5 & -3 & -4 & 7 \end{matrix} \right] $$All $x_i^2$'s terms are
$$ 3x_1^2-x_2^2-2x_3^2+7x_4^2 $$whose coefficients are from principal diagonal.
All $x_ix_j$'s terms are
$$ 4x_1x_2+0x_1x_3+10x_1x_4+8x_2x_3-6x_2x_4-8x_3x_4 $$Add up together then quadratic form is
$$ 3x_1^2-x_2^2-2x_3^2+7x_4^2+4x_1x_2+0x_1x_3+10x_1x_4+8x_2x_3-6x_2x_4-8x_3x_4 $$Let's verify in SymPy.
x1, x2, x3, x4 = sy.symbols('x_1 x_2 x_3 x_4')
A = sy.Matrix([[3,2,0,5],[2,-1,4,-3],[0,4,-2,-4],[5,-3,-4,7]])
x = sy.Matrix([x1, x2, x3, x4])
sy.expand(x.T*A*x)
The results is exactly the same as we derived.
To convert a matrix of quadratic form into diagonal matrix can save us same troubles, that is to say, no cross products terms.
Since $A$ is symmetric, there is an orthonormal $P$ that
$$ PDP^T = A \qquad \text{and}\qquad PP^T = I $$We can show that
$$ \mathbf{x}^TA\mathbf{x}=\mathbf{x}^TIAI\mathbf{x}=\mathbf{x}^TPP^TAPP^T\mathbf{x}=\mathbf{x}^TPDP^T\mathbf{x}=(P^T\mathbf{x})^TDP^T\mathbf{x}=\mathbf{y}^T D \mathbf{y}$$where $P^T$ defined a coordinate transformation and $\mathbf{y} = P^T\mathbf{x}$.
Consider $A$
$$ A = \left[ \begin{matrix} 3 & 2 & 0\\ 2 & -1 & 4\\ 0 & 4 & -2 \end{matrix} \right] $$Find eigenvalue and eigenvectors.
A = np.array([[3,2,0],[2,-1,4],[0,4,-2]]); A
array([[ 3, 2, 0], [ 2, -1, 4], [ 0, 4, -2]])
D, P = np.linalg.eig(A)
D = np.diag(D); D
array([[ 4.388, 0. , 0. ], [ 0. , 1.35 , 0. ], [ 0. , 0. , -5.738]])
Test if $P$ is normalized.
P.T@P
array([[ 1., -0., -0.], [-0., 1., -0.], [-0., -0., 1.]])
We can compute $\mathbf{y}= P^T\mathbf{x}$
x1, x2, x3 = sy.symbols('x1 x2 x3')
x = sy.Matrix([[x1], [x2], [x3]])
x
P = round_expr(sy.Matrix(P), 4); P
So the $\mathbf{y} = P^T \mathbf{x}$ is
$$ \left[\begin{matrix}0.7738 x_{1} + 0.5369 x_{2} + 0.3362 x_{3}\\- 0.6143 x_{1} + 0.5067 x_{2} + 0.6049 x_{3}\\0.1544 x_{1} - 0.6746 x_{2} + 0.7219 x_{3}\end{matrix}\right] $$The transformed quadratic form $\mathbf{y}^T D \mathbf{y}$ is
D = round_expr(sy.Matrix(D),4);D
y1, y2, y3 = sy.symbols('y1 y2 y3')
y = sy.Matrix([[y1], [y2], [y3]]);y
y.T*D*y
The codes are exceedingly lengthy, but intuitive.
k = 6
x = np.linspace(-k, k)
y = np.linspace(-k, k)
X, Y = np.meshgrid(x, y)
fig = plt.figure(figsize = (7, 7))
########################### xAx 1 ############################
Z = 3*X**2 + 7*Y**2
ax = fig.add_subplot(221, projection='3d')
ax.plot_wireframe(X, Y, Z, linewidth = 1.5, alpha = .3, color = 'r')
ax.set_title('$z = 3x^2+7y^2$')
xarrow = np.array([[-5, 0, 0, 10, 0, 0]])
X1, Y1, Z1, U1, V1, W1 = zip(*xarrow)
ax.quiver(X1, Y1, Z1, U1, V1, W1, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
yarrow = np.array([[0, -5, 0, 0, 10, 0]])
X2, Y2, Z2, U2, V2, W2 = zip(*yarrow)
ax.quiver(X2, Y2, Z2, U2, V2, W2, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
zarrow = np.array([[0, 0, -3, 0, 0, 300]])
X3, Y3, Z3, U3, V3, W3 = zip(*zarrow)
ax.quiver(X3, Y3, Z3, U3, V3, W3, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .001, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
########################### xAx 2 ############################
Z = 3*X**2
ax = fig.add_subplot(222, projection='3d')
ax.plot_wireframe(X, Y, Z, linewidth = 1.5, alpha = .3, color = 'r')
ax.set_title('$z = 3x^2$')
xarrow = np.array([[-5, 0, 0, 10, 0, 0]])
X1, Y1, Z1, U1, V1, W1 = zip(*xarrow)
ax.quiver(X1, Y1, Z1, U1, V1, W1, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
yarrow = np.array([[0, -5, 0, 0, 10, 0]])
X2, Y2, Z2, U2, V2, W2 = zip(*yarrow)
ax.quiver(X2, Y2, Z2, U2, V2, W2, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
zarrow = np.array([[0, 0, -3, 0, 0, 800]])
X3, Y3, Z3, U3, V3, W3 = zip(*zarrow)
ax.quiver(X3, Y3, Z3, U3, V3, W3, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .001, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
########################### xAx 3 ############################
Z = 3*X**2 - 7*Y**2
ax = fig.add_subplot(223, projection='3d')
ax.plot_wireframe(X, Y, Z, linewidth = 1.5, alpha = .3, color = 'r')
ax.set_title('$z = 3x^2-7y^2$')
xarrow = np.array([[-5, 0, 0, 10, 0, 0]])
X1, Y1, Z1, U1, V1, W1 = zip(*xarrow)
ax.quiver(X1, Y1, Z1, U1, V1, W1, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
yarrow = np.array([[0, -5, 0, 0, 10, 0]])
X2, Y2, Z2, U2, V2, W2 = zip(*yarrow)
ax.quiver(X2, Y2, Z2, U2, V2, W2, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
zarrow = np.array([[0, 0, -150, 0, 0, 300]])
X3, Y3, Z3, U3, V3, W3 = zip(*zarrow)
ax.quiver(X3, Y3, Z3, U3, V3, W3, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .001, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
########################### xAx 4 ############################
Z = -3*X**2 - 7*Y**2
ax = fig.add_subplot(224, projection='3d')
ax.plot_wireframe(X, Y, Z, linewidth = 1.5, alpha = .3, color = 'r')
ax.set_title('$z = -3x^2-7y^2$')
xarrow = np.array([[-5, 0, 0, 10, 0, 0]])
X1, Y1, Z1, U1, V1, W1 = zip(*xarrow)
ax.quiver(X1, Y1, Z1, U1, V1, W1, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
yarrow = np.array([[0, -5, 0, 0, 10, 0]])
X2, Y2, Z2, U2, V2, W2 = zip(*yarrow)
ax.quiver(X2, Y2, Z2, U2, V2, W2, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .12, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
zarrow = np.array([[0, 0, -300, 0, 0, 330]])
X3, Y3, Z3, U3, V3, W3 = zip(*zarrow)
ax.quiver(X3, Y3, Z3, U3, V3, W3, length=1, normalize=False, color = 'black',
alpha = .6, arrow_length_ratio = .001, pivot = 'tail',
linestyles = 'solid',linewidths = 2)
plt.show()
Som terms to need to be defined, a quadratic form $Q$ is:
We have a theorem for quadratic forms and eigenvalues:
Let $A$ be an $n \times n$ symmetric matrix. Then a quadratic form $\mathbf{x}^{T} A \mathbf{x}$ is:
With the help of this theorem, we can immediate tell if a quadratic form has a maximum, minimum or saddle point after calculating the eigenvalues.
Symmetric matrices are one of the most important matrix form in linear algebra, we will show they are always positive definite.
${A}$ is a symmetric matrix, premultiplying ${A}\mathbf{x}=\lambda \mathbf{x}$ by $\mathbf{x}^T$
$$ \mathbf{x}^T{A}\mathbf{x} = \lambda \mathbf{x}^T\mathbf{x} = \lambda \|\mathbf{x}\|^2 $$$\mathbf{x}^T{A}\mathbf{x}$ must be positive, since we defined so, then $\lambda$ must be larger than $0$.
Try asking the other way around: if all eigenvalues are positive, is $A_{n\times n}$ positive definite? Yes.
Here is the Principal Axes Theorem which employs the orthogonal change of variable $\mathbf{x}=P\mathbf{y}$:
$$ Q(\mathbf{x})=\mathbf{x}^{T} A \mathbf{x}=\mathbf{y}^{T} D \mathbf{y}=\lambda_{1} y_{1}^{2}+\lambda_{2} y_{2}^{2}+\cdots+\lambda_{n} y_{n}^{2} $$If all of $\lambda$'s are positive, $\mathbf{x}^{T} A \mathbf{x}$ is also positive.
Cholesky decomposition is modification of $LU$ decomposition. And it is more efficient than $LU$ algorithm.
If $A$ is positive definite matrix, i.e. $\mathbf{x}^{T} A \mathbf{x}>0$ or every eigenvalue is strictly positive. A positive definite matrix can be decomposed into a multiplication of lower triangular matrix and its transpose.
We will show this with NumPy.
A = np.array([[16, -8, -4], [-8, 29, 12], [-4, 12, 41]]); A
array([[16, -8, -4], [-8, 29, 12], [-4, 12, 41]])
L = sp.linalg.cholesky(A, lower = True); L
array([[ 4., 0., 0.], [-2., 5., 0.], [-1., 2., 6.]])
Check if $LL^T=A$
L@L.T
array([[16., -8., -4.], [-8., 29., 12.], [-4., 12., 41.]])
If a symmetric matrix $A$ does not have full rank, which means there must be a non-trivial vector $\mathbf{v}$ satisfies
$$ A\mathbf{v} = \mathbf{0} $$which also means the quadratic form equals zero $\mathbf{v}^TA\mathbf{v} = \mathbf{0}$. Thus $A$ can not be a positive definite matrix if it does not have full rank.
Contrarily, a matrix to be positive definite must have full rank.