#!/usr/bin/env python
# coding: utf-8
# # Lecture 8: Symmetric eigenvalue problems (algorithms) and SVD (applications)
# ## Recap of the previous lecture
# - QR decomposition and Gram-Schmidt algorithm
# - Schur decomposition and QR-algorithm (basic)
# ## Today's lecture
#
# Today we will talk about:
#
# - Algorithms for the symmetric eigenvalue problems
# - QR algorithm (in more details)
# - Divide-and-Conquer
# - bisection
# - SVD and its applications
# ## Schur form computation
#
# Recall that we are trying to avoid $\mathcal{O}(n^4)$ complexity. The idea is to make a matrix have a simpler structure so that each step of QR algorithm becomes cheaper.
#
# In case of a general matrix we can use the **Hessenberg form**.
# ## Hessenberg form
#
# The matrix $A$ is said to be in the Hessenberg form, if
#
# $$a_{ij} = 0, \quad \mbox{if } i \geq j+2.$$
#
# $$H = \begin{bmatrix} * & * & * & * & * \\ * & * & * & * & * \\ 0 & * & * & * & *\\ 0 & 0 & * & * & *\\ 0 & 0 & 0 & * & * \\ \end{bmatrix}.$$
# ## Reduction any matrix to Hessenberg form
#
# By applying Householder reflections we can reduce any matrix to the Hessenberg form
# $$U^* A U = H$$
#
# The only difference with Schur decomposition is that we have to map the first column to the vector with two non-zeros, and the first element is not changed.
#
# The computational cost of such reduction is $\mathcal{O}(n^3)$ operations.
#
# In a Hessenberg form, computation of one iteration of the QR algorithm costs $\mathcal{O}(n^2)$ operations (e.g. using Givens rotations, how?), and the Hessenberg form is preserved by the QR iteration (check why).
# ## Symmetric (Hermitian) case
#
# In the symmetric case, we have $A = A^*$, then $H = H^*$ and the upper Hessenberg form becomes tridiagonal matrix.
#
# From now on we will talk about the case of symmetric tridiagonal form.
#
# Any symmetric (Hermitian) matrix can be reduced to the tridiagonal form by Householder reflections.
#
# Key point is that tridiagonal form is preserved by the QR algorithm, and the cost of one step can be reduced to $\mathcal{O}(n)$!
# ## QR algorithm: iterations
#
# The iterations of the QR algorithm have the following form:
#
# $$A_k = Q_k R_k, \quad A_{k+1} = R_k Q_k.$$
#
# If $A_0 = A$ is tridiagonal symmetric matrix , this form is preserved by the QR algorithm.
#
# Let us see..
# In[1]:
get_ipython().run_line_magic('matplotlib', 'inline')
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
#Generate a random tridiagonal matrix
n = 20
d = np.random.randn(n)
sub_diag = np.random.randn(n-1)
mat = np.diag(d) + np.diag(sub_diag, -1) + np.diag(sub_diag, 1)
mat1 = np.abs(mat)
mat1 = mat1/np.max(mat1.flatten())
plt.spy(mat)
q, r = np.linalg.qr(mat)
plt.figure()
b = r.dot(q)
b[abs(b) <= 1e-12] = 0
plt.spy(b)
#plt.figure()
#plt.imshow(np.abs(r.dot(q)))
b[0, :]
# ## Tridiagonal form
# In the tridiagonal form, you do not have to compute the $Q$ matrix: you only have to compute the **triadiagonal part**
# that appears after the iterations
#
# $$A_k = Q_k R_k, \quad A_{k+1} = R_k Q_k,$$
#
# in the case when $A_k = A^*_k$ and is also tridiagonal.
#
#
# Such matrix is defined by $\mathcal{O}(n)$ parameters; computation of the QR is more complicated,
# but it is possible to compute $A_{k+1}$ directly without computing $Q_k$.
#
# This is called **implicit QR-step**.
# ## Theorem on implicit QR iteration
#
#
# All the implicit QR algorithms are based on the following **theorem**:
#
# Let $$Q^* A Q = H$$ be an irreducible upper Hessenberg matrix.
#
# Then, the first column of the matrix $Q$ defines all of its other columns.
# It can be found from the equation
#
# $$A Q = Q H. $$
#
# ## Convergence of the QR-algorithm
#
# The convergence of the QR-algorithm is a very **delicate issue** (see Tyrtyshnikov, Brief introduction to numerical analysis for details).
#
#
# Summary. If we have a decomposition of the form
#
# $$A = X \Lambda X^{-1}, \quad A = \begin{bmatrix}A_{11} & A_{12} \\ A_{21} & A_{22}\end{bmatrix}$$
#
# and
# $$
# \Lambda = \begin{bmatrix} \Lambda_1 & 0 \\
# 0 & \Lambda_2 \end{bmatrix}, \quad \lambda(\Lambda_1)=\{\lambda_1,\dots,\lambda_m\}, \ \lambda(\Lambda_2)=\{\lambda_{m+1},\dots,\lambda_r\},
# $$
#
# and there is a **gap** between the eigenvalues of $\Lambda_1$ and $\Lambda_2$ ($|\lambda_1|\geq \dots \geq |\lambda_m| > |\lambda_{m+1}| \geq\dots \geq |\lambda_r| >0$), then the $A^{(k)}_{21}$ block of $A_k$
# in the QR-iteration goes to zero with
#
# $$\Vert A^{(k)}_{21} \Vert \leq C q^k, \quad q = \left| \frac{\lambda_{m+1}}{\lambda_{m}} \right |,$$
#
# where $m$ is the size of $\Lambda_1$.
#
# So we need to increase the gap! It can be done by the **QR algorithm with shifts**.
# ## QR-algorithm with shifts
#
# $$A_{k} - s_k I = Q_k R_k, \quad A_{k+1} = R_k Q_k + s_k I$$
#
#
# The convergence rate for a shifted version is then
#
# $$\left| \frac{\lambda_{m+1} - s_k}{\lambda_{m} - s_k} \right |,$$
#
# where $\lambda_m$ is the $m$-th largest eigenvalue of the matrix in modulus. If the shift is close to the eigenvalue, then the convergence speed is better.
#
# There are different stratagies to choose shifts.
#
# Shifts is a general stratagy to improve convergence of iterative methods of finding eigenvalues. In next slides we will illustrate how to choose shift on a simpler algorithm.
# ## Shifts and power method
#
# Remember the power method for the computation of the eigenvalues.
#
# $$x_{k+1} := A x_k, \quad x_{k+1} := \frac{x_{k+1}}{\Vert x_{k+1} \Vert}.$$
#
# It converges to the eigenvector corresponding to the largest eigenvalue in modulus.
#
# The convergence can be very slow.
#
# Let us try to use shifting strategy. If we shift the matrix as
#
# $$ A := A - \lambda_k I.$$
#
# and the corresponding eigenvalue becomes small (but we need large). That is not what we wanted.
# ## Inverse iteration and Rayleigh quotient iteration
#
# To make a small eigenvalue large, we need to **invert the matrix**, and that gives us **inverse iteration**
#
# $$x_{k+1} = (A - \lambda I)^{-1} x_k,$$
#
# where $\lambda$ is the shift which is approximation to the eigenvalue that we want. As it was for the power method, the convegence is linear.
#
# To accelerate convergence one can use the **Rayleigh quotient iteration** (inverse iteration with adaptive shifts) which is given by the selection of the **adaptive shift**:
#
# $$x_{k+1} = (A - \lambda_k I)^{-1} x_k,$$
# $$\lambda_k = \frac{(Ax_k, x_k)}{(x_k, x_k)}$$
#
# In the symmetric case $A = A^*$ the convergence is **locally cubic** and **locally quadratic** otherwise.
# ## Singular values and eigenvalues (1)
#
# Now let us talk about singular values and eigenvalues.
#
# SVD: $$A = U \Sigma V^*$$
#
# exists for any matrix.
#
# It can be also viewed as a reduction of a given matrix to the diagonal form by means of
#
# two-sided unitary transformations:
#
# $$\Sigma = U^* A V.$$
#
# By two-sided Householder transformation we can reduce any matrix to the **bidiagonal form** $B$.
# ## Singular values and eigenvalues (2)
#
# **Implicit QR algorithm** (with shifts) gives the way of computing the eigenvalues (and Schur form).
# But we cannot apply QR algorithm directly to the bidiagonal matrix, as it is not diagonalizable in general case.
#
# However, the problem of the computation of the SVD can be reduced to the **symmetric eigenvalue problem** in two ways:
#
# 1. Work with the tridiagonal matrix $$T = B^* B$$
# 2. Work with the extended matrix $$T = \begin{bmatrix} 0 & B \\ B^* & 0 \end{bmatrix}$$
#
#
# The case 1 is OK if you **do not form T directly**!
#
# Thus, the problem of computing singular values can be reduced to the problem of the computation of the eigenvalues of symmetric tridiagonal matrix.
# ## Algorithms for the SEV (symmetric eigenvalue problem)
#
# Done:
# - QR algorithm: the "gold standard" of the eigenvalue computations
# - RQI-iteration: Rayleigh quotient iteration is implicitly performed at each step of the QR algorithm
#
# Next slides:
# - Divide-and-conquer: the fastest (?) nowdays
# - Bisection method
# - Jacobi method
# ## Divide-and-conquer
#
# Suppose we have a tridiagonal matrix, and we split it into two blocks:
#
#
# $$T = \begin{bmatrix} T'_1 & B \\ B^{\top} & T'_2 \end{bmatrix}$$
#
# We can write the matrix $T$ as
#
# $$T = \begin{bmatrix} T_1 & 0 \\ 0 & T_2 \end{bmatrix} + b_m v v^*$$
#
# where $vv^*$ is rank 1 matrix, $v = (0,\dots,0,1,1,0,\dots,0)^T$.
#
# Suppose we have decomposed $T_1$ and $T_2$ already:
#
# $$T_1 = Q_1 \Lambda_1 Q^*_1, \quad T_2 = Q_2 \Lambda_2 Q^*_2$$
#
# Then (check),
#
# $$\begin{bmatrix} Q^*_1 & 0 \\ 0 & Q^*_2 \end{bmatrix} T\begin{bmatrix} Q^*_1 & 0 \\ 0 & Q^*_2 \end{bmatrix} = D + \rho u u^{*}, \quad D = \begin{bmatrix} \Lambda_1 & 0 \\ 0 & \Lambda_2\end{bmatrix}$$
#
# I.e. we have reduced the problem to the problem of the computation of the eigenvalues of
#
#
# diagonal plus low-rank matrix
#
#
# ## Diagonal-plus-low-rank matrix
#
# It is tricky to compute the eigenvalues of the matrix
#
# $$D + \rho u u^* $$
#
# The characteristic polynomial has the form
#
# $$\det(D + \rho uu^* - \lambda I) = \det(D - \lambda I)\det(I + \rho (D - \lambda I)^{-1} uu^*) = 0.$$
#
# Then (prove!!)
#
# $$\det(I + \rho (D - \lambda I)^{-1} uu^*) = 1 + \rho \sum_{i=1}^n \frac{|u_i|^2}{d_i - \lambda} = 0$$
#
# Hint: find $\det(I + w u^*)$ using the fact that $\text{det}(C) = \prod_{i=1}^n\lambda_i(C)$ and $\text{trace}(C) = \sum_{i=1}^n \lambda_i$.
# ## Characteristic equation
#
# $$1 + \rho \sum_{i=1}^n \frac{|u_i|^2}{d_i - \lambda} = 0$$
#
# How to find the roots?
# In[2]:
import numpy as np
lm = [1, 2, 3, 4]
M = len(lm)
D = np.array(lm)
a = np.min(lm)
b = np.max(lm)
t = np.linspace(-1, 6, 1000)
u = 0.5 * np.ones(M)
rho = 1
def fun(lam):
return 1 + rho * np.sum(u**2/(D - lam))
res = [fun(lam) for lam in t]
plt.plot(res, 'k')
plt.ylim([-6, 6])
plt.tight_layout()
plt.title('Plot of the function')
# The function has only one root at $[d_i, d_{i+1}]$
#
# We have proved, by the way, the **Cauchy interlacing theorem** (what happens to the eigenvalues under rank-$1$ perturbation)
# ## How to find the root
#
# A Newton method will fail (draw a picture with a tangent line).
#
# Note that Newton method is just approximation of a function $f(\lambda)$ by a linear function.
#
# Much better approximation is the **hyperbola**:
#
# $$f(\lambda) \approx c_0 + \frac{c_1}{d_i - \lambda} + \frac{c_2}{d_{i+1} - \lambda}.$$
#
# To fit the coefficients, we have to evaluate $f(\lambda)$ and $f'(\lambda)$ in the particular point.
#
# After that, the approximation can be recovered from solving **quadratic equation**
#
#
#
# ## Important issues
#
# First, stability: This method was abandoned for a long time due to instability of the computation of the eigenvectors.
#
# In the recursion, we need to compute the eigenvectors of the $D + \rho uu^*$ matrix.
#
# The exact expression for the eigenvectors is just (let us check!)
#
# $$(D - \alpha_i I)^{-1}u,$$ where $\alpha_i$ is the computed root.
# ## Lowner theorem
#
# The solution came is to use a strange **Lowner theorem:**
#
# If $\alpha_i$ and $d_i$ satisfy the **interlacing theorem**
#
#
# $$d_n < \alpha_n < \ldots < d_{i+1} < \alpha_{i+1} \ldots$$
#
# Then there exists a vector $\widehat{u}$ such that $\alpha_i$ are exact eigenvalues of the matrix
#
# $$\widehat{D} = D + \widehat{u} \widehat{u}^*.$$
#
# So, you first compute the eigenvalues, then compute $\widehat{u}$ and only then the eigenvectors.
# ## Divide and conquer and the Fast Multipole Method
#
# In the computations of divide and conquer we have to evaluate the sums of the form
#
# $$f(\lambda) = 1 + \rho \sum_{i=1}^n \frac{|u_i|^2}{(d_i - \lambda)},$$
#
# and have to do it at least for $n$ points.
#
# The complexity is then $\mathcal{O}(n^2)$, as for the QR algorithm.
#
# Can we make it $\mathcal{O}(n \log n)$?
#
# The answer is yes, but we have to replace the computations by the approximate ones
# by the help of **Fast Multipole Method**.
#
# I will do a short explanation on the whiteboard.
# ## Few more algorithms
#
# Absolutely different approach is based on the **bisection**.
#
# Given a matrix $A$ its inertia is defined as a triple $(\nu, \zeta, \pi)$,
# where $\nu$ is the number of negative, $\zeta$ - zero and $\pi$ - positive eigenvalues.
#
# If $X$ is non-singular, then
#
# $$Inertia(A) = Inertia(X^* A X)$$
# ## Bisection via Gaussian elimination
#
# Given $z$ we can do the Gaussian elimination:
#
# $$A - zI = L D L^*,$$
#
# and inertia of the diagonal matrix is trivial to compute.
#
# Thus, if we want to find all the eigenvalues in the interval $a$, $b$
#
# Using inertia, we can easily count the number of eigenvalues in an interval.
# Illustration: if $Inertia(A)=(5,0,2)$ and after shift $Inertia(A-zI)=(4,0,3)$, $z\in[a,b]$ then we know that $\lambda(A)\in[a,z]$.
# ## Jacobi method
#
# Recall what a Jacobi (Givens rotations) are:
#
# In a plane they correspong to a $2 \times 2$ orthogonal matrix of the form
#
# $$\begin{pmatrix} \cos \phi & \sin \phi \\ -\sin \phi & \cos \phi \end{pmatrix},$$
#
# and in the $n$-dimensional case we select two variables $i$ and $j$ and rotate.
# ## Jacobi method (cont.)
#
# The idea of the Jacobi method is to minimize sum of squares of off-diagonal elements:
#
# $$\Gamma(A) = \mathrm{off}( U^* A U), \quad \mathrm{off}^2(X) = \sum_{i \ne j} \left|X_{ij}\right|^2 = \|X \|^2_F - \sum\limits_{i=1}^n x^2_{ii}.$$
#
# by applying succesive Jacobi rotations $U$ to zero off-diagonal elements.
#
# When the "pivot" is chosen, it is easy to eliminate it.
#
# The main question is then what is the order of **sweeps** we have to make (i.e. in which order to eliminate).
#
# If we always eliminate the largest off-diagonal elements the method has quadratic convergence.
#
# In practice, a cyclic order (i.e., $(1, 2), (1, 3), \ldots, (2, 3), \ldots$) is used.
# ## Jacobi method: convergence
#
# To show convergence, we firstly show that
# $$
# \text{off}(B) < \text{off}(A),
# $$
# where $B = U^*AU$.
#
# In this case we use the unitary invariance of Frobenius norm and denote by $p$ and $q$ the indices that is changed after rotation:
#
# $ \Gamma^2(A) = \text{off}^2(B) = \|B\|^2_F - \sum\limits_{i=1}^n b^2_{ii} = \| A \|^2_F - \sum\limits_{i \neq p, q} b^2_{ii} - (b^2_{pp} + b^2_{qq}) = \| A \|^2_F - \sum\limits_{i \neq p, q} a^2_{ii} - (a^2_{pp} + 2a^2_{pq} + a^2_{qq}) = \| A \|^2_F - \sum\limits_{i =1}^n a^2_{ii} - 2a^2_{pq} = \text{off}^2(A) - 2a^2_{pq} < \text{off}^2(A)$
#
# We show that the ''size'' of off-diagonal elements decreases after Jacobi rotation.
#
# If we always select the largest off-diagonal element $a_{pq} = \gamma$ to eliminate (pivot), then we have
#
# $$
# |a_{ij}| \leq \gamma,
# $$
#
# thus
# $$
# \text{off}(A)^2 \leq 2 N \gamma^2,
# $$
# where $2N = n(n-1)$ is the number of off-diagonal elements.
# Or rewrite this inequality in the form
#
# $$2
# \gamma^2 \geq \frac{\text{off}^2(A)}{N}.
# $$
# Now we use relations $\Gamma^2(A) = \text{off}^2(A) - 2\gamma^2 \leq \text{off}^2(A) - \dfrac{\text{off}^2(A)}{N}$ and get
# $$
# \Gamma(A) \leq \sqrt{\left(1 - \frac{1}{N}\right)} \text{off}(A).
# $$
#
# Aften $K$ steps we have the factor
#
# $$\left(1 - \frac{1}{N}\right)^{\frac{K}{2}} \approx e^{-\frac{1}{2}},$$
#
# i.e. linear convergence. However, the convergence is locally quadratic (given without proof here).
# ## Jacobi: summary
#
# Jacobi method was the first numerical method for the eigenvalues, proposed in 1846.
#
# - Large constant
# - Very accurate (high relative error for small eigenvalues)
# - Good parallel capabilities
# ## Summary for this part
# - Many algorithms for the computation of the SEV solution (QR, Divide-and-conquer, bisection, Jacobi)
# ## Summary of todays lecture
#
# - Algorithms for symmetric eigenvalue problems
# ## Next lecture
# - We start **sparse and/or structured** NLA.
# # Questions?
# In[14]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()