Week 1: Matrices, vectors, matrix/vector norms, scalar products & unitary matrices
Week 2: TAs-week (Strassen, FFT, a bit of SVD)
Week 3: Matrix ranks, singular value decomposition, linear systems, eigenvalues
Week 4: Matrix decompositions: QR, LU, SVD + test + structured matrices start
Today we will talk about sparse matrices, where they arise, how we store them, how we operate with them.
Consider the simplest partial differential equation (PDE), called Laplace equation:
$$
\Delta T = \frac{\partial^2 T}{\partial x^2} + \frac{\partial^2 T}{\partial y^2} = f.
$$
same for $\frac{\partial^2 T}{\partial y^2},$
and we get a linear system.
First, let us do one-dimensional case:
After the discretization of the one-dimensional Laplace equation with Dirichlet boundary conditions we have $$\frac{u_{i+1} + u_{i-1} - 2u_i}{h^2} = f_i,$$
or in the matrix form
$$ A u = f, $$ and ($n = 5$ illustration)
\begin{equation}
A=-\frac{1}{h^2}\begin{bmatrix}
2& -1 & 0 & 0 & 0\\
-1 & 2 & -1 & 0 &0 \\
0 & -1 & 2& -1 & 0 \\
0 & 0 & -1 & 2 &-1\\
0 & 0 & 0 & -1 & 2
\end{bmatrix}
\end{equation}
The matrix is triadiagonal and sparse
(and also Toeplitz: all elements on the diagonal are the same)
In two dimensions, we get equation of the form
$$\frac{4u_{ij} -u_{(i-1)j} - u_{(i+1)j} - u_{i(j-1)}-u_{i(j+1)}}{h^2} = f_{ij},$$
or in the Kronecker product form
$$\Delta_2 = \Delta_1 \otimes I + I \otimes \Delta_1,$$where $\Delta_1$ is a 1D Laplacian, and $\otimes$ is a Kronecker product of matrices.
For matrices $A$ and $B$ its Kronecker product is defined as a block matrix of the form $$ [a_{ij} B] $$
In the block matrix form the 2D-Laplace matrix can be written in the following form:
$$ A = -\frac{1}{h^2}\begin{bmatrix} \Delta_1 + 2I & -I & 0 & 0 & 0\\ -I & \Delta_1 + 2I & -I & 0 &0 \\ 0 & -I & \Delta_1 + 2I & -I & 0 \\ 0 & 0 & -I & \Delta_1 + 2I &-I\\ 0 & 0 & 0 & -I & \Delta_1 + 2I \end{bmatrix} $$We can create this matrix using scipy.sparse package (actually this is not the best sparse matrix package)
import numpy as np
import scipy as sp
import scipy.sparse
from scipy.sparse import csc_matrix
import matplotlib.pyplot as plt
%matplotlib inline
n = 20
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
plt.spy(A, aspect='equal', marker='.', markersize=1)
<matplotlib.lines.Line2D at 0x10eccc390>
The spy
command plots the sparsity pattern of the matrix: the $(i, j)$ pixel is drawn, if the corresponding matrix element is non-zero.
Sparsity pattern is really important for the understanding the complexity of the sparse linear algebra algorithms.
Often, only the sparsity pattern is needed for the analysis of "how complex" the matrix is.
The formal definition of "sparse matrix" is that the number of non-zero elements is much less than the total number of
elements, so you can do the basic linear algebra operations (like solving linear systems at the first place) can be done faster,
than if working for with the full matrix.
The scipy.sparse package has tools for solving sparse linear systems:
import numpy as np
import scipy as sp
import scipy.sparse
import scipy.sparse.linalg
import seaborn as sns
from scipy.sparse import csc_matrix, csr_matrix
import matplotlib.pyplot as plt
%matplotlib inline
n = 512
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csr_matrix(A)
rhs = np.ones(n * n)
sol = sp.sparse.linalg.spsolve(A, rhs)
plt.contourf(sol.reshape((n, n)))
<matplotlib.contour.QuadContourSet instance at 0x1020a6f38>
Question 1: How to store the sparse matrix in memory?
Question 2: How to solve linear systems with sparse matrices fast?
A matrix is stored as 3 different arrays:
ia, ja, sa
where:
for i in range(n):
for k in range(ia[i]:ia[i+1]):
y[i] += sa[k] * x[ja[k]]
Let us do a short timing test
import numpy as np
import scipy as sp
import scipy.sparse
import scipy.sparse.linalg
from scipy.sparse import csc_matrix, csr_matrix, coo_matrix
import matplotlib.pyplot as plt
%matplotlib inline
n = 60
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csr_matrix(A)
rhs = np.ones(n * n)
B = coo_matrix(A)
%timeit A.dot(rhs)
%timeit B.dot(rhs)
10000 loops, best of 3: 46.7 µs per loop 10000 loops, best of 3: 56.8 µs per loop
As you see, CSR is faster, and for more unstructured patterns the gain will be larger.
Sparse matrix give complexity reduction.
But they are not very good for parallel implementation.
And they do not give maximal efficiency due to random data access.
Typically, peak efficiency of $10\%-15\%$ is considered good.
The standard way to measure the efficiency of linear algebra operations on a particular computing architecture is to
use flops (number of floating point operations per second)
The peak performance is determined as
frequency x number of cores x pipeline size x 2
We can measure peak efficiency of an ordinary matrix-by-vector product.
import numpy as np
import time
n = 1000
a = np.random.randn(n, n)
v = np.random.randn(n)
t = time.time()
np.dot(a, v)
t = time.time() - t
print('Time: {0: 3.1e}, Efficiency: {1: 3.1e} Gflops'.\
format(t, ((2 * n ** 2)/t) / 10 ** 9))
Time: 8.5e-04, Efficiency: 2.3e+00 Gflops
n = 1000
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
rhs = np.random.randn(n)
t = time.time()
a.dot(rhs)
t = time.time() - t
print('Time: {0: 3.1e}, Efficiency: {1: 3.1e} Gflops'.\
format(t, (3 * n) / t / 10 ** 9))
Time: 1.4e-04, Efficiency: 2.2e-02 Gflops
Sparse matrix computations dominate the linear algebra computations nowdays.
They allow us to work with much larger matrices, but they utilize only $10\%-15\%$ percent of the peak computer performance.
It means, that our computer architecture is not well suited for standard sparse matrix algorithms.
There are many possible solutions of the problem, for example:
n = 1000
k = 10
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
rhs = np.random.randn(*(n, k))
t = time.time()
a.dot(rhs)
t = time.time() - t
print('Time: {0: 3.1e}, Efficiency: {1: 3.1e} Gflops'.\
format(t, (3 * n * k) / t / 10 ** 9))
Time: 2.9e-04, Efficiency: 1.0e-01 Gflops
There are many other types of matrices besides tridiagonal!
Florida sparse matrix collection which contains all sorts of matrices for different applications.
It also allows for finding test matrices as well!
We can have a look.
from IPython.display import HTML
HTML('<iframe src=http://yifanhu.net/GALLERY/GRAPHS/search.html width=700 height=450></iframe>')
Sparse matrices and fast algorithms (especially for linear systems) have deep connection with graph theory.
First of all, sparse matrix can be treated as an adjacency matrix of a certain graph:
The vertices $(i, j)$ are connected, if the corresponding matrix element is non-zero.
Can you guess why it is important.
Why sparse linear systems can be solved fast?
Because the LU-decomposition is often also sparse (i.e., matrices $L$ and $U$ are sparse as well).
And how sparse they are, and which variables/equations select for the elimination at each step, is governed by the sparse structure.
And solving linear systems with sparse triangular matrices is very easy.
Note that the inverse matrix is not sparse!
#Indeed, it is not sparse
n = 7
ex = np.ones(n);
a = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
a = a.todense()
b = np.array(np.linalg.inv(a))
print a
print b
[[-2. 1. 0. 0. 0. 0. 0.] [ 1. -2. 1. 0. 0. 0. 0.] [ 0. 1. -2. 1. 0. 0. 0.] [ 0. 0. 1. -2. 1. 0. 0.] [ 0. 0. 0. 1. -2. 1. 0.] [ 0. 0. 0. 0. 1. -2. 1.] [ 0. 0. 0. 0. 0. 1. -2.]] [[-0.875 -0.75 -0.625 -0.5 -0.375 -0.25 -0.125] [-0.75 -1.5 -1.25 -1. -0.75 -0.5 -0.25 ] [-0.625 -1.25 -1.875 -1.5 -1.125 -0.75 -0.375] [-0.5 -1. -1.5 -2. -1.5 -1. -0.5 ] [-0.375 -0.75 -1.125 -1.5 -1.875 -1.25 -0.625] [-0.25 -0.5 -0.75 -1. -1.25 -1.5 -0.75 ] [-0.125 -0.25 -0.375 -0.5 -0.625 -0.75 -0.875]]
$L$ and $U$ are typically much better. In the tridiagonal case they are even bidiagonal!
p, l, u = scipy.linalg.lu(a)
print l
[[ 1. 0. 0. 0. 0. 0. 0. ] [-0.5 1. 0. 0. 0. 0. 0. ] [-0. -0.66666667 1. 0. 0. 0. 0. ] [-0. -0. -0.75 1. 0. 0. 0. ] [-0. -0. -0. -0.8 1. 0. 0. ] [-0. -0. -0. -0. -0.83333333 1. 0. ] [-0. -0. -0. -0. -0. -0.85714286 1. ]]
However, for a 2D case everything is much worser:
n = 20
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
T = scipy.sparse.linalg.spilu(A)
plt.spy(T.L, marker='.', color='k', markersize=8)
<matplotlib.lines.Line2D at 0x1159538d0>
For two-dimensional case the number of nonzeros in the L factor grows as $\mathcal{O}(N^{3/2})$.
The number of non-zeros in LU decomposition has a deep connection to the graph theory.
(I.e., there is an edge between $(i, j)$ if $a_{ij} \ne 0$.
networkx package
can be used to visualize graphs, given only the adjacency matrix.
It may even recover to some extend the graph structure.
import networkx as nx
n = 10
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = csc_matrix(A)
G = nx.Graph(A)
nx.draw(G, pos=nx.spring_layout(G), node_size=10)
Reordering the rows and the columns of the sparse matrix in order to reduce the number of non-zeros in $L$ and $U$ factors is called (fill-in minimization)
is based on graph theory:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()