Notebook

Matrix imputation¶

$$ \newcommand{\eg}{{\it e.g.}} \newcommand{\ie}{{\it i.e.}} \newcommand{\argmin}{\operatornamewithlimits{argmin}} \newcommand{\mc}{\mathcal} \newcommand{\mb}{\mathbb} \newcommand{\mf}{\mathbf} \newcommand{\minimize}{{\text{minimize}}} \newcommand{\Vec}{{\text{vec}}} \newcommand{\argmin}{{\text{argmin}}} \newcommand{\diag}{{\text{diag}}} \newcommand{\cond}{{\text{cond}}} \newcommand{\rank}{{\text{rank }}} \newcommand{\range}{{\mathcal{R}}} \newcommand{\null}{{\mathcal{N}}} \newcommand{\tr}{{\text{trace}}} \newcommand{\dom}{{\text{dom}}} \newcommand{\dist}{{\text{dist}}} \newcommand{\E}{\mathbf{E}} \newcommand{\var}{\mathbf{var}} \newcommand{\R}{\mathbf{R}} \newcommand{\B}{\mathbf{B}} \newcommand{\SM}{\mathbf{S}} \newcommand{\ball}{\mathcal{B}} \newcommand{\bmat}[1]{\begin{bmatrix}#1\end{bmatrix}} $$

ASE7030: Convex Optimization, Inha University.

Jong-Han Kim (jonghank@inha.ac.kr)

This problem concerns arrays of real numbers on an $m \times n$ grid. Such as array can represent an image, or a sampled description of a function defined on a rectangle. We can describe such an array by a matrix $X \in \R^{m \times n}$, where $X_{ij}$ gives the real number at location $i,j$, for $i=1,\ldots, m$ and $j=1,\ldots, n$. We will think of the index $i$ as associated with the $y$ axis, and the index $j$ as associated with the $x$ axis.

It will also be convenient to describe such an array by a vector $x=\Vec (X) \in \R^{mn}$. Here $\Vec$ is the function that stacks the columns of a matrix on top of each other:

$$ \Vec (X) = \bmat{ x_1 \\ \vdots \\ x_n } $$

where $X = \bmat{x_1 & \cdots & x_n}$. To go back to the array representation, from the vector, we have $X=\Vec^{-1} (x)$. (This looks complicated, but isn't; $\Vec^{-1}$ just arranges the elements in a vector into an array.)

We will need two linear functions that operate on $m \times n$ arrays. These are simple approximations of partial differentiation with respect to the $x$ and $y$ axes, respectively. The first function takes as argument an $m \times n$ array $X$ and returns an $m \times (n-1)$ array $V$ of forward (rightward) differences:

$$ V_{ij} = X_{i,j+1} - X_{ij}, \quad i=1, \ldots, m, \quad j = 1, \ldots, n-1. $$

We can represent this linear mapping as multiplication by a matrix $D_x \in \R^{m(n-1) \times mn}$, which satisfies

$$ \Vec (V) = D_x \Vec(X). $$

(This looks scarier than it is---each row of the matrix $D_x$ has exactly one $+1$ and one $-1$ entry in it.)

The other linear function, which is a simple approximation of partial differentiation with respect to the $y$ axis, maps an $m \times n$ array $X$ into an $(m-1) \times n$ array $W$, is defined as

$$ W_{ij} = X_{i+1,j} - X_{ij}, \quad i=1, \ldots, m-1, \quad j = 1, \ldots, n. $$

We define the matrix $D_y \in \R^{(m-1)n\times mn}$, which satisfies $\Vec (W) = D_y \Vec(X)$.

We define the roughness of an array $X$ as

$$ R = \|D_x \Vec(X)\|_2^2 + \|D_y \Vec(X)\|_2^2. $$

The roughness measure $R$ is the sum of the squares of the differences of each element in the array and its neighbors. Small $R$ corresponds to smooth, or smoothly varying, $X$. The roughness measure $R$ is zero precisely for constant arrays, i.e., when $X_{ij}$ are all equal.

Now we get to the problem, which is to interpolate some unknown values in an array in the smoothest possible way, given the known values in the array. To define this precisely, we partition the set of indices $\{1, \ldots, mn\}$ into two sets: $I_\mathrm{known}$ and $I_\mathrm{unknown}$.

We let $p \geq 1$ denote the number of known values (i.e., the number of elements in $I_\mathrm{known}$), and $q=mn-p$ the number of unknown values (the number of elements in $I_\mathrm{unknown}$).

We are given the values $x_i$ for $i \in I_\mathrm{known}$; the goal is to guess (or estimate or assign) values for $x_i$ for $i \in I_\mathrm{unknown}$.

We'll choose the values for $x_i$, with $i \in I_\mathrm{unknown}$, so that the resulting $X$ is as smooth as possible, i.e., so it minimizes $R$. Thus, the goal is to fill in or interpolate missing data in a 2D array (an image, say), so the reconstructed array is as smooth as possible.

We give the $p$ known values in a vector $x_\mathrm{k} \in \R^{p}$, and the $q=mn-p$ unknown values in a vector $x_\mathrm{u} \in \R^{mn-p}$.

The complete array is obtained by putting the entries of $x_\mathrm{k}$ and $x_\mathrm{u}$ into the correct positions of the array.

We describe these operations using two matrices $Z_\mathrm{k} \in \R^{mn \times p}$ and $Z_\mathrm{u} \in \R^{mn \times (mn-p)}$, that satisfy

$$ \Vec(X) = Z_\mathrm{k}x_\mathrm{k} + Z_\mathrm{u}x_\mathrm{u}. $$

(This looks complicated, but isn't: Each row of these matrices is a unit vector, so multiplication with either matrix just stuffs the entries of the $w$ vectors into particular locations in $\Vec(X)$. In fact, the matrix $\bmat{Z_\mathrm{k} & Z_\mathrm{u}}$ is an $mn \times mn$ permutation matrix.)

In summary, you are given the problem data $x_\mathrm{k}$ (which gives the known array values), $Z_\mathrm{k}$ (which gives the locations of the known values), and $Z_\mathrm{u}$ (which gives the locations of the unknown array values, in some specific order).

Your job is to find $x_\mathrm{u}$ that minimizes $R$.

In [1]:

import numpy as np
import scipy.sparse as ssp
import matplotlib.pyplot as plt

u = np.linspace(-2, 2, 100)
v = np.linspace(-2, 2, 100)

U, V = np.meshgrid(u, v)

X_original = np.sin(-U+V-1)**2*np.cos(U+V+1) + 1

xmin = np.min(X_original)
xmax = np.max(X_original)

m,n = X_original.shape

B = np.random.rand(m, n) > 0.50

X_obscured = B*X_original

plt.figure(figsize=(12,6))
plt.subplot(121)
plt.imshow(X_original, cmap='gray')
plt.title('Original image')
plt.axis('off')
plt.subplot(122)
plt.imshow(X_obscured, cmap='gray')
plt.title('Obscured image')
plt.axis('off')
plt.show()

In [2]:

mn = m*n
p = np.sum(B)
q = mn - p

B_k = B.flatten('F')
B_u = 1-B.flatten('F')

Z_k = ssp.coo_matrix( ( np.ones(p), ( np.where(B_k>0)[0], range(p) ) ),
    shape=(mn, p), dtype=np.int8)
Z_u = ssp.coo_matrix( ( np.ones(q), ( np.where(B_u>0)[0], range(q) ) ),
    shape=(mn, q), dtype=np.int8)
x_k = Z_k.T@X_original.flatten('F')

Dx = ssp.hstack( [ np.zeros( (m*(n-1),m) ), ssp.eye( m*(n-1) ) ] ) \
   - ssp.hstack( [ ssp.eye( m*(n-1) ), np.zeros( (m*(n-1),m) ) ] )
Dy = ssp.kron( ssp.eye(n), np.diff(np.eye(m), axis=0) )

plt.figure(figsize=(12,6))
plt.subplot(121)
plt.spy(Z_k, markersize=0.2)
plt.title(r'$Z_{k}$')
plt.xticks(np.arange(0,p,500))
plt.subplot(122)
plt.spy(Z_u, markersize=0.2)
plt.title(r'$Z_{u}$')
plt.xticks(np.arange(0,q,500))
plt.show()

plt.figure(figsize=(12,6))
plt.subplot(121)
plt.spy(Dx, markersize=0.2)
plt.title(r'$D_x$')
plt.subplot(122)
plt.spy(Dy, markersize=0.2)
plt.title(r'$D_y$')
plt.show()

We can express our roughness measure directly in terms of the vector of known values $x_\mathrm{k}$ and unknown values $x_\mathrm{u}$ as

\begin{eqnarray*} R &=& \| D_x (Z_\mathrm{k} x_\mathrm{k} + Z_\mathrm{u} x_\mathrm{u}) \|^2 \\ && + \| D_y (Z_\mathrm{k} x_\mathrm{k} + Z_\mathrm{u} x_\mathrm{u}) \|^2\\ &=& \left\| \bmat{D_x \\ D_y } Z_\mathrm{k} x_\mathrm{k} +\bmat{D_x \\ D_y } Z_\mathrm{u} x_\mathrm{u}\right\|^2. \end{eqnarray*}

Defining

$$ A = \bmat{D_x \\ D_y} Z_\mathrm{u}, \quad b = -\bmat{D_x \\ D_y} Z_\mathrm{k} x_\mathrm{k}, $$

we can express the problem in the familiar form

$$ \begin{array}{ll} \mbox{minimize} & \| A x_\mathrm{u} - b \|^2 .\\ \end{array} $$

Provided $A$ is skinny and full rank, the solution is

\begin{eqnarray*} x_\mathrm{unknown} &=& A^\dagger b\\ &=& (A^TA)^{-1}A^Tb\\ &=& - \left( Z_\mathrm{u}^T (D_x^TD_x+D_y^TD_y) Z_\mathrm{u} \right)^{-1} \left( Z_\mathrm{u}^T (D_x^TD_x+D_y^TD_y) Z_\mathrm{k} \right) x_\mathrm{k}. \end{eqnarray*}

When is $A \in \R^{(2mn-m-n)\times (mn-p)}$ skinny and full rank? It's always skinny, since $2mn-m-n\geq mn-p$. If $A$ were not full rank, then there would exist some nonzero $w$ with $Aw=0$. This means that $Z_\mathrm{u} w$ is in the nullspace of both $D_x$ and $D_y$, which means that $Z_\mathrm{u} w$ is a constant (i.e., its entries are all the same). This means that we have to have $w=0$, assuming there is at least one known array value. In other words, $A$ is always full rank and skinny.

In [3]:

import scipy.sparse.linalg as sla

DD = ssp.vstack( [ Dx, Dy ] )

A_MATRIX = DD@Z_u
B_MATRIX = DD@Z_k@x_k

x_u = sla.lsqr(A_MATRIX, -B_MATRIX)[0]
X_recon_L2 = (Z_k@x_k + Z_u@x_u).reshape(m, n, order='F')

plt.figure(figsize=(12,12))
plt.subplot(221)
plt.imshow(X_original, cmap='gray')
plt.title('Original image')
plt.axis('off')
plt.subplot(222)
plt.imshow(X_obscured, cmap='gray')
plt.title('Obscured image')
plt.axis('off')
plt.subplot(223)
plt.imshow(X_recon_L2, cmap='gray')
plt.title('Reconstructed image')
plt.axis('off')
plt.subplot(224)
plt.imshow(np.abs(X_recon_L2-X_original), cmap='gray', vmin=xmin, vmax=xmax)
plt.title('Difference image')
plt.axis('off')
plt.show()

Now let us do the same with a little bit different image.

In [4]:

X_original = np.sin(-U+V-1)**2*np.cos(U+V+1) + 1

X_original[10:30,10:30] = 0*np.ones((20,20))
X_original[30:50,30:50] = 0*np.ones((20,20))
X_original[10:30,50:70] = 2*np.ones((20,20))
X_original[30:50,70:90] = 2*np.ones((20,20))

xmin = np.min(X_original)
xmax = np.max(X_original)

m,n = X_original.shape

B = np.random.rand(m, n) > 0.50

X_obscured = B*X_original

plt.figure(figsize=(12,6))
plt.subplot(121)
plt.imshow(X_original, cmap='gray')
plt.title('Original image')
plt.axis('off')
plt.subplot(122)
plt.imshow(X_obscured, cmap='gray')
plt.title('Obscured image')
plt.axis('off')
plt.show()

In [5]:

p = np.sum(B)
q = mn - p

B_k = B.flatten('F')
B_u = 1-B.flatten('F')

Z_k = ssp.coo_matrix( ( np.ones(p), ( np.where(B_k>0)[0], range(p) ) ),
    shape=(mn, p), dtype=np.int8)
Z_u = ssp.coo_matrix( ( np.ones(q), ( np.where(B_u>0)[0], range(q) ) ),
    shape=(mn, q), dtype=np.int8)
x_k = Z_k.T@X_original.flatten('F')

Dx = ssp.hstack( [ np.zeros( (m*(n-1),m) ), ssp.eye( m*(n-1) ) ] ) \
   - ssp.hstack( [ ssp.eye( m*(n-1) ), np.zeros( (m*(n-1),m) ) ] )
Dy = ssp.kron( ssp.eye(n), np.diff(np.eye(m), axis=0) )

Solving the same problem gives the following image.

In [6]:

DD = ssp.vstack( [ Dx, Dy ] )

A_MATRIX = DD@Z_u
B_MATRIX = DD@Z_k@x_k

x_u = sla.lsqr(A_MATRIX, -B_MATRIX)[0]
X_recon_L2 = (Z_k@x_k + Z_u@x_u).reshape(m, n, order='F')

plt.figure(figsize=(12,12))
plt.subplot(221)
plt.imshow(X_original, cmap='gray')
plt.title('Original image')
plt.axis('off')
plt.subplot(222)
plt.imshow(X_obscured, cmap='gray')
plt.title('Obscured image')
plt.axis('off')
plt.subplot(223)
plt.imshow(X_recon_L2, cmap='gray')
plt.title('Reconstructed image')
plt.axis('off')
plt.subplot(224)
plt.imshow(np.abs(X_recon_L2-X_original), cmap='gray', vmin=xmin, vmax=xmax)
plt.title('Difference image')
plt.axis('off')
plt.show()

Now we define the roughness of an array $X$ in two differnt ways,

$$ R_{\ell_2} = \|D_x \Vec(X)\|_2^2 + \|D_y \Vec(X)\|_2^2, $$

and

$$ R_{\ell_1} = \|D_x \Vec(X)\|_1 + \|D_y \Vec(X)\|_1. $$

The optimal image reconstructions minimizing $R_{\ell_2}$ and $R_{\ell_1}$ respectively are given below.

In [7]:

import cvxpy as cp

v = cp.Variable(q)
obj = cp.Minimize( cp.norm( A_MATRIX@v + B_MATRIX, 1 ) )
prob = cp.Problem(obj)
prob.solve(verbose=True)
x_u = v.value

X_recon_L1 = (Z_k@x_k + Z_u@x_u).reshape(m, n, order='F')

plt.figure(figsize=(12,12))
plt.subplot(221)
plt.imshow(X_original, cmap='gray')
plt.title('Original image')
plt.axis('off')
plt.subplot(222)
plt.imshow(X_obscured, cmap='gray')
plt.title('Obscured image')
plt.axis('off')
plt.subplot(223)
plt.imshow(X_recon_L2, cmap='gray')
plt.title('Reconstructed image (L2)')
plt.axis('off')
plt.subplot(224)
plt.imshow(X_recon_L1, cmap='gray')
plt.title('Reconstructed (L1)')
plt.axis('off')
plt.show()

===============================================================================
                                     CVXPY                                     
                                     v1.6.4                                    
===============================================================================
(CVXPY) Apr 17 05:01:14 AM: Your problem has 5046 variables, 0 constraints, and 0 parameters.
(CVXPY) Apr 17 05:01:14 AM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Apr 17 05:01:14 AM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Apr 17 05:01:14 AM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
(CVXPY) Apr 17 05:01:14 AM: Your problem is compiled with the CPP canonicalization backend.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Apr 17 05:01:14 AM: Compiling problem (target solver=CLARABEL).
(CVXPY) Apr 17 05:01:14 AM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> CLARABEL
(CVXPY) Apr 17 05:01:14 AM: Applying reduction Dcp2Cone
(CVXPY) Apr 17 05:01:14 AM: Applying reduction CvxAttr2Constr
(CVXPY) Apr 17 05:01:14 AM: Applying reduction ConeMatrixStuffing
(CVXPY) Apr 17 05:01:14 AM: Applying reduction CLARABEL
(CVXPY) Apr 17 05:01:14 AM: Finished problem compilation (took 2.491e-01 seconds).
-------------------------------------------------------------------------------
                                Numerical solver                               
-------------------------------------------------------------------------------
(CVXPY) Apr 17 05:01:14 AM: Invoking solver CLARABEL  to obtain a solution.
-------------------------------------------------------------
           Clarabel.rs v0.10.0  -  Clever Acronym                

                   (c) Paul Goulart                          
                University of Oxford, 2022                   
-------------------------------------------------------------

problem:
  variables     = 24846
  constraints   = 39600
  nnz(P)        = 0
  nnz(A)        = 79568
  cones (total) = 1
    : Nonnegative = 1,  numel = 39600

settings:
  linear algebra: direct / qdldl, precision: 64 bit
  max iter = 200, time limit = Inf,  max step = 0.990
  tol_feas = 1.0e-8, tol_gap_abs = 1.0e-8, tol_gap_rel = 1.0e-8,
  static reg : on, ϵ1 = 1.0e-8, ϵ2 = 4.9e-32
  dynamic reg: on, ϵ = 1.0e-13, δ = 2.0e-7
  iter refine: on, reltol = 1.0e-13, abstol = 1.0e-12,
               max iter = 10, stop ratio = 5.0
  equilibrate: on, min_scale = 1.0e-4, max_scale = 1.0e4
               max iter = 10

iter    pcost        dcost       gap       pres      dres      k/t        μ       step      
---------------------------------------------------------------------------------------------
  0  +7.4470e-17  +2.4647e-14  2.46e-14  8.76e-01  5.00e-01  1.00e+00  2.98e+00   ------   
  1  +2.4303e+02  +2.4322e+02  8.07e-04  5.97e-01  1.53e-01  4.19e-01  1.08e+00  7.80e-01  
  2  +3.6576e+02  +3.6584e+02  2.27e-04  3.17e-01  5.68e-02  1.61e-01  4.28e-01  6.46e-01  
  3  +4.5227e+02  +4.5230e+02  5.38e-05  1.09e-01  1.75e-02  4.81e-02  1.37e-01  7.59e-01  
  4  +5.9872e+02  +5.9873e+02  9.00e-06  2.86e-02  4.35e-03  1.16e-02  3.62e-02  8.33e-01  
  5  +6.7267e+02  +6.7267e+02  2.30e-06  9.84e-03  1.42e-03  3.66e-03  1.25e-02  7.99e-01  
  6  +7.0037e+02  +7.0037e+02  5.88e-07  2.75e-03  3.87e-04  1.00e-03  3.48e-03  7.64e-01  
  7  +7.0827e+02  +7.0827e+02  1.70e-07  8.85e-04  1.23e-04  3.11e-04  1.12e-03  7.90e-01  
  8  +7.1107e+02  +7.1108e+02  4.67e-08  2.57e-04  3.54e-05  8.85e-05  3.25e-04  7.81e-01  
  9  +7.1193e+02  +7.1193e+02  1.26e-08  7.17e-05  9.88e-06  2.44e-05  9.09e-05  7.86e-01  
 10  +7.1220e+02  +7.1220e+02  2.53e-09  1.47e-05  2.02e-06  4.96e-06  1.86e-05  8.22e-01  
 11  +7.1227e+02  +7.1227e+02  3.01e-10  1.86e-06  2.55e-07  6.14e-07  2.35e-06  9.42e-01  
 12  +7.1227e+02  +7.1227e+02  2.69e-11  1.68e-07  2.31e-08  5.54e-08  2.13e-07  9.31e-01  
 13  +7.1227e+02  +7.1227e+02  5.18e-13  3.25e-09  4.47e-10  1.07e-09  4.12e-09  9.81e-01  
---------------------------------------------------------------------------------------------
Terminated with status = Solved
solve time = 1.296264925s
-------------------------------------------------------------------------------
                                    Summary                                    
-------------------------------------------------------------------------------
(CVXPY) Apr 17 05:01:16 AM: Problem status: optimal
(CVXPY) Apr 17 05:01:16 AM: Optimal value: 7.123e+02
(CVXPY) Apr 17 05:01:16 AM: Compilation took 2.491e-01 seconds
(CVXPY) Apr 17 05:01:16 AM: Solver (including time spent in interface) took 1.463e+00 seconds

References:

Course notes on EE263: Introduction to linear dynamical systems, http://ee263.stanford.edu

In [7]: