Nonconvex least squares

Suppose a signal $x\in\{-1,1\}^{n}$ is sent over over a noisy channel, and received as $b=Ax + v$ where $v\sim\mathcal{N}(0,\sigma^{2}I)$ is Gaussian noise. Based on the received signal $b\in\mathbb{R}^{m}$, the maximum likelihood estimate $x_{\text{ML}}^{*}$ of the original signal $x$ is obtained by solving

$$x_{\text{ML}}^{*} = \underset{x\in\mathbb{R}^{n}}{\arg\min}\quad \|Ax - b\|_{2}^{2}\\ \qquad\qquad\qquad\qquad\text{subject to} \quad x_{j}^{2} = 1, \quad\forall j =1,..., n,$$

where $A\in\mathbb{R}^{m\times n}$ and $b\in\mathbb{R}^{m}$ are known. Assume that $\text{rank}(A)=n$, that is, $A^{\top}A$ is nonsingular.

(a) [2 + 15 = 17 points] Primal and its convex relaxation

(i) Why the primal problem above is nonconvex?

(ii) A convex relaxation of the primal is obtained by simply deleting the nonconvex constraint, and is therefore a convex optimization problem. Numerically solve the convex relaxation of the primal in MATLAB/Python/Julia using cvx/cvxpy/Convex.jl for the $A,b$ data given in the starter code $\texttt{AM229F20HW7problem.m}$ (inside CANVAS). Report the numerically computed optimal value, and the percentage error between true $x$ and $\text{sign}(x_{a})$ where $x_{a}$ is the minimizer of the convex relaxation. Submit your code.

Solution for part (a):

(i) The constraint set $\{x\in\mathbb{R}^{n}\mid x_{j}^{2}=1,\;\forall\:j=1,...,n\}=\{-1,1\}^{n}$ being the union of vertices of a cube with each coordinate $\pm 1$, is nonconvex. Therefore, the given optimization problem asks us to minimize a convex function over a nonconvex set, and hence a nonconvex problem.

(ii) The convex relaxation, obtained by deleting the constraints $x_{j}^{2}=1$, becomes the ordinary least squares. The posted code $\texttt{AM229F20HW7solution.m}$ in CANVAS Files section, inside folder HW problems and solutions, gives

$$\|A\text{sign}(x_{a})-b\|_{2}^{2} = 63.791726053201685.$$

The error is $0\%$, i.e., composing the argmin of the convex relaxation with the sign function recovers the true signal $x$.

(b) [30 + 15 = 45 points] Dual

(i) Derive the Lagrange dual problem of the original primal problem as an SDP.

Hint: use epigraph form followed by Schur complement lemma for possibly singular matrices (see textbook Appendix A.5.5, printed page number 651).

(ii) Use the same code (and problem data) in part (a)(ii) to numerically solve the dual problem of part(b)(i). Report the numerically computed optimal value of the dual problem, and submit your code (same file as in part(a)).

Solution for part (b):

(i) The Lagrangian \begin{align*} L(x,\nu) &= \|Ax-b\|_{2}^{2} + \sum_{j=1}^{n}\nu_{j}\left(x_{j}^{2}-1\right)\\ &= (Ax-b)^{\top}(Ax-b) + x^{\top}\operatorname{diag}(\nu)x - \mathbf{1}^{\top}\nu\\ &= x^{\top}\left(A^{\top}A + \operatorname{diag}(\nu)\right)x - 2b^{\top}Ax - \mathbf{1}^{\top}\nu + b^{\top}b \end{align*}

is unbounded if either $A^{\top}A + \operatorname{diag}(\nu) \not\succeq 0$ or if $-A^{\top}b \not\in\text{range}(A^{\top}A + \operatorname{diag}(\nu))$.

When both $A^{\top}A + \operatorname{diag}(\nu) \succeq 0$ and $-A^{\top}b \in\text{range}(A^{\top}A + \operatorname{diag}(\nu))$, then to compute the infimum of $L$, we solve $\nabla_{x}L = 0$ as $$\left(A^{\top}A + \operatorname{diag}(\nu)\right)x^{\text{opt}} = A^{\top}b\quad\Rightarrow\quad x^{\text{opt}} = \left(A^{\top}A + \operatorname{diag}(\nu)\right)^{\dagger}A^{\top}b.$$

Thus, the infimum value of the Lagrangian (a.k.a. the Lagrange dual function) is

\begin{align*} &g(\nu) = \underset{x}{\inf}\;L(x,\nu) = L\left(x= x^{\text{opt}},\nu\right)\\ &= -b^{\top}A\left(A^{\top}A + \operatorname{diag}(\nu)\right)^{\dagger}A^{\top}b - \mathbf{1}^{\top}\nu + b^{\top}b, \,\text{if}\, A^{\top}A + \operatorname{diag}(\nu) \succeq 0, \:\text{and}\: -A^{\top}b \in\text{range}(A^{\top}A + \operatorname{diag}(\nu)), \end{align*}

and $g(\nu)= -\infty$ otherwise. Hence the dual problem is (maximize is negative of minimize):

\begin{align*} &\underset{\nu\in\mathbb{R}^{n}}{\text{minimize}}\quad\mathbf{1}^{\top}\nu +b^{\top}A\left(A^{\top}A + \operatorname{diag}(\nu)\right)^{\dagger}A^{\top}b - b^{\top}b\\ &\text{subject to}\quad A^{\top}A + \operatorname{diag}(\nu) \succeq 0,\\ &\qquad\qquad -A^{\top}b \in\text{range}(A^{\top}A + \operatorname{diag}(\nu)). \end{align*}

Using epigraph form for the nonlinear (in $\nu$) term in the objective, we rewrite the dual problem as

\begin{align*} &\underset{\nu\in\mathbb{R}^{n}}{\text{minimize}}\quad\mathbf{1}^{\top}\nu + t - b^{\top}b\\ &\text{subject to}\quad b^{\top}A\left(A^{\top}A + \operatorname{diag}(\nu)\right)^{\dagger}A^{\top}b \leq t,\\ &\qquad\qquad\; A^{\top}A + \operatorname{diag}(\nu) \succeq 0,\\ &\qquad\qquad -A^{\top}b \in\text{range}(A^{\top}A + \operatorname{diag}(\nu)). \end{align*}

Following the hint, we notice that by Schur complement lemma (see Appendix A.5.5 last para in text, printed page number 651), the above three constraints are equivalent to the LMI $$\begin{pmatrix} A^{\top}A + \operatorname{diag}(\nu) & -A^{\top}b\\ -b^{\top}A & t \end{pmatrix} \succeq 0.$$ Therefore, the dual problem is the SDP \begin{align*} &\underset{\nu\in\mathbb{R}^{n}}{\text{minimize}}\quad\mathbf{1}^{\top}\nu + t - b^{\top}b\\ &\text{subject to}\quad \begin{pmatrix} A^{\top}A + \operatorname{diag}(\nu) & -A^{\top}b\\ -b^{\top}A & t \end{pmatrix} \succeq 0. \end{align*}

(ii) The posted code $\texttt{AM229F20HW7solution.m}$ gives the optimal value of the dual: $d^{*} = 63.791723605885636$.

(c) [20 + 15 + 3 = 38 points] Bidual

(i) Starting from part (b)(i), prove that the bidual (dual of the dual) problem can be written as:

$$\text{minimize}\quad \operatorname{trace}(A^{\top}AZ) -2b^{\top}Az + b^{\top}b\\ \text{subject to} \quad \operatorname{diag}(Z) = \mathbf{1}, \quad \begin{pmatrix} Z & z\\ z^{\top} & 1 \end{pmatrix} \succeq \mathbf{0}.$$

(ii) Use the same code (and problem data) in part (a)(ii) and (b)(ii) to numerically solve the bidual problem of part(c)(i). Report the numerically computed optimal value, and the percentage error between true $x$ and $\text{sign}(z)$ where $z$ is the minimizer in part (c)(i). Submit your code (same file as in part(a)).

(iii) Explain what you observe by comparing the numerically computed optimal solutions from (a)(ii), (b)(ii) and c(ii).

Solution for part (c):

(i) Corresponding to the LMI derived in part (b)(i), we introduce a Lagrange multiplier $$\begin{pmatrix} Z & z\\ z^{\top} & \lambda \end{pmatrix}.$$ Then the Lagrangian of the optimization problem derived in part (b)(i) is

\begin{align*} L(\nu,t,Z,z,\lambda) &= 1^{\top}\nu + t - b^{\top}b - \text{trace}\left(\begin{pmatrix} Z & z\\ z^{\top} & \lambda \end{pmatrix}\begin{pmatrix} A^{\top}A + \text{diag}(\nu) & -A^{\top}b\\ -b^{\top}A & t \end{pmatrix}\right)\\ &= 1^{\top}\nu + t - b^{\top}b - \text{trace}\left(Z\left(A^{\top}A + \text{diag}(\nu)\right) - zb^{\top}A\right) - \left(-z^{\top}A^{\top}b + \lambda t\right)\\ &= 1^{\top}\nu + t - b^{\top}b - \text{trace}\left(A^{\top}AZ\right) -(\text{diag}(Z))^{\top}\nu + b^{\top}Az + z^{\top}A^{\top}b - \lambda t\\ &= \left(\mathbf{1}-\text{diag}(Z)\right)^{\top}\nu + t(1-\lambda) - \text{trace}\left(A^{\top}AZ\right) + 2 b^{\top}Az - b^{\top}b. \end{align*}

The above Lagrangian is unbounded below unless $\text{diag}(Z)=\mathbf{1}$ and $\lambda = 1$. Thus, the dual function is $$g(Z,z,\lambda) = -\text{trace}\left(A^{\top}AZ\right) + 2b^{\top}Az - b^{\top}b, \quad \text{if}\;\text{diag}(Z)=\mathbf{1},\:\lambda = 1,$$ and $g(Z,z,\lambda) = -\infty$ otherwise. Hence the bidual problem is $$\text{maximize}\quad -\operatorname{trace}(A^{\top}AZ) +2b^{\top}Az - b^{\top}b\\ \text{subject to} \quad \operatorname{diag}(Z) = \mathbf{1}, \quad \begin{pmatrix} Z & z\\ z^{\top} & 1 \end{pmatrix} \succeq \mathbf{0}.$$ Negating the objective above, we arrive at the desired form.

(ii) The posted code $\texttt{AM229F20HW7solution.m}$ gives the optimal value of the bidual: $63.791752427470477$. The error is $0\%$, that is, the bidual minimizer recovers the true signal $x$.

(iii) We see that the numerically computed optimal solutions from a(ii), b(ii) and c(ii) match very well (modulo floating point arithmetic). For the randomly generated problem data given, the convex relaxation recovers the true signal. But this is not guaranteed for any problem data. Numerical experiments (with different random seeds) show that the recovery does occur often (i.e., with high probability) but not always. The dual and the bidual problems are both convex optimization problems. From the numerics, they have zero duality gap.