Notebook

Second Order Cone Programming (SOCP)¶

In [1]:

versioninfo()

Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

In [2]:

using Pkg
Pkg.activate("../..")
Pkg.status()

  Activating environment at `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml`

      Status `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml`
  [7d9fca2a] Arpack v0.4.0
  [6e4b80f9] BenchmarkTools v1.2.0
  [1e616198] COSMO v0.8.1
  [f65535da] Convex v0.14.16
  [a93c6f00] DataFrames v1.2.2
  [31a5f54b] Debugger v0.6.8
  [31c24e10] Distributions v0.24.18
  [e2685f51] ECOS v0.12.3
  [f6369f11] ForwardDiff v0.10.21
  [28b8d3ca] GR v0.61.0
  [c91e804a] Gadfly v1.3.3
  [bd48cda9] GraphRecipes v0.5.7
  [2e9cd046] Gurobi v0.9.14
  [f67ccb44] HDF5 v0.14.3
  [82e4d734] ImageIO v0.5.8
  [6218d12a] ImageMagick v1.2.1
  [916415d5] Images v0.24.1
  [b6b21f68] Ipopt v0.7.0
  [42fd0dbc] IterativeSolvers v0.9.1
  [4076af6c] JuMP v0.21.10
  [b51810bb] MatrixDepot v1.0.4
  [6405355b] Mosek v1.2.1
  [1ec41992] MosekTools v0.9.4
  [76087f3c] NLopt v0.6.3
  [47be7bcc] ORCA v0.5.0
  [a03496cd] PlotlyBase v0.4.3
  [f0f68f2c] PlotlyJS v0.15.0
  [91a5bcdd] Plots v1.22.6
  [438e738f] PyCall v1.92.3
  [d330b81b] PyPlot v2.10.0
  [dca85d43] QuartzImageIO v0.7.3
  [6f49c342] RCall v0.13.12
  [ce6b1742] RDatasets v0.7.5
  [c946c3f1] SCS v0.7.1
  [276daf66] SpecialFunctions v1.7.0
  [2913bbd2] StatsBase v0.33.12
  [f3b207a7] StatsPlots v0.14.28
  [b8865327] UnicodePlots v2.4.6
  [0f1e0344] WebIO v0.8.16
  [8f399da3] Libdl
  [2f01184e] SparseArrays
  [10745b16] Statistics

SOCP, a.k.a. Conic Quadratic Programming (CQP)¶

A second-order cone program (SOCP)

\begin{eqnarray*} &\text{minimize}& \mathbf{f}^T \mathbf{x} \\ &\text{subject to}& \|\mathbf{A}_i \mathbf{x} + \mathbf{b}_i\|_2 \le \mathbf{c}_i^T \mathbf{x} + d_i, \quad i = 1,\ldots,m \\ & & \mathbf{F} \mathbf{x} = \mathbf{g} \end{eqnarray*}

over $\mathbf{x} \in \mathbb{R}^d$. This says the points $(\mathbf{A}_i \mathbf{x} + \mathbf{b}_i, \mathbf{c}_i^T \mathbf{x} + d_i)$ live in the second order cone (ice cream cone, Lorentz cone, quadratic cone) \begin{eqnarray*} \mathbb{L}^{d+1} = \{(\mathbf{x}, t): \|\mathbf{x}\|_2 \le t\} \end{eqnarray*} in $\mathbb{R}^{d+1}$.

QP is a special case of SOCP. Why?
When $\mathbf{c}_i = \mathbf{0}$ for $i=1,\ldots,m$, SOCP is equivalent to a quadratically constrained quadratic program (QCQP)

\begin{eqnarray*} &\text{minimize}& (1/2) \mathbf{x}^T \mathbf{P}_0 \mathbf{x} + \mathbf{q}_0^T \mathbf{x} \\ &\text{subject to}& (1/2) \mathbf{x}^T \mathbf{P}_i \mathbf{x} + \mathbf{q}_i^T \mathbf{x} + r_i \le 0, \quad i = 1,\ldots,m \\ & & \mathbf{A} \mathbf{x} = \mathbf{b}, \end{eqnarray*}

where $\mathbf{P}_i \in \mathbf{S}_+^n$, $i=0,1,\ldots,m$. Why?

SOCr¶

A convex set $S \in \mathbb{R}^n$ is said to be second-order cone representable (SOCr) if $S$ is the projection of a set in a higher-dimensional space that can be described by a set of second-order cone inequalities. Specifically,

$$ \mathbf{x}\in S \iff \exists\mathbf{u}\in \mathbb{R}^m: \left\| \mathbf{A}_i\begin{bmatrix} \mathbf{x} \\ \mathbf{u} \end{bmatrix} + \mathbf{b}_i \right\|_2 \le \mathbf{c}_i^T\begin{bmatrix} \mathbf{x} \\ \mathbf{u} \end{bmatrix} + d_i, \quad i=1, \dotsc, N. $$

A convex function $f: \mathbb{R}^n \to \mathbb{R}$ is said to be SOCr if and only if $\text{epi}f$ is SOCr.

A rotated quadratic cone in $\mathbb{R}^{d+1}$ is

\begin{eqnarray*} \mathbb{L}_r^{d+2} = \{(\mathbf{x}, t_1, t_2): \|\mathbf{x}\|_2^2 \le 2 t_1 t_2, \mathbf{x}\in\mathbb{R}^{d}, t_1 \ge 0, t_2 \ge 0\}. \end{eqnarray*}

- $\mathbb{L}_r^{d+2}$ is SOCr: $\|\mathbf{x}\|_2^2 \le 2 t_1 t_2,~t_1, t_2 \ge 0 \iff \left\| \begin{bmatrix} \mathbf{x} \\ \frac{t_1 - t_2}{2} \end{bmatrix} \right\|_2 \le \frac{t_1 + t_2}{2}$.
- $\mathbb{L}^{d+1}$ is $\mathbb{L}_r^{d+2}$-representable:

$$ \|\mathbf{x}\|_2 \le t \implies \left\| \begin{bmatrix} \mathbf{x} \\ t/2 - t/2 \end{bmatrix} \right\|_2 \le t/2 + t/2. $$

- In fact $\mathbb{L}_r^{d+2}$ is the $\mathbb{L}^{d+2}$ rotated by 45$^\circ$ in the $t_1$-$t_2$ plane, hence 1-to-1 and onto:

$$ \begin{bmatrix} \mathbf{x} \\ t_1 \\ t_2 \end{bmatrix} \mapsto \begin{bmatrix} \mathbf{x} \\ (t_1-t_2)/2 \\ (t_1+t_2)/2 \end{bmatrix} $$

Gurobi allows users to input second order cone constraint and quadratic constraints directly.
Mosek allows users to input second order cone constraint, quadratic constraints, and rotated quadratic cone constraint directly.

SOC-representable sets¶

Elementary sets¶

(Epigraphs of affine functions) $\mathbf{a}^T\mathbf{x} + b \le t \iff (0, t - \mathbf{a}^T\mathbf{x} - b) \in \mathbb{L}^{2}$.
(Euclidean norms) $\|\mathbf{x}\|_2 \le t \Leftrightarrow (\mathbf{x}, t) \in \mathbb{L}^{d+1}$.
- (Absolute values) $|x| \le t \Leftrightarrow (x, t) \in \mathbb{L}^2$.
(Sum of squares) $\|\mathbf{x}\|_2^2 \le t \Leftrightarrow (\mathbf{x}, t, 1/2) \in \mathbb{L}_r^{d+2}$.
(Epigraphs of quadratic-fractional functions) Let

$$ g(\mathbf{x}, s) = \begin{cases} \frac{\mathbf{x}^T\mathbf{x}}{s}, & s > 0 \\ 0, & s=0, \mathbf{x}=\mathbf{0} \\ \infty, & \text{otherwise} \end{cases} $$

Then, $\text{epi}g = \{(\mathbf{x}, t): \mathbf{x}^T\mathbf{x}/s \le t, s \ge 0 \} = \mathbb{L}_r^{d+2}$.

(Hyperbola) $\{(t,s)\in\mathbb{R}^2: ts \ge 1, t > 0\} = \text{epi}\tilde{g}$ with $\tilde{g}(s) = g(1,s)$.

Operations that preserve SOCr of sets¶

Intersection.
Cartesian product.
Minkowski sum.
Image $\mathcal{A}C$ of a SOCr set $C$ under an linear (affine) map $\mathcal{A}$.
Inverse image $\mathcal{A}^{-1}C$ of a SOCr set $C$ under a linear (affine) map $\mathcal{A}$.

Operations that preserve SOCr of functions¶

Nonnegative weighted sum.
Composition with an affine mapping.
Pointwise maximum and supremum.
Separable sum. $g(\mathbf{x}_1,\dotsc,\mathbf{x}_m) = g_1(\mathbf{x}_1) + \dotsb + g_m(\mathbf{x}_m)$ is SOCr if each summand is so.
Paritial minimization.
Perspective. If $f(\mathbf{x})$ is SOCr, then its perspective $g(\mathbf{x}, t)=t f(t^{-1}\mathbf{x})$ is SOCr.

More complex (but useful) sets¶

(Ellipsoids) For $\mathbf{P} \in \mathbb{S}_+^n$, we can find $\mathbf{F}\in \mathbb{R}^{d \times k}$ such that $\mathbf{P} = \mathbf{F}^T \mathbf{F}$. Then

$$ \begin{eqnarray*} & \mathbf{x}^T \mathbf{P} \mathbf{x} + \mathbf{q}^T \mathbf{x} + r \le t \\ \iff & (\mathbf{x}, t) \in \text{epi}f, \text{where } f(\mathbf{x})=\mathbf{x}^T\mathbf{P} \mathbf{x} + \mathbf{q}^T \mathbf{x} + r = \|\mathbf{F}\mathbf{x}\|_2^2 + \mathbf{q}^T \mathbf{x} + r, \end{eqnarray*} $$

which is the sum of squared Euclidean norm (under a liner map) and an affine function. Both are SOCr.

This fact shows that QP and QCQP are instances of SOCP.

(Simple polynomial sets)

$$ \begin{eqnarray*} \{(t, x): |t| \le \sqrt x, x \ge 0\} &=& \{ (t,x): (t, x, 1/2) \in \mathbb{L}_r^3\} \\ \{(t, x): t \ge x^{-1}, x \ge 0\} &=& \{ (t,x): (\sqrt 2, x, t) \in \mathbb{L}_r^3\} \\ \{(t, x): t \ge x^{3/2}, x \ge 0\} &=& \{ (t,x): (x, s, t), (s, x, 1/8) \in \mathbb{L}_r^3\} \\ \{(t, x): t \ge x^{5/3}, x \ge 0\} &=& \{ (t,x): (x, s, t), (s, 1/8, z), (z, s, x) \in \mathbb{L}_r^3\} \\ \{(t, x): t \ge x^{(2k-1)/k}, x \ge 0\}&,& k \ge 2, \text{can be represented similarly} \\ \{(t, x): t \ge x^{-2}, x \ge 0\} &=& \{ (t,x): (s, t, 1/2), (\sqrt 2, x, s) \in \mathbb{L}_r^3\} \\ \{(t, x, y): t \ge |x|^3/y^2, y \ge 0\} &=& \{ (t,x,y): (x, z) \in \mathbb{L}^2, (z, y/ 2, s), (s, t/2, z) \in \mathbb{L}_r^3\} \end{eqnarray*} $$

(Geometric mean) The hypograph of the (concave) geometric mean function

$$ \begin{eqnarray*} \mathbb{K}_{\text{gm}}^n = \{(\mathbf{x}, t) \in \mathbb{R}^{n+1}: (x_1 x_2 \cdots x_n)^{1/n} \ge t, \mathbf{x} \geq \mathbf{0}\} \end{eqnarray*} $$

can be represented by rotated quadratic cones. For example,

$$ \begin{eqnarray*} \mathbb{K}_{\text{gm}}^2 &=& \{(x_1, x_2, t): \sqrt{x_1 x_2} \ge t, x_1, x_2 \ge 0\} \\ &=& \{(x_1, x_2, t): (\sqrt 2 t, x_1, x_2) \in \mathbb{L}_r^3\}. \end{eqnarray*} $$

Then, $$ \begin{eqnarray*} \mathbb{K}_{\text{gm}}^4 &=& \{(x_1, x_2, x_3, x_4, t): \sqrt{x_1 x_2 x_3 x_4} \ge t, x_1, x_2, x_3, x_4 \ge 0\} \\ &=& \{(x_1, x_2, x_3, x_4, t): (\exists u_1, u_2 \ge 0)\sqrt{x_1 x_2} \ge u_1, \sqrt{x_3 x_4} \ge u_2, \sqrt{u_1u_2} \ge t, ~ x_1, x_2, x_3, x_4 \ge 0\} \\ &=& \{(x_1, x_2, t): (\exists u_1, u_2 \ge 0) (\sqrt 2 u_1, x_1, x_2) \in \mathbb{L}_r^3, (\sqrt 2 u_2, x_3, x_4) \in \mathbb{L}_r^3, (\sqrt 2 t, u_1, u_2) \in \mathbb{L}_r^3\} \end{eqnarray*} $$

and so on.

For $n\neq 2^l$, consider, e.g.,

$$ (x_1 x_2 x_3)^{1/3} \ge t \iff (x_1 x_2 x_3 t)^{1/4} \ge t $$

(Convex increasing rational powers) For $p,q \in \mathbb{Z}_+$ and $p/q \ge 1$,

$$ \begin{eqnarray*} \mathbb{K}^{p/q} = \{(x, t): x^{p/q} \le t, x \ge 0\} = \{(x,t): (t\mathbf{1}_q, \mathbf{1}_{p-q}, x) \in \mathbb{K}_{\text{gm}}^p\}. \end{eqnarray*} $$

(Convex decreasing rational powers) For any $p,q \in \mathbb{Z}_+$,

$$ \begin{eqnarray*} \mathbb{K}^{-p/q} = \{(x, t): x^{-p/q} \le t, x \ge 0\} = \{(x,t): (x\mathbf{1}_p, t\mathbf{1}_{q}, 1) \in \mathbb{K}_{\text{gm}}^{p+q}\}. \end{eqnarray*} $$

(Harmonic mean) The hypograph of the harmonic mean function $\left( n^{-1} \sum_{i=1}^n x_i^{-1} \right)^{-1}$ is SOCr.
(Rational $\ell_p$-norm) Function $\|\mathbf{x}\|_p = (\sum_{i=1}^d |x_i|^p)^{1/p}$ for $p \ge 1$ rational is SOCr.
References:
Now our catalogue of SOCP terms includes all above terms.
Most of these function are implemented as the built-in function in the convex optimization modeling language cvx (for Matlab) or Convex.jl (for Julia).

SOCP example: group lasso¶

In many applications, we need to perform variable selection at group level. For instance, in factorial analysis, we want to select or de-select the group of regression coefficients for a factor simultaneously. Yuan and Lin (2006) propose the group lasso that

\begin{eqnarray*} &\text{minimize}& \frac 12 \|\mathbf{y} - \beta_0 \mathbf{1} - \mathbf{X} \beta\|_2^2 + \lambda \sum_{g=1}^G w_g \|\beta_g\|_2, \end{eqnarray*}

where $\beta_g$ is the subvector of regression coefficients for group $g$, and $w_g$ are fixed group weights. This is equivalent to the SOCP \begin{eqnarray*} &\text{minimize}& \frac 12 \beta^T \mathbf{X}^T \left(\mathbf{I} - \frac{\mathbf{1} \mathbf{1}^T}{n} \right) \mathbf{X} \beta + \mathbf{y}^T \left(\mathbf{I} - \frac{\mathbf{1} \mathbf{1}^T}{n} \right) \mathbf{X} \beta + \lambda \sum_{g=1}^G w_g t_g \\ &\text{subject to}& \|\beta_g\|_2 \le t_g, \quad g = 1,\ldots, G, \end{eqnarray*} in variables $\beta$ and $t_1,\ldots,t_G$.

Overlapping groups are allowed here.

Sparse group lasso

\begin{eqnarray*} &\text{minimize}& \frac 12 \|\mathbf{y} - \beta_0 \mathbf{1} - \mathbf{X} \beta\|_2^2 + \lambda_1 \|\beta\|_1 + \lambda_2 \sum_{g=1}^G w_g \|\beta_g\|_2 \end{eqnarray*}

achieves sparsity at both group and individual coefficient level and can be solved by SOCP as well.

We have seen zero-sum group lasso before.
Apparently we can solve any previous loss functions (quantile, $\ell_1$, composite quantile, Huber, multi-response model) plus group or sparse group penalty by SOCP.

SOCP example: square-root lasso¶

Belloni, Chernozhukov, and Wang (2011) minimizes

\begin{eqnarray*} \|\mathbf{y} - \beta_0 \mathbf{1} - \mathbf{X} \beta\|_2 + \lambda \|\beta\|_1 \end{eqnarray*}

by SOCP. This variant generates the same solution path as lasso (why?) but simplifies the choice of $\lambda$.

In [3]:

using Statistics
using Random, LinearAlgebra, SparseArrays

Random.seed!(123) # seed

n, p = 100, 10
# Design matrix
X0 = randn(n, p)
X = [ones(n, 1) (X0 .- mean(X0 ,dims=1)) ./ std(X0, dims=1)]   # design matrix is standardized and includes intercept
# True regression coefficients (first 5 are non-zero)
β = [1.0; randn(5); zeros(p - 5)]
# Responses
y = X * β + randn(n)

# plot the true β
using Plots; gr()

plt = plot(1:p, β[2:end], line=:stem, marker=2, legend=:none)
xlabel!(plt, "coordinate")
ylabel!(plt, "beta_j")
title!(plt, "True beta")

Out[3]:

In [4]:

using Convex

## Use Mosek solver
#using Mosek, MosekTools
#opt = () -> Mosek.Optimizer(LOG=0)

## Use SCS solver
#using SCS
#opt = () -> SCS.Optimizer(verbose=0)  

# Use COSMO solver
using COSMO
opt = () -> COSMO.Optimizer(max_iter=10000, verbose=false) 

# solve at a grid of λ
λgrid = 0:10:200
β̂path = zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ
β̂lasso = Variable(size(X, 2))
@time for i in 1:length(λgrid)
    λ = λgrid[i]
    # objective
    problem = minimize(0.5sumsquares(y - X * β̂lasso) + λ * norm(β̂lasso[2:end], 1))
    solve!(problem, opt)
    β̂path[i, :] = β̂lasso.value
end

plt = plot(collect(λgrid), β̂path, legend=:none)
xlabel!(plt, "lambda")
ylabel!(plt, "beta_hat")
title!(plt, "Lasso")

 36.154562 seconds (92.12 M allocations: 5.221 GiB, 7.07% gc time, 1.75% compilation time)

Out[4]:

SOCP example: $\ell_p$-norm regression¶

$\ell_p$ regression with $p \ge 1$ a rational number

\begin{eqnarray*} &\text{minimize}& \|\mathbf{y} - \mathbf{X} \beta\|_p \end{eqnarray*}

can be formulated as a SOCP. Why? For instance, $\ell_{3/2}$ regression combines advantage of both robust $\ell_1$ regression and least squares.

In [5]:

λlist = 0:0.5:10
βpath = zeros(length(λlist), size(X, 2)) # each row is β̂ at a λ
βsqrtlasso = Variable(size(X, 2))
t = Variable()
@time for i = 1:length(λlist)
    λ = λlist[i]
    problem = minimize(norm(y - X * βsqrtlasso, 2) + + λ * norm(βsqrtlasso[2:end], 1))
    solve!(problem, opt)
    βpath[i, :] = βsqrtlasso.value
end

plt = plot(collect(λlist), βpath, legend=:none)
xlabel!(plt, "lambda")
ylabel!(plt, "beta_hat")
title!(plt, "Square-root Lasso")

  0.281433 seconds (570.34 k allocations: 50.319 MiB, 19.59% gc time, 66.93% compilation time)

Out[5]:

SOCP example: image denoising by ROF model¶

ROF model:

Rudin, L.I., Osher, S. and Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4), pp.259-268. https://doi.org/10.1016/0167-2789(92)90242-F

SOCP method:

Goldfarb, D. and Yin, W., 2005. Second-order cone programming methods for total variation-based image restoration. SIAM Journal on Scientific Computing, 27(2), pp.622-645. https://doi.org/10.1137/040608982

Acknowledgment¶

Many parts of this lecture note is based on Dr. Hua Zhou's 2019 Spring Statistical Computing course notes available at http://hua-zhou.github.io/teaching/biostatm280-2019spring/index.html.