# Piece-wise linear approximation of a convex function¶

It is clear that each convex function can be approximated to some extent by a convex piece-wise linear function. In this notebook we show how to model this as a least squares problem. This model was introduced and studied by Kuosmanen and Seijo, Sen, among others.

We are given a set of points $X_1,\ldots,X_n\in \mathbb{R}^d$ and $Y_1,\ldots,Y_n\in\mathbb{R}$, which we presume were generated as $Y_i = f(X_i) + \varepsilon_i$ where $f$ is a convex function and $\varepsilon_i$ is noise. Our least squares problem looks as follows

$$\begin{array}{rll} \text{minimize} & \sum_{i=1}^n (t_i - Y_i)^2 & \\ \text{subject to} & t_i \geq t_j + s_j^T (X_i - X_j) &\text{for all}\ 1\leq i,j\leq n, \\ & t_1, ..., t_n \in \mathbb{R}, & \\ & s_1, ..., s_n \in \mathbb{R}^d. & \end{array}$$

Then the piecewise linear function approximating the data will be defined as the maximum $\hat{\Phi}(x)=\text{max}\{\Phi_i(x),\ i=1,\ldots,n\}$ of $n$ linear functions $\Phi_i:\mathbb{R}^d\to \mathbb{R}$ given by

$$\Phi_i(x) = t_i + s_i^T(x-X_i)$$

for $i=1,\ldots,n$.

Note that $\Phi_i(X_i)=t_i$ and $\nabla \Phi_i(X_i)=s_i$. Therefore the main constraint appearing in the model corresponds precisely the convexity requirement:

$$\hat{\Phi}(X_i) - \hat{\Phi}(X_j) \geq \langle\nabla \hat{\Phi}(X_j),X_i-X_j\rangle$$

for the function $\Theta$.

### Preparing synthetic data¶

In the example we will show an approximation of a simple quadratic function $f(x) = x^Tx.$

Let us first define our data. We generate $n$ points of dimension $d$ uniformly from $[-1, 1]^d$. The corresponding respose is given as $Y_i = f(X_i) + \varepsilon_i$. Error $\varepsilon$ follows normal distribution $N(0, \sigma^2 I_n)$, where $\sigma^2 = Var(f(\mathbf{X})) / SNR$ and $SNR = 3$. Lastly, we mean-center and standardize to unit $l_2$ norm the responses $Y$ and the data $X$. We do this to get a more predictive model.

In [1]:
import numpy as np
from mosek.fusion import *
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 10]
from matplotlib import cm
import sys
In [2]:
np.random.seed(0)

# specify the dimension
n = 200
d = 2

# specify the function
fun = lambda x: np.sum(np.multiply(x, x), 1)   # x^T x

# generate data and get corresponding responses
x = np.random.uniform(-1, 1, n*d).reshape(n, d)
response = fun(x)

# compute response vector with error
varResponse = np.var(response, ddof=1)  # unbiased sample variance
SNR = 3
stdError = np.sqrt(varResponse / SNR)
error = np.random.normal(0, stdError, n)
reponseWithError = response + error

# standardize and mean-center the response
meanY = np.sum(reponseWithError) / n
standY = reponseWithError - meanY
norm = np.sqrt( np.sum( np.square( standY )))
Y = standY / norm

# standardize the x (by the dimension) and mean-center
meanX = np.sum(x, axis=0) / n
standX = x - meanX
norm = np.sqrt(np.sum (np.square( standX ), axis=0))
X = np.divide(standX, norm)
In [3]:
# Plot generated data
fig = plt.figure()

# set title
ax.set_title("Generated Data")

# scatter old points
scatter = ax.scatter(X[:, 0], X[:, 1], Y)

### Fusion Model¶

In the following block, usage of Mosek Fusion for Python for this case is presented.

In [4]:
# solve QP
with Model('ConvexApproxLeastSquares') as M:

############## create variables ##############

m =  M.variable( 1, Domain.unbounded() )       # maximum variable
t =  M.variable( n, Domain.unbounded() )
ss = M.variable( [n, d], Domain.unbounded() )  # variables s stored in n X d matrix

############## create constraints ##############

# create Qr constraints

# auxDiff = t - ys
auxDiff = M.variable( n, Domain.unbounded() )
M.constraint( Expr.sub(t, auxDiff), Domain.equalsTo(Y) )

# (0.5, m, t - ys) in Q_r
auxHalf = M.variable( 1, Domain.equalsTo(0.5) )
z1 = Var.vstack( auxHalf, m, auxDiff )
M.constraint( z1, Domain.inRotatedQCone() )

# create constraints on the maximum of the linear-piecewise functions
# i.e.
#       0 >= t_j - t_i + <s_j,x_i - x_j>

# compute t_j - t_i
t_is = Expr.flatten( Expr.repeat(t, n, 1) )
t_js = Expr.repeat( t, n, 0 )

Diff = Expr.sub(t_js, t_is)

# make a dot product <s_j,x_i - x_j>
x_is = np.repeat( X, repeats=n, axis=0 )
x_js = np.tile( X, (n, 1) )
ss_stack = Expr.repeat( ss, n, 0 )

dotProd = Expr.sum( Expr.mulElm(ss_stack ,x_is - x_js) , 1 )   # dot product

M.constraint( expr, Domain.lessThan(0) )

############## solve problem ##############

# minimize the least squares
M.objective( "obj", ObjectiveSense.Minimize, m )

# set the log handler to stdout
M.setLogHandler(sys.stdout)

M.solve()

# get the optimal values
solt = t.level()
sols = ss.level().reshape((n, d))
Problem
Name                   : ConvexApproxLeastSquares
Objective sense        : minimize
Type                   : CONIC (conic optimization problem)
Constraints            : 40200
Affine conic cons.     : 1
Disjunctive cons.      : 0
Cones                  : 0
Scalar variables       : 803
Matrix variables       : 0
Integer variables      : 0

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 200
Eliminator terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 2                 time                   : 0.00
Lin. dep.  - tries                  : 1                 time                   : 0.00
Lin. dep.  - number                 : 0
Presolve terminated. Time: 0.03
Problem
Name                   : ConvexApproxLeastSquares
Objective sense        : minimize
Type                   : CONIC (conic optimization problem)
Constraints            : 40200
Affine conic cons.     : 1
Disjunctive cons.      : 0
Cones                  : 0
Scalar variables       : 803
Matrix variables       : 0
Integer variables      : 0

Optimizer  - solved problem         : the dual
Optimizer  - Constraints            : 592
Optimizer  - Cones                  : 1
Optimizer  - Scalar variables       : 39206             conic                  : 202
Optimizer  - Semi-definite variables: 0                 scalarized             : 0
Factor     - setup time             : 0.02              dense det. time        : 0.00
Factor     - ML order time          : 0.00              GP order time          : 0.00
Factor     - nonzeros before factor : 9.91e+04          after factor           : 9.91e+04
Factor     - dense dim.             : 0                 flops                  : 1.94e+07
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME
0   2.1e-01  1.0e+00  2.1e+00  0.00e+00   7.071067812e-01   -3.535533906e-01  1.0e+00  0.06
1   1.2e-01  5.7e-01  1.8e-01  1.23e+00   2.938859452e-01   -2.209341240e-01  5.7e-01  0.11
2   3.4e-02  1.7e-01  3.7e-02  2.36e+00   3.283410900e-01   2.541459120e-01   1.7e-01  0.13
3   5.1e-03  2.4e-02  2.0e-03  1.57e+00   2.984161649e-01   2.900558823e-01   2.4e-02  0.14
4   1.2e-03  5.9e-03  2.3e-04  1.07e+00   2.833396729e-01   2.813883669e-01   5.9e-03  0.15
5   5.2e-04  2.5e-03  6.1e-05  1.01e+00   2.697746770e-01   2.689233337e-01   2.5e-03  0.16
6   1.9e-04  9.3e-04  1.3e-05  1.00e+00   2.540998423e-01   2.537768700e-01   9.3e-04  0.17
7   1.1e-04  5.2e-04  5.5e-06  9.90e-01   2.405247711e-01   2.403411101e-01   5.2e-04  0.18
8   7.6e-05  3.7e-04  3.2e-06  9.87e-01   2.395813200e-01   2.394502320e-01   3.7e-04  0.19
9   3.5e-05  1.7e-04  9.8e-07  9.44e-01   2.282203169e-01   2.281592375e-01   1.7e-04  0.20
10  1.8e-05  8.7e-05  3.7e-07  9.48e-01   2.228880403e-01   2.228554922e-01   8.7e-05  0.21
11  1.2e-05  5.6e-05  1.9e-07  9.47e-01   2.206282771e-01   2.206070722e-01   5.6e-05  0.22
12  8.1e-06  3.9e-05  1.1e-07  9.45e-01   2.192059131e-01   2.191906637e-01   3.9e-05  0.22
13  5.6e-06  2.7e-05  6.6e-08  9.08e-01   2.175484624e-01   2.175376218e-01   2.7e-05  0.23
14  5.3e-06  2.6e-05  6.1e-08  8.31e-01   2.172616367e-01   2.172513140e-01   2.6e-05  0.24
15  4.1e-06  2.0e-05  4.2e-08  8.35e-01   2.161293821e-01   2.161211400e-01   2.0e-05  0.25
16  3.1e-06  1.5e-05  2.8e-08  8.61e-01   2.152832407e-01   2.152768442e-01   1.5e-05  0.26
17  2.5e-06  1.2e-05  2.1e-08  8.75e-01   2.148823263e-01   2.148769841e-01   1.2e-05  0.27
18  1.7e-06  8.2e-06  1.2e-08  8.85e-01   2.143013544e-01   2.142976924e-01   8.2e-06  0.28
19  1.6e-06  7.7e-06  1.1e-08  9.03e-01   2.142786291e-01   2.142751783e-01   7.7e-06  0.29
20  9.9e-07  4.8e-06  5.4e-09  9.09e-01   2.137893790e-01   2.137871826e-01   4.8e-06  0.30
21  8.5e-07  4.1e-06  4.4e-09  8.60e-01   2.136317544e-01   2.136298245e-01   4.1e-06  0.30
22  8.1e-07  3.9e-06  4.1e-09  8.33e-01   2.135692363e-01   2.135673840e-01   3.9e-06  0.31
23  7.9e-07  3.8e-06  3.9e-09  8.35e-01   2.135530013e-01   2.135511868e-01   3.8e-06  0.32
24  6.9e-07  3.3e-06  3.2e-09  8.33e-01   2.134027805e-01   2.134011731e-01   3.3e-06  0.33
25  3.5e-07  1.7e-06  1.2e-09  8.23e-01   2.129778108e-01   2.129769367e-01   1.7e-06  0.34
26  3.3e-07  1.6e-06  1.1e-09  8.43e-01   2.129379387e-01   2.129371256e-01   1.6e-06  0.35
27  2.2e-07  1.1e-06  6.5e-10  8.48e-01   2.127934005e-01   2.127928223e-01   1.1e-06  0.36
28  1.2e-07  5.8e-07  2.7e-10  8.72e-01   2.126512890e-01   2.126509681e-01   5.8e-07  0.37
29  1.2e-07  5.7e-07  2.6e-10  9.18e-01   2.126472296e-01   2.126469133e-01   5.7e-07  0.37
30  4.6e-08  2.2e-07  6.5e-11  9.22e-01   2.125597556e-01   2.125596301e-01   2.2e-07  0.38
31  1.7e-08  8.3e-08  1.5e-11  9.61e-01   2.125239401e-01   2.125238932e-01   8.3e-08  0.39
32  3.7e-09  1.9e-08  1.7e-12  9.85e-01   2.125108858e-01   2.125108750e-01   1.9e-08  0.40
Optimizer terminated. Time: 0.42

Interior-point solution summary
Problem status  : PRIMAL_AND_DUAL_FEASIBLE
Solution status : OPTIMAL
Primal.  obj: 2.1251088582e-01    nrm: 5e+01    Viol.  con: 4e-14    var: 0e+00    acc: 0e+00
Dual.    obj: 2.1251087505e-01    nrm: 1e+00    Viol.  con: 6e-09    var: 1e-12    acc: 0e+00

### Model evalutation¶

Let us evaluate our fit, this can be done as follows

$\hat \Phi (\mathbf{x}) = \max \limits _{1 \leq j \leq n} \{ \hat{t}_j + \hat{s}_j^T (\mathbf{x} - X_j) \}$

We can now see obvious drawback of such method; we need to keep all points. Please note that code below is not vectorized for the educational purposes but should be if aiming for efficiency.

In [5]:
# create points for evaluation
n = 100
xs = np.linspace(-.1, .1, n)
ys = np.linspace(-.1, .1, n)

xx, yy = np.meshgrid(xs, ys)
x = np.c_[xx.reshape(xx.size, 1), yy.reshape(yy.size, 1)]
x = x.astype(np.double)

# get number of linear functions
numberOfLinearFuns = solt.size

# evaluate data
phi = np.zeros(n**2)

for i in range(0, n**2):

possibleValues = np.zeros(numberOfLinearFuns)
for j in range(0, numberOfLinearFuns):

# max_j { t_j + <s_j , x_i - X_j> }
val = solt[j] + np.dot(sols[j, :], x[i, :] - X[j, :])
possibleValues[j] = val

phi[i] = np.max(possibleValues)

### Plot the results¶

Finally, we can plot our solution.

In [6]:
# create plot
fig = plt.figure()

# set title
ax.set_title("Linear piece-wise approximation")

# scatter old points
ax.scatter(X[:, 0], X[:, 1], Y)

# plot the model
xx = x[:, 0].reshape(n, n)
yy = x[:, 1].reshape(n, n)
zz = phi.reshape(n, n)
axs = ax.contour3D(xx, yy, zz, 50, cmap=cm.cool)

# set the legend
leg = ax.legend(["Data"])