Practical Optimization for Stats Nerds¶

Ryan J. O'Neil
Data Science DC
March 20, 2017

ryanjoneil@gmail.com
https://ryanjoneil.github.io
@ryanjoneil

What to expect¶

We'll take familiar models, show how they work from an optimization perspective, then show applications.

This talk is (mostly) about model formulation.

It might get a little technical, so...

Take-aways¶

Many statistical techniques are based on some sort of optimization.

Optimization has lots of uses, such as solving decision models.

Learning to structure problems you already know for optimization solvers is a great way to understand them!

Model 1: Least Squares¶

We observe noisy data from an unknown function. We want to infer that function.

But let's assume that, deep down, we actually know the function. That way we can generate noisy data and see if our techniques work right.

Function:¶

$$y = 3x^2 - 2x + 10 + \epsilon$$

Noise:¶

$$\epsilon \sim N\left(0, 25\right)$$

In [1]:

# Generate some random data.
import numpy as np
import random

# Sort the data so they're easier to plot later.
x = [random.uniform(-10, 10) for _ in range(500)]
x.sort()

y = []
for xi in x:
    eps = random.normalvariate(0, 25)
    yi = 3*xi**2 - 2*xi + 10 + eps
    y.append(yi)
    
x = np.array(x)
y = np.array(y)

In [2]:

from bokeh.charts import Scatter, output_notebook, show
output_notebook()

scatter = Scatter({'x': x, 'y': y}, width=750, height=400)
show(scatter)

Loading BokehJS ...

Least Squares the way you do it...¶

...assuming you use scikit-learn like every other sane Python programmer.

We looked at a chart of our data and decided to describe it with:

A quadratic term
A linear term
An offset

In [3]:

from sklearn.linear_model import LinearRegression

# Note: A is our feature matrix.
#       We intentionally add a "1" for the offset, instead of letting 
#       sklearn do that for us. This will make sense soon.

X = np.array([[xi**2, xi, 1] for xi in x])

lin = LinearRegression(fit_intercept=False)
lin.fit(X, y)

print(lin.coef_)

[  2.98259014  -2.25266697  10.66100195]

/Users/roneil/src/github.com/ryanjoneil/talks/2017/03/data-science-dc/lib/python3.6/site-packages/scipy/linalg/basic.py:884: RuntimeWarning: internal gelsd driver lwork query error, required iwork dimension not returned. This is likely the result of LAPACK bug 0038, fixed in LAPACK 3.2.2 (released July 21, 2010). Falling back to 'gelss' driver.
  warnings.warn(mesg, RuntimeWarning)

How'd we do?

In [4]:

from bokeh.charts import Line
y_hat = lin.predict(X)
show(Line({'x': x, 'y': y_hat}, x='x', y='y', width=750, height=400))

Least Squares the way your grandparents did it...¶

...with chalk and a slab of slate.

Construct a function to calculate the sum of squared residuals...

$$ \begin{align} \text{min}\ f(\beta) & = \frac{1}{2} ||y - X \beta||^2 \\ & \\ & = \frac{1}{2} (y - X \beta)'(y - X \beta) \\ & \\ & = \frac{1}{2} y'y - y'X\beta + \frac{1}{2} \beta'X'X\beta \\ & \\ & = \frac{1}{2} y'y - y'X\beta + \frac{1}{2} \beta'X'X\beta \\ & \\ \end{align} $$

First Order Necessary Conditions¶

...and take its derivative to find a closed-form solution.

$$\nabla f(\beta) = X'X\beta - y'X\beta = 0$$

$$\beta = (X'X)^{-1}X'y$$

In [5]:

from numpy.linalg import inv

# beta = (X'X)^-1 * X * y
Xt = X.transpose()
pseudo_inv = inv(np.matmul(Xt, X))
beta = np.matmul(np.matmul(pseudo_inv, Xt), y)
print(beta)

[  2.98259014  -2.25266697  10.66100195]

How'd grandma and grandpa do?

In [6]:

y_hat = [beta[0]*xi**2 + beta[1]*xi + beta[2] for xi in x]
show(Line({'x': x, 'y': y_hat}, x='x', y='y', width=750, height=400))

Least Squares the way your crazy uncle Eddie does it...¶

...'cause he used to work at NASA and code in Forth.

cvxopt provides a qp method that can solve anything of this form.

$$ \begin{align} \text{min} \ \ \ & \ \frac{1}{2} \beta'P\beta + q'\beta \\ & \\ \text{s.t.} \ \ \ & \ G\beta \preceq h \\ & \\ & \ A\beta = b \end{align} $$

So we need to convert from

$$\frac{1}{2} \beta'X'X\beta - y'X\beta + \frac{1}{2} y'y $$

to another form

$$\frac{1}{2} \beta'P\beta + q'\beta$$

which is simply

$$P = X'X, q = -y'X$$

In [7]:

import cvxopt as cvx

P = cvx.matrix(np.matmul(Xt, X))
q = cvx.matrix(-1 * np.matmul(y.transpose(), X))
solution = cvx.solvers.qp(P, q)
beta = solution['x'] # unrelated to our x
print(beta)

[ 2.98e+00]
[-2.25e+00]
[ 1.07e+01]

How'd Crazy Uncle Eddie do?

In [8]:

y_hat = [beta[0]*xi**2 + beta[1]*xi + beta[2] for xi in x]
show(Line({'x': x, 'y': y_hat}, x='x', y='y', width=750, height=400))

So what's different about Crazy Uncle Eddie?¶

Well, besides the obvious.

While all three techniques produced the same result, Crazy Uncle Eddie's is interesting because it is more general than the others.

Crazy Eddie can solve any quadratic optimization problem, of which Least Squares is just one instance.

If we change the structure of the problem slightly:

We probably can't solve it with scikit-learn
Grams and Gramps have to go back to their chalkboard
Crazy Eddie can update the inputs to his problem and reoptimize

Application 1: Portfolio Optimization¶

We have a big pot of money to allocate among different investments. Lucky us!

Some investment returns are correlated. They go up and down together.

Other returns are anticorrelated. They tend to do the opposite things.

How do we allocate our money to maximize our expected return, subject to our tolerance for risk?

We'll use 100 months of exchange rate data from the Fed, circa 2014.

Since this talk isn't about data wrangling, I've already cleaned it up into the important pieces.

Expected monthly return data for each foreign currency
A covariance matrix for those investments

The Markowitz Porfolio Optimization Model¶

Inputs:¶

$$\mu = \text{vector of expected investment returns}$$$$\Sigma = \text{covariance matrix for returns}$$$$\alpha = \text{unitless measure of risk aversion}$$

Model:¶

$x$ tells me how much of my total budget to put in each investment.

$$ \begin{align} \text{max} \ \ \ & \ \mu'x - \alpha x \Sigma x \\ \text{s.t.} \ \ \ & \ e'x = 1 \\ & \ x \ge 0 \end{align} $$

But wait! This is a maximization problem! The last model used $\text{min}$.

$$ \begin{align} \text{min} \ \ \ & \ \alpha x \Sigma x - \mu'x \\ \text{s.t.} \ \ \ & \ e'x = 1 \\ & \ x \ge 0 \end{align} $$

The only differences between this model and the least squares model are the constraints we've added.

This one forces the model to allocate all of my budget into investments:

$$e'x = 1$$

This one disallows the model from making negative investments:

$$x \ge 0$$

In [9]:

# Read in the returns and covariance data.
import pandas as pd
exp_returns = pd.read_csv('portfolio-optimization/currency-returns.csv')
exp_returns.head()

Out[9]:

	Unnamed: 0	mean	variance
0	RXI$US_N.M.AL	0.152151	10.996718
1	RXI$US_N.M.EU	0.002683	5.928188
2	RXI$US_N.M.NZ	0.220217	9.690793
3	RXI$US_N.M.UK	-0.159779	5.098969
4	RXI_N.M.BZ	-0.128507	12.743632

In [10]:

returns_cov = pd.read_csv('portfolio-optimization/currency-covariance.csv', header=None)
returns_cov.head()

Out[10]:

	0	1	2	3	4	5	6	7	8	9	...	13	14	15	16	17	18	19	20	21	22
0	10.996718	5.181782	8.654762	4.457704	9.789818	5.421746	0.194428	5.166875	0.028496	4.379632	...	6.494732	7.078498	7.083824	8.574515	3.287411	0.343673	4.163674	2.450772	2.381722	0.542450
1	5.181782	5.928188	4.487890	3.659548	4.686754	2.467804	0.313081	5.867674	0.038117	2.545200	...	2.894461	5.350733	5.515164	4.314480	2.265781	0.418269	5.173168	1.559857	1.351946	1.776855
2	8.654762	4.487890	9.690793	4.330531	8.130412	4.268307	0.129303	4.466514	0.017975	3.919170	...	5.536301	5.602586	6.054567	6.733828	2.811946	0.451742	3.903260	2.135899	2.569281	0.379954
3	4.457704	3.659548	4.330531	5.098969	4.301933	2.573100	0.137068	3.632088	0.012420	1.821456	...	2.775715	4.081803	4.233658	3.749874	1.780620	0.326828	3.335175	1.356927	1.014109	1.584577
4	9.789818	4.686754	8.130412	4.301933	12.743632	5.433992	0.279201	4.689891	-0.034405	5.567318	...	7.177333	7.288341	6.774689	8.761299	2.999360	0.843294	4.269524	2.163251	2.439204	0.496304

5 rows × 23 columns

In [11]:

# A model that will return an optimal portfolio for any risk aversion.
def portfolio(alpha):
    P = cvx.matrix(alpha * returns_cov.as_matrix())
    q = cvx.matrix(-exp_returns['mean'].as_matrix())
    G = cvx.matrix(0.0, (len(q),len(q)))
    G[::len(q)+1] = -1.0
    h = cvx.matrix(0.0, (len(q),1))
    A = cvx.matrix(1.0, (1,len(q)))
    b = cvx.matrix(1.0)

    solution = cvx.solvers.qp(P, q, G, h, A, b)
    return exp_returns['mean'].dot(solution['x'])[0]

In [12]:

risk_aversion = [ra/2.0 for ra in range(41)]
returns = [portfolio(alpha) for alpha in risk_aversion]

     pcost       dcost       gap    pres   dres
 0: -1.4109e+00 -1.2870e+00  6e+01  9e+00  5e+00
 1: -1.8207e-02 -1.2285e+00  1e+00  5e-15  8e-16
 2: -6.2411e-02 -2.9637e-01  2e-01  7e-16  7e-16
 3: -1.4238e-01 -3.5309e-01  2e-01  8e-16  5e-16
 4: -2.8201e-01 -2.9496e-01  1e-02  4e-16  5e-16
 5: -2.8688e-01 -2.8703e-01  2e-04  1e-16  3e-16
 6: -2.8695e-01 -2.8695e-01  2e-06  3e-16  2e-16
 7: -2.8695e-01 -2.8695e-01  2e-08  1e-16  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -2.6295e-01 -1.3019e+00  4e+01  5e+00  6e+00
 1: -3.1048e-02 -1.0159e+00  2e+00  2e-01  2e-01
 2:  3.3465e-02 -3.1889e-01  5e-01  3e-02  3e-02
 3: -1.3476e-01 -2.5385e-01  1e-01  3e-16  7e-16
 4: -1.9154e-01 -2.0318e-01  1e-02  1e-16  7e-16
 5: -2.0001e-01 -2.0086e-01  9e-04  5e-17  3e-16
 6: -2.0070e-01 -2.0074e-01  4e-05  6e-17  6e-16
 7: -2.0073e-01 -2.0073e-01  1e-06  2e-16  1e-15
 8: -2.0073e-01 -2.0073e-01  1e-08  3e-16  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.7310e-01 -1.2428e+00  4e+01  5e+00  6e+00
 1: -4.7542e-03 -9.2263e-01  2e+00  2e-01  2e-01
 2:  3.8252e-02 -2.6797e-01  4e-01  3e-02  3e-02
 3: -9.4620e-02 -1.8481e-01  9e-02  1e-16  1e-15
 4: -1.4057e-01 -1.5061e-01  1e-02  3e-16  5e-16
 5: -1.4747e-01 -1.4833e-01  9e-04  1e-16  6e-16
 6: -1.4822e-01 -1.4825e-01  3e-05  1e-16  5e-16
 7: -1.4825e-01 -1.4825e-01  1e-06  2e-16  4e-16
 8: -1.4825e-01 -1.4825e-01  2e-08  2e-16  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.3049e-01 -1.1978e+00  4e+01  5e+00  6e+00
 1:  1.0823e-02 -8.5585e-01  2e+00  2e-01  2e-01
 2:  4.1508e-02 -2.1689e-01  4e-01  3e-02  3e-02
 3: -6.4792e-02 -1.2939e-01  6e-02  2e-16  8e-16
 4: -9.7828e-02 -1.0446e-01  7e-03  3e-16  9e-16
 5: -1.0228e-01 -1.0300e-01  7e-04  2e-16  5e-16
 6: -1.0293e-01 -1.0295e-01  2e-05  1e-16  4e-16
 7: -1.0295e-01 -1.0295e-01  3e-07  1e-16  5e-16
 8: -1.0295e-01 -1.0295e-01  3e-09  1e-16  3e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.0337e-01 -1.1595e+00  3e+01  5e+00  6e+00
 1:  2.2165e-02 -8.0202e-01  2e+00  2e-01  2e-01
 2:  4.3501e-02 -1.6970e-01  3e-01  2e-02  2e-02
 3: -4.8964e-02 -9.2477e-02  4e-02  2e-16  1e-15
 4: -7.3350e-02 -7.7843e-02  4e-03  5e-17  1e-15
 5: -7.6465e-02 -7.6904e-02  4e-04  1e-16  6e-16
 6: -7.6868e-02 -7.6878e-02  1e-05  3e-16  3e-16
 7: -7.6877e-02 -7.6878e-02  1e-07  1e-16  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -8.3552e-02 -1.1256e+00  3e+01  5e+00  6e+00
 1:  3.1025e-02 -7.5646e-01  2e+00  2e-01  2e-01
 2:  4.4428e-02 -1.3898e-01  3e-01  2e-02  2e-02
 3: -3.7424e-02 -7.1497e-02  3e-02  8e-17  9e-16
 4: -5.7093e-02 -6.0638e-02  4e-03  1e-16  8e-16
 5: -5.9609e-02 -5.9920e-02  3e-04  4e-16  5e-16
 6: -5.9898e-02 -5.9904e-02  5e-06  2e-16  8e-16
 7: -5.9904e-02 -5.9904e-02  5e-08  2e-16  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -6.7880e-02 -1.0982e+00  3e+01  5e+00  6e+00
 1:  3.8514e-02 -7.1915e-01  2e+00  2e-01  2e-01
 2:  4.4486e-02 -1.3048e-01  3e-01  2e-02  2e-02
 3: -2.3873e-02 -6.0193e-02  4e-02  4e-04  5e-04
 4: -4.4365e-02 -4.8338e-02  4e-03  2e-16  7e-16
 5: -4.7131e-02 -4.7498e-02  4e-04  6e-17  7e-16
 6: -4.7472e-02 -4.7478e-02  6e-06  3e-16  5e-16
 7: -4.7478e-02 -4.7478e-02  6e-08  3e-16  4e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -5.4871e-02 -1.0978e+00  3e+01  5e+00  6e+00
 1:  4.6908e-02 -7.0310e-01  2e+00  2e-01  2e-01
 2:  4.7530e-02 -1.2405e-01  3e-01  2e-02  2e-02
 3: -1.4964e-02 -5.0020e-02  4e-02  2e-04  3e-04
 4: -3.4679e-02 -3.8443e-02  4e-03  1e-16  7e-16
 5: -3.7331e-02 -3.7669e-02  3e-04  2e-16  6e-16
 6: -3.7646e-02 -3.7652e-02  5e-06  1e-16  6e-16
 7: -3.7651e-02 -3.7651e-02  5e-08  2e-16  4e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -4.3716e-02 -1.0974e+00  3e+01  5e+00  6e+00
 1:  5.4196e-02 -6.8865e-01  2e+00  2e-01  2e-01
 2:  5.0246e-02 -1.1735e-01  3e-01  2e-02  2e-02
 3: -7.9683e-03 -4.1274e-02  3e-02  1e-17  1e-15
 4: -2.6682e-02 -3.0144e-02  3e-03  1e-16  7e-16
 5: -2.9169e-02 -2.9464e-02  3e-04  1e-16  9e-16
 6: -2.9445e-02 -2.9449e-02  4e-06  2e-16  5e-16
 7: -2.9449e-02 -2.9449e-02  4e-08  2e-16  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -3.3930e-02 -1.0969e+00  4e+01  5e+00  6e+00
 1:  6.0601e-02 -6.7563e-01  2e+00  2e-01  2e-01
 2:  5.2718e-02 -1.1058e-01  3e-01  2e-02  2e-02
 3: -1.7058e-03 -3.4171e-02  3e-02  1e-16  1e-15
 4: -1.9669e-02 -2.2968e-02  3e-03  1e-16  1e-15
 5: -2.2073e-02 -2.2343e-02  3e-04  1e-16  7e-16
 6: -2.2327e-02 -2.2330e-02  4e-06  1e-16  6e-16
 7: -2.2330e-02 -2.2330e-02  4e-08  2e-16  6e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -2.5201e-02 -1.0963e+00  4e+01  5e+00  6e+00
 1:  6.6299e-02 -6.6378e-01  2e+00  2e-01  2e-01
 2:  5.5021e-02 -1.0386e-01  3e-01  2e-02  2e-02
 3:  3.7272e-03 -2.7715e-02  3e-02  1e-16  1e-15
 4: -1.3351e-02 -1.6585e-02  3e-03  1e-16  8e-16
 5: -1.5724e-02 -1.5981e-02  3e-04  2e-16  7e-16
 6: -1.5966e-02 -1.5969e-02  4e-06  3e-16  4e-16
 7: -1.5969e-02 -1.5969e-02  4e-08  2e-16  4e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.7311e-02 -1.0955e+00  4e+01  5e+00  6e+00
 1:  7.1360e-02 -6.5307e-01  2e+00  1e-01  2e-01
 2:  5.7368e-02 -9.7504e-02  3e-01  2e-02  2e-02
 3:  8.8197e-03 -2.1891e-02  3e-02  2e-16  1e-15
 4: -7.5622e-03 -1.0767e-02  3e-03  1e-16  8e-16
 5: -9.9228e-03 -1.0171e-02  2e-04  3e-16  2e-15
 6: -1.0156e-02 -1.0160e-02  4e-06  2e-16  6e-16
 7: -1.0159e-02 -1.0159e-02  4e-08  2e-16  6e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.0105e-02 -1.0946e+00  4e+01  5e+00  6e+00
 1:  7.5884e-02 -6.4354e-01  2e+00  1e-01  2e-01
 2:  5.9679e-02 -9.1465e-02  3e-01  2e-02  2e-02
 3:  1.3652e-02 -1.6538e-02  3e-02  4e-16  9e-16
 4: -2.1949e-03 -5.3577e-03  3e-03  1e-16  9e-16
 5: -4.5359e-03 -4.7750e-03  2e-04  6e-17  7e-16
 6: -4.7597e-03 -4.7633e-03  4e-06  1e-16  7e-16
 7: -4.7632e-03 -4.7632e-03  4e-08  1e-16  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -3.4681e-03 -1.0935e+00  4e+01  5e+00  6e+00
 1:  8.0049e-02 -6.3465e-01  2e+00  1e-01  2e-01
 2:  6.1857e-02 -8.5535e-02  3e-01  2e-02  2e-02
 3:  1.8143e-02 -1.1436e-02  3e-02  3e-16  1e-15
 4:  2.8321e-03 -2.5884e-04  3e-03  3e-16  6e-16
 5:  5.3004e-04  3.0346e-04  2e-04  2e-16  6e-16
 6:  3.1844e-04  3.1486e-04  4e-06  1e-16  8e-16
 7:  3.1496e-04  3.1492e-04  4e-08  5e-17  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  2.6898e-03 -1.0922e+00  4e+01  5e+00  6e+00
 1:  8.3930e-02 -6.2618e-01  2e+00  1e-01  2e-01
 2:  6.3951e-02 -7.9724e-02  2e-01  2e-02  2e-02
 3:  2.2378e-02 -6.5347e-03  3e-02  2e-16  1e-15
 4:  7.5962e-03  4.5940e-03  3e-03  2e-16  7e-16
 5:  5.3448e-03  5.1321e-03  2e-04  3e-16  9e-16
 6:  5.1467e-03  5.1431e-03  4e-06  3e-16  6e-16
 7:  5.1432e-03  5.1432e-03  4e-08  2e-16  7e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  8.4393e-03 -1.0909e+00  4e+01  5e+00  6e+00
 1:  8.7571e-02 -6.1808e-01  2e+00  1e-01  2e-01
 2:  6.5997e-02 -7.4045e-02  2e-01  2e-02  2e-02
 3:  2.6417e-02 -1.7979e-03  3e-02  7e-17  1e-15
 4:  1.2153e-02  9.2485e-03  3e-03  5e-17  1e-15
 5:  9.9593e-03  9.7607e-03  2e-04  2e-16  8e-16
 6:  9.7750e-03  9.7715e-03  4e-06  1e-16  9e-16
 7:  9.7715e-03  9.7715e-03  4e-08  8e-17  7e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  1.3837e-02 -1.0893e+00  4e+01  5e+00  6e+00
 1:  9.1007e-02 -6.1030e-01  2e+00  1e-01  2e-01
 2:  6.8019e-02 -6.8500e-02  2e-01  2e-02  2e-02
 3:  3.0306e-02  2.8000e-03  3e-02  1e-16  1e-15
 4:  1.6544e-02  1.3741e-02  3e-03  1e-16  9e-16
 5:  1.4412e-02  1.4227e-02  2e-04  3e-18  6e-16
 6:  1.4241e-02  1.4237e-02  4e-06  1e-16  5e-16
 7:  1.4237e-02  1.4237e-02  4e-08  1e-16  5e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  1.8930e-02 -1.0877e+00  4e+01  5e+00  6e+00
 1:  9.4269e-02 -6.0282e-01  2e+00  1e-01  2e-01
 2:  7.0033e-02 -6.3089e-02  2e-01  1e-02  2e-02
 3:  3.4079e-02  7.2789e-03  3e-02  1e-16  8e-16
 4:  2.0799e-02  1.8098e-02  3e-03  1e-16  9e-16
 5:  1.8731e-02  1.8559e-02  2e-04  6e-17  7e-16
 6:  1.8573e-02  1.8569e-02  4e-06  2e-16  1e-15
 7:  1.8570e-02  1.8570e-02  4e-08  6e-17  6e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  2.3755e-02 -1.0860e+00  4e+01  5e+00  6e+00
 1:  9.7380e-02 -5.9561e-01  2e+00  1e-01  2e-01
 2:  7.2051e-02 -5.7806e-02  2e-01  1e-02  2e-02
 3:  3.7759e-02  1.1655e-02  3e-02  3e-16  1e-15
 4:  2.4943e-02  2.2343e-02  3e-03  1e-16  1e-15
 5:  2.2939e-02  2.2780e-02  2e-04  8e-17  7e-16
 6:  2.2794e-02  2.2790e-02  4e-06  7e-17  5e-16
 7:  2.2790e-02  2.2790e-02  4e-08  2e-16  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  2.8347e-02 -1.0842e+00  4e+01  5e+00  6e+00
 1:  1.0036e-01 -5.8864e-01  2e+00  1e-01  2e-01
 2:  7.4080e-02 -5.2647e-02  2e-01  1e-02  2e-02
 3:  4.1366e-02  1.5941e-02  3e-02  3e-16  2e-15
 4:  2.8994e-02  2.6492e-02  3e-03  2e-16  7e-16
 5:  2.7055e-02  2.6906e-02  1e-04  2e-16  2e-15
 6:  2.6921e-02  2.6917e-02  4e-06  2e-16  8e-16
 7:  2.6917e-02  2.6917e-02  4e-08  2e-18  6e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  3.2731e-02 -1.0823e+00  4e+01  5e+00  6e+00
 1:  1.0323e-01 -5.8189e-01  2e+00  1e-01  2e-01
 2:  7.6124e-02 -4.7604e-02  2e-01  1e-02  2e-02
 3:  4.4914e-02  2.0148e-02  2e-02  1e-16  1e-15
 4:  3.2967e-02  3.0559e-02  2e-03  1e-16  1e-15
 5:  3.1091e-02  3.0953e-02  1e-04  2e-16  1e-15
 6:  3.0968e-02  3.0964e-02  4e-06  4e-17  7e-16
 7:  3.0964e-02  3.0964e-02  5e-08  1e-16  9e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  3.6932e-02 -1.0803e+00  4e+01  5e+00  6e+00
 1:  1.0601e-01 -5.7535e-01  2e+00  1e-01  2e-01
 2:  7.8188e-02 -4.2671e-02  2e-01  1e-02  2e-02
 3:  4.8414e-02  2.4285e-02  2e-02  1e-16  1e-15
 4:  3.6873e-02  3.4556e-02  2e-03  1e-16  9e-16
 5:  3.5060e-02  3.4931e-02  1e-04  1e-16  7e-16
 6:  3.4947e-02  3.4942e-02  5e-06  1e-16  7e-16
 7:  3.4942e-02  3.4942e-02  7e-08  3e-16  2e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  4.0970e-02 -1.0782e+00  4e+01  5e+00  6e+00
 1:  1.0870e-01 -5.6899e-01  2e+00  1e-01  2e-01
 2:  8.0272e-02 -3.7840e-02  2e-01  1e-02  2e-02
 3:  5.1876e-02  2.8359e-02  2e-02  2e-16  2e-15
 4:  4.0722e-02  3.8492e-02  2e-03  3e-16  1e-15
 5:  3.8970e-02  3.8850e-02  1e-04  2e-16  1e-15
 6:  3.8866e-02  3.8862e-02  5e-06  6e-17  7e-16
 7:  3.8862e-02  3.8862e-02  1e-07  2e-16  5e-16
 8:  3.8862e-02  3.8862e-02  1e-09  1e-16  7e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  4.4862e-02 -1.0761e+00  4e+01  5e+00  6e+00
 1:  1.1131e-01 -5.6281e-01  2e+00  1e-01  2e-01
 2:  8.2437e-02 -3.3137e-02  2e-01  1e-02  2e-02
 3:  5.5312e-02  3.2376e-02  2e-02  2e-16  2e-15
 4:  4.4523e-02  4.2375e-02  2e-03  6e-17  1e-15
 5:  4.2829e-02  4.2718e-02  1e-04  3e-17  1e-15
 6:  4.2735e-02  4.2730e-02  5e-06  3e-17  1e-15
 7:  4.2730e-02  4.2730e-02  2e-07  1e-16  8e-16
 8:  4.2730e-02  4.2730e-02  2e-09  2e-16  8e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  4.8622e-02 -1.0739e+00  4e+01  5e+00  6e+00
 1:  1.1386e-01 -5.5678e-01  2e+00  1e-01  2e-01
 2:  8.4714e-02 -2.8572e-02  2e-01  1e-02  2e-02
 3:  5.8731e-02  3.6341e-02  2e-02  1e-16  2e-15
 4:  4.8283e-02  4.6211e-02  2e-03  1e-16  1e-15
 5:  4.6644e-02  4.6540e-02  1e-04  1e-16  1e-15
 6:  4.6558e-02  4.6553e-02  5e-06  1e-16  1e-15
 7:  4.6553e-02  4.6553e-02  3e-07  2e-16  1e-15
 8:  4.6553e-02  4.6553e-02  3e-09  2e-16  8e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  5.2265e-02 -1.0717e+00  4e+01  5e+00  6e+00
 1:  1.1636e-01 -5.5091e-01  2e+00  1e-01  2e-01
 2:  8.7006e-02 -2.4082e-02  2e-01  1e-02  2e-02
 3:  6.2126e-02  4.0262e-02  2e-02  4e-16  2e-15
 4:  5.2005e-02  5.0005e-02  2e-03  3e-16  2e-15
 5:  5.0419e-02  5.0323e-02  1e-04  1e-16  1e-15
 6:  5.0341e-02  5.0336e-02  5e-06  1e-16  9e-16
 7:  5.0337e-02  5.0337e-02  4e-07  2e-16  8e-16
 8:  5.0337e-02  5.0337e-02  1e-08  1e-16  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  5.5803e-02 -1.0694e+00  4e+01  5e+00  6e+00
 1:  1.1880e-01 -5.4517e-01  2e+00  1e-01  2e-01
 2:  8.9312e-02 -1.9664e-02  2e-01  1e-02  2e-02
 3:  6.5569e-02  4.4089e-02  2e-02  4e-16  2e-15
 4:  5.5705e-02  5.3761e-02  2e-03  1e-16  1e-15
 5:  5.4160e-02  5.4070e-02  9e-05  1e-16  8e-16
 6:  5.4089e-02  5.4084e-02  5e-06  2e-16  9e-16
 7:  5.4086e-02  5.4085e-02  6e-07  1e-16  1e-15
 8:  5.4085e-02  5.4085e-02  6e-08  3e-16  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  5.9244e-02 -1.0671e+00  4e+01  5e+00  6e+00
 1:  1.2120e-01 -5.3956e-01  2e+00  1e-01  2e-01
 2:  9.1634e-02 -1.5312e-02  2e-01  1e-02  2e-02
 3:  6.8996e-02  4.7879e-02  2e-02  3e-16  2e-15
 4:  5.9375e-02  5.7484e-02  2e-03  1e-16  2e-15
 5:  5.7872e-02  5.7786e-02  9e-05  1e-16  1e-15
 6:  5.7806e-02  5.7801e-02  5e-06  2e-16  7e-16
 7:  5.7803e-02  5.7803e-02  6e-07  1e-16  1e-15
 8:  5.7803e-02  5.7803e-02  7e-08  2e-16  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  6.2599e-02 -1.0647e+00  4e+01  5e+00  6e+00
 1:  1.2357e-01 -5.3408e-01  2e+00  1e-01  2e-01
 2:  9.3970e-02 -1.1021e-02  2e-01  1e-02  1e-02
 3:  7.2411e-02  5.1637e-02  2e-02  5e-16  2e-15
 4:  6.3019e-02  6.1178e-02  2e-03  8e-17  2e-15
 5:  6.1557e-02  6.1473e-02  8e-05  2e-16  8e-16
 6:  6.1494e-02  6.1489e-02  5e-06  2e-16  8e-16
 7:  6.1492e-02  6.1492e-02  4e-07  7e-17  1e-15
 8:  6.1492e-02  6.1492e-02  1e-08  6e-17  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  6.5876e-02 -1.0623e+00  4e+01  5e+00  6e+00
 1:  1.2589e-01 -5.2871e-01  2e+00  1e-01  2e-01
 2:  9.6320e-02 -6.7889e-03  2e-01  1e-02  1e-02
 3:  7.5815e-02  5.5365e-02  2e-02  2e-16  2e-15
 4:  6.6652e-02  6.4833e-02  2e-03  1e-16  1e-15
 5:  6.5218e-02  6.5136e-02  8e-05  1e-16  1e-15
 6:  6.5157e-02  6.5153e-02  4e-06  1e-16  7e-16
 7:  6.5155e-02  6.5155e-02  3e-07  3e-16  1e-15
 8:  6.5155e-02  6.5155e-02  3e-09  1e-16  2e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  6.9081e-02 -1.0599e+00  4e+01  5e+00  7e+00
 1:  1.2820e-01 -5.2344e-01  2e+00  1e-01  2e-01
 2:  9.8683e-02 -2.6104e-03  2e-01  1e-02  1e-02
 3:  7.9211e-02  5.9065e-02  2e-02  2e-16  4e-15
 4:  7.0265e-02  6.8463e-02  2e-03  1e-16  2e-15
 5:  6.8856e-02  6.8775e-02  8e-05  1e-16  2e-15
 6:  6.8797e-02  6.8793e-02  4e-06  2e-16  8e-16
 7:  6.8796e-02  6.8796e-02  1e-07  3e-16  9e-16
 8:  6.8796e-02  6.8796e-02  1e-09  3e-17  2e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  7.2221e-02 -1.0574e+00  4e+01  5e+00  7e+00
 1:  1.3047e-01 -5.1827e-01  2e+00  1e-01  2e-01
 2:  1.0106e-01  1.5175e-03  2e-01  1e-02  1e-02
 3:  8.2601e-02  6.2739e-02  2e-02  6e-17  4e-15
 4:  7.3858e-02  7.2073e-02  2e-03  1e-16  2e-15
 5:  7.2476e-02  7.2394e-02  8e-05  1e-16  1e-15
 6:  7.2416e-02  7.2413e-02  3e-06  1e-16  1e-15
 7:  7.2415e-02  7.2415e-02  8e-08  2e-16  8e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  7.5302e-02 -1.0549e+00  4e+01  5e+00  7e+00
 1:  1.3272e-01 -5.1320e-01  2e+00  1e-01  2e-01
 2:  1.0342e-01  4.4485e-03  2e-01  1e-02  1e-02
 3:  8.6093e-02  6.6277e-02  2e-02  1e-16  3e-15
 4:  7.7451e-02  7.5661e-02  2e-03  2e-16  1e-15
 5:  7.6078e-02  7.5995e-02  8e-05  1e-16  2e-15
 6:  7.6017e-02  7.6014e-02  3e-06  1e-16  1e-15
 7:  7.6016e-02  7.6016e-02  5e-08  2e-16  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  7.8329e-02 -1.0524e+00  4e+01  5e+00  7e+00
 1:  1.3495e-01 -5.0821e-01  2e+00  1e-01  2e-01
 2:  1.0576e-01  6.5317e-03  2e-01  1e-02  1e-02
 3:  8.9647e-02  6.9716e-02  2e-02  3e-16  9e-15
 4:  8.1040e-02  7.9229e-02  2e-03  1e-16  2e-15
 5:  7.9665e-02  7.9578e-02  9e-05  1e-16  1e-15
 6:  7.9600e-02  7.9597e-02  3e-06  2e-16  1e-15
 7:  7.9599e-02  7.9599e-02  3e-08  2e-16  8e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  8.1307e-02 -1.0499e+00  4e+01  5e+00  7e+00
 1:  1.3717e-01 -5.0331e-01  2e+00  1e-01  2e-01
 2:  1.0811e-01  8.6551e-03  2e-01  1e-02  1e-02
 3:  9.3182e-02  7.3146e-02  2e-02  1e-16  6e-15
 4:  8.4612e-02  8.2783e-02  2e-03  2e-16  1e-15
 5:  8.3235e-02  8.3145e-02  9e-05  1e-16  1e-15
 6:  8.3168e-02  8.3165e-02  2e-06  2e-16  2e-15
 7:  8.3167e-02  8.3167e-02  2e-08  1e-16  9e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  8.4239e-02 -1.0473e+00  4e+01  5e+00  7e+00
 1:  1.3937e-01 -4.9849e-01  2e+00  1e-01  2e-01
 2:  1.1048e-01  1.0815e-02  2e-01  1e-02  1e-02
 3:  9.6699e-02  7.6568e-02  2e-02  3e-16  4e-15
 4:  8.8168e-02  8.6322e-02  2e-03  1e-16  1e-15
 5:  8.6792e-02  8.6699e-02  9e-05  2e-16  1e-15
 6:  8.6721e-02  8.6719e-02  2e-06  1e-16  1e-15
 7:  8.6720e-02  8.6720e-02  2e-08  2e-16  9e-16
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  8.7131e-02 -1.0447e+00  4e+01  5e+00  7e+00
 1:  1.4155e-01 -4.9374e-01  2e+00  1e-01  2e-01
 2:  1.1285e-01  1.3010e-02  2e-01  1e-02  2e-02
 3:  1.0020e-01  7.9982e-02  2e-02  5e-16  4e-15
 4:  9.1709e-02  8.9848e-02  2e-03  1e-16  1e-15
 5:  9.0335e-02  9.0239e-02  1e-04  5e-18  1e-15
 6:  9.0261e-02  9.0260e-02  2e-06  6e-17  8e-16
 7:  9.0260e-02  9.0260e-02  2e-08  4e-16  1e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  8.9984e-02 -1.0421e+00  4e+01  5e+00  7e+00
 1:  1.4373e-01 -4.8906e-01  2e+00  1e-01  2e-01
 2:  1.1523e-01  1.5235e-02  2e-01  1e-02  2e-02
 3:  1.0368e-01  8.3385e-02  2e-02  7e-07  9e-07
 4:  9.5237e-02  9.3362e-02  2e-03  2e-08  3e-08
 5:  9.3867e-02  9.3768e-02  1e-04  1e-10  1e-10
 6:  9.3790e-02  9.3788e-02  2e-06  9e-13  1e-12
 7:  9.3789e-02  9.3789e-02  2e-08  9e-15  1e-14
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  9.2802e-02 -1.0395e+00  4e+01  5e+00  7e+00
 1:  1.4589e-01 -4.8445e-01  2e+00  1e-01  2e-01
 2:  1.1762e-01  1.7488e-02  2e-01  1e-02  2e-02
 3:  1.0715e-01  8.6703e-02  2e-02  2e-05  2e-05
 4:  9.8757e-02  9.6865e-02  2e-03  6e-07  8e-07
 5:  9.7387e-02  9.7285e-02  1e-04  3e-09  4e-09
 6:  9.7307e-02  9.7305e-02  1e-06  3e-11  3e-11
 7:  9.7306e-02  9.7306e-02  1e-08  3e-13  3e-13
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  9.5589e-02 -1.0369e+00  4e+01  5e+00  7e+00
 1:  1.4804e-01 -4.7990e-01  2e+00  1e-01  2e-01
 2:  1.2002e-01  1.9769e-02  2e-01  1e-02  2e-02
 3:  1.1060e-01  9.0031e-02  2e-02  3e-05  4e-05
 4:  1.0226e-01  1.0036e-01  2e-03  1e-06  1e-06
 5:  1.0090e-01  1.0079e-01  1e-04  5e-09  7e-09
 6:  1.0081e-01  1.0081e-01  1e-06  5e-11  6e-11
 7:  1.0081e-01  1.0081e-01  1e-08  5e-13  6e-13
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0:  9.8346e-02 -1.0343e+00  4e+01  5e+00  7e+00
 1:  1.5019e-01 -4.7542e-01  2e+00  1e-01  2e-01
 2:  1.2243e-01  2.2073e-02  2e-01  1e-02  2e-02
 3:  1.1404e-01  9.3364e-02  2e-02  4e-05  6e-05
 4:  1.0576e-01  1.0384e-01  2e-03  1e-06  2e-06
 5:  1.0440e-01  1.0429e-01  1e-04  7e-09  1e-08
 6:  1.0431e-01  1.0431e-01  1e-06  7e-11  9e-11
 7:  1.0431e-01  1.0431e-01  1e-08  7e-13  9e-13
Optimal solution found.

In [13]:

show(Line(
    {'risk aversion': risk_aversion, 'expected return': returns}, 
    x='risk aversion', 
    y='expected return', 
    width=750, 
    height=400
))

Model 2: Support Vector Machines¶

We want to draw a line that separates two sets with as little misclassification as possible.

If $f(x) \le 5$, the point is of type $-1$, otherwise it is type $+1$.

$$f(x) = (x_1 + \epsilon_1) - (x_2+ \epsilon_2)$$

$$\epsilon_i \sim N\left(0, 1.25^2\right)\ \forall\ i \in {1, 2}$$

In [14]:

x1 = [random.uniform(0, 10) for _ in range(150)]
x2 = [random.uniform(0, 10) for _ in range(150)]

y = []
for xi_1, xi_2 in zip(x1, x2):
    eps_1 = random.normalvariate(0, 1.25)
    eps_2 = random.normalvariate(0, 1.25)
    
    if (xi_1 + eps_1) - (xi_2 + eps_2) <= 5:
        y.append(-1)
    else:
        y.append(+1)
        
X = np.array(list(zip(x1, x2)))
y = np.array(y)

In [15]:

show(Scatter(
    {'x1': x1, 'x2': x2, 'y': y}, 
    x='x1', y='x2', color='y', marker='y', palette=['lightblue', 'orange'],
    width=500, height=500
))

Support Vector Machines a la `scikit-learn`¶

In [16]:

from sklearn import svm
clf = svm.SVC(kernel='linear')
clf.fit(X, y)

Out[16]:

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

The SVM coefficients give us a separating hyperplane with minimum classification error.

$$b + w'x = 0$$

In [17]:

print('b =', clf.intercept_[0])
print('w =', clf.coef_[0])

b = -3.36050577816
w = [ 0.67262893 -0.60681053]

We classify a point $x = (x_1, x_2)'$ based on which side of that hyperplane it is on.

$$b + w'x \ge 0\ \rightarrow\ \text{Type +1}$$

$$b + w'x < 0\ \rightarrow\ \text{Type -1}$$

In [18]:

from sklearn import metrics

y_hat = clf.predict(X)

print('confusion matrix:')
metrics.confusion_matrix(y_true=y, y_pred=y_hat)

confusion matrix:

Out[18]:

array([[117,   2],
       [ 15,  16]])

In [19]:

# We need a function to plot classifications and the separating hyperplane.
def plot_svm_results(correct, b_val, w1_val, w2_val):
    # Classifications
    plot = Scatter(
        {'x1': x1, 'x2': x2, 'y': y, 'correct': correct}, 
        x='x1', y='x2', 
        color='correct', palette=['lightgrey', 'darkred'], marker='correct',
        width=500, height=500
    )

    # Separating hyperplane: x2 = -(b + (w1*x1)) / w2
    hyperplane_x1 = np.linspace(0, 10)
    hyperplane_x2 = [
        -(b_val + (w1_val * x1_i)) / w2_val for x1_i in hyperplane_x1
    ]
    plot.line(hyperplane_x1, hyperplane_x2, color='darkblue')
    show(plot)

How'd scikit-learn do?

In [20]:

correct = ['Correct' if yi == yh_i else 'Incorrect' for yi, yh_i in zip(y, y_hat)]
plot_svm_results(correct, clf.intercept_[0], clf.coef_[0][0], clf.coef_[0][1])

Support Vector Machines a la Linear Optimization¶

PuLP is a modeling tool that lets us solve anything of this form.

$$ \begin{align} \text{min} \ \ \ & \ c'x \\ \text{s.t.} \ \ \ & \ A x = b \\ & \ x \ge 0 \end{align} $$

Where $c$ and $b$ are input vectors, $A$ is an input matrix, and $x$ is a vector of real-valued decision variables.

Linear Optimization is well solved. You can assume that a linear optimizer is robust and fast for any model in that form.

PuLP can also solve discrete models, which are not so computationally easy.

How do we get our SVM problem into a linear optimizer?

Inputs:¶

$$(x_{i1}, x_{i2})' = \text{column vector of inputs for observation}\ i$$

$$y_i \in \{+1, -1\} = \text{known classification of observation}\ i $$

Model:¶

The decision variables $b$, $w_1$, and $w_2$, give us a separating hyperplane $b + w'x = 0$.

For each misclassified observation $i$, $\xi_i$ is its classification error. We find the best hyperplane by minimizing total error.

$$ \begin{align} \text{min} \ \ \ & \ \sum_i \xi_i \ & \ \\ \text{s.t.} \ \ \ & \ b + w_1 x_{i1} + w_2 x_{i2} \le -1 + \xi_i\ & \ \forall\ y_i = -1 \\ & \ b + w_1 x_{i1} + w_2 x_{i2} \ge +1 - \xi_i\ & \ \forall\ y_i = +1 \\ & \ \xi \ge 0 \end{align} $$

We can simplify the model by multiplying each constraint by $y_i$.

$$ \begin{align} \text{min} \ \ \ & \ \sum_i \xi_i \\ \text{s.t.} \ \ \ & \ y_i (b + w_1 x_{i1} + w_2 x_{i2}) \ge 1 - \xi_i\ \forall\ i \\ & \ \xi \ge 0 \end{align} $$

But wait! Constraints like $a'x \ge b$ aren't in the form you gave us!

That's OK. They get converted by the optimizer.

$$a'x - e = b$$$$e \ge 0$$

So do constraints of the form $a'x \le b$.

$$a'x + s = b$$$$s \ge 0$$

As do any variables that are unrestricted in sign.

$$x_i = x_i^+ - x_i^-$$$$x_i^+, x_i^- \ge 0$$

As long as your objective function is linear and your constraints are linear and of the form $=$, $\le$, or $\ge$, you can solve it with a linear optimizer.

Let's get hacking. First make variables for the separating hyperplane.

Note these are unrestricted in sign.

In [21]:

import pulp

b = pulp.LpVariable('b')
w1 = pulp.LpVariable('w1')
w2 = pulp.LpVariable('w2')

We also need variables for the classification errors.

These are nonnegative since we only want to penalize errors, not reward the model for correct classifiations.

In [22]:

errors = []
for i in range(len(y)):
    e = pulp.LpVariable('e_%d' % i, lowBound=0) # lowBound=0 means >=0
    errors.append(e)

Build a model that records classification error based on the separating hyperplane in the appropriate variables.

In [23]:

m = pulp.LpProblem(sense=pulp.LpMinimize)

for x_i1, x_i2, y_i, e_i in zip(x1, x2, y, errors):
    m += y_i * (b + w1 * x_i1 + w2 * x_i2) >= (1 - e_i)

So far the model represents the feasible region of all possible separating hyperplanes.

We want the best one.

In [24]:

m.setObjective(sum(errors))
assert m.solve() == pulp.LpStatusOptimal

print('b =', b.value())
print('w =', [w1.value(), w2.value()])

b = -3.5456539
w = [0.71991737, -0.64676861]

We don't have a predict function, so let's build one.

In [25]:

def classify(xi_1, xi_2):
    if b.value() + (w1.value() * xi_1) + (w2.value() * xi_2) >= 0:
        return +1
    return -1

y_hat = [classify(xi_1, xi_2) for xi_1, xi_2 in zip(x1, x2)]

print('confusion matrix:')
metrics.confusion_matrix(y_true=y, y_pred=y_hat)

confusion matrix:

Out[25]:

array([[117,   2],
       [ 15,  16]])

How'd the linear optimizer do?

In [26]:

correct = ['Correct' if yi == yh_i else 'Incorrect' for yi, yh_i in zip(y, y_hat)]
plot_svm_results(correct, b.value(), w1.value(), w2.value())

Interlude: Problem Shapes¶

In Linear Algebra 101 we learned to solve problems that look like this.

$$Ax = b$$$$ \begin{bmatrix} \cdot & \cdot \\ \cdot & \cdot \end{bmatrix} \begin{bmatrix} \cdot \\ \cdot \end{bmatrix} = \begin{bmatrix} \cdot \\ \cdot \end{bmatrix} $$

Many statistical inference problems are overdetermined and unsolvable directly.

$$ \begin{bmatrix} \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \\ \cdot & \cdot \end{bmatrix} \begin{bmatrix} \cdot \\ \cdot \end{bmatrix} = \begin{bmatrix} \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \end{bmatrix} $$

Decision models tend to involve a large number of potential actions in relation to the constraint set.

$$ \begin{bmatrix} \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \end{bmatrix} \begin{bmatrix} \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \\ \cdot \end{bmatrix} = \begin{bmatrix} \cdot \\ \cdot \end{bmatrix} $$

In both cases, we start with a problem we can't solve easily and reformulate it as something we can.

For statistical inference, we solve a secondary problem that minimizes error against the observed data.

For decision models, we choose a set of actions to take (i.e. a basis) and solve the remaining system.

Model 3: Clustering¶

We have a set of unclassified data and we want to cluster it around logical centers.

We generate 25 points around each center $(x_{10}, x_{20}) \in \{(0.3, 0.3), (0.3, 0.7), (0.7, 0.7)\}$.

$$x_{1i} = x_{10} + \epsilon$$

$$x_{2i} = x_{20} + \epsilon$$$$\epsilon \sim N\left(0, 0.2^2\right)$$

In [27]:

centers = [(0.3, 0.3), (0.3, 0.7), (0.7, 0.7)]

x1 = []
x2 = []
for x10, x20 in centers:
    for _ in range(7):
        eps_x1 = random.normalvariate(0, 0.2)
        eps_x2 = random.normalvariate(0, 0.2)

        x1i = min(max(0.0, x10+eps_x1), 1.0)
        x2i = min(max(0.0, x20+eps_x2), 1.0)
        
        x1.append(x1i)
        x2.append(x2i)
        
X = np.array(list(zip(x1, x2)))

In [28]:

show(Scatter({'x1': x1, 'x2': x2}, x='x1', y='x2', width=500, height=500))

Clustering a la `scikit-learn`¶

Assume we know a prioi how many clusters there are.

In [29]:

from sklearn.cluster import KMeans
clust = KMeans(n_clusters=3)
clust.fit(X)

Out[29]:

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

The KMeans model has two outputs:

The center point of each cluster
Which cluster each point is assigned to

In [30]:

clust.cluster_centers_

Out[30]:

array([[ 0.80998334,  0.53503504],
       [ 0.32591213,  0.69901855],
       [ 0.25386159,  0.29119595]])

We need a function to easily plot clustering results.

In [31]:

def plot_clusters(clusters, cluster_centers):
    plot = Scatter(
        {'x1': x1, 'x2': x2, 'clusters': clusters},
        x='x1', y='x2',
        color='clusters', marker='clusters', 
        width=500, height=500
    )

    for x10, x20 in cluster_centers:
        plot.circle(x10, x20, alpha=0.5, color='grey', size=15)

    show(plot)

How'd scikit-learn do?

In [32]:

plot_clusters(clust.predict(X), clust.cluster_centers_)

Clustering a la Mixed Integer Linear Optimization¶

Let's extend the linar optimization model support the concept of binary variables.

$$ \begin{align} \text{min} \ \ \ & \ c'x + d'y \\ \text{s.t.} \ \ \ & \ A x + D y = b \\ & \ x \ge 0 \\ & \ y \in \{0, 1\} \end{align} $$

$y$ is a vector of variables which can only take values $0$ or $1$. These represent discrete decisions and allow logical relationships among them.

Discrete optimization is very powerful, but not well solved the way linear optimization is. Model solvability is highly dependent on formulation.

What's a clustering model look like with binary variables?

Inputs:¶

$$(x_{1i}, x_{2i}) = \text{location of observed point} \in [0, 1]\ \forall\ i$$

Model:¶

We need to make two major types of decisions:

Where are the cluster centers, $\{w_1, w_2, w_3\}$?
Which cluster is each point assigned to? $v_{ij} = 1$ if point $i$ is part of cluster $j$, $0$ if not.

We minimize total taxicab distance from each point to its assigned cluster

$d_{x_{1}i}$ is the distance from point $i$ to its cluster center along the $x_1$ axis
$d_{x_{2}i}$ is the distance from point $i$ to its cluster center along the $x_2$ axis

$$ \begin{align} \text{min} \ \ \ & \ \sum_i (d_{x_{1}i} + d_{x_{2}i}) \ & \ \\ \text{s.t.} \ \ \ & \ \sum_j v_{ij} = 1 & \ \forall\ i \\ & \ d_{x_{1}i} \ge (x_{1i} - w_{j1}) - (1 - v_{ij}) & \ \forall\ i, j \\ & \ d_{x_{1}i} \ge (w_{j1} - x_{1i}) - (1 - v_{ij}) & \ \forall\ i, j \\ & \ d_{x_{2}i} \ge (x_{2i} - w_{j2}) - (1 - v_{ij}) & \ \forall\ i, j \\ & \ d_{x_{2}i} \ge (w_{j2} - x_{2i}) - (1 - v_{ij}) & \ \forall\ i, j \\ & \ 0 \le d_x \le 1 \\ & v_{ij} \in \{0, 1\} \end{align} $$

Primary decision variables = locations of cluster centers

In [33]:

w = []
for j in range(len(centers)):
    x10 = pulp.LpVariable('x_10%d' % j, lowBound=0, upBound=1)
    x20 = pulp.LpVariable('x_20%d' % j, lowBound=0, upBound=1)
    w.append([x10, x20])

Secondary decision variables = do we associate point $i$ with cluster $j$?

In [34]:

v = []
for i in range(len(x1)):
    vi = []
    for j in range(len(w)):
        vij = pulp.LpVariable('v_%d%d' % (i, j), cat=pulp.LpBinary)
        vi.append(vij)
    v.append(vi)

Each point is associated with exactly one cluster.

In [35]:

m = pulp.LpProblem(sense=pulp.LpMinimize)
for vi in v:
    m += sum(vi) == 1

Make a list of taxicab distance variables to guide the optimizer.

In [36]:

distances = []
for i, (vi, x1i, x2i), in enumerate(zip(v, x1, x2)):
    for j, (wj, vij) in enumerate(zip(w, vi)):
        x10, x20 = wj
        
        dx1 = pulp.LpVariable('dx_1%d%d' % (i, j), lowBound=0, upBound=1)
        dx2 = pulp.LpVariable('dx_2%d%d' % (i, j), lowBound=0, upBound=1)
    
        m += dx1 >= (x1i - x10) - (1 - vij)
        m += dx1 >= (x10 - x1i) - (1 - vij)

        m += dx2 >= (x2i - x20) - (1 - vij)
        m += dx2 >= (x20 - x2i) - (1 - vij)
        
        distances.extend([dx1, dx2])

WARNING WARNING WARNING

Symmetry elimination is beyond the scope of this talk.

We don't technically need it to solve this model, but it sure can help.

In [37]:

for j, (x01_1, x02_1) in enumerate(w):
    for x01_2, x02_2 in w[j+1:]:
        m += x01_1 + x02_1 <= x01_2 + x02_2

Total distance from points to their cluster centers is our objective function.

In [38]:

m.setObjective(sum(distances))
assert m.solve() == pulp.LpStatusOptimal

In [39]:

cluster_centers = []
for wj in w:
    cluster_centers.append([wj[0].value(), wj[1].value()])

for c in cluster_centers:
    print(c)

[0.35947481, 0.24456388]
[0.23018792, 0.75395849]
[0.79554363, 0.60669282]

There's no predict method, so we yank that information out of the $v_{ij}$ variables.

In [40]:

clusters = []
for vi in v:
    for c, vij in zip([0, 1, 2], vi):
        if vij.value() > 0.5:
            clusters.append(c)
            break

How'd mixed integer optimization do?

In [41]:

plot_clusters(clusters, cluster_centers)

Hmmm. That model took a while to solve. What gives?

When we introduce discrete choices into models, we lose the characteristics that make them easy to solve.
This matter a lot less now that it did in the 90s.
But effective formulation of discrete models is still (may always be?) an art.

Let's look at something more realistic.

Peuber is the Uber of pedicabs.

We have 7 pedicabs in our fleet and 11 customer requesting service. It's a prime day for Peuber!

In [42]:

pedicabs = range(7)
customers = range(11)

Generate locations for each of these.

In [43]:

rng = lambda: random.uniform(0, 100)

pedicab_locations = [(rng(), rng()) for _ in pedicabs]
pickup_locations = [(rng(), rng()) for _ in customers]
dropoff_locations = [(rng(), rng()) for _ in customers]

This function will help us visualize the routing problem and its solutions.

In [44]:

from bokeh.models import Range1d

def build_pedicabs_and_customers_plot():
    all_points = pedicab_locations + pickup_locations + dropoff_locations
    point_types = ['Pedicab'] * len(pedicabs) + \
                  ['Pickup'] * len(customers) + \
                  ['Dropoff'] * len(customers)

    plot = Scatter(
        {
            'x': [l[0] for l in all_points],
            'y': [l[1] for l in all_points],
            'type': point_types
        },
        x='x', y='y',
        color='type', marker='type',
        width=600, height=600
    )

    plot.legend.orientation = "horizontal"
    plot.x_range = plot.y_range = Range1d(-15, 115)

    return plot

So what's our problem look like right now?

In [45]:

plot = build_pedicabs_and_customers_plot()

# Connect pickups to dropoffs
for p, d in zip(pickup_locations, dropoff_locations):
    plot.line(
        [p[0], d[0]], [p[1], d[1]], 
        line_alpha=0.5, 
        line_color='grey', 
        line_dash='dashed'
    )

In [46]:

show(plot)

Column generation is a popular technique for solving problems like this. We generate columns (i.e. binary variables) which are pre-defined routes.

$\Omega = \{\text{set of allowable routes}\}$
$\Omega_i = \{\text{set of routes covering customer i}\}$
$\Omega_j = \{\text{set of routes involving pedicab j}\}$
$c_r = \{\text{cost of route r}\}$
$x_r = 1$ if we use route $r$, and $0$ otherwise.

Our model minimizes overall cost while covering each customer and assigning no more than one route to each pedicab.

$$ \begin{align} \text{min} \ \ \ & \ \sum_r c_r x_r \ & \ \\ \text{s.t.} \ \ \ & \ \sum_{r \in \Omega_i} x_r = 1 & \ \forall\ i \\ & \ \sum_{r \in \Omega_j} x_r \le 1 & \ \forall\ j \\ & x_r \in \{0, 1\} \end{align} $$

We need to track routes, their costs, and what actors they are associated with.

In [47]:

from collections import defaultdict
import math

l2 = lambda pt1, pt2: math.sqrt(sum((x1-x2)**2 for x1, x2 in zip(pt1, pt2)))

routes = []
route_costs = {}

routes_by_pedicab = defaultdict(list)
routes_by_customer = defaultdict(list)

def add_route(pedicab, customers, locations):
    stops = [pedicab_locations[pedicab]] + list(locations)
    
    route_num = len(routes)
    
    routes.append(stops)
    route_costs[route_num] = sum(l2(s1, s2) for s1, s2 in zip(stops, stops[1:]))
    
    routes_by_pedicab[pedicab].append(route_num)
    for cust in customers:
        routes_by_customer[cust].append(route_num)

Generate $\Omega$! Let's assume we're not interested in assigning more than 2 customers to any pedicab.

In [48]:

from itertools import permutations

pickups_dropoffs = list(zip(pickup_locations, dropoff_locations))

for cab in pedicabs:
    # Single-customer routes
    for cust in customers:
        add_route(cab, [cust], pickups_dropoffs[cust])

    # Doubles
    for c1, c2 in permutations(customers, 2):
        p1, d1 = pickups_dropoffs[c1]
        p2, d2 = pickups_dropoffs[c2]
        
        add_route(cab, [c1, c2], [p1, p2, d1, d2])
        add_route(cab, [c1, c2], [p1, p2, d2, d1])
        add_route(cab, [c1, c2], [p1, d1, p2, d2])
     
len(routes)

Out[48]:

Binary variables indicate which routes to use.

In [49]:

x = []
for r in range(len(routes)):
    x.append(pulp.LpVariable('x%d' % r, cat=pulp.LpBinary))

The entire constraint set to the problem:

In [50]:

prm = pulp.LpProblem('Pedicab Routing Model', sense=pulp.LpMinimize)

# No more than one route assigned to each cab
for cab_routes in routes_by_pedicab.values():
    prm += sum(x[r] for r in cab_routes) <= 1
    
# Each customer is picked up by exactly one pedicab
for cust_routes in routes_by_customer.values():
    prm += sum(x[r] for r in cust_routes) == 1

Objective = minimize total distance.

In [51]:

z = pulp.LpVariable('z', lowBound=0)
prm += z == sum(route_costs[r] * x[r] for r in range(len(routes)))

prm.setObjective(z)
assert prm.solve() == pulp.LpStatusOptimal

best_total_cost = z.value()
print('total cost:', best_total_cost)

total cost: 692.04026

We need a function to plot solutions.

In [52]:

def plot_routes():
    plot = build_pedicabs_and_customers_plot()

    # Show 
    for r, stops in enumerate(routes):
        if x[r].value() < 0.5:
            continue

        plot.line(
            [s[0] for s in stops], 
            [s[1] for s in stops], 
            line_alpha=0.5, 
            line_color='grey', 
            line_dash='dashed'
        )
    
    show(plot)

In [53]:

plot_routes()

Some of our pedicabs complain that they have to travel much farther than others. Let's fix that by minimizing the maximum trip length.

In [54]:

z2 = pulp.LpVariable('z2', lowBound=0)
for r in range(len(routes)):
    prm += z2 >= route_costs[r] * x[r]

prm.setObjective(z2)
assert prm.solve() == pulp.LpStatusOptimal

print('total cost:', z.value())
print('max route length:', z2.value())

total cost: 916.95713
max route length: 155.61076

In [55]:

plot_routes()

That can hurt our global solution a lot. What if we are willing to accept a 10% cut to the global solution for more balance among individual routes?

In [56]:

# Only willing to accept 10% degradation in global optimality
prm += z <= best_total_cost * 1.1
assert prm.solve() == pulp.LpStatusOptimal

print('total cost:', z.value())
print('max route length:', z2.value())

total cost: 710.47738
max route length: 167.38201

In [57]:

plot_routes()

Where to find more information¶

Introduction to Operations Research by Hillier and Lieberman
Linear and Nonlinear Optimization by Griva, Nash, and Sofer
Applied Integer Programming by Chen, Batson, and Dang
Model Building in Mathematical Programming by Williams

Practical Optimization for Stats Nerds¶

What to expect¶

Take-aways¶

Model 1: Least Squares¶

Function:¶

Noise:¶

Least Squares the way you do it...¶

Least Squares the way your grandparents did it...¶

First Order Necessary Conditions¶

Least Squares the way your crazy uncle Eddie does it...¶

So what's different about Crazy Uncle Eddie?¶

Application 1: Portfolio Optimization¶

The Markowitz Porfolio Optimization Model¶

Inputs:¶

Model:¶

Model 2: Support Vector Machines¶

Support Vector Machines a la scikit-learn¶

Support Vector Machines a la Linear Optimization¶

Inputs:¶

Model:¶

Interlude: Problem Shapes¶

Model 3: Clustering¶

Clustering a la scikit-learn¶

Clustering a la Mixed Integer Linear Optimization¶

Inputs:¶

Model:¶

Application 2: Pedicab Sharing¶

Where to find more information¶

Support Vector Machines a la `scikit-learn`¶

Clustering a la `scikit-learn`¶