Notebook

Lab 7: Kernels, Support Vector Machines¶

Part I: Kernels¶

Exercise 1. A complex dataset¶

Consider the dataset given below. We want to learn a classifier for this data, by relying on the $\ell_2$ loss.

In [8]:

import numpy as np
import matplotlib.pyplot as plt

from scipy.io import loadmat
pointsClass1 = loadmat('KernelPointsEx1class1.mat')['PointsEx1class1']
pointsClass2 = loadmat('KernelPointsEx1class2.mat')['PointsEx1class2']


plt.scatter(pointsClass1[:,0], pointsClass1[:,1], c='r')
plt.scatter(pointsClass2[:,0], pointsClass2[:,1], c='b')
plt.show()

Exercise 1.1 Gradient updates on the dual formulation¶

Start by generating polynomial features using the function 'sklearn.preprocessing.PolynomialFeatures' from scikit learn. By relying on the kernel trick, starting from the $\ell_2$ (OLS) loss, derive the dual formulation (optimization problem on the coefficients $\lambda_i$ that are used to express the weight vector $\beta$ as the combination $\beta$). Once you have that loss, find the optimal coefficients $\lambda^*$ through gradient updates.

In [ ]:

# put your code here

Exercise 1.2 Final Models¶

Using the optimal coefficients, derive the classifier $y(\mathbf{x}) = \mathbf{\beta}^T\mathbf{\phi}(\mathbf{x})$, expressing the weight vector $\mathbf{\beta}$ as the combination $\mathbf{\beta} = \sum_{i=1}^N \lambda_i^* \mathbf{\phi}(\mathbf{x}^{(i)})$. Then display the boundary by using meshgrid.

In [ ]:

Exercise 2 A disconnected dataset.¶

We want to make the problem a little more difficult. Use your implementation of descent algorithm to learn a classifier for the dataset below. Progressively increase the number of features (i.e. increase the degree) and keep in mind that there is often and efficient way to compute the Kernel matrix.

In [11]:

import numpy as np
import matplotlib.pyplot as plt

from scipy.io import loadmat
pointsClass1 = loadmat('KernelPointsEx2class1.mat')['PointsEx2class1']
pointsClass2 = loadmat('KernelPointsEx2class2.mat')['PointsEx2class2']


plt.scatter(pointsClass1[:,0], pointsClass1[:,1], c='r')
plt.scatter(pointsClass2[:,0], pointsClass2[:,1], c='b')
plt.show()

Exercise 3. From polynomial fesatures to the Gaussian kernel¶

Use your gradient descent iterations on the $\mathbf{\lambda}_i$ to learn a classifier for the dataset below by relying on the Gaussian kernel.

In [12]:

import numpy as np
import matplotlib.pyplot as plt

from scipy.io import loadmat
pointsClass1 = loadmat('KernelPointsEx3class1.mat')['PointsEx3class1']
pointsClass2 = loadmat('KernelPointsEx3class2.mat')['PointsEx3class2']


plt.scatter(pointsClass1[:,0], pointsClass1[:,1], c='r')
plt.scatter(pointsClass2[:,0], pointsClass2[:,1], c='b')
plt.show()

Part II : Maximizing the margin¶

Exercise 1. The hinge loss¶

Consider the dataset below. We would like to learn a classifier for this dataset that maximizes the margin (i.e. such that the distance between the closest points to the plane is maximized). We have seen that one can solve this problem by means of the constrained formulation

\begin{align*} \min_{\mathbf{\beta}} \quad & \|\mathbf{\beta}\|^2 \\ \text{subject to} \quad & y(\mathbf{x}^{(i)})t^{(i)} \geq 1 \end{align*}

where $y(\mathbf{x}^{(i)}) = \mathbf{\beta}^T\mathbf{x}^{(i)} + \beta_0$. We might sometimes want to use a (softer) unconstrained formulation. in particular, when selecting this option, we can use the following function known as the Hinge loss

\begin{align*} \max(0, 1-t^{(i)}y(\mathbf{x}^{(i)})) = \max(0, 1-t^{(i)}(\mathbf{\beta}^T\mathbf{x}^{(i)}+\beta_0)) \end{align*}

For such a loss, we can derive a softer, unconstrained version of the problem as

\begin{align*} \min_{\mathbf{\beta}} \quad & \|\mathbf{\beta}\|^2 + \frac{C}{N}\sum_{i=1}^N \max(0, 1-t^{(i)}(\mathbf{\beta}^T\mathbf{x}^{(i)}+\beta_0)) \end{align*}

In short we penalize a point, only if this point lies on the wrong side of the plane.

In [13]:

import numpy as np
import matplotlib.pyplot as plt

from scipy.io import loadmat
pointsClass1 = loadmat('KernelPointsEx4class1.mat')['PointsEx4class1']
pointsClass2 = loadmat('KernelPointsEx4class2.mat')['PointsEx4class2']


plt.scatter(pointsClass1[:,0], pointsClass1[:,1], c='r')
plt.scatter(pointsClass2[:,0], pointsClass2[:,1], c='b')
plt.show()

Exercise 1.1¶

Start by completing the function below which should return the value and gradient of the hinge loss at a point $\mathbf{x}^{(i)}$. What is the gradient of the hinge loss?

In [ ]:

def HingeLoss(x):
    
    '''Returns the value and gradient of the hinge 
    loss at the point x'''
    
    return value, gradient

Exercise 1.2¶

Once you have the function, implement a function HingeLossSVC that takes as innput a starting weight vector $\mathbf{\beta}$ and intercept $\beta_0$ as well as the set of training points and a value for the parameter $C$ and returns the maximum margin classifier.

In [ ]:

def HingeLossSVC(beta_init, beta0_init training, C):
    
    '''Returns the maximal margin classifier for the 
    training dataset'''
    
    return beta, beta0

Exercise 2.¶

We now would like to find a maximal margin classifier based on the Gaussian kernel. Write the dual formulation and use the Kernel trick to replace the inner product of the feature vectors with the (Gaussian) Kernel matrix. The dual formulation is quadratically constrained program. In order to solve this problem, we will rely on the CVXOPT library. You can install this library from the terminal using the line 'pip install cvxopt'

CVXOPT provides a quadratic solver which is defined by means of 6 matrices $Q, p, G, h, A, b$ which define the problem to be solved as

\begin{align*} \text{minimize} \quad & \frac{1}{2} \mathbf{x}^T\mathbf{P}\mathbf{x} + \mathbf{q}^T\mathbf{x}\\ \text{subject to}\quad & \mathbf{G}\mathbf{x} \preceq h\\ &\mathbf{A}\mathbf{x} = \mathbf{b} \end{align*}

Here the notation $\mathbf{x}\preceq 0$ is used to indicate that every entry of the vector $\mathbf{x}$ has to be non negative

Exercise 2.1 The gaussian kernel¶

Start by providing the definition of the Gaussian kernel in order to define the matrix $\mathbf{P}$

In [ ]:

import numpy as np

def GaussianKernel(training, sigma):
    
    '''should return the kernel matrix K whose 
    entry K_ij is defined as K(x_i, x_j) = exp(-||x_i - x_j||^2/(2*sigma^2))'''
    
    return K

Exercise 2.2 Solving the QCQP¶

Relying on the dual formulation and on the Gaussian kernel, provide a sensible definition for each of the matrices $Q, p, G, h, A, b$ and solve the quadratic program using CVXOPT.

In [ ]:

import numpy as np
from cvxopt import matrix, solvers