#!/usr/bin/env python # coding: utf-8 # # # ##
CSCI-UA 9473 Introduction to Machine Learning
# ###
Spring 2022
# ##
Assignment 3: Convolutional nets, SVM and Robust PCA
# # #
Given date: Wednesday April 5
# #
Due date: Sunday April 24
# # # ####
Total: 30pts
# # Additional readings (To go further): # # - [Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning](https://www.deeplearningbook.org/) # - [Saharon Rosset, Ji Zhu and Trevor Hastie, Margin Maximizing Loss Functions](https://web.stanford.edu/~hastie/Papers/margmax1.pdf) # # The assignment is divided into three parts. In the first part, we will go back to neural networks. You will be asked to build and train a convolutional neural network for image classification. In the second part, we will focus on the max margin classifier and study how such a classifier can be learned by means of gradient descent. Finally, in the last part, we will implement a principal component decomposition of a video sequence to extract moving targets from their background. # ## Question I: (15pts) conv nets and autonomous driving # # In this first question, we will use [the Keras API](https://keras.io/) to build and train a convolutional neural network to discriminate between four types of road signs. To simplify we will consider the detection of 4 different signs: # # - A '30 km/h' sign (folder 1) # - A 'Stop' sign # - A 'Go straight' sign # - A 'Keep left' sign # # # # # An example of each sign is given below. # In[2]: import matplotlib.pyplot as plt import matplotlib.image as mpimg img1 = mpimg.imread('1/00001_00000_00012.png') plt.subplot(141) plt.imshow(img1) plt.axis('off') plt.subplot(142) img2 = mpimg.imread('2/00014_00001_00019.png') plt.imshow(img2) plt.axis('off') plt.subplot(143) img3 = mpimg.imread('3/00035_00008_00023.png') plt.imshow(img3) plt.axis('off') plt.subplot(144) img4 = mpimg.imread('4/00039_00000_00029.png') plt.imshow(img4) plt.axis('off') plt.show() # ### Question I.1. (10pts) # # In this exercise, you need to build and train a convolutional neural network to discriminate between the four images. # # - Before building the network, you should start by cropping the images so that they all have a common predefined size (take the smallest size across all images) # # - We will use a __Sequential model__ from Keras but it will be up to you to define the structure of the convolution net. Initialization of the sequential model can be done with the following line # In[ ]: model = Sequential() # #### I.1.a. Convolutions. # # - We will use a __convolutional__ architecture. you can add convolutional layers to the model by using the following lines # In[ ]: model.add(Conv2D(num_units, (filter_size1, filter_size2), padding='same', input_shape=(3, IMG_SIZE, IMG_SIZE), activation='relu')) # for the first layer and # In[ ]: model.add(Conv2D(filters, filter_size, activation, input_shape) # for all the others. 'filters' indicate the number of filters you want to use in the convolutional layer. filter_size is the size of each filter and activation is the usual activation that comes on top of the convolution, i.e. # $x_{\text{out}} = \sigma(\text{filter}*\text{input})$. Finally input_shape indicates the size of your input. Note that only the input layer should be given the input size. Subsequent layers will automatically compute the size of their inputs based on previous layers. # #### I.1.b Pooling Layers # # On top of the convolutional layers, convolutional neural networks (CNN) also often rely on __Pooling layers__. The addition of such a layer can be done through the following line # In[ ]: model.add(MaxPooling2D(pool_size=(filter_sz1, filter_sz2),strides=None)) # # The _pooling layers_ usually come with two parameters: the 'pool size' and the 'stride' parameter. The basic choice for the pool size is (2,2) and the stride is usually set to None (which means it will split the image into non overlapping regions such as in the Figure below). You should however feel free to play a little with those parameters. The __MaxPool operator__ considers a mask of size 'pool_size' which is slided over the image by a number of pixels equal to the stride parameters (in x and y, there are hence two translation parameters). for each position of the mask, the output only retains the max of the pixels appearing in the mask (This idea is illustrated below). One way to understand the effect of the pooling operator is that if the filter detects an edge in a subregion of the image (thus returning at least one large value), although a MaxPooling will reduce the number of parameters, it will keep track of this information. # # Adding 'Maxpooling' layers is known to work well in practice. # # # Although it is a little bit up to you to decide how you want to structure the network, a good start is to add a couple (definitely not exceeding 4) combinations (convolution, convolution, Pooling) with increasing number of units (you do every power of two like 16, 32, 128,...). # #### I.1.c. Flattening and Fully connected layers # # Once you have stacked the convolutional and pooling layers, you should flatten the output through a line of the form # In[ ]: model.add(Flatten()) # And add a couple (no need to put more than 2,3) dense fully connected layers through lines of the form # In[ ]: model.add(Dense(num_units, activation='relu')) # #### I.1.d. Concluding # # Since there are four possible signs, you need to __finish your network with a dense layer with 4 units__. Each of those units should output four number between 0 and 1 representing the likelihood that any of the four signs is detected and such that $p_1 + p_2 + p_3 + p_4 = 1$ (hopefully with one probability much larger than the others). For this reason, a good choice for the __final activation function__ of those four units is the __softmax__ (Why?). # # # Build your model below. # In[ ]: model = Sequential() # construct the model using convolutional layers, dense fully connected layers and # ### Question I.2 (3pts). Setting up the optimizer # # Once you have found a good architecture for your network, split the dataset, by retaining about 90% of the images for training and 10% of each folder for test. To train your network in Keras, we need two more steps. The first step is to set up the optimizer. Here again it is a little bit up to you to decide how you want to set up the optimization. Two popular approaches are __SGD and ADAM__. You will get to choose the learning rate. This rate should however be between 1e-3 and 1e-2. Once you have set up the optimizer, we need to set up the optimization parameters. This includes the loss (we will take it to be the __categorical cross entropy__ which is the extension of the log loss to the multiclass problem). # In[ ]: from tensorflow.keras.optimizers import SGD from tensorflow.keras.optimizers import Adam # set up the optimize here # Myoptimizer = SGD # Myoptimizer = Adam model.compile(loss='categorical_crossentropy', optimizer=Myoptimizer, metrics=['accuracy']) # ### Question I.3 (2pts). Optimization # # The last step is to fit the network to your data. Just as any function in scikit-learn, we use a call to the function 'fit'. The training of neural networks can be done by splitting the dataset into minibatches and using a different batch at each SGD step. This process is repeated over the whole dataset. A complete screening of the dataset is called an epoch. We can then repeat this idea several times. In keras the number of epochs is stored in the 'epochs' parameter and the batch size is stored in the 'batch_size' parameter. # In[ ]: batch_size = 32 epochs = 30 model.fit(X, t,batch_size=batch_size,epochs=epochs, validation_split=0.2) # ## Question II (10pts): Max margin classifiers and outliers # # Consider the dataset below. We would like to learn a classifier for this dataset that maximizes the margin (i.e. such that the distance between the closest points to the classifier is maximized). We have seen that one can solve this problem by means of the constrained formulation # # \begin{align*} # \min_{\mathbf{\beta}} \quad & \|\mathbf{\beta}\|^2 \\ # \text{subject to} \quad & y(\mathbf{x}^{(i)})t^{(i)} \geq 1 # \end{align*} # # where $y(\mathbf{x}^{(i)}) = \mathbf{\beta}^T\mathbf{x}^{(i)} + \beta_0$. We might sometimes want to use a (softer) unconstrained formulation. in particular, when selecting this option, we can use the following function known as the _Hinge loss_ # # \begin{align*} # \max(0, 1-t^{(i)}y(\mathbf{x}^{(i)})) = \max(0, 1-t^{(i)}(\mathbf{\beta}^T\mathbf{x}^{(i)}+\beta_0)) # \end{align*} # # For such a loss, we can derive a softer, unconstrained version of the problem as # # \begin{align*} # \min_{\mathbf{\beta}} \quad & \|\mathbf{\beta}\|^2 + \frac{C}{N}\sum_{i=1}^N \max(0, 1-t^{(i)}(\mathbf{\beta}^T\mathbf{x}^{(i)}+\beta_0)) # \end{align*} # # In short we penalize a point, only if this point lies on the wrong side of the plane. # In[2]: import numpy as np import matplotlib.pyplot as plt from scipy.io import loadmat pointsClass1 = loadmat('KernelPointsEx4class1.mat')['PointsEx4class1'] pointsClass2 = loadmat('KernelPointsEx4class2.mat')['PointsEx4class2'] plt.scatter(pointsClass1[:,0], pointsClass1[:,1], c='r') plt.scatter(pointsClass2[:,0], pointsClass2[:,1], c='b') plt.show() # ### Question II.1 (3pts) # # Start by completing the function below which should return the value and gradient of the hinge loss at a point $\mathbf{x}^{(i)}$. What is the gradient of the hinge loss? # In[ ]: def HingeLoss(x): '''Returns the value and gradient of the hinge loss at the point x''' return value, gradient # ### Question II.2 (7pts) # # Once you have the function, implement a function HingeLossSVC that takes as innput a starting weight vector $\mathbf{\beta}$ and intercept $\beta_0$ as well as the set of training points and a value for the parameter $C$ and returns the maximum margin classifier. # In[ ]: def HingeLossSVC(beta_init, beta0_init training, C): '''Returns the maximal margin classifier for the training dataset''' return beta, beta0 # ## Question III (5pts): Segmentation with K means # # Upload a picture of yourself (possibly downsampled) and apply a Kmeans segmentation in the RGB space for a few distinct numbers of centroids (e.g. 5, 10, 20). # In[ ]: