#!/usr/bin/env python # coding: utf-8 #

Introduction to Machine Learning, Summer 2022

#

Assignment 2

# # __Given date:__ June 13 # # __Due date:__ June 21 # # __Total: 25pts__ # # # ### Question 1 Logistic regression (15pts) # # ##### Question 1.1 Logistic regression (5pts) # # As we saw during the lectures, one approach at learning a (binary) linear discriminant is to combine the sigmoid activation function with the linear discriminant $\beta_0 + \mathbf{\beta}^T \mathbf{x}$. We then assume that the probability of having a particular target ($0$ vs $1$) follows a Bernoulli with parameter $\sigma(\tilde{\mathbf{\beta}}^T\tilde{\mathbf{x}})$. i.e. we have # # $$\left\{\begin{array}{l} # P(t = 1|x) = \sigma(\mathbf{\beta}^T\mathbf{x})\\ # P(t = 0|x) = 1-\sigma(\mathbf{\beta}^T\mathbf{x})\end{array}\right.$$ # # The total density can read from the product of each of the independent densities as # # $$P(\left\{t_i\right\}_{i=1}^N) = \prod_{i=1}^N \sigma(\mathbf{\beta}^T\mathbf{x})^{t^{(i)}}(1-\sigma(\mathbf{\beta}^T\mathbf{x}))^{1-t^{(i)}}$$ # # we can then take the log and compute the derivatives of the resulting expression with respect to each weight $\beta_j$. Implement this approach below. Recall that the derivative of the sigmoid $\sigma(\boldsymbol x)$ has a _simple expression_. # In[ ]: # Step 1 define the sigmoid activation and its derivative def sigmoid(x): '''the function should return the sigmoid and its derivative at all the entries of x ''' return sig, deriv_sig def solve_logisticRegression(xi, ti, beta0, maxIter, eta): '''The function should return the vector of weights for a logistic regression classifier learned through gradient descent iterations applied to the log likelihood function''' return beta # #### Question 1.2 Logistic regression and Fisher scoring (5pts) # # An interesting aspect of the MLE estimator in logistic regression (as opposed to other objective functions) is that the Hessian is positive definite. We can thus improve the iterations by using a second order method (such as Newton's method) where the simpler gradient iterations $ \mathbf{\boldsymbol \beta}^{k+1}\leftarrow \mathbf{\boldsymbol \beta}^k - \eta\nabla \ell(\mathbf{\boldsymbol \beta}^k)$ are replaced by # # $$\mathbf{\boldsymbol \beta}^{k+1}\leftarrow \mathbf{\boldsymbol \beta}^k - \eta H^{-1}({{\boldsymbol \beta}^k})\nabla \ell(\mathbf{\boldsymbol \beta}^k)$$ # # (see e.g. [here](https://statacumen.com/teach/SC1/SC1_11_LogisticRegression.pdf) for more details) Start by completing the function 'HessianMLE' below which should return the Hessian of the negative log likelihood. # In[ ]: def HessianMLE(beta, xi, ti): '''Function should return the Hessian (see https://en.wikipedia.org/wiki/Hessian_matrix) of the log likelihood at a particular value of the weights beta''' return HessianMatrix # Then complete the function 'Fisher_scoring' which should learn a logistic regression classifier based on the second order Fisher iterations. # In[ ]: def Fisher_scoring(beta0, xi, ti, maxIter, eta): '''Function should compute the logistic regression classifier by relying on Fisher scoring iterates should start at beta0 and be applied with a learning eta''' while numIter # # An example of each sign is given below. # In[2]: import matplotlib.pyplot as plt import matplotlib.image as mpimg img1 = mpimg.imread('1/00001_00000_00012.png') plt.subplot(141) plt.imshow(img1) plt.axis('off') plt.subplot(142) img2 = mpimg.imread('2/00014_00001_00019.png') plt.imshow(img2) plt.axis('off') plt.subplot(143) img3 = mpimg.imread('3/00035_00008_00023.png') plt.imshow(img3) plt.axis('off') plt.subplot(144) img4 = mpimg.imread('4/00039_00000_00029.png') plt.imshow(img4) plt.axis('off') plt.show() # ### Question 2.1. (Constructing the network 5pts) # # In this first part, we will set up the convolutional net step by step. # # - Before building the network, you should start by cropping the images so that they all have a common predefined size (take the smallest size across all images) # # # - We will use a __Sequential model__ from Keras but it will be up to you to define the final structure of the network. The construction of a sequential model should be started with the following line # In[ ]: from tensorflow.keras import Sequential model = Sequential() # #### 2.1.a. Convolutions. # # - We will use a __convolutional__ architecture. you can add convolutional layers to the model by using the following lines # In[ ]: model.add(Conv2D(num_units, (filter_size1, filter_size2), padding='same', input_shape=(3, IMG_SIZE, IMG_SIZE), activation='relu')) # for the first layer and # In[ ]: model.add(Conv2D(filters, filter_size, activation, input_shape) # for all the other layers. The 'filters' parameter indicates the number of filters you want to use in the layer. 'filter_size' encodes the size of each filter and 'activation' can be used to specify the activation function that will be applied to the output of the layer, i.e. # # $$x_{\text{out}} = \sigma(\text{filter}*\text{input}).$$ # # Finally 'input_shape' encodes the size of the input. Note that the input layer is the only layer for which the input size should be explicitely specified. Subsequent layers will automatically compute the size of their inputs based on the outputs of the previous layers. # #### 2.1.b Pooling Layers # # On top of the convolutional layers, convolutional neural networks (CNN) also involve __Pooling layers__. The addition of such layer can be done through the following line # In[ ]: model.add(MaxPooling2D(pool_size=(filter_sz1, filter_sz2),strides=None)) # The __pooling layers__ come with two parameters: the 'pool size' and the 'stride'. The basic choice for the pool size is (2,2) and the stride is usually set to None (which means it will split the image into non overlapping regions such as in the Figure below). You should however feel free to play a little with those parameters. A __Max Pooling operator__ slides a mask of size 'pool_size' over the image by a number of pixels equal to the stride parameters (in x and y, there are hence two translation parameters). for each position of the mask, the output then returns the max of the pixels appearing in the mask (again, see the Figure below). One way to understand the effect of a pooling operator is that when the filter detects an edge in a subregion of the image (thus returning at least one large value), although the MaxPooling operation will reduce the resolution, it will keep track of this information. # # Adding 'Maxpooling' layers is known to work well in practice for image processing tasks. # # # Although it is up to you to decide how you want to structure the network, a good start is to add a couple (definitely not exceeding 4) combinations (convolution, convolution, Pooling) with an increasing number of units per layer (you can for example consider a number of units increasing according to powers of 2 such as 16, 32, 128,...). # #### 2.1.c. Flattening and Fully connected layers # # Once you have stacked the convolutional and pooling layers, you should flatten the output through a line of the form # In[ ]: model.add(Flatten()) # And add a couple (no need to put more than 2,3) dense fully connected layers through lines of the form # In[ ]: model.add(Dense(num_units, activation='relu')) # #### 2.1.d. Concluding # # Since there are four possible signs, you need to __finish your network with a dense layer consisting of 4 units__. Each of those units should output a number between 0 and 1 representing the likelihood that any of the four signs is detected. Correspondingly those numbers should satisfy $n_1 + n_2 + n_3 + n_4 = 1$ (hopefully with one $n_i$ larger than the others). For this reason, a good choice for the __final activation function__ of those four units is the __softmax__ (Why?). # # # Build your model below. # In[ ]: model = Sequential() # construct the model using convolutional layers, dense fully connected layers and # ### Question 2.2. Setting up the optimizer (3pts). # # Once you have found a good architecture for your network, split the dataset, by retaining about 90% of the images for training and 10% for test. To train the network in Keras, we need two more steps. The first step will set up the optimizer. Here again it is up to you to decide how you want to set up the optimization. Two popular approaches are __SGD and ADAM__. You will get to choose the learning rate (although it is a good idea to take it between 1e-3 and 1e-2). Once you have set up the optimizer, you need to specify the loss (we will take it to be the __categorical cross entropy__ which is the extension of the log loss to the multiclass problem). # In[ ]: from tensorflow.keras.optimizers import SGD from tensorflow.keras.optimizers import Adam # set up the optimize here # Myoptimizer = SGD # Myoptimizer = Adam model.compile(loss='categorical_crossentropy', optimizer=Myoptimizer, metrics=['accuracy']) # ### Question 2.3 Optimization (2pts). # # Our last step will consist in fitting the network to the training set. Just as for any implementation in scikit-learn, we will rely on the function 'fit'. In image processing tasks, the training of convolutional neural networks is usually done by splitting the dataset into minibatches and using a different batch for each SGD iteration. This process is repeated over the whole dataset. A complete screening of the dataset is known as an 'epoch'. The complete training step then repeats several epochs. In keras the number of epochs is stored in the 'epochs' parameter of the function 'fit' and the batch size is stored in the 'batch_size' parameter. Plot the evolution of the loss through the SGD iterations. # In[ ]: from sklearn.model_selection import train_test_split batch_size = 32 epochs = 30 X_train, X_test, t_train, t_test = train_test_split(X, t, test_size=0.1, random_state=1) model.fit(X_train, t_train, batch_size=batch_size, epochs=epochs, validation_split=0.15) model.evaluate(X_test, t_test)