In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

import numpy as np
from numpy import linalg

from sklearn.datasets import fetch_olivetti_faces

For this exercise, you will be making use of the Olivetti Faces dataset. Here is a description from the original website:

There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

In [2]:
# load faces dataset
faces = fetch_olivetti_faces(shuffle=True).data
downloading Olivetti faces from https://ndownloader.figshare.com/files/5976027 to /Users/rasa/scikit_learn_data
In [3]:
faces.shape  # 400 images of dimension 64 x 64 
# note: images are elongated into a 4096-dimensional vector
Out[3]:
(400, 4096)
In [36]:
faces[np.random.randint(400)] #
Out[36]:
array([0.42975205, 0.45041323, 0.49586776, ..., 0.30578512, 0.2892562 ,
       0.3140496 ], dtype=float32)

For this exercise, we will be centering our dataset.

In [37]:
faces = faces - faces.mean(axis=0)  # center data

Below is a function that will help you plot images.

In [52]:
# Helper method: Plots the first 15 images in the dataset.
def plot_images(images,):
    images = images[0:15,:]
    plt.figure(figsize=(10,7))
    for i, image in enumerate(images):
        plt.subplot(3, 5, i+1)
        cmap_range = max(image.max(), -image.min())
        reshaped =  image.reshape((64,64))
        plt.imshow(reshaped, vmin=-cmap_range, vmax=cmap_range,
                   cmap=plt.cm.gray, interpolation='nearest')
        plt.xticks(())
        plt.yticks(())

To get a sense of the dataset, let's first plot the some of the images:

In [53]:
indexes = np.random.randint(0,400,15)
plot_images(faces)

Your task begins here:

In [ ]:
"""
In this box, compute the SVD of the Faces dataset. 
You may use the linalg module in numpy.
You should get three outputs, the left singular matrix U, 
a vector of singular values S, and the transpose of the right 
singular matrix, V_T. 
"""
def compute_svd(data):
    pass

U, S, V_T = compute_svd(faces)
In [ ]:
"""
In this box, print the dimensions of each of U, S, and V_T. 
Are these dimensions in line with your expectations?
"""
pass 
In [ ]:
"""
Now, print the first 15 elements of the vector of singular values.
How are they ordered?
"""
pass
In [ ]:
"""
Plot the first 15 right singular vectors of the Faces dataset.
What do you observe? 
"""
pass

How are these singular vectors related to the dataset? Think about the relationship between SVD and eigendecomposition.

Write your answer HERE