This is a Python rendition of principal component analysis in the context of facial recognition using the Extended Yale Faces Database B which you can download here. Originally done in R, this was written in order to experiment with the sklearn
library. If you have any questions about this notebook, please do not hesitate to contact me at ryan.quan08@gmail.com.
%matplotlib inline
import numpy as np
import os
import matplotlib.pyplot as plt
import math
from PIL import Image
!pwd
data_dir = "/Users/Quan/GitHub/sklearn-practice/CroppedYale"
os.chdir(data_dir)
!ls
/Users/Quan/github/sklearn-practice yaleB01 yaleB05 yaleB09 yaleB13 yaleB18 yaleB22 yaleB26 yaleB30 yaleB34 yaleB38 yaleB02 yaleB06 yaleB10 yaleB15 yaleB19 yaleB23 yaleB27 yaleB31 yaleB35 yaleB39 yaleB03 yaleB07 yaleB11 yaleB16 yaleB20 yaleB24 yaleB28 yaleB32 yaleB36 yaleB04 yaleB08 yaleB12 yaleB17 yaleB21 yaleB25 yaleB29 yaleB33 yaleB37
If your files are not already in the .png format, you can use this block of code to convert the file using the UNIX shell. Credits to UCSD_Big_Data for this block of code.
# converting from svg to png
# from glob import glob
# %cd $data_dir
# files=glob('yaleB*/*.pgm')
# print 'number of files is',len(files)
# count=0
# for f in files:
# new_f=f[:-3]+'png'
# !convert $f $new_f
# count += 1
# if count % 100==0:
# print count,f,new_f
This image_grid_
function reshapes the array into its original dimensions and plots the image in a grid. The col
parameter allows you to specify the number of images in each row.
def image_grid(D,H,W,cols=10,scale=1):
""" display a grid of images
H,W: Height and width of the images
cols: number of columns = number of images in each row
scale: 1 to fill screen
"""
n = np.shape(D)[0]
rows = int(math.ceil((n+0.0)/cols))
fig = plt.figure(1,figsize=[scale*20.0/H*W,scale*20.0/cols*rows],dpi=300)
for i in range(n):
plt.subplot(rows,cols,i+1)
fig=plt.imshow(np.reshape(D[i,:],[H,W]), cmap = plt.get_cmap("gray"))
plt.axis('off')
create_filenames
allows the users to specify the current working directory where the CroppedYale
folder resides and the image view for the subjects.
def create_filenames(data_dir, view_list):
# loads the pictures into a list
# data_dir: the CroppedYale folder
# view_list: the views you wish to grab
dir_list = os.listdir(data_dir)
file_list = []
for dir in dir_list:
for view in view_list:
filename = "%s/%s_%s.png" % (dir, dir, view)
file_list.append(filename)
return(file_list)
view_list = ['P00A+000E+00', 'P00A+005E+10' , 'P00A+005E-10' , 'P00A+010E+00']
file_list = create_filenames(data_dir, view_list)
len(file_list)
152
# open image
im = Image.open(file_list[0]).convert("L")
# get original dimensions
H,W = np.shape(im)
print 'shape=',(H,W)
im_number = len(file_list)
# fill array with rows as image
# and columns as pixels
arr = np.zeros([im_number,H*W])
for i in range(im_number):
im = Image.open(file_list[i]).convert("L")
arr[i,:] = np.reshape(np.asarray(im),[1,H*W])
image_grid(arr,H,W)
shape= (192, 168)