In this example we are going to autoencode the faces of the olivetti dataset and try to reconstruct them back.
%matplotlib inline
import matplotlib
import numpy as np
import pandas as pd
import scipy.io
import matplotlib.pyplot as plt
from IPython.display import Image, display
import h2o
from h2o.estimators.deeplearning import H2OAutoEncoderEstimator
h2o.init()
Checking whether there is an H2O instance running at http://localhost:54321. connected.
H2O cluster uptime: | 2 hours 1 min |
H2O cluster version: | 3.11.0.99999 |
H2O cluster version age: | 9 hours and 8 minutes |
H2O cluster name: | arno |
H2O cluster total nodes: | 1 |
H2O cluster free memory: | 13.47 Gb |
H2O cluster total cores: | 12 |
H2O cluster allowed cores: | 12 |
H2O cluster status: | locked, healthy |
H2O connection url: | http://localhost:54321 |
H2O connection proxy: | None |
Python version: | 2.7.12 final |
!wget -c http://www.cl.cam.ac.uk/Research/DTG/attarchive/pub/data/att_faces.tar.Z
--2016-10-23 01:11:47-- http://www.cl.cam.ac.uk/Research/DTG/attarchive/pub/data/att_faces.tar.Z Resolving www.cl.cam.ac.uk (www.cl.cam.ac.uk)... 2001:630:212:200::80:14, 128.232.0.20 Connecting to www.cl.cam.ac.uk (www.cl.cam.ac.uk)|2001:630:212:200::80:14|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.tar.Z [following] --2016-10-23 01:11:47-- http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.tar.Z Reusing existing connection to [www.cl.cam.ac.uk]:80. HTTP request sent, awaiting response... 200 OK Length: 4075767 (3.9M) [application/x-compress] Saving to: ‘att_faces.tar.Z’ att_faces.tar.Z 100%[===================>] 3.89M 2.55MB/s in 1.5s 2016-10-23 01:11:49 (2.55 MB/s) - ‘att_faces.tar.Z’ saved [4075767/4075767]
!tar xzvf att_faces.tar.Z;rm att_faces.tar.Z;
orl_faces/ orl_faces/README orl_faces/s1/ orl_faces/s1/6.pgm orl_faces/s1/7.pgm orl_faces/s1/8.pgm orl_faces/s1/9.pgm orl_faces/s1/10.pgm orl_faces/s1/1.pgm orl_faces/s1/2.pgm orl_faces/s1/3.pgm orl_faces/s1/4.pgm orl_faces/s1/5.pgm orl_faces/s2/ orl_faces/s2/6.pgm orl_faces/s2/7.pgm orl_faces/s2/8.pgm orl_faces/s2/9.pgm orl_faces/s2/10.pgm orl_faces/s2/1.pgm orl_faces/s2/2.pgm orl_faces/s2/3.pgm orl_faces/s2/4.pgm orl_faces/s2/5.pgm orl_faces/s3/ orl_faces/s3/6.pgm orl_faces/s3/7.pgm orl_faces/s3/8.pgm orl_faces/s3/9.pgm orl_faces/s3/10.pgm orl_faces/s3/1.pgm orl_faces/s3/2.pgm orl_faces/s3/3.pgm orl_faces/s3/4.pgm orl_faces/s3/5.pgm orl_faces/s4/ orl_faces/s4/6.pgm orl_faces/s4/7.pgm orl_faces/s4/8.pgm orl_faces/s4/9.pgm orl_faces/s4/10.pgm orl_faces/s4/1.pgm orl_faces/s4/2.pgm orl_faces/s4/3.pgm orl_faces/s4/4.pgm orl_faces/s4/5.pgm orl_faces/s5/ orl_faces/s5/6.pgm orl_faces/s5/7.pgm orl_faces/s5/8.pgm orl_faces/s5/9.pgm orl_faces/s5/10.pgm orl_faces/s5/1.pgm orl_faces/s5/2.pgm orl_faces/s5/3.pgm orl_faces/s5/4.pgm orl_faces/s5/5.pgm orl_faces/s6/ orl_faces/s6/6.pgm orl_faces/s6/7.pgm orl_faces/s6/8.pgm orl_faces/s6/9.pgm orl_faces/s6/10.pgm orl_faces/s6/1.pgm orl_faces/s6/2.pgm orl_faces/s6/3.pgm orl_faces/s6/4.pgm orl_faces/s6/5.pgm orl_faces/s7/ orl_faces/s7/6.pgm orl_faces/s7/7.pgm orl_faces/s7/8.pgm orl_faces/s7/9.pgm orl_faces/s7/10.pgm orl_faces/s7/1.pgm orl_faces/s7/2.pgm orl_faces/s7/3.pgm orl_faces/s7/4.pgm orl_faces/s7/5.pgm orl_faces/s8/ orl_faces/s8/6.pgm orl_faces/s8/7.pgm orl_faces/s8/8.pgm orl_faces/s8/9.pgm orl_faces/s8/10.pgm orl_faces/s8/1.pgm orl_faces/s8/2.pgm orl_faces/s8/3.pgm orl_faces/s8/4.pgm orl_faces/s8/5.pgm orl_faces/s9/ orl_faces/s9/6.pgm orl_faces/s9/7.pgm orl_faces/s9/8.pgm orl_faces/s9/9.pgm orl_faces/s9/10.pgm orl_faces/s9/1.pgm orl_faces/s9/2.pgm orl_faces/s9/3.pgm orl_faces/s9/4.pgm orl_faces/s9/5.pgm orl_faces/s10/ orl_faces/s10/6.pgm orl_faces/s10/7.pgm orl_faces/s10/8.pgm orl_faces/s10/9.pgm orl_faces/s10/10.pgm orl_faces/s10/1.pgm orl_faces/s10/2.pgm orl_faces/s10/3.pgm orl_faces/s10/4.pgm orl_faces/s10/5.pgm orl_faces/s11/ orl_faces/s11/6.pgm orl_faces/s11/7.pgm orl_faces/s11/8.pgm orl_faces/s11/9.pgm orl_faces/s11/10.pgm orl_faces/s11/1.pgm orl_faces/s11/2.pgm orl_faces/s11/3.pgm orl_faces/s11/4.pgm orl_faces/s11/5.pgm orl_faces/s12/ orl_faces/s12/6.pgm orl_faces/s12/7.pgm orl_faces/s12/8.pgm orl_faces/s12/9.pgm orl_faces/s12/10.pgm orl_faces/s12/1.pgm orl_faces/s12/2.pgm orl_faces/s12/3.pgm orl_faces/s12/4.pgm orl_faces/s12/5.pgm orl_faces/s13/ orl_faces/s13/6.pgm orl_faces/s13/7.pgm orl_faces/s13/8.pgm orl_faces/s13/9.pgm orl_faces/s13/10.pgm orl_faces/s13/1.pgm orl_faces/s13/2.pgm orl_faces/s13/3.pgm orl_faces/s13/4.pgm orl_faces/s13/5.pgm orl_faces/s14/ orl_faces/s14/6.pgm orl_faces/s14/7.pgm orl_faces/s14/8.pgm orl_faces/s14/9.pgm orl_faces/s14/10.pgm orl_faces/s14/1.pgm orl_faces/s14/2.pgm orl_faces/s14/3.pgm orl_faces/s14/4.pgm orl_faces/s14/5.pgm orl_faces/s15/ orl_faces/s15/6.pgm orl_faces/s15/7.pgm orl_faces/s15/8.pgm orl_faces/s15/9.pgm orl_faces/s15/10.pgm orl_faces/s15/1.pgm orl_faces/s15/2.pgm orl_faces/s15/3.pgm orl_faces/s15/4.pgm orl_faces/s15/5.pgm orl_faces/s16/ orl_faces/s16/6.pgm orl_faces/s16/7.pgm orl_faces/s16/8.pgm orl_faces/s16/9.pgm orl_faces/s16/10.pgm orl_faces/s16/1.pgm orl_faces/s16/2.pgm orl_faces/s16/3.pgm orl_faces/s16/4.pgm orl_faces/s16/5.pgm orl_faces/s17/ orl_faces/s17/6.pgm orl_faces/s17/7.pgm orl_faces/s17/8.pgm orl_faces/s17/9.pgm orl_faces/s17/10.pgm orl_faces/s17/1.pgm orl_faces/s17/2.pgm orl_faces/s17/3.pgm orl_faces/s17/4.pgm orl_faces/s17/5.pgm orl_faces/s18/ orl_faces/s18/6.pgm orl_faces/s18/7.pgm orl_faces/s18/8.pgm orl_faces/s18/9.pgm orl_faces/s18/10.pgm orl_faces/s18/1.pgm orl_faces/s18/2.pgm orl_faces/s18/3.pgm orl_faces/s18/4.pgm orl_faces/s18/5.pgm orl_faces/s19/ orl_faces/s19/6.pgm orl_faces/s19/7.pgm orl_faces/s19/8.pgm orl_faces/s19/9.pgm orl_faces/s19/10.pgm orl_faces/s19/1.pgm orl_faces/s19/2.pgm orl_faces/s19/3.pgm orl_faces/s19/4.pgm orl_faces/s19/5.pgm orl_faces/s20/ orl_faces/s20/6.pgm orl_faces/s20/7.pgm orl_faces/s20/8.pgm orl_faces/s20/9.pgm orl_faces/s20/10.pgm orl_faces/s20/1.pgm orl_faces/s20/2.pgm orl_faces/s20/3.pgm orl_faces/s20/4.pgm orl_faces/s20/5.pgm orl_faces/s21/ orl_faces/s21/6.pgm orl_faces/s21/7.pgm orl_faces/s21/8.pgm orl_faces/s21/9.pgm orl_faces/s21/10.pgm orl_faces/s21/1.pgm orl_faces/s21/2.pgm orl_faces/s21/3.pgm orl_faces/s21/4.pgm orl_faces/s21/5.pgm orl_faces/s22/ orl_faces/s22/6.pgm orl_faces/s22/7.pgm orl_faces/s22/8.pgm orl_faces/s22/9.pgm orl_faces/s22/10.pgm orl_faces/s22/1.pgm orl_faces/s22/2.pgm orl_faces/s22/3.pgm orl_faces/s22/4.pgm orl_faces/s22/5.pgm orl_faces/s23/ orl_faces/s23/6.pgm orl_faces/s23/7.pgm orl_faces/s23/8.pgm orl_faces/s23/9.pgm orl_faces/s23/10.pgm orl_faces/s23/1.pgm orl_faces/s23/2.pgm orl_faces/s23/3.pgm orl_faces/s23/4.pgm orl_faces/s23/5.pgm orl_faces/s24/ orl_faces/s24/6.pgm orl_faces/s24/7.pgm orl_faces/s24/8.pgm orl_faces/s24/9.pgm orl_faces/s24/10.pgm orl_faces/s24/1.pgm orl_faces/s24/2.pgm orl_faces/s24/3.pgm orl_faces/s24/4.pgm orl_faces/s24/5.pgm orl_faces/s25/ orl_faces/s25/6.pgm orl_faces/s25/7.pgm orl_faces/s25/8.pgm orl_faces/s25/9.pgm orl_faces/s25/10.pgm orl_faces/s25/1.pgm orl_faces/s25/2.pgm orl_faces/s25/3.pgm orl_faces/s25/4.pgm orl_faces/s25/5.pgm orl_faces/s26/ orl_faces/s26/6.pgm orl_faces/s26/7.pgm orl_faces/s26/8.pgm orl_faces/s26/9.pgm orl_faces/s26/10.pgm orl_faces/s26/1.pgm orl_faces/s26/2.pgm orl_faces/s26/3.pgm orl_faces/s26/4.pgm orl_faces/s26/5.pgm orl_faces/s27/ orl_faces/s27/6.pgm orl_faces/s27/7.pgm orl_faces/s27/8.pgm orl_faces/s27/9.pgm orl_faces/s27/10.pgm orl_faces/s27/1.pgm orl_faces/s27/2.pgm orl_faces/s27/3.pgm orl_faces/s27/4.pgm orl_faces/s27/5.pgm orl_faces/s28/ orl_faces/s28/6.pgm orl_faces/s28/7.pgm orl_faces/s28/8.pgm orl_faces/s28/9.pgm orl_faces/s28/10.pgm orl_faces/s28/1.pgm orl_faces/s28/2.pgm orl_faces/s28/3.pgm orl_faces/s28/4.pgm orl_faces/s28/5.pgm orl_faces/s29/ orl_faces/s29/6.pgm orl_faces/s29/7.pgm orl_faces/s29/8.pgm orl_faces/s29/9.pgm orl_faces/s29/10.pgm orl_faces/s29/1.pgm orl_faces/s29/2.pgm orl_faces/s29/3.pgm orl_faces/s29/4.pgm orl_faces/s29/5.pgm orl_faces/s30/ orl_faces/s30/6.pgm orl_faces/s30/7.pgm orl_faces/s30/8.pgm orl_faces/s30/9.pgm orl_faces/s30/10.pgm orl_faces/s30/1.pgm orl_faces/s30/2.pgm orl_faces/s30/3.pgm orl_faces/s30/4.pgm orl_faces/s30/5.pgm orl_faces/s31/ orl_faces/s31/6.pgm orl_faces/s31/7.pgm orl_faces/s31/8.pgm orl_faces/s31/9.pgm orl_faces/s31/10.pgm orl_faces/s31/1.pgm orl_faces/s31/2.pgm orl_faces/s31/3.pgm orl_faces/s31/4.pgm orl_faces/s31/5.pgm orl_faces/s32/ orl_faces/s32/6.pgm orl_faces/s32/7.pgm orl_faces/s32/8.pgm orl_faces/s32/9.pgm orl_faces/s32/10.pgm orl_faces/s32/1.pgm orl_faces/s32/2.pgm orl_faces/s32/3.pgm orl_faces/s32/4.pgm orl_faces/s32/5.pgm orl_faces/s33/ orl_faces/s33/6.pgm orl_faces/s33/7.pgm orl_faces/s33/8.pgm orl_faces/s33/9.pgm orl_faces/s33/10.pgm orl_faces/s33/1.pgm orl_faces/s33/2.pgm orl_faces/s33/3.pgm orl_faces/s33/4.pgm orl_faces/s33/5.pgm orl_faces/s34/ orl_faces/s34/6.pgm orl_faces/s34/7.pgm orl_faces/s34/8.pgm orl_faces/s34/9.pgm orl_faces/s34/10.pgm orl_faces/s34/1.pgm orl_faces/s34/2.pgm orl_faces/s34/3.pgm orl_faces/s34/4.pgm orl_faces/s34/5.pgm orl_faces/s35/ orl_faces/s35/6.pgm orl_faces/s35/7.pgm orl_faces/s35/8.pgm orl_faces/s35/9.pgm orl_faces/s35/10.pgm orl_faces/s35/1.pgm orl_faces/s35/2.pgm orl_faces/s35/3.pgm orl_faces/s35/4.pgm orl_faces/s35/5.pgm orl_faces/s36/ orl_faces/s36/6.pgm orl_faces/s36/7.pgm orl_faces/s36/8.pgm orl_faces/s36/9.pgm orl_faces/s36/10.pgm orl_faces/s36/1.pgm orl_faces/s36/2.pgm orl_faces/s36/3.pgm orl_faces/s36/4.pgm orl_faces/s36/5.pgm orl_faces/s37/ orl_faces/s37/6.pgm orl_faces/s37/7.pgm orl_faces/s37/8.pgm orl_faces/s37/9.pgm orl_faces/s37/10.pgm orl_faces/s37/1.pgm orl_faces/s37/2.pgm orl_faces/s37/3.pgm orl_faces/s37/4.pgm orl_faces/s37/5.pgm orl_faces/s38/ orl_faces/s38/6.pgm orl_faces/s38/7.pgm orl_faces/s38/8.pgm orl_faces/s38/9.pgm orl_faces/s38/10.pgm orl_faces/s38/1.pgm orl_faces/s38/2.pgm orl_faces/s38/3.pgm orl_faces/s38/4.pgm orl_faces/s38/5.pgm orl_faces/s39/ orl_faces/s39/6.pgm orl_faces/s39/7.pgm orl_faces/s39/8.pgm orl_faces/s39/9.pgm orl_faces/s39/10.pgm orl_faces/s39/1.pgm orl_faces/s39/2.pgm orl_faces/s39/3.pgm orl_faces/s39/4.pgm orl_faces/s39/5.pgm orl_faces/s40/ orl_faces/s40/6.pgm orl_faces/s40/7.pgm orl_faces/s40/8.pgm orl_faces/s40/9.pgm orl_faces/s40/10.pgm orl_faces/s40/1.pgm orl_faces/s40/2.pgm orl_faces/s40/3.pgm orl_faces/s40/4.pgm orl_faces/s40/5.pgm
We now need some code to read pgm files. Thanks to StackOverflow we have some code to leverage:
import re
def read_pgm(filename, byteorder='>'):
"""Return image data from a raw PGM file as numpy array.
Format specification: http://netpbm.sourceforge.net/doc/pgm.html
"""
with open(filename, 'rb') as f:
buffer = f.read()
try:
header, width, height, maxval = re.search(
b"(^P5\s(?:\s*#.*[\r\n])*"
b"(\d+)\s(?:\s*#.*[\r\n])*"
b"(\d+)\s(?:\s*#.*[\r\n])*"
b"(\d+)\s(?:\s*#.*[\r\n]\s)*)", buffer).groups()
except AttributeError:
raise ValueError("Not a raw PGM file: '%s'" % filename)
return np.frombuffer(buffer,
dtype='u1' if int(maxval) < 256 else byteorder+'u2',
count=int(width)*int(height),
offset=len(header)
).reshape((int(height), int(width)))
image = read_pgm("orl_faces/s12/6.pgm", byteorder='<')
image.shape
(112, 92)
plt.imshow(image, plt.cm.gray)
plt.show()
import glob
import os
from collections import defaultdict
images = glob.glob("orl_faces/**/*.pgm")
data = defaultdict(list)
image_data = []
for img in images:
_,label,_ = img.split(os.path.sep)
imgdata = read_pgm(img, byteorder='<').flatten().tolist()
data[label].append(imgdata)
image_data.append(imgdata)
Let's import it to H2O
faces = h2o.H2OFrame(image_data)
Parse progress: |█████████████████████████████████████████████████████████████████████████████| 100%
faces.shape
(400, 10304)
from h2o.estimators.deeplearning import H2OAutoEncoderEstimator
model = H2OAutoEncoderEstimator(
activation="Tanh",
hidden=[50],
l1=1e-4,
epochs=10
)
model.train(x=faces.names, training_frame=faces)
deeplearning Model Build progress: |██████████████████████████████████████████████████████████| 100%
model
Model Details ============= H2OAutoEncoderEstimator : Deep Learning Model Key: DeepLearning_model_python_1477202989522_3315 Status of Neuron Layers: auto-encoder, gaussian distribution, Quadratic loss, 1,040,754 weights/biases, 13.0 MB, 4,000 training samples, mini-batch size 1
layer | units | type | dropout | l1 | l2 | mean_rate | rate_rms | momentum | mean_weight | weight_rms | mean_bias | bias_rms | |
1 | 10304 | Input | 0.0 | ||||||||||
2 | 50 | Tanh | 0.0 | 0.0001 | 0.0 | 0.0007149 | 0.0002085 | 0.0 | 0.0002015 | 0.0198808 | 0.0081105 | 0.1122776 | |
3 | 10304 | Tanh | 0.0001 | 0.0 | 0.0049906 | 0.0019985 | 0.0 | 0.0003122 | 0.0250720 | 0.0011189 | 0.0135767 |
ModelMetricsAutoEncoder: deeplearning ** Reported on train data. ** MSE: 0.0150895996476 RMSE: 0.122839731551 Scoring History:
timestamp | duration | training_speed | epochs | iterations | samples | training_rmse | training_mse | |
2016-10-23 01:12:20 | 1.372 sec | 0.00000 obs/sec | 0.0 | 0 | 0.0 | 0.2161652 | 0.0467274 | |
2016-10-23 01:12:26 | 6.921 sec | 370 obs/sec | 5.0 | 5 | 2000.0 | 0.1310679 | 0.0171788 | |
2016-10-23 01:12:31 | 12.148 sec | 342 obs/sec | 9.0 | 9 | 3600.0 | 0.1246621 | 0.0155406 | |
2016-10-23 01:12:32 | 13.363 sec | 345 obs/sec | 10.0 | 10 | 4000.0 | 0.1228397 | 0.0150896 |
Now that we have our model trained, we would like to understand better what is the internal representation of this model? What makes a face a .. face?
We will provide to the model some gaussian noise and see what is the results.
We star by creating some gaussian noise:
import pandas as pd
gaussian_noise = np.random.randn(10304)
plt.imshow(gaussian_noise.reshape(112, 92), plt.cm.gray);
Then we import this data inside H2O. We have to first map the columns to the gaussian data.
gaussian_noise_pre = dict(zip(faces.names,gaussian_noise))
gaussian_noise_hf = h2o.H2OFrame.from_python(gaussian_noise_pre)
Parse progress: |█████████████████████████████████████████████████████████████████████████████| 100%
result = model.predict(gaussian_noise_hf)
deeplearning prediction progress: |███████████████████████████████████████████████████████████| 100%
result.shape
(1, 10304)
img = result.as_data_frame()
img_data = img.T.values.reshape(112, 92)
plt.imshow(img_data, plt.cm.gray);