Notebook

A Network Tour of Data Science¶

Xavier Bresson, Winter 2016/17¶

Assignment 3 : Recurrent Neural Networks¶

In [1]:

# Import libraries
import tensorflow as tf
import numpy as np
import collections
import os

In [2]:

# Load text data
data = open(os.path.join('datasets', 'text_ass_6.txt'), 'r').read() # must be simple plain text file
print('Text data:',data)
chars = list(set(data))
print('\nSingle characters:',chars)
data_len, vocab_size = len(data), len(chars)
print('\nText data has %d characters, %d unique.' % (data_len, vocab_size))
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }
print('\nMapping characters to numbers:',char_to_ix)
print('\nMapping numbers to characters:',ix_to_char)

Text data: hello world! is a very simple program in most programming languages often used to illustrate the basic syntax of a programming language

Single characters: ['x', 'w', 'a', 'b', '!', 'n', 'h', 'd', 'i', 's', 'v', 'y', 'e', ' ', 'r', 'u', 'g', 'f', 'c', 'l', 'm', 'p', 't', 'o']

Text data has 135 characters, 24 unique.

Mapping characters to numbers: {'x': 0, 's': 9, 'v': 10, 'y': 11, 't': 22, 'w': 1, 'e': 12, 'r': 14, 'u': 15, 'g': 16, 'f': 17, 'c': 18, 'l': 19, 'm': 20, 'b': 3, 'a': 2, ' ': 13, '!': 4, 'o': 23, 'n': 5, 'h': 6, 'p': 21, 'd': 7, 'i': 8}

Mapping numbers to characters: {0: 'x', 1: 'w', 2: 'a', 3: 'b', 4: '!', 5: 'n', 6: 'h', 7: 'd', 8: 'i', 9: 's', 10: 'v', 11: 'y', 12: 'e', 13: ' ', 14: 'r', 15: 'u', 16: 'g', 17: 'f', 18: 'c', 19: 'l', 20: 'm', 21: 'p', 22: 't', 23: 'o'}

Goal¶

The goal is to define with TensorFlow a vanilla recurrent neural network (RNN) model:

$$ \begin{aligned} h_t &= \textrm{tanh}(W_h h_{t-1} + W_x x_t + b_h)\\ y_t &= W_y h_t + b_y \end{aligned} $$

to predict a sequence of characters. $x_t \in \mathbb{R}^D$ is the input character of the RNN in a dictionary of size $D$. $y_t \in \mathbb{R}^D$ is the predicted character (through a distribution function) by the RNN system. $h_t \in \mathbb{R}^H$ is the memory of the RNN, called hidden state at time $t$. Its dimensionality is arbitrarly chosen to $H$. The variables of the system are $W_h \in \mathbb{R}^{H\times H}$, $W_x \in \mathbb{R}^{H\times D}$, $W_y \in \mathbb{R}^{D\times H}$, $b_h \in \mathbb{R}^D$, and $b_y \in \mathbb{R}^D$.

The number of time steps of the RNN is $T$, that is we will learn a sequence of data of length $T$: $x_t$ for $t=0,...,T-1$.

In [3]:

# hyperparameters of RNN
batch_size = 3                                  # batch size
batch_len = data_len // batch_size              # batch length
T = 5                                           # temporal length
epoch_size = (batch_len - 1) // T               # nb of iterations to get one epoch
D = vocab_size                                  # data dimension = nb of unique characters
H = 5*D                                         # size of hidden state, the memory layer

print('data_len=',data_len,' batch_size=',batch_size,' batch_len=',
      batch_len,' T=',T,' epoch_size=',epoch_size,' D=',D)

data_len= 135  batch_size= 3  batch_len= 45  T= 5  epoch_size= 8  D= 24

Step 1¶

Initialize input variables of the computational graph:
(1) Xin of size batch_size x T x D and type tf.float32. Each input character is encoded on a vector of size D.
(2) Ytarget of size batch_size x T and type tf.int64. Each target character is encoded by a value in {0,...,D-1}.
(3) hin of size batch_size x H and type tf.float32

In [4]:

# input variables of computational graph (CG)
Xin = tf.placeholder(tf.float32, [batch_size,T,D]); #print('Xin=',Xin) # Input
Ytarget = tf.placeholder(tf.int64, [batch_size,T]); #print('Y_=',Y_) # target 
hin = tf.placeholder(tf.float32, [batch_size,H]); #print('hin=',hin.get_shape())

Step 2¶

Define the variables of the computational graph:
(1) $W_x$ is a random variable of shape D x H with normal distribution of variance $\frac{6}{D+H}$
(2) $W_h$ is an identity matrix multiplies by constant $0.01$
(3) $W_y$ is a random variable of shape H x D with normal distribution of variance $\frac{6}{D+H}$
(4) $b_h$, $b_y$ are zero vectors of size H, and D

In [5]:

# Model variables
Wx = tf.Variable(tf.random_normal([D,H], stddev=tf.sqrt(6./tf.to_float(D+H)))); print('Wx=',Wx.get_shape())
Wh = tf.Variable(0.01*np.identity(H, np.float32)); print('Wh=',Wh.get_shape())
Wy = tf.Variable(tf.random_normal([H,D], stddev=tf.sqrt(6./tf.to_float(H+D)))); print('Wy=',Wy.get_shape())
bh = tf.Variable(tf.zeros([H])); print('bh=',bh.get_shape())
by = tf.Variable(tf.zeros([D])); print('by=',by.get_shape())

Wx= (24, 120)
Wh= (120, 120)
Wy= (120, 24)
bh= (120,)
by= (24,)

Step 3¶

Implement the recursive formula:

$$ \begin{aligned} h_t &= \textrm{tanh}(W_h h_{t-1} + W_x x_t + b_h)\\ y_t &= W_y h_t + b_y \end{aligned} $$

with $h_{t=0}=hin$.

Hints:
(1) You may use functions tf.split(), enumerate(), tf.squeeze(), tf.matmul(), tf.tanh(), tf.transpose(), append(), pack().
(2) You may use a matrix Y of shape batch_size x T x D. We recall that Ytarget should have the shape batch_size x T.

In [6]:

# Vanilla RNN implementation
Y = []
ht = hin
for t, xt in enumerate(tf.split(1, T, Xin)): 
    if batch_size>1:
        xt = tf.squeeze(xt); #print('xt=',xt) 
    else:
        xt = tf.squeeze(xt)[None,:] 
    ht = tf.matmul(ht, Wh); #print('ht1=',ht) 
    ht += tf.matmul(xt, Wx); #print('ht2=',ht) 
    
    ht += bh; #print('ht3=',ht) 
    ht = tf.tanh(ht); #print('ht4=',ht) 
    
    yt = tf.matmul(ht, Wy); #print('yt1=',yt)
    yt += by; #print('yt2=',yt)
    
    Y.append(yt)
#print('Y=',Y) 

Y = tf.pack(Y); 
if batch_size>1:
    Y = tf.squeeze(Y); 
Y = tf.transpose(Y, [1, 0, 2])
print('Y=',Y.get_shape())
print('Ytarget=',Ytarget.get_shape())

Y= (3, 5, 24)
Ytarget= (3, 5)

Step 4¶

Perplexity loss is implemented as:

In [7]:

# perplexity
logits = tf.reshape(Y,[batch_size*T,D])
weights = tf.ones([batch_size*T])
cross_entropy_perplexity = tf.nn.seq2seq.sequence_loss_by_example([logits],[Ytarget],[weights])
cross_entropy_perplexity = tf.reduce_sum(cross_entropy_perplexity) / batch_size
loss = cross_entropy_perplexity

Step 5¶

Implement the optimization of the loss function.

Hint: You may use function tf.train.GradientDescentOptimizer().

In [8]:

# Optimization
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

Step 6¶

Implement the prediction scheme: from an input character e.g. "h" then the RNN should predict "ello".

Hints:
(1) You should use the learned RNN.
(2) You may use functions tf.one_hot(), tf.nn.softmax(), tf.argmax().

In [9]:

# Predict
idx_pred = tf.placeholder(tf.int64) # input seed
xtp = tf.one_hot(idx_pred,depth=D); #print('xtp1=',xtp.get_shape())
htp = tf.zeros([1,H])
Ypred = []
for t in range(T):
    htp = tf.matmul(htp, Wh); #print('htp1=',htp) 
    htp += tf.matmul(xtp, Wx); #print('htp2=',htp) 
    htp += bh; #print('htp3=',htp) # (1, 100)
    htp = tf.tanh(htp); #print('htp4=',htp) # (1, 100)
    ytp = tf.matmul(htp, Wy); #print('ytp1=',ytp)
    ytp += by; #print('ytp2=',ytp)
    ytp = tf.nn.softmax(ytp); #print('yt1=',ytp)
    ytp = tf.squeeze(ytp); #print('yt2=',ytp)  
    seed_idx = tf.argmax(ytp,dimension=0); #print('seed_idx=',seed_idx)
    xtp = tf.one_hot(seed_idx,depth=D)[None,:]; #print('xtp2=',xtp.get_shape())
    Ypred.append(seed_idx)
Ypred = tf.convert_to_tensor(Ypred)

In [10]:

# Prepare train data matrix of size "batch_size x batch_len"
data_ix = [char_to_ix[ch] for ch in data[:data_len]]
train_data = np.array(data_ix)
print('original train set shape',train_data.shape)
train_data = np.reshape(train_data[:batch_size*batch_len], [batch_size,batch_len])
print('pre-processed train set shape',train_data.shape)

original train set shape (135,)
pre-processed train set shape (3, 45)

In [11]:

# The following function tansforms an integer value d between {0,...,D-1} into an one hot vector, that is a 
# vector of dimension D x 1 which has value 1 for index d-1, and 0 otherwise
from scipy.sparse import coo_matrix
def convert_to_one_hot(a,max_val=None):
    N = a.size
    data = np.ones(N,dtype=int)
    sparse_out = coo_matrix((data,(np.arange(N),a.ravel())), shape=(N,max_val))
    return np.array(sparse_out.todense())

Step 7¶

Run the computational graph with batches of training data.
Predict the sequence of characters starting from the character "h".

Hints:
(1) Initial memory is $h_{t=0}$ is 0.
(2) Run the computational graph to optimize the perplexity loss, and to predict the the sequence of characters starting from the character "h".

In [12]:

# Run CG
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

h0 = np.zeros([batch_size,H])
indices = collections.deque()
costs = 0.0; epoch_iters = 0
for n in range(50):
    
    # Batch extraction
    if len(indices) < 1:
        indices.extend(range(epoch_size))
        costs = 0.0; epoch_iters = 0
    i = indices.popleft() 
    batch_x = train_data[:,i*T:(i+1)*T]
    batch_x = convert_to_one_hot(batch_x,D); batch_x = np.reshape(batch_x,[batch_size,T,D])
    batch_y = train_data[:,i*T+1:(i+1)*T+1]
    #print(batch_x.shape,batch_y.shape)

    # Train
    idx = char_to_ix['h'];
    loss_value,_,Ypredicted = sess.run([loss,train_step,Ypred], feed_dict={Xin: batch_x, Ytarget: batch_y, hin: h0, idx_pred: [idx]})
   
    # Perplexity
    costs += loss_value
    epoch_iters += T
    perplexity = np.exp(costs/epoch_iters)
    
    if not n%1:
        idx_char = Ypredicted
        txt = ''.join(ix_to_char[ix] for ix in list(idx_char))
        print('\nn=',n,', perplexity value=',perplexity)
        print('starting char=',ix_to_char[idx], ', predicted sequences=',txt)
    
sess.close()    

n= 0 , perplexity value= 29.5535130421
starting char= h , predicted sequences= bguct

n= 1 , perplexity value= 27.9625002039
starting char= h , predicted sequences= epepe

n= 2 , perplexity value= 28.2821211393
starting char= h , predicted sequences=  toe 

n= 3 , perplexity value= 27.6789270581
starting char= h , predicted sequences=  la i

n= 4 , perplexity value= 26.0211196019
starting char= h , predicted sequences=  aa  

n= 5 , perplexity value= 23.946255338
starting char= h , predicted sequences=  a  a

n= 6 , perplexity value= 21.8258818021
starting char= h , predicted sequences=  a ps

n= 7 , perplexity value= 21.1786920954
starting char= h , predicted sequences=  osra

n= 8 , perplexity value= 9.46772242765
starting char= h , predicted sequences=  in i

n= 9 , perplexity value= 10.4062497066
starting char= h , predicted sequences= ello 

n= 10 , perplexity value= 10.4282811726
starting char= h , predicted sequences= ello 

n= 11 , perplexity value= 11.502705008
starting char= h , predicted sequences=  lan 

n= 12 , perplexity value= 11.6051496676
starting char= h , predicted sequences=  lan 

n= 13 , perplexity value= 10.6132289482
starting char= h , predicted sequences=  of a

n= 14 , perplexity value= 9.6058489675
starting char= h , predicted sequences=  ls p

n= 15 , perplexity value= 9.37241584996
starting char= h , predicted sequences= e pro

n= 16 , perplexity value= 5.31786451328
starting char= h , predicted sequences=  lang

n= 17 , perplexity value= 5.71144402466
starting char= h , predicted sequences= ello 

n= 18 , perplexity value= 5.68151127767
starting char= h , predicted sequences= ello 

n= 19 , perplexity value= 6.46925124888
starting char= h , predicted sequences= elln 

n= 20 , perplexity value= 6.58066032529
starting char= h , predicted sequences= e bas

n= 21 , perplexity value= 6.13328235991
starting char= h , predicted sequences= e pf 

n= 22 , perplexity value= 5.65600492947
starting char= h , predicted sequences= e pro

n= 23 , perplexity value= 5.61072280515
starting char= h , predicted sequences= el of

n= 24 , perplexity value= 3.5418044391
starting char= h , predicted sequences= ellu 

n= 25 , perplexity value= 3.61793218568
starting char= h , predicted sequences= ello 

n= 26 , perplexity value= 3.56632600989
starting char= h , predicted sequences= ello 

n= 27 , perplexity value= 3.98579038998
starting char= h , predicted sequences= ello 

n= 28 , perplexity value= 4.01507879557
starting char= h , predicted sequences= ello 

n= 29 , perplexity value= 3.81765365683
starting char= h , predicted sequences= e bas

n= 30 , perplexity value= 3.60902281626
starting char= h , predicted sequences= ello 

n= 31 , perplexity value= 3.66261425863
starting char= h , predicted sequences= ello 

n= 32 , perplexity value= 2.58569042517
starting char= h , predicted sequences= ellu 

n= 33 , perplexity value= 2.71294759439
starting char= h , predicted sequences= ello 

n= 34 , perplexity value= 2.69944025877
starting char= h , predicted sequences= ello 

n= 35 , perplexity value= 2.93036847301
starting char= h , predicted sequences= ello 

n= 36 , perplexity value= 2.8802468858
starting char= h , predicted sequences= ello 

n= 37 , perplexity value= 2.74185760328
starting char= h , predicted sequences= ello 

n= 38 , perplexity value= 2.60001086299
starting char= h , predicted sequences= ello 

n= 39 , perplexity value= 2.63901644204
starting char= h , predicted sequences= ello 

n= 40 , perplexity value= 2.10067073021
starting char= h , predicted sequences= ello 

n= 41 , perplexity value= 2.21468356725
starting char= h , predicted sequences= ello 

n= 42 , perplexity value= 2.19319889961
starting char= h , predicted sequences= ello 

n= 43 , perplexity value= 2.36261821496
starting char= h , predicted sequences= ello 

n= 44 , perplexity value= 2.31412813917
starting char= h , predicted sequences= ello 

n= 45 , perplexity value= 2.22972349843
starting char= h , predicted sequences= ello 

n= 46 , perplexity value= 2.12354689044
starting char= h , predicted sequences= ello 

n= 47 , perplexity value= 2.12585888398
starting char= h , predicted sequences= ello 

n= 48 , perplexity value= 1.87534942843
starting char= h , predicted sequences= ello 

n= 49 , perplexity value= 1.9234134446
starting char= h , predicted sequences= ello