Notebook

Movie List Recommender using Actor-critic based RL method¶

Training a list-wise movie recommender using actor-critic policy and evaluating offline using experience replay method

toc: true
badges: true
comments: true
categories: [RL, Movie, Tensorflow 1x]
image:

Introduction¶

We will model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users’ feedbacks.

Efforts have been made on utilizing reinforcement learning for recommender systems, such as POMDP and Q-learning. However, these methods may become inflexible with the increasing number of items for recommendations. This prevents them to be adopted by practical recommender systems.

Generally, there exist two Deep Q-learning architectures, shown in the above figure. Traditional deep Q-learning adopts the first architecture as shown in (a), which inputs only the state space and outputs Q-values of all actions. This architecture is suitable for the scenario with high state space and small action space, like playing Atari. However, one drawback is that it cannot handle large and dynamic action space scenario, like recommender systems. The second Q-learning architecture, shown (b), treats the state and the action as the input of Neural Networks and outputs the Q-value corresponding to this action. This architecture does not need to store each Q-value in memory and thus can deal with large action space or even continuous action space. A challenging problem of leveraging the second architecture is temporal complexity, i.e., this architecture computes Q-value for all potential actions, separately.

Recommending a list of items is more desirable (especially for sellers) than recommending a single item. To achieve this, one option is to score the items seperately and select the top ones. For example, DQN can calculate Q-values of all recalled items separately, and recommend a list of items with highest Q-values. But this strategy do not consider the relation between items. e.g. If the next best item is egg, all kind of different eggs will get high scores, like white eggs, brown eggs, farm eggs etc. But these are similar items, not complimentary. And the whole purpose of list-wise recommendation is to recommend complimentary items. That's where DQN fails.

One option to resolve this issue is by adding a simple rule - select only 1 top-scoring item from each category. This is a good rule and will improve the list quality but we have to compromise with some missed opportunities here because let's say we recommend a 12-brown-eggs 🥚 tray and a small brown bread 🍞. Now it is possible that if a 24-brown-eggs tray is scored higher than 12-brown-eggs tray but in bread category, small-brown-bread is still the highest score item. As per business sense, we should recommend a large brown bread with 24-brown-eggs tray. This is what we missed - either customer will manually select the large bread (lost opportunity for higher customer satisfaction) or just buy the small one (lost opportunity for higher revenue).

In this tutorial, our goal is to fill this gap. We will evaluate the RL agent offline using experience replay method. Also, it is possible that productionizing this model cost more than the benefit, especially for small businesses, because if we are getting 1% revenue gain, on $1M, it might not be sufficient, and on $1B, the same model will become one of the highest priority model to productionize 💵.

To tackle this problem, in this paper, our recommending policy builds upon the Actor-Critic framework. We model this problem as a Markov Decision Process (MDP), which includes a sequence of states, actions and rewards. More formally, MDP consists of a tuple of five elements $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma)$ as follows:

State space $\mathcal{S}$: A state $s_t \in S$ is defined as the browsing history of a user, i.e., previous $N$ items that a user browsed before time $t$. The items in $s_t$ are sorted in chronological order.
Action space $\mathcal{A}$: An action $a_t \in \mathcal{A}$ is to recommend a list of items to a user at time $t$ based on current state $s_t$.
Reward $\mathcal{R}$: After the recommender agent takes an action $a_t$ at the state $s_t$ , i.e., recommending a list of items to a user, the user browses these items and provides her feedback. She can skip (not click), click, or order these items, and the agent receives immediate reward $r(s_t,a_t)$ according to the user’s feedback.
Transition probability $\mathcal{P}$: Transition probability defines the probability of state transition from $s_t$ to $s_{t+1}$ when RA takes action $a_t$. If user skips all the recommended items, then the next state $s_{t+1}$ = $s_t$; while if the user clicks/orders part of items, then the next state $s_{t+1}$ updates.
Discount factor $\gamma$ : $\gamma \in [0,1]$ defines the discount factor when we measure the present value of future reward. In particular, when $\gamma$ = 0, RA only considers the immediate reward. In other words, when $\gamma$ = 1, all future rewards can be counted fully into that of the current action.

With the notations and definitions above, the problem of listwise item recommendation can be formally de!ned as follows: Given the historical MDP, i.e., $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma)$, the goal is to find a recommendation policy $\pi : \mathcal{S} \to A$, which can maximize the cumulative reward for the recommender system.

According to collaborative filtering techniques, users with similar interests will make similar decisions on the same item. With this intuition, we match the current state and action to existing historical state-action pairs, and stochastically generate a simulated reward. To be more specific, we first build a memory $M = {m_1,m_2, ···}$ to store users’ historical browsing history, where $m_i$ is a user-agent interaction triplet $((s_i, a_i) \to r_i)$. The procedure to build the online simulator memory is illustrated in the following figure. Given a historical recommendation session ${a_1, ··· , a_L}$, we can observe the initial state $s_0 = {s_0^1, ··· ,s_0^N}$ from the previous sessions (line 2). Each time we observe $K$ items in temporal order (line 3), which means that each iteration we will move forward a window of K. We can observe the current state (line 4), current $K$ items (line 5), and the user’s feedbacks for these items (line 6). Then we store triplet $((s_i, a_i) \to r_i)$ in memory (line 7). Finally we update the state (lines 8-13), and move to the next $K$ items. Since we keep a fixed length state $s = {s_1, ··· ,s_N }$, each time a user clicked/ordered some items in the recommended list, we add these items to the end of state and remove the same number of items in the top of the state. For example, the RA recommends a list of !ve items ${a_1, ··· , a_5}$ to a user, if the user clicks $a_1$ and orders $a_5$, then update $s = {s_3, ··· ,s_N , a_1, a_5}$.

Then we calculated the similarity of the current state-action pair, say $p_t(s_t,a_t)$, to each existing historical state-action pair in the memory. In this work, we adopt cosine similarity as:

where the first term measures the state similarity and the second term evaluates the action similarity. Parameter $\alpha$ controls the balance of two similarities.

The proposed framework is as follows:

The framework works like this:

Input: Current state $s_t$ , Item space $\mathcal{I}$, the length of recommendation list $K$. Output: Recommendation list $a_t$.

Generate $w_t = {w_t^1 , ··· , w_t^K}$ according to $f_\theta\pi : s_t \to w_t$ where $f_\theta\pi$ is a function parametrized by $\theta^\pi$, mapping from the state space to the weight representation space
For $k = 1:K$ do
1. Score items in $\mathcal{I}$ according to $score_i = w_t^ke_i^T$
2. Select an item with highest score $a_t^k$
3. Add item $a_t^k$ in the bottom of $a_t$
4. Remove item $a_t^k$ from $\mathcal{I}$
end for
return $a_t$

Setup¶

In [ ]:

%tensorflow_version 1.x

TensorFlow 1.x selected.

In [ ]:

import itertools
import pandas as pd
import numpy as np
import random
import csv
import time

import matplotlib.pyplot as plt

import tensorflow as tf

import keras.backend as K
from keras import Sequential
from keras.layers import Dense, Dropout

Using TensorFlow backend.

Download data¶

Downloading Movielens dataset from official source

In [ ]:

!wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
!unzip -q ml-100k.zip

DataGenerator `class`¶

Load the data into pandas dataframe
List down user's rating history in chronological order
Generate a sample of state-action pair
Split the data into train/test
Store the data back into csv file format

In [ ]:

#collapse-hide
class DataGenerator():
  def __init__(self, datapath, itempath):
    '''
    Load data from the DB MovieLens
    List the users and the items
    List all the users histories
    '''
    self.data  = self.load_data(datapath, itempath)
    self.users = self.data['userId'].unique()   #list of all users
    self.items = self.data['itemId'].unique()   #list of all items
    self.histo = self.generate_history()
    self.train = []
    self.test  = []

  def load_data(self, datapath, itempath):
    '''
    Load the data and merge the name of each movie. 
    A row corresponds to a rate given by a user to a movie.

     Parameters
    ----------
    datapath :  string
                path to the data 100k MovieLens
                contains usersId;itemId;rating 
    itempath :  string
                path to the data 100k MovieLens
                contains itemId;itemName
     Returns
    -------
    result :    DataFrame
                Contains all the ratings 
    '''
    data = pd.read_csv(datapath, sep='\t', 
                       names=['userId', 'itemId', 'rating', 'timestamp'])
    movie_titles = pd.read_csv(itempath, sep='|', names=['itemId', 'itemName'],
                           usecols=range(2), encoding='latin-1')
    return data.merge(movie_titles,on='itemId', how='left')


  def generate_history(self):
    '''
    Group all rates given by users and store them from older to most recent.
    
    Returns
    -------
    result :    List(DataFrame)
                List of the historic for each user
    '''
    historic_users = []
    for i, u in enumerate(self.users):
      temp = self.data[self.data['userId'] == u]
      temp = temp.sort_values('timestamp').reset_index()
      temp.drop('index', axis=1, inplace=True)
      historic_users.append(temp)
    return historic_users

  def sample_history(self, user_histo, action_ratio=0.8, max_samp_by_user=5,  max_state=100, max_action=50, nb_states=[], nb_actions=[]):
    '''
    For a given history, make one or multiple sampling.
    If no optional argument given for nb_states and nb_actions, then the sampling
    is random and each sample can have differents size for action and state.
    To normalize sampling we need to give list of the numbers of states and actions
    to be sampled.

    Parameters
    ----------
    user_histo :  DataFrame
                      historic of user
    delimiter :       string, optional
                      delimiter for the csv
    action_ratio :    float, optional
                      ratio form which movies in history will be selected
    max_samp_by_user: int, optional
                      Nulber max of sample to make by user
    max_state :       int, optional
                      Number max of movies to take for the 'state' column
    max_action :      int, optional
                      Number max of movies to take for the 'action' action
    nb_states :       array(int), optional
                      Numbers of movies to be taken for each sample made on user's historic
    nb_actions :      array(int), optional
                      Numbers of rating to be taken for each sample made on user's historic
    
    Returns
    -------
    states :         List(String)
                     All the states sampled, format of a sample: itemId&rating
    actions :        List(String)
                     All the actions sampled, format of a sample: itemId&rating
  

    Notes
    -----
    States must be before(timestamp<) the actions.
    If given, size of nb_states is the numbller of sample by user
    sizes of nb_states and nb_actions must be equals
    '''

    n = len(user_histo)
    sep = int(action_ratio * n)
    nb_sample = random.randint(1, max_samp_by_user)
    if not nb_states:
      nb_states = [min(random.randint(1, sep), max_state) for i in range(nb_sample)]
    if not nb_actions:
      nb_actions = [min(random.randint(1, n - sep), max_action) for i in range(nb_sample)]
    assert len(nb_states) == len(nb_actions), 'Given array must have the same size'
    
    states  = []
    actions = []
    # SELECT SAMPLES IN HISTORY
    for i in range(len(nb_states)):
      sample_states = user_histo.iloc[0:sep].sample(nb_states[i])
      sample_actions = user_histo.iloc[-(n - sep):].sample(nb_actions[i])
      
      sample_state =  []
      sample_action = []
      for j in range(nb_states[i]):
        row   = sample_states.iloc[j]
        # FORMAT STATE
        state = str(row.loc['itemId']) + '&' + str(row.loc['rating'])
        sample_state.append(state)
      
      for j in range(nb_actions[i]):
        row    = sample_actions.iloc[j]
        # FORMAT ACTION
        action = str(row.loc['itemId']) + '&' + str(row.loc['rating'])
        sample_action.append(action)

      states.append(sample_state)
      actions.append(sample_action)
    return states, actions

  def gen_train_test(self, test_ratio, seed=None):
    '''
    Shuffle the historic of users and separate it in a train and a test set.
    Store the ids for each set.
    An user can't be in both set.

     Parameters
    ----------
    test_ratio :  float
                  Ratio to control the sizes of the sets
    seed       :  float
                  Seed on the shuffle
    '''
    n = len(self.histo)

    if seed is not None:
      random.Random(seed).shuffle(self.histo)
    else:
      random.shuffle(self.histo)

    self.train = self.histo[:int((test_ratio * n))]
    self.test  = self.histo[int((test_ratio * n)):]
    self.user_train = [h.iloc[0,0] for h in self.train]
    self.user_test  = [h.iloc[0,0] for h in self.test]
    

  def write_csv(self, filename, histo_to_write, delimiter=';', action_ratio=0.8, max_samp_by_user=5, max_state=100, max_action=50, nb_states=[], nb_actions=[]):
    '''
    From  a given historic, create a csv file with the format:
    columns : state;action_reward;n_state
    rows    : itemid&rating1 | itemid&rating2 | ... ; itemid&rating3 | ... | itemid&rating4; itemid&rating1 | itemid&rating2 | itemid&rating3 | ... | item&rating4
    at filename location.

    Parameters
    ----------
    filename :        string
                      path to the file to be produced
    histo_to_write :  List(DataFrame)
                      List of the historic for each user
    delimiter :       string, optional
                      delimiter for the csv
    action_ratio :    float, optional
                      ratio form which movies in history will be selected
    max_samp_by_user: int, optional
                      Nulber max of sample to make by user
    max_state :       int, optional
                      Number max of movies to take for the 'state' column
    max_action :      int, optional
                      Number max of movies to take for the 'action' action
    nb_states :       array(int), optional
                      Numbers of movies to be taken for each sample made on user's historic
    nb_actions :      array(int), optional
                      Numbers of rating to be taken for each sample made on user's historic

    Notes
    -----
    if given, size of nb_states is the numbller of sample by user
    sizes of nb_states and nb_actions must be equals

    '''
    with open(filename, mode='w') as file:
      f_writer = csv.writer(file, delimiter=delimiter)
      f_writer.writerow(['state', 'action_reward', 'n_state'])
      for user_histo in histo_to_write:
        states, actions = self.sample_history(user_histo, action_ratio, max_samp_by_user, max_state, max_action, nb_states, nb_actions)
        for i in range(len(states)):
          # FORMAT STATE
          state_str   = '|'.join(states[i])
          # FORMAT ACTION
          action_str  = '|'.join(actions[i])
          # FORMAT N_STATE
          n_state_str = state_str + '|' + action_str
          f_writer.writerow([state_str, action_str, n_state_str])

EmbeddingsGenerator `class`¶

Load the data
Build a keras sequential model
Convert train and test set into required format
Train and evaluate the model
Generate item embeddings for each movie id
Save the embeddings into a csv file

In [ ]:

#collapse-hide
class EmbeddingsGenerator:
  def  __init__(self, train_users, data):
    self.train_users = train_users

    #preprocess
    self.data = data.sort_values(by=['timestamp'])
    #make them start at 0
    self.data['userId'] = self.data['userId'] - 1
    self.data['itemId'] = self.data['itemId'] - 1
    self.user_count = self.data['userId'].max() + 1
    self.movie_count = self.data['itemId'].max() + 1
    self.user_movies = {} #list of rated movies by each user
    for userId in range(self.user_count):
      self.user_movies[userId] = self.data[self.data.userId == userId]['itemId'].tolist()
    self.m = self.model()

  def model(self, hidden_layer_size=100):
    m = Sequential()
    m.add(Dense(hidden_layer_size, input_shape=(1, self.movie_count)))
    m.add(Dropout(0.2))
    m.add(Dense(self.movie_count, activation='softmax'))
    m.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return m
  
  def generate_input(self, user_id):
    '''
    Returns a context and a target for the user_id
    context: user's history with one random movie removed
    target: id of random removed movie
    '''
    user_movies_count = len(self.user_movies[user_id])
    #picking random movie
    random_index = np.random.randint(0, user_movies_count-1) # -1 avoids taking the last movie
    #setting target
    target = np.zeros((1, self.movie_count))
    target[0][self.user_movies[user_id][random_index]] = 1
    #setting context
    context = np.zeros((1, self.movie_count))
    context[0][self.user_movies[user_id][:random_index] + self.user_movies[user_id][random_index+1:]] = 1
    return context, target

  def train(self, nb_epochs = 300, batch_size = 10000):
    '''
    Trains the model from train_users's history
    '''
    for i in range(nb_epochs):
      print('%d/%d' % (i+1, nb_epochs))
      batch = [self.generate_input(user_id=np.random.choice(self.train_users) - 1) for _ in range(batch_size)]
      X_train = np.array([b[0] for b in batch])
      y_train = np.array([b[1] for b in batch])
      self.m.fit(X_train, y_train, epochs=1, validation_split=0.5)

  def test(self, test_users, batch_size = 100000):
    '''
    Returns [loss, accuracy] on the test set
    '''
    batch_test = [self.generate_input(user_id=np.random.choice(test_users) - 1) for _ in range(batch_size)]
    X_test = np.array([b[0] for b in batch_test])
    y_test = np.array([b[1] for b in batch_test])
    return self.m.evaluate(X_test, y_test)

  def save_embeddings(self, file_name):
    '''
    Generates a csv file containg the vector embedding for each movie.
    '''
    inp = self.m.input                                           # input placeholder
    outputs = [layer.output for layer in self.m.layers]          # all layer outputs
    functor = K.function([inp, K.learning_phase()], outputs )   # evaluation function

    #append embeddings to vectors
    vectors = []
    for movie_id in range(self.movie_count):
      movie = np.zeros((1, 1, self.movie_count))
      movie[0][0][movie_id] = 1
      layer_outs = functor([movie])
      vector = [str(v) for v in layer_outs[0][0][0]]
      vector = '|'.join(vector)
      vectors.append([movie_id, vector])

    #saves as a csv file
    embeddings = pd.DataFrame(vectors, columns=['item_id', 'vectors']).astype({'item_id': 'int32'})
    embeddings.to_csv(file_name, sep=';', index=False)

Embeddings `helper class`¶

In [ ]:

#collapse-hide
class Embeddings:
  def __init__(self, item_embeddings):
    self.item_embeddings = item_embeddings
  
  def size(self):
    return self.item_embeddings.shape[1]
  
  def get_embedding_vector(self):
    return self.item_embeddings
  
  def get_embedding(self, item_index):
    return self.item_embeddings[item_index]

  def embed(self, item_list):
    return np.array([self.get_embedding(item) for item in item_list])

read_file `helper function`¶

This function will read the stored data csv files into pandas dataframe

In [ ]:

#collapse-hide
def read_file(data_path):
  ''' Load data from train.csv or test.csv. '''

  data = pd.read_csv(data_path, sep=';')
  for col in ['state', 'n_state', 'action_reward']:
    data[col] = [np.array([[np.int(k) for k in ee.split('&')] for ee in e.split('|')]) for e in data[col]]
  for col in ['state', 'n_state']:
    data[col] = [np.array([e[0] for e in l]) for l in data[col]]

  data['action'] = [[e[0] for e in l] for l in data['action_reward']]
  data['reward'] = [tuple(e[1] for e in l) for l in data['action_reward']]
  data.drop(columns=['action_reward'], inplace=True)

  return data

read_embeddings `helper function`¶

This function will read the stored embedding csv file into pandas dataframe and return as multi-dimensional array

In [ ]:

def read_embeddings(embeddings_path):
  ''' Load embeddings (a vector for each item). '''
  
  embeddings = pd.read_csv(embeddings_path, sep=';')

  return np.array([[np.float64(k) for k in e.split('|')]
                   for e in embeddings['vectors']])

Environment `class`¶

This is the simulator. It will help orchestrating the whole process of learning list recommendations by our actor-critic based MDP agent.

In [ ]:

#collapse-hide
class Environment():
  def __init__(self, data, embeddings, alpha, gamma, fixed_length):
    self.embeddings = embeddings

    self.embedded_data = pd.DataFrame()
    self.embedded_data['state'] = [np.array([embeddings.get_embedding(item_id) 
      for item_id in row['state']]) for _, row in data.iterrows()]
    self.embedded_data['action'] = [np.array([embeddings.get_embedding(item_id) 
      for item_id in row['action']]) for _, row in data.iterrows()]
    self.embedded_data['reward'] = data['reward']

    self.alpha = alpha # α (alpha) in Equation (1)
    self.gamma = gamma # Γ (Gamma) in Equation (4)
    self.fixed_length = fixed_length
    self.current_state = self.reset()
    self.groups = self.get_groups()

  def reset(self):
    self.init_state = self.embedded_data['state'].sample(1).values[0]
    return self.init_state

  def step(self, actions):
    '''
    Compute reward and update state.
    Args:
      actions: embedded chosen items.
    Returns:
      cumulated_reward: overall reward.
      current_state: updated state.
    '''

    # '18: Compute overall reward r_t according to Equation (4)'
    simulated_rewards, cumulated_reward = self.simulate_rewards(self.current_state.reshape((1, -1)), actions.reshape((1, -1)))

    # '11: Set s_t+1 = s_t' <=> self.current_state = self.current_state

    for k in range(len(simulated_rewards)): # '12: for k = 1, K do'
      if simulated_rewards[k] > 0: # '13: if r_t^k > 0 then'
        # '14: Add a_t^k to the end of s_t+1'
        self.current_state = np.append(self.current_state, [actions[k]], axis=0)
        if self.fixed_length: # '15: Remove the first item of s_t+1'
          self.current_state = np.delete(self.current_state, 0, axis=0)

    return cumulated_reward, self.current_state

  def get_groups(self):
    ''' Calculate average state/action value for each group. Equation (3). '''

    groups = []
    for rewards, group in self.embedded_data.groupby(['reward']):
      size = group.shape[0]
      states = np.array(list(group['state'].values))
      actions = np.array(list(group['action'].values))
      groups.append({
        'size': size, # N_x in article
        'rewards': rewards, # U_x in article (combination of rewards)
        'average state': (np.sum(states / np.linalg.norm(states, 2, axis=1)[:, np.newaxis], axis=0) / size).reshape((1, -1)), # s_x^-
        'average action': (np.sum(actions / np.linalg.norm(actions, 2, axis=1)[:, np.newaxis], axis=0) / size).reshape((1, -1)) # a_x^-
      })
    return groups

  def simulate_rewards(self, current_state, chosen_actions, reward_type='grouped cosine'):
    '''
    Calculate simulated rewards.
    Args:
      current_state: history, list of embedded items.
      chosen_actions: embedded chosen items.
      reward_type: from ['normal', 'grouped average', 'grouped cosine'].
    Returns:
      returned_rewards: most probable rewards.
      cumulated_reward: probability weighted rewards.
    '''

    # Equation (1)
    def cosine_state_action(s_t, a_t, s_i, a_i):
      cosine_state = np.dot(s_t, s_i.T) / (np.linalg.norm(s_t, 2) * np.linalg.norm(s_i, 2))
      cosine_action = np.dot(a_t, a_i.T) / (np.linalg.norm(a_t, 2) * np.linalg.norm(a_i, 2))
      return (self.alpha * cosine_state + (1 - self.alpha) * cosine_action).reshape((1,))

    if reward_type == 'normal':
      # Calculate simulated reward in normal way: Equation (2)
      probabilities = [cosine_state_action(current_state, chosen_actions, row['state'], row['action'])
        for _, row in self.embedded_data.iterrows()]
    elif reward_type == 'grouped average':
      # Calculate simulated reward by grouped average: Equation (3)
      probabilities = np.array([g['size'] for g in self.groups]) *\
        [(self.alpha * (np.dot(current_state, g['average state'].T) / np.linalg.norm(current_state, 2))\
        + (1 - self.alpha) * (np.dot(chosen_actions, g['average action'].T) / np.linalg.norm(chosen_actions, 2)))
        for g in self.groups]
    elif reward_type == 'grouped cosine':
      # Calculate simulated reward by grouped cosine: Equations (1) and (3)
      probabilities = [cosine_state_action(current_state, chosen_actions, g['average state'], g['average action'])
        for g in self.groups]

    # Normalize (sum to 1)
    probabilities = np.array(probabilities) / sum(probabilities)

    # Get most probable rewards
    if reward_type == 'normal':
      returned_rewards = self.embedded_data.iloc[np.argmax(probabilities)]['reward']
    elif reward_type in ['grouped average', 'grouped cosine']:
      returned_rewards = self.groups[np.argmax(probabilities)]['rewards']

    # Equation (4)
    def overall_reward(rewards, gamma):
      return np.sum([gamma**k * reward for k, reward in enumerate(rewards)])

    if reward_type in ['normal', 'grouped average']:
      # Get cumulated reward: Equation (4)
      cumulated_reward = overall_reward(returned_rewards, self.gamma)
    elif reward_type == 'grouped cosine':
      # Get probability weighted cumulated reward
      cumulated_reward = np.sum([p * overall_reward(g['rewards'], self.gamma)
        for p, g in zip(probabilities, self.groups)])

    return returned_rewards, cumulated_reward

Actor `class`¶

This is the policy approximator actor

In [ ]:

#collapse-hide
class Actor():
  ''' Policy function approximator. '''
  
  def __init__(self, sess, state_space_size, action_space_size, batch_size, ra_length, history_length, embedding_size, tau, learning_rate, scope='actor'):
    self.sess = sess
    self.state_space_size = state_space_size
    self.action_space_size = action_space_size
    self.batch_size = batch_size
    self.ra_length = ra_length
    self.history_length = history_length
    self.embedding_size = embedding_size
    self.tau = tau
    self.learning_rate = learning_rate
    self.scope = scope

    with tf.variable_scope(self.scope):
      # Build Actor network
      self.action_weights, self.state, self.sequence_length = self._build_net('estimator_actor')
      self.network_params = tf.trainable_variables()

      # Build target Actor network
      self.target_action_weights, self.target_state, self.target_sequence_length = self._build_net('target_actor')
      self.target_network_params = tf.trainable_variables()[len(self.network_params):] # TODO: why sublist [len(x):]? Maybe because its equal to network_params + target_network_params

      # Initialize target network weights with network weights (θ^π′ ← θ^π)
      self.init_target_network_params = [self.target_network_params[i].assign(self.network_params[i])
        for i in range(len(self.target_network_params))]
        
      # Update target network weights (θ^π′ ← τθ^π + (1 − τ)θ^π′)
      self.update_target_network_params = [self.target_network_params[i].assign(
        tf.multiply(self.tau, self.network_params[i]) +
        tf.multiply(1 - self.tau, self.target_network_params[i]))
        for i in range(len(self.target_network_params))]

      # Gradient computation from Critic's action_gradients
      self.action_gradients = tf.placeholder(tf.float32, [None, self.action_space_size])
      gradients = tf.gradients(tf.reshape(self.action_weights, [self.batch_size, self.action_space_size], name='42222222222'),
                               self.network_params,
                               self.action_gradients)
      params_gradients = list(map(lambda x: tf.div(x, self.batch_size * self.action_space_size), gradients))
      
      # Compute ∇_a.Q(s, a|θ^µ).∇_θ^π.f_θ^π(s)
      self.optimizer = tf.train.AdamOptimizer(self.learning_rate).apply_gradients(
          zip(params_gradients, self.network_params))

  def _build_net(self, scope):
    ''' Build the (target) Actor network. '''

    def gather_last_output(data, seq_lens):
      def cli_value(x, v):
        y = tf.constant(v, shape=x.get_shape(), dtype=tf.int64)
        x = tf.cast(x, tf.int64)
        return tf.where(tf.greater(x, y), x, y)

      batch_range = tf.range(tf.cast(tf.shape(data)[0], dtype=tf.int64), dtype=tf.int64)
      tmp_end = tf.map_fn(lambda x: cli_value(x, 0), seq_lens - 1, dtype=tf.int64)
      indices = tf.stack([batch_range, tmp_end], axis=1)
      return tf.gather_nd(data, indices)

    with tf.variable_scope(scope):
      # Inputs: current state, sequence_length
      # Outputs: action weights to compute the score Equation (6)
      state = tf.placeholder(tf.float32, [None, self.state_space_size], 'state')
      state_ = tf.reshape(state, [-1, self.history_length, self.embedding_size])
      sequence_length = tf.placeholder(tf.int32, [None], 'sequence_length')
      cell = tf.nn.rnn_cell.GRUCell(self.embedding_size,
                                    activation=tf.nn.relu,
                                    kernel_initializer=tf.initializers.random_normal(),
                                    bias_initializer=tf.zeros_initializer())
      outputs, _ = tf.nn.dynamic_rnn(cell, state_, dtype=tf.float32, sequence_length=sequence_length)
      last_output = gather_last_output(outputs, sequence_length) # TODO: replace by h
      x = tf.keras.layers.Dense(self.ra_length * self.embedding_size)(last_output)
      action_weights = tf.reshape(x, [-1, self.ra_length, self.embedding_size])

    return action_weights, state, sequence_length

  def train(self, state, sequence_length, action_gradients):
    '''  Compute ∇_a.Q(s, a|θ^µ).∇_θ^π.f_θ^π(s). '''
    self.sess.run(self.optimizer,
                  feed_dict={
                      self.state: state,
                      self.sequence_length: sequence_length,
                      self.action_gradients: action_gradients})

  def predict(self, state, sequence_length):
    return self.sess.run(self.action_weights,
                         feed_dict={
                             self.state: state,
                             self.sequence_length: sequence_length})

  def predict_target(self, state, sequence_length):
    return self.sess.run(self.target_action_weights,
                         feed_dict={
                             self.target_state: state,
                             self.target_sequence_length: sequence_length})

  def init_target_network(self):
    self.sess.run(self.init_target_network_params)

  def update_target_network(self):
    self.sess.run(self.update_target_network_params)
    
  def get_recommendation_list(self, ra_length, noisy_state, embeddings, target=False):
    '''
    Algorithm 2
    Args:
      ra_length: length of the recommendation list.
      noisy_state: current/remembered environment state with noise.
      embeddings: Embeddings object.
      target: boolean to use Actor's network or target network.
    Returns:
      Recommendation List: list of embedded items as future actions.
    '''

    def get_score(weights, embedding, batch_size):
      '''
      Equation (6)
      Args:
        weights: w_t^k shape=(embedding_size,).
        embedding: e_i shape=(embedding_size,).
      Returns:
        score of the item i: score_i=w_t^k.e_i^T shape=(1,).
      '''
      ret = np.dot(weights, embedding.T)
      return ret

    batch_size = noisy_state.shape[0]

    # '1: Generate w_t = {w_t^1, ..., w_t^K} according to Equation (5)'
    method = self.predict_target if target else self.predict
    weights = method(noisy_state, [ra_length] * batch_size)

    # '3: Score items in I according to Equation (6)'
    scores = np.array([[[get_score(weights[i][k], embedding, batch_size)
      for embedding in embeddings.get_embedding_vector()]
      for k in range(ra_length)]
      for i in range(batch_size)])

    # '8: return a_t'
    return np.array([[embeddings.get_embedding(np.argmax(scores[i][k]))
      for k in range(ra_length)]
      for i in range(batch_size)])

Critic `class`¶

This is the value approximator critic

In [ ]:

#collapse-hide
class Critic():
  ''' Value function approximator. '''
  
  def __init__(self, sess, state_space_size, action_space_size, history_length, embedding_size, tau, learning_rate, scope='critic'):
    self.sess = sess
    self.state_space_size = state_space_size
    self.action_space_size = action_space_size
    self.history_length = history_length
    self.embedding_size = embedding_size
    self.tau = tau
    self.learning_rate = learning_rate
    self.scope = scope

    with tf.variable_scope(self.scope):
      # Build Critic network
      self.critic_Q_value, self.state, self.action, self.sequence_length = self._build_net('estimator_critic')
      self.network_params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='estimator_critic')

      # Build target Critic network
      self.target_Q_value, self.target_state, self.target_action, self.target_sequence_length = self._build_net('target_critic')
      self.target_network_params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='target_critic')

      # Initialize target network weights with network weights (θ^µ′ ← θ^µ)
      self.init_target_network_params = [self.target_network_params[i].assign(self.network_params[i])
        for i in range(len(self.target_network_params))]

      # Update target network weights (θ^µ′ ← τθ^µ + (1 − τ)θ^µ′)
      self.update_target_network_params = [self.target_network_params[i].assign(
        tf.multiply(self.tau, self.network_params[i]) +
        tf.multiply(1 - self.tau, self.target_network_params[i]))
        for i in range(len(self.target_network_params))]

      # Minimize MSE between Critic's and target Critic's outputed Q-values
      self.expected_reward = tf.placeholder(tf.float32, [None, 1])
      self.loss = tf.reduce_mean(tf.squared_difference(self.expected_reward, self.critic_Q_value))
      self.optimizer = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss)

      # Compute ∇_a.Q(s, a|θ^µ)
      self.action_gradients = tf.gradients(self.critic_Q_value, self.action)

  def _build_net(self, scope):
    ''' Build the (target) Critic network. '''

    def gather_last_output(data, seq_lens):
      def cli_value(x, v):
        y = tf.constant(v, shape=x.get_shape(), dtype=tf.int64)
        return tf.where(tf.greater(x, y), x, y)

      this_range = tf.range(tf.cast(tf.shape(seq_lens)[0], dtype=tf.int64), dtype=tf.int64)
      tmp_end = tf.map_fn(lambda x: cli_value(x, 0), seq_lens - 1, dtype=tf.int64)
      indices = tf.stack([this_range, tmp_end], axis=1)
      return tf.gather_nd(data, indices)

    with tf.variable_scope(scope):
      # Inputs: current state, current action
      # Outputs: predicted Q-value
      state = tf.placeholder(tf.float32, [None, self.state_space_size], 'state')
      state_ = tf.reshape(state, [-1, self.history_length, self.embedding_size])
      action = tf.placeholder(tf.float32, [None, self.action_space_size], 'action')
      sequence_length = tf.placeholder(tf.int64, [None], name='critic_sequence_length')
      cell = tf.nn.rnn_cell.GRUCell(self.history_length,
                                    activation=tf.nn.relu,
                                    kernel_initializer=tf.initializers.random_normal(),
                                    bias_initializer=tf.zeros_initializer())
      predicted_state, _ = tf.nn.dynamic_rnn(cell, state_, dtype=tf.float32, sequence_length=sequence_length)
      predicted_state = gather_last_output(predicted_state, sequence_length)

      inputs = tf.concat([predicted_state, action], axis=-1)
      layer1 = tf.layers.Dense(32, activation=tf.nn.relu)(inputs)
      layer2 = tf.layers.Dense(16, activation=tf.nn.relu)(layer1)
      critic_Q_value = tf.layers.Dense(1)(layer2)
      return critic_Q_value, state, action, sequence_length

  def train(self, state, action, sequence_length, expected_reward):
    ''' Minimize MSE between expected reward and target Critic's Q-value. '''
    return self.sess.run([self.critic_Q_value, self.loss, self.optimizer],
                         feed_dict={
                             self.state: state,
                             self.action: action,
                             self.sequence_length: sequence_length,
                             self.expected_reward: expected_reward})

  def predict(self, state, action, sequence_length):
    ''' Returns Critic's predicted Q-value. '''
    return self.sess.run(self.critic_Q_value,
                         feed_dict={
                             self.state: state,
                             self.action: action,
                             self.sequence_length: sequence_length})

  def predict_target(self, state, action, sequence_length):
    ''' Returns target Critic's predicted Q-value. '''
    return self.sess.run(self.target_Q_value,
                         feed_dict={
                             self.target_state: state,
                             self.target_action: action,
                             self.target_sequence_length: sequence_length})

  def get_action_gradients(self, state, action, sequence_length):
    ''' Returns ∇_a.Q(s, a|θ^µ). '''
    return np.array(self.sess.run(self.action_gradients,
                         feed_dict={
                             self.state: state,
                             self.action: action,
                             self.sequence_length: sequence_length})[0])

  def init_target_network(self):
    self.sess.run(self.init_target_network_params)

  def update_target_network(self):
    self.sess.run(self.update_target_network_params)

ReplayMemory `class`¶

In [ ]:

#collapse-hide
class ReplayMemory():
  ''' Replay memory D. '''
  
  def __init__(self, buffer_size):
    self.buffer_size = buffer_size
    # self.buffer = [[row['state'], row['action'], row['reward'], row['n_state']] for _, row in data.iterrows()][-self.buffer_size:] TODO: empty or not?
    self.buffer = []

  def add(self, state, action, reward, n_state):
    self.buffer.append([state, action, reward, n_state])
    if len(self.buffer) > self.buffer_size:
      self.buffer.pop(0)

  def size(self):
    return len(self.buffer)

  def sample_batch(self, batch_size):
    return random.sample(self.buffer, batch_size)

experience_replay `function`¶

In [ ]:

#collapse-hide
def experience_replay(replay_memory, batch_size, actor, critic, embeddings, ra_length, state_space_size, action_space_size, discount_factor):
  '''
  Experience replay.
  Args:
    replay_memory: replay memory D in article.
    batch_size: sample size.
    actor: Actor network.
    critic: Critic network.
    embeddings: Embeddings object.
    state_space_size: dimension of states.
    action_space_size: dimensions of actions.
  Returns:
    Best Q-value, loss of Critic network for printing/recording purpose.
  '''

  # '22: Sample minibatch of N transitions (s, a, r, s′) from D'
  samples = replay_memory.sample_batch(batch_size)
  states = np.array([s[0] for s in samples])
  actions = np.array([s[1] for s in samples])
  rewards = np.array([s[2] for s in samples])
  n_states = np.array([s[3] for s in samples]).reshape(-1, state_space_size)

  # '23: Generate a′ by target Actor network according to Algorithm 2'
  n_actions = actor.get_recommendation_list(ra_length, states, embeddings, target=True).reshape(-1, action_space_size)

  # Calculate predicted Q′(s′, a′|θ^µ′) value
  target_Q_value = critic.predict_target(n_states, n_actions, [ra_length] * batch_size)

  # '24: Set y = r + γQ′(s′, a′|θ^µ′)'
  expected_rewards = rewards + discount_factor * target_Q_value
  
  # '25: Update Critic by minimizing (y − Q(s, a|θ^µ))²'
  critic_Q_value, critic_loss, _ = critic.train(states, actions, [ra_length] * batch_size, expected_rewards)
  
  # '26: Update the Actor using the sampled policy gradient'
  action_gradients = critic.get_action_gradients(states, n_actions, [ra_length] * batch_size)
  actor.train(states, [ra_length] * batch_size, action_gradients)

  # '27: Update the Critic target networks'
  critic.update_target_network()

  # '28: Update the Actor target network'
  actor.update_target_network()

  return np.amax(critic_Q_value), critic_loss

OrnsteinUhlenbeckNoise `class`¶

In [ ]:

#collapse-hide
class OrnsteinUhlenbeckNoise:
  ''' Noise for Actor predictions. '''
  def __init__(self, action_space_size, mu=0, theta=0.5, sigma=0.2):
    self.action_space_size = action_space_size
    self.mu = mu
    self.theta = theta
    self.sigma = sigma
    self.state = np.ones(self.action_space_size) * self.mu

  def get(self):
    self.state += self.theta * (self.mu - self.state) + self.sigma * np.random.rand(self.action_space_size)
    return self.state

def train(sess, environment, actor, critic, embeddings, history_length, ra_length, buffer_size, batch_size, discount_factor, nb_episodes, filename_summary):
  ''' Algorithm 3 in article. '''

  # Set up summary operators
  def build_summaries():
    episode_reward = tf.Variable(0.)
    tf.summary.scalar('reward', episode_reward)
    episode_max_Q = tf.Variable(0.)
    tf.summary.scalar('max_Q_value', episode_max_Q)
    critic_loss = tf.Variable(0.)
    tf.summary.scalar('critic_loss', critic_loss)

    summary_vars = [episode_reward, episode_max_Q, critic_loss]
    summary_ops = tf.summary.merge_all()
    return summary_ops, summary_vars

  summary_ops, summary_vars = build_summaries()
  sess.run(tf.global_variables_initializer())
  writer = tf.summary.FileWriter(filename_summary, sess.graph)

  # '2: Initialize target network f′ and Q′'
  actor.init_target_network()
  critic.init_target_network()

  # '3: Initialize the capacity of replay memory D'
  replay_memory = ReplayMemory(buffer_size) # Memory D in article
  replay = False


  start_time = time.time()
  for i_session in range(nb_episodes): # '4: for session = 1, M do'
    session_reward = 0
    session_Q_value = 0
    session_critic_loss = 0

    # '5: Reset the item space I' is useless because unchanged.

    states = environment.reset() # '6: Initialize state s_0 from previous sessions'
    
    if (i_session + 1) % 10 == 0: # Update average parameters every 10 episodes
      environment.groups = environment.get_groups()
      
    exploration_noise = OrnsteinUhlenbeckNoise(history_length * embeddings.size())

    for t in range(nb_rounds): # '7: for t = 1, T do'
      # '8: Stage 1: Transition Generating Stage'

      # '9: Select an action a_t = {a_t^1, ..., a_t^K} according to Algorithm 2'
      actions = actor.get_recommendation_list(
          ra_length,
          states.reshape(1, -1), # TODO + exploration_noise.get().reshape(1, -1),
          embeddings).reshape(ra_length, embeddings.size())

      # '10: Execute action a_t and observe the reward list {r_t^1, ..., r_t^K} for each item in a_t'
      rewards, next_states = environment.step(actions)

      # '19: Store transition (s_t, a_t, r_t, s_t+1) in D'
      replay_memory.add(states.reshape(history_length * embeddings.size()),
                        actions.reshape(ra_length * embeddings.size()),
                        [rewards],
                        next_states.reshape(history_length * embeddings.size()))

      states = next_states # '20: Set s_t = s_t+1'

      session_reward += rewards
      
      # '21: Stage 2: Parameter Updating Stage'
      if replay_memory.size() >= batch_size: # Experience replay
        replay = True
        replay_Q_value, critic_loss = experience_replay(replay_memory, batch_size,
          actor, critic, embeddings, ra_length, history_length * embeddings.size(),
          ra_length * embeddings.size(), discount_factor)
        session_Q_value += replay_Q_value
        session_critic_loss += critic_loss

      summary_str = sess.run(summary_ops,
                             feed_dict={summary_vars[0]: session_reward,
                                        summary_vars[1]: session_Q_value,
                                        summary_vars[2]: session_critic_loss})
      
      writer.add_summary(summary_str, i_session)

      '''
      print(state_to_items(embeddings.embed(data['state'][0]), actor, ra_length, embeddings),
            state_to_items(embeddings.embed(data['state'][0]), actor, ra_length, embeddings, True))
      '''

    str_loss = str('Loss=%0.4f' % session_critic_loss)
    print(('Episode %d/%d Reward=%d Time=%ds ' + (str_loss if replay else 'No replay')) % (i_session + 1, nb_episodes, session_reward, time.time() - start_time))
    start_time = time.time()

  writer.close()
  tf.train.Saver().save(sess, 'models.h5', write_meta_graph=False)

Hyperparameters¶

In [ ]:

# Hyperparameters
history_length = 12 # N in article
ra_length = 4 # K in article
discount_factor = 0.99 # Gamma in Bellman equation
actor_lr = 0.0001
critic_lr = 0.001
tau = 0.001 # τ in Algorithm 3
batch_size = 64
nb_episodes = 100
nb_rounds = 50
filename_summary = 'summary.txt'
alpha = 0.5 # α (alpha) in Equation (1)
gamma = 0.9 # Γ (Gamma) in Equation (4)
buffer_size = 1000000 # Size of replay memory D in article
fixed_length = True # Fixed memory length

Data generation¶

In [ ]:

dg = DataGenerator('ml-100k/u.data', 'ml-100k/u.item')
dg.gen_train_test(0.8, seed=42)

dg.write_csv('train.csv', dg.train, nb_states=[history_length], nb_actions=[ra_length])
dg.write_csv('test.csv', dg.test, nb_states=[history_length], nb_actions=[ra_length])

data = read_file('train.csv')

In [ ]:

data.head()

Out[ ]:

	state	n_state	action	reward
0	[732, 257, 507, 602, 481, 568, 1286, 50, 501, ...	[732, 257, 507, 602, 481, 568, 1286, 50, 501, ...	[731, 525, 80, 88]	(3, 4, 3, 3)
1	[1226, 855, 339, 124, 16, 147, 59, 827, 323, 2...	[1226, 855, 339, 124, 16, 147, 59, 827, 323, 2...	[52, 1005, 347, 70]	(4, 5, 4, 3)
2	[316, 286, 313, 748, 258, 272, 300, 302, 347, ...	[316, 286, 313, 748, 258, 272, 300, 302, 347, ...	[751, 271, 689, 289]	(4, 4, 4, 5)
3	[235, 433, 96, 117, 429, 7, 471, 201, 276, 55,...	[235, 433, 96, 117, 429, 7, 471, 201, 276, 55,...	[31, 198, 724, 654]	(3, 5, 3, 4)
4	[77, 241, 98, 423, 71, 157, 955, 186, 121, 421...	[77, 241, 98, 423, 71, 157, 955, 186, 121, 421...	[316, 427, 313, 959]	(4, 5, 4, 5)

Embedding generation¶

In [ ]:

#collapse-output
if True: # Generate embeddings?
  eg = EmbeddingsGenerator(dg.user_train, pd.read_csv('ml-100k/u.data', sep='\t', names=['userId', 'itemId', 'rating', 'timestamp']))
  eg.train(nb_epochs=300)
  train_loss, train_accuracy = eg.test(dg.user_train)
  print('Train set: Loss=%.4f ; Accuracy=%.1f%%' % (train_loss, train_accuracy * 100))
  test_loss, test_accuracy = eg.test(dg.user_test)
  print('Test set: Loss=%.4f ; Accuracy=%.1f%%' % (test_loss, test_accuracy * 100))
  eg.save_embeddings('embeddings.csv')

WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
1/300
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 493us/step - loss: 6.9202 - accuracy: 0.0100 - val_loss: 6.5489 - val_accuracy: 0.0160
2/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 392us/step - loss: 6.4452 - accuracy: 0.0150 - val_loss: 6.3391 - val_accuracy: 0.0144
3/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 397us/step - loss: 6.2737 - accuracy: 0.0172 - val_loss: 6.2418 - val_accuracy: 0.0142
4/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 6.2539 - accuracy: 0.0156 - val_loss: 6.1208 - val_accuracy: 0.0152
5/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 387us/step - loss: 6.1386 - accuracy: 0.0196 - val_loss: 6.1390 - val_accuracy: 0.0230
6/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 399us/step - loss: 6.0841 - accuracy: 0.0170 - val_loss: 6.0209 - val_accuracy: 0.0210
7/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 6.0844 - accuracy: 0.0204 - val_loss: 5.9844 - val_accuracy: 0.0240
8/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 397us/step - loss: 6.0329 - accuracy: 0.0222 - val_loss: 5.9625 - val_accuracy: 0.0244
9/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 6.0147 - accuracy: 0.0204 - val_loss: 5.9273 - val_accuracy: 0.0246
10/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 387us/step - loss: 5.9878 - accuracy: 0.0250 - val_loss: 5.9182 - val_accuracy: 0.0264
11/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 388us/step - loss: 5.9183 - accuracy: 0.0254 - val_loss: 5.8513 - val_accuracy: 0.0324
12/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 390us/step - loss: 5.9244 - accuracy: 0.0266 - val_loss: 5.8591 - val_accuracy: 0.0334
13/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 386us/step - loss: 5.8851 - accuracy: 0.0312 - val_loss: 5.8540 - val_accuracy: 0.0326
14/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 392us/step - loss: 5.8685 - accuracy: 0.0316 - val_loss: 5.8123 - val_accuracy: 0.0378
15/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 399us/step - loss: 5.8641 - accuracy: 0.0334 - val_loss: 5.8084 - val_accuracy: 0.0338
16/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 400us/step - loss: 5.8641 - accuracy: 0.0322 - val_loss: 5.7705 - val_accuracy: 0.0390
17/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 399us/step - loss: 5.8240 - accuracy: 0.0434 - val_loss: 5.7161 - val_accuracy: 0.0510
18/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 387us/step - loss: 5.7978 - accuracy: 0.0442 - val_loss: 5.7052 - val_accuracy: 0.0492
19/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 384us/step - loss: 5.7698 - accuracy: 0.0396 - val_loss: 5.7139 - val_accuracy: 0.0484
20/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 392us/step - loss: 5.7513 - accuracy: 0.0440 - val_loss: 5.6890 - val_accuracy: 0.0524
21/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 401us/step - loss: 5.7090 - accuracy: 0.0488 - val_loss: 5.6584 - val_accuracy: 0.0514
22/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 401us/step - loss: 5.7031 - accuracy: 0.0432 - val_loss: 5.5928 - val_accuracy: 0.0622
23/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 5.6509 - accuracy: 0.0554 - val_loss: 5.5568 - val_accuracy: 0.0670
24/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 381us/step - loss: 5.6746 - accuracy: 0.0600 - val_loss: 5.6020 - val_accuracy: 0.0604
25/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 390us/step - loss: 5.5880 - accuracy: 0.0618 - val_loss: 5.5072 - val_accuracy: 0.0756
26/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 5.6083 - accuracy: 0.0634 - val_loss: 5.5330 - val_accuracy: 0.0712
27/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 391us/step - loss: 5.5992 - accuracy: 0.0658 - val_loss: 5.5303 - val_accuracy: 0.0782
28/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 390us/step - loss: 5.5620 - accuracy: 0.0730 - val_loss: 5.4303 - val_accuracy: 0.0820
29/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 386us/step - loss: 5.5142 - accuracy: 0.0720 - val_loss: 5.3807 - val_accuracy: 0.0934
30/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 387us/step - loss: 5.5019 - accuracy: 0.0752 - val_loss: 5.3523 - val_accuracy: 0.0960
31/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 391us/step - loss: 5.4424 - accuracy: 0.0876 - val_loss: 5.3459 - val_accuracy: 0.0970
32/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 391us/step - loss: 5.4085 - accuracy: 0.0878 - val_loss: 5.3406 - val_accuracy: 0.1014
33/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 398us/step - loss: 5.3886 - accuracy: 0.0888 - val_loss: 5.3043 - val_accuracy: 0.0976
34/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 393us/step - loss: 5.3653 - accuracy: 0.0924 - val_loss: 5.2621 - val_accuracy: 0.1140
35/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 388us/step - loss: 5.4007 - accuracy: 0.0890 - val_loss: 5.2699 - val_accuracy: 0.1140
36/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 404us/step - loss: 5.2848 - accuracy: 0.1040 - val_loss: 5.2311 - val_accuracy: 0.1204
37/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 5.3042 - accuracy: 0.1008 - val_loss: 5.1873 - val_accuracy: 0.1230
38/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 5.2921 - accuracy: 0.1066 - val_loss: 5.1309 - val_accuracy: 0.1314
39/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 398us/step - loss: 5.2618 - accuracy: 0.1070 - val_loss: 5.0953 - val_accuracy: 0.1448
40/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 390us/step - loss: 5.1676 - accuracy: 0.1136 - val_loss: 5.0245 - val_accuracy: 0.1492
41/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 389us/step - loss: 5.1354 - accuracy: 0.1242 - val_loss: 5.0307 - val_accuracy: 0.1498
42/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 5.1219 - accuracy: 0.1276 - val_loss: 4.9901 - val_accuracy: 0.1650
43/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 400us/step - loss: 5.1534 - accuracy: 0.1336 - val_loss: 4.9714 - val_accuracy: 0.1694
44/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 399us/step - loss: 5.1184 - accuracy: 0.1376 - val_loss: 4.9489 - val_accuracy: 0.1592
45/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 401us/step - loss: 5.0298 - accuracy: 0.1376 - val_loss: 4.9274 - val_accuracy: 0.1720
46/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 5.0099 - accuracy: 0.1504 - val_loss: 4.8445 - val_accuracy: 0.1786
47/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 4.9666 - accuracy: 0.1526 - val_loss: 4.7906 - val_accuracy: 0.1948
48/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 399us/step - loss: 4.9227 - accuracy: 0.1608 - val_loss: 4.7955 - val_accuracy: 0.1882
49/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 392us/step - loss: 4.9143 - accuracy: 0.1596 - val_loss: 4.7671 - val_accuracy: 0.1988
50/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 4.8729 - accuracy: 0.1662 - val_loss: 4.7239 - val_accuracy: 0.2054
51/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 394us/step - loss: 4.7912 - accuracy: 0.1836 - val_loss: 4.6844 - val_accuracy: 0.2136
52/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 393us/step - loss: 4.8110 - accuracy: 0.1784 - val_loss: 4.6213 - val_accuracy: 0.2212
53/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 408us/step - loss: 4.6814 - accuracy: 0.1954 - val_loss: 4.5962 - val_accuracy: 0.2324
54/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 394us/step - loss: 4.7148 - accuracy: 0.1876 - val_loss: 4.5031 - val_accuracy: 0.2446
55/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 4.6427 - accuracy: 0.2044 - val_loss: 4.5063 - val_accuracy: 0.2494
56/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 394us/step - loss: 4.6296 - accuracy: 0.2088 - val_loss: 4.4856 - val_accuracy: 0.2628
57/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 389us/step - loss: 4.5943 - accuracy: 0.2136 - val_loss: 4.4382 - val_accuracy: 0.2612
58/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 394us/step - loss: 4.5594 - accuracy: 0.2218 - val_loss: 4.3101 - val_accuracy: 0.2852
59/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 4.5126 - accuracy: 0.2246 - val_loss: 4.3327 - val_accuracy: 0.2772
60/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 397us/step - loss: 4.4381 - accuracy: 0.2378 - val_loss: 4.2424 - val_accuracy: 0.2890
61/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 388us/step - loss: 4.4603 - accuracy: 0.2266 - val_loss: 4.2749 - val_accuracy: 0.2970
62/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 392us/step - loss: 4.4146 - accuracy: 0.2394 - val_loss: 4.1974 - val_accuracy: 0.3112
63/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 405us/step - loss: 4.3489 - accuracy: 0.2488 - val_loss: 4.1782 - val_accuracy: 0.3094
64/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 4.3680 - accuracy: 0.2514 - val_loss: 4.1138 - val_accuracy: 0.3308
65/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 4.3094 - accuracy: 0.2558 - val_loss: 4.0360 - val_accuracy: 0.3318
66/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 392us/step - loss: 4.2179 - accuracy: 0.2738 - val_loss: 4.0146 - val_accuracy: 0.3466
67/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 400us/step - loss: 4.1800 - accuracy: 0.2802 - val_loss: 3.9621 - val_accuracy: 0.3408
68/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 399us/step - loss: 4.1356 - accuracy: 0.2862 - val_loss: 3.9211 - val_accuracy: 0.3626
69/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 390us/step - loss: 4.0549 - accuracy: 0.3104 - val_loss: 3.8791 - val_accuracy: 0.3770
70/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 401us/step - loss: 4.0348 - accuracy: 0.3038 - val_loss: 3.8400 - val_accuracy: 0.3868
71/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 397us/step - loss: 3.9944 - accuracy: 0.3064 - val_loss: 3.7690 - val_accuracy: 0.3856
72/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 4.0106 - accuracy: 0.3132 - val_loss: 3.7704 - val_accuracy: 0.3956
73/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 404us/step - loss: 3.9379 - accuracy: 0.3204 - val_loss: 3.6701 - val_accuracy: 0.3974
74/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 403us/step - loss: 3.8949 - accuracy: 0.3356 - val_loss: 3.6144 - val_accuracy: 0.4296
75/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 390us/step - loss: 3.8187 - accuracy: 0.3368 - val_loss: 3.5836 - val_accuracy: 0.4128
76/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 405us/step - loss: 3.8102 - accuracy: 0.3428 - val_loss: 3.5028 - val_accuracy: 0.4346
77/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 404us/step - loss: 3.7603 - accuracy: 0.3534 - val_loss: 3.4695 - val_accuracy: 0.4408
78/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 3.7149 - accuracy: 0.3650 - val_loss: 3.4650 - val_accuracy: 0.4510
79/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 392us/step - loss: 3.6468 - accuracy: 0.3808 - val_loss: 3.4893 - val_accuracy: 0.4436
80/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 3.6445 - accuracy: 0.3664 - val_loss: 3.3478 - val_accuracy: 0.4686
81/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 398us/step - loss: 3.5365 - accuracy: 0.3988 - val_loss: 3.3008 - val_accuracy: 0.4854
82/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 3.5254 - accuracy: 0.3944 - val_loss: 3.3290 - val_accuracy: 0.4742
83/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 3.4863 - accuracy: 0.4056 - val_loss: 3.3257 - val_accuracy: 0.4792
84/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 398us/step - loss: 3.4112 - accuracy: 0.4164 - val_loss: 3.1776 - val_accuracy: 0.5012
85/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 3.4042 - accuracy: 0.4216 - val_loss: 3.1592 - val_accuracy: 0.5088
86/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 3.2852 - accuracy: 0.4386 - val_loss: 3.1144 - val_accuracy: 0.5164
87/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 403us/step - loss: 3.2709 - accuracy: 0.4408 - val_loss: 3.0742 - val_accuracy: 0.5240
88/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 403us/step - loss: 3.2722 - accuracy: 0.4414 - val_loss: 3.0320 - val_accuracy: 0.5292
89/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 3.1894 - accuracy: 0.4532 - val_loss: 2.9413 - val_accuracy: 0.5480
90/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 404us/step - loss: 3.1212 - accuracy: 0.4698 - val_loss: 2.8748 - val_accuracy: 0.5718
91/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 394us/step - loss: 3.0990 - accuracy: 0.4772 - val_loss: 2.9096 - val_accuracy: 0.5594
92/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 397us/step - loss: 2.9949 - accuracy: 0.4906 - val_loss: 2.7876 - val_accuracy: 0.5796
93/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 405us/step - loss: 2.9939 - accuracy: 0.4926 - val_loss: 2.7424 - val_accuracy: 0.5816
94/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 405us/step - loss: 2.9415 - accuracy: 0.4980 - val_loss: 2.6546 - val_accuracy: 0.5992
95/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 397us/step - loss: 2.9397 - accuracy: 0.5084 - val_loss: 2.6482 - val_accuracy: 0.5952
96/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 387us/step - loss: 2.8384 - accuracy: 0.5260 - val_loss: 2.6494 - val_accuracy: 0.6058
97/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 2.8045 - accuracy: 0.5276 - val_loss: 2.6678 - val_accuracy: 0.6152
98/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 2.7874 - accuracy: 0.5284 - val_loss: 2.5639 - val_accuracy: 0.6278
99/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 408us/step - loss: 2.7609 - accuracy: 0.5396 - val_loss: 2.5057 - val_accuracy: 0.6398
100/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 404us/step - loss: 2.6811 - accuracy: 0.5578 - val_loss: 2.4427 - val_accuracy: 0.6516
101/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 410us/step - loss: 2.6871 - accuracy: 0.5456 - val_loss: 2.4066 - val_accuracy: 0.6654
102/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 408us/step - loss: 2.5663 - accuracy: 0.5738 - val_loss: 2.3830 - val_accuracy: 0.6528
103/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 393us/step - loss: 2.5405 - accuracy: 0.5778 - val_loss: 2.2917 - val_accuracy: 0.6684
104/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 398us/step - loss: 2.5373 - accuracy: 0.5870 - val_loss: 2.2606 - val_accuracy: 0.6852
105/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 2.4934 - accuracy: 0.5946 - val_loss: 2.2190 - val_accuracy: 0.6852
106/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 409us/step - loss: 2.4745 - accuracy: 0.5920 - val_loss: 2.1826 - val_accuracy: 0.6910
107/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 400us/step - loss: 2.3676 - accuracy: 0.6158 - val_loss: 2.1226 - val_accuracy: 0.7058
108/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 413us/step - loss: 2.3591 - accuracy: 0.6148 - val_loss: 2.0710 - val_accuracy: 0.7092
109/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 413us/step - loss: 2.3524 - accuracy: 0.6132 - val_loss: 2.0538 - val_accuracy: 0.7120
110/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 412us/step - loss: 2.2083 - accuracy: 0.6440 - val_loss: 2.0058 - val_accuracy: 0.7246
111/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 2.2500 - accuracy: 0.6318 - val_loss: 1.9156 - val_accuracy: 0.7410
112/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 398us/step - loss: 2.1819 - accuracy: 0.6530 - val_loss: 1.8126 - val_accuracy: 0.7542
113/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 2.0875 - accuracy: 0.6602 - val_loss: 1.8725 - val_accuracy: 0.7480
114/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 403us/step - loss: 2.0695 - accuracy: 0.6694 - val_loss: 1.7876 - val_accuracy: 0.7560
115/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 1.9872 - accuracy: 0.6808 - val_loss: 1.7615 - val_accuracy: 0.7638
116/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 2.0265 - accuracy: 0.6764 - val_loss: 1.8029 - val_accuracy: 0.7500
117/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 395us/step - loss: 1.9593 - accuracy: 0.6968 - val_loss: 1.7222 - val_accuracy: 0.7694
118/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 405us/step - loss: 1.9793 - accuracy: 0.6876 - val_loss: 1.7054 - val_accuracy: 0.7854
119/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 1.8934 - accuracy: 0.6984 - val_loss: 1.6764 - val_accuracy: 0.7760
120/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 411us/step - loss: 1.8731 - accuracy: 0.7044 - val_loss: 1.6600 - val_accuracy: 0.7762
121/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 413us/step - loss: 1.8633 - accuracy: 0.7042 - val_loss: 1.5655 - val_accuracy: 0.7950
122/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 400us/step - loss: 1.8155 - accuracy: 0.7212 - val_loss: 1.5577 - val_accuracy: 0.8050
123/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 1.7936 - accuracy: 0.7226 - val_loss: 1.5797 - val_accuracy: 0.7924
124/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 400us/step - loss: 1.6727 - accuracy: 0.7428 - val_loss: 1.4695 - val_accuracy: 0.8170
125/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 401us/step - loss: 1.6898 - accuracy: 0.7436 - val_loss: 1.4540 - val_accuracy: 0.8192
126/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 405us/step - loss: 1.6531 - accuracy: 0.7418 - val_loss: 1.4008 - val_accuracy: 0.8206
127/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 399us/step - loss: 1.6137 - accuracy: 0.7506 - val_loss: 1.3895 - val_accuracy: 0.8118
128/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 1.6290 - accuracy: 0.7560 - val_loss: 1.3585 - val_accuracy: 0.8338
129/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 401us/step - loss: 1.6117 - accuracy: 0.7524 - val_loss: 1.2903 - val_accuracy: 0.8374
130/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 411us/step - loss: 1.5511 - accuracy: 0.7594 - val_loss: 1.2891 - val_accuracy: 0.8316
131/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 396us/step - loss: 1.5014 - accuracy: 0.7786 - val_loss: 1.3013 - val_accuracy: 0.8308
132/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 416us/step - loss: 1.4592 - accuracy: 0.7834 - val_loss: 1.2238 - val_accuracy: 0.8452
133/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 415us/step - loss: 1.4994 - accuracy: 0.7738 - val_loss: 1.1824 - val_accuracy: 0.8542
134/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 1.4425 - accuracy: 0.7844 - val_loss: 1.1497 - val_accuracy: 0.8548
135/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 1.4319 - accuracy: 0.7870 - val_loss: 1.1830 - val_accuracy: 0.8530
136/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 405us/step - loss: 1.3450 - accuracy: 0.8006 - val_loss: 1.1152 - val_accuracy: 0.8616
137/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 408us/step - loss: 1.4073 - accuracy: 0.7928 - val_loss: 1.1236 - val_accuracy: 0.8584
138/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 413us/step - loss: 1.3359 - accuracy: 0.8014 - val_loss: 1.1054 - val_accuracy: 0.8554
139/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 420us/step - loss: 1.3105 - accuracy: 0.8080 - val_loss: 1.0732 - val_accuracy: 0.8714
140/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 1.2528 - accuracy: 0.8166 - val_loss: 1.1127 - val_accuracy: 0.8648
141/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 412us/step - loss: 1.2472 - accuracy: 0.8178 - val_loss: 1.0218 - val_accuracy: 0.8784
142/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 420us/step - loss: 1.2278 - accuracy: 0.8228 - val_loss: 0.9639 - val_accuracy: 0.8844
143/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 1.2285 - accuracy: 0.8170 - val_loss: 1.0322 - val_accuracy: 0.8720
144/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 407us/step - loss: 1.1853 - accuracy: 0.8240 - val_loss: 0.8959 - val_accuracy: 0.8954
145/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 1.1545 - accuracy: 0.8328 - val_loss: 0.9459 - val_accuracy: 0.8820
146/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 409us/step - loss: 1.1736 - accuracy: 0.8298 - val_loss: 0.9650 - val_accuracy: 0.8752
147/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 408us/step - loss: 1.0828 - accuracy: 0.8468 - val_loss: 0.8727 - val_accuracy: 0.8946
148/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 418us/step - loss: 1.0743 - accuracy: 0.8454 - val_loss: 0.8732 - val_accuracy: 0.8952
149/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 397us/step - loss: 1.1223 - accuracy: 0.8380 - val_loss: 0.8399 - val_accuracy: 0.8968
150/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 409us/step - loss: 1.0736 - accuracy: 0.8504 - val_loss: 0.8629 - val_accuracy: 0.8986
151/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 404us/step - loss: 1.0527 - accuracy: 0.8480 - val_loss: 0.7800 - val_accuracy: 0.9100
152/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 1.0271 - accuracy: 0.8550 - val_loss: 0.8736 - val_accuracy: 0.8946
153/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 401us/step - loss: 0.9801 - accuracy: 0.8648 - val_loss: 0.7773 - val_accuracy: 0.9082
154/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 398us/step - loss: 0.9624 - accuracy: 0.8606 - val_loss: 0.7587 - val_accuracy: 0.9122
155/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 412us/step - loss: 0.9653 - accuracy: 0.8668 - val_loss: 0.7569 - val_accuracy: 0.9102
156/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 409us/step - loss: 0.9171 - accuracy: 0.8768 - val_loss: 0.7783 - val_accuracy: 0.9008
157/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 410us/step - loss: 0.9824 - accuracy: 0.8598 - val_loss: 0.7716 - val_accuracy: 0.9114
158/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 412us/step - loss: 0.9050 - accuracy: 0.8720 - val_loss: 0.6798 - val_accuracy: 0.9246
159/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 419us/step - loss: 0.8868 - accuracy: 0.8782 - val_loss: 0.7305 - val_accuracy: 0.9134
160/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.8656 - accuracy: 0.8774 - val_loss: 0.6773 - val_accuracy: 0.9174
161/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 412us/step - loss: 0.9094 - accuracy: 0.8732 - val_loss: 0.7563 - val_accuracy: 0.9078
162/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 408us/step - loss: 0.8625 - accuracy: 0.8852 - val_loss: 0.6772 - val_accuracy: 0.9216
163/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 411us/step - loss: 0.9143 - accuracy: 0.8728 - val_loss: 0.7034 - val_accuracy: 0.9154
164/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 417us/step - loss: 0.8791 - accuracy: 0.8746 - val_loss: 0.7079 - val_accuracy: 0.9188
165/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 410us/step - loss: 0.8265 - accuracy: 0.8880 - val_loss: 0.6240 - val_accuracy: 0.9240
166/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.7777 - accuracy: 0.8974 - val_loss: 0.6988 - val_accuracy: 0.9150
167/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 403us/step - loss: 0.8305 - accuracy: 0.8876 - val_loss: 0.6100 - val_accuracy: 0.9274
168/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 407us/step - loss: 0.7893 - accuracy: 0.8900 - val_loss: 0.6668 - val_accuracy: 0.9140
169/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.8148 - accuracy: 0.8926 - val_loss: 0.6679 - val_accuracy: 0.9230
170/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 419us/step - loss: 0.7757 - accuracy: 0.8962 - val_loss: 0.6510 - val_accuracy: 0.9218
171/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 403us/step - loss: 0.7666 - accuracy: 0.8958 - val_loss: 0.5848 - val_accuracy: 0.9266
172/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 417us/step - loss: 0.7406 - accuracy: 0.8994 - val_loss: 0.5751 - val_accuracy: 0.9296
173/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 413us/step - loss: 0.7387 - accuracy: 0.8992 - val_loss: 0.5893 - val_accuracy: 0.9270
174/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 402us/step - loss: 0.7045 - accuracy: 0.9026 - val_loss: 0.5447 - val_accuracy: 0.9280
175/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 417us/step - loss: 0.7532 - accuracy: 0.8990 - val_loss: 0.5440 - val_accuracy: 0.9338
176/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 410us/step - loss: 0.7437 - accuracy: 0.9028 - val_loss: 0.5865 - val_accuracy: 0.9252
177/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 416us/step - loss: 0.6795 - accuracy: 0.9090 - val_loss: 0.5411 - val_accuracy: 0.9344
178/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 411us/step - loss: 0.7007 - accuracy: 0.9012 - val_loss: 0.5581 - val_accuracy: 0.9260
179/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 421us/step - loss: 0.6832 - accuracy: 0.9086 - val_loss: 0.5115 - val_accuracy: 0.9390
180/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 413us/step - loss: 0.6835 - accuracy: 0.9100 - val_loss: 0.5173 - val_accuracy: 0.9446
181/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 415us/step - loss: 0.6935 - accuracy: 0.9046 - val_loss: 0.5112 - val_accuracy: 0.9412
182/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 410us/step - loss: 0.7066 - accuracy: 0.9000 - val_loss: 0.5668 - val_accuracy: 0.9314
183/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 429us/step - loss: 0.6225 - accuracy: 0.9148 - val_loss: 0.5051 - val_accuracy: 0.9388
184/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.6505 - accuracy: 0.9174 - val_loss: 0.5356 - val_accuracy: 0.9334
185/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 425us/step - loss: 0.6816 - accuracy: 0.9102 - val_loss: 0.4791 - val_accuracy: 0.9428
186/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 407us/step - loss: 0.6619 - accuracy: 0.9106 - val_loss: 0.5131 - val_accuracy: 0.9410
187/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 407us/step - loss: 0.6706 - accuracy: 0.9084 - val_loss: 0.5034 - val_accuracy: 0.9348
188/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 413us/step - loss: 0.6367 - accuracy: 0.9156 - val_loss: 0.4722 - val_accuracy: 0.9390
189/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 412us/step - loss: 0.6209 - accuracy: 0.9154 - val_loss: 0.4924 - val_accuracy: 0.9394
190/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 417us/step - loss: 0.5862 - accuracy: 0.9240 - val_loss: 0.4789 - val_accuracy: 0.9396
191/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 419us/step - loss: 0.6070 - accuracy: 0.9210 - val_loss: 0.4566 - val_accuracy: 0.9392
192/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 428us/step - loss: 0.5869 - accuracy: 0.9196 - val_loss: 0.4740 - val_accuracy: 0.9422
193/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 428us/step - loss: 0.6011 - accuracy: 0.9222 - val_loss: 0.4707 - val_accuracy: 0.9468
194/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 412us/step - loss: 0.5858 - accuracy: 0.9198 - val_loss: 0.4336 - val_accuracy: 0.9468
195/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.5947 - accuracy: 0.9202 - val_loss: 0.4398 - val_accuracy: 0.9484
196/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.5615 - accuracy: 0.9256 - val_loss: 0.4687 - val_accuracy: 0.9408
197/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 420us/step - loss: 0.5673 - accuracy: 0.9236 - val_loss: 0.4215 - val_accuracy: 0.9478
198/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 415us/step - loss: 0.5637 - accuracy: 0.9294 - val_loss: 0.4343 - val_accuracy: 0.9456
199/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 417us/step - loss: 0.6137 - accuracy: 0.9172 - val_loss: 0.4341 - val_accuracy: 0.9462
200/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.6006 - accuracy: 0.9218 - val_loss: 0.3884 - val_accuracy: 0.9512
201/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 417us/step - loss: 0.5635 - accuracy: 0.9268 - val_loss: 0.4230 - val_accuracy: 0.9480
202/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 418us/step - loss: 0.5658 - accuracy: 0.9256 - val_loss: 0.4512 - val_accuracy: 0.9440
203/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 416us/step - loss: 0.6056 - accuracy: 0.9200 - val_loss: 0.4215 - val_accuracy: 0.9438
204/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.5344 - accuracy: 0.9278 - val_loss: 0.4380 - val_accuracy: 0.9458
205/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 426us/step - loss: 0.5138 - accuracy: 0.9304 - val_loss: 0.3961 - val_accuracy: 0.9506
206/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 418us/step - loss: 0.5704 - accuracy: 0.9264 - val_loss: 0.3948 - val_accuracy: 0.9486
207/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 436us/step - loss: 0.5551 - accuracy: 0.9248 - val_loss: 0.3943 - val_accuracy: 0.9526
208/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.4828 - accuracy: 0.9366 - val_loss: 0.4855 - val_accuracy: 0.9334
209/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 416us/step - loss: 0.4814 - accuracy: 0.9376 - val_loss: 0.3574 - val_accuracy: 0.9580
210/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.4560 - accuracy: 0.9418 - val_loss: 0.4189 - val_accuracy: 0.9474
211/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 417us/step - loss: 0.5182 - accuracy: 0.9278 - val_loss: 0.3576 - val_accuracy: 0.9526
212/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 416us/step - loss: 0.4829 - accuracy: 0.9360 - val_loss: 0.3724 - val_accuracy: 0.9542
213/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.5509 - accuracy: 0.9252 - val_loss: 0.4110 - val_accuracy: 0.9492
214/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 431us/step - loss: 0.5758 - accuracy: 0.9202 - val_loss: 0.4106 - val_accuracy: 0.9498
215/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 440us/step - loss: 0.4821 - accuracy: 0.9340 - val_loss: 0.3331 - val_accuracy: 0.9600
216/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 430us/step - loss: 0.4816 - accuracy: 0.9352 - val_loss: 0.3872 - val_accuracy: 0.9562
217/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 418us/step - loss: 0.4919 - accuracy: 0.9354 - val_loss: 0.3316 - val_accuracy: 0.9600
218/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 424us/step - loss: 0.4545 - accuracy: 0.9382 - val_loss: 0.3393 - val_accuracy: 0.9538
219/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 426us/step - loss: 0.4772 - accuracy: 0.9376 - val_loss: 0.3637 - val_accuracy: 0.9542
220/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 423us/step - loss: 0.4726 - accuracy: 0.9400 - val_loss: 0.3490 - val_accuracy: 0.9598
221/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 440us/step - loss: 0.4793 - accuracy: 0.9378 - val_loss: 0.3734 - val_accuracy: 0.9488
222/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 415us/step - loss: 0.5026 - accuracy: 0.9352 - val_loss: 0.3776 - val_accuracy: 0.9526
223/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.4759 - accuracy: 0.9306 - val_loss: 0.3640 - val_accuracy: 0.9504
224/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 424us/step - loss: 0.4789 - accuracy: 0.9378 - val_loss: 0.3393 - val_accuracy: 0.9588
225/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 431us/step - loss: 0.4675 - accuracy: 0.9372 - val_loss: 0.3774 - val_accuracy: 0.9558
226/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.5579 - accuracy: 0.9288 - val_loss: 0.3467 - val_accuracy: 0.9576
227/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 446us/step - loss: 0.4209 - accuracy: 0.9410 - val_loss: 0.3965 - val_accuracy: 0.9468
228/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 447us/step - loss: 0.4648 - accuracy: 0.9406 - val_loss: 0.3432 - val_accuracy: 0.9578
229/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 438us/step - loss: 0.5176 - accuracy: 0.9314 - val_loss: 0.3913 - val_accuracy: 0.9500
230/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 435us/step - loss: 0.4967 - accuracy: 0.9360 - val_loss: 0.3768 - val_accuracy: 0.9560
231/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 424us/step - loss: 0.4823 - accuracy: 0.9396 - val_loss: 0.3141 - val_accuracy: 0.9628
232/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 424us/step - loss: 0.4552 - accuracy: 0.9438 - val_loss: 0.3027 - val_accuracy: 0.9600
233/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 423us/step - loss: 0.4230 - accuracy: 0.9444 - val_loss: 0.3282 - val_accuracy: 0.9578
234/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.4708 - accuracy: 0.9340 - val_loss: 0.3755 - val_accuracy: 0.9472
235/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.4087 - accuracy: 0.9416 - val_loss: 0.3489 - val_accuracy: 0.9550
236/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 433us/step - loss: 0.4523 - accuracy: 0.9386 - val_loss: 0.3294 - val_accuracy: 0.9594
237/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 431us/step - loss: 0.4482 - accuracy: 0.9392 - val_loss: 0.3893 - val_accuracy: 0.9538
238/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 434us/step - loss: 0.4506 - accuracy: 0.9400 - val_loss: 0.3563 - val_accuracy: 0.9530
239/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 406us/step - loss: 0.4631 - accuracy: 0.9388 - val_loss: 0.3517 - val_accuracy: 0.9570
240/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 421us/step - loss: 0.4394 - accuracy: 0.9492 - val_loss: 0.2732 - val_accuracy: 0.9688
241/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 428us/step - loss: 0.4131 - accuracy: 0.9440 - val_loss: 0.3108 - val_accuracy: 0.9628
242/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 420us/step - loss: 0.4081 - accuracy: 0.9440 - val_loss: 0.3488 - val_accuracy: 0.9532
243/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 415us/step - loss: 0.4218 - accuracy: 0.9440 - val_loss: 0.3199 - val_accuracy: 0.9612
244/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 414us/step - loss: 0.4196 - accuracy: 0.9452 - val_loss: 0.3118 - val_accuracy: 0.9610
245/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.3984 - accuracy: 0.9474 - val_loss: 0.3152 - val_accuracy: 0.9612
246/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.4311 - accuracy: 0.9400 - val_loss: 0.2962 - val_accuracy: 0.9634
247/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 424us/step - loss: 0.4427 - accuracy: 0.9402 - val_loss: 0.3643 - val_accuracy: 0.9556
248/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 420us/step - loss: 0.4436 - accuracy: 0.9410 - val_loss: 0.3296 - val_accuracy: 0.9612
249/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 423us/step - loss: 0.4356 - accuracy: 0.9440 - val_loss: 0.3044 - val_accuracy: 0.9630
250/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 428us/step - loss: 0.3828 - accuracy: 0.9490 - val_loss: 0.2771 - val_accuracy: 0.9684
251/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 435us/step - loss: 0.4115 - accuracy: 0.9416 - val_loss: 0.3557 - val_accuracy: 0.9580
252/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 432us/step - loss: 0.3686 - accuracy: 0.9490 - val_loss: 0.3319 - val_accuracy: 0.9634
253/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 435us/step - loss: 0.4639 - accuracy: 0.9432 - val_loss: 0.2853 - val_accuracy: 0.9638
254/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 437us/step - loss: 0.4792 - accuracy: 0.9362 - val_loss: 0.3423 - val_accuracy: 0.9564
255/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 426us/step - loss: 0.4066 - accuracy: 0.9480 - val_loss: 0.3347 - val_accuracy: 0.9576
256/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 428us/step - loss: 0.4724 - accuracy: 0.9376 - val_loss: 0.2919 - val_accuracy: 0.9658
257/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 432us/step - loss: 0.4215 - accuracy: 0.9410 - val_loss: 0.2725 - val_accuracy: 0.9642
258/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 425us/step - loss: 0.4419 - accuracy: 0.9464 - val_loss: 0.3282 - val_accuracy: 0.9636
259/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 420us/step - loss: 0.4133 - accuracy: 0.9474 - val_loss: 0.2633 - val_accuracy: 0.9680
260/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 438us/step - loss: 0.4547 - accuracy: 0.9410 - val_loss: 0.3277 - val_accuracy: 0.9632
261/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 432us/step - loss: 0.3550 - accuracy: 0.9518 - val_loss: 0.2824 - val_accuracy: 0.9662
262/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 432us/step - loss: 0.4278 - accuracy: 0.9406 - val_loss: 0.2624 - val_accuracy: 0.9686
263/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 437us/step - loss: 0.4622 - accuracy: 0.9398 - val_loss: 0.3406 - val_accuracy: 0.9556
264/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 426us/step - loss: 0.3704 - accuracy: 0.9546 - val_loss: 0.3197 - val_accuracy: 0.9664
265/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 421us/step - loss: 0.3736 - accuracy: 0.9492 - val_loss: 0.3204 - val_accuracy: 0.9618
266/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.3926 - accuracy: 0.9480 - val_loss: 0.3420 - val_accuracy: 0.9558
267/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 426us/step - loss: 0.3492 - accuracy: 0.9552 - val_loss: 0.3409 - val_accuracy: 0.9594
268/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 424us/step - loss: 0.4315 - accuracy: 0.9456 - val_loss: 0.3871 - val_accuracy: 0.9564
269/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 438us/step - loss: 0.4241 - accuracy: 0.9416 - val_loss: 0.3569 - val_accuracy: 0.9562
270/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 433us/step - loss: 0.4078 - accuracy: 0.9438 - val_loss: 0.2925 - val_accuracy: 0.9646
271/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 419us/step - loss: 0.3924 - accuracy: 0.9468 - val_loss: 0.3646 - val_accuracy: 0.9536
272/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 433us/step - loss: 0.3643 - accuracy: 0.9520 - val_loss: 0.3494 - val_accuracy: 0.9608
273/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 418us/step - loss: 0.3252 - accuracy: 0.9564 - val_loss: 0.2771 - val_accuracy: 0.9666
274/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.4002 - accuracy: 0.9480 - val_loss: 0.3212 - val_accuracy: 0.9644
275/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.4312 - accuracy: 0.9450 - val_loss: 0.3275 - val_accuracy: 0.9604
276/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 431us/step - loss: 0.4204 - accuracy: 0.9418 - val_loss: 0.2861 - val_accuracy: 0.9620
277/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 434us/step - loss: 0.4327 - accuracy: 0.9462 - val_loss: 0.2808 - val_accuracy: 0.9636
278/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 429us/step - loss: 0.4367 - accuracy: 0.9428 - val_loss: 0.3191 - val_accuracy: 0.9568
279/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 434us/step - loss: 0.3983 - accuracy: 0.9492 - val_loss: 0.3552 - val_accuracy: 0.9592
280/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 432us/step - loss: 0.3887 - accuracy: 0.9472 - val_loss: 0.2665 - val_accuracy: 0.9658
281/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 435us/step - loss: 0.3997 - accuracy: 0.9486 - val_loss: 0.2733 - val_accuracy: 0.9666
282/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 435us/step - loss: 0.3756 - accuracy: 0.9484 - val_loss: 0.2950 - val_accuracy: 0.9626
283/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 430us/step - loss: 0.3351 - accuracy: 0.9502 - val_loss: 0.2685 - val_accuracy: 0.9652
284/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 431us/step - loss: 0.3385 - accuracy: 0.9566 - val_loss: 0.2542 - val_accuracy: 0.9664
285/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 446us/step - loss: 0.3217 - accuracy: 0.9572 - val_loss: 0.3173 - val_accuracy: 0.9628
286/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 421us/step - loss: 0.3968 - accuracy: 0.9494 - val_loss: 0.3565 - val_accuracy: 0.9536
287/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 434us/step - loss: 0.4626 - accuracy: 0.9386 - val_loss: 0.3295 - val_accuracy: 0.9590
288/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 436us/step - loss: 0.3845 - accuracy: 0.9450 - val_loss: 0.3039 - val_accuracy: 0.9630
289/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.4553 - accuracy: 0.9394 - val_loss: 0.2742 - val_accuracy: 0.9644
290/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 423us/step - loss: 0.4083 - accuracy: 0.9486 - val_loss: 0.2771 - val_accuracy: 0.9692
291/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 447us/step - loss: 0.3854 - accuracy: 0.9468 - val_loss: 0.3103 - val_accuracy: 0.9640
292/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 429us/step - loss: 0.3969 - accuracy: 0.9484 - val_loss: 0.2863 - val_accuracy: 0.9642
293/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 435us/step - loss: 0.3743 - accuracy: 0.9508 - val_loss: 0.2994 - val_accuracy: 0.9654
294/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 431us/step - loss: 0.3768 - accuracy: 0.9484 - val_loss: 0.3063 - val_accuracy: 0.9620
295/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 422us/step - loss: 0.3746 - accuracy: 0.9496 - val_loss: 0.3148 - val_accuracy: 0.9662
296/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 421us/step - loss: 0.3738 - accuracy: 0.9516 - val_loss: 0.2518 - val_accuracy: 0.9680
297/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 427us/step - loss: 0.3634 - accuracy: 0.9562 - val_loss: 0.3113 - val_accuracy: 0.9608
298/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 433us/step - loss: 0.3427 - accuracy: 0.9530 - val_loss: 0.3012 - val_accuracy: 0.9556
299/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 437us/step - loss: 0.3641 - accuracy: 0.9508 - val_loss: 0.2865 - val_accuracy: 0.9550
300/300
Train on 5000 samples, validate on 5000 samples
Epoch 1/1
5000/5000 [==============================] - 2s 439us/step - loss: 0.3436 - accuracy: 0.9544 - val_loss: 0.2552 - val_accuracy: 0.9686
100000/100000 [==============================] - 10s 101us/step
Train set: Loss=0.2559 ; Accuracy=96.9%
100000/100000 [==============================] - 10s 101us/step
Test set: Loss=14.9620 ; Accuracy=1.8%

Load embeddings¶

In [ ]:

embeddings = Embeddings(read_embeddings('embeddings.csv'))

state_space_size = embeddings.size() * history_length
action_space_size = embeddings.size() * ra_length

Start Agent training¶

In [ ]:

environment = Environment(data, embeddings, alpha, gamma, fixed_length)

tf.reset_default_graph() # For multiple consecutive executions

sess = tf.Session()
# '1: Initialize actor network f_θ^π and critic network Q(s, a|θ^µ) with random weights'
actor = Actor(sess, state_space_size, action_space_size, batch_size, ra_length, history_length, embeddings.size(), tau, actor_lr)
critic = Critic(sess, state_space_size, action_space_size, history_length, embeddings.size(), tau, critic_lr)

In [ ]:

#collapse-output
train(sess, environment, actor, critic, embeddings, history_length, ra_length, buffer_size, batch_size, discount_factor, nb_episodes, filename_summary)

WARNING:tensorflow:From <ipython-input-19-1a5cd2de5f02>:69: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:From <ipython-input-19-1a5cd2de5f02>:70: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
WARNING:tensorflow:From <ipython-input-19-1a5cd2de5f02>:40: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Episode 1/100 Reward=552 Time=4s No replay
Episode 2/100 Reward=551 Time=52s Loss=2082.7095
Episode 3/100 Reward=551 Time=68s Loss=123.3409
Episode 4/100 Reward=551 Time=67s Loss=67.4848
Episode 5/100 Reward=551 Time=67s Loss=44.0779
Episode 6/100 Reward=552 Time=67s Loss=40.3507
Episode 7/100 Reward=552 Time=67s Loss=29.4237
Episode 8/100 Reward=552 Time=68s Loss=26.5713
Episode 9/100 Reward=552 Time=68s Loss=27.1646
Episode 10/100 Reward=552 Time=68s Loss=25.9928
Episode 11/100 Reward=552 Time=67s Loss=22.3598
Episode 12/100 Reward=552 Time=68s Loss=18.7026
Episode 13/100 Reward=552 Time=67s Loss=17.8249
Episode 14/100 Reward=551 Time=67s Loss=19.8687
Episode 15/100 Reward=552 Time=68s Loss=20.9238
Episode 16/100 Reward=551 Time=67s Loss=19.9583
Episode 17/100 Reward=551 Time=68s Loss=20.2933
Episode 18/100 Reward=551 Time=67s Loss=19.9462
Episode 19/100 Reward=551 Time=67s Loss=24.3696
Episode 20/100 Reward=551 Time=67s Loss=25.6828
Episode 21/100 Reward=551 Time=67s Loss=28.5111
Episode 22/100 Reward=551 Time=67s Loss=29.4505
Episode 23/100 Reward=551 Time=67s Loss=27.2863
Episode 24/100 Reward=551 Time=68s Loss=28.4667
Episode 25/100 Reward=551 Time=67s Loss=26.7666
Episode 26/100 Reward=551 Time=67s Loss=26.2378
Episode 27/100 Reward=551 Time=67s Loss=25.0391
Episode 28/100 Reward=551 Time=67s Loss=25.0170
Episode 29/100 Reward=551 Time=68s Loss=22.9655
Episode 30/100 Reward=551 Time=68s Loss=23.8730
Episode 31/100 Reward=551 Time=67s Loss=20.4020
Episode 32/100 Reward=551 Time=67s Loss=22.4662
Episode 33/100 Reward=551 Time=67s Loss=23.8943
Episode 34/100 Reward=551 Time=67s Loss=21.1020
Episode 35/100 Reward=551 Time=67s Loss=22.8528
Episode 36/100 Reward=551 Time=67s Loss=21.2307
Episode 37/100 Reward=551 Time=67s Loss=19.7094
Episode 38/100 Reward=551 Time=67s Loss=21.4233
Episode 39/100 Reward=551 Time=67s Loss=23.6903
Episode 40/100 Reward=551 Time=67s Loss=24.4852
Episode 41/100 Reward=551 Time=67s Loss=25.7120
Episode 42/100 Reward=551 Time=67s Loss=21.7722
Episode 43/100 Reward=551 Time=67s Loss=20.9898
Episode 44/100 Reward=551 Time=67s Loss=20.6604
Episode 45/100 Reward=551 Time=68s Loss=20.8646
Episode 46/100 Reward=551 Time=67s Loss=19.4622
Episode 47/100 Reward=551 Time=67s Loss=20.4751
Episode 48/100 Reward=551 Time=67s Loss=18.9989
Episode 49/100 Reward=551 Time=67s Loss=17.7407
Episode 50/100 Reward=551 Time=67s Loss=17.3576
Episode 51/100 Reward=551 Time=68s Loss=17.2397
Episode 52/100 Reward=552 Time=67s Loss=16.5722
Episode 53/100 Reward=552 Time=67s Loss=15.5511
Episode 54/100 Reward=552 Time=67s Loss=15.7651
Episode 55/100 Reward=552 Time=67s Loss=14.0308
Episode 56/100 Reward=552 Time=67s Loss=14.4518
Episode 57/100 Reward=551 Time=67s Loss=15.9018
Episode 58/100 Reward=552 Time=67s Loss=14.2520
Episode 59/100 Reward=551 Time=67s Loss=14.2282
Episode 60/100 Reward=551 Time=67s Loss=14.1576
Episode 61/100 Reward=551 Time=67s Loss=13.1366
Episode 62/100 Reward=551 Time=67s Loss=13.7383
Episode 63/100 Reward=551 Time=67s Loss=12.3095
Episode 64/100 Reward=551 Time=68s Loss=11.7993
Episode 65/100 Reward=551 Time=67s Loss=12.1072
Episode 66/100 Reward=551 Time=67s Loss=12.8614
Episode 67/100 Reward=552 Time=67s Loss=11.4739
Episode 68/100 Reward=552 Time=67s Loss=12.6560
Episode 69/100 Reward=552 Time=67s Loss=12.8773
Episode 70/100 Reward=552 Time=67s Loss=11.7954
Episode 71/100 Reward=552 Time=67s Loss=11.2212
Episode 72/100 Reward=552 Time=67s Loss=12.3400
Episode 73/100 Reward=552 Time=67s Loss=12.5248
Episode 74/100 Reward=552 Time=67s Loss=11.2045
Episode 75/100 Reward=552 Time=67s Loss=11.1089
Episode 76/100 Reward=552 Time=67s Loss=11.6253
Episode 77/100 Reward=552 Time=67s Loss=10.9183
Episode 78/100 Reward=552 Time=67s Loss=9.2532
Episode 79/100 Reward=552 Time=67s Loss=10.4258
Episode 80/100 Reward=552 Time=67s Loss=10.0044
Episode 81/100 Reward=552 Time=67s Loss=10.4150
Episode 82/100 Reward=552 Time=67s Loss=10.9766
Episode 83/100 Reward=551 Time=67s Loss=8.8571
Episode 84/100 Reward=551 Time=67s Loss=10.5467
Episode 85/100 Reward=551 Time=67s Loss=8.8356
Episode 86/100 Reward=551 Time=67s Loss=11.0192
Episode 87/100 Reward=551 Time=67s Loss=9.3348
Episode 88/100 Reward=551 Time=67s Loss=11.0042
Episode 89/100 Reward=551 Time=67s Loss=8.5757
Episode 90/100 Reward=551 Time=67s Loss=8.8545
Episode 91/100 Reward=551 Time=67s Loss=10.0286
Episode 92/100 Reward=551 Time=67s Loss=10.2471
Episode 93/100 Reward=551 Time=67s Loss=9.3180
Episode 94/100 Reward=551 Time=68s Loss=8.2303
Episode 95/100 Reward=551 Time=68s Loss=9.1910
Episode 96/100 Reward=551 Time=67s Loss=8.2708
Episode 97/100 Reward=551 Time=67s Loss=8.1502
Episode 98/100 Reward=551 Time=67s Loss=8.1103
Episode 99/100 Reward=551 Time=67s Loss=8.2638
Episode 100/100 Reward=551 Time=67s Loss=8.2967

Testing¶

In [ ]:

dict_embeddings = {}
for i, item in enumerate(embeddings.get_embedding_vector()):
  str_item = str(item)
  assert(str_item not in dict_embeddings)
  dict_embeddings[str_item] = i

In [ ]:

def state_to_items(state, actor, ra_length, embeddings, dict_embeddings, target=False):
  return [dict_embeddings[str(action)]
          for action in actor.get_recommendation_list(ra_length, np.array(state).reshape(1, -1), embeddings, target).reshape(ra_length, embeddings.size())]

In [ ]:

def test_actor(actor, test_df, embeddings, dict_embeddings, ra_length, history_length, target=False, nb_rounds=1):
  ratings = []
  unknown = 0
  random_seen = []
  for _ in range(nb_rounds):
    for i in range(len(test_df)):
      history_sample = list(test_df[i].sample(history_length)['itemId'])
      recommendation = state_to_items(embeddings.embed(history_sample), actor, ra_length, embeddings, dict_embeddings, target)
      for item in recommendation:
        l = list(test_df[i].loc[test_df[i]['itemId'] == item]['rating'])
        assert(len(l) < 2)
        if len(l) == 0:
          unknown += 1
        else:
          ratings.append(l[0])
      for item in history_sample:
        random_seen.append(list(test_df[i].loc[test_df[i]['itemId'] == item]['rating'])[0])

  return ratings, unknown, random_seen

Test 1 - Trainset and target=False¶

In [ ]:

ratings, unknown, random_seen = test_actor(actor, dg.train, embeddings, dict_embeddings, ra_length, history_length, target=False, nb_rounds=10)
print('%0.1f%% unknown' % (100 * unknown / (len(ratings) + unknown)))

91.5% unknown

In [ ]:

plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.hist(ratings)
plt.title('Predictions ; Mean = %.4f' % (np.mean(ratings)))
plt.subplot(1, 2, 2)
plt.hist(random_seen)
plt.title('Random ; Mean = %.4f' % (np.mean(random_seen)))
plt.show()

Test 2 - Trainset and target=True¶

In [ ]:

ratings, unknown, random_seen = test_actor(actor, dg.train, embeddings, dict_embeddings, ra_length, history_length, target=True, nb_rounds=10)
print('%0.1f%% unknown' % (100 * unknown / (len(ratings) + unknown)))

91.5% unknown

In [ ]:

plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.hist(ratings)
plt.title('Predictions ; Mean = %.4f' % (np.mean(ratings)))
plt.subplot(1, 2, 2)
plt.hist(random_seen)
plt.title('Random ; Mean = %.4f' % (np.mean(random_seen)))
plt.show()

Movie List Recommender using Actor-critic based RL method¶

Introduction¶

Setup¶

Download data¶

DataGenerator class¶

EmbeddingsGenerator class¶

Embeddings helper class¶

read_file helper function¶

read_embeddings helper function¶

Environment class¶

Actor class¶

Critic class¶

ReplayMemory class¶

experience_replay function¶

OrnsteinUhlenbeckNoise class¶

Hyperparameters¶

Data generation¶

Embedding generation¶

Load embeddings¶

Start Agent training¶

Testing¶

Test 1 - Trainset and target=False¶

Test 2 - Trainset and target=True¶

DataGenerator `class`¶

EmbeddingsGenerator `class`¶

Embeddings `helper class`¶

read_file `helper function`¶

read_embeddings `helper function`¶

Environment `class`¶

Actor `class`¶

Critic `class`¶

ReplayMemory `class`¶

experience_replay `function`¶

OrnsteinUhlenbeckNoise `class`¶