Training a list-wise movie recommender using actor-critic policy and evaluating offline using experience replay method
We will model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users’ feedbacks.
Efforts have been made on utilizing reinforcement learning for recommender systems, such as POMDP and Q-learning. However, these methods may become inflexible with the increasing number of items for recommendations. This prevents them to be adopted by practical recommender systems.
Generally, there exist two Deep Q-learning architectures, shown in the above figure. Traditional deep Q-learning adopts the first architecture as shown in (a), which inputs only the state space and outputs Q-values of all actions. This architecture is suitable for the scenario with high state space and small action space, like playing Atari. However, one drawback is that it cannot handle large and dynamic action space scenario, like recommender systems. The second Q-learning architecture, shown (b), treats the state and the action as the input of Neural Networks and outputs the Q-value corresponding to this action. This architecture does not need to store each Q-value in memory and thus can deal with large action space or even continuous action space. A challenging problem of leveraging the second architecture is temporal complexity, i.e., this architecture computes Q-value for all potential actions, separately.
Recommending a list of items is more desirable (especially for sellers) than recommending a single item. To achieve this, one option is to score the items seperately and select the top ones. For example, DQN can calculate Q-values of all recalled items separately, and recommend a list of items with highest Q-values. But this strategy do not consider the relation between items. e.g. If the next best item is egg, all kind of different eggs will get high scores, like white eggs, brown eggs, farm eggs etc. But these are similar items, not complimentary. And the whole purpose of list-wise recommendation is to recommend complimentary items. That's where DQN fails.
One option to resolve this issue is by adding a simple rule - select only 1 top-scoring item from each category. This is a good rule and will improve the list quality but we have to compromise with some missed opportunities here because let's say we recommend a 12-brown-eggs 🥚 tray and a small brown bread 🍞. Now it is possible that if a 24-brown-eggs tray is scored higher than 12-brown-eggs tray but in bread category, small-brown-bread is still the highest score item. As per business sense, we should recommend a large brown bread with 24-brown-eggs tray. This is what we missed - either customer will manually select the large bread (lost opportunity for higher customer satisfaction) or just buy the small one (lost opportunity for higher revenue).
In this tutorial, our goal is to fill this gap. We will evaluate the RL agent offline using experience replay method. Also, it is possible that productionizing this model cost more than the benefit, especially for small businesses, because if we are getting 1% revenue gain, on $1M, it might not be sufficient, and on $1B, the same model will become one of the highest priority model to productionize 💵.
To tackle this problem, in this paper, our recommending policy builds upon the Actor-Critic framework. We model this problem as a Markov Decision Process (MDP), which includes a sequence of states, actions and rewards. More formally, MDP consists of a tuple of five elements $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma)$ as follows:
With the notations and definitions above, the problem of listwise item recommendation can be formally de!ned as follows: Given the historical MDP, i.e., $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma)$, the goal is to find a recommendation policy $\pi : \mathcal{S} \to A$, which can maximize the cumulative reward for the recommender system.
According to collaborative filtering techniques, users with similar interests will make similar decisions on the same item. With this intuition, we match the current state and action to existing historical state-action pairs, and stochastically generate a simulated reward. To be more specific, we first build a memory $M = {m_1,m_2, ···}$ to store users’ historical browsing history, where $m_i$ is a user-agent interaction triplet $((s_i, a_i) \to r_i)$. The procedure to build the online simulator memory is illustrated in the following figure. Given a historical recommendation session ${a_1, ··· , a_L}$, we can observe the initial state $s_0 = {s_0^1, ··· ,s_0^N}$ from the previous sessions (line 2). Each time we observe $K$ items in temporal order (line 3), which means that each iteration we will move forward a window of K. We can observe the current state (line 4), current $K$ items (line 5), and the user’s feedbacks for these items (line 6). Then we store triplet $((s_i, a_i) \to r_i)$ in memory (line 7). Finally we update the state (lines 8-13), and move to the next $K$ items. Since we keep a fixed length state $s = {s_1, ··· ,s_N }$, each time a user clicked/ordered some items in the recommended list, we add these items to the end of state and remove the same number of items in the top of the state. For example, the RA recommends a list of !ve items ${a_1, ··· , a_5}$ to a user, if the user clicks $a_1$ and orders $a_5$, then update $s = {s_3, ··· ,s_N , a_1, a_5}$.
Then we calculated the similarity of the current state-action pair, say $p_t(s_t,a_t)$, to each existing historical state-action pair in the memory. In this work, we adopt cosine similarity as:
where the first term measures the state similarity and the second term evaluates the action similarity. Parameter $\alpha$ controls the balance of two similarities.
The proposed framework is as follows:
The framework works like this:
Input: Current state $s_t$ , Item space $\mathcal{I}$, the length of recommendation list $K$. Output: Recommendation list $a_t$.
%tensorflow_version 1.x
TensorFlow 1.x selected.
import itertools
import pandas as pd
import numpy as np
import random
import csv
import time
import matplotlib.pyplot as plt
import tensorflow as tf
import keras.backend as K
from keras import Sequential
from keras.layers import Dense, Dropout
Using TensorFlow backend.
Downloading Movielens dataset from official source
!wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
!unzip -q ml-100k.zip
class
¶#collapse-hide
class DataGenerator():
def __init__(self, datapath, itempath):
'''
Load data from the DB MovieLens
List the users and the items
List all the users histories
'''
self.data = self.load_data(datapath, itempath)
self.users = self.data['userId'].unique() #list of all users
self.items = self.data['itemId'].unique() #list of all items
self.histo = self.generate_history()
self.train = []
self.test = []
def load_data(self, datapath, itempath):
'''
Load the data and merge the name of each movie.
A row corresponds to a rate given by a user to a movie.
Parameters
----------
datapath : string
path to the data 100k MovieLens
contains usersId;itemId;rating
itempath : string
path to the data 100k MovieLens
contains itemId;itemName
Returns
-------
result : DataFrame
Contains all the ratings
'''
data = pd.read_csv(datapath, sep='\t',
names=['userId', 'itemId', 'rating', 'timestamp'])
movie_titles = pd.read_csv(itempath, sep='|', names=['itemId', 'itemName'],
usecols=range(2), encoding='latin-1')
return data.merge(movie_titles,on='itemId', how='left')
def generate_history(self):
'''
Group all rates given by users and store them from older to most recent.
Returns
-------
result : List(DataFrame)
List of the historic for each user
'''
historic_users = []
for i, u in enumerate(self.users):
temp = self.data[self.data['userId'] == u]
temp = temp.sort_values('timestamp').reset_index()
temp.drop('index', axis=1, inplace=True)
historic_users.append(temp)
return historic_users
def sample_history(self, user_histo, action_ratio=0.8, max_samp_by_user=5, max_state=100, max_action=50, nb_states=[], nb_actions=[]):
'''
For a given history, make one or multiple sampling.
If no optional argument given for nb_states and nb_actions, then the sampling
is random and each sample can have differents size for action and state.
To normalize sampling we need to give list of the numbers of states and actions
to be sampled.
Parameters
----------
user_histo : DataFrame
historic of user
delimiter : string, optional
delimiter for the csv
action_ratio : float, optional
ratio form which movies in history will be selected
max_samp_by_user: int, optional
Nulber max of sample to make by user
max_state : int, optional
Number max of movies to take for the 'state' column
max_action : int, optional
Number max of movies to take for the 'action' action
nb_states : array(int), optional
Numbers of movies to be taken for each sample made on user's historic
nb_actions : array(int), optional
Numbers of rating to be taken for each sample made on user's historic
Returns
-------
states : List(String)
All the states sampled, format of a sample: itemId&rating
actions : List(String)
All the actions sampled, format of a sample: itemId&rating
Notes
-----
States must be before(timestamp<) the actions.
If given, size of nb_states is the numbller of sample by user
sizes of nb_states and nb_actions must be equals
'''
n = len(user_histo)
sep = int(action_ratio * n)
nb_sample = random.randint(1, max_samp_by_user)
if not nb_states:
nb_states = [min(random.randint(1, sep), max_state) for i in range(nb_sample)]
if not nb_actions:
nb_actions = [min(random.randint(1, n - sep), max_action) for i in range(nb_sample)]
assert len(nb_states) == len(nb_actions), 'Given array must have the same size'
states = []
actions = []
# SELECT SAMPLES IN HISTORY
for i in range(len(nb_states)):
sample_states = user_histo.iloc[0:sep].sample(nb_states[i])
sample_actions = user_histo.iloc[-(n - sep):].sample(nb_actions[i])
sample_state = []
sample_action = []
for j in range(nb_states[i]):
row = sample_states.iloc[j]
# FORMAT STATE
state = str(row.loc['itemId']) + '&' + str(row.loc['rating'])
sample_state.append(state)
for j in range(nb_actions[i]):
row = sample_actions.iloc[j]
# FORMAT ACTION
action = str(row.loc['itemId']) + '&' + str(row.loc['rating'])
sample_action.append(action)
states.append(sample_state)
actions.append(sample_action)
return states, actions
def gen_train_test(self, test_ratio, seed=None):
'''
Shuffle the historic of users and separate it in a train and a test set.
Store the ids for each set.
An user can't be in both set.
Parameters
----------
test_ratio : float
Ratio to control the sizes of the sets
seed : float
Seed on the shuffle
'''
n = len(self.histo)
if seed is not None:
random.Random(seed).shuffle(self.histo)
else:
random.shuffle(self.histo)
self.train = self.histo[:int((test_ratio * n))]
self.test = self.histo[int((test_ratio * n)):]
self.user_train = [h.iloc[0,0] for h in self.train]
self.user_test = [h.iloc[0,0] for h in self.test]
def write_csv(self, filename, histo_to_write, delimiter=';', action_ratio=0.8, max_samp_by_user=5, max_state=100, max_action=50, nb_states=[], nb_actions=[]):
'''
From a given historic, create a csv file with the format:
columns : state;action_reward;n_state
rows : itemid&rating1 | itemid&rating2 | ... ; itemid&rating3 | ... | itemid&rating4; itemid&rating1 | itemid&rating2 | itemid&rating3 | ... | item&rating4
at filename location.
Parameters
----------
filename : string
path to the file to be produced
histo_to_write : List(DataFrame)
List of the historic for each user
delimiter : string, optional
delimiter for the csv
action_ratio : float, optional
ratio form which movies in history will be selected
max_samp_by_user: int, optional
Nulber max of sample to make by user
max_state : int, optional
Number max of movies to take for the 'state' column
max_action : int, optional
Number max of movies to take for the 'action' action
nb_states : array(int), optional
Numbers of movies to be taken for each sample made on user's historic
nb_actions : array(int), optional
Numbers of rating to be taken for each sample made on user's historic
Notes
-----
if given, size of nb_states is the numbller of sample by user
sizes of nb_states and nb_actions must be equals
'''
with open(filename, mode='w') as file:
f_writer = csv.writer(file, delimiter=delimiter)
f_writer.writerow(['state', 'action_reward', 'n_state'])
for user_histo in histo_to_write:
states, actions = self.sample_history(user_histo, action_ratio, max_samp_by_user, max_state, max_action, nb_states, nb_actions)
for i in range(len(states)):
# FORMAT STATE
state_str = '|'.join(states[i])
# FORMAT ACTION
action_str = '|'.join(actions[i])
# FORMAT N_STATE
n_state_str = state_str + '|' + action_str
f_writer.writerow([state_str, action_str, n_state_str])
class
¶#collapse-hide
class EmbeddingsGenerator:
def __init__(self, train_users, data):
self.train_users = train_users
#preprocess
self.data = data.sort_values(by=['timestamp'])
#make them start at 0
self.data['userId'] = self.data['userId'] - 1
self.data['itemId'] = self.data['itemId'] - 1
self.user_count = self.data['userId'].max() + 1
self.movie_count = self.data['itemId'].max() + 1
self.user_movies = {} #list of rated movies by each user
for userId in range(self.user_count):
self.user_movies[userId] = self.data[self.data.userId == userId]['itemId'].tolist()
self.m = self.model()
def model(self, hidden_layer_size=100):
m = Sequential()
m.add(Dense(hidden_layer_size, input_shape=(1, self.movie_count)))
m.add(Dropout(0.2))
m.add(Dense(self.movie_count, activation='softmax'))
m.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return m
def generate_input(self, user_id):
'''
Returns a context and a target for the user_id
context: user's history with one random movie removed
target: id of random removed movie
'''
user_movies_count = len(self.user_movies[user_id])
#picking random movie
random_index = np.random.randint(0, user_movies_count-1) # -1 avoids taking the last movie
#setting target
target = np.zeros((1, self.movie_count))
target[0][self.user_movies[user_id][random_index]] = 1
#setting context
context = np.zeros((1, self.movie_count))
context[0][self.user_movies[user_id][:random_index] + self.user_movies[user_id][random_index+1:]] = 1
return context, target
def train(self, nb_epochs = 300, batch_size = 10000):
'''
Trains the model from train_users's history
'''
for i in range(nb_epochs):
print('%d/%d' % (i+1, nb_epochs))
batch = [self.generate_input(user_id=np.random.choice(self.train_users) - 1) for _ in range(batch_size)]
X_train = np.array([b[0] for b in batch])
y_train = np.array([b[1] for b in batch])
self.m.fit(X_train, y_train, epochs=1, validation_split=0.5)
def test(self, test_users, batch_size = 100000):
'''
Returns [loss, accuracy] on the test set
'''
batch_test = [self.generate_input(user_id=np.random.choice(test_users) - 1) for _ in range(batch_size)]
X_test = np.array([b[0] for b in batch_test])
y_test = np.array([b[1] for b in batch_test])
return self.m.evaluate(X_test, y_test)
def save_embeddings(self, file_name):
'''
Generates a csv file containg the vector embedding for each movie.
'''
inp = self.m.input # input placeholder
outputs = [layer.output for layer in self.m.layers] # all layer outputs
functor = K.function([inp, K.learning_phase()], outputs ) # evaluation function
#append embeddings to vectors
vectors = []
for movie_id in range(self.movie_count):
movie = np.zeros((1, 1, self.movie_count))
movie[0][0][movie_id] = 1
layer_outs = functor([movie])
vector = [str(v) for v in layer_outs[0][0][0]]
vector = '|'.join(vector)
vectors.append([movie_id, vector])
#saves as a csv file
embeddings = pd.DataFrame(vectors, columns=['item_id', 'vectors']).astype({'item_id': 'int32'})
embeddings.to_csv(file_name, sep=';', index=False)
helper class
¶#collapse-hide
class Embeddings:
def __init__(self, item_embeddings):
self.item_embeddings = item_embeddings
def size(self):
return self.item_embeddings.shape[1]
def get_embedding_vector(self):
return self.item_embeddings
def get_embedding(self, item_index):
return self.item_embeddings[item_index]
def embed(self, item_list):
return np.array([self.get_embedding(item) for item in item_list])
helper function
¶This function will read the stored data csv files into pandas dataframe
#collapse-hide
def read_file(data_path):
''' Load data from train.csv or test.csv. '''
data = pd.read_csv(data_path, sep=';')
for col in ['state', 'n_state', 'action_reward']:
data[col] = [np.array([[np.int(k) for k in ee.split('&')] for ee in e.split('|')]) for e in data[col]]
for col in ['state', 'n_state']:
data[col] = [np.array([e[0] for e in l]) for l in data[col]]
data['action'] = [[e[0] for e in l] for l in data['action_reward']]
data['reward'] = [tuple(e[1] for e in l) for l in data['action_reward']]
data.drop(columns=['action_reward'], inplace=True)
return data
helper function
¶This function will read the stored embedding csv file into pandas dataframe and return as multi-dimensional array
def read_embeddings(embeddings_path):
''' Load embeddings (a vector for each item). '''
embeddings = pd.read_csv(embeddings_path, sep=';')
return np.array([[np.float64(k) for k in e.split('|')]
for e in embeddings['vectors']])
class
¶This is the simulator. It will help orchestrating the whole process of learning list recommendations by our actor-critic based MDP agent.
#collapse-hide
class Environment():
def __init__(self, data, embeddings, alpha, gamma, fixed_length):
self.embeddings = embeddings
self.embedded_data = pd.DataFrame()
self.embedded_data['state'] = [np.array([embeddings.get_embedding(item_id)
for item_id in row['state']]) for _, row in data.iterrows()]
self.embedded_data['action'] = [np.array([embeddings.get_embedding(item_id)
for item_id in row['action']]) for _, row in data.iterrows()]
self.embedded_data['reward'] = data['reward']
self.alpha = alpha # α (alpha) in Equation (1)
self.gamma = gamma # Γ (Gamma) in Equation (4)
self.fixed_length = fixed_length
self.current_state = self.reset()
self.groups = self.get_groups()
def reset(self):
self.init_state = self.embedded_data['state'].sample(1).values[0]
return self.init_state
def step(self, actions):
'''
Compute reward and update state.
Args:
actions: embedded chosen items.
Returns:
cumulated_reward: overall reward.
current_state: updated state.
'''
# '18: Compute overall reward r_t according to Equation (4)'
simulated_rewards, cumulated_reward = self.simulate_rewards(self.current_state.reshape((1, -1)), actions.reshape((1, -1)))
# '11: Set s_t+1 = s_t' <=> self.current_state = self.current_state
for k in range(len(simulated_rewards)): # '12: for k = 1, K do'
if simulated_rewards[k] > 0: # '13: if r_t^k > 0 then'
# '14: Add a_t^k to the end of s_t+1'
self.current_state = np.append(self.current_state, [actions[k]], axis=0)
if self.fixed_length: # '15: Remove the first item of s_t+1'
self.current_state = np.delete(self.current_state, 0, axis=0)
return cumulated_reward, self.current_state
def get_groups(self):
''' Calculate average state/action value for each group. Equation (3). '''
groups = []
for rewards, group in self.embedded_data.groupby(['reward']):
size = group.shape[0]
states = np.array(list(group['state'].values))
actions = np.array(list(group['action'].values))
groups.append({
'size': size, # N_x in article
'rewards': rewards, # U_x in article (combination of rewards)
'average state': (np.sum(states / np.linalg.norm(states, 2, axis=1)[:, np.newaxis], axis=0) / size).reshape((1, -1)), # s_x^-
'average action': (np.sum(actions / np.linalg.norm(actions, 2, axis=1)[:, np.newaxis], axis=0) / size).reshape((1, -1)) # a_x^-
})
return groups
def simulate_rewards(self, current_state, chosen_actions, reward_type='grouped cosine'):
'''
Calculate simulated rewards.
Args:
current_state: history, list of embedded items.
chosen_actions: embedded chosen items.
reward_type: from ['normal', 'grouped average', 'grouped cosine'].
Returns:
returned_rewards: most probable rewards.
cumulated_reward: probability weighted rewards.
'''
# Equation (1)
def cosine_state_action(s_t, a_t, s_i, a_i):
cosine_state = np.dot(s_t, s_i.T) / (np.linalg.norm(s_t, 2) * np.linalg.norm(s_i, 2))
cosine_action = np.dot(a_t, a_i.T) / (np.linalg.norm(a_t, 2) * np.linalg.norm(a_i, 2))
return (self.alpha * cosine_state + (1 - self.alpha) * cosine_action).reshape((1,))
if reward_type == 'normal':
# Calculate simulated reward in normal way: Equation (2)
probabilities = [cosine_state_action(current_state, chosen_actions, row['state'], row['action'])
for _, row in self.embedded_data.iterrows()]
elif reward_type == 'grouped average':
# Calculate simulated reward by grouped average: Equation (3)
probabilities = np.array([g['size'] for g in self.groups]) *\
[(self.alpha * (np.dot(current_state, g['average state'].T) / np.linalg.norm(current_state, 2))\
+ (1 - self.alpha) * (np.dot(chosen_actions, g['average action'].T) / np.linalg.norm(chosen_actions, 2)))
for g in self.groups]
elif reward_type == 'grouped cosine':
# Calculate simulated reward by grouped cosine: Equations (1) and (3)
probabilities = [cosine_state_action(current_state, chosen_actions, g['average state'], g['average action'])
for g in self.groups]
# Normalize (sum to 1)
probabilities = np.array(probabilities) / sum(probabilities)
# Get most probable rewards
if reward_type == 'normal':
returned_rewards = self.embedded_data.iloc[np.argmax(probabilities)]['reward']
elif reward_type in ['grouped average', 'grouped cosine']:
returned_rewards = self.groups[np.argmax(probabilities)]['rewards']
# Equation (4)
def overall_reward(rewards, gamma):
return np.sum([gamma**k * reward for k, reward in enumerate(rewards)])
if reward_type in ['normal', 'grouped average']:
# Get cumulated reward: Equation (4)
cumulated_reward = overall_reward(returned_rewards, self.gamma)
elif reward_type == 'grouped cosine':
# Get probability weighted cumulated reward
cumulated_reward = np.sum([p * overall_reward(g['rewards'], self.gamma)
for p, g in zip(probabilities, self.groups)])
return returned_rewards, cumulated_reward
class
¶This is the policy approximator actor
#collapse-hide
class Actor():
''' Policy function approximator. '''
def __init__(self, sess, state_space_size, action_space_size, batch_size, ra_length, history_length, embedding_size, tau, learning_rate, scope='actor'):
self.sess = sess
self.state_space_size = state_space_size
self.action_space_size = action_space_size
self.batch_size = batch_size
self.ra_length = ra_length
self.history_length = history_length
self.embedding_size = embedding_size
self.tau = tau
self.learning_rate = learning_rate
self.scope = scope
with tf.variable_scope(self.scope):
# Build Actor network
self.action_weights, self.state, self.sequence_length = self._build_net('estimator_actor')
self.network_params = tf.trainable_variables()
# Build target Actor network
self.target_action_weights, self.target_state, self.target_sequence_length = self._build_net('target_actor')
self.target_network_params = tf.trainable_variables()[len(self.network_params):] # TODO: why sublist [len(x):]? Maybe because its equal to network_params + target_network_params
# Initialize target network weights with network weights (θ^π′ ← θ^π)
self.init_target_network_params = [self.target_network_params[i].assign(self.network_params[i])
for i in range(len(self.target_network_params))]
# Update target network weights (θ^π′ ← τθ^π + (1 − τ)θ^π′)
self.update_target_network_params = [self.target_network_params[i].assign(
tf.multiply(self.tau, self.network_params[i]) +
tf.multiply(1 - self.tau, self.target_network_params[i]))
for i in range(len(self.target_network_params))]
# Gradient computation from Critic's action_gradients
self.action_gradients = tf.placeholder(tf.float32, [None, self.action_space_size])
gradients = tf.gradients(tf.reshape(self.action_weights, [self.batch_size, self.action_space_size], name='42222222222'),
self.network_params,
self.action_gradients)
params_gradients = list(map(lambda x: tf.div(x, self.batch_size * self.action_space_size), gradients))
# Compute ∇_a.Q(s, a|θ^µ).∇_θ^π.f_θ^π(s)
self.optimizer = tf.train.AdamOptimizer(self.learning_rate).apply_gradients(
zip(params_gradients, self.network_params))
def _build_net(self, scope):
''' Build the (target) Actor network. '''
def gather_last_output(data, seq_lens):
def cli_value(x, v):
y = tf.constant(v, shape=x.get_shape(), dtype=tf.int64)
x = tf.cast(x, tf.int64)
return tf.where(tf.greater(x, y), x, y)
batch_range = tf.range(tf.cast(tf.shape(data)[0], dtype=tf.int64), dtype=tf.int64)
tmp_end = tf.map_fn(lambda x: cli_value(x, 0), seq_lens - 1, dtype=tf.int64)
indices = tf.stack([batch_range, tmp_end], axis=1)
return tf.gather_nd(data, indices)
with tf.variable_scope(scope):
# Inputs: current state, sequence_length
# Outputs: action weights to compute the score Equation (6)
state = tf.placeholder(tf.float32, [None, self.state_space_size], 'state')
state_ = tf.reshape(state, [-1, self.history_length, self.embedding_size])
sequence_length = tf.placeholder(tf.int32, [None], 'sequence_length')
cell = tf.nn.rnn_cell.GRUCell(self.embedding_size,
activation=tf.nn.relu,
kernel_initializer=tf.initializers.random_normal(),
bias_initializer=tf.zeros_initializer())
outputs, _ = tf.nn.dynamic_rnn(cell, state_, dtype=tf.float32, sequence_length=sequence_length)
last_output = gather_last_output(outputs, sequence_length) # TODO: replace by h
x = tf.keras.layers.Dense(self.ra_length * self.embedding_size)(last_output)
action_weights = tf.reshape(x, [-1, self.ra_length, self.embedding_size])
return action_weights, state, sequence_length
def train(self, state, sequence_length, action_gradients):
''' Compute ∇_a.Q(s, a|θ^µ).∇_θ^π.f_θ^π(s). '''
self.sess.run(self.optimizer,
feed_dict={
self.state: state,
self.sequence_length: sequence_length,
self.action_gradients: action_gradients})
def predict(self, state, sequence_length):
return self.sess.run(self.action_weights,
feed_dict={
self.state: state,
self.sequence_length: sequence_length})
def predict_target(self, state, sequence_length):
return self.sess.run(self.target_action_weights,
feed_dict={
self.target_state: state,
self.target_sequence_length: sequence_length})
def init_target_network(self):
self.sess.run(self.init_target_network_params)
def update_target_network(self):
self.sess.run(self.update_target_network_params)
def get_recommendation_list(self, ra_length, noisy_state, embeddings, target=False):
'''
Algorithm 2
Args:
ra_length: length of the recommendation list.
noisy_state: current/remembered environment state with noise.
embeddings: Embeddings object.
target: boolean to use Actor's network or target network.
Returns:
Recommendation List: list of embedded items as future actions.
'''
def get_score(weights, embedding, batch_size):
'''
Equation (6)
Args:
weights: w_t^k shape=(embedding_size,).
embedding: e_i shape=(embedding_size,).
Returns:
score of the item i: score_i=w_t^k.e_i^T shape=(1,).
'''
ret = np.dot(weights, embedding.T)
return ret
batch_size = noisy_state.shape[0]
# '1: Generate w_t = {w_t^1, ..., w_t^K} according to Equation (5)'
method = self.predict_target if target else self.predict
weights = method(noisy_state, [ra_length] * batch_size)
# '3: Score items in I according to Equation (6)'
scores = np.array([[[get_score(weights[i][k], embedding, batch_size)
for embedding in embeddings.get_embedding_vector()]
for k in range(ra_length)]
for i in range(batch_size)])
# '8: return a_t'
return np.array([[embeddings.get_embedding(np.argmax(scores[i][k]))
for k in range(ra_length)]
for i in range(batch_size)])
class
¶This is the value approximator critic
#collapse-hide
class Critic():
''' Value function approximator. '''
def __init__(self, sess, state_space_size, action_space_size, history_length, embedding_size, tau, learning_rate, scope='critic'):
self.sess = sess
self.state_space_size = state_space_size
self.action_space_size = action_space_size
self.history_length = history_length
self.embedding_size = embedding_size
self.tau = tau
self.learning_rate = learning_rate
self.scope = scope
with tf.variable_scope(self.scope):
# Build Critic network
self.critic_Q_value, self.state, self.action, self.sequence_length = self._build_net('estimator_critic')
self.network_params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='estimator_critic')
# Build target Critic network
self.target_Q_value, self.target_state, self.target_action, self.target_sequence_length = self._build_net('target_critic')
self.target_network_params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='target_critic')
# Initialize target network weights with network weights (θ^µ′ ← θ^µ)
self.init_target_network_params = [self.target_network_params[i].assign(self.network_params[i])
for i in range(len(self.target_network_params))]
# Update target network weights (θ^µ′ ← τθ^µ + (1 − τ)θ^µ′)
self.update_target_network_params = [self.target_network_params[i].assign(
tf.multiply(self.tau, self.network_params[i]) +
tf.multiply(1 - self.tau, self.target_network_params[i]))
for i in range(len(self.target_network_params))]
# Minimize MSE between Critic's and target Critic's outputed Q-values
self.expected_reward = tf.placeholder(tf.float32, [None, 1])
self.loss = tf.reduce_mean(tf.squared_difference(self.expected_reward, self.critic_Q_value))
self.optimizer = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss)
# Compute ∇_a.Q(s, a|θ^µ)
self.action_gradients = tf.gradients(self.critic_Q_value, self.action)
def _build_net(self, scope):
''' Build the (target) Critic network. '''
def gather_last_output(data, seq_lens):
def cli_value(x, v):
y = tf.constant(v, shape=x.get_shape(), dtype=tf.int64)
return tf.where(tf.greater(x, y), x, y)
this_range = tf.range(tf.cast(tf.shape(seq_lens)[0], dtype=tf.int64), dtype=tf.int64)
tmp_end = tf.map_fn(lambda x: cli_value(x, 0), seq_lens - 1, dtype=tf.int64)
indices = tf.stack([this_range, tmp_end], axis=1)
return tf.gather_nd(data, indices)
with tf.variable_scope(scope):
# Inputs: current state, current action
# Outputs: predicted Q-value
state = tf.placeholder(tf.float32, [None, self.state_space_size], 'state')
state_ = tf.reshape(state, [-1, self.history_length, self.embedding_size])
action = tf.placeholder(tf.float32, [None, self.action_space_size], 'action')
sequence_length = tf.placeholder(tf.int64, [None], name='critic_sequence_length')
cell = tf.nn.rnn_cell.GRUCell(self.history_length,
activation=tf.nn.relu,
kernel_initializer=tf.initializers.random_normal(),
bias_initializer=tf.zeros_initializer())
predicted_state, _ = tf.nn.dynamic_rnn(cell, state_, dtype=tf.float32, sequence_length=sequence_length)
predicted_state = gather_last_output(predicted_state, sequence_length)
inputs = tf.concat([predicted_state, action], axis=-1)
layer1 = tf.layers.Dense(32, activation=tf.nn.relu)(inputs)
layer2 = tf.layers.Dense(16, activation=tf.nn.relu)(layer1)
critic_Q_value = tf.layers.Dense(1)(layer2)
return critic_Q_value, state, action, sequence_length
def train(self, state, action, sequence_length, expected_reward):
''' Minimize MSE between expected reward and target Critic's Q-value. '''
return self.sess.run([self.critic_Q_value, self.loss, self.optimizer],
feed_dict={
self.state: state,
self.action: action,
self.sequence_length: sequence_length,
self.expected_reward: expected_reward})
def predict(self, state, action, sequence_length):
''' Returns Critic's predicted Q-value. '''
return self.sess.run(self.critic_Q_value,
feed_dict={
self.state: state,
self.action: action,
self.sequence_length: sequence_length})
def predict_target(self, state, action, sequence_length):
''' Returns target Critic's predicted Q-value. '''
return self.sess.run(self.target_Q_value,
feed_dict={
self.target_state: state,
self.target_action: action,
self.target_sequence_length: sequence_length})
def get_action_gradients(self, state, action, sequence_length):
''' Returns ∇_a.Q(s, a|θ^µ). '''
return np.array(self.sess.run(self.action_gradients,
feed_dict={
self.state: state,
self.action: action,
self.sequence_length: sequence_length})[0])
def init_target_network(self):
self.sess.run(self.init_target_network_params)
def update_target_network(self):
self.sess.run(self.update_target_network_params)
class
¶#collapse-hide
class ReplayMemory():
''' Replay memory D. '''
def __init__(self, buffer_size):
self.buffer_size = buffer_size
# self.buffer = [[row['state'], row['action'], row['reward'], row['n_state']] for _, row in data.iterrows()][-self.buffer_size:] TODO: empty or not?
self.buffer = []
def add(self, state, action, reward, n_state):
self.buffer.append([state, action, reward, n_state])
if len(self.buffer) > self.buffer_size:
self.buffer.pop(0)
def size(self):
return len(self.buffer)
def sample_batch(self, batch_size):
return random.sample(self.buffer, batch_size)
function
¶#collapse-hide
def experience_replay(replay_memory, batch_size, actor, critic, embeddings, ra_length, state_space_size, action_space_size, discount_factor):
'''
Experience replay.
Args:
replay_memory: replay memory D in article.
batch_size: sample size.
actor: Actor network.
critic: Critic network.
embeddings: Embeddings object.
state_space_size: dimension of states.
action_space_size: dimensions of actions.
Returns:
Best Q-value, loss of Critic network for printing/recording purpose.
'''
# '22: Sample minibatch of N transitions (s, a, r, s′) from D'
samples = replay_memory.sample_batch(batch_size)
states = np.array([s[0] for s in samples])
actions = np.array([s[1] for s in samples])
rewards = np.array([s[2] for s in samples])
n_states = np.array([s[3] for s in samples]).reshape(-1, state_space_size)
# '23: Generate a′ by target Actor network according to Algorithm 2'
n_actions = actor.get_recommendation_list(ra_length, states, embeddings, target=True).reshape(-1, action_space_size)
# Calculate predicted Q′(s′, a′|θ^µ′) value
target_Q_value = critic.predict_target(n_states, n_actions, [ra_length] * batch_size)
# '24: Set y = r + γQ′(s′, a′|θ^µ′)'
expected_rewards = rewards + discount_factor * target_Q_value
# '25: Update Critic by minimizing (y − Q(s, a|θ^µ))²'
critic_Q_value, critic_loss, _ = critic.train(states, actions, [ra_length] * batch_size, expected_rewards)
# '26: Update the Actor using the sampled policy gradient'
action_gradients = critic.get_action_gradients(states, n_actions, [ra_length] * batch_size)
actor.train(states, [ra_length] * batch_size, action_gradients)
# '27: Update the Critic target networks'
critic.update_target_network()
# '28: Update the Actor target network'
actor.update_target_network()
return np.amax(critic_Q_value), critic_loss
class
¶#collapse-hide
class OrnsteinUhlenbeckNoise:
''' Noise for Actor predictions. '''
def __init__(self, action_space_size, mu=0, theta=0.5, sigma=0.2):
self.action_space_size = action_space_size
self.mu = mu
self.theta = theta
self.sigma = sigma
self.state = np.ones(self.action_space_size) * self.mu
def get(self):
self.state += self.theta * (self.mu - self.state) + self.sigma * np.random.rand(self.action_space_size)
return self.state
def train(sess, environment, actor, critic, embeddings, history_length, ra_length, buffer_size, batch_size, discount_factor, nb_episodes, filename_summary):
''' Algorithm 3 in article. '''
# Set up summary operators
def build_summaries():
episode_reward = tf.Variable(0.)
tf.summary.scalar('reward', episode_reward)
episode_max_Q = tf.Variable(0.)
tf.summary.scalar('max_Q_value', episode_max_Q)
critic_loss = tf.Variable(0.)
tf.summary.scalar('critic_loss', critic_loss)
summary_vars = [episode_reward, episode_max_Q, critic_loss]
summary_ops = tf.summary.merge_all()
return summary_ops, summary_vars
summary_ops, summary_vars = build_summaries()
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter(filename_summary, sess.graph)
# '2: Initialize target network f′ and Q′'
actor.init_target_network()
critic.init_target_network()
# '3: Initialize the capacity of replay memory D'
replay_memory = ReplayMemory(buffer_size) # Memory D in article
replay = False
start_time = time.time()
for i_session in range(nb_episodes): # '4: for session = 1, M do'
session_reward = 0
session_Q_value = 0
session_critic_loss = 0
# '5: Reset the item space I' is useless because unchanged.
states = environment.reset() # '6: Initialize state s_0 from previous sessions'
if (i_session + 1) % 10 == 0: # Update average parameters every 10 episodes
environment.groups = environment.get_groups()
exploration_noise = OrnsteinUhlenbeckNoise(history_length * embeddings.size())
for t in range(nb_rounds): # '7: for t = 1, T do'
# '8: Stage 1: Transition Generating Stage'
# '9: Select an action a_t = {a_t^1, ..., a_t^K} according to Algorithm 2'
actions = actor.get_recommendation_list(
ra_length,
states.reshape(1, -1), # TODO + exploration_noise.get().reshape(1, -1),
embeddings).reshape(ra_length, embeddings.size())
# '10: Execute action a_t and observe the reward list {r_t^1, ..., r_t^K} for each item in a_t'
rewards, next_states = environment.step(actions)
# '19: Store transition (s_t, a_t, r_t, s_t+1) in D'
replay_memory.add(states.reshape(history_length * embeddings.size()),
actions.reshape(ra_length * embeddings.size()),
[rewards],
next_states.reshape(history_length * embeddings.size()))
states = next_states # '20: Set s_t = s_t+1'
session_reward += rewards
# '21: Stage 2: Parameter Updating Stage'
if replay_memory.size() >= batch_size: # Experience replay
replay = True
replay_Q_value, critic_loss = experience_replay(replay_memory, batch_size,
actor, critic, embeddings, ra_length, history_length * embeddings.size(),
ra_length * embeddings.size(), discount_factor)
session_Q_value += replay_Q_value
session_critic_loss += critic_loss
summary_str = sess.run(summary_ops,
feed_dict={summary_vars[0]: session_reward,
summary_vars[1]: session_Q_value,
summary_vars[2]: session_critic_loss})
writer.add_summary(summary_str, i_session)
'''
print(state_to_items(embeddings.embed(data['state'][0]), actor, ra_length, embeddings),
state_to_items(embeddings.embed(data['state'][0]), actor, ra_length, embeddings, True))
'''
str_loss = str('Loss=%0.4f' % session_critic_loss)
print(('Episode %d/%d Reward=%d Time=%ds ' + (str_loss if replay else 'No replay')) % (i_session + 1, nb_episodes, session_reward, time.time() - start_time))
start_time = time.time()
writer.close()
tf.train.Saver().save(sess, 'models.h5', write_meta_graph=False)
# Hyperparameters
history_length = 12 # N in article
ra_length = 4 # K in article
discount_factor = 0.99 # Gamma in Bellman equation
actor_lr = 0.0001
critic_lr = 0.001
tau = 0.001 # τ in Algorithm 3
batch_size = 64
nb_episodes = 100
nb_rounds = 50
filename_summary = 'summary.txt'
alpha = 0.5 # α (alpha) in Equation (1)
gamma = 0.9 # Γ (Gamma) in Equation (4)
buffer_size = 1000000 # Size of replay memory D in article
fixed_length = True # Fixed memory length
dg = DataGenerator('ml-100k/u.data', 'ml-100k/u.item')
dg.gen_train_test(0.8, seed=42)
dg.write_csv('train.csv', dg.train, nb_states=[history_length], nb_actions=[ra_length])
dg.write_csv('test.csv', dg.test, nb_states=[history_length], nb_actions=[ra_length])
data = read_file('train.csv')
data.head()
state | n_state | action | reward | |
---|---|---|---|---|
0 | [732, 257, 507, 602, 481, 568, 1286, 50, 501, ... | [732, 257, 507, 602, 481, 568, 1286, 50, 501, ... | [731, 525, 80, 88] | (3, 4, 3, 3) |
1 | [1226, 855, 339, 124, 16, 147, 59, 827, 323, 2... | [1226, 855, 339, 124, 16, 147, 59, 827, 323, 2... | [52, 1005, 347, 70] | (4, 5, 4, 3) |
2 | [316, 286, 313, 748, 258, 272, 300, 302, 347, ... | [316, 286, 313, 748, 258, 272, 300, 302, 347, ... | [751, 271, 689, 289] | (4, 4, 4, 5) |
3 | [235, 433, 96, 117, 429, 7, 471, 201, 276, 55,... | [235, 433, 96, 117, 429, 7, 471, 201, 276, 55,... | [31, 198, 724, 654] | (3, 5, 3, 4) |
4 | [77, 241, 98, 423, 71, 157, 955, 186, 121, 421... | [77, 241, 98, 423, 71, 157, 955, 186, 121, 421... | [316, 427, 313, 959] | (4, 5, 4, 5) |
#collapse-output
if True: # Generate embeddings?
eg = EmbeddingsGenerator(dg.user_train, pd.read_csv('ml-100k/u.data', sep='\t', names=['userId', 'itemId', 'rating', 'timestamp']))
eg.train(nb_epochs=300)
train_loss, train_accuracy = eg.test(dg.user_train)
print('Train set: Loss=%.4f ; Accuracy=%.1f%%' % (train_loss, train_accuracy * 100))
test_loss, test_accuracy = eg.test(dg.user_test)
print('Test set: Loss=%.4f ; Accuracy=%.1f%%' % (test_loss, test_accuracy * 100))
eg.save_embeddings('embeddings.csv')
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. 1/300 WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 493us/step - loss: 6.9202 - accuracy: 0.0100 - val_loss: 6.5489 - val_accuracy: 0.0160 2/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 392us/step - loss: 6.4452 - accuracy: 0.0150 - val_loss: 6.3391 - val_accuracy: 0.0144 3/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 397us/step - loss: 6.2737 - accuracy: 0.0172 - val_loss: 6.2418 - val_accuracy: 0.0142 4/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 6.2539 - accuracy: 0.0156 - val_loss: 6.1208 - val_accuracy: 0.0152 5/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 387us/step - loss: 6.1386 - accuracy: 0.0196 - val_loss: 6.1390 - val_accuracy: 0.0230 6/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 399us/step - loss: 6.0841 - accuracy: 0.0170 - val_loss: 6.0209 - val_accuracy: 0.0210 7/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 6.0844 - accuracy: 0.0204 - val_loss: 5.9844 - val_accuracy: 0.0240 8/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 397us/step - loss: 6.0329 - accuracy: 0.0222 - val_loss: 5.9625 - val_accuracy: 0.0244 9/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 6.0147 - accuracy: 0.0204 - val_loss: 5.9273 - val_accuracy: 0.0246 10/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 387us/step - loss: 5.9878 - accuracy: 0.0250 - val_loss: 5.9182 - val_accuracy: 0.0264 11/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 388us/step - loss: 5.9183 - accuracy: 0.0254 - val_loss: 5.8513 - val_accuracy: 0.0324 12/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 390us/step - loss: 5.9244 - accuracy: 0.0266 - val_loss: 5.8591 - val_accuracy: 0.0334 13/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 386us/step - loss: 5.8851 - accuracy: 0.0312 - val_loss: 5.8540 - val_accuracy: 0.0326 14/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 392us/step - loss: 5.8685 - accuracy: 0.0316 - val_loss: 5.8123 - val_accuracy: 0.0378 15/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 399us/step - loss: 5.8641 - accuracy: 0.0334 - val_loss: 5.8084 - val_accuracy: 0.0338 16/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 400us/step - loss: 5.8641 - accuracy: 0.0322 - val_loss: 5.7705 - val_accuracy: 0.0390 17/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 399us/step - loss: 5.8240 - accuracy: 0.0434 - val_loss: 5.7161 - val_accuracy: 0.0510 18/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 387us/step - loss: 5.7978 - accuracy: 0.0442 - val_loss: 5.7052 - val_accuracy: 0.0492 19/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 384us/step - loss: 5.7698 - accuracy: 0.0396 - val_loss: 5.7139 - val_accuracy: 0.0484 20/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 392us/step - loss: 5.7513 - accuracy: 0.0440 - val_loss: 5.6890 - val_accuracy: 0.0524 21/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 401us/step - loss: 5.7090 - accuracy: 0.0488 - val_loss: 5.6584 - val_accuracy: 0.0514 22/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 401us/step - loss: 5.7031 - accuracy: 0.0432 - val_loss: 5.5928 - val_accuracy: 0.0622 23/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 5.6509 - accuracy: 0.0554 - val_loss: 5.5568 - val_accuracy: 0.0670 24/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 381us/step - loss: 5.6746 - accuracy: 0.0600 - val_loss: 5.6020 - val_accuracy: 0.0604 25/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 390us/step - loss: 5.5880 - accuracy: 0.0618 - val_loss: 5.5072 - val_accuracy: 0.0756 26/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 5.6083 - accuracy: 0.0634 - val_loss: 5.5330 - val_accuracy: 0.0712 27/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 391us/step - loss: 5.5992 - accuracy: 0.0658 - val_loss: 5.5303 - val_accuracy: 0.0782 28/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 390us/step - loss: 5.5620 - accuracy: 0.0730 - val_loss: 5.4303 - val_accuracy: 0.0820 29/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 386us/step - loss: 5.5142 - accuracy: 0.0720 - val_loss: 5.3807 - val_accuracy: 0.0934 30/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 387us/step - loss: 5.5019 - accuracy: 0.0752 - val_loss: 5.3523 - val_accuracy: 0.0960 31/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 391us/step - loss: 5.4424 - accuracy: 0.0876 - val_loss: 5.3459 - val_accuracy: 0.0970 32/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 391us/step - loss: 5.4085 - accuracy: 0.0878 - val_loss: 5.3406 - val_accuracy: 0.1014 33/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 398us/step - loss: 5.3886 - accuracy: 0.0888 - val_loss: 5.3043 - val_accuracy: 0.0976 34/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 393us/step - loss: 5.3653 - accuracy: 0.0924 - val_loss: 5.2621 - val_accuracy: 0.1140 35/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 388us/step - loss: 5.4007 - accuracy: 0.0890 - val_loss: 5.2699 - val_accuracy: 0.1140 36/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 404us/step - loss: 5.2848 - accuracy: 0.1040 - val_loss: 5.2311 - val_accuracy: 0.1204 37/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 5.3042 - accuracy: 0.1008 - val_loss: 5.1873 - val_accuracy: 0.1230 38/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 5.2921 - accuracy: 0.1066 - val_loss: 5.1309 - val_accuracy: 0.1314 39/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 398us/step - loss: 5.2618 - accuracy: 0.1070 - val_loss: 5.0953 - val_accuracy: 0.1448 40/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 390us/step - loss: 5.1676 - accuracy: 0.1136 - val_loss: 5.0245 - val_accuracy: 0.1492 41/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 389us/step - loss: 5.1354 - accuracy: 0.1242 - val_loss: 5.0307 - val_accuracy: 0.1498 42/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 5.1219 - accuracy: 0.1276 - val_loss: 4.9901 - val_accuracy: 0.1650 43/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 400us/step - loss: 5.1534 - accuracy: 0.1336 - val_loss: 4.9714 - val_accuracy: 0.1694 44/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 399us/step - loss: 5.1184 - accuracy: 0.1376 - val_loss: 4.9489 - val_accuracy: 0.1592 45/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 401us/step - loss: 5.0298 - accuracy: 0.1376 - val_loss: 4.9274 - val_accuracy: 0.1720 46/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 5.0099 - accuracy: 0.1504 - val_loss: 4.8445 - val_accuracy: 0.1786 47/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 4.9666 - accuracy: 0.1526 - val_loss: 4.7906 - val_accuracy: 0.1948 48/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 399us/step - loss: 4.9227 - accuracy: 0.1608 - val_loss: 4.7955 - val_accuracy: 0.1882 49/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 392us/step - loss: 4.9143 - accuracy: 0.1596 - val_loss: 4.7671 - val_accuracy: 0.1988 50/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 4.8729 - accuracy: 0.1662 - val_loss: 4.7239 - val_accuracy: 0.2054 51/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 394us/step - loss: 4.7912 - accuracy: 0.1836 - val_loss: 4.6844 - val_accuracy: 0.2136 52/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 393us/step - loss: 4.8110 - accuracy: 0.1784 - val_loss: 4.6213 - val_accuracy: 0.2212 53/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 408us/step - loss: 4.6814 - accuracy: 0.1954 - val_loss: 4.5962 - val_accuracy: 0.2324 54/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 394us/step - loss: 4.7148 - accuracy: 0.1876 - val_loss: 4.5031 - val_accuracy: 0.2446 55/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 4.6427 - accuracy: 0.2044 - val_loss: 4.5063 - val_accuracy: 0.2494 56/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 394us/step - loss: 4.6296 - accuracy: 0.2088 - val_loss: 4.4856 - val_accuracy: 0.2628 57/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 389us/step - loss: 4.5943 - accuracy: 0.2136 - val_loss: 4.4382 - val_accuracy: 0.2612 58/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 394us/step - loss: 4.5594 - accuracy: 0.2218 - val_loss: 4.3101 - val_accuracy: 0.2852 59/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 4.5126 - accuracy: 0.2246 - val_loss: 4.3327 - val_accuracy: 0.2772 60/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 397us/step - loss: 4.4381 - accuracy: 0.2378 - val_loss: 4.2424 - val_accuracy: 0.2890 61/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 388us/step - loss: 4.4603 - accuracy: 0.2266 - val_loss: 4.2749 - val_accuracy: 0.2970 62/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 392us/step - loss: 4.4146 - accuracy: 0.2394 - val_loss: 4.1974 - val_accuracy: 0.3112 63/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 405us/step - loss: 4.3489 - accuracy: 0.2488 - val_loss: 4.1782 - val_accuracy: 0.3094 64/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 4.3680 - accuracy: 0.2514 - val_loss: 4.1138 - val_accuracy: 0.3308 65/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 4.3094 - accuracy: 0.2558 - val_loss: 4.0360 - val_accuracy: 0.3318 66/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 392us/step - loss: 4.2179 - accuracy: 0.2738 - val_loss: 4.0146 - val_accuracy: 0.3466 67/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 400us/step - loss: 4.1800 - accuracy: 0.2802 - val_loss: 3.9621 - val_accuracy: 0.3408 68/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 399us/step - loss: 4.1356 - accuracy: 0.2862 - val_loss: 3.9211 - val_accuracy: 0.3626 69/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 390us/step - loss: 4.0549 - accuracy: 0.3104 - val_loss: 3.8791 - val_accuracy: 0.3770 70/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 401us/step - loss: 4.0348 - accuracy: 0.3038 - val_loss: 3.8400 - val_accuracy: 0.3868 71/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 397us/step - loss: 3.9944 - accuracy: 0.3064 - val_loss: 3.7690 - val_accuracy: 0.3856 72/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 4.0106 - accuracy: 0.3132 - val_loss: 3.7704 - val_accuracy: 0.3956 73/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 404us/step - loss: 3.9379 - accuracy: 0.3204 - val_loss: 3.6701 - val_accuracy: 0.3974 74/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 403us/step - loss: 3.8949 - accuracy: 0.3356 - val_loss: 3.6144 - val_accuracy: 0.4296 75/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 390us/step - loss: 3.8187 - accuracy: 0.3368 - val_loss: 3.5836 - val_accuracy: 0.4128 76/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 405us/step - loss: 3.8102 - accuracy: 0.3428 - val_loss: 3.5028 - val_accuracy: 0.4346 77/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 404us/step - loss: 3.7603 - accuracy: 0.3534 - val_loss: 3.4695 - val_accuracy: 0.4408 78/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 3.7149 - accuracy: 0.3650 - val_loss: 3.4650 - val_accuracy: 0.4510 79/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 392us/step - loss: 3.6468 - accuracy: 0.3808 - val_loss: 3.4893 - val_accuracy: 0.4436 80/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 3.6445 - accuracy: 0.3664 - val_loss: 3.3478 - val_accuracy: 0.4686 81/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 398us/step - loss: 3.5365 - accuracy: 0.3988 - val_loss: 3.3008 - val_accuracy: 0.4854 82/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 3.5254 - accuracy: 0.3944 - val_loss: 3.3290 - val_accuracy: 0.4742 83/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 3.4863 - accuracy: 0.4056 - val_loss: 3.3257 - val_accuracy: 0.4792 84/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 398us/step - loss: 3.4112 - accuracy: 0.4164 - val_loss: 3.1776 - val_accuracy: 0.5012 85/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 3.4042 - accuracy: 0.4216 - val_loss: 3.1592 - val_accuracy: 0.5088 86/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 3.2852 - accuracy: 0.4386 - val_loss: 3.1144 - val_accuracy: 0.5164 87/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 403us/step - loss: 3.2709 - accuracy: 0.4408 - val_loss: 3.0742 - val_accuracy: 0.5240 88/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 403us/step - loss: 3.2722 - accuracy: 0.4414 - val_loss: 3.0320 - val_accuracy: 0.5292 89/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 3.1894 - accuracy: 0.4532 - val_loss: 2.9413 - val_accuracy: 0.5480 90/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 404us/step - loss: 3.1212 - accuracy: 0.4698 - val_loss: 2.8748 - val_accuracy: 0.5718 91/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 394us/step - loss: 3.0990 - accuracy: 0.4772 - val_loss: 2.9096 - val_accuracy: 0.5594 92/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 397us/step - loss: 2.9949 - accuracy: 0.4906 - val_loss: 2.7876 - val_accuracy: 0.5796 93/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 405us/step - loss: 2.9939 - accuracy: 0.4926 - val_loss: 2.7424 - val_accuracy: 0.5816 94/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 405us/step - loss: 2.9415 - accuracy: 0.4980 - val_loss: 2.6546 - val_accuracy: 0.5992 95/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 397us/step - loss: 2.9397 - accuracy: 0.5084 - val_loss: 2.6482 - val_accuracy: 0.5952 96/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 387us/step - loss: 2.8384 - accuracy: 0.5260 - val_loss: 2.6494 - val_accuracy: 0.6058 97/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 2.8045 - accuracy: 0.5276 - val_loss: 2.6678 - val_accuracy: 0.6152 98/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 2.7874 - accuracy: 0.5284 - val_loss: 2.5639 - val_accuracy: 0.6278 99/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 408us/step - loss: 2.7609 - accuracy: 0.5396 - val_loss: 2.5057 - val_accuracy: 0.6398 100/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 404us/step - loss: 2.6811 - accuracy: 0.5578 - val_loss: 2.4427 - val_accuracy: 0.6516 101/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 410us/step - loss: 2.6871 - accuracy: 0.5456 - val_loss: 2.4066 - val_accuracy: 0.6654 102/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 408us/step - loss: 2.5663 - accuracy: 0.5738 - val_loss: 2.3830 - val_accuracy: 0.6528 103/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 393us/step - loss: 2.5405 - accuracy: 0.5778 - val_loss: 2.2917 - val_accuracy: 0.6684 104/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 398us/step - loss: 2.5373 - accuracy: 0.5870 - val_loss: 2.2606 - val_accuracy: 0.6852 105/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 2.4934 - accuracy: 0.5946 - val_loss: 2.2190 - val_accuracy: 0.6852 106/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 409us/step - loss: 2.4745 - accuracy: 0.5920 - val_loss: 2.1826 - val_accuracy: 0.6910 107/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 400us/step - loss: 2.3676 - accuracy: 0.6158 - val_loss: 2.1226 - val_accuracy: 0.7058 108/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 413us/step - loss: 2.3591 - accuracy: 0.6148 - val_loss: 2.0710 - val_accuracy: 0.7092 109/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 413us/step - loss: 2.3524 - accuracy: 0.6132 - val_loss: 2.0538 - val_accuracy: 0.7120 110/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 412us/step - loss: 2.2083 - accuracy: 0.6440 - val_loss: 2.0058 - val_accuracy: 0.7246 111/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 2.2500 - accuracy: 0.6318 - val_loss: 1.9156 - val_accuracy: 0.7410 112/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 398us/step - loss: 2.1819 - accuracy: 0.6530 - val_loss: 1.8126 - val_accuracy: 0.7542 113/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 2.0875 - accuracy: 0.6602 - val_loss: 1.8725 - val_accuracy: 0.7480 114/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 403us/step - loss: 2.0695 - accuracy: 0.6694 - val_loss: 1.7876 - val_accuracy: 0.7560 115/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 1.9872 - accuracy: 0.6808 - val_loss: 1.7615 - val_accuracy: 0.7638 116/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 2.0265 - accuracy: 0.6764 - val_loss: 1.8029 - val_accuracy: 0.7500 117/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 395us/step - loss: 1.9593 - accuracy: 0.6968 - val_loss: 1.7222 - val_accuracy: 0.7694 118/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 405us/step - loss: 1.9793 - accuracy: 0.6876 - val_loss: 1.7054 - val_accuracy: 0.7854 119/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 1.8934 - accuracy: 0.6984 - val_loss: 1.6764 - val_accuracy: 0.7760 120/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 411us/step - loss: 1.8731 - accuracy: 0.7044 - val_loss: 1.6600 - val_accuracy: 0.7762 121/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 413us/step - loss: 1.8633 - accuracy: 0.7042 - val_loss: 1.5655 - val_accuracy: 0.7950 122/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 400us/step - loss: 1.8155 - accuracy: 0.7212 - val_loss: 1.5577 - val_accuracy: 0.8050 123/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 1.7936 - accuracy: 0.7226 - val_loss: 1.5797 - val_accuracy: 0.7924 124/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 400us/step - loss: 1.6727 - accuracy: 0.7428 - val_loss: 1.4695 - val_accuracy: 0.8170 125/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 401us/step - loss: 1.6898 - accuracy: 0.7436 - val_loss: 1.4540 - val_accuracy: 0.8192 126/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 405us/step - loss: 1.6531 - accuracy: 0.7418 - val_loss: 1.4008 - val_accuracy: 0.8206 127/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 399us/step - loss: 1.6137 - accuracy: 0.7506 - val_loss: 1.3895 - val_accuracy: 0.8118 128/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 1.6290 - accuracy: 0.7560 - val_loss: 1.3585 - val_accuracy: 0.8338 129/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 401us/step - loss: 1.6117 - accuracy: 0.7524 - val_loss: 1.2903 - val_accuracy: 0.8374 130/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 411us/step - loss: 1.5511 - accuracy: 0.7594 - val_loss: 1.2891 - val_accuracy: 0.8316 131/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 396us/step - loss: 1.5014 - accuracy: 0.7786 - val_loss: 1.3013 - val_accuracy: 0.8308 132/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 416us/step - loss: 1.4592 - accuracy: 0.7834 - val_loss: 1.2238 - val_accuracy: 0.8452 133/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 415us/step - loss: 1.4994 - accuracy: 0.7738 - val_loss: 1.1824 - val_accuracy: 0.8542 134/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 1.4425 - accuracy: 0.7844 - val_loss: 1.1497 - val_accuracy: 0.8548 135/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 1.4319 - accuracy: 0.7870 - val_loss: 1.1830 - val_accuracy: 0.8530 136/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 405us/step - loss: 1.3450 - accuracy: 0.8006 - val_loss: 1.1152 - val_accuracy: 0.8616 137/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 408us/step - loss: 1.4073 - accuracy: 0.7928 - val_loss: 1.1236 - val_accuracy: 0.8584 138/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 413us/step - loss: 1.3359 - accuracy: 0.8014 - val_loss: 1.1054 - val_accuracy: 0.8554 139/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 420us/step - loss: 1.3105 - accuracy: 0.8080 - val_loss: 1.0732 - val_accuracy: 0.8714 140/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 1.2528 - accuracy: 0.8166 - val_loss: 1.1127 - val_accuracy: 0.8648 141/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 412us/step - loss: 1.2472 - accuracy: 0.8178 - val_loss: 1.0218 - val_accuracy: 0.8784 142/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 420us/step - loss: 1.2278 - accuracy: 0.8228 - val_loss: 0.9639 - val_accuracy: 0.8844 143/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 1.2285 - accuracy: 0.8170 - val_loss: 1.0322 - val_accuracy: 0.8720 144/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 407us/step - loss: 1.1853 - accuracy: 0.8240 - val_loss: 0.8959 - val_accuracy: 0.8954 145/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 1.1545 - accuracy: 0.8328 - val_loss: 0.9459 - val_accuracy: 0.8820 146/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 409us/step - loss: 1.1736 - accuracy: 0.8298 - val_loss: 0.9650 - val_accuracy: 0.8752 147/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 408us/step - loss: 1.0828 - accuracy: 0.8468 - val_loss: 0.8727 - val_accuracy: 0.8946 148/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 418us/step - loss: 1.0743 - accuracy: 0.8454 - val_loss: 0.8732 - val_accuracy: 0.8952 149/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 397us/step - loss: 1.1223 - accuracy: 0.8380 - val_loss: 0.8399 - val_accuracy: 0.8968 150/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 409us/step - loss: 1.0736 - accuracy: 0.8504 - val_loss: 0.8629 - val_accuracy: 0.8986 151/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 404us/step - loss: 1.0527 - accuracy: 0.8480 - val_loss: 0.7800 - val_accuracy: 0.9100 152/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 1.0271 - accuracy: 0.8550 - val_loss: 0.8736 - val_accuracy: 0.8946 153/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 401us/step - loss: 0.9801 - accuracy: 0.8648 - val_loss: 0.7773 - val_accuracy: 0.9082 154/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 398us/step - loss: 0.9624 - accuracy: 0.8606 - val_loss: 0.7587 - val_accuracy: 0.9122 155/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 412us/step - loss: 0.9653 - accuracy: 0.8668 - val_loss: 0.7569 - val_accuracy: 0.9102 156/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 409us/step - loss: 0.9171 - accuracy: 0.8768 - val_loss: 0.7783 - val_accuracy: 0.9008 157/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 410us/step - loss: 0.9824 - accuracy: 0.8598 - val_loss: 0.7716 - val_accuracy: 0.9114 158/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 412us/step - loss: 0.9050 - accuracy: 0.8720 - val_loss: 0.6798 - val_accuracy: 0.9246 159/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 419us/step - loss: 0.8868 - accuracy: 0.8782 - val_loss: 0.7305 - val_accuracy: 0.9134 160/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.8656 - accuracy: 0.8774 - val_loss: 0.6773 - val_accuracy: 0.9174 161/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 412us/step - loss: 0.9094 - accuracy: 0.8732 - val_loss: 0.7563 - val_accuracy: 0.9078 162/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 408us/step - loss: 0.8625 - accuracy: 0.8852 - val_loss: 0.6772 - val_accuracy: 0.9216 163/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 411us/step - loss: 0.9143 - accuracy: 0.8728 - val_loss: 0.7034 - val_accuracy: 0.9154 164/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 417us/step - loss: 0.8791 - accuracy: 0.8746 - val_loss: 0.7079 - val_accuracy: 0.9188 165/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 410us/step - loss: 0.8265 - accuracy: 0.8880 - val_loss: 0.6240 - val_accuracy: 0.9240 166/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.7777 - accuracy: 0.8974 - val_loss: 0.6988 - val_accuracy: 0.9150 167/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 403us/step - loss: 0.8305 - accuracy: 0.8876 - val_loss: 0.6100 - val_accuracy: 0.9274 168/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 407us/step - loss: 0.7893 - accuracy: 0.8900 - val_loss: 0.6668 - val_accuracy: 0.9140 169/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.8148 - accuracy: 0.8926 - val_loss: 0.6679 - val_accuracy: 0.9230 170/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 419us/step - loss: 0.7757 - accuracy: 0.8962 - val_loss: 0.6510 - val_accuracy: 0.9218 171/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 403us/step - loss: 0.7666 - accuracy: 0.8958 - val_loss: 0.5848 - val_accuracy: 0.9266 172/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 417us/step - loss: 0.7406 - accuracy: 0.8994 - val_loss: 0.5751 - val_accuracy: 0.9296 173/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 413us/step - loss: 0.7387 - accuracy: 0.8992 - val_loss: 0.5893 - val_accuracy: 0.9270 174/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 402us/step - loss: 0.7045 - accuracy: 0.9026 - val_loss: 0.5447 - val_accuracy: 0.9280 175/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 417us/step - loss: 0.7532 - accuracy: 0.8990 - val_loss: 0.5440 - val_accuracy: 0.9338 176/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 410us/step - loss: 0.7437 - accuracy: 0.9028 - val_loss: 0.5865 - val_accuracy: 0.9252 177/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 416us/step - loss: 0.6795 - accuracy: 0.9090 - val_loss: 0.5411 - val_accuracy: 0.9344 178/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 411us/step - loss: 0.7007 - accuracy: 0.9012 - val_loss: 0.5581 - val_accuracy: 0.9260 179/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 421us/step - loss: 0.6832 - accuracy: 0.9086 - val_loss: 0.5115 - val_accuracy: 0.9390 180/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 413us/step - loss: 0.6835 - accuracy: 0.9100 - val_loss: 0.5173 - val_accuracy: 0.9446 181/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 415us/step - loss: 0.6935 - accuracy: 0.9046 - val_loss: 0.5112 - val_accuracy: 0.9412 182/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 410us/step - loss: 0.7066 - accuracy: 0.9000 - val_loss: 0.5668 - val_accuracy: 0.9314 183/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 429us/step - loss: 0.6225 - accuracy: 0.9148 - val_loss: 0.5051 - val_accuracy: 0.9388 184/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.6505 - accuracy: 0.9174 - val_loss: 0.5356 - val_accuracy: 0.9334 185/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 425us/step - loss: 0.6816 - accuracy: 0.9102 - val_loss: 0.4791 - val_accuracy: 0.9428 186/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 407us/step - loss: 0.6619 - accuracy: 0.9106 - val_loss: 0.5131 - val_accuracy: 0.9410 187/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 407us/step - loss: 0.6706 - accuracy: 0.9084 - val_loss: 0.5034 - val_accuracy: 0.9348 188/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 413us/step - loss: 0.6367 - accuracy: 0.9156 - val_loss: 0.4722 - val_accuracy: 0.9390 189/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 412us/step - loss: 0.6209 - accuracy: 0.9154 - val_loss: 0.4924 - val_accuracy: 0.9394 190/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 417us/step - loss: 0.5862 - accuracy: 0.9240 - val_loss: 0.4789 - val_accuracy: 0.9396 191/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 419us/step - loss: 0.6070 - accuracy: 0.9210 - val_loss: 0.4566 - val_accuracy: 0.9392 192/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 428us/step - loss: 0.5869 - accuracy: 0.9196 - val_loss: 0.4740 - val_accuracy: 0.9422 193/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 428us/step - loss: 0.6011 - accuracy: 0.9222 - val_loss: 0.4707 - val_accuracy: 0.9468 194/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 412us/step - loss: 0.5858 - accuracy: 0.9198 - val_loss: 0.4336 - val_accuracy: 0.9468 195/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.5947 - accuracy: 0.9202 - val_loss: 0.4398 - val_accuracy: 0.9484 196/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.5615 - accuracy: 0.9256 - val_loss: 0.4687 - val_accuracy: 0.9408 197/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 420us/step - loss: 0.5673 - accuracy: 0.9236 - val_loss: 0.4215 - val_accuracy: 0.9478 198/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 415us/step - loss: 0.5637 - accuracy: 0.9294 - val_loss: 0.4343 - val_accuracy: 0.9456 199/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 417us/step - loss: 0.6137 - accuracy: 0.9172 - val_loss: 0.4341 - val_accuracy: 0.9462 200/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.6006 - accuracy: 0.9218 - val_loss: 0.3884 - val_accuracy: 0.9512 201/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 417us/step - loss: 0.5635 - accuracy: 0.9268 - val_loss: 0.4230 - val_accuracy: 0.9480 202/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 418us/step - loss: 0.5658 - accuracy: 0.9256 - val_loss: 0.4512 - val_accuracy: 0.9440 203/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 416us/step - loss: 0.6056 - accuracy: 0.9200 - val_loss: 0.4215 - val_accuracy: 0.9438 204/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.5344 - accuracy: 0.9278 - val_loss: 0.4380 - val_accuracy: 0.9458 205/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 426us/step - loss: 0.5138 - accuracy: 0.9304 - val_loss: 0.3961 - val_accuracy: 0.9506 206/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 418us/step - loss: 0.5704 - accuracy: 0.9264 - val_loss: 0.3948 - val_accuracy: 0.9486 207/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 436us/step - loss: 0.5551 - accuracy: 0.9248 - val_loss: 0.3943 - val_accuracy: 0.9526 208/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.4828 - accuracy: 0.9366 - val_loss: 0.4855 - val_accuracy: 0.9334 209/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 416us/step - loss: 0.4814 - accuracy: 0.9376 - val_loss: 0.3574 - val_accuracy: 0.9580 210/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.4560 - accuracy: 0.9418 - val_loss: 0.4189 - val_accuracy: 0.9474 211/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 417us/step - loss: 0.5182 - accuracy: 0.9278 - val_loss: 0.3576 - val_accuracy: 0.9526 212/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 416us/step - loss: 0.4829 - accuracy: 0.9360 - val_loss: 0.3724 - val_accuracy: 0.9542 213/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.5509 - accuracy: 0.9252 - val_loss: 0.4110 - val_accuracy: 0.9492 214/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 431us/step - loss: 0.5758 - accuracy: 0.9202 - val_loss: 0.4106 - val_accuracy: 0.9498 215/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 440us/step - loss: 0.4821 - accuracy: 0.9340 - val_loss: 0.3331 - val_accuracy: 0.9600 216/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 430us/step - loss: 0.4816 - accuracy: 0.9352 - val_loss: 0.3872 - val_accuracy: 0.9562 217/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 418us/step - loss: 0.4919 - accuracy: 0.9354 - val_loss: 0.3316 - val_accuracy: 0.9600 218/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 424us/step - loss: 0.4545 - accuracy: 0.9382 - val_loss: 0.3393 - val_accuracy: 0.9538 219/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 426us/step - loss: 0.4772 - accuracy: 0.9376 - val_loss: 0.3637 - val_accuracy: 0.9542 220/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 423us/step - loss: 0.4726 - accuracy: 0.9400 - val_loss: 0.3490 - val_accuracy: 0.9598 221/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 440us/step - loss: 0.4793 - accuracy: 0.9378 - val_loss: 0.3734 - val_accuracy: 0.9488 222/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 415us/step - loss: 0.5026 - accuracy: 0.9352 - val_loss: 0.3776 - val_accuracy: 0.9526 223/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.4759 - accuracy: 0.9306 - val_loss: 0.3640 - val_accuracy: 0.9504 224/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 424us/step - loss: 0.4789 - accuracy: 0.9378 - val_loss: 0.3393 - val_accuracy: 0.9588 225/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 431us/step - loss: 0.4675 - accuracy: 0.9372 - val_loss: 0.3774 - val_accuracy: 0.9558 226/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.5579 - accuracy: 0.9288 - val_loss: 0.3467 - val_accuracy: 0.9576 227/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 446us/step - loss: 0.4209 - accuracy: 0.9410 - val_loss: 0.3965 - val_accuracy: 0.9468 228/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 447us/step - loss: 0.4648 - accuracy: 0.9406 - val_loss: 0.3432 - val_accuracy: 0.9578 229/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 438us/step - loss: 0.5176 - accuracy: 0.9314 - val_loss: 0.3913 - val_accuracy: 0.9500 230/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 435us/step - loss: 0.4967 - accuracy: 0.9360 - val_loss: 0.3768 - val_accuracy: 0.9560 231/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 424us/step - loss: 0.4823 - accuracy: 0.9396 - val_loss: 0.3141 - val_accuracy: 0.9628 232/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 424us/step - loss: 0.4552 - accuracy: 0.9438 - val_loss: 0.3027 - val_accuracy: 0.9600 233/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 423us/step - loss: 0.4230 - accuracy: 0.9444 - val_loss: 0.3282 - val_accuracy: 0.9578 234/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.4708 - accuracy: 0.9340 - val_loss: 0.3755 - val_accuracy: 0.9472 235/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.4087 - accuracy: 0.9416 - val_loss: 0.3489 - val_accuracy: 0.9550 236/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 433us/step - loss: 0.4523 - accuracy: 0.9386 - val_loss: 0.3294 - val_accuracy: 0.9594 237/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 431us/step - loss: 0.4482 - accuracy: 0.9392 - val_loss: 0.3893 - val_accuracy: 0.9538 238/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 434us/step - loss: 0.4506 - accuracy: 0.9400 - val_loss: 0.3563 - val_accuracy: 0.9530 239/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 406us/step - loss: 0.4631 - accuracy: 0.9388 - val_loss: 0.3517 - val_accuracy: 0.9570 240/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 421us/step - loss: 0.4394 - accuracy: 0.9492 - val_loss: 0.2732 - val_accuracy: 0.9688 241/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 428us/step - loss: 0.4131 - accuracy: 0.9440 - val_loss: 0.3108 - val_accuracy: 0.9628 242/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 420us/step - loss: 0.4081 - accuracy: 0.9440 - val_loss: 0.3488 - val_accuracy: 0.9532 243/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 415us/step - loss: 0.4218 - accuracy: 0.9440 - val_loss: 0.3199 - val_accuracy: 0.9612 244/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 414us/step - loss: 0.4196 - accuracy: 0.9452 - val_loss: 0.3118 - val_accuracy: 0.9610 245/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.3984 - accuracy: 0.9474 - val_loss: 0.3152 - val_accuracy: 0.9612 246/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.4311 - accuracy: 0.9400 - val_loss: 0.2962 - val_accuracy: 0.9634 247/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 424us/step - loss: 0.4427 - accuracy: 0.9402 - val_loss: 0.3643 - val_accuracy: 0.9556 248/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 420us/step - loss: 0.4436 - accuracy: 0.9410 - val_loss: 0.3296 - val_accuracy: 0.9612 249/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 423us/step - loss: 0.4356 - accuracy: 0.9440 - val_loss: 0.3044 - val_accuracy: 0.9630 250/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 428us/step - loss: 0.3828 - accuracy: 0.9490 - val_loss: 0.2771 - val_accuracy: 0.9684 251/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 435us/step - loss: 0.4115 - accuracy: 0.9416 - val_loss: 0.3557 - val_accuracy: 0.9580 252/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 432us/step - loss: 0.3686 - accuracy: 0.9490 - val_loss: 0.3319 - val_accuracy: 0.9634 253/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 435us/step - loss: 0.4639 - accuracy: 0.9432 - val_loss: 0.2853 - val_accuracy: 0.9638 254/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 437us/step - loss: 0.4792 - accuracy: 0.9362 - val_loss: 0.3423 - val_accuracy: 0.9564 255/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 426us/step - loss: 0.4066 - accuracy: 0.9480 - val_loss: 0.3347 - val_accuracy: 0.9576 256/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 428us/step - loss: 0.4724 - accuracy: 0.9376 - val_loss: 0.2919 - val_accuracy: 0.9658 257/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 432us/step - loss: 0.4215 - accuracy: 0.9410 - val_loss: 0.2725 - val_accuracy: 0.9642 258/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 425us/step - loss: 0.4419 - accuracy: 0.9464 - val_loss: 0.3282 - val_accuracy: 0.9636 259/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 420us/step - loss: 0.4133 - accuracy: 0.9474 - val_loss: 0.2633 - val_accuracy: 0.9680 260/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 438us/step - loss: 0.4547 - accuracy: 0.9410 - val_loss: 0.3277 - val_accuracy: 0.9632 261/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 432us/step - loss: 0.3550 - accuracy: 0.9518 - val_loss: 0.2824 - val_accuracy: 0.9662 262/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 432us/step - loss: 0.4278 - accuracy: 0.9406 - val_loss: 0.2624 - val_accuracy: 0.9686 263/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 437us/step - loss: 0.4622 - accuracy: 0.9398 - val_loss: 0.3406 - val_accuracy: 0.9556 264/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 426us/step - loss: 0.3704 - accuracy: 0.9546 - val_loss: 0.3197 - val_accuracy: 0.9664 265/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 421us/step - loss: 0.3736 - accuracy: 0.9492 - val_loss: 0.3204 - val_accuracy: 0.9618 266/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.3926 - accuracy: 0.9480 - val_loss: 0.3420 - val_accuracy: 0.9558 267/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 426us/step - loss: 0.3492 - accuracy: 0.9552 - val_loss: 0.3409 - val_accuracy: 0.9594 268/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 424us/step - loss: 0.4315 - accuracy: 0.9456 - val_loss: 0.3871 - val_accuracy: 0.9564 269/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 438us/step - loss: 0.4241 - accuracy: 0.9416 - val_loss: 0.3569 - val_accuracy: 0.9562 270/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 433us/step - loss: 0.4078 - accuracy: 0.9438 - val_loss: 0.2925 - val_accuracy: 0.9646 271/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 419us/step - loss: 0.3924 - accuracy: 0.9468 - val_loss: 0.3646 - val_accuracy: 0.9536 272/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 433us/step - loss: 0.3643 - accuracy: 0.9520 - val_loss: 0.3494 - val_accuracy: 0.9608 273/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 418us/step - loss: 0.3252 - accuracy: 0.9564 - val_loss: 0.2771 - val_accuracy: 0.9666 274/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.4002 - accuracy: 0.9480 - val_loss: 0.3212 - val_accuracy: 0.9644 275/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.4312 - accuracy: 0.9450 - val_loss: 0.3275 - val_accuracy: 0.9604 276/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 431us/step - loss: 0.4204 - accuracy: 0.9418 - val_loss: 0.2861 - val_accuracy: 0.9620 277/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 434us/step - loss: 0.4327 - accuracy: 0.9462 - val_loss: 0.2808 - val_accuracy: 0.9636 278/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 429us/step - loss: 0.4367 - accuracy: 0.9428 - val_loss: 0.3191 - val_accuracy: 0.9568 279/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 434us/step - loss: 0.3983 - accuracy: 0.9492 - val_loss: 0.3552 - val_accuracy: 0.9592 280/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 432us/step - loss: 0.3887 - accuracy: 0.9472 - val_loss: 0.2665 - val_accuracy: 0.9658 281/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 435us/step - loss: 0.3997 - accuracy: 0.9486 - val_loss: 0.2733 - val_accuracy: 0.9666 282/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 435us/step - loss: 0.3756 - accuracy: 0.9484 - val_loss: 0.2950 - val_accuracy: 0.9626 283/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 430us/step - loss: 0.3351 - accuracy: 0.9502 - val_loss: 0.2685 - val_accuracy: 0.9652 284/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 431us/step - loss: 0.3385 - accuracy: 0.9566 - val_loss: 0.2542 - val_accuracy: 0.9664 285/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 446us/step - loss: 0.3217 - accuracy: 0.9572 - val_loss: 0.3173 - val_accuracy: 0.9628 286/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 421us/step - loss: 0.3968 - accuracy: 0.9494 - val_loss: 0.3565 - val_accuracy: 0.9536 287/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 434us/step - loss: 0.4626 - accuracy: 0.9386 - val_loss: 0.3295 - val_accuracy: 0.9590 288/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 436us/step - loss: 0.3845 - accuracy: 0.9450 - val_loss: 0.3039 - val_accuracy: 0.9630 289/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.4553 - accuracy: 0.9394 - val_loss: 0.2742 - val_accuracy: 0.9644 290/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 423us/step - loss: 0.4083 - accuracy: 0.9486 - val_loss: 0.2771 - val_accuracy: 0.9692 291/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 447us/step - loss: 0.3854 - accuracy: 0.9468 - val_loss: 0.3103 - val_accuracy: 0.9640 292/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 429us/step - loss: 0.3969 - accuracy: 0.9484 - val_loss: 0.2863 - val_accuracy: 0.9642 293/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 435us/step - loss: 0.3743 - accuracy: 0.9508 - val_loss: 0.2994 - val_accuracy: 0.9654 294/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 431us/step - loss: 0.3768 - accuracy: 0.9484 - val_loss: 0.3063 - val_accuracy: 0.9620 295/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 422us/step - loss: 0.3746 - accuracy: 0.9496 - val_loss: 0.3148 - val_accuracy: 0.9662 296/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 421us/step - loss: 0.3738 - accuracy: 0.9516 - val_loss: 0.2518 - val_accuracy: 0.9680 297/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 427us/step - loss: 0.3634 - accuracy: 0.9562 - val_loss: 0.3113 - val_accuracy: 0.9608 298/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 433us/step - loss: 0.3427 - accuracy: 0.9530 - val_loss: 0.3012 - val_accuracy: 0.9556 299/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 437us/step - loss: 0.3641 - accuracy: 0.9508 - val_loss: 0.2865 - val_accuracy: 0.9550 300/300 Train on 5000 samples, validate on 5000 samples Epoch 1/1 5000/5000 [==============================] - 2s 439us/step - loss: 0.3436 - accuracy: 0.9544 - val_loss: 0.2552 - val_accuracy: 0.9686 100000/100000 [==============================] - 10s 101us/step Train set: Loss=0.2559 ; Accuracy=96.9% 100000/100000 [==============================] - 10s 101us/step Test set: Loss=14.9620 ; Accuracy=1.8%
embeddings = Embeddings(read_embeddings('embeddings.csv'))
state_space_size = embeddings.size() * history_length
action_space_size = embeddings.size() * ra_length
environment = Environment(data, embeddings, alpha, gamma, fixed_length)
tf.reset_default_graph() # For multiple consecutive executions
sess = tf.Session()
# '1: Initialize actor network f_θ^π and critic network Q(s, a|θ^µ) with random weights'
actor = Actor(sess, state_space_size, action_space_size, batch_size, ra_length, history_length, embeddings.size(), tau, actor_lr)
critic = Critic(sess, state_space_size, action_space_size, history_length, embeddings.size(), tau, critic_lr)
#collapse-output
train(sess, environment, actor, critic, embeddings, history_length, ra_length, buffer_size, batch_size, discount_factor, nb_episodes, filename_summary)
WARNING:tensorflow:From <ipython-input-19-1a5cd2de5f02>:69: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From <ipython-input-19-1a5cd2de5f02>:70: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version. Instructions for updating: Please use `keras.layers.RNN(cell)`, which is equivalent to this API WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use `layer.add_weight` method instead. WARNING:tensorflow:From <ipython-input-19-1a5cd2de5f02>:40: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. Episode 1/100 Reward=552 Time=4s No replay Episode 2/100 Reward=551 Time=52s Loss=2082.7095 Episode 3/100 Reward=551 Time=68s Loss=123.3409 Episode 4/100 Reward=551 Time=67s Loss=67.4848 Episode 5/100 Reward=551 Time=67s Loss=44.0779 Episode 6/100 Reward=552 Time=67s Loss=40.3507 Episode 7/100 Reward=552 Time=67s Loss=29.4237 Episode 8/100 Reward=552 Time=68s Loss=26.5713 Episode 9/100 Reward=552 Time=68s Loss=27.1646 Episode 10/100 Reward=552 Time=68s Loss=25.9928 Episode 11/100 Reward=552 Time=67s Loss=22.3598 Episode 12/100 Reward=552 Time=68s Loss=18.7026 Episode 13/100 Reward=552 Time=67s Loss=17.8249 Episode 14/100 Reward=551 Time=67s Loss=19.8687 Episode 15/100 Reward=552 Time=68s Loss=20.9238 Episode 16/100 Reward=551 Time=67s Loss=19.9583 Episode 17/100 Reward=551 Time=68s Loss=20.2933 Episode 18/100 Reward=551 Time=67s Loss=19.9462 Episode 19/100 Reward=551 Time=67s Loss=24.3696 Episode 20/100 Reward=551 Time=67s Loss=25.6828 Episode 21/100 Reward=551 Time=67s Loss=28.5111 Episode 22/100 Reward=551 Time=67s Loss=29.4505 Episode 23/100 Reward=551 Time=67s Loss=27.2863 Episode 24/100 Reward=551 Time=68s Loss=28.4667 Episode 25/100 Reward=551 Time=67s Loss=26.7666 Episode 26/100 Reward=551 Time=67s Loss=26.2378 Episode 27/100 Reward=551 Time=67s Loss=25.0391 Episode 28/100 Reward=551 Time=67s Loss=25.0170 Episode 29/100 Reward=551 Time=68s Loss=22.9655 Episode 30/100 Reward=551 Time=68s Loss=23.8730 Episode 31/100 Reward=551 Time=67s Loss=20.4020 Episode 32/100 Reward=551 Time=67s Loss=22.4662 Episode 33/100 Reward=551 Time=67s Loss=23.8943 Episode 34/100 Reward=551 Time=67s Loss=21.1020 Episode 35/100 Reward=551 Time=67s Loss=22.8528 Episode 36/100 Reward=551 Time=67s Loss=21.2307 Episode 37/100 Reward=551 Time=67s Loss=19.7094 Episode 38/100 Reward=551 Time=67s Loss=21.4233 Episode 39/100 Reward=551 Time=67s Loss=23.6903 Episode 40/100 Reward=551 Time=67s Loss=24.4852 Episode 41/100 Reward=551 Time=67s Loss=25.7120 Episode 42/100 Reward=551 Time=67s Loss=21.7722 Episode 43/100 Reward=551 Time=67s Loss=20.9898 Episode 44/100 Reward=551 Time=67s Loss=20.6604 Episode 45/100 Reward=551 Time=68s Loss=20.8646 Episode 46/100 Reward=551 Time=67s Loss=19.4622 Episode 47/100 Reward=551 Time=67s Loss=20.4751 Episode 48/100 Reward=551 Time=67s Loss=18.9989 Episode 49/100 Reward=551 Time=67s Loss=17.7407 Episode 50/100 Reward=551 Time=67s Loss=17.3576 Episode 51/100 Reward=551 Time=68s Loss=17.2397 Episode 52/100 Reward=552 Time=67s Loss=16.5722 Episode 53/100 Reward=552 Time=67s Loss=15.5511 Episode 54/100 Reward=552 Time=67s Loss=15.7651 Episode 55/100 Reward=552 Time=67s Loss=14.0308 Episode 56/100 Reward=552 Time=67s Loss=14.4518 Episode 57/100 Reward=551 Time=67s Loss=15.9018 Episode 58/100 Reward=552 Time=67s Loss=14.2520 Episode 59/100 Reward=551 Time=67s Loss=14.2282 Episode 60/100 Reward=551 Time=67s Loss=14.1576 Episode 61/100 Reward=551 Time=67s Loss=13.1366 Episode 62/100 Reward=551 Time=67s Loss=13.7383 Episode 63/100 Reward=551 Time=67s Loss=12.3095 Episode 64/100 Reward=551 Time=68s Loss=11.7993 Episode 65/100 Reward=551 Time=67s Loss=12.1072 Episode 66/100 Reward=551 Time=67s Loss=12.8614 Episode 67/100 Reward=552 Time=67s Loss=11.4739 Episode 68/100 Reward=552 Time=67s Loss=12.6560 Episode 69/100 Reward=552 Time=67s Loss=12.8773 Episode 70/100 Reward=552 Time=67s Loss=11.7954 Episode 71/100 Reward=552 Time=67s Loss=11.2212 Episode 72/100 Reward=552 Time=67s Loss=12.3400 Episode 73/100 Reward=552 Time=67s Loss=12.5248 Episode 74/100 Reward=552 Time=67s Loss=11.2045 Episode 75/100 Reward=552 Time=67s Loss=11.1089 Episode 76/100 Reward=552 Time=67s Loss=11.6253 Episode 77/100 Reward=552 Time=67s Loss=10.9183 Episode 78/100 Reward=552 Time=67s Loss=9.2532 Episode 79/100 Reward=552 Time=67s Loss=10.4258 Episode 80/100 Reward=552 Time=67s Loss=10.0044 Episode 81/100 Reward=552 Time=67s Loss=10.4150 Episode 82/100 Reward=552 Time=67s Loss=10.9766 Episode 83/100 Reward=551 Time=67s Loss=8.8571 Episode 84/100 Reward=551 Time=67s Loss=10.5467 Episode 85/100 Reward=551 Time=67s Loss=8.8356 Episode 86/100 Reward=551 Time=67s Loss=11.0192 Episode 87/100 Reward=551 Time=67s Loss=9.3348 Episode 88/100 Reward=551 Time=67s Loss=11.0042 Episode 89/100 Reward=551 Time=67s Loss=8.5757 Episode 90/100 Reward=551 Time=67s Loss=8.8545 Episode 91/100 Reward=551 Time=67s Loss=10.0286 Episode 92/100 Reward=551 Time=67s Loss=10.2471 Episode 93/100 Reward=551 Time=67s Loss=9.3180 Episode 94/100 Reward=551 Time=68s Loss=8.2303 Episode 95/100 Reward=551 Time=68s Loss=9.1910 Episode 96/100 Reward=551 Time=67s Loss=8.2708 Episode 97/100 Reward=551 Time=67s Loss=8.1502 Episode 98/100 Reward=551 Time=67s Loss=8.1103 Episode 99/100 Reward=551 Time=67s Loss=8.2638 Episode 100/100 Reward=551 Time=67s Loss=8.2967
dict_embeddings = {}
for i, item in enumerate(embeddings.get_embedding_vector()):
str_item = str(item)
assert(str_item not in dict_embeddings)
dict_embeddings[str_item] = i
def state_to_items(state, actor, ra_length, embeddings, dict_embeddings, target=False):
return [dict_embeddings[str(action)]
for action in actor.get_recommendation_list(ra_length, np.array(state).reshape(1, -1), embeddings, target).reshape(ra_length, embeddings.size())]
def test_actor(actor, test_df, embeddings, dict_embeddings, ra_length, history_length, target=False, nb_rounds=1):
ratings = []
unknown = 0
random_seen = []
for _ in range(nb_rounds):
for i in range(len(test_df)):
history_sample = list(test_df[i].sample(history_length)['itemId'])
recommendation = state_to_items(embeddings.embed(history_sample), actor, ra_length, embeddings, dict_embeddings, target)
for item in recommendation:
l = list(test_df[i].loc[test_df[i]['itemId'] == item]['rating'])
assert(len(l) < 2)
if len(l) == 0:
unknown += 1
else:
ratings.append(l[0])
for item in history_sample:
random_seen.append(list(test_df[i].loc[test_df[i]['itemId'] == item]['rating'])[0])
return ratings, unknown, random_seen
ratings, unknown, random_seen = test_actor(actor, dg.train, embeddings, dict_embeddings, ra_length, history_length, target=False, nb_rounds=10)
print('%0.1f%% unknown' % (100 * unknown / (len(ratings) + unknown)))
91.5% unknown
plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.hist(ratings)
plt.title('Predictions ; Mean = %.4f' % (np.mean(ratings)))
plt.subplot(1, 2, 2)
plt.hist(random_seen)
plt.title('Random ; Mean = %.4f' % (np.mean(random_seen)))
plt.show()
ratings, unknown, random_seen = test_actor(actor, dg.train, embeddings, dict_embeddings, ra_length, history_length, target=True, nb_rounds=10)
print('%0.1f%% unknown' % (100 * unknown / (len(ratings) + unknown)))
91.5% unknown
plt.figure(figsize=(12,6))
plt.subplot(1, 2, 1)
plt.hist(ratings)
plt.title('Predictions ; Mean = %.4f' % (np.mean(ratings)))
plt.subplot(1, 2, 2)
plt.hist(random_seen)
plt.title('Random ; Mean = %.4f' % (np.mean(random_seen)))
plt.show()