This python function illustrates two different training methods: batch and stochastic training. For each model, we will use a regression model that predicts one model variable.
We start by loading the necessary libraries and resetting the computational graph.
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops
ops.reset_default_graph()
We start a computational graph session.
sess = tf.Session()
The data we will create is 100 random samples from a Normal(mean = 1, sd = 0.1)
. The target will be an array of size 100 filled with the constant 10.0.
We also create the necessary placeholders in the graph for the data and target. Note that we use a shape of [1]
for stochastic training.
x_vals = np.random.normal(1, 0.1, 100)
y_vals = np.repeat(10., 100)
x_data = tf.placeholder(shape=[1], dtype=tf.float32)
y_target = tf.placeholder(shape=[1], dtype=tf.float32)
We create the one variable in the graph, A
. We then create the model operation, which is just the multiplication of the input data and A
.
# Create variable (one model parameter = A)
A = tf.Variable(tf.random_normal(shape=[1]))
# Add operation to graph
my_output = tf.multiply(x_data, A)
For this, we choose the L2 loss. We can easily choose the L1 loss by replacing tf.square()
with tf.abs()
.
# Add L2 loss operation to graph
loss = tf.square(my_output - y_target)
For the optimization function, we will choose the standard Gradient Descent Algorithm with a learning rate of 0.02
. We also add and run a variable initialization operation.
# Create Optimizer
my_opt = tf.train.GradientDescentOptimizer(0.02)
train_step = my_opt.minimize(loss)
# Initialize variables
init = tf.global_variables_initializer()
sess.run(init)
We run the training step for 100 iterations and print off the value of A
and the loss every 5 iterations.
loss_stochastic = []
# Run Loop
for i in range(100):
rand_index = np.random.choice(100)
rand_x = [x_vals[rand_index]]
rand_y = [y_vals[rand_index]]
sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
if (i+1)%5==0:
print('Step #' + str(i+1) + ' A = ' + str(sess.run(A)))
temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
print('Loss = ' + str(temp_loss))
loss_stochastic.append(temp_loss)
Step #5 A = [ 2.15584159] Loss = [ 59.26628113] Step #10 A = [ 3.6463747] Loss = [ 38.98156357] Step #15 A = [ 4.82094002] Loss = [ 22.86421394] Step #20 A = [ 5.77139997] Loss = [ 19.58551025] Step #25 A = [ 6.52262354] Loss = [ 9.57485962] Step #30 A = [ 7.12625837] Loss = [ 6.36455345] Step #35 A = [ 7.63606024] Loss = [ 2.55103087] Step #40 A = [ 7.93105412] Loss = [ 2.34990478] Step #45 A = [ 8.30565166] Loss = [ 2.08385348] Step #50 A = [ 8.55292988] Loss = [ 6.9521904] Step #55 A = [ 8.81163883] Loss = [ 1.32351923] Step #60 A = [ 8.95606041] Loss = [ 6.76757669] Step #65 A = [ 9.12868977] Loss = [ 0.61686873] Step #70 A = [ 9.18059349] Loss = [ 0.01681929] Step #75 A = [ 9.22375107] Loss = [ 0.00448726] Step #80 A = [ 9.28870773] Loss = [ 0.08788975] Step #85 A = [ 9.44317436] Loss = [ 0.01499911] Step #90 A = [ 9.61464977] Loss = [ 0.00621458] Step #95 A = [ 9.65019512] Loss = [ 0.32731441] Step #100 A = [ 9.64083195] Loss = [ 0.31338796]
# Batch Training:
# Re-initialize graph
ops.reset_default_graph()
sess = tf.Session()
For Batch training, we need to declare our batch size. The larger the batch size, the smoother the convergence will be towards the optimal value. But if the batch size is too large, the optimization algorithm may get stuck in a local minimum, where a more stochastic convergence may jump out.
Here, the we may change the batch size from 1 to 100 to see the effects of the batch size on the convergence plots at the end.
# Declare batch size
batch_size = 25
The data we will create is 100 random samples from a Normal(mean = 1, sd = 0.1)
. The target will be an array of size 100 filled with the constant 10.0.
We also create the necessary placeholders in the graph for the data and target.
Note that here, our placeholders have shape [None, 1]
, where the batch size will take the place of the None
dimension.
# Create data
x_vals = np.random.normal(1, 0.1, 100)
y_vals = np.repeat(10., 100)
x_data = tf.placeholder(shape=[None, 1], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
We create the one variable in the graph, A
. We then create the model operation, which is just the multiplication of the input data and A
.
# Create variable (one model parameter = A)
A = tf.Variable(tf.random_normal(shape=[1,1]))
# Add operation to graph
my_output = tf.matmul(x_data, A)
For this, we choose the L2 loss. We can easily choose the L1 loss by replacing tf.square()
with tf.abs()
.
# Add L2 loss operation to graph
loss = tf.reduce_mean(tf.square(my_output - y_target))
For the optimization function, we will choose the standard Gradient Descent Algorithm with a learning rate of 0.02
. We also add and run a variable initialization operation.
# Initialize variables
init = tf.global_variables_initializer()
sess.run(init)
# Create Optimizer
my_opt = tf.train.GradientDescentOptimizer(0.02)
train_step = my_opt.minimize(loss)
We run the training step for 100 iterations and print off the value of A
and the loss every 5 iterations.
Note that here we select a batch of data instead of just one data point.
loss_batch = []
# Run Loop
for i in range(100):
rand_index = np.random.choice(100, size=batch_size)
rand_x = np.transpose([x_vals[rand_index]])
rand_y = np.transpose([y_vals[rand_index]])
sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
if (i+1)%5==0:
print('Step #' + str(i+1) + ' A = ' + str(sess.run(A)))
temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
print('Loss = ' + str(temp_loss))
loss_batch.append(temp_loss)
Step #5 A = [[ 1.87009048]] Loss = 65.4902 Step #10 A = [[ 3.36440945]] Loss = 45.8059 Step #15 A = [[ 4.58703709]] Loss = 29.0102 Step #20 A = [[ 5.57683086]] Loss = 19.0908 Step #25 A = [[ 6.38111591]] Loss = 13.8178 Step #30 A = [[ 7.03755045]] Loss = 10.0696 Step #35 A = [[ 7.5764699]] Loss = 6.38239 Step #40 A = [[ 8.01234055]] Loss = 5.45151 Step #45 A = [[ 8.35609531]] Loss = 2.84672 Step #50 A = [[ 8.64307308]] Loss = 2.26105 Step #55 A = [[ 8.84179211]] Loss = 2.03782 Step #60 A = [[ 9.0474844]] Loss = 1.74324 Step #65 A = [[ 9.21524715]] Loss = 1.54201 Step #70 A = [[ 9.32061958]] Loss = 1.09765 Step #75 A = [[ 9.4463768]] Loss = 1.29265 Step #80 A = [[ 9.53134918]] Loss = 0.934568 Step #85 A = [[ 9.59794903]] Loss = 0.810933 Step #90 A = [[ 9.66680145]] Loss = 1.40375 Step #95 A = [[ 9.71764565]] Loss = 1.36738 Step #100 A = [[ 9.73819923]] Loss = 0.717119
Here is the matplotlib code to plot the loss for each.
plt.plot(range(0, 100, 5), loss_stochastic, 'b-', label='Stochastic Loss')
plt.plot(range(0, 100, 5), loss_batch, 'r--', label='Batch Loss, size=20')
plt.legend(loc='upper right', prop={'size': 11})
plt.show()