This notebook is based on the paper:
with additional insight from:
This code use the new conx layer that sits on top of Keras. Conx is designed to be simpler than Keras, more intuitive, and integrated visualizations.
Currently this code requires the TensorFlow backend, as it has a function specific to TF.
First, let's look at a specific game. We can use many, but for this demonstration we'll pick ConnectFour. There is a good code base of different games and a game engine in the code based on Artificial Intelligence: A Modern Approach.
If you would like to install aima3, you can use something like this in a cell:
! pip install aima3 -U --user
aima3 has other games that you can play as well as ConnectFour, including TicTacToe. aima3 has many AI algorithms wrapped up to play games. You can see more details about the game engine and ConnectFour here:
and other resources in that repository.
We import some of these that will be useful in our AlphaZero exploration:
from aima3.games import (ConnectFour, RandomPlayer,
MCTSPlayer, QueryPlayer, Player,
MiniMaxPlayer, AlphaBetaPlayer,
AlphaBetaCutoffPlayer)
import numpy as np
Let's make a game:
game = ConnectFour()
and play a game between two random players:
game.play_game(RandomPlayer("Random-1"), RandomPlayer("Random-2"))
Random-2 is thinking... Random-2 makes action (1, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . . . . Random-1 is thinking... Random-1 makes action (3, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . . . . Random-2 is thinking... Random-2 makes action (5, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . X . . Random-1 is thinking... Random-1 makes action (4, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O O X . . Random-2 is thinking... Random-2 makes action (3, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . . X . O O X . . Random-1 is thinking... Random-1 makes action (7, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . . X . O O X . O Random-2 is thinking... Random-2 makes action (7, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . X X . O O X . O Random-1 is thinking... Random-1 makes action (5, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . X X . O O X . O Random-2 is thinking... Random-2 makes action (4, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X O . X X . O O X . O Random-1 is thinking... Random-1 makes action (1, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . X X O . X X . O O X . O Random-2 is thinking... Random-2 makes action (4, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . O . X X O . X X . O O X . O Random-1 is thinking... Random-1 makes action (3, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O X . . . O . X X O . X X . O O X . O Random-2 is thinking... Random-2 makes action (1, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O X . . . O . X X O . X X . O O X . O Random-1 is thinking... Random-1 makes action (3, 4): . . . . . . . . . . . . . . . . . . . . . . . O . . . . X . O X . . . O . X X O . X X . O O X . O Random-2 is thinking... Random-2 makes action (1, 4): . . . . . . . . . . . . . . . . . . . . . X . O . . . . X . O X . . . O . X X O . X X . O O X . O Random-1 is thinking... Random-1 makes action (1, 5): . . . . . . . . . . . . . . O . . . . . . X . O . . . . X . O X . . . O . X X O . X X . O O X . O Random-2 is thinking... Random-2 makes action (2, 1): . . . . . . . . . . . . . . O . . . . . . X . O . . . . X . O X . . . O . X X O . X X X O O X . O Random-1 is thinking... Random-1 makes action (6, 1): . . . . . . . . . . . . . . O . . . . . . X . O . . . . X . O X . . . O . X X O . X X X O O X O O Random-2 is thinking... Random-2 makes action (1, 6): . . . . . . . X . . . . . . O . . . . . . X . O . . . . X . O X . . . O . X X O . X X X O O X O O Random-1 is thinking... Random-1 makes action (3, 5): . . . . . . . X . . . . . . O . O . . . . X . O . . . . X . O X . . . O . X X O . X X X O O X O O Random-2 is thinking... Random-2 makes action (2, 2): . . . . . . . X . . . . . . O . O . . . . X . O . . . . X . O X . . . O X X X O . X X X O O X O O Random-1 is thinking... Random-1 makes action (4, 4): . . . . . . . X . . . . . . O . O . . . . X . O O . . . X . O X . . . O X X X O . X X X O O X O O Random-2 is thinking... Random-2 makes action (5, 3): . . . . . . . X . . . . . . O . O . . . . X . O O . . . X . O X X . . O X X X O . X X X O O X O O Random-1 is thinking... Random-1 makes action (5, 4): . . . . . . . X . . . . . . O . O . . . . X . O O O . . X . O X X . . O X X X O . X X X O O X O O Random-2 is thinking... Random-2 makes action (7, 3): . . . . . . . X . . . . . . O . O . . . . X . O O O . . X . O X X . X O X X X O . X X X O O X O O Random-1 is thinking... Random-1 makes action (6, 2): . . . . . . . X . . . . . . O . O . . . . X . O O O . . X . O X X . X O X X X O O X X X O O X O O Random-2 is thinking... Random-2 makes action (7, 4): . . . . . . . X . . . . . . O . O . . . . X . O O O . X X . O X X . X O X X X O O X X X O O X O O Random-1 is thinking... Random-1 makes action (7, 5): . . . . . . . X . . . . . . O . O . . . O X . O O O . X X . O X X . X O X X X O O X X X O O X O O Random-2 is thinking... Random-2 makes action (2, 3): . . . . . . . X . . . . . . O . O . . . O X . O O O . X X X O X X . X O X X X O O X X X O O X O O Random-1 is thinking... Random-1 makes action (5, 5): . . . . . . . X . . . . . . O . O . O . O X . O O O . X X X O X X . X O X X X O O X X X O O X O O Random-2 is thinking... Random-2 makes action (3, 6): . . . . . . . X . X . . . . O . O . O . O X . O O O . X X X O X X . X O X X X O O X X X O O X O O Random-1 is thinking... Random-1 makes action (2, 4): . . . . . . . X . X . . . . O . O . O . O X O O O O . X X X O X X . X O X X X O O X X X O O X O O ***** Random-1 wins!
['Random-1']
We can also play a match (a bunch of games) or even a tournament between a bunch of players.
p1 = RandomPlayer("Random-1")
p2 = MiniMax("MiniMax-1")
p3 = AlphaBetaCutoff("ABCutoff-1")
game.play_matches(10, p1, p2)
game.play_tournament(1, p1, p2, p3)
Can you beat RandomPlayer? Hope so!
Can you beat MiniMax? No! But it takes too long.
Humans enter their commands by (column, row) where column starts at 1 from left, and row starts at 1 from bottom.
# game.play_game(AlphaBetaCutoffPlayer("AlphaBetaCutoff"), HumanPlayer("Your Name Here"))
Net, we are going to build the same kind of network described in the AlphaZero paper.
Make sure to set your Keras backend to TensorFlow for now, as we have a function that is written at that level.
import conx as cx
from aima3.games import Game
from keras import regularizers
Using TensorFlow backend. /usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds) conx, version 3.5.14
## NEED TO REWRITE THIS FUNCTION IN KERAS:
import tensorflow as tf
def softmax_cross_entropy_with_logits(y_true, y_pred):
p = y_pred
pi = y_true
zero = tf.zeros(shape = tf.shape(pi), dtype=tf.float32)
where = tf.equal(pi, zero)
negatives = tf.fill(tf.shape(pi), -100.0)
p = tf.where(where, negatives, p)
loss = tf.nn.softmax_cross_entropy_with_logits(labels = pi, logits = p)
return loss
The state board is the most important bits of information. How to represent it? Possible ideas:
We decided to represent the state of the board as 2 6x7 matrices: one for representing the current player's pieces, and the other for the opponent pieces.
We also need to represent actions. Possible ideas:
We decided to represent them as the final option: 42 outputs.
The network architecture in AlphaZero is quite large, and has repeating blocks of layers. To help in the construction of the network, we define some functions
def add_conv_block(net, input_layer):
cname = net.add(cx.Conv2DLayer("conv2d-%d",
filters=75,
kernel_size=(4,4),
padding='same',
use_bias=False,
activation='linear',
kernel_regularizer=regularizers.l2(0.0001)))
bname = net.add(cx.BatchNormalizationLayer("batch-norm-%d", axis=1))
lname = net.add(cx.LeakyReLULayer("leaky-relu-%d"))
net.connect(input_layer, cname)
net.connect(cname, bname)
net.connect(bname, lname)
return lname
def add_residual_block(net, input_layer):
prev_layer = add_conv_block(net, input_layer)
cname = net.add(cx.Conv2DLayer("conv2d-%d",
filters=75,
kernel_size=(4,4),
padding='same',
use_bias=False,
activation='linear',
kernel_regularizer=regularizers.l2(0.0001)))
bname = net.add(cx.BatchNormalizationLayer("batch-norm-%d", axis=1))
aname = net.add(cx.AddLayer("add-%d"))
lname = net.add(cx.LeakyReLULayer("leaky-relu-%d"))
net.connect(prev_layer, cname)
net.connect(cname, bname)
net.connect(input_layer, aname)
net.connect(bname, aname)
net.connect(aname, lname)
return lname
def add_value_block(net, input_layer):
l1 = net.add(cx.Conv2DLayer("conv2d-%d",
filters=1,
kernel_size=(1,1),
padding='same',
use_bias=False,
activation='linear',
kernel_regularizer=regularizers.l2(0.0001)))
l2 = net.add(cx.BatchNormalizationLayer("batch-norm-%d", axis=1))
l3 = net.add(cx.LeakyReLULayer("leaky-relu-%d"))
l4 = net.add(cx.FlattenLayer("flatten-%d"))
l5 = net.add(cx.Layer("dense-%d",
20,
use_bias=False,
activation='linear',
kernel_regularizer=regularizers.l2(0.0001)))
l6 = net.add(cx.LeakyReLULayer("leaky-relu-%d"))
l7 = net.add(cx.Layer('value_head',
1,
use_bias=False,
activation='tanh',
kernel_regularizer=regularizers.l2(0.0001)))
net.connect(input_layer, l1)
net.connect(l1, l2)
net.connect(l2, l3)
net.connect(l3, l4)
net.connect(l4, l5)
net.connect(l5, l6)
net.connect(l6, l7)
return l7
def add_policy_block(net, input_layer):
l1 = net.add(cx.Conv2DLayer("conv2d-%d",
filters=2,
kernel_size=(1,1),
padding='same',
use_bias=False,
activation='linear',
kernel_regularizer = regularizers.l2(0.0001)))
l2 = net.add(cx.BatchNormalizationLayer("batch-norm-%d", axis=1))
l3 = net.add(cx.LeakyReLULayer("leaky-relu-%d"))
l4 = net.add(cx.FlattenLayer("flatten-%d"))
l5 = net.add(cx.Layer('policy_head',
42,
use_bias=False,
activation='linear',
kernel_regularizer=regularizers.l2(0.0001)))
net.connect(input_layer, l1)
net.connect(l1, l2)
net.connect(l2, l3)
net.connect(l3, l4)
net.connect(l4, l5)
return l5
def make_network(game, residuals=5):
net = cx.Network("AlphaZero Network")
net.add(cx.Layer("main_input", (game.v, game.h, 2)))
out_layer = add_conv_block(net, "main_input")
for i in range(residuals):
out_layer = add_residual_block(net, out_layer)
add_policy_block(net, out_layer)
add_value_block(net, out_layer)
net.compile(loss={'value_head': 'mean_squared_error',
'policy_head': softmax_cross_entropy_with_logits},
optimizer=cx.SGD(lr=0.1, momentum=0.9),
loss_weights={'value_head': 0.5,
'policy_head': 0.5})
for layer in net.layers:
if layer.kind() == "hidden":
layer.visible = False
return net
game = ConnectFour()
net = make_network(game)
net.model.summary()
__________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== main_input (InputLayer) (None, 6, 7, 2) 0 __________________________________________________________________________________________________ conv2d-1 (Conv2D) (None, 6, 7, 75) 2400 main_input[0][0] __________________________________________________________________________________________________ batch-norm-1 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-1[0][0] __________________________________________________________________________________________________ leaky-relu-1 (LeakyReLU) (None, 6, 7, 75) 0 batch-norm-1[0][0] __________________________________________________________________________________________________ conv2d-2 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-1[0][0] __________________________________________________________________________________________________ batch-norm-2 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-2[0][0] __________________________________________________________________________________________________ leaky-relu-2 (LeakyReLU) (None, 6, 7, 75) 0 batch-norm-2[0][0] __________________________________________________________________________________________________ conv2d-3 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-2[0][0] __________________________________________________________________________________________________ batch-norm-3 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-3[0][0] __________________________________________________________________________________________________ add-1 (Add) (None, 6, 7, 75) 0 leaky-relu-1[0][0] batch-norm-3[0][0] __________________________________________________________________________________________________ leaky-relu-3 (LeakyReLU) (None, 6, 7, 75) 0 add-1[0][0] __________________________________________________________________________________________________ conv2d-4 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-3[0][0] __________________________________________________________________________________________________ batch-norm-4 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-4[0][0] __________________________________________________________________________________________________ leaky-relu-4 (LeakyReLU) (None, 6, 7, 75) 0 batch-norm-4[0][0] __________________________________________________________________________________________________ conv2d-5 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-4[0][0] __________________________________________________________________________________________________ batch-norm-5 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-5[0][0] __________________________________________________________________________________________________ add-2 (Add) (None, 6, 7, 75) 0 leaky-relu-3[0][0] batch-norm-5[0][0] __________________________________________________________________________________________________ leaky-relu-5 (LeakyReLU) (None, 6, 7, 75) 0 add-2[0][0] __________________________________________________________________________________________________ conv2d-6 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-5[0][0] __________________________________________________________________________________________________ batch-norm-6 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-6[0][0] __________________________________________________________________________________________________ leaky-relu-6 (LeakyReLU) (None, 6, 7, 75) 0 batch-norm-6[0][0] __________________________________________________________________________________________________ conv2d-7 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-6[0][0] __________________________________________________________________________________________________ batch-norm-7 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-7[0][0] __________________________________________________________________________________________________ add-3 (Add) (None, 6, 7, 75) 0 leaky-relu-5[0][0] batch-norm-7[0][0] __________________________________________________________________________________________________ leaky-relu-7 (LeakyReLU) (None, 6, 7, 75) 0 add-3[0][0] __________________________________________________________________________________________________ conv2d-8 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-7[0][0] __________________________________________________________________________________________________ batch-norm-8 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-8[0][0] __________________________________________________________________________________________________ leaky-relu-8 (LeakyReLU) (None, 6, 7, 75) 0 batch-norm-8[0][0] __________________________________________________________________________________________________ conv2d-9 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-8[0][0] __________________________________________________________________________________________________ batch-norm-9 (BatchNormalizatio (None, 6, 7, 75) 24 conv2d-9[0][0] __________________________________________________________________________________________________ add-4 (Add) (None, 6, 7, 75) 0 leaky-relu-7[0][0] batch-norm-9[0][0] __________________________________________________________________________________________________ leaky-relu-9 (LeakyReLU) (None, 6, 7, 75) 0 add-4[0][0] __________________________________________________________________________________________________ conv2d-10 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-9[0][0] __________________________________________________________________________________________________ batch-norm-10 (BatchNormalizati (None, 6, 7, 75) 24 conv2d-10[0][0] __________________________________________________________________________________________________ leaky-relu-10 (LeakyReLU) (None, 6, 7, 75) 0 batch-norm-10[0][0] __________________________________________________________________________________________________ conv2d-11 (Conv2D) (None, 6, 7, 75) 90000 leaky-relu-10[0][0] __________________________________________________________________________________________________ batch-norm-11 (BatchNormalizati (None, 6, 7, 75) 24 conv2d-11[0][0] __________________________________________________________________________________________________ add-5 (Add) (None, 6, 7, 75) 0 leaky-relu-9[0][0] batch-norm-11[0][0] __________________________________________________________________________________________________ leaky-relu-11 (LeakyReLU) (None, 6, 7, 75) 0 add-5[0][0] __________________________________________________________________________________________________ conv2d-13 (Conv2D) (None, 6, 7, 1) 75 leaky-relu-11[0][0] __________________________________________________________________________________________________ batch-norm-13 (BatchNormalizati (None, 6, 7, 1) 24 conv2d-13[0][0] __________________________________________________________________________________________________ conv2d-12 (Conv2D) (None, 6, 7, 2) 150 leaky-relu-11[0][0] __________________________________________________________________________________________________ leaky-relu-13 (LeakyReLU) (None, 6, 7, 1) 0 batch-norm-13[0][0] __________________________________________________________________________________________________ batch-norm-12 (BatchNormalizati (None, 6, 7, 2) 24 conv2d-12[0][0] __________________________________________________________________________________________________ flatten-2 (Flatten) (None, 42) 0 leaky-relu-13[0][0] __________________________________________________________________________________________________ leaky-relu-12 (LeakyReLU) (None, 6, 7, 2) 0 batch-norm-12[0][0] __________________________________________________________________________________________________ dense-1 (Dense) (None, 20) 840 flatten-2[0][0] __________________________________________________________________________________________________ flatten-1 (Flatten) (None, 84) 0 leaky-relu-12[0][0] __________________________________________________________________________________________________ leaky-relu-14 (LeakyReLU) (None, 20) 0 dense-1[0][0] __________________________________________________________________________________________________ policy_head (Dense) (None, 42) 3528 flatten-1[0][0] __________________________________________________________________________________________________ value_head (Dense) (None, 1) 20 leaky-relu-14[0][0] ================================================================================================== Total params: 907,325 Trainable params: 907,169 Non-trainable params: 156 __________________________________________________________________________________________________
len(net.layers)
51
net.render()
First, we need a mapping from game (x,y) moves to a position in a list of actions and probabilities.
def make_mappings(game):
"""
Get a mapping from game's (x,y) to array position.
"""
move2pos = {}
pos2move = []
position = 0
for y in range(game.v, 0, -1):
for x in range(1, game.h + 1):
move2pos[(x,y)] = position
pos2move.append((x,y))
position += 1
return move2pos, pos2move
We use the connectFour game, defined above:
move2pos, pos2move = make_mappings(game)
move2pos[(2,1)]
36
pos2move[35]
(1, 1)
Need a method of converting a list of state moves into an array:
def state2array(game, state):
array = []
to_move = game.to_move(state)
for y in range(game.v, 0, -1):
for x in range(1, game.h + 1):
item = state.board.get((x, y), 0)
if item != 0:
item = 1 if item == to_move else -1
array.append(item)
return array
cx.shape(state2array(game, game.initial))
(42,)
So, state2array returns a list of 42 numbers, where:
Note that "my" and "my opponent" may swap back and forth depending on perspective (ie, whose turn it is, as determined by game.to_move(state)).
def state2inputs(game, state):
board = np.array(state2array(game, state)) # 1 is my pieces, -1 other
currentplayer_position = np.zeros(len(board), dtype=np.int)
currentplayer_position[board==1] = 1
other_position = np.zeros(len(board), dtype=np.int)
other_position[board==-1] = 1
position = np.array(list(zip(currentplayer_position,other_position)))
inputs = position.reshape((game.v, game.h, 2))
return inputs.tolist()
We need to convert the state's board into a form for the neural network:
state2inputs(game, game.initial)
[[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]]
We can check to see if this is correct by propagating the activations to the first layer.
Initial board state has no pieces on the board:
state = game.initial
net.propagate_to_features("main_input", state2inputs(game, state))
Now we make a move to (1,1). But note that after the move, it is now the other player's move. So the first move is seen on the opponent's board (the right side, feature #1):
state = game.result(game.initial, (1,1))
net.propagate_to_features("main_input", state2inputs(game, state))
Now, the second player moves to (2,1). Now we are back to the original perspective, and so the right-hand board is on the left, because that is now the current player's perspective.
state = game.result(state, (3,1))
net.propagate_to_features("main_input", state2inputs(game, state))
Finally, we are ready to connect the game to the network. We define a function get_predictions
that takes a game and state, and propagates it through the network returning a (value, probabilities, allowedActions). The probabilities are the pi list from the AlphaZero paper.
def get_predictions(net, game, state):
"""
Given a state, give output of network on preferred
actions. state.allowedActions removes impossible
actions.
Returns (value, probabilties, allowedActions)
"""
board = np.array(state2array(game, state)) # 1 is my pieces, -1 other
inputs = state2inputs(game, state)
preds = net.propagate(inputs, visualize=True)
value = preds[1][0]
logits = np.array(preds[0])
allowedActions = np.array([move2pos[act] for act in game.actions(state)])
mask = np.ones(len(board), dtype=bool)
mask[allowedActions] = False
logits[mask] = -100
#SOFTMAX
odds = np.exp(logits)
probs = odds / np.sum(odds) ###put this just before the for?
return (value, probs.tolist(), allowedActions.tolist())
value, probs, acts = get_predictions(net, game, state)
net.snapshot(state2inputs(game, state))
Finally, we turn the predictions into a move, and we can play a game with the network.
class NNPlayer(Player):
def set_game(self, game):
"""
Get a mapping from game's (x,y) to array position.
"""
self.net = make_network(game)
self.game = game
self.move2pos = {}
self.pos2move = []
position = 0
for y in range(self.game.v, 0, -1):
for x in range(1, self.game.h + 1):
self.move2pos[(x,y)] = position
self.pos2move.append((x,y))
position += 1
def get_predictions(self, state):
"""
Given a state, give output of network on preferred
actions. state.allowedActions removes impossible
actions.
Returns (value, probabilties, allowedActions)
"""
board = np.array(self.state2array(state)) # 1 is my pieces, -1 other
inputs = self.state2inputs(state)
preds = self.net.propagate(inputs)
value = preds[1][0]
logits = np.array(preds[0])
allowedActions = np.array([self.move2pos[act] for act in self.game.actions(state)])
mask = np.ones(len(board), dtype=bool)
mask[allowedActions] = False
logits[mask] = -100
#SOFTMAX
odds = np.exp(logits)
probs = odds / np.sum(odds)
return (value, probs.tolist(), allowedActions.tolist())
def get_action(self, state, turn):
value, probabilities, moves = self.get_predictions(state)
probs = np.array(probabilities)[moves]
pos = cx.choice(moves, probs)
return self.pos2move[pos]
def state2inputs(self, state):
board = np.array(self.state2array(state)) # 1 is my pieces, -1 other
currentplayer_position = np.zeros(len(board), dtype=np.int)
currentplayer_position[board==1] = 1
other_position = np.zeros(len(board), dtype=np.int)
other_position[board==-1] = 1
position = np.array(list(zip(currentplayer_position,other_position)))
inputs = position.reshape((self.game.v, self.game.h, 2))
return inputs
def state2array(self, state):
array = []
to_move = self.game.to_move(state)
for y in range(self.game.v, 0, -1):
for x in range(1, self.game.h + 1):
item = state.board.get((x, y), 0)
if item != 0:
item = 1 if item == to_move else -1
array.append(item)
return array
p1 = RandomPlayer("Random")
p2 = NNPlayer("NNPlayer")
p2.set_game(game)
p2.get_action(state, 2)
(2, 1)
game.play_game(p1, p2)
NNPlayer is thinking... NNPlayer makes action (2, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . . . Random is thinking... Random makes action (4, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . . . NNPlayer is thinking... NNPlayer makes action (5, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O X . . Random is thinking... Random makes action (4, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . X . O X . . NNPlayer is thinking... NNPlayer makes action (5, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O X . . . X . O X . . Random is thinking... Random makes action (5, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . O X . . . X . O X . . NNPlayer is thinking... NNPlayer makes action (3, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . O X . . . X X O X . . Random is thinking... Random makes action (2, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . O . O X . . . X X O X . . NNPlayer is thinking... NNPlayer makes action (3, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . O X O X . . . X X O X . . Random is thinking... Random makes action (7, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . O X O X . . . X X O X . O NNPlayer is thinking... NNPlayer makes action (3, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . . . O X O X . . . X X O X . O Random is thinking... Random makes action (6, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . . . O X O X . . . X X O X O O NNPlayer is thinking... NNPlayer makes action (1, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . . . O X O X . . X X X O X O O Random is thinking... Random makes action (7, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . O . . . O X O X . O X X X O X O O NNPlayer is thinking... NNPlayer makes action (2, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X . O . . . O X O X . O X X X O X O O Random is thinking... Random makes action (3, 4): . . . . . . . . . . . . . . . . . . . . . . . O . . . . . X X . O . . . O X O X . O X X X O X O O NNPlayer is thinking... NNPlayer makes action (1, 2): . . . . . . . . . . . . . . . . . . . . . . . O . . . . . X X . O . . X O X O X . O X X X O X O O Random is thinking... Random makes action (5, 4): . . . . . . . . . . . . . . . . . . . . . . . O . O . . . X X . O . . X O X O X . O X X X O X O O NNPlayer is thinking... NNPlayer makes action (3, 5): . . . . . . . . . . . . . . . . X . . . . . . O . O . . . X X . O . . X O X O X . O X X X O X O O Random is thinking... Random makes action (5, 5): . . . . . . . . . . . . . . . . X . O . . . . O . O . . . X X . O . . X O X O X . O X X X O X O O NNPlayer is thinking... NNPlayer makes action (1, 3): . . . . . . . . . . . . . . . . X . O . . . . O . O . . X X X . O . . X O X O X . O X X X O X O O Random is thinking... Random makes action (3, 6): . . . . . . . . . O . . . . . . X . O . . . . O . O . . X X X . O . . X O X O X . O X X X O X O O NNPlayer is thinking... NNPlayer makes action (5, 6): . . . . . . . . . O . X . . . . X . O . . . . O . O . . X X X . O . . X O X O X . O X X X O X O O Random is thinking... Random makes action (6, 2): . . . . . . . . . O . X . . . . X . O . . . . O . O . . X X X . O . . X O X O X O O X X X O X O O NNPlayer is thinking... NNPlayer makes action (2, 4): . . . . . . . . . O . X . . . . X . O . . . X O . O . . X X X . O . . X O X O X O O X X X O X O O Random is thinking... Random makes action (4, 3): . . . . . . . . . O . X . . . . X . O . . . X O . O . . X X X O O . . X O X O X O O X X X O X O O NNPlayer is thinking... NNPlayer makes action (1, 4): . . . . . . . . . O . X . . . . X . O . . X X O . O . . X X X O O . . X O X O X O O X X X O X O O ***** NNPlayer wins!
['NNPlayer']
Now we are ready to train the network. The training is a clever use of Monte Carlo Tree Search, combined with playing against itself.
There is a Monte Carlo Tree Search player in aima3 that we will use. We set the policy to come from predictions from the neural network.
class AlphaZeroMCTSPlayer(MCTSPlayer):
"""
A Monte Carlo Tree Search with policy function from
neural network. Network will be set later to self.nnplayer.
"""
def policy(self, game, state):
# these moves are positions:
value, probs_all, moves = self.nnplayer.get_predictions(state)
if len(moves) == 0:
result = [], value
else:
probs = np.array(probs_all)[moves]
moves = [self.nnplayer.pos2move[pos] for pos in moves]
# we need to return probs and moves for game
result = [(act, prob) for (act, prob) in list(zip(moves, probs))], value
return result
The main AlphaZeroPlayer needs to be able to play in one of two modes:
class AlphaZeroPlayer(NNPlayer):
## Load weights if continuing
def __init__(self, name, n_playout=40, *args, **kwargs):
super().__init__(name, *args, **kwargs)
self.mcts_players = [AlphaZeroMCTSPlayer("MCTS-1", n_playout=n_playout),
AlphaZeroMCTSPlayer("MCTS-2", n_playout=n_playout)]
def set_game(self, game):
super().set_game(game)
self.mcts_players[0].set_game(game)
self.mcts_players[1].set_game(game)
self.mcts_players[0].nnplayer = self
self.mcts_players[1].nnplayer = self
self.data = [[], []]
self.cache = {}
def get_action(self, state, turn, self_play):
if self_play:
## Only way to determine which is which?
if turn in self.cache:
player_num = 1
else:
player_num = 0
self.cache[turn] = True
## now use the policy to get some probs:
move, pi = self.mcts_players[player_num].get_action(state, round(turn), return_prob=True)
## save the state and probs:
self.data[player_num].append((self.state2inputs(state), self.move_probs2all_probs(pi)))
return move
else:
# play the network, were're in the playoffs!
return super().get_action(state, round(turn))
def move_probs2all_probs(self, move_probs):
all_probs = np.zeros(len(self.state2array(game.initial)))
for move in move_probs:
all_probs[self.move2pos[move]] = move_probs[move]
return all_probs.tolist()
We now set up the game to play in one of the two modes.
One complication when playing itself: the system isn't sure which one it is, and we want to separate the two plays! To keep track, we cache the turn; if we see the same turn again, then we know it is the second.
class AlphaZeroGame(ConnectFour):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.memory = []
def play_game(self, *players, flip_coin=False, verbose=1, **kwargs):
results = super().play_game(*players, flip_coin=flip_coin, verbose=verbose, **kwargs)
if "self_play" in kwargs and kwargs["self_play"]:
## Do not allow flipping coins when self play:
## Assumes that player1 == player2 when self-playing
assert flip_coin is False, "no coin_flip when self-playing"
## value is in terms of player 0
value = self.final_utility
for state, probs in players[0].data[0]:
self.memory.append([state, [probs, [value]]])
# also data from opponent, so flip value:
value = -value
for state, probs in players[1].data[1]:
self.memory.append([state, [probs, [value]]])
return results
game = AlphaZeroGame()
best_player = AlphaZeroPlayer("best_player")
current_player = AlphaZeroPlayer("current_player")
Some basic tests to make sure things are going in the right place:
current_player.set_game(game)
assert current_player.data == [[], []]
print(current_player.get_action(game.initial, 1, self_play=False))
assert current_player.data == [[], []]
print(current_player.get_action(game.initial, 1, self_play=True))
assert current_player.data[0] != []
print(current_player.get_action(game.initial, 1, self_play=True))
assert current_player.data[1] != []
(5, 1) (4, 1) (3, 1)
Sample just for testing:
game.play_tournament(1, best_player, best_player, verbose=1, mode="ordered", self_play=True)
Tournament to begin with 2 matches... best_player is thinking... best_player makes action (3, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . . best_player is thinking... best_player makes action (7, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X . . . O best_player is thinking... best_player makes action (2, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X . . . O best_player is thinking... best_player makes action (6, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X . . O O best_player is thinking... best_player makes action (1, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X X . . O O best_player is thinking... best_player makes action (4, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X X O . O O best_player is thinking... best_player makes action (5, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X X X O X O O best_player is thinking... best_player makes action (4, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . X X X O X O O best_player is thinking... best_player makes action (3, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X O . . . X X X O X O O best_player is thinking... best_player makes action (4, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . X O . . . X X X O X O O best_player is thinking... best_player makes action (7, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . X O . . X X X X O X O O best_player is thinking... best_player makes action (4, 4): . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . . O . . . . . X O . . X X X X O X O O ***** best_player wins! best_player is thinking... best_player makes action (7, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X best_player is thinking... best_player makes action (2, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . X best_player is thinking... best_player makes action (4, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . X . . X best_player is thinking... best_player makes action (2, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . . O . X . . X best_player is thinking... best_player makes action (1, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . X O . X . . X best_player is thinking... best_player makes action (7, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . O X O . X . . X best_player is thinking... best_player makes action (4, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . X . . O X O . X . . X best_player is thinking... best_player makes action (3, 1): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . X . . O X O O X . . X best_player is thinking... best_player makes action (1, 2): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X O . X . . O X O O X . . X best_player is thinking... best_player makes action (2, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O . . . . . X O . X . . O X O O X . . X best_player is thinking... best_player makes action (1, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . X O . . . . . X O . X . . O X O O X . . X best_player is thinking... best_player makes action (4, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . X O . O . . . X O . X . . O X O O X . . X best_player is thinking... best_player makes action (7, 3): . . . . . . . . . . . . . . . . . . . . . . . . . . . . X O . O . . X X O . X . . O X O O X . . X best_player is thinking... best_player makes action (2, 4): . . . . . . . . . . . . . . . . . . . . . . O . . . . . X O . O . . X X O . X . . O X O O X . . X ***** best_player wins!
{'DRAW': 0, 'best_player': 2}
Did we collect some history?
len(game.memory)
26
Ok, we are ready to learn!
config = dict(
MINIMUM_MEMORY_SIZE_BEFORE_TRAINING = 1000, # min size of memory
TRAINING_EPOCHS_PER_CYCLE = 500, # training on current network
CYCLES = 1, # number of cycles to run
SELF_PLAY_MATCHES = 1, # matches to test yo' self per self-play round
TOURNAMENT_MATCHES = 2, # plays each player as first mover per match, so * 2
BEST_SWAP_PERCENT = 1.0, # you must be this much better than best
)
def alphazero_train(config):
## Uses global game, best_player, and current_player
for cycle in range(config["CYCLES"]):
print("Epoch #%s..." % cycle)
# self-play, collect data:
print("Self-play matches begin...")
while len(game.memory) < config["MINIMUM_MEMORY_SIZE_BEFORE_TRAINING"]:
results = game.play_tournament(config["SELF_PLAY_MATCHES"],
best_player, best_player,
mode="ordered", self_play=True)
print("Memory size is %s" % len(game.memory))
print("Enough to train!")
current_player.net.dataset.clear()
current_player.net.dataset.load(game.memory)
print("Training on ", len(current_player.net.dataset.inputs), "patterns...")
current_player.net.train(config["TRAINING_EPOCHS_PER_CYCLE"],
batch_size=len(game.memory),
plot=True)
## save dataset every once in a while
## now see which net is better:
print("Playing best vs current to see who wins the title...")
results = game.play_tournament(config["TOURNAMENT_MATCHES"],
best_player, current_player,
mode="one-each", self_play=False)
if results["current_player"] > results["best_player"] * config["BEST_SWAP_PERCENT"]:
print("current won! swapping weights")
# give the better weights to the best_player
best_player.net.set_weights(
current_player.net.get_weights())
game.memory = []
else:
print("best won!")
alphazero_train(config)
Epoch #0... Self-play matches begin... Memory size is 171 Memory size is 208 Memory size is 247 Memory size is 284 Memory size is 310 Memory size is 349 Memory size is 387 Memory size is 425 Memory size is 455 Memory size is 481 Memory size is 541 Memory size is 571 Memory size is 608 Memory size is 657 Memory size is 703 Memory size is 744 Memory size is 788 Memory size is 833 Memory size is 878 Memory size is 920 Memory size is 979 Memory size is 1028 Enough to train! Training on 1028 patterns... Training... | Training | policy | value Epochs | Error | head acc | head acc ------ | --------- | --------- | --------- # 801 | 0.39601 | 0.00000 | 0.92708 # 802 | 1.03875 | 0.00000 | 0.27140 # 803 | 0.87153 | 0.00000 | 0.15953 # 804 | 0.79308 | 0.00000 | 0.18191 # 805 | 0.73491 | 0.00000 | 0.29475 # 806 | 0.73332 | 0.00000 | 0.35019 # 807 | 0.70390 | 0.00000 | 0.36868 # 808 | 0.67224 | 0.00000 | 0.40467 # 809 | 0.64956 | 0.00000 | 0.46790 # 810 | 0.65823 | 0.00000 | 0.49222 # 811 | 0.64335 | 0.00000 | 0.49611 # 812 | 0.68785 | 0.00000 | 0.47374 # 813 | 0.64451 | 0.00000 | 0.50486 # 814 | 0.62052 | 0.00000 | 0.51946 # 815 | 0.65647 | 0.00000 | 0.47179 # 816 | 0.71668 | 0.00000 | 0.45039 # 817 | 0.69026 | 0.00000 | 0.45525 # 818 | 0.62992 | 0.00000 | 0.53794 # 819 | 0.59203 | 0.00000 | 0.61479 # 820 | 0.62125 | 0.00000 | 0.60019 # 821 | 0.59042 | 0.00000 | 0.62257 # 822 | 0.56548 | 0.00000 | 0.64008 # 823 | 0.57956 | 0.00000 | 0.68288 # 824 | 0.60517 | 0.00000 | 0.60798 # 825 | 0.56350 | 0.00000 | 0.65953 # 826 | 0.54747 | 0.00000 | 0.71304 # 827 | 0.54154 | 0.00000 | 0.73054 # 828 | 0.57241 | 0.00000 | 0.69553 # 829 | 0.56825 | 0.00000 | 0.70914 # 830 | 0.55524 | 0.00000 | 0.69747 # 831 | 0.54866 | 0.00000 | 0.71109 # 832 | 0.54924 | 0.00000 | 0.71693 # 833 | 0.54617 | 0.00000 | 0.73249 # 834 | 0.53730 | 0.00000 | 0.73638 # 835 | 0.53897 | 0.00000 | 0.74125 # 836 | 0.54266 | 0.00000 | 0.72860 # 837 | 0.52672 | 0.00000 | 0.75486 # 838 | 0.53487 | 0.00000 | 0.77626 # 839 | 0.53721 | 0.00000 | 0.73054 # 840 | 0.56242 | 0.00000 | 0.70623 # 841 | 0.54416 | 0.00000 | 0.71693 # 842 | 0.53821 | 0.00000 | 0.73054 # 843 | 0.52338 | 0.00000 | 0.76654 # 844 | 0.52326 | 0.00000 | 0.77335 # 845 | 0.53004 | 0.00000 | 0.75973 # 846 | 0.52644 | 0.00000 | 0.77918 # 847 | 0.52099 | 0.00000 | 0.78113 # 848 | 0.52136 | 0.00000 | 0.78599 # 849 | 0.51461 | 0.00000 | 0.79961 # 850 | 0.52613 | 0.00000 | 0.75389 # 851 | 0.52171 | 0.00000 | 0.78210 # 852 | 0.52806 | 0.00000 | 0.76556 # 853 | 0.52643 | 0.00000 | 0.78502 # 854 | 0.52034 | 0.00000 | 0.78599 # 855 | 0.51269 | 0.00000 | 0.79183 # 856 | 0.51479 | 0.00000 | 0.78210 # 857 | 0.50916 | 0.00000 | 0.79572 # 858 | 0.50813 | 0.00000 | 0.79475 # 859 | 0.50480 | 0.00000 | 0.81615 # 860 | 0.50278 | 0.00000 | 0.82879 # 861 | 0.49981 | 0.00000 | 0.82296 # 862 | 0.51360 | 0.00000 | 0.80447 # 863 | 0.50122 | 0.00000 | 0.79377 # 864 | 0.49991 | 0.00000 | 0.81712 # 865 | 0.49977 | 0.00000 | 0.82101 # 866 | 0.49591 | 0.00000 | 0.81323 # 867 | 0.49709 | 0.00000 | 0.83074 # 868 | 0.49621 | 0.00000 | 0.82685 # 869 | 0.49390 | 0.00000 | 0.81809 # 870 | 0.49332 | 0.00000 | 0.82685 # 871 | 0.49415 | 0.00000 | 0.82588 # 872 | 0.49587 | 0.00000 | 0.82198 # 873 | 0.50590 | 0.00000 | 0.76167 # 874 | 0.50653 | 0.00000 | 0.79280 # 875 | 0.50713 | 0.00000 | 0.79669 # 876 | 0.50457 | 0.00000 | 0.78210 # 877 | 0.50043 | 0.00000 | 0.80447 # 878 | 0.51643 | 0.00000 | 0.74805 # 879 | 0.51950 | 0.00000 | 0.74903 # 880 | 0.51771 | 0.00000 | 0.77140 # 881 | 0.49895 | 0.00000 | 0.77140 # 882 | 0.49755 | 0.00000 | 0.81420 # 883 | 0.49531 | 0.00000 | 0.79669 # 884 | 0.51883 | 0.00000 | 0.77335 # 885 | 0.51188 | 0.00000 | 0.76265 # 886 | 0.50649 | 0.00000 | 0.78307 # 887 | 0.50355 | 0.00000 | 0.78405 # 888 | 0.49438 | 0.00000 | 0.79864 # 889 | 0.49338 | 0.00000 | 0.82101 # 890 | 0.49197 | 0.00000 | 0.82393 # 891 | 0.48883 | 0.00000 | 0.82490 # 892 | 0.49004 | 0.00000 | 0.83658 # 893 | 0.48731 | 0.00000 | 0.82198 # 894 | 0.48745 | 0.00000 | 0.83560 # 895 | 0.49023 | 0.00000 | 0.81907 # 896 | 0.48879 | 0.00000 | 0.82393 # 897 | 0.48995 | 0.00000 | 0.82198 # 898 | 0.48738 | 0.00000 | 0.81809 # 899 | 0.48518 | 0.00000 | 0.81615 # 900 | 0.48521 | 0.00000 | 0.83852 # 901 | 0.48788 | 0.00000 | 0.82101 # 902 | 0.48487 | 0.00000 | 0.83268 # 903 | 0.48504 | 0.00000 | 0.82296 # 904 | 0.48348 | 0.00000 | 0.82782 # 905 | 0.48291 | 0.00000 | 0.83074 # 906 | 0.48198 | 0.00000 | 0.83074 # 907 | 0.48150 | 0.00000 | 0.83560 # 908 | 0.47950 | 0.00000 | 0.83463 # 909 | 0.47981 | 0.00000 | 0.83852 # 910 | 0.48167 | 0.00000 | 0.84047 # 911 | 0.48813 | 0.00000 | 0.75486 # 912 | 0.48913 | 0.00000 | 0.82101 # 913 | 0.48447 | 0.00000 | 0.80058 # 914 | 0.48232 | 0.00000 | 0.83658 # 915 | 0.47927 | 0.00000 | 0.83658 # 916 | 0.47946 | 0.00000 | 0.83463 # 917 | 0.48028 | 0.00000 | 0.83463 # 918 | 0.47864 | 0.00000 | 0.82490 # 919 | 0.47678 | 0.00000 | 0.83949 # 920 | 0.47827 | 0.00000 | 0.83755 # 921 | 0.47766 | 0.00000 | 0.84047 # 922 | 0.47743 | 0.00000 | 0.82296 # 923 | 0.47477 | 0.00000 | 0.83755 # 924 | 0.47580 | 0.00000 | 0.83755 # 925 | 0.47772 | 0.00000 | 0.84241 # 926 | 0.47633 | 0.00000 | 0.82879 # 927 | 0.47586 | 0.00000 | 0.83658 # 928 | 0.47410 | 0.00000 | 0.84922 # 929 | 0.47460 | 0.00000 | 0.83755 # 930 | 0.47401 | 0.00000 | 0.83463 # 931 | 0.47358 | 0.00000 | 0.83463 # 932 | 0.48433 | 0.00000 | 0.77529 # 933 | 0.48837 | 0.00000 | 0.80156 # 934 | 0.49228 | 0.00000 | 0.75000 # 935 | 0.53406 | 0.00000 | 0.72957 # 936 | 0.59349 | 0.00000 | 0.65370 # 937 | 0.54461 | 0.00000 | 0.65661 # 938 | 0.51751 | 0.00000 | 0.73541 # 939 | 0.51946 | 0.00000 | 0.76654 # 940 | 0.51198 | 0.00000 | 0.74611 # 941 | 0.49071 | 0.00000 | 0.79572 # 942 | 0.48741 | 0.00000 | 0.80837 # 943 | 0.48940 | 0.00000 | 0.83171 # 944 | 0.49434 | 0.00000 | 0.78307 # 945 | 0.50810 | 0.00000 | 0.79086 # 946 | 0.50190 | 0.00000 | 0.77529 # 947 | 0.50138 | 0.00000 | 0.79280 # 948 | 0.49911 | 0.00000 | 0.79572 # 949 | 0.50314 | 0.00000 | 0.80739 # 950 | 0.49738 | 0.00000 | 0.80058 # 951 | 0.53510 | 0.00000 | 0.74222 # 952 | 0.53245 | 0.00000 | 0.75973 # 953 | 0.50113 | 0.00000 | 0.79280 # 954 | 0.49678 | 0.00000 | 0.80642 # 955 | 0.49968 | 0.00000 | 0.82004 # 956 | 0.49353 | 0.00000 | 0.79864 # 957 | 0.49409 | 0.00000 | 0.82393 # 958 | 0.48850 | 0.00000 | 0.81712 # 959 | 0.48732 | 0.00000 | 0.82198 # 960 | 0.48583 | 0.00000 | 0.83463 # 961 | 0.48717 | 0.00000 | 0.83560 # 962 | 0.49166 | 0.00000 | 0.78988 # 963 | 0.48718 | 0.00000 | 0.82101 # 964 | 0.48449 | 0.00000 | 0.84436 # 965 | 0.48572 | 0.00000 | 0.84241 # 966 | 0.48336 | 0.00000 | 0.83366 # 967 | 0.48527 | 0.00000 | 0.82198 # 968 | 0.48378 | 0.00000 | 0.83658 # 969 | 0.48560 | 0.00000 | 0.83463 # 970 | 0.48140 | 0.00000 | 0.83560 # 971 | 0.47973 | 0.00000 | 0.84047 # 972 | 0.48026 | 0.00000 | 0.84339 # 973 | 0.52743 | 0.00000 | 0.71109 # 974 | 0.51828 | 0.00000 | 0.73152 # 975 | 0.51169 | 0.00000 | 0.75681 # 976 | 0.50040 | 0.00000 | 0.77626 # 977 | 0.49987 | 0.00000 | 0.80545 # 978 | 0.49475 | 0.00000 | 0.79961 # 979 | 0.48799 | 0.00000 | 0.82685 # 980 | 0.48720 | 0.00000 | 0.83171 # 981 | 0.48790 | 0.00000 | 0.84241 # 982 | 0.48720 | 0.00000 | 0.82296 # 983 | 0.48248 | 0.00000 | 0.83366 # 984 | 0.48260 | 0.00000 | 0.85019 # 985 | 0.48272 | 0.00000 | 0.83560 # 986 | 0.48424 | 0.00000 | 0.82101 # 987 | 0.48143 | 0.00000 | 0.84047 # 988 | 0.48048 | 0.00000 | 0.82977 # 989 | 0.47887 | 0.00000 | 0.84922 # 990 | 0.48022 | 0.00000 | 0.84241 # 991 | 0.47867 | 0.00000 | 0.84533 # 992 | 0.47878 | 0.00000 | 0.84241 # 993 | 0.47876 | 0.00000 | 0.83560 # 994 | 0.47767 | 0.00000 | 0.84144 # 995 | 0.47786 | 0.00000 | 0.85214 # 996 | 0.47649 | 0.00000 | 0.84533 # 997 | 0.47572 | 0.00000 | 0.84728 # 998 | 0.47814 | 0.00000 | 0.83560 # 999 | 0.47693 | 0.00000 | 0.82588 # 1000 | 0.47429 | 0.00000 | 0.82977 # 1001 | 0.47584 | 0.00000 | 0.85798 # 1002 | 0.47488 | 0.00000 | 0.84825 # 1003 | 0.47582 | 0.00000 | 0.83560 # 1004 | 0.47530 | 0.00000 | 0.83366 # 1005 | 0.47396 | 0.00000 | 0.83852 # 1006 | 0.47526 | 0.00000 | 0.82879 # 1007 | 0.47325 | 0.00000 | 0.83755 # 1008 | 0.47301 | 0.00000 | 0.83949 # 1009 | 0.47212 | 0.00000 | 0.85117 # 1010 | 0.47124 | 0.00000 | 0.84728 # 1011 | 0.47168 | 0.00000 | 0.85700 # 1012 | 0.49259 | 0.00000 | 0.79864 # 1013 | 0.49331 | 0.00000 | 0.78891 # 1014 | 0.48602 | 0.00000 | 0.79280 # 1015 | 0.51709 | 0.00000 | 0.73444 # 1016 | 0.50436 | 0.00000 | 0.76070 # 1017 | 0.49104 | 0.00000 | 0.78016 # 1018 | 0.48956 | 0.00000 | 0.80253 # 1019 | 0.54473 | 0.00000 | 0.76459 # 1020 | 0.51186 | 0.00000 | 0.74514
len(game.memory)
133
Let's train best_player some more:
best_player.net.train(1000, report_rate=5, plot=True)
Interrupted! Cleaning up... ======================================================================== | Training | policy | value Epochs | Error | head acc | head acc ------ | --------- | --------- | --------- # 801 | 0.39601 | 0.00000 | 0.92708
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-48-5c805804ac93> in <module>() 1 current_player.net.dataset.clear() 2 current_player.net.dataset.load(game.memory) ----> 3 current_player.net.train(1000, report_rate=5, plot=True) ~/.local/lib/python3.6/site-packages/conx/network.py in train(self, epochs, accuracy, error, batch_size, report_rate, verbose, kverbose, shuffle, tolerance, class_weight, sample_weight, use_validation_to_stop, plot, record, callbacks, save) 1262 print("Saved!") 1263 if interrupted: -> 1264 raise KeyboardInterrupt 1265 if verbose == 0: 1266 return (self.epoch_count, result) KeyboardInterrupt:
best_player.net["policy_head"].vshape = (6,7)
best_player.net.config["show_targets"] = True
best_player.net.dashboard()
Failed to display Jupyter Widget of type Dashboard
.
If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean that the widgets JavaScript is still loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the Jupyter Widgets Documentation for setup instructions.
If you're reading this message in another frontend (for example, a static rendering on GitHub or NBViewer), it may mean that your frontend doesn't currently support widgets.
Now, you can play the best player to see how it does:
p1 = QueryPlayer("Your Name")
p2 = NNPlayer("Trained AlphaZero")
p2.net = best_player.net
connect4 = ConnectFour()
connect4.play_game(p1, p2)