Notebook

Quantum Circuit Builder v0¶

In this notebook we rely on IBM qiskit [1], OpenAI gym [2] and the library stable-baselines [3] to setup a quantum game and have some artificial reinforcement learning agent play and learn them.

We setup a very simple game, qcircuit-v0, and we compare the performances of different agents playing it.

Setup¶

First of all, let us setup the packages necessary for this simulation as explained in Setup.ipynb.

Next, let us import some basic libraries.

In [1]:

import numpy as np
import gym

from IPython.display import display

Importing the game¶

The game we will run is provided in gym-qcircuit [4], and it is implemented complying with the standard OpenAI gym interface.

The game is a simple quantum circuit building game: given a fixed number of qubits and a desired final state for these qubits, the objective is to design a quantum circuit that takes the given qubits to the desired final state.

In [2]:

import qcircuit

The module qcircuit offers two versions of the game:

qcircuit-v0: it presents the player with a single qubit, and it requires to design a simple circuit setting this qubit in a perfect superposition.
qcircuit-v1: a slightly more challenging scenario where the player is presented with two qubits and he/she is requested to design a circuit setting the qubits in the state $\frac{1}{\sqrt{2}}\left|00\right\rangle +\frac{1}{\sqrt{2}}\left|11\right\rangle $.

Details on the implementation of these games are available at https://github.com/FMZennaro/gym-qcircuit/blob/master/qcircuit/envs/qcircuit_env.py.

qcircuit-v0¶

We start loading the first scenario and run agents on it.

In [3]:

env = gym.make('qcircuit-v0')

The game qcircuit-v0 is completely observed, and both its state space and action space are small.

Remember that a single qubit is described by $\alpha\left|0\right\rangle +\beta\left|1\right\rangle$, where $\alpha, \beta$ are complex numbers and $\left|0\right\rangle, \left|1\right\rangle$ are the measurement axes. The state space is then described by four real numbers between -1 and 1 representing the real and complex part of $\alpha, \beta$.

An agent plays the game interacting with a quantum circuit, adding and removing standard gates. In this version of the game there are only three actions available: add an X gate, add a Hadamard gate, or remove the last inserted gate.

Again, details on the implementation of the state space and the action space are available at https://github.com/FMZennaro/gym-qcircuit/blob/master/qcircuit/envs/qcircuit_env.py.

Random agent¶

First, we simply run a random agent. This allows us to test out the game and see its evolution.

A random agent selects a possible action from the action space at random and executes it. Given the limited amount of actions (including the possibility of undoing actions by removing a gate), and the simple objective, the random agent should be able to land on the right circuit in a limited amount of actions.

In [4]:

env.reset()
display(env.render())

done = False
while(not done):
    obs, _, done, info = env.step(env.action_space.sample())
    display(info['circuit_img'])
       
env.close()

PPO2 Agent¶

We now run a PPO2 agent, a more sophisticated agent picked from the library of stable_baselines.

First we import the agent.

In [5]:

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Then we train it.

In [6]:

env = DummyVecEnv([lambda: env])
modelPPO2 = PPO2(MlpPolicy, env, verbose=1)
modelPPO2.learn(total_timesteps=10000)

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:66: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/policies.py:115: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/input.py:25: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/policies.py:562: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:332: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/a2c/utils.py:156: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/distributions.py:323: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/distributions.py:324: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:193: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:201: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:209: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:243: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:245: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

--------------------------------------
| approxkl           | 7.973169e-05  |
| clipfrac           | 0.0           |
| explained_variance | -0.000982     |
| fps                | 22            |
| n_updates          | 1             |
| policy_entropy     | 1.0985414     |
| policy_loss        | -0.0070484644 |
| serial_timesteps   | 128           |
| time_elapsed       | 3.58e-06      |
| total_timesteps    | 128           |
| value_loss         | 3996.909      |
--------------------------------------
--------------------------------------
| approxkl           | 3.2882737e-05 |
| clipfrac           | 0.0           |
| explained_variance | 0.000217      |
| fps                | 30            |
| n_updates          | 2             |
| policy_entropy     | 1.0981455     |
| policy_loss        | -0.004966901  |
| serial_timesteps   | 256           |
| time_elapsed       | 5.61          |
| total_timesteps    | 256           |
| value_loss         | 4248.403      |
--------------------------------------
--------------------------------------
| approxkl           | 6.612671e-05  |
| clipfrac           | 0.0           |
| explained_variance | -0.000358     |
| fps                | 28            |
| n_updates          | 3             |
| policy_entropy     | 1.0974414     |
| policy_loss        | -0.0062959143 |
| serial_timesteps   | 384           |
| time_elapsed       | 9.82          |
| total_timesteps    | 384           |
| value_loss         | 3907.8992     |
--------------------------------------
--------------------------------------
| approxkl           | 1.0456845e-05 |
| clipfrac           | 0.0           |
| explained_variance | -0.000337     |
| fps                | 30            |
| n_updates          | 4             |
| policy_entropy     | 1.0968205     |
| policy_loss        | -0.002253918  |
| serial_timesteps   | 512           |
| time_elapsed       | 14.4          |
| total_timesteps    | 512           |
| value_loss         | 4101.919      |
--------------------------------------
--------------------------------------
| approxkl           | 0.00016963946 |
| clipfrac           | 0.0           |
| explained_variance | -1.24e-05     |
| fps                | 28            |
| n_updates          | 5             |
| policy_entropy     | 1.095028      |
| policy_loss        | -0.0103989765 |
| serial_timesteps   | 640           |
| time_elapsed       | 18.6          |
| total_timesteps    | 640           |
| value_loss         | 4050.2324     |
--------------------------------------
--------------------------------------
| approxkl           | 0.00015043674 |
| clipfrac           | 0.0           |
| explained_variance | -0.00013      |
| fps                | 39            |
| n_updates          | 6             |
| policy_entropy     | 1.0920858     |
| policy_loss        | -0.009303682  |
| serial_timesteps   | 768           |
| time_elapsed       | 23            |
| total_timesteps    | 768           |
| value_loss         | 3945.5684     |
--------------------------------------
--------------------------------------
| approxkl           | 0.00032779737 |
| clipfrac           | 0.0           |
| explained_variance | 0.00017       |
| fps                | 28            |
| n_updates          | 7             |
| policy_entropy     | 1.0865046     |
| policy_loss        | -0.01436338   |
| serial_timesteps   | 896           |
| time_elapsed       | 26.3          |
| total_timesteps    | 896           |
| value_loss         | 3870.6        |
--------------------------------------
-------------------------------------
| approxkl           | 0.0006958895 |
| clipfrac           | 0.0          |
| explained_variance | 0.000924     |
| fps                | 37           |
| n_updates          | 8            |
| policy_entropy     | 1.0740579    |
| policy_loss        | -0.019546235 |
| serial_timesteps   | 1024         |
| time_elapsed       | 30.8         |
| total_timesteps    | 1024         |
| value_loss         | 3722.3884    |
-------------------------------------
--------------------------------------
| approxkl           | 0.00084322505 |
| clipfrac           | 0.0           |
| explained_variance | 0.000482      |
| fps                | 25            |
| n_updates          | 9             |
| policy_entropy     | 1.0548321     |
| policy_loss        | -0.025074812  |
| serial_timesteps   | 1152          |
| time_elapsed       | 34.2          |
| total_timesteps    | 1152          |
| value_loss         | 3957.3313     |
--------------------------------------
-------------------------------------
| approxkl           | 0.0018638866 |
| clipfrac           | 0.0          |
| explained_variance | -0.000548    |
| fps                | 35           |
| n_updates          | 10           |
| policy_entropy     | 1.01555      |
| policy_loss        | -0.04078684  |
| serial_timesteps   | 1280         |
| time_elapsed       | 39.3         |
| total_timesteps    | 1280         |
| value_loss         | 4177.0903    |
-------------------------------------
-------------------------------------
| approxkl           | 0.0021850825 |
| clipfrac           | 0.0          |
| explained_variance | 0.00265      |
| fps                | 22           |
| n_updates          | 11           |
| policy_entropy     | 0.9594531    |
| policy_loss        | -0.039690077 |
| serial_timesteps   | 1408         |
| time_elapsed       | 42.9         |
| total_timesteps    | 1408         |
| value_loss         | 4124.8867    |
-------------------------------------
-------------------------------------
| approxkl           | 0.0033041902 |
| clipfrac           | 0.0          |
| explained_variance | -0.000456    |
| fps                | 34           |
| n_updates          | 12           |
| policy_entropy     | 0.8843033    |
| policy_loss        | -0.04696823  |
| serial_timesteps   | 1536         |
| time_elapsed       | 48.7         |
| total_timesteps    | 1536         |
| value_loss         | 4201.283     |
-------------------------------------
-------------------------------------
| approxkl           | 0.0030925882 |
| clipfrac           | 0.021484375  |
| explained_variance | 0.000952     |
| fps                | 34           |
| n_updates          | 13           |
| policy_entropy     | 0.7641637    |
| policy_loss        | -0.05795412  |
| serial_timesteps   | 1664         |
| time_elapsed       | 52.5         |
| total_timesteps    | 1664         |
| value_loss         | 4440.92      |
-------------------------------------
-------------------------------------
| approxkl           | 0.0038732863 |
| clipfrac           | 0.04296875   |
| explained_variance | 0.00518      |
| fps                | 22           |
| n_updates          | 14           |
| policy_entropy     | 0.6456456    |
| policy_loss        | -0.061438754 |
| serial_timesteps   | 1792         |
| time_elapsed       | 56.2         |
| total_timesteps    | 1792         |
| value_loss         | 4367.134     |
-------------------------------------
-------------------------------------
| approxkl           | 0.003097237  |
| clipfrac           | 0.041015625  |
| explained_variance | 0.00306      |
| fps                | 33           |
| n_updates          | 15           |
| policy_entropy     | 0.51307905   |
| policy_loss        | -0.057217635 |
| serial_timesteps   | 1920         |
| time_elapsed       | 61.7         |
| total_timesteps    | 1920         |
| value_loss         | 4417.79      |
-------------------------------------
-------------------------------------
| approxkl           | 0.0021201954 |
| clipfrac           | 0.03125      |
| explained_variance | -0.0133      |
| fps                | 33           |
| n_updates          | 16           |
| policy_entropy     | 0.40890202   |
| policy_loss        | -0.04748139  |
| serial_timesteps   | 2048         |
| time_elapsed       | 65.5         |
| total_timesteps    | 2048         |
| value_loss         | 4432.298     |
-------------------------------------
-------------------------------------
| approxkl           | 0.0014529026 |
| clipfrac           | 0.01953125   |
| explained_variance | 0.00501      |
| fps                | 17           |
| n_updates          | 17           |
| policy_entropy     | 0.31277794   |
| policy_loss        | -0.03971738  |
| serial_timesteps   | 2176         |
| time_elapsed       | 69.3         |
| total_timesteps    | 2176         |
| value_loss         | 4414.0176    |
-------------------------------------
-------------------------------------
| approxkl           | 0.0011339688 |
| clipfrac           | 0.01953125   |
| explained_variance | 0.00757      |
| fps                | 32           |
| n_updates          | 18           |
| policy_entropy     | 0.24806914   |
| policy_loss        | -0.025814183 |
| serial_timesteps   | 2304         |
| time_elapsed       | 76.6         |
| total_timesteps    | 2304         |
| value_loss         | 4313.063     |
-------------------------------------
-------------------------------------
| approxkl           | 0.0007391007 |
| clipfrac           | 0.01171875   |
| explained_variance | 0.0316       |
| fps                | 23           |
| n_updates          | 19           |
| policy_entropy     | 0.19068442   |
| policy_loss        | -0.02404321  |
| serial_timesteps   | 2432         |
| time_elapsed       | 80.5         |
| total_timesteps    | 2432         |
| value_loss         | 4365.6025    |
-------------------------------------
-------------------------------------
| approxkl           | 0.0008076045 |
| clipfrac           | 0.01171875   |
| explained_variance | -0.0131      |
| fps                | 29           |
| n_updates          | 20           |
| policy_entropy     | 0.14820632   |
| policy_loss        | -0.026867293 |
| serial_timesteps   | 2560         |
| time_elapsed       | 85.9         |
| total_timesteps    | 2560         |
| value_loss         | 4335.884     |
-------------------------------------
--------------------------------------
| approxkl           | 5.5856257e-05 |
| clipfrac           | 0.0           |
| explained_variance | -0.0232       |
| fps                | 27            |
| n_updates          | 21            |
| policy_entropy     | 0.11489105    |
| policy_loss        | -0.0046238033 |
| serial_timesteps   | 2688          |
| time_elapsed       | 90.3          |
| total_timesteps    | 2688          |
| value_loss         | 4331.92       |
--------------------------------------
--------------------------------------
| approxkl           | 2.1058022e-05 |
| clipfrac           | 0.0           |
| explained_variance | 0             |
| fps                | 17            |
| n_updates          | 22            |
| policy_entropy     | 0.09650411    |
| policy_loss        | -0.0025608484 |
| serial_timesteps   | 2816          |
| time_elapsed       | 94.9          |
| total_timesteps    | 2816          |
| value_loss         | 4296.74       |
--------------------------------------
--------------------------------------
| approxkl           | 4.2838517e-05 |
| clipfrac           | 0.0           |
| explained_variance | -0.0345       |
| fps                | 28            |
| n_updates          | 23            |
| policy_entropy     | 0.08733705    |
| policy_loss        | -0.004682667  |
| serial_timesteps   | 2944          |
| time_elapsed       | 102           |
| total_timesteps    | 2944          |
| value_loss         | 4259.8        |
--------------------------------------
---------------------------------------
| approxkl           | 0.000102847254 |
| clipfrac           | 0.001953125    |
| explained_variance | -0.0375        |
| fps                | 29             |
| n_updates          | 24             |
| policy_entropy     | 0.07257957     |
| policy_loss        | -0.0053370036  |
| serial_timesteps   | 3072           |
| time_elapsed       | 107            |
| total_timesteps    | 3072           |
| value_loss         | 4229.9087      |
---------------------------------------
--------------------------------------
| approxkl           | 1.1270785e-07 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 32            |
| n_updates          | 25            |
| policy_entropy     | 0.06345923    |
| policy_loss        | 0.0           |
| serial_timesteps   | 3200          |
| time_elapsed       | 111           |
| total_timesteps    | 3200          |
| value_loss         | 4198.5474     |
--------------------------------------
--------------------------------------
| approxkl           | 5.508119e-06  |
| clipfrac           | 0.0           |
| explained_variance | -0.0257       |
| fps                | 28            |
| n_updates          | 26            |
| policy_entropy     | 0.06236457    |
| policy_loss        | -0.0010745144 |
| serial_timesteps   | 3328          |
| time_elapsed       | 115           |
| total_timesteps    | 3328          |
| value_loss         | 4156.753      |
--------------------------------------
--------------------------------------
| approxkl           | 1.636944e-05  |
| clipfrac           | 0.0           |
| explained_variance | 0             |
| fps                | 15            |
| n_updates          | 27            |
| policy_entropy     | 0.055864867   |
| policy_loss        | -0.0025804834 |
| serial_timesteps   | 3456          |
| time_elapsed       | 119           |
| total_timesteps    | 3456          |
| value_loss         | 4127.7583     |
--------------------------------------
--------------------------------------
| approxkl           | 2.8690836e-05 |
| clipfrac           | 0.0           |
| explained_variance | -1.19e-07     |
| fps                | 29            |
| n_updates          | 28            |
| policy_entropy     | 0.0511804     |
| policy_loss        | -0.0035447306 |
| serial_timesteps   | 3584          |
| time_elapsed       | 128           |
| total_timesteps    | 3584          |
| value_loss         | 4091.4973     |
--------------------------------------
--------------------------------------
| approxkl           | 7.4687875e-08 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 30            |
| n_updates          | 29            |
| policy_entropy     | 0.047582556   |
| policy_loss        | 0.0           |
| serial_timesteps   | 3712          |
| time_elapsed       | 132           |
| total_timesteps    | 3712          |
| value_loss         | 4069.115      |
--------------------------------------
--------------------------------------
| approxkl           | 4.1875264e-06 |
| clipfrac           | 0.0           |
| explained_variance | 0             |
| fps                | 29            |
| n_updates          | 30            |
| policy_entropy     | 0.045673173   |
| policy_loss        | -0.0009450745 |
| serial_timesteps   | 3840          |
| time_elapsed       | 136           |
| total_timesteps    | 3840          |
| value_loss         | 4034.147      |
--------------------------------------
-------------------------------------
| approxkl           | 8.003526e-08 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 31           |
| n_updates          | 31           |
| policy_entropy     | 0.04141441   |
| policy_loss        | 0.0          |
| serial_timesteps   | 3968         |
| time_elapsed       | 141          |
| total_timesteps    | 3968         |
| value_loss         | 4008.5713    |
-------------------------------------
---------------------------------------
| approxkl           | 2.251936e-06   |
| clipfrac           | 0.0            |
| explained_variance | 0              |
| fps                | 31             |
| n_updates          | 32             |
| policy_entropy     | 0.03956447     |
| policy_loss        | -0.00078786956 |
| serial_timesteps   | 4096           |
| time_elapsed       | 145            |
| total_timesteps    | 4096           |
| value_loss         | 3974.9924      |
---------------------------------------
-------------------------------------
| approxkl           | 6.080667e-08 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 32           |
| n_updates          | 33           |
| policy_entropy     | 0.035701253  |
| policy_loss        | 0.0          |
| serial_timesteps   | 4224         |
| time_elapsed       | 149          |
| total_timesteps    | 4224         |
| value_loss         | 3948.7913    |
-------------------------------------
--------------------------------------
| approxkl           | 1.4656681e-09 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 12            |
| n_updates          | 34            |
| policy_entropy     | 0.03424095    |
| policy_loss        | 0.0           |
| serial_timesteps   | 4352          |
| time_elapsed       | 153           |
| total_timesteps    | 4352          |
| value_loss         | 3919.9714     |
--------------------------------------
-------------------------------------
| approxkl           | 6.521696e-12 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 29           |
| n_updates          | 35           |
| policy_entropy     | 0.034051728  |
| policy_loss        | 0.0          |
| serial_timesteps   | 4480         |
| time_elapsed       | 163          |
| total_timesteps    | 4480         |
| value_loss         | 3890.7466    |
-------------------------------------
---------------------------------------
| approxkl           | 5.814285e-07   |
| clipfrac           | 0.0            |
| explained_variance | -0.0895        |
| fps                | 29             |
| n_updates          | 36             |
| policy_entropy     | 0.03464252     |
| policy_loss        | -0.00025901757 |
| serial_timesteps   | 4608           |
| time_elapsed       | 168            |
| total_timesteps    | 4608           |
| value_loss         | 3858.5444      |
---------------------------------------
--------------------------------------
| approxkl           | 1.0398925e-05 |
| clipfrac           | 0.0           |
| explained_variance | 2.98e-07      |
| fps                | 30            |
| n_updates          | 37            |
| policy_entropy     | 0.030399777   |
| policy_loss        | -0.0020691967 |
| serial_timesteps   | 4736          |
| time_elapsed       | 172           |
| total_timesteps    | 4736          |
| value_loss         | 3828.6802     |
--------------------------------------
--------------------------------------
| approxkl           | 2.3587065e-08 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 29            |
| n_updates          | 38            |
| policy_entropy     | 0.02762542    |
| policy_loss        | 0.0           |
| serial_timesteps   | 4864          |
| time_elapsed       | 176           |
| total_timesteps    | 4864          |
| value_loss         | 3804.909      |
--------------------------------------
-------------------------------------
| approxkl           | 5.955296e-10 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 25           |
| n_updates          | 39           |
| policy_entropy     | 0.026657818  |
| policy_loss        | 0.0          |
| serial_timesteps   | 4992         |
| time_elapsed       | 181          |
| total_timesteps    | 4992         |
| value_loss         | 3777.6287    |
-------------------------------------
--------------------------------------
| approxkl           | 2.6384682e-12 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 28            |
| n_updates          | 40            |
| policy_entropy     | 0.026530726   |
| policy_loss        | 0.0           |
| serial_timesteps   | 5120          |
| time_elapsed       | 186           |
| total_timesteps    | 5120          |
| value_loss         | 3750.984      |
--------------------------------------
------------------------------------
| approxkl           | 6.78427e-12 |
| clipfrac           | 0.0         |
| explained_variance | nan         |
| fps                | 27          |
| n_updates          | 41          |
| policy_entropy     | 0.026557334 |
| policy_loss        | 0.0         |
| serial_timesteps   | 5248        |
| time_elapsed       | 190         |
| total_timesteps    | 5248        |
| value_loss         | 3724.9565   |
------------------------------------
----------------------------------------
| approxkl           | 1.6480338e-07   |
| clipfrac           | 0.0             |
| explained_variance | -0.0964         |
| fps                | 12              |
| n_updates          | 42              |
| policy_entropy     | 0.027045451     |
| policy_loss        | -0.000119969714 |
| serial_timesteps   | 5376            |
| time_elapsed       | 195             |
| total_timesteps    | 5376            |
| value_loss         | 3695.2075       |
----------------------------------------
--------------------------------------
| approxkl           | 1.0748278e-05 |
| clipfrac           | 0.0           |
| explained_variance | 1.19e-07      |
| fps                | 31            |
| n_updates          | 43            |
| policy_entropy     | 0.023529774   |
| policy_loss        | -0.0020988667 |
| serial_timesteps   | 5504          |
| time_elapsed       | 205           |
| total_timesteps    | 5504          |
| value_loss         | 3668.289      |
--------------------------------------
--------------------------------------
| approxkl           | 9.843582e-06  |
| clipfrac           | 0.0           |
| explained_variance | -0.0959       |
| fps                | 29            |
| n_updates          | 44            |
| policy_entropy     | 0.021820018   |
| policy_loss        | -0.0019928538 |
| serial_timesteps   | 5632          |
| time_elapsed       | 209           |
| total_timesteps    | 5632          |
| value_loss         | 3643.6006     |
--------------------------------------
--------------------------------------
| approxkl           | 1.6955488e-05 |
| clipfrac           | 0.0           |
| explained_variance | -0.0952       |
| fps                | 29            |
| n_updates          | 45            |
| policy_entropy     | 0.019816618   |
| policy_loss        | -0.0025166627 |
| serial_timesteps   | 5760          |
| time_elapsed       | 213           |
| total_timesteps    | 5760          |
| value_loss         | 3617.57       |
--------------------------------------
-------------------------------------
| approxkl           | 8.778779e-09 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 30           |
| n_updates          | 46           |
| policy_entropy     | 0.017607767  |
| policy_loss        | 0.0          |
| serial_timesteps   | 5888         |
| time_elapsed       | 218          |
| total_timesteps    | 5888         |
| value_loss         | 3594.7239    |
-------------------------------------
--------------------------------------
| approxkl           | 2.3879365e-10 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 30            |
| n_updates          | 47            |
| policy_entropy     | 0.01695734    |
| policy_loss        | 0.0           |
| serial_timesteps   | 6016          |
| time_elapsed       | 222           |
| total_timesteps    | 6016          |
| value_loss         | 3569.6123     |
--------------------------------------
---------------------------------------
| approxkl           | 7.391328e-09   |
| clipfrac           | 0.0            |
| explained_variance | -1.19e-07      |
| fps                | 30             |
| n_updates          | 48             |
| policy_entropy     | 0.016850296    |
| policy_loss        | -4.8967544e-05 |
| serial_timesteps   | 6144           |
| time_elapsed       | 226            |
| total_timesteps    | 6144           |
| value_loss         | 3541.0596      |
---------------------------------------
--------------------------------------
| approxkl           | 2.9383868e-05 |
| clipfrac           | 0.0           |
| explained_variance | -0.0493       |
| fps                | 30            |
| n_updates          | 49            |
| policy_entropy     | 0.01639788    |
| policy_loss        | -0.004105901  |
| serial_timesteps   | 6272          |
| time_elapsed       | 230           |
| total_timesteps    | 6272          |
| value_loss         | 3514.154      |
--------------------------------------
--------------------------------------
| approxkl           | 1.8241103e-08 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 28            |
| n_updates          | 50            |
| policy_entropy     | 0.013761687   |
| policy_loss        | 0.0           |
| serial_timesteps   | 6400          |
| time_elapsed       | 234           |
| total_timesteps    | 6400          |
| value_loss         | 3496.9583     |
--------------------------------------
-------------------------------------
| approxkl           | 4.921573e-10 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 26           |
| n_updates          | 51           |
| policy_entropy     | 0.012798915  |
| policy_loss        | 0.0          |
| serial_timesteps   | 6528         |
| time_elapsed       | 239          |
| total_timesteps    | 6528         |
| value_loss         | 3473.544     |
-------------------------------------
--------------------------------------
| approxkl           | 1.2990284e-11 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 10            |
| n_updates          | 52            |
| policy_entropy     | 0.0126367025  |
| policy_loss        | 0.0           |
| serial_timesteps   | 6656          |
| time_elapsed       | 244           |
| total_timesteps    | 6656          |
| value_loss         | 3449.3325     |
--------------------------------------
-------------------------------------
| approxkl           | 4.424428e-15 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 24           |
| n_updates          | 53           |
| policy_entropy     | 0.01261734   |
| policy_loss        | 0.0          |
| serial_timesteps   | 6784         |
| time_elapsed       | 256          |
| total_timesteps    | 6784         |
| value_loss         | 3425.9346    |
-------------------------------------
-------------------------------------
| approxkl           | 3.002179e-13 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 28           |
| n_updates          | 54           |
| policy_entropy     | 0.012623467  |
| policy_loss        | 0.0          |
| serial_timesteps   | 6912         |
| time_elapsed       | 261          |
| total_timesteps    | 6912         |
| value_loss         | 3402.3008    |
-------------------------------------
--------------------------------------
| approxkl           | 4.8433507e-13 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 29            |
| n_updates          | 55            |
| policy_entropy     | 0.012635817   |
| policy_loss        | 0.0           |
| serial_timesteps   | 7040          |
| time_elapsed       | 266           |
| total_timesteps    | 7040          |
| value_loss         | 3379.5684     |
--------------------------------------
-------------------------------------
| approxkl           | 4.046746e-13 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 30           |
| n_updates          | 56           |
| policy_entropy     | 0.01264801   |
| policy_loss        | 0.0          |
| serial_timesteps   | 7168         |
| time_elapsed       | 270          |
| total_timesteps    | 7168         |
| value_loss         | 3356.5557    |
-------------------------------------
--------------------------------------
| approxkl           | 4.0507368e-13 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 30            |
| n_updates          | 57            |
| policy_entropy     | 0.0126599725  |
| policy_loss        | 0.0           |
| serial_timesteps   | 7296          |
| time_elapsed       | 274           |
| total_timesteps    | 7296          |
| value_loss         | 3333.3027     |
--------------------------------------
--------------------------------------
| approxkl           | 4.2151704e-13 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 30            |
| n_updates          | 58            |
| policy_entropy     | 0.012672037   |
| policy_loss        | 0.0           |
| serial_timesteps   | 7424          |
| time_elapsed       | 279           |
| total_timesteps    | 7424          |
| value_loss         | 3310.2527     |
--------------------------------------
-------------------------------------
| approxkl           | 3.842457e-13 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 31           |
| n_updates          | 59           |
| policy_entropy     | 0.012684286  |
| policy_loss        | 0.0          |
| serial_timesteps   | 7552         |
| time_elapsed       | 283          |
| total_timesteps    | 7552         |
| value_loss         | 3287.4731    |
-------------------------------------
-------------------------------------
| approxkl           | 5.241878e-13 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 27           |
| n_updates          | 60           |
| policy_entropy     | 0.012696676  |
| policy_loss        | 0.0          |
| serial_timesteps   | 7680         |
| time_elapsed       | 287          |
| total_timesteps    | 7680         |
| value_loss         | 3264.962     |
-------------------------------------
-------------------------------------
| approxkl           | 4.985035e-13 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 28           |
| n_updates          | 61           |
| policy_entropy     | 0.012709266  |
| policy_loss        | 0.0          |
| serial_timesteps   | 7808         |
| time_elapsed       | 291          |
| total_timesteps    | 7808         |
| value_loss         | 3242.7092    |
-------------------------------------
---------------------------------------
| approxkl           | 3.615873e-10   |
| clipfrac           | 0.0            |
| explained_variance | -3.58e-07      |
| fps                | 26             |
| n_updates          | 62             |
| policy_entropy     | 0.012707609    |
| policy_loss        | -3.6573038e-06 |
| serial_timesteps   | 7936           |
| time_elapsed       | 296            |
| total_timesteps    | 7936           |
| value_loss         | 3217.091       |
---------------------------------------
--------------------------------------
| approxkl           | 3.4761847e-09 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 28            |
| n_updates          | 63            |
| policy_entropy     | 0.011906338   |
| policy_loss        | 0.0           |
| serial_timesteps   | 8064          |
| time_elapsed       | 301           |
| total_timesteps    | 8064          |
| value_loss         | 3198.923      |
--------------------------------------
-------------------------------------
| approxkl           | 9.972234e-11 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 26           |
| n_updates          | 64           |
| policy_entropy     | 0.0114687495 |
| policy_loss        | 0.0          |
| serial_timesteps   | 8192         |
| time_elapsed       | 305          |
| total_timesteps    | 8192         |
| value_loss         | 3177.3672    |
-------------------------------------
---------------------------------------
| approxkl           | 9.6275965e-09  |
| clipfrac           | 0.0            |
| explained_variance | -0.0722        |
| fps                | 8              |
| n_updates          | 65             |
| policy_entropy     | 0.011702726    |
| policy_loss        | -5.7652127e-05 |
| serial_timesteps   | 8320           |
| time_elapsed       | 310            |
| total_timesteps    | 8320           |
| value_loss         | 3153.4082      |
---------------------------------------
--------------------------------------
| approxkl           | 1.8454681e-09 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 26            |
| n_updates          | 66            |
| policy_entropy     | 0.010539748   |
| policy_loss        | 0.0           |
| serial_timesteps   | 8448          |
| time_elapsed       | 326           |
| total_timesteps    | 8448          |
| value_loss         | 3134.8647     |
--------------------------------------
-------------------------------------
| approxkl           | 5.293721e-11 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 29           |
| n_updates          | 67           |
| policy_entropy     | 0.0102127725 |
| policy_loss        | 0.0          |
| serial_timesteps   | 8576         |
| time_elapsed       | 331          |
| total_timesteps    | 8576         |
| value_loss         | 3113.911     |
-------------------------------------
-------------------------------------
| approxkl           | 9.471275e-13 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 26           |
| n_updates          | 68           |
| policy_entropy     | 0.010160062  |
| policy_loss        | 0.0          |
| serial_timesteps   | 8704         |
| time_elapsed       | 335          |
| total_timesteps    | 8704         |
| value_loss         | 3093.1445    |
-------------------------------------
-------------------------------------
| approxkl           | 6.822244e-14 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 28           |
| n_updates          | 69           |
| policy_entropy     | 0.010157408  |
| policy_loss        | 0.0          |
| serial_timesteps   | 8832         |
| time_elapsed       | 340          |
| total_timesteps    | 8832         |
| value_loss         | 3072.5557    |
-------------------------------------
-------------------------------------
| approxkl           | 1.745386e-13 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 26           |
| n_updates          | 70           |
| policy_entropy     | 0.010164159  |
| policy_loss        | 0.0          |
| serial_timesteps   | 8960         |
| time_elapsed       | 345          |
| total_timesteps    | 8960         |
| value_loss         | 3052.1367    |
-------------------------------------
---------------------------------------
| approxkl           | 1.7796338e-09  |
| clipfrac           | 0.0            |
| explained_variance | -0.0652        |
| fps                | 25             |
| n_updates          | 71             |
| policy_entropy     | 0.0104862265   |
| policy_loss        | -1.4320016e-05 |
| serial_timesteps   | 9088           |
| time_elapsed       | 349            |
| total_timesteps    | 9088           |
| value_loss         | 3029.3086      |
---------------------------------------
-------------------------------------
| approxkl           | 2.333519e-09 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 28           |
| n_updates          | 72           |
| policy_entropy     | 0.009473039  |
| policy_loss        | 0.0          |
| serial_timesteps   | 9216         |
| time_elapsed       | 354          |
| total_timesteps    | 9216         |
| value_loss         | 3011.7778    |
-------------------------------------
-------------------------------------
| approxkl           | 6.870047e-11 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 28           |
| n_updates          | 73           |
| policy_entropy     | 0.009099268  |
| policy_loss        | 0.0          |
| serial_timesteps   | 9344         |
| time_elapsed       | 359          |
| total_timesteps    | 9344         |
| value_loss         | 2990.942     |
-------------------------------------
--------------------------------------
| approxkl           | 1.6176166e-12 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 27            |
| n_updates          | 74            |
| policy_entropy     | 0.009037095   |
| policy_loss        | 0.0           |
| serial_timesteps   | 9472          |
| time_elapsed       | 363           |
| total_timesteps    | 9472          |
| value_loss         | 2970.5623     |
--------------------------------------
--------------------------------------
| approxkl           | 1.3288071e-15 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 26            |
| n_updates          | 75            |
| policy_entropy     | 0.009030421   |
| policy_loss        | 0.0           |
| serial_timesteps   | 9600          |
| time_elapsed       | 368           |
| total_timesteps    | 9600          |
| value_loss         | 2949.9497     |
--------------------------------------
---------------------------------------
| approxkl           | 1.1359422e-08  |
| clipfrac           | 0.0            |
| explained_variance | -0.0324        |
| fps                | 27             |
| n_updates          | 76             |
| policy_entropy     | 0.009321555    |
| policy_loss        | -6.9826376e-05 |
| serial_timesteps   | 9728           |
| time_elapsed       | 373            |
| total_timesteps    | 9728           |
| value_loss         | 2924.0024      |
---------------------------------------
--------------------------------------
| approxkl           | 2.3084932e-09 |
| clipfrac           | 0.0           |
| explained_variance | nan           |
| fps                | 30            |
| n_updates          | 77            |
| policy_entropy     | 0.008434594   |
| policy_loss        | 0.0           |
| serial_timesteps   | 9856          |
| time_elapsed       | 378           |
| total_timesteps    | 9856          |
| value_loss         | 2909.5547     |
--------------------------------------
-------------------------------------
| approxkl           | 6.620775e-11 |
| clipfrac           | 0.0          |
| explained_variance | nan          |
| fps                | 28           |
| n_updates          | 78           |
| policy_entropy     | 0.008057287  |
| policy_loss        | 0.0          |
| serial_timesteps   | 9984         |
| time_elapsed       | 382          |
| total_timesteps    | 9984         |
| value_loss         | 2889.783     |
-------------------------------------

Out[6]:

<stable_baselines.ppo2.ppo2.PPO2 at 0x7fda7c594550>

Last, we test it by letting it play the game.

In [7]:

obs = env.reset()
display(env.render())

for _ in range(1):
    action, _states = modelPPO2.predict(obs)
    obs, _, done, info = env.step(action)
    display(info[0]['circuit_img'])
    
env.close()

As expected, the agent easily learned the optimal circuit.

A2C Agent¶

For comparison, we now run an A2C agent, another agent from the library of stable_baselines.

First we import the agent.

In [8]:

from stable_baselines import A2C

We train it.

In [9]:

modelA2C = A2C(MlpPolicy, env, verbose=1)
modelA2C.learn(total_timesteps=10000)

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:312: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:312: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/a2c/a2c.py:159: The name tf.train.RMSPropOptimizer is deprecated. Please use tf.compat.v1.train.RMSPropOptimizer instead.

WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
---------------------------------
| explained_variance | 0.0313   |
| fps                | 7        |
| nupdates           | 1        |
| policy_entropy     | 1.1      |
| total_timesteps    | 5        |
| value_loss         | 9.92e+03 |
---------------------------------
---------------------------------
| explained_variance | 0.00121  |
| fps                | 12       |
| nupdates           | 100      |
| policy_entropy     | 1.1      |
| total_timesteps    | 500      |
| value_loss         | 7.91e+03 |
---------------------------------
---------------------------------
| explained_variance | 0.000339 |
| fps                | 16       |
| nupdates           | 200      |
| policy_entropy     | 1.1      |
| total_timesteps    | 1000     |
| value_loss         | 7.81e+03 |
---------------------------------
---------------------------------
| explained_variance | -0.0287  |
| fps                | 18       |
| nupdates           | 300      |
| policy_entropy     | 1.1      |
| total_timesteps    | 1500     |
| value_loss         | 9.76e+03 |
---------------------------------
---------------------------------
| explained_variance | -0.00107 |
| fps                | 20       |
| nupdates           | 400      |
| policy_entropy     | 1.09     |
| total_timesteps    | 2000     |
| value_loss         | 7.64e+03 |
---------------------------------
----------------------------------
| explained_variance | -1.19e-07 |
| fps                | 21        |
| nupdates           | 500       |
| policy_entropy     | 1.07      |
| total_timesteps    | 2500      |
| value_loss         | 9.46e+03  |
----------------------------------
---------------------------------
| explained_variance | -0.00371 |
| fps                | 21       |
| nupdates           | 600      |
| policy_entropy     | 1.06     |
| total_timesteps    | 3000     |
| value_loss         | 3.73e+03 |
---------------------------------
---------------------------------
| explained_variance | 0        |
| fps                | 23       |
| nupdates           | 700      |
| policy_entropy     | 1        |
| total_timesteps    | 3500     |
| value_loss         | 8.85e+03 |
---------------------------------
---------------------------------
| explained_variance | 0        |
| fps                | 23       |
| nupdates           | 800      |
| policy_entropy     | 0.939    |
| total_timesteps    | 4000     |
| value_loss         | 7.97e+03 |
---------------------------------
---------------------------------
| explained_variance | 0        |
| fps                | 23       |
| nupdates           | 900      |
| policy_entropy     | 0.8      |
| total_timesteps    | 4500     |
| value_loss         | 6.86e+03 |
---------------------------------
----------------------------------
| explained_variance | -1.19e-07 |
| fps                | 23        |
| nupdates           | 1000      |
| policy_entropy     | 0.661     |
| total_timesteps    | 5000      |
| value_loss         | 3.6e+03   |
----------------------------------
---------------------------------
| explained_variance | -1.57    |
| fps                | 24       |
| nupdates           | 1100     |
| policy_entropy     | 0.554    |
| total_timesteps    | 5500     |
| value_loss         | 5.12e+03 |
---------------------------------
---------------------------------
| explained_variance | 0        |
| fps                | 24       |
| nupdates           | 1200     |
| policy_entropy     | 0.375    |
| total_timesteps    | 6000     |
| value_loss         | 4.35e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 24       |
| nupdates           | 1300     |
| policy_entropy     | 0.264    |
| total_timesteps    | 6500     |
| value_loss         | 3.79e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 24       |
| nupdates           | 1400     |
| policy_entropy     | 0.172    |
| total_timesteps    | 7000     |
| value_loss         | 3.24e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 24       |
| nupdates           | 1500     |
| policy_entropy     | 0.159    |
| total_timesteps    | 7500     |
| value_loss         | 2.74e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 24       |
| nupdates           | 1600     |
| policy_entropy     | 0.122    |
| total_timesteps    | 8000     |
| value_loss         | 2.28e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 24       |
| nupdates           | 1700     |
| policy_entropy     | 0.103    |
| total_timesteps    | 8500     |
| value_loss         | 1.86e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 23       |
| nupdates           | 1800     |
| policy_entropy     | 0.0669   |
| total_timesteps    | 9000     |
| value_loss         | 1.49e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 23       |
| nupdates           | 1900     |
| policy_entropy     | 0.0509   |
| total_timesteps    | 9500     |
| value_loss         | 1.16e+03 |
---------------------------------
---------------------------------
| explained_variance | nan      |
| fps                | 24       |
| nupdates           | 2000     |
| policy_entropy     | 0.0564   |
| total_timesteps    | 10000    |
| value_loss         | 870      |
---------------------------------

Out[9]:

<stable_baselines.a2c.a2c.A2C at 0x7fd9682dd450>

And we test it by letting it play the game.

In [10]:

obs = env.reset()
display(env.render())

for _ in range(1):
    action, _states = modelA2C.predict(obs)
    obs, _, done, info = env.step(action)
    display(info[0]['circuit_img'])
    
env.close()

Comparison of the agents¶

Finally, we compare the agents quantitavely by contrasting their average reward computed running 1000 episodes of the game. We rely on the evaluation module that provides simple and standard routines to evaluate the agents.

In [11]:

import evaluation
n_episodes = 1000

PPO2_perf, _ = evaluation.evaluate_model(modelPPO2, env, num_steps=n_episodes)
A2C_perf, _ = evaluation.evaluate_model(modelA2C, env, num_steps=n_episodes)

env = gym.make('qcircuit-v0')
rand_perf, _ = evaluation.evaluate_random(env, num_steps=n_episodes)

In [12]:

print('Mean performance of random agent (out of {0} episodes): {1}'.format(n_episodes,rand_perf))
print('Mean performance of PPO2 agent (out of {0} episodes): {1}'.format(n_episodes,PPO2_perf))
print('Mean performance of A2C agent (out of {0} episodes): {1}'.format(n_episodes,A2C_perf))

Mean performance of random agent (out of 1000 episodes): 97.674
Mean performance of PPO2 agent (out of 1000 episodes): 99.9
Mean performance of A2C agent (out of 1000 episodes): 99.893

As expected the reinforcement learning agents (PPO2, A2C) learned to play the game optimally. The random agent is still able to play and reach a solution given the small state and action space available; its average reward, however, is clearly inferior; on average it takes the random agent two and half more actions (or guesses) than PPO2/A2C per episode to reach the solution.

References¶

[1] IBM qiskit, https://qiskit.org/

[2] OpenAI gym, http://gym.openai.com/docs/

[3] stable-baselines, https://github.com/hill-a/stable-baselines

[4] gym-qcircuit, https://github.com/FMZennaro/gym-qcircuit