Try me out interactively with:
objectives This notebooks briefly explains how to use grid2op with commonly used RL frameworks. It also explains the main methods / class of the grid2op.gym_compat
module that ease grid2op integration with these frameworks.
This explains the ideas and shows a "self contained" somewhat minimal example of use of some RL frameworks with grid2op. For a more complete, easier, more concise, etc. integration, please use the "l2rpn_baselines" package.
The structure is always very similar:
In this notebook, we will demonstrate its usage with 3 different framework. The code provided here are given as examples and we do not assume anything on their performance or fitness of use. More detailed example will be provided in the l2rpn-baselines repository in due time (work in progress at the time of writing this notebook). The 3 framework we will demonstrate an example of are:
Other RL frameworks are not cover here. If you already use them, let us know !
Note also that there is still the possibility to use past codes in the l2rpn-baselines repository: https://github.com/rte-france/l2rpn-baselines . This repository contains code snippets that can be reuse to make really nice agents on the l2rpn competitions. You can try it out :-)
Execute the cell below by removing the `#` characters if you use google colab !Cell will look like:
import sys
!$sys.executable install grid2op[optional] # for use with google colab (grid2Op is not installed by default)
!$sys.executable install tensorflow pytorch stable-baselines3 'ray[rllib]' tf_agents
It might take a while
import sys
# !$sys.executable -m pip install grid2op[optional] # for use with google colab (grid2Op is not installed by default)
# !$sys.executable -m pip install stable-baselines3 'ray[rllib]' tf_agents
# because this notebook is part of some tests, we train the agent for only a small number of steps
nb_step_train = 0
For the organisation of this notebook we decided to first detail features closer to grid2op to go later on "higher level" feature that are closer to "standard" gym representation (eg Box
and Discrete
space).
Note the closer you are to grid2op the more grid2op feature you can use. For example, in gym environment, it is not possible to use the "simulate" function (remember, this function allow to use a simulator that has a behaviour close to the one of the environment) at all. Also, grid2op observation and action comes with a lot of different feature (capacity to add them, to retrieve the graph of the grid etc.) which is not possible to use directly in gym.
That being said, this notebook is organized as follow:
gym_compat
grid2op module allowing to convert a grid2op environment to a gym environment.gym_env.observation_space.ignore_attr
) or transforming feature from a continuous space to a discrete space (ContinuousToDiscreteConverter
)keep_only_attr
) or to scale the data on between a certain range (ScalerAttrConverter
)gym_env.observation_space.reencode_space
and gym_env.observation_space.add_key
) or modifying the type of gym attribute (MultiToTupleConverter
) as well as an example of how to use RLLIB frameworkBoxGymObsSpace
and to use BoxGymActSpace
if you are more focus on continuous actions and MultiDiscreteActSpace
for discrete actions (NB in both case there will be loss of information as compared to regular grid2op actions! for example it will be harder to have a representation of the graph of the grid there)DiscreteActSpace
On each sections, we also explain concisely how to train the agent. Note that we did not spend any time on customizing the default agents and training scheme. It is then less than likely that these agents there
As in other machine learning tasks, we highly recommend, before even trying to train an agent, to split the "chronics" (ie the episode data) into 3 datasets:
Grid2op lets you do that with relative ease:
import grid2op
env_name = "l2rpn_case14_sandbox" # or any other...
env = grid2op.make(env_name)
# extract 1% of the "chronics" to be used in the validation environment. The other 99% will
# be used for test
nm_env_train, nm_env_val, nm_env_test = env.train_val_split_random(pct_val=1., pct_test=1.)
# and now you can use the training set only to train your agent:
print(f"The name of the training environment is \\"{nm_env_train}\\"")
print(f"The name of the validation environment is \\"{nm_env_val}\\"")
print(f"The name of the test environment is \\"{nm_env_test}\\"")
And now, you can use the training environment to train your agent:
import grid2op
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name+"train")
Be carefull, on windows you might run into issues. Don't hesitate to have a look at the documentation of this funciton if this the case (see https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split and https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split_random)
More information are provided here: https://grid2op.readthedocs.io/en/latest/environment.html#splitting-into-raining-validation-test-scenarios
The grid2op documentation is full of details to "optimize" the number of steps you can do per seconds. This number can rise from a few dozen per seconds to around a thousands per seconds with proper care.
We strongly encouraged you to leverage all the possibilities which includes (but are not limited to):
This is a rather standard step, with lots of inspiration drawn from openAI gym framework, and there is absolutely no specificity here.
import grid2op
try:
from lightsim2grid import LightSimBackend
bk_cls = LightSimBackend
except ImportError as exc:
print(f"Error: {exc} when importing faster LightSimBackend")
from grid2op.Backend import PandaPowerBackend
bk_cls = PandaPowerBackend
env_name = "l2rpn_case14_sandbox"
env_glop = grid2op.make(env_name, test=True, backend=bk_cls())
# NOTE: do not set the flag "test=True" for a real usage !
# NOTE: use grid2op.make(env_name+"_train", test=True) for a real usage (see paragraph above !)
# This flag is here for testing purpose !!!
obs_glop = env_glop.reset()
obs_glop
To that end, we recommend using the "gym_compat" module. More information is given in the official grid2op documentation
import gym
import numpy as np
from grid2op.gym_compat import GymEnv
env_gym = GymEnv(env_glop)
print(f"The \"env_gym\" is a gym environment: {isinstance(env_gym, gym.Env)}")
obs_gym = env_gym.reset()
# obs_gym
This step is optional, but highly recommended.
By default, grid2op actions and observations are huge. Even for this very simplistic example, you have really important sizes:
dim_act_space = np.sum([np.sum(env_gym.action_space[el].shape) for el in env_gym.action_space.spaces])
print(f"The size of the action space is : "
f"{dim_act_space}")
dim_obs_space = np.sum([np.sum(env_gym.observation_space[el].shape).astype(int)
for el in env_gym.observation_space.spaces])
print(f"The size of the observation space is : "
f"{dim_obs_space}")
This is partly due because in grid2op, you can represent the same concept (eg reconnect a powerline) in different manners (in this case: either you "toggle a switch" - if the said powerline was connected, it will disconnect it, otherwise it will reconnect it- or you can say "i want this line connected whatever its original state"). This behaviour is detailed in the official grid2op documentation.
To (in general) reduce the action space by a factor of 2, you can represent these actions only using the change method (for example). You can do that with:
# example: ignore the "set_status" and "set_bus" type of actions, that are covered by the "change_status" and
# "change_bus"
env_gym.action_space = env_gym.action_space.ignore_attr("set_bus").ignore_attr("set_line_status")
new_dim_act_space = np.sum([np.sum(env_gym.action_space[el].shape) for el in env_gym.action_space.spaces])
print(f"The new size of the action space is : {new_dim_act_space}")
Grid2op environments allow for both continuous and discrete action. For the sake of the example, let's "convert" the continuous actions in discrete ones (this is done with "binning" the values as explained in more details in the documentation )
# example: convert the continuous action type "redispatch" to a discrete action type
from grid2op.gym_compat import ContinuousToDiscreteConverter
env_gym.action_space = env_gym.action_space.reencode_space("redispatch",
ContinuousToDiscreteConverter(nb_bins=11)
)
# And now our action space looks like:
env_gym.action_space
You also have the possibility to use other types of more common action space.
For example, just like in most Atari Games, you can encode each unary action by an integer (for example "0" might be "turn left", "1" "turn right" etc.) and have you argent predict the ID of the action instead of its complex form.
This action space will "automatically" transform continuous actions into discrete by "binning" (more information on the official documentation for example here )
This can be achieved with:
from grid2op.gym_compat import DiscreteActSpace
env_gym.action_space = DiscreteActSpace(env_gym.init_env.action_space)
print(f"There are {env_gym.action_space.n} independant actions")
env_gym.action_space
You can customize it even more, for example if you have at your disposal a list of grid2op actions you want to use (and not use the other one, this is explained in the documentation).
For the obsevation space, we will remove lots of useless attributes (remember, it is for the sake of the example here, and rescale some other so that they have numbers between rougly 0. and 1., which stabilizes the learning process.
# first let's see which are the attributes in the observation space:
# More information on
# https://beta-grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes
# and
# https://grid2op.readthedocs.io/en/latest/gym.html#observation-space-and-action-space-customization
env_gym.observation_space
Let's keep only the information about the flow on the powerlines: rho
, the generation gen_p
, the load load_p
and the representation of the topology topo_vect
(for the sake of the example, once again)
env_gym.observation_space = env_gym.observation_space.keep_only_attr(["rho", "gen_p", "load_p", "topo_vect",
"actual_dispatch"])
new_dim_obs_space = np.sum([np.sum(env_gym.observation_space[el].shape).astype(int)
for el in env_gym.observation_space.spaces])
print(f"The new size of the observation space is : "
f"{new_dim_obs_space} (it was {dim_obs_space} before!)")
One other detail here, the generation and loads are not scaled (they are given in MW). We recommend to scale them to have number roughly between 0 and 1 for stability during learning.
This can be done pretty easily with the code below:
from grid2op.gym_compat import ScalerAttrConverter
from gym.spaces import Box
ob_space = env_gym.observation_space
ob_space = ob_space.reencode_space("actual_dispatch",
ScalerAttrConverter(substract=0.,
divide=env_glop.gen_pmax
)
)
ob_space = ob_space.reencode_space("gen_p",
ScalerAttrConverter(substract=0.,
divide=env_glop.gen_pmax
)
)
ob_space = ob_space.reencode_space("load_p",
ScalerAttrConverter(substract=obs_gym["load_p"],
divide=0.5 * obs_gym["load_p"]
)
)
# for even more customization, you can use any functions you want !
shape_ = (env_glop.dim_topo, env_glop.dim_topo)
env_gym.observation_space.add_key("connectivity_matrix",
lambda obs: obs.connectivity_matrix(), # can be any function returning a gym space
Box(shape=shape_,
low=np.zeros(shape_),
high=np.ones(shape_),
) # this "Box" should represent the return type of the above function
)
env_gym.observation_space = ob_space
env_gym.observation_space
In this subsection we briefly explain how to wrapped the trained agent (see below for training methods depending on the framework you want to use). The goal is to make this "tutorial" complete, in the sense that you will be able to use the trained agent in regular grid2op framework, for example using the Runner
This subsection is compatible with all code that is explained in this notebook, even though we demonstrate it with the env created above.
The basic idea is really simple, you create an grid2op agent, initialize it with the gym_env (you got from the gym_compat
module) and use the "gym_env.action_space.from_gym" and "gym_env.observation_space.to_gym" function to convert the action and the observation.
from grid2op.Agent import BaseAgent
class AgentFromGym(BaseAgent):
def __init__(self, gym_env, trained_agent):
self.gym_env = gym_env
BaseAgent.__init__(self, gym_env.init_env.action_space)
self.trained_aget = trained_agent
def act(self, obs, reward, done):
gym_obs = self.gym_env.observation_space.to_gym(obs)
gym_act = self.trained_agent.act(gym_obs, reward, done)
grid2op_act = self.gym_env.action_space.from_gym(gym_act)
return grid2op_act
And this is it. You are done ;-)
We recommend you to read the notebook 04_TrainingAnAgent for more information about this "template" agent. And most importantly, some examples of such agents (and "better" grid2op environment) are provided in the "l2rpn_baselines" package.
This part is not a tutorial on how to use rllib. Please refer to their documentation for more detailed information.
As explained in the header of this notebook, we will follow the recommended usage:
The issue with rllib is that it does not take into account MultiBinary nor MultiDiscrete action space (see
see https://github.com/ray-project/ray/issues/1519) so we need some way to encode these types of actions. This can be done automatically with the MultiToTupleConverter
provided in grid2op (as always, more information in the documentation ).
We will then use this to customize our environment previously defined:
import copy
env_rllib = copy.deepcopy(env_gym)
from grid2op.gym_compat import MultiToTupleConverter
env_rllib.action_space = env_rllib.action_space.reencode_space("change_bus", MultiToTupleConverter())
env_rllib.action_space = env_rllib.action_space.reencode_space("change_line_status", MultiToTupleConverter())
env_rllib.action_space = env_rllib.action_space.reencode_space("redispatch", MultiToTupleConverter())
env_rllib.action_space
Another specificity of RLLIB is that it handles creation of environments "on its own". This implies that you need to create a custom class representing an environment, rather a python object.
And finally, you ask it to use this class, and learn a specific agent. This is really well explained in their documentation: https://docs.ray.io/en/master/rllib-env.html#configuring-environments.
# gym specific, we simply do a copy paste of what we did in the previous cells, wrapping it in the
# MyEnv class, and train a Proximal Policy Optimisation based agent
import gym
import ray
import gym
import numpy as np
class MyEnv(gym.Env):
def __init__(self, env_config):
import grid2op
from grid2op.gym_compat import GymEnv
from grid2op.gym_compat import ScalerAttrConverter, ContinuousToDiscreteConverter, MultiToTupleConverter
# 1. create the grid2op environment
if not "env_name" in env_config:
raise RuntimeError("The configuration for RLLIB should provide the env name")
nm_env = env_config["env_name"]
del env_config["env_name"]
self.env_glop = grid2op.make(nm_env, **env_config)
# 2. create the gym environment
self.env_gym = GymEnv(self.env_glop)
obs_gym = self.env_gym.reset()
# 3. (optional) customize it (see section above for more information)
## customize action space
self.env_gym.action_space = self.env_gym.action_space.ignore_attr("set_bus").ignore_attr("set_line_status")
self.env_gym.action_space = self.env_gym.action_space.reencode_space("redispatch",
ContinuousToDiscreteConverter(nb_bins=11)
)
self.env_gym.action_space = self.env_gym.action_space.reencode_space("change_bus", MultiToTupleConverter())
self.env_gym.action_space = self.env_gym.action_space.reencode_space("change_line_status",
MultiToTupleConverter())
self.env_gym.action_space = self.env_gym.action_space.reencode_space("redispatch", MultiToTupleConverter())
## customize observation space
ob_space = self.env_gym.observation_space
ob_space = ob_space.keep_only_attr(["rho", "gen_p", "load_p", "topo_vect", "actual_dispatch"])
ob_space = ob_space.reencode_space("actual_dispatch",
ScalerAttrConverter(substract=0.,
divide=self.env_glop.gen_pmax
)
)
ob_space = ob_space.reencode_space("gen_p",
ScalerAttrConverter(substract=0.,
divide=self.env_glop.gen_pmax
)
)
ob_space = ob_space.reencode_space("load_p",
ScalerAttrConverter(substract=obs_gym["load_p"],
divide=0.5 * obs_gym["load_p"]
)
)
self.env_gym.observation_space = ob_space
# 4. specific to rllib
self.action_space = self.env_gym.action_space
self.observation_space = self.env_gym.observation_space
# 4. bis: to avoid other type of issues, we recommend to build the action space and observation
# space directly from the spaces class.
d = {k: v for k, v in self.env_gym.observation_space.spaces.items()}
self.observation_space = gym.spaces.Dict(d)
a = {k: v for k, v in self.env_gym.action_space.items()}
self.action_space = gym.spaces.Dict(a)
def reset(self):
obs = self.env_gym.reset()
return obs
def step(self, action):
obs, reward, done, info = self.env_gym.step(action)
return obs, reward, done, info
test = MyEnv({"env_name": "l2rpn_case14_sandbox"})
And now you can train it :
if nb_step_train: # remember: don't forge to change this number to perform an actual training !
from ray.rllib.agents import ppo # import the type of agents
# nb_step_train = 100 # Do not forget to turn on the actual training !
# fist initialize ray
try:
# then define a "trainer"
trainer = ppo.PPOTrainer(env=MyEnv, config={
"env_config": {"env_name":"l2rpn_case14_sandbox"}, # config to pass to env class
})
# and then train it for a given number of iteration
for step in range(nb_step_train):
trainer.train()
finally:
# shutdown ray
ray.shutdown()
Because we are approximating a physical system with real equations, and limited computational power regardless of the "backend" / "powergrid simulator" used internally by grid2op, it is sometimes possible that an observation obs["gen_p"] is not exactly in the range env.observation_space["gen_p"].low, env.observation_space["gen_p"].high.
In this "pathological" cases we recommend to manually change the low / high value of the gen_p
part of the observation space, for example by adding, after the definition of self.observation_space something like:
# 4. specific to rllib
self.action_space = self.env_gym.action_space
self.observation_space = self.env_gym.observation_space
self.observation_space["gen_p"].low[:] = -np.inf
self.observation_space["gen_p"].high[:] = np.inf
More information at https://github.com/rte-france/Grid2Op/issues/196
NB these cases can be spotted with an error like:
RayTaskError(ValueError): ray::RolloutWorker.par_iter_next() (pid=378, ip=172.28.0.2)
File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/usr/local/lib/python3.7/dist-packages/ray/util/iter.py", line 1152, in par_iter_next
return next(self.local_it)
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 327, in gen_rollouts
yield self.sample()
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 662, in sample
batches = [self.input_reader.next()]
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py", line 95, in next
batches = [self.get_data()]
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py", line 224, in get_data
item = next(self.rollout_provider)
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py", line 620, in _env_runner
sample_collector=sample_collector,
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py", line 1056, in _process_observations_w_trajectory_view_api
policy_id).transform(raw_obs)
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/models/preprocessors.py", line 257, in transform
self.check_shape(observation)
File "/usr/local/lib/python3.7/dist-packages/ray/rllib/models/preprocessors.py", line 68, in check_shape
observation, self._obs_space)
ValueError: ('Observation ({}) outside given space ({})!', OrderedDict([('actual_dispatch', array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.], dtype=float32)), ('gen_p', array([0. , 0.14583334, 0. , 0.5376 , 0. ,
0.13690476, 0. , 0. , 0.13988096, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.10416667, 0. , 0.9975 ,
0. , 0.0872582 ], dtype=float32)), ('load_p', array([-8.33333358e-02, 1.27543859e+01, -3.14843726e+00, -4.91228588e-02,
-7.84314200e-02, 2.70270016e-02, 4.51001197e-01, -7.63358772e-02,
-8.42104480e-02, -7.90961310e-02, -2.31212564e-02, -7.31706619e-02,
-5.47945984e-02, -5.57769537e-02, -4.65115122e-02, 0.00000000e+00,
-6.25000373e-02, -2.98508592e-02, 0.00000000e+00, 2.59741265e-02,
-5.12821227e-02, 2.12766770e-02, -4.38757129e-02, 1.45455096e-02,
-1.45278079e-02, -3.63636017e-02, 7.14286715e-02, 1.03358915e-02,
8.95522386e-02, 4.81927246e-02, -1.76759213e-02, 1.11111533e-02,
1.00000061e-01, -5.28445065e-01, 3.00833374e-01, 7.76839375e-01,
-7.07498193e-01], dtype=float32)), ('rho', array([0.49652272, 0.42036632, 0.12563582, 0.22375877, 0.54946697,
0.08844228, 0.05907034, 0.10975129, 0.13002895, 0.14068729,
0.17318982, 0.6956544 , 0.38796344, 0.67179894, 0.22992906,
0.25189328, 0.15049867, 0.09095841, 0.35627988, 0.35627988,
0.36776555, 0.27249542, 0.6269728 , 0.62393713, 0.3464659 ,
0.35879263, 0.22755426, 0.35994047, 0.36117986, 0.12019955,
0.03638522, 0.2805753 , 0.5809281 , 0.6191531 , 0.5243356 ,
0.60382956, 0.35834518, 0.35867074, 0.3580954 , 0.6681824 ,
0.3441911 , 0.6081861 , 0.34460714, 0.18246886, 0.10307808,
0.46778303, 0.47179568, 0.45407027, 0.30089107, 0.30089107,
0.34481782, 0.3182735 , 0.35940355, 0.21895139, 0.19766088,
0.63653564, 0.46778303, 0.4566811 , 0.64398617], dtype=float32)), ('topo_vect', array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1], dtype=int32))]), Dict(actual_dispatch:Box(-1.0, 1.0, (22,), float32), gen_p:Box(0.0, 1.2000000476837158, (22,), float32), load_p:Box(-inf, inf, (37,), float32), rho:Box(0.0, inf, (59,), float32), topo_vect:Box(-1, 2, (177,), int32)))
NB We want to emphasize here that:
For a better "usecase" of the PPO agent using RLLIB we strongly encourage you to check out the "PPO_RLLIB" agent of l2rpn_baselines package.
This part is not a tutorial on how to use stable baselines. Please refer to their documentation for more detailed information.
As explained in the header of this notebook, we will follow the recommended usage:
The issue with stable beselines 3 is that it expects standard action / observation types as explained there: https://stable-baselines3.readthedocs.io/en/master/guide/algos.html#rl-algorithms
Non-array spaces such as Dict or Tuple are not currently supported by any algorithm.
Unfortunately, it's not possible to convert without any "loss of information" an action space of dictionnary type to a vector.
It is possible to use the grid2op framework in such cases, and in this section, we will explain how.
First, as always, we convert the grid2op environment in a gym environment.
env_sb = GymEnv(env_glop) # sb for "stable baselines"
glop_obs = env_glop.reset()
Then, we need to convert everything into a "Box" as it is the only things that stable baselines seems to digest at time of writing (March 20201).
We explain here how we convert an observation as a single Box. This step is rather easy, you just need to specify which attributes of the observation you want to keep and if you want so scale them (with the keword subtract
and divide
)
from grid2op.gym_compat import BoxGymObsSpace
env_sb.observation_space = BoxGymObsSpace(env_sb.init_env.observation_space,
attr_to_keep=["gen_p", "load_p", "topo_vect",
"rho", "actual_dispatch", "connectivity_matrix"],
divide={"gen_p": env_glop.gen_pmax,
"load_p": glop_obs.load_p,
"actual_dispatch": env_glop.gen_pmax},
functs={"connectivity_matrix": (
lambda grid2obs: grid2obs.connectivity_matrix().flatten(),
0., 1., None, None,
)
}
)
obs_gym = env_sb.reset()
obs_gym in env_sb.observation_space
NB: the above code is equivalent to something like:
from gym.spaces import Box
class BoxGymObsSpaceExample(Box):
def __init__(self, observation_space)
shape = observation_space.n_gen + \ # dimension of gen_p
observation_space.n_load + \ # load_p
observation_space.dim_topo + \ # topo_vect
observation_space.n_line + \ # rho
observation_space.n_gen + \ # actual_dispatch
observation_space.dim_topo ** 2 # connectivity_matrix
ob_sp = observation_space
# lowest value the attribute can take (see doc for more information)
low = np.concatenate((np.full(shape=(ob_sp.n_gen,), fill_value=0., dtype=dt_float), # gen_p
np.full(shape=(ob_sp.n_load,), fill_value=-np.inf, dtype=dt_float), # load_p
np.full(shape=(ob_sp.dim_topo,), fill_value=-1., dtype=dt_float), # topo_vect
np.full(shape=(ob_sp.n_line,), fill_value=0., dtype=dt_float), # rho
np.full(shape=(ob_sp.n_line,), fill_value=-ob_sp.gen_pmax, dtype=dt_float), # actual_dispatch
np.full(shape=(ob_sp.dim_topo**2,), fill_value=0., dtype=dt_float), # connectivity_matrix
))
# highest value the attribute can take
high = np.concatenate((np.full(shape=(ob_sp.n_gen,), fill_value=np.inf, dtype=dt_float), # gen_p
np.full(shape=(ob_sp.n_load,), fill_value=np.inf, dtype=dt_float), # load_p
np.full(shape=(ob_sp.dim_topo,), fill_value=2., dtype=dt_float), # topo_vect
np.full(shape=(ob_sp.n_line,), fill_value=np.inf, dtype=dt_float), # rho
np.full(shape=(ob_sp.n_line,), fill_value=ob_sp.gen_pmax, dtype=dt_float), # actual_dispatch
np.full(shape=(ob_sp.dim_topo**2,), fill_value=1., dtype=dt_float), # connectivity_matrix
))
Box.__init__(self, low=low, high=high, shape=shape)
def to_gym(self, observation):
res = np.concatenate((obs.gen_p / obs.gen_pmax,
obs.prod_p / glop_obs.load_p,
obs.topo_vect.astype(float),
obs.rho,
obs.actual_dispatch / env_glop.gen_pmax,
obs.connectivity_matrix().flatten()
))
return res
So if you want more customization, but making less generic code (the BoxGymObsSpace
works for all the attribute of the observation) you can customize it by adapting the snippet above or read the documentation here (TODO).
Only the "to_gym" function, and this exact signature is important in this case. It should take an observation in a grid2op format and return this same observation compatible with the gym Box (so a numpy array with the right shape and in the right range)
Converting the grid2op actions in something that is not a Tuple, nor a Dict. The main restriction in these frameworks is that they do not allow for easy integration of environment where both discrete actions and continuous actions are possible.
We can use the same kind of method explained above with the use of the class BoxGymActSpace
. In this case, you need to provide a way to convert a numpy array (an element of a gym Box) into a grid2op action.
NB This method is particularly suited if you want to focus on CONTINUOUS part of the action space, for example redispatching, curtailment or action on storage unit.
Though we made it possible to also use discrete action, we do not recommend to use it. Prefer using the MultiDiscreteActSpace
for such purpose.
from grid2op.gym_compat import BoxGymActSpace
scale_gen = env_sb.init_env.gen_max_ramp_up + env_sb.init_env.gen_max_ramp_down
scale_gen[~env_sb.init_env.gen_redispatchable] = 1.0
env_sb.action_space = BoxGymActSpace(env_sb.init_env.action_space,
attr_to_keep=["redispatch"],
multiply={"redispatch": scale_gen},
)
obs_gym = env_sb.reset()
NB: the above code is equivalent to something like:
from gym.spaces import Box
class BoxGymActSpace(Box):
def __init__(self, action_space)
shape = observation_space.n_gen # redispatch
ob_sp = observation_space
# lowest value the attribute can take (see doc for more information)
low = np.full(shape=(ob_sp.n_gen,), fill_value=-1., dtype=dt_float)
# highest value the attribute can take
high = np.full(shape=(ob_sp.n_gen,), fill_value=1., dtype=dt_float)
Box.__init__(self, low=low, high=high, shape=shape)
self.action_space = action_space
def from_gym(self, gym_observation):
res = self.action_space()
res.redispatch = gym_observation * scale_gen
return res
So if you want more customization, but making less generic code (the BoxGymActSpace
works for all the attribute of the action) you can customize it by adapting the snippet above or read the documentation here (TODO). The only important method you need to code is the "from_gym" one that should take into account an action as sampled by the gym Box and return a grid2op action.
We can use the same kind of method explained above with the use of the class BoxGymActSpace
, but which is more suited to the discrete type of actions.
In this case, you need to provide a way to convert a numpy array of integer (an element of a gym MultiDiscrete) into a grid2op action.
NB This method is particularly suited if you want to focus on DISCRETE part of the action space, for example set_bus or change_line_status.
from grid2op.gym_compat import MultiDiscreteActSpace
reencoded_act_space = MultiDiscreteActSpace(env_sb.init_env.action_space,
attr_to_keep=["set_line_status", "set_bus", "redispatch"])
env_sb.action_space = reencoded_act_space
obs_gym = env_sb.reset()
First, let's make sure our environment is compatible with stable baselines, thanks to their helper function.
This means that
from stable_baselines3.common.env_checker import check_env
check_env(env_sb)
So as we see, the environment seems to be compatible with stable baselines. Now we can start the training.
from stable_baselines3 import PPO
model = PPO("MlpPolicy", env_sb, verbose=1)
if nb_step_train:
model.learn(total_timesteps=nb_step_train)
# model.save("ppo_stable_baselines3")
Again, the goal of this section was not to demonstrate how to train a state of the art algorithm, but rather to demonstrate how to use grid2op with the stable baselines repository.
Most importantly, the neural networks there are not customized for the environment, default parameters are used. This is unlikely to work at all !
For more information and to use tips and tricks to get started with RL agents, the devs of "stable baselines" have done a really nice job. You can have some tips for training RL agents here https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html and consult any of the resources listed there https://stable-baselines3.readthedocs.io/en/master/guide/rl.html
For a better "usecase" of the PPO agent using stable-baselines3 we strongly encourage you to check out the "PPO_SB3" agent of l2rpn_baselines package.
Lastly, the RL frameworks we will use is tf agents.
Compared to the previous one, this framework is more verbose. In this notebook we will mimic what has been done in the https://github.com/tensorflow/agents/blob/master/docs/tutorials/1_dqn_tutorial.ipynb
To that end, we will introduce the last "gym transformer" available in grid2op at time of writing. This function will transform the action space in a Discrete one. With this modeling, the agent can take an action on a substation, or act on a powerline or perform redispatching. But, as opposed to what is done previously, it cannot act on, say, a substation and a powerline at the same time.
This limitation does not come from tf agents. But this limitation is necessary to run the tutorial of the DQN provided with tf agents.
First we will build the observation space as for the stable baselines repository. See section 2) Stable baselines for more information.
# create the gym environment
env_tfa = GymEnv(env_glop) # tfa for "tf agents"
glop_obs = env_glop.reset()
# customize the observation space
env_tfa.observation_space = BoxGymObsSpace(env_tfa.init_env.observation_space,
attr_to_keep=["gen_p", "load_p", "topo_vect",
"rho", "actual_dispatch", "connectivity_matrix"],
divide={"gen_p": env_glop.gen_pmax,
"load_p": glop_obs.load_p,
"actual_dispatch": env_glop.gen_pmax},
functs={"connectivity_matrix": (
lambda grid2obs: grid2obs.connectivity_matrix().flatten(),
0., 1., None, None,
)
}
)
obs_gym = env_tfa.reset()
Again, the observation space might need to be customize. We don't assume here that everything here is relevant, nor that any information that would be needed for an agent is here.
This example is only here to demonstrate how to use grid2op with openai gym framework.
As opposed to the previous action space, to use the tutorial of tf agents, we need to customize the action space to ouput a single number (the id of the action you want to take).
This can be done with the DiscreteActSpace
gym converter, that behave approximately the same way as MultiDiscreteActSpace
does.
from grid2op.gym_compat import DiscreteActSpace
reencoded_act_space = DiscreteActSpace(env_sb.init_env.action_space,
attr_to_keep=["set_line_status", "set_bus", "redispatch"])
env_tfa.action_space = reencoded_act_space
obs_gym = env_sb.reset()
print(env_tfa.action_space.from_gym(env_tfa.action_space.sample()))
print(env_tfa.action_space.from_gym(env_tfa.action_space.sample()))
And that is it. All the rest is done thanks to tf agents.
tf agents is a lot more verbose than ray and stable baselines, but it allows for more control on what you want to do, we will, for the sake of the example, only show the step without detailing them.
For more information, you can visit their github: https://github.com/tensorflow/agents
website: https://www.tensorflow.org/agents/api_docs/python/tf_agents
and the notebook that inspired this one: https://colab.research.google.com/github/tensorflow/agents/blob/master/docs/tutorials/1_dqn_tutorial.ipynb
Note: the above code, once again, only aims at showing how to integrate grid2op with tf agents. Its aim is not to showcase the best use of tensorflow, tf agents or grid2op.
It is only an example for demonstration purpose and do not aim at providing an interesting agent at all. For that you might want to use something different than DQN, tune the hyper parameters (including size of each neural networks, number of step for which you train, learning rate, etc. etc.), define in a better fasshion the action space and observation space etc.
import tensorflow as tf
from tf_agents.agents.dqn import dqn_agent
from tf_agents.environments import tf_py_environment
from tf_agents.networks import sequential
from tf_agents.policies import random_tf_policy
from tf_agents.replay_buffers import tf_uniform_replay_buffer
from tf_agents.trajectories import trajectory
from tf_agents.specs import tensor_spec
from tf_agents.utils import common
# initialize the environment
from tf_agents.environments.gym_wrapper import GymWrapper
tf_env_train = tf_py_environment.TFPyEnvironment(GymWrapper(env_tfa))
eval_env = tf_py_environment.TFPyEnvironment(GymWrapper(copy.deepcopy(env_tfa)))
# meta parameters
num_iterations = nb_step_train
initial_collect_steps = 100
collect_steps_per_iteration = 1
replay_buffer_max_length = 100000
batch_size = 64
learning_rate = 1e-3
log_interval = 200
num_eval_episodes = 10
eval_interval = 1000
# neural nets (for the agents)
fc_layer_params = (100, 50)
action_tensor_spec = tensor_spec.from_spec(tf_env_train.action_spec())
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1
# Define a helper function to create Dense layers configured with the right
# activation and kernel initializer.
def dense_layer(num_units):
return tf.keras.layers.Dense(
num_units,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling(
scale=2.0, mode='fan_in', distribution='truncated_normal'))
# QNetwork consists of a sequence of Dense layers followed by a dense layer
# with `num_actions` units to generate one q_value per available action as
# it's output.
dense_layers = [dense_layer(num_units) for num_units in fc_layer_params]
q_values_layer = tf.keras.layers.Dense(
num_actions,
activation=None,
kernel_initializer=tf.keras.initializers.RandomUniform(
minval=-0.03, maxval=0.03),
bias_initializer=tf.keras.initializers.Constant(-0.2))
q_net = sequential.Sequential(dense_layers + [q_values_layer])
# optimizer (for training)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
# just a variable to count the number of "env.step" performed
train_step_counter = tf.Variable(0)
# create the agent
agent = dqn_agent.DqnAgent(
tf_env_train.time_step_spec(),
tf_env_train.action_spec(),
q_network=q_net,
optimizer=optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)
agent.initialize()
# for exploration
random_policy = random_tf_policy.RandomTFPolicy(tf_env_train.time_step_spec(),
tf_env_train.action_spec())
# replay buffer (to store the past actions / states / rewards)
replay_buffer = tf_uniform_replay_buffer.TFUniformReplayBuffer(
data_spec=agent.collect_data_spec,
batch_size=tf_env_train.batch_size,
max_length=replay_buffer_max_length)
def collect_step(environment, policy, buffer):
time_step = environment.current_time_step()
action_step = policy.action(time_step)
next_time_step = environment.step(action_step.action)
traj = trajectory.from_transition(time_step, action_step, next_time_step)
# Add trajectory to the replay buffer
buffer.add_batch(traj)
def collect_data(env, policy, buffer, steps):
for _ in range(steps):
collect_step(env, policy, buffer)
collect_data(tf_env_train, random_policy, replay_buffer, initial_collect_steps)
# generate the datasets
# Dataset generates trajectories with shape [Bx2x...]
dataset = replay_buffer.as_dataset(
num_parallel_calls=3,
sample_batch_size=batch_size,
num_steps=2).prefetch(3)
iterator = iter(dataset)
# train it
# (Optional) Optimize by wrapping some of the code in a graph using TF function.
agent.train = common.function(agent.train)
# Reset the train step
agent.train_step_counter.assign(0)
# Evaluate the agent's policy once before training.
def compute_avg_return(environment, policy, num_episodes=10):
total_return = 0.0
for _ in range(num_episodes):
time_step = environment.reset()
episode_return = 0.0
while not time_step.is_last():
action_step = policy.action(time_step)
time_step = environment.step(action_step.action)
episode_return += time_step.reward
total_return += episode_return
avg_return = total_return / num_episodes
return avg_return.numpy()[0]
# See also the metrics module for standard implementations of different metrics.
# https://github.com/tensorflow/agents/tree/master/tf_agents/metrics
avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
returns = [avg_return]
for _ in range(num_iterations):
# Collect a few steps using collect_policy and save to the replay buffer.
collect_data(tf_env_train, agent.collect_policy, replay_buffer, collect_steps_per_iteration)
# Sample a batch of data from the buffer and update the agent's network.
experience, unused_info = next(iterator)
trainer = agent.train(experience)
train_loss = trainer.loss
step = agent.train_step_counter.numpy()
if step % log_interval == 0:
print('step = {0}: loss = {1}'.format(step, train_loss))
if step % eval_interval == 0:
avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
print('step = {0}: Average Return = {1}'.format(step, avg_return))
returns.append(avg_return)
avg_return = compute_avg_return(eval_env, agent.policy, num_eval_episodes)
if num_iterations:
print('Final Average return aftre training for {} steps: {}'.format(step, avg_return))
returns.append(avg_return)
If you want to use another RL framework, let us know by filling a github issue template here: https://github.com/rte-france/Grid2Op/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=
Even better, if you have used another RL framework, let us know and we will find a way to integrate your developement into this notebook ! You can write an issue https://github.com/rte-france/Grid2Op/issues/new?assignees=&labels=documentation&template=documentation.md&title= and explaining which framework you used and a minimal code example we could use