Try me out interactively with: Binder

In this notebook we will outline the type of problems that grid2op aims to model.

We will first expose the "powergrid control" problem modeled by grid2op, and in a second part will give some introductory elements on Reinforcement Learning.

I The « powergrid control» problem

The reality

A powergrid is a really complex man made "object". In our case, a powergrid can be represented as a set of "objects" that are:

  • powerlines (with two ends : "origin" and "extremity")
  • loads (usually the size of a small city or of a big industrial firm)
  • generators (usually big power plants, such as wind farm, dumps, coal or nuclear powerplants for example)

All these objects are connected together in some places called "substations". In reality a substation looks like this: (image credit )

Substations are connected together with "powerlines" that allow some power to flow from one place to another: (image credit

The fiction (aka our "model")

Of course, in the grid2Op framework, we will not detail every object you can see on these pictures. For us, a powergrid (here relatively simple) will be represented as:

In the above figure you can see :

  • Substations represented as circles, you can see five on this figure, labelled 0, 1, ..., 4
  • Powerlines showed with... lines on this plot, you can see eight powerlines there
  • Generators displayed with small green circle, there are two
  • Loads displayed with small orange cirle, three on this simple powergrid.

Focus on a substation

Now let's zoom (or at least pretend to) on substation with id 2: As you can see on the right part (bottom) this substation (as all substations in grid2op) has 2 "busbars" (represented here as vertical black lines). Note that in all these diagrams, powerlines are represented with blue lines.

What does it entails ? It entails that we can "split" a substation into different independant "buses". A "bus" is a fancy word from the power system community meaning that: if two objects are on the same bus, then there exist a direct electrical path in the substation connecting them. But wait, how can you split buses ?

An object in a substation

In fact, in our modeling, for each "object" in a substation, we have two switches as showed in the image below:

And we can choose to either connect the object (in this case, the powerline with id 1) to either busbars (denoted by "a" and "b" in the figure).

So basically, the main goal of the grid2op platform is to be able to model in particular what is the "best" position of these switches. In all grid2op framework, when we speak about the topology you can imagine it being the list of the positions of each switch in the grid.

Different configurations

This possibility to change the switches configuration, lead to multiple configurations of the elements per substation. If we take the example of the substation 2 (that we already used above) we can list the number of possible configurations:

NB: Here we only listed the topologies with all elements connected (in grid2op it is possible to disconnect a powerline for example). We also didn't take into account the symmetries of the situation. For example, the topology we called "topo 2: {1,4}; {5,6}" has been here represented with object 1 and 4 connected to busbar b and object 5 and 6 to busbar a it would have been equivalent to assign 5 and 6 to busbar b and 1 and 4 to busbar a. What matters here is that objects 1 and 4 are electrically isolated from object 5 and 6 in this substation: there is not electrical path that connects 5 and 4 in this topology

In order to be "valid" a topology must also ensure some physical constraints. For example:

  • we could decide to connect the 1,4 and 5 to busbar a and leave 6 alone on busbar a. This is not strictly impossible (and it is not equivalent to disconnect powerline 6). In reality it can serve certain purposes, mostly related to voltages. However, we do not recommend to use this kind of topology.
  • having a load alone on a busbar (=without having one powerline connected to the same busbar) is equivalent to disconnect it, and will lead to a game over.
  • having a generator alone on a busbar (=without having one powerline connected to the same busbar) is also equivalent to disconnect it, it will also lead to a game over.

Note on the topology count

When trying to modify the topology of the powergrid, you can expect 1) highly non linear effects (that we will cover in the next notebook) and 2) a dramatic increase of the number of actions you can perform at each time step.

We want to emphasize this: Action spaces in power systems are really humongous.

Different possibilities per substation

As we covered in the previous subsection (Different configurations) the number of different configurations for a substation having k elements is on the order of $2^{k-1}$. For example, for the standard IEEE 118 grid used in many power system publications, one single substation counts 17 elements. This leads to $\approx 2^16 = 65.536$ configurations for this substation alone (i.e if you only care about finding the exact topology of this substations counting 17 elements, you need to choose among more than $65.000$ possible actions).

Modifying all substations

But in theory, you could change not only the topology of one single substation but on as many substations as you want (in theory, see section Introduction of "operational constraints" in grid2op for more information). So say the first substations counts 11 elements, and the second one 17 elements.

You can choose among $\approx 2^{11-1}$ actions for the first and among $\approx 2^{16-1}$ actions for the second substations, this leads to $1024 \times 65.536 \approx 65.000.000$ actions for the two substations. There are only 116 more to consider...

As an example, we sum up in the table below the total number of possible topological actions for some grid you can used in grid2op:

grid2op env name Number of substations number of powerlines total number of topologies
rte_case5_example 5 8 31.320
l2rpn_case14_sandbox 14 20 1.397.519.564
l2rpn_wcci_2020 36 59 1.88 e+21 (*)
IEEE case 118 118 186 3.88 e+76 (*)

* These are approximations of course (and some numerical overflows might have occured during the computation)... These figures also do not take account symmetries.

Connection / disconnection of powerlines

Of course at one time step, it is also possible to change the status of the powerline, meaning connecting or disconnecting them. If there are N powerlines, the number of such possible actions is $2^N$.

Wrapping up: different number of "actions"

Finally, concerning the number of different actions, we can wrap it up in this table (be carefull this is provided that you allow at each time steps to reconfigure everything)

grid2op env name Number of substations number of powerlines total number of topologies number of powerlines action
rte_case5_example 5 8 31.320 256
l2rpn_case14_sandbox 14 20 1.397.519.564 1.048.576
l2rpn_wcci_2020 36 59 1.88 e+21 (*) 5.76 e+17 (*)
IEEE case 118 118 186 3.88 e+76 (*) 9.81 e+55 (*)

* Again, these are approximations.

NB: This is way more than the number of action of Alphastar which is according to deepming blog on the order of $10^{26}$.

Introduction of "operational constraints" in grid2op

In reality though, we modeled different type of "constraints" that makes this total number of "actions" completely irrelevant. We decided, in the grid2op framework to enforce some operational constraints we decided to enforce some limitation that dramatically reduce the action space above. These limitations can be modified from the Parameters class

Of the number of moves someone can perform at the same time

In reality an operator would not switch off half the powerlines of the grid (this would surely lead to a blackout).

He would not act on more than one or two substations at the same time (this part is probably more limited by the fact humans cannot focus efficiently on many things at the same time).

So we decided to have 2 attributes of the Parameters class:

  • MAX_LINE_STATUS_CHANGED will limit the number of powerline status you can change in a single action (it is usually set to 1)
  • MAX_SUB_CHANGED will limit the number of substation you can change at the same time (it is also usually set to 1)

Of the frequency on which an object might change

Would you appoint this person in charge of the powergrid ? image credit : - "Les Visiteurs", French fantasy comedy film directed by Jean-Marie Poiré, released in 1993

Well... We think your answer is likely to be ... Not in charge of a real grid ...

And this has nothing to do with the clothing. Indeed, the more you act on a breaker, the more rapidly you can expect it to age, the more fragile it will become. And you can imagine having to change the breakers of extra high voltage part of the powergrid every few months. If you don't imagine the problem it may cause, there is an image of a circuit breaker: image credit:

So long introduction to say, there are 2 attributes of the Parameters class that are present to limit the frequency on which you can act on the same object:

  • NB_TIMESTEP_COOLDOWN_LINE tells you how many timesteps you have to wait before redoing an action of type connection / disconnection of powerline (it is usually 3 when not 0)
  • NB_TIMESTEP_COOLDOWN_SUB indicates how many timesteps you have to wait before making an action on the same substation again (it is usually 3 too).

NB : These "cooldown" time also enforce some physical constraints. For example, there exist* elements in real powergrid that cannot be opened / closed more than once every 5 or 10 mins.

Total number of actions

This introduction is.. already pretty big. So let's summarize the total number of actions that are feasible in grid2op taken into account the limitations above mentionned:

grid2op env name Number of substations number of powerlines total number of topologies number of powerlines action
rte_case5_example 5 8 117 8
l2rpn_case14_sandbox 14 20 179 20
l2rpn_wcci_2020 36 59 66.811 59
IEEE case 118 118 186 72.150 186

In particular, if your are able to both act on the powerlines and on the substation topology, the total number of action you might be able to do is $72.150 \times 186 = 13.419.900$ which is already a pretty decent size for an action space (in comparison in Go the action space counts 361 actions at most)

Comparison with "the reality"

We want to emphasize that this modeling is an (over) over simplification of the reality.

  • In reality there can also be "switches" that can connect the two busbars (reconfiguring the topology of the substation can be done with only one switch, but on the other hand, sometimes changing one switch will have no effect at all).

  • You can also have more than 2 busbars in each substation (sometimes 5 or 6 for example). This makes the number of possible topologies even higher than what it is in grid2op.

  • Finally, most of the time a single busbar count a "switch" in its middle that allows to disconnect part of the element connected to it to another part. Basically this entails that some combinaison of elements are not possible to perform

And of course, we model explicitly in this framework (eg we allow the agents to act on) only some elements of a powergrid. In reality, much more heterogeneous objects exists with more complex properties.

We decided to make all these assumptions because we thought it was the easiest setting that allow to perform some topological reconfiguration, beside connecting / disconnecting powerlines.

So keep in mind that the tables presented above are an overly underestimated number of actions companies in charge of the powergrid can take at the same time, mostly because the "real" grid can count more than $10.000$ substations, because much more objects are connected at each substations and also because some substations can often be "split" into a dozen independant "buses".

II Summary of Reinforcement Learning

Though the Grid2Op package can be used to perform many different tasks, this set of notebooks will be focused on the machine learning part, and its usage in a Reinforcement learning framework.

Reinforcement learning is a framework that allows to train an "agent" to solve some task in a time-dependant domain. We tried to cast the grid operation planning task into this framework. The package Grid2Op was inspired by it.

In reinforcement learning (RL), there are 2 distinct entities:

  • Environment: is a modeling of the "world" in which the agent takes some actions to achieve some pre-definite objectives.
  • Agent: will take actions on the environment that will have consequences.

These 2 entities exchange 3 main types of information:

  • Action: is an information sent by the Agent that will modify the internal state of the environment.
  • State / Observation: is the (partial) view of the environment as seen by the Agent. The Agent receives a new state after each action it takes. It can use the observation (state) at time step t to take an action at time t.
  • Reward: is the score received by the agent for its previous action. It is the reward that define the objectives of the agent (the goal of the agent is to maximize the reward it receives over time).

A schematic representation of this is shown in the figure below (Credit: Sutton & Barto):


III Other materials

Other useful informations are provided in the white paper Reinforcement Learning for Electricity Network Operation presented for the L2RPN 2020 Neurips edition.