In this tutorial, we introduce the EnergyFlow package for computing arbitrary Energy Flow Polynomials (EFPs) and collections of EFPs. The package is built with both usability, flexibility, and computational efficiency in mind.
For a collection of $M$ particles with energy measure $z_i$ and pairwise angular distance measures $\theta_{ij}$, the EFPs are multiparticle energy correlator observables, indexed by multigraphs $G$, and defined as: $$ \mathrm{EFP}_G = \sum_{i_1=1}^M \sum_{i_2=1}^M\cdots \sum_{i_N=1}^M z_{i_1}z_{i_2}\cdots z_{i_N} \prod_{(k,\ell)\in G} \theta_{i_ki_\ell},$$ where $(k,\ell)\in G$ iterates over the edges in the multigraph.
The specific choice of energy and angular measures depends on the collider context. We provide the following measure
options (default is hadrdot
):
hadr
: $$ z_i = p_{T,i}^\kappa,\,\,\,\,\,\, \theta_{ij} = ((y_i - y_j)^2 + (\phi_i - \phi_j)^2)^{\beta/2} $$
hadrdot
: $$ z_i = p_{T,i}^\kappa,\,\,\,\,\,\, \theta_{ij} = (2\, \hat p_i^\mu \hat p_{i\,\mu})^{\beta/2},\,\,\,\,\,\, \mathrm{where }\,\,\,\,\,\,\hat p_i^\mu \equiv p_i^\mu/p_{T,i} $$
ee
: $$ z_i = E_i^\kappa,\,\,\,\,\,\, \theta_{ij} = (2\, \hat p_i^\mu \hat p_{i\,\mu})^{\beta/2},\,\,\,\,\,\, \mathrm{where }\,\,\,\,\,\,\hat p_i^\mu \equiv p_i^\mu/E_i $$
The energy and angular weighting parameters kappa
and beta
default to $\kappa=1$ and $\beta = 2$. The choice of $\kappa = 1$ is required for infrared and collinear (IRC) safety of the observables. Any choice of $\beta > 0$ guarantees IRC safety. We also provide the normed
option (default is True
) to use a normalized and dimensionless energy measure $z_i/\sum_{j=1}^M z_j$ rather than $z_i$.
With this refresher, we have enough to begin using EnergyFlow to compute arbitrary EFPs! Ensure you have EnergyFlow installed. It is easily pip
installable by executing:
pip install energyflow
We start by importing the EnergyFlow package as well as some other helpful Python libraries.
import energyflow as ef
import numpy as np
Let's get some events. Typically these would be read in from your favorite event generator for your physics case of interest.
For the purposes of this tutorial, we will uniformly sample massless $M$-body phase space with the use of our implementation of the RAMBO algorithm via the function ef.gen_massless_phase_space
. It returns nevents
events consisting of nparticles
massless four-momenta with center of mass energy of energy
in the center of momentum frame. In general, EnergyFlow supports events as arrays of four-momenta [E,px,py,pz]
or arrays of hadronic coordinates [pT,y,phi]
for hadronic measures.
Let's generate 50 events with 20 particles each at center of mass energy 100 GeV.
events = ef.gen_massless_phase_space(nevents=50, nparticles=20, energy=100)
To specify a particular EFP to be computed for our events, we must simply specify the corresponding multigraph.
In EnergyFlow, multigraphs are specified as lists of edges, where edges are pairs of vertices. Here are several examples of graphs given as edge lists, where we label the vertices with integers from $0$ to $N-1$.
line = [(0,1)]
wedge = [(0,1), (0,2)]
triangle = [(0,1), (0,2), (1,2)]
square = [(0,1), (1,2), (2,3), (3,1)]
Multigraphs can have multiple edges per pair of vertices. Here are several examples of multigraphs as edgelists, with one or more doubled edges.
multiline = [(0,1), (0,1)]
multiwedge1 = [(0,1), (0,1), (0,2)]
multiwedge2 = [(0,1), (0,1), (0,2), (0,2)]
multitriangle1 = [(0,1), (0,2), (0,2), (1,2)]
In fact, arbitrary objects can be used to label the vertices. We typically use integer-labeling for simplicity and readibility, though this is not required. The following two edge lists both define the same graph from the perspective of EnergyFlow.
graph_LHC = [('atlas','cms'),('atlas','cms'),('atlas','lhcb'),('cms','lhcb'),('lhcb','alice')]
graph_ints = [(0,1),(0,1),(0,2),(1,2),(2,3)]
EFPs are defined by passing a graph and measure choies to ef.EFP
. The compute
method of ef.EFP
can then be called on an event in order to compute the EFP on that event.
For concreteness, let's begin by defining the EFP corresponding to the line graph [(0,1)]
(one edge connecting two vertices) which should be equal to twice the squared center of mass energy of the event with a suitable choice of measure parameters.
# Computing a single EFP on a single event
# specify a graph and event
graph = [(0,1)]
event = events[0]
# define the EFP corresponding to the specified graph
EFP_graph = ef.EFP(graph, measure='hadrdot', beta=2, normed=False, coords='epxpypz')
# compute the EFP on the specified event
result = EFP_graph.compute(event)
print("EFP Value:", result)
EFP Value: 19999.999999999993
We can see that the value of this EFP is indeed $\mathrm{EFP}_{[(0,1)]} = 2\times E_\mathrm{CM}^2 = 2\times (100\,\mathrm{GeV})^2 = 20\,000\, \mathrm{GeV}^2$, as expected for our events.
The framework above can be immediately extended to computing the value of an EFP on a collection of many events. This can either be done with list comprehension or even more simply using the batch_compute
method, which tries to use as many processes as there are CPUs in the machine. The number of worker processes can be controlled by passing in n_jobs
.
# Computing a single EFP on many events
# specify a graph
graph = [(0,1), (0,2), (0,2), (1,2)]
# define the EFP corresponding to the specified graph
efp_graph = ef.EFP(graph, measure='hadr', beta=1, normed=True)
# compute the EFP on the collection of events with list comprehension
results = np.asarray([efp_graph.compute(event) for event in events])
print("EFP w. list comprehension:", results)
EFP w. list comprehension: [ 410.89816574 2729.30841947 1405.84627912 5499.17017239 5020.55782768 1611.06862123 4439.88653423 2145.62086557 2343.14870895 638.33009236 4667.05882859 1664.01582395 1968.89508412 1597.20768481 3420.74774541 8538.19219307 437.71338479 1210.09001188 780.02457182 2924.27631757 383.17265712 3246.23472963 469.16416123 3092.43821852 1268.80016388 4700.20244546 2107.83309722 1024.96296112 2484.42211565 679.25199535 4193.25578551 302.61963128 774.84572218 3692.18469762 13791.99683833 2428.4647007 2336.81681925 1345.07043461 9644.01252448 562.55177915 519.71366825 2214.86688602 2610.89117682 1466.50508983 2286.28548137 2401.37933511 2404.73871182 1664.4322198 1302.25035557 3029.57234645]
If we are interested in using the spanning properties of EFPs, we typically want to compute large numbers of EFPs. The EnergyFlow package does the heavy lifting for you! It contains information about all multigraphs with up to 10 edges, which can be easily and intuitively accessed using EFPSet
.
Relevant graph quantities which can easily be used to select a set of multigraphs are summarized below. More options are available: for a full list see the documentation.
n
: Number of vertices in the multigraph.
d
: Degree, or number of edges in the multigraph.
v
: Maximum valency (number of edges touching a vertex).
c
: Variable Elimination computational complexity $\mathcal O(M^c)$
p
: Number of prime factors (or connected components of the multigraph).
As a basis, EFPs are typically organized by the number of edges d
. Not only does this correspond to the degree of the polynomial, but there are also a finite number of multigraphs up to a specified degree d
. Lets get all EFPs with up to five edges by passing d<=5
to EFPSet
and compute them on our events!
# get all EFPs with d<=5 (up to d<=10 available by default)
efpset = ef.EFPSet('d<=5', measure='hadr', beta=1, normed=True, verbose=True)
# compute their values on our events
results = np.asarray([efpset.compute(event) for event in events])
print("Results:", results)
Originally Available EFPs: Prime: 23690 Composite: 21540 Total: 45230 Current Stored EFPs: Prime: 53 Composite: 48 Total: 101 Results: [[3.90550800e+00 2.16316980e+01 1.44384985e+02 ... 1.28861507e+03 9.87233332e+02 9.08631223e+02] [6.52840850e+00 6.58316927e+01 7.97008752e+02 ... 1.83171115e+04 1.28596703e+04 1.18586871e+04] [5.38216351e+00 4.49973688e+01 4.72965232e+02 ... 7.01548629e+03 4.90667406e+03 4.51631719e+03] ... [5.73676667e+00 4.76448286e+01 4.75423962e+02 ... 8.99533469e+03 6.63865977e+03 6.21349467e+03] [5.06673559e+00 4.06291743e+01 4.17566823e+02 ... 5.28472897e+03 3.81526658e+03 3.33919056e+03] [6.67712759e+00 6.53690256e+01 7.81348061e+02 ... 1.94599194e+04 1.42460112e+04 1.32723668e+04]]
We can very easily do much more sophisticated EFP selections using any of the graph quantities. For example to select all prime EFPs with at most 4 vertices and at most 5 edges, we simply use EFPSet
with the following intuitive syntax:
# get all EFPs with n<=4, d<=5, that are prime (i.e. p==1)
efpset = ef.EFPSet(('n<=',6), ('d<=',5), ('p==',1), measure='hadr', beta=1, normed=True, verbose=True)
# compute their values on our events
results = np.asarray([efpset.compute(event) for event in events])
Originally Available EFPs: Prime: 23690 Composite: 21540 Total: 45230 Current Stored EFPs: Prime: 53 Composite: 0 Total: 53
To study the properties of an individual EFP within EFPSet
we can easily do this using specs
and graphs
. Suppose we are interested in the 95th EFP. We can find out its graph and all of its relevant information simply with the following syntax. Here we show only a subset of all the information that specs provides about an EFP: for a full list see the documentation.
efpset = ef.EFPSet('d<=5', measure='hadr', beta=1, normed=True, verbose=True)
ind = 95
graph = efpset.graphs(ind)
n, _, d, v, _, c, p, _ = efpset.specs[ind]
print("Graph:", graph)
print("Number of vertices, n:", n)
print("Number of edges, d:", d)
print("Maximum valency, v:", v)
print("VE complexity, c:", c)
print("Number of primes, p:", p)
Originally Available EFPs: Prime: 23690 Composite: 21540 Total: 45230 Current Stored EFPs: Prime: 53 Composite: 48 Total: 101 Graph: [(0, 1), (2, 3), (4, 5), (4, 6), (4, 7)] Number of vertices, n: 8 Number of edges, d: 5 Maximum valency, v: 3 VE complexity, c: 2 Number of primes, p: 3
If you want to specify your own custom measure for $z_i$ and $\theta_{ij}$ to be used in the EFP formula, that's also possible within the EnergyFlow package. You simply compute your own custom-defined zs
and thetas
on the events and pass them to the compute
methods.
We demonstrate this below by using random numbers for zs
and thetas
, which can be replaced with any custom values.
# zs and thetas can be passed in explicitly if you want to use a custom measure
(zs, thetas) = (np.random.rand(100,25), np.random.rand(100,25,25))
results = np.asarray([efpset.compute(zs=z, thetas=theta) for (z,theta) in zip(zs,thetas)])
print(results)
[[5.84562543e+01 3.83445229e+01 2.84429674e+01 ... 7.65942713e+06 6.32247695e+07 6.82582135e+08] [8.44642656e+01 5.54759810e+01 4.08175307e+01 ... 3.34290489e+07 3.38025843e+08 4.29897630e+09] [5.02285473e+01 3.38970445e+01 2.57924291e+01 ... 4.29549963e+06 3.24553405e+07 3.19707694e+08] ... [6.22437927e+01 4.13875391e+01 3.10700569e+01 ... 9.98062519e+06 8.48302112e+07 9.34286853e+08] [1.00674840e+02 6.79039536e+01 5.13063021e+01 ... 6.92879816e+07 7.41633349e+08 1.03420051e+10] [6.85991243e+01 4.74525222e+01 3.66658850e+01 ... 1.53184568e+07 1.32060566e+08 1.51912272e+09]]
And that's it! Now you should be able to specify any EFP (i.e. multigraph) or set of EFPs that you want to comput with EFP
or EFPSet
, compute them on a set of events with compute
or batch_compute
, and study the results with specs
and graphs
! As always, see the documentation for a full description of the EnergyFlow package.