NanoEvents awkward1 demo

NanoEvents is a Coffea utility to wrap the CMS NanoAOD or similar flat nTuple structure into a single awkward array with appropriate object methods (such as Lorentz vector methods), cross references, and nested objects, all lazily accessed from the source ROOT TTree via uproot.

NanoEvents is in a experimental stage, and has been available in awkward0 for about 6 months. Quite recently, it was ported to awkward1. Here we demo using an awkward1-based NanoEvents array.

It can be instantiated using the NanoEventsFactory:

In [ ]:
import awkward1 as ak
from coffea.nanoevents import NanoEventsFactory

fname = ""
cache = {}
factory = NanoEventsFactory(fname, cache=cache)
events =

The events object is an awkward array, which at its top level is a record array with one record for each "collection", where a collection is a grouping of column (TBranch) names, categorized based on the available columns as follows:

  • one branch exists named name and no branches start with name_, interpreted as a single flat array;
  • one branch exists named name, one named n{name}, and no branches start with name_, interpreted as a single jagged array;
  • no branch exists named n{name} and many branches start with name_*, interpreted as a flat table; or
  • one branch exists named n{name} and many branches start with name_*, interpreted as a jagged table.

Any ROOT TTree that follows such a naming convention should be readable as a NanoEvents array.

For example, in the file we opened, the branches:


are grouped into one sub-record named Generator which can be accessed using either getitem or getattr syntax, i.e. events["Generator"] or events.Generator.

In [ ]:
In [ ]:
# all column names can be listed with:
In [ ]:
# In CMS NanoAOD, each TBranch title is a help string, which is carried into the NanoEvents
# e.g. executing the following cell should produce a help pop-up "id of first parton"

Based on a collection's name, some collections acquire additional methods, which are extra features exposed by the code in the mixin classes of the nanoaod.methods module. For example, although events.GenJet has the columns:

In [ ]:

we can access additional attributes associated to each generated jet by virtue of the fact that they can be interpreted as Lorentz vectors:

In [ ]:

We can call more complex methods, like computing the distance $\Delta R = \sqrt{\Delta \eta^2 + \Delta \phi ^2}$ between two LorentzVector objects:

In [ ]:
# find distance between leading jet and all electrons in each event
events.Jet[:, 0].delta_r(events.Electron)

The mapping from collection name to methods is controlled by NanoEventsFactory.default_mixins and can be overriden with new mappings in the NanoEventsFactory constructor, if desired. Additional methods provide convenience functions for interpreting some branches, e.g.

In [ ]:
# unpacked Jet_jetId flags
In [ ]:
# unpacked GenPart_statusFlags
events.GenPart.hasFlags(['isPrompt', 'isLastCopy'])

CMS NanoAOD also contains pre-computed cross-references for some types of collections. For example, there is a TBranch Electron_genPartIdx which indexes the GenPart collection per event to give the matched generated particle, and -1 if no match is found. NanoEvents transforms these indices into an awkward indexed array pointing to the collection, so that one can directly access the matched particle using getattr syntax:

In [ ]:
In [ ]:

For generated particles, the parent index is similarly mapped:

In [ ]:

In addition, using the parent index, a helper method computes the inverse mapping, namely, children. As such, one can find particle siblings with:

In [ ]:
# notice this is a doubly-jagged array

Since often one wants to shortcut repeated particles in a decay sequence, a helper method distinctParent is also available. Here we use it to find the parent particle ID for all prompt electrons:

In [ ]:
    (abs(events.GenPart.pdgId) == 11)
    & events.GenPart.hasFlags(['isPrompt', 'isLastCopy'])

Events can be filtered like any other awkward array using boolean fancy-indexing

In [ ]:
mmevents = events[ak.num(events.Muon) == 2]
zmm = mmevents.Muon[:, 0] + mmevents.Muon[:, 1]

One can assign new variables to the arrays, with some caveats:

  • Assignment must use setitem (events["path", "to", "name"] = value)
  • Assignment to a sliced events won't be accessible from the original variable
  • New variables are not visible from cross-references
In [ ]:
mmevents["Electron", "myvar2"] = + zmm.mass

Just to demonstrate that everything is lazily-accessed, here are all the cache items that have built up through the execution of this demo

In [ ]: