#!/usr/bin/env python # coding: utf-8 # # NanoEvents awkward1 demo # # NanoEvents is a Coffea utility to wrap the CMS [NanoAOD](https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_06021.pdf) or similar flat nTuple structure into a single awkward array with appropriate object methods (such as Lorentz vector methods), cross references, and nested objects, all lazily accessed from the source ROOT TTree via uproot. # # NanoEvents is in a **experimental** stage, and has been available in awkward0 for about 6 months. Quite recently, it was ported to awkward1. Here we demo using an awkward1-based NanoEvents array. # # It can be instantiated using the [NanoEventsFactory](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoEventsFactory.html#nanoeventsfactory): # In[ ]: import awkward1 as ak from coffea.nanoevents import NanoEventsFactory fname = "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root" cache = {} factory = NanoEventsFactory(fname, cache=cache) events = factory.events() # The `events` object is an awkward array, which at its top level is a record array with one record for each "collection", where a collection is a grouping of column (TBranch) names, categorized based on the available columns as follows: # # * one branch exists named `name` and no branches start with `name_`, interpreted as a single flat array; # * one branch exists named `name`, one named `n{name}`, and no branches start with `name_`, interpreted as a single jagged array; # * no branch exists named `n{name}` and many branches start with `name_*`, interpreted as a flat table; or # * one branch exists named `n{name}` and many branches start with `name_*`, interpreted as a jagged table. # # *Any ROOT TTree that follows such a naming convention should be readable as a NanoEvents array.* # # For example, in the file we opened, the branches: # ``` # Generator_binvar # Generator_scalePDF # Generator_weight # Generator_x1 # Generator_x2 # Generator_xpdf1 # Generator_xpdf2 # Generator_id1 # Generator_id2 # ``` # are grouped into one sub-record named `Generator` which can be accessed using either getitem or getattr syntax, i.e. `events["Generator"]` or `events.Generator`. # In[ ]: events.Generator.id1 # In[ ]: # all column names can be listed with: events.Generator.columns # In[ ]: # In CMS NanoAOD, each TBranch title is a help string, which is carried into the NanoEvents # e.g. executing the following cell should produce a help pop-up "id of first parton" get_ipython().run_line_magic('pinfo', 'events.Generator.id1') # Based on a collection's name, some collections acquire additional _methods_, which are extra features exposed by the code in the mixin classes of the [nanoaod.methods](https://coffeateam.github.io/coffea/modules/coffea.nanoevents.methods.html) module. For example, although `events.GenJet` has the columns: # In[ ]: events.GenJet.columns # we can access additional attributes associated to each generated jet by virtue of the fact that they can be interpreted as Lorentz vectors: # In[ ]: events.GenJet.energy # We can call more complex methods, like computing the distance $\Delta R = \sqrt{\Delta \eta^2 + \Delta \phi ^2}$ between two LorentzVector objects: # In[ ]: # find distance between leading jet and all electrons in each event events.Jet[:, 0].delta_r(events.Electron) # The mapping from collection name to methods is controlled by [NanoEventsFactory.default_mixins](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoEventsFactory.html#coffea.nanoevents.NanoEventsFactory.default_mixins) and can be overriden with new mappings in the NanoEventsFactory constructor, if desired. # Additional methods provide convenience functions for interpreting some branches, e.g. # In[ ]: # unpacked Jet_jetId flags events.Jet.isTight # In[ ]: # unpacked GenPart_statusFlags events.GenPart.hasFlags(['isPrompt', 'isLastCopy']) # CMS NanoAOD also contains pre-computed cross-references for some types of collections. For example, there is a TBranch `Electron_genPartIdx` which indexes the `GenPart` collection per event to give the matched generated particle, and `-1` if no match is found. NanoEvents transforms these indices into an awkward _indexed array_ pointing to the collection, so that one can directly access the matched particle using getattr syntax: # In[ ]: events.Electron.matched_gen.pdgId # In[ ]: events.Muon.matched_jet.pt # For generated particles, the parent index is similarly mapped: # In[ ]: events.GenPart.parent.pdgId # In addition, using the parent index, a helper method computes the inverse mapping, namely, `children`. As such, one can find particle siblings with: # In[ ]: events.GenPart.parent.children.pdgId # notice this is a doubly-jagged array # Since often one wants to shortcut repeated particles in a decay sequence, a helper method `distinctParent` is also available. Here we use it to find the parent particle ID for all prompt electrons: # In[ ]: events.GenPart[ (abs(events.GenPart.pdgId) == 11) & events.GenPart.hasFlags(['isPrompt', 'isLastCopy']) ].distinctParent.pdgId # Events can be filtered like any other awkward array using boolean fancy-indexing # In[ ]: mmevents = events[ak.num(events.Muon) == 2] zmm = mmevents.Muon[:, 0] + mmevents.Muon[:, 1] zmm.mass # One can assign new variables to the arrays, with some caveats: # # * Assignment must use setitem (`events["path", "to", "name"] = value`) # * Assignment to a sliced `events` won't be accessible from the original variable # * New variables are not visible from cross-references # In[ ]: mmevents["Electron", "myvar2"] = mmevents.Electron.pt + zmm.mass mmevents.Electron.myvar2 # Just to demonstrate that everything is lazily-accessed, here are all the cache items that have built up through the execution of this demo # In[ ]: print("\n".join(sorted(cache.keys())))