#!/usr/bin/env python
# coding: utf-8

# # NanoEvents awkward1 demo
# 
# NanoEvents is a Coffea utility to wrap the CMS [NanoAOD](https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_06021.pdf) or similar flat nTuple structure into a single awkward array with appropriate object methods (such as Lorentz vector methods), cross references, and nested objects, all lazily accessed from the source ROOT TTree via uproot.
# 
# NanoEvents is in a **experimental** stage, and has been available in awkward0 for about 6 months. Quite recently, it was ported to awkward1. Here we demo using an awkward1-based NanoEvents array.
# 
# It can be instantiated using the [NanoEventsFactory](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoEventsFactory.html#nanoeventsfactory):

# In[ ]:


import awkward1 as ak
from coffea.nanoevents import NanoEventsFactory

fname = "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root"
cache = {}
factory = NanoEventsFactory(fname, cache=cache)
events = factory.events()


# The `events` object is an awkward array, which at its top level is a record array with one record for each "collection", where a collection is a grouping of column (TBranch) names, categorized based on the available columns as follows:
# 
#  * one branch exists named `name` and no branches start with `name_`, interpreted as a single flat array;
#  * one branch exists named `name`, one named `n{name}`, and no branches start with `name_`, interpreted as a single jagged array;
#  * no branch exists named `n{name}` and many branches start with `name_*`, interpreted as a flat table; or
#  * one branch exists named `n{name}` and many branches start with `name_*`, interpreted as a jagged table.
# 
# *Any ROOT TTree that follows such a naming convention should be readable as a NanoEvents array.*
# 
# For example, in the file we opened, the branches:
# ```
# Generator_binvar
# Generator_scalePDF
# Generator_weight
# Generator_x1
# Generator_x2
# Generator_xpdf1
# Generator_xpdf2
# Generator_id1
# Generator_id2
# ```
# are grouped into one sub-record named `Generator` which can be accessed using either getitem or getattr syntax, i.e. `events["Generator"]` or `events.Generator`.

# In[ ]:


events.Generator.id1


# In[ ]:


# all column names can be listed with:
events.Generator.columns


# In[ ]:


# In CMS NanoAOD, each TBranch title is a help string, which is carried into the NanoEvents
# e.g. executing the following cell should produce a help pop-up "id of first parton"
get_ipython().run_line_magic('pinfo', 'events.Generator.id1')


# Based on a collection's name, some collections acquire additional _methods_, which are extra features exposed by the code in the mixin classes of the [nanoaod.methods](https://coffeateam.github.io/coffea/modules/coffea.nanoevents.methods.html) module. For example, although `events.GenJet` has the columns:

# In[ ]:


events.GenJet.columns


# we can access additional attributes associated to each generated jet by virtue of the fact that they can be interpreted as Lorentz vectors:

# In[ ]:


events.GenJet.energy


# We can call more complex methods, like computing the distance $\Delta R = \sqrt{\Delta \eta^2 + \Delta \phi ^2}$ between two LorentzVector objects:

# In[ ]:


# find distance between leading jet and all electrons in each event
events.Jet[:, 0].delta_r(events.Electron)


# The mapping from collection name to methods is controlled by [NanoEventsFactory.default_mixins](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoEventsFactory.html#coffea.nanoevents.NanoEventsFactory.default_mixins) and can be overriden with new mappings in the NanoEventsFactory constructor, if desired.
# Additional methods provide convenience functions for interpreting some branches, e.g.

# In[ ]:


# unpacked Jet_jetId flags
events.Jet.isTight


# In[ ]:


# unpacked GenPart_statusFlags
events.GenPart.hasFlags(['isPrompt', 'isLastCopy'])


# CMS NanoAOD also contains pre-computed cross-references for some types of collections. For example, there is a TBranch `Electron_genPartIdx` which indexes the `GenPart` collection per event to give the matched generated particle, and `-1` if no match is found. NanoEvents transforms these indices into an awkward _indexed array_ pointing to the collection, so that one can directly access the matched particle using getattr syntax:

# In[ ]:


events.Electron.matched_gen.pdgId


# In[ ]:


events.Muon.matched_jet.pt


# For generated particles, the parent index is similarly mapped:

# In[ ]:


events.GenPart.parent.pdgId


# In addition, using the parent index, a helper method computes the inverse mapping, namely, `children`. As such, one can find particle siblings with:

# In[ ]:


events.GenPart.parent.children.pdgId
# notice this is a doubly-jagged array


# Since often one wants to shortcut repeated particles in a decay sequence, a helper method `distinctParent` is also available. Here we use it to find the parent particle ID for all prompt electrons:

# In[ ]:


events.GenPart[
    (abs(events.GenPart.pdgId) == 11)
    & events.GenPart.hasFlags(['isPrompt', 'isLastCopy'])
].distinctParent.pdgId


# Events can be filtered like any other awkward array using boolean fancy-indexing

# In[ ]:


mmevents = events[ak.num(events.Muon) == 2]
zmm = mmevents.Muon[:, 0] + mmevents.Muon[:, 1]
zmm.mass


# One can assign new variables to the arrays, with some caveats:
# 
#  * Assignment must use setitem (`events["path", "to", "name"] = value`)
#  * Assignment to a sliced `events` won't be accessible from the original variable
#  * New variables are not visible from cross-references

# In[ ]:


mmevents["Electron", "myvar2"] = mmevents.Electron.pt + zmm.mass
mmevents.Electron.myvar2


# Just to demonstrate that everything is lazily-accessed, here are all the cache items that have built up through the execution of this demo

# In[ ]:


print("\n".join(sorted(cache.keys())))