The modern interface to process datasets in ROOT files (aka TTree
s) is RDataFrame
. The concept is a computation graph, which is built in a declarative manner, and executes the booked computations as efficient as possible. The following notebook provides examples of the workflow in Python.
import ROOT
Welcome to JupyROOT 6.22/00
path = 'root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root'
df = ROOT.RDataFrame('Events', path)
We filter the dataset for events with two muons and opposite charge. The last line restricts the full dataset to a subset of the in total 66 mio. events.
df = df.Filter("nMuon == 2", "Events with exactly two muons")\
.Filter("Muon_charge[0] != Muon_charge[1]", "Muons with opposite charge")\
.Range(100000)
As example for the injection of efficient C++ kernels, the computation of the invariant mass is carried out explicitely with two fourvectors. Alternatively, the ROOT utility ROOT::VecOps::InvariantMass
could be used.
ROOT.gInterpreter.Declare(
"""
using Vec_t = const ROOT::VecOps::RVec<float>&;
float compute_mass(Vec_t pt, Vec_t eta, Vec_t phi, Vec_t mass) {
ROOT::Math::PtEtaPhiMVector p1(pt[0], eta[0], phi[0], mass[0]);
ROOT::Math::PtEtaPhiMVector p2(pt[1], eta[1], phi[1], mass[1]);
return (p1 + p2).mass();
}
""")
df = df.Define("Dimuon_mass", "compute_mass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")
This cell books a histogram. Note that the computation has not started yet! Since the workflow is declarative, the computation just starts when the results are required to perform all computations in one go as optimized as possible.
hist = df.Histo1D(("hist", ";m_{#mu#mu} (GeV);N_{Events}", 5000, 2, 200), "Dimuon_mass")
In addition, we book a cutflow report, which is shown below the result plot.
report = df.Report()
Note that drawing the histogram accessed the object, which also triggers the actual event loop.
ROOT.gStyle.SetOptStat(0); ROOT.gStyle.SetTextFont(42)
c = ROOT.TCanvas("c", "", 800, 700)
c.SetLogx(); c.SetLogy()
hist.Draw()
label = ROOT.TLatex(); label.SetNDC(True)
label.SetTextSize(0.040); label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}")
label.SetTextSize(0.030); label.DrawLatex(0.630, 0.920, "#sqrt{s} = 8 TeV, L_{int} = 11.6 fb^{-1}");
c.Draw()
Here you can inspect the number of events actually read and how many ended up in the histogram.
report.Print()
Events with exactly two muons: pass=131936 all=270077 -- eff=48.85 % cumulative eff=48.85 % Muons with opposite charge: pass=100000 all=131936 -- eff=75.79 % cumulative eff=37.03 %
The following part of the notebook highlights example analyses in Python using a workflow similar to the example above, all freely accessbile thanks to open source and open data!
Link to analysis and notebooks
Link to analysis and notebooks
Link to analysis and notebooks