The by far most commen data interface in the Python world are numpy arrays. Therefore, it's crucial for PyROOT to provide interoperability features between the wrapped C++ objects and numpy arrays, which allows to feed data seemlessly from C++ to Python and back.
import ROOT
import numpy as np
Welcome to JupyROOT 6.22/00
PyROOT attachs the array interface of numpy to suitable C++ containers and therefore allows the adoption of the memory with numpy arrays.
v1 = ROOT.std.vector['float']((1, 2, 3))
print("ROOT.std.vector['float']", v1)
v2 = np.asarray(v1)
print('numpy.array', v2)
v1[0] = 42
print('numpy.array', v2)
ROOT.std.vector['float'] { 1.00000f, 2.00000f, 3.00000f } numpy.array [1. 2. 3.] numpy.array [42. 2. 3.]
PyROOT also supports due to ROOT::RVec
the adoption of memory from objects with a numpy array interface.
v1 = np.array((1, 2, 3), dtype=np.float32)
print('numpy.array', v1)
v2 = ROOT.VecOps.AsRVec(v1)
print("ROOT.RVec['float']", v2)
v1[0] = 42
print("ROOT.RVec['float']", v2)
numpy.array [1. 2. 3.] ROOT.RVec['float'] { 1.00000f, 2.00000f, 3.00000f } ROOT.RVec['float'] { 42.0000f, 2.00000f, 3.00000f }
In addition, C++ interfaces with raw pointers are understood natively.
ROOT.gInterpreter.Declare('''
float get_element(float* v, unsigned int i) {
return v[i];
}
''')
print('The first element of the numpy.array is', ROOT.get_element(v1, 0))
The first element of the numpy.array is 42.0
Another crucial feature for Python based analysis is moving data from ROOT files to numpy arrays. To do so, PyROOT offers extensions to ROOT::RDataFrame
, which allow to load data in TTrees
as dictionary of numpy arrays. The workflow in mind is doing heavy computation in C++ powered by RDataFrame
and push only the required data to Python.
path = 'root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root'
df = ROOT.RDataFrame('Events', path)
npy = df.Filter('nMuon == 2')\
.Filter('Muon_pt[0] != Muon_pt[1]')\
.Define('Dimuon_mass', 'InvariantMass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)')\
.Range(10000)\
.AsNumpy(['Dimuon_mass'])
The result of AsNumpy
can be directly used to construct a pandas dataframe:
import pandas
pdf = pandas.DataFrame(npy)
print(pdf)
Dimuon_mass 0 34.415466 1 27.915493 2 113.646866 3 1.587861 4 23.723238 ... ... 9995 24.469269 9996 91.798920 9997 18.113958 9998 1.600781 9999 3.073879 [10000 rows x 1 columns]
Or make a plot with matplotlib:
import matplotlib.pyplot as plt
plt.hist(npy['Dimuon_mass'], range=(70, 110), bins=20)
plt.xlabel('Dimoun mass in GeV');
PyROOT supports to create a ROOT::RDataFrame
from numpy arrays, which allows to further transform the dataset and eventually write the data back to disk in a ROOT file.
npy2 = {'x': np.array((1, 2, 3), dtype=np.float32), 'y': np.array((4, 5, 6), dtype=np.int32)}
df = ROOT.RDF.MakeNumpyDataFrame(npy2)
display = df.Display()
df.Snapshot('Events', 'file.root')
display.Print()
x | y | 1.00000f | 4 | 2.00000f | 5 | 3.00000f | 6 |
We can show with rootls
the objects in the newly created file:
%%bash
rootls -t file.root
TTree Jul 10 12:06 2020 Events "Events" x "x/F" 82 y "y/I" 82 Cluster INCLUSIVE ranges: - # 0: [0, 2] The total number of clusters is 1