PyROOT has the goal to combine the convenience of the Python language with the efficiency of C++ implementations. The dynamic C++ bindings of PyROOT powered by the C++ interpreter cling allow to use conveniently efficient implementations in Python.
import ROOT
import numpy as np
np.random.seed(1234)
Welcome to JupyROOT 6.22/00
Just-in-time compilation (jitting) allows to use the power of the C++ compiler to optimize computation heavy operations. Just like numpy implements the actual functionality under the hood in C(++), PyROOT allows you to do the same dynamically.
ROOT.gInterpreter.Declare('''
float largest_sum(float* v1, float* v2, std::size_t size){
float r = -999.f;
for (size_t i1 = 0; i1 < size; i1++) {
for (size_t i2 = 0; i2 < size; i2++) {
const auto tmp = v1[i1] + v2[i2];
if (tmp > r) r = tmp;
}
}
return r;
}
''');
As example inputs, we generate two numpy arrays with random numbers.
size = 100
v1 = np.random.randn(size).astype(np.float32)
v2 = np.random.randn(size).astype(np.float32)
And next we benchmark the runtime:
%%timeit
ROOT.largest_sum(v1, v2, size)
25.5 µs ± 323 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
How does the C++ kernel compare to a pure Python implementation?
def largest_sum(x1, x2):
r = -999.0
for e1 in x1:
for e2 in x2:
tmp = e1 + e2
if tmp > r: r = tmp
return r
The Python implementation is a factor of 100 slower!
%%timeit
largest_sum(v1, v2)
2.25 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Improved C++ performance can be expected by precompiling the functionality and loading interfacing the functions via PyROOT.
%%bash
echo 'Header:'
cat analysis.hxx
echo
echo 'Source:'
cat analysis.cxx
g++ -Ofast -shared -o libanalysis.so analysis.cxx
Header: #include <cstddef> float optimized_largest_sum(float* v1, float* v2, std::size_t size); Source: # include "analysis.hxx" float optimized_largest_sum(float* v1, float* v2, std::size_t size){ float r = -999.f; for (size_t i1 = 0; i1 < size; i1++) { for (size_t i2 = 0; i2 < size; i2++) { const auto tmp = v1[i1] + v2[i2]; if (tmp > r) r = tmp; } } return r; }
You can interactively include the header and functionality from the shared library.
ROOT.gInterpreter.Declare('#include "analysis.hxx"')
ROOT.gSystem.Load('libanalysis.so');
The optimized compilation improves the runtime again by a factor of 5!
%%timeit
ROOT.optimized_largest_sum(v1, v2, size)
4.83 µs ± 468 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Finally, we can show that all implementations come to the same result:
print('PyROOT:', ROOT.largest_sum(v1, v2, size))
print('Native Python:', largest_sum(v1, v2))
print('PyROOT (optimized):', ROOT.optimized_largest_sum(v1, v2, size))
PyROOT: 4.7567291259765625 Native Python: 4.756729 PyROOT (optimized): 4.7567291259765625