rf408_RDataFrameToRooFit

Fill RooDataSet/RooDataHist in RDataFrame.

This tutorial shows how to fill RooFit data classes directly from RDataFrame. Using two small helpers, we tell RDataFrame where the data has to go.

Author: Harshal Shende, Stephan Hageboeck (C++ version)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Wednesday, November 30, 2022 at 11:23 AM.

In [1]:
import ROOT
import math
Welcome to JupyROOT 6.27/01

Set up

We create an RDataFrame with two columns filled with 2 million random numbers.

In [2]:
d = ROOT.RDataFrame(2000000)
dd = d.Define("x", "gRandom->Uniform(-5.,  5.)").Define("y", "gRandom->Gaus(1., 3.)")

We create RooFit variables that will represent the dataset.

In [3]:
x = ROOT.RooRealVar("x", "x", -5.0, 5.0)
y = ROOT.RooRealVar("y", "y", -50.0, 50.0)
x.setBins(10)
y.setBins(20)

Booking the creation of RooDataSet / RooDataHist in RDataFrame

Method 1:

We directly book the RooDataSetHelper action. We need to pass

  • the RDataFrame column types as template parameters
  • the constructor arguments for RooDataSet (they follow the same syntax as the usual RooDataSet constructors)
  • the column names that RDataFrame should fill into the dataset

NOTE: RDataFrame columns are matched to RooFit variables by position, not by name!

In [4]:
rooDataSet = dd.Book(
    ROOT.std.move(ROOT.RooDataSetHelper("dataset", "Title of dataset", ROOT.RooArgSet(x, y))), ("x", "y")
)

Method 2:

We first declare the RooDataHistHelper

In [5]:
rdhMaker = ROOT.RooDataHistHelper("dataset", "Title of dataset", ROOT.RooArgSet(x, y))

Then, we move it into an RDataFrame action:

In [6]:
rooDataHist = dd.Book(ROOT.std.move(rdhMaker), ("x", "y"))

Run it and inspect the results

Let's inspect the dataset / datahist. Note that the first time we touch one of those objects, the RDataFrame event loop will run.

In [7]:
def printData(data):
    print("")
    data.Print()
    for i in range(min(data.numEntries(), 20)):
        print(
            "("
            + ", ".join(["{0:8.3f}".format(var.getVal()) for var in data.get(i)])
            + ", )  weight={0:10.3f}".format(data.weight())
        )

    print("mean(x) = {0:.3f}".format(data.mean(x)) + "\tsigma(x) = {0:.3f}".format(math.sqrt(data.moment(x, 2.0))))
    print("mean(y) = {0:.3f}".format(data.mean(y)) + "\tsigma(y) = {0:.3f}\n".format(math.sqrt(data.moment(y, 2.0))))


printData(rooDataSet)
printData(rooDataHist)
(   4.997,   -0.304, )  weight=     1.000
(   4.472,    0.910, )  weight=     1.000
(   4.575,    0.830, )  weight=     1.000
(   0.400,    0.776, )  weight=     1.000
(   2.599,   -0.232, )  weight=     1.000
(  -1.844,    1.575, )  weight=     1.000
(   0.197,    0.853, )  weight=     1.000
(  -1.077,   -0.721, )  weight=     1.000
(  -4.697,   -3.165, )  weight=     1.000
(   4.437,   -1.208, )  weight=     1.000
(   3.983,   -0.146, )  weight=     1.000
(  -0.014,   -1.447, )  weight=     1.000
(  -3.177,   -2.704, )  weight=     1.000
(  -4.371,   -0.363, )  weight=     1.000
(   2.254,   -0.499, )  weight=     1.000
(   2.139,    6.533, )  weight=     1.000
(   1.993,    6.991, )  weight=     1.000
(  -3.708,    7.781, )  weight=     1.000
(  -4.168,    1.284, )  weight=     1.000
(  -4.177,    4.650, )  weight=     1.000
mean(x) = 0.001	sigma(x) = 2.886
mean(y) = 1.000	sigma(y) = 3.000


(  -4.500,  -47.500, )  weight=     0.000
(  -4.500,  -42.500, )  weight=     0.000
(  -4.500,  -37.500, )  weight=     0.000
(  -4.500,  -32.500, )  weight=     0.000
(  -4.500,  -27.500, )  weight=     0.000
(  -4.500,  -22.500, )  weight=     0.000
(  -4.500,  -17.500, )  weight=     0.000
(  -4.500,  -12.500, )  weight=    24.000
(  -4.500,   -7.500, )  weight=  4537.000
(  -4.500,   -2.500, )  weight= 69653.000
(  -4.500,    2.500, )  weight=107838.000
(  -4.500,    7.500, )  weight= 17790.000
(  -4.500,   12.500, )  weight=   292.000
(  -4.500,   17.500, )  weight=     0.000
(  -4.500,   22.500, )  weight=     0.000
(  -4.500,   27.500, )  weight=     0.000
(  -4.500,   32.500, )  weight=     0.000
(  -4.500,   37.500, )  weight=     0.000
(  -4.500,   42.500, )  weight=     0.000
(  -4.500,   47.500, )  weight=     0.000
mean(x) = 0.001	sigma(x) = 2.872
mean(y) = 0.999	sigma(y) = 3.329

RooDataSet::dataset[x,y] = 2000000 entries
RooDataHist::dataset[x,y] = 200 bins (2e+06 weights)

Draw all canvases

In [8]:
from ROOT import gROOT 
gROOT.GetListOfCanvases().Draw()