rf408_RDataFrameToRooFit

Fill RooDataSet/RooDataHist in RDataFrame.

This tutorial shows how to fill RooFit data classes directly from RDataFrame. Using two small helpers, we tell RDataFrame where the data has to go.

Author: Stephan Hageboeck (CERN)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Wednesday, November 30, 2022 at 11:23 AM.

In [1]:
%%cpp -d
#include <RooAbsDataHelper.h>

#include <TRandom.h>

Print the first few entries and summary statistics.

In [2]:
%%cpp -d
void printData(const RooAbsData& data) {
  std::cout << "\n";
  data.Print();

  for (int i=0; i < data.numEntries() && i < 20; ++i) {
    std::cout << "(";
    for (const auto var : *data.get(i)) {
      std::cout << std::setprecision(3) << std::right << std::fixed << std::setw(8) << static_cast<const RooAbsReal*>(var)->getVal() << ", ";
    }
    std::cout << ")\tweight=" << std::setw(10) << data.weight() << std::endl;
  }

  // Get the x and y variables from the dataset:
  const auto & x = static_cast<const RooRealVar&>(*(*data.get())[0]);
  const auto & y = static_cast<const RooRealVar&>(*(*data.get())[1]);

  std::cout << "mean(x) = " << data.mean(x) << "\tsigma(x) = " << std::sqrt(data.moment(x, 2.))
    << "\n" << "mean(y) = " << data.mean(y) << "\tsigma(y) = " << std::sqrt(data.moment(y, 2.)) << std::endl;
}

Set up

We create an RDataFrame with two columns filled with 2 million random numbers.

In [3]:
ROOT::RDataFrame d(2000000);
auto dd = d.Define("x", [](){ return gRandom->Uniform(-5.,  5.); })
           .Define("y", [](){ return gRandom->Gaus(1., 3.); });

We create RooFit variables that will represent the dataset.

In [4]:
RooRealVar x("x", "x", -5.,   5.);
RooRealVar y("y", "y", -50., 50.);
x.setBins(10);
y.setBins(20);

Booking the creation of RooDataSet / RooDataHist in RDataFrame

Method 1:

We directly book the RooDataSetHelper action. We need to pass

  • the RDataFrame column types as template parameters
  • the constructor arguments for RooDataSet (they follow the same syntax as the usual RooDataSet constructors)
  • the column names that RDataFrame should fill into the dataset

NOTE: RDataFrame columns are matched to RooFit variables by position, not by name!

In [5]:
auto rooDataSet = dd.Book<double, double>(
    RooDataSetHelper("dataset", // Name
        "Title of dataset",     // Title
        RooArgSet(x, y)         // Variables in this dataset
        ),
    {"x", "y"}                  // Column names in RDataFrame.
);

Method 2:

We first declare the RooDataHistHelper

In [6]:
RooDataHistHelper rdhMaker{"datahist",  // Name
  "Title of data hist",                 // Title
  RooArgSet(x, y)                       // Variables in this dataset
};

Then, we move it into an RDataFrame action:

In [7]:
auto rooDataHist = dd.Book<double, double>(std::move(rdhMaker), {"x", "y"});

Run it and inspect the results

Let's inspect the dataset / datahist. Note that the first time we touch one of those objects, the RDataFrame event loop will run.

In [8]:
printData(*rooDataSet);
printData(*rooDataHist);
RooDataSet::dataset[x,y] = 2000000 entries
(   4.997,   -0.304, )	weight=     1.000
(   4.472,    0.910, )	weight=     1.000
(   4.575,    0.830, )	weight=     1.000
(   0.400,    0.776, )	weight=     1.000
(   2.599,   -0.232, )	weight=     1.000
(  -1.844,    1.575, )	weight=     1.000
(   0.197,    0.853, )	weight=     1.000
(  -1.077,   -0.721, )	weight=     1.000
(  -4.697,   -3.165, )	weight=     1.000
(   4.437,   -1.208, )	weight=     1.000
(   3.983,   -0.146, )	weight=     1.000
(  -0.014,   -1.447, )	weight=     1.000
(  -3.177,   -2.704, )	weight=     1.000
(  -4.371,   -0.363, )	weight=     1.000
(   2.254,   -0.499, )	weight=     1.000
(   2.139,    6.533, )	weight=     1.000
(   1.993,    6.991, )	weight=     1.000
(  -3.708,    7.781, )	weight=     1.000
(  -4.168,    1.284, )	weight=     1.000
(  -4.177,    4.650, )	weight=     1.000
mean(x) = 0.001	sigma(x) = 2.886
mean(y) = 1.000	sigma(y) = 3.000

RooDataHist::datahist[x,y] = 200 bins (2000000.000 weights)
(  -4.500,  -47.500, )	weight=     0.000
(  -4.500,  -42.500, )	weight=     0.000
(  -4.500,  -37.500, )	weight=     0.000
(  -4.500,  -32.500, )	weight=     0.000
(  -4.500,  -27.500, )	weight=     0.000
(  -4.500,  -22.500, )	weight=     0.000
(  -4.500,  -17.500, )	weight=     0.000
(  -4.500,  -12.500, )	weight=    24.000
(  -4.500,   -7.500, )	weight=  4537.000
(  -4.500,   -2.500, )	weight= 69653.000
(  -4.500,    2.500, )	weight=107838.000
(  -4.500,    7.500, )	weight= 17790.000
(  -4.500,   12.500, )	weight=   292.000
(  -4.500,   17.500, )	weight=     0.000
(  -4.500,   22.500, )	weight=     0.000
(  -4.500,   27.500, )	weight=     0.000
(  -4.500,   32.500, )	weight=     0.000
(  -4.500,   37.500, )	weight=     0.000
(  -4.500,   42.500, )	weight=     0.000
(  -4.500,   47.500, )	weight=     0.000
mean(x) = 0.001	sigma(x) = 2.872
mean(y) = 0.999	sigma(y) = 3.329

Draw all canvases

In [9]:
gROOT->GetListOfCanvases()->Draw()