rf102_dataimport

'BASIC FUNCTIONALITY' RooFit tutorial macro #102 Importing data from ROOT TTrees and THx histograms

Author: Clemens Lange, Wouter Verkerke (C version)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Sunday, November 27, 2022 at 11:05 AM.

In [1]:
import ROOT
from array import array


def makeTH1():

    # Create ROOT ROOT.TH1 filled with a Gaussian distribution

    hh = ROOT.TH1D("hh", "hh", 25, -10, 10)
    for i in range(100):
        hh.Fill(ROOT.gRandom.Gaus(0, 3))
    return hh


def makeTTree():
    # Create ROOT ROOT.TTree filled with a Gaussian distribution in x and a
    # uniform distribution in y

    tree = ROOT.TTree("tree", "tree")
    px = array("d", [0])
    py = array("d", [0])
    tree.Branch("x", px, "x/D")
    tree.Branch("y", py, "y/D")
    for i in range(100):
        px[0] = ROOT.gRandom.Gaus(0, 3)
        py[0] = ROOT.gRandom.Uniform() * 30 - 15
        tree.Fill()
    return tree
Welcome to JupyROOT 6.27/01
#

Importing ROOT histograms

#

Import ROOT TH1 into a RooDataHist

Create a ROOT TH1 histogram

In [2]:
hh = makeTH1()

Declare observable x

In [3]:
x = ROOT.RooRealVar("x", "x", -10, 10)

Create a binned dataset that imports contents of ROOT.TH1 and associates its contents to observable 'x'

In [4]:
dh = ROOT.RooDataHist("dh", "dh", [x], Import=hh)

Plot and fit a RooDataHist

Make plot of binned dataset showing Poisson error bars (RooFit default)

In [5]:
frame = x.frame(Title="Imported ROOT.TH1 with Poisson error bars")
dh.plotOn(frame)
Out[5]:
<cppyy.gbl.RooPlot object at 0xa359bc0>

Fit a Gaussian p.d.f to the data

In [6]:
mean = ROOT.RooRealVar("mean", "mean", 0, -10, 10)
sigma = ROOT.RooRealVar("sigma", "sigma", 3, 0.1, 10)
gauss = ROOT.RooGaussian("gauss", "gauss", x, mean, sigma)
gauss.fitTo(dh)
gauss.plotOn(frame)
Out[6]:
<cppyy.gbl.RooPlot object at 0xa359bc0>
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
 **********
 **    1 **SET PRINT           1
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 mean         0.00000e+00  2.00000e+00   -1.00000e+01  1.00000e+01
     2 sigma        3.00000e+00  9.90000e-01    1.00000e-01  1.00000e+01
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           1
 **********
 **********
 **    5 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **    6 **MIGRAD        1000           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=249.349 FROM MIGRAD    STATUS=INITIATE        8 CALLS           9 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mean         0.00000e+00   2.00000e+00   2.01358e-01   1.15556e+01
   2  sigma        3.00000e+00   9.90000e-01   2.22742e-01   5.42294e+00
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=249.251 FROM MIGRAD    STATUS=CONVERGED      23 CALLS          24 TOTAL
                     EDM=1.58964e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  mean        -1.05079e-01   2.95122e-01   3.29083e-04  -2.34747e-02
   2  sigma        2.93926e+00   2.13363e-01   5.44955e-04  -8.23858e-02
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  2    ERR DEF=0.5
  8.712e-02 -9.823e-05 
 -9.823e-05  4.556e-02 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2
        1  0.00156   1.000 -0.002
        2  0.00156  -0.002  1.000
 **********
 **    7 **SET ERR         0.5
 **********
 **********
 **    8 **SET PRINT           1
 **********
 **********
 **    9 **HESSE        1000
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=249.251 FROM HESSE     STATUS=OK             10 CALLS          34 TOTAL
                     EDM=1.58906e-05    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  mean        -1.05079e-01   2.95122e-01   6.58167e-05  -1.05081e-02
   2  sigma        2.93926e+00   2.13363e-01   1.08991e-04  -4.40523e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  2    ERR DEF=0.5
  8.712e-02 -1.406e-04 
 -1.406e-04  4.556e-02 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2
        1  0.00223   1.000 -0.002
        2  0.00223  -0.002  1.000
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization

Plot and fit a RooDataHist with internal errors

If histogram has custom error (i.e. its contents is does not originate from a Poisson process but e.g. is a sum of weighted events) you can data with symmetric 'sum-of-weights' error instead (same error bars as shown by ROOT)

In [7]:
frame2 = x.frame(Title="Imported ROOT.TH1 with internal errors")
dh.plotOn(frame2, DataError="SumW2")
gauss.plotOn(frame2)
Out[7]:
<cppyy.gbl.RooPlot object at 0xa6b1970>

Please note that error bars shown (Poisson or SumW2) are for visualization only, the are NOT used in a maximum likelihood fit

A (binned) ML fit will ALWAYS assume the Poisson error interpretation of data (the mathematical definition of likelihood does not take any external definition of errors). Data with non-unit weights can only be correctly fitted with a chi^2 fit (see rf602_chi2fit.py)

Importing ROOT TTrees

Import ROOT TTree into a RooDataSet

In [8]:
tree = makeTTree()

Define 2nd observable y

In [9]:
y = ROOT.RooRealVar("y", "y", -10, 10)

Construct unbinned dataset importing tree branches x and y matching between branches and ROOT.RooRealVars is done by name of the branch/RRV

Note that ONLY entries for which x,y have values within their allowed ranges as defined in ROOT.RooRealVar x and y are imported. Since the y values in the import tree are in the range [-15,15] and RRV y defines a range [-10,10] this means that the ROOT.RooDataSet below will have less entries than the ROOT.TTree 'tree'

In [10]:
ds = ROOT.RooDataSet("ds", "ds", {x, y}, Import=tree)
[#1] INFO:DataHandling -- RooTreeDataStore::loadValues(ds) Skipping event #0 because y cannot accommodate the value 14.424
[#1] INFO:DataHandling -- RooTreeDataStore::loadValues(ds) Skipping event #3 because y cannot accommodate the value -12.0022
[#1] INFO:DataHandling -- RooTreeDataStore::loadValues(ds) Skipping event #5 because y cannot accommodate the value 13.8261
[#1] INFO:DataHandling -- RooTreeDataStore::loadValues(ds) Skipping event #6 because y cannot accommodate the value -14.9925
[#1] INFO:DataHandling -- RooTreeDataStore::loadValues(ds) Skipping ...
[#0] WARNING:DataHandling -- RooTreeDataStore::loadValues(ds) Ignored 36 out-of-range events

Use ascii import/export for datasets

In [11]:
def write_dataset(ds, filename):
    # Write data to output stream
    outstream = ROOT.std.ofstream(filename)
    # Optionally, adjust the stream here (e.g. std::setprecision)
    ds.write(outstream)
    outstream.close()


write_dataset(ds, "rf102_testData.txt")

Read data from input stream. The variables of the dataset need to be supplied to the RooDataSet::read() function.

In [12]:
print("\n-----------------------\nReading data from ASCII")
dataReadBack = ROOT.RooDataSet.read(
    "rf102_testData.txt",
    [x, y],  # variables to be read. If the file has more fields, these are ignored.
    "D",  # Prints if a RooFit message stream listens for debug messages. Use Q for quiet.
)

dataReadBack.Print("V")

print("\nOriginal data, line 20:")
ds.get(20).Print("V")

print("\nRead-back data, line 20:")
dataReadBack.get(20).Print("V")
-----------------------
Reading data from ASCII

Original data, line 20:

Read-back data, line 20:
[#1] INFO:DataHandling -- RooDataSet::read: reading file rf102_testData.txt
[#1] INFO:DataHandling -- RooDataSet::read: read 64 events (ignored 0 out of range events)
DataStore dataset (rf102_testData.txt)
  Contains 64 entries
  Observables: 
    1)           x = 0.0174204  L(-10 - 10)  "x"
    2)           y = 9.46654  L(-10 - 10)  "y"
    3)  blindState = Normal(idx = 0)
  "Blinding State"
  1) RooRealVar:: x = -0.79919
  2) RooRealVar:: y = 0.0106407
  1) RooRealVar::          x = -0.79919
  2) RooRealVar::          y = 0.0106407
  3) RooCategory:: blindState = Normal(idx = 0)

Plot data set with multiple binning choices

Print number of events in dataset

In [13]:
ds.Print()
RooDataSet::ds[x,y] = 64 entries

Print unbinned dataset with default frame binning (100 bins)

In [14]:
frame3 = y.frame(Title="Unbinned data shown in default frame binning")
ds.plotOn(frame3)
Out[14]:
<cppyy.gbl.RooPlot object at 0xac593f0>

Print unbinned dataset with custom binning choice (20 bins)

In [15]:
frame4 = y.frame(Title="Unbinned data shown with custom binning")
ds.plotOn(frame4, Binning=20)

frame5 = y.frame(Title="Unbinned data read back from ASCII file")
ds.plotOn(frame5, Binning=20)
dataReadBack.plotOn(frame5, Binning=20, MarkerColor="r", MarkerStyle=5)
Out[15]:
<cppyy.gbl.RooPlot object at 0xab9c6f0>

Draw all frames on a canvas

In [16]:
c = ROOT.TCanvas("rf102_dataimport", "rf102_dataimport", 800, 800)
c.Divide(3, 2)
c.cd(1)
ROOT.gPad.SetLeftMargin(0.15)
frame.GetYaxis().SetTitleOffset(1.4)
frame.Draw()
c.cd(2)
ROOT.gPad.SetLeftMargin(0.15)
frame2.GetYaxis().SetTitleOffset(1.4)
frame2.Draw()
c.cd(4)
ROOT.gPad.SetLeftMargin(0.15)
frame3.GetYaxis().SetTitleOffset(1.4)
frame3.Draw()
c.cd(5)
ROOT.gPad.SetLeftMargin(0.15)
frame4.GetYaxis().SetTitleOffset(1.4)
frame4.Draw()
c.cd(6)
ROOT.gPad.SetLeftMargin(0.15)
frame4.GetYaxis().SetTitleOffset(1.4)
frame5.Draw()

c.SaveAs("rf102_dataimport.png")
Info in <TCanvas::Print>: png file rf102_dataimport.png has been created

Draw all canvases

In [17]:
from ROOT import gROOT 
gROOT.GetListOfCanvases().Draw()