rf603_multicpu

Likelihood and minimization: setting up a multi-core parallelized unbinned maximum likelihood fit

Author: Clemens Lange, Wouter Verkerke (C++ version)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Sunday, November 27, 2022 at 11:07 AM.

In [1]:
import ROOT
Welcome to JupyROOT 6.27/01

Create 3D pdf and data

Create observables

In [2]:
x = ROOT.RooRealVar("x", "x", -5, 5)
y = ROOT.RooRealVar("y", "y", -5, 5)
z = ROOT.RooRealVar("z", "z", -5, 5)

Create signal pdf gauss(x)gauss(y)gauss(z)

In [3]:
gx = ROOT.RooGaussian("gx", "gx", x, ROOT.RooFit.RooConst(0), ROOT.RooFit.RooConst(1))
gy = ROOT.RooGaussian("gy", "gy", y, ROOT.RooFit.RooConst(0), ROOT.RooFit.RooConst(1))
gz = ROOT.RooGaussian("gz", "gz", z, ROOT.RooFit.RooConst(0), ROOT.RooFit.RooConst(1))
sig = ROOT.RooProdPdf("sig", "sig", [gx, gy, gz])

Create background pdf poly(x)poly(y)poly(z)

In [4]:
px = ROOT.RooPolynomial("px", "px", x, [-0.1, 0.004])
py = ROOT.RooPolynomial("py", "py", y, [0.1, -0.004])
pz = ROOT.RooPolynomial("pz", "pz", z)
bkg = ROOT.RooProdPdf("bkg", "bkg", [px, py, pz])

Create composite pdf sig+bkg

In [5]:
fsig = ROOT.RooRealVar("fsig", "signal fraction", 0.1, 0.0, 1.0)
model = ROOT.RooAddPdf("model", "model", [sig, bkg], [fsig])

Generate large dataset

In [6]:
data = model.generate({x, y, z}, 200000)

Parallel fitting

In parallel mode the likelihood calculation is split in N pieces, that are calculated in parallel and added a posteriori before passing it back to MINUIT.

Use four processes and time results both in wall time and CPU time

In [7]:
model.fitTo(data, NumCPU=4, Timer=True)
Out[7]:
<cppyy.gbl.RooFitResult object at 0x(nil)>
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization
[#1] INFO:Eval -- RooAbsTestStatistic::initMPMode: started 4 remote server process.
 **********
 **    1 **SET PRINT           1
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 fsig         1.00000e-01  5.00000e-02    0.00000e+00  1.00000e+00
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           1
 **********
 **********
 **    5 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **    6 **MIGRAD         500           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (sig,bkg)
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (sig,bkg)
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (sig,bkg)
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (sig,bkg)
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=1.35338e+06 FROM MIGRAD    STATUS=INITIATE        6 CALLS           7 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  fsig         1.00000e-01   5.00000e-02   1.72186e-01  -1.02596e+02
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=1.35338e+06 FROM MIGRAD    STATUS=CONVERGED      16 CALLS          17 TOTAL
                     EDM=1.10509e-08    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  fsig         1.00271e-01   8.91444e-04   2.38484e-03  -3.54200e-02
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  7.947e-07 
[#1] INFO:Minimization -- Command timer: Real time 0:00:00, CP time 0.090
[#1] INFO:Minimization -- Session timer: Real time 0:00:00, CP time 0.090
 **********
 **    7 **SET ERR         0.5
 **********
 **********
 **    8 **SET PRINT           1
 **********
 **********
 **    9 **HESSE         500
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=1.35338e+06 FROM HESSE     STATUS=OK              5 CALLS          22 TOTAL
                     EDM=1.11088e-08    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  fsig         1.00271e-01   8.91444e-04   9.53935e-05  -9.26391e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  1    ERR DEF=0.5
  7.947e-07 
[#1] INFO:Minimization -- Command timer: Real time 0:00:00, CP time 0.010
[#1] INFO:Minimization -- Session timer: Real time 0:00:00, CP time 0.100, 2 slices
[#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization

Parallel MC projections

Construct signal, likelihood projection on (y,z) observables and likelihood ratio

In [8]:
sigyz = sig.createProjection({x})
totyz = model.createProjection({x})
llratio_func = ROOT.RooFormulaVar("llratio", "log10(@0)-log10(@1)", [sigyz, totyz])

Calculate likelihood ratio for each event, subset of events with high signal likelihood

In [9]:
data.addColumn(llratio_func)
dataSel = data.reduce(Cut="llratio>0.7")
[#1] INFO:InputArguments -- The formula llratio>0.7 claims to use the variables (x,y,z,llratio) but only (llratio) seem to be in use.
  inputs:         llratio>0.7

Make plot frame and plot data

In [10]:
frame = x.frame(Title="Projection on X with LLratio(y,z)>0.7", Bins=40)
dataSel.plotOn(frame)
Out[10]:
<cppyy.gbl.RooPlot object at 0x88216f0>

Perform parallel projection using MC integration of pdf using given input dataSet. In self mode the data-weighted average of the pdf is calculated by splitting the input dataset in N equal pieces and calculating in parallel the weighted average one each subset. The N results of those calculations are then weighted into the final result

Use four processes

In [11]:
model.plotOn(frame, ProjWData=dataSel, NumCPU=4)

c = ROOT.TCanvas("rf603_multicpu", "rf603_multicpu", 600, 600)
ROOT.gPad.SetLeftMargin(0.15)
frame.GetYaxis().SetTitleOffset(1.6)
frame.Draw()

c.SaveAs("rf603_multicpu.png")
[#1] INFO:Plotting -- RooAbsReal::plotOn(model) plot on x averages using data variables (y,z)
[#1] INFO:Plotting -- RooAbsReal::plotOn(model) only the following components of the projection data will be used: (y,z)
[#1] INFO:Plotting -- RooDataWeightedAverage::ctor(modelDataWgtAvg) constructing data weighted average of function model_Norm[x] over 14497 data points of (y,z) with a total weight of 14497
[#1] INFO:Eval -- RooAbsTestStatistic::initMPMode: started 4 remote server process.
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (gy,gz,py,pz)
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (gy,gz,py,pz)
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (gy,gz,py,pz)
[#1] INFO:Minimization --  The following expressions have been identified as constant and will be precalculated and cached: (gy,gz,py,pz)
Info in <TCanvas::Print>: png file rf603_multicpu.png has been created

Draw all canvases

In [12]:
from ROOT import gROOT 
gROOT.GetListOfCanvases().Draw()