rf204b_extendedLikelihood_rangedFit¶

This macro demonstrates how to set up a fit in two ranges for plain likelihoods and extended likelihoods.

1. Shape fits (plain likelihood)¶

If you fit a non-extended pdf in two ranges, e.g. pdf->fitTo(data,Range("Range1,Range2")), it will fit the shapes in the two selected ranges and also take into account the relative predicted yields in those ranges.

This is useful for example to represent a full-range fit, but with a blinded signal region inside it.

2. Shape+rate fits (extended likelihood)¶

If your pdf is extended, i.e. measuring both the distribution in the observable as well as the event count in the fitted region, some intervention is needed to make fits in ranges work in a way that corresponds to intuition.

If an extended fit is performed in a sub-range, the observed yield is only that of the subrange, hence the expected event count will converge to a number that is smaller than what's visible in a plot. In such cases, it is often preferred to interpret the extended term with respect to the full range that's plotted, i.e., apply a correction to the extended likelihood term in such a way that the interpretation of the expected event count remains that of the full range. This can be done by applying a correcion factor (equal to the fraction of the pdf that is contained in the fitted range) in the Poisson term that represents the extended likelihood term.

If an extended likelihood fit is performed over two sub-ranges, this correction is even more important: without it, each component likelihood would have a different interpretation of the expected event count (each corresponding to the count in its own region), and a joint fit of these regions with different interpretations of the same model parameter results in a number that is not easily interpreted.

If both regions correct their interpretatin such that N_expected refers to the full range, it is interpreted easily, and consistent in both regions.

This requires that the likelihood model is extended using RooAddPdf in the form SumPdf = Nsig sigPdf + Nbkg bkgPdf.

Author: Stephan Hageboeck, Wouter Verkerke
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Wednesday, November 30, 2022 at 11:22 AM.

In [1]:
using namespace RooFit;


PART 1: Background-only fits¶

Build plain exponential model

In [2]:
RooRealVar x("x", "x", 10, 100);
RooRealVar alpha("alpha", "alpha", -0.04, -0.1, -0.0);
RooExponential model("model", "Exponential model", x, alpha);


Define side band regions and full range

In [3]:
x.setRange("LEFT",10,20);
x.setRange("RIGHT",60,100);

x.setRange("FULL",10,100);

std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};

[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'LEFT' created with bounds [10,20]
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'RIGHT' created with bounds [60,100]
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'FULL' created with bounds [10,100]

input_line_51:7:1: warning: 'data' shadows a declaration with the same name in the 'std' namespace; use '::data' to reference this declaration
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^


Construct an extended pdf, which measures the event count N on the full range. If the actual domain of x that is fitted is identical to FULL, this has no affect.

If the fitted domain is a subset of FULL, though, the expected event count is divided by $$\mathrm{frac} = \frac{ \int_{\mathrm{Fit range}} \mathrm{model}(x) \; \mathrm{d}x }{ \int_{\mathrm{Full range}} \mathrm{model}(x) \; \mathrm{d}x }.$$ N will therefore return the count extrapolated to the full range instead of the fit range.

Note: When using a RooAddPdf for extending the likelihood, the same effect can be achieved with RooAddPdf::fixCoefRange(),

In [4]:
RooRealVar N("N", "Extended term", 0, 20000);
RooExtendPdf extmodel("extmodel", "Extended model", model, N, "FULL");


It can be instructive to fit the above model to either the LEFT or RIGHT range. N should approximately converge to the expected number of events in the full range. One may try to leave out "FULL" in the constructor, or the interpretation of N changes.

In [5]:
extmodel.fitTo(*data, Range("LEFT"), PrintLevel(-1));
N.Print();

input_line_53:2:18: error: reference to 'data' is ambiguous
extmodel.fitTo(*data, Range("LEFT"), PrintLevel(-1));
^
input_line_51:7:29: note: candidate found by name lookup is '__cling_N522::data'
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
data(initializer_list<_Tp> __il) noexcept
^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
data(_Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
data(const _Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
data(_Tp (&__array)[_Nm]) noexcept
^


If we now do a simultaneous fit to the extended model, instead of the original model, the LEFT and RIGHT range will each correct their local N such that it refers to the FULL range.

This joint fit of the extmodel will include (w.r.t. the plain model fit) a product of extended terms $$L_\mathrm{ext} = L \cdot \mathrm{Poisson} \left( N_\mathrm{obs}^\mathrm{LEFT} | N_\mathrm{exp} / \mathrm{frac LEFT} \right) \cdot \mathrm{Poisson} \left( N_\mathrm{obs}^\mathrm{RIGHT} | N_\mathrm{exp} / \mathrm{frac RIGHT} \right)$$

In [6]:
TCanvas* c = new TCanvas("c", "c", 2100, 700);
c->Divide(3);
c->cd(1);

std::unique_ptr<RooFitResult> r{model.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
r->Print();

RooPlot* frame = x.frame();
data->plotOn(frame);
model.plotOn(frame, VisualizeError(*r));
model.plotOn(frame);
model.paramOn(frame, Label("Non-extended fit"));
frame->Draw();

c->cd(2);

std::unique_ptr<RooFitResult> r2{extmodel.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
r2->Print();
RooPlot* frame2 = x.frame();
data->plotOn(frame2);
extmodel.plotOn(frame2);
extmodel.plotOn(frame2, VisualizeError(*r2));
extmodel.paramOn(frame2, Label("Extended fit"), Layout(0.4,0.95));
frame2->Draw();

input_line_54:6:46: error: reference to 'data' is ambiguous
std::unique_ptr<RooFitResult> r{model.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
^
input_line_51:7:29: note: candidate found by name lookup is '__cling_N522::data'
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
data(initializer_list<_Tp> __il) noexcept
^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
data(_Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
data(const _Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
data(_Tp (&__array)[_Nm]) noexcept
^
input_line_54:10:1: error: reference to 'data' is ambiguous
data->plotOn(frame);
^
input_line_51:7:29: note: candidate found by name lookup is '__cling_N522::data'
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
data(initializer_list<_Tp> __il) noexcept
^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
data(_Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
data(const _Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
data(_Tp (&__array)[_Nm]) noexcept
^
input_line_54:18:50: error: reference to 'data' is ambiguous
std::unique_ptr<RooFitResult> r2{extmodel.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
^
input_line_51:7:29: note: candidate found by name lookup is '__cling_N522::data'
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
data(initializer_list<_Tp> __il) noexcept
^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
data(_Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
data(const _Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
data(_Tp (&__array)[_Nm]) noexcept
^
input_line_54:21:1: error: reference to 'data' is ambiguous
data->plotOn(frame2);
^
input_line_51:7:29: note: candidate found by name lookup is '__cling_N522::data'
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
data(initializer_list<_Tp> __il) noexcept
^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
data(_Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
data(const _Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
data(_Tp (&__array)[_Nm]) noexcept
^


Now we repeat the above exercise, but instead of explicitly adding an extended term to a single shape pdf (RooExponential), we assume that we already have an extended likelihood model in the form of a RooAddPdf constructed in the form Nsig * sigPdf + Nbkg * bkgPdf.

We add a Gaussian to the previously defined exponential background. The signal shape parameters are chosen constant, since the signal region is entirely in the blinded region, i.e., the fit has no sensitivity.

In [7]:
c->cd(3);

RooRealVar Nsig("Nsig", "Number of signal events", 1000, 0, 2000);
RooRealVar Nbkg("Nbkg", "Number of background events", 10000, 0, 20000);

RooRealVar mean("mean", "Mean of signal model", 40.);
RooRealVar width("width", "Width of signal model", 5.);
RooGaussian sig("sig", "Signal model", x, mean, width);

RooAddPdf modelsum("modelsum", "NSig*signal + NBkg*background", {sig, model}, {Nsig, Nbkg});

input_line_56:2:3: error: use of undeclared identifier 'c'
(c->cd(3))
^
Error in <HandleInterpreterException>: Error evaluating expression (c->cd(3))
Execution of your code was aborted.


This model will automatically insert the correction factor for the reinterpretation of Nsig and Nnbkg in the full ranges.

When this happens, it reports this with lines like the following:

[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_modelsum_modelsumData_LEFT) fixing interpretation of coefficients of any RooAddPdf to full domain of observables
[#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_modelsum_modelsumData_RIGHT) fixing interpretation of coefficients of any RooAddPdf to full domain of observables
In [8]:
std::unique_ptr<RooFitResult> r3{modelsum.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
r3->Print();

RooPlot* frame3 = x.frame();
data->plotOn(frame3);
modelsum.plotOn(frame3);
modelsum.plotOn(frame3, VisualizeError(*r3));
modelsum.paramOn(frame3, Label("S+B fit with RooAddPdf"), Layout(0.3,0.95));
frame3->Draw();

c->Draw();

input_line_57:2:51: error: reference to 'data' is ambiguous
std::unique_ptr<RooFitResult> r3{modelsum.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
^
input_line_51:7:29: note: candidate found by name lookup is '__cling_N522::data'
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
data(initializer_list<_Tp> __il) noexcept
^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
data(_Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
data(const _Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
data(_Tp (&__array)[_Nm]) noexcept
^
input_line_57:3:3: error: member reference type 'std::unique_ptr<RooFitResult>' is not a pointer; did you mean to use '.'?
r3->Print();
~~^~
.
input_line_57:3:5: error: no member named 'Print' in 'std::unique_ptr<RooFitResult, std::default_delete<RooFitResult> >'
r3->Print();
~~  ^
input_line_57:6:1: error: reference to 'data' is ambiguous
data->plotOn(frame3);
^
input_line_51:7:29: note: candidate found by name lookup is '__cling_N522::data'
std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
data(initializer_list<_Tp> __il) noexcept
^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
data(_Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
data(const _Container& __cont) noexcept(noexcept(__cont.data()))
^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
data(_Tp (&__array)[_Nm]) noexcept
^
input_line_57:8:40: error: indirection requires pointer operand ('std::unique_ptr<RooFitResult>' invalid)
modelsum.plotOn(frame3, VisualizeError(*r3));
^~~
input_line_57:12:1: error: use of undeclared identifier 'c'
c->Draw();
^