Standard Test Stat Distribution Demo

StandardTestStatDistributionDemo.C

This simple script plots the sampling distribution of the profile likelihood ratio test statistic based on the input Model File. To do this one needs to specify the value of the parameter of interest that will be used for evaluating the test statistic and the value of the parameters used for generating the toy data. In this case, it uses the upper-limit estimated from the ProfileLikleihoodCalculator, which assumes the asymptotic chi-square distribution for -2 log profile likelihood ratio. Thus, the script is handy for checking to see if the asymptotic approximations are valid. To aid, that comparison, the script overlays a chi-square distribution as well. The most common parameter of interest is a parameter proportional to the signal rate, and often that has a lower-limit of 0, which breaks the standard chi-square distribution. Thus the script allows the parameter to be negative so that the overlay chi-square is the correct asymptotic distribution.

Author: Kyle Cranmer
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Thursday, June 30, 2022 at 10:04 AM.

In [1]:
%%cpp -d
#include "TFile.h"
#include "TROOT.h"
#include "TH1F.h"
#include "TCanvas.h"
#include "TSystem.h"
#include "TF1.h"
#include "TSystem.h"

#include "RooWorkspace.h"
#include "RooAbsData.h"

#include "RooStats/ModelConfig.h"
#include "RooStats/FeldmanCousins.h"
#include "RooStats/ToyMCSampler.h"
#include "RooStats/PointSetInterval.h"
#include "RooStats/ConfidenceBelt.h"

#include "RooStats/ProfileLikelihoodCalculator.h"
#include "RooStats/LikelihoodInterval.h"
#include "RooStats/ProfileLikelihoodTestStat.h"
#include "RooStats/SamplingDistribution.h"
#include "RooStats/SamplingDistPlot.h"
In [2]:
%%cpp -d
// This is a workaround to make sure the namespace is used inside functions
using namespace RooFit;
using namespace RooStats;
In [3]:
bool useProof = false; // flag to control whether to use Proof
int nworkers = 0;      // number of workers (default use all available cores)

The actual macro

Arguments are defined.

In [4]:
const char *infile = "";
const char *workspaceName = "combined";
const char *modelConfigName = "ModelConfig";
const char *dataName = "obsData";

The number of toy mc used to generate the distribution

In [5]:
int nToyMC = 1000;

The parameter below is needed for asymptotic distribution to be chi-square, but set to false if your model is not numerically stable if mu<0

In [6]:
bool allowNegativeMu = true;

First part is just to access a user-defined file or create the standard example file if it doesn't exist

In [7]:
const char *filename = "";
if (!strcmp(infile, "")) {
   filename = "results/example_combined_GaussExample_model.root";
   bool fileExist = !gSystem->AccessPathName(filename); // note opposite return code
   // if file does not exists generate with histfactory
   if (!fileExist) {
#ifdef _WIN32
      cout << "HistFactory file cannot be generated on Windows - exit" << endl;
      return;
#endif
      // Normally this would be run on the command line
      cout << "will run standard hist2workspace example" << endl;
      gROOT->ProcessLine(".! prepareHistFactory .");
      gROOT->ProcessLine(".! hist2workspace config/example.xml");
      cout << "\n\n---------------------" << endl;
      cout << "Done creating example input" << endl;
      cout << "---------------------\n\n" << endl;
   }

} else
   filename = infile;

Try to open the file

In [8]:
TFile *file = TFile::Open(filename);

If input file was specified byt not found, quit

In [9]:
if (!file) {
   cout << "StandardRooStatsDemoMacro: Input file " << filename << " is not found" << endl;
   return;
}

Now get the data and workspace

Get the workspace out of the file

In [10]:
RooWorkspace *w = (RooWorkspace *)file->Get(workspaceName);
if (!w) {
   cout << "workspace not found" << endl;
   return;
}

Get the modelconfig out of the file

In [11]:
ModelConfig *mc = (ModelConfig *)w->obj(modelConfigName);

Get the modelconfig out of the file

In [12]:
RooAbsData *data = w->data(dataName);
input_line_87:2:2: warning: 'data' shadows a declaration with the same name in the 'std' namespace; use '::data' to reference this declaration
 RooAbsData *data = w->data(dataName);
 ^

Make sure ingredients are found

In [13]:
if (!data || !mc) {
   w->Print();
   cout << "data or ModelConfig was not found" << endl;
   return;
}

mc->Print();
input_line_88:2:7: error: reference to 'data' is ambiguous
 if (!data || !mc) {
      ^
input_line_87:2:14: note: candidate found by name lookup is '__cling_N530::data'
 RooAbsData *data = w->data(dataName);
             ^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
    data(initializer_list<_Tp> __il) noexcept
    ^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
    data(_Container& __cont) noexcept(noexcept(__cont.data()))
    ^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
    data(const _Container& __cont) noexcept(noexcept(__cont.data()))
    ^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
    data(_Tp (&__array)[_Nm]) noexcept
    ^

Now find the upper limit based on the asymptotic results

In [14]:
RooRealVar *firstPOI = (RooRealVar *)mc->GetParametersOfInterest()->first();
ProfileLikelihoodCalculator plc(*data, *mc);
LikelihoodInterval *interval = plc.GetInterval();
double plcUpperLimit = interval->UpperLimit(*firstPOI);
delete interval;
cout << "\n\n--------------------------------------" << endl;
cout << "Will generate sampling distribution at " << firstPOI->GetName() << " = " << plcUpperLimit << endl;
int nPOI = mc->GetParametersOfInterest()->getSize();
if (nPOI > 1) {
   cout << "not sure what to do with other parameters of interest, but here are their values" << endl;
   mc->GetParametersOfInterest()->Print("v");
}
input_line_89:3:34: error: reference to 'data' is ambiguous
ProfileLikelihoodCalculator plc(*data, *mc);
                                 ^
input_line_87:2:14: note: candidate found by name lookup is '__cling_N530::data'
 RooAbsData *data = w->data(dataName);
             ^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
    data(initializer_list<_Tp> __il) noexcept
    ^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
    data(_Container& __cont) noexcept(noexcept(__cont.data()))
    ^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
    data(const _Container& __cont) noexcept(noexcept(__cont.data()))
    ^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
    data(_Tp (&__array)[_Nm]) noexcept
    ^

create the test stat sampler

In [15]:
ProfileLikelihoodTestStat ts(*mc->GetPdf());

To avoid effects from boundary and simplify asymptotic comparison, set min=-max

In [16]:
if (allowNegativeMu)
   firstPOI->setMin(-1 * firstPOI->getMax());
input_line_92:2:3: error: use of undeclared identifier 'firstPOI'
 (firstPOI->setMin(-1 * firstPOI->getMax()))
  ^
input_line_92:2:25: error: use of undeclared identifier 'firstPOI'
 (firstPOI->setMin(-1 * firstPOI->getMax()))
                        ^
Error in <HandleInterpreterException>: Error evaluating expression (firstPOI->setMin(-1 * firstPOI->getMax()))
Execution of your code was aborted.

Temporary rooargset

In [17]:
RooArgSet poi;
poi.add(*mc->GetParametersOfInterest());

Create and configure the toymcsampler

In [18]:
ToyMCSampler sampler(ts, nToyMC);
sampler.SetPdf(*mc->GetPdf());
sampler.SetObservables(*mc->GetObservables());
sampler.SetGlobalObservables(*mc->GetGlobalObservables());
if (!mc->GetPdf()->canBeExtended() && (data->numEntries() == 1)) {
   cout << "tell it to use 1 event" << endl;
   sampler.SetNEventsPerToy(1);
}
firstPOI->setVal(plcUpperLimit);                                  // set POI value for generation
sampler.SetParametersForTestStat(*mc->GetParametersOfInterest()); // set POI value for evaluation

if (useProof) {
   ProofConfig pc(*w, nworkers, "", false);
   sampler.SetProofConfig(&pc); // enable proof
}

firstPOI->setVal(plcUpperLimit);
RooArgSet allParameters;
allParameters.add(*mc->GetParametersOfInterest());
allParameters.add(*mc->GetNuisanceParameters());
allParameters.Print("v");

SamplingDistribution *sampDist = sampler.GetSamplingDistribution(allParameters);
SamplingDistPlot plot;
plot.AddSamplingDistribution(sampDist);
plot.GetTH1F(sampDist)->GetYaxis()->SetTitle(
   Form("f(-log #lambda(#mu=%.2f) | #mu=%.2f)", plcUpperLimit, plcUpperLimit));
plot.SetAxisTitle(Form("-log #lambda(#mu=%.2f)", plcUpperLimit));

TCanvas *c1 = new TCanvas("c1");
c1->SetLogy();
plot.Draw();
double min = plot.GetTH1F(sampDist)->GetXaxis()->GetXmin();
double max = plot.GetTH1F(sampDist)->GetXaxis()->GetXmax();

TF1 *f = new TF1("f", Form("2*ROOT::Math::chisquared_pdf(2*x,%d,0)", nPOI), min, max);
f->Draw("same");
c1->SaveAs("standard_test_stat_distribution.pdf");
input_line_94:6:40: error: reference to 'data' is ambiguous
if (!mc->GetPdf()->canBeExtended() && (data->numEntries() == 1)) {
                                       ^
input_line_87:2:14: note: candidate found by name lookup is '__cling_N530::data'
 RooAbsData *data = w->data(dataName);
             ^
/usr/include/c++/9/bits/range_access.h:318:5: note: candidate found by name lookup is 'std::data'
    data(initializer_list<_Tp> __il) noexcept
    ^
/usr/include/c++/9/bits/range_access.h:289:5: note: candidate found by name lookup is 'std::data'
    data(_Container& __cont) noexcept(noexcept(__cont.data()))
    ^
/usr/include/c++/9/bits/range_access.h:299:5: note: candidate found by name lookup is 'std::data'
    data(const _Container& __cont) noexcept(noexcept(__cont.data()))
    ^
/usr/include/c++/9/bits/range_access.h:309:5: note: candidate found by name lookup is 'std::data'
    data(_Tp (&__array)[_Nm]) noexcept
    ^
input_line_94:6:40: error: unknown type name 'data'
if (!mc->GetPdf()->canBeExtended() && (data->numEntries() == 1)) {
                                       ^
input_line_94:6:44: error: expected ')'
if (!mc->GetPdf()->canBeExtended() && (data->numEntries() == 1)) {
                                           ^
input_line_94:6:39: note: to match this '('
if (!mc->GetPdf()->canBeExtended() && (data->numEntries() == 1)) {
                                      ^
input_line_94:10:1: error: use of undeclared identifier 'firstPOI'
firstPOI->setVal(plcUpperLimit);                                  // set POI value for generation
^
input_line_94:10:18: error: use of undeclared identifier 'plcUpperLimit'
firstPOI->setVal(plcUpperLimit);                                  // set POI value for generation
                 ^
input_line_94:18:1: error: use of undeclared identifier 'firstPOI'
firstPOI->setVal(plcUpperLimit);
^
input_line_94:18:18: error: use of undeclared identifier 'plcUpperLimit'
firstPOI->setVal(plcUpperLimit);
                 ^
input_line_94:28:49: error: use of undeclared identifier 'plcUpperLimit'
   Form("f(-log #lambda(#mu=%.2f) | #mu=%.2f)", plcUpperLimit, plcUpperLimit));
                                                ^
input_line_94:28:64: error: use of undeclared identifier 'plcUpperLimit'
   Form("f(-log #lambda(#mu=%.2f) | #mu=%.2f)", plcUpperLimit, plcUpperLimit));
                                                               ^
input_line_94:29:50: error: use of undeclared identifier 'plcUpperLimit'
plot.SetAxisTitle(Form("-log #lambda(#mu=%.2f)", plcUpperLimit));
                                                 ^
input_line_94:37:70: error: use of undeclared identifier 'nPOI'
TF1 *f = new TF1("f", Form("2*ROOT::Math::chisquared_pdf(2*x,%d,0)", nPOI), min, max);
                                                                     ^
input_line_94:34:1: warning: 'min' shadows a declaration with the same name in the 'std' namespace; use '::min' to reference this declaration
double min = plot.GetTH1F(sampDist)->GetXaxis()->GetXmin();
^
input_line_94:35:1: warning: 'max' shadows a declaration with the same name in the 'std' namespace; use '::max' to reference this declaration
double max = plot.GetTH1F(sampDist)->GetXaxis()->GetXmax();
^

Draw all canvases

In [19]:
gROOT->GetListOfCanvases()->Draw()