SPlot tutorial
This tutorial shows an example of using SPlot to unfold two distributions. The physics context for the example is that we want to know the isolation distribution for real electrons from Z events and fake electrons from QCD. Isolation is our 'control' variable. To unfold them, we need a model for an uncorrelated variable that discriminates between Z and QCD. To do this, we use the invariant mass of two electrons. We model the Z with a Gaussian and the QCD with a falling exponential.
Note, since we don't have real data in this tutorial, we need to generate toy data. To do that we need a model for the isolation variable for both Z and QCD. This is only used to generate the toy data, and would not be needed if we had real data.
Author: Kyle Cranmer
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, March 19, 2024 at 07:18 PM.
%%cpp -d
#include "RooRealVar.h"
#include "RooStats/SPlot.h"
#include "RooDataSet.h"
#include "RooRealVar.h"
#include "RooGaussian.h"
#include "RooExponential.h"
#include "RooChebychev.h"
#include "RooAddPdf.h"
#include "RooProdPdf.h"
#include "RooAddition.h"
#include "RooProduct.h"
#include "RooAbsPdf.h"
#include "RooFitResult.h"
#include "RooWorkspace.h"
#include "RooConstVar.h"
#include "TCanvas.h"
#include "TLegend.h"
#include <iomanip>
using namespace RooFit;
using namespace RooStats;
void AddModel(RooWorkspace &);
void AddData(RooWorkspace &);
void DoSPlot(RooWorkspace &);
void MakePlots(RooWorkspace &);
%%cpp -d
void AddModel(RooWorkspace &ws)
{
// Make models for signal (Higgs) and background (Z+jets and QCD)
// In real life, this part requires an intelligent modeling
// of signal and background -- this is only an example.
// set range of observable
Double_t lowRange = 0., highRange = 200.;
// make a RooRealVar for the observables
RooRealVar invMass("invMass", "M_{inv}", lowRange, highRange, "GeV");
RooRealVar isolation("isolation", "isolation", 0., 20., "GeV");
// --------------------------------------
// make 2-d model for Z including the invariant mass
// distribution and an isolation distribution which we want to
// unfold from QCD.
std::cout << "make z model" << std::endl;
// mass model for Z
RooRealVar mZ("mZ", "Z Mass", 91.2, lowRange, highRange);
RooRealVar sigmaZ("sigmaZ", "Width of Gaussian", 2, 0, 10, "GeV");
RooGaussian mZModel("mZModel", "Z+jets Model", invMass, mZ, sigmaZ);
// we know Z mass
mZ.setConstant();
// we leave the width of the Z free during the fit in this example.
// isolation model for Z. Only used to generate toy MC.
// the exponential is of the form exp(c*x). If we want
// the isolation to decay an e-fold every R GeV, we use
// c = -1/R.
RooConstVar zIsolDecayConst("zIsolDecayConst", "z isolation decay constant", -1);
RooExponential zIsolationModel("zIsolationModel", "z isolation model", isolation, zIsolDecayConst);
// make the combined Z model
RooProdPdf zModel("zModel", "2-d model for Z", RooArgSet(mZModel, zIsolationModel));
// --------------------------------------
// make QCD model
std::cout << "make qcd model" << std::endl;
// mass model for QCD.
// the exponential is of the form exp(c*x). If we want
// the mass to decay an e-fold every R GeV, we use
// c = -1/R.
// We can leave this parameter free during the fit.
RooRealVar qcdMassDecayConst("qcdMassDecayConst", "Decay const for QCD mass spectrum", -0.01, -100, 100, "1/GeV");
RooExponential qcdMassModel("qcdMassModel", "qcd Mass Model", invMass, qcdMassDecayConst);
// isolation model for QCD. Only used to generate toy MC
// the exponential is of the form exp(c*x). If we want
// the isolation to decay an e-fold every R GeV, we use
// c = -1/R.
RooConstVar qcdIsolDecayConst("qcdIsolDecayConst", "Et resolution constant", -.1);
RooExponential qcdIsolationModel("qcdIsolationModel", "QCD isolation model", isolation, qcdIsolDecayConst);
// make the 2-d model
RooProdPdf qcdModel("qcdModel", "2-d model for QCD", RooArgSet(qcdMassModel, qcdIsolationModel));
// --------------------------------------
// combined model
// These variables represent the number of Z or QCD events
// They will be fitted.
RooRealVar zYield("zYield", "fitted yield for Z", 500, 0., 5000);
RooRealVar qcdYield("qcdYield", "fitted yield for QCD", 1000, 0., 10000);
// now make the combined models
std::cout << "make full model" << std::endl;
RooAddPdf model("model", "z+qcd background models", {zModel, qcdModel}, {zYield, qcdYield});
RooAddPdf massModel("massModel", "z+qcd invariant mass model", {mZModel, qcdMassModel}, {zYield, qcdYield});
// interesting for debugging and visualizing the model
model.graphVizTree("fullModel.dot");
std::cout << "import model" << std::endl;
ws.import(model);
ws.import(massModel, RecycleConflictNodes());
}
%%cpp -d
void AddData(RooWorkspace &ws)
{
// Add a toy dataset
// get what we need out of the workspace to make toy data
RooAbsPdf *model = ws.pdf("model");
RooRealVar *invMass = ws.var("invMass");
RooRealVar *isolation = ws.var("isolation");
// make the toy data
std::cout << "make data set and import to workspace" << std::endl;
std::unique_ptr<RooDataSet> data{model->generate({*invMass, *isolation})};
// import data into workspace
ws.import(*data, Rename("data"));
}
%%cpp -d
void DoSPlot(RooWorkspace &ws)
{
std::cout << "Calculate sWeights" << std::endl;
// get what we need out of the workspace to do the fit
RooAbsPdf *model = ws.pdf("model");
RooAbsPdf *massModel = ws.pdf("massModel");
RooRealVar *zYield = ws.var("zYield");
RooRealVar *qcdYield = ws.var("qcdYield");
RooDataSet& data = static_cast<RooDataSet&>(*ws.data("data"));
// The sPlot technique requires that we fix the parameters
// of the model that are not yields after doing the fit.
//
// This *could* be done with the lines below, however this is taken care of
// by the RooStats::SPlot constructor (or more precisely the AddSWeight
// method).
//
// RooRealVar* sigmaZ = ws.var("sigmaZ");
// RooRealVar* qcdMassDecayConst = ws.var("qcdMassDecayConst");
// sigmaZ->setConstant();
// qcdMassDecayConst->setConstant();
RooMsgService::instance().setSilentMode(true);
std::cout << "\n\n------------------------------------------\nThe dataset before creating sWeights:\n";
data.Print();
RooMsgService::instance().setGlobalKillBelow(RooFit::ERROR);
// Now we use the SPlot class to add SWeights for the isolation variable to
// our data set based on fitting the yields to the invariant mass variable
RooStats::SPlot sData{"sData", "An SPlot", data, massModel, RooArgList(*zYield, *qcdYield)};
std::cout << "\n\nThe dataset after creating sWeights:\n";
data.Print();
// Check that our weights have the desired properties
std::cout << "\n\n------------------------------------------\n\nCheck SWeights:" << std::endl;
std::cout << std::endl
<< "Yield of Z is\t" << zYield->getVal() << ". From sWeights it is "
<< sData.GetYieldFromSWeight("zYield") << std::endl;
std::cout << "Yield of QCD is\t" << qcdYield->getVal() << ". From sWeights it is "
<< sData.GetYieldFromSWeight("qcdYield") << std::endl
<< std::endl;
for (Int_t i = 0; i < 10; i++) {
std::cout << "z Weight for event " << i << std::right << std::setw(12) << sData.GetSWeight(i, "zYield") << " qcd Weight"
<< std::setw(12) << sData.GetSWeight(i, "qcdYield") << " Total Weight" << std::setw(12) << sData.GetSumOfEventSWeight(i)
<< std::endl;
}
std::cout << std::endl;
// import this new dataset with sWeights
std::cout << "import new dataset with sWeights" << std::endl;
ws.import(data, Rename("dataWithSWeights"));
RooMsgService::instance().setGlobalKillBelow(RooFit::INFO);
}
Definition of a helper function:
%%cpp -d
void MakePlots(RooWorkspace &ws)
{
// Here we make plots of the discriminating variable (invMass) after the fit
// and of the control variable (isolation) after unfolding with sPlot.
std::cout << "make plots" << std::endl;
// make our canvas
TCanvas *cdata = new TCanvas("sPlot", "sPlot demo", 400, 600);
cdata->Divide(1, 3);
// get what we need out of the workspace
RooAbsPdf *model = ws.pdf("model");
RooAbsPdf *zModel = ws.pdf("zModel");
RooAbsPdf *qcdModel = ws.pdf("qcdModel");
RooRealVar *isolation = ws.var("isolation");
RooRealVar *invMass = ws.var("invMass");
// note, we get the dataset with sWeights
auto& data = static_cast<RooDataSet&>(*ws.data("dataWithSWeights"));
// create weighted data sets
RooDataSet dataw_z{data.GetName(), data.GetTitle(), &data, *data.get(), nullptr, "zYield_sw"};
RooDataSet dataw_qcd{data.GetName(), data.GetTitle(), &data, *data.get(), nullptr, "qcdYield_sw"};
// this shouldn't be necessary, need to fix something with workspace
// do this to set parameters back to their fitted values.
// model->fitTo(*data, Extended());
// plot invMass for data with full model and individual components overlaid
// TCanvas* cdata = new TCanvas();
cdata->cd(1);
RooPlot *frame = invMass->frame(Title("Fit of model to discriminating variable"));
data.plotOn(frame);
model->plotOn(frame, Name("FullModel"));
model->plotOn(frame, Components(*zModel), LineStyle(kDashed), LineColor(kRed), Name("ZModel"));
model->plotOn(frame, Components(*qcdModel), LineStyle(kDashed), LineColor(kGreen), Name("QCDModel"));
TLegend leg(0.11, 0.5, 0.5, 0.8);
leg.AddEntry(frame->findObject("FullModel"), "Full model", "L");
leg.AddEntry(frame->findObject("ZModel"), "Z model", "L");
leg.AddEntry(frame->findObject("QCDModel"), "QCD model", "L");
leg.SetBorderSize(0);
leg.SetFillStyle(0);
frame->Draw();
leg.DrawClone();
// Now use the sWeights to show isolation distribution for Z and QCD.
// The SPlot class can make this easier, but here we demonstrate in more
// detail how the sWeights are used. The SPlot class should make this
// very easy and needs some more development.
// Plot isolation for Z component.
// Do this by plotting all events weighted by the sWeight for the Z component.
// The SPlot class adds a new variable that has the name of the corresponding
// yield + "_sw".
cdata->cd(2);
RooPlot *frame2 = isolation->frame(Title("Isolation distribution with s weights to project out Z"));
// Since the data are weighted, we use SumW2 to compute the errors.
dataw_z.plotOn(frame2, DataError(RooAbsData::SumW2));
zModel->plotOn(frame2, LineStyle(kDashed), LineColor(kRed));
frame2->Draw();
// Plot isolation for QCD component.
// Eg. plot all events weighted by the sWeight for the QCD component.
// The SPlot class adds a new variable that has the name of the corresponding
// yield + "_sw".
cdata->cd(3);
RooPlot *frame3 = isolation->frame(Title("Isolation distribution with s weights to project out QCD"));
dataw_qcd.plotOn(frame3, DataError(RooAbsData::SumW2));
qcdModel->plotOn(frame3, LineStyle(kDashed), LineColor(kGreen));
frame3->Draw();
// cdata->SaveAs("SPlot.gif");
}
Create a workspace to manage the project.
RooWorkspace wspace{"myWS"};
add the signal and background models to the workspace. Inside this function you will find a description of our model.
AddModel(wspace);
make z model [#0] WARNING:InputArguments -- The parameter 'sigmaZ' with range [0, 10] of the RooGaussian 'mZModel' exceeds the safe range of (0, inf). Advise to limit its range. make qcd model make full model import model [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooAddPdf::model [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooProdPdf::zModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooGaussian::mZModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::invMass [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::mZ [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::sigmaZ [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooExponential::zIsolationModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::isolation [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooConstVar::zIsolDecayConst [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::zYield [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooProdPdf::qcdModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooExponential::qcdMassModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::qcdMassDecayConst [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooExponential::qcdIsolationModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooConstVar::qcdIsolDecayConst [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::qcdYield [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooGaussian::mZModel for import of RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooRealVar::invMass for import of RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooRealVar::mZ for import of RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooRealVar::sigmaZ for import of RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooRealVar::zYield for import of RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooExponential::qcdMassModel for import of RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooRealVar::qcdMassDecayConst for import of RooAddPdf::massModel [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) using existing copy of RooRealVar::qcdYield for import of RooAddPdf::massModel
add some toy data to the workspace
AddData(wspace);
make data set and import to workspace [#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing dataset modelData [#1] INFO:ObjectHandling -- RooWorkSpace::import(myWS) changing name of dataset from modelData to data
inspect the workspace if you wish wspace->Print();
do sPlot. This will make a new dataset with sWeights added for every event.
DoSPlot(wspace);
Calculate sWeights ------------------------------------------ The dataset before creating sWeights: RooDataSet::data[invMass,isolation] = 1500 entries The dataset after creating sWeights: RooDataSet::data[invMass,isolation,zYield_sw,L_zYield,qcdYield_sw,L_qcdYield] = 1500 entries ------------------------------------------ Check SWeights: Yield of Z is 543.477. From sWeights it is 543.477 Yield of QCD is 956.917. From sWeights it is 956.917 z Weight for event 0 -0.0490677 qcd Weight 1.04942 Total Weight 1.00036 z Weight for event 1 -0.0490677 qcd Weight 1.04942 Total Weight 1.00036 z Weight for event 2 1.0038 qcd Weight -0.00368157 Total Weight 1.00012 z Weight for event 3 0.950485 qcd Weight 0.0496438 Total Weight 1.00013 z Weight for event 4 -0.0490677 qcd Weight 1.04942 Total Weight 1.00036 z Weight for event 5 0.994479 qcd Weight 0.00564054 Total Weight 1.00012 z Weight for event 6 -0.0490677 qcd Weight 1.04942 Total Weight 1.00036 z Weight for event 7 -0.0490677 qcd Weight 1.04942 Total Weight 1.00036 z Weight for event 8 1.04017 qcd Weight -0.0400605 Total Weight 1.00011 z Weight for event 9 1.04125 qcd Weight -0.0411384 Total Weight 1.00011 import new dataset with sWeights
Make some plots showing the discriminating variable and the control variable after unfolding.
MakePlots(wspace);
make plots [#1] INFO:Plotting -- RooAbsReal::plotOn(model) plot on invMass integrates over variables (isolation) [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (zModel) [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: (mZModel,zIsolationModel) [#1] INFO:Plotting -- RooAbsReal::plotOn(model) plot on invMass integrates over variables (isolation) [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (qcdModel) [#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: (qcdMassModel,qcdIsolationModel) [#1] INFO:Plotting -- RooAbsReal::plotOn(model) plot on invMass integrates over variables (isolation) [#1] INFO:Plotting -- RooAbsReal::plotOn(zModel) plot on isolation integrates over variables (invMass) [#1] INFO:Plotting -- RooAbsReal::plotOn(qcdModel) plot on isolation integrates over variables (invMass)
Draw all canvases
%jsroot on
gROOT->GetListOfCanvases()->Draw()