order
: An introduction¶In this example we get to know the most important classes of order and how they are related to describe your analysis and all external data. We will set up a simple but scalable example analysis that involves most of the API. For more info, see the full API documentation.
Name | Purpose |
---|---|
Analysis |
Represents the central object of a physics analysis. |
Campaign |
Provides data of a well-defined range of data-taking, detector alignment, MC settings, datasets, etc. |
Config |
Holds analysis information related to a campaign instance (most configuration happens here!). |
Dataset |
Definition of a dataset, produced for / measured in a campaign. |
Process |
Phyiscs process with cross sections for multiple center-of-mass energies, labels, etc. |
Channel |
Analysis channel, often defined by a particular decay resulting in distinct final state objects. |
Category |
Category definition, (optionally) within the phase-space of an analysis channel. |
Variable |
Generic variable description providing expression and selection statements, titles, binning, etc. |
Shift |
Represents a systematic shift with a name, direction and type. |
Relations between these classes are the glue that hold an analysis together. If you have ever performed a HEP analysis, they might look pretty familiar to you.
Analysis
↔ Campaign
↔ Config
¶Campaign
, Config
↔ Dataset
¶Dataset
↔ Process
¶Channel
↔ Category
¶Config
↔ Channel
, Variable
, Shift
¶In this example, we define a toy tˉtH analysis with a signal dataset, a tˉt background and real data.
import order as od
import scinum as sn
Define a campaign, its datasets and link processes. This could be done externally or even via importing a centrally maintained repository.
# campaign
c_2017 = od.Campaign("2017_13Tev_25ns", 1, ecm=13, bx=25)
# processes
p_data = od.Process("data", 1,
is_data=True,
label="data",
)
p_ttH = od.Process("ttH", 2,
label=r"$t\bar{t}H$",
xsecs={
13: sn.Number(0.5071, {"scale": (sn.Number.REL, 0.058, 0.092)}),
},
)
p_tt = od.Process("tt", 3,
label=r"$t\bar{t}$",
xsecs={
13: sn.Number(831.76, {"scale": (19.77, 29.20)}),
},
)
# datasets
d_data = od.Dataset("data", 1,
campaign=c_2017,
is_data=True,
n_files=100,
n_events=200000,
keys=["/data/2017.../AOD"],
)
d_ttH = od.Dataset("ttH", 2,
campaign=c_2017,
n_files=50,
n_events=100000,
keys=["/ttH_powheg.../.../AOD"],
)
d_tt = od.Dataset("tt", 3,
campaign=c_2017,
n_files=500,
n_events=87654321,
keys=["/tt_powheg.../.../AOD"],
)
d_WW = od.Dataset("WW", 4,
campaign=c_2017,
n_files=100,
n_events=54321,
keys=["/WW_madgraph.../.../AOD"],
)
# link processes to datasets
d_data.add_process(p_data)
d_ttH.add_process(p_ttH)
d_tt.add_process(p_tt)
print([len(d.processes) for d in [d_data, d_ttH, d_tt]])
[1, 1, 1]
Task: Get the cross section of the process in the ttH dataset at the energy of its campaign.
d_ttH.get_process("ttH").get_xsec(d_ttH.campaign.ecm)
Now, define the analysis object and create a config for the 2017_13Tev_25ns
campaign:
ana = od.Analysis("ttH", 1)
# create a config by passing the campaign, so id and name will be identical
cfg = ana.add_config(c_2017)
# add processes manually
cfg.add_process(p_data)
cfg.add_process(p_ttH)
cfg.add_process(p_tt)
# add datasets in a loop
for name in ["data", "ttH", "tt"]:
cfg.add_dataset(c_2017.get_dataset(name))
Task: Get the mean number of events per file in the ttH
dataset.
cfg.get_dataset("ttH").n_events / float(cfg.get_dataset("ttH").n_files)
2000.0
ch_bb = cfg.add_channel("ttH_bb", 1)
cat_5j = ch_bb.add_category("eq5j",
label="5 jets",
selection="n_jets == 5",
)
cat_6j = ch_bb.add_category("ge6j",
label=r"$\geq$ 6 jets",
selection="n_jets >= 6",
)
# divide the 6j category further
cat_6j_3b = cat_6j.add_category("ge6j_eq3b",
label=r"$\geq$ 6 jets, 3 b-tags",
selection="n_jets >= 6 && n_btags == 3",
)
cat_6j_4b = cat_6j.add_category("ge6j_ge4b",
label=r"$\geq$ 6 jets, $\geq$ 4 b-tags",
selection="n_jets >= 6 && n_btags >= 4",
)
Task: Get the ROOT-latex label of the 6j4b category by using only the config.
cfg.get_channel("ttH_bb").get_category("ge6j_ge4b", deep=True).label_root
'#geq 6 jets, #geq 4 b-tags'
cfg.add_shift("nominal", 1)
cfg.add_shift("lumi_up", 2, type="rate")
cfg.add_shift("lumi_down", 3, type="rate")
cfg.add_shift("scale_up", 4, type="shape")
cfg.add_shift("scale_down", 5, type="shape")
print(len(cfg.shifts))
5
Task: Determine all shift objects starting wiht the source of the scale_down
shift.
shifts = [s for s in cfg.shifts if s.source == "scale"]
print(shifts)
[<Shift at 0x10b8d2050, name=scale_up, id=4, context=shift>, <Shift at 0x10b8c7b50, name=scale_down, id=5, context=shift>]
cfg.add_variable("jet1_pt",
expression="Reco__jet1__pt",
binning=(25, 0., 500,),
unit="GeV",
x_title=r"Leading jet $p_{T}$",
)
cfg.add_variable("jet1_px",
expression="Reco__jet1__pt * cos(Reco__jet1__phi)",
binning=(25, 0., 500,),
unit="GeV",
x_title=r"Leading jet $p_{x}$",
)
print(len(cfg.variables))
2
Task: Get the full ROOT histogram title (i.e. + axis labels) of the jet1_px
variable.
cfg.get_variable("jet1_px").get_full_title(root=True)
'jet1_px;Leading jet p_{x} / GeV;Entries / 20.0 GeV'
cfg.set_aux("lumi", 40.)
cfg.set_aux(("globalTag", "data"), "80X_dataRun2...")
cfg.set_aux(("globalTag", "mc"), "80X_mcRun2...")
print(len(cfg.aux))
3
Task: Get the MC global tag.
print(cfg.get_aux(("globalTag", "mc")))
80X_mcRun2...