`order`: An introduction¶

In this example we get to know the most important classes of order and how they are related to describe your analysis and all external data. We will set up a simple but scalable example analysis that involves most of the API. For more info, see the full API documentation.

Classes and Relations¶

Name	Purpose
`Analysis`	Represents the central object of a physics analysis.
`Campaign`	Provides data of a well-defined range of data-taking, detector alignment, MC settings, datasets, etc.
`Config`	Holds analysis information related to a campaign instance (most configuration happens here!).
`Dataset`	Definition of a dataset, produced for / measured in a campaign.
`Process`	Phyiscs process with cross sections for multiple center-of-mass energies, labels, etc.
`Channel`	Analysis channel, often defined by a particular decay resulting in distinct final state objects.
`Category`	Category definition, (optionally) within the phase-space of an analysis channel.
`Variable`	Generic variable description providing expression and selection statements, titles, binning, etc.
`Shift`	Represents a systematic shift with a name, direction and type.

Relations between these classes are the glue that hold an analysis together. If you have ever performed a HEP analysis, they might look pretty familiar to you.

`Analysis` ↔ `Campaign` ↔ `Config`¶

An analysis is not limited to a single campaign (e.g. for combining results across several data-taking periods or even experiments).
A campaign is independent of analyses it is used in. In general, it could be defined externally / centrally.
An analysis stores campaign-related data in config objects.
An analysis can store multiple config objects that are related to the same campaign.

`Campaign`, `Config` ↔ `Dataset`¶

A campaign can contain all datasets that were recorded / produced for its era and settings.
A config contains a subset of its campaign's datasets, depending on what is required in its analysis.
A dataset belongs to a campaign and since a config is distinctly assigned to a campaign, a dataset is also related to a config.

`Dataset` ↔ `Process`¶

A dataset contains physics processes.
A process can be contained in multiple datasets.
Processes can have child and parent processes.

`Channel` ↔ `Category`¶

A category describes a sub-phase-space of a channel, therefore, it belongs to a channel and channels have categories.
Channels can have child channels and a parent channel.
Categories can have child and parent categories.

`Config` ↔ `Channel`, `Variable`, `Shift`¶

A config has channels.
A config has variables.
A config has shifts.

Example Analysis¶

In this example, we define a toy $t\bar{t}H$ analysis with a signal dataset, a $t\bar{t}$ background and real data.

Imports¶

In [1]:

import order as od
import scinum as sn

General, Analysis-unrelated Setup¶

Define a campaign, its datasets and link processes. This could be done externally or even via importing a centrally maintained repository.

In [2]:

# campaign
c_2017 = od.Campaign("2017_13Tev_25ns", 1, ecm=13, bx=25)

# processes
p_data = od.Process("data", 1,
    is_data=True,
    label="data",
)
p_ttH = od.Process("ttH", 2,
    label=r"$t\bar{t}H$",
    xsecs={
        13: sn.Number(0.5071, {"scale": (sn.Number.REL, 0.058, 0.092)}),
    },
)
p_tt = od.Process("tt", 3,
    label=r"$t\bar{t}$",
    xsecs={
        13: sn.Number(831.76, {"scale": (19.77, 29.20)}),
    },
)

# datasets
d_data = od.Dataset("data", 1,
    campaign=c_2017,
    is_data=True,
    n_files=100,
    n_events=200000,
    keys=["/data/2017.../AOD"],
)
d_ttH = od.Dataset("ttH", 2,
    campaign=c_2017,
    n_files=50,
    n_events=100000,
    keys=["/ttH_powheg.../.../AOD"],
)
d_tt = od.Dataset("tt", 3,
    campaign=c_2017,
    n_files=500,
    n_events=87654321,
    keys=["/tt_powheg.../.../AOD"],
)
d_WW = od.Dataset("WW", 4,
    campaign=c_2017,
    n_files=100,
    n_events=54321,
    keys=["/WW_madgraph.../.../AOD"],
)

# link processes to datasets
d_data.add_process(p_data)
d_ttH.add_process(p_ttH)
d_tt.add_process(p_tt)
print([len(d.processes) for d in [d_data, d_ttH, d_tt]])

[1, 1, 1]

Task: Get the cross section of the process in the ttH dataset at the energy of its campaign.

In [3]:

d_ttH.get_process("ttH").get_xsec(d_ttH.campaign.ecm)

Out[3]:

$0.5071\;^{+0.0294118}_{-0.0466532}\;\left(\text{scale}\right)$

Analysis Setup¶

Now, define the analysis object and create a config for the 2017_13Tev_25ns campaign:

In [4]:

ana = od.Analysis("ttH", 1)

# create a config by passing the campaign, so id and name will be identical
cfg = ana.add_config(c_2017)

Add processes we're interested in and datasets that we want to use:

In [5]:

# add processes manually
cfg.add_process(p_data)
cfg.add_process(p_ttH)
cfg.add_process(p_tt)

# add datasets in a loop
for name in ["data", "ttH", "tt"]:
    cfg.add_dataset(c_2017.get_dataset(name))

Task: Get the mean number of events per file in the ttH dataset.

In [6]:

cfg.get_dataset("ttH").n_events / float(cfg.get_dataset("ttH").n_files)

Out[6]:

2000.0

Define channels and categories:

In [7]:

ch_bb = cfg.add_channel("ttH_bb", 1)
cat_5j = ch_bb.add_category("eq5j",
    label="5 jets",
    selection="n_jets == 5",
)
cat_6j = ch_bb.add_category("ge6j",
    label=r"$\geq$ 6 jets",
    selection="n_jets >= 6",
)

# divide the 6j category further
cat_6j_3b = cat_6j.add_category("ge6j_eq3b",
    label=r"$\geq$ 6 jets, 3 b-tags",
    selection="n_jets >= 6 && n_btags == 3",
)
cat_6j_4b = cat_6j.add_category("ge6j_ge4b",
    label=r"$\geq$ 6 jets, $\geq$ 4 b-tags",
    selection="n_jets >= 6 && n_btags >= 4",
)

Task: Get the ROOT-latex label of the 6j4b category by using only the config.

In [8]:

cfg.get_channel("ttH_bb").get_category("ge6j_ge4b", deep=True).label_root

Out[8]:

'#geq 6 jets, #geq 4 b-tags'

Systematic shifts we're going to study:

In [9]:

cfg.add_shift("nominal", 1)
cfg.add_shift("lumi_up", 2, type="rate")
cfg.add_shift("lumi_down", 3, type="rate")
cfg.add_shift("scale_up", 4, type="shape")
cfg.add_shift("scale_down", 5, type="shape")
print(len(cfg.shifts))

Task: Determine all shift objects starting wiht the source of the scale_down shift.

In [10]:

shifts = [s for s in cfg.shifts if s.source == "scale"]
print(shifts)

[<Shift at 0x10b8d2050, name=scale_up, id=4, context=shift>, <Shift at 0x10b8c7b50, name=scale_down, id=5, context=shift>]

Add some variables that we want to project via ROOT trees (or numpy arrays / pandas dataframes with numexpr).

In [11]:

cfg.add_variable("jet1_pt",
    expression="Reco__jet1__pt",
    binning=(25, 0., 500,),
    unit="GeV",
    x_title=r"Leading jet $p_{T}$",
)
cfg.add_variable("jet1_px",
    expression="Reco__jet1__pt * cos(Reco__jet1__phi)",
    binning=(25, 0., 500,),
    unit="GeV",
    x_title=r"Leading jet $p_{x}$",
)
print(len(cfg.variables))

Task: Get the full ROOT histogram title (i.e. + axis labels) of the jet1_px variable.

In [12]:

cfg.get_variable("jet1_px").get_full_title(root=True)

Out[12]:

'jet1_px;Leading jet p_{x} / GeV;Entries / 20.0 GeV'

Add "soft" information as auxiliary data.

In [13]:

cfg.set_aux("lumi", 40.)
cfg.set_aux(("globalTag", "data"), "80X_dataRun2...")
cfg.set_aux(("globalTag", "mc"), "80X_mcRun2...")
print(len(cfg.aux))

Task: Get the MC global tag.

In [14]:

print(cfg.get_aux(("globalTag", "mc")))

80X_mcRun2...

Now, we can start to use the analysis objects in a "framework" ...

order: An introduction¶

Classes and Relations¶

Analysis ↔ Campaign ↔ Config¶

Campaign, Config ↔ Dataset¶

Dataset ↔ Process¶

Channel ↔ Category¶

Config ↔ Channel, Variable, Shift¶