Notebook

Z to leptons analysis with ATLAS open data and ADL/CutLang¶

This is an exercise showing a simple analysis exploring the Z -> 2 lepton final state, focusing on the e+e- and μ+μ- channels. The analysis aims to explore the kinematics of Z --> e+e- OR μ+μ- events.

The analysis is performed based on ATLAS run1 open data MC ntuples. The below cell retrieves such an ntuple.

The analysis consists of two parts:

Applying some event selection to the input events and making distributions. This part is performed using a special language called ADL, and via a software called CutLang that can read and process ADL.
Drawing plots produced by the previous step. This part is performed using ROOT (with Python syntax). ROOT is the main analysis software used at CERN.

In [ ]:

!wget --progress=dot:giga http://opendata.atlas.cern/release/samples/MC/mc_105987.WZ.root
// Get the ROOT file containing the Z -> eemumu background events

What are ADL and CutLang?¶

(More information on cern.ch/adl)

LHC data analyses are usually performed using complex analysis frameworks written in general purpose languages like C++ and python. But this method has a steep learning curve, as even the simplest tasks could be coded in a complicated way, and it is not straightforward to understand the code, make changes or additions. However there is another emerging alternative which allows to decouple physics content from the technical code and write analyses with a simple, self-describing syntax. Analysis Description Language (ADL) is a HEP-specific analysis language developed with this purpose.

A HEP analysis includes 3 main parts:

Object definitions: Which objects are used? e.g. electrons? muons? jets? What are the selections applied on these?
Event variable definitions: Are there event-wide variables used such as a invariant mass or a transverse mass? How are they calculated?
Event selections: What selections do we apply on events, for example, to enhance the signal and reduce the backgrounds? Are there more than one event selections? How are they defined?

ADL consists of blocks separating object, variable and event selection definitions for a clear separation of analysis components. Blocks have a keyword-expression structure. Keywords specify analysis concepts and operations. Syntax includes mathematical and logical operations, comparison and optimization operators, reducers, 4-vector algebra and HEP-specific functions (dφ, dR, …).

ADL is designed with the goal to be self-describing, so especially for simple cases like in this example, one does not need to read syntax rules to understand an ADL description. However if you are interested, the set of syntax rules can be found here.

Once an analysis is written it needs to be run on events. This is achieved by CutLang , the runtime interpreter who reads and understands the ADL syntax and runs it on events. CutLang is also a framework which aturomatically handles many tedious tasks as reading input events, writing output histograms, etc. CutLang can be run on various environments such as linux, mac, conda, docker, jupyter, etc.

In case you are interested to learn more on CutLang, please see the CutLang github

Writing the analysis with ADL and running with CutLang¶

Writing the analysis with ADL: In the following cell, part of the analysis is written using the ADL syntax. However there are some parts missing. Please follow the instructions in the comments to complete the missing parts. If you feel adventurous, you could modify the object or event selections, add new variables or new histograms.

Running the analysis with CutLang: Executing the cell will run the analysis on both the signal and background events. The run parameters are given in the first line of the cell:

file : input root file
filetype : input event format (do not change!)
adlfile : the name we use for labeling the analysis
events : number of events used from each file
verbose : frequency of processed event numbers written in output text

NOTE: When running jupyter/binder via direct link, if your run does not complete due to memory issues, please reduce the number of events via the "events" parameter.

Analysis output: Running the analysis will produce two outputs:

Text output shown cell output: This includes "cutflows" for each region, i.e. the selections applied and how many events survive the various selections. Histograms are also listed. You should see a separate output for each ROOT file that is run.
ROOT output: One ROOT file called histoOut-<adlfile name>-<file name>.root that includes all the histograms produced by the analysis. These ROOT files will be used in the next step.

In [ ]:

%%cutlang file=mc_105987.WZ.root filetype=ATLASOD adlfile=ZtoLL events=100000 verbose=10000

# ADL file for Z->ee/mumu analysis

# Object selection
# Take input electrons, labeled "ele" and obtain a set of selected electrons "elesel"
object goodEle
  take ele                   # start with initial electron set
  select pT(ele) > 25        # apply a cut on transverse momentum
  select abs(eta(ele)) < 2.5 # apply a cut on pseudorapidity

# Take input muons, labeled "muo" and obtain a set of selected muons "muosel"
object goodMuo
  take muo                   # start with initial muon set
  select pT(muo) > 25        # apply a cut on transverse momentum
  select abs(eta(muo)) < 2.5 # apply a cut on pseudorapidity

object goodLeptons : Union (goodEle, goodMuo)

# Useful definitions
define mLL = m(goodLeptons[0] goodLeptons[1])
define elePDGid = 11    
define muoPDGid = 13

    
# Event selection

algo Zll
  select ALL                                               #cut0: count all events
  histo hneinp, "number of input electrons",    6, 0, 6, size(ele)
  histo hnesel, "number of selected electrons", 6, 0, 6, size(goodEle)
  histo hnminp, "number of input muons",        6, 0, 6, size(muo)
  histo hnmsel, "number of selected muons",     6, 0, 6, size(goodMuo)
  histo hnleps, "number of selected lepts",     6, 0, 6, size(goodLeptons)
  histo hnenminp, "number of input electrons vs muons",    6, 0, 6, 6, 0, 6, size(ele), size(muo)
  histo hnenmsel, "number of selected electrons vs muons", 6, 0, 6, 6, 0, 6, size(goodEle), size(goodMuo)      
  select Size(ele) + Size(muo) > 1                         #cut1: We just want events with at least two leptons
  select Size(goodLeptons) == 2                            #cut2: We want, in fact, exactly two good leptons
  select q(goodLeptons[0]) * q(goodLeptons[1]) == -1       #cut3: The two selected leptons must have opposite charge
  select pdgID(goodLeptons[0])+pdgID(goodLeptons[1])==0    #cut4: The two selected leptons have the same flavor
  histo hZllselbc, "Z(->LL,selected) candidate mass (GeV)", 50, 50, 150, mLL
  select abs(mLL - 91.18) < 20                             #cut5: The absolute value of the difference between the 
#    two leptons and the known Z boson mass (mz) must be less than 20 GeV     
  histo hZllselac, "Z(->LL,selected,massWindow) candidate mass (GeV)", 50, 50, 150, mLL
  select abs(pdgID(goodLeptons[0])) == elePDGid ? hMZee,"Inv.Mass of Z (Zee)",50,50.0,150.0,mLL  : ALL
  select abs(pdgID(goodLeptons[0])) == muoPDGid ? hMZmm,"Inv.Mass of Z (Zmm)",50,50.0,150.0,mLL  : ALL

Checking the analysis output with ROOT¶

Now let's make some plots using the ROOT package in python (which is widely used at CERN). Instructions are shown within comments in the following cells.

What to do:

Compare some of the histograms you made:
- Electrons vs. muons
- Initial leptons vs. selected leptons
- Z candidate invariant mass before and after mass window selection
- Z candidate from selected electrons vs selected muons

In [ ]:

%%python
# Let's start with importing the needed modules
from ROOT import gStyle, TFile, TH1, TH1D, TH2D, TCanvas, TLegend, TColor

# Now let's set some ROOT styling parameters:
# You do not need to know what they mean, but can directly use these settings

gStyle.SetOptStat(0)
gStyle.SetPalette(1)

gStyle.SetTextFont(42)

gStyle.SetTitleStyle(0000)
gStyle.SetTitleBorderSize(0)
gStyle.SetTitleFont(42)
gStyle.SetTitleFontSize(0.055)

gStyle.SetTitleFont(42, "xyz")
gStyle.SetTitleSize(0.5, "xyz")
gStyle.SetLabelFont(42, "xyz")
gStyle.SetLabelSize(0.45, "xyz")

In [ ]:

%%python
# Let's open the output file produced by CutLang: 
# (If you changed the adlfile option when running cutlang, you will need to change the file names)
f = TFile("histoOut-ZtoLL-mc_105987.root")
# We can see what is inside the signal file:
f.ls()
# There should be a directory (TDirectoryFile) per selection algorithm also known as a region.

In [ ]:

%%python
# Let's see what is available:
f.cd("Zll")
f.ls()

In [ ]:

%%python

# Get the histograms out of the file

# lepton counts:
hneinp = f.Get("Zll/hneinp")
hnminp = f.Get("Zll/hnminp")
hnesel = f.Get("Zll/hnesel")
hnmsel = f.Get("Zll/hnmsel")
hnenminp = f.Get("Zll/hnenminp")
hnenmsel = f.Get("Zll/hnenmsel")
# Z reconstruction before cut
hZllselbc = f.Get("Zll/hZllselbc")
# Z reconstruction after cut
hZllselac = f.Get("Zll/hZllselac")
# Z from electrons only
hMZee = f.Get("Zll/hMZee")
# Z from muons only
hMZmm = f.Get("Zll/hMZmm")

In [ ]:

%%python
############ LETS SEE 1D MULTIPLICITIES AND HOW TO MAKE NICE PLOTS

# In order to be able to make many plots, let's define two generic histogrms to which we can 
# assign any of the histograms above:
h1 = hneinp
h2 = hnminp

# Now we format the histograms: lines, colors, axes titles, etc..  
# You do not need to learn the commands here unless you are really curious.
# Otherwise just execute the cell.

# Color numbers can be retrived from https://root.cern.ch/doc/master/classTColor.html
# (check for color wheel)
h1.SetLineColor(600) # kBlue
h2.SetLineColor(416+2) # kGreen + 2

# Make the x-axis title:
title = h1.GetTitle()
    
h1.SetTitle("")
h1.GetXaxis().SetTitle(title)
h1.GetXaxis().SetTitleOffset(1.25)
h1.GetXaxis().SetTitleSize(0.05)
h1.GetXaxis().SetLabelSize(0.045)
h1.GetXaxis().SetNdivisions(8, 5, 0)
h1.GetYaxis().SetTitle("number of events")
h1.GetYaxis().SetTitleOffset(1.4)
h1.GetYaxis().SetTitleSize(0.05)
h1.GetYaxis().SetLabelSize(0.045)

# Set the maximum of the y axis:
if (h2.GetMaximum()>h1.GetMaximum()):
    h1.SetMaximum(h2.GetMaximum()*1.1)
    
# Make a generically usable legend
l = TLegend(0.65, 0.75, 0.88, 0.87)
l.SetBorderSize(0)
l.SetFillStyle(0000)
# You can change the legend titles from here based on what you are plotting
l.AddEntry(h1,h1.GetName(), "l")
l.AddEntry(h2,h2.GetName(), "l")

In [ ]:

%%python %jsroot on
############ LETS SEE 2D MULTIPLICITIES AND THE EFFECT OF MASS WINDOW CUT
c = TCanvas("c", "c", 620, 500)
c.SetBottomMargin(0.15)
c.SetLeftMargin(0.15)
c.SetRightMargin(0.15)
h1.Draw()
h2.Draw("same")
l.Draw("same")
c.Draw()
# Don't worry about the error that appears below!

In [ ]:

%%python %jsroot on
############ LETS SEE 2D MULTIPLICITIES AND THE EFFECT OF MASS WINDOW CUT
c2 = TCanvas("c2", "c2", 620, 500)
c2.Divide(2,1)
c2.SetBottomMargin(0.15)
c2.SetLeftMargin(0.15)
c2.SetRightMargin(0.15)
c2.cd(1)
hnenmsel.Draw("colz")    
hnenmsel.Draw("sametext")    
c2.cd(2)    
hZllselbc.SetLineColor(2)
hZllselac.SetLineColor(4)
hZllselbc.Draw("e")
hZllselac.Draw("esame")    
c2.Draw()
# Don't worry about the error that appears below!

In [ ]:

%%python %jsroot on
############ LETS SEE MUON AND ELECTRON CHANNELS ON TOP OF EACH OTHER
c3 = TCanvas("c3", "c3", 620, 500)
c3.SetBottomMargin(0.15)
c3.SetLeftMargin(0.15)
c3.SetRightMargin(0.15)
hMZmm.SetLineColor(2)
hMZmm.SetTitle("compare ee(blue) & mm(red) channels")
hMZmm.GetXaxis().SetTitle("mLL (GeV)")    
hMZmm.Draw("e")    
hMZee.Draw("esame")    
c3.Draw()
# Don't worry about the error that appears below!

In [ ]: