Tutorial 1: Basics¶

This tutorial will talk about how to use this software from your own python project or Jupyter notebook. There is also a nice command line interface that enables you to do the same with just two lines in your command line.

NOTE FOR CONTRIBUTORS: Always clear all output before commiting (Cell > All Output > Clear)!

In [ ]:

# Magic
%matplotlib inline
# Reload modules whenever they change
%load_ext autoreload
%autoreload 2

# Make clusterking package available even without installation
import sys
sys.path = ["../../"] + sys.path

import clusterking as ck

Scanning¶

Setting it up¶

Let's set up a scanner object and configure it.

In [ ]:

s = ck.scan.WilsonScanner()

First we set up the function/distribution that we want to consider. Here we look into the branching ratio with respect to $q^2$ of $B\to D \,\tau\, \bar\nu_\tau$. The function of the differential branching ration is taken from the flavio package (https://flav-io.github.io/). The $q^2$ binning is chosen to have 9 bins between $3.2 \,\text{GeV}^2$ and $11.6\,\text{GeV}^2$ and is implemented as follows:

In [ ]:

import flavio
import numpy as np

def dBrdq2(w, q):
    return flavio.np_prediction("dBR/dq2(B+->Dtaunu)", w, q)

s.set_dfunction(
    dBrdq2,
    binning=np.linspace(3.2, 11.6, 10),
    normalize=True
)

First, let's set up the Wilson coefficients that need to be sampled. The Wilson coefficients are implemented using the Wilson package (https://wilson-eft.github.io/), which allows to use a variety of bases, EFTs and matches them to user specified scales. Using the example of $B\to D \tau \bar\nu_\tau$, we sample the coefficients CVL_bctaunutau, CSL_bctaunutau and CT_bctaunutau from the flavio basis (https://wcxf.github.io/assets/pdf/WET.flavio.pdf) with 10 points between $-1$ and $1$ at the scale of 5 GeV:

In [ ]:

s.set_spoints_equidist(
    {
        "CVL_bctaunutau": (-1, 1, 10),
        "CSL_bctaunutau": (-1, 1, 10),
        "CT_bctaunutau": (-1, 1, 10)
    },
    scale=5,
    eft='WET',
    basis='flavio'
)

Running it¶

Now to compute the kinematical distributions from the Wilson coefficients sampled above we need a data instance:

In [ ]:

d = ck.Data()

Computing the kinematical distributions is done using run() method:

In [ ]:

s.run(d)

The results are saved in a dataframe, d.df. Let's have a look:

In [ ]:

d.df.head()

Clustering¶

Let's build a hierarchy cluster out of the data object we created above.

In [ ]:

c = ck.cluster.HierarchyCluster(d)

First, we have to specify the metric we want to use to measure the distance between different distributions. If no argument is specified, the common $\chi^2$ metric from is used.

In [ ]:

c.set_metric()

Let's build now the hierarchy cluster:

In [ ]:

c.build_hierarchy()

The maximal distance between the individual clusters max_d can be chosen as follows:

In [ ]:

c.cluster(max_d=0.15)

Now we add the information about the clusters to the dataframe created above:

In [ ]:

c.write()

Let's take a look and notice the new column cluster at the end of the data frame:

In [ ]:

d.df.head()

Selecting benchmark points¶

In a similar way we can determine the benchmark points representing the individual clusters. Initializing a benchmark point object

In [ ]:

b = ck.Benchmark(d)

and again choosing a metric ($\chi^2$ metric is default)

In [ ]:

b.set_metric()

the benchmark points can be computed

In [ ]:

b.select_bpoints()

and written in the dataframe:

In [ ]:

b.write()

Let's take a look and notice the new column bpoint at the end of the data frame:

In [ ]:

d.df.head()

Preserving results¶

Now it's time to write out the results for later use.

In [ ]:

d.write("output/cluster", "tutorial_basics", overwrite="overwrite")

This will not only write out the data itself, but also a lot of associated metadata that makes it easy to later reconstruct what the data actually represents. This was accumulated in the attribute d.md over all steps:

In [ ]:

d.md

In [ ]: