# Using boost-histogram¶

### Henry Schreiner, Hans Dembenski¶

Setup (done for you on binder):

conda env create
conda activate bh-talk
jupyter lab


The interesting parts of the environment:

• boost-histogram 0.10.0: This is the current boost-histogram.
• hist 2.0.0a5: You have to have a recent alpha release of Hist.
• uproot4: The new version that will become default by the end of the year.
• mplhep: The matplotlib HEP helper library

#### Note:¶

boost-histogram has minimal, lightweight requirements and works almost anywhere. Anything beyond NumPy and boost-histogram is just for our demos later.

In [ ]:
import boost_histogram as bh
import numpy as np
import matplotlib.pyplot as plt
import mplhep


#### IPython setup¶

For the demos, let's see the last expression so it's easier to follow what's happening, even if it is an assignment. The default here is 'last_expr', but 'last_expr_or_assign' is better for demos. Also see 'last', 'all', and 'none'.

In [ ]:
%config InteractiveShell.ast_node_interactivity="last_expr_or_assign"


# What is boost-histogram?¶

The Python ecosystem is missing a good Histogram object. NumPy can perform a histogram operation, but it does not produce an object, and there are limitations and performance issues. The closest thing we have to a histogram is in ROOT.

## 1: Basic histograms¶

Let's start with the basics. We will create a histogram using boost-histogram and fill it.

### 1.1: Data¶

Let's make a 1D dataset to run on.

In [ ]:
data1 = np.random.normal(3.5, 2.5, size=1_000_000)


### 1.2: Histogram definition¶

Now, let's make a 1D histogram

In [ ]:
hist1 = bh.Histogram(bh.axis.Regular(40, -2, 10))


### 1.3 Histogram fill¶

Like ROOT, we can fill after we make a histogram, as many times as we want. You can fill single values, but to take advantage of the performance, you should fill with arrays.

In [ ]:
hist1.fill(data1)


You can see that the histogram has been filled. Let's explicitly check to see how many entries are in the histogram:

In [ ]:
hist1.sum()


What happened to the missing items? They are in the underflow and overflow bins:

In [ ]:
hist1.sum(flow=True)


Like ROOT, we have overflow bins by default. We can turn them off, but they enable some powerful things like projections.

Let's plot this:

In [ ]:
plt.bar(hist1.axes[0].centers, hist1.view(), width=hist1.axes[0].widths);


Note: You can leave off the .view() if you want to - histograms conform to the buffer protocol. Also, you can select the axes before or after calling .centers; this is very useful for ND histograms.

From now on, let's be lazy, and use mplhep, which natively supports boost-histogram now.

In [ ]:
mplhep.histplot(hist1);


You should think of boost-histogram like NumPy: No plotting is built in, but the data is easy to access.

Unlike ROOT, Histograms are built of of basic building blocks: 1 or more axes, and a storage. The storage holds accumulators, which can be simple doubles or ints, or more complex things that hold extra information about the operation (which might not even be a sum! (generalized histograms). Here's what it looks like for 1D:

If we go to 2D, this generalizes - we have the same API for ND!

## 2: Drop-in replacement for NumPy¶

To start using this yourself, you don't even need to change your code. Let's try the NumPy adapters.

In [ ]:
bins2, edges2 = bh.numpy.histogram(data1, bins=10)

In [ ]:
b2, e2 = np.histogram(data1, bins=10)

In [ ]:
bins2 - b2

In [ ]:
e2 - edges2


Not bad! Let's start moving to the boost-histogram API, so we can use the plotting function we just learned about:

In [ ]:
hist2 = bh.numpy.histogram(data1, bins="auto", histogram=bh.Histogram)
mplhep.histplot(hist2);


Now we can move over to boost-histogram one step at a time! Just to be complete, we can also go back to a Numpy tuple from a Histogram object:

In [ ]:
b2p, e2p = bh.numpy.histogram(data1, bins=10, histogram=bh.Histogram).to_numpy()
b2p == b2


And, while "recently" NumPy optimized 1D regular binning, we still beat optimized NumPy:

In [ ]:
%%timeit
bh.numpy.histogram(data1, bins=100)

In [ ]:
%%timeit
np.histogram(data1, bins=100)


## 3: More dimensions¶

The same API works for multiple dimensions.

In [ ]:
hist3 = bh.Histogram(bh.axis.Regular(150, -1.5, 1.5), bh.axis.Regular(100, -1, 1))

In [ ]:
data2d = [np.random.normal(size=1_000_000) for _ in range(2)]

In [ ]:
hist3.fill(*data2d)

In [ ]:
plt.pcolormesh(*hist3.axes.edges.T, hist3.view().T);


This is transposed because of differing indexing conventions.

Let's try a 3D histogram

In [ ]:
data3d = [np.random.normal(size=1_000_000) for _ in range(3)]

hist3d = bh.Histogram(
bh.axis.Regular(150, -5, 5),
bh.axis.Regular(100, -5, 5),
bh.axis.Regular(100, -5, 5),
)

hist3d.fill(*data3d)


Let's project to the first two axes:

In [ ]:
mplhep.hist2dplot(hist3d[:, :, sum]);


## 4: UHI¶

Let's explore the boost-histogram UHI syntax. We will start with a 1D histogram:

In [ ]:
h = bh.Histogram(bh.axis.Regular(100, -3.5, 3.5))


Fill it with some mildly interesting data:

In [ ]:
data = np.concatenate([
np.random.normal(-.75,.3, 1_000_000),
np.random.normal(.75,.3, 750_000),
np.random.normal(-1.5,.2, 200_000),
])

h.fill(data)

In [ ]:
mplhep.histplot(h);


I can see that I want x from -2 to 0, in data coordinates:

In [ ]:
mplhep.histplot(h[bh.loc(-2):bh.loc(0)]);


What's the contents of a bin?

In [ ]:
h[bh.loc(-.75)]


To get the coordinates manually, you could do (not UHI):

In [ ]:
h.axes[0].index(.75)


How about reducing a histogram? Let's try the previous 2D Histogram

In [ ]:
mplhep.histplot(hist3[:, sum])
mplhep.histplot(hist3[sum, :]);


Let's look at one part and rebin:

In [ ]:
mplhep.hist2dplot(hist3[: 50 : bh.rebin(2), 50 :: bh.rebin(2)]);


What is the value at (-.75, .5)?

In [ ]:
hist3[bh.loc(-0.75), bh.loc(0.5)]


### Distribution is state-of-the-art Scikit-HEP style¶

The Scikit-HEP developer pages were heavily influenced by boost-histogram.

We now support ARM and PowerPC wheels!