Notebook

In [1]:

!jupyter nbextension enable splitcell/splitcell
!jupyter nbextension enable rise/main

Enabling notebook extension splitcell/splitcell...
      - Validating: OK
Enabling notebook extension rise/main...
      - Validating: OK

In [2]:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
from matplotlib import colors, cm

import os
import pandas as pd
from tensorflow import keras

from tutorial import get_file

eos access: ✗

In [3]:

plt.rcParams["axes.grid"] = False
plt.rcParams.update({'font.size': 20})
plt.rcParams.update({'figure.figsize': (12,9)})
plt.rcParams['lines.markersize'] = 8

IML Workshop Tutorial¶

Introduction to the Basics of Deep Learning¶

Martin Erdmann, Yannik Rath

Introduction¶

This first session is an introduction to neural networks and deep learning

Neural network basics
Training and generalization
Fully connected and convolutional architectures
Physics!

This is a tutorial, not a talk. Feel free to ask questions throughout!

Plan of Today¶

Marked tutorials use same technical setup

Setup for Tutorial¶

You can follow along locally or use Swan/Binder:

Go to https://github.com/3pia/iml2019
Suggested: Click Open in SWAN

$\qquad \rightarrow$ Enter /eos/user/m/mrieger/public/iml2019/intro/setup.sh as environment script

$\qquad \rightarrow$ Press Start my session

Alternatives: Binder, local execution with docker

$\qquad \rightarrow$ Choose Tensorflow v1 image

Second tutorial will use v2 image

Gitter Channel¶

Chat channel for the tutorials at https://gitter.im/IMLWorkshop19:

For any questions that may arise during the open part or afterwards (especially remote)
For us to share code snippets if needed

Deep Learning Libraries¶

Many different frameworks for deep learning, steered mostly from Python

Two most popular:

Tensorflow ( + higher level abstractions with Keras)
Pytorch

^{https://towardsdatascience.com/deep-learning-framework-power-scores-2018-23607ddf297a}

In this tutorial, we will use TensorFlow + Keras

Side note: Many changes coming with TensorFlow 2.0. Some glimpses into the future later today

Training by Example¶

In [4]:

data = np.random.uniform(size=100)
labels = 3*data + 1 + np.random.normal(loc=0.0, scale=0.1, size=100)

plt.scatter(data, labels, label="data")
plt.legend()
plt.show()

Data: {x_i, y_i}, i = 1...N

Define model $y_{m}(x, \theta) = Wx + b$ with free parameters $\theta = (W, b)$

Define objective function (loss/cost) \begin{equation} J(\theta|x,y) = \frac{1}{N} \sum_{i}^{N} [y_{i}-y_{m}(x_{i}, \theta)]^{2} \end{equation}

Minimize objective ("train model") \begin{equation} \hat{\theta} = argmin[J(\theta)] \end{equation}

Easy to do if we have only a few variables

In [5]:

def cost(params):
    W, b = params
    return np.mean((labels - (W*data + b))**2)

from scipy.optimize import minimize
res = minimize(cost, [1., 1.])
W, b = res.x

points = np.linspace(0, 1, 100)
prediction = W*points + b

plt.scatter(data, labels, label="data")
plt.plot(points, prediction, label="model", color="blue")
plt.legend()
plt.show()

Multidimensional Linear Models¶

Now, consider multiple inputs $x = (x_{1}, .., x_{n})$ and outputs $y = (y_{1}, ..., y_{m})$

Example: $x \in \mathbb{R}^{3}$, $y \in \mathbb{R}^{2}$

\begin{gather} \begin{bmatrix} W_{11} & W_{12} & W_{13} \\ W_{21} & W_{22} & W_{23} \end{bmatrix} x \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \end{pmatrix} + \begin{pmatrix} b_{1} \\ b_{2} \end{pmatrix} = \begin{pmatrix} y_{1} \\ y_{2} \end{pmatrix} \end{gather}

^{Adapted from Deep Learning in Physics Research Course, RWTH}

Neural Networks¶

So far we can only describe linear models

$\quad \rightarrow$ Compose model with multiple layers

\begin{align} & h = W^{(1)}x + b^{(1)} \\ & y = W^{(2)}h + b^{(2)} \end{align}

Model ist still linear

\begin{align} & y = W^{(2)}(W^{(1)}x + b^{(1)}) + b^{(2)} \\ & y = W^{(2)}W^{(1)}x + W^{(2)}b^{(1)} + b^{(1)} \end{align}

$\quad \rightarrow$ Apply non-linear activation function $\sigma(x)$

^{Deep Learning in Physics Research Course, RWTH}

Activation Functions¶

Applied elementwise to each node in a hidden layer

Introduces non-linearity

$\quad \rightarrow$ Allows stacking of multiple layers

Mostly ReLU variants used in deep learning due to vanishing gradient problem

Deep Neural Networks¶

^{Michael Nielsen}

Hierarchical feature extraction: Higher level of abstraction with each layer

$\quad \rightarrow$ Enables training on lower-level/"raw" data

Classification vs Regression¶

Regression¶

Predict continous label

Classification¶

Separate events into multiple categories

Minimize mean squared error

\begin{equation} J(\theta|x,y) = \frac{1}{N} \sum_{i}^{N} [y_{i}-y_{m}(x_{i}, \theta)]^{2} \end{equation}

Typically linear/ no activation function in output layer

Minimize cross entropy

\begin{equation} J(\theta|x,y) = - \frac{1}{N} \sum_{i}^{N} y_{i}log[y_{m}(x_{i}, \theta)] \end{equation}

with softmax activation to constrain outputs to (0, 1) and their sum to 1 ("probability like")

\begin{equation} y_{j}(z) = \frac{e^{z_{j}}}{\sum_{i}e^{z_{i}}} \end{equation}

Backpropagation and Gradient Descent¶

Can no longer directly minimize objective function. Instead, minimze iteratively by updating $\theta$ in opposite direction of gradient

\begin{equation} \hat{\theta} \rightarrow \theta - \alpha\frac{dJ}{d\theta} \end{equation}

with learning rate $\alpha$

Derivative can be calculated using the chain rule

\begin{equation} \frac{dJ}{d\theta} = \frac{dJ}{dy_{m}}\frac{dy_{m}}{d\theta} = ... \end{equation}

Stochastic Gradient Descent¶

Evaluation and derivation of the objective function for the full dataset costly.

Instead, calculate gradient for a a small subset (batch) of the training data

$\rightarrow$ Stochastic updates also help avoid local minima

One iteration over the full dataset called an epoch

Optimizers¶

More advanced options than fixed learning rate

Momentum: Use past gradients as "velocity"

Adaptive methods: Learning rates based on past gradients, separate for all parameters

$\qquad \rightarrow$ E.g. Adagrad, Adadelta, Adam

^{Alec Radford}

Initialization¶

Neural network parameters initialized randomly

$\quad \rightarrow$ Break symmetry

Suitable values depend on layer sizes and activation functions

$\quad \rightarrow$ Basic idea: $Var[output] = Var[input]$

Glorot (tanh)

\begin{equation} Var[W] = \frac{2}{N_{in} + N_{out}} \end{equation}

He (ReLU)

\begin{equation} Var[W] = \frac{2}{N_{in}} \end{equation}

Values typically sampled from normal or uniform (with range $\pm \sqrt{3Var}$ ) distribution

Preprocessing¶

Input features can have vastly different scales (e.g. $p_{T}$ vs $\eta$)

Most basic strategy: Normalize to mean 0 and variance 1

^{Andrej Karpathy, http://cs231n.github.io}

Other options possible, e.g. input decorrelation, non-linear transformations

Let's Try it Out¶

For now we use base tensorflow to see all steps. Later we will use higher level abstractions provided in tf.keras

First, let's generate some toy data:

In [6]:

n_samples = 50

np.random.seed(4321)
class1_data = np.random.multivariate_normal([-1., -1.], [[1., 0.], [0., 1.]], n_samples)
class2_data = np.random.multivariate_normal([1., 1.], [[1., 0.], [0., 1.]], n_samples)

train_data = np.concatenate([class1_data, class2_data])
toy_labels = np.zeros(train_data.shape)
toy_labels[:n_samples, 0] = 1
toy_labels[n_samples:, 1] = 1

plt.scatter(*class1_data.T)
plt.scatter(*class2_data.T)
plt.show()

Our class labels are one-hot encoded, i.e. $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ for the first class and $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$ for the second

Tensorflow provides datasets and iterators to handle the data. Our model works with an iterator batch, which will be dynamically loaded during training

In [7]:

tf.reset_default_graph()

inp_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, 2])
target_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, 2])

dataset = tf.data.Dataset.from_tensor_slices((inp_placeholder, target_placeholder)).batch(10).shuffle(buffer_size=2*n_samples)
iterator = dataset.make_initializable_iterator()
inp, target = iterator.get_next()

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.

We define a simple model with one hidden layer, using the initialization rules we learned about

Since this is a classification task, we use cross entropy as our objective function

In [8]:

n_hidden = 10
n_epochs = 100

with tf.variable_scope("initialization"):
    W1 = tf.get_variable("W1", initializer=tf.random.normal((2, n_hidden), stddev=1.))
    b1 = tf.get_variable("b1", initializer=tf.constant(0., shape=(n_hidden,)))
    hidden = tf.nn.relu(tf.add(tf.matmul(inp, W1), b1))

    W2 = tf.get_variable("W2", initializer=tf.random.normal((n_hidden, 2), stddev=(2./n_hidden)**0.5))
    b2 = tf.get_variable("b2", initializer=tf.constant(0., shape=(2,)))
    out = tf.nn.softmax(tf.add(tf.matmul(hidden, W2), b2))

# tensorflow also provides predefined objective functions in tf.losses
cost = -tf.reduce_mean(tf.reduce_sum(target*tf.log(tf.clip_by_value(out,1e-10,1.0)), reduction_indices=1))

train_step = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.

In our training loop, we iterate over all batches in our training data, then re-initialize the iterator and repeat this for N epochs

In [9]:

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(n_epochs):
        sess.run(iterator.initializer, feed_dict={inp_placeholder: train_data,
                                      target_placeholder: toy_labels})
        while True:
            try:
                sess.run(train_step)
            except tf.errors.OutOfRangeError:
                break

    x = y = np.linspace(-3, 3, 41)
    z = np.array([sess.run(out, feed_dict={inp: [[j, i]]})[0, 0] for i in x for j in y])
Z = z.reshape(41, 41)

In [10]:

plt.scatter(*class1_data.T)
plt.scatter(*class2_data.T)
plt.imshow(Z, interpolation="bilinear", origin="lower", extent=(-3, 3, -3., 3.))
plt.show()

Overtraining and Generalization¶

If modeling capacity too low or training insufficient $\rightarrow$ Bad performance

If modeling capacity high, network can learn to memorize training samples $\rightarrow$ Bad generalization (Overtraining)

Solution 1: Evaluate performance on a statistically independent validation set to measure generalization capabilities

$\quad \rightarrow$ Stop training when performance on validation set decreases: Early stopping

Caution: Optimization of network hyperparameters itself a form of training

$\quad \rightarrow$ Validation set not unbiased

Use a third test set to measure final performance once

In physics: Validation set is sometimes called test set, test set is called evaluation set

More sophisticated option: Cross validation

Regularization¶

Methods to suppress overtraining

Already mentioned: Early stopping
L1/L2 regularization: Penalize high weights

\begin{align} & L1: J = J_{0} + \lambda \sum |W_{i}| \\ & L2: J = J_{0} + \lambda \sum W_{i}^{2} \\ \end{align}

with scaling factor $\lambda$

ElasticNet: Combination of L1 and L2 regularization

Dropout¶

Randomly drop a percentage of nodes at each training step

Learn redundant representations $\rightarrow$ More robust model

Can be seen as training an ensemble of losely coupled networks in parallel

For evaluation: Disable dropout, scale node outputs accordingly

^{Dropout: A Simple Way to Prevent Neural Networks from Overfitting}

Convolutional Neural Networks¶

So far, considered only fully connected neural networks

For images, Convolutional neural networks (CNNs) are used

Exploit structures in data: Local correlations and translational invariance

^{Udacity Course Deep Learning}

Convolutions¶

Small filter (weight tensor) is applied to image patch, mapping it to a single value

Convolution: Slide filter over image to create feature map

^{Adapted from Deep Learning in Physics Research Course, RWTH}

Feature Maps¶

Multiple filters are stacked depth-wise

^{Adapted from Deep Learning in Physics Research Course, RWTH}

Each filter can learn a different feature

^{Deep Learning in Physics Research Course, RWTH}

Padding and Pooling¶

Multiple convolutional layers will decrease output size due to edge effects

$\quad \rightarrow$ Can pad with zeros to keep dimensions the same

On the other hand, can pool outputs together when output size reduction is desired

E.g. max pooling: Take the maximum of each patch

^{Deep Learning in Physics Research Course, RWTH}

Feature Hierarchy¶

^{Zeiler & Fergus 2013, adapted by Yann leCun}

CNN Structure¶

Abstraction increases for later layers

$\quad \rightarrow$ Typically smaller spatial extent (pooling), but larger number of feature maps

^{Deep Learning in Physics Research Course, RWTH}

Fully connected layers at the end to combine features

CNN Example - Top Tagging¶

Problem statement: Hadronically decaying top quark, boosted so that all decay products are contained in one fat jet

$\quad \rightarrow$ Distinguish from QCD jets

^{Top tagging at the LHC experiments with proton-proton collisions at √s = 13TeV}

Open dataset (citation), with comparison of various machine learning methods here

One approach: Image based using CNNs

Simplified model here based on Deep-learning Top Taggers or The End of QCD? and Pulling Out All the Tops with Computer Vision and Deep Learning

Input Data and Preprocessing¶

The dataset contains 1 Million jets each for QCD and top events. For this tutorial we only use 10000 events each for experimentation.

Features are the cartesian four vectors for the first 200 constituents. (zero-padded if fewer than 200)

In [11]:

label_key = "is_signal_new"
n_constituents = 200
n_bins = 40
n_events_per_class = None
R_jet = 0.8
plot_range = ([-R_jet, R_jet], [-R_jet, R_jet])

vector_keys = ["PT", "ETA", "PHI"]
vector_names = {
    key: ["{}_{}".format(key, i) for i in range(n_constituents)]
    for key in vector_keys
}

# get input file
input_file = get_file("intro/data/top_tagging.h5")

df = pd.read_hdf(input_file, key="table")
if n_events_per_class is not None:
    # select subset of events for faster testing
    df = df.iloc[pd.np.r_[0:n_events_per_class, -n_events_per_class:0]]
print(df)

            E_0        PX_0        PY_0        PZ_0         E_1        PX_1  \
436  218.364243 -172.341858  110.129105  -76.503624  153.661118 -111.320465   
440  122.238762   26.738468  -91.613998   76.382225  121.227135   17.644758   
441  383.772308  -97.906456   79.640709 -362.426361  200.625992  -54.921326   
444  132.492752  -77.763947  -87.322601  -62.304600   83.946594  -49.450481   
445  730.786987 -209.120010 -193.454315 -672.973877  225.477325  -75.363350   
452  425.659546  323.020142 -155.901611 -229.213257   83.688065   63.508339   
454  184.878754 -163.308701  -48.425228   71.870857   39.259518  -34.826504   
460  337.542389 -144.905655   52.931824 -300.225647  194.716293  -83.677284   
469   57.453899    8.245859   54.802525   15.153853   48.546764  -30.239859   
476  269.725555  197.983688  180.264572  -32.542664  170.214188  112.955109   
159  279.122498 -186.983612 -184.703552   93.973915  130.233459  -87.081039   
174  293.411957  226.085785 -186.910233    6.353484   56.282650   46.258255   
176   68.190567   42.502636   44.376175   29.567446   50.892151   31.388681   
188  364.401398    0.241121 -186.221115  313.225189  126.724442   -5.135780   
190  839.949280  261.210724  149.367004  784.202332  470.249542  160.376953   
195  158.013107  -58.645222 -111.765907   95.064514  147.360916  -64.619461   
196  242.548203  128.798447  133.625168  156.156647  118.898994   65.832794   
198  155.798996  -76.049294  -74.047806 -114.047157  126.377312  -57.531590   
200  156.450684   70.312820  129.299667  -53.052074   93.001892   34.249096   
211   37.492371  -14.051148   34.111626   -6.681313   29.366516   -4.658844   
835  306.089844   77.324013 -286.238312   76.023705   46.095268   14.454353   
838   72.369453  -42.351864  -47.631145  -34.277267   55.552345  -33.023098   
839  208.428543  176.771591  -16.152925 -109.239861  169.994568  144.319977   
843  119.918091  -83.957573  -62.511585   58.513050  106.310768  -77.028214   
844  227.227386  186.315811  -46.266666  121.565208  157.918549  129.314774   
851  517.916016 -294.284393  359.200562 -229.365799   44.625099  -22.775585   
853  286.906738  119.059898  -84.216034  247.078690  128.020676   53.864735   
863  226.503021  -90.072205  141.984818  151.759430  215.646698  -78.667992   
865  681.133545  -71.523308 -199.660583  647.273499  110.246689   -6.527587   
876  120.037498   90.969063   53.741943   56.968712  106.599525   81.198769   
..          ...         ...         ...         ...         ...         ...   
78   112.548164  -69.994125   55.432499  -68.521164   92.737343  -75.086960   
79    85.717751  -74.409958   11.056811  -41.090591   68.491020  -59.455765   
81    85.493820  -27.314945  -51.542961  -62.501278  102.863480  -34.857529   
84   127.245735   85.553139  -93.959938    6.608004  124.445580   73.211723   
86    94.716499  -78.742737   -1.188648  -52.624935   40.758934  -37.494156   
87   100.364716   72.360008  -60.275139   34.698883   97.817802   71.583481   
88   119.582886  -44.312881  110.618286  -10.001425  113.191605  -41.405144   
95    94.261490  -26.274246   90.048035    9.286770   62.105225  -14.019166   
96    59.289379  -39.532875  -41.123634  -16.162573   50.016296  -22.099298   
98   114.959221   82.774529   77.833389  -17.491800   70.668648   54.612209   
300  337.563660  130.170883 -161.850128 -266.100159  145.962097   56.285725   
305  186.623291  115.854362 -119.002655  -85.113960   70.260399   51.881416   
315  327.187653   91.097389 -132.940445 -284.745239  226.840576   64.454109   
317   55.485935  -17.748005   51.104897   12.328293   44.846516    0.138648   
324  171.201141  155.533829   43.707058  -56.645840  138.376160  125.712807   
325  143.212662  -54.333054   31.085360  128.807953   79.777313  -52.362625   
335  158.137543 -133.522797   -7.243723  -84.419617   80.231056  -68.161530   
338  168.803345   52.608585   34.850925  156.564102   87.154083   39.945709   
345  112.950981   64.614891   -4.162764   92.550049  115.284904   63.913273   
347  274.400665  -60.116039 -165.123810  210.750854  121.846008  -68.042580   
791  279.383087   43.128311  131.399200  242.753189   84.100731  -10.180331   
797  229.098892  228.911407   -6.138898   -6.941409   53.032585   49.544853   
810  311.466248  177.566422  -99.088737 -235.929688   64.184464   36.165394   
816  179.288879  120.904938 -132.369537   -2.192201  148.808716  115.180977   
819  153.994751  151.944946  -10.638010   22.670460   70.607399   69.639229   
820  169.716614 -122.102844   36.656090  112.031059  112.158615  -80.692657   
822  287.591492  -81.565056 -140.493576  237.313202  160.357544  -34.246628   
824   91.996681   60.538616  -63.476665   27.734055   93.122139   60.690456   
825  132.474686   35.062359  125.736656   22.593466   61.466259    7.588309   
827   95.355087  -46.626762   25.236219   79.256989   86.610634  -36.374290   

           PY_1        PZ_1         E_2        PX_2  ...  E_199  PX_199  \
436   93.167969  -50.390713   76.708054  -56.523701  ...    0.0     0.0   
440  -93.015450   75.715302   90.420105   21.377417  ...    0.0     0.0   
441   37.994343 -189.184753  123.247223  -33.828953  ...    0.0     0.0   
444  -53.823605  -41.288010   28.072624  -19.964916  ...    0.0     0.0   
445  -66.226990 -201.926651  217.040192  -63.698189  ...    0.0     0.0   
452  -30.651501  -45.065155   35.438320   25.900942  ...    0.0     0.0   
454   -6.211102   17.024878   36.229164  -31.835764  ...    0.0     0.0   
460   30.484045 -173.156784  115.748047  -52.055187  ...    0.0     0.0   
469   37.749592    4.160240   49.887844   -4.153875  ...    0.0     0.0   
476  125.723831  -20.187439   47.387337   35.194256  ...    0.0     0.0   
159  -86.566429   43.403915   35.724968  -23.600866  ...    0.0     0.0   
174  -32.018463    1.651915   47.768581   37.176056  ...    0.0     0.0   
176   36.319347   16.901674   42.756905   23.390104  ...    0.0     0.0   
188  -65.158585  108.568260   95.833519   -0.119313  ...    0.0     0.0   
190   66.286209  437.058350  213.598602   73.139099  ...    0.0     0.0   
195  -96.582558   90.616646  107.360291  -41.913647  ...    0.0     0.0   
196   62.647701   76.669937   55.116993   31.064240  ...    0.0     0.0   
198  -64.622345  -92.115646   83.377754  -41.058903  ...    0.0     0.0   
200   81.849770  -27.874113   90.545639   34.173183  ...    0.0     0.0   
211   28.639086    4.526615   28.641333   -4.654485  ...    0.0     0.0   
835  -42.235260   11.490357   41.543114   12.639909  ...    0.0     0.0   
838  -36.267849  -26.080288   50.658684  -31.153194  ...    0.0     0.0   
839  -11.911080  -89.039459  118.925804  100.790558  ...    0.0     0.0   
843  -49.156624   54.334709   57.184879  -40.574955  ...    0.0     0.0   
844  -42.471935   80.075539   74.789467   61.393959  ...    0.0     0.0   
851   33.639759  -18.467234   32.496765  -18.493771  ...    0.0     0.0   
853  -37.853809  109.795143   98.720222   38.892624  ...    0.0     0.0   
863  132.869980  150.533752  114.808167  -43.253536  ...    0.0     0.0   
865  -32.490814  105.147850  104.702782  -11.399035  ...    0.0     0.0   
876   47.593819   50.050446   78.952003   59.764385  ...    0.0     0.0   
..          ...         ...         ...         ...  ...    ...     ...   
78    42.109306  -34.481441   71.450615  -45.100121  ...    0.0     0.0   
79     8.834720  -32.832600   53.507847  -46.168694  ...    0.0     0.0   
81   -38.749676  -88.680946   96.135536  -31.703087  ...    0.0     0.0   
84   -95.260292  -32.437981  105.382408   56.979771  ...    0.0     0.0   
86     8.191441  -13.725134   38.636875  -36.520794  ...    0.0     0.0   
87   -57.933125   32.983059   88.084023   80.769066  ...    0.0     0.0   
88   104.923485   -9.434802   87.742432  -33.157097  ...    0.0     0.0   
95    59.618134   10.305344   53.662228   11.528226  ...    0.0     0.0   
96   -39.540451  -21.208580   43.462807  -20.643940  ...    0.0     0.0   
98    40.573910   19.113411   50.274555   37.760147  ...    0.0     0.0   
300  -69.983795 -115.061378   73.108635    5.547623  ...    0.0     0.0   
305  -21.762051  -42.086288   60.948166   44.974922  ...    0.0     0.0   
315  -92.687958 -196.751770  168.608826   45.740944  ...    0.0     0.0   
317   43.049892   12.565718   33.957642    6.710351  ...    0.0     0.0   
324   35.326954  -45.784943   41.754326   34.007488  ...    0.0     0.0   
325   -3.214091   60.101952   69.032806  -44.975594  ...    0.0     0.0   
335   -3.043645  -42.210953   63.538044  -58.925129  ...    0.0     0.0   
338   12.922450   76.375290   73.016518   36.039776  ...    0.0     0.0   
345   -4.656441   95.833298   81.806313   57.135715  ...    0.0     0.0   
347  -61.409584   80.283997   97.850533  -57.151199  ...    0.0     0.0   
791   46.183716   69.543930   60.285049   -7.393208  ...    0.0     0.0   
797   13.902727  -12.824851   50.633774   50.608711  ...    0.0     0.0   
810  -20.800457  -48.775509   40.091709   20.045677  ...    0.0     0.0   
816  -90.013542  -27.837770   77.845200   51.305622  ...    0.0     0.0   
819   -5.500076   10.272882   57.127357   54.653866  ...    0.0     0.0   
820   24.224476   74.036636   86.687820  -60.876274  ...    0.0     0.0   
822  -88.843933  129.028931  105.006508  -30.313942  ...    0.0     0.0   
824  -61.697636   34.377369   67.617706   33.402180  ...    0.0     0.0   
825   55.307018   25.722607   53.369095   17.563345  ...    0.0     0.0   
827   28.903816   73.095024   76.419800  -39.889969  ...    0.0     0.0   

     PY_199  PZ_199       truthE     truthPX     truthPY      truthPZ  ttv  \
436     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
440     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
441     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
444     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
445     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
452     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
454     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
460     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
469     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
476     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
159     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
174     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
176     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
188     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
190     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
195     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
196     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
198     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
200     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
211     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
835     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
838     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
839     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
843     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
844     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
851     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
853     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
863     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
865     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
876     0.0     0.0     0.000000    0.000000    0.000000     0.000000    1   
..      ...     ...          ...         ...         ...          ...  ...   
78      0.0     0.0   684.287781 -258.680817  376.150818  -481.489655    1   
79      0.0     0.0   564.532837 -442.659424  201.212677  -230.923553    1   
81      0.0     0.0   869.995117 -400.330170 -440.149078  -611.341797    1   
84      0.0     0.0   503.165894  361.098053 -289.973358   -87.063324    1   
86      0.0     0.0   689.829285 -542.529480  -75.202179  -382.762421    1   
87      0.0     0.0   710.630371  544.073853 -405.004059   125.427887    1   
88      0.0     0.0   550.275146 -185.418213  489.095428     4.628590    1   
95      0.0     0.0   575.489746 -208.970932  479.300079   170.623764    1   
96      0.0     0.0   573.650818 -303.963409 -434.481537  -136.952423    1   
98      0.0     0.0   574.677612  459.333130  298.173401   -36.719540    1   
300     0.0     0.0   971.624207  345.387085 -633.088318  -628.454407    1   
305     0.0     0.0   829.788757  527.661987 -320.467896  -527.385315    1   
315     0.0     0.0  1005.369202  301.333252 -463.502106  -822.074158    1   
317     0.0     0.0   677.365662   35.399406  587.608215   291.583069    1   
324     0.0     0.0   585.305908  513.958679  195.995544  -106.244362    1   
325     0.0     0.0   970.638184 -571.535217  -35.579041   764.569458    1   
335     0.0     0.0   692.826843 -609.769348  -40.451492  -278.393799    1   
338     0.0     0.0  1452.763672  575.779358  302.395813  1287.667969    1   
345     0.0     0.0   703.255371  415.590302  221.872940   493.012695    1   
347     0.0     0.0   860.613403 -257.987335 -486.466919   638.902527    1   
791     0.0     0.0  1145.094727  139.510925  637.620056   925.251465    1   
797     0.0     0.0   537.091309  489.141541  -43.226307   135.200928    1   
810     0.0     0.0   959.667175  459.227905 -374.334259  -735.301575    1   
816     0.0     0.0   666.544861  546.472229 -329.620361   -88.627724    1   
819     0.0     0.0   653.731995  618.538391  -69.253319   102.905121    1   
820     0.0     0.0   832.899963 -572.147461  249.778381   524.242310    1   
822     0.0     0.0  1014.205322 -213.806396 -476.465851   852.417358    1   
824     0.0     0.0   659.905701  419.718506 -414.999115   239.802658    1   
825     0.0     0.0   659.797302   39.736858  593.325745   230.296585    1   
827     0.0     0.0  1255.107666 -514.749023  269.764374  1099.081665    1   

     is_signal_new  
436              0  
440              0  
441              0  
444              0  
445              0  
452              0  
454              0  
460              0  
469              0  
476              0  
159              0  
174              0  
176              0  
188              0  
190              0  
195              0  
196              0  
198              0  
200              0  
211              0  
835              0  
838              0  
839              0  
843              0  
844              0  
851              0  
853              0  
863              0  
865              0  
876              0  
..             ...  
78               1  
79               1  
81               1  
84               1  
86               1  
87               1  
88               1  
95               1  
96               1  
98               1  
300              1  
305              1  
315              1  
317              1  
324              1  
325              1  
335              1  
338              1  
345              1  
347              1  
791              1  
797              1  
810              1  
816              1  
819              1  
820              1  
822              1  
824              1  
825              1  
827              1  

[20000 rows x 806 columns]

We want to work on images in the $\eta$-$\phi$ plane, with $p_{T}$ as the pixel values

Start by calculating $p_{T}$, $\eta$, and $\phi$ for all constituents

In [12]:

# convert cartesian coordiantes to pt/eta/phi
for i in range(n_constituents):
    px = "PX_{}".format(i)
    py = "PY_{}".format(i)
    pz = "PZ_{}".format(i)
    pt = "PT_{}".format(i)
    df["PT_{}".format(i)] = (df[px]**2 + df[py]**2)**0.5
    df["ETA_{}".format(i)] = np.arctanh(df[pz] / (df[pt]**2 + df[pz]**2)**0.5)
    df["PHI_{}".format(i)] = np.arctan2(df[py], df[px])

df.fillna(0, inplace=True)

First step: Center the jet

In [13]:

# (first constituent has highest pT)
df[vector_names["ETA"]] = df[vector_names["ETA"]].subtract(df["ETA_0"], axis=0)
# for phi, take smaller angle
df[vector_names["PHI"]] = df[vector_names["PHI"]].subtract(df["PHI_0"], axis=0).add(np.pi).mod(2 * np.pi).subtract(np.pi)

In [14]:

def plot_jet_image(jet_histogram):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    norm = colors.LogNorm(10**-4, jet_histogram.max(), clip='True')
    im = ax.imshow(jet_histogram, norm=norm)
    fig.colorbar(im)
    plt.show()

In [15]:

average_centered_jet, xedges, y_edges = \
    np.histogram2d(df[vector_names["ETA"]].values.reshape(-1), df[vector_names["PHI"]].values.reshape(-1),
        weights=df[vector_names["PT"]].values.reshape(-1) / len(df), bins=n_bins, range=plot_range,
    )
plot_jet_image(average_centered_jet)

Multiple options for further preprocessing. Here, we rotate to align the jets and then normalize the pixel values

In [16]:

stacked_matrix = np.stack([df[vector_names["ETA"]], df[vector_names["PHI"]]], axis=1)

# get constituent with second highest p_T
rotation_vectors = stacked_matrix[:, :, 1].copy()
rotation_vectors /= np.linalg.norm(rotation_vectors, axis=-1)[:, None]

# rotate to align that maximum
rotation_matrix = np.stack([rotation_vectors, rotation_vectors.dot([[0, 1],[-1, 0]])], axis=-1)
rotated_matrix = np.einsum('ijl,ijk->ilk', rotation_matrix, stacked_matrix)

# plot rotated jets
average_rotated_jet, x_edges, y_edges = \
    np.histogram2d(rotated_matrix[:,0,:].reshape(-1), rotated_matrix[:,1,:].reshape(-1),
        weights=df[vector_names["PT"]].values.reshape(-1) / len(df), bins=n_bins, range=plot_range,
    )
plot_jet_image(average_rotated_jet)

In [17]:

# create image data
weights = df[vector_names["PT"]].values
images = []
for i in range(len(df)):
    image, x_edges, y_edges = np.histogram2d(rotated_matrix[i, 0], rotated_matrix[i, 1],
        weights=weights[i], bins=n_bins, range=plot_range,
    )
    # scale to keep values between 0 and 1
    image /= np.max(image)
    images.append(image)
images = np.stack(images)

# plot data we use for training
average_image = np.sum(images, axis=0) / len(images)
plot_jet_image(average_image)

Network Training¶

In [18]:

# add color channel dimension to data
data = np.expand_dims(images, axis=-1)
labels = df[label_key].values

activation = "relu"
padding = "same"

from tensorflow.keras import layers

Use keras to simplify network definition. Two options of writing the model:

Functional¶

Sequential¶

In [19]:

inputs = layers.Input(shape=(40, 40, 1))

# convolutional layers: n_filters, kernel size, **kwargs
x = layers.Conv2D(8, 4, activation=activation,
                  padding=padding)(inputs)
x = layers.Conv2D(8, 4, activation=activation,
                  padding=padding)(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(8, 4, activation=activation, padding=padding)(x)
x = layers.Conv2D(8, 4, activation=activation, padding=padding)(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation=activation)(x)
x = layers.Dense(64, activation=activation)(x)
x = layers.Dense(64, activation=activation)(x)
output = layers.Dense(2, activation="softmax")(x)

model = keras.models.Model(inputs, output)
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="adam", metrics=["accuracy"])

In [20]:

model = keras.Sequential([
# convolutional layers: n_filters, kernel size, **kwargs
    layers.Conv2D(8, 4, activation=activation, padding=padding,
                  input_shape=(40, 40, 1)),
    layers.Conv2D(8, 4, activation=activation, padding=padding),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(8, 4, activation=activation, padding=padding),
    layers.Conv2D(8, 4, activation=activation, padding=padding),
    layers.Flatten(),
    layers.Dense(64, activation=activation),
    layers.Dense(64, activation=activation),
    layers.Dense(64, activation=activation),
    layers.Dense(2, activation="softmax")
])

model.compile(loss="sparse_categorical_crossentropy",
              optimizer="adam", metrics=["accuracy"])

Sequential definition is slightly more simple, but functional definition allows access to intermediate layers of the model

Very helpful feature in Keras: Model summary

In [21]:

print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 40, 40, 8)         136       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 40, 40, 8)         1032      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 20, 20, 8)         0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 20, 20, 8)         1032      
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 20, 20, 8)         1032      
_________________________________________________________________
flatten_1 (Flatten)          (None, 3200)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 64)                204864    
_________________________________________________________________
dense_5 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_6 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_7 (Dense)              (None, 2)                 130       
=================================================================
Total params: 216,546
Trainable params: 216,546
Non-trainable params: 0
_________________________________________________________________
None

Now we train the model, with early stopping if the loss on the validation set does not improve for 2 epochs

In [22]:

early_stopper = keras.callbacks.EarlyStopping(monitor="val_loss", patience=2, mode="auto", restore_best_weights=True)
history = model.fit(data, labels, batch_size=50, epochs=10, shuffle=True, validation_split=0.3, callbacks=[early_stopper])

Train on 14000 samples, validate on 6000 samples
Epoch 1/20
14000/14000 [==============================] - 21s 1ms/sample - loss: 0.3727 - acc: 0.8469 - val_loss: 0.3028 - val_acc: 0.8847
Epoch 2/20
14000/14000 [==============================] - 20s 1ms/sample - loss: 0.2967 - acc: 0.8800 - val_loss: 0.2802 - val_acc: 0.8910
Epoch 3/20
14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2700 - acc: 0.8874 - val_loss: 0.2505 - val_acc: 0.8988
Epoch 4/20
14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2532 - acc: 0.8945 - val_loss: 0.2453 - val_acc: 0.8980
Epoch 5/20
14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2448 - acc: 0.8978 - val_loss: 0.2382 - val_acc: 0.9057
Epoch 6/20
14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2313 - acc: 0.9025 - val_loss: 0.2395 - val_acc: 0.9023
Epoch 7/20
14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2211 - acc: 0.9082 - val_loss: 0.2331 - val_acc: 0.9057
Epoch 8/20
14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2099 - acc: 0.9124 - val_loss: 0.2526 - val_acc: 0.8980
Epoch 9/20
14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2039 - acc: 0.9154 - val_loss: 0.2464 - val_acc: 0.9010

In [23]:

plt.plot(history.history["loss"], label="training loss")
plt.plot(history.history["val_loss"], label="validation loss")
plt.legend()
plt.show()

A note on metrics (losss/accuracy/...) in Keras:

Training metrics are accumulated during an epoch, with dropout applied (if used)
Validation metrics are evaluated at the end of an epoch

$\quad \rightarrow$ E.g. training loss can be larger than validation loss

Target: Validation accuracy of 0.93 when trained on full dataset

Open Part¶

Time for questions¶

Otherwise: Explore top-tagging dataset. Some inspiration

How helpful are the parts of the preprocessing?
How does the performance scale with the amount of data?
What would we be able to achieve with a fully connected network?
Find better hyperparameters for architecture?
Other ideas to improve performance?

IML Workshop Tutorial¶

Introduction to the Basics of Deep Learning¶

Introduction¶

Plan of Today¶

Setup for Tutorial¶

Gitter Channel¶

Deep Learning Libraries¶

Training by Example¶

Multidimensional Linear Models¶

Neural Networks¶

Activation Functions¶

Deep Neural Networks¶

Classification vs Regression¶

Regression¶

Classification¶

Backpropagation and Gradient Descent¶

Stochastic Gradient Descent¶

Optimizers¶

Initialization¶

Preprocessing¶

Let's Try it Out¶

Overtraining and Generalization¶

Regularization¶

Dropout¶

Convolutional Neural Networks¶

Convolutions¶

Feature Maps¶

Padding and Pooling¶

Feature Hierarchy¶

CNN Structure¶

CNN Example - Top Tagging¶

Input Data and Preprocessing¶

Network Training¶

Functional¶

Sequential¶

Open Part¶

Time for questions¶

Coffee¶