!jupyter nbextension enable splitcell/splitcell
!jupyter nbextension enable rise/main
Enabling notebook extension splitcell/splitcell... - Validating: OK Enabling notebook extension rise/main... - Validating: OK
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
from matplotlib import colors, cm
import os
import pandas as pd
from tensorflow import keras
from tutorial import get_file
eos access: ✗
plt.rcParams["axes.grid"] = False
plt.rcParams.update({'font.size': 20})
plt.rcParams.update({'figure.figsize': (12,9)})
plt.rcParams['lines.markersize'] = 8
Martin Erdmann, Yannik Rath
This first session is an introduction to neural networks and deep learning
This is a tutorial, not a talk. Feel free to ask questions throughout!
Marked tutorials use same technical setup
You can follow along locally or use Swan/Binder:
Suggested: Click Open in SWAN
$\qquad \rightarrow$ Enter /eos/user/m/mrieger/public/iml2019/intro/setup.sh
as environment script
$\qquad \rightarrow$ Press Start my session
$\qquad \rightarrow$ Choose Tensorflow v1 image
Chat channel for the tutorials at https://gitter.im/IMLWorkshop19:
For any questions that may arise during the open part or afterwards (especially remote)
For us to share code snippets if needed
Many different frameworks for deep learning, steered mostly from Python
Two most popular:
In this tutorial, we will use TensorFlow + Keras
Side note: Many changes coming with TensorFlow 2.0. Some glimpses into the future later today
data = np.random.uniform(size=100)
labels = 3*data + 1 + np.random.normal(loc=0.0, scale=0.1, size=100)
plt.scatter(data, labels, label="data")
plt.legend()
plt.show()
Data: {xi, yi}, i = 1...N
Define model $y_{m}(x, \theta) = Wx + b$ with free parameters $\theta = (W, b)$
Define objective function (loss/cost) \begin{equation} J(\theta|x,y) = \frac{1}{N} \sum_{i}^{N} [y_{i}-y_{m}(x_{i}, \theta)]^{2} \end{equation}
Minimize objective ("train model") \begin{equation} \hat{\theta} = argmin[J(\theta)] \end{equation}
Easy to do if we have only a few variables
def cost(params):
W, b = params
return np.mean((labels - (W*data + b))**2)
from scipy.optimize import minimize
res = minimize(cost, [1., 1.])
W, b = res.x
points = np.linspace(0, 1, 100)
prediction = W*points + b
plt.scatter(data, labels, label="data")
plt.plot(points, prediction, label="model", color="blue")
plt.legend()
plt.show()
Now, consider multiple inputs $x = (x_{1}, .., x_{n})$ and outputs $y = (y_{1}, ..., y_{m})$
Example: $x \in \mathbb{R}^{3}$, $y \in \mathbb{R}^{2}$
\begin{gather} \begin{bmatrix} W_{11} & W_{12} & W_{13} \\ W_{21} & W_{22} & W_{23} \end{bmatrix} x \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \end{pmatrix} + \begin{pmatrix} b_{1} \\ b_{2} \end{pmatrix} = \begin{pmatrix} y_{1} \\ y_{2} \end{pmatrix} \end{gather}Adapted from Deep Learning in Physics Research Course, RWTH
So far we can only describe linear models
$\quad \rightarrow$ Compose model with multiple layers
\begin{align} & h = W^{(1)}x + b^{(1)} \\ & y = W^{(2)}h + b^{(2)} \end{align}Model ist still linear
\begin{align} & y = W^{(2)}(W^{(1)}x + b^{(1)}) + b^{(2)} \\ & y = W^{(2)}W^{(1)}x + W^{(2)}b^{(1)} + b^{(1)} \end{align}$\quad \rightarrow$ Apply non-linear activation function $\sigma(x)$
Deep Learning in Physics Research Course, RWTH
Applied elementwise to each node in a hidden layer
Introduces non-linearity
$\quad \rightarrow$ Allows stacking of multiple layers
Mostly ReLU variants used in deep learning due to vanishing gradient problem
Hierarchical feature extraction: Higher level of abstraction with each layer
$\quad \rightarrow$ Enables training on lower-level/"raw" data
Predict continous label
Separate events into multiple categories
Minimize mean squared error
\begin{equation} J(\theta|x,y) = \frac{1}{N} \sum_{i}^{N} [y_{i}-y_{m}(x_{i}, \theta)]^{2} \end{equation}Typically linear/ no activation function in output layer
Minimize cross entropy
\begin{equation} J(\theta|x,y) = - \frac{1}{N} \sum_{i}^{N} y_{i}log[y_{m}(x_{i}, \theta)] \end{equation}with softmax activation to constrain outputs to (0, 1) and their sum to 1 ("probability like")
\begin{equation} y_{j}(z) = \frac{e^{z_{j}}}{\sum_{i}e^{z_{i}}} \end{equation}Can no longer directly minimize objective function. Instead, minimze iteratively by updating $\theta$ in opposite direction of gradient
\begin{equation} \hat{\theta} \rightarrow \theta - \alpha\frac{dJ}{d\theta} \end{equation}with learning rate $\alpha$
Derivative can be calculated using the chain rule
\begin{equation} \frac{dJ}{d\theta} = \frac{dJ}{dy_{m}}\frac{dy_{m}}{d\theta} = ... \end{equation}Evaluation and derivation of the objective function for the full dataset costly.
Instead, calculate gradient for a a small subset (batch) of the training data
$\rightarrow$ Stochastic updates also help avoid local minima
One iteration over the full dataset called an epoch
More advanced options than fixed learning rate
$\qquad \rightarrow$ E.g. Adagrad, Adadelta, Adam
Neural network parameters initialized randomly
$\quad \rightarrow$ Break symmetry
Suitable values depend on layer sizes and activation functions
$\quad \rightarrow$ Basic idea: $Var[output] = Var[input]$
Glorot (tanh)
\begin{equation} Var[W] = \frac{2}{N_{in} + N_{out}} \end{equation}He (ReLU)
\begin{equation} Var[W] = \frac{2}{N_{in}} \end{equation}Values typically sampled from normal or uniform (with range $\pm \sqrt{3Var}$ ) distribution
Input features can have vastly different scales (e.g. $p_{T}$ vs $\eta$)
Most basic strategy: Normalize to mean 0 and variance 1
Andrej Karpathy, http://cs231n.github.io
Other options possible, e.g. input decorrelation, non-linear transformations
For now we use base tensorflow to see all steps.
Later we will use higher level abstractions provided in tf.keras
First, let's generate some toy data:
n_samples = 50
np.random.seed(4321)
class1_data = np.random.multivariate_normal([-1., -1.], [[1., 0.], [0., 1.]], n_samples)
class2_data = np.random.multivariate_normal([1., 1.], [[1., 0.], [0., 1.]], n_samples)
train_data = np.concatenate([class1_data, class2_data])
toy_labels = np.zeros(train_data.shape)
toy_labels[:n_samples, 0] = 1
toy_labels[n_samples:, 1] = 1
plt.scatter(*class1_data.T)
plt.scatter(*class2_data.T)
plt.show()
Our class labels are one-hot encoded, i.e. $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ for the first class and $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$ for the second
Tensorflow provides datasets and iterators to handle the data. Our model works with an iterator batch, which will be dynamically loaded during training
tf.reset_default_graph()
inp_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, 2])
target_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, 2])
dataset = tf.data.Dataset.from_tensor_slices((inp_placeholder, target_placeholder)).batch(10).shuffle(buffer_size=2*n_samples)
iterator = dataset.make_initializable_iterator()
inp, target = iterator.get_next()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.
We define a simple model with one hidden layer, using the initialization rules we learned about
Since this is a classification task, we use cross entropy as our objective function
n_hidden = 10
n_epochs = 100
with tf.variable_scope("initialization"):
W1 = tf.get_variable("W1", initializer=tf.random.normal((2, n_hidden), stddev=1.))
b1 = tf.get_variable("b1", initializer=tf.constant(0., shape=(n_hidden,)))
hidden = tf.nn.relu(tf.add(tf.matmul(inp, W1), b1))
W2 = tf.get_variable("W2", initializer=tf.random.normal((n_hidden, 2), stddev=(2./n_hidden)**0.5))
b2 = tf.get_variable("b2", initializer=tf.constant(0., shape=(2,)))
out = tf.nn.softmax(tf.add(tf.matmul(hidden, W2), b2))
# tensorflow also provides predefined objective functions in tf.losses
cost = -tf.reduce_mean(tf.reduce_sum(target*tf.log(tf.clip_by_value(out,1e-10,1.0)), reduction_indices=1))
train_step = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead.
In our training loop, we iterate over all batches in our training data, then re-initialize the iterator and repeat this for N epochs
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(n_epochs):
sess.run(iterator.initializer, feed_dict={inp_placeholder: train_data,
target_placeholder: toy_labels})
while True:
try:
sess.run(train_step)
except tf.errors.OutOfRangeError:
break
x = y = np.linspace(-3, 3, 41)
z = np.array([sess.run(out, feed_dict={inp: [[j, i]]})[0, 0] for i in x for j in y])
Z = z.reshape(41, 41)
plt.scatter(*class1_data.T)
plt.scatter(*class2_data.T)
plt.imshow(Z, interpolation="bilinear", origin="lower", extent=(-3, 3, -3., 3.))
plt.show()
Solution 1: Evaluate performance on a statistically independent validation set to measure generalization capabilities
$\quad \rightarrow$ Stop training when performance on validation set decreases: Early stopping
Caution: Optimization of network hyperparameters itself a form of training
$\quad \rightarrow$ Validation set not unbiased
Use a third test set to measure final performance once
In physics: Validation set is sometimes called test set, test set is called evaluation set
More sophisticated option: Cross validation
Methods to suppress overtraining
Already mentioned: Early stopping
L1/L2 regularization: Penalize high weights
with scaling factor $\lambda$
Randomly drop a percentage of nodes at each training step
Learn redundant representations $\rightarrow$ More robust model
Can be seen as training an ensemble of losely coupled networks in parallel
For evaluation: Disable dropout, scale node outputs accordingly
So far, considered only fully connected neural networks
For images, Convolutional neural networks (CNNs) are used
Exploit structures in data: Local correlations and translational invariance
Small filter (weight tensor) is applied to image patch, mapping it to a single value
Convolution: Slide filter over image to create feature map
Multiple filters are stacked depth-wise
Each filter can learn a different feature
Deep Learning in Physics Research Course, RWTH
Multiple convolutional layers will decrease output size due to edge effects
$\quad \rightarrow$ Can pad with zeros to keep dimensions the same
On the other hand, can pool outputs together when output size reduction is desired
E.g. max pooling: Take the maximum of each patch
Deep Learning in Physics Research Course, RWTH
Zeiler & Fergus 2013, adapted by Yann leCun
Abstraction increases for later layers
$\quad \rightarrow$ Typically smaller spatial extent (pooling), but larger number of feature maps
Deep Learning in Physics Research Course, RWTH
Fully connected layers at the end to combine features
Problem statement: Hadronically decaying top quark, boosted so that all decay products are contained in one fat jet
$\quad \rightarrow$ Distinguish from QCD jets
Open dataset (citation), with comparison of various machine learning methods here
One approach: Image based using CNNs
Simplified model here based on Deep-learning Top Taggers or The End of QCD? and Pulling Out All the Tops with Computer Vision and Deep Learning
The dataset contains 1 Million jets each for QCD and top events. For this tutorial we only use 10000 events each for experimentation.
Features are the cartesian four vectors for the first 200 constituents. (zero-padded if fewer than 200)
label_key = "is_signal_new"
n_constituents = 200
n_bins = 40
n_events_per_class = None
R_jet = 0.8
plot_range = ([-R_jet, R_jet], [-R_jet, R_jet])
vector_keys = ["PT", "ETA", "PHI"]
vector_names = {
key: ["{}_{}".format(key, i) for i in range(n_constituents)]
for key in vector_keys
}
# get input file
input_file = get_file("intro/data/top_tagging.h5")
df = pd.read_hdf(input_file, key="table")
if n_events_per_class is not None:
# select subset of events for faster testing
df = df.iloc[pd.np.r_[0:n_events_per_class, -n_events_per_class:0]]
print(df)
E_0 PX_0 PY_0 PZ_0 E_1 PX_1 \ 436 218.364243 -172.341858 110.129105 -76.503624 153.661118 -111.320465 440 122.238762 26.738468 -91.613998 76.382225 121.227135 17.644758 441 383.772308 -97.906456 79.640709 -362.426361 200.625992 -54.921326 444 132.492752 -77.763947 -87.322601 -62.304600 83.946594 -49.450481 445 730.786987 -209.120010 -193.454315 -672.973877 225.477325 -75.363350 452 425.659546 323.020142 -155.901611 -229.213257 83.688065 63.508339 454 184.878754 -163.308701 -48.425228 71.870857 39.259518 -34.826504 460 337.542389 -144.905655 52.931824 -300.225647 194.716293 -83.677284 469 57.453899 8.245859 54.802525 15.153853 48.546764 -30.239859 476 269.725555 197.983688 180.264572 -32.542664 170.214188 112.955109 159 279.122498 -186.983612 -184.703552 93.973915 130.233459 -87.081039 174 293.411957 226.085785 -186.910233 6.353484 56.282650 46.258255 176 68.190567 42.502636 44.376175 29.567446 50.892151 31.388681 188 364.401398 0.241121 -186.221115 313.225189 126.724442 -5.135780 190 839.949280 261.210724 149.367004 784.202332 470.249542 160.376953 195 158.013107 -58.645222 -111.765907 95.064514 147.360916 -64.619461 196 242.548203 128.798447 133.625168 156.156647 118.898994 65.832794 198 155.798996 -76.049294 -74.047806 -114.047157 126.377312 -57.531590 200 156.450684 70.312820 129.299667 -53.052074 93.001892 34.249096 211 37.492371 -14.051148 34.111626 -6.681313 29.366516 -4.658844 835 306.089844 77.324013 -286.238312 76.023705 46.095268 14.454353 838 72.369453 -42.351864 -47.631145 -34.277267 55.552345 -33.023098 839 208.428543 176.771591 -16.152925 -109.239861 169.994568 144.319977 843 119.918091 -83.957573 -62.511585 58.513050 106.310768 -77.028214 844 227.227386 186.315811 -46.266666 121.565208 157.918549 129.314774 851 517.916016 -294.284393 359.200562 -229.365799 44.625099 -22.775585 853 286.906738 119.059898 -84.216034 247.078690 128.020676 53.864735 863 226.503021 -90.072205 141.984818 151.759430 215.646698 -78.667992 865 681.133545 -71.523308 -199.660583 647.273499 110.246689 -6.527587 876 120.037498 90.969063 53.741943 56.968712 106.599525 81.198769 .. ... ... ... ... ... ... 78 112.548164 -69.994125 55.432499 -68.521164 92.737343 -75.086960 79 85.717751 -74.409958 11.056811 -41.090591 68.491020 -59.455765 81 85.493820 -27.314945 -51.542961 -62.501278 102.863480 -34.857529 84 127.245735 85.553139 -93.959938 6.608004 124.445580 73.211723 86 94.716499 -78.742737 -1.188648 -52.624935 40.758934 -37.494156 87 100.364716 72.360008 -60.275139 34.698883 97.817802 71.583481 88 119.582886 -44.312881 110.618286 -10.001425 113.191605 -41.405144 95 94.261490 -26.274246 90.048035 9.286770 62.105225 -14.019166 96 59.289379 -39.532875 -41.123634 -16.162573 50.016296 -22.099298 98 114.959221 82.774529 77.833389 -17.491800 70.668648 54.612209 300 337.563660 130.170883 -161.850128 -266.100159 145.962097 56.285725 305 186.623291 115.854362 -119.002655 -85.113960 70.260399 51.881416 315 327.187653 91.097389 -132.940445 -284.745239 226.840576 64.454109 317 55.485935 -17.748005 51.104897 12.328293 44.846516 0.138648 324 171.201141 155.533829 43.707058 -56.645840 138.376160 125.712807 325 143.212662 -54.333054 31.085360 128.807953 79.777313 -52.362625 335 158.137543 -133.522797 -7.243723 -84.419617 80.231056 -68.161530 338 168.803345 52.608585 34.850925 156.564102 87.154083 39.945709 345 112.950981 64.614891 -4.162764 92.550049 115.284904 63.913273 347 274.400665 -60.116039 -165.123810 210.750854 121.846008 -68.042580 791 279.383087 43.128311 131.399200 242.753189 84.100731 -10.180331 797 229.098892 228.911407 -6.138898 -6.941409 53.032585 49.544853 810 311.466248 177.566422 -99.088737 -235.929688 64.184464 36.165394 816 179.288879 120.904938 -132.369537 -2.192201 148.808716 115.180977 819 153.994751 151.944946 -10.638010 22.670460 70.607399 69.639229 820 169.716614 -122.102844 36.656090 112.031059 112.158615 -80.692657 822 287.591492 -81.565056 -140.493576 237.313202 160.357544 -34.246628 824 91.996681 60.538616 -63.476665 27.734055 93.122139 60.690456 825 132.474686 35.062359 125.736656 22.593466 61.466259 7.588309 827 95.355087 -46.626762 25.236219 79.256989 86.610634 -36.374290 PY_1 PZ_1 E_2 PX_2 ... E_199 PX_199 \ 436 93.167969 -50.390713 76.708054 -56.523701 ... 0.0 0.0 440 -93.015450 75.715302 90.420105 21.377417 ... 0.0 0.0 441 37.994343 -189.184753 123.247223 -33.828953 ... 0.0 0.0 444 -53.823605 -41.288010 28.072624 -19.964916 ... 0.0 0.0 445 -66.226990 -201.926651 217.040192 -63.698189 ... 0.0 0.0 452 -30.651501 -45.065155 35.438320 25.900942 ... 0.0 0.0 454 -6.211102 17.024878 36.229164 -31.835764 ... 0.0 0.0 460 30.484045 -173.156784 115.748047 -52.055187 ... 0.0 0.0 469 37.749592 4.160240 49.887844 -4.153875 ... 0.0 0.0 476 125.723831 -20.187439 47.387337 35.194256 ... 0.0 0.0 159 -86.566429 43.403915 35.724968 -23.600866 ... 0.0 0.0 174 -32.018463 1.651915 47.768581 37.176056 ... 0.0 0.0 176 36.319347 16.901674 42.756905 23.390104 ... 0.0 0.0 188 -65.158585 108.568260 95.833519 -0.119313 ... 0.0 0.0 190 66.286209 437.058350 213.598602 73.139099 ... 0.0 0.0 195 -96.582558 90.616646 107.360291 -41.913647 ... 0.0 0.0 196 62.647701 76.669937 55.116993 31.064240 ... 0.0 0.0 198 -64.622345 -92.115646 83.377754 -41.058903 ... 0.0 0.0 200 81.849770 -27.874113 90.545639 34.173183 ... 0.0 0.0 211 28.639086 4.526615 28.641333 -4.654485 ... 0.0 0.0 835 -42.235260 11.490357 41.543114 12.639909 ... 0.0 0.0 838 -36.267849 -26.080288 50.658684 -31.153194 ... 0.0 0.0 839 -11.911080 -89.039459 118.925804 100.790558 ... 0.0 0.0 843 -49.156624 54.334709 57.184879 -40.574955 ... 0.0 0.0 844 -42.471935 80.075539 74.789467 61.393959 ... 0.0 0.0 851 33.639759 -18.467234 32.496765 -18.493771 ... 0.0 0.0 853 -37.853809 109.795143 98.720222 38.892624 ... 0.0 0.0 863 132.869980 150.533752 114.808167 -43.253536 ... 0.0 0.0 865 -32.490814 105.147850 104.702782 -11.399035 ... 0.0 0.0 876 47.593819 50.050446 78.952003 59.764385 ... 0.0 0.0 .. ... ... ... ... ... ... ... 78 42.109306 -34.481441 71.450615 -45.100121 ... 0.0 0.0 79 8.834720 -32.832600 53.507847 -46.168694 ... 0.0 0.0 81 -38.749676 -88.680946 96.135536 -31.703087 ... 0.0 0.0 84 -95.260292 -32.437981 105.382408 56.979771 ... 0.0 0.0 86 8.191441 -13.725134 38.636875 -36.520794 ... 0.0 0.0 87 -57.933125 32.983059 88.084023 80.769066 ... 0.0 0.0 88 104.923485 -9.434802 87.742432 -33.157097 ... 0.0 0.0 95 59.618134 10.305344 53.662228 11.528226 ... 0.0 0.0 96 -39.540451 -21.208580 43.462807 -20.643940 ... 0.0 0.0 98 40.573910 19.113411 50.274555 37.760147 ... 0.0 0.0 300 -69.983795 -115.061378 73.108635 5.547623 ... 0.0 0.0 305 -21.762051 -42.086288 60.948166 44.974922 ... 0.0 0.0 315 -92.687958 -196.751770 168.608826 45.740944 ... 0.0 0.0 317 43.049892 12.565718 33.957642 6.710351 ... 0.0 0.0 324 35.326954 -45.784943 41.754326 34.007488 ... 0.0 0.0 325 -3.214091 60.101952 69.032806 -44.975594 ... 0.0 0.0 335 -3.043645 -42.210953 63.538044 -58.925129 ... 0.0 0.0 338 12.922450 76.375290 73.016518 36.039776 ... 0.0 0.0 345 -4.656441 95.833298 81.806313 57.135715 ... 0.0 0.0 347 -61.409584 80.283997 97.850533 -57.151199 ... 0.0 0.0 791 46.183716 69.543930 60.285049 -7.393208 ... 0.0 0.0 797 13.902727 -12.824851 50.633774 50.608711 ... 0.0 0.0 810 -20.800457 -48.775509 40.091709 20.045677 ... 0.0 0.0 816 -90.013542 -27.837770 77.845200 51.305622 ... 0.0 0.0 819 -5.500076 10.272882 57.127357 54.653866 ... 0.0 0.0 820 24.224476 74.036636 86.687820 -60.876274 ... 0.0 0.0 822 -88.843933 129.028931 105.006508 -30.313942 ... 0.0 0.0 824 -61.697636 34.377369 67.617706 33.402180 ... 0.0 0.0 825 55.307018 25.722607 53.369095 17.563345 ... 0.0 0.0 827 28.903816 73.095024 76.419800 -39.889969 ... 0.0 0.0 PY_199 PZ_199 truthE truthPX truthPY truthPZ ttv \ 436 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 440 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 441 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 444 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 445 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 452 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 454 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 460 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 469 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 476 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 159 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 174 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 176 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 188 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 190 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 195 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 196 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 198 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 200 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 211 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 835 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 838 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 839 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 843 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 844 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 851 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 853 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 863 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 865 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 876 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 .. ... ... ... ... ... ... ... 78 0.0 0.0 684.287781 -258.680817 376.150818 -481.489655 1 79 0.0 0.0 564.532837 -442.659424 201.212677 -230.923553 1 81 0.0 0.0 869.995117 -400.330170 -440.149078 -611.341797 1 84 0.0 0.0 503.165894 361.098053 -289.973358 -87.063324 1 86 0.0 0.0 689.829285 -542.529480 -75.202179 -382.762421 1 87 0.0 0.0 710.630371 544.073853 -405.004059 125.427887 1 88 0.0 0.0 550.275146 -185.418213 489.095428 4.628590 1 95 0.0 0.0 575.489746 -208.970932 479.300079 170.623764 1 96 0.0 0.0 573.650818 -303.963409 -434.481537 -136.952423 1 98 0.0 0.0 574.677612 459.333130 298.173401 -36.719540 1 300 0.0 0.0 971.624207 345.387085 -633.088318 -628.454407 1 305 0.0 0.0 829.788757 527.661987 -320.467896 -527.385315 1 315 0.0 0.0 1005.369202 301.333252 -463.502106 -822.074158 1 317 0.0 0.0 677.365662 35.399406 587.608215 291.583069 1 324 0.0 0.0 585.305908 513.958679 195.995544 -106.244362 1 325 0.0 0.0 970.638184 -571.535217 -35.579041 764.569458 1 335 0.0 0.0 692.826843 -609.769348 -40.451492 -278.393799 1 338 0.0 0.0 1452.763672 575.779358 302.395813 1287.667969 1 345 0.0 0.0 703.255371 415.590302 221.872940 493.012695 1 347 0.0 0.0 860.613403 -257.987335 -486.466919 638.902527 1 791 0.0 0.0 1145.094727 139.510925 637.620056 925.251465 1 797 0.0 0.0 537.091309 489.141541 -43.226307 135.200928 1 810 0.0 0.0 959.667175 459.227905 -374.334259 -735.301575 1 816 0.0 0.0 666.544861 546.472229 -329.620361 -88.627724 1 819 0.0 0.0 653.731995 618.538391 -69.253319 102.905121 1 820 0.0 0.0 832.899963 -572.147461 249.778381 524.242310 1 822 0.0 0.0 1014.205322 -213.806396 -476.465851 852.417358 1 824 0.0 0.0 659.905701 419.718506 -414.999115 239.802658 1 825 0.0 0.0 659.797302 39.736858 593.325745 230.296585 1 827 0.0 0.0 1255.107666 -514.749023 269.764374 1099.081665 1 is_signal_new 436 0 440 0 441 0 444 0 445 0 452 0 454 0 460 0 469 0 476 0 159 0 174 0 176 0 188 0 190 0 195 0 196 0 198 0 200 0 211 0 835 0 838 0 839 0 843 0 844 0 851 0 853 0 863 0 865 0 876 0 .. ... 78 1 79 1 81 1 84 1 86 1 87 1 88 1 95 1 96 1 98 1 300 1 305 1 315 1 317 1 324 1 325 1 335 1 338 1 345 1 347 1 791 1 797 1 810 1 816 1 819 1 820 1 822 1 824 1 825 1 827 1 [20000 rows x 806 columns]
We want to work on images in the $\eta$-$\phi$ plane, with $p_{T}$ as the pixel values
Start by calculating $p_{T}$, $\eta$, and $\phi$ for all constituents
# convert cartesian coordiantes to pt/eta/phi
for i in range(n_constituents):
px = "PX_{}".format(i)
py = "PY_{}".format(i)
pz = "PZ_{}".format(i)
pt = "PT_{}".format(i)
df["PT_{}".format(i)] = (df[px]**2 + df[py]**2)**0.5
df["ETA_{}".format(i)] = np.arctanh(df[pz] / (df[pt]**2 + df[pz]**2)**0.5)
df["PHI_{}".format(i)] = np.arctan2(df[py], df[px])
df.fillna(0, inplace=True)
First step: Center the jet
# (first constituent has highest pT)
df[vector_names["ETA"]] = df[vector_names["ETA"]].subtract(df["ETA_0"], axis=0)
# for phi, take smaller angle
df[vector_names["PHI"]] = df[vector_names["PHI"]].subtract(df["PHI_0"], axis=0).add(np.pi).mod(2 * np.pi).subtract(np.pi)
def plot_jet_image(jet_histogram):
fig = plt.figure()
ax = fig.add_subplot(111)
norm = colors.LogNorm(10**-4, jet_histogram.max(), clip='True')
im = ax.imshow(jet_histogram, norm=norm)
fig.colorbar(im)
plt.show()
average_centered_jet, xedges, y_edges = \
np.histogram2d(df[vector_names["ETA"]].values.reshape(-1), df[vector_names["PHI"]].values.reshape(-1),
weights=df[vector_names["PT"]].values.reshape(-1) / len(df), bins=n_bins, range=plot_range,
)
plot_jet_image(average_centered_jet)
Multiple options for further preprocessing. Here, we rotate to align the jets and then normalize the pixel values
stacked_matrix = np.stack([df[vector_names["ETA"]], df[vector_names["PHI"]]], axis=1)
# get constituent with second highest p_T
rotation_vectors = stacked_matrix[:, :, 1].copy()
rotation_vectors /= np.linalg.norm(rotation_vectors, axis=-1)[:, None]
# rotate to align that maximum
rotation_matrix = np.stack([rotation_vectors, rotation_vectors.dot([[0, 1],[-1, 0]])], axis=-1)
rotated_matrix = np.einsum('ijl,ijk->ilk', rotation_matrix, stacked_matrix)
# plot rotated jets
average_rotated_jet, x_edges, y_edges = \
np.histogram2d(rotated_matrix[:,0,:].reshape(-1), rotated_matrix[:,1,:].reshape(-1),
weights=df[vector_names["PT"]].values.reshape(-1) / len(df), bins=n_bins, range=plot_range,
)
plot_jet_image(average_rotated_jet)
# create image data
weights = df[vector_names["PT"]].values
images = []
for i in range(len(df)):
image, x_edges, y_edges = np.histogram2d(rotated_matrix[i, 0], rotated_matrix[i, 1],
weights=weights[i], bins=n_bins, range=plot_range,
)
# scale to keep values between 0 and 1
image /= np.max(image)
images.append(image)
images = np.stack(images)
# plot data we use for training
average_image = np.sum(images, axis=0) / len(images)
plot_jet_image(average_image)
# add color channel dimension to data
data = np.expand_dims(images, axis=-1)
labels = df[label_key].values
activation = "relu"
padding = "same"
from tensorflow.keras import layers
Use keras to simplify network definition. Two options of writing the model:
inputs = layers.Input(shape=(40, 40, 1))
# convolutional layers: n_filters, kernel size, **kwargs
x = layers.Conv2D(8, 4, activation=activation,
padding=padding)(inputs)
x = layers.Conv2D(8, 4, activation=activation,
padding=padding)(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Conv2D(8, 4, activation=activation, padding=padding)(x)
x = layers.Conv2D(8, 4, activation=activation, padding=padding)(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation=activation)(x)
x = layers.Dense(64, activation=activation)(x)
x = layers.Dense(64, activation=activation)(x)
output = layers.Dense(2, activation="softmax")(x)
model = keras.models.Model(inputs, output)
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam", metrics=["accuracy"])
model = keras.Sequential([
# convolutional layers: n_filters, kernel size, **kwargs
layers.Conv2D(8, 4, activation=activation, padding=padding,
input_shape=(40, 40, 1)),
layers.Conv2D(8, 4, activation=activation, padding=padding),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(8, 4, activation=activation, padding=padding),
layers.Conv2D(8, 4, activation=activation, padding=padding),
layers.Flatten(),
layers.Dense(64, activation=activation),
layers.Dense(64, activation=activation),
layers.Dense(64, activation=activation),
layers.Dense(2, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam", metrics=["accuracy"])
Sequential definition is slightly more simple, but functional definition allows access to intermediate layers of the model
Very helpful feature in Keras: Model summary
print(model.summary())
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_4 (Conv2D) (None, 40, 40, 8) 136 _________________________________________________________________ conv2d_5 (Conv2D) (None, 40, 40, 8) 1032 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 20, 20, 8) 0 _________________________________________________________________ conv2d_6 (Conv2D) (None, 20, 20, 8) 1032 _________________________________________________________________ conv2d_7 (Conv2D) (None, 20, 20, 8) 1032 _________________________________________________________________ flatten_1 (Flatten) (None, 3200) 0 _________________________________________________________________ dense_4 (Dense) (None, 64) 204864 _________________________________________________________________ dense_5 (Dense) (None, 64) 4160 _________________________________________________________________ dense_6 (Dense) (None, 64) 4160 _________________________________________________________________ dense_7 (Dense) (None, 2) 130 ================================================================= Total params: 216,546 Trainable params: 216,546 Non-trainable params: 0 _________________________________________________________________ None
Now we train the model, with early stopping if the loss on the validation set does not improve for 2 epochs
early_stopper = keras.callbacks.EarlyStopping(monitor="val_loss", patience=2, mode="auto", restore_best_weights=True)
history = model.fit(data, labels, batch_size=50, epochs=10, shuffle=True, validation_split=0.3, callbacks=[early_stopper])
Train on 14000 samples, validate on 6000 samples Epoch 1/20 14000/14000 [==============================] - 21s 1ms/sample - loss: 0.3727 - acc: 0.8469 - val_loss: 0.3028 - val_acc: 0.8847 Epoch 2/20 14000/14000 [==============================] - 20s 1ms/sample - loss: 0.2967 - acc: 0.8800 - val_loss: 0.2802 - val_acc: 0.8910 Epoch 3/20 14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2700 - acc: 0.8874 - val_loss: 0.2505 - val_acc: 0.8988 Epoch 4/20 14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2532 - acc: 0.8945 - val_loss: 0.2453 - val_acc: 0.8980 Epoch 5/20 14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2448 - acc: 0.8978 - val_loss: 0.2382 - val_acc: 0.9057 Epoch 6/20 14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2313 - acc: 0.9025 - val_loss: 0.2395 - val_acc: 0.9023 Epoch 7/20 14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2211 - acc: 0.9082 - val_loss: 0.2331 - val_acc: 0.9057 Epoch 8/20 14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2099 - acc: 0.9124 - val_loss: 0.2526 - val_acc: 0.8980 Epoch 9/20 14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2039 - acc: 0.9154 - val_loss: 0.2464 - val_acc: 0.9010
plt.plot(history.history["loss"], label="training loss")
plt.plot(history.history["val_loss"], label="validation loss")
plt.legend()
plt.show()
A note on metrics (losss/accuracy/...) in Keras:
Training metrics are accumulated during an epoch, with dropout applied (if used)
Validation metrics are evaluated at the end of an epoch
$\quad \rightarrow$ E.g. training loss can be larger than validation loss
Target: Validation accuracy of 0.93 when trained on full dataset
Otherwise: Explore top-tagging dataset. Some inspiration