Notebook

Air-shower reconstruction using a convolutional neural network¶

Cosmic-ray observatory¶

In this example, we will train a neural network to reconstruct air-shower properties measured by a hypothetic cosmic-ray observatory located at a height of 1400 m. The observatory features a cartesian array of 14 x 14 particle detectors with a distance of 750 m. Thus, the measured data contain the simulated detector responses from secondary particles of the cosmic-ray induced air shower that interact with the ground-based observatory.

Each particle detector measures two quantities that are stored in the form of a cartesian image (2D array with 14 x 14 pixels)- arrival time T: arrival time of the first particles

signal S: integrated signal

In this task, we are interested in reconstructing the cosmic-ray energy in EeV (exaelectronvolt = $10^{18}~\mathrm{eV}$) using the measured particle footprint and a convolutional neural network.

Preparation:¶

Enable a GPU for this task.
Therefore, click: Edit -> Notebook settings -> select GPU as Hardware accelerator

References¶

The tutorial is inspired by the simulation and the study performed in https://doi.org/10.1016/j.astropartphys.2017.10.006.

In [1]:

import tensorflow as tf
from tensorflow import keras
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

layers = keras.layers

print("tf", tf.__version__)

keras 2.6.0
tf 2.6.0

Data¶

Download data¶

As a first step, we have to download the simulated air shower data.

In [2]:

import os
import gdown
url = "https://drive.google.com/u/0/uc?export=download&confirm=HgGH&id=19gOSVQ0mhdjMkm5i2u5NsGrWNBdlhR4F"
output = 'airshowers_mnist.npz'

if os.path.exists(output) == False:
    gdown.download(url, output, quiet=False)

f = np.load(output)

Input 1: Arrival times¶

Our first input is the arrival times of the first shower particles, measured at the various stations. We normalize the arrival times with respect to the mean and the standard deviation of the arrival time distribution.
Remember that each input has the shape of 14 x 14, as each station measures a specific arrival time.

In [3]:

# time map
T = f['time']
T -= np.nanmean(T)
T /= np.nanstd(T)
T[np.isnan(T)] = 0

print(T.shape)

(100000, 14, 14)

Plot four example maps¶

To visualize a map of arrival times, we can plot a random example event. Here, the dark blue indicates an early trigger and a bright yellow a late trigger.
With this information, one can directly get an impression of the shower axis (arrival direction) of the arriving particle shower.

In [4]:

nsamples=len(T)
random_samples = np.random.choice(nsamples, 4)

def rectangular_array(n=14):
    """ Return x,y coordinates for rectangular array with n^2 stations. """
    n0 = (n - 1) / 2
    return (np.mgrid[0:n, 0:n].astype(float) - n0)

plt.figure(figsize=(9,7))

for i,j in enumerate(random_samples):
    plt.subplot(2,2,i+1)
    footprint=T[j,...]
    xd, yd = rectangular_array()
    mask = footprint != 0
    mask[5, 5] = True
    marker_size = 50 * footprint[mask]
    plot = plt.scatter(xd, yd, c='grey', s=10, alpha=0.3, label="silent")
    circles = plt.scatter(xd[mask], yd[mask], c=footprint[mask], 
                          cmap="viridis", alpha=1, label="loud",
                          edgecolors="k", linewidths=0.3)
    cbar = plt.colorbar(circles)
    cbar.set_label('normalized time [a.u.]')
    plt.xlabel("x / km")
    plt.ylabel("y / km")
    plt.grid(True)

plt.tight_layout()
plt.show()

Input 2: Measured signals¶

In the following, we inspect the measured signals of the detectors. Again, we have 14 x 14 measurements (a few are zero as no signal was measured) that form a single event. We process the signal using a logarithmic re-scaling.

In [5]:

# signal map
S = f['signal']
S = np.log10(1 + S)
S -= np.nanmin(S)
S /= np.nanmax(S)
S[np.isnan(S)] = 0

print(S.shape)

(100000, 14, 14)

Plot four example events¶

In [6]:

plt.figure(figsize=(9,7))

for i,j in enumerate(random_samples):
    plt.subplot(2,2,i+1)
    footprint=S[j,...]
    xd, yd = rectangular_array()
    mask = footprint != 0
    mask[5, 5] = True
    marker_size = 130 * footprint[mask] + 20
    plot = plt.scatter(xd, yd, c='grey', s=10, alpha=0.4, label="silent")
    circles = plt.scatter(xd[mask], yd[mask], c=footprint[mask], s=marker_size,
                          cmap="autumn_r", alpha=1, label="loud",
                          edgecolors="k", linewidths=0.4)
    cbar = plt.colorbar(circles)
    cbar.set_label('normalized signals [a.u.]')
    plt.xlabel("x / km")
    plt.ylabel("y / km")
    plt.grid(True)

plt.tight_layout()
plt.show()

Labels¶

Now, we can prepare the targets of our neural network training.

In [7]:

axis = f['showeraxis']

In [8]:

core = f['showercore'][:, 0:2]
core /= 750

In [9]:

# energy - log10(E/eV) in range 3 - 100 EeV
energy = f['energy']
print(energy)

[ 4.00461776  5.99159294  7.49367551 ... 10.81657107  5.07542798
 43.32910954]

Training and test data¶

We further split the data into one large training set and a smaller test set. This test set is used for the final evaluation of the model and is only used once to ensure an unbiased performance measure.

Before, we combine our input X by concatenating the maps of arrival times with the maps of signals.

In [10]:

X = np.stack([T, S], axis=-1)

In [11]:

X_train, X_test = np.split(X, [-20000])
energy_train, energy_test = np.split(energy, [-20000])

Define Model¶

Now, we will set up a neural network to reconstruct the energy of the particle shower. In this define step we will set the architecture of the model.

Modification section
Feel free to modify the model. For example:

Change the number of nodes (but remember that the number of weights scales with n x n. Also, the final layer has to have only one node as we are reconstructing the energy, which is a scalar.).

Change the activation function, e.g., use relu, tanh, sigmoid, softplus, elu, (take care to not use an activation function for the final layer!).

Add new layers convolutional layers.

Add new pooling layers, followed by other convolutional layers.

Add fully-connected layers (remember to use Flatten() before).

Increase the Dropout fraction or place Dropout between several layers if you observe overtraining (validation loss increases).

In [12]:

model = keras.models.Sequential(name="energy_regression_CNN")
kwargs = dict(kernel_initializer="he_normal", padding="same",)

model.add(layers.Conv2D(32, (3, 3), activation="elu", input_shape=X_train.shape[1:], **kwargs))
model.add(layers.Conv2D(32, (3, 3), activation="elu", **kwargs))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation="elu", **kwargs))
model.add(layers.Conv2D(64, (3, 3), activation="elu", **kwargs))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation="elu", **kwargs))
model.add(layers.Conv2D(128, (3, 3), activation="elu", **kwargs))
model.add(layers.GlobalMaxPooling2D())
model.add(layers.Dropout(0.3))
model.add(layers.Dense(1))

We can have a deeper look at our designed model and inspect the number of adaptive parameters by printing the model summary.

In [13]:

print(model.summary())

Model: "energy_regression_CNN"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 14, 14, 32)        608       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 32)        9248      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 7, 7, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 64)          18496     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 7, 7, 64)          36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 3, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 3, 3, 128)         73856     
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 3, 3, 128)         147584    
_________________________________________________________________
global_max_pooling2d (Global (None, 128)               0         
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 129       
=================================================================
Total params: 286,849
Trainable params: 286,849
Non-trainable params: 0
_________________________________________________________________
None

Compile¶

We now compile the model to prepare it for the training. During the compile step, we set a loss/objective function (mean_squared_error, as energy reconstruction is a regression task) and set an optimizer. In this case, we use the Adam optimizer with a learning rate of 0.001. To monitor the resolution and the bias of the model, we add them as a metric.

In [14]:

def resolution(y_true, y_pred):
    """ Metric to control for standart deviation """
    mean, var = tf.nn.moments((y_true - y_pred), axes=[0])
    return tf.sqrt(var)


def bias(y_true, y_pred):
    """ Metric to control for standart deviation """
    mean, var = tf.nn.moments((y_true - y_pred), axes=[0])
    return mean

In [15]:

model.compile(
    loss='mean_squared_error',
    metrics=[bias, resolution],
    optimizer=keras.optimizers.Adam(learning_rate=1e-3))

Training¶

We can now run the training for 20 epochs (20-fold iteration over the dataset) using our training data X_train and our energy labels energy_train. For each iteration (calculating the gradients, updating the model parameters), we use 128 samples (batch_size).

We furthermore can set the fraction of validation data (initially set to 0.1) that is used to monitor overtraining.

Modification section
Feel free to modify the training procedure, for example:

Change (increase) the number of epochs.

Modify the batch size via batch_size.

In [16]:

fit = model.fit(
    X_train,
    energy_train,
    batch_size=64,
    epochs=50,
    verbose=2,
    validation_split=0.1,
)

Epoch 1/50
1125/1125 - 10s - loss: 51.2510 - bias: 0.2206 - resolution: 5.5853 - val_loss: 7.1643 - val_bias: 0.2620 - val_resolution: 2.6028
Epoch 2/50
1125/1125 - 7s - loss: 18.3058 - bias: 0.1312 - resolution: 3.8974 - val_loss: 7.3300 - val_bias: 0.8466 - val_resolution: 2.5060
Epoch 3/50
1125/1125 - 7s - loss: 14.9923 - bias: 0.1080 - resolution: 3.6163 - val_loss: 8.8342 - val_bias: 1.0222 - val_resolution: 2.7385
Epoch 4/50
1125/1125 - 7s - loss: 14.7818 - bias: 0.1117 - resolution: 3.5481 - val_loss: 6.2154 - val_bias: -1.0135e+00 - val_resolution: 2.2248
Epoch 5/50
1125/1125 - 7s - loss: 13.4509 - bias: 0.1009 - resolution: 3.3991 - val_loss: 5.4807 - val_bias: 0.7123 - val_resolution: 2.1765
Epoch 6/50
1125/1125 - 7s - loss: 12.5764 - bias: 0.1025 - resolution: 3.3318 - val_loss: 3.7811 - val_bias: 0.2103 - val_resolution: 1.8875
Epoch 7/50
1125/1125 - 7s - loss: 12.3931 - bias: 0.0986 - resolution: 3.3015 - val_loss: 6.3476 - val_bias: 0.2326 - val_resolution: 2.4490
Epoch 8/50
1125/1125 - 8s - loss: 11.8061 - bias: 0.0998 - resolution: 3.2197 - val_loss: 4.8639 - val_bias: -6.1058e-01 - val_resolution: 2.0738
Epoch 9/50
1125/1125 - 7s - loss: 11.0862 - bias: 0.1001 - resolution: 3.1308 - val_loss: 5.6667 - val_bias: 1.0685 - val_resolution: 2.0814
Epoch 10/50
1125/1125 - 7s - loss: 10.9595 - bias: 0.1017 - resolution: 3.1238 - val_loss: 5.8789 - val_bias: -7.6734e-01 - val_resolution: 2.2383
Epoch 11/50
1125/1125 - 7s - loss: 10.9668 - bias: 0.0949 - resolution: 3.1261 - val_loss: 5.4742 - val_bias: -6.9801e-01 - val_resolution: 2.1887
Epoch 12/50
1125/1125 - 8s - loss: 14.2411 - bias: 0.1067 - resolution: 3.3518 - val_loss: 5.0514 - val_bias: -1.0154e+00 - val_resolution: 1.9655
Epoch 13/50
1125/1125 - 8s - loss: 11.0807 - bias: 0.0909 - resolution: 3.1564 - val_loss: 4.4283 - val_bias: -5.9475e-02 - val_resolution: 1.9238
Epoch 14/50
1125/1125 - 8s - loss: 10.9338 - bias: 0.0962 - resolution: 3.1070 - val_loss: 3.2731 - val_bias: -4.4556e-01 - val_resolution: 1.7086
Epoch 15/50
1125/1125 - 8s - loss: 10.2959 - bias: 0.0961 - resolution: 3.0459 - val_loss: 4.9692 - val_bias: -9.8476e-02 - val_resolution: 2.1837
Epoch 16/50
1125/1125 - 8s - loss: 10.3641 - bias: 0.0912 - resolution: 3.0307 - val_loss: 3.9938 - val_bias: 0.4335 - val_resolution: 1.9019
Epoch 17/50
1125/1125 - 8s - loss: 10.3517 - bias: 0.0936 - resolution: 3.0283 - val_loss: 4.7277 - val_bias: -1.8346e-02 - val_resolution: 2.1363
Epoch 18/50
1125/1125 - 8s - loss: 10.3864 - bias: 0.0891 - resolution: 3.0255 - val_loss: 6.6410 - val_bias: 1.5696 - val_resolution: 1.9911
Epoch 19/50
1125/1125 - 8s - loss: 10.2311 - bias: 0.0963 - resolution: 3.0281 - val_loss: 3.9939 - val_bias: 0.1593 - val_resolution: 1.9415
Epoch 20/50
1125/1125 - 8s - loss: 10.2525 - bias: 0.0925 - resolution: 3.0340 - val_loss: 6.6985 - val_bias: -1.5925e-01 - val_resolution: 2.5286
Epoch 21/50
1125/1125 - 8s - loss: 9.8306 - bias: 0.0920 - resolution: 2.9892 - val_loss: 3.3819 - val_bias: 0.2883 - val_resolution: 1.7735
Epoch 22/50
1125/1125 - 8s - loss: 9.6167 - bias: 0.0896 - resolution: 2.9479 - val_loss: 3.2679 - val_bias: 0.1500 - val_resolution: 1.7628
Epoch 23/50
1125/1125 - 8s - loss: 9.8323 - bias: 0.0923 - resolution: 2.9840 - val_loss: 4.7279 - val_bias: -1.1579e+00 - val_resolution: 1.7977
Epoch 24/50
1125/1125 - 8s - loss: 10.0505 - bias: 0.0934 - resolution: 3.0126 - val_loss: 3.2452 - val_bias: -1.2292e-02 - val_resolution: 1.7605
Epoch 25/50
1125/1125 - 8s - loss: 9.5215 - bias: 0.0890 - resolution: 2.9296 - val_loss: 3.9381 - val_bias: 0.2119 - val_resolution: 1.9251
Epoch 26/50
1125/1125 - 8s - loss: 9.5036 - bias: 0.0914 - resolution: 2.9311 - val_loss: 7.8479 - val_bias: 0.6907 - val_resolution: 2.6659
Epoch 27/50
1125/1125 - 8s - loss: 9.2302 - bias: 0.0897 - resolution: 2.8851 - val_loss: 4.3504 - val_bias: 0.5339 - val_resolution: 1.9730
Epoch 28/50
1125/1125 - 8s - loss: 9.0326 - bias: 0.0875 - resolution: 2.8620 - val_loss: 4.1697 - val_bias: 0.8761 - val_resolution: 1.7970
Epoch 29/50
1125/1125 - 8s - loss: 8.7526 - bias: 0.0898 - resolution: 2.8216 - val_loss: 4.3673 - val_bias: -2.6423e-01 - val_resolution: 2.0314
Epoch 30/50
1125/1125 - 8s - loss: 8.7202 - bias: 0.0855 - resolution: 2.8117 - val_loss: 4.2290 - val_bias: -1.4770e-01 - val_resolution: 2.0123
Epoch 31/50
1125/1125 - 8s - loss: 8.6606 - bias: 0.0905 - resolution: 2.8043 - val_loss: 4.4510 - val_bias: -1.0919e+00 - val_resolution: 1.7586
Epoch 32/50
1125/1125 - 8s - loss: 8.8006 - bias: 0.0843 - resolution: 2.8235 - val_loss: 3.2122 - val_bias: 0.3594 - val_resolution: 1.7116
Epoch 33/50
1125/1125 - 8s - loss: 8.6037 - bias: 0.0867 - resolution: 2.7932 - val_loss: 3.3244 - val_bias: -5.3664e-02 - val_resolution: 1.7777
Epoch 34/50
1125/1125 - 8s - loss: 8.4480 - bias: 0.0854 - resolution: 2.7728 - val_loss: 4.9355 - val_bias: -6.1919e-02 - val_resolution: 2.1743
Epoch 35/50
1125/1125 - 8s - loss: 8.3973 - bias: 0.0848 - resolution: 2.7553 - val_loss: 2.9338 - val_bias: -1.5135e-01 - val_resolution: 1.6573
Epoch 36/50
1125/1125 - 8s - loss: 8.4444 - bias: 0.0845 - resolution: 2.7592 - val_loss: 4.0460 - val_bias: -3.4717e-01 - val_resolution: 1.9152
Epoch 37/50
1125/1125 - 8s - loss: 8.4157 - bias: 0.0828 - resolution: 2.7608 - val_loss: 3.3563 - val_bias: 0.1284 - val_resolution: 1.7755
Epoch 38/50
1125/1125 - 8s - loss: 8.4045 - bias: 0.0835 - resolution: 2.7544 - val_loss: 3.2529 - val_bias: 0.0631 - val_resolution: 1.7567
Epoch 39/50
1125/1125 - 8s - loss: 8.1558 - bias: 0.0830 - resolution: 2.7154 - val_loss: 4.0424 - val_bias: 0.3233 - val_resolution: 1.9358
Epoch 40/50
1125/1125 - 8s - loss: 8.3666 - bias: 0.0801 - resolution: 2.7408 - val_loss: 2.9017 - val_bias: 0.4578 - val_resolution: 1.5972
Epoch 41/50
1125/1125 - 8s - loss: 7.9695 - bias: 0.0810 - resolution: 2.6883 - val_loss: 2.9058 - val_bias: -6.1835e-02 - val_resolution: 1.6628
Epoch 42/50
1125/1125 - 8s - loss: 7.8910 - bias: 0.0803 - resolution: 2.6773 - val_loss: 5.1250 - val_bias: 1.5005 - val_resolution: 1.6534
Epoch 43/50
1125/1125 - 8s - loss: 7.9850 - bias: 0.0815 - resolution: 2.6898 - val_loss: 2.8235 - val_bias: 0.1839 - val_resolution: 1.6263
Epoch 44/50
1125/1125 - 8s - loss: 8.1262 - bias: 0.0800 - resolution: 2.7059 - val_loss: 3.6428 - val_bias: 0.4255 - val_resolution: 1.8185
Epoch 45/50
1125/1125 - 8s - loss: 7.8745 - bias: 0.0794 - resolution: 2.6626 - val_loss: 3.5905 - val_bias: 0.0579 - val_resolution: 1.8435
Epoch 46/50
1125/1125 - 8s - loss: 7.9296 - bias: 0.0789 - resolution: 2.6789 - val_loss: 3.1791 - val_bias: 0.6325 - val_resolution: 1.6294
Epoch 47/50
1125/1125 - 8s - loss: 7.8913 - bias: 0.0777 - resolution: 2.6621 - val_loss: 2.7095 - val_bias: -1.1288e-01 - val_resolution: 1.5988
Epoch 48/50
1125/1125 - 8s - loss: 7.6065 - bias: 0.0775 - resolution: 2.6237 - val_loss: 4.8592 - val_bias: 1.2698 - val_resolution: 1.7565
Epoch 49/50
1125/1125 - 8s - loss: 7.8245 - bias: 0.0793 - resolution: 2.6488 - val_loss: 2.8825 - val_bias: 0.3158 - val_resolution: 1.6318
Epoch 50/50
1125/1125 - 8s - loss: 7.7593 - bias: 0.0783 - resolution: 2.6517 - val_loss: 3.1409 - val_bias: -5.9592e-01 - val_resolution: 1.6252

Plot training curves¶

In [17]:

fig, ax = plt.subplots(1, figsize=(8,5))
n = np.arange(len(fit.history['loss']))

ax.plot(n, fit.history['loss'], ls='--', c='k', label='loss')
ax.plot(n, fit.history['val_loss'], label='val_loss', c='k')

ax.set_xlabel('Epoch')
ax.set_ylabel('Loss')
ax.legend()
ax.semilogy()
ax.grid()
plt.show()

Performance of the DNN¶

After training the model, we can use the test set X_test to evaluate the performance of the DNN.

Particularly interesting are the resolution and the bias of the method. A bias close to 0 is desirable. A resolution below 3 EeV can be regarded as good.

In [18]:

energy_pred = model.predict(X_test, batch_size=128, verbose=1)[:,0]

157/157 [==============================] - 1s 4ms/step

In [19]:

diff = energy_pred - energy_test
resolution = np.std(diff)
plt.figure()
plt.hist(diff, bins=100)
plt.xlabel('$E_\mathrm{rec} - E_\mathrm{true}$')
plt.ylabel('# Events')
plt.text(0.95, 0.95, '$\sigma = %.3f$ EeV' % resolution, ha='right', va='top', transform=plt.gca().transAxes)
plt.text(0.95, 0.85, '$\mu = %.1f$ EeV' % diff.mean(), ha='right', va='top', transform=plt.gca().transAxes)
plt.grid()
plt.xlim(-10, 10)
plt.tight_layout()

After estimating the bias and the resolution, we can now inspect the reconstruction via a scatter plot.

Furthermore, we can study the energy dependence of the resolution. With increasing energy, the performance increases due to the lower sampling fluctuation of the ground-based particle detectors and the larger footprints.

In [20]:

x = [3, 10, 30, 100]
labels = ["3", "10", "30", "100"]

diff = energy_pred - energy_test

# Embedding plot
fig, axes = plt.subplots(1, 2, figsize=(20, 9))
axes[0].scatter(energy_test, energy_pred, s=3, alpha=0.60)
axes[0].set_xlabel(r"$E_{true}\;/\;\mathrm{EeV}$")
axes[0].set_ylabel(r"$E_{DNN}\;/\;\mathrm{EeV}$")

stat_box = r"$\mu = $ %.3f" % np.mean(diff) + " / EeV" + "\n" + "$\sigma = $ %.3f" % np.std(diff) + " / EeV"
axes[0].text(0.95, 0.2, stat_box, verticalalignment="top", horizontalalignment="right",
          transform=axes[0].transAxes, backgroundcolor="w")
axes[0].plot([np.min(energy_test), np.max(energy_test)],
             [np.min(energy_test), np.max(energy_test)], color="red")

sns.regplot(x=energy_test, y=diff / energy_test, x_estimator=np.std, x_bins=12,
            fit_reg=False, color="royalblue", ax=axes[1])
axes[1].tick_params(axis="both", which="major")
axes[1].set(xscale="log")
plt.xticks(x, labels)

axes[1].set_xlabel(r"$E_{true}\;/\;\mathrm{EeV}$")
axes[1].set_ylabel(r"$\sigma_{E}/E$")
axes[1].set_ylim(0, 0.2)
plt.show()