Notebook

Visualizing impulse responses of Echo State Networks¶

Introduction¶

In this notebook, the impact of different hyper-parameters of an ESN are explained. The notebook depends on just a small variety of packages: numpy, matplotlib, IPython and pyrcn.

In [1]:

import numpy as np
from collections.abc import Iterable

import matplotlib.pyplot as plt
from matplotlib import ticker
from mpl_toolkits.axes_grid1 import make_axes_locatable
from ipywidgets import interact
import ipywidgets as widgets
import seaborn as sns

sns.set_theme(context="talk")
# sns.set(font="Times New Roman")
from pyrcn.base.blocks import InputToNode, NodeToNode
from input_to_node import SimpleInputToNode
from node_to_node import DLRBNodeToNode
from sklearn.pipeline import Pipeline

%matplotlib notebook

In [2]:

def build_visualization_widget(plot_func, estimator, params, X):
    def visualize(func, estimator, constant_params, X, axs, **interactive_params):
        params = constant_params.copy()
        params.update(interactive_params)
        func(estimator, params, X, axs)
    _, axs = plt.subplots(figsize=(3.5, 2.25))
    constant_params = dict()
    interactive_params = dict()
    for key, value in zip(params.keys(), params.values()):
        if isinstance(value, Iterable) and not isinstance(value, str):
            interactive_params[key] = value
        else:
            constant_params[key] = value
    interact(lambda **interactive_params: visualize(my_custom_plot, 
                                                    estimator, 
                                                    constant_params, 
                                                    X, axs, **interactive_params), **interactive_params)

In this notebook, we feed an example impulse through the ESN, treating it as a non-linear filter. The unit impuls starts at n=5.

In [3]:

X = np.zeros(shape=(100, ), dtype=int)
X[5] = 1

fix, axs = plt.subplots()
axs.stem(np.arange(100), X)
axs.set_xlim([-0.5, 20.5])
axs.xaxis.set_major_locator(ticker.MultipleLocator(2))
axs.set_xlabel('n')
axs.set_ylabel('u[n]')
axs.legend(("Unit impulse", "Baseline"))

plt.savefig('input_impulse.pdf', bbox_inches='tight')

In this notebook, we record the hidden_layer_state in each time step

In [7]:

def my_custom_plot(estimator, params, X, axs):
    estimator.set_params(**params)
    hidden_layer_state = estimator.fit_transform(X)
    if axs.images:
        axs.images[-1].colorbar.remove()
    axs.clear()
    im=axs.imshow(hidden_layer_state.T, cmap="RdBu", vmin=-1., vmax=1.)
    axs.set_xlabel("Time step")
    axs.set_ylabel("Neuron index")
    divider = make_axes_locatable(axs)
    cax = divider.append_axes("right", size="5%", pad=0.05)
    plt.colorbar(mappable=im, cax=cax)
    plt.show()

At first, show the impact of different input scaling factors.

Therefore, we neutralize the other hyper-parameters, i.e., no recurrent connections ($\rho = 0$), no bias ($\alpha_{\mathrm{b}} = 0$) and no leakage ($\lambda = 1$).

Visualizing this, we can see exactly what we might expect. We have chosen an input scaling factor of 0.1. Thus, the reservoir state is non-zero for exactly one sample. We can see that all reservoir states are zero all the times except for $n=5$, when the impulse is fed into the ESN.

The absolute values of the reservoir states lie between 0 and 0.1.

If we increase the input scaling factor, each reservoir state still has only one non-zero value at $n=5$ as before, just with higher activations up to 0.8. The $\tanh$ non-linearity is damping the reservoir states so that they cannot reach 1.

In [5]:

InputToNode()

Out[5]:

InputToNode()

In [6]:

params = {'hidden_layer_size': 50,
          'input_activation': 'identity',
          'input_scaling': (0.1, 2.0, 0.1),
          'bias_scaling':  0.0}
build_visualization_widget(plot_func=my_custom_plot, estimator=InputToNode(), 
                           params=params, X=X.reshape(-1, 1))

interactive(children=(FloatSlider(value=1.0, description='input_scaling', max=2.0, min=0.1), Output()), _dom_c…

In [7]:

params = {'hidden_layer_size': 50,
          'input_activation': 'identity',
          'input_scaling': 1.0,
          'bias_scaling':  (0.0, 0.5, 0.05)}
build_visualization_widget(plot_func=my_custom_plot, estimator=InputToNode(), 
                           params=params, X=X.reshape(-1, 1))

interactive(children=(FloatSlider(value=0.25, description='bias_scaling', max=0.5, step=0.05), Output()), _dom…

Let's keep the input scaling factor fixed to 1.0 for the next experiments. Echo State Networks have two ways to incorporate past information.

Next, let us analyze the spectral radius. Therefore, we set it to 0.3 in the following example.

We can observe that the impulse responses are starting at n=5 and decaying until reaching zero after a short time. Obviously, the reservoir states are decaying rather fast, because the recurrent connections are small compared to the input scaling.

If we increase the spectral radius close to 1.0, the values are still bounded between [0.8, 0.8], but the reservoir states are active over a longer time now.

If we further increase the spectral radius, we can see that the reservoir states are decaying very slowly, and they are oscillating with a resonance frequency. For many tasks, it is indeed necessary to preserve the echo state property of reservoir and keep 𝜌<1. However in some cases, such as time-series prediction, the spectral radius can be larger than 1.

In [8]:

Pipeline([("i2n", InputToNode()), ("n2n", NodeToNode())])

Out[8]:

Pipeline(steps=[('i2n', InputToNode()), ('n2n', NodeToNode())])

In [9]:

params = {'i2n__hidden_layer_size': 50,
          'n2n__hidden_layer_size': 50,
          'i2n__input_activation': 'identity',
          'i2n__input_scaling': 1.0,
          'i2n__bias_scaling':  0.0,
          'n2n__spectral_radius': (0.0, 2.0, 0.1),
          'n2n__leakage': 1.0,
          'n2n__k_rec': 10,
          'n2n__reservoir_activation': 'tanh',
          'n2n__bidirectional': False
         }
build_visualization_widget(plot_func=my_custom_plot, estimator=Pipeline([("i2n", InputToNode()),
                                                                         ("n2n", NodeToNode())]),
                           params=params, X=X.reshape(-1, 1))

interactive(children=(FloatSlider(value=1.0, description='n2n__spectral_radius', max=2.0), Output()), _dom_cla…

For some use cases, the reservoir should behave more non-linear. Therefore, we can play around with the bias. Here, we will show the impact of the bias scaling factor just for one example and increase it to 0.2.

The spectral radius is decreased to 0.9 in order to fulfil the Echo State Property.

Two impacts of the bias scaling can be mainly observed: (1) The absolute value of the stable states of the reservoir neurons is approximately distributed from 0 to 0.2 and each neuron has its own stable state. When new information from the input is passed to the reservoir neurons, this is the excitation point. (2) Before the impulse arrives in the reservoir (𝑛=5), the states are approaching their stable state. Due to the spectral radius, each reservoir neuron is connected to other neurons and thus feeds the constant bias through the network, until each neuron has reached its final state.

In [10]:

params = {'i2n__hidden_layer_size': 50,
          'n2n__hidden_layer_size': 50,
          'i2n__input_activation': 'identity',
          'i2n__input_scaling': 1.0,
          'i2n__bias_scaling':  (0.0, 1.0, 0.1),
          'n2n__spectral_radius': 0.9,
          'n2n__leakage': 1.0,
          'n2n__k_rec': 10,
          'n2n__reservoir_activation': 'tanh',
          'n2n__bidirectional': False
         }

build_visualization_widget(plot_func=my_custom_plot, estimator=Pipeline([("i2n", InputToNode()),
                                                                         ("n2n", NodeToNode())]),
                           params=params, X=X.reshape(-1, 1))

interactive(children=(FloatSlider(value=0.5, description='i2n__bias_scaling', max=1.0), Output()), _dom_classe…

Finally, let us observe the impact of the leakage. Leaky integration is the other way to incorporate past information into the reservoir. This works by keeping the previous reservoir states over a longer time.

The leakage behaves in the same way for all nodes in the reservoir and acts like a low-pass filter. The magnitude is strongly damped, and all reservoir states are decaying exponentially over a longer time. Due to the spectral radius, all neurons have individual decaying times.

If we would like to incorporate future information into the reservoir, we can pass our input samples for- and backward through the reservoir. Therefore, we set bidirectional to True.

We can see the clear effect of usind bidirectional ESNs:

Because of the additional backward-pass, the number of reservoir states is doubled, and we can see that they decay in both, forward and backward direction. This is especially useful for some classification tasks.

In [11]:

params = {'i2n__hidden_layer_size': 50,
          'n2n__hidden_layer_size': 50,
          'i2n__input_activation': 'identity',
          'i2n__input_scaling': 1.0,
          'i2n__bias_scaling':  0.0,
          'n2n__spectral_radius': 0.9,
          'n2n__leakage': 1.0,
          'n2n__k_rec': 10,
          'n2n__reservoir_activation': 'tanh',
          'n2n__bidirectional': [False, True]
         }

build_visualization_widget(plot_func=my_custom_plot, estimator=Pipeline([("i2n", InputToNode()),
                                                                         ("n2n", NodeToNode())]),
                           params=params, X=X.reshape(-1, 1))

interactive(children=(Dropdown(description='n2n__bidirectional', options=(False, True), value=False), Output()…

Try to play around with all these hyper-parameters!

In [8]:

params = {'i2n__hidden_layer_size': 50,
          'n2n__hidden_layer_size': 50,
          'i2n__input_activation': 'identity',
          'i2n__input_scaling': (0.1, 1.0, 0.1),
          'i2n__bias_scaling':  (0.0, 0.5, 0.1),
          'n2n__spectral_radius': (0.0, 1.5, 0.1),
          'n2n__leakage': (0.1, 1.0, 0.1),
          'n2n__k_rec': 10,
          'n2n__reservoir_activation': ['tanh', 'relu', 'identity'],
          'n2n__bidirectional': [False, True]
         }

build_visualization_widget(plot_func=my_custom_plot, estimator=Pipeline([("i2n", InputToNode()),
                                                                         ("n2n", NodeToNode())]),
                           params=params, X=X.reshape(-1, 1))

interactive(children=(FloatSlider(value=0.5, description='i2n__input_scaling', max=1.0, min=0.1), FloatSlider(…

In [ ]: