Using Orion on Multivariate Input¶

In this notebook, we demonstrate how you can use multivariate time series in Orion. We will walk through the process using NASA's dataset, you can find the original data in Telemanom github or directly from their S3 bucket.

1. Load the data¶

In the first step, we setup the environment and load the CSV that we want to process.

To do so, we need to import the orion.data.load_signal function and call it passing the path to the CSV file.

In this case, we will be loading the S-1.csv file from inside the data/multivariate folder.

In [1]:

from orion.data import load_signal

signal_path = 'multivariate/S-1'

data = load_signal(signal_path)
data.head()

Out[1]:

	timestamp	0	...
0	1222819200	-0.366359	...
1	1222840800	-0.394108	...
2	1222862400	0.403625	...
3	1222884000	-0.362759	...
4	1222905600	-0.370746	...

5 rows × 26 columns

2. Detect anomalies using Orion¶

Once we have the data, let us try to use the LSTM pipeline to analyze it and search for anomalies.

In order to do so, we will import the Orion class from orion.core and pass it the loaded data and the path to the pipeline JSON that we want to use.

In this case, we will be using the lstm_dynamic_threshold pipeline from inside the orion folder.

In addition, we setup the hyperparameters to correctly identify the signal we are trying to predict. In this case, dimension 0 is the signal value and such we set target_column to 0. Note that 0 refers to the location of the channel rather than the name.

In [3]:

from orion import Orion

hyperparameters = {
    "mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": {
        'target_column': 0 
    },
    'keras.Sequential.LSTMTimeSeriesRegressor#1': {
        'epochs': 5,
        'verbose': True
    }
}

orion = Orion(
    pipeline='lstm_dynamic_threshold',
    hyperparameters=hyperparameters
)

orion.fit(data)

Using TensorFlow backend.

WARNING:tensorflow:From /Users/sarah/opt/anaconda3/envs/orion/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /Users/sarah/opt/anaconda3/envs/orion/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 7919 samples, validate on 1980 samples
Epoch 1/5
7919/7919 [==============================] - 39s 5ms/step - loss: 0.2056 - mse: 0.2056 - val_loss: 0.2614 - val_mse: 0.2614
Epoch 2/5
7919/7919 [==============================] - 35s 4ms/step - loss: 0.1993 - mse: 0.1993 - val_loss: 0.2581 - val_mse: 0.2581
Epoch 3/5
7919/7919 [==============================] - 37s 5ms/step - loss: 0.1973 - mse: 0.1973 - val_loss: 0.2637 - val_mse: 0.2637
Epoch 4/5
7919/7919 [==============================] - 36s 5ms/step - loss: 0.1934 - mse: 0.1934 - val_loss: 0.2594 - val_mse: 0.2594
Epoch 5/5
7919/7919 [==============================] - 36s 5ms/step - loss: 0.1933 - mse: 0.1933 - val_loss: 0.2808 - val_mse: 0.2808
9899/9899 [==============================] - 13s 1ms/step

The output will be a pandas.DataFrame containing a table with the detected anomalies.

In [4]:

orion.detect(data)

9899/9899 [==============================] - 12s 1ms/step

Out[4]:

	start	end	severity
0	1228219200	1229472000	0.623775

For reconstruction based pipelines, we need to specify the shape of the input and target sequences. For example, assume we are using the lstm_autoencoder pipeline, we set the hyperparameter values

hyperparameters = {
    "mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": {
        'window_size': 100,
        'target_column': 0 
    },
    'keras.Sequential.LSTMSeq2Seq#1': {
        'epochs': 5,
        'verbose': True,
        'window_size': 100,
        'input_shape': [100, 25],
        'target_shape': [100, 1],
    }
}

where the shape of the input is dependent on

window_size and
the number of channels in the data.

Similarly, the shape of the output is dependent on the window_size. Currently, we are focusing on multivariate input and univariate output, therefore the target shape should always be [window_size, 1].

In [8]:

hyperparameters = {
    "mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": {
        'window_size': 150,
        'target_column': 0 
    },
    'keras.Sequential.LSTMSeq2Seq#1': {
        'epochs': 5,
        'verbose': True,
        'window_size': 150,
        'input_shape': [150, 25],
        'target_shape': [150, 1],
    }
}

orion = Orion(
    pipeline='lstm_autoencoder',
    hyperparameters=hyperparameters
)

orion.fit(data)

Train on 7999 samples, validate on 2000 samples
Epoch 1/5
7999/7999 [==============================] - 17s 2ms/step - loss: 0.2017 - mse: 0.2017 - val_loss: 0.2593 - val_mse: 0.2593
Epoch 2/5
7999/7999 [==============================] - 16s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2604 - val_mse: 0.2604
Epoch 3/5
7999/7999 [==============================] - 17s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2579 - val_mse: 0.2579
Epoch 4/5
7999/7999 [==============================] - 16s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2577 - val_mse: 0.2577
Epoch 5/5
7999/7999 [==============================] - 17s 2ms/step - loss: 0.1984 - mse: 0.1984 - val_loss: 0.2558 - val_mse: 0.2558
9999/9999 [==============================] - 7s 677us/step

TadGAN is also a reconstruction based pipeline, thus we specify the input_shape to be of multivariate shape as needed.

hyperparameters = {
    'orion.primitives.tadgan.TadGAN#1': {
        'epochs': 5,
        'verbose': True,
        'input_shape': [100, 25]
    }
}