In the first step, we setup the environment and load the CSV that we want to process.
To do so, we need to import the orion.data.load_signal
function and call it passing
the path to the CSV file.
In this case, we will be loading the S-1.csv
file from inside the data/multivariate
folder.
from orion.data import load_signal
signal_path = 'multivariate/S-1'
data = load_signal(signal_path)
data.head()
timestamp | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ... | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1222819200 | -0.366359 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 1222840800 | -0.394108 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 1222862400 | 0.403625 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 1222884000 | -0.362759 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 1222905600 | -0.370746 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 26 columns
Once we have the data, let us try to use the LSTM pipeline to analyze it and search for anomalies.
In order to do so, we will import the Orion
class from orion.core
and pass it
the loaded data and the path to the pipeline JSON that we want to use.
In this case, we will be using the lstm_dynamic_threshold
pipeline from inside the orion
folder.
In addition, we setup the hyperparameters to correctly identify the signal we are trying to predict. In this case, dimension 0
is the signal value and such we set target_column
to 0
. Note that 0
refers to the location of the channel rather than the name.
from orion import Orion
hyperparameters = {
"mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": {
'target_column': 0
},
'keras.Sequential.LSTMTimeSeriesRegressor#1': {
'epochs': 5,
'verbose': True
}
}
orion = Orion(
pipeline='lstm_dynamic_threshold',
hyperparameters=hyperparameters
)
orion.fit(data)
Using TensorFlow backend.
WARNING:tensorflow:From /Users/sarah/opt/anaconda3/envs/orion/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. WARNING:tensorflow:From /Users/sarah/opt/anaconda3/envs/orion/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. Train on 7919 samples, validate on 1980 samples Epoch 1/5 7919/7919 [==============================] - 39s 5ms/step - loss: 0.2056 - mse: 0.2056 - val_loss: 0.2614 - val_mse: 0.2614 Epoch 2/5 7919/7919 [==============================] - 35s 4ms/step - loss: 0.1993 - mse: 0.1993 - val_loss: 0.2581 - val_mse: 0.2581 Epoch 3/5 7919/7919 [==============================] - 37s 5ms/step - loss: 0.1973 - mse: 0.1973 - val_loss: 0.2637 - val_mse: 0.2637 Epoch 4/5 7919/7919 [==============================] - 36s 5ms/step - loss: 0.1934 - mse: 0.1934 - val_loss: 0.2594 - val_mse: 0.2594 Epoch 5/5 7919/7919 [==============================] - 36s 5ms/step - loss: 0.1933 - mse: 0.1933 - val_loss: 0.2808 - val_mse: 0.2808 9899/9899 [==============================] - 13s 1ms/step
The output will be a pandas.DataFrame
containing a table with the detected anomalies.
orion.detect(data)
9899/9899 [==============================] - 12s 1ms/step
start | end | severity | |
---|---|---|---|
0 | 1228219200 | 1229472000 | 0.623775 |
For reconstruction based pipelines, we need to specify the shape of the input and target sequences. For example, assume we are using the lstm_autoencoder
pipeline, we set the hyperparameter values
hyperparameters = {
"mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": {
'window_size': 100,
'target_column': 0
},
'keras.Sequential.LSTMSeq2Seq#1': {
'epochs': 5,
'verbose': True,
'window_size': 100,
'input_shape': [100, 25],
'target_shape': [100, 1],
}
}
where the shape of the input is dependent on
window_size
andSimilarly, the shape of the output is dependent on the window_size
. Currently, we are focusing on multivariate input and univariate output, therefore the target shape should always be [window_size
, 1].
hyperparameters = {
"mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": {
'window_size': 150,
'target_column': 0
},
'keras.Sequential.LSTMSeq2Seq#1': {
'epochs': 5,
'verbose': True,
'window_size': 150,
'input_shape': [150, 25],
'target_shape': [150, 1],
}
}
orion = Orion(
pipeline='lstm_autoencoder',
hyperparameters=hyperparameters
)
orion.fit(data)
Train on 7999 samples, validate on 2000 samples Epoch 1/5 7999/7999 [==============================] - 17s 2ms/step - loss: 0.2017 - mse: 0.2017 - val_loss: 0.2593 - val_mse: 0.2593 Epoch 2/5 7999/7999 [==============================] - 16s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2604 - val_mse: 0.2604 Epoch 3/5 7999/7999 [==============================] - 17s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2579 - val_mse: 0.2579 Epoch 4/5 7999/7999 [==============================] - 16s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2577 - val_mse: 0.2577 Epoch 5/5 7999/7999 [==============================] - 17s 2ms/step - loss: 0.1984 - mse: 0.1984 - val_loss: 0.2558 - val_mse: 0.2558 9999/9999 [==============================] - 7s 677us/step
TadGAN is also a reconstruction based pipeline, thus we specify the input_shape
to be of multivariate shape as needed.
hyperparameters = {
'orion.primitives.tadgan.TadGAN#1': {
'epochs': 5,
'verbose': True,
'input_shape': [100, 25]
}
}