import pandas as pd
from orion.data import load_signal
signal_name = 'S-1'
data = load_signal(signal_name)
data.head()
timestamp | value | |
---|---|---|
0 | 1222819200 | -0.366359 |
1 | 1222840800 | -0.394108 |
2 | 1222862400 | 0.403625 |
3 | 1222884000 | -0.362759 |
4 | 1222905600 | -0.370746 |
from mlblocks import MLPipeline
pipeline_name = 'lstm_dynamic_threshold'
pipeline = MLPipeline(pipeline_name)
hyperparameters = {
'keras.Sequential.LSTMTimeSeriesRegressor#1': {
'epochs': 5,
'verbose': True
}
}
pipeline.set_hyperparameters(hyperparameters)
MLPipelines are compose of a squence of primitives, these primitives apply tranformation and calculation operations to the data and updates the variables within the pipeline. To view the primitives used by the pipeline, we access its primtivies
attribute.
The lstm_dynamic_threshold
contains 7 primitives. we will observe how the context
(which are the variables held within the pipeline) are updated after the execution of each primitive.
pipeline.primitives
['mlstars.custom.timeseries_preprocessing.time_segments_aggregate', 'sklearn.impute.SimpleImputer', 'sklearn.preprocessing.MinMaxScaler', 'mlstars.custom.timeseries_preprocessing.rolling_window_sequences', 'keras.Sequential.LSTMTimeSeriesRegressor', 'orion.primitives.timeseries_errors.regression_errors', 'orion.primitives.timeseries_anomalies.find_anomalies']
this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.
X
which is an n-dimensional sequence of values.X
sequence of aggregated values, one column for each aggregation method.index
sequence of index values (first index of each aggregated segment).context = pipeline.fit(data, output_=0)
context.keys()
dict_keys(['X', 'index'])
for i, x in list(zip(context['index'], context['X']))[:5]:
print("entry at {} has value {}".format(i, x))
entry at 1222819200 has value [-0.36635895] entry at 1222840800 has value [-0.39410778] entry at 1222862400 has value [0.4036246] entry at 1222884000 has value [-0.36275906] entry at 1222905600 has value [-0.37074649]
this primitive is an imputation transformer for filling missing values.
X
which is an n-dimensional sequence of values.X
which is a transformed version of X.step = 1
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X'])
this primitive transforms features by scaling each feature to a given range.
X
the data used to compute the per-feature minimum and maximum used for later scaling along the features axis.X
which is a transformed version of X.step = 2
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X'])
# after scaling the data between [-1, 1]
# in this example, no change is observed
# since the data was pre-handedly scaled
for i, x in list(zip(context['index'], context['X']))[:5]:
print("entry at {} has value {}".format(i, x))
entry at 1222819200 has value [-0.36635895] entry at 1222840800 has value [-0.39410778] entry at 1222862400 has value [0.4036246] entry at 1222884000 has value [-0.36275906] entry at 1222905600 has value [-0.37074649]
this primitive generates many sub-sequences of the original sequence. it uses a rolling window approach to create the sub-sequences out of time series data.
X
n-dimensional sequence to iterate over.index
array containing the index values of X.X
input sequences.y
target sequences.index
first index value of each input sequence.target_index
first index value of each target sequence.step = 3
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X', 'y', 'target_index'])
# after slicing X into multiple sub-sequences
# we obtain a 3 dimensional matrix X where
# the shape indicates (# slices, window size, 1)
# and similarly y is (# slices, target size)
print("X shape = {}\ny shape = {}\nindex shape = {}\ntarget index shape = {}".format(
context['X'].shape, context['y'].shape, context['index'].shape, context['target_index'].shape))
X shape = (9899, 250, 1) y shape = (9899, 1) index shape = (9899,) target index shape = (9899,)
this is a prediction model with double stacked LSTM layers used as a time series regressor. you can read more about it in the related paper.
X
n-dimensional array containing the input sequences for the model.y
n-dimensional array containing the target sequences for the model.y_hat
predicted values.step = 4
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
WARNING:tensorflow:From /Users/sarahalnegheimish/opt/anaconda3/envs/orion/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. WARNING:tensorflow:From /Users/sarahalnegheimish/opt/anaconda3/envs/orion/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. Train on 7919 samples, validate on 1980 samples Epoch 1/5 7919/7919 [==============================] - 29s 4ms/step - loss: 0.1960 - mse: 0.1960 - val_loss: 0.2748 - val_mse: 0.2748 Epoch 2/5 7919/7919 [==============================] - 31s 4ms/step - loss: 0.1928 - mse: 0.1928 - val_loss: 0.3016 - val_mse: 0.3016 Epoch 3/5 7919/7919 [==============================] - 32s 4ms/step - loss: 0.1898 - mse: 0.1898 - val_loss: 0.3237 - val_mse: 0.3237 Epoch 4/5 7919/7919 [==============================] - 29s 4ms/step - loss: 0.1886 - mse: 0.1886 - val_loss: 0.2640 - val_mse: 0.2640 Epoch 5/5 7919/7919 [==============================] - 29s 4ms/step - loss: 0.1863 - mse: 0.1863 - val_loss: 0.4433 - val_mse: 0.4433 9899/9899 [==============================] - 11s 1ms/step
dict_keys(['index', 'target_index', 'X', 'y', 'y_hat'])
for i, y, y_hat in list(zip(context['target_index'], context['y'], context['y_hat']))[:5]:
print("entry at {} has value {}, predicted value {}".format(i, y, y_hat))
entry at 1228219200 has value [-0.3741225], predicted value [0.15370035] entry at 1228240800 has value [1.], predicted value [0.18855423] entry at 1228262400 has value [-0.35400432], predicted value [0.13846633] entry at 1228284000 has value [1.], predicted value [0.11550236] entry at 1228305600 has value [-0.38154089], predicted value [0.00080755]
this primitive computes an array of absolute errors comparing predictions and expected output. Optionally smooth them using EWMA.
y
ground truth.y_hat
predicted values.errors
array of errors.step = 5
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'target_index', 'y_hat', 'X', 'y', 'errors'])
for i, e in list(zip(context['target_index'], context['errors']))[:5]:
print("entry at {} has error value {:.3f}".format(i, e))
entry at 1228219200 has error value 0.528 entry at 1228240800 has error value 0.671 entry at 1228262400 has error value 0.610 entry at 1228284000 has error value 0.681 entry at 1228305600 has error value 0.619
this primitive extracts anomalies from sequences of errors following the approach explained in the related paper.
errors
array of errors.target_index
array of indices of errors.y
array containing start-index, end-index, score for each anomalous sequence that was found.step = 6
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'target_index', 'y_hat', 'errors', 'X', 'y'])
pd.DataFrame(context['y'], columns=['start', 'end', 'severity'])
start | end | severity | |
---|---|---|---|
0 | 1.228219e+09 | 1.229472e+09 | 0.614295 |
1 | 1.400134e+09 | 1.404086e+09 | 0.282328 |