In [1]:

import pandas as pd

from orion.data import load_signal

1. Data¶

In [2]:

signal_name = 'S-1'

data = load_signal(signal_name)

data.head()

Out[2]:

	timestamp	value
0	1222819200	-0.366359
1	1222840800	-0.394108
2	1222862400	0.403625
3	1222884000	-0.362759
4	1222905600	-0.370746

2. Pipeline¶

In [3]:

from mlblocks import MLPipeline

pipeline_name = 'lstm_dynamic_threshold'

pipeline = MLPipeline(pipeline_name)

In [4]:

hyperparameters = {
    'keras.Sequential.LSTMTimeSeriesRegressor#1': {
        'epochs': 5,
        'verbose': True
    }
}

pipeline.set_hyperparameters(hyperparameters)

step by step execution¶

MLPipelines are compose of a squence of primitives, these primitives apply tranformation and calculation operations to the data and updates the variables within the pipeline. To view the primitives used by the pipeline, we access its primtivies attribute.

The lstm_dynamic_threshold contains 7 primitives. we will observe how the context (which are the variables held within the pipeline) are updated after the execution of each primitive.

In [5]:

pipeline.primitives

Out[5]:

['mlstars.custom.timeseries_preprocessing.time_segments_aggregate',
 'sklearn.impute.SimpleImputer',
 'sklearn.preprocessing.MinMaxScaler',
 'mlstars.custom.timeseries_preprocessing.rolling_window_sequences',
 'keras.Sequential.LSTMTimeSeriesRegressor',
 'orion.primitives.timeseries_errors.regression_errors',
 'orion.primitives.timeseries_anomalies.find_anomalies']

time segments aggregate¶

this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.

input: X which is an n-dimensional sequence of values.
output:
- X sequence of aggregated values, one column for each aggregation method.
- index sequence of index values (first index of each aggregated segment).

In [6]:

context = pipeline.fit(data, output_=0)
context.keys()

Out[6]:

dict_keys(['X', 'index'])

In [7]:

for i, x in list(zip(context['index'], context['X']))[:5]:
    print("entry at {} has value {}".format(i, x))

entry at 1222819200 has value [-0.36635895]
entry at 1222840800 has value [-0.39410778]
entry at 1222862400 has value [0.4036246]
entry at 1222884000 has value [-0.36275906]
entry at 1222905600 has value [-0.37074649]

SimpleImputer¶

this primitive is an imputation transformer for filling missing values.

input: X which is an n-dimensional sequence of values.
output: X which is a transformed version of X.

In [8]:

step = 1

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[8]:

dict_keys(['index', 'X'])

MinMaxScaler¶

this primitive transforms features by scaling each feature to a given range.

input: X the data used to compute the per-feature minimum and maximum used for later scaling along the features axis.
output: X which is a transformed version of X.

In [9]:

step = 2

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[9]:

dict_keys(['index', 'X'])

In [10]:

# after scaling the data between [-1, 1]
# in this example, no change is observed
# since the data was pre-handedly scaled

for i, x in list(zip(context['index'], context['X']))[:5]:
    print("entry at {} has value {}".format(i, x))

entry at 1222819200 has value [-0.36635895]
entry at 1222840800 has value [-0.39410778]
entry at 1222862400 has value [0.4036246]
entry at 1222884000 has value [-0.36275906]
entry at 1222905600 has value [-0.37074649]

rolling window sequence¶

this primitive generates many sub-sequences of the original sequence. it uses a rolling window approach to create the sub-sequences out of time series data.

input:
- X n-dimensional sequence to iterate over.
- index array containing the index values of X.
output:
- X input sequences.
- y target sequences.
- index first index value of each input sequence.
- target_index first index value of each target sequence.

In [11]:

step = 3

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[11]:

dict_keys(['index', 'X', 'y', 'target_index'])

In [12]:

# after slicing X into multiple sub-sequences
# we obtain a 3 dimensional matrix X where
# the shape indicates (# slices, window size, 1)
# and similarly y is (# slices, target size)

print("X shape = {}\ny shape = {}\nindex shape = {}\ntarget index shape = {}".format(
    context['X'].shape, context['y'].shape, context['index'].shape, context['target_index'].shape))

X shape = (9899, 250, 1)
y shape = (9899, 1)
index shape = (9899,)
target index shape = (9899,)

LSTMTimeSeriesRegressor¶

this is a prediction model with double stacked LSTM layers used as a time series regressor. you can read more about it in the related paper.

input:
- X n-dimensional array containing the input sequences for the model.
- y n-dimensional array containing the target sequences for the model.
output: y_hat predicted values.

In [13]:

step = 4

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

WARNING:tensorflow:From /Users/sarahalnegheimish/opt/anaconda3/envs/orion/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /Users/sarahalnegheimish/opt/anaconda3/envs/orion/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 7919 samples, validate on 1980 samples
Epoch 1/5
7919/7919 [==============================] - 29s 4ms/step - loss: 0.1960 - mse: 0.1960 - val_loss: 0.2748 - val_mse: 0.2748
Epoch 2/5
7919/7919 [==============================] - 31s 4ms/step - loss: 0.1928 - mse: 0.1928 - val_loss: 0.3016 - val_mse: 0.3016
Epoch 3/5
7919/7919 [==============================] - 32s 4ms/step - loss: 0.1898 - mse: 0.1898 - val_loss: 0.3237 - val_mse: 0.3237
Epoch 4/5
7919/7919 [==============================] - 29s 4ms/step - loss: 0.1886 - mse: 0.1886 - val_loss: 0.2640 - val_mse: 0.2640
Epoch 5/5
7919/7919 [==============================] - 29s 4ms/step - loss: 0.1863 - mse: 0.1863 - val_loss: 0.4433 - val_mse: 0.4433
9899/9899 [==============================] - 11s 1ms/step

Out[13]:

dict_keys(['index', 'target_index', 'X', 'y', 'y_hat'])

In [14]:

for i, y, y_hat in list(zip(context['target_index'], context['y'], context['y_hat']))[:5]:
    print("entry at {} has value {}, predicted value {}".format(i, y, y_hat))

entry at 1228219200 has value [-0.3741225], predicted value [0.15370035]
entry at 1228240800 has value [1.], predicted value [0.18855423]
entry at 1228262400 has value [-0.35400432], predicted value [0.13846633]
entry at 1228284000 has value [1.], predicted value [0.11550236]
entry at 1228305600 has value [-0.38154089], predicted value [0.00080755]

regression errors¶

this primitive computes an array of absolute errors comparing predictions and expected output. Optionally smooth them using EWMA.

input:
- y ground truth.
- y_hat predicted values.
output: errors array of errors.

In [15]:

step = 5

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[15]:

dict_keys(['index', 'target_index', 'y_hat', 'X', 'y', 'errors'])

In [16]:

for i, e in list(zip(context['target_index'], context['errors']))[:5]:
    print("entry at {} has error value {:.3f}".format(i, e))

entry at 1228219200 has error value 0.528
entry at 1228240800 has error value 0.671
entry at 1228262400 has error value 0.610
entry at 1228284000 has error value 0.681
entry at 1228305600 has error value 0.619

find anomalies¶

this primitive extracts anomalies from sequences of errors following the approach explained in the related paper.

input:
- errors array of errors.
- target_index array of indices of errors.
output: y array containing start-index, end-index, score for each anomalous sequence that was found.

In [17]:

step = 6

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[17]:

dict_keys(['index', 'target_index', 'y_hat', 'errors', 'X', 'y'])

In [18]:

pd.DataFrame(context['y'], columns=['start', 'end', 'severity'])

Out[18]:

	start	end	severity
0	1.228219e+09	1.229472e+09	0.614295
1	1.400134e+09	1.404086e+09	0.282328