import pandas as pd
from orion.data import load_signal
signal_name = 'S-1'
data = load_signal(signal_name)
data.head()
timestamp | value | |
---|---|---|
0 | 1222819200 | -0.366359 |
1 | 1222840800 | -0.394108 |
2 | 1222862400 | 0.403625 |
3 | 1222884000 | -0.362759 |
4 | 1222905600 | -0.370746 |
from mlblocks import MLPipeline
pipeline_name = 'matrixprofile'
pipeline = MLPipeline(pipeline_name)
MLPipelines are compose of a squence of primitives, these primitives apply tranformation and calculation operations to the data and updates the variables within the pipeline. To view the primitives used by the pipeline, we access its primtivies
attribute.
The matrixprofile
contains 7 primitives. we will observe how the context
(which are the variables held within the pipeline) are updated after the execution of each primitive.
pipeline.primitives
['mlstars.custom.timeseries_preprocessing.time_segments_aggregate', 'sklearn.impute.SimpleImputer', 'sklearn.preprocessing.MinMaxScaler', 'numpy.reshape', 'stumpy.stump', 'orion.primitives.timeseries_preprocessing.slice_array_by_dims', 'numpy.reshape', 'orion.primitives.timeseries_anomalies.find_anomalies']
this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.
X
which is an n-dimensional sequence of values.X
sequence of aggregated values, one column for each aggregation method.index
sequence of index values (first index of each aggregated segment).context = pipeline.fit(data, output_=0)
context.keys()
dict_keys(['X', 'index'])
for i, x in list(zip(context['index'], context['X']))[:5]:
print("entry at {} has value {}".format(i, x))
entry at 1222819200 has value [-0.36635895] entry at 1222840800 has value [-0.39410778] entry at 1222862400 has value [0.4036246] entry at 1222884000 has value [-0.36275906] entry at 1222905600 has value [-0.37074649]
this primitive is an imputation transformer for filling missing values.
X
which is an n-dimensional sequence of values.X
which is a transformed version of X.step = 1
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X'])
this primitive transforms features by scaling each feature to a given range.
X
the data used to compute the per-feature minimum and maximum used for later scaling along the features axis.X
which is a transformed version of X.step = 2
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X'])
# after scaling the data between [0, 1]
# in this example, no change is observed
# since the data was pre-handedly scaled
for i, x in list(zip(context['index'], context['X']))[:5]:
print("entry at {} has value {}".format(i, x))
entry at 1222819200 has value [0.31682053] entry at 1222840800 has value [0.30294611] entry at 1222862400 has value [0.7018123] entry at 1222884000 has value [0.31862047] entry at 1222905600 has value [0.31462675]
this primitive flattens the array.
X
n-dimensional values.X
which is a flat version of X.step = 3
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X'])
context['X'].shape
(10149,)
this primitive computes the matrix profile of X
.
X
n-dimensional values.y
which is the matrix profile of X.step = 4
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X', 'y'])
context['y'].shape
(10050, 4)
this primitive extracts the distance to the nearest neighbor from the matrix profile.
y
n-dimensional values.y
which is the distance array in y.step = 5
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X', 'y'])
context['y'].shape
(10050, 1)
this primitive flattens the array.
y
n-dimensional values.errors
which is a flat version of y.step = 6
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
dict_keys(['index', 'X', 'y', 'errors'])
this primitive extracts anomalies from sequences of errors following the approach explained in the related paper.
errors
array of errors.index
array of indices of errors.y
array containing start-index, end-index, score for each anomalous sequence that was found.step = 7
context = pipeline.fit(**context, output_=step, start_=step)
context.keys()
/Users/sarah/opt/anaconda3/envs/stumpy/lib/python3.7/site-packages/scipy/optimize/optimize.py:761: RuntimeWarning: invalid value encountered in subtract np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):
dict_keys(['index', 'errors', 'X', 'y'])
pd.DataFrame(context['y'], columns=['start', 'end', 'severity'])
start | end | severity | |
---|---|---|---|
0 | 1.310386e+09 | 1.312826e+09 | 0.198253 |
1 | 1.398125e+09 | 1.401408e+09 | 1.728175 |