In [1]:

import pandas as pd

from orion.data import load_signal

1. Data¶

In [2]:

signal_name = 'S-1'

data = load_signal(signal_name)

data.head()

Out[2]:

	timestamp	value
0	1222819200	-0.366359
1	1222840800	-0.394108
2	1222862400	0.403625
3	1222884000	-0.362759
4	1222905600	-0.370746

2. Pipeline¶

In [3]:

from mlblocks import MLPipeline

pipeline_name = 'matrixprofile'

pipeline = MLPipeline(pipeline_name)

step by step execution¶

MLPipelines are compose of a squence of primitives, these primitives apply tranformation and calculation operations to the data and updates the variables within the pipeline. To view the primitives used by the pipeline, we access its primtivies attribute.

The matrixprofile contains 7 primitives. we will observe how the context (which are the variables held within the pipeline) are updated after the execution of each primitive.

In [4]:

pipeline.primitives

Out[4]:

['mlstars.custom.timeseries_preprocessing.time_segments_aggregate',
 'sklearn.impute.SimpleImputer',
 'sklearn.preprocessing.MinMaxScaler',
 'numpy.reshape',
 'stumpy.stump',
 'orion.primitives.timeseries_preprocessing.slice_array_by_dims',
 'numpy.reshape',
 'orion.primitives.timeseries_anomalies.find_anomalies']

time segments aggregate¶

this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.

input: X which is an n-dimensional sequence of values.
output:
- X sequence of aggregated values, one column for each aggregation method.
- index sequence of index values (first index of each aggregated segment).

In [5]:

context = pipeline.fit(data, output_=0)
context.keys()

Out[5]:

dict_keys(['X', 'index'])

In [6]:

for i, x in list(zip(context['index'], context['X']))[:5]:
    print("entry at {} has value {}".format(i, x))

entry at 1222819200 has value [-0.36635895]
entry at 1222840800 has value [-0.39410778]
entry at 1222862400 has value [0.4036246]
entry at 1222884000 has value [-0.36275906]
entry at 1222905600 has value [-0.37074649]

SimpleImputer¶

this primitive is an imputation transformer for filling missing values.

input: X which is an n-dimensional sequence of values.
output: X which is a transformed version of X.

In [7]:

step = 1

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[7]:

dict_keys(['index', 'X'])

MinMaxScaler¶

this primitive transforms features by scaling each feature to a given range.

input: X the data used to compute the per-feature minimum and maximum used for later scaling along the features axis.
output: X which is a transformed version of X.

In [8]:

step = 2

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[8]:

dict_keys(['index', 'X'])

In [9]:

# after scaling the data between [0, 1]
# in this example, no change is observed
# since the data was pre-handedly scaled

for i, x in list(zip(context['index'], context['X']))[:5]:
    print("entry at {} has value {}".format(i, x))

entry at 1222819200 has value [0.31682053]
entry at 1222840800 has value [0.30294611]
entry at 1222862400 has value [0.7018123]
entry at 1222884000 has value [0.31862047]
entry at 1222905600 has value [0.31462675]

reshape¶

this primitive flattens the array.

input: X n-dimensional values.
output: X which is a flat version of X.

In [10]:

step = 3

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[10]:

dict_keys(['index', 'X'])

In [11]:

context['X'].shape

Out[11]:

(10149,)

stump¶

this primitive computes the matrix profile of X.

input: X n-dimensional values.
output: y which is the matrix profile of X.

In [12]:

step = 4

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[12]:

dict_keys(['index', 'X', 'y'])

In [13]:

context['y'].shape

Out[13]:

(10050, 4)

slice array by dim¶

this primitive extracts the distance to the nearest neighbor from the matrix profile.

input: y n-dimensional values.
output: y which is the distance array in y.

In [14]:

step = 5

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[14]:

dict_keys(['index', 'X', 'y'])

In [15]:

context['y'].shape

Out[15]:

(10050, 1)

reshape¶

this primitive flattens the array.

input: y n-dimensional values.
output: errors which is a flat version of y.

In [16]:

step = 6

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

Out[16]:

dict_keys(['index', 'X', 'y', 'errors'])

find anomalies¶

this primitive extracts anomalies from sequences of errors following the approach explained in the related paper.

input:
- errors array of errors.
- index array of indices of errors.
output: y array containing start-index, end-index, score for each anomalous sequence that was found.

In [17]:

step = 7

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

/Users/sarah/opt/anaconda3/envs/stumpy/lib/python3.7/site-packages/scipy/optimize/optimize.py:761: RuntimeWarning: invalid value encountered in subtract
  np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):

Out[17]:

dict_keys(['index', 'errors', 'X', 'y'])

In [18]:

pd.DataFrame(context['y'], columns=['start', 'end', 'severity'])

Out[18]:

	start	end	severity
0	1.310386e+09	1.312826e+09	0.198253
1	1.398125e+09	1.401408e+09	1.728175