This is a demo notebook showing how to use azure pipeline on a signal using the orion.analysis.analyze
function. For more information about the usage of microsoft's anomaly detection API, view their documentation here.
In the first step, we load the signal that we want to process.
To do so, we need to import the orion.data.load_signal
function and call it passing
either the path to the CSV file or the name of the signal to fetch fromm the s3 bucket
.
In this case, we will be loading the S-1
.
from orion.data import load_signal
signal_path = 'S-1'
data = load_signal(signal_path)
data.head()
timestamp | value | |
---|---|---|
0 | 1222819200 | -0.366359 |
1 | 1222840800 | -0.394108 |
2 | 1222862400 | 0.403625 |
3 | 1222884000 | -0.362759 |
4 | 1222905600 | -0.370746 |
To use azure
pipeline, we first need two important information: subscription_key
and endpoint
. In order to obtain them, you must setup an Anomaly Detection resource on Azure portal, follow the steps mentioned here to setup your resource instance.
Once that's accomplished, update the hyperparameter dictionary specified to the values of your instance.
# your subscription key and endpoint
subscription_key = None
endpoint = None
hyperparameters = {
"mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1": {
"interval": 21600,
},
"orion.primitives.azure_anomaly_detector.split_sequence#1": {
"sequence_size": 6000,
"overlap_size": 2640
},
"orion.primitives.azure_anomaly_detector.detect_anomalies#1": {
"subscription_key": subscription_key,
"endpoint": endpoint,
"overlap_size": 2640,
"interval": 21600,
"granularity": "hourly",
"custom_interval": 6
}
}
The split_sequence
primitive takes the signal and splits it into multiple signals based on the sequence_size
and overlap_size
. Since the method uses a rolling window sequence approach, we use the overlap_size
to maintain historical information when splitting the sequence.
It is custom to set the overlap_size
as the same value in both split_sequence
and detect_anomalies
primitives. In addition, we require the frequency of the signal to be recorded in timestamp interval, as well as convention based where granularity
refers to the aggregation unit (e.g. hourly, minutely, etc) and custom_interval
refers to the quantity (in this case, 6 hours).
Once we have the data and setup, we use the azure pipeline to analyze it and search for anomalies.
In order to do so, we will have import the orion.analysis.analyze
function and pass it
the loaded data and the path to the pipeline JSON that we want to use.
In this case, we will be using the azure.json
pipeline from inside the orion
folder.
The output will be a pandas.DataFrame
containing a table with the detected anomalies.
from orion.analysis import analyze
pipeline_path = 'azure'
if subscription_key and endpoint:
anomalies = analyze(pipeline_path, data, hyperparams=hyperparameters)