Azure Analysis Example¶

This is a demo notebook showing how to use azure pipeline on a signal using the orion.analysis.analyze function. For more information about the usage of microsoft's anomaly detection API, view their documentation here.

1. Load the data¶

In the first step, we load the signal that we want to process.

To do so, we need to import the orion.data.load_signal function and call it passing either the path to the CSV file or the name of the signal to fetch fromm the s3 bucket.

In this case, we will be loading the S-1.

In [1]:

from orion.data import load_signal

signal_path = 'S-1'

data = load_signal(signal_path)
data.head()

Out[1]:

	timestamp	value
0	1222819200	-0.366359
1	1222840800	-0.394108
2	1222862400	0.403625
3	1222884000	-0.362759
4	1222905600	-0.370746

2. Setup the pipeline¶

To use azure pipeline, we first need two important information: subscription_key and endpoint. In order to obtain them, you must setup an Anomaly Detection resource on Azure portal, follow the steps mentioned here to setup your resource instance.

Once that's accomplished, update the hyperparameter dictionary specified to the values of your instance.

In [2]:

# your subscription key and endpoint

subscription_key = None
endpoint = None

In [3]:

hyperparameters = {
    "mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1": {
        "interval": 21600,
    },
    "orion.primitives.azure_anomaly_detector.split_sequence#1": {
        "sequence_size": 6000,
        "overlap_size": 2640
    },
    "orion.primitives.azure_anomaly_detector.detect_anomalies#1": {
        "subscription_key": subscription_key,
        "endpoint": endpoint,
        "overlap_size": 2640,
        "interval": 21600,
        "granularity": "hourly",
        "custom_interval": 6
    }
}

The split_sequence primitive takes the signal and splits it into multiple signals based on the sequence_size and overlap_size. Since the method uses a rolling window sequence approach, we use the overlap_size to maintain historical information when splitting the sequence.

It is custom to set the overlap_size as the same value in both split_sequence and detect_anomalies primitives. In addition, we require the frequency of the signal to be recorded in timestamp interval, as well as convention based where granularity refers to the aggregation unit (e.g. hourly, minutely, etc) and custom_interval refers to the quantity (in this case, 6 hours).

3. Detect anomalies using azure pipeline¶

Once we have the data and setup, we use the azure pipeline to analyze it and search for anomalies.

In order to do so, we will have import the orion.analysis.analyze function and pass it the loaded data and the path to the pipeline JSON that we want to use.

In this case, we will be using the azure.json pipeline from inside the orion folder.

The output will be a pandas.DataFrame containing a table with the detected anomalies.

In [4]:

from orion.analysis import analyze

pipeline_path = 'azure'

if subscription_key and endpoint:
    anomalies = analyze(pipeline_path, data, hyperparams=hyperparameters)