#!/usr/bin/env python # coding: utf-8 # # Azure Analysis Example # This is a demo notebook showing how to use **azure** pipeline on a signal using the `orion.analysis.analyze` function. For more information about the usage of microsoft's anomaly detection API, view their documentation [here](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/). # ## 1. Load the data # # In the first step, we load the signal that we want to process. # # To do so, we need to import the `orion.data.load_signal` function and call it passing # either the path to the CSV file or the name of the signal to fetch fromm the `s3 bucket`. # # In this case, we will be loading the `S-1`. # In[1]: from orion.data import load_signal signal_path = 'S-1' data = load_signal(signal_path) data.head() # ## 2. Setup the pipeline # # To use `azure` pipeline, we first need two important information: `subscription_key` and `endpoint`. In order to obtain them, you must setup an Anomaly Detection resource on Azure portal, follow the steps mentioned [here](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/quickstarts/client-libraries?pivots=programming-language-python&tabs=linux) to setup your resource instance. # # Once that's accomplished, update the hyperparameter dictionary specified to the values of your instance. # In[2]: # your subscription key and endpoint subscription_key = None endpoint = None # In[3]: hyperparameters = { "mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate#1": { "interval": 21600, }, "orion.primitives.azure_anomaly_detector.split_sequence#1": { "sequence_size": 6000, "overlap_size": 2640 }, "orion.primitives.azure_anomaly_detector.detect_anomalies#1": { "subscription_key": subscription_key, "endpoint": endpoint, "overlap_size": 2640, "interval": 21600, "granularity": "hourly", "custom_interval": 6 } } # The `split_sequence` primitive takes the signal and splits it into multiple signals based on the `sequence_size` and `overlap_size`. Since the method uses a rolling window sequence approach, we use the `overlap_size` to maintain historical information when splitting the sequence. # # It is custom to set the `overlap_size` as the same value in both `split_sequence` and `detect_anomalies` primitives. In addition, we require the frequency of the signal to be recorded in timestamp interval, as well as convention based where `granularity` refers to the aggregation unit (e.g. hourly, minutely, etc) and `custom_interval` refers to the quantity (in this case, 6 hours). # ## 3. Detect anomalies using azure pipeline # # Once we have the data and setup, we use the azure pipeline to analyze it and search for anomalies. # # In order to do so, we will have import the `orion.analysis.analyze` function and pass it # the loaded data and the path to the pipeline JSON that we want to use. # # In this case, we will be using the `azure.json` pipeline from inside the `orion` folder. # # The output will be a ``pandas.DataFrame`` containing a table with the detected anomalies. # In[4]: from orion.analysis import analyze pipeline_path = 'azure' if subscription_key and endpoint: anomalies = analyze(pipeline_path, data, hyperparams=hyperparameters)