#!/usr/bin/env python
# coding: utf-8

# # Azure Analysis Example

# This is a demo notebook showing how to use **azure** pipeline on a signal using the `orion.analysis.analyze` function. For more information about the usage of microsoft's anomaly detection API, view their documentation [here](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/).

# ## 1. Load the data
# 
# In the first step, we load the signal that we want to process.
# 
# To do so, we need to import the `orion.data.load_signal` function and call it passing
# either the path to the CSV file or the name of the signal to fetch fromm the `s3 bucket`.
# 
# In this case, we will be loading the `S-1`.

# In[1]:


from orion.data import load_signal

signal_path = 'S-1'

data = load_signal(signal_path)
data.head()


# ## 2. Setup the pipeline
# 
# To use `azure` pipeline, we first need two important information: `subscription_key` and `endpoint`. In order to obtain them, you must setup an Anomaly Detection resource on Azure portal, follow the steps mentioned [here](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/quickstarts/client-libraries?pivots=programming-language-python&tabs=linux) to setup your resource instance.
# 
# Once that's accomplished, update the hyperparameter dictionary specified to the values of your instance. 

# In[2]:


# your subscription key and endpoint

subscription_key = None
endpoint = None


# In[3]:


hyperparameters = {
    "mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate#1": {
        "interval": 21600,
    },
    "orion.primitives.azure_anomaly_detector.split_sequence#1": {
        "sequence_size": 6000,
        "overlap_size": 2640
    },
    "orion.primitives.azure_anomaly_detector.detect_anomalies#1": {
        "subscription_key": subscription_key,
        "endpoint": endpoint,
        "overlap_size": 2640,
        "interval": 21600,
        "granularity": "hourly",
        "custom_interval": 6
    }
}


# The `split_sequence` primitive takes the signal and splits it into multiple signals based on the `sequence_size` and `overlap_size`. Since the method uses a rolling window sequence approach, we use the `overlap_size` to maintain historical information when splitting the sequence.
# 
# It is custom to set the `overlap_size` as the same value in both `split_sequence` and `detect_anomalies` primitives. In addition, we require the frequency of the signal to be recorded in timestamp interval, as well as convention based where `granularity` refers to the aggregation unit (e.g. hourly, minutely, etc) and `custom_interval` refers to the quantity (in this case, 6 hours).

# ## 3. Detect anomalies using azure pipeline
# 
# Once we have the data and setup, we use the azure pipeline to analyze it and search for anomalies.
# 
# In order to do so, we will have import the `orion.analysis.analyze` function and pass it
# the loaded data and the path to the pipeline JSON that we want to use.
# 
# In this case, we will be using the `azure.json` pipeline from inside the `orion` folder.
# 
# The output will be a ``pandas.DataFrame`` containing a table with the detected anomalies.

# In[4]:


from orion.analysis import analyze

pipeline_path = 'azure'

if subscription_key and endpoint:
    anomalies = analyze(pipeline_path, data, hyperparams=hyperparameters)