#!/usr/bin/env python
# coding: utf-8

# # Overview 
# 
# In the 10x series of notebooks, we will look at Time Series modeling in pycaret using univariate data and no exogenous variables. We will use the famous airline dataset for illustration. Our plan of action is as follows:
# 
# 1. Perform EDA on the dataset to extract valuable insight about the process generating the time series. **(COMPLETED)**
# 2. Model the dataset based on exploratory analysis (univariable model without exogenous variables). **(COMPLETED)**
# 3. Use an automated approach (AutoML) to improve the performance. **(Covered in this notebook)**

# In[1]:


# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"


# In[2]:


def what_is_installed():
    from pycaret import show_versions
    show_versions()

try:
    what_is_installed()
except ModuleNotFoundError:
    get_ipython().system('pip install pycaret')
    what_is_installed()


# In[3]:


import time
import numpy as np
import pandas as pd

from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment


# In[4]:


y = get_data('airline', verbose=False)


# In[5]:


# We want to forecast the next 12 months of data and we will use 3 fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3


# In[6]:


# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab, Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook saved size is reduced.
fig_kwargs = {
    # "renderer": "notebook",
    "renderer": "png",
    "width": 1000,
    "height": 600,
}


# # Auto Create
# 
# We have so many models to choose from. How do we know which ones perform the best. Let's see how we can do with `pycaret`.

# In[7]:


exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42)


# # Compare Models

# In[8]:


# Get the 3 best baseline models 
best_baseline_models = exp.compare_models(n_select=3)
best_baseline_models


# In[9]:


# We will save the metrics to be used in a later step.
compare_metrics = exp.pull()
# compare_metrics


# * Note that some models like BATS and TBATS are disabled by default. 
# * You can enable them by setting `turbo = False`
# 
# `best_baseline_models = exp.compare_models(n_select=3, turbo=False)`

# # Tune Best Models

# In[10]:


best_tuned_models = [exp.tune_model(model) for model in best_baseline_models]
best_tuned_models


# # Blend Best Models
# 
# We can achieve even better results sometimes if we combine results of several good models. This can be achieved using the `blend_model` functionality. There are several options available to blend such as the `mean`, `gmean` `median`, `min`, `max`. In addition, weights can be applied to the forecasts from the base learners. This is useful when we want to give more importance (weight) to models with a lower error for example. Please refer to the `blend_models` docstring for more information about the blending functionality.

# In[11]:


# help(exp.blend_models)


# Let's see the voting blender in action.

# In[12]:


# Get model weights to use
top_model_metrics = compare_metrics.iloc[0:3]['MAE']
display(top_model_metrics)

top_model_weights = 1 - top_model_metrics/top_model_metrics.sum()
display(top_model_weights)


# In[13]:


blender = exp.blend_models(best_tuned_models, method='mean', weights=top_model_weights.values.tolist())


# In[14]:


y_predict = exp.predict_model(blender)
print(y_predict)
exp.plot_model(estimator=blender)


# # Finalize Model

# In[15]:


final_model = exp.finalize_model(blender)
print(exp.predict_model(final_model))
exp.plot_model(final_model)


# # Save and Load Model

# In[16]:


_ = exp.save_model(final_model, "my_blender")


# In[17]:


loaded_exp = TSForecastingExperiment()
m = loaded_exp.load_model("my_blender")
# Predictions should be same as before the model was saved and loaded
loaded_exp.predict_model(m)


# **That's it for this notebook. Users can hopefully see how easy it was to use the automated approach to model this data and we were able to achieve reasonale results on par with (or even better than) the manual approach.**