Now that we have gone through a manual process of modeling our dataset, let's see if we can replicate this using an Automated workflow. As a reminder, our plan of action was as follows:

Perform EDA on the dataset to extract valuable insight about the process generating the time series (COMPLETED).
Build a baseline model (univariable model without exogenous variables) for benchmarking purposes (COMPLETED).
Build a univariate model with all exogenous variables to check best possible performance (COMPLETED).
Evaluate the model with exogenous variables and discuss any potential issues (COMPLETED).
Overcome issues identified above (COMPLETED).
Make future predictions with the best model (COMPLETED).
Replicate flow with Automated Time Series Modeling (AutoML) (Covered in this notebook)

In [1]:

# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"

In [2]:

def what_is_installed():
    from pycaret import show_versions
    show_versions()

try:
    what_is_installed()
except ModuleNotFoundError:
    !pip install pycaret
    what_is_installed()

System:
    python: 3.9.16 (main, Jan 11 2023, 16:16:36) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\Nikhil\.conda\envs\pycaret_dev_sktime_16p1\python.exe
   machine: Windows-10-10.0.19044-SP0

PyCaret required dependencies:
                 pip: 22.3.1
          setuptools: 65.6.3
             pycaret: 3.0.0rc9
             IPython: 8.10.0
          ipywidgets: 8.0.4
                tqdm: 4.64.1
               numpy: 1.23.5
              pandas: 1.5.3
              jinja2: 3.1.2
               scipy: 1.10.0
              joblib: 1.2.0
             sklearn: 1.2.1
                pyod: 1.0.8
            imblearn: 0.10.1
   category_encoders: 2.6.0
            lightgbm: 3.3.5
               numba: 0.56.4
            requests: 2.28.2
          matplotlib: 3.7.0
          scikitplot: 0.3.7
         yellowbrick: 1.5
              plotly: 5.13.0
             kaleido: 0.2.1
         statsmodels: 0.13.5
              sktime: 0.16.1
               tbats: 1.1.2
            pmdarima: 2.0.2
              psutil: 5.9.4

PyCaret optional dependencies:
                shap: 0.41.0
           interpret: Not installed
                umap: Not installed
    pandas_profiling: Not installed
  explainerdashboard: Not installed
             autoviz: Not installed
           fairlearn: Not installed
             xgboost: Not installed
            catboost: Not installed
              kmodes: Not installed
             mlxtend: Not installed
       statsforecast: Not installed
        tune_sklearn: Not installed
                 ray: Not installed
            hyperopt: Not installed
              optuna: Not installed
               skopt: Not installed
              mlflow: 2.1.1
              gradio: Not installed
             fastapi: Not installed
             uvicorn: Not installed
              m2cgen: Not installed
           evidently: Not installed
               fugue: 0.8.0
           streamlit: Not installed
             prophet: 1.1.2

In [3]:

import numpy as np
import pandas as pd
from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment

In [4]:

# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab, Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook saved size is reduced.
global_fig_settings = {
    # "renderer": "notebook",
    "renderer": "png",
    "width": 1000,
    "height": 600,
}

In [5]:

data = get_data("airquality", verbose=False)

# Limiting the data for demonstration purposes.
data = data.iloc[-720:]
data["index"] = pd.to_datetime(data["Date"] + " " + data["Time"])
data.drop(columns=["Date", "Time"], inplace=True)
data.replace(-200, np.nan, inplace=True)
data.set_index("index", inplace=True)

target = "CO(GT)"
exog_vars = ['NOx(GT)', 'PT08.S3(NOx)', 'RH']
include = [target] + exog_vars
data = data[include]
data.head()

Out[5]:

	CO(GT)	NOx(GT)	PT08.S3(NOx)	RH
index
2005-03-05 15:00:00	1.5	180.0	820.0	28.3
2005-03-05 16:00:00	1.8	255.0	751.0	29.7
2005-03-05 17:00:00	2.0	251.0	721.0	38.7
2005-03-05 18:00:00	1.9	258.0	695.0	56.3
2005-03-05 19:00:00	2.5	344.0	654.0	57.9

Step 7: AutoML¶

In [6]:

FH=48
metric = "mase"
exclude = ["auto_arima", "bats", "tbats", "lar_cds_dt", "par_cds_dt"]

Step 7A: Univariate AutoML with and without Exogenous Variables¶

In [7]:

exp_auto = TSForecastingExperiment()

# enforce_exogenous=False --> Use multivariate forecasting when model supports it, else use univariate forecasting
exp_auto.setup(
    data=data, target=target, fh=FH, enforce_exogenous=False,
    numeric_imputation_target="ffill", numeric_imputation_exogenous="ffill",
    fig_kwargs=global_fig_settings, session_id=42
)

	Description	Value
0	session_id	42
1	Target	CO(GT)
2	Approach	Univariate
3	Exogenous Variables	Present
4	Original data shape	(720, 4)
5	Transformed data shape	(720, 4)
6	Transformed train set shape	(672, 4)
7	Transformed test set shape	(48, 4)
8	Rows with missing values	3.8%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	User Defined Seasonal Period(s)	None
14	Ignore Seasonality Test	False
15	Seasonality Detection Algo	auto
16	Max Period to Consider	60
17	Seasonal Period(s) Tested	[24, 23, 25, 2, 48, 22, 47, 49, 26, 3, 12, 11, 21, 13, 46, 10, 50]
18	Significant Seasonal Period(s)	[24, 23, 25, 2, 48, 22, 47, 49, 26, 3, 12, 11, 21, 13, 46, 10, 50]
19	Significant Seasonal Period(s) without Harmonics	[48, 46, 50, 22, 47, 49, 26, 21]
20	Remove Harmonics	False
21	Harmonics Order Method	harmonic_max
22	Num Seasonalities to Use	1
23	All Seasonalities to Use	[24]
24	Primary Seasonality	24
25	Seasonality Present	True
26	Target Strictly Positive	True
27	Target White Noise	No
28	Recommended d	0
29	Recommended Seasonal D	0
30	Preprocess	True
31	Numerical Imputation (Target)	ffill
32	Transformation (Target)	None
33	Scaling (Target)	None
34	Feature Engineering (Target) - Reduced Regression	False
35	Numerical Imputation (Exogenous)	ffill
36	Transformation (Exogenous)	None
37	Scaling (Exogenous)	None
38	CPU Jobs	-1
39	Use GPU	False
40	Log Experiment	False
41	Experiment Name	ts-default-name
42	USI	d7b9

Out[7]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x27849347c40>

In [8]:

# # Check available models ----
# exp_auto_noexo.models()

In [9]:

# Include slower models like Prophet (turbo=False), but exclude some specific models ----
best = exp_auto.compare_models(sort=metric, turbo=False, exclude=exclude)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
arima	ARIMA	0.2509	0.2302	0.1810	0.2449	0.1443	0.1523	0.8499	0.9367
prophet	Prophet	0.3079	0.2647	0.2226	0.2823	0.1943	0.2027	0.8501	0.5800
br_cds_dt	Bayesian Ridge w/ Cond. Deseasonalize & Detrending	0.8609	0.8199	0.6211	0.8730	0.4675	0.4450	-0.7053	1.1967
ridge_cds_dt	Ridge w/ Cond. Deseasonalize & Detrending	0.8647	0.8218	0.6238	0.8750	0.4714	0.4491	-0.7106	1.2200
lr_cds_dt	Linear w/ Cond. Deseasonalize & Detrending	0.8651	0.8220	0.6240	0.8752	0.4718	0.4494	-0.7111	1.4067
snaive	Seasonal Naive Forecaster	0.9672	0.9659	0.6972	1.0275	0.4645	0.3643	-1.8616	1.3867
theta	Theta Forecaster	0.9871	0.8962	0.7146	0.9574	0.4639	0.4412	-0.2349	0.0600
en_cds_dt	Elastic Net w/ Cond. Deseasonalize & Detrending	0.9948	0.9519	0.7181	1.0130	0.5099	0.3702	-1.7375	1.3767
lasso_cds_dt	Lasso w/ Cond. Deseasonalize & Detrending	1.0081	0.9598	0.7275	1.0213	0.5178	0.3741	-1.8353	1.1533
llar_cds_dt	Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending	1.0081	0.9598	0.7275	1.0213	0.5178	0.3741	-1.8354	1.1967
omp_cds_dt	Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending	1.0119	0.9608	0.7303	1.0223	0.5209	0.3738	-1.8667	1.1067
huber_cds_dt	Huber w/ Cond. Deseasonalize & Detrending	1.0141	0.9402	0.7314	1.0009	0.5600	0.4724	-1.3001	1.1867
knn_cds_dt	K Neighbors w/ Cond. Deseasonalize & Detrending	1.0397	0.9920	0.7505	1.0557	0.5770	0.4194	-1.9501	1.1100
gbr_cds_dt	Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.0614	0.8726	0.7667	0.9294	0.6470	0.6835	-0.7997	1.5100
naive	Naive Forecaster	1.0848	0.9259	0.7861	0.9895	0.6125	0.5160	-0.3784	2.0967
croston	Croston	1.1033	0.9167	0.7966	0.9775	0.7744	0.5053	-0.6474	0.0500
et_cds_dt	Extra Trees w/ Cond. Deseasonalize & Detrending	1.1264	0.9404	0.8140	1.0019	0.6625	0.7238	-1.0762	1.6300
rf_cds_dt	Random Forest w/ Cond. Deseasonalize & Detrending	1.1411	0.9626	0.8240	1.0255	0.6866	0.6809	-1.1643	1.9033
ada_cds_dt	AdaBoost w/ Cond. Deseasonalize & Detrending	1.1815	0.9690	0.8545	1.0328	0.7225	0.7716	-1.0861	1.3300
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.1975	0.9889	0.8647	1.0531	0.7274	0.7735	-1.4445	1.3067
grand_means	Grand Means Forecaster	1.3017	1.0074	0.9392	1.0725	0.9892	0.5728	-1.6494	1.4100
dt_cds_dt	Decision Tree w/ Cond. Deseasonalize & Detrending	1.3115	1.1741	0.9486	1.2521	0.7326	0.7303	-2.1020	1.1233
polytrend	Polynomial Trend Forecaster	1.4885	1.1214	1.0745	1.1939	1.1644	0.6238	-2.3456	0.0500
exp_smooth	Exponential Smoothing	1.7959	1.6246	1.2987	1.7354	0.8998	0.5140	-3.1090	0.1700
ets	ETS	2.3059	2.1201	1.6615	2.2557	1.2981	0.6474	-10.2278	1.7233

In [10]:

exp_auto.plot_model(best)

In [11]:

final_auto_model = exp_auto.finalize_model(best)

In [12]:

def safe_predict(exp, model):
    """Prediction wrapper for demo purposes."""
    try: 
        future_preds = exp.predict_model(model)
    except ValueError as exception:
        print(exception)
        exo_vars = exp.exogenous_variables
        print(f"{len(exo_vars)} exogenous variables (X) needed in order to make future predictions:\n{exo_vars}")
        
        
        exog_exps = []
        exog_models = []
        for exog_var in exog_vars:
            exog_exp = TSForecastingExperiment()
            exog_exp.setup(
                data=data[exog_var], fh=FH,
                numeric_imputation_target="ffill", numeric_imputation_exogenous="ffill",
                fig_kwargs=global_fig_settings, session_id=42
            )

            # Users can customize how to model future exogenous variables i.e. add
            # more steps and models to potentially get better models at the expense
            # of higher modeling time.
            best = exog_exp.compare_models(
                sort=metric, include=["arima", "ets", "exp_smooth", "theta", "lightgbm_cds_dt",]        
            )
            final_exog_model = exog_exp.finalize_model(best)

            exog_exps.append(exog_exp)
            exog_models.append(final_exog_model)

        # Step 2: Get future predictions for exog variables ----
        future_exog = [
            exog_exp.predict_model(exog_model)
            for exog_exp, exog_model in zip(exog_exps, exog_models)
        ]
        future_exog = pd.concat(future_exog, axis=1)
        future_exog.columns = exog_vars
        
        future_preds = exp.predict_model(model, X=future_exog)
    
    return future_preds      

In [13]:

future_preds = safe_predict(exp_auto, final_auto_model)
future_preds.plot()

Model was trained with exogenous variables but you have not passed any for predictions. Please pass exogenous variables to make predictions.
3 exogenous variables (X) needed in order to make future predictions:
['NOx(GT)', 'PT08.S3(NOx)', 'RH']

	Description	Value
0	session_id	42
1	Target	NOx(GT)
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(720, 1)
5	Transformed data shape	(720, 1)
6	Transformed train set shape	(672, 1)
7	Transformed test set shape	(48, 1)
8	Rows with missing values	0.8%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	User Defined Seasonal Period(s)	None
14	Ignore Seasonality Test	False
15	Seasonality Detection Algo	auto
16	Max Period to Consider	60
17	Seasonal Period(s) Tested	[24, 48, 23, 25, 47, 49, 13, 12, 36, 11, 35, 60]
18	Significant Seasonal Period(s)	[24, 48, 23, 25, 47, 49, 13, 12, 36, 11, 35, 60]
19	Significant Seasonal Period(s) without Harmonics	[48, 23, 25, 47, 49, 13, 60, 36, 11, 35]
20	Remove Harmonics	False
21	Harmonics Order Method	harmonic_max
22	Num Seasonalities to Use	1
23	All Seasonalities to Use	[24]
24	Primary Seasonality	24
25	Seasonality Present	True
26	Target Strictly Positive	True
27	Target White Noise	No
28	Recommended d	1
29	Recommended Seasonal D	0
30	Preprocess	True
31	Numerical Imputation (Target)	ffill
32	Transformation (Target)	None
33	Scaling (Target)	None
34	Feature Engineering (Target) - Reduced Regression	False
35	CPU Jobs	-1
36	Use GPU	False
37	Log Experiment	False
38	Experiment Name	ts-default-name
39	USI	95be

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
arima	ARIMA	0.8406	0.9158	87.3689	133.0642	0.4273	0.3443	-1.3072	0.3200
exp_smooth	Exponential Smoothing	0.8954	0.8400	93.0132	121.9760	0.4828	0.5917	-0.9311	0.1533
theta	Theta Forecaster	1.0279	0.9437	107.4620	137.6886	0.5192	0.4990	-0.4072	0.0600
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2021	1.1033	124.6215	160.1069	0.6995	0.5078	-2.5717	1.1700
ets	ETS	1.6466	1.5514	171.0757	225.4193	0.9284	0.5548	-4.3206	1.7100

	Description	Value
0	session_id	42
1	Target	PT08.S3(NOx)
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(720, 1)
5	Transformed data shape	(720, 1)
6	Transformed train set shape	(672, 1)
7	Transformed test set shape	(48, 1)
8	Rows with missing values	0.1%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	User Defined Seasonal Period(s)	None
14	Ignore Seasonality Test	False
15	Seasonality Detection Algo	auto
16	Max Period to Consider	60
17	Seasonal Period(s) Tested	[24, 48, 25, 23, 47, 49, 12, 36, 11]
18	Significant Seasonal Period(s)	[24, 48, 25, 23, 47, 49, 12, 36, 11]
19	Significant Seasonal Period(s) without Harmonics	[48, 25, 23, 47, 49, 36, 11]
20	Remove Harmonics	False
21	Harmonics Order Method	harmonic_max
22	Num Seasonalities to Use	1
23	All Seasonalities to Use	[24]
24	Primary Seasonality	24
25	Seasonality Present	True
26	Target Strictly Positive	True
27	Target White Noise	No
28	Recommended d	1
29	Recommended Seasonal D	0
30	Preprocess	True
31	Numerical Imputation (Target)	ffill
32	Transformation (Target)	None
33	Scaling (Target)	None
34	Feature Engineering (Target) - Reduced Regression	False
35	CPU Jobs	-1
36	Use GPU	False
37	Log Experiment	False
38	Experiment Name	ts-default-name
39	USI	7c62

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
exp_smooth	Exponential Smoothing	1.2435	1.2056	126.5383	158.9241	0.1738	0.1695	-0.0211	0.1600
ets	ETS	1.3630	1.3140	138.7259	173.2545	0.1906	0.1879	-0.2091	0.7067
theta	Theta Forecaster	1.3716	1.3079	139.5929	172.4272	0.1909	0.1878	-0.1963	0.0500
arima	ARIMA	1.3929	1.3245	141.6775	174.6211	0.1792	0.1953	-0.3985	0.4100
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.6778	1.5491	170.7442	204.2588	0.2197	0.2541	-0.7666	1.1667

	Description	Value
0	session_id	42
1	Target	RH
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(720, 1)
5	Transformed data shape	(720, 1)
6	Transformed train set shape	(672, 1)
7	Transformed test set shape	(48, 1)
8	Rows with missing values	0.1%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	User Defined Seasonal Period(s)	None
14	Ignore Seasonality Test	False
15	Seasonality Detection Algo	auto
16	Max Period to Consider	60
17	Seasonal Period(s) Tested	[2, 3, 24, 23, 25, 22, 4, 26, 21, 48, 47, 49, 46, 5, 27, 50, 20, 45, 51, 28, 19, 6, 44, 52]
18	Significant Seasonal Period(s)	[2, 3, 24, 23, 25, 22, 4, 26, 21, 48, 47, 49, 46, 5, 27, 50, 20, 45, 51, 28, 19, 6, 44, 52]
19	Significant Seasonal Period(s) without Harmonics	[52, 51, 48, 46, 50, 44, 21, 47, 49, 27, 20, 45, 28, 19]
20	Remove Harmonics	False
21	Harmonics Order Method	harmonic_max
22	Num Seasonalities to Use	1
23	All Seasonalities to Use	[2]
24	Primary Seasonality	2
25	Seasonality Present	True
26	Target Strictly Positive	True
27	Target White Noise	No
28	Recommended d	0
29	Recommended Seasonal D	0
30	Preprocess	True
31	Numerical Imputation (Target)	ffill
32	Transformation (Target)	None
33	Scaling (Target)	None
34	Feature Engineering (Target) - Reduced Regression	False
35	CPU Jobs	-1
36	Use GPU	False
37	Log Experiment	False
38	Experiment Name	ts-default-name
39	USI	9d33

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
theta	Theta Forecaster	1.6218	1.4765	11.3749	13.1578	0.2481	0.2286	-0.0585	0.0500
arima	ARIMA	1.8001	1.6165	12.6310	14.4108	0.2548	0.2523	-0.2695	0.0800
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	2.7797	2.6115	19.4667	23.2519	0.5241	0.3609	-4.8711	0.5633
exp_smooth	Exponential Smoothing	5.2972	4.7592	37.2423	42.4918	0.7188	0.9298	-10.5261	0.1400
ets	ETS	5.3235	4.7812	37.4259	42.6872	0.7228	0.9349	-10.5911	0.1233

Out[13]:

<Axes: >