Now that we have gone through a manual process of modeling our dataset, let's see if we can replicate this using an Automated workflow. As a reminder, our plan of action was as follows:
# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"
def what_is_installed():
from pycaret import show_versions
show_versions()
try:
what_is_installed()
except ModuleNotFoundError:
!pip install pycaret
what_is_installed()
System: python: 3.9.16 (main, Jan 11 2023, 16:16:36) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\Nikhil\.conda\envs\pycaret_dev_sktime_16p1\python.exe machine: Windows-10-10.0.19044-SP0 PyCaret required dependencies: pip: 22.3.1 setuptools: 65.6.3 pycaret: 3.0.0rc9 IPython: 8.10.0 ipywidgets: 8.0.4 tqdm: 4.64.1 numpy: 1.23.5 pandas: 1.5.3 jinja2: 3.1.2 scipy: 1.10.0 joblib: 1.2.0 sklearn: 1.2.1 pyod: 1.0.8 imblearn: 0.10.1 category_encoders: 2.6.0 lightgbm: 3.3.5 numba: 0.56.4 requests: 2.28.2 matplotlib: 3.7.0 scikitplot: 0.3.7 yellowbrick: 1.5 plotly: 5.13.0 kaleido: 0.2.1 statsmodels: 0.13.5 sktime: 0.16.1 tbats: 1.1.2 pmdarima: 2.0.2 psutil: 5.9.4 PyCaret optional dependencies: shap: 0.41.0 interpret: Not installed umap: Not installed pandas_profiling: Not installed explainerdashboard: Not installed autoviz: Not installed fairlearn: Not installed xgboost: Not installed catboost: Not installed kmodes: Not installed mlxtend: Not installed statsforecast: Not installed tune_sklearn: Not installed ray: Not installed hyperopt: Not installed optuna: Not installed skopt: Not installed mlflow: 2.1.1 gradio: Not installed fastapi: Not installed uvicorn: Not installed m2cgen: Not installed evidently: Not installed fugue: 0.8.0 streamlit: Not installed prophet: 1.1.2
import numpy as np
import pandas as pd
from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment
# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab, Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook saved size is reduced.
global_fig_settings = {
# "renderer": "notebook",
"renderer": "png",
"width": 1000,
"height": 600,
}
data = get_data("airquality", verbose=False)
# Limiting the data for demonstration purposes.
data = data.iloc[-720:]
data["index"] = pd.to_datetime(data["Date"] + " " + data["Time"])
data.drop(columns=["Date", "Time"], inplace=True)
data.replace(-200, np.nan, inplace=True)
data.set_index("index", inplace=True)
target = "CO(GT)"
exog_vars = ['NOx(GT)', 'PT08.S3(NOx)', 'RH']
include = [target] + exog_vars
data = data[include]
data.head()
CO(GT) | NOx(GT) | PT08.S3(NOx) | RH | |
---|---|---|---|---|
index | ||||
2005-03-05 15:00:00 | 1.5 | 180.0 | 820.0 | 28.3 |
2005-03-05 16:00:00 | 1.8 | 255.0 | 751.0 | 29.7 |
2005-03-05 17:00:00 | 2.0 | 251.0 | 721.0 | 38.7 |
2005-03-05 18:00:00 | 1.9 | 258.0 | 695.0 | 56.3 |
2005-03-05 19:00:00 | 2.5 | 344.0 | 654.0 | 57.9 |
FH=48
metric = "mase"
exclude = ["auto_arima", "bats", "tbats", "lar_cds_dt", "par_cds_dt"]
exp_auto = TSForecastingExperiment()
# enforce_exogenous=False --> Use multivariate forecasting when model supports it, else use univariate forecasting
exp_auto.setup(
data=data, target=target, fh=FH, enforce_exogenous=False,
numeric_imputation_target="ffill", numeric_imputation_exogenous="ffill",
fig_kwargs=global_fig_settings, session_id=42
)
Description | Value | |
---|---|---|
0 | session_id | 42 |
1 | Target | CO(GT) |
2 | Approach | Univariate |
3 | Exogenous Variables | Present |
4 | Original data shape | (720, 4) |
5 | Transformed data shape | (720, 4) |
6 | Transformed train set shape | (672, 4) |
7 | Transformed test set shape | (48, 4) |
8 | Rows with missing values | 3.8% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [24, 23, 25, 2, 48, 22, 47, 49, 26, 3, 12, 11, 21, 13, 46, 10, 50] |
18 | Significant Seasonal Period(s) | [24, 23, 25, 2, 48, 22, 47, 49, 26, 3, 12, 11, 21, 13, 46, 10, 50] |
19 | Significant Seasonal Period(s) without Harmonics | [48, 46, 50, 22, 47, 49, 26, 21] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [24] |
24 | Primary Seasonality | 24 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 0 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | True |
31 | Numerical Imputation (Target) | ffill |
32 | Transformation (Target) | None |
33 | Scaling (Target) | None |
34 | Feature Engineering (Target) - Reduced Regression | False |
35 | Numerical Imputation (Exogenous) | ffill |
36 | Transformation (Exogenous) | None |
37 | Scaling (Exogenous) | None |
38 | CPU Jobs | -1 |
39 | Use GPU | False |
40 | Log Experiment | False |
41 | Experiment Name | ts-default-name |
42 | USI | d7b9 |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x27849347c40>
# # Check available models ----
# exp_auto_noexo.models()
# Include slower models like Prophet (turbo=False), but exclude some specific models ----
best = exp_auto.compare_models(sort=metric, turbo=False, exclude=exclude)
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | TT (Sec) | |
---|---|---|---|---|---|---|---|---|---|
arima | ARIMA | 0.2509 | 0.2302 | 0.1810 | 0.2449 | 0.1443 | 0.1523 | 0.8499 | 0.9367 |
prophet | Prophet | 0.3079 | 0.2647 | 0.2226 | 0.2823 | 0.1943 | 0.2027 | 0.8501 | 0.5800 |
br_cds_dt | Bayesian Ridge w/ Cond. Deseasonalize & Detrending | 0.8609 | 0.8199 | 0.6211 | 0.8730 | 0.4675 | 0.4450 | -0.7053 | 1.1967 |
ridge_cds_dt | Ridge w/ Cond. Deseasonalize & Detrending | 0.8647 | 0.8218 | 0.6238 | 0.8750 | 0.4714 | 0.4491 | -0.7106 | 1.2200 |
lr_cds_dt | Linear w/ Cond. Deseasonalize & Detrending | 0.8651 | 0.8220 | 0.6240 | 0.8752 | 0.4718 | 0.4494 | -0.7111 | 1.4067 |
snaive | Seasonal Naive Forecaster | 0.9672 | 0.9659 | 0.6972 | 1.0275 | 0.4645 | 0.3643 | -1.8616 | 1.3867 |
theta | Theta Forecaster | 0.9871 | 0.8962 | 0.7146 | 0.9574 | 0.4639 | 0.4412 | -0.2349 | 0.0600 |
en_cds_dt | Elastic Net w/ Cond. Deseasonalize & Detrending | 0.9948 | 0.9519 | 0.7181 | 1.0130 | 0.5099 | 0.3702 | -1.7375 | 1.3767 |
lasso_cds_dt | Lasso w/ Cond. Deseasonalize & Detrending | 1.0081 | 0.9598 | 0.7275 | 1.0213 | 0.5178 | 0.3741 | -1.8353 | 1.1533 |
llar_cds_dt | Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending | 1.0081 | 0.9598 | 0.7275 | 1.0213 | 0.5178 | 0.3741 | -1.8354 | 1.1967 |
omp_cds_dt | Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending | 1.0119 | 0.9608 | 0.7303 | 1.0223 | 0.5209 | 0.3738 | -1.8667 | 1.1067 |
huber_cds_dt | Huber w/ Cond. Deseasonalize & Detrending | 1.0141 | 0.9402 | 0.7314 | 1.0009 | 0.5600 | 0.4724 | -1.3001 | 1.1867 |
knn_cds_dt | K Neighbors w/ Cond. Deseasonalize & Detrending | 1.0397 | 0.9920 | 0.7505 | 1.0557 | 0.5770 | 0.4194 | -1.9501 | 1.1100 |
gbr_cds_dt | Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.0614 | 0.8726 | 0.7667 | 0.9294 | 0.6470 | 0.6835 | -0.7997 | 1.5100 |
naive | Naive Forecaster | 1.0848 | 0.9259 | 0.7861 | 0.9895 | 0.6125 | 0.5160 | -0.3784 | 2.0967 |
croston | Croston | 1.1033 | 0.9167 | 0.7966 | 0.9775 | 0.7744 | 0.5053 | -0.6474 | 0.0500 |
et_cds_dt | Extra Trees w/ Cond. Deseasonalize & Detrending | 1.1264 | 0.9404 | 0.8140 | 1.0019 | 0.6625 | 0.7238 | -1.0762 | 1.6300 |
rf_cds_dt | Random Forest w/ Cond. Deseasonalize & Detrending | 1.1411 | 0.9626 | 0.8240 | 1.0255 | 0.6866 | 0.6809 | -1.1643 | 1.9033 |
ada_cds_dt | AdaBoost w/ Cond. Deseasonalize & Detrending | 1.1815 | 0.9690 | 0.8545 | 1.0328 | 0.7225 | 0.7716 | -1.0861 | 1.3300 |
lightgbm_cds_dt | Light Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.1975 | 0.9889 | 0.8647 | 1.0531 | 0.7274 | 0.7735 | -1.4445 | 1.3067 |
grand_means | Grand Means Forecaster | 1.3017 | 1.0074 | 0.9392 | 1.0725 | 0.9892 | 0.5728 | -1.6494 | 1.4100 |
dt_cds_dt | Decision Tree w/ Cond. Deseasonalize & Detrending | 1.3115 | 1.1741 | 0.9486 | 1.2521 | 0.7326 | 0.7303 | -2.1020 | 1.1233 |
polytrend | Polynomial Trend Forecaster | 1.4885 | 1.1214 | 1.0745 | 1.1939 | 1.1644 | 0.6238 | -2.3456 | 0.0500 |
exp_smooth | Exponential Smoothing | 1.7959 | 1.6246 | 1.2987 | 1.7354 | 0.8998 | 0.5140 | -3.1090 | 0.1700 |
ets | ETS | 2.3059 | 2.1201 | 1.6615 | 2.2557 | 1.2981 | 0.6474 | -10.2278 | 1.7233 |
exp_auto.plot_model(best)
final_auto_model = exp_auto.finalize_model(best)
def safe_predict(exp, model):
"""Prediction wrapper for demo purposes."""
try:
future_preds = exp.predict_model(model)
except ValueError as exception:
print(exception)
exo_vars = exp.exogenous_variables
print(f"{len(exo_vars)} exogenous variables (X) needed in order to make future predictions:\n{exo_vars}")
exog_exps = []
exog_models = []
for exog_var in exog_vars:
exog_exp = TSForecastingExperiment()
exog_exp.setup(
data=data[exog_var], fh=FH,
numeric_imputation_target="ffill", numeric_imputation_exogenous="ffill",
fig_kwargs=global_fig_settings, session_id=42
)
# Users can customize how to model future exogenous variables i.e. add
# more steps and models to potentially get better models at the expense
# of higher modeling time.
best = exog_exp.compare_models(
sort=metric, include=["arima", "ets", "exp_smooth", "theta", "lightgbm_cds_dt",]
)
final_exog_model = exog_exp.finalize_model(best)
exog_exps.append(exog_exp)
exog_models.append(final_exog_model)
# Step 2: Get future predictions for exog variables ----
future_exog = [
exog_exp.predict_model(exog_model)
for exog_exp, exog_model in zip(exog_exps, exog_models)
]
future_exog = pd.concat(future_exog, axis=1)
future_exog.columns = exog_vars
future_preds = exp.predict_model(model, X=future_exog)
return future_preds
future_preds = safe_predict(exp_auto, final_auto_model)
future_preds.plot()
Model was trained with exogenous variables but you have not passed any for predictions. Please pass exogenous variables to make predictions. 3 exogenous variables (X) needed in order to make future predictions: ['NOx(GT)', 'PT08.S3(NOx)', 'RH']
Description | Value | |
---|---|---|
0 | session_id | 42 |
1 | Target | NOx(GT) |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (720, 1) |
5 | Transformed data shape | (720, 1) |
6 | Transformed train set shape | (672, 1) |
7 | Transformed test set shape | (48, 1) |
8 | Rows with missing values | 0.8% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [24, 48, 23, 25, 47, 49, 13, 12, 36, 11, 35, 60] |
18 | Significant Seasonal Period(s) | [24, 48, 23, 25, 47, 49, 13, 12, 36, 11, 35, 60] |
19 | Significant Seasonal Period(s) without Harmonics | [48, 23, 25, 47, 49, 13, 60, 36, 11, 35] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [24] |
24 | Primary Seasonality | 24 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | True |
31 | Numerical Imputation (Target) | ffill |
32 | Transformation (Target) | None |
33 | Scaling (Target) | None |
34 | Feature Engineering (Target) - Reduced Regression | False |
35 | CPU Jobs | -1 |
36 | Use GPU | False |
37 | Log Experiment | False |
38 | Experiment Name | ts-default-name |
39 | USI | 95be |
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | TT (Sec) | |
---|---|---|---|---|---|---|---|---|---|
arima | ARIMA | 0.8406 | 0.9158 | 87.3689 | 133.0642 | 0.4273 | 0.3443 | -1.3072 | 0.3200 |
exp_smooth | Exponential Smoothing | 0.8954 | 0.8400 | 93.0132 | 121.9760 | 0.4828 | 0.5917 | -0.9311 | 0.1533 |
theta | Theta Forecaster | 1.0279 | 0.9437 | 107.4620 | 137.6886 | 0.5192 | 0.4990 | -0.4072 | 0.0600 |
lightgbm_cds_dt | Light Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.2021 | 1.1033 | 124.6215 | 160.1069 | 0.6995 | 0.5078 | -2.5717 | 1.1700 |
ets | ETS | 1.6466 | 1.5514 | 171.0757 | 225.4193 | 0.9284 | 0.5548 | -4.3206 | 1.7100 |
Description | Value | |
---|---|---|
0 | session_id | 42 |
1 | Target | PT08.S3(NOx) |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (720, 1) |
5 | Transformed data shape | (720, 1) |
6 | Transformed train set shape | (672, 1) |
7 | Transformed test set shape | (48, 1) |
8 | Rows with missing values | 0.1% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [24, 48, 25, 23, 47, 49, 12, 36, 11] |
18 | Significant Seasonal Period(s) | [24, 48, 25, 23, 47, 49, 12, 36, 11] |
19 | Significant Seasonal Period(s) without Harmonics | [48, 25, 23, 47, 49, 36, 11] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [24] |
24 | Primary Seasonality | 24 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | True |
31 | Numerical Imputation (Target) | ffill |
32 | Transformation (Target) | None |
33 | Scaling (Target) | None |
34 | Feature Engineering (Target) - Reduced Regression | False |
35 | CPU Jobs | -1 |
36 | Use GPU | False |
37 | Log Experiment | False |
38 | Experiment Name | ts-default-name |
39 | USI | 7c62 |
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | TT (Sec) | |
---|---|---|---|---|---|---|---|---|---|
exp_smooth | Exponential Smoothing | 1.2435 | 1.2056 | 126.5383 | 158.9241 | 0.1738 | 0.1695 | -0.0211 | 0.1600 |
ets | ETS | 1.3630 | 1.3140 | 138.7259 | 173.2545 | 0.1906 | 0.1879 | -0.2091 | 0.7067 |
theta | Theta Forecaster | 1.3716 | 1.3079 | 139.5929 | 172.4272 | 0.1909 | 0.1878 | -0.1963 | 0.0500 |
arima | ARIMA | 1.3929 | 1.3245 | 141.6775 | 174.6211 | 0.1792 | 0.1953 | -0.3985 | 0.4100 |
lightgbm_cds_dt | Light Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.6778 | 1.5491 | 170.7442 | 204.2588 | 0.2197 | 0.2541 | -0.7666 | 1.1667 |
Description | Value | |
---|---|---|
0 | session_id | 42 |
1 | Target | RH |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (720, 1) |
5 | Transformed data shape | (720, 1) |
6 | Transformed train set shape | (672, 1) |
7 | Transformed test set shape | (48, 1) |
8 | Rows with missing values | 0.1% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [2, 3, 24, 23, 25, 22, 4, 26, 21, 48, 47, 49, 46, 5, 27, 50, 20, 45, 51, 28, 19, 6, 44, 52] |
18 | Significant Seasonal Period(s) | [2, 3, 24, 23, 25, 22, 4, 26, 21, 48, 47, 49, 46, 5, 27, 50, 20, 45, 51, 28, 19, 6, 44, 52] |
19 | Significant Seasonal Period(s) without Harmonics | [52, 51, 48, 46, 50, 44, 21, 47, 49, 27, 20, 45, 28, 19] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [2] |
24 | Primary Seasonality | 2 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 0 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | True |
31 | Numerical Imputation (Target) | ffill |
32 | Transformation (Target) | None |
33 | Scaling (Target) | None |
34 | Feature Engineering (Target) - Reduced Regression | False |
35 | CPU Jobs | -1 |
36 | Use GPU | False |
37 | Log Experiment | False |
38 | Experiment Name | ts-default-name |
39 | USI | 9d33 |
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | TT (Sec) | |
---|---|---|---|---|---|---|---|---|---|
theta | Theta Forecaster | 1.6218 | 1.4765 | 11.3749 | 13.1578 | 0.2481 | 0.2286 | -0.0585 | 0.0500 |
arima | ARIMA | 1.8001 | 1.6165 | 12.6310 | 14.4108 | 0.2548 | 0.2523 | -0.2695 | 0.0800 |
lightgbm_cds_dt | Light Gradient Boosting w/ Cond. Deseasonalize & Detrending | 2.7797 | 2.6115 | 19.4667 | 23.2519 | 0.5241 | 0.3609 | -4.8711 | 0.5633 |
exp_smooth | Exponential Smoothing | 5.2972 | 4.7592 | 37.2423 | 42.4918 | 0.7188 | 0.9298 | -10.5261 | 0.1400 |
ets | ETS | 5.3235 | 4.7812 | 37.4259 | 42.6872 | 0.7228 | 0.9349 | -10.5911 | 0.1233 |
<Axes: >