In the 10x series of notebooks, we will look at Time Series modeling in pycaret using univariate data and no exogenous variables. We will use the famous airline dataset for illustration. Our plan of action is as follows:
# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"
def what_is_installed():
from pycaret import show_versions
show_versions()
try:
what_is_installed()
except ModuleNotFoundError:
!pip install pycaret
what_is_installed()
System: python: 3.9.16 (main, Jan 11 2023, 16:16:36) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\Nikhil\.conda\envs\pycaret_dev_sktime_16p1\python.exe machine: Windows-10-10.0.19044-SP0 PyCaret required dependencies: pip: 22.3.1 setuptools: 65.6.3 pycaret: 3.0.0rc9 IPython: 8.10.0 ipywidgets: 8.0.4 tqdm: 4.64.1 numpy: 1.23.5 pandas: 1.5.3 jinja2: 3.1.2 scipy: 1.10.0 joblib: 1.2.0 sklearn: 1.2.1 pyod: 1.0.8 imblearn: 0.10.1 category_encoders: 2.6.0 lightgbm: 3.3.5 numba: 0.56.4 requests: 2.28.2 matplotlib: 3.7.0 scikitplot: 0.3.7 yellowbrick: 1.5 plotly: 5.13.0 kaleido: 0.2.1 statsmodels: 0.13.5 sktime: 0.16.1 tbats: 1.1.2 pmdarima: 2.0.2 psutil: 5.9.4 PyCaret optional dependencies: shap: 0.41.0 interpret: Not installed umap: Not installed pandas_profiling: Not installed explainerdashboard: Not installed autoviz: Not installed fairlearn: Not installed xgboost: Not installed catboost: Not installed kmodes: Not installed mlxtend: Not installed statsforecast: Not installed tune_sklearn: Not installed ray: Not installed hyperopt: Not installed optuna: Not installed skopt: Not installed mlflow: 2.1.1 gradio: Not installed fastapi: Not installed uvicorn: Not installed m2cgen: Not installed evidently: Not installed fugue: 0.8.0 streamlit: Not installed prophet: 1.1.2
import time
import numpy as np
import pandas as pd
from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment
y = get_data('airline', verbose=False)
# We want to forecast the next 12 months of data and we will use 3 fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3
# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab, Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook saved size is reduced.
fig_kwargs = {
# "renderer": "notebook",
"renderer": "png",
"width": 1000,
"height": 400,
}
Let's look at how users can customize various steps in the modeling process
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42, verbose=False)
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffcb6085e0>
model = exp.create_model("arima")
cutoff | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | 1956-12 | 0.4462 | 0.4933 | 13.0286 | 16.1485 | 0.0327 | 0.0334 | 0.9151 |
1 | 1957-12 | 0.5983 | 0.5993 | 18.2920 | 20.3442 | 0.0506 | 0.0491 | 0.8916 |
2 | 1958-12 | 1.0044 | 0.9280 | 28.6999 | 30.1669 | 0.0671 | 0.0697 | 0.7964 |
Mean | NaT | 0.6830 | 0.6735 | 20.0069 | 22.2199 | 0.0501 | 0.0507 | 0.8677 |
SD | NaT | 0.2356 | 0.1851 | 6.5117 | 5.8746 | 0.0141 | 0.0148 | 0.0513 |
# Default prediction
exp.predict_model(model)
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | ARIMA | 0.4955 | 0.5395 | 15.0867 | 18.6380 | 0.0312 | 0.0312 | 0.9373 |
y_pred | |
---|---|
1960-01 | 420.8767 |
1960-02 | 397.1069 |
1960-03 | 456.4335 |
1960-04 | 442.6482 |
1960-05 | 463.5822 |
1960-06 | 513.0988 |
1960-07 | 587.0872 |
1960-08 | 596.4580 |
1960-09 | 499.1383 |
1960-10 | 442.0694 |
1960-11 | 396.2036 |
1960-12 | 438.5023 |
# Increased forecast horizon to 2 years instead of the original 1 year
exp.predict_model(model, fh=24)
y_pred | |
---|---|
1960-01 | 420.8767 |
1960-02 | 397.1069 |
1960-03 | 456.4335 |
1960-04 | 442.6482 |
1960-05 | 463.5822 |
1960-06 | 513.0988 |
1960-07 | 587.0872 |
1960-08 | 596.4580 |
1960-09 | 499.1383 |
1960-10 | 442.0694 |
1960-11 | 396.2036 |
1960-12 | 438.5023 |
1961-01 | 453.8109 |
1961-02 | 429.5811 |
1961-03 | 488.5351 |
1961-04 | 474.4479 |
1961-05 | 495.1374 |
1961-06 | 544.4560 |
1961-07 | 618.2840 |
1961-08 | 627.5248 |
1961-09 | 530.0999 |
1961-10 | 472.9458 |
1961-11 | 427.0110 |
1961-12 | 469.2538 |
# With Prediction Interval (default coverage = 0.9)
exp.predict_model(model, return_pred_int=True)
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | ARIMA | 0.4955 | 0.5395 | 15.0867 | 18.6380 | 0.0312 | 0.0312 | 0.9373 |
y_pred | lower | upper | |
---|---|---|---|
1960-01 | 420.8767 | 403.9466 | 437.8067 |
1960-02 | 397.1069 | 375.3199 | 418.8939 |
1960-03 | 456.4335 | 431.9786 | 480.8884 |
1960-04 | 442.6482 | 416.5909 | 468.7055 |
1960-05 | 463.5822 | 436.5252 | 490.6392 |
1960-06 | 513.0988 | 485.4054 | 540.7921 |
1960-07 | 587.0872 | 558.9843 | 615.1902 |
1960-08 | 596.4580 | 568.0895 | 624.8264 |
1960-09 | 499.1383 | 470.5969 | 527.6796 |
1960-10 | 442.0694 | 413.4152 | 470.7236 |
1960-11 | 396.2036 | 367.4756 | 424.9316 |
1960-12 | 438.5023 | 409.7260 | 467.2786 |
# With Prediction Interval (custom coverage = 0.8, corresponding to lower and upper quantiles of 0.1 and 0.9 respectively)
# The point estimate remains the same as before.
# But the lower and upper intervals are now narrower since we are OK with a lower coverage.
exp.predict_model(model, return_pred_int=True, coverage=0.8)
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | ARIMA | 0.4955 | 0.5395 | 15.0867 | 18.6380 | 0.0312 | 0.0312 | 0.9373 |
y_pred | lower | upper | |
---|---|---|---|
1960-01 | 420.8767 | 407.6860 | 434.0673 |
1960-02 | 397.1069 | 380.1320 | 414.0818 |
1960-03 | 456.4335 | 437.3800 | 475.4870 |
1960-04 | 442.6482 | 422.3463 | 462.9502 |
1960-05 | 463.5822 | 442.5013 | 484.6631 |
1960-06 | 513.0988 | 491.5221 | 534.6754 |
1960-07 | 587.0872 | 565.1915 | 608.9830 |
1960-08 | 596.4580 | 574.3553 | 618.5606 |
1960-09 | 499.1383 | 476.9009 | 521.3756 |
1960-10 | 442.0694 | 419.7441 | 464.3946 |
1960-11 | 396.2036 | 373.8208 | 418.5863 |
1960-12 | 438.5023 | 416.0819 | 460.9227 |
Sometimes, users may wish to get the point estimates at values other than the mean/median. In such cases, they can specify the alpha (quantile) value for the point estimate directly.
NOTE: Not all models support this feature. If this is used with models that do not support it, an error is raised. If you want to only use models that support this feature, you must set point_alpha
to a floating point value in the setup
stage (see below).
# With Custom Point Estimate (alpha = 0.7)
# The point estimate is now higher than before since we are asking for the
# 70% percentile as the point estimate), vs. mean/median before.
exp.predict_model(model, alpha=0.7)
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | ARIMA | 0.4335 | 0.5168 | 13.2004 | 17.8549 | 0.0292 | 0.0287 | 0.9425 |
y_pred | |
---|---|
1960-01 | 426.2742 |
1960-02 | 404.0529 |
1960-03 | 464.2301 |
1960-04 | 450.9556 |
1960-05 | 472.2083 |
1960-06 | 521.9277 |
1960-07 | 596.0468 |
1960-08 | 605.5022 |
1960-09 | 508.2376 |
1960-10 | 451.2047 |
1960-11 | 405.3624 |
1960-12 | 447.6766 |
# For models that do not produce a prediction interval --> returns NA values
model_no_pred_int = exp.create_model("lr_cds_dt")
exp.predict_model(model_no_pred_int, return_pred_int=True)
cutoff | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | 1956-12 | 0.7137 | 0.8440 | 20.8412 | 27.6262 | 0.0513 | 0.0533 | 0.7516 |
1 | 1957-12 | 0.6678 | 0.7038 | 20.4172 | 23.8918 | 0.0557 | 0.0539 | 0.8505 |
2 | 1958-12 | 0.7198 | 0.7630 | 20.5669 | 24.8024 | 0.0457 | 0.0471 | 0.8624 |
Mean | NaT | 0.7004 | 0.7702 | 20.6084 | 25.4401 | 0.0509 | 0.0514 | 0.8215 |
SD | NaT | 0.0232 | 0.0575 | 0.1756 | 1.5898 | 0.0041 | 0.0031 | 0.0497 |
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | LinearRegression | 0.7993 | 0.9275 | 24.3376 | 32.0418 | 0.0475 | 0.0493 | 0.8147 |
y_pred | lower | upper | |
---|---|---|---|
1960-01 | 399.5740 | NaN | NaN |
1960-02 | 384.6911 | NaN | NaN |
1960-03 | 420.8922 | NaN | NaN |
1960-04 | 412.8696 | NaN | NaN |
1960-05 | 438.3520 | NaN | NaN |
1960-06 | 494.9357 | NaN | NaN |
1960-07 | 556.8907 | NaN | NaN |
1960-08 | 558.1492 | NaN | NaN |
1960-09 | 503.6881 | NaN | NaN |
1960-10 | 449.0433 | NaN | NaN |
1960-11 | 405.1229 | NaN | NaN |
1960-12 | 431.7701 | NaN | NaN |
Similar to the prediction customization, we can customize the forecast plots as well.
# Regular Plot
exp.plot_model(model)
# Modified Plot (zoom into the plot to see differences between the 2 plots)
exp.plot_model(model, data_kwargs={"alpha": 0.7, "coverage": 0.8})
In some use cases, it is important to have prediction intervals. Users may wish to restrict the modeling to only those models that support prediction intervals.
point_alpha
to any floating point value restricts the models to only those that provide a prediction interval. The value that is specified corresponds to the quantile of the point prediction that is returned.COVERAGE
.COVERAGE
gives the percentage of actuals that are within the prediction interval.exp = TSForecastingExperiment()
# We can see that specifying a value for point_alpha enables `Enforce Prediction Interval` in the grid (and limits the models).
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, point_alpha=0.5)
exp.models()
Description | Value | |
---|---|---|
0 | session_id | 3833 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (132, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | True |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [12, 24, 36, 11, 48] |
18 | Significant Seasonal Period(s) | [12, 24, 36, 11, 48] |
19 | Significant Seasonal Period(s) without Harmonics | [48, 36, 11] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [12] |
24 | Primary Seasonality | 12 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 1 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | 95ff |
Name | Reference | Turbo | |
---|---|---|---|
ID | |||
naive | Naive Forecaster | sktime.forecasting.naive.NaiveForecaster | True |
grand_means | Grand Means Forecaster | sktime.forecasting.naive.NaiveForecaster | True |
snaive | Seasonal Naive Forecaster | sktime.forecasting.naive.NaiveForecaster | True |
arima | ARIMA | sktime.forecasting.arima.ARIMA | True |
auto_arima | Auto ARIMA | sktime.forecasting.arima.AutoARIMA | True |
ets | ETS | sktime.forecasting.ets.AutoETS | True |
theta | Theta Forecaster | sktime.forecasting.theta.ThetaForecaster | True |
tbats | TBATS | sktime.forecasting.tbats.TBATS | False |
bats | BATS | sktime.forecasting.bats.BATS | False |
prophet | Prophet | pycaret.containers.models.time_series.ProphetP... | False |
best_model = exp.compare_models()
# # To enable slower models such as prophet, BATS and TBATS, add turbo=False
# best_model = exp.compare_models(turbo=False)
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | COVERAGE | TT (Sec) | |
---|---|---|---|---|---|---|---|---|---|---|
ets | ETS | 0.5865 | 0.6145 | 17.2184 | 20.2857 | 0.0435 | 0.0440 | 0.8909 | 0.6667 | 0.1500 |
arima | ARIMA | 0.6830 | 0.6735 | 20.0069 | 22.2199 | 0.0501 | 0.0507 | 0.8677 | 0.6389 | 0.0733 |
auto_arima | Auto ARIMA | 0.7181 | 0.7114 | 21.0297 | 23.4661 | 0.0525 | 0.0531 | 0.8509 | 0.6944 | 2.9400 |
theta | Theta Forecaster | 0.9729 | 1.0306 | 28.3192 | 33.8639 | 0.0670 | 0.0700 | 0.6710 | 0.6389 | 0.0400 |
snaive | Seasonal Naive Forecaster | 1.1479 | 1.0945 | 33.3611 | 35.9139 | 0.0832 | 0.0879 | 0.6072 | 0.8333 | 0.0200 |
naive | Naive Forecaster | 2.3599 | 2.7612 | 69.0278 | 91.0322 | 0.1569 | 0.1792 | -1.2216 | 0.7778 | 1.3900 |
grand_means | Grand Means Forecaster | 5.5306 | 5.2596 | 162.4117 | 173.6492 | 0.4000 | 0.5075 | -7.0462 | 0.4167 | 0.0233 |
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='sliding', verbose=False)
exp.plot_model(plot="cv")
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='expanding', verbose=False)
exp.plot_model(plot="cv")
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='rolling', verbose=False)
exp.plot_model(plot="cv")
Sometimes, there are not enough data points available to perform the experiment. In such cases, pycaret will warn you accordingly.
try:
exp = TSForecastingExperiment()
exp.setup(data=y[:30], fh=12, fold=3, fig_kwargs=fig_kwargs)
except ValueError as error:
print(error)
Not Enough Data Points, set a lower number of folds or fh
try:
exp = TSForecastingExperiment()
exp.setup(data=y[:30], fh=6, fold=3, fig_kwargs=fig_kwargs)
except ValueError as error:
print(error)
Description | Value | |
---|---|---|
0 | session_id | 5965 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (30, 1) |
5 | Transformed data shape | (30, 1) |
6 | Transformed train set shape | (24, 1) |
7 | Transformed test set shape | (6, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [] |
18 | Significant Seasonal Period(s) | [1] |
19 | Significant Seasonal Period(s) without Harmonics | [1] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [1] |
24 | Primary Seasonality | 1 |
25 | Seasonality Present | False |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | a170 |
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42)
Description | Value | |
---|---|---|
0 | session_id | 42 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (132, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [12, 24, 36, 11, 48] |
18 | Significant Seasonal Period(s) | [12, 24, 36, 11, 48] |
19 | Significant Seasonal Period(s) without Harmonics | [48, 36, 11] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [12] |
24 | Primary Seasonality | 12 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 1 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | a819 |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffd3016790>
model = exp.create_model("lr_cds_dt")
cutoff | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | 1956-12 | 0.7137 | 0.8440 | 20.8412 | 27.6262 | 0.0513 | 0.0533 | 0.7516 |
1 | 1957-12 | 0.6678 | 0.7038 | 20.4172 | 23.8918 | 0.0557 | 0.0539 | 0.8505 |
2 | 1958-12 | 0.7198 | 0.7630 | 20.5669 | 24.8024 | 0.0457 | 0.0471 | 0.8624 |
Mean | NaT | 0.7004 | 0.7702 | 20.6084 | 25.4401 | 0.0509 | 0.0514 | 0.8215 |
SD | NaT | 0.0232 | 0.0575 | 0.1756 | 1.5898 | 0.0041 | 0.0031 | 0.0497 |
# Random Grid Search (default)
tuned_model = exp.tune_model(model)
print(model)
print(tuned_model)
cutoff | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | 1956-12 | 0.4373 | 0.5021 | 12.7700 | 16.4341 | 0.0327 | 0.0336 | 0.9121 |
1 | 1957-12 | 0.5497 | 0.6410 | 16.8061 | 21.7622 | 0.0457 | 0.0441 | 0.8759 |
2 | 1958-12 | 0.6689 | 0.6846 | 19.1128 | 22.2542 | 0.0462 | 0.0476 | 0.8892 |
Mean | NaT | 0.5520 | 0.6092 | 16.2296 | 20.1502 | 0.0415 | 0.0418 | 0.8924 |
SD | NaT | 0.0946 | 0.0778 | 2.6213 | 2.6353 | 0.0062 | 0.0059 | 0.0149 |
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 4.8s finished
BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(n_jobs=-1), sp=12, window_length=12) BaseCdsDtForecaster(deseasonal_model='multiplicative', fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(fit_intercept=False, n_jobs=-1), sp=24, window_length=23)
exp.plot_model([model, tuned_model], data_kwargs={"labels": ["Original", "Tuned"]})
# Fixed Grid Search
tuned_model = exp.tune_model(model, search_algorithm="grid")
print(model)
print(tuned_model)
cutoff | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | 1956-12 | 0.7053 | 0.8345 | 20.5954 | 27.3146 | 0.0507 | 0.0527 | 0.7571 |
1 | 1957-12 | 0.6793 | 0.7113 | 20.7679 | 24.1471 | 0.0568 | 0.0549 | 0.8472 |
2 | 1958-12 | 0.6699 | 0.7197 | 19.1424 | 23.3933 | 0.0425 | 0.0436 | 0.8776 |
Mean | NaT | 0.6848 | 0.7551 | 20.1686 | 24.9517 | 0.0500 | 0.0504 | 0.8273 |
SD | NaT | 0.0149 | 0.0562 | 0.7290 | 1.6990 | 0.0059 | 0.0049 | 0.0511 |
Fitting 3 folds for each of 2 candidates, totalling 6 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 6 out of 6 | elapsed: 0.8s finished
BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(n_jobs=-1), sp=12, window_length=12) BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(fit_intercept=False, n_jobs=-1), sp=12)
Observations:
choose_better=True
by default.choose_better=False
tuned_model = exp.tune_model(model, search_algorithm="grid", choose_better=False)
print(model)
print(tuned_model)
cutoff | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | 1956-12 | 0.7053 | 0.8345 | 20.5954 | 27.3146 | 0.0507 | 0.0527 | 0.7571 |
1 | 1957-12 | 0.6793 | 0.7113 | 20.7679 | 24.1471 | 0.0568 | 0.0549 | 0.8472 |
2 | 1958-12 | 0.6699 | 0.7197 | 19.1424 | 23.3933 | 0.0425 | 0.0436 | 0.8776 |
Mean | NaT | 0.6848 | 0.7551 | 20.1686 | 24.9517 | 0.0500 | 0.0504 | 0.8273 |
SD | NaT | 0.0149 | 0.0562 | 0.7290 | 1.6990 | 0.0059 | 0.0049 | 0.0511 |
Fitting 3 folds for each of 2 candidates, totalling 6 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 6 out of 6 | elapsed: 0.8s finished
BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(n_jobs=-1), sp=12, window_length=12) BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(fit_intercept=False, n_jobs=-1), sp=12)
Sometimes, there are time constraints on the tuning so users may wish to adjust the number of hyperparameters that are tried using the n_iter
argument.
tuned_model = exp.tune_model(model, n_iter=5)
print(model)
print(tuned_model)
cutoff | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | |
---|---|---|---|---|---|---|---|---|
0 | 1956-12 | 0.5447 | 0.5667 | 15.9077 | 18.5515 | 0.0447 | 0.0437 | 0.8880 |
1 | 1957-12 | 1.2555 | 1.2590 | 38.3847 | 42.7414 | 0.1072 | 0.1004 | 0.5214 |
2 | 1958-12 | 0.6878 | 0.7155 | 19.6530 | 23.2573 | 0.0442 | 0.0454 | 0.8790 |
Mean | NaT | 0.8293 | 0.8471 | 24.6484 | 28.1834 | 0.0654 | 0.0632 | 0.7628 |
SD | NaT | 0.3070 | 0.2975 | 9.8326 | 10.4718 | 0.0296 | 0.0263 | 0.1707 |
Fitting 3 folds for each of 5 candidates, totalling 15 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers. [Parallel(n_jobs=-1)]: Done 15 out of 15 | elapsed: 1.9s finished
BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(n_jobs=-1), sp=12, window_length=12) BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]}, n_jobs=1)], regressor=LinearRegression(n_jobs=-1), sp=12, window_length=12)
More information about tunuing in pycaret time series can be found here:
Sometimes the plotly renderer if not detected correctly for the environment. In such cases, the users can manually specify the render in pycaret
exp = TSForecastingExperiment()
exp.setup(
data=y,
fh=fh,
fold=fold,
# fig_kwargs={'renderer': 'notebook'},
verbose=False
)
exp.plot_model(plot="cv")
Users can also specify the renderer for specific plot types
exp.plot_model(fig_kwargs={'renderer': 'png'})
All these options are shown below
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs)
Description | Value | |
---|---|---|
0 | session_id | 8371 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (132, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [12, 24, 36, 11, 48] |
18 | Significant Seasonal Period(s) | [12, 24, 36, 11, 48] |
19 | Significant Seasonal Period(s) without Harmonics | [48, 36, 11] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [12] |
24 | Primary Seasonality | 12 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 1 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | d6ea |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffd2c8f340>
Observations:
As specified above, users can change the seasonal period manually if they want based on their EDA. e.g. lets change it to 36 and see what happens
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, seasonal_period=36, fig_kwargs=fig_kwargs)
Description | Value | |
---|---|---|
0 | session_id | 641 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (132, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | 36 |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | user_defined |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [36] |
18 | Significant Seasonal Period(s) | [1] |
19 | Significant Seasonal Period(s) without Harmonics | [1] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [1] |
24 | Primary Seasonality | 1 |
25 | Seasonality Present | False |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | a308 |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffd2e40ac0>
Observations:
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, seasonal_period=52, fig_kwargs=fig_kwargs)
Description | Value | |
---|---|---|
0 | session_id | 955 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (132, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | 52 |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | user_defined |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [52] |
18 | Significant Seasonal Period(s) | [1] |
19 | Significant Seasonal Period(s) without Harmonics | [1] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [1] |
24 | Primary Seasonality | 1 |
25 | Seasonality Present | False |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | 7330 |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffd31954f0>
Even then, if the user want to force pycaret to use a certain seasonal period, they can do that by specifying the ignore_seasonality_test
argument
exp.setup(data=y, fh=fh, fold=fold, seasonal_period=52, ignore_seasonality_test=True, fig_kwargs=fig_kwargs)
Description | Value | |
---|---|---|
0 | session_id | 6483 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (132, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | 52 |
14 | Ignore Seasonality Test | True |
15 | Seasonality Detection Algo | user_defined |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [52] |
18 | Significant Seasonal Period(s) | [52] |
19 | Significant Seasonal Period(s) without Harmonics | [52] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [52] |
24 | Primary Seasonality | 52 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | a5ba |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffd31954f0>
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, sp_detection="index", fig_kwargs=fig_kwargs)
Description | Value | |
---|---|---|
0 | session_id | 1913 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (132, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | index |
16 | Max Period to Consider | 60 |
17 | Seasonal Period(s) Tested | [12] |
18 | Significant Seasonal Period(s) | [12] |
19 | Significant Seasonal Period(s) without Harmonics | [12] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [12] |
24 | Primary Seasonality | 12 |
25 | Seasonality Present | True |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 1 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | 8cde |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffd2dc1430>
Observations:
In some cases, when the frequency can not be derived from the index (see example below), user needs to switch to one of the other methods (auto detection or manualy specifying period).
y = get_data("1", folder="time_series/ar1")
x | |
---|---|
0 | 173.786244 |
1 | 174.850941 |
2 | 175.435101 |
3 | 174.807199 |
4 | 174.872474 |
try:
exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, sp_detection="index", fig_kwargs=fig_kwargs)
except ValueError as error:
print(error)
The index of your 'data' is of type '<class 'pandas.core.indexes.range.RangeIndex'>'. If the 'data' index is not of one of the following types: <class 'pandas.core.indexes.period.PeriodIndex'>, <class 'pandas.core.indexes.datetimes.DatetimeIndex'>, then 'seasonal_period' must be provided. Refer to docstring for options.
Observations:
eda = TSForecastingExperiment()
eda.setup(data=y, fh=fh, fold=fold, sp_detection="auto", max_sp_to_consider = 40, fig_kwargs=fig_kwargs)
Description | Value | |
---|---|---|
0 | session_id | 785 |
1 | Target | x |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (340, 1) |
5 | Transformed data shape | (340, 1) |
6 | Transformed train set shape | (328, 1) |
7 | Transformed test set shape | (12, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | User Defined Seasonal Period(s) | None |
14 | Ignore Seasonality Test | False |
15 | Seasonality Detection Algo | auto |
16 | Max Period to Consider | 40 |
17 | Seasonal Period(s) Tested | [] |
18 | Significant Seasonal Period(s) | [1] |
19 | Significant Seasonal Period(s) without Harmonics | [1] |
20 | Remove Harmonics | False |
21 | Harmonics Order Method | harmonic_max |
22 | Num Seasonalities to Use | 1 |
23 | All Seasonalities to Use | [1] |
24 | Primary Seasonality | 1 |
25 | Seasonality Present | False |
26 | Target Strictly Positive | True |
27 | Target White Noise | No |
28 | Recommended d | 1 |
29 | Recommended Seasonal D | 0 |
30 | Preprocess | False |
31 | CPU Jobs | -1 |
32 | Use GPU | False |
33 | Log Experiment | False |
34 | Experiment Name | ts-default-name |
35 | USI | 5add |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1ffd2d693d0>
That's it for this notebook. If you would like to see other demonstrations, feel free to open an issue on GitHub.