Overview¶

In the 10x series of notebooks, we will look at Time Series modeling in pycaret using univariate data and no exogenous variables. We will use the famous airline dataset for illustration. Our plan of action is as follows:

Perform EDA on the dataset to extract valuable insight about the process generating the time series. (COMPLETED)
Model the dataset based on exploratory analysis (univariable model without exogenous variables). (COMPLETED)
Use an automated approach (AutoML) to improve the performance. (COMPLETED)
User customizations, potential pitfalls and how to overcome them. (Covered in this notebook)

In [1]:

# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"

In [2]:

def what_is_installed():
    from pycaret import show_versions
    show_versions()

try:
    what_is_installed()
except ModuleNotFoundError:
    !pip install pycaret
    what_is_installed()

System:
    python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\Nikhil\.conda\envs\pycaret_dev_sktime_0p11_2\python.exe
   machine: Windows-10-10.0.19044-SP0

PyCaret required dependencies:

C:\Users\Nikhil\.conda\envs\pycaret_dev_sktime_0p11_2\lib\site-packages\_distutils_hack\__init__.py:30: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

                 pip: 21.2.2
          setuptools: 61.2.0
             pycaret: 3.0.0
             ipython: Not installed
          ipywidgets: 7.7.0
               numpy: 1.21.6
              pandas: 1.4.2
              jinja2: 3.1.2
               scipy: 1.8.0
              joblib: 1.1.0
             sklearn: 1.0.2
                pyod: Installed but version unavailable
            imblearn: 0.9.0
   category_encoders: 2.4.1
            lightgbm: 3.3.2
               numba: 0.55.1
            requests: 2.27.1
          matplotlib: 3.5.2
          scikitplot: 0.3.7
         yellowbrick: 1.4
              plotly: 5.8.0
             kaleido: 0.2.1
         statsmodels: 0.13.2
              sktime: 0.11.4
               tbats: Installed but version unavailable
            pmdarima: 1.8.5

PyCaret optional dependencies:
                shap: Not installed
           interpret: Not installed
                umap: Not installed
    pandas_profiling: Not installed
  explainerdashboard: Not installed
             autoviz: Not installed
           fairlearn: Not installed
             xgboost: Not installed
            catboost: Not installed
              kmodes: Not installed
             mlxtend: Not installed
       statsforecast: 0.5.5
        tune_sklearn: Not installed
                 ray: Not installed
            hyperopt: Not installed
              optuna: Not installed
               skopt: Not installed
              mlflow: 1.25.1
              gradio: Not installed
             fastapi: Not installed
             uvicorn: Not installed
              m2cgen: Not installed
           evidently: Not installed
                nltk: Not installed
            pyLDAvis: Not installed
              gensim: Not installed
               spacy: Not installed
           wordcloud: Not installed
            textblob: Not installed
              psutil: 5.9.0
               fugue: Not installed
           streamlit: Not installed
             prophet: Not installed

In [3]:

import time
import numpy as np
import pandas as pd

from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment

In [4]:

y = get_data('airline', verbose=False)

In [5]:

# We want to forecast the next 12 months of data and we will use 3 fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3

In [6]:

# Global Plot Settings
fig_kwargs={'renderer': 'notebook'}

User Customizations¶

Let's look at how users can customize various steps in the modeling process

Prediction Customization¶

Forecast Horizon¶

Sometimes users may wish to customize the forecast horizon after the model has been created. This can be done as follows.

In [7]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42, verbose=False)

Out[7]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1e886404850>

In [8]:

model = exp.create_model("arima")

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1956-12	0.4462	0.4933	13.0286	16.1485	0.0327	0.0334	0.9151
1	1957-12	0.5983	0.5993	18.2920	20.3442	0.0506	0.0491	0.8916
2	1958-12	1.0044	0.9280	28.6999	30.1669	0.0671	0.0697	0.7964
Mean	nan	0.6830	0.6735	20.0069	22.2199	0.0501	0.0507	0.8677
SD	nan	0.2356	0.1851	6.5117	5.8746	0.0141	0.0148	0.0513

In [9]:

# Default prediction
exp.predict_model(model)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	ARIMA	0.4955	0.5395	15.0867	18.6380	0.0312	0.0312	0.9373

Out[9]:

	y_pred
1960-01	420.8767
1960-02	397.1069
1960-03	456.4335
1960-04	442.6482
1960-05	463.5822
1960-06	513.0988
1960-07	587.0872
1960-08	596.4580
1960-09	499.1383
1960-10	442.0694
1960-11	396.2036
1960-12	438.5023

In [10]:

# Increased forecast horizon to 2 years instead of the original 1 year
exp.predict_model(model, fh=24)

Out[10]:

	y_pred
1960-01	420.8767
1960-02	397.1069
1960-03	456.4335
1960-04	442.6482
1960-05	463.5822
1960-06	513.0988
1960-07	587.0872
1960-08	596.4580
1960-09	499.1383
1960-10	442.0694
1960-11	396.2036
1960-12	438.5023
1961-01	453.8109
1961-02	429.5811
1961-03	488.5351
1961-04	474.4479
1961-05	495.1374
1961-06	544.4560
1961-07	618.2840
1961-08	627.5248
1961-09	530.0999
1961-10	472.9458
1961-11	427.0110
1961-12	469.2538

Prediction Interval¶

NOTES:¶

When prediction intervals are requested, the default coverage = 0.9 corresponding to 90% coverage.
Coverage is symmetrical around the median (alpha = 0.5). Hence a coverage of 0.9 corresponds to lower interval = 0.05 and an upper interval of 0.95 to give a total coverage between lower and upper interval = 0.9.

In [11]:

# With Prediction Interval (default coverage = 0.9)
exp.predict_model(model, return_pred_int=True)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	ARIMA	0.4955	0.5395	15.0867	18.6380	0.0312	0.0312	0.9373

Out[11]:

	y_pred	lower	upper
1960-01	420.8767	403.9466	437.8067
1960-02	397.1069	375.3199	418.8939
1960-03	456.4335	431.9786	480.8884
1960-04	442.6482	416.5909	468.7055
1960-05	463.5822	436.5252	490.6392
1960-06	513.0988	485.4054	540.7921
1960-07	587.0872	558.9843	615.1902
1960-08	596.4580	568.0895	624.8264
1960-09	499.1383	470.5969	527.6796
1960-10	442.0694	413.4152	470.7236
1960-11	396.2036	367.4756	424.9315
1960-12	438.5023	409.7260	467.2786

In [12]:

# With Prediction Interval (custom coverage = 0.8, corresponding to lower and upper quantiles of 0.1 and 0.9 respectively)
# The point estimate remains the same as before.
# But the lower and upper intervals are now narrower since we are OK with a lower coverage.
exp.predict_model(model, return_pred_int=True, coverage=0.8)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	ARIMA	0.4955	0.5395	15.0867	18.6380	0.0312	0.0312	0.9373

Out[12]:

	y_pred	lower	upper
1960-01	420.8767	407.6860	434.0673
1960-02	397.1069	380.1320	414.0818
1960-03	456.4335	437.3800	475.4870
1960-04	442.6482	422.3463	462.9502
1960-05	463.5822	442.5013	484.6631
1960-06	513.0988	491.5221	534.6754
1960-07	587.0872	565.1915	608.9830
1960-08	596.4580	574.3553	618.5606
1960-09	499.1383	476.9009	521.3756
1960-10	442.0694	419.7441	464.3946
1960-11	396.2036	373.8208	418.5863
1960-12	438.5023	416.0819	460.9227

Sometimes, users may wish to get the point estimates at values other than the mean/median. In such cases, they can specify the alpha (quantile) value for the point estimate directly.

NOTE: Not all models support this feature. If this is used with models that do not support it, an error is raised. If you want to only use models that support this feature, you must set point_alpha to a floating point value in the setup stage (see below).

In [13]:

# With Custom Point Estimate (alpha = 0.7)
# The point estimate is now higher than before since we are asking for the
# 70% percentile as the point estimate), vs. mean/median before.
exp.predict_model(model, alpha=0.7)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	ARIMA	0.4335	0.5168	13.2004	17.8549	0.0292	0.0287	0.9425

Out[13]:

	y_pred
1960-01	426.2742
1960-02	404.0529
1960-03	464.2301
1960-04	450.9556
1960-05	472.2083
1960-06	521.9277
1960-07	596.0468
1960-08	605.5022
1960-09	508.2376
1960-10	451.2047
1960-11	405.3624
1960-12	447.6766

In [14]:

# For models that do not produce a prediction interval --> returns NA values
model_no_pred_int = exp.create_model("lr_cds_dt")
exp.predict_model(model_no_pred_int, return_pred_int=True)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1956-12	0.7137	0.8440	20.8412	27.6262	0.0513	0.0533	0.7516
1	1957-12	0.6678	0.7038	20.4172	23.8918	0.0557	0.0539	0.8505
2	1958-12	0.7198	0.7630	20.5669	24.8024	0.0457	0.0471	0.8624
Mean	nan	0.7004	0.7702	20.6084	25.4401	0.0509	0.0514	0.8215
SD	nan	0.0232	0.0575	0.1756	1.5898	0.0041	0.0031	0.0497

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	LinearRegression	0.7993	0.9275	24.3376	32.0418	0.0475	0.0493	0.8147

Out[14]:

	y_pred	lower	upper
1960-01	399.5740	NaN	NaN
1960-02	384.6911	NaN	NaN
1960-03	420.8922	NaN	NaN
1960-04	412.8696	NaN	NaN
1960-05	438.3520	NaN	NaN
1960-06	494.9357	NaN	NaN
1960-07	556.8907	NaN	NaN
1960-08	558.1492	NaN	NaN
1960-09	503.6881	NaN	NaN
1960-10	449.0433	NaN	NaN
1960-11	405.1229	NaN	NaN
1960-12	431.7701	NaN	NaN

Forecast Plotting Customization¶

Similar to the prediction customization, we can customize the forecast plots as well.

In [15]:

# Regular Plot
exp.plot_model(model)

In [16]:

# Modified Plot (zoom into the plot to see differences between the 2 plots)
exp.plot_model(model, data_kwargs={"alpha": 0.7, "coverage": 0.8})

Enforce Prediction Intervals¶

In some use cases, it is important to have prediction intervals. Users may wish to restrict the modeling to only those models that support prediction intervals.

Specifying point_alpha to any floating point value restricts the models to only those that provide a prediction interval. The value that is specified corresponds to the quantile of the point prediction that is returned.
This also adds an extra metric called COVERAGE.
COVERAGE gives the percentage of actuals that are within the prediction interval.

In [17]:

exp = TSForecastingExperiment()

# We can see that specifying a value for point_alpha enables `Enforce Prediction Interval` in the grid (and limits the models).
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, point_alpha=0.5)
exp.models()

	Description	Value
0	session_id	3833
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(132, 1)
7	Transformed test set shape	(12, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	True
12	Seasonal Period(s) Tested	12
13	Seasonality Present	True
14	Seasonalities Detected	[12]
15	Primary Seasonality	12
16	Target Strictly Positive	True
17	Target White Noise	No
18	Recommended d	1
19	Recommended Seasonal D	1
20	Preprocess	False
21	CPU Jobs	-1
22	Use GPU	False
23	Log Experiment	False
24	Experiment Name	ts-default-name
25	USI	17db

Out[17]:

	Name	Reference	Turbo
ID
arima	ARIMA	sktime.forecasting.arima.ARIMA	True
auto_arima	Auto ARIMA	sktime.forecasting.arima.AutoARIMA	True
ets	ETS	sktime.forecasting.ets.AutoETS	True
theta	Theta Forecaster	sktime.forecasting.theta.ThetaForecaster	True
tbats	TBATS	sktime.forecasting.tbats.TBATS	False
bats	BATS	sktime.forecasting.bats.BATS	False

In [18]:

best_model = exp.compare_models()

# # To enable slower models such as prophet, BATS and TBATS, add turbo=False
# best_model = exp.compare_models(turbo=False)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	COVERAGE	TT (Sec)
ets	ETS	0.5932	0.6201	17.4214	20.4742	0.0440	0.0445	0.8884	0.6667	0.1933
arima	ARIMA	0.6830	0.6735	20.0069	22.2199	0.0501	0.0507	0.8677	0.6389	1.5467
auto_arima	Auto ARIMA	0.7181	0.7114	21.0297	23.4661	0.0525	0.0531	0.8509	0.6944	2.7633
theta	Theta Forecaster	0.9729	1.0306	28.3192	33.8639	0.0670	0.0700	0.6710	0.6389	0.0500

Types of Window Splitters¶

Various window splitters are available for performing the cross validation.

Sliding Window Splitter¶

In [19]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='sliding', verbose=False)
exp.plot_model(plot="cv")

Expanding/Rolling Window¶

They are identical

In [20]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='expanding', verbose=False)
exp.plot_model(plot="cv")

In [21]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, fold_strategy='rolling', verbose=False)
exp.plot_model(plot="cv")

Error Handling due to lack of data¶

Sometimes, there are not enough data points available to perform the experiment. In such cases, pycaret will warn you accordingly.

In [22]:

try:
    exp = TSForecastingExperiment()
    exp.setup(data=y[:30], fh=12, fold=3, fig_kwargs=fig_kwargs)
except ValueError as error:
    print(error)

Not Enough Data Points, set a lower number of folds or fh

In [23]:

try:
    exp = TSForecastingExperiment()
    exp.setup(data=y[:30], fh=6, fold=3, fig_kwargs=fig_kwargs)
except ValueError as error:
    print(error)

	Description	Value
0	session_id	5965
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(30, 1)
5	Transformed data shape	(30, 1)
6	Transformed train set shape	(24, 1)
7	Transformed test set shape	(6, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Seasonal Period(s) Tested	12
13	Seasonality Present	False
14	Seasonalities Detected	[1]
15	Primary Seasonality	1
16	Target Strictly Positive	True
17	Target White Noise	No
18	Recommended d	1
19	Recommended Seasonal D	0
20	Preprocess	False
21	CPU Jobs	-1
22	Use GPU	False
23	Log Experiment	False
24	Experiment Name	ts-default-name
25	USI	2a8f

Tuning Customization¶

In [24]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs, session_id=42)

	Description	Value
0	session_id	42
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(132, 1)
7	Transformed test set shape	(12, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Seasonal Period(s) Tested	12
13	Seasonality Present	True
14	Seasonalities Detected	[12]
15	Primary Seasonality	12
16	Target Strictly Positive	True
17	Target White Noise	No
18	Recommended d	1
19	Recommended Seasonal D	1
20	Preprocess	False
21	CPU Jobs	-1
22	Use GPU	False
23	Log Experiment	False
24	Experiment Name	ts-default-name
25	USI	f401

Out[24]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1e88a825e20>

In [25]:

model = exp.create_model("lr_cds_dt")

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1956-12	0.7137	0.8440	20.8412	27.6262	0.0513	0.0533	0.7516
1	1957-12	0.6678	0.7038	20.4172	23.8918	0.0557	0.0539	0.8505
2	1958-12	0.7198	0.7630	20.5669	24.8024	0.0457	0.0471	0.8624
Mean	nan	0.7004	0.7702	20.6084	25.4401	0.0509	0.0514	0.8215
SD	nan	0.0232	0.0575	0.1756	1.5898	0.0041	0.0031	0.0497

In [26]:

# Random Grid Search (default)
tuned_model = exp.tune_model(model)
print(model)
print(tuned_model)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1956-12	0.3157	0.3699	9.2184	12.1077	0.0233	0.0235	0.9523
1	1957-12	1.0009	0.9835	30.6011	33.3898	0.0834	0.0794	0.7079
2	1958-12	0.4787	0.4882	13.6786	15.8682	0.0320	0.0325	0.9437
Mean	nan	0.5984	0.6139	17.8327	20.4552	0.0462	0.0452	0.8680
SD	nan	0.2923	0.2658	9.2104	9.2741	0.0265	0.0245	0.1132

BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,
                    window_length=12)
BaseCdsDtForecaster(degree=2, deseasonal_model='multiplicative',
                    regressor=LinearRegression(fit_intercept=False, n_jobs=-1,
                                               normalize=True),
                    sp=12, window_length=23)

In [27]:

exp.plot_model([model, tuned_model], data_kwargs={"labels": ["Original", "Tuned"]})

In [28]:

# Fixed Grid Search
tuned_model = exp.tune_model(model, search_algorithm="grid")
print(model)
print(tuned_model)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1956-12	0.8252	1.0020	24.0975	32.7998	0.0587	0.0616	0.6498
1	1957-12	0.7115	0.7487	21.7530	25.4177	0.0583	0.0571	0.8307
2	1958-12	0.7203	0.8296	20.5818	26.9684	0.0445	0.0459	0.8373
Mean	nan	0.7523	0.8601	22.1441	28.3953	0.0539	0.0549	0.7726
SD	nan	0.0516	0.1056	1.4617	3.1781	0.0066	0.0066	0.0869

BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,
                    window_length=12)
BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,
                    window_length=12)

Observations:

In this case, the tuning resulted in worse metrics than the original model (this is possible).
Hence, pycaret returned the original model as the best one since choose_better=True by default.
If the user does not want this behavior, they can set choose_better=False

In [29]:

tuned_model = exp.tune_model(model, search_algorithm="grid", choose_better=False)
print(model)
print(tuned_model)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1956-12	0.8252	1.0020	24.0975	32.7998	0.0587	0.0616	0.6498
1	1957-12	0.7115	0.7487	21.7530	25.4177	0.0583	0.0571	0.8307
2	1958-12	0.7203	0.8296	20.5818	26.9684	0.0445	0.0459	0.8373
Mean	nan	0.7523	0.8601	22.1441	28.3953	0.0539	0.0549	0.7726
SD	nan	0.0516	0.1056	1.4617	3.1781	0.0066	0.0066	0.0869

BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,
                    window_length=12)
BaseCdsDtForecaster(regressor=LinearRegression(fit_intercept=False, n_jobs=-1,
                                               normalize=True),
                    sp=12)

Sometimes, there are time constraints on the tuning so users may wish to adjust the number of hyperparameters that are tried using the n_iter argument.

In [30]:

tuned_model = exp.tune_model(model, n_iter=5)
print(model)
print(tuned_model)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1956-12	0.3157	0.3699	9.2184	12.1077	0.0233	0.0235	0.9523
1	1957-12	1.0009	0.9835	30.6011	33.3898	0.0834	0.0794	0.7079
2	1958-12	0.4787	0.4882	13.6786	15.8682	0.0320	0.0325	0.9437
Mean	nan	0.5984	0.6139	17.8327	20.4552	0.0462	0.0452	0.8680
SD	nan	0.2923	0.2658	9.2104	9.2741	0.0265	0.0245	0.1132

BaseCdsDtForecaster(regressor=LinearRegression(n_jobs=-1), sp=12,
                    window_length=12)
BaseCdsDtForecaster(degree=2, deseasonal_model='multiplicative',
                    regressor=LinearRegression(fit_intercept=False, n_jobs=-1,
                                               normalize=True),
                    sp=12, window_length=23)

More information about tunuing in pycaret time series can be found here:

Basic Tuning
Advanced Tuning

Setting Renderer¶

Sometimes the plotly renderer if not detected correctly for the environment. In such cases, the users can manually specify the render in pycaret

In [31]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, verbose=False)
exp.plot_model(plot="cv")

Renderer 'plotly_mimetype+notebook' is not a valid Plotly renderer. Valid renderers are:
 Renderers configuration
-----------------------
    Default renderer: 'plotly_mimetype+notebook'
    Available renderers:
        ['plotly_mimetype', 'jupyterlab', 'nteract', 'vscode',
         'notebook', 'notebook_connected', 'kaggle', 'azure', 'colab',
         'cocalc', 'databricks', 'json', 'png', 'jpeg', 'jpg', 'svg',
         'pdf', 'browser', 'firefox', 'chrome', 'chromium', 'iframe',
         'iframe_connected', 'sphinx_gallery', 'sphinx_gallery_png']

When data exceeds a certain threshold (determined by `big_data_threshold`), the renderer is switched to a static one to prevent notebooks from being slowed down.
This renderer may need to be installed manually by users.
Alternately:
Option 1: Users can increase `big_data_threshold` in either `setup` (globally) or `plot_model` (plot specific). Examples.
	>>> setup(..., fig_kwargs={'big_data_threshold': 1000})
	>>> plot_model(..., fig_kwargs={'big_data_threshold': 1000})
Option 2: Users can specify any plotly renderer directly in either `setup` (globally) or `plot_model` (plot specific). Examples.
	>>> setup(..., fig_kwargs={'renderer': 'notebook'})
	>>> plot_model(..., fig_kwargs={'renderer': 'colab'})
Refer to the docstring in `setup` for more details.

In [32]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs={'renderer': 'notebook'}, verbose=False)
exp.plot_model(plot="cv")

Users can also specify the renderer for specific plot types

In [33]:

exp.plot_model(fig_kwargs={'renderer': 'png'})

Seasonal Period¶

Setting the seasonal period for time series models is one of the most important aspects that can dictate how accurate the model are.
By default, pycaret will try to derive the seasonal period from the index.
When this can not be done, seasonal period needs to be provided manually by the user.
Even when the seasonal period can be derived from the index, users can always override this manually by specyig the seasonal period.

In [34]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs)

	Description	Value
0	session_id	641
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(132, 1)
7	Transformed test set shape	(12, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Seasonal Period(s) Tested	12
13	Seasonality Present	True
14	Seasonalities Detected	[12]
15	Primary Seasonality	12
16	Target Strictly Positive	True
17	Target White Noise	No
18	Recommended d	1
19	Recommended Seasonal D	1
20	Preprocess	False
21	CPU Jobs	-1
22	Use GPU	False
23	Log Experiment	False
24	Experiment Name	ts-default-name
25	USI	325d

Out[34]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1e88a4a09a0>

Observations:

The default Seasinal Period derived from index = 12

Users can change this based on EDA. e.g. lets change it to 36

In [35]:

exp = TSForecastingExperiment()
exp.setup(data=y, fh=fh, fold=fold, seasonal_period=36, fig_kwargs=fig_kwargs)

	Description	Value
0	session_id	955
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(132, 1)
7	Transformed test set shape	(12, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Seasonal Period(s) Tested	36
13	Seasonality Present	False
14	Seasonalities Detected	[1]
15	Primary Seasonality	1
16	Target Strictly Positive	True
17	Target White Noise	No
18	Recommended d	1
19	Recommended Seasonal D	0
20	Preprocess	False
21	CPU Jobs	-1
22	Use GPU	False
23	Log Experiment	False
24	Experiment Name	ts-default-name
25	USI	546a

Out[35]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1e88a6cb100>

Observations:

In this case, the user specified a seasonal period of 36, but a seasonality test at this period did not detect seasonality.
Hence a seasonality of 1 will be used for modeling.

In [36]:

y = get_data("1", folder="time_series/ar1")

	x
0	173.786244
1	174.850941
2	175.435101
3	174.807199
4	174.872474

In [37]:

try:
    exp = TSForecastingExperiment()
    exp.setup(data=y, fh=fh, fold=fold, fig_kwargs=fig_kwargs)
except ValueError as error:
    print(error)

The index of your 'data' is of type '<class 'pandas.core.indexes.range.RangeIndex'>'. If the 'data' index is not of one of the following types: <class 'pandas.core.indexes.period.PeriodIndex'>, <class 'pandas.core.indexes.datetimes.DatetimeIndex'>, then 'seasonal_period' must be provided. Refer to docstring for options.

Observations:

The frequency/seasonal period could not be derived from the index. Hence the user needs to specify this manually.
The user can specify an arbitrary seasonal period at first (as below), perform EDA to deterine the appropriate seasonal period for modeling.

In [38]:

eda = TSForecastingExperiment()
eda.setup(data=y, fh=fh, fold=fold, seasonal_period=3, fig_kwargs=fig_kwargs)

	Description	Value
0	session_id	5965
1	Target	x
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(340, 1)
5	Transformed data shape	(340, 1)
6	Transformed train set shape	(328, 1)
7	Transformed test set shape	(12, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Seasonal Period(s) Tested	3
13	Seasonality Present	True
14	Seasonalities Detected	[3]
15	Primary Seasonality	3
16	Target Strictly Positive	True
17	Target White Noise	No
18	Recommended d	1
19	Recommended Seasonal D	0
20	Preprocess	False
21	CPU Jobs	-1
22	Use GPU	False
23	Log Experiment	False
24	Experiment Name	ts-default-name
25	USI	41bc

Out[38]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1e88a5d72b0>

In [39]:

eda.plot_model(plot="diagnostics", fig_kwargs={"height": 600, "width": 1000})

Observations:

We see wandering behavior in the dataset but no real seasonal pattern.
We should reser the seasonal period to 1 for the modeling.

In [40]:

eda = TSForecastingExperiment()
eda.setup(data=y, fh=fh, fold=fold, seasonal_period=1, fig_kwargs=fig_kwargs)

	Description	Value
0	session_id	5108
1	Target	x
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(340, 1)
5	Transformed data shape	(340, 1)
6	Transformed train set shape	(328, 1)
7	Transformed test set shape	(12, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Seasonal Period(s) Tested	1
13	Seasonality Present	False
14	Seasonalities Detected	[1]
15	Primary Seasonality	1
16	Target Strictly Positive	True
17	Target White Noise	No
18	Recommended d	1
19	Recommended Seasonal D	0
20	Preprocess	False
21	CPU Jobs	-1
22	Use GPU	False
23	Log Experiment	False
24	Experiment Name	ts-default-name
25	USI	cf16

Out[40]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1e88a8b6b20>

That's it for this notebook. If you would like to see other demonstrations, feel free to open an issue on GitHub.