Last updated: 16 Feb 2023

👋 PyCaret Time Series Forecasting Tutorial¶

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.

Compared with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks, such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more.

The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise.

💻 Installation¶

PyCaret is tested and supported on the following 64-bit systems:

Python 3.7 – 3.10
Python 3.9 for Ubuntu only
Ubuntu 16.04 or later
Windows 7 or later

You can install PyCaret with Python's pip package manager:

pip install pycaret

PyCaret's default installation will not install all the extra dependencies automatically. For that you will have to install the full version:

pip install pycaret[full]

or depending on your use-case you may install one of the following variant:

pip install pycaret[analysis]
pip install pycaret[models]
pip install pycaret[tuner]
pip install pycaret[mlops]
pip install pycaret[parallel]
pip install pycaret[test]

In [1]:

# check installed version
import pycaret
pycaret.__version__

Out[1]:

'3.0.0'

🚀 Quick start¶

PyCaret's time series forecasting module is now available. The module currently is suitable for univariate / multivariate time series forecasting tasks. The API of time series module is consistent with other modules of PyCaret.

It comes built-in with preprocessing capabilities and over 30 algorithms comprising of statistical / time-series methods as well as machine learning based models. In addition to the model training, this module has lot of other capabilities such as automated hyperparameter tuning, ensembling, model analysis, model packaging and deployment capabilities.

A typical workflow in PyCaret consist of following 5 steps in this order:

Setup ➡️ Compare Models ➡️ Analyze Model ➡️ Prediction ➡️ Save Model
¶

In [2]:

### loading sample dataset from pycaret dataset module
from pycaret.datasets import get_data
data = get_data('airline')

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
Freq: M, Name: Number of airline passengers, dtype: float64

In [3]:

# plot the dataset
data.plot()

Out[3]:

<AxesSubplot:xlabel='Period'>

Setup¶

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function in PyCaret. Setup has only one required parameter i.e. data. All the other parameters are optional.

In [4]:

# import pycaret time series and init setup
from pycaret.time_series import *
s = setup(data, fh = 3, session_id = 123)

	Description	Value
0	session_id	123
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(141, 1)
7	Transformed test set shape	(3, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	Seasonality Detection Algo	auto
14	Max Period to Consider	60
15	Seasonal Period(s) Tested	[12, 24, 36, 11, 48]
16	Significant Seasonal Period(s)	[12, 24, 36, 11, 48]
17	Significant Seasonal Period(s) without Harmonics	[48, 36, 11]
18	Remove Harmonics	False
19	Harmonics Order Method	harmonic_max
20	Num Seasonalities to Use	1
21	All Seasonalities to Use	[12]
22	Primary Seasonality	12
23	Seasonality Present	True
24	Target Strictly Positive	True
25	Target White Noise	No
26	Recommended d	1
27	Recommended Seasonal D	1
28	Preprocess	False
29	CPU Jobs	-1
30	Use GPU	False
31	Log Experiment	False
32	Experiment Name	ts-default-name
33	USI	4a01

Once the setup has been successfully executed it shows the information grid containing experiment level information.

Session id: A pseudo-random number distributed as a seed in all functions for later reproducibility. If no session_id is passed, a random number is automatically generated that is distributed to all functions.

- **Approach:** Univariate or multivariate.

- **Exogenous Variables:** Exogeneous variables to be used in model.

- **Original data shape:** Shape of the original data prior to any transformations.

- **Transformed train set shape :** Shape of transformed train set

- **Transformed test set shape :** Shape of transformed test set

PyCaret has two set of API's that you can work with. (1) Functional (as seen above) and (2) Object Oriented API.

With Object Oriented API instead of executing functions directly you will import a class and execute methods of class.

In [5]:

# import TSForecastingExperiment and init the class
from pycaret.time_series import TSForecastingExperiment
exp = TSForecastingExperiment()

In [6]:

# check the type of exp
type(exp)

Out[6]:

pycaret.time_series.forecasting.oop.TSForecastingExperiment

In [7]:

# init setup on exp
exp.setup(data, fh = 3, session_id = 123)

	Description	Value
0	session_id	123
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(141, 1)
7	Transformed test set shape	(3, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	Seasonality Detection Algo	auto
14	Max Period to Consider	60
15	Seasonal Period(s) Tested	[12, 24, 36, 11, 48]
16	Significant Seasonal Period(s)	[12, 24, 36, 11, 48]
17	Significant Seasonal Period(s) without Harmonics	[48, 36, 11]
18	Remove Harmonics	False
19	Harmonics Order Method	harmonic_max
20	Num Seasonalities to Use	1
21	All Seasonalities to Use	[12]
22	Primary Seasonality	12
23	Seasonality Present	True
24	Target Strictly Positive	True
25	Target White Noise	No
26	Recommended d	1
27	Recommended Seasonal D	1
28	Preprocess	False
29	CPU Jobs	-1
30	Use GPU	False
31	Log Experiment	False
32	Experiment Name	ts-default-name
33	USI	cf71

Out[7]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1d36ad79a90>

You can use any of the two method i.e. Functional or OOP and even switch back and forth between two set of API's. The choice of method will not impact the results and has been tested for consistency.

Check Stats¶

The check_stats function is used to get summary statistics and run statistical tests on the original data or model residuals.

In [8]:

# check statistical tests on original data
check_stats()

Out[8]:

	Test	Test Name	Data	Property	Setting	Value
0	Summary	Statistics	Transformed	Length		144.0
1	Summary	Statistics	Transformed	# Missing Values		0.0
2	Summary	Statistics	Transformed	Mean		280.298611
3	Summary	Statistics	Transformed	Median		265.5
4	Summary	Statistics	Transformed	Standard Deviation		119.966317
5	Summary	Statistics	Transformed	Variance		14391.917201
6	Summary	Statistics	Transformed	Kurtosis		-0.364942
7	Summary	Statistics	Transformed	Skewness		0.58316
8	Summary	Statistics	Transformed	# Distinct Values		118.0
9	White Noise	Ljung-Box	Transformed	Test Statictic	{'alpha': 0.05, 'K': 24}	1606.083817
10	White Noise	Ljung-Box	Transformed	Test Statictic	{'alpha': 0.05, 'K': 48}	1933.155822
11	White Noise	Ljung-Box	Transformed	p-value	{'alpha': 0.05, 'K': 24}	0.0
12	White Noise	Ljung-Box	Transformed	p-value	{'alpha': 0.05, 'K': 48}	0.0
13	White Noise	Ljung-Box	Transformed	White Noise	{'alpha': 0.05, 'K': 24}	False
14	White Noise	Ljung-Box	Transformed	White Noise	{'alpha': 0.05, 'K': 48}	False
15	Stationarity	ADF	Transformed	Stationarity	{'alpha': 0.05}	False
16	Stationarity	ADF	Transformed	p-value	{'alpha': 0.05}	0.99188
17	Stationarity	ADF	Transformed	Test Statistic	{'alpha': 0.05}	0.815369
18	Stationarity	ADF	Transformed	Critical Value 1%	{'alpha': 0.05}	-3.481682
19	Stationarity	ADF	Transformed	Critical Value 5%	{'alpha': 0.05}	-2.884042
20	Stationarity	ADF	Transformed	Critical Value 10%	{'alpha': 0.05}	-2.57877
21	Stationarity	KPSS	Transformed	Trend Stationarity	{'alpha': 0.05}	True
22	Stationarity	KPSS	Transformed	p-value	{'alpha': 0.05}	0.1
23	Stationarity	KPSS	Transformed	Test Statistic	{'alpha': 0.05}	0.09615
24	Stationarity	KPSS	Transformed	Critical Value 10%	{'alpha': 0.05}	0.119
25	Stationarity	KPSS	Transformed	Critical Value 5%	{'alpha': 0.05}	0.146
26	Stationarity	KPSS	Transformed	Critical Value 2.5%	{'alpha': 0.05}	0.176
27	Stationarity	KPSS	Transformed	Critical Value 1%	{'alpha': 0.05}	0.216
28	Normality	Shapiro	Transformed	Normality	{'alpha': 0.05}	False
29	Normality	Shapiro	Transformed	p-value	{'alpha': 0.05}	0.000068

Compare Models¶

This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

In [9]:

# compare baseline models
best = compare_models()

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
ets	ETS	0.4912	0.5541	15.0940	19.3099	0.0318	0.0316	-0.4465	0.0967
exp_smooth	Exponential Smoothing	0.4929	0.5560	15.1460	19.3779	0.0320	0.0317	-0.4600	0.1033
arima	ARIMA	0.6964	0.7110	21.3757	24.7774	0.0447	0.0456	-0.5495	0.0667
auto_arima	Auto ARIMA	0.7136	0.6945	21.9389	24.2138	0.0459	0.0464	-0.5454	9.6867
par_cds_dt	Passive Aggressive w/ Cond. Deseasonalize & Detrending	0.7212	0.6696	22.1794	23.3673	0.0453	0.0468	0.0261	0.1200
lar_cds_dt	Least Angular Regressor w/ Cond. Deseasonalize & Detrending	0.8503	0.8261	26.2655	28.9830	0.0513	0.0534	0.0367	0.0967
huber_cds_dt	Huber w/ Cond. Deseasonalize & Detrending	0.8658	0.8362	26.7826	29.3947	0.0516	0.0536	0.1501	0.1333
lr_cds_dt	Linear w/ Cond. Deseasonalize & Detrending	0.8904	0.8722	27.5266	30.6243	0.0534	0.0555	-0.0092	0.4067
ridge_cds_dt	Ridge w/ Cond. Deseasonalize & Detrending	0.8905	0.8722	27.5270	30.6246	0.0534	0.0555	-0.0092	0.2933
en_cds_dt	Elastic Net w/ Cond. Deseasonalize & Detrending	0.8944	0.8746	27.6535	30.7127	0.0535	0.0557	-0.0063	0.3833
lasso_cds_dt	Lasso w/ Cond. Deseasonalize & Detrending	0.8966	0.8759	27.7231	30.7594	0.0536	0.0558	-0.0040	0.1033
br_cds_dt	Bayesian Ridge w/ Cond. Deseasonalize & Detrending	0.9156	0.8878	28.3188	31.1821	0.0547	0.0569	-0.0209	0.1067
knn_cds_dt	K Neighbors w/ Cond. Deseasonalize & Detrending	1.0695	0.9924	33.1500	34.9277	0.0631	0.0656	-0.1682	0.1233
theta	Theta Forecaster	1.0839	1.0393	33.3223	36.2555	0.0686	0.0710	-1.7926	0.0333
et_cds_dt	Extra Trees w/ Cond. Deseasonalize & Detrending	1.1678	1.0866	36.1678	38.2100	0.0694	0.0726	-0.4302	0.1900
dt_cds_dt	Decision Tree w/ Cond. Deseasonalize & Detrending	1.1930	1.1346	36.9106	39.8518	0.0733	0.0769	-0.8135	0.1300
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2019	1.1362	37.2359	39.9827	0.0713	0.0746	-0.6051	0.6633
omp_cds_dt	Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending	1.2171	1.1475	37.6457	40.3070	0.0724	0.0757	-0.7057	0.1067
gbr_cds_dt	Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2274	1.1449	37.9963	40.2550	0.0735	0.0769	-0.7190	0.1467
rf_cds_dt	Random Forest w/ Cond. Deseasonalize & Detrending	1.2500	1.1782	38.6418	41.3528	0.0749	0.0784	-0.9426	0.2133
catboost_cds_dt	CatBoost Regressor w/ Cond. Deseasonalize & Detrending	1.2523	1.1604	38.8002	40.8201	0.0745	0.0780	-0.6842	1.5933
ada_cds_dt	AdaBoost w/ Cond. Deseasonalize & Detrending	1.2786	1.1951	39.6382	42.0658	0.0750	0.0788	-0.6308	0.1367
xgboost_cds_dt	Extreme Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.3198	1.2045	40.8342	42.3045	0.0792	0.0831	-0.9192	0.1800
llar_cds_dt	Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending	1.3659	1.2672	42.3974	44.6597	0.0793	0.0834	-0.7393	0.0967
naive	Naive Forecaster	1.5654	1.4951	48.4444	52.5232	0.0920	0.0981	-1.8344	2.5533
snaive	Seasonal Naive Forecaster	1.6741	1.5343	51.6667	53.7350	0.1052	0.1117	-4.5388	1.2567
polytrend	Polynomial Trend Forecaster	2.1553	2.1096	66.9817	74.4048	0.1241	0.1350	-4.2525	0.0167
croston	Croston	2.4565	2.3513	76.3953	82.9794	0.1394	0.1562	-4.5895	0.0167
grand_means	Grand Means Forecaster	7.3065	6.5029	226.0502	228.3880	0.4469	0.5821	-72.1183	1.4433

Processing:   0%|          | 0/125 [00:00<?, ?it/s]

In [10]:

# compare models using OOP
exp.compare_models()

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
ets	ETS	0.4912	0.5541	15.0940	19.3099	0.0318	0.0316	-0.4465	0.0967
exp_smooth	Exponential Smoothing	0.4929	0.5560	15.1460	19.3779	0.0320	0.0317	-0.4600	0.0867
arima	ARIMA	0.6964	0.7110	21.3757	24.7774	0.0447	0.0456	-0.5495	0.1300
auto_arima	Auto ARIMA	0.7136	0.6945	21.9389	24.2138	0.0459	0.0464	-0.5454	13.9433
par_cds_dt	Passive Aggressive w/ Cond. Deseasonalize & Detrending	0.7212	0.6696	22.1794	23.3673	0.0453	0.0468	0.0261	0.1100
lar_cds_dt	Least Angular Regressor w/ Cond. Deseasonalize & Detrending	0.8503	0.8261	26.2655	28.9830	0.0513	0.0534	0.0367	0.1200
huber_cds_dt	Huber w/ Cond. Deseasonalize & Detrending	0.8658	0.8362	26.7826	29.3947	0.0516	0.0536	0.1501	0.0967
lr_cds_dt	Linear w/ Cond. Deseasonalize & Detrending	0.8904	0.8722	27.5266	30.6243	0.0534	0.0555	-0.0092	0.0967
ridge_cds_dt	Ridge w/ Cond. Deseasonalize & Detrending	0.8905	0.8722	27.5270	30.6246	0.0534	0.0555	-0.0092	0.0967
en_cds_dt	Elastic Net w/ Cond. Deseasonalize & Detrending	0.8944	0.8746	27.6535	30.7127	0.0535	0.0557	-0.0063	0.1133
lasso_cds_dt	Lasso w/ Cond. Deseasonalize & Detrending	0.8966	0.8759	27.7231	30.7594	0.0536	0.0558	-0.0040	0.0933
br_cds_dt	Bayesian Ridge w/ Cond. Deseasonalize & Detrending	0.9156	0.8878	28.3188	31.1821	0.0547	0.0569	-0.0209	0.0900
knn_cds_dt	K Neighbors w/ Cond. Deseasonalize & Detrending	1.0695	0.9924	33.1500	34.9277	0.0631	0.0656	-0.1682	0.1300
theta	Theta Forecaster	1.0839	1.0393	33.3223	36.2555	0.0686	0.0710	-1.7926	0.0300
et_cds_dt	Extra Trees w/ Cond. Deseasonalize & Detrending	1.1678	1.0866	36.1678	38.2100	0.0694	0.0726	-0.4302	0.2600
dt_cds_dt	Decision Tree w/ Cond. Deseasonalize & Detrending	1.1930	1.1346	36.9106	39.8518	0.0733	0.0769	-0.8135	0.1767
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2019	1.1362	37.2359	39.9827	0.0713	0.0746	-0.6051	0.5800
omp_cds_dt	Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending	1.2171	1.1475	37.6457	40.3070	0.0724	0.0757	-0.7057	0.1167
gbr_cds_dt	Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2274	1.1449	37.9963	40.2550	0.0735	0.0769	-0.7190	0.1633
rf_cds_dt	Random Forest w/ Cond. Deseasonalize & Detrending	1.2500	1.1782	38.6418	41.3528	0.0749	0.0784	-0.9426	0.2433
catboost_cds_dt	CatBoost Regressor w/ Cond. Deseasonalize & Detrending	1.2523	1.1604	38.8002	40.8201	0.0745	0.0780	-0.6842	1.6900
ada_cds_dt	AdaBoost w/ Cond. Deseasonalize & Detrending	1.2786	1.1951	39.6382	42.0658	0.0750	0.0788	-0.6308	0.1733
xgboost_cds_dt	Extreme Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.3198	1.2045	40.8342	42.3045	0.0792	0.0831	-0.9192	0.2167
llar_cds_dt	Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending	1.3659	1.2672	42.3974	44.6597	0.0793	0.0834	-0.7393	0.0967
naive	Naive Forecaster	1.5654	1.4951	48.4444	52.5232	0.0920	0.0981	-1.8344	0.0467
snaive	Seasonal Naive Forecaster	1.6741	1.5343	51.6667	53.7350	0.1052	0.1117	-4.5388	0.0367
polytrend	Polynomial Trend Forecaster	2.1553	2.1096	66.9817	74.4048	0.1241	0.1350	-4.2525	0.0433
croston	Croston	2.4565	2.3513	76.3953	82.9794	0.1394	0.1562	-4.5895	0.0267
grand_means	Grand Means Forecaster	7.3065	6.5029	226.0502	228.3880	0.4469	0.5821	-72.1183	0.0400

Processing:   0%|          | 0/125 [00:00<?, ?it/s]

Out[10]:

AutoETS(seasonal='mul', sp=12, trend='add')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Notice that the output between functional and OOP API is consistent. Rest of the functions in this notebook will only be shown using functional API only.

Analyze Model¶

You can use the plot_model function to analyzes the performance of a trained model on the test set. It may require re-training the model in certain cases.

In [11]:

# plot forecast
plot_model(best, plot = 'forecast')

In [12]:

# plot forecast for 36 months in future
plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 36})

In [13]:

# residuals plot
plot_model(best, plot = 'residuals')

In [14]:

# check docstring to see available plots 
# help(plot_model)

An alternate to plot_model function is evaluate_model. It can only be used in Notebook since it uses ipywidget.

Prediction¶

The predict_model function returns y_pred. When data is None (default), it uses fh as defined during the setup function.

In [15]:

# predict on test set
holdout_pred = predict_model(best)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	ETS	0.2516	0.2962	8.0352	10.7426	0.0179	0.0182	0.8642

In [16]:

# show predictions df
holdout_pred.head()

Out[16]:

	y_pred
1960-10	442.9857
1960-11	388.2084
1960-12	427.7002

In [17]:

# generate forecast for 36 period in future
predict_model(best, fh = 36)

Out[17]:

	y_pred
1960-10	442.9857
1960-11	388.2084
1960-12	427.7002
1961-01	440.8284
1961-02	414.1669
1961-03	460.3102
1961-04	489.8039
1961-05	500.6157
1961-06	567.9574
1961-07	657.4232
1961-08	648.7133
1961-09	541.5302
1961-10	472.1907
1961-11	413.6623
1961-12	455.5911
1962-01	469.4199
1962-02	440.8848
1962-03	489.8460
1962-04	521.0650
1962-05	532.3979
1962-06	603.8251
1962-07	698.7234
1962-08	689.2542
1962-09	575.1974
1962-10	501.3958
1962-11	439.1161
1962-12	483.4819
1963-01	498.0115
1963-02	467.6027
1963-03	519.3819
1963-04	552.3262
1963-05	564.1800
1963-06	639.6927
1963-07	740.0237
1963-08	729.7950
1963-09	608.8646

Save Model¶

Finally, you can save the entire pipeline on disk for later use, using pycaret's save_model function.

In [18]:

# save pipeline
save_model(best, 'my_first_pipeline')

Transformation Pipeline and Model Successfully Saved

Out[18]:

(AutoETS(seasonal='mul', sp=12, trend='add'), 'my_first_pipeline.pkl')

In [19]:

# load pipeline
loaded_best_pipeline = load_model('my_first_pipeline')
loaded_best_pipeline

Transformation Pipeline and Model Successfully Loaded

Out[19]:

AutoETS(seasonal='mul', sp=12, trend='add')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

👇 Detailed function-by-function overview¶

✅ Setup¶

This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function in PyCaret. Setup has only one required parameter i.e. data. All the other parameters are optional.

In [20]:

s = setup(data, fh = 3, session_id = 123)

	Description	Value
0	session_id	123
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(141, 1)
7	Transformed test set shape	(3, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	Seasonality Detection Algo	auto
14	Max Period to Consider	60
15	Seasonal Period(s) Tested	[12, 24, 36, 11, 48]
16	Significant Seasonal Period(s)	[12, 24, 36, 11, 48]
17	Significant Seasonal Period(s) without Harmonics	[48, 36, 11]
18	Remove Harmonics	False
19	Harmonics Order Method	harmonic_max
20	Num Seasonalities to Use	1
21	All Seasonalities to Use	[12]
22	Primary Seasonality	12
23	Seasonality Present	True
24	Target Strictly Positive	True
25	Target White Noise	No
26	Recommended d	1
27	Recommended Seasonal D	1
28	Preprocess	False
29	CPU Jobs	-1
30	Use GPU	False
31	Log Experiment	False
32	Experiment Name	ts-default-name
33	USI	0889

To access all the variables created by the setup function such as transformed dataset, random_state, etc. you can use get_config method.

In [21]:

# check all available config
get_config()

Out[21]:

{'USI',
 'X',
 'X_test',
 'X_test_transformed',
 'X_train',
 'X_train_transformed',
 'X_transformed',
 '_available_plots',
 '_ml_usecase',
 'all_sps_to_use',
 'approach_type',
 'candidate_sps',
 'data',
 'dataset',
 'dataset_transformed',
 'enforce_exogenous',
 'enforce_pi',
 'exogenous_present',
 'exp_id',
 'exp_name_log',
 'fh',
 'fold_generator',
 'fold_param',
 'gpu_n_jobs_param',
 'gpu_param',
 'html_param',
 'idx',
 'index_type',
 'is_multiclass',
 'log_plots_param',
 'logging_param',
 'memory',
 'model_engines',
 'n_jobs_param',
 'pipeline',
 'primary_sp_to_use',
 'seasonality_present',
 'seed',
 'significant_sps',
 'significant_sps_no_harmonics',
 'strictly_positive',
 'test',
 'test_transformed',
 'train',
 'train_transformed',
 'variable_and_property_keys',
 'variables',
 'y',
 'y_test',
 'y_test_transformed',
 'y_train',
 'y_train_transformed',
 'y_transformed'}

In [22]:

# lets access y_train_transformed
get_config('y_train_transformed')

Out[22]:

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
           ...  
1960-05    472.0
1960-06    535.0
1960-07    622.0
1960-08    606.0
1960-09    508.0
Freq: M, Name: Number of airline passengers, Length: 141, dtype: float64

In [23]:

# another example: let's access seed
print("The current seed is: {}".format(get_config('seed')))

# now lets change it using set_config
set_config('seed', 786)
print("The new seed is: {}".format(get_config('seed')))

The current seed is: 123
The new seed is: 786

All the preprocessing configurations and experiment settings/parameters are passed into the setup function. To see all available parameters, check the docstring:

In [24]:

# help(setup)

In [25]:

# init setup fold_strategy = expanding
s = setup(data, fh = 3, session_id = 123,
          fold_strategy = 'expanding', numeric_imputation_target = 'drift')

	Description	Value
0	session_id	123
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(141, 1)
7	Transformed test set shape	(3, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	Seasonality Detection Algo	auto
14	Max Period to Consider	60
15	Seasonal Period(s) Tested	[12, 24, 36, 11, 48]
16	Significant Seasonal Period(s)	[12, 24, 36, 11, 48]
17	Significant Seasonal Period(s) without Harmonics	[48, 36, 11]
18	Remove Harmonics	False
19	Harmonics Order Method	harmonic_max
20	Num Seasonalities to Use	1
21	All Seasonalities to Use	[12]
22	Primary Seasonality	12
23	Seasonality Present	True
24	Target Strictly Positive	True
25	Target White Noise	No
26	Recommended d	1
27	Recommended Seasonal D	1
28	Preprocess	True
29	Numerical Imputation (Target)	drift
30	Transformation (Target)	None
31	Scaling (Target)	None
32	Feature Engineering (Target) - Reduced Regression	False
33	CPU Jobs	-1
34	Use GPU	False
35	Log Experiment	False
36	Experiment Name	ts-default-name
37	USI	b1f7

✅ Compare Models¶

This function trains and evaluates the performance of all estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

In [26]:

best = compare_models()

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
ets	ETS	0.4912	0.5541	15.0940	19.3099	0.0318	0.0316	-0.4465	0.1100
exp_smooth	Exponential Smoothing	0.4929	0.5560	15.1460	19.3779	0.0320	0.0317	-0.4600	0.1067
arima	ARIMA	0.6964	0.7110	21.3757	24.7774	0.0447	0.0456	-0.5495	0.1167
auto_arima	Auto ARIMA	0.7136	0.6945	21.9389	24.2138	0.0459	0.0464	-0.5454	11.6333
par_cds_dt	Passive Aggressive w/ Cond. Deseasonalize & Detrending	0.7212	0.6696	22.1794	23.3673	0.0453	0.0468	0.0261	0.1267
lar_cds_dt	Least Angular Regressor w/ Cond. Deseasonalize & Detrending	0.8503	0.8261	26.2655	28.9830	0.0513	0.0534	0.0367	0.1167
huber_cds_dt	Huber w/ Cond. Deseasonalize & Detrending	0.8658	0.8362	26.7826	29.3947	0.0516	0.0536	0.1501	0.1267
lr_cds_dt	Linear w/ Cond. Deseasonalize & Detrending	0.8904	0.8722	27.5266	30.6243	0.0534	0.0555	-0.0092	0.1300
ridge_cds_dt	Ridge w/ Cond. Deseasonalize & Detrending	0.8905	0.8722	27.5270	30.6246	0.0534	0.0555	-0.0092	0.1333
en_cds_dt	Elastic Net w/ Cond. Deseasonalize & Detrending	0.8944	0.8746	27.6535	30.7127	0.0535	0.0557	-0.0063	0.1333
lasso_cds_dt	Lasso w/ Cond. Deseasonalize & Detrending	0.8966	0.8759	27.7231	30.7594	0.0536	0.0558	-0.0040	0.1233
br_cds_dt	Bayesian Ridge w/ Cond. Deseasonalize & Detrending	0.9156	0.8878	28.3188	31.1821	0.0547	0.0569	-0.0209	0.1167
knn_cds_dt	K Neighbors w/ Cond. Deseasonalize & Detrending	1.0695	0.9924	33.1500	34.9277	0.0631	0.0656	-0.1682	0.1433
theta	Theta Forecaster	1.0839	1.0393	33.3223	36.2555	0.0686	0.0710	-1.7926	0.0600
et_cds_dt	Extra Trees w/ Cond. Deseasonalize & Detrending	1.1678	1.0866	36.1678	38.2100	0.0694	0.0726	-0.4302	0.2033
dt_cds_dt	Decision Tree w/ Cond. Deseasonalize & Detrending	1.1930	1.1346	36.9106	39.8518	0.0733	0.0769	-0.8135	0.1233
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2019	1.1362	37.2359	39.9827	0.0713	0.0746	-0.6051	0.3767
omp_cds_dt	Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending	1.2171	1.1475	37.6457	40.3070	0.0724	0.0757	-0.7057	0.1200
gbr_cds_dt	Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2274	1.1449	37.9963	40.2550	0.0735	0.0769	-0.7190	0.1567
rf_cds_dt	Random Forest w/ Cond. Deseasonalize & Detrending	1.2500	1.1782	38.6418	41.3528	0.0749	0.0784	-0.9426	0.2267
catboost_cds_dt	CatBoost Regressor w/ Cond. Deseasonalize & Detrending	1.2523	1.1604	38.8002	40.8201	0.0745	0.0780	-0.6842	1.0700
ada_cds_dt	AdaBoost w/ Cond. Deseasonalize & Detrending	1.2786	1.1951	39.6382	42.0658	0.0750	0.0788	-0.6308	0.1433
xgboost_cds_dt	Extreme Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.3198	1.2045	40.8342	42.3045	0.0792	0.0831	-0.9192	0.1400
llar_cds_dt	Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending	1.3659	1.2672	42.3974	44.6597	0.0793	0.0834	-0.7393	0.1167
naive	Naive Forecaster	1.5654	1.4951	48.4444	52.5232	0.0920	0.0981	-1.8344	0.1067
snaive	Seasonal Naive Forecaster	1.6741	1.5343	51.6667	53.7350	0.1052	0.1117	-4.5388	0.0733
polytrend	Polynomial Trend Forecaster	2.1553	2.1096	66.9817	74.4048	0.1241	0.1350	-4.2525	0.0567
croston	Croston	2.4565	2.3513	76.3953	82.9794	0.1394	0.1562	-4.5895	0.0433
grand_means	Grand Means Forecaster	7.3065	6.5029	226.0502	228.3880	0.4469	0.5821	-72.1183	0.0733

Processing:   0%|          | 0/125 [00:00<?, ?it/s]

compare_models by default uses all the estimators in model library (all except models with Turbo=False) . To see all available models you can use the function models()

In [27]:

# check available models
models()

Out[27]:

	Name	Reference	Turbo
ID
naive	Naive Forecaster	sktime.forecasting.naive.NaiveForecaster	True
grand_means	Grand Means Forecaster	sktime.forecasting.naive.NaiveForecaster	True
snaive	Seasonal Naive Forecaster	sktime.forecasting.naive.NaiveForecaster	True
polytrend	Polynomial Trend Forecaster	sktime.forecasting.trend.PolynomialTrendForeca...	True
arima	ARIMA	sktime.forecasting.arima.ARIMA	True
auto_arima	Auto ARIMA	sktime.forecasting.arima.AutoARIMA	True
exp_smooth	Exponential Smoothing	sktime.forecasting.exp_smoothing.ExponentialSm...	True
croston	Croston	sktime.forecasting.croston.Croston	True
ets	ETS	sktime.forecasting.ets.AutoETS	True
theta	Theta Forecaster	sktime.forecasting.theta.ThetaForecaster	True
tbats	TBATS	sktime.forecasting.tbats.TBATS	False
bats	BATS	sktime.forecasting.bats.BATS	False
lr_cds_dt	Linear w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
en_cds_dt	Elastic Net w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
ridge_cds_dt	Ridge w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
lasso_cds_dt	Lasso w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
lar_cds_dt	Least Angular Regressor w/ Cond. Deseasonalize...	pycaret.containers.models.time_series.BaseCdsD...	True
llar_cds_dt	Lasso Least Angular Regressor w/ Cond. Deseaso...	pycaret.containers.models.time_series.BaseCdsD...	True
br_cds_dt	Bayesian Ridge w/ Cond. Deseasonalize & Detren...	pycaret.containers.models.time_series.BaseCdsD...	True
huber_cds_dt	Huber w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
par_cds_dt	Passive Aggressive w/ Cond. Deseasonalize & De...	pycaret.containers.models.time_series.BaseCdsD...	True
omp_cds_dt	Orthogonal Matching Pursuit w/ Cond. Deseasona...	pycaret.containers.models.time_series.BaseCdsD...	True
knn_cds_dt	K Neighbors w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
dt_cds_dt	Decision Tree w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
rf_cds_dt	Random Forest w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
et_cds_dt	Extra Trees w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
gbr_cds_dt	Gradient Boosting w/ Cond. Deseasonalize & Det...	pycaret.containers.models.time_series.BaseCdsD...	True
ada_cds_dt	AdaBoost w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
xgboost_cds_dt	Extreme Gradient Boosting w/ Cond. Deseasonali...	pycaret.containers.models.time_series.BaseCdsD...	True
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize...	pycaret.containers.models.time_series.BaseCdsD...	True
catboost_cds_dt	CatBoost Regressor w/ Cond. Deseasonalize & De...	pycaret.containers.models.time_series.BaseCdsD...	True

You can use the include and exclude parameter in the compare_models to train only select model or exclude specific models from training by passing the model id's in exclude parameter.

In [28]:

compare_ts_models = compare_models(include = ['ets', 'arima', 'theta', 'naive', 'snaive', 'grand_means', 'polytrend'])

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
ets	ETS	0.4912	0.5541	15.0940	19.3099	0.0318	0.0316	-0.4465	0.1033
arima	ARIMA	0.6964	0.7110	21.3757	24.7774	0.0447	0.0456	-0.5495	0.0800
theta	Theta Forecaster	1.0839	1.0393	33.3223	36.2555	0.0686	0.0710	-1.7926	0.0500
naive	Naive Forecaster	1.5654	1.4951	48.4444	52.5232	0.0920	0.0981	-1.8344	0.0467
snaive	Seasonal Naive Forecaster	1.6741	1.5343	51.6667	53.7350	0.1052	0.1117	-4.5388	0.0400
polytrend	Polynomial Trend Forecaster	2.1553	2.1096	66.9817	74.4048	0.1241	0.1350	-4.2525	0.0500
grand_means	Grand Means Forecaster	7.3065	6.5029	226.0502	228.3880	0.4469	0.5821	-72.1183	0.0500

Processing:   0%|          | 0/33 [00:00<?, ?it/s]

In [29]:

compare_ts_models

Out[29]:

AutoETS(seasonal='mul', sp=12, trend='add')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The function above has return trained model object as an output. The scoring grid is only displayed and not returned. If you need access to the scoring grid you can use pull function to access the dataframe.

In [30]:

compare_ts_models_results = pull()
compare_ts_models_results

Out[30]:

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
ets	ETS	0.4912	0.5541	15.094	19.3099	0.0318	0.0316	-0.4465	0.1033
arima	ARIMA	0.6964	0.711	21.3757	24.7774	0.0447	0.0456	-0.5495	0.0800
theta	Theta Forecaster	1.0839	1.0393	33.3223	36.2555	0.0686	0.071	-1.7926	0.0500
naive	Naive Forecaster	1.5654	1.4951	48.4444	52.5232	0.092	0.0981	-1.8344	0.0467
snaive	Seasonal Naive Forecaster	1.6741	1.5343	51.6667	53.735	0.1052	0.1117	-4.5388	0.0400
polytrend	Polynomial Trend Forecaster	2.1553	2.1096	66.9817	74.4048	0.1241	0.135	-4.2525	0.0500
grand_means	Grand Means Forecaster	7.3065	6.5029	226.0502	228.388	0.4469	0.5821	-72.1183	0.0500

By default compare_models return the single best performing model based on the metric defined in the sort parameter. Let's change our code to return 3 top models based on MAE.

In [31]:

best_mae_models_top3 = compare_models(sort = 'R2', n_select = 3)

	Model	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2	TT (Sec)
huber_cds_dt	Huber w/ Cond. Deseasonalize & Detrending	0.8658	0.8362	26.7826	29.3947	0.0516	0.0536	0.1501	0.1300
lar_cds_dt	Least Angular Regressor w/ Cond. Deseasonalize & Detrending	0.8503	0.8261	26.2655	28.9830	0.0513	0.0534	0.0367	0.1067
par_cds_dt	Passive Aggressive w/ Cond. Deseasonalize & Detrending	0.7212	0.6696	22.1794	23.3673	0.0453	0.0468	0.0261	0.1100
lasso_cds_dt	Lasso w/ Cond. Deseasonalize & Detrending	0.8966	0.8759	27.7231	30.7594	0.0536	0.0558	-0.0040	0.1133
en_cds_dt	Elastic Net w/ Cond. Deseasonalize & Detrending	0.8944	0.8746	27.6535	30.7127	0.0535	0.0557	-0.0063	0.1133
lr_cds_dt	Linear w/ Cond. Deseasonalize & Detrending	0.8904	0.8722	27.5266	30.6243	0.0534	0.0555	-0.0092	0.1200
ridge_cds_dt	Ridge w/ Cond. Deseasonalize & Detrending	0.8905	0.8722	27.5270	30.6246	0.0534	0.0555	-0.0092	0.1167
br_cds_dt	Bayesian Ridge w/ Cond. Deseasonalize & Detrending	0.9156	0.8878	28.3188	31.1821	0.0547	0.0569	-0.0209	0.1267
knn_cds_dt	K Neighbors w/ Cond. Deseasonalize & Detrending	1.0695	0.9924	33.1500	34.9277	0.0631	0.0656	-0.1682	0.1367
et_cds_dt	Extra Trees w/ Cond. Deseasonalize & Detrending	1.1678	1.0866	36.1678	38.2100	0.0694	0.0726	-0.4302	0.2133
ets	ETS	0.4912	0.5541	15.0940	19.3099	0.0318	0.0316	-0.4465	0.1000
exp_smooth	Exponential Smoothing	0.4929	0.5560	15.1460	19.3779	0.0320	0.0317	-0.4600	0.1033
auto_arima	Auto ARIMA	0.7136	0.6945	21.9389	24.2138	0.0459	0.0464	-0.5454	11.7400
arima	ARIMA	0.6964	0.7110	21.3757	24.7774	0.0447	0.0456	-0.5495	0.0867
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2019	1.1362	37.2359	39.9827	0.0713	0.0746	-0.6051	0.3667
ada_cds_dt	AdaBoost w/ Cond. Deseasonalize & Detrending	1.2786	1.1951	39.6382	42.0658	0.0750	0.0788	-0.6308	0.1433
catboost_cds_dt	CatBoost Regressor w/ Cond. Deseasonalize & Detrending	1.2523	1.1604	38.8002	40.8201	0.0745	0.0780	-0.6842	1.1400
omp_cds_dt	Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending	1.2171	1.1475	37.6457	40.3070	0.0724	0.0757	-0.7057	0.1200
gbr_cds_dt	Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.2274	1.1449	37.9963	40.2550	0.0735	0.0769	-0.7190	0.1333
llar_cds_dt	Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending	1.3659	1.2672	42.3974	44.6597	0.0793	0.0834	-0.7393	0.1033
dt_cds_dt	Decision Tree w/ Cond. Deseasonalize & Detrending	1.1930	1.1346	36.9106	39.8518	0.0733	0.0769	-0.8135	0.1200
xgboost_cds_dt	Extreme Gradient Boosting w/ Cond. Deseasonalize & Detrending	1.3198	1.2045	40.8342	42.3045	0.0792	0.0831	-0.9192	0.1333
rf_cds_dt	Random Forest w/ Cond. Deseasonalize & Detrending	1.2500	1.1782	38.6418	41.3528	0.0749	0.0784	-0.9426	0.2033
theta	Theta Forecaster	1.0839	1.0393	33.3223	36.2555	0.0686	0.0710	-1.7926	0.0600
naive	Naive Forecaster	1.5654	1.4951	48.4444	52.5232	0.0920	0.0981	-1.8344	0.0500
polytrend	Polynomial Trend Forecaster	2.1553	2.1096	66.9817	74.4048	0.1241	0.1350	-4.2525	0.0500
snaive	Seasonal Naive Forecaster	1.6741	1.5343	51.6667	53.7350	0.1052	0.1117	-4.5388	0.0600
croston	Croston	2.4565	2.3513	76.3953	82.9794	0.1394	0.1562	-4.5895	0.0433
grand_means	Grand Means Forecaster	7.3065	6.5029	226.0502	228.3880	0.4469	0.5821	-72.1183	0.0633

Processing:   0%|          | 0/127 [00:00<?, ?it/s]

In [32]:

# list of top 3 models by MAE
best_mae_models_top3

Out[32]:

[BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                         10, 9,
                                                                         8, 7, 6,
                                                                         5, 4, 3,
                                                                         2, 1]},
                                                    n_jobs=1)],
                     regressor=HuberRegressor(), sp=12, window_length=12),
 BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                         10, 9,
                                                                         8, 7, 6,
                                                                         5, 4, 3,
                                                                         2, 1]},
                                                    n_jobs=1)],
                     regressor=Lars(random_state=123), sp=12, window_length=12),
 BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                         10, 9,
                                                                         8, 7, 6,
                                                                         5, 4, 3,
                                                                         2, 1]},
                                                    n_jobs=1)],
                     regressor=PassiveAggressiveRegressor(random_state=123),
                     sp=12, window_length=12)]

Some other parameters that you might find very useful in compare_models are:

fold
cross_validation
budget_time
errors
parallel
engine

You can check the docstring of the function for more info.

In [33]:

# help(compare_models)

✅ Check Stats¶

The check_stats function is used to get summary statistics and run statistical tests on the original data or model residuals.

In [34]:

# check stats on original data
check_stats()

Out[34]:

	Test	Test Name	Data	Property	Setting	Value
0	Summary	Statistics	Transformed	Length		144.0
1	Summary	Statistics	Transformed	# Missing Values		0.0
2	Summary	Statistics	Transformed	Mean		280.298611
3	Summary	Statistics	Transformed	Median		265.5
4	Summary	Statistics	Transformed	Standard Deviation		119.966317
5	Summary	Statistics	Transformed	Variance		14391.917201
6	Summary	Statistics	Transformed	Kurtosis		-0.364942
7	Summary	Statistics	Transformed	Skewness		0.58316
8	Summary	Statistics	Transformed	# Distinct Values		118.0
9	White Noise	Ljung-Box	Transformed	Test Statictic	{'alpha': 0.05, 'K': 24}	1606.083817
10	White Noise	Ljung-Box	Transformed	Test Statictic	{'alpha': 0.05, 'K': 48}	1933.155822
11	White Noise	Ljung-Box	Transformed	p-value	{'alpha': 0.05, 'K': 24}	0.0
12	White Noise	Ljung-Box	Transformed	p-value	{'alpha': 0.05, 'K': 48}	0.0
13	White Noise	Ljung-Box	Transformed	White Noise	{'alpha': 0.05, 'K': 24}	False
14	White Noise	Ljung-Box	Transformed	White Noise	{'alpha': 0.05, 'K': 48}	False
15	Stationarity	ADF	Transformed	Stationarity	{'alpha': 0.05}	False
16	Stationarity	ADF	Transformed	p-value	{'alpha': 0.05}	0.99188
17	Stationarity	ADF	Transformed	Test Statistic	{'alpha': 0.05}	0.815369
18	Stationarity	ADF	Transformed	Critical Value 1%	{'alpha': 0.05}	-3.481682
19	Stationarity	ADF	Transformed	Critical Value 5%	{'alpha': 0.05}	-2.884042
20	Stationarity	ADF	Transformed	Critical Value 10%	{'alpha': 0.05}	-2.57877
21	Stationarity	KPSS	Transformed	Trend Stationarity	{'alpha': 0.05}	True
22	Stationarity	KPSS	Transformed	p-value	{'alpha': 0.05}	0.1
23	Stationarity	KPSS	Transformed	Test Statistic	{'alpha': 0.05}	0.09615
24	Stationarity	KPSS	Transformed	Critical Value 10%	{'alpha': 0.05}	0.119
25	Stationarity	KPSS	Transformed	Critical Value 5%	{'alpha': 0.05}	0.146
26	Stationarity	KPSS	Transformed	Critical Value 2.5%	{'alpha': 0.05}	0.176
27	Stationarity	KPSS	Transformed	Critical Value 1%	{'alpha': 0.05}	0.216
28	Normality	Shapiro	Transformed	Normality	{'alpha': 0.05}	False
29	Normality	Shapiro	Transformed	p-value	{'alpha': 0.05}	0.000068

In [35]:

# check_stats on residuals of best model
check_stats(estimator = best)

Out[35]:

	Test	Test Name	Data	Property	Setting	Value
0	Summary	Statistics	Residual	Length		141.0
1	Summary	Statistics	Residual	# Missing Values		0.0
2	Summary	Statistics	Residual	Mean		-0.040771
3	Summary	Statistics	Residual	Median		-0.9734
4	Summary	Statistics	Residual	Standard Deviation		10.584861
5	Summary	Statistics	Residual	Variance		112.039291
6	Summary	Statistics	Residual	Kurtosis		1.564477
7	Summary	Statistics	Residual	Skewness		-0.180433
8	Summary	Statistics	Residual	# Distinct Values		141.0
9	White Noise	Ljung-Box	Residual	Test Statictic	{'alpha': 0.05, 'K': 24}	41.377235
10	White Noise	Ljung-Box	Residual	Test Statictic	{'alpha': 0.05, 'K': 48}	62.234507
11	White Noise	Ljung-Box	Residual	p-value	{'alpha': 0.05, 'K': 24}	0.015137
12	White Noise	Ljung-Box	Residual	p-value	{'alpha': 0.05, 'K': 48}	0.081294
13	White Noise	Ljung-Box	Residual	White Noise	{'alpha': 0.05, 'K': 24}	False
14	White Noise	Ljung-Box	Residual	White Noise	{'alpha': 0.05, 'K': 48}	True
15	Stationarity	ADF	Residual	Stationarity	{'alpha': 0.05}	True
16	Stationarity	ADF	Residual	p-value	{'alpha': 0.05}	0.000377
17	Stationarity	ADF	Residual	Test Statistic	{'alpha': 0.05}	-4.341183
18	Stationarity	ADF	Residual	Critical Value 1%	{'alpha': 0.05}	-3.481282
19	Stationarity	ADF	Residual	Critical Value 5%	{'alpha': 0.05}	-2.883868
20	Stationarity	ADF	Residual	Critical Value 10%	{'alpha': 0.05}	-2.578677
21	Stationarity	KPSS	Residual	Trend Stationarity	{'alpha': 0.05}	True
22	Stationarity	KPSS	Residual	p-value	{'alpha': 0.05}	0.1
23	Stationarity	KPSS	Residual	Test Statistic	{'alpha': 0.05}	0.036131
24	Stationarity	KPSS	Residual	Critical Value 10%	{'alpha': 0.05}	0.119
25	Stationarity	KPSS	Residual	Critical Value 5%	{'alpha': 0.05}	0.146
26	Stationarity	KPSS	Residual	Critical Value 2.5%	{'alpha': 0.05}	0.176
27	Stationarity	KPSS	Residual	Critical Value 1%	{'alpha': 0.05}	0.216
28	Normality	Shapiro	Residual	Normality	{'alpha': 0.05}	False
29	Normality	Shapiro	Residual	p-value	{'alpha': 0.05}	0.026076

✅ Experiment Logging¶

PyCaret integrates with many different type of experiment loggers (default = 'mlflow'). To turn on experiment tracking in PyCaret you can set log_experiment and experiment_name parameter. It will automatically track all the metrics, hyperparameters, and artifacts based on the defined logger.

In [36]:

# from pycaret.time_series import *
# s = setup(data, fh = 3, session_id = 123, log_experiment='mlflow', experiment_name='airline_experiment')

In [37]:

# compare models
# best = compare_models()

In [38]:

# start mlflow server on localhost:5000
# !mlflow ui

By default PyCaret uses MLFlow logger that can be changed using log_experiment parameter. Following loggers are available:

- mlflow
- wandb
- comet_ml
- dagshub

Other logging related parameters that you may find useful are:

experiment_custom_tags
log_plots
log_data
log_profile

For more information check out the docstring of the setup function.

In [39]:

# help(setup)

✅ Create Model¶

This function trains and evaluates the performance of a given estimator using cross-validation. The output of this function is a scoring grid with CV scores by fold. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function. All the available models can be accessed using the models function.

In [40]:

# check all the available models
models()

Out[40]:

	Name	Reference	Turbo
ID
naive	Naive Forecaster	sktime.forecasting.naive.NaiveForecaster	True
grand_means	Grand Means Forecaster	sktime.forecasting.naive.NaiveForecaster	True
snaive	Seasonal Naive Forecaster	sktime.forecasting.naive.NaiveForecaster	True
polytrend	Polynomial Trend Forecaster	sktime.forecasting.trend.PolynomialTrendForeca...	True
arima	ARIMA	sktime.forecasting.arima.ARIMA	True
auto_arima	Auto ARIMA	sktime.forecasting.arima.AutoARIMA	True
exp_smooth	Exponential Smoothing	sktime.forecasting.exp_smoothing.ExponentialSm...	True
croston	Croston	sktime.forecasting.croston.Croston	True
ets	ETS	sktime.forecasting.ets.AutoETS	True
theta	Theta Forecaster	sktime.forecasting.theta.ThetaForecaster	True
tbats	TBATS	sktime.forecasting.tbats.TBATS	False
bats	BATS	sktime.forecasting.bats.BATS	False
lr_cds_dt	Linear w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
en_cds_dt	Elastic Net w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
ridge_cds_dt	Ridge w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
lasso_cds_dt	Lasso w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
lar_cds_dt	Least Angular Regressor w/ Cond. Deseasonalize...	pycaret.containers.models.time_series.BaseCdsD...	True
llar_cds_dt	Lasso Least Angular Regressor w/ Cond. Deseaso...	pycaret.containers.models.time_series.BaseCdsD...	True
br_cds_dt	Bayesian Ridge w/ Cond. Deseasonalize & Detren...	pycaret.containers.models.time_series.BaseCdsD...	True
huber_cds_dt	Huber w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
par_cds_dt	Passive Aggressive w/ Cond. Deseasonalize & De...	pycaret.containers.models.time_series.BaseCdsD...	True
omp_cds_dt	Orthogonal Matching Pursuit w/ Cond. Deseasona...	pycaret.containers.models.time_series.BaseCdsD...	True
knn_cds_dt	K Neighbors w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
dt_cds_dt	Decision Tree w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
rf_cds_dt	Random Forest w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
et_cds_dt	Extra Trees w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
gbr_cds_dt	Gradient Boosting w/ Cond. Deseasonalize & Det...	pycaret.containers.models.time_series.BaseCdsD...	True
ada_cds_dt	AdaBoost w/ Cond. Deseasonalize & Detrending	pycaret.containers.models.time_series.BaseCdsD...	True
xgboost_cds_dt	Extreme Gradient Boosting w/ Cond. Deseasonali...	pycaret.containers.models.time_series.BaseCdsD...	True
lightgbm_cds_dt	Light Gradient Boosting w/ Cond. Deseasonalize...	pycaret.containers.models.time_series.BaseCdsD...	True
catboost_cds_dt	CatBoost Regressor w/ Cond. Deseasonalize & De...	pycaret.containers.models.time_series.BaseCdsD...	True

In [41]:

# train ets with default fold=3
ets = create_model('ets')

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-12	0.5083	0.7238	15.4772	25.0045	0.0371	0.0354	-2.8436
1	1960-03	0.6856	0.6262	21.0315	21.7984	0.0437	0.0448	0.5529
2	1960-06	0.2796	0.3123	8.7733	11.1270	0.0147	0.0146	0.9512
Mean	NaT	0.4912	0.5541	15.0940	19.3099	0.0318	0.0316	-0.4465
SD	NaT	0.1662	0.1755	5.0117	5.9324	0.0124	0.0126	1.7028

Processing:   0%|          | 0/4 [00:00<?, ?it/s]

The function above has return trained model object as an output. The scoring grid is only displayed and not returned. If you need access to the scoring grid you can use pull function to access the dataframe.

In [42]:

ets_results = pull()
print(type(ets_results))
ets_results

<class 'pandas.core.frame.DataFrame'>

Out[42]:

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-12	0.5083	0.7238	15.4772	25.0045	0.0371	0.0354	-2.8436
1	1960-03	0.6856	0.6262	21.0315	21.7984	0.0437	0.0448	0.5529
2	1960-06	0.2796	0.3123	8.7733	11.1270	0.0147	0.0146	0.9512
Mean	NaT	0.4912	0.5541	15.0940	19.3099	0.0318	0.0316	-0.4465
SD	NaT	0.1662	0.1755	5.0117	5.9324	0.0124	0.0126	1.7028

In [43]:

# train theta model with fold=5
theta = create_model('theta', fold=5)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-06	0.8152	0.8212	23.7114	27.0777	0.0436	0.0448	0.6016
1	1959-09	0.1622	0.1723	4.8339	5.8216	0.0127	0.0128	0.9213
2	1959-12	0.6788	0.7857	20.6700	27.1432	0.0501	0.0481	-3.5292
3	1960-03	2.0377	1.8037	62.5075	62.7874	0.1276	0.1363	-2.7090
4	1960-06	0.5352	0.5287	16.7895	18.8359	0.0282	0.0286	0.8603
Mean	NaT	0.8458	0.8223	25.7024	28.3332	0.0524	0.0541	-0.7710
SD	NaT	0.6346	0.5428	19.4876	18.9053	0.0397	0.0430	1.9377

Processing:   0%|          | 0/4 [00:00<?, ?it/s]

In [44]:

# train theta with specific model parameters
create_model('theta', deseasonalize = False, fold=5)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-06	1.9597	1.9658	57.0033	64.8214	0.1046	0.1117	-1.2833
1	1959-09	2.5537	2.3345	76.0868	78.8857	0.1979	0.1785	-13.4421
2	1959-12	0.3980	0.3686	12.1206	12.7351	0.0300	0.0298	0.0030
3	1960-03	2.1688	2.1163	66.5262	73.6688	0.1324	0.1436	-4.1060
4	1960-06	1.9552	1.8291	61.3391	65.1682	0.1034	0.1083	-0.6723
Mean	NaT	1.8071	1.7229	54.6152	59.0559	0.1136	0.1144	-3.9002
SD	NaT	0.7374	0.6976	22.1793	23.7612	0.0541	0.0493	4.9718

Processing:   0%|          | 0/4 [00:00<?, ?it/s]

Out[44]:

ThetaForecaster(deseasonalize=False, sp=12)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Some other parameters that you might find very useful in create_model are:

cross_validation
engine
fit_kwargs

You can check the docstring of the function for more info.

In [45]:

# help(create_model)

✅ Tune Model¶

The tune_model function tunes the hyperparameters of the model. The output of this function is a scoring grid with cross-validated scores by fold. The best model is selected based on the metric defined in optimize parameter. Metrics evaluated during cross-validation can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

In [46]:

# train a dt model with default params
dt = create_model('dt_cds_dt')

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-12	0.5039	0.5459	15.3434	18.8593	0.0377	0.0388	-1.1865
1	1960-03	1.5566	1.3747	47.7489	47.8526	0.0984	0.1036	-1.1544
2	1960-06	1.5185	1.4832	47.6395	52.8433	0.0838	0.0884	-0.0996
Mean	NaT	1.1930	1.1346	36.9106	39.8518	0.0733	0.0769	-0.8135
SD	NaT	0.4875	0.4186	15.2504	14.9831	0.0259	0.0277	0.5050

Processing:   0%|          | 0/4 [00:00<?, ?it/s]

In [47]:

# tune hyperparameters of dt
tuned_dt = tune_model(dt)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-12	0.6369	0.7822	19.3938	27.0225	0.0470	0.0450	-3.4890
1	1960-03	1.3005	1.1639	39.8938	40.5155	0.0819	0.0856	-0.5444
2	1960-06	0.9561	0.9788	29.9971	34.8742	0.0495	0.0512	0.5211
Mean	NaT	0.9645	0.9750	29.7616	34.1374	0.0595	0.0606	-1.1708
SD	NaT	0.2710	0.1559	8.3707	5.5331	0.0159	0.0178	1.6960

Processing:   0%|          | 0/7 [00:00<?, ?it/s]

Fitting 3 folds for each of 10 candidates, totalling 30 fits

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:    3.2s finished

Metric to optimize can be defined in optimize parameter (default = 'MASE'). Also, a custom tuned grid can be passed with custom_grid parameter.

In [48]:

dt

Out[48]:

BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                        10, 9,
                                                                        8, 7, 6,
                                                                        5, 4, 3,
                                                                        2, 1]},
                                                   n_jobs=1)],
                    regressor=DecisionTreeRegressor(random_state=123), sp=12,
                    window_length=12)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [49]:

# define tuning grid
dt_grid = {'regressor__max_depth' : [None, 2, 4, 6, 8, 10, 12]}

# tune model with custom grid and metric = MAE
tuned_dt = tune_model(dt, custom_grid = dt_grid, optimize = 'MAE')

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-12	0.5466	0.5815	16.6450	20.0910	0.0409	0.0421	-1.4814
1	1960-03	1.2777	1.1388	39.1945	39.6419	0.0799	0.0833	-0.4785
2	1960-06	1.6742	1.5262	52.5234	54.3772	0.0906	0.0952	-0.1643
Mean	NaT	1.1662	1.0822	36.1210	38.0367	0.0705	0.0735	-0.7081
SD	NaT	0.4670	0.3877	14.8077	14.0432	0.0214	0.0227	0.5617

Processing:   0%|          | 0/7 [00:00<?, ?it/s]

Fitting 3 folds for each of 7 candidates, totalling 21 fits

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  21 out of  21 | elapsed:    2.9s finished

In [50]:

# see tuned_dt params
tuned_dt

Out[50]:

BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                        10, 9,
                                                                        8, 7, 6,
                                                                        5, 4, 3,
                                                                        2, 1]},
                                                   n_jobs=1)],
                    regressor=DecisionTreeRegressor(max_depth=4,
                                                    random_state=123),
                    sp=12, window_length=12)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [51]:

# to access the tuner object you can set return_tuner = True
tuned_dt, tuner = tune_model(dt, return_tuner=True)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-12	0.6369	0.7822	19.3938	27.0225	0.0470	0.0450	-3.4890
1	1960-03	1.3005	1.1639	39.8938	40.5155	0.0819	0.0856	-0.5444
2	1960-06	0.9561	0.9788	29.9971	34.8742	0.0495	0.0512	0.5211
Mean	NaT	0.9645	0.9750	29.7616	34.1374	0.0595	0.0606	-1.1708
SD	NaT	0.2710	0.1559	8.3707	5.5331	0.0159	0.0178	1.6960

Processing:   0%|          | 0/7 [00:00<?, ?it/s]

Fitting 3 folds for each of 10 candidates, totalling 30 fits

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:    3.8s finished

In [52]:

# model object
tuned_dt

Out[52]:

BaseCdsDtForecaster(degree=3, deseasonal_model='multiplicative',
                    fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                        10, 9,
                                                                        8, 7, 6,
                                                                        5, 4, 3,
                                                                        2, 1]},
                                                   n_jobs=1)],
                    regressor=DecisionTreeRegressor(max_depth=9,
                                                    max_features='log2',
                                                    min_impurity_decrease=0.005742993267225779,
                                                    min_samples_leaf=5,
                                                    min_samples_split=4,
                                                    random_state=123),
                    sp=12, window_length=22)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [53]:

# tuner object
tuner

Out[53]:

<pycaret.utils.time_series.forecasting.model_selection.ForecastingRandomizedSearchCV at 0x1d36d1965b0>

For more details on all available search_library and search_algorithm please check the docstring. Some other parameters that you might find very useful in tune_model are:

choose_better
custom_scorer
n_iter
search_algorithm
optimize

You can check the docstring of the function for more info.

In [54]:

# help(tune_model)

✅ Blend Models¶

This function trains a EnsembleForecaster for select models passed in the estimator_list parameter. The output of this function is a scoring grid with CV scores by fold. Metrics evaluated during CV can be accessed using the get_metrics function. Custom metrics can be added or removed using add_metric and remove_metric function.

In [55]:

# top 3 models based on mae
best_mae_models_top3

Out[55]:

[BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                         10, 9,
                                                                         8, 7, 6,
                                                                         5, 4, 3,
                                                                         2, 1]},
                                                    n_jobs=1)],
                     regressor=HuberRegressor(), sp=12, window_length=12),
 BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                         10, 9,
                                                                         8, 7, 6,
                                                                         5, 4, 3,
                                                                         2, 1]},
                                                    n_jobs=1)],
                     regressor=Lars(random_state=123), sp=12, window_length=12),
 BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12, 11,
                                                                         10, 9,
                                                                         8, 7, 6,
                                                                         5, 4, 3,
                                                                         2, 1]},
                                                    n_jobs=1)],
                     regressor=PassiveAggressiveRegressor(random_state=123),
                     sp=12, window_length=12)]

In [56]:

# blend top 3 models
blend_models(best_mae_models_top3)

	cutoff	MASE	RMSSE	MAE	RMSE	MAPE	SMAPE	R2
0	1959-12	0.1240	0.1641	3.7761	5.6693	0.0091	0.0092	0.8024
1	1960-03	1.4150	1.2555	43.4050	43.7064	0.0890	0.0932	-0.7972
2	1960-06	0.7444	0.7505	23.3552	26.7403	0.0386	0.0395	0.7184
Mean	NaT	0.7612	0.7234	23.5121	25.3720	0.0456	0.0473	0.2412
SD	NaT	0.5272	0.4460	16.1788	15.5587	0.0330	0.0347	0.7351

Processing:   0%|          | 0/6 [00:00<?, ?it/s]

Out[56]:

_EnsembleForecasterWithVoting(forecasters=[('HuberRegressor',
                                            BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12,
                                                                                                                    11,
                                                                                                                    10,
                                                                                                                    9,
                                                                                                                    8,
                                                                                                                    7,
                                                                                                                    6,
                                                                                                                    5,
                                                                                                                    4,
                                                                                                                    3,
                                                                                                                    2,
                                                                                                                    1]},
                                                                                               n_jobs=1)],
                                                                regressor=HuberRegressor(),
                                                                sp=12,
                                                                window_length=12)),
                                           ('Lars',
                                            BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12,
                                                                                                                    11,
                                                                                                                    10,
                                                                                                                    9,
                                                                                                                    8,
                                                                                                                    7,
                                                                                                                    6,
                                                                                                                    5,
                                                                                                                    4,
                                                                                                                    3,
                                                                                                                    2,
                                                                                                                    1]},
                                                                                               n_jobs=1)],
                                                                regressor=Lars(random_state=123),
                                                                sp=12,
                                                                window_length=12)),
                                           ('PassiveAggressiveRegressor',
                                            BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12,
                                                                                                                    11,
                                                                                                                    10,
                                                                                                                    9,
                                                                                                                    8,
                                                                                                                    7,
                                                                                                                    6,
                                                                                                                    5,
                                                                                                                    4,
                                                                                                                    3,
                                                                                                                    2,
                                                                                                                    1]},
                                                                                               n_jobs=1)],
                                                                regressor=PassiveAggressiveRegressor(random_state=123),
                                                                sp=12,
                                                                window_length=12))],
                              n_jobs=-1)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

_EnsembleForecasterWithVoting

_EnsembleForecasterWithVoting(forecasters=[('HuberRegressor',
                                            BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12,
                                                                                                                    11,
                                                                                                                    10,
                                                                                                                    9,
                                                                                                                    8,
                                                                                                                    7,
                                                                                                                    6,
                                                                                                                    5,
                                                                                                                    4,
                                                                                                                    3,
                                                                                                                    2,
                                                                                                                    1]},
                                                                                               n_jobs=1)],
                                                                regressor=HuberRegressor(),
                                                                sp=12,
                                                                window_length=12)),
                                           ('Lars',
                                            BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12,
                                                                                                                    11,
                                                                                                                    10,
                                                                                                                    9,
                                                                                                                    8,
                                                                                                                    7,
                                                                                                                    6,
                                                                                                                    5,
                                                                                                                    4,
                                                                                                                    3,
                                                                                                                    2,
                                                                                                                    1]},
                                                                                               n_jobs=1)],
                                                                regressor=Lars(random_state=123),
                                                                sp=12,
                                                                window_length=12)),
                                           ('PassiveAggressiveRegressor',
                                            BaseCdsDtForecaster(fe_target_rr=[WindowSummarizer(lag_feature={'lag': [12,
                                                                                                                    11,
                                                                                                                    10,
                                                                                                                    9,
                                                                                                                    8,
                                                                                                                    7,
                                                                                                                    6,
                                                                                                                    5,
                                                                                                                    4,
                                                                                                                    3,
                                                                                                                    2,
                                                                                                                    1]},
                                                                                               n_jobs=1)],
                                                                regressor=PassiveAggressiveRegressor(random_state=123),
                                                                sp=12,
                                                                window_length=12))],
                              n_jobs=-1)

Some other parameters that you might find very useful in blend_models are:

choose_better
method
weights
fit_kwargs
optimize

You can check the docstring of the function for more info.

In [57]:

# help(blend_models)

✅ Plot Model¶

This function analyzes the performance of a trained model on the hold-out set. It may require re-training the model in certain cases.

In [58]:

# plot forecast
plot_model(best, plot = 'forecast')

In [59]:

# plot acf
# for certain plots you don't need a trained model
plot_model(plot = 'acf')

In [60]:

# plot diagnostics
# for certain plots you don't need a trained model
plot_model(plot = 'diagnostics')

Some other parameters that you might find very useful in plot_model are:

fig_kwargs
data_kwargs
display_format
return_fig
return_data
save

You can check the docstring of the function for more info.

In [61]:

# help(plot_model)

✅ Finalize Model¶

This function trains a given model on the entire dataset including the hold-out set.

In [62]:

final_best = finalize_model(best)

In [63]:

final_best

Out[63]:

ForecastingPipeline(steps=[('forecaster',
                            TransformedTargetForecaster(steps=[('transformer_target',
                                                                TransformerPipeline(steps=[('numerical_imputer',
                                                                                            Imputer(random_state=123))])),
                                                               ('model',
                                                                AutoETS(seasonal='mul',
                                                                        sp=12,
                                                                        trend='add'))]))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

✅ Deploy Model¶

This function deploys the entire ML pipeline on the cloud.

AWS: When deploying model on AWS S3, environment variables must be configured using the command-line interface. To configure AWS environment variables, type aws configure in terminal. The following information is required which can be generated using the Identity and Access Management (IAM) portal of your amazon console account:

AWS Access Key ID
AWS Secret Key Access
Default Region Name (can be seen under Global settings on your AWS console)
Default output format (must be left blank)

GCP: To deploy a model on Google Cloud Platform ('gcp'), the project must be created using the command-line or GCP console. Once the project is created, you must create a service account and download the service account key as a JSON file to set environment variables in your local environment. Learn more about it: https://cloud.google.com/docs/authentication/production

Azure: To deploy a model on Microsoft Azure ('azure'), environment variables for the connection string must be set in your local environment. Go to settings of storage account on Azure portal to access the connection string required. AZURE_STORAGE_CONNECTION_STRING (required as environment variable) Learn more about it: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?toc=%2Fpython%2Fazure%2FTOC.json

In [64]:

# deploy model on aws s3
# deploy_model(best, model_name = 'my_first_platform_on_aws',
#             platform = 'aws', authentication = {'bucket' : 'pycaret-test'})

In [65]:

# load model from aws s3
# loaded_from_aws = load_model(model_name = 'my_first_platform_on_aws', platform = 'aws',
#                              authentication = {'bucket' : 'pycaret-test'})

# loaded_from_aws

✅ Save / Load Model¶

This function saves the transformation pipeline and a trained model object into the current working directory as a pickle file for later use.

In [66]:

# save model
save_model(best, 'my_first_model')

Transformation Pipeline and Model Successfully Saved

Out[66]:

(AutoETS(seasonal='mul', sp=12, trend='add'), 'my_first_model.pkl')

In [67]:

# load model
loaded_from_disk = load_model('my_first_model')
loaded_from_disk

Transformation Pipeline and Model Successfully Loaded

Out[67]:

AutoETS(seasonal='mul', sp=12, trend='add')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

✅ Save / Load Experiment¶

This function saves all the experiment variables on disk, allowing to later resume without rerunning the setup function.

In [68]:

# save experiment
save_experiment('my_experiment')

In [69]:

# load experiment from disk
exp_from_disk = load_experiment('my_experiment', data=data)

	Description	Value
0	session_id	123
1	Target	Number of airline passengers
2	Approach	Univariate
3	Exogenous Variables	Not Present
4	Original data shape	(144, 1)
5	Transformed data shape	(144, 1)
6	Transformed train set shape	(141, 1)
7	Transformed test set shape	(3, 1)
8	Rows with missing values	0.0%
9	Fold Generator	ExpandingWindowSplitter
10	Fold Number	3
11	Enforce Prediction Interval	False
12	Splits used for hyperparameters	all
13	Seasonality Detection Algo	auto
14	Max Period to Consider	60
15	Seasonal Period(s) Tested	[12, 24, 36, 11, 48]
16	Significant Seasonal Period(s)	[12, 24, 36, 11, 48]
17	Significant Seasonal Period(s) without Harmonics	[48, 36, 11]
18	Remove Harmonics	False
19	Harmonics Order Method	harmonic_max
20	Num Seasonalities to Use	1
21	All Seasonalities to Use	[12]
22	Primary Seasonality	12
23	Seasonality Present	True
24	Target Strictly Positive	True
25	Target White Noise	No
26	Recommended d	1
27	Recommended Seasonal D	1
28	Preprocess	True
29	Numerical Imputation (Target)	drift
30	Transformation (Target)	None
31	Scaling (Target)	None
32	Feature Engineering (Target) - Reduced Regression	False
33	CPU Jobs	-1
34	Use GPU	False
35	Log Experiment	False
36	Experiment Name	ts-default-name
37	USI	46d6

👋 PyCaret Time Series Forecasting Tutorial¶

💻 Installation¶

🚀 Quick start¶

Setup ➡️ Compare Models ➡️ Analyze Model ➡️ Prediction ➡️ Save Model ¶

Setup¶

Check Stats¶

Compare Models¶

Analyze Model¶

Prediction¶

Save Model¶

👇 Detailed function-by-function overview¶

✅ Setup¶

✅ Compare Models¶

✅ Check Stats¶

✅ Experiment Logging¶

✅ Create Model¶

✅ Tune Model¶

✅ Blend Models¶

✅ Plot Model¶

✅ Finalize Model¶

✅ Deploy Model¶

✅ Save / Load Model¶

✅ Save / Load Experiment¶

Setup ➡️ Compare Models ➡️ Analyze Model ➡️ Prediction ➡️ Save Model
¶