Last updated: 16 Feb 2023
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.
Compared with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks, such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more.
The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise.
PyCaret is tested and supported on the following 64-bit systems:
You can install PyCaret with Python's pip package manager:
pip install pycaret
PyCaret's default installation will not install all the extra dependencies automatically. For that you will have to install the full version:
pip install pycaret[full]
or depending on your use-case you may install one of the following variant:
pip install pycaret[analysis]
pip install pycaret[models]
pip install pycaret[tuner]
pip install pycaret[mlops]
pip install pycaret[parallel]
pip install pycaret[test]
# check installed version
import pycaret
pycaret.__version__
'3.0.0'
PyCaret's time series forecasting module is now available. The module currently is suitable for univariate / multivariate time series forecasting tasks. The API of time series module is consistent with other modules of PyCaret.
It comes built-in with preprocessing capabilities and over 30 algorithms comprising of statistical / time-series methods as well as machine learning based models. In addition to the model training, this module has lot of other capabilities such as automated hyperparameter tuning, ensembling, model analysis, model packaging and deployment capabilities.
A typical workflow in PyCaret consist of following 5 steps in this order:
### loading sample dataset from pycaret dataset module
from pycaret.datasets import get_data
data = get_data('airline')
Period 1949-01 112.0 1949-02 118.0 1949-03 132.0 1949-04 129.0 1949-05 121.0 Freq: M, Name: Number of airline passengers, dtype: float64
# plot the dataset
data.plot()
<AxesSubplot:xlabel='Period'>
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function in PyCaret. Setup
has only one required parameter i.e. data
. All the other parameters are optional.
# import pycaret time series and init setup
from pycaret.time_series import *
s = setup(data, fh = 3, session_id = 123)
Description | Value | |
---|---|---|
0 | session_id | 123 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (141, 1) |
7 | Transformed test set shape | (3, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | Seasonality Detection Algo | auto |
14 | Max Period to Consider | 60 |
15 | Seasonal Period(s) Tested | [12, 24, 36, 11, 48] |
16 | Significant Seasonal Period(s) | [12, 24, 36, 11, 48] |
17 | Significant Seasonal Period(s) without Harmonics | [48, 36, 11] |
18 | Remove Harmonics | False |
19 | Harmonics Order Method | harmonic_max |
20 | Num Seasonalities to Use | 1 |
21 | All Seasonalities to Use | [12] |
22 | Primary Seasonality | 12 |
23 | Seasonality Present | True |
24 | Target Strictly Positive | True |
25 | Target White Noise | No |
26 | Recommended d | 1 |
27 | Recommended Seasonal D | 1 |
28 | Preprocess | False |
29 | CPU Jobs | -1 |
30 | Use GPU | False |
31 | Log Experiment | False |
32 | Experiment Name | ts-default-name |
33 | USI | 4a01 |
Once the setup has been successfully executed it shows the information grid containing experiment level information.
session_id
is passed, a random number is automatically generated that is distributed to all functions.PyCaret has two set of API's that you can work with. (1) Functional (as seen above) and (2) Object Oriented API.
With Object Oriented API instead of executing functions directly you will import a class and execute methods of class.
# import TSForecastingExperiment and init the class
from pycaret.time_series import TSForecastingExperiment
exp = TSForecastingExperiment()
# check the type of exp
type(exp)
pycaret.time_series.forecasting.oop.TSForecastingExperiment
# init setup on exp
exp.setup(data, fh = 3, session_id = 123)
Description | Value | |
---|---|---|
0 | session_id | 123 |
1 | Target | Number of airline passengers |
2 | Approach | Univariate |
3 | Exogenous Variables | Not Present |
4 | Original data shape | (144, 1) |
5 | Transformed data shape | (144, 1) |
6 | Transformed train set shape | (141, 1) |
7 | Transformed test set shape | (3, 1) |
8 | Rows with missing values | 0.0% |
9 | Fold Generator | ExpandingWindowSplitter |
10 | Fold Number | 3 |
11 | Enforce Prediction Interval | False |
12 | Splits used for hyperparameters | all |
13 | Seasonality Detection Algo | auto |
14 | Max Period to Consider | 60 |
15 | Seasonal Period(s) Tested | [12, 24, 36, 11, 48] |
16 | Significant Seasonal Period(s) | [12, 24, 36, 11, 48] |
17 | Significant Seasonal Period(s) without Harmonics | [48, 36, 11] |
18 | Remove Harmonics | False |
19 | Harmonics Order Method | harmonic_max |
20 | Num Seasonalities to Use | 1 |
21 | All Seasonalities to Use | [12] |
22 | Primary Seasonality | 12 |
23 | Seasonality Present | True |
24 | Target Strictly Positive | True |
25 | Target White Noise | No |
26 | Recommended d | 1 |
27 | Recommended Seasonal D | 1 |
28 | Preprocess | False |
29 | CPU Jobs | -1 |
30 | Use GPU | False |
31 | Log Experiment | False |
32 | Experiment Name | ts-default-name |
33 | USI | cf71 |
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1d36ad79a90>
You can use any of the two method i.e. Functional or OOP and even switch back and forth between two set of API's. The choice of method will not impact the results and has been tested for consistency.
The check_stats
function is used to get summary statistics and run statistical tests on the original data or model residuals.
# check statistical tests on original data
check_stats()
Test | Test Name | Data | Property | Setting | Value | |
---|---|---|---|---|---|---|
0 | Summary | Statistics | Transformed | Length | 144.0 | |
1 | Summary | Statistics | Transformed | # Missing Values | 0.0 | |
2 | Summary | Statistics | Transformed | Mean | 280.298611 | |
3 | Summary | Statistics | Transformed | Median | 265.5 | |
4 | Summary | Statistics | Transformed | Standard Deviation | 119.966317 | |
5 | Summary | Statistics | Transformed | Variance | 14391.917201 | |
6 | Summary | Statistics | Transformed | Kurtosis | -0.364942 | |
7 | Summary | Statistics | Transformed | Skewness | 0.58316 | |
8 | Summary | Statistics | Transformed | # Distinct Values | 118.0 | |
9 | White Noise | Ljung-Box | Transformed | Test Statictic | {'alpha': 0.05, 'K': 24} | 1606.083817 |
10 | White Noise | Ljung-Box | Transformed | Test Statictic | {'alpha': 0.05, 'K': 48} | 1933.155822 |
11 | White Noise | Ljung-Box | Transformed | p-value | {'alpha': 0.05, 'K': 24} | 0.0 |
12 | White Noise | Ljung-Box | Transformed | p-value | {'alpha': 0.05, 'K': 48} | 0.0 |
13 | White Noise | Ljung-Box | Transformed | White Noise | {'alpha': 0.05, 'K': 24} | False |
14 | White Noise | Ljung-Box | Transformed | White Noise | {'alpha': 0.05, 'K': 48} | False |
15 | Stationarity | ADF | Transformed | Stationarity | {'alpha': 0.05} | False |
16 | Stationarity | ADF | Transformed | p-value | {'alpha': 0.05} | 0.99188 |
17 | Stationarity | ADF | Transformed | Test Statistic | {'alpha': 0.05} | 0.815369 |
18 | Stationarity | ADF | Transformed | Critical Value 1% | {'alpha': 0.05} | -3.481682 |
19 | Stationarity | ADF | Transformed | Critical Value 5% | {'alpha': 0.05} | -2.884042 |
20 | Stationarity | ADF | Transformed | Critical Value 10% | {'alpha': 0.05} | -2.57877 |
21 | Stationarity | KPSS | Transformed | Trend Stationarity | {'alpha': 0.05} | True |
22 | Stationarity | KPSS | Transformed | p-value | {'alpha': 0.05} | 0.1 |
23 | Stationarity | KPSS | Transformed | Test Statistic | {'alpha': 0.05} | 0.09615 |
24 | Stationarity | KPSS | Transformed | Critical Value 10% | {'alpha': 0.05} | 0.119 |
25 | Stationarity | KPSS | Transformed | Critical Value 5% | {'alpha': 0.05} | 0.146 |
26 | Stationarity | KPSS | Transformed | Critical Value 2.5% | {'alpha': 0.05} | 0.176 |
27 | Stationarity | KPSS | Transformed | Critical Value 1% | {'alpha': 0.05} | 0.216 |
28 | Normality | Shapiro | Transformed | Normality | {'alpha': 0.05} | False |
29 | Normality | Shapiro | Transformed | p-value | {'alpha': 0.05} | 0.000068 |
This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the get_metrics
function. Custom metrics can be added or removed using add_metric
and remove_metric
function.
# compare baseline models
best = compare_models()
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | TT (Sec) | |
---|---|---|---|---|---|---|---|---|---|
ets | ETS | 0.4912 | 0.5541 | 15.0940 | 19.3099 | 0.0318 | 0.0316 | -0.4465 | 0.0967 |
exp_smooth | Exponential Smoothing | 0.4929 | 0.5560 | 15.1460 | 19.3779 | 0.0320 | 0.0317 | -0.4600 | 0.1033 |
arima | ARIMA | 0.6964 | 0.7110 | 21.3757 | 24.7774 | 0.0447 | 0.0456 | -0.5495 | 0.0667 |
auto_arima | Auto ARIMA | 0.7136 | 0.6945 | 21.9389 | 24.2138 | 0.0459 | 0.0464 | -0.5454 | 9.6867 |
par_cds_dt | Passive Aggressive w/ Cond. Deseasonalize & Detrending | 0.7212 | 0.6696 | 22.1794 | 23.3673 | 0.0453 | 0.0468 | 0.0261 | 0.1200 |
lar_cds_dt | Least Angular Regressor w/ Cond. Deseasonalize & Detrending | 0.8503 | 0.8261 | 26.2655 | 28.9830 | 0.0513 | 0.0534 | 0.0367 | 0.0967 |
huber_cds_dt | Huber w/ Cond. Deseasonalize & Detrending | 0.8658 | 0.8362 | 26.7826 | 29.3947 | 0.0516 | 0.0536 | 0.1501 | 0.1333 |
lr_cds_dt | Linear w/ Cond. Deseasonalize & Detrending | 0.8904 | 0.8722 | 27.5266 | 30.6243 | 0.0534 | 0.0555 | -0.0092 | 0.4067 |
ridge_cds_dt | Ridge w/ Cond. Deseasonalize & Detrending | 0.8905 | 0.8722 | 27.5270 | 30.6246 | 0.0534 | 0.0555 | -0.0092 | 0.2933 |
en_cds_dt | Elastic Net w/ Cond. Deseasonalize & Detrending | 0.8944 | 0.8746 | 27.6535 | 30.7127 | 0.0535 | 0.0557 | -0.0063 | 0.3833 |
lasso_cds_dt | Lasso w/ Cond. Deseasonalize & Detrending | 0.8966 | 0.8759 | 27.7231 | 30.7594 | 0.0536 | 0.0558 | -0.0040 | 0.1033 |
br_cds_dt | Bayesian Ridge w/ Cond. Deseasonalize & Detrending | 0.9156 | 0.8878 | 28.3188 | 31.1821 | 0.0547 | 0.0569 | -0.0209 | 0.1067 |
knn_cds_dt | K Neighbors w/ Cond. Deseasonalize & Detrending | 1.0695 | 0.9924 | 33.1500 | 34.9277 | 0.0631 | 0.0656 | -0.1682 | 0.1233 |
theta | Theta Forecaster | 1.0839 | 1.0393 | 33.3223 | 36.2555 | 0.0686 | 0.0710 | -1.7926 | 0.0333 |
et_cds_dt | Extra Trees w/ Cond. Deseasonalize & Detrending | 1.1678 | 1.0866 | 36.1678 | 38.2100 | 0.0694 | 0.0726 | -0.4302 | 0.1900 |
dt_cds_dt | Decision Tree w/ Cond. Deseasonalize & Detrending | 1.1930 | 1.1346 | 36.9106 | 39.8518 | 0.0733 | 0.0769 | -0.8135 | 0.1300 |
lightgbm_cds_dt | Light Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.2019 | 1.1362 | 37.2359 | 39.9827 | 0.0713 | 0.0746 | -0.6051 | 0.6633 |
omp_cds_dt | Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending | 1.2171 | 1.1475 | 37.6457 | 40.3070 | 0.0724 | 0.0757 | -0.7057 | 0.1067 |
gbr_cds_dt | Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.2274 | 1.1449 | 37.9963 | 40.2550 | 0.0735 | 0.0769 | -0.7190 | 0.1467 |
rf_cds_dt | Random Forest w/ Cond. Deseasonalize & Detrending | 1.2500 | 1.1782 | 38.6418 | 41.3528 | 0.0749 | 0.0784 | -0.9426 | 0.2133 |
catboost_cds_dt | CatBoost Regressor w/ Cond. Deseasonalize & Detrending | 1.2523 | 1.1604 | 38.8002 | 40.8201 | 0.0745 | 0.0780 | -0.6842 | 1.5933 |
ada_cds_dt | AdaBoost w/ Cond. Deseasonalize & Detrending | 1.2786 | 1.1951 | 39.6382 | 42.0658 | 0.0750 | 0.0788 | -0.6308 | 0.1367 |
xgboost_cds_dt | Extreme Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.3198 | 1.2045 | 40.8342 | 42.3045 | 0.0792 | 0.0831 | -0.9192 | 0.1800 |
llar_cds_dt | Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending | 1.3659 | 1.2672 | 42.3974 | 44.6597 | 0.0793 | 0.0834 | -0.7393 | 0.0967 |
naive | Naive Forecaster | 1.5654 | 1.4951 | 48.4444 | 52.5232 | 0.0920 | 0.0981 | -1.8344 | 2.5533 |
snaive | Seasonal Naive Forecaster | 1.6741 | 1.5343 | 51.6667 | 53.7350 | 0.1052 | 0.1117 | -4.5388 | 1.2567 |
polytrend | Polynomial Trend Forecaster | 2.1553 | 2.1096 | 66.9817 | 74.4048 | 0.1241 | 0.1350 | -4.2525 | 0.0167 |
croston | Croston | 2.4565 | 2.3513 | 76.3953 | 82.9794 | 0.1394 | 0.1562 | -4.5895 | 0.0167 |
grand_means | Grand Means Forecaster | 7.3065 | 6.5029 | 226.0502 | 228.3880 | 0.4469 | 0.5821 | -72.1183 | 1.4433 |
Processing: 0%| | 0/125 [00:00<?, ?it/s]
# compare models using OOP
exp.compare_models()
Model | MASE | RMSSE | MAE | RMSE | MAPE | SMAPE | R2 | TT (Sec) | |
---|---|---|---|---|---|---|---|---|---|
ets | ETS | 0.4912 | 0.5541 | 15.0940 | 19.3099 | 0.0318 | 0.0316 | -0.4465 | 0.0967 |
exp_smooth | Exponential Smoothing | 0.4929 | 0.5560 | 15.1460 | 19.3779 | 0.0320 | 0.0317 | -0.4600 | 0.0867 |
arima | ARIMA | 0.6964 | 0.7110 | 21.3757 | 24.7774 | 0.0447 | 0.0456 | -0.5495 | 0.1300 |
auto_arima | Auto ARIMA | 0.7136 | 0.6945 | 21.9389 | 24.2138 | 0.0459 | 0.0464 | -0.5454 | 13.9433 |
par_cds_dt | Passive Aggressive w/ Cond. Deseasonalize & Detrending | 0.7212 | 0.6696 | 22.1794 | 23.3673 | 0.0453 | 0.0468 | 0.0261 | 0.1100 |
lar_cds_dt | Least Angular Regressor w/ Cond. Deseasonalize & Detrending | 0.8503 | 0.8261 | 26.2655 | 28.9830 | 0.0513 | 0.0534 | 0.0367 | 0.1200 |
huber_cds_dt | Huber w/ Cond. Deseasonalize & Detrending | 0.8658 | 0.8362 | 26.7826 | 29.3947 | 0.0516 | 0.0536 | 0.1501 | 0.0967 |
lr_cds_dt | Linear w/ Cond. Deseasonalize & Detrending | 0.8904 | 0.8722 | 27.5266 | 30.6243 | 0.0534 | 0.0555 | -0.0092 | 0.0967 |
ridge_cds_dt | Ridge w/ Cond. Deseasonalize & Detrending | 0.8905 | 0.8722 | 27.5270 | 30.6246 | 0.0534 | 0.0555 | -0.0092 | 0.0967 |
en_cds_dt | Elastic Net w/ Cond. Deseasonalize & Detrending | 0.8944 | 0.8746 | 27.6535 | 30.7127 | 0.0535 | 0.0557 | -0.0063 | 0.1133 |
lasso_cds_dt | Lasso w/ Cond. Deseasonalize & Detrending | 0.8966 | 0.8759 | 27.7231 | 30.7594 | 0.0536 | 0.0558 | -0.0040 | 0.0933 |
br_cds_dt | Bayesian Ridge w/ Cond. Deseasonalize & Detrending | 0.9156 | 0.8878 | 28.3188 | 31.1821 | 0.0547 | 0.0569 | -0.0209 | 0.0900 |
knn_cds_dt | K Neighbors w/ Cond. Deseasonalize & Detrending | 1.0695 | 0.9924 | 33.1500 | 34.9277 | 0.0631 | 0.0656 | -0.1682 | 0.1300 |
theta | Theta Forecaster | 1.0839 | 1.0393 | 33.3223 | 36.2555 | 0.0686 | 0.0710 | -1.7926 | 0.0300 |
et_cds_dt | Extra Trees w/ Cond. Deseasonalize & Detrending | 1.1678 | 1.0866 | 36.1678 | 38.2100 | 0.0694 | 0.0726 | -0.4302 | 0.2600 |
dt_cds_dt | Decision Tree w/ Cond. Deseasonalize & Detrending | 1.1930 | 1.1346 | 36.9106 | 39.8518 | 0.0733 | 0.0769 | -0.8135 | 0.1767 |
lightgbm_cds_dt | Light Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.2019 | 1.1362 | 37.2359 | 39.9827 | 0.0713 | 0.0746 | -0.6051 | 0.5800 |
omp_cds_dt | Orthogonal Matching Pursuit w/ Cond. Deseasonalize & Detrending | 1.2171 | 1.1475 | 37.6457 | 40.3070 | 0.0724 | 0.0757 | -0.7057 | 0.1167 |
gbr_cds_dt | Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.2274 | 1.1449 | 37.9963 | 40.2550 | 0.0735 | 0.0769 | -0.7190 | 0.1633 |
rf_cds_dt | Random Forest w/ Cond. Deseasonalize & Detrending | 1.2500 | 1.1782 | 38.6418 | 41.3528 | 0.0749 | 0.0784 | -0.9426 | 0.2433 |
catboost_cds_dt | CatBoost Regressor w/ Cond. Deseasonalize & Detrending | 1.2523 | 1.1604 | 38.8002 | 40.8201 | 0.0745 | 0.0780 | -0.6842 | 1.6900 |
ada_cds_dt | AdaBoost w/ Cond. Deseasonalize & Detrending | 1.2786 | 1.1951 | 39.6382 | 42.0658 | 0.0750 | 0.0788 | -0.6308 | 0.1733 |
xgboost_cds_dt | Extreme Gradient Boosting w/ Cond. Deseasonalize & Detrending | 1.3198 | 1.2045 | 40.8342 | 42.3045 | 0.0792 | 0.0831 | -0.9192 | 0.2167 |
llar_cds_dt | Lasso Least Angular Regressor w/ Cond. Deseasonalize & Detrending | 1.3659 | 1.2672 | 42.3974 | 44.6597 | 0.0793 | 0.0834 | -0.7393 | 0.0967 |
naive | Naive Forecaster | 1.5654 | 1.4951 | 48.4444 | 52.5232 | 0.0920 | 0.0981 | -1.8344 | 0.0467 |
snaive | Seasonal Naive Forecaster | 1.6741 | 1.5343 | 51.6667 | 53.7350 | 0.1052 | 0.1117 | -4.5388 | 0.0367 |
polytrend | Polynomial Trend Forecaster | 2.1553 | 2.1096 | 66.9817 | 74.4048 | 0.1241 | 0.1350 | -4.2525 | 0.0433 |
croston | Croston | 2.4565 | 2.3513 | 76.3953 | 82.9794 | 0.1394 | 0.1562 | -4.5895 | 0.0267 |
grand_means | Grand Means Forecaster | 7.3065 | 6.5029 | 226.0502 | 228.3880 | 0.4469 | 0.5821 | -72.1183 | 0.0400 |
Processing: 0%| | 0/125 [00:00<?, ?it/s]
AutoETS(seasonal='mul', sp=12, trend='add')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
AutoETS(seasonal='mul', sp=12, trend='add')
Notice that the output between functional and OOP API is consistent. Rest of the functions in this notebook will only be shown using functional API only.
You can use the plot_model
function to analyzes the performance of a trained model on the test set. It may require re-training the model in certain cases.
# plot forecast
plot_model(best, plot = 'forecast')