Overview

In the 10x series of notebooks, we will look at Time Series modeling in pycaret using univariate data and no exogenous variables. We will use the famous airline dataset for illustration. Our plan of action is as follows:

  1. Perform EDA on the dataset to extract valuable insight about the process generating the time series. (Covered in this notebook)
  2. Model the dataset based on exploratory analysis (univariable model without exogenous variables).
  3. Use an automated approach (AutoML) to improve the performance.
  4. User customizations, potential pitfalls and how to overcome them.
In [21]:
def what_is_installed():
    from pycaret import show_versions
    show_versions()

try:
    what_is_installed()
except ModuleNotFoundError:
    !pip install pycaret-ts-alpha
    what_is_installed()
System:
    python: 3.8.12 (default, Oct 12 2021, 03:01:40) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\Nikhil\.conda\envs\pycaret_dev_sktime_0p10p0\python.exe
   machine: Windows-10-10.0.19044-SP0

Python dependencies:
          pip: 21.2.2
   setuptools: 60.8.2
      pycaret: 3.0.0
      sklearn: 1.0.2
       sktime: 0.10.1
  statsmodels: 0.12.2
        numpy: 1.21.5
        scipy: 1.7.3
       pandas: 1.4.2
   matplotlib: 3.5.1
       plotly: 5.5.0
       joblib: 1.0.1
        numba: 0.55.1
       mlflow: 1.23.1
     lightgbm: 3.3.2
      xgboost: Not installed
     pmdarima: 1.8.4
        tbats: Installed but version unavailable
      prophet: 1.0
      tsfresh: Not installed
In [20]:
import time
import numpy as np
import pandas as pd

from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment
In [3]:
y = get_data('airline', verbose=False)
In [4]:
# We want to forecast the next 12 months of data and we will use 3 fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3
In [5]:
# Global Plot Settings
fig_kwargs={'renderer': 'notebook'}

Exploratory Analysis

pycaret Time Series Forecasting module provides a conventient interface for perform exploratory analysis using plot_model.

NOTE:

  • Without an estimator argument, plot_model will plot using the original dataset. We will cover this in the current notebook.
  • If an estimator (model) is passed to plot_model, the the plots are made using the model data (e.g. future forecasts, or analysis on insample residuals). We will cover this in a subsequent notebook.

Let's see how this works next.

First, we will plots the original dataset.

In [6]:
eda = TSForecastingExperiment()
eda.setup(data=y, fh=fh, fig_kwargs=fig_kwargs)
  Description Value
0 session_id 5291
1 Target Number of airline passengers
2 Approach Univariate
3 Exogenous Variables Not Present
4 Data shape (144, 1)
5 Train data shape (132, 1)
6 Test data shape (12, 1)
7 Fold Generator ExpandingWindowSplitter
8 Fold Number 3
9 Enforce Prediction Interval False
10 Seasonal Period(s) Tested 12
11 Seasonality Present True
12 Seasonalities Detected [12]
13 Primary Seasonality 12
14 Target Strictly Positive True
15 Target White Noise No
16 Recommended d 1
17 Recommended Seasonal D 1
18 Missing Values 0
19 Preprocess True
20 CPU Jobs -1
21 Use GPU False
22 Log Experiment False
23 Experiment Name ts-default-name
24 USI 3627
Out[6]:
<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x27999bbb910>
In [7]:
# NOTE: This is the same as `eda.plot_model(plot="ts")`
eda.plot_model()