In the 10x series of notebooks, we will look at Time Series modeling in pycaret using univariate data and no exogenous variables. We will use the famous airline dataset for illustration. Our plan of action is as follows:

- Perform EDA on the dataset to extract valuable insight about the process generating the time series.
**(Covered in this notebook)** - Model the dataset based on exploratory analysis (univariable model without exogenous variables).
- Use an automated approach (AutoML) to improve the performance.

In [1]:

```
# Only enable critical logging (Optional)
import os
os.environ["PYCARET_CUSTOM_LOGGING_LEVEL"] = "CRITICAL"
```

In [2]:

```
def what_is_installed():
from pycaret import show_versions
show_versions()
try:
what_is_installed()
except ModuleNotFoundError:
!pip install pycaret
what_is_installed()
```

In [3]:

```
import time
import numpy as np
import pandas as pd
from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment
```

In [4]:

```
y = get_data('airline', verbose=False)
```

In [5]:

```
# We want to forecast the next 12 months of data and we will use 3 fold cross-validation to test the models.
fh = 12 # or alternately fh = np.arange(1,13)
fold = 3
```

In [6]:

```
# Global Figure Settings for notebook ----
# Depending on whether you are using jupyter notebook, jupyter lab, Google Colab, you may have to set the renderer appropriately
# NOTE: Setting to a static renderer here so that the notebook saved size is reduced.
fig_kwargs = {
# "renderer": "notebook",
"renderer": "png",
"width": 1000,
"height": 600,
}
```

`pycaret`

Time Series Forecasting module provides a conventient interface for perform exploratory analysis using `plot_model`

.

**NOTE:**

- Without an estimator argument,
`plot_model`

will plot using the original dataset. We will cover this in the current notebook. - If an estimator (model) is passed to
`plot_model`

, the the plots are made using the model data (e.g. future forecasts, or analysis on insample residuals). We will cover this in a subsequent notebook.

Let's see how this works next.

**First, we will plots the original dataset.**

In [7]:

```
eda = TSForecastingExperiment()
eda.setup(data=y, fh=fh, fig_kwargs=fig_kwargs)
```

Description | Value | |
---|---|---|

0 | session_id | 4531 |

1 | Target | Number of airline passengers |

2 | Approach | Univariate |

3 | Exogenous Variables | Not Present |

4 | Original data shape | (144, 1) |

5 | Transformed data shape | (144, 1) |

6 | Transformed train set shape | (132, 1) |

7 | Transformed test set shape | (12, 1) |

8 | Rows with missing values | 0.0% |

9 | Fold Generator | ExpandingWindowSplitter |

10 | Fold Number | 3 |

11 | Enforce Prediction Interval | False |

12 | Splits used for hyperparameters | all |

13 | User Defined Seasonal Period(s) | None |

14 | Ignore Seasonality Test | False |

15 | Seasonality Detection Algo | auto |

16 | Max Period to Consider | 60 |

17 | Seasonal Period(s) Tested | [12, 24, 36, 11, 48] |

18 | Significant Seasonal Period(s) | [12, 24, 36, 11, 48] |

19 | Significant Seasonal Period(s) without Harmonics | [48, 36, 11] |

20 | Remove Harmonics | False |

21 | Harmonics Order Method | harmonic_max |

22 | Num Seasonalities to Use | 1 |

23 | All Seasonalities to Use | [12] |

24 | Primary Seasonality | 12 |

25 | Seasonality Present | True |

26 | Target Strictly Positive | True |

27 | Target White Noise | No |

28 | Recommended d | 1 |

29 | Recommended Seasonal D | 1 |

30 | Preprocess | False |

31 | CPU Jobs | -1 |

32 | Use GPU | False |

33 | Log Experiment | False |

34 | Experiment Name | ts-default-name |

35 | USI | 964b |

Out[7]:

<pycaret.time_series.forecasting.oop.TSForecastingExperiment at 0x1a586ae8670>

In [8]:

```
# NOTE: This is the same as `eda.plot_model(plot="ts")`
eda.plot_model()
```

Before we explore the data further, there are a few minor things to know about how PyCaret prepares a **modeling pipeline** under the hood. The data being modeled is usually fed through an internal pipeline that has optional steps in the following order:

**Data Input (by user) >> Imputation >> Transformation & Scaling >> Model**

**Imputation**

- This step is optional if data does not have missing values, and is mandatory if data has missing values. This is because many statistical tests and models can not work with missing data.
- Although some models like
`Prophet`

can work with missing data, the need to run statistical tests to extract useful informaton from the data for default model settings necessitates having imputation when data has missing values.

**Transformations and Scaling**

- This step is optional and user should usually only enable this after evaluating the models (e.g. by performing residual analysis), or if they have specific requirements such as limiting the forecast to only positive values.

We will discuss more about imputation and transformations in in another notebook. For now, our data does not have any missing values or any transformations. So **Data Input (by user), i.e. Original data = Imputed data = Transformed data = Data fed to models**. We can verify this by plotting the internal datasets by specifying the `plot_data_type`

`data_kwargs`

.

NOTE: If `plot_data_type`

is not provided, each plot type has it's own default data_type that is internally determined, but the users have the ability to always overwrite using `plot_data_type`

.

In [9]:

```
eda.plot_model(data_kwargs={"plot_data_type": ["original", "imputed", "transformed"]})
```

**Let's explore the standard ACF and PACF plots for the dataset next**

In [10]:

```
# ACF and PACF for the original dataset
eda.plot_model(plot="acf")
```

In [11]:

```
# NOTE: you can customize the plots with kwargs - e.g. number of lags, figure size (width, height), etc
# data_kwargs such as `nlags` are passed to the underlying function that gets the ACF values
# figure kwargs such as `fig_size` & `template` are passed to plotly and can have any value that plotly accepts
eda.plot_model(plot="pacf", data_kwargs={'nlags':36}, fig_kwargs={'height': 500, "width": 800})
```

**Users may also wish to explore the spectrogram or the FFT which are very useful for studying the frequency components in the time series.**

For example:

- Peaking at f =~ 0 can indicating wandering behavior characteristic of a random walk that needs to be differenced. This could also be indicative of a stationary ARMA process with a high positive phi value.
- Peaking at a frequency and its multiples is indicative of seasonality. The lowest frequency in this case is called the fundamental frequency and the inverse of this frequency is the seasonal period for the model.

In [12]:

```
eda.plot_model(plot="periodogram")
eda.plot_model(plot="fft")
```