Testing pmdarima for auto ARIMA

Playing around with the pmdarima library for basic ARIMA and ARIMAX models in python - a nice alternative to statsmodels. Time series modeling in Python in general seems very scattered compared to the experience in R unfortunately.

In [1]:
import numpy as np
import pandas as pd
import pmdarima as pm
from pmdarima import model_selection

import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
In [2]:
# Load a local helper file with the data cleaning and all that
from create_dataset import load_combined_dataset

Using the sample datasets from the pmdarima library we can create a combined dataframe with data from 1980 to 1994 covering Australian beer production (in megaliters), residents, and wine sales (in bottles)

In [3]:
df = load_combined_dataset()
df.head()
Out[3]:
timeperiod Wine Residents Beer
0 19801 51885.0 14515.7 513.0
1 19802 54954.0 14554.9 427.0
2 19803 67765.0 14602.5 473.0
3 19804 79117.0 14646.4 526.0
4 19811 53013.0 14695.4 548.0
In [4]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index, y=df.Wine, name="Wine Sales (bottles)"))
fig.add_trace(go.Scatter(x=df.index, y=df.Residents, name="Residents"))
fig.update_layout(title="Wine Sales and Residents Over Time", 
                  xaxis_title="Quarters since Q1 1980", yaxis_title="Quarterly Total")