Exploring pandas Date Offsets

Creating finance calender lookup tables with the powerful date and time classes in base pandas.

In [1]:
import datetime
import numpy as np
import pandas as pd
print(pd.__version__)
1.1.3

pandas has a variety of functions that let you create and manipulate date ranges that conform to a fiscal calendar that begins on January 1st of every year and ends on Dec 31st of every year, and each quarter is exactly 3 months long.

We can see an example of this built-in functionality if we use pandas to give us an array of the last day in each quarter in 2021 using the freq='Q' parameter.

In [2]:
start = datetime.datetime(2021, 1, 1)
end = datetime.datetime(2022, 1, 1)
# Create a DatetimeIndex with freq='Q'
# the 'Q' indicates the last day of each quarter
ts = pd.date_range(start, end, freq='Q')
ts
Out[2]:
DatetimeIndex(['2021-03-31', '2021-06-30', '2021-09-30', '2021-12-31'], dtype='datetime64[ns]', freq='Q-DEC')

We can use this index to create a dataframe with columns with various attributes, by converting this stuff to a period object:

In [3]:
tp = ts.to_period()

pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})
Out[3]:
quarter fiscal_year
2021-03-31 1 2021
2021-06-30 2 2021
2021-09-30 3 2021
2021-12-31 4 2021

However, what if you wanted to use pandas to handle fiscal calendars that aren't set to this standard? For example, let's say we have a fiscal calendar that starts on June 1st (i.e. Nike).

Notice that the previous creation of a pd.date_range with freq='Q' created a DatetimeIndex with a freq of 'Q-DEC'. While there's a list of 'offset aliases' there isn't much mention of what 'Q-DEC' is except mentioned briefly in the Pandas user guide:

Many organizations define quarters relative to the month in which their fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or a few months into 2011. Via anchored frequencies, pandas works for all quarterly frequencies Q-JAN through Q-DEC.

So in other words, the default quarter system Q-DEC (and A-DEC) means that it ends in December. So if we wanted to have a fiscal calendar that ends in May and starts in June 1st, we'd use Q-MAY:

In [4]:
ts = pd.date_range(start, end, freq='Q-MAY')
ts
Out[4]:
DatetimeIndex(['2021-02-28', '2021-05-31', '2021-08-31', '2021-11-30'], dtype='datetime64[ns]', freq='Q-MAY')
In [5]:
tp = ts.to_period()

pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})
Out[5]:
quarter fiscal_year
2021-02-28 3 2021
2021-05-31 4 2021
2021-08-31 1 2022
2021-11-30 2 2022

So that was easy! And instead of creating the date range with just the end quarters, we can create a daily date range and then use the to_period() call to do the fiscal year translation for us, to create a daily calendar lookup table:

In [6]:
# create a DatetimeIndex with all days between 2021 and 2022
ts = pd.date_range(start='2021', end='2022', freq='D')

tp = ts.to_period(freq='Q-MAY')
df = pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})
df.tail()
Out[6]:
quarter fiscal_year
2021-12-28 3 2022
2021-12-29 3 2022
2021-12-30 3 2022
2021-12-31 3 2022
2022-01-01 3 2022

We can see that the correct fiscal year and fiscal quarters are represented. Nice!

What are Periods?

Notice in the above example, I converted the DatetimeIndex into something with the to_period() call. But what exactly is that? Let's take a look:

In [7]:
tp
Out[7]:
PeriodIndex(['2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3',
             '2021Q3', '2021Q3', '2021Q3', '2021Q3',
             ...
             '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3',
             '2022Q3', '2022Q3', '2022Q3', '2022Q3'],
            dtype='period[Q-MAY]', length=366, freq='Q-MAY')

Our DatetimeIndex was converted into a PeriodIndex, which is an index of Period objects. Each Period object is a custom object that has a variety of helpful attributes, like quarter, month, etc that we can extract for our lookup calendar if we wanted to.

52/53 Workweek Calendars

What if your company follows a 52/53-workweek calendar using the 4-4-5 quarter system (i.e. Intel, Dell)? I want to be able to use pandas methods on those, too!

We can use the pd.tseries.offsets.FY5253 DateOffset type to find the dates of the beginning of each fiscal year:

In [8]:
variation = 'last'

# create an offset for the Intel fiscal calendar, which ends on the last Saturday of December
yoffset = pd.tseries.offsets.FY5253(n=1, weekday=5, startingMonth=12, variation=variation)

# use the offset to create a date range with intervals at each fiscal year beginning
yoffset_range = pd.date_range('2020', periods=3, freq=yoffset)
yoffset_range
Out[8]:
DatetimeIndex(['2020-12-26', '2021-12-25', '2022-12-31'], dtype='datetime64[ns]', freq='RE-L-DEC-SAT')

Let's take a look at the above result - we've generated a DatetimeIndex with the last day of each fiscal year.

Notice the new freq string. Creating the timeseries offset object generates a new freq string that you can use for the pd.date_range function. However, unlike the previous examples you can't convert these to Period objects to get the quarter, month, etc (you can try it - it'll return an error).

So how can we use this to create a lookup table? Well, first off having the beginning dates for each fiscal year is helpful to us because having to worry about leap years and figuring out whether a year is going to be a 52 workweek year vs 53 workweek year is annoying. Trust me, I went down that rabbit hole.

But with the start/end dates for each fiscal year and the fact that we're following a 4-4-5 convention, we can create this manually:

In [9]:
# Create a '1 day' offset:
d1 = pd.tseries.offsets.DateOffset(n = 1)

# Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel, 
# while '2021-12-26' was the first day of fiscal year 2022
yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1

# iterate over each item in the date range we created earlier:
result = pd.DataFrame()
for i in yoffset_range:
    # recall each item is a Timestamp that represents the first day of the fiscal year,
    # so create a date_range from beginning to end of the fiscal year
    current_range = pd.date_range(i+d1, i+yoffset, freq='D')
    interim_df = pd.DataFrame(index = current_range)
    # day of year
    interim_df['DOY'] = (current_range-current_range[0]).days +1
    # fiscal year
    interim_df['FY'] = yr
    result = result.append(interim_df)
    yr += 1
In [10]:
result.query('FY==2021').tail(10)
Out[10]:
DOY FY
2021-12-16 355 2021
2021-12-17 356 2021
2021-12-18 357 2021
2021-12-19 358 2021
2021-12-20 359 2021
2021-12-21 360 2021
2021-12-22 361 2021
2021-12-23 362 2021
2021-12-24 363 2021
2021-12-25 364 2021

Let's go ahead and add in a few more features:

In [11]:
# workweek in year
result['WW'] = ((result['DOY']-1) //7) + 1
# fiscal quarter
result['FQ'] = np.minimum((result['WW']//13) + 1, 4)
# workweek in quarter
result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)
# fiscal month
result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)
In [12]:
result.query('FY==2021').tail(10)
Out[12]:
DOY FY WW FQ WWinQ FM
2021-12-16 355 2021 51 4 12 12
2021-12-17 356 2021 51 4 12 12
2021-12-18 357 2021 51 4 12 12
2021-12-19 358 2021 52 4 13 12
2021-12-20 359 2021 52 4 13 12
2021-12-21 360 2021 52 4 13 12
2021-12-22 361 2021 52 4 13 12
2021-12-23 362 2021 52 4 13 12
2021-12-24 363 2021 52 4 13 12
2021-12-25 364 2021 52 4 13 12

Great! We can summarize the function below and in a public gist.

In [13]:
"""
Make a dataframe for a 52/53-workweek accounting period calendar given the last day of the previous year.
Since many companies use this 52/53 workweek for their fiscal periods, this is a useful function to quickly
build custom lookup tables so you can join between datasets!
"""

import pandas as pd
from datetime import datetime
import numpy as np


def create_5253_calendar(start='2020', periods=3, weekday=5, startingMonth=12, variation='last'):
    """
    Builds a lookup dataframe with calendar date on the index and relevant fiscal dates for a 52/53 workweek
    aka 4-4-5 fiscal calendar. 
    
    See https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html for info on the 
    pandas 52/53 workweek dateoffset object.
    
    Parameters
    ----------
    start : str, datetime, or pd.Timestamp object (default : '2020')
        Indicate the calendar year (i.e. '2020') or datetime/timestamp for the final output dataframe
    periods : int (default : 3
        Number of fiscal years ahead to build the output dataframe
    weekday : int (default : 5)
        Which day of week the fiscal year ends on
        See also: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html
        * 0 : Monday
        * 1 : Tuesday
        * 2 : Wednesday
        * 3 : Thursday
        * 4 : Friday
        * 5 : Saturday
        * 6 : Sunday
    startingMonth : int (default : 12)
        Following the pandas convention, but this actually signals the 'ending month' for the fiscal year. So a fiscal year
        that ends on the last Saturday of every December would have a value of 12 here.
    variation : str (default : 'last')
        Either 'last' or 'nearest' - parameters for the pandas FY5253 offset that specifies the type of 5253 calendar this is.
        If it's 'last', then the fiscal year ends on the 'last' `weekday` of the `startingMonth`.
        If it's 'nearest', then the fiscal year ends on the 'nearest' `weekday` of the `startingMonth`. 
        
    Returns
    -------
    pd.DataFrame
        Returns a dataframe with a daily DatetimeIndex and columns that map to key fiscal timelines for
        a 52/53 workweek calendar, such as DOY (day of fiscal year), WW (workweek in fiscal year), etc
        
    Examples
    --------
    >>>create_5253_calendar(start='20', periods=5, weekday=5, startingMonth=12, variation='last')
        
    """
    # create an offset for the Intel fiscal calendar, which ends on the last Saturday of December
    yoffset = pd.tseries.offsets.FY5253(n=1, weekday=weekday, startingMonth=startingMonth, variation=variation)

    # use the offset to create a date range with intervals at each fiscal year beginning
    yoffset_range = pd.date_range(start=start, periods=periods, freq=yoffset)    
    
    # Create a '1 day' offset:
    d1 = pd.tseries.offsets.DateOffset(n = 1)

    # Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel, 
    # while '2021-12-26' was the first day of fiscal year 2022
    yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1

    # iterate over each item in the date range we created earlier:
    result = pd.DataFrame()
    for i in yoffset_range:
        # recall each item is a Timestamp that represents the first day of the fiscal year,
        # so create a date_range from beginning to end of the fiscal year
        current_range = pd.date_range(i+d1, i+yoffset, freq='D')
        interim_df = pd.DataFrame(index = current_range)
        # day of year
        interim_df['DOY'] = (current_range-current_range[0]).days +1
        # fiscal year
        interim_df['FY'] = yr
        result = result.append(interim_df)
        yr += 1

    # workweek in year
    result['WW'] = ((result['DOY']-1) //7) + 1
    # fiscal quarter
    result['FQ'] = np.minimum((result['WW']//13) + 1, 4)
    # workweek in quarter
    result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)
    # fiscal month
    result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)

    return result
In [14]:
%%timeit
create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')
64.3 ms ± 8.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [15]:
df = create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')
df.tail()
Out[15]:
DOY FY WW FQ WWinQ FM
2030-12-24 360 2030 52 4 13 12
2030-12-25 361 2030 52 4 13 12
2030-12-26 362 2030 52 4 13 12
2030-12-27 363 2030 52 4 13 12
2030-12-28 364 2030 52 4 13 12
In [ ]: