Exploring pandas Date Offsets¶

Creating finance calender lookup tables with the powerful date and time classes in base pandas.

In [1]:
import datetime
import numpy as np
import pandas as pd
print(pd.__version__)

1.1.3


pandas has a variety of functions that let you create and manipulate date ranges that conform to a fiscal calendar that begins on January 1st of every year and ends on Dec 31st of every year, and each quarter is exactly 3 months long.

We can see an example of this built-in functionality if we use pandas to give us an array of the last day in each quarter in 2021 using the freq='Q' parameter.

In [2]:
start = datetime.datetime(2021, 1, 1)
end = datetime.datetime(2022, 1, 1)
# Create a DatetimeIndex with freq='Q'
# the 'Q' indicates the last day of each quarter
ts = pd.date_range(start, end, freq='Q')
ts

Out[2]:
DatetimeIndex(['2021-03-31', '2021-06-30', '2021-09-30', '2021-12-31'], dtype='datetime64[ns]', freq='Q-DEC')

We can use this index to create a dataframe with columns with various attributes, by converting this stuff to a period object:

In [3]:
tp = ts.to_period()

pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})

Out[3]:
quarter fiscal_year
2021-03-31 1 2021
2021-06-30 2 2021
2021-09-30 3 2021
2021-12-31 4 2021

However, what if you wanted to use pandas to handle fiscal calendars that aren't set to this standard? For example, let's say we have a fiscal calendar that starts on June 1st (i.e. Nike).

Notice that the previous creation of a pd.date_range with freq='Q' created a DatetimeIndex with a freq of 'Q-DEC'. While there's a list of 'offset aliases' there isn't much mention of what 'Q-DEC' is except mentioned briefly in the Pandas user guide:

Many organizations define quarters relative to the month in which their fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or a few months into 2011. Via anchored frequencies, pandas works for all quarterly frequencies Q-JAN through Q-DEC.

So in other words, the default quarter system Q-DEC (and A-DEC) means that it ends in December. So if we wanted to have a fiscal calendar that ends in May and starts in June 1st, we'd use Q-MAY:

In [4]:
ts = pd.date_range(start, end, freq='Q-MAY')
ts

Out[4]:
DatetimeIndex(['2021-02-28', '2021-05-31', '2021-08-31', '2021-11-30'], dtype='datetime64[ns]', freq='Q-MAY')
In [5]:
tp = ts.to_period()

pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})

Out[5]:
quarter fiscal_year
2021-02-28 3 2021
2021-05-31 4 2021
2021-08-31 1 2022
2021-11-30 2 2022

So that was easy! And instead of creating the date range with just the end quarters, we can create a daily date range and then use the to_period() call to do the fiscal year translation for us, to create a daily calendar lookup table:

In [6]:
# create a DatetimeIndex with all days between 2021 and 2022
ts = pd.date_range(start='2021', end='2022', freq='D')

tp = ts.to_period(freq='Q-MAY')
df = pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})
df.tail()

Out[6]:
quarter fiscal_year
2021-12-28 3 2022
2021-12-29 3 2022
2021-12-30 3 2022
2021-12-31 3 2022
2022-01-01 3 2022

We can see that the correct fiscal year and fiscal quarters are represented. Nice!

What are Periods?¶

Notice in the above example, I converted the DatetimeIndex into something with the to_period() call. But what exactly is that? Let's take a look:

In [7]:
tp

Out[7]:
PeriodIndex(['2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3',
'2021Q3', '2021Q3', '2021Q3', '2021Q3',
...
'2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3',
'2022Q3', '2022Q3', '2022Q3', '2022Q3'],
dtype='period[Q-MAY]', length=366, freq='Q-MAY')

Our DatetimeIndex was converted into a PeriodIndex, which is an index of Period objects. Each Period object is a custom object that has a variety of helpful attributes, like quarter, month, etc that we can extract for our lookup calendar if we wanted to.

52/53 Workweek Calendars¶

What if your company follows a 52/53-workweek calendar using the 4-4-5 quarter system (i.e. Intel, Dell)? I want to be able to use pandas methods on those, too!

We can use the pd.tseries.offsets.FY5253 DateOffset type to find the dates of the beginning of each fiscal year:

In [8]:
variation = 'last'

# create an offset for the Intel fiscal calendar, which ends on the last Saturday of December
yoffset = pd.tseries.offsets.FY5253(n=1, weekday=5, startingMonth=12, variation=variation)

# use the offset to create a date range with intervals at each fiscal year beginning
yoffset_range = pd.date_range('2020', periods=3, freq=yoffset)
yoffset_range

Out[8]:
DatetimeIndex(['2020-12-26', '2021-12-25', '2022-12-31'], dtype='datetime64[ns]', freq='RE-L-DEC-SAT')

Let's take a look at the above result - we've generated a DatetimeIndex with the last day of each fiscal year.

Notice the new freq string. Creating the timeseries offset object generates a new freq string that you can use for the pd.date_range function. However, unlike the previous examples you can't convert these to Period objects to get the quarter, month, etc (you can try it - it'll return an error).

So how can we use this to create a lookup table? Well, first off having the beginning dates for each fiscal year is helpful to us because having to worry about leap years and figuring out whether a year is going to be a 52 workweek year vs 53 workweek year is annoying. Trust me, I went down that rabbit hole.

But with the start/end dates for each fiscal year and the fact that we're following a 4-4-5 convention, we can create this manually:

In [9]:
# Create a '1 day' offset:
d1 = pd.tseries.offsets.DateOffset(n = 1)

# Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel,
# while '2021-12-26' was the first day of fiscal year 2022
yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1

# iterate over each item in the date range we created earlier:
result = pd.DataFrame()
for i in yoffset_range:
# recall each item is a Timestamp that represents the first day of the fiscal year,
# so create a date_range from beginning to end of the fiscal year
current_range = pd.date_range(i+d1, i+yoffset, freq='D')
interim_df = pd.DataFrame(index = current_range)
# day of year
interim_df['DOY'] = (current_range-current_range[0]).days +1
# fiscal year
interim_df['FY'] = yr
result = result.append(interim_df)
yr += 1

In [10]:
result.query('FY==2021').tail(10)

Out[10]:
DOY FY
2021-12-16 355 2021
2021-12-17 356 2021
2021-12-18 357 2021
2021-12-19 358 2021
2021-12-20 359 2021
2021-12-21 360 2021
2021-12-22 361 2021
2021-12-23 362 2021
2021-12-24 363 2021
2021-12-25 364 2021

In [11]:
# workweek in year
result['WW'] = ((result['DOY']-1) //7) + 1
# fiscal quarter
result['FQ'] = np.minimum((result['WW']//13) + 1, 4)
# workweek in quarter
result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)
# fiscal month
result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)

In [12]:
result.query('FY==2021').tail(10)

Out[12]:
DOY FY WW FQ WWinQ FM
2021-12-16 355 2021 51 4 12 12
2021-12-17 356 2021 51 4 12 12
2021-12-18 357 2021 51 4 12 12
2021-12-19 358 2021 52 4 13 12
2021-12-20 359 2021 52 4 13 12
2021-12-21 360 2021 52 4 13 12
2021-12-22 361 2021 52 4 13 12
2021-12-23 362 2021 52 4 13 12
2021-12-24 363 2021 52 4 13 12
2021-12-25 364 2021 52 4 13 12

Great! We can summarize the function below and in a public gist.

In [13]:
"""
Make a dataframe for a 52/53-workweek accounting period calendar given the last day of the previous year.
Since many companies use this 52/53 workweek for their fiscal periods, this is a useful function to quickly
build custom lookup tables so you can join between datasets!
"""

import pandas as pd
from datetime import datetime
import numpy as np

def create_5253_calendar(start='2020', periods=3, weekday=5, startingMonth=12, variation='last'):
"""
Builds a lookup dataframe with calendar date on the index and relevant fiscal dates for a 52/53 workweek
aka 4-4-5 fiscal calendar.

See https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html for info on the
pandas 52/53 workweek dateoffset object.

Parameters
----------
start : str, datetime, or pd.Timestamp object (default : '2020')
Indicate the calendar year (i.e. '2020') or datetime/timestamp for the final output dataframe
periods : int (default : 3
Number of fiscal years ahead to build the output dataframe
weekday : int (default : 5)
Which day of week the fiscal year ends on
* 0 : Monday
* 1 : Tuesday
* 2 : Wednesday
* 3 : Thursday
* 4 : Friday
* 5 : Saturday
* 6 : Sunday
startingMonth : int (default : 12)
Following the pandas convention, but this actually signals the 'ending month' for the fiscal year. So a fiscal year
that ends on the last Saturday of every December would have a value of 12 here.
variation : str (default : 'last')
Either 'last' or 'nearest' - parameters for the pandas FY5253 offset that specifies the type of 5253 calendar this is.
If it's 'last', then the fiscal year ends on the 'last' weekday of the startingMonth.
If it's 'nearest', then the fiscal year ends on the 'nearest' weekday of the startingMonth.

Returns
-------
pd.DataFrame
Returns a dataframe with a daily DatetimeIndex and columns that map to key fiscal timelines for
a 52/53 workweek calendar, such as DOY (day of fiscal year), WW (workweek in fiscal year), etc

Examples
--------
>>>create_5253_calendar(start='20', periods=5, weekday=5, startingMonth=12, variation='last')

"""
# create an offset for the Intel fiscal calendar, which ends on the last Saturday of December
yoffset = pd.tseries.offsets.FY5253(n=1, weekday=weekday, startingMonth=startingMonth, variation=variation)

# use the offset to create a date range with intervals at each fiscal year beginning
yoffset_range = pd.date_range(start=start, periods=periods, freq=yoffset)

# Create a '1 day' offset:
d1 = pd.tseries.offsets.DateOffset(n = 1)

# Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel,
# while '2021-12-26' was the first day of fiscal year 2022
yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1

# iterate over each item in the date range we created earlier:
result = pd.DataFrame()
for i in yoffset_range:
# recall each item is a Timestamp that represents the first day of the fiscal year,
# so create a date_range from beginning to end of the fiscal year
current_range = pd.date_range(i+d1, i+yoffset, freq='D')
interim_df = pd.DataFrame(index = current_range)
# day of year
interim_df['DOY'] = (current_range-current_range[0]).days +1
# fiscal year
interim_df['FY'] = yr
result = result.append(interim_df)
yr += 1

# workweek in year
result['WW'] = ((result['DOY']-1) //7) + 1
# fiscal quarter
result['FQ'] = np.minimum((result['WW']//13) + 1, 4)
# workweek in quarter
result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)
# fiscal month
result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)

return result

In [14]:
%%timeit
create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')

64.3 ms ± 8.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [15]:
df = create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')
df.tail()

Out[15]:
DOY FY WW FQ WWinQ FM
2030-12-24 360 2030 52 4 13 12
2030-12-25 361 2030 52 4 13 12
2030-12-26 362 2030 52 4 13 12
2030-12-27 363 2030 52 4 13 12
2030-12-28 364 2030 52 4 13 12
In [ ]: