pandas
Date Offsets¶Creating finance calender lookup tables with the powerful date and time classes in base pandas
.
import datetime
import numpy as np
import pandas as pd
print(pd.__version__)
1.1.3
pandas
has a variety of functions that let you create and manipulate date ranges that conform to a fiscal calendar that begins on January 1st of every year and ends on Dec 31st of every year, and each quarter is exactly 3 months long.
We can see an example of this built-in functionality if we use pandas
to give us an array of the last day in each quarter in 2021 using the freq='Q'
parameter.
start = datetime.datetime(2021, 1, 1)
end = datetime.datetime(2022, 1, 1)
# Create a DatetimeIndex with freq='Q'
# the 'Q' indicates the last day of each quarter
ts = pd.date_range(start, end, freq='Q')
ts
DatetimeIndex(['2021-03-31', '2021-06-30', '2021-09-30', '2021-12-31'], dtype='datetime64[ns]', freq='Q-DEC')
We can use this index to create a dataframe with columns with various attributes, by converting this stuff to a period
object:
tp = ts.to_period()
pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})
quarter | fiscal_year | |
---|---|---|
2021-03-31 | 1 | 2021 |
2021-06-30 | 2 | 2021 |
2021-09-30 | 3 | 2021 |
2021-12-31 | 4 | 2021 |
However, what if you wanted to use pandas
to handle fiscal calendars that aren't set to this standard? For example, let's say we have a fiscal calendar that starts on June 1st (i.e. Nike).
Notice that the previous creation of a pd.date_range
with freq='Q'
created a DatetimeIndex with a freq of 'Q-DEC'
. While there's a list of 'offset aliases' there isn't much mention of what 'Q-DEC'
is except mentioned briefly in the Pandas user guide:
Many organizations define quarters relative to the month in which their fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or a few months into 2011. Via anchored frequencies, pandas works for all quarterly frequencies
Q-JAN
throughQ-DEC
.
So in other words, the default quarter system Q-DEC
(and A-DEC
) means that it ends in December. So if we wanted to have a fiscal calendar that ends in May and starts in June 1st, we'd use Q-MAY
:
ts = pd.date_range(start, end, freq='Q-MAY')
ts
DatetimeIndex(['2021-02-28', '2021-05-31', '2021-08-31', '2021-11-30'], dtype='datetime64[ns]', freq='Q-MAY')
tp = ts.to_period()
pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})
quarter | fiscal_year | |
---|---|---|
2021-02-28 | 3 | 2021 |
2021-05-31 | 4 | 2021 |
2021-08-31 | 1 | 2022 |
2021-11-30 | 2 | 2022 |
So that was easy! And instead of creating the date range with just the end quarters, we can create a daily date range and then use the to_period()
call to do the fiscal year translation for us, to create a daily calendar lookup table:
# create a DatetimeIndex with all days between 2021 and 2022
ts = pd.date_range(start='2021', end='2022', freq='D')
tp = ts.to_period(freq='Q-MAY')
df = pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})
df.tail()
quarter | fiscal_year | |
---|---|---|
2021-12-28 | 3 | 2022 |
2021-12-29 | 3 | 2022 |
2021-12-30 | 3 | 2022 |
2021-12-31 | 3 | 2022 |
2022-01-01 | 3 | 2022 |
We can see that the correct fiscal year and fiscal quarters are represented. Nice!
Notice in the above example, I converted the DatetimeIndex into something with the to_period()
call. But what exactly is that? Let's take a look:
tp
PeriodIndex(['2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', ... '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3'], dtype='period[Q-MAY]', length=366, freq='Q-MAY')
Our DatetimeIndex
was converted into a PeriodIndex
, which is an index of Period
objects. Each Period
object is a custom object that has a variety of helpful attributes, like quarter
, month
, etc that we can extract for our lookup calendar if we wanted to.
What if your company follows a 52/53-workweek calendar using the 4-4-5 quarter system (i.e. Intel, Dell)? I want to be able to use pandas
methods on those, too!
We can use the pd.tseries.offsets.FY5253
DateOffset type to find the dates of the beginning of each fiscal year:
variation = 'last'
# create an offset for the Intel fiscal calendar, which ends on the last Saturday of December
yoffset = pd.tseries.offsets.FY5253(n=1, weekday=5, startingMonth=12, variation=variation)
# use the offset to create a date range with intervals at each fiscal year beginning
yoffset_range = pd.date_range('2020', periods=3, freq=yoffset)
yoffset_range
DatetimeIndex(['2020-12-26', '2021-12-25', '2022-12-31'], dtype='datetime64[ns]', freq='RE-L-DEC-SAT')
Let's take a look at the above result - we've generated a DatetimeIndex
with the last day of each fiscal year.
Notice the new freq
string. Creating the timeseries offset object generates a new freq
string that you can use for the pd.date_range
function. However, unlike the previous examples you can't convert these to Period
objects to get the quarter, month, etc (you can try it - it'll return an error).
So how can we use this to create a lookup table? Well, first off having the beginning dates for each fiscal year is helpful to us because having to worry about leap years and figuring out whether a year is going to be a 52 workweek year vs 53 workweek year is annoying. Trust me, I went down that rabbit hole.
But with the start/end dates for each fiscal year and the fact that we're following a 4-4-5 convention, we can create this manually:
# Create a '1 day' offset:
d1 = pd.tseries.offsets.DateOffset(n = 1)
# Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel,
# while '2021-12-26' was the first day of fiscal year 2022
yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1
# iterate over each item in the date range we created earlier:
result = pd.DataFrame()
for i in yoffset_range:
# recall each item is a Timestamp that represents the first day of the fiscal year,
# so create a date_range from beginning to end of the fiscal year
current_range = pd.date_range(i+d1, i+yoffset, freq='D')
interim_df = pd.DataFrame(index = current_range)
# day of year
interim_df['DOY'] = (current_range-current_range[0]).days +1
# fiscal year
interim_df['FY'] = yr
result = result.append(interim_df)
yr += 1
result.query('FY==2021').tail(10)
DOY | FY | |
---|---|---|
2021-12-16 | 355 | 2021 |
2021-12-17 | 356 | 2021 |
2021-12-18 | 357 | 2021 |
2021-12-19 | 358 | 2021 |
2021-12-20 | 359 | 2021 |
2021-12-21 | 360 | 2021 |
2021-12-22 | 361 | 2021 |
2021-12-23 | 362 | 2021 |
2021-12-24 | 363 | 2021 |
2021-12-25 | 364 | 2021 |
Let's go ahead and add in a few more features:
# workweek in year
result['WW'] = ((result['DOY']-1) //7) + 1
# fiscal quarter
result['FQ'] = np.minimum((result['WW']//13) + 1, 4)
# workweek in quarter
result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)
# fiscal month
result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)
result.query('FY==2021').tail(10)
DOY | FY | WW | FQ | WWinQ | FM | |
---|---|---|---|---|---|---|
2021-12-16 | 355 | 2021 | 51 | 4 | 12 | 12 |
2021-12-17 | 356 | 2021 | 51 | 4 | 12 | 12 |
2021-12-18 | 357 | 2021 | 51 | 4 | 12 | 12 |
2021-12-19 | 358 | 2021 | 52 | 4 | 13 | 12 |
2021-12-20 | 359 | 2021 | 52 | 4 | 13 | 12 |
2021-12-21 | 360 | 2021 | 52 | 4 | 13 | 12 |
2021-12-22 | 361 | 2021 | 52 | 4 | 13 | 12 |
2021-12-23 | 362 | 2021 | 52 | 4 | 13 | 12 |
2021-12-24 | 363 | 2021 | 52 | 4 | 13 | 12 |
2021-12-25 | 364 | 2021 | 52 | 4 | 13 | 12 |
Great! We can summarize the function below and in a public gist.
"""
Make a dataframe for a 52/53-workweek accounting period calendar given the last day of the previous year.
Since many companies use this 52/53 workweek for their fiscal periods, this is a useful function to quickly
build custom lookup tables so you can join between datasets!
"""
import pandas as pd
from datetime import datetime
import numpy as np
def create_5253_calendar(start='2020', periods=3, weekday=5, startingMonth=12, variation='last'):
"""
Builds a lookup dataframe with calendar date on the index and relevant fiscal dates for a 52/53 workweek
aka 4-4-5 fiscal calendar.
See https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html for info on the
pandas 52/53 workweek dateoffset object.
Parameters
----------
start : str, datetime, or pd.Timestamp object (default : '2020')
Indicate the calendar year (i.e. '2020') or datetime/timestamp for the final output dataframe
periods : int (default : 3
Number of fiscal years ahead to build the output dataframe
weekday : int (default : 5)
Which day of week the fiscal year ends on
See also: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html
* 0 : Monday
* 1 : Tuesday
* 2 : Wednesday
* 3 : Thursday
* 4 : Friday
* 5 : Saturday
* 6 : Sunday
startingMonth : int (default : 12)
Following the pandas convention, but this actually signals the 'ending month' for the fiscal year. So a fiscal year
that ends on the last Saturday of every December would have a value of 12 here.
variation : str (default : 'last')
Either 'last' or 'nearest' - parameters for the pandas FY5253 offset that specifies the type of 5253 calendar this is.
If it's 'last', then the fiscal year ends on the 'last' `weekday` of the `startingMonth`.
If it's 'nearest', then the fiscal year ends on the 'nearest' `weekday` of the `startingMonth`.
Returns
-------
pd.DataFrame
Returns a dataframe with a daily DatetimeIndex and columns that map to key fiscal timelines for
a 52/53 workweek calendar, such as DOY (day of fiscal year), WW (workweek in fiscal year), etc
Examples
--------
>>>create_5253_calendar(start='20', periods=5, weekday=5, startingMonth=12, variation='last')
"""
# create an offset for the Intel fiscal calendar, which ends on the last Saturday of December
yoffset = pd.tseries.offsets.FY5253(n=1, weekday=weekday, startingMonth=startingMonth, variation=variation)
# use the offset to create a date range with intervals at each fiscal year beginning
yoffset_range = pd.date_range(start=start, periods=periods, freq=yoffset)
# Create a '1 day' offset:
d1 = pd.tseries.offsets.DateOffset(n = 1)
# Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel,
# while '2021-12-26' was the first day of fiscal year 2022
yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1
# iterate over each item in the date range we created earlier:
result = pd.DataFrame()
for i in yoffset_range:
# recall each item is a Timestamp that represents the first day of the fiscal year,
# so create a date_range from beginning to end of the fiscal year
current_range = pd.date_range(i+d1, i+yoffset, freq='D')
interim_df = pd.DataFrame(index = current_range)
# day of year
interim_df['DOY'] = (current_range-current_range[0]).days +1
# fiscal year
interim_df['FY'] = yr
result = result.append(interim_df)
yr += 1
# workweek in year
result['WW'] = ((result['DOY']-1) //7) + 1
# fiscal quarter
result['FQ'] = np.minimum((result['WW']//13) + 1, 4)
# workweek in quarter
result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)
# fiscal month
result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)
return result
%%timeit
create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')
64.3 ms ± 8.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
df = create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')
df.tail()
DOY | FY | WW | FQ | WWinQ | FM | |
---|---|---|---|---|---|---|
2030-12-24 | 360 | 2030 | 52 | 4 | 13 | 12 |
2030-12-25 | 361 | 2030 | 52 | 4 | 13 | 12 |
2030-12-26 | 362 | 2030 | 52 | 4 | 13 | 12 |
2030-12-27 | 363 | 2030 | 52 | 4 | 13 | 12 |
2030-12-28 | 364 | 2030 | 52 | 4 | 13 | 12 |