#!/usr/bin/env python # coding: utf-8 # # Exploring `pandas` Date Offsets # Creating finance calender lookup tables with the powerful date and time classes in base `pandas`. # Reference: # # * [Pandas user guide on time series/date functionality](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html) # * [Pandas docs on `timedeltas`](https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html) # * [Pandas Reference Docs on Date offsets](https://pandas.pydata.org/docs/reference/offset_frequency.html#) # * [Docs on Period objects](https://pandas.pydata.org/docs/reference/api/pandas.Period.html) # * [Pandas frequency strings aka offset aliases](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases) # * [Pandas Info on the FY5253 offset](https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html) # In[1]: import datetime import numpy as np import pandas as pd print(pd.__version__) # `pandas` has a variety of functions that let you create and manipulate date ranges that conform to a fiscal calendar that begins on January 1st of every year and ends on Dec 31st of every year, and each quarter is exactly 3 months long. # # We can see an example of this built-in functionality if we use `pandas` to give us an array of the *last* day in each quarter in 2021 using the `freq='Q'` parameter. # In[2]: start = datetime.datetime(2021, 1, 1) end = datetime.datetime(2022, 1, 1) # Create a DatetimeIndex with freq='Q' # the 'Q' indicates the last day of each quarter ts = pd.date_range(start, end, freq='Q') ts # We can use this index to create a dataframe with columns with various attributes, by converting this stuff to a `period` object: # In[3]: tp = ts.to_period() pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear}) # However, what if you wanted to use `pandas` to handle fiscal calendars that aren't set to this standard? For example, let's say we have a fiscal calendar that starts on June 1st (i.e. Nike). # # Notice that the previous creation of a `pd.date_range` with `freq='Q'` created a DatetimeIndex with a freq of `'Q-DEC'`. While there's a list of ['offset aliases'](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases) there isn't much mention of what `'Q-DEC'` is except mentioned briefly in the [Pandas user guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#frequency-conversion-and-resampling-with-periodindex): # # >Many organizations define quarters relative to the month in which their fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or a few months into 2011. Via anchored frequencies, pandas works for all quarterly frequencies `Q-JAN` through `Q-DEC`. # So in other words, the default quarter system `Q-DEC` (and `A-DEC`) means that it ends in December. So if we wanted to have a fiscal calendar that ends in May and starts in June 1st, we'd use `Q-MAY`: # In[4]: ts = pd.date_range(start, end, freq='Q-MAY') ts # In[5]: tp = ts.to_period() pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear}) # So that was easy! And instead of creating the date range with just the end quarters, we can create a daily date range and then use the `to_period()` call to do the fiscal year translation for us, to create a *daily* calendar lookup table: # In[6]: # create a DatetimeIndex with all days between 2021 and 2022 ts = pd.date_range(start='2021', end='2022', freq='D') tp = ts.to_period(freq='Q-MAY') df = pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear}) df.tail() # We can see that the correct fiscal year and fiscal quarters are represented. Nice! # ## What are Periods? # # Notice in the above example, I converted the DatetimeIndex into something with the `to_period()` call. But what exactly is that? Let's take a look: # In[7]: tp # Our `DatetimeIndex` was converted into a `PeriodIndex`, which is an index of `Period` objects. Each `Period` object is a custom object that has a variety of helpful attributes, like `quarter`, `month`, etc that we can extract for our lookup calendar if we wanted to. # ## 52/53 Workweek Calendars # # What if your company follows a 52/53-workweek calendar using the 4-4-5 quarter system (i.e. Intel, Dell)? I want to be able to use `pandas` methods on those, too! # # We can use the `pd.tseries.offsets.FY5253` DateOffset type to find the dates of the beginning of each fiscal year: # In[8]: variation = 'last' # create an offset for the Intel fiscal calendar, which ends on the last Saturday of December yoffset = pd.tseries.offsets.FY5253(n=1, weekday=5, startingMonth=12, variation=variation) # use the offset to create a date range with intervals at each fiscal year beginning yoffset_range = pd.date_range('2020', periods=3, freq=yoffset) yoffset_range # Let's take a look at the above result - we've generated a `DatetimeIndex` with the *last* day of each fiscal year. # # Notice the new `freq` string. Creating the timeseries offset object generates a new `freq` string that you can use for the `pd.date_range` function. However, unlike the previous examples you can't convert these to `Period` objects to get the quarter, month, etc (you can try it - it'll return an error). # # So how can we use this to create a lookup table? Well, first off having the beginning dates for each fiscal year is helpful to us because having to worry about leap years and figuring out whether a year is going to be a 52 workweek year vs 53 workweek year is annoying. Trust me, I went down that rabbit hole. # # But with the start/end dates for each fiscal year and the fact that we're following a 4-4-5 convention, we can create this manually: # In[9]: # Create a '1 day' offset: d1 = pd.tseries.offsets.DateOffset(n = 1) # Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel, # while '2021-12-26' was the first day of fiscal year 2022 yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1 # iterate over each item in the date range we created earlier: result = pd.DataFrame() for i in yoffset_range: # recall each item is a Timestamp that represents the first day of the fiscal year, # so create a date_range from beginning to end of the fiscal year current_range = pd.date_range(i+d1, i+yoffset, freq='D') interim_df = pd.DataFrame(index = current_range) # day of year interim_df['DOY'] = (current_range-current_range[0]).days +1 # fiscal year interim_df['FY'] = yr result = result.append(interim_df) yr += 1 # In[10]: result.query('FY==2021').tail(10) # Let's go ahead and add in a few more features: # In[11]: # workweek in year result['WW'] = ((result['DOY']-1) //7) + 1 # fiscal quarter result['FQ'] = np.minimum((result['WW']//13) + 1, 4) # workweek in quarter result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13) # fiscal month result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3) # In[12]: result.query('FY==2021').tail(10) # Great! We can summarize the function below and in a [public gist](https://gist.github.com/banditkings/735fe1885a442b3457d208060ac4b970). # In[13]: """ Make a dataframe for a 52/53-workweek accounting period calendar given the last day of the previous year. Since many companies use this 52/53 workweek for their fiscal periods, this is a useful function to quickly build custom lookup tables so you can join between datasets! """ import pandas as pd from datetime import datetime import numpy as np def create_5253_calendar(start='2020', periods=3, weekday=5, startingMonth=12, variation='last'): """ Builds a lookup dataframe with calendar date on the index and relevant fiscal dates for a 52/53 workweek aka 4-4-5 fiscal calendar. See https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html for info on the pandas 52/53 workweek dateoffset object. Parameters ---------- start : str, datetime, or pd.Timestamp object (default : '2020') Indicate the calendar year (i.e. '2020') or datetime/timestamp for the final output dataframe periods : int (default : 3 Number of fiscal years ahead to build the output dataframe weekday : int (default : 5) Which day of week the fiscal year ends on See also: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html * 0 : Monday * 1 : Tuesday * 2 : Wednesday * 3 : Thursday * 4 : Friday * 5 : Saturday * 6 : Sunday startingMonth : int (default : 12) Following the pandas convention, but this actually signals the 'ending month' for the fiscal year. So a fiscal year that ends on the last Saturday of every December would have a value of 12 here. variation : str (default : 'last') Either 'last' or 'nearest' - parameters for the pandas FY5253 offset that specifies the type of 5253 calendar this is. If it's 'last', then the fiscal year ends on the 'last' `weekday` of the `startingMonth`. If it's 'nearest', then the fiscal year ends on the 'nearest' `weekday` of the `startingMonth`. Returns ------- pd.DataFrame Returns a dataframe with a daily DatetimeIndex and columns that map to key fiscal timelines for a 52/53 workweek calendar, such as DOY (day of fiscal year), WW (workweek in fiscal year), etc Examples -------- >>>create_5253_calendar(start='20', periods=5, weekday=5, startingMonth=12, variation='last') """ # create an offset for the Intel fiscal calendar, which ends on the last Saturday of December yoffset = pd.tseries.offsets.FY5253(n=1, weekday=weekday, startingMonth=startingMonth, variation=variation) # use the offset to create a date range with intervals at each fiscal year beginning yoffset_range = pd.date_range(start=start, periods=periods, freq=yoffset) # Create a '1 day' offset: d1 = pd.tseries.offsets.DateOffset(n = 1) # Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel, # while '2021-12-26' was the first day of fiscal year 2022 yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1 # iterate over each item in the date range we created earlier: result = pd.DataFrame() for i in yoffset_range: # recall each item is a Timestamp that represents the first day of the fiscal year, # so create a date_range from beginning to end of the fiscal year current_range = pd.date_range(i+d1, i+yoffset, freq='D') interim_df = pd.DataFrame(index = current_range) # day of year interim_df['DOY'] = (current_range-current_range[0]).days +1 # fiscal year interim_df['FY'] = yr result = result.append(interim_df) yr += 1 # workweek in year result['WW'] = ((result['DOY']-1) //7) + 1 # fiscal quarter result['FQ'] = np.minimum((result['WW']//13) + 1, 4) # workweek in quarter result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13) # fiscal month result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3) return result # In[14]: get_ipython().run_cell_magic('timeit', '', "create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')\n") # In[15]: df = create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last') df.tail() # In[ ]: