{admonition}
:class: note
When you want rolling returns over some calendar unit (hours, days, weeks, months, year), in realistic data with many firms and returns missing for parts of the calendar (e.g. no weekend trading, no after-hours trading):
1. `df['newVarName'] = df.set_index('date').groupby('asset')['ret'].rolling(window=<dateoffset>,1).apply(lambda x: np.prod(1+x)-1).values`
1. Do not use `window=#` unless you are sure your data is perfectly formed. `window=5` fails in real data because stock markets close for the weekend.
1. The `<dateoffset>` could be '7D', '24H', '1M' or many other things - [see this link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).
1. The docs for the [`rolling` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#pandas.DataFrame.rolling) provide extra functionality.
Suppose we have an asset-timeperiod dataset, in long/tidy form. For example, this is a firm-day trading dataset:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Firm': [1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'Date' : ['1/11/22','1/12/22','1/13/22','1/14/22','1/18/22','1/19/22','1/30/22',
'1/11/22','1/12/22','1/13/22','1/14/22','1/18/22','1/19/22','1/30/22'],
'ret' : [.01,.01,.01,-.0294,.01,-0.01,.03,.02,-.02,.03,-.03,.04,-.04,.05] })
df['Date'] = pd.to_datetime(df.Date,format="%m/%d/%y")
df
Firm | Date | ret | |
---|---|---|---|
0 | 1 | 2022-01-11 | 0.0100 |
1 | 1 | 2022-01-12 | 0.0100 |
2 | 1 | 2022-01-13 | 0.0100 |
3 | 1 | 2022-01-14 | -0.0294 |
4 | 1 | 2022-01-18 | 0.0100 |
5 | 1 | 2022-01-19 | -0.0100 |
6 | 1 | 2022-01-30 | 0.0300 |
7 | 2 | 2022-01-11 | 0.0200 |
8 | 2 | 2022-01-12 | -0.0200 |
9 | 2 | 2022-01-13 | 0.0300 |
10 | 2 | 2022-01-14 | -0.0300 |
11 | 2 | 2022-01-18 | 0.0400 |
12 | 2 | 2022-01-19 | -0.0400 |
13 | 2 | 2022-01-30 | 0.0500 |
Let's create a variable containing the cumulative return over the last week. Let's stipulate that if we have less than five days of returns for a firm at any point, we just use what we have.
{exercise}
Before continuing, eyeball the dataset above and guess what the rolling week-long returns should be for each observation.
This one-liner is very close to correct:
df.groupby('Firm')['ret'].rolling(5,1).apply(lambda x: np.prod(1+x)-1)
It's not correct, but let's start here because it's easier to understand. Let me separate that code up and add comments for each part:
(df.groupby('Firm')['ret'] # for each firm, grab the ret var
.rolling(window=5,min_periods=1) # in each row, use the prior 1-5 obs (including the current one)
.apply(lambda x: np.prod(1+x)-1) # R[0,T] = prod{ R(0), ..., R(T) } - 1
)
To add that variable to our dataset, df['varname'] = <the above>
doesn't quite work. A small workaround does it: add .values
to the end of the line. (It returns just the column of numbers and ignores the weird index the code above makes.)
So let's look at the result:
df['ret1week_wrong'] = df.groupby('Firm')['ret'].rolling(5,1).apply(lambda x: np.prod(1+x)-1).values
df[:7] # just look at the top of the dataset
Firm | Date | ret | ret1week_wrong | |
---|---|---|---|---|
0 | 1 | 2022-01-11 | 0.0100 | 0.010000 |
1 | 1 | 2022-01-12 | 0.0100 | 0.020100 |
2 | 1 | 2022-01-13 | 0.0100 | 0.030301 |
3 | 1 | 2022-01-14 | -0.0294 | 0.000010 |
4 | 1 | 2022-01-18 | 0.0100 | 0.010010 |
5 | 1 | 2022-01-19 | -0.0100 | -0.009990 |
6 | 1 | 2022-01-30 | 0.0300 | 0.009614 |
{warning}
The result above is wrong!
- 1/17 was a holiday, so the return for 1/18 should start with 1/12 (and only use 4 rows).
- 1/30 should only include itself since we are missing dates.
Let's fix that. The documentation mentions that the window length can use a time period!
So instead of window=5
, we can use window='7D'
!
Except there is a final complication: After df.groupby('Firm')['ret']
, the date variable is gone. And rolling()
needs that, So we need to keep the date variable around somehow.
The last fix: Add .set_index('Date')
before groupby, so that the date variable will stick around as the index.
Put all that together:
We can write this in one line:
df['ret1week'] = df.set_index('Date').groupby('Firm')['ret'].rolling('7D',1).apply(lambda x: np.prod(1+x)-1).values
{note}
Look at the last three dates for each firm, and you'll see how incorrect the `window=#` approach is.
df.set_index(['Firm','Date'])
ret | ret1week_wrong | ret1week | ||
---|---|---|---|---|
Firm | Date | |||
1 | 2022-01-11 | 0.0100 | 0.010000 | 0.010000 |
2022-01-12 | 0.0100 | 0.020100 | 0.020100 | |
2022-01-13 | 0.0100 | 0.030301 | 0.030301 | |
2022-01-14 | -0.0294 | 0.000010 | 0.000010 | |
2022-01-18 | 0.0100 | 0.010010 | 0.000010 | |
2022-01-19 | -0.0100 | -0.009990 | -0.019792 | |
2022-01-30 | 0.0300 | 0.009614 | 0.030000 | |
2 | 2022-01-11 | 0.0200 | 0.020000 | 0.020000 |
2022-01-12 | -0.0200 | -0.000400 | -0.000400 | |
2022-01-13 | 0.0300 | 0.029588 | 0.029588 | |
2022-01-14 | -0.0300 | -0.001300 | -0.001300 | |
2022-01-18 | 0.0400 | 0.038648 | 0.018283 | |
2022-01-19 | -0.0400 | -0.022449 | -0.002499 | |
2022-01-30 | 0.0500 | 0.047377 | 0.050000 |