This notebook explains how to calculate the mean absolute difference of time series data in pandas
.
This notebook will use gold and silver price data from rdatasets
for this tutorial
The documentation for each package used in this tutorial is linked below:
import statsmodels.api as sm
import numpy as np
import pandas as pd
The data is from rdatasets
imported using the Python package statsmodels
.
df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9132 entries, 0 to 9131 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 9132 non-null object 1 gold 9132 non-null float64 2 silver 9132 non-null float64 dtypes: float64(2), object(1) memory usage: 214.2+ KB
df['date'] = pd.to_datetime(df.date)
The pandas
function rolling
can be used to create aggregations on windows of specific lengths. Here, an aggregate of the daily gold and silver price data will be created covering the primary week.
First, a datetime index needs to be created from the date column.
df.set_index('date', inplace=True)
Pandas aggregation will require a function that can be called. The mean difference will be called using three functions from numpy
: mean
, abs
and diff
. diff
calculates the difference between the current value and a prior value (by default the immediate prior value).
def mean_absolute_difference(series):
return np.mean(np.abs(np.diff(series)))
If, instead of an offset ('7D' representing 7 days), a number is used, it will just use the prior number of observations.
weekly_resample = df.rolling('7D')
aggregated_df = weekly_resample.agg(['mean', mean_absolute_difference])
aggregated_df.columns = ['_'.join(col).strip() + '_week' for col in aggregated_df.columns.values]
aggregated_df.head(20)
gold_mean_week | gold_mean_absolute_difference_week | silver_mean_week | silver_mean_absolute_difference_week | |
---|---|---|---|---|
date | ||||
1977-12-30 | 100.000 | NaN | 223.420 | NaN |
1978-01-02 | 100.000 | 0.0000 | 223.420 | 0.000000 |
1978-01-03 | 100.000 | 0.0000 | 225.560 | 3.210000 |
1978-01-04 | 100.000 | 0.0000 | 225.315 | 3.893333 |
1978-01-05 | 100.000 | 0.0000 | 225.850 | 3.772500 |
1978-01-06 | 100.000 | 0.0000 | 226.604 | 3.972500 |
1978-01-09 | 100.246 | 0.3075 | 227.844 | 2.975000 |
1978-01-10 | 100.436 | 0.3775 | 227.670 | 1.822500 |
1978-01-11 | 100.886 | 0.7025 | 228.998 | 1.532500 |
1978-01-12 | 101.062 | 1.0450 | 228.978 | 2.165000 |
1978-01-13 | 101.586 | 1.1725 | 230.050 | 2.722500 |
1978-01-16 | 101.958 | 1.2200 | 230.624 | 2.575000 |
1978-01-17 | 102.124 | 1.2225 | 230.866 | 2.590000 |
1978-01-18 | 101.848 | 1.1075 | 230.214 | 2.312500 |
1978-01-19 | 102.012 | 0.8800 | 230.412 | 1.377500 |
1978-01-20 | 101.832 | 0.7675 | 229.686 | 1.372500 |
1978-01-23 | 101.938 | 0.9150 | 229.988 | 2.065000 |
1978-01-24 | 102.300 | 0.6950 | 230.678 | 1.602500 |
1978-01-25 | 102.836 | 0.4975 | 231.986 | 1.590000 |
1978-01-26 | 103.152 | 0.5600 | 232.508 | 2.332500 |