This notebook explains how to aggregate time series data in pandas
.
This notebook will use gold and silver price data from rdatasets
for this tutorial
The documentation for each package used in this tutorial is linked below:
import statsmodels.api as sm
import pandas as pd
The data is from rdatasets
imported using the Python package statsmodels
.
df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9132 entries, 0 to 9131 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 9132 non-null object 1 gold 9132 non-null float64 2 silver 9132 non-null float64 dtypes: float64(2), object(1) memory usage: 214.2+ KB
df['date'] = pd.to_datetime(df.date)
The pandas
function resample
can be used to create aggregations on specified windows. Here, a weekly aggregate of the daily gold and silver price data will be created.
First, a datetime index needs to be created from the date column.
df.set_index('date', inplace=True)
The parameter origin is used to specify the day of week to start the aggregation on. January 1, 1978 is a Sunday, so each of these aggreagtions will start on Sunday and aggregate over the subsequent week.
weekly_resample = df.resample('7D', origin=pd.to_datetime('1978-01-01'))
aggregated_df = weekly_resample.agg(['min', 'mean', 'max', 'std'])
aggregated_df.columns = ['_'.join(col).strip() + '_week' for col in aggregated_df.columns.values]
aggregated_df.head(20)
gold_min_week | gold_mean_week | gold_max_week | gold_std_week | silver_min_week | silver_mean_week | silver_max_week | silver_std_week | |
---|---|---|---|---|---|---|---|---|
date | ||||||||
1977-12-25 | 100.00 | 100.000 | 100.00 | NaN | 223.42 | 223.420 | 223.42 | NaN |
1978-01-01 | 100.00 | 100.000 | 100.00 | 0.000000 | 223.42 | 226.604 | 229.84 | 2.596657 |
1978-01-08 | 100.88 | 101.586 | 102.62 | 0.796825 | 227.89 | 230.050 | 232.55 | 1.846334 |
1978-01-15 | 100.87 | 101.832 | 103.09 | 0.796976 | 227.96 | 229.686 | 232.49 | 1.755130 |
1978-01-22 | 102.49 | 103.306 | 103.62 | 0.475742 | 230.37 | 232.798 | 234.50 | 1.777630 |
1978-01-29 | 101.96 | 102.578 | 103.49 | 0.636530 | 226.91 | 229.116 | 231.61 | 1.936835 |
1978-02-05 | 101.90 | 102.572 | 103.15 | 0.505193 | 224.98 | 227.506 | 229.90 | 1.875361 |
1978-02-12 | 103.92 | 104.936 | 105.87 | 0.850312 | 228.16 | 231.506 | 234.01 | 2.412370 |
1978-02-19 | 105.14 | 105.956 | 107.23 | 0.822028 | 229.81 | 232.392 | 234.55 | 1.784550 |
1978-02-26 | 106.20 | 107.076 | 107.60 | 0.538451 | 229.83 | 231.092 | 232.22 | 1.043681 |
1978-03-05 | 108.89 | 109.916 | 111.03 | 0.791789 | 235.33 | 244.862 | 248.98 | 5.619695 |
1978-03-12 | 106.05 | 108.042 | 110.39 | 1.638740 | 248.09 | 251.940 | 254.95 | 2.875239 |
1978-03-19 | 102.65 | 104.458 | 105.53 | 1.355865 | 240.71 | 243.672 | 246.05 | 2.578317 |
1978-03-26 | 104.66 | 106.180 | 107.29 | 1.119710 | 246.81 | 249.628 | 253.55 | 2.764710 |
1978-04-02 | 103.33 | 103.968 | 105.27 | 0.771894 | 241.89 | 243.780 | 246.38 | 1.809627 |
1978-04-09 | 102.99 | 104.120 | 105.07 | 0.806908 | 241.23 | 243.816 | 247.27 | 2.410940 |
1978-04-16 | 99.11 | 99.996 | 100.93 | 0.852397 | 230.53 | 232.784 | 235.15 | 2.045527 |
1978-04-23 | 97.45 | 98.234 | 99.41 | 0.856756 | 225.99 | 228.588 | 230.49 | 1.848194 |
1978-04-30 | 98.38 | 99.778 | 100.92 | 1.143119 | 228.87 | 232.358 | 234.71 | 2.439912 |
1978-05-07 | 100.11 | 100.874 | 101.51 | 0.644306 | 232.08 | 233.124 | 234.55 | 0.905003 |