(c) 2016 - present. Enplus Advisors, Inc.
import numpy as np
import pandas as pd
pd.set_option('display.float_format', '{:,.1f}'.format)
dat = pd.read_csv('data/weather-6m.csv')
Exercise:
Calculate the average air_temp
by month
.
grp = dat.groupby('month')
grp['air_temp'].mean()
month 1 -10.0 2 -3.0 3 2.1 4 7.0 5 14.0 6 18.1 Name: air_temp, dtype: float64
Exercise:
Compute summary statistics on air_temp
and dew_point
using
the describe
method.
grp[['air_temp', 'dew_point']].describe()
air_temp | dew_point | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | |
month | ||||||||||||||||
1 | 712.0 | -10.0 | 6.2 | -29.4 | -13.3 | -10.0 | -5.6 | 2.8 | 712.0 | -14.1 | 6.8 | -32.8 | -18.3 | -13.9 | -8.9 | 1.0 |
2 | 644.0 | -3.0 | 6.8 | -19.4 | -7.2 | -2.2 | 1.7 | 15.0 | 644.0 | -7.3 | 7.3 | -22.8 | -12.2 | -7.2 | -2.2 | 8.3 |
3 | 713.0 | 2.1 | 6.7 | -13.3 | -1.7 | 2.2 | 5.6 | 22.8 | 713.0 | -3.4 | 6.1 | -17.2 | -7.8 | -2.8 | 0.6 | 13.3 |
4 | 691.0 | 7.0 | 6.0 | -2.8 | 2.8 | 5.6 | 9.4 | 28.3 | 691.0 | 0.3 | 6.1 | -13.3 | -3.9 | -1.1 | 3.9 | 16.7 |
5 | 713.0 | 14.0 | 5.1 | 1.1 | 10.6 | 13.9 | 17.2 | 28.3 | 713.0 | 6.2 | 4.6 | -6.1 | 2.8 | 6.7 | 10.0 | 18.3 |
6 | 688.0 | 18.1 | 6.0 | 3.3 | 13.8 | 17.8 | 22.8 | 33.3 | 688.0 | 12.3 | 5.5 | -3.3 | 8.9 | 11.7 | 17.2 | 23.3 |
Exercise:
For January and February and 0 - 11 hours, calculate the average and standard deviation of air_temp
grouping by month and hour of the day. Name your result columns air_temp_mean
and air_temp_sd
.
Your result DataFrame
should have 24 rows, the number of months (2) times the number of hours (12).
$2 * 12 = 24$
idx = dat.month.isin([1, 2]) & (dat.hour < 12)
grp2 = dat[idx].groupby(['month', 'hour'])
hourly_temp = grp2.agg(
air_temp_mean=('air_temp', 'mean'),
air_temp_sd=('air_temp', 'std')
)
Exercise:
By month, calculate quantiles for air_temp
using the quantiles defined in breaks
.
Hint: Use the quantile
method defined on a Series
(pd.Series.quantile
).
breaks = [0.01, 0.25, 0.5, 0.75, 0.99]
grp3 = dat.groupby('month')
grp3.apply(lambda x: x.air_temp.quantile(breaks))
air_temp | 0.0 | 0.2 | 0.5 | 0.8 | 1.0 |
---|---|---|---|---|---|
month | |||||
1 | -25.0 | -13.3 | -10.0 | -5.6 | 1.1 |
2 | -18.3 | -7.2 | -2.2 | 1.7 | 12.8 |
3 | -11.0 | -1.7 | 2.2 | 5.6 | 19.3 |
4 | -2.2 | 2.8 | 5.6 | 9.4 | 23.4 |
5 | 2.8 | 10.6 | 13.9 | 17.2 | 27.8 |
6 | 6.1 | 13.8 | 17.8 | 22.8 | 32.2 |