A notebook that demonstrates how to cast various time periods (weekly, monthly, quarterly) using pandas.
Time series such as the NHS Digital Accident and Emergency figures are published as quarterly and monthly time series.
The following example shows how to cast the quarterly reports to a quarterly time period in pandas, with quarters aligned by financial year.
#Download example data file
url='https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2016/06/Quarterly-timeseries-May-2017.xls'
!wget -P data/ {url}
--2017-08-08 11:15:51-- https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2016/06/Quarterly-timeseries-May-2017.xls Resolving www.england.nhs.uk... 54.230.199.109, 54.230.199.252, 54.230.199.172, ... Connecting to www.england.nhs.uk|54.230.199.109|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 92672 (90K) [application/vnd.ms-excel] Saving to: 'data/Quarterly-timeseries-May-2017.xls' Quarterly-timeserie 100%[=====================>] 90.50K --.-KB/s in 0.02s 2017-08-08 11:15:52 (4.52 MB/s) - 'data/Quarterly-timeseries-May-2017.xls' saved [92672/92672]
import pandas as pd
#Preview data
dfw=pd.read_excel('data/Quarterly-timeseries-May-2017.xls',
skiprows=15,header=None,na_values='-').dropna(how='all').dropna(axis=1,how='all')
dfw.iloc[0,0]='Reporting'
dfw.iloc[1,0]='Code'
dfw= dfw.fillna(axis=1,method='ffill').T.set_index([0,1]).T.dropna(how='all').dropna(axis=1,how='all')
dfw=dfw.reset_index(drop=True)
dfw.head()
0 | Reporting | A&E attendances | A&E attendances > 4 hours from arrival to admission, transfer or discharge | Emergency Admissions | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Code | Year | Quarter | Type 1 Departments - Major A&E | Type 2 Departments - Single Specialty | Type 3 Departments - Other A&E/Minor Injury Unit | Total attendances | Type 1 Departments - Major A&E | Type 2 Departments - Single Specialty | Type 3 Departments - Other A&E/Minor Injury Unit | ... | Percentage in 4 hours or less (type 1) | Percentage in 4 hours or less (all) | Emergency Admissions via Type 1 A&E | Emergency Admissions via Type 2 A&E | Emergency Admissions via Type 3 and 4 A&E | Total Emergency admissions via A&E | Other Emergency admissions (i.e not via A&E) | Total emergency admissions | Number of patients spending >4 hours from decision to admit to admission | Number of patients spending >12 hours from decision to admit to admission |
0 | NaN | 2004-05 | Q1: April - June | 3377850 | 159966 | 964762 | 4502578 | 232110 | 2612 | 3305 | ... | 0.931285 | 0.947135 | 651785 | 6778 | 7830 | 666393 | 666393 | 666393 | 27665 | 27665 |
1 | NaN | NaN | Q2: July - Sept | 3381219 | 156697 | 1018779 | 4556695 | 182554 | 1830 | 2625 | ... | 0.946009 | 0.95896 | 673452 | 6670 | 8256 | 688378 | 688378 | 688378 | 20989 | 20989 |
2 | NaN | NaN | Q3: Oct - Dec | 3257398 | 150643 | 966886 | 4374927 | 136953 | 1415 | 1364 | ... | 0.957956 | 0.968061 | 705901 | 6802 | 8691 | 721394 | 721394 | 721394 | 19831 | 19831 |
3 | NaN | NaN | Q4: Jan - Mar | 3249353 | 151926 | 1001701 | 4402980 | 115978 | 982 | 687 | ... | 0.964307 | 0.97328 | 724814 | 6975 | 5414 | 737203 | 737203 | 737203 | 24178 | 24178 |
4 | NaN | 2005-06 | Q1: April - June | 3520931 | 165801 | 1172846 | 4859578 | 83845 | 742 | 555 | ... | 0.976187 | 0.98248 | 719644 | 7717 | 6048 | 733409 | 733409 | 733409 | 12991 | 12991 |
5 rows × 21 columns
Note how the time series is described in the data file across two columns: the financial year and the quarter:
dfw[[('Reporting','Year'),('Reporting','Quarter')]].head()
0 | Reporting | |
---|---|---|
1 | Year | Quarter |
0 | 2004-05 | Q1: April - June |
1 | NaN | Q2: July - Sept |
2 | NaN | Q3: Oct - Dec |
3 | NaN | Q4: Jan - Mar |
4 | 2005-06 | Q1: April - June |
Fill down on the financial year, and create a new period column joining those two elements.
dfw['Reporting','Year'].fillna(method='ffill',inplace=True)
dfw['Reporting','period']=dfw['Reporting','Year']+' '+dfw['Reporting','Quarter']
dfw[[('Reporting','Year'),('Reporting','Quarter'),('Reporting','period')]].head()
0 | Reporting | ||
---|---|---|---|
1 | Year | Quarter | period |
0 | 2004-05 | Q1: April - June | 2004-05 Q1: April - June |
1 | 2004-05 | Q2: July - Sept | 2004-05 Q2: July - Sept |
2 | 2004-05 | Q3: Oct - Dec | 2004-05 Q3: Oct - Dec |
3 | 2004-05 | Q4: Jan - Mar | 2004-05 Q4: Jan - Mar |
4 | 2005-06 | Q1: April - June | 2005-06 Q1: April - June |
Use the pandas "anchored offsets" to specifiy the quarterly period relative to the financial year.
#Return a date that we can use to anchor a period on
def getMonthYear(row):
if not row['Reporting','Quarter'].startswith('Q'): return ''
month=row['Reporting','Quarter'].split(':')[1].split('-')[0].strip()
year=int(row['Reporting','Year'].split('-')[0])
if month in ['Jan']:
year= year+1
#Following the conversion, the _quarter year specifies the calendar year in which the financial year ends
#Return the date corresponding to the first day of the month in the calendar year the financial quarter applies to
return pd.to_datetime("01-{}-{}".format(month[:3],year),format='%d-%b-%Y')
#Get a date corrsponding to the first day in the month of the first month in the quarter, by calendar year
#Use this date to align a particular quarterly period
#Q-MAR specifies a quarterly frequency, with year ending in March
dfw['Reporting','_quarter']=pd.PeriodIndex(dfw.apply(getMonthYear,axis=1), freq='Q-MAR')
#Sort the columns
dfw=dfw.sort_index(axis=1)
dfw=dfw.reindex(columns=['Reporting']+[c for c in dfw.columns.levels[0] if c!='Reporting'],level=0)
#Preview the data with the quarter time period column defined
dfw[[('Reporting','Year'),('Reporting','Quarter'),('Reporting','period'),('Reporting','_quarter')]].head()
0 | Reporting | |||
---|---|---|---|---|
1 | Year | Quarter | period | _quarter |
0 | 2004-05 | Q1: April - June | 2004-05 Q1: April - June | 2005Q1 |
1 | 2004-05 | Q2: July - Sept | 2004-05 Q2: July - Sept | 2005Q2 |
2 | 2004-05 | Q3: Oct - Dec | 2004-05 Q3: Oct - Dec | 2005Q3 |
3 | 2004-05 | Q4: Jan - Mar | 2004-05 Q4: Jan - Mar | 2005Q4 |
4 | 2005-06 | Q1: April - June | 2005-06 Q1: April - June | 2006Q1 |
NHS Digital Accident and Emergency figures are also published as a monthly time series.
url='https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2017/06/May-17-Monthly-AE-Timeseries-with-growth-charts.xlsx'
!wget -P data/ {url}
--2017-08-08 11:31:25-- https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2017/06/May-17-Monthly-AE-Timeseries-with-growth-charts.xlsx Resolving www.england.nhs.uk... 54.230.199.30, 54.230.199.113, 54.230.199.252, ... Connecting to www.england.nhs.uk|54.230.199.30|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 111779 (109K) [application/vnd.openxmlformats-officedocument.spreadsheetml.sheet] Saving to: 'data/May-17-Monthly-AE-Timeseries-with-growth-charts.xlsx' May-17-Monthly-AE-T 100%[=====================>] 109.16K --.-KB/s in 0.02s 2017-08-08 11:31:26 (4.53 MB/s) - 'data/May-17-Monthly-AE-Timeseries-with-growth-charts.xlsx' saved [111779/111779]
dfw=pd.read_excel('data/May-17-Monthly-AE-Timeseries-with-growth-charts.xlsx',skiprows=16,header=None,na_values='-').dropna(how='all').dropna(axis=1,how='all')
dfw.iloc[0,0]='Reporting'
dfw.iloc[1,0]='Code'
dfw= dfw.fillna(axis=1,method='ffill').T.set_index([0,1]).T.dropna(how='all').dropna(axis=1,how='all')
dfw.head()
0 | Reporting | A&E attendances | A&E attendances > 4 hours from arrival to admission, transfer or discharge | Emergency Admissions | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Code | Period | Type 1 Departments - Major A&E | Type 2 Departments - Single Specialty | Type 3 Departments - Other A&E/Minor Injury Unit | Total Attendances | Type 1 Departments - Major A&E | Type 2 Departments - Single Specialty | Type 3 Departments - Other A&E/Minor Injury Unit | Total Attendances > 4 hours | ... | Percentage in 4 hours or less (all) | Emergency Admissions via Type 1 A&E | Emergency Admissions via Type 2 A&E | Emergency Admissions via Type 3 and 4 A&E | Total Emergency Admissions via A&E | Other Emergency Admissions (i.e not via A&E) | Total Emergency Admissions | Number of patients spending >4 hours from decision to admit to admission | Number of patients spending >12 hours from decision to admit to admission | Operational standard (Performance) |
2 | NaN | 2010-08-01 00:00:00 | 1138652 | 54371 | 559358 | 1752381 | 32664 | 97 | 423 | 33184 | ... | 0.981063 | 287438 | 5367 | 8081 | 300886 | 124816 | 425702 | 3697 | 1 | 0.95 |
3 | NaN | 2010-09-01 00:00:00 | 1150728 | 55181 | 550359 | 1756268 | 40416 | 142 | 593 | 41151 | ... | 0.976569 | 293991 | 5543 | 3673 | 303207 | 121693 | 424900 | 5907 | 0 | 0.95 |
4 | NaN | 2010-10-01 00:00:00 | 1163143 | 54961 | 583244 | 1801348 | 46467 | 141 | 806 | 47414 | ... | 0.973679 | 303452 | 5485 | 2560 | 311497 | 124718 | 436215 | 6932 | 0 | 0.95 |
5 | NaN | 2010-11-01 00:00:00 | 1.11129e+06 | 53727.4 | 486005 | 1.65103e+06 | 45838.4 | 143.429 | 454.571 | 46436.4 | ... | 0.971874 | 297832 | 5731.14 | 3279 | 306842 | 122257 | 429099 | 7179 | 2 | 0.95 |
6 | NaN | 2010-12-01 00:00:00 | 1.1592e+06 | 45536.4 | 533001 | 1.73774e+06 | 88475.3 | 140.571 | 1301.43 | 89917.3 | ... | 0.948256 | 318602 | 6277 | 3198.43 | 328078 | 124651 | 452729 | 13818.1 | 15 | 0.95 |
5 rows × 21 columns
The pandas dt.to_period('M')
function returns a monthly period corresponding to the month within which a specified date falls.
#Cast the date corresponding to the month start as a month period starting on that date
dfw['Reporting','_period']=pd.to_datetime(dfw['Reporting','Period'],
format='%d/%m/%Y').dt.to_period('M')
dfw=dfw.sort_index(axis=1)
dfw=dfw.reindex(columns=['Reporting']+[c for c in dfw.columns.levels[0] if c!='Reporting'],level=0)
dfw[[('Reporting','Period'),('Reporting','_period')]].head()
0 | Reporting | |
---|---|---|
1 | Period | _period |
2 | 2010-08-01 00:00:00 | 2010-08 |
3 | 2010-09-01 00:00:00 | 2010-09 |
4 | 2010-10-01 00:00:00 | 2010-10 |
5 | 2010-11-01 00:00:00 | 2010-11 |
6 | 2010-12-01 00:00:00 | 2010-12 |
The NHS used to publish weekly time series for A & E figures. Archival data is still available.
url='https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/04/2015.06.28-AE-TimeseriesBaG87.xls'
!wget -P data/ {url}
--2017-08-08 11:40:14-- https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2015/04/2015.06.28-AE-TimeseriesBaG87.xls Resolving www.england.nhs.uk... 54.230.199.90, 54.230.199.191, 54.230.199.113, ... Connecting to www.england.nhs.uk|54.230.199.90|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 84480 (82K) [application/vnd.ms-excel] Saving to: 'data/2015.06.28-AE-TimeseriesBaG87.xls' 2015.06.28-AE-Times 100%[=====================>] 82.50K --.-KB/s in 0.03s 2017-08-08 11:40:15 (3.18 MB/s) - 'data/2015.06.28-AE-TimeseriesBaG87.xls' saved [84480/84480]
dfw=pd.read_excel('data/2015.06.28-AE-TimeseriesBaG87.xls',
skiprows=14,header=None,na_values='-').dropna(how='all').dropna(axis=1,how='all')
dfw.iloc[0,0]='Reporting'
dfw.iloc[1,0]='Code'
dfw= dfw.fillna(axis=1,method='ffill').T.set_index([0,1]).T.dropna(how='all').dropna(axis=1,how='all')
dfw.head()
0 | Reporting | A&E attendances | A&E attendances > 4 hours from arrival to admission, transfer or discharge | Emergency Admissions | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Code | Period | Type 1 Departments - Major A&E | Type 2 Departments - Single Specialty | Type 3 Departments - Other A&E/Minor Injury Unit | Total Attendances | Type 1 Departments - Major A&E | Type 2 Departments - Single Specialty | Type 3 Departments - Other A&E/Minor Injury Unit | Total Attendances > 4 hours | Percentage in 4 hours or less (type 1) | Percentage in 4 hours or less (all) | Emergency Admissions via Type 1 A&E | Emergency Admissions via Type 2 A&E | Emergency Admissions via Type 3 and 4 A&E | Total Emergency Admissions via A&E | Other Emergency Admissions (i.e not via A&E) | Total Emergency Admissions | Number of patients spending >4 hours from decision to admit to admission | Number of patients spending >12 hours from decision to admit to admission |
2 | NaN | 2015-16 Year to Date | 3702458 | 159697 | 1850023 | 5712178 | 329060 | 1278 | 6765 | 337103 | 0.911124 | 0.940985 | 994256 | 4077 | 12543 | 1010876 | 363004 | 1373880 | 71252 | 130 |
4 | Eng | W/E 07/11/2010 | 267142 | 12806 | 111547 | 391495 | 11827 | 40 | 85 | 11952 | 0.955728 | 0.969471 | 73589 | 1372 | 1272 | 76233 | 28181 | 104414 | 1847 | 0 |
5 | Eng | W/E 14/11/2010 | 256893 | 13135 | 115657 | 385685 | 10223 | 29 | 131 | 10383 | 0.960205 | 0.973079 | 67526 | 1340 | 563 | 69429 | 28558 | 97987 | 1601 | 0 |
6 | Eng | W/E 21/11/2010 | 260958 | 12751 | 115034 | 388743 | 9790 | 33 | 94 | 9917 | 0.962484 | 0.97449 | 68990 | 1343 | 649 | 70982 | 28794 | 99776 | 1430 | 0 |
7 | Eng | W/E 28/11/2010 | 256061 | 12126 | 113086 | 381273 | 10087 | 36 | 96 | 10219 | 0.960607 | 0.973198 | 68181 | 1305 | 639 | 70125 | 29095 | 99220 | 1703 | 2 |
The pandas .to_period('W')
function returns a week period, starting on a Monday, that a given date falls within.
#Filter to retain rows that have a W/E period specified
dfw=dfw[dfw[('Reporting','Period')].str.startswith('W/E')]
#Remove the W/E prefix and use the date to define
dfw['Reporting','_period']=pd.to_datetime(dfw['Reporting','Period'].str.replace('W/E ',''),
format='%d/%m/%Y').dt.to_period('W')
dfw[[('Reporting','Period'),('Reporting','_period')]].head()
0 | Reporting | |
---|---|---|
1 | Period | _period |
4 | W/E 07/11/2010 | 2010-11-01/2010-11-07 |
5 | W/E 14/11/2010 | 2010-11-08/2010-11-14 |
6 | W/E 21/11/2010 | 2010-11-15/2010-11-21 |
7 | W/E 28/11/2010 | 2010-11-22/2010-11-28 |
8 | W/E 05/12/2010 | 2010-11-29/2010-12-05 |