Pulling together spending data from Milton Keynes Council.
We need to start by pulling down the data files and getting the data into a single data table.
!mkdir mkdata
import requests, re, urllib.request
from bs4 import BeautifulSoup
def linkgrabber_MK(response):
soup=BeautifulSoup(response.content)
results=[]
#Get the list of results
resultsblock=soup.find('div',{'class':'content right'})
for result in resultsblock.findAll('a',text=re.compile('^Council Spend Data.*')):
csvurl=result.attrs['href']
fn=result.attrs['href'].split('/')[-1]
results.append((csvurl,fn))
return results
def harvester(links,path='.'):
for csvurl,fn in links:
print('Grabbing {0} as {1}'.format(csvurl, fn))
urllib.request.urlretrieve(csvurl.strip('/'), '/'.join([path,fn]))
url="http://www.milton-keynes.gov.uk/your-council-and-elections/council-information-and-accounts/data-performance-and-spending/Milton-Keynes-Council-spend-archive"
html2= requests.get(url)
cc=linkgrabber_MK(html2)
harvester(cc,'mkdata')
Grabbing http://www.milton-keynes.gov.uk/assets/attach/18898/03-2014.csv as 03-2014.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/18178/02-2014.csv as 02-2014.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/17750/01-2014.csv as 01-2014.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/17149/12-2013.csv as 12-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/16719/11-2013.csv as 11-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/16333/10-2013.csv as 10-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/15845/09-2013.csv as 09-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/15393/Spend%20data%2008-2013.csv as Spend%20data%2008-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/15236/07-2013.csv as 07-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/15038/06-2013.csv as 06-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/14869/05-2013(1).csv as 05-2013(1).csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/14535/04-2013.csv as 04-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/14287/03-2013.csv as 03-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/14069/02-2013.csv as 02-2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/13688/01_2013.csv as 01_2013.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/13291/12-2012.csv as 12-2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/13011/11-2012.csv as 11-2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/12815/Council_Spend_Data_October_2012.csv as Council_Spend_Data_October_2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/12569/Council_Spend_September_2012.csv as Council_Spend_September_2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/12410/August_Spend_2012.csv as August_Spend_2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/12085/Council_Spend_Data_July_2012.csv as Council_Spend_Data_July_2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/11674/Council_Spend_Data_June_2012.csv as Council_Spend_Data_June_2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/11384/05-2012.csv as 05-2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/11055/04-2012.csv as 04-2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/10857/10-2011_to_03-2012.csv as 10-2011_to_03-2012.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/10856/01-2011_to_09-2011.csv as 01-2011_to_09-2011.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/14534/03-2011(1).csv as 03-2011(1).csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/14533/02-2011(1).csv as 02-2011(1).csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/14532/01-2011(2).csv as 01-2011(2).csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/7023/12-2010.csv as 12-2010.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/7022/11-2010.csv as 11-2010.csv Grabbing http://www.milton-keynes.gov.uk/assets/attach/7021/10-2010.csv as 10-2010.csv
def linkgrabber_MK2(response):
soup=BeautifulSoup(response.content)
results=[]
#Get the list of results
resultsblock=soup.find('ul',{'class':'documents'})
for result in resultsblock.findAll('a',href=re.compile('.*\.csv$')):
csvurl=result.attrs['href']
csvurl='http://www.milton-keynes.gov.uk/'+csvurl
fn=result.attrs['href'].split('/')[-1]
results.append((csvurl,fn))
return results
url='http://www.milton-keynes.gov.uk/your-council-and-elections/council-information-and-accounts/data-performance-and-spending/milton-keynes-council-spend'
html2= requests.get(url)
cc=linkgrabber_MK2(html2)
harvester(cc,'mkdata')
Grabbing http://www.milton-keynes.gov.uk//assets/attach/22476/11-2014.csv as 11-2014.csv Grabbing http://www.milton-keynes.gov.uk//assets/attach/22178/10-2014.csv as 10-2014.csv Grabbing http://www.milton-keynes.gov.uk//assets/attach/21862/09-2014.csv as 09-2014.csv Grabbing http://www.milton-keynes.gov.uk//assets/attach/21228/08-2014.csv as 08-2014.csv Grabbing http://www.milton-keynes.gov.uk//assets/attach/20768/07-2014.csv as 07-2014.csv Grabbing http://www.milton-keynes.gov.uk//assets/attach/20306/06-2014.csv as 06-2014.csv Grabbing http://www.milton-keynes.gov.uk//assets/attach/19870/05-2014.csv as 05-2014.csv Grabbing http://www.milton-keynes.gov.uk//assets/attach/19433/04-2014.csv as 04-2014.csv
!head mkdata/04-2014.csv
#derived from /usr/local/bin/chardetect.py
import os
from chardet.universaldetector import UniversalDetector
def sniffer(fname):
file=open(fname, 'rb')
u = UniversalDetector()
for line in file:
u.feed(line)
u.close()
result = u.result
if result['encoding']:
return (result['encoding'], result['confidence'])
for fname in os.listdir('mkdata'):
if fname.endswith('csv'):
print(fname,sniffer('mkdata/'+fname))
01-2011(2).csv ('ascii', 1.0) 01-2011_to_09-2011.csv ('ascii', 1.0) 01-2014.csv ('ascii', 1.0) 01_2013.csv ('ascii', 1.0) 02-2011(1).csv ('ascii', 1.0) 02-2013.csv ('ascii', 1.0) 02-2014.csv ('ascii', 1.0) 03-2011(1).csv ('ascii', 1.0) 03-2013.csv ('ascii', 1.0) 03-2014.csv ('ascii', 1.0) 04-2012.csv ('ascii', 1.0) 04-2013.csv ('ascii', 1.0) 04-2014.csv ('ascii', 1.0) 05-2012.csv ('ascii', 1.0) 05-2013(1).csv ('ascii', 1.0) 05-2014.csv ('ascii', 1.0) 06-2013.csv ('ascii', 1.0) 06-2014.csv ('ascii', 1.0) 07-2013.csv ('ascii', 1.0) 07-2014.csv ('ascii', 1.0) 08-2014.csv ('ascii', 1.0) 09-2013.csv ('ascii', 1.0) 09-2014.csv ('ISO-8859-2', 0.7888398211101958) 10-2010.csv ('ascii', 1.0) 10-2011_to_03-2012.csv ('ascii', 1.0) 10-2013.csv ('ascii', 1.0) 10-2014.csv ('ascii', 1.0) 11-2010.csv ('ascii', 1.0) 11-2012.csv ('ascii', 1.0) 11-2013.csv ('ascii', 1.0) 11-2014.csv ('ascii', 1.0) 12-2010.csv ('ascii', 1.0) 12-2012.csv ('ascii', 1.0) 12-2013.csv ('ascii', 1.0) August_Spend_2012.csv ('ascii', 1.0) Council_Spend_Data_July_2012.csv ('ascii', 1.0) Council_Spend_Data_June_2012.csv ('ascii', 1.0) Council_Spend_Data_October_2012.csv ('ascii', 1.0) Council_Spend_September_2012.csv ('ascii', 1.0) Spend%20data%2008-2013.csv ('ascii', 1.0)
import pandas as pd
dd=pd.DataFrame()
for fname in os.listdir('mkdata/'):
enc,cert=sniffer('mkdata/'+fname)
dd=pd.concat([dd,pd.read_csv('mkdata/'+fname,encoding=enc)])
dd.to_csv('mkSpendingData.csv',index=False)
dd.dtypes
Body name object Service area categorisation object Service division categorisation object Expenses type object Date object Transaction number int64 Amount float64 Supplier name object dtype: object
from dateutil.parser import *
dd['Date']=dd['Date'].apply(parse)
Quick preview of what the datatable looks like:
dd[:5]
Body name | Service area categorisation | Service division categorisation | Expenses type | Date | Transaction number | Amount | Supplier name | Cumul Amount | Financial Year | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Milton Keynes Council | Environment | Infrastructure | Supplies and services | 2010-01-10 | 5100473631 | 8432.64 | A1 Security Home Counties (Uk) Ltd | 8432.64 | 2009 |
1 | Milton Keynes Council | CHILDREN AND YOUNG PEOPLE'S SERVICE | Early Years and Extended Services | Premises-related expenditure | 2010-01-10 | 5100470016 | 75.00 | Acorn (M K) Nurseries | 8507.64 | 2009 |
2 | Milton Keynes Council | CHILDREN AND YOUNG PEOPLE'S SERVICE | Early Years and Extended Services | Premises-related expenditure | 2010-01-10 | 5100470014 | 350.00 | Acorn (M K) Nurseries | 8857.64 | 2009 |
3 | Milton Keynes Council | CHILDREN AND YOUNG PEOPLE'S SERVICE | Early Years and Extended Services | Premises-related expenditure | 2010-01-10 | 5100470017 | 75.00 | Acorn (M K) Nurseries | 8932.64 | 2009 |
4 | Milton Keynes Council | Community Wellbeing-GF | Head of Adult Social Care | Employees | 2010-01-10 | 5100473816 | 600.00 | ACS Care Services Limited | 9532.64 | 2009 |
Start by looking at accumulated totals.
from ggplot import *
dd.sort('Date',inplace=True)
dd['Cumul Amount'] = dd['Amount'].cumsum()
#Financial year models
#http://stackoverflow.com/a/26342880/454773
def getFiscalYear(dt):
year = dt.year
if dt.month<4: year -= 1
return year
dd['Financial Year']= dd['Date'].apply(getFiscalYear)
dd['Cumul Amount Tax Year'] = dd.groupby('Financial Year')['Amount'].cumsum()
#I don't know (yet?!) how to cast an April-March tax year x-axis that would allow us to compare
# accumulated totals using differently coloured lines in the same chart.
# Also not that the incomplete 2014 tax year has an x-axis of fewer days than the other years...
ggplot(dd,aes(x='Date',y='Cumul Amount Tax Year'))+geom_line()+facet_wrap('Financial Year',scales="free_x")
<ggplot: (-9223363294681154030)>
pd.options.display.float_format = '{:20,.2f}'.format
Let's look at totals by Service Area Categorisation:
servSpend=pd.pivot_table(dd, values='Amount', index=['Service area categorisation'], columns=['Financial Year'], aggfunc=np.sum)
servSpend
Financial Year | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
---|---|---|---|---|---|---|
Service area categorisation | ||||||
Adult Social Care & Health Group | nan | 2,507,671.42 | 66,917,432.11 | 50,622,120.65 | 76,771,472.60 | 40,815,721.67 |
BBS | nan | nan | nan | nan | nan | 203,630.77 |
CHILDREN AND YOUNG PEOPLE'S SERVICE | 1,481,308.65 | 19,530,427.28 | 2,868,301.07 | nan | nan | nan |
Capital PCs not allocated | nan | 612,216.00 | 1,191,209.76 | 880,015.12 | 822,128.10 | 539,744.52 |
Chief Executives | 1.48 | 317,793.94 | 60,454.79 | nan | nan | nan |
Children & Families: Education, Effectiv | nan | 1,667,997.99 | 31,636,348.07 | 29,491,161.55 | 41,141,363.03 | 23,808,969.52 |
Children & Families: Integrated Support | nan | 1,196,990.11 | 22,621,653.85 | 22,436,458.02 | 30,706,208.94 | 18,541,475.99 |
Community Facilities Unit | nan | 155,514.68 | 6,706,288.32 | 11,342,114.06 | 437,578.74 | nan |
Community Wellbeing-GF | 2,249,593.30 | 23,665,867.38 | 3,061,094.89 | nan | nan | nan |
Corporate Codes | nan | 32,981.50 | 2,997,220.00 | 53,883,296.53 | 26,704,333.75 | 14,228,367.00 |
Corporate Core | nan | 8,222.61 | 291,620.39 | 286,825.18 | 647,557.65 | 426,898.86 |
Debt Financing | nan | 11,998.00 | 735,093.15 | 722,527.40 | 1,074,312.42 | 709,355.60 |
Environment | 895,065.20 | 24,735,756.19 | 5,320,645.98 | nan | nan | nan |
Finance & Risk | 3,354,756.67 | 34,734,949.25 | 6,422,187.71 | nan | nan | nan |
HOUSING REVENUE ACCOUNT | 163,061.58 | 16,796,584.03 | 1,852,262.94 | nan | nan | nan |
Highways & Transportation Group | nan | 1,470,833.03 | 27,779,408.21 | 23,530,954.98 | 49,991,082.73 | 27,460,049.83 |
Housing & Community Group | nan | 166,380.22 | 12,759,630.05 | 22,477,055.85 | 19,272,726.22 | 10,132,276.11 |
Housing Revenue Account | nan | 41,097.36 | 26,222,740.82 | 11,102,537.85 | 15,595,743.21 | 7,915,323.07 |
Neighbourhood Services Group | nan | 491,764.86 | 30,507,386.58 | 27,860,325.86 | 632,060.74 | nan |
Organisational Transformation Programme | nan | 39,026.38 | 39,954.74 | 3,821.84 | nan | nan |
Planning & Strategic Transport | nan | nan | nan | nan | 232,366.84 | 4,293,981.23 |
Planning & Transport | nan | nan | nan | nan | 505,401.39 | 4,118,557.54 |
Planning, Economy & Development Group | nan | 19,827.39 | 2,382,212.36 | 2,875,179.69 | 18,770,095.17 | 5,299,982.97 |
Profit Centres No Longer in use | nan | nan | nan | 256,297.93 | 293,838.52 | nan |
Public Health | nan | nan | nan | 197,454.41 | 6,479,471.47 | 4,864,291.54 |
Regulatory Unit | nan | 33,756.20 | 2,978,481.19 | 224,546.58 | nan | nan |
Resources | nan | 6,165,323.55 | 1,824,799.93 | nan | nan | nan |
Resources: Public Access Group | nan | 361,435.29 | 6,098,724.27 | 8,757,760.53 | 8,173,702.20 | 3,941,619.20 |
Resources:Finance,HR & Governance Group | nan | 12,065,805.46 | 111,422,782.02 | 47,384,457.88 | 46,524,010.08 | 28,219,036.37 |
Strategy & Partnerships | 236,446.90 | 2,194,903.19 | 300,030.00 | nan | nan | nan |
Tariff Projects | nan | nan | nan | nan | 47,775.00 | 912,063.14 |
Do we trust that the nan values are based on no transactions recorded in those areas? Check by listing the unique tax years associated with spending items in each category:
dd.groupby(['Service area categorisation'])['Financial Year'].unique()
Service area categorisation Adult Social Care & Health Group [2010, 2011, 2012, 2013, 2014] BBS [2014] CHILDREN AND YOUNG PEOPLE'S SERVICE [2009, 2010, 2011] Capital PCs not allocated [2010, 2011, 2012, 2013, 2014] Chief Executives [2009, 2010, 2011] Children & Families: Education, Effectiv [2010, 2011, 2012, 2013, 2014] Children & Families: Integrated Support [2010, 2011, 2012, 2013, 2014] Community Facilities Unit [2010, 2011, 2012, 2013] Community Wellbeing-GF [2009, 2010, 2011] Corporate Codes [2010, 2011, 2012, 2013, 2014] Corporate Core [2010, 2011, 2012, 2013, 2014] Debt Financing [2010, 2011, 2012, 2013, 2014] Environment [2009, 2010, 2011] Finance & Risk [2009, 2010, 2011] HOUSING REVENUE ACCOUNT [2009, 2010, 2011] Highways & Transportation Group [2010, 2011, 2012, 2013, 2014] Housing & Community Group [2010, 2011, 2012, 2013, 2014] Housing Revenue Account [2010, 2011, 2012, 2013, 2014] Neighbourhood Services Group [2010, 2011, 2012, 2013] Organisational Transformation Programme [2010, 2011, 2012] Planning & Strategic Transport [2013, 2014] Planning & Transport [2013, 2014] Planning, Economy & Development Group [2010, 2011, 2012, 2013, 2014] Profit Centres No Longer in use [2012, 2013] Public Health [2012, 2013, 2014] Regulatory Unit [2010, 2011, 2012] Resources [2010, 2011] Resources: Public Access Group [2010, 2011, 2012, 2013, 2014] Resources:Finance,HR & Governance Group [2010, 2011, 2012, 2013, 2014] Strategy & Partnerships [2009, 2010, 2011] Tariff Projects [2013, 2014] Name: Financial Year, dtype: object