%%HTML
<style>
/* style for presentation only */
.reveal .rendered_html table { font-size: 24px }
/* centre images */
.reveal .rendered_html img {
display: table-cell;
text-align: center;
vertical-align: middle;
horizontal-align: middle;
margin-left: auto;
margin-right: auto;
}
</style>
from IPython.display import Image
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
from matplotlib import pyplot as plt
plt.rcParams.update({'figure.max_open_warning': 0})
graph_figsize = (10,6) # I'm forgetful and lazy
plt.rcParams.update({'figure.figsize': graph_figsize})
%matplotlib nbagg
# Jupyter Magics!
import geopandas as gp
import urllib.request
import zipfile
from pathlib import Path
import pandas as pd
import plotly.plotly as py
from plotly.offline import init_notebook_mode, plot, iplot
import plotly.graph_objs as go
from datetime import datetime
init_notebook_mode(connected=False)
import cufflinks
Andrew Bolster
DISCLAIMERS
*Hint: If you really want to learn data science, go to/get involved in some of these*
*Hint the Second: These are the best ways to get a job in the field*
Open data and content can be freely used, modified, and shared by anyone for any purpose - The Open Definition
Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. - Wikipedia
Open data is data that’s available to everyone to access, use and share. Yep, even your nan.- The Open Data Institute
Open data is only useful if it’s shared in ways that people can actually understand. It needs to be shared in a standardised format and easily traced back to where it came from.
When people talk about ‘big data’ they mean a lot of data. Obviously. But ‘big data’, like government statistics on health care, is often closed to the public (or expensive to access, like the terrifyingly-named TWITTER FIREHOSE). Opening up big data lets people use it to spot trends, fill gaps and improve services.
If you’re worried about big companies being fed all your private details, that’s got nothing to do with open data. Groups sharing information with each other is different from opening it up for all to access. Your private data should only be open if you choose to share it. (But if you want to know who’s accessing or sharing your data, open data can help.)
If citizens know about their governments they can hold leaders to account, make more informed decisions and demand better services. Open data can also help governments stay on their toes and make better policies for society, the economy and the environment.
It’s helped people in London to track stolen bikes and police in Vancouver to stay one step ahead of criminals.
With tools like FoodTradeMenu using it to help restaurants make sure they don’t serve you food you’re allergic to without realising.
It helps groups to coordinate aid delivery in humanitarian disasters.
Apps like CityMapper use open data from groups like Transport for London to help you find the quickest and cheapest way to get from A to B. Even maps can be open, like OpenStreetMap, which powers map data for websites and humanitarian crises relief around the world.
‘Data journalism’ only differs from ‘words journalism’ in that we use a different kit. We all sniff out, report, and relate stories for a living. It’s like ‘photo journalism’; just swap the camera for a laptop. - Brian Boyer, Chicago Tribune
[...] providing information and analysis to help inform us all about important issues of the day. - Aron Pilhofer, New York Times
Using Open Data and Data Science principles to ask, analyse and answer complex or contentious questions or areas of inquiry using available evidence. - Bolster, Here
This is usually the wrong question;
I think we can get most of these... but first things first
Gov Open Data is notoriously difficult to 'wander around'
When in doubt, bring in pandas
read_csv
¶import pandas as pd
url_from_odni = "https://www.opendatani.gov.uk/dataset/6058be29-b2e1-4253-bab8-8a018568560a/resource/b4fcfed1-2dc1-4f61-8968-573535522d53/download/school-level-post-primary-reference-data.csv"
pd.read_csv(url_from_odni)
De ref | school name | address 1 | town | postcode | school type | management type | constituency | council | rural | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1210014 | Ashfield Girls' High School | 397 HOLYWOOD ROAD | BELFAST | BT42LY | Secondary | Controlled | BELFAST EAST | BELFAST | Urban |
1 | 1210015 | Ashfield Boys' High School | 395 HOLYWOOD ROAD | BELFAST | BT42LY | Secondary | Controlled | BELFAST EAST | BELFAST | Urban |
2 | 1210021 | Belfast Model School For Girls | 35 DUNOWEN GARDENS | BELFAST | BT146NQ | Secondary | Controlled | BELFAST NORTH | BELFAST | Urban |
3 | 1210022 | Belfast Boys' Model School | BALLYSILLAN ROAD | BELFAST | BT146RB | Secondary | Controlled | BELFAST NORTH | BELFAST | Urban |
4 | 1230026 | St Patrick's College, Belfast | 619-629 ANTRIM ROAD | BELFAST | BT154DZ | Secondary | Catholic Maintained | BELFAST NORTH | BELFAST | Urban |
5 | 1230053 | St Louise's Comprehensive College | 468 FALLS ROAD | BELFAST | BT126EN | Secondary | Catholic Maintained | BELFAST WEST | BELFAST | Urban |
6 | 1230089 | Little Flower Girls' School | 71A SOMERTON ROAD | BELFAST | BT154DE | Secondary | Catholic Maintained | BELFAST NORTH | BELFAST | Urban |
7 | 1230104 | Mercy College | BALLYSILLAN ROAD | BELFAST | BT147QR | Secondary | Catholic Maintained | BELFAST NORTH | BELFAST | Urban |
8 | 1230130 | St Rose?s Dominican College | 65 BEECHMOUNT AVENUE | BELFAST | BT127NA | Secondary | Catholic Maintained | BELFAST WEST | BELFAST | Urban |
9 | 1230146 | Christian Brothers School, Belfast | GLEN ROAD | BELFAST | BT118BW | Secondary | Catholic Maintained | BELFAST WEST | BELFAST | Urban |
10 | 1230155 | St Genevieve's High School | TRENCH HOUSE | BELFAST | BT119JP | Secondary | Catholic Maintained | BELFAST WEST | BELFAST | Urban |
11 | 1230182 | De La Salle College | 36 EDENMORE DRIVE | BELFAST | BT118LT | Secondary | Catholic Maintained | BELFAST WEST | BELFAST | Urban |
12 | 1230262 | Corpus Christi College | ARD NA VA ROAD | BELFAST | BT126FF | Secondary | Catholic Maintained | BELFAST WEST | BELFAST | Urban |
13 | 1230275 | St Joseph's College, Belfast | 518-572 RAVENHILL ROAD | BELFAST | BT60BY | Secondary | Catholic Maintained | BELFAST SOUTH | BELFAST | Urban |
14 | 1240291 | Colaiste Feirste | 7 BEECHVIEW PARK | BELFAST | BT127PY | Secondary | Other Maintained | BELFAST WEST | BELFAST | Urban |
15 | 1260269 | Hazelwood College | 70 WHITEWELL ROAD | NEWTOWNABBEY | BT367ES | Secondary | GMI | BELFAST NORTH | BELFAST | Urban |
16 | 1260294 | Malone Integrated College | 45 FINAGHY ROAD NORTH | BELFAST | BT100JB | Secondary | GMI | BELFAST SOUTH | BELFAST | Urban |
17 | 1410079 | Grosvenor Grammar School | 50 MARINA PARK | BELFAST | BT56BA | Grammar | Controlled | BELFAST EAST | BELFAST | Urban |
18 | 1410270 | Wellington College | 18 CAROLAN ROAD | BELFAST | BT73HE | Grammar | Controlled | BELFAST SOUTH | BELFAST | Urban |
19 | 1410315 | Bloomfield Collegiate | 8 ASTORIA GARDENS | BELFAST | BT56HW | Grammar | Controlled | BELFAST EAST | BELFAST | Urban |
20 | 1420020 | Campbell College | BELMONT ROAD | BELFAST | BT42ND | Grammar | Voluntary | BELFAST EAST | BELFAST | Urban |
21 | 1420021 | St Mary's Christian Brothers' Grammar, Belfast | 147A GLEN ROAD | BELFAST | BT118NR | Grammar | Voluntary | BELFAST WEST | BELFAST | Urban |
22 | 1420022 | Methodist College | 1 MALONE ROAD | BELFAST | BT96BY | Grammar | Voluntary | BELFAST SOUTH | BELFAST | Urban |
23 | 1420027 | The Royal Belfast Academical Institution | COLLEGE SQUARE EAST | BELFAST | BT16DL | Grammar | Voluntary | BELFAST SOUTH | BELFAST | Urban |
24 | 1420028 | Belfast Royal Academy | 5-17 CLIFTONVILLE ROAD | BELFAST | BT146JL | Grammar | Voluntary | BELFAST NORTH | BELFAST | Urban |
25 | 1420029 | St Dominic's High School, Belfast | 135-137 FALLS ROAD | BELFAST | BT126AE | Grammar | Voluntary | BELFAST WEST | BELFAST | Urban |
26 | 1420030 | St Malachy's College, Belfast | 36 ANTRIM ROAD | BELFAST | BT152AE | Grammar | Voluntary | BELFAST NORTH | BELFAST | Urban |
27 | 1420082 | Dominican College, Belfast | 38 FORTWILLIAM PARK | BELFAST | BT154AQ | Grammar | Voluntary | BELFAST NORTH | BELFAST | Urban |
28 | 1420089 | Strathearn School, Belfast | 188 BELMONT ROAD | BELFAST | BT42AU | Grammar | Voluntary | BELFAST EAST | BELFAST | Urban |
29 | 1420095 | Rathmore Grammar School | KINGSWAY | BELFAST | BT100LF | Grammar | Voluntary | BELFAST SOUTH | BELFAST | Urban |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
171 | 5230076 | St Patrick's College, Banbridge | 38 SCARVA ROAD | BANBRIDGE | BT323AS | Secondary | Catholic Maintained | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
172 | 5230108 | St Mary's High School, Newry | UPPER CHAPEL STREET | NEWRY | BT342DT | Secondary | Catholic Maintained | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Urban |
173 | 5230135 | St Mark's High School | UPPER DROMORE ROAD | NEWRY | BT343PN | Secondary | Catholic Maintained | SOUTH DOWN | NEWRY MOURNE AND DOWN | Urban |
174 | 5230152 | St Ciaran's High School | 15 TULLYBRYAN ROAD | DUNGANNON | BT702LY | Secondary | Catholic Maintained | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Rural |
175 | 5230157 | St Paul's High School, Bessbrook | 108 CAMLOUGH ROAD | NEWRY | BT357EE | Secondary | Catholic Maintained | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Rural |
176 | 5230167 | St Joseph's High School, Crossmaglen | 77 DUNDALK ROAD | NEWRY | BT359HL | Secondary | Catholic Maintained | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Rural |
177 | 5230187 | St Patrick's High School, Keady | MIDDLETOWN ROAD | KEADY | BT603TH | Secondary | Catholic Maintained | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Rural |
178 | 5230192 | St Joseph's College, Coalisland | 29 SCHOOL LANE | DUNGANNON | BT714NW | Secondary | Catholic Maintained | MID ULSTER | MID ULSTER | Urban |
179 | 5230213 | Lismore Comprehensive School | DRUMGASK | CRAIGAVON | BT655DU | Secondary | Catholic Maintained | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
180 | 5230218 | St Catherine's College | 2A CONVENT ROAD | ARMAGH | BT604BG | Secondary | Catholic Maintained | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
181 | 5230256 | Drumcree College | 4 MOY ROAD | PORTADOWN | BT621QL | Secondary | Catholic Maintained | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
182 | 5230278 | Holy Trinity College | CHAPEL STREET | COOKSTOWN | BT808QB | Secondary | Catholic Maintained | MID ULSTER | MID ULSTER | Urban |
183 | 5230293 | St Patrick's College, Dungannon | 41 KILLYMEAL ROAD | DUNGANNON | BT716LJ | Secondary | Catholic Maintained | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Urban |
184 | 5250216 | Brownlow Int College | TULLYGALLY ROAD | CRAIGAVON | BT655BS | Secondary | Controlled Integrated | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
185 | 5260285 | New-Bridge Integrated College | 25 DONARD VIEW ROAD | BANBRIDGE | BT323LN | Secondary | GMI | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Rural |
186 | 5260286 | Integrated College Dungannon | 21 GORTMERRON LINK ROAD | DUNGANNON | BT716LS | Secondary | GMI | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Urban |
187 | 5410013 | Banbridge Academy | LURGAN ROAD | BANBRIDGE | BT324AQ | Grammar | Controlled | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
188 | 5410057 | Lurgan College | 9 COLLEGE WALK | CRAIGAVON | BT666JW | Grammar | Controlled | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
189 | 5410067 | Portadown College | 4 KILLYCOMAINE ROAD | CRAIGAVON | BT635BU | Grammar | Controlled | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
190 | 5420045 | St Louis Grammar School, Kilkeel | 151 NEWRY ROAD | KILKEEL | BT344EU | Grammar | Voluntary | SOUTH DOWN | NEWRY MOURNE AND DOWN | Urban |
191 | 5420059 | Abbey Christian Brothers Grammar School | 77A ASHGROVE ROAD | NEWRY | BT342QN | Grammar | Voluntary | SOUTH DOWN | NEWRY MOURNE AND DOWN | Urban |
192 | 5420060 | Our Lady's Grammar School | CHEQUER HILL | NEWRY | BT356DY | Grammar | Voluntary | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Urban |
193 | 5420062 | St Colman's College, Newry | 46 ARMAGH ROAD | NEWRY | BT356PP | Grammar | Voluntary | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Urban |
194 | 5420073 | St Joseph's Convent Grammar School, Donaghmore | 58 CASTLECAULFIELD ROAD | DUNGANNON | BT703HE | Grammar | Voluntary | MID ULSTER | MID ULSTER | Rural |
195 | 5420076 | Sacred Heart Grammar School, Newry | 10 ASHGROVE AVENUE | NEWRY | BT341PR | Grammar | Voluntary | SOUTH DOWN | NEWRY MOURNE AND DOWN | Urban |
196 | 5420260 | The Royal School, Dungannon | 2 RANFURLEY ROAD | DUNGANNON | BT716AP | Grammar | Voluntary | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Urban |
197 | 5420263 | The Royal School, Armagh | COLLEGE HILL | ARMAGH | BT619DH | Grammar | Voluntary | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
198 | 5420268 | St Patrick's Grammar School, Armagh | CATHEDRAL ROAD | ARMAGH | BT617QZ | Grammar | Voluntary | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
199 | 5420304 | St Patrick's Academy, Dungannon | 37 KILLYMEAL ROAD | DUNGANNON | BT716DS | Grammar | Voluntary | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Urban |
200 | 5420314 | St Ronan?s College | 12 CORNAKINEGAR ROAD | LURGAN | BT679JW | Grammar | Voluntary | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Urban |
201 rows × 10 columns
from ckanapi import RemoteCKAN # it's on pip
def build_odni_connection():
version_no = (pd.to_datetime('now') -
pd.to_datetime('1988/05/17')).days/365
ua = f'@Bolster/{version_no:.2f} (+http://bolster.online/)'
return RemoteCKAN('https://www.opendatani.gov.uk/', user_agent=ua)
def dataset_generator(resource_id):
"""A Generator that yields records from a given dataset resource id"""
demo = build_odni_connection()
offset = 0
while True:
datastore_page = demo.action.datastore_search(
resource_id=resource_id, offset=offset)
if not datastore_page['records']:
raise StopIteration
for record in datastore_page['records']:
yield record # Execution is passed back to the caller here
offset += 1
resource= "b4fcfed1-2dc1-4f61-8968-573535522d53"
pd.DataFrame.from_records(dataset_generator(resource))
De ref | _id | address 1 | constituency | council | management type | postcode | rural | school name | school type | town | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1210014 | 1 | 397 HOLYWOOD ROAD | BELFAST EAST | BELFAST | Controlled | BT42LY | Urban | Ashfield Girls' High School | Secondary | BELFAST |
1 | 1210015 | 2 | 395 HOLYWOOD ROAD | BELFAST EAST | BELFAST | Controlled | BT42LY | Urban | Ashfield Boys' High School | Secondary | BELFAST |
2 | 1210021 | 3 | 35 DUNOWEN GARDENS | BELFAST NORTH | BELFAST | Controlled | BT146NQ | Urban | Belfast Model School For Girls | Secondary | BELFAST |
3 | 1210022 | 4 | BALLYSILLAN ROAD | BELFAST NORTH | BELFAST | Controlled | BT146RB | Urban | Belfast Boys' Model School | Secondary | BELFAST |
4 | 1230026 | 5 | 619-629 ANTRIM ROAD | BELFAST NORTH | BELFAST | Catholic Maintained | BT154DZ | Urban | St Patrick's College, Belfast | Secondary | BELFAST |
5 | 1230053 | 6 | 468 FALLS ROAD | BELFAST WEST | BELFAST | Catholic Maintained | BT126EN | Urban | St Louise's Comprehensive College | Secondary | BELFAST |
6 | 1230089 | 7 | 71A SOMERTON ROAD | BELFAST NORTH | BELFAST | Catholic Maintained | BT154DE | Urban | Little Flower Girls' School | Secondary | BELFAST |
7 | 1230104 | 8 | BALLYSILLAN ROAD | BELFAST NORTH | BELFAST | Catholic Maintained | BT147QR | Urban | Mercy College | Secondary | BELFAST |
8 | 1230130 | 9 | 65 BEECHMOUNT AVENUE | BELFAST WEST | BELFAST | Catholic Maintained | BT127NA | Urban | St Rose?s Dominican College | Secondary | BELFAST |
9 | 1230146 | 10 | GLEN ROAD | BELFAST WEST | BELFAST | Catholic Maintained | BT118BW | Urban | Christian Brothers School, Belfast | Secondary | BELFAST |
10 | 1230155 | 11 | TRENCH HOUSE | BELFAST WEST | BELFAST | Catholic Maintained | BT119JP | Urban | St Genevieve's High School | Secondary | BELFAST |
11 | 1230182 | 12 | 36 EDENMORE DRIVE | BELFAST WEST | BELFAST | Catholic Maintained | BT118LT | Urban | De La Salle College | Secondary | BELFAST |
12 | 1230262 | 13 | ARD NA VA ROAD | BELFAST WEST | BELFAST | Catholic Maintained | BT126FF | Urban | Corpus Christi College | Secondary | BELFAST |
13 | 1230275 | 14 | 518-572 RAVENHILL ROAD | BELFAST SOUTH | BELFAST | Catholic Maintained | BT60BY | Urban | St Joseph's College, Belfast | Secondary | BELFAST |
14 | 1240291 | 15 | 7 BEECHVIEW PARK | BELFAST WEST | BELFAST | Other Maintained | BT127PY | Urban | Colaiste Feirste | Secondary | BELFAST |
15 | 1260269 | 16 | 70 WHITEWELL ROAD | BELFAST NORTH | BELFAST | GMI | BT367ES | Urban | Hazelwood College | Secondary | NEWTOWNABBEY |
16 | 1260294 | 17 | 45 FINAGHY ROAD NORTH | BELFAST SOUTH | BELFAST | GMI | BT100JB | Urban | Malone Integrated College | Secondary | BELFAST |
17 | 1410079 | 18 | 50 MARINA PARK | BELFAST EAST | BELFAST | Controlled | BT56BA | Urban | Grosvenor Grammar School | Grammar | BELFAST |
18 | 1410270 | 19 | 18 CAROLAN ROAD | BELFAST SOUTH | BELFAST | Controlled | BT73HE | Urban | Wellington College | Grammar | BELFAST |
19 | 1410315 | 20 | 8 ASTORIA GARDENS | BELFAST EAST | BELFAST | Controlled | BT56HW | Urban | Bloomfield Collegiate | Grammar | BELFAST |
20 | 1420020 | 21 | BELMONT ROAD | BELFAST EAST | BELFAST | Voluntary | BT42ND | Urban | Campbell College | Grammar | BELFAST |
21 | 1420021 | 22 | 147A GLEN ROAD | BELFAST WEST | BELFAST | Voluntary | BT118NR | Urban | St Mary's Christian Brothers' Grammar, Belfast | Grammar | BELFAST |
22 | 1420022 | 23 | 1 MALONE ROAD | BELFAST SOUTH | BELFAST | Voluntary | BT96BY | Urban | Methodist College | Grammar | BELFAST |
23 | 1420027 | 24 | COLLEGE SQUARE EAST | BELFAST SOUTH | BELFAST | Voluntary | BT16DL | Urban | The Royal Belfast Academical Institution | Grammar | BELFAST |
24 | 1420028 | 25 | 5-17 CLIFTONVILLE ROAD | BELFAST NORTH | BELFAST | Voluntary | BT146JL | Urban | Belfast Royal Academy | Grammar | BELFAST |
25 | 1420029 | 26 | 135-137 FALLS ROAD | BELFAST WEST | BELFAST | Voluntary | BT126AE | Urban | St Dominic's High School, Belfast | Grammar | BELFAST |
26 | 1420030 | 27 | 36 ANTRIM ROAD | BELFAST NORTH | BELFAST | Voluntary | BT152AE | Urban | St Malachy's College, Belfast | Grammar | BELFAST |
27 | 1420082 | 28 | 38 FORTWILLIAM PARK | BELFAST NORTH | BELFAST | Voluntary | BT154AQ | Urban | Dominican College, Belfast | Grammar | BELFAST |
28 | 1420089 | 29 | 188 BELMONT ROAD | BELFAST EAST | BELFAST | Voluntary | BT42AU | Urban | Strathearn School, Belfast | Grammar | BELFAST |
29 | 1420095 | 30 | KINGSWAY | BELFAST SOUTH | BELFAST | Voluntary | BT100LF | Urban | Rathmore Grammar School | Grammar | BELFAST |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
171 | 5230076 | 172 | 38 SCARVA ROAD | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Catholic Maintained | BT323AS | Urban | St Patrick's College, Banbridge | Secondary | BANBRIDGE |
172 | 5230108 | 173 | UPPER CHAPEL STREET | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Catholic Maintained | BT342DT | Urban | St Mary's High School, Newry | Secondary | NEWRY |
173 | 5230135 | 174 | UPPER DROMORE ROAD | SOUTH DOWN | NEWRY MOURNE AND DOWN | Catholic Maintained | BT343PN | Urban | St Mark's High School | Secondary | NEWRY |
174 | 5230152 | 175 | 15 TULLYBRYAN ROAD | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Catholic Maintained | BT702LY | Rural | St Ciaran's High School | Secondary | DUNGANNON |
175 | 5230157 | 176 | 108 CAMLOUGH ROAD | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Catholic Maintained | BT357EE | Rural | St Paul's High School, Bessbrook | Secondary | NEWRY |
176 | 5230167 | 177 | 77 DUNDALK ROAD | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Catholic Maintained | BT359HL | Rural | St Joseph's High School, Crossmaglen | Secondary | NEWRY |
177 | 5230187 | 178 | MIDDLETOWN ROAD | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Catholic Maintained | BT603TH | Rural | St Patrick's High School, Keady | Secondary | KEADY |
178 | 5230192 | 179 | 29 SCHOOL LANE | MID ULSTER | MID ULSTER | Catholic Maintained | BT714NW | Urban | St Joseph's College, Coalisland | Secondary | DUNGANNON |
179 | 5230213 | 180 | DRUMGASK | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Catholic Maintained | BT655DU | Urban | Lismore Comprehensive School | Secondary | CRAIGAVON |
180 | 5230218 | 181 | 2A CONVENT ROAD | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Catholic Maintained | BT604BG | Urban | St Catherine's College | Secondary | ARMAGH |
181 | 5230256 | 182 | 4 MOY ROAD | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Catholic Maintained | BT621QL | Urban | Drumcree College | Secondary | PORTADOWN |
182 | 5230278 | 183 | CHAPEL STREET | MID ULSTER | MID ULSTER | Catholic Maintained | BT808QB | Urban | Holy Trinity College | Secondary | COOKSTOWN |
183 | 5230293 | 184 | 41 KILLYMEAL ROAD | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Catholic Maintained | BT716LJ | Urban | St Patrick's College, Dungannon | Secondary | DUNGANNON |
184 | 5250216 | 185 | TULLYGALLY ROAD | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Controlled Integrated | BT655BS | Urban | Brownlow Int College | Secondary | CRAIGAVON |
185 | 5260285 | 186 | 25 DONARD VIEW ROAD | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | GMI | BT323LN | Rural | New-Bridge Integrated College | Secondary | BANBRIDGE |
186 | 5260286 | 187 | 21 GORTMERRON LINK ROAD | FERMANAGH AND SOUTH TYRONE | MID ULSTER | GMI | BT716LS | Urban | Integrated College Dungannon | Secondary | DUNGANNON |
187 | 5410013 | 188 | LURGAN ROAD | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Controlled | BT324AQ | Urban | Banbridge Academy | Grammar | BANBRIDGE |
188 | 5410057 | 189 | 9 COLLEGE WALK | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Controlled | BT666JW | Urban | Lurgan College | Grammar | CRAIGAVON |
189 | 5410067 | 190 | 4 KILLYCOMAINE ROAD | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Controlled | BT635BU | Urban | Portadown College | Grammar | CRAIGAVON |
190 | 5420045 | 191 | 151 NEWRY ROAD | SOUTH DOWN | NEWRY MOURNE AND DOWN | Voluntary | BT344EU | Urban | St Louis Grammar School, Kilkeel | Grammar | KILKEEL |
191 | 5420059 | 192 | 77A ASHGROVE ROAD | SOUTH DOWN | NEWRY MOURNE AND DOWN | Voluntary | BT342QN | Urban | Abbey Christian Brothers Grammar School | Grammar | NEWRY |
192 | 5420060 | 193 | CHEQUER HILL | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Voluntary | BT356DY | Urban | Our Lady's Grammar School | Grammar | NEWRY |
193 | 5420062 | 194 | 46 ARMAGH ROAD | NEWRY AND ARMAGH | NEWRY MOURNE AND DOWN | Voluntary | BT356PP | Urban | St Colman's College, Newry | Grammar | NEWRY |
194 | 5420073 | 195 | 58 CASTLECAULFIELD ROAD | MID ULSTER | MID ULSTER | Voluntary | BT703HE | Rural | St Joseph's Convent Grammar School, Donaghmore | Grammar | DUNGANNON |
195 | 5420076 | 196 | 10 ASHGROVE AVENUE | SOUTH DOWN | NEWRY MOURNE AND DOWN | Voluntary | BT341PR | Urban | Sacred Heart Grammar School, Newry | Grammar | NEWRY |
196 | 5420260 | 197 | 2 RANFURLEY ROAD | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Voluntary | BT716AP | Urban | The Royal School, Dungannon | Grammar | DUNGANNON |
197 | 5420263 | 198 | COLLEGE HILL | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Voluntary | BT619DH | Urban | The Royal School, Armagh | Grammar | ARMAGH |
198 | 5420268 | 199 | CATHEDRAL ROAD | NEWRY AND ARMAGH | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Voluntary | BT617QZ | Urban | St Patrick's Grammar School, Armagh | Grammar | ARMAGH |
199 | 5420304 | 200 | 37 KILLYMEAL ROAD | FERMANAGH AND SOUTH TYRONE | MID ULSTER | Voluntary | BT716DS | Urban | St Patrick's Academy, Dungannon | Grammar | DUNGANNON |
200 | 5420314 | 201 | 12 CORNAKINEGAR ROAD | UPPER BANN | ARMAGH CITY, BANBRIDGE AND CRAIGAVON | Voluntary | BT679JW | Urban | St Ronan?s College | Grammar | LURGAN |
201 rows × 11 columns
Great for one resource, but we need a load of these....
Most ODNI datasets are collected together as 'packages'.
And you can enumerate them...
odni = build_odni_connection()
odni.action.package_list()
['2011-census-microdata-sample-of-anonymised-records-teaching-file', 'a26-dualling-lidar', 'a6-lidar', 'active-places-ni-sports-facilities-database', 'adult-bmi-health-survey-northern-ireland', 'adult-magistrates-defendants-dealt-with', 'advice-centres', 'air-quality-monitoring-stations', 'analysis-of-the-trees-of-belfast', 'anbc-car-parking', 'anbc-local-nature-reserves', 'annual-survey-of-hours-and-earnings', 'anti-social-behaviour-incidents-recorded-by-the-police-in-northern-ireland', 'antrim-and-newtownabbey-borough-council-bring-bank-locations', 'antrim-and-newtownabbey-councillors', 'antrim-newtownabbey-borough-council-assets', 'ards-and-north-down-borough-council-car-parks', 'ards-and-north-down-councillors', 'areas-of-archaeological-potential', 'areas-of-outstanding-natural-beauty', 'areas-of-significant-archaeological-interest', 'areas-of-special-scientific-interest', 'armagh-city-banbridge-craigavon-borough-councillors', 'attendance-allowance-recipients-statistical-geography', 'authorised-waste-sites-treatment-storage', 'average-2015-prescription-costs-for-gp-practices-in-northern-ireland', 'bathymetric-lidar-for-red-bay', 'bathymetric-lidar-for-small-area-of-strangford-lough', 'bcni2018', 'belfast-bicycle-network', 'belfast-bike-hire-rentals-and-returns', 'belfast-bike-hire-rentals-returns', 'belfast-city-council-councillors', 'belfast-council-car-parks', 'belfast-trees', 'bike-hire-stations', 'botanic-gardens-3d-tree-model', 'bottle-banks', 'boundary-commission-for-northern-ireland-revised-proposals', 'boundary-review', 'bowling-pavilions', 'breastfeeding-welcome-here-certificate-holders', 'cancer-waiting-times', 'carers-allowance-recipients-statistical-geography', 'causewaycoastandglensbc', 'causeway-coast-and-glens-borough-councillors', 'causeway-coast-and-glens-borough-council-off-street-car-parks', 'cemeteries', 'city-parks', 'civic-amenity-sites', 'client-group-analysis-claimants-statisical-geographies', 'clinical-social-care-negligence-cases', 'community-centres', 'contracts-awarded-by-central-procurement-directorate-in-the-2016-2017-year', 'council-properties', 'crown-court-defendants-dealt-with', 'datasets-financial-abuse-of-older-people', 'defence-heritage', 'dental-list-march-2018', 'department-of-health-trust-boundaries', 'derry-and-strabane-borough-councillors', 'derry-city-and-strabane-district-council-off-street-car-parks-locations', 'diagnostic-waiting-times', 'disability-living-allowance-recipients-statistical-geographies', 'disabled-persons-allowance-statistics', 'disease-prevalence', 'disease-prevalence-in-northern-ireland', 'domestic-abuse-incidents-and-crimes-recorded-by-the-police-in-northern-ireland', 'draft-programme-for-government-framework-2016-21', 'drainage-asset', 'driver-testing-outcomes-by-test-centre', 'emergency-care-waiting-times', 'emergency-care-waiting-times1', 'enrolments-by-school-management-type', 'environmental-noise-directive-noise-mapping', 'essential-skills-activity', 'essential-skills-activity-2002-20014-15', 'essential-skills-activity-2002-2015-16', 'essential-skills-enrolments-administrative-geographies', 'estimates-of-the-population-aged-85-and-over-northern-ireland', 'eu-referendum-23-june-2016', 'european-parliament-election-22-may-2014-ni-region', 'events', 'farm-census-administrative-geographies', 'fermanagh-and-omagh-district-council-bowling-pavilions', 'fermanagh-and-omagh-district-council-car-parks', 'fermanagh-and-omagh-district-council-leisure-centres', 'fermanagh-and-omagh-district-councillors', 'filtering-offence-list', 'find-your-nearest-bus-stops', 'find-your-public-amenities-in-northern-ireland', 'fly-tipping', 'fodc-community-centres', 'fodc-recycling-centres', 'fodc-tree-preservation-orders', 'food-premise-hygiene-ratings', 'full-list-of-first-forenames-given-to-babies-registered-in-northern-ireland', 'further-education-achievements-administrative-geographies', 'further-education-activity', 'further-education-regulated-enrolments-administrative-geographies', 'gp-practice-list-sizes', 'gp-prescribing-data', 'gp-prescription-spending-by-british-national-formulary-bnf-chapter-and-drug-april-2013-may-2016', 'greater-beflast-area-park-and-ride-sites', 'gross-hourly-pay-annual-survey-of-hours-and-earnings', 'gross-weekly-pay-annual-survey-of-hours-and-earnings', 'gsni-250k-geology', 'gsni-tellus-regional-airborne-geophysical-survey-electromagnetics', 'gsni-tellus-regional-airborne-geophysical-survey-magnetics', 'gsni-tellus-regional-airborne-geophysical-survey-radiometrics', 'gsni-tellus-regional-stream-sediments', 'gsni-tellus-regional-stream-waters-icp', 'hea', 'health', 'health-trust-community-services-and-reference-costs', 'health-trust-community-services-indicators-and-reference-costs-2015-16', 'health-trust-reference-costs-2015-16', 'health-trust-specialist-services-reference-costs-2015-16', 'higher-education-enrolments-administrative-geographies', 'higher-education-qualifications-administrative-geographies', 'highway-network', 'historic-environment-division-lidar', 'historic-environment-division-lidar-2008', 'historic-environment-division-lidar-2009', 'historic-environment-division-lidar-2013', 'historic-environment-division-lidar-2014', 'historic-parks-and-gardens', 'https-www-daera-ni-gov-uk-sites-default-files-publications-doe-lakewaterbodygml-zip', 'https-www-daera-ni-gov-uk-sites-default-files-publications-doe-riversegmentgml-zip', 'https-www-nisra-gov-uk-statistics-deprivation', 'incidents-and-crimes-with-a-hate-motivation-recorded-by-the-police-in-northern-ireland', 'income-support-recipients-stat-geog', 'industrial-heritage-record', 'inpatient-waiting-times', 'invest-ni-financial-offers-of-support-2012-13-to-2016-17', 'jobs', 'job-seekers-allowance-claimants-statistical-geographies', 'landscape-character-areas', 'leisure-centres', 'libraries-ni-annual-loans-summary', 'library-locations-ni', 'licensed-bus-vehicles', 'licensed-taxi-vehicles', 'lidar-lough-neagh-environs-2017', 'lighting-assets', 'lisburn-and-castlereagh-city-council', 'lisburn-carpark-data', 'listed-buildings-northern-ireland', 'locate-a-school', 'lone-pensioner-allowance-statistics', 'lps-arcgis', 'materials-collected-at-northern-ireland-local-authority-waste-management-sites-time-series-data', 'members-allowances', 'metro-timetable-data-valid-from-18-june-until-31-august-2016', 'mid-and-east-antrim-borough-councillors', 'mid-and-east-antrim-borough-council-off-street-car-parks', 'mid-ulster-district-councillors', 'my-train-for-amazon-echo', 'national-nature-reserves', 'newry-mourne-and-down-district-councillors', 'nia-constituencies', 'ni-air-quality', 'nia-members', 'nia-questions', 'nia-register', 'ni-assembly-election-5-may-2016', 'ni-cycle-counters-raw-data-2011-2016', 'ni-direct-website-data', 'niea-authorised-landfill-sites', 'ni-gp-prescribing-data-november-december-2015', 'ni-herd-flock-map-references', 'ni-housing-stock', 'nihpi-by-propertytype', 'nihpi-mean-median-standardisded-price', 'nihpi-verified-sales-by-propertytype', 'nir20160126v2', 'nisra-open-data-boundaries-output-areas-2001', 'nisra-open-data-boundaries-small-areas-2011', 'nisra-open-data-boundaries-super-output-areas-2011', 'ni-water-customer-tap-authorised-supply-point-results', 'norther', 'northern-ireland-capital-value-bands-april-2016', 'northern-ireland-catchment-stakeholder-groups', 'northern-ireland-civil-service-employment-statistics', 'northern-ireland-composite-economic-index', 'northern-ireland-groundwater-bodies', 'northern-ireland-index-of-production', 'northern-ireland-index-of-services', 'northern-ireland-local-authority-collected-municipal-waste-management-statistics', 'northern-ireland-local-councils-election-22-may-2014', 'northern-ireland-local-management-areas', 'northern-ireland-multiple-deprivation-measure-2005', 'northern-ireland-multiple-deprivation-measures-2017', 'northern-ireland-new-dwelling-completions', 'northern-ireland-new-dwelling-starts', 'northern-ireland-planning-statistics-annual-dataset', 'northern-ireland-railways-halts', 'northern-ireland-railways-nir-bridges', 'northern-ireland-railways-nir-culverts', 'northern-ireland-railways-nir-railway-network', 'northern-ireland-railways-nir-signal-posts', 'northern-ireland-railways-platforms', 'northern-ireland-railways-stations', 'northern-ireland-river-basin-districts', 'northern-ireland-river-water-bodies', 'northern-ireland-road-safety-partnership-2011-2016', 'northern-ireland-road-traffic-collision-data-2013-15-a-visualisation-of-the-data', 'northern-ireland-sites-and-monuments-record', 'northern-ireland-traffic-count-data', 'notifiable-infectious-diseases-report-2016-week-17', 'notifiable-infectious-diseases-reports-noids-northern-ireland-trends-predictions', 'number-of-education-establishments-in-ni-by-management-type1991-2017', 'number-of-pupils-attending-education-establishmentsin-ni-by-school-type-1991-2017', 'obese-and-overweight-children-health-survey-northern-ireland', 'off-street-car-parks-within-mid-ulster', 'opendatani-dataset-numbers-by-publisher', 'opendatani-dataset-suggestions', 'osni-open-data-10k-grid', 'osni-open-data-10m-dtm-sheets-101-1501', 'osni-open-data-10m-dtm-sheets-1-501', 'osni-open-data-10m-dtm-sheets-151-2001', 'osni-open-data-10m-dtm-sheets-201-2501', 'osni-open-data-10m-dtm-sheets-251-2931', 'osni-open-data-10m-dtm-sheets-51-1001', 'osni-open-data-1-million-raster-infrastructure1', 'osni-open-data-1-million-raster-locations1', 'osni-open-data-1-million-raster-natural-environment1', 'osni-open-data-1-million-raster-parliamentary-boundaries1', 'osni-open-data-50k-admin-boundaries-counties1', 'osni-open-data-50k-admin-boundaries-deas-19931', 'osni-open-data-50k-admin-boundaries-lgds-19931', 'osni-open-data-50k-admin-boundaries-ni-outline1', 'osni-open-data-50k-admin-boundaries-parliamentary-constituencies-20081', 'osni-open-data-50k-admin-boundaries-townlands1', 'osni-open-data-50k-admin-boundaries-wards-19931', 'osni-open-data-50k-grid', 'osni-open-data-50k-transport-line1', 'osni-open-data-50k-transport-text1', 'osni-open-data-50m-digital-terrain-model-csv1', 'osni-open-data-benchmark-data1', 'osni-open-data-largescale-boundaries-county-boundaries1', 'osni-open-data-largescale-boundaries-district-electoral-areas-19931', 'osni-open-data-largescale-boundaries-district-electoral-areas-20121', 'osni-open-data-largescale-boundaries-local-government-districts-19931', 'osni-open-data-largescale-boundaries-local-government-districts-20121', 'osni-open-data-largescale-boundaries-ni-outline1', 'osni-open-data-largescale-boundaries-parliamentary-constituencies-20081', 'osni-open-data-largescale-boundaries-townlands1', 'osni-open-data-largescale-boundaries-wards-19931', 'osni-open-data-largescale-boundaries-wards-20121', 'osni-open-data-midscale-raster1', 'osni-open-data-placename-gazetteer1', 'osni-open-data-river-basin-lidar-2004-dtms-and-dsms', 'osni-open-data-river-basin-lidar-2007-dtms-and-dsms', 'osni-open-data-river-basin-lidar-2008-dtms-and-dsms', 'osni-open-data-river-basin-lidar-2009-dtms-and-dsms', 'osni-open-data-river-basin-lidar-2010-dtms-and-dsms', 'osni-open-data-river-basin-lidar-2012-dtms-and-dsms', 'osni-open-data-river-basin-lidar-2013-dtms-and-dsms', 'osni-open-data-river-basin-lidar-2014-dtms-and-dsms', 'osni-open-data-streetmaps1', 'osni-open-data-street-name-gazetteer1', 'osni-open-data-townland-raster-maps1', 'osni-open-data-transport-50k-points', 'ou', 'outpatient-first-and-follow-up-appointments', 'outpatient-first-and-follow-up-appointments-2015-16', 'outpatient-waiting-times', 'pedestrain-crossing', 'pension-credit-claimants-statistical-geography', 'pitches-playing-fields', 'play-centres', 'playgrounds', 'police-recorded-crime-in-northern-ireland', 'police-recorded-injury-road-traffic-collision-statistics-northern-ireland', 'police-recorded-injury-road-traffic-collision-statistics-northern-ireland-2013', 'police-recorded-injury-road-traffic-collision-statistics-northern-ireland-2014', 'police-recorded-injury-road-traffic-collision-statistics-northern-ireland-2016', 'population-estimates-for-northern-ireland', 'population-estimates-for-small-areas-northern-ireland', 'population-estimates-for-super-output-areas-soas-and-electoral-wards-northern-ireland', 'population-projection-for-northern-ireland', 'population-projections-for-areas-within-northern-ireland-2016-based', 'pothole-enquiries', 'prescriptions-cost-savings', 'prescriptive-variability-of-drugs-by-general-practitioners', 'primary-1-and-year-8-obesity', 'protected-wreck-sites', 'psni-street-crime-data', 'public-toilets', 'ramsar-sites', 'rating-income', 'real-time-rail-stations-arrivals-and-departures', 'recycling-banks', 'recycling-centres', 'registered-and-responsible-body-list', 'register-of-charities', 'roadworks', 'rqia-registered-services', 'rural-soil-survey', 'scheduled-historic-monument-areas', 'school-census-nursery-schools', 'school-census-post-primary', 'school-census-pre-primary-schools', 'school-census-primary-schools', 'school-census-special-schools', 'school-enrolments-ni-summary-data', 'senior-officer-salaries', 'settlement-boundary-data', 'special-areas-of-conservation', 'special-protection-areas', 'sports-pitches', 'state-pension-claimants-statistical-geography', 'stop-and-search-statistics-2016-17', 'surface-defects', 'surface-water-bodies-with-2015-status', 'theatre-utilisation-activity', 'the-continuing-religious-divide-in-northern-ireland-s-schools', 'the-public-trees-of-belfast-city', 'total-paid-hours-worked-annual-survey-of-hours-and-earnings', 'translink-bus-stop-list', 'translink-metro-bus-routes', 'translink-ni-railways-stations', 'translink-ulsterbus-routes', 'trust-community-services-indicators-and-reference-costs', 'trust-reference-costs', 'trust-specialist-services-reference-costs', 'type-of-long-term-condition-by-age-by-sex-dc3101ni', 'uk-parliamentary-election-7-may-2015', 'ulsterbus-and-goldline-timetable-data-from-28-june-31-august-2016', 'umbrella-bodies', 'upcoming-parades', 'using-machine-learning-to-extract-useful-information-about-belfast-trees', 'vehicle-testing-outcomes-by-test-centre', 'victims-survivors-services-funded-organisations-2017-18', 'visualising-the-n-i-bus-network-you-wait-for-ages-and-3-743-show-up-at-once', 'water-abstraction-licence-application-locations-except-public-water-supplies', 'what-s-under-the-trees-visualising-opendatani-lidar-in-a-geographical-information-system-gis', 'whylivehere', 'world-heritage-site', 'york-street-interchange-lidar', 'youth-magistrates-defendants-dealt-with']
resource = 'school-census-post-primary'
dfs = {}
for dataset in odni.action.package_show(id=resource)['resources']:
df = pd.read_csv(dataset['url'], index_col=False)
dfs[dataset['name']] = df
print(list(dfs.keys()))
['School Census - post primary - Reference data', 'School Census - post primary - enrolments', 'School Census - post primary - Free School Meal entitlement', 'School Census - post primary - SEN', 'School Census - post Primary - Religion', 'School Census - post - Primary - newcomers']
There's a problem; We want to know about change over time.
Dept of Ed have this data 2009-18 on their website in Excel format
OpenDataNI do not 😭
base_link = "https://www.education-ni.gov.uk/sites/default/files/publications/education/"
pd.read_excel(base_link+'School%20level%20-%20post%20primary%20schools%20data%20supp%201718.xls').head()
Data are collected annually through the School census exercise. | Unnamed: 1 | |
---|---|---|
0 | This takes place in early October, when each s... | NaN |
1 | In 2017/18, the school census date was 6th Oct... | NaN |
2 | Among the information collected are details on... | NaN |
3 | Statistics produced from the school census exe... | NaN |
4 | NaN | NaN |
pd.read_excel(base_link+\
'School%20level%20-%20post%20primary%20schools%20data%20supp%201617.xls'
).head()
--------------------------------------------------------------------------- HTTPError Traceback (most recent call last) <ipython-input-229-3526dec84404> in <module>() ----> 1 pd.read_excel(base_link+ 'School%20level%20-%20post%20primary%20schools%20data%20supp%201617.xls' 2 ).head() ~/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 116 else: 117 kwargs[new_arg_name] = new_arg_value --> 118 return func(*args, **kwargs) 119 return wrapper 120 return _deprecate_kwarg ~/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py in read_excel(io, sheet_name, header, skiprows, skip_footer, index_col, names, usecols, parse_dates, date_parser, na_values, thousands, convert_float, converters, dtype, true_values, false_values, engine, squeeze, **kwds) 228 229 if not isinstance(io, ExcelFile): --> 230 io = ExcelFile(io, engine=engine) 231 232 return io._parse_excel( ~/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py in __init__(self, io, **kwds) 281 # to get_filepath_or_buffer() 282 if _is_url(self._io): --> 283 io = _urlopen(self._io) 284 elif not isinstance(self.io, (ExcelFile, xlrd.Book)): 285 io, _, _ = get_filepath_or_buffer(self._io) ~/anaconda3/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context) 221 else: 222 opener = _opener --> 223 return opener.open(url, data, timeout) 224 225 def install_opener(opener): ~/anaconda3/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout) 530 for processor in self.process_response.get(protocol, []): 531 meth = getattr(processor, meth_name) --> 532 response = meth(req, response) 533 534 return response ~/anaconda3/lib/python3.6/urllib/request.py in http_response(self, request, response) 640 if not (200 <= code < 300): 641 response = self.parent.error( --> 642 'http', request, response, code, msg, hdrs) 643 644 return response ~/anaconda3/lib/python3.6/urllib/request.py in error(self, proto, *args) 568 if http_err: 569 args = (dict, 'default', 'http_error_default') + orig_args --> 570 return self._call_chain(*args) 571 572 # XXX probably also want an abstract factory that knows when it makes ~/anaconda3/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args) 502 for handler in handlers: 503 func = getattr(handler, meth_name) --> 504 result = func(*args) 505 if result is not None: 506 return result ~/anaconda3/lib/python3.6/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs) 648 class HTTPDefaultErrorHandler(BaseHandler): 649 def http_error_default(self, req, fp, code, msg, hdrs): --> 650 raise HTTPError(req.full_url, code, msg, hdrs, fp) 651 652 class HTTPRedirectHandler(BaseHandler): HTTPError: HTTP Error 404: Not Found
School%20level%20-%20post%20primary%20schools%20data%20supp%201718.xls
School%20level%20-%20post%20primary%20schools%20data%201617%20supp_0.XLS
from bs4 import BeautifulSoup
import requests
base_url="https://www.education-ni.gov.uk"
listing_path = "/articles/school-enrolments-school-level-data"
soup = BeautifulSoup(requests.get(base_url+listing_path).text, 'lxml')
for link in soup.find_all('a'):
if 'School enrolments - school level data 20' in ' '.join(map(str,link.contents)):
print(link)
<a href="/publications/school-enrolments-school-level-data-201718">School enrolments - school level data 2017/18</a> <a href="/publications/school-enrolments-school-level-data-201617">School enrolments - school level data 2016/17</a> <a href="/publications/school-enrolments-school-level-data-201516">School enrolments - school level data 2015/16</a> <a href="/publications/school-enrolments-school-level-data-201415">School enrolments - school level data 2014/15</a> <a href="/publications/school-enrolments-school-level-data-201314">School enrolments - school level data 2013/14</a> <a href="/publications/school-enrolments-school-level-data-201213">School enrolments - school level data 2012/13</a> <a href="/publications/school-enrolments-school-level-data-201112">School enrolments - school level data 2011/12</a> <a href="/publications/school-enrolments-school-level-data-201011">School enrolments - school level data 2010/11</a> <a href="/publications/school-enrolments-school-level-data-200910">School enrolments - school level data 2009/10</a>
soup = BeautifulSoup(requests.get(base_url+listing_path).text, 'lxml')
year_urls = {}
for link in soup.find_all('a'):
contents = ' '.join(map(str,link.contents))
if 'School enrolments - school level data 20' in contents:
year = link.get('href')[-6:-2]
year_urls[int(year)] = link.get('href')
year_urls
{2009: '/publications/school-enrolments-school-level-data-200910', 2010: '/publications/school-enrolments-school-level-data-201011', 2011: '/publications/school-enrolments-school-level-data-201112', 2012: '/publications/school-enrolments-school-level-data-201213', 2013: '/publications/school-enrolments-school-level-data-201314', 2014: '/publications/school-enrolments-school-level-data-201415', 2015: '/publications/school-enrolments-school-level-data-201516', 2016: '/publications/school-enrolments-school-level-data-201617', 2017: '/publications/school-enrolments-school-level-data-201718'}
Using urllib.request.urlretrieve
to store them on the local filesystem
import urllib.request
from pathlib import Path
from collections import defaultdict
year_files = defaultdict(list)
dest = Path('./data/education-ni/')
for year, path in year_urls.items():
dest.joinpath(str(year)).mkdir(parents=True, exist_ok=True)
soup = BeautifulSoup(requests.get(base_url+path).text, 'lxml')
for link in soup.find_all('a'):
href=link.get('href',"")
filename = href.split('/')[-1]
contents = ' '.join(map(str,link.contents))
if 'xls' in href.split('.')[-1].lower():
# See this requirement right 👆 Fuck you DoE
# Mix of xls, XLSX, xlsx and XLS
if not dest.joinpath(str(year),filename).exists():
urllib.request.urlretrieve(href, dest.joinpath(str(year),filename))
year_files[year].append(filename)
print(f"{year}:{len(year_files[year])} docs")
2017:8 docs 2016:8 docs 2015:5 docs 2014:5 docs 2013:5 docs 2012:5 docs 2011:5 docs 2010:5 docs 2009:5 docs
We've technically 'Extracted' the data we wanted from the source,
Clean and twist the data until it's useful
This is the most frustrating part of being a data scientist/engineer...
for year, filenames in year_files.items():
for file in filenames:
print(f"{year}:/{file}")
2017:/School%20level%20-%20nursery%20schools%20data%201718%20supp.xlsx 2017:/School%20level%20-%20post%20primary%20schools%20data%20supp%201718.xls 2017:/School%20level%20-%20primary%20schools%20data%20supp%201718.xlsx 2017:/School%20level%20-%20pre%20school%20data%201718%20supp.xlsx 2017:/available%20places%20-%20primary%201718%20supp.xlsx 2017:/available%20places%20-%20post-primary%201718%20supp.xlsx 2017:/available%20places%20-%20nursery%201718%20supp.xlsx 2017:/School%20level%20-%20special%20school%20data%202017-18_0.xlsx 2016:/School%20level%20-%20nursery%20schools%20data%201617%20supp.XLS 2016:/School%20level%20-%20post%20primary%20schools%20data%201617%20supp_0.XLS 2016:/School%20level%20-%20primary%20schools%20data%201617%20supp.XLSX 2016:/School%20level%20-%20pre%20school%20data%201617%20supp.XLSX 2016:/School%20level%20-%20special_school_data%201617%20supp_0.XLSX 2016:/available%20places%20-%20primary%201617%20supp.XLSX 2016:/available%20places%20-%20post-primary%201617%20supp.XLSX 2016:/available%20places%20-%20nursery%201617%20supp%20final.XLSX 2015:/School%20level%20-%20nursery%20schools%20data%201516%20supp.XLS 2015:/School-level-post-primary-schools-data-1516_1.XLS 2015:/School%20level%20-%20primary%20schools%20data%201516%20supp_0.XLSX 2015:/School%20level%20-%20pre%20school%20data%201516%20supp.XLSX 2015:/School%20level%20-%20special_school_data%201516%20supp.XLSX 2014:/School%20level%20-%20pre%20school%20data%201415.XLS 2014:/School%20level%20-%20nursery%20schools%20data%201415%20supp%20-%20unfilled%20places.XLS 2014:/primary-schools-data-1415-supp-with-unfilled-places.xlsx 2014:/post-primary-schools-data-1415-supp-with-unfilled-places.xlsx 2014:/special-school-data-1415-supp.xlsx 2013:/pre-school-data-2013-14.xlsx 2013:/nursery-schools-data-13-14-supp-incl-unfilled-places.xlsx 2013:/primary-schools-data-2013-14-supp-inc-unfilled-places.xlsx 2013:/post-primary-schools-data-13-14-supp-inc-unfilled-places.xlsx 2013:/special-school-data-2013-14-supp.xlsx 2012:/pre-school-data-2012-13.xlsx 2012:/nursery-schools-data-2012-13-supp-inc-unfilled-places.xlsx 2012:/primary-schools-data-2012-13-supp-inc-unfilled-places-2.xlsx 2012:/post-primary-schools-data-2012-13-supp-inc-unfilled-places.xlsx 2012:/special-school-data-2012-13-supp.xlsx 2011:/pre-school-data-2011-12.xlsx 2011:/nursery-schools-data-2011-12-supp-inc-unfilled-2.xlsx 2011:/primary-schools-data-11-12-supp-inc-unfilled-places-r.xlsx 2011:/post-primary-schools-data-11-12-supp-inc-unfilled-2.xlsx 2011:/special-school-data-11-12-supp.xlsx 2010:/pre-school-data-1011-3.xlsx 2010:/nursery-schools-data-2010-11-supp-inc-unfilled.xlsx 2010:/primary-schools-data-1011-supp-inc-unfilled-r.xlsx 2010:/post-primary-schools-data-10-11-supp-inc-unfilled.xlsx 2010:/special-school-data-1011-supp-2.xlsx 2009:/pre-school-data-0910-3.xlsx 2009:/nursery-schools-data-0910-supp-3.xlsx 2009:/primary-schools-data-0910-supp-r.xlsx 2009:/post-primary-schools-data-0910-supp-2.xlsx 2009:/special-school-data-0910-supp-3.xlsx
for year, filenames in year_files.items():
for file in filenames:
if 'post' in file:
print(f"{year}:/{file}")
2017:/School%20level%20-%20post%20primary%20schools%20data%20supp%201718.xls 2017:/available%20places%20-%20post-primary%201718%20supp.xlsx 2016:/School%20level%20-%20post%20primary%20schools%20data%201617%20supp_0.XLS 2016:/available%20places%20-%20post-primary%201617%20supp.XLSX 2015:/School-level-post-primary-schools-data-1516_1.XLS 2014:/post-primary-schools-data-1415-supp-with-unfilled-places.xlsx 2013:/post-primary-schools-data-13-14-supp-inc-unfilled-places.xlsx 2012:/post-primary-schools-data-2012-13-supp-inc-unfilled-places.xlsx 2011:/post-primary-schools-data-11-12-supp-inc-unfilled-2.xlsx 2010:/post-primary-schools-data-10-11-supp-inc-unfilled.xlsx 2009:/post-primary-schools-data-0910-supp-2.xlsx
Right, lets limit our scope to look at places available
for year, filenames in year_files.items():
print(f"Year: {year}")
for file in filenames:
if file.lower().startswith('available'):
print(file)
Year: 2017 available%20places%20-%20primary%201718%20supp.xlsx available%20places%20-%20post-primary%201718%20supp.xlsx available%20places%20-%20nursery%201718%20supp.xlsx Year: 2016 available%20places%20-%20primary%201617%20supp.XLSX available%20places%20-%20post-primary%201617%20supp.XLSX available%20places%20-%20nursery%201617%20supp%20final.XLSX Year: 2015 Year: 2014 Year: 2013 Year: 2012 Year: 2011 Year: 2010 Year: 2009
Ok, what about actually parsing the excel sheets?
Will they be consistent?
sheets = defaultdict(list)
for year, files in year_files.items():
for file in files:
if 'post' in file:
xls = pd.ExcelFile(f'data/education-ni/{year}/{file}')
sheets[year].extend(xls.sheet_names)
dict(sheets)
{2009: ['meta data', 'reference data', 'enrolments', 'gender', 'free school meals', 'SEN'], 2010: ['meta data', 'reference data', 'enrolments', 'gender', 'free school meals', 'SEN', 'unfilled places'], 2011: ['meta data', 'reference data', 'enrolments', 'gender', 'free school meals', 'SEN', 'unfilled places'], 2012: ['meta data', 'reference data', 'enrolments', 'gender', 'free school meals', 'SEN', 'unfilled places'], 2013: ['meta data', 'reference data', 'enrolments', 'gender', 'free school meals', 'SEN', 'unfilled places'], 2014: ['meta data', 'reference data', 'enrolments', 'gender', 'free school meals', 'SEN', 'religion', 'newcomers', 'unfilled places'], 2015: ['meta data', 'reference data', 'enrolments', 'gender', 'free school meals', 'SEN', 'religion', 'newcomers', 'unfilled places'], 2016: ['meta data', 'reference data', 'enrolments', 'free school meals', 'SEN', 'religion', 'newcomers', 'School level data', 'Overall'], 2017: ['meta data', 'reference data', 'enrolments', 'free school meals', 'SEN', 'religion', 'newcomers', 'School level data', 'Overall']}
from collections import Counter
all_sheet_names = Counter([_ for d in sheets.values() for _ in d])
all_sheet_names.most_common()
[('meta data', 9), ('reference data', 9), ('enrolments', 9), ('free school meals', 9), ('SEN', 9), ('gender', 7), ('unfilled places', 6), ('religion', 4), ('newcomers', 4), ('School level data', 2), ('Overall', 2)]
df = pd.DataFrame.from_dict({
year: [sn in sheet_names for sn in all_sheet_names]
for year, sheet_names in sheets.items()
}, orient='index')
df.columns=all_sheet_names.keys()
df.T
2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | |
---|---|---|---|---|---|---|---|---|---|
meta data | True | True | True | True | True | True | True | True | True |
reference data | True | True | True | True | True | True | True | True | True |
enrolments | True | True | True | True | True | True | True | True | True |
free school meals | True | True | True | True | True | True | True | True | True |
SEN | True | True | True | True | True | True | True | True | True |
religion | True | True | True | True | False | False | False | False | False |
newcomers | True | True | True | True | False | False | False | False | False |
School level data | True | True | False | False | False | False | False | False | False |
Overall | True | True | False | False | False | False | False | False | False |
gender | False | False | True | True | True | True | True | True | True |
unfilled places | False | False | True | True | True | True | True | True | False |
Use visualisation tools to validate your transformation assumptions...
All. The. Time.
import seaborn as sns
f,ax = plt.subplots(figsize=graph_figsize)
sns.heatmap(df.T, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x128401710>
from collections import defaultdict
import numpy as np
## Initialisation and re-walking already extracted
## Datasets.
year_files = defaultdict(list)
dest = Path('./data/education-ni/')
for year_p in dest.iterdir():
try:
year=int(year_p.parts[-1])
for file in year_p.iterdir():
year_files[year].append(file.parts[-1])
except ValueError:
pass
year_files = dict(sorted(year_files.items()))
This was the only document that seemed to explain what 'supernumerary' actually means
It means 'extra'...
Excel on Mac helpfully puts down a lock-entry on open excel files which prevent it from being opened by other things.
Pandas occasionally has unhelpful error messages in this regard
Special Educational Needs classification policy is >>100pg policy that appears to change every year.
I am not an expert on education policy so I'm not sticking my foot in that one without being sure I understand the implications.
<img src="img/hashs.png", width="90%"/>
def parse_reference_table(xls):
"""
From an ExcelFile, clean up:
* School Mgmt Type disaster
* Inconsistent header depth
* Multi-row header names
* inconsistent headers (ref_key_map)
* inconsistent col order
* inconsistent caps/spacing (strip|lower)
"""
cols= [
'de ref',
'school name',
'school type',
'address 1',
'postcode',
'urban_rural',
'school management type',
'district council',
'parliamentary constituency',
'town'
]
categories = [
'school type',
'urban_rural',
'school management type',
'district council',
'parliamentary constituency',
'town'
]
ref_key_map={
'denino':'de ref',
'urban/ rural': 'urban_rural',
'schoolname': 'school name'
}
reference_value_rename = {
'school management type':{
'gmi':'integrated',
'controlled integrated':'integrated',
'roman catholic maintained':'rc maintained',
'grant maintained integrated':'integrated',
'voluntary - other managed':'voluntary',
'voluntary - rc managed':'voluntary',
'catholic maintained':'rc maintained'
}
}
join_n_strip_n_lower = lambda l: ' '.join(l).strip().lower()
if 'reference data' in xls.sheet_names:
df = pd.read_excel(xls, 'reference data', header=None)
h_range = 2 if isinstance(df.ix[3,0], int) else 3
try:
df.columns=df.ix[1:h_range].fillna('').apply(join_n_strip_n_lower, axis=0).values
df.rename(columns=ref_key_map, inplace=True)
df = df.drop(df.index[0:h_range+1]).reset_index(drop=True)
df = df[cols]
df['de ref'] = df['de ref'].astype(int)
df.set_index('de ref', inplace=True)
for c in df:
df[c]=df[c].str.lower().str.strip()
df.replace(reference_value_rename, inplace=True)
for c in categories:
df[c].fillna('NA', inplace=True)
df[c] = df[c].astype('category')
except TypeError as e:
print(e)
else:
df=None
return df
def parse_enrolments_table(xls):
"""From an ExcelFile, clean up:
* Inconsistent header depth
* fucked up nans/nulls all over the place
* inconsistent *footer* depth...
* Set de ref as join index and drop pointless fields
"""
def join_n_strip_n_lower(l): return ' '.join(l).strip().lower()
def strip_n_lower(s): return s.strip().lower()
def unyearify(s): return int(s.replace('year ', ''))
if 'enrolments' in xls.sheet_names:
df = pd.read_excel(xls, 'enrolments', header=None, skip_footer=5)
h_range = 2 if isinstance(df.ix[3, 0], int) else 3
try:
df.columns = df.ix[3].fillna('').apply(strip_n_lower).values
df = df.drop(df.index[0:h_range+1]).reset_index(drop=True)
df.dropna(how='all', inplace=True, axis=0)
df['de ref'] = df['de ref'].astype(int)
df.drop('schoolname', axis=1, inplace=True)
df.drop('total pupils', axis=1, inplace=True)
df.set_index('de ref', inplace=True)
df.rename(columns=unyearify, inplace=True)
df = df.astype(float)
except TypeError as e:
print(e)
else:
df = None
return df
def parse_fsm_table(xls):
"""From an ExcelFile, clean up:
* Inconsistent header depth
* fucked up nans/nulls all over the place
* inconsistent *footer* depth...
* Set de ref as join index and drop pointless fields
"""
join_n_strip_n_lower = lambda l: ' '.join(l).strip().lower()
strip_n_lower = lambda s: s.strip().lower()
unyearify = lambda s: int(s.replace('year ',''))
if 'free school meals' in xls.sheet_names:
df = pd.read_excel(xls, 'free school meals', header=None, skip_footer=5)
h_range = 2 if isinstance(df.ix[3,0], int) else 3
try:
df.columns=df.ix[3].fillna('').apply(strip_n_lower).values
df = df.drop(df.index[0:h_range+1]).reset_index(drop=True)
df.dropna(how='all', inplace=True, axis=0)
df['de ref'] = df['de ref'].astype(int)
df.drop('schoolname',axis=1, inplace=True)
df.drop('free school meals', axis=1, inplace=True)
df.set_index('de ref', inplace=True)
df.replace('#',pd.np.nan, inplace=True) # # = Undisclosed
df.replace('*',2.0, inplace=True) # * == < 5
df.replace('!',1, inplace=True) # ! avoid identification, so it's prob one or two
df=df.astype(float)
except TypeError as e:
print(e)
else:
df=None
return df
def parse_available_table(xls):
"""From an ExcelFile, clean up:
* Inconsistent header depth
* fucked up nans/nulls all over the place
* inconsistent *footer* depth...
* Set de ref as join index and drop pointless fields
* Totally different schemas between years
* Inconsistent metric naming
* non numerical data flags (*/!)
"""
ref_key_map={
'schoolname': 'school name',
'total unfilled places': 'available places',
'unfilled places': 'available places',
'total approved enrolment number': 'approved enrolments'
}
join_n_strip_n_lower = lambda l: ' '.join(l).strip().lower()
if 'School level data' in xls.sheet_names:
df = pd.read_excel(xls, 'School level data', header=None)
h_range = 2 if isinstance(df.ix[3,0], int) else 3
elif 'unfilled places' in xls.sheet_names:
df = pd.read_excel(xls, 'unfilled places', header=None)
h_range = 2 if isinstance(df.ix[3,0], int) else 3
else:
df=None
if df is not None:
try:
df.columns=df.ix[1:h_range].fillna('').apply(join_n_strip_n_lower, axis=0).values
df.rename(columns=ref_key_map, inplace=True)
df = df.drop(df.index[0:h_range+1]).reset_index(drop=True)
df=df.applymap(lambda x: np.nan if isinstance(x, str) and x.isspace() else x)
df.dropna(how='all', axis=1, inplace=True)
df.dropna(how='any', axis=0, inplace=True)
if df.shape[1] == 6: # recent doesn't have fecking headers
cols = list(df.columns)
cols[0] = 'de ref'
cols[1] = 'school name'
df.columns=cols
df.drop('school name', axis=1, inplace=True)
df['de ref'] = df['de ref'].astype(int)
df.set_index('de ref', inplace=True)
df.replace('*',2.0, inplace=True) # * == < 5
df.replace('!',1, inplace=True) # ! avoid identification, so it's prob one or two
df.dropna(how='all', inplace=True, axis=1)
df.astype(int, inplace=True)
except TypeError as e:
print(e)
return df
re_dfs={}
av_dfs={}
en_dfs={}
fsm_dfs={}
for year, files in year_files.items():
for file in files:
if 'post' in file:
xls = pd.ExcelFile(f'data/education-ni/{year}/{file}')
df = parse_reference_table(xls)
if df is not None:
print(f'Got reference data for {year}')
re_dfs[year]=df
df = parse_enrolments_table(xls)
if df is not None:
print(f'Got enrolment data for {year}')
en_dfs[year]=df
df = parse_available_table(xls)
if df is not None:
print(f'Got available data for {year}')
av_dfs[year]=df
df = parse_fsm_table(xls)
if df is not None:
print(f'Got fsm data for {year}')
fsm_dfs[year]=df
reference = pd.Panel(re_dfs)
available = pd.Panel(av_dfs)
enrolment = pd.Panel(en_dfs)
fsm = pd.Panel(fsm_dfs)
Got reference data for 2009 Got enrolment data for 2009 Got fsm data for 2009 Got reference data for 2010 Got enrolment data for 2010 Got available data for 2010 Got fsm data for 2010 Got reference data for 2011 Got enrolment data for 2011 Got available data for 2011 Got fsm data for 2011 Got reference data for 2012 Got enrolment data for 2012 Got available data for 2012 Got fsm data for 2012 Got reference data for 2013 Got enrolment data for 2013 Got available data for 2013 Got fsm data for 2013 Got reference data for 2014 Got enrolment data for 2014 Got available data for 2014 Got fsm data for 2014 Got reference data for 2015 Got enrolment data for 2015 Got available data for 2015 Got fsm data for 2015 Got reference data for 2016 Got enrolment data for 2016 Got fsm data for 2016 Got available data for 2016 Got available data for 2017 Got reference data for 2017 Got enrolment data for 2017 Got fsm data for 2017
Panels are 3D Dataframes, i.e. [Year x School x Metric]
enrolment
<class 'pandas.core.panel.Panel'> Dimensions: 9 (items) x 232 (major_axis) x 8 (minor_axis) Items axis: 2009 to 2017 Major_axis axis: 1210014 to 5420314 Minor_axis axis: 8 to 15
enrolment.sum()
2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | |
---|---|---|---|---|---|---|---|---|---|
8 | 23116.0 | 23113.0 | 21770.0 | 21322.0 | 21418.0 | 21367.0 | 21792.0 | 22571.0 | 22817.0 |
9 | 23859.0 | 23403.0 | 22925.0 | 21928.0 | 21425.0 | 21481.0 | 21422.0 | 22284.0 | 22650.0 |
10 | 22954.0 | 24155.0 | 23245.0 | 23101.0 | 21971.0 | 21527.0 | 21443.0 | 21864.0 | 22381.0 |
11 | 23521.0 | 23277.0 | 24094.0 | 23524.0 | 23299.0 | 22092.0 | 21601.0 | 22118.0 | 22102.0 |
12 | 23808.0 | 23656.0 | 22967.0 | 24027.0 | 23353.0 | 23166.0 | 21864.0 | 22091.0 | 22099.0 |
13 | 14912.0 | 15652.0 | 15421.0 | 15892.0 | 16567.0 | 16208.0 | 15912.0 | 15667.0 | 15347.0 |
14 | 12607.0 | 13097.0 | 13274.0 | 13482.0 | 13554.0 | 14174.0 | 13922.0 | 13818.0 | 13149.0 |
15 | 100.0 | 154.0 | 154.0 | 196.0 | 201.0 | 275.0 | 365.0 | 0.0 | 0.0 |
enrolment.sum().sum().iplot()
f,ax = plt.subplots(figsize=graph_figsize)
enrolment.sum().sum(axis=1).plot.bar(ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x125176940>
enrolment.sum().iplot(kind='bar')
enrolment.sum().iplot(kind='bar',barmode='stack')
(enrolment.sum()/enrolment.sum().sum()).iplot(kind='bar')
enrolment.sum(axis=2).head()
2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | |
---|---|---|---|---|---|---|---|---|---|
de ref | |||||||||
1210014 | 672.0 | 688.0 | 703.0 | 708.0 | 704.0 | 697.0 | 704.0 | 696.0 | 691.0 |
1210015 | 647.0 | 650.0 | 649.0 | 666.0 | 709.0 | 737.0 | 744.0 | 728.0 | 740.0 |
1210021 | 963.0 | 929.0 | 899.0 | 930.0 | 916.0 | 884.0 | 910.0 | 954.0 | 946.0 |
1210022 | 907.0 | 932.0 | 913.0 | 930.0 | 943.0 | 969.0 | 968.0 | 991.0 | 997.0 |
1210266 | 297.0 | 290.0 | 264.0 | 247.0 | 88.0 | 0.0 | 0.0 | 0.0 | 0.0 |
When is a zero not nothing?
[10]+[np.nan]*10
[10, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
sum([10]+[np.nan]*10)
nan
pd.Series([10]+[np.nan]*10).sum()
10.0
pd.Series([10]+[np.nan]*10).mean()
10.0
pd.Series([np.nan]*10).sum()
0.0
pd.Series([10]+[0.0]*10).mean()
0.9090909090909091
# Average 'throughput' of a school, right?
enrolment.sum(axis=2).mean(axis=1).sum()
142229.22222222225
enrolment.sum(axis=2).replace(0.0, np.nan).mean(axis=1).sum()
155243.9103174603
enrolment.mean(axis=2).T.iplot()
annual_school_enrolment = enrolment.sum(axis=2).T
annual_school_enrolment.iplot() # Those numbers aren't accessible...
name_lookup = reference.minor_xs('school name').T.describe().ix['top']
annual_school_enrolment.rename(columns=name_lookup)
de ref | ashfield girls' high school | ashfield boys' high school | belfast model school for girls | belfast boys' model school | orangefield high school | st patrick's college | st louise's comprehensive college | little flower girls' school | our lady of mercy girls' school | st rose's high school | ... | abbey christian brothers grammar school | our lady's grammar school | st colman's college | st joseph's grammar school | sacred heart grammar school | the royal school dungannon | the royal school armagh | st patrick's grammar school | st patrick's academy | st ronan’s college |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 | 672.0 | 647.0 | 963.0 | 907.0 | 297.0 | 784.0 | 1547.0 | 691.0 | 438.0 | 468.0 | ... | 880.0 | 861.0 | 919.0 | 527.0 | 836.0 | 646.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2010 | 688.0 | 650.0 | 929.0 | 932.0 | 290.0 | 785.0 | 1530.0 | 687.0 | 429.0 | 440.0 | ... | 887.0 | 861.0 | 940.0 | 555.0 | 846.0 | 658.0 | 682.0 | 803.0 | 0.0 | 0.0 |
2011 | 703.0 | 649.0 | 899.0 | 913.0 | 264.0 | 764.0 | 1530.0 | 690.0 | 384.0 | 406.0 | ... | 894.0 | 864.0 | 942.0 | 579.0 | 845.0 | 665.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2012 | 708.0 | 666.0 | 930.0 | 930.0 | 247.0 | 765.0 | 1521.0 | 692.0 | 367.0 | 388.0 | ... | 907.0 | 861.0 | 920.0 | 604.0 | 849.0 | 652.0 | 706.0 | 0.0 | 0.0 | 0.0 |
2013 | 704.0 | 709.0 | 916.0 | 943.0 | 88.0 | 705.0 | 1516.0 | 684.0 | 423.0 | 352.0 | ... | 900.0 | 858.0 | 900.0 | 635.0 | 855.0 | 653.0 | 712.0 | 0.0 | 0.0 | 0.0 |
2014 | 697.0 | 737.0 | 884.0 | 969.0 | 0.0 | 679.0 | 1518.0 | 644.0 | 433.0 | 316.0 | ... | 911.0 | 845.0 | 904.0 | 678.0 | 853.0 | 660.0 | 722.0 | 0.0 | 0.0 | 0.0 |
2015 | 704.0 | 744.0 | 910.0 | 968.0 | 0.0 | 674.0 | 1474.0 | 637.0 | 446.0 | 273.0 | ... | 902.0 | 853.0 | 897.0 | 707.0 | 854.0 | 650.0 | 720.0 | 972.0 | 0.0 | 0.0 |
2016 | 696.0 | 728.0 | 954.0 | 991.0 | 0.0 | 654.0 | 1420.0 | 633.0 | 452.0 | 240.0 | ... | 882.0 | 854.0 | 903.0 | 709.0 | 852.0 | 652.0 | 724.0 | 975.0 | 1338.0 | 1398.0 |
2017 | 691.0 | 740.0 | 946.0 | 997.0 | 0.0 | 0.0 | 1435.0 | 0.0 | 450.0 | 243.0 | ... | 885.0 | 853.0 | 885.0 | 686.0 | 843.0 | 638.0 | 735.0 | 967.0 | 1340.0 | 1469.0 |
9 rows × 232 columns
annual_school_enrolment.rename(columns=name_lookup).iplot()
first = annual_school_enrolment.replace(0.0,np.nan).apply(pd.Series.first_valid_index).replace(2009,np.nan)
last = annual_school_enrolment.replace(0.0,np.nan).apply(pd.Series.last_valid_index).replace(2017,np.nan)
first.dropna()
de ref 1230321 2017.0 1410315 2014.0 2210312 2013.0 2230322 2017.0 2240319 2015.0 2410311 2011.0 2420320 2016.0 3210313 2015.0 3230310 2010.0 3420317 2015.0 4210316 2015.0 5230321 2017.0 5420263 2010.0 5420268 2010.0 5420304 2016.0 5420314 2016.0 dtype: float64
close_vec=pd.DataFrame({'opened':first.groupby(first).size(),
'closed':last.groupby(last).size()})
close_vec.cumsum().fillna(method='ffill').iplot()
enrolment.minor_axis<13
array([ True, True, True, True, True, False, False, False])
enrolment[:,:,enrolment.minor_axis<13].sum()
2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | |
---|---|---|---|---|---|---|---|---|---|
8 | 23116.0 | 23113.0 | 21770.0 | 21322.0 | 21418.0 | 21367.0 | 21792.0 | 22571.0 | 22817.0 |
9 | 23859.0 | 23403.0 | 22925.0 | 21928.0 | 21425.0 | 21481.0 | 21422.0 | 22284.0 | 22650.0 |
10 | 22954.0 | 24155.0 | 23245.0 | 23101.0 | 21971.0 | 21527.0 | 21443.0 | 21864.0 | 22381.0 |
11 | 23521.0 | 23277.0 | 24094.0 | 23524.0 | 23299.0 | 22092.0 | 21601.0 | 22118.0 | 22102.0 |
12 | 23808.0 | 23656.0 | 22967.0 | 24027.0 | 23353.0 | 23166.0 | 21864.0 | 22091.0 | 22099.0 |
reference.major_axis[(reference.minor_xs('town')=='ballymoney').any(axis=1)]
Int64Index([3210133, 3230075, 3420012], dtype='int64', name='de ref')
However... we do get towns and administrative boundaries...
if not Path('data/gztr.csv').exists():
gazetteer = "45341066-1998-4116-b234-6223e7e19397"
gztr = pd.DataFrame.from_records(dataset_generator(gazetteer))
gztr['town'] = gztr['PLACENAME'].str.lower()
gztr.to_csv('data/gztr.csv', index=False)
else:
gztr = pd.read_csv('data/gztr.csv')
town_lookup = gztr.set_index('town')[['X','Y']].astype(int)
town_lookup.head()
X | Y | |
---|---|---|
town | ||
acton | 305600 | 341200 |
aghadowey | 285900 | 420900 |
aghalee | 312300 | 365000 |
agivey | 289900 | 422900 |
ahoghill | 305000 | 401700 |
f,ax = plt.subplots(figsize=graph_figsize)
town_lookup.plot.scatter(x='X',y='Y', ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x12874ec18>
base_map_zip="http://osni-spatial-ni.opendata.arcgis.com/datasets/d9dfdaf77847401e81efc9471dcd09e1_0.zip"
base_map_shp = "OSNI_Open_Data_Largescale_Boundaries__NI_Outline.shp"
if not Path('data/'+base_map_shp).exists():
urllib.request.urlretrieve(base_map_zip, 'data/_tmp.zip')
with zipfile.ZipFile('data/_tmp.zip') as z:
z.extractall('data/')
Path('data/_tmp.zip').unlink()
base_map=gp.GeoDataFrame.from_file('data/'+base_map_shp)
f,ax = plt.subplots(figsize=graph_figsize)
base_map.plot(ax=ax, color='gray')
<matplotlib.axes._subplots.AxesSubplot at 0x1a31fd0128>
from shapely.geometry import Point
from pyproj import Proj, transform
prj_wgs = Proj(proj='latlong',datum='WGS84')
prj_itm = Proj(init='EPSG:29903')
def apply_itm_to_wgs(row, e='eastings', n='northings'):
eastings, northings = row[e], row[n]
if eastings is not None and northings is not None:
lat,long = transform(prj_itm, prj_wgs, eastings, northings)
else:
lat,long = None,None
return pd.Series({'longitude':long, 'latitude':lat})
geometry= town_lookup.apply(lambda r: Point(apply_itm_to_wgs(r, e='X',n='Y')), axis=1)
town_lookup = gp.GeoDataFrame(town_lookup, crs={'init': 'epsg:4326'},
geometry=geometry)
town_lookup.plot(figsize=graph_figsize)
<matplotlib.axes._subplots.AxesSubplot at 0x1a31c03fd0>
f,ax = plt.subplots(figsize=graph_figsize)
base_map.plot(ax=ax, color="lightgray")
town_lookup.plot(ax=ax, color='darkgray')
<matplotlib.axes._subplots.AxesSubplot at 0x1a31fb2080>
towns = set(reference.minor_xs('town').values.ravel('k'))
en_df= pd.DataFrame.from_dict({
town:
enrolment[:,reference.major_axis[(reference.minor_xs('town')==town).any(axis=1)],:].sum().sum()
for town in towns
}).T.drop(np.nan)
av_df= pd.DataFrame.from_dict({
town:
available[:,reference.major_axis[(reference.minor_xs('town')==town).any(axis=1)],'available places'].sum()
for town in towns
}).T.drop(np.nan)
town_stats = gp.GeoDataFrame(geometry=geometry)
for c in av_df:
town_stats[f"av_{c}"] = av_df[c]
town_stats[f"av_{c}_rat"] = (av_df[c]/en_df[c])
for c in en_df:
town_stats[f"en_{c}"] = en_df[c]
town_stats[["av_2017","en_2017", "av_2017_rat"]]\
.dropna().sort_values('av_2017_rat').sort_values('en_2017').head()
av_2017 | en_2017 | av_2017_rat | |
---|---|---|---|
town | |||
dungiven | 18.0 | 83.0 | 0.216867 |
crumlin | 286.0 | 126.0 | 2.269841 |
aughnacloy | 86.0 | 166.0 | 0.518072 |
crossgar | 230.0 | 235.0 | 0.978723 |
tandragee | 68.0 | 287.0 | 0.236934 |
f,ax = plt.subplots(figsize = graph_figsize)
base_map.plot(color='lightgray', ax=ax)
town_stats.plot(column="en_2017",
ax=ax, legend=True,
cmap='RdYlGn_r')
<matplotlib.axes._subplots.AxesSubplot at 0x1289bda90>
Need some regions
cons_map_zip="http://osni-spatial-ni.opendata.arcgis.com/datasets/563dc2ec3d9943428e3fe68966d40deb_3.zip"
cons_map_shp = "OSNI_Open_Data_Largescale_Boundaries__Parliamentary_Constituencies_2008.shp"
if not Path('data/'+cons_map_shp).exists():
urllib.request.urlretrieve(cons_map_zip, 'data/_tmp.zip')
with zipfile.ZipFile('data/_tmp.zip') as z:
z.extractall('data/')
Path('data/_tmp.zip').unlink()
cons_map=gp.GeoDataFrame.from_file('data/'+cons_map_shp)
cons_map.rename(columns={'PC_NAME':'constituency'}, inplace=True)
cons_map.drop(['OBJECTID','PC_ID'], axis=1, inplace=True)
cons_map['constituency'] = cons_map['constituency'].str.lower().str.strip()
cons_map.set_index('constituency', inplace=True)
f,ax = plt.subplots(figsize=graph_figsize)
cons_map.plot(ax=ax)
cons_map.apply(lambda x: ax.annotate(s=x.name,
xy=x.geometry.centroid.coords[0],
ha='center'),
axis=1);
cons_lookup = {
r.name: list(
town_stats.index[town_stats.geometry.intersects(r.geometry)]
)
for i, r in cons_map.iterrows()
}
cons_lookup
{'belfast east': ['dundonald'], 'belfast north': ['newtownabbey'], 'belfast south': ['belfast', 'carryduff'], 'belfast west': ['dunmurry'], 'east antrim': ['ballycarry', 'ballygalley', 'ballylumford', 'beltoy', 'carncastle', 'carnlough', 'carrickfergus', 'cushendall', 'cushendun', 'eden', 'glenariff', 'glenarm', 'glenoe', 'glynn', 'greenisland', 'larne', 'magheramorne', 'millbrook', 'monkstown', 'whiteabbey', 'whitehead'], 'east londonderry': ['aghadowey', 'agivey', 'articlave', 'ballykelly', 'bolea', 'bovedy', 'bovevagh', 'castlerock', 'claudy', 'coleraine', 'downhill', 'drumsurn', 'dungiven', 'feeny', 'garvagh', 'gortnahey', 'greysteel', 'kilrea', 'limavady', 'macosquin', 'milltown', 'milltown', 'moneydig', 'mullan', 'portballintrae', 'portrush', 'portstewart'], 'fermanagh and south tyrone': ['augher', 'aughnacloy', 'ballinamallard', 'ballygawley', 'bellanaleck', 'belleek', 'benburb', 'boho', 'brookeborough', 'caledon', 'castlecaulfield', 'clogher', 'clonfeacle', 'derrygonnelly', 'donagh', 'dungannon', 'ederney', 'eglish', 'enniskillen', 'fivemiletown', 'garrison', 'irvinestown', 'kesh', 'killadeas', 'kinawley', 'lack', 'lisbellaw', 'lisnaskea', 'maguiresbridge', 'minterburn', 'moy', 'moygashel', 'newtownbutler', 'rosslea', 'tedd', 'tempo'], 'foyle': ['drumahoe', 'eglinton', 'londonderry', 'new buildings'], 'lagan valley': ['aghalee', 'annahilt', 'baileysmill', 'ballyskeagh', 'boardmills', 'culcavy', 'dromara', 'dromore', 'drumbeg', 'drumbo', 'finnis', 'hillsborough', 'lambeg', 'lisburn', 'maghaberry', 'mazetown', 'moira', 'the temple', 'waringsford'], 'mid ulster': ['ballyronan', 'bellaghy', 'cappagh', 'castledawson', 'coagh', 'coalisland', 'cookstown', 'desertmartin', 'donaghmore', 'draperstown', 'dunnamore', 'gulladuff', 'knockcloghrim', 'maghera', 'magherafelt', 'moneymore', 'moneyneany', 'newferry', 'newmills', 'orritor', 'pomeroy', 'sandholes', 'stewartstown', 'swatragh', 'tamlaght', 'the diamond', 'the rock', 'tobermore', 'upperlands'], 'newry and armagh': ['acton', 'armagh', 'belcoo', 'belleek', 'bessbrook', 'blackwatertown', 'camlough', 'carnagh', 'charlemont', 'clare', 'creggan', 'crossmaglen', 'cullaville', 'cullyhanna', 'darkley', 'flurrybridge', 'forkhill', 'hamiltonsbawn', 'jerrettspass', 'jonesborough', 'keady', 'killylea', 'kingsmills', 'laurelvale', 'loughgall', 'markethill', 'middletown', 'milltown', 'newry', 'newtownhamilton', 'poyntz pass', 'richhill', 'tandragee', 'tynan'], 'north antrim': ['ahoghill', 'armoy', 'ballintoy', 'ballybogy', 'ballycastle', 'ballymena', 'ballymoney', 'balnamore', 'bendooragh', 'broughshane', 'buckna', 'bushmills', 'clogh mills', 'clogh', 'corkey', 'craigs', 'cullybackey', 'dervock', 'dunloy', 'dunseverick', 'galgorm', 'garryduff', 'gracehill', '"kells connor"', 'killybegs', 'kilraghts', 'loughguile', 'martinstown', 'moss-side', 'portglenone', 'rasharkin', 'stranocum', 'the drones', 'the dry arch'], 'north down': ['bangor', 'conlig', 'craigavad', 'crawfordsburn', 'cultra', 'donaghadee', 'groomsport', "helen's bay", 'holywood', 'millisle', 'six road end'], 'south antrim': ['antrim', 'ballyclare', 'carnmoney', 'clady', 'crumlin', 'doagh', 'dundrod', 'glenavy', 'killead', 'mallusk', 'moneyglass', 'mossley', 'muckamore', 'randalstown', 'templepatrick', 'the diamond', 'toome'], 'south down': ['annaclone', 'annacloy', 'annadorn', 'annalong', 'annsborough', 'ardglass', 'attical', 'ballyhornan', 'ballymartin', 'ballyroney', 'ballyward', 'bishops court', 'burren', 'castlewellan', 'clough', 'crossgar', 'downpatrick', 'drumaness', 'greencastle', 'hilltown', 'katesbridge', 'kilcoo', 'kilkeel', 'killough', 'loughinisland', 'maghera', 'mayobridge', 'milltown', 'newcastle', 'rathfriland', 'rostrevor', 'saul', 'seaforde', 'strangford', 'the spa', 'warrenpoint'], 'strangford': ['ballygowan', 'ballyhalbert', 'ballynahinch', 'ballywalter', 'carrowdore', 'comber', 'derryboy', 'greyabbey', 'killinchy', 'killyleagh', 'kircubbin', 'kirkistown', 'moneyreagh', 'newtownards', 'portaferry', 'raffrey', 'saintfield', 'shrigley'], 'upper bann': ['banbridge', 'craigavon', 'dollingstown', 'donaghcloney', 'dunbarton', 'gilford', 'lawrencetown', 'lurgan', 'magheralin', 'portadown', 'scarva', 'scotch street', 'waringstown'], 'west tyrone': ['artigarvan', 'ballymagorry', 'beragh', 'carrickmore', 'castlederg', 'clady', 'cranagh', 'dromore', 'drumquin', 'dunnamanagh', 'fintona', 'garvaghy', 'gortin', 'greencastle', 'killen', 'magheramason', 'milltown', 'mountfield', 'mountjoy', 'newtownstewart', 'omagh', 'plumbridge', 'seskinore', 'sion mills', 'sixmilecross', 'strabane', 'trillick', 'victoria bridge']}
reference.minor_xs('parliamentary constituency')
2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | |
---|---|---|---|---|---|---|---|---|---|
de ref | |||||||||
1210014 | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east |
1210015 | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east |
1210021 | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north |
1210022 | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north |
1210266 | belfast east | belfast east | belfast east | belfast east | belfast east | NaN | NaN | NaN | NaN |
1230026 | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | NaN |
1230053 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1230089 | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | NaN |
1230104 | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north |
1230130 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1230146 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1230155 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1230173 | belfast north | belfast north | belfast north | belfast north | NaN | NaN | NaN | NaN | NaN |
1230182 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1230262 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1230275 | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south |
1230321 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | belfast north |
1240291 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1260269 | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north |
1260294 | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south |
1410079 | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east |
1410270 | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south |
1410315 | NaN | NaN | NaN | NaN | NaN | belfast east | belfast east | belfast east | belfast east |
1420018 | belfast east | belfast east | belfast east | belfast east | belfast east | NaN | NaN | NaN | NaN |
1420020 | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east | belfast east |
1420021 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
1420022 | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south |
1420027 | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south | belfast south |
1420028 | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north | belfast north |
1420029 | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west | belfast west |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5230152 | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone |
5230157 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5230160 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | NaN | NaN | NaN |
5230167 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5230187 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5230192 | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster |
5230213 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann |
5230218 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5230256 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | NaN |
5230278 | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster |
5230293 | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone |
5230321 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | upper bann |
5250216 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann |
5260285 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann |
5260286 | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone |
5410013 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann |
5410057 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann |
5410067 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann |
5420045 | south down | south down | south down | south down | south down | south down | south down | south down | south down |
5420056 | upper bann | upper bann | upper bann | upper bann | upper bann | upper bann | NaN | NaN | NaN |
5420059 | newry and armagh | newry and armagh | south down | south down | south down | south down | south down | south down | south down |
5420060 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5420062 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5420073 | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster | mid ulster |
5420076 | newry and armagh | newry and armagh | south down | south down | south down | south down | south down | south down | south down |
5420260 | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone |
5420263 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5420268 | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh | newry and armagh |
5420304 | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone | fermanagh and south tyrone |
5420314 | NaN | NaN | NaN | NaN | NaN | NaN | upper bann | upper bann | upper bann |
232 rows × 9 columns
cons = set(reference.minor_xs('parliamentary constituency').values.ravel('k'))
en_df= pd.DataFrame.from_dict({
con:
enrolment[:,reference.major_axis[(reference.minor_xs('parliamentary constituency')==con).any(axis=1)],:].sum().sum()
for con in cons
}).T.drop(np.nan)
av_df= pd.DataFrame.from_dict({
con:
available[:,reference.major_axis[(reference.minor_xs('parliamentary constituency')==con).any(axis=1)],'available places'].sum()
for con in cons
}).T.drop(np.nan)
cons_stats = gp.GeoDataFrame(geometry=cons_map.geometry)
for c in av_df:
cons_stats[f"av_{c}"] = av_df[c]
cons_stats[f"av_{c}_rat"] = (av_df[c]/en_df[c])
for c in en_df:
cons_stats[f"en_{c}"] = en_df[c]
cons_stats[f"en_{c}_pk"] = en_df[c]/cons_map["Area_sqkm"]
f,ax = plt.subplots(figsize=graph_figsize)
ax.set_title("Per Constituency Total Enrolments [2017]")
cons_stats.plot(column='en_2017',ax=ax, legend=True)
cons_stats.apply(lambda x: ax.annotate(s=x.name, xy=x.geometry.centroid.coords[0], ha='center', color='white'),axis=1);
f,ax = plt.subplots(figsize=graph_figsize)
ax.set_title("Per Constituency Total Enrolments per $km^2$[2017]")
cons_stats.plot(column='en_2017_pk',ax=ax, legend=True)
cons_stats.apply(lambda x: ax.annotate(s=x.name, xy=x.geometry.centroid.coords[0], ha='center', color='white'),axis=1);
cons_stats["en_2017_pk"].sort_values(ascending=False)
constituency belfast south 204.103113 belfast north 188.638143 belfast west 168.999917 belfast east 142.627717 foyle 50.799751 north down 49.887102 upper bann 20.446435 lagan valley 11.912162 newry and armagh 11.243427 strangford 10.671039 east antrim 10.611417 south down 8.404405 south antrim 7.097824 north antrim 6.583296 mid ulster 6.462817 east londonderry 5.307115 fermanagh and south tyrone 3.857524 west tyrone 3.768899 Name: en_2017_pk, dtype: float64
Gather data from other sources to put things in context;
How many people are in each region?
# This is very very slow, not my fault!
if not Path('data/cons_pop.csv').exists():
population_resource = "67c25586-b9aa-4717-9a4b-42de21a403f2"
cons_pop = pd.DataFrame.from_records(dataset_generator(population_resource))
cons_pop.set_index('_id', inplace=True)
cons_pop.to_csv('data/cons_pop.csv', index=False)
cons_pop = pd.read_csv('data/cons_pop.csv')
cons_pop['Mid_Year_Ending'] = cons_pop.Mid_Year_Ending.astype(int)
cons_pop['Population_Estimate'] = cons_pop.Population_Estimate.astype(float)
cons_pop['Age'] = cons_pop.Age.astype(int)
cons_pop.rename(columns={'Geo_Name':'constituency'}, inplace=True)
cons_pop['constituency']= cons_pop.constituency.str.strip().str.lower()
f,ax = plt.subplots(figsize=graph_figsize)
ax.set_title('Population Distribution (2016)')
cons_pop[(cons_pop.Gender == 'All People') & (cons_pop.Mid_Year_Ending == 2016)]\
.groupby('Age')['Population_Estimate'].sum().plot.line(ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a33a0e5c0>
f,ax = plt.subplots(figsize=graph_figsize)
ax.set_title('Constituency Population Distribution (2016)')
cons_pop[(cons_pop.Gender == 'All People') & (cons_pop.Mid_Year_Ending == 2016)]\
.groupby(['Age','constituency'])['Population_Estimate'].sum().unstack().plot.bar(stacked=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a33a55c88>
def idxquantile(s, q=0.5, *args, **kwargs):
qv = s.quantile(q, *args, **kwargs)
return (s.sort_values()[::-1] <= qv).idxmax()
f,ax = plt.subplots(figsize=graph_figsize)
ax.set_title('Cumulative Population Distribution (2016)')
pop_cumsum = cons_pop[(cons_pop.Gender == 'All People') & (cons_pop.Mid_Year_Ending == 2016)]\
.groupby('Age')['Population_Estimate'].sum().cumsum()
pop_cumsum.plot(ax=ax)
age_quartiles = [idxquantile(pop_cumsum, q) for q in [0.25,0.5,0.75]]
ax.vlines(age_quartiles, ymin=pop_cumsum.min(), ymax = pop_cumsum[age_quartiles])
age_quartiles
[22, 45, 67]
f,ax = plt.subplots(figsize=graph_figsize)
ax.set_title('Cumulative Population Distribution (2009-2016)')
yrs=sorted(cons_pop.Mid_Year_Ending.unique())
colors = plt.cm.jet(np.linspace(0,1,len(yrs)))
for yr,c in zip(yrs,colors):
pop_cumsum = cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.Mid_Year_Ending == yr)]\
.groupby('Age')['Population_Estimate'].sum().cumsum()
pop_cumsum.plot(ax=ax, label=yr, c=c)
age_quartiles = [idxquantile(pop_cumsum, q) for q in [0.25,0.5,0.75]]
ax.vlines(age_quartiles, ymin=pop_cumsum.min(), ymax = pop_cumsum[age_quartiles],color=c,alpha=0.2)
ax.legend()
<matplotlib.legend.Legend at 0x12a62ba20>
def constituency_age_profile(constituency=None):
# I have a hunch we might was to do this
# with lists....
if constituency is None:
constituency = cons_pop.constituency.unique()
label = f"Population profile for NI"
elif not isinstance(constituency, list):
label = f"Population profile for {constituency}"
constituency = [constituency]
else:
label = f"Population profile for {constituency}"
f,ax = plt.subplots(figsize=graph_figsize)
yrs=sorted(cons_pop.Mid_Year_Ending.unique())
colors = plt.cm.jet(np.linspace(0,1,len(yrs)))
for yr,c in zip(yrs,colors):
pop_cumsum = cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.constituency.isin(constituency))\
& (cons_pop.Mid_Year_Ending == yr)]\
.groupby('Age')['Population_Estimate'].sum().cumsum()
pop_cumsum.plot(ax=ax, label=yr, c=c)
age_quartiles = [idxquantile(pop_cumsum, q) for q in [0.25,0.5,0.75]]
ax.vlines(age_quartiles, ymin=pop_cumsum.min(), ymax = pop_cumsum[age_quartiles],color=c,alpha=0.2)
for i,age in enumerate(age_quartiles):
ax.annotate(f"<={age}", xy=(age, i))
ax.legend()
ax.set_title(label)
constituency_age_profile('belfast south')
constituency_age_profile('newry and armagh')
constituency_age_profile('foyle')
constituency_age_profile('north antrim')
constituency_age_profile('lagan valley')
df = cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.Mid_Year_Ending == 2016)]\
.groupby(['Age','constituency'])['Population_Estimate'].sum().unstack()
df.iplot()
f,ax = plt.subplots(figsize=graph_figsize)
sns.heatmap(df.T, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2fd8b828>
def colnorm(df):
return (df-df.min())/(df.max()-df.min())
colnorm(df).iplot()
f,ax = plt.subplots(figsize=graph_figsize)
sns.heatmap(colnorm(df).T, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2f4a2f98>
def year_age_profile(yr=2016):
constituencies = list(cons_pop.groupby("constituency")['Population_Estimate'].sum().sort_values(ascending=True).keys())
label = f"Population profile for NI in {yr}"
f,ax = plt.subplots(figsize=graph_figsize)
colors = plt.cm.jet(np.linspace(0,1,len(constituencies)))
for constituency,c in zip(constituencies,colors):
pop_cumsum = cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.constituency==constituency)\
& (cons_pop.Mid_Year_Ending == yr)]\
.groupby('Age')['Population_Estimate'].sum().cumsum()
pop_cumsum.plot(ax=ax, label=constituency, c=c)
age_quartiles = [idxquantile(pop_cumsum, q) for q in [0.25,0.5,0.75]]
ax.vlines(age_quartiles, ymin=pop_cumsum.min(), ymax = pop_cumsum[age_quartiles],color=c,alpha=0.2)
ax.legend()
ax.set_title(label)
year_age_profile()
cons_pop_qilted=pd.DataFrame.from_dict(
{
qtile:cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.Mid_Year_Ending == 2016) \
& (lotile<cons_pop.Age) &(cons_pop.Age<=qtile)
].groupby('constituency')['Population_Estimate'].sum()
for lotile, qtile in zip([-1]+age_quartiles, age_quartiles+[90])
}
)
cons_pop_qilted.idxmax(axis=1).sort_values()
constituency west tyrone 22 south down 22 south antrim 22 belfast west 22 east londonderry 22 newry and armagh 22 foyle 22 mid ulster 22 north antrim 45 belfast east 45 fermanagh and south tyrone 45 belfast south 45 belfast north 45 upper bann 45 lagan valley 45 east antrim 67 north down 67 strangford 67 dtype: int64
pop_years = cons_pop.Mid_Year_Ending.unique()
for c in en_df:
if c not in pop_years:
yr = pop_years[np.abs(pop_years-c).argmin()]
else:
yr=c
_cons_pop = cons_pop[(cons_pop.Gender != 'All People') \
& (cons_pop.Mid_Year_Ending == yr) \
].groupby(['constituency', 'Gender'])['Population_Estimate'].sum().unstack().rename(columns=lambda c: c.lower())
_cons_pop['total'] = _cons_pop.sum(axis=1)
_cons_pop['m_per_f'] = _cons_pop['males']/_cons_pop['females']
cons_pop_qilted=pd.DataFrame.from_dict(
{
qtile:cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.Mid_Year_Ending == yr) \
& (lotile<cons_pop.Age) &(cons_pop.Age<=qtile)
].groupby('constituency')['Population_Estimate'].sum()
for lotile, qtile in zip([-1]+age_quartiles, age_quartiles+[90])
}
)
cons_stats[f"pop_{c}"] = _cons_pop['total']
cons_stats[f"pop_{c}_males"] = _cons_pop['males']
cons_stats[f"pop_{c}_females"] = _cons_pop['females']
cons_stats[f"pop_{c}_m_per_f"] = _cons_pop['m_per_f']
cons_stats[f"topqt_{c}"]=cons_pop_qilted.idxmax(axis=1)
cons_stats[f"en_{c}_pc"]=cons_stats[f"en_{c}"]/cons_pop_qilted.sum(axis=1)
cons_pop_mean=cons_pop[(cons_pop.Gender.isin(["All People"])) & (cons_pop.Mid_Year_Ending == yr)]
cons_pop_mean['popprod'] = cons_pop_mean[['Age','Population_Estimate']].product(axis=1)
cons_pop_mean=cons_pop_mean.groupby('constituency')[['popprod','Population_Estimate']].sum(axis=0)
cons_stats[f'age_{c}_avg'] = cons_pop_mean['popprod']/cons_pop_mean['Population_Estimate']
try:
cons_stats[f"av_{c}_pc"]=cons_stats[f"av_{c}"]/cons_pop_qilted.sum(axis=1)
except:
print(f"No data for av in {c}")
/Users/andrew.bolster/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:30: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
No data for av in 2009
f,ax = plt.subplots(figsize=graph_figsize)
cons_stats.plot(column='age_2016_avg', legend=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a31170b38>
Who rules the roost
f,ax = plt.subplots(figsize=graph_figsize)
cons_stats.plot(column='topqt_2016', legend=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2ecb2f28>
f,ax = plt.subplots(figsize=graph_figsize)
cons_stats.plot(column='pop_2016_m_per_f', legend=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2fd8b240>
Lagan Valley and South Antrim popping up again!
f,ax = plt.subplots(figsize=graph_figsize)
cons_stats.plot(column='en_2016_pc', legend=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a2f654588>
f,ax = plt.subplots(figsize=graph_figsize)
cons_stats.plot(column='av_2016_pc', legend=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a31191908>
fsm_df= pd.DataFrame.from_dict({
con:
fsm[:,reference.major_axis[(reference.minor_xs('parliamentary constituency')==con).any(axis=1)],'fsme'].sum()
for con in cons
}).T.drop(np.nan)
for c in fsm_df:
cons_stats[f"fsm_{c}"] = fsm_df[c]
cons_stats[f"fsm_{c}_rat"] = (fsm_df[c]/en_df[c])
f,ax = plt.subplots(figsize=graph_figsize)
cons_stats.plot(column='fsm_2016_rat', legend=True, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x1a36881828>
unyearify = lambda c:'_'.join([c.split('_')[0]]+c.split('_')[2:])
cons_stats_2016 = cons_stats[[c for c in cons_stats.columns if '2016' in c]].rename(columns=unyearify)
sns.pairplot(data=cons_stats_2016, hue='topqt',
vars=['age_avg','fsm_rat','pop_m_per_f'])
<seaborn.axisgrid.PairGrid at 0x1a37ba0fd0>
import matplotlib.animation as animation
unyearify = lambda c:'_'.join([c.split('_')[0]]+c.split('_')[2:])
years = list(range(2009,2017))
explanations={
'age_avg': "Average Age",
'av_pc': "Available School Places per capita",
'av_rat': "Ratio of Available Places to Actualised Enrolment",
'av': "Available School Places",
'en_pc': "Enrolled Pupils per capita",
'en_pk': "Enrolled Pupils per $km^2$",
'en': "Enrolled Pupils",
'fsm_rat': "Ratio of FSM pupils to total pupils",
'fsm': "FSM pupils",
'pop': "Population",
'pop_females': "Population (F)",
'pop_males': "Population (M)",
'pop_m_per_f': "Ratio of Males to Females",
'topqt': "Most populous age quartile (Age or younger)"
}
for metric in set(map(unyearify, cons_stats.columns)):
for year in years:
try:
f, ax = plt.subplots(figsize=graph_figsize)
if metric in explanations:
ax.set_title(f"{explanations[metric]}:({year})")
else:
ax.set_title(f"{metric}:({year})")
cons_stats[[c for c in cons_stats.columns if str(year) in c]+['geometry']]\
.rename(columns=unyearify).plot(column=metric, legend=True, ax=ax)
ax.axis('off')
f.tight_layout()
p = Path(f"outputs/{metric}/{year}.png")
p.parent.mkdir(parents=True, exist_ok=True)
f.savefig(p)
plt.close(f)
fps = 12
except:
print(f"{metric}:{year} failed")
finally:
if f:
plt.close(f)
age_avg:2016 failed
av:2009 failed
av_pc:2009 failed
geometry:2009 failed
geometry:2010 failed
geometry:2011 failed
geometry:2012 failed
geometry:2013 failed
geometry:2014 failed
geometry:2015 failed
geometry:2016 failed
av_rat:2009 failed
import moviepy.editor as mpy
import moviepy.video as mpyv
clips = {}
for p in Path('outputs/').iterdir():
if not p.is_dir():
continue
metric = p.parts[-1]
file_list = [str(_p) for _p in sorted(filter(lambda s:s.suffix=='.png', p.iterdir()))]
clip = mpy.ImageSequenceClip(file_list, fps=1.5)
clip = mpyv.fx.all.freeze(clip, t='end', freeze_duration=2)
clip.write_gif(f"outputs/{metric}.gif")
[Path(f).unlink() for f in file_list]
p.rmdir()
clips[metric]=clip
[MoviePy] Building file outputs/Population Distribution.gif with imageio
92%|█████████▏| 11/12 [00:01<00:00, 10.61it/s]
10% fewer men than women in West Belfast, ~2% more men than women out west
There just aren't any where people want to send their kids...
Could possibly indicate over-provisioning in yellow areas...
clips.keys()
dict_keys(['av_pc', 'pop', 'pop_males', 'av', 'en_pk', 'en_pc', 'age_avg', 'fsm', 'pop_females', 'fsm_rat', 'pop_m_per_f', 'av_rat', 'en', 'topqt'])
def batch(iterable, n=1):
"""
Split an iterable into n-length batches
:param iterable:
:param n:
:return:
"""
parent_length = len(iterable)
for ndx in range(0, parent_length, n):
yield iterable[ndx:min(ndx + n, parent_length)]
ex_list = ['pop_males','pop_females']
all_gifs = [v for k,v in clips.items() if k not in ex_list]
mpy.clips_array(list(batch(all_gifs,4))
).write_gif('outputs/all.gif')
[MoviePy] Building file outputs/all.gif with imageio
92%|█████████▏| 11/12 [00:08<00:00, 1.25it/s]
Andrew Bolster
metric='Population Distribution'
for year in years:
try:
f, ax = plt.subplots(figsize=graph_figsize)
if metric in explanations:
ax.set_title(f"{explanations[metric]}:({year})")
else:
ax.set_title(f"{metric}:({year})")
df = cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.Mid_Year_Ending == year)]\
.groupby(['Age','constituency'])['Population_Estimate']\
.sum().unstack()
sns.heatmap(df.T, ax=ax)
f.tight_layout()
p = Path(f"outputs/{metric}/{year}.png")
p.parent.mkdir(parents=True, exist_ok=True)
f.savefig(p)
plt.close(f)
fps = 12
except:
print(f"{metric}:{year} failed")
finally:
if f:
plt.close(f)
metric='Population Distribution (normed)'
for year in years:
try:
f, ax = plt.subplots(figsize=graph_figsize)
if metric in explanations:
ax.set_title(f"{explanations[metric]}:({year})")
else:
ax.set_title(f"{metric}:({year})")
df = cons_pop[(cons_pop.Gender == 'All People') \
& (cons_pop.Mid_Year_Ending == year)]\
.groupby(['Age','constituency'])['Population_Estimate']\
.sum().unstack()
sns.heatmap(colnorm(df.T), ax=ax)
f.tight_layout()
p = Path(f"outputs/{metric}/{year}.png")
p.parent.mkdir(parents=True, exist_ok=True)
f.savefig(p)
plt.close(f)
fps = 12
except:
print(f"{metric}:{year} failed")
finally:
if f:
plt.close(f)
import moviepy.editor as mpy
import moviepy.video as mpyv
clips = {}
for p in Path('outputs/').iterdir():
if not p.is_dir():
continue
metric = p.parts[-1]
file_list = [str(_p) for _p in sorted(filter(lambda s:s.suffix=='.png', p.iterdir()))]
clip = mpy.ImageSequenceClip(file_list, fps=1.5)
clip = mpyv.fx.all.freeze(clip, t='end', freeze_duration=2)
clip.write_gif(f"outputs/{metric}.gif")
[Path(f).unlink() for f in file_list]
p.rmdir()
clips[metric]=clip
[MoviePy] Building file outputs/Population Distribution.gif with imageio
92%|█████████▏| 11/12 [00:01<00:00, 10.36it/s]
[MoviePy] Building file outputs/Population Distribution (normed).gif with imageio
92%|█████████▏| 11/12 [00:01<00:00, 10.34it/s]
https://www.opendatani.gov.uk/dataset/ni-housing-stock
https://www.education-ni.gov.uk/sites/default/files/publications/de/the-code-of-practice.pdf
~ hundred page doc to work out that 5 is 'confirmed and fully stated special educational needs. Everthing less is 'suspected and in need of advice and support'
"Controlled and Maintained schools have [a budget] for this" "Grant maintained integrated schools also receive this through their annual budget"