In the last year several worm hotels popped up in Amsterdam. This grassroot initiative lets the neighbourhood turn green waste into compost.
Using open data from the municipality of Amsterdam, I will see what I can find out about these worm hotels. I will dig into the following:
Use a routing machine to calculate the shortest walking duration and distance to the nearest worm hotel for all addresses in Amsterdam. Keep only residences (and throw out companies, schools, museums and the like).
Create a function which takes in an address or coordinates, and returns the walking duration to the nearest worm hotel, and all information about the nearest worm hotel.
Then I can do things like:
Calculate the percentage of residences within a 5 minute walk of a worm hotel, show histogram.
Score neighbourhoods and boroughs on reachability. Take into account the population per neigbourhood. Show a choropleth with the scores.
The OSRM lookup could be much faster with duplicate detection. Pandas can then fill down the missing values.
I would like to turn the data part into an API and run a web front end on this, to lookup addresses.
I'd like to create a matrix optimization function that returns the perfect placement for a new hotel (probably inside a designated area). I think I need to cover the area in random points, and pick the one with the greatest walking distance to all nearby wormhotels.
import numpy as np
import pandas as pd
import requests
import folium
from folium.plugins import HeatMap
import geojson
import seaborn as sns
from decimal import Decimal, ROUND_HALF_UP
# set interactivity to 'all' so I can easily print more than 1 outputs in 1 cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# Supress 'SettingWithCopyWarning'
pd.options.mode.chained_assignment = None # default='warn'
# Load afvalcontainers from Amsterdam site
df_buurtcompost_import = pd.read_csv('http://maps.amsterdam.nl/open_geodata/excel.php?KAARTLAAG=BUURTCOMPOST&THEMA=buurtcompost', encoding='latin-1', sep=";")
df_buurtcompost = df_buurtcompost_import
# Clean up the coordinates column. I need a string formatted a coord string
# with decimals points, comma seperated, formatted as '{lng},{lat}'.
df_buurtcompost = df_buurtcompost_import
df_buurtcompost['locatie_cleaned'] = df_buurtcompost_import['COORDS'].str.replace('POINT\(','').str.replace('\)','')
# Create a function to create lng and lat float columns from 'location_cleaned' column
def create_lng_lat_columns ( df ) :
df['lng'] = df['LNG'].str.replace(',','.').astype(float)
df['lat'] = df['LAT'].str.replace(',','.').astype(float)
return df
df_buurtcompost = create_lng_lat_columns(df_buurtcompost)
df_buurtcompost.head()
OBJECTNUMMER | Straatnaam | Initiatiefnemer | Aantal_bewoners | Soort_afval | Gebruik | Startjaar | Foto | COORDS | LNG | LAT | Unnamed: 12 | locatie_cleaned | lng | lat | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | Frans Halsstraat | Peter Jan Brouwer | amsterdam+C001@wormenhotel.nl | 20 | GF | Toestemming van initiatiefnemer | 2015 | frans hals .jpeg | POINT(4.888706,52.356007) | 4,888706 | 52,356007 | NaN | 4.888706,52.356007 | 4.888706 | 52.356007 |
1 | 5 | hoek saffierstraat jozef israelkade | christien & anneke | amsterdam+c002@wormenhotel.nl | 7 | GF | Toestemming van initiatiefnemer | 2016 | NaN | POINT(4.908527,52.349857) | 4,908527 | 52,349857 | NaN | 4.908527,52.349857 | 4.908527 | 52.349857 |
2 | 6 | zaanstraat t/o 300 | soeptuinen bredius | www.soeptuinen.nl | 0 | GFT | Vrij inleveren | 2016 | NaN | POINT(4.871422,52.391292) | 4,871422 | 52,391292 | NaN | 4.871422,52.391292 | 4.871422 | 52.391292 |
3 | 7 | kramatweg 51 | oost indisch groen | info@oostindischgroen.nl | 0 | GFT | Vrij inleveren | 2016 | NaN | POINT(4.945789,52.36213) | 4,945789 | 52,36213 | NaN | 4.945789,52.36213 | 4.945789 | 52.362130 |
4 | 8 | IJplein | buurtbak voedseltuin | ireen@balkonton.nl | 6 | GF | Toestemming van initiatiefnemer | 2016 | NaN | POINT(4.910984,52.382154) | 4,910984 | 52,382154 | NaN | 4.910984,52.382154 | 4.910984 | 52.382154 |
Let's do some mapping! I'll use Folium to plot all hotels, and also plot the approximate location to myet house in Amsterdam.
# Temporarily turn of showing all output for folium mapping, needs to be in seperate cell
InteractiveShell.ast_node_interactivity = "last_expr"
buurtcompostlocaties_markers = list(zip(df_buurtcompost.lat.values, df_buurtcompost.lng.values, df_buurtcompost.Straatnaam))
hmap = folium.Map(location=[52.3824856,4.877951499999995], zoom_start=12, tiles='stamentoner',)
feature_group = folium.FeatureGroup("Locations")
for lat, lon, name in buurtcompostlocaties_markers:
feature_group.add_child(folium.Marker(location=[lat,lon],popup=name))
my_house = folium.FeatureGroup("Locations")
my_house.add_child(folium.Marker([52.381994, 4.876480], popup="My house (approx.)", icon=folium.Icon(color='red',icon='info-sign')))
hmap.add_child(my_house)
hmap.add_child(feature_group)
hmap.save('01-locations.html')
hmap
I can see a worm hotel is near the house (it's so closeby the red marker is obscured by it).
First, let's see the how much over Amsterdam is covered within a 5 minute walk from the worm hotels by looking at some isochrones.
There is no way to plot Isochrones with OSRM, so I'll use the Openrouteservice instead. It only allows 5 calls at a time for their free tier, so this will be a bit clunky.
InteractiveShell.ast_node_interactivity = "all" # Turn on all output again
# Create array with coords in format [['lat,lng'], ['lat,lng']]
buurtcompostlocaties_arrays = list(df_buurtcompost.locatie_cleaned)
# Divide length by 5 to see how many calls I need to make to openrouteservice.org. Shows 7.8, so 8 calls.
len(buurtcompostlocaties_arrays) / 5
# Put groups of 3-5 coordinates together in an array for building the API request URLs
buurtcompostlocaties_arrays_5 = [buurtcompostlocaties_arrays[i:i+5] for i in range(0, len(buurtcompostlocaties_arrays), 5)]
# print('Calls I need to make to openrouteservice API: ' + str(len(buurtcompostlocaties_arrays_5))) # Shows 8
7.8
# Create a function to create the GeoJSON files with isochrones
def create_isochrone_json(range_seconds, filename):
isochrone_features = []
for i in range(len(buurtcompostlocaties_arrays_5)):
coords = '|'.join(buurtcompostlocaties_arrays_5[i])
requestURL = 'https://private-anon-e7e15c7342-openrouteservice.apiary-proxy.com/isochrones' + \
'?api_key=58d904a497c67e00015b45fc892c214578cd4fae9ea5f36e23e8cb2f&profile=foot-walking' + \
'&range=' + range_seconds + '&location_type=destination&locations=' + coords
features = requests.get(requestURL).json()['features']
isochrone_features.append(features)
isochrone_featurecollection = {
"type": "FeatureCollection",
"features": []
}
for i in range(len(isochrone_features)):
isochrone_featurecollection['features'].extend(isochrone_features[i])
with open(filename, 'w') as file:
geojson.dump(isochrone_featurecollection, file, indent=4, sort_keys=False)
# Create isochrone GeoJSON
create_isochrone_json('300', 'data/isochrone_featurecollection_300.geojson')
create_isochrone_json('600', 'data/isochrone_featurecollection_600.geojson')
InteractiveShell.ast_node_interactivity = "last_expr"
buurtcompostlocaties_markers = list(zip(df_buurtcompost.lat.values, df_buurtcompost.lng.values, df_buurtcompost.Straatnaam))
hmap = folium.Map(location=[52.3824856,4.877951499999995], zoom_start=12, tiles='stamentoner',)
feature_group = folium.FeatureGroup("Locations")
for lat, lon, name in buurtcompostlocaties_markers:
feature_group.add_child(folium.Marker(location=[lat,lon],popup=name))
hmap.add_child(feature_group)
folium.GeoJson(open("data/isochrone_featurecollection_300.geojson",encoding = "utf-8-sig").read(), name='geojson').add_to(hmap)
# folium.GeoJson(open("data/isochrone_featurecollection_600.geojson",encoding = "utf-8-sig").read(), name='geojson').add_to(hmap)
hmap.save('02-locations-isochrones.html')
hmap
InteractiveShell.ast_node_interactivity = "all" # Turn on all output again
Some neighbourhoods have decent coverage. It's notable that the city center does not have any hotels. Let's look at the data on all addresses in Amsterdam!
# Loading from disk because file is 17mb, loading from site takes too long
df_adressen = pd.read_csv('data/export_20180227_122333.zip', encoding='latin-1', sep=";")
# Convert lat and long values to float
df_adressen['lng'] = df_adressen['Longitude (WGS84)'].str.replace(',', '.').astype(float)
df_adressen['lat'] = df_adressen['Latitude (WGS84)'].str.replace(',', '.').astype(float)
# A few rows have NaN as longtitude or lattitude, let's remove those
df_adressen = df_adressen[pd.notnull(df_adressen['lng'])]
df_adressen = df_adressen[pd.notnull(df_adressen['lat'])]
# Now combine the coordinates into a string
df_adressen['locatie_cleaned'] = df_adressen['lng'].map(str) + "," + df_adressen['lat'].map(str)
df_adressen.head()
df_adressen['locatie_cleaned'].describe()
Naam openbare ruimte | Huisnummer | Huisletter | Huisnummertoevoeging | Postcode | Woonplaats | Naam stadsdeel | Code stadsdeel | Naam gebiedsgerichtwerkengebied | Code gebiedsgerichtwerkengebied | ... | Verblijfsobjectstatus | Openbareruimte-identificatie | Pandidentificatie | Verblijfsobjectidentificatie | Ligplaatsidentificatie | Standplaatsidentificatie | Nummeraanduidingidentificatie | lng | lat | locatie_cleaned | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Eerste Constantijn Huygensstraat | 15 | NaN | 1 | 1054BP | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | Verblijfsobject in gebruik | 363300000002537 | ['0363100012159829'] | 3.630100e+14 | NaN | NaN | 363200000007060 | 4.874401 | 52.364906 | 4.8744013,52.3649059 |
1 | Eerste Helmersstraat | 188 | NaN | 3 | 1054EL | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | Verblijfsobject in gebruik | 363300000002541 | ['0363100012120031'] | 3.630100e+14 | NaN | NaN | 363200000007846 | 4.864202 | 52.360826 | 4.8642015,52.3608257 |
2 | Eerste Anjeliersdwarsstraat | 1 | NaN | 2 | 1015NR | NaN | Centrum | A | Centrum-West | DX01 | ... | Verblijfsobject in gebruik | 363300000002529 | ['0363100012174784'] | 3.630100e+14 | NaN | NaN | 363200000006112 | 4.883470 | 52.378220 | 4.8834704,52.3782196 |
3 | Eerste Constantijn Huygensstraat | 19 | NaN | 2 | 1054BP | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | Verblijfsobject in gebruik | 363300000002537 | ['0363100012157570'] | 3.630100e+14 | NaN | NaN | 363200000007066 | 4.874329 | 52.364798 | 4.8743292,52.3647977 |
4 | Eerste Helmersstraat | 190 | NaN | 3 | 1054EL | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | Verblijfsobject in gebruik | 363300000002541 | ['0363100012120432'] | 3.630100e+14 | NaN | NaN | 363200000007852 | 4.864099 | 52.360816 | 4.8640988,52.3608163 |
5 rows × 34 columns
count 510436 unique 314880 top 4.8507211,52.3574849 freq 145 Name: locatie_cleaned, dtype: object
Let's calculate who lives inside the isochrones I calculated earlier. Using the even–odd rule I can determine if a point is inside a polygon or not.
# Put geojson in variable
with open('data/isochrone_featurecollection_300.geojson') as geojson_file:
isochrones_300 = geojson.load(geojson_file)
# Create even-odd rule function, see https://en.wikipedia.org/wiki/Even–odd_rule
def is_point_in_path(x, y, poly):
"""
x, y -- x and y coordinates of point
poly -- a list of tuples [(x, y), (x, y), ...]
"""
num = len(poly)
i = 0
j = num - 1
c = False
for i in range(num):
if ((poly[i][1] > y) != (poly[j][1] > y)) and \
(x < poly[i][0] + (poly[j][0] - poly[i][0]) * (y - poly[i][1]) /
(poly[j][1] - poly[i][1])):
c = not c
j = i
return c
# Create function that outputs an Array with the names of the nearest worm hotel, or 'Not found'
def is_point_in_isochrones(isochrone_geojson,lng,lat):
geos = []
for i in range(len(isochrone_geojson['features'])):
geos.append(list(geojson.utils.coords(isochrone_geojson['features'][i])))
coordinates = []
for i in range(len(geos)):
if is_point_in_path(lng,lat,geos[i]):
coordinates.append(df_buurtcompost.Straatnaam.iloc[i])
return coordinates or ["Not found"]
# Create a function with a single argument I can apply to the pandas dataframe
def is_point_in_isochrones_300(locatie_cleaned):
isochrone_geojson = isochrones_300
array_string = locatie_cleaned.split(',')
lng = float(array_string[0])
lat = float(array_string[1])
return is_point_in_isochrones(isochrone_geojson, lng, lat)
is_point_in_isochrones(isochrones_300, 4.876480, 52.381994) # Near my house
is_point_in_isochrones(isochrones_300, 4.88718,52.357117) # Address which is within 3 polygons
is_point_in_isochrones_300(df_adressen.locatie_cleaned[4433]) # Testing the function to use with Pandas
['hoek jacob catskade - de wittenstraat']
['Frans Halsstraat', 'daniel stalpertstraat', 'quellijnstraat']
['Frans Halsstraat', 'sarphatipark/sweelinckstraat']
# Calculate this for all residences. Warning: this will take a while (10 minutes on my computer)
df_adressen['wormhotel_within_5_minutes'] = df_adressen['locatie_cleaned'].apply(
np.vectorize(is_point_in_isochrones_300))
df_adressen.head()
Naam openbare ruimte | Huisnummer | Huisletter | Huisnummertoevoeging | Postcode | Woonplaats | Naam stadsdeel | Code stadsdeel | Naam gebiedsgerichtwerkengebied | Code gebiedsgerichtwerkengebied | ... | Openbareruimte-identificatie | Pandidentificatie | Verblijfsobjectidentificatie | Ligplaatsidentificatie | Standplaatsidentificatie | Nummeraanduidingidentificatie | lng | lat | locatie_cleaned | wormhotel_within_5_minutes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Eerste Constantijn Huygensstraat | 15 | NaN | 1 | 1054BP | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | 363300000002537 | ['0363100012159829'] | 3.630100e+14 | NaN | NaN | 363200000007060 | 4.874401 | 52.364906 | 4.8744013,52.3649059 | [Not found] |
1 | Eerste Helmersstraat | 188 | NaN | 3 | 1054EL | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | 363300000002541 | ['0363100012120031'] | 3.630100e+14 | NaN | NaN | 363200000007846 | 4.864202 | 52.360826 | 4.8642015,52.3608257 | [Not found] |
2 | Eerste Anjeliersdwarsstraat | 1 | NaN | 2 | 1015NR | NaN | Centrum | A | Centrum-West | DX01 | ... | 363300000002529 | ['0363100012174784'] | 3.630100e+14 | NaN | NaN | 363200000006112 | 4.883470 | 52.378220 | 4.8834704,52.3782196 | [Not found] |
3 | Eerste Constantijn Huygensstraat | 19 | NaN | 2 | 1054BP | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | 363300000002537 | ['0363100012157570'] | 3.630100e+14 | NaN | NaN | 363200000007066 | 4.874329 | 52.364798 | 4.8743292,52.3647977 | [Not found] |
4 | Eerste Helmersstraat | 190 | NaN | 3 | 1054EL | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | 363300000002541 | ['0363100012120432'] | 3.630100e+14 | NaN | NaN | 363200000007852 | 4.864099 | 52.360816 | 4.8640988,52.3608163 | [Not found] |
5 rows × 35 columns
# Also create a true/false column for easy plotting
def wormhotel_within_5_minutes_true_false(array):
if array[0] == "Not found":
return False
else:
return True
df_adressen['is_wormhotel_within_5_minutes'] = df_adressen['wormhotel_within_5_minutes'].apply(wormhotel_within_5_minutes_true_false)
We're only interested in residences (addresses where people live). Let's select those.
df_adressen_woning = df_adressen[df_adressen['Feitelijk gebruik'].str.contains("woning", na=False)]
len(df_adressen_woning)
df_adressen_woning.is_wormhotel_within_5_minutes.value_counts()
p = sns.countplot(data=df_adressen_woning, x = 'is_wormhotel_within_5_minutes')
438028
False 334937 True 103091 Name: is_wormhotel_within_5_minutes, dtype: int64
This is quite surprising! 24% of all residence addresses are within a 5 minute walk of a worm hotel.
Seeing which addresses are within a 5 minute walk from a worm hotel is a nice start, but let's go deeper! I can calculate how far every address needs to walk to the nearest worm hotel. If I know this, I can be much more precise in
To get walking distances for all addresses, I need a routing service. I installed the excellent OSRM project locally with docker via http://router.project-osrm.org/ and used the latest information for The Netherlands via http://download.geofabrik.de.
OSRM lets you create a matrix that shows the walking duration for a series of points. I can create a 1 by many matrix for each address, and pick out the worm hotel with the shortest walking duration. The OSMR doesn't return distance, but since walking speed is set at 5 km/h by default and walking penalties are only given for sandy terrain and the like, I can reasonably calculate the distance in meters from the duration information.
I'll first explore how to get the right information from the API, and then create a function which picks the nearest worm hotel for an address, and also prints the walking duration in minutes and distance in meters.
close_to_my_address = "4.877951499999995,52.3824856"
close_to_my_address = "4.876480,52.381994"
buurtcompostlocaties = df_buurtcompost.locatie_cleaned.values
buurtcompostlocaties_string = ";".join(buurtcompostlocaties)
profile = "foot"
# URL format for a htpp API call for getting a matrix is http://0.0.0.0:5000/table/v1/walking/
# 52.3548300687;4.7856975691585,52.349415837928;4.7941351619763,52.357223885252?sources=0
# 'sources=0' indicates that I only want the distances from the first set of coordinates
# to all the others. As a test I construct a URL that gives me the walking duration from my
# address to all the worm hotels.
testURL_wormhotel_distance_from_my_address = "http://0.0.0.0:5000/table/v1/" + profile + "/" \
+ close_to_my_address + ";" \
+ buurtcompostlocaties_string \
+ "?sources=0"
print("Example of URL to send the API call with: " + testURL_wormhotel_distance_from_my_address)
# Let's run a request and only select the duration array
distances_from_testURL = requests.get(testURL_wormhotel_distance_from_my_address).json()["durations"][0]
# This gives us a durations array. The first is 0 and the duration to my own address, so I can remove that:
distances_from_testURL.pop(0)
print("All distances I get from the test url: " + str(distances_from_testURL))
# Give us the duration and the index of the wormhotel that's closest
print("Index of the wormhotel in the wormhotel dataset: " + str(distances_from_testURL.index(min(distances_from_testURL))))
print("Seconds I have to walk to the nearest hotel: " + str(min(distances_from_testURL)))
# Let's find out where the nearest wormhotel to my house is! 🎉
df_buurtcompost.iloc[33]
Example of URL to send the API call with: http://0.0.0.0:5000/table/v1/foot/4.876480,52.381994;4.888706,52.356007;4.908527,52.349857;4.871422,52.391292;4.945789,52.36213;4.910984,52.382154;4.864346,52.3687925;4.926176,52.3370046;4.971355,52.37154;5.004663,52.345872;4.861766,52.376636;4.923331,52.366302;4.9177,52.351515;4.887118,52.356278;4.887075,52.35694;5.000672,52.353723;4.9758081,52.3612775;4.9457027,52.3681734;4.9939396,52.3572812;4.9520167,52.3340925;4.877403,52.313325;4.9121,52.3511739;4.8247244,52.3604001;4.8815228,52.3514622;4.9442745,52.3920048;4.9464001,52.3773338;4.8999979,52.4053634;4.9392225,52.3594959;4.7711392,52.3800813;4.9461426,52.3742948;4.8538532,52.3676329;4.8515358,52.2839112;4.9365725,52.3703089;4.8761048,52.3900331;4.8761638,52.3813059;4.8487945,52.3403283;4.8980131,52.355381;4.9840798,52.2981711;4.9370124,52.3571305;4.9071004,52.3491685?sources=0
0
All distances I get from the test url: [2602, 3748.6, 1119.6, 4334.4, 2271.9, 1586.6, 5376.6, 7132.4, 8087.9, 1292, 3097.4, 4021.7, 2553.4, 2505, 8017.2, 5816.8, 4204.5, 8434.5, 6620.2, 7349.9, 3681.9, 3936.7, 2952.7, 4164.8, 4094.8, 4238.1, 4239.7, 5938.3, 3923.9, 2215.6, 9138.1, 3642.4, 891.9, 59.4, 4627.8, 2997.8, 10855.8, 4395.8, 3726.4] Index of the wormhotel in the wormhotel dataset: 33 Seconds I have to walk to the nearest hotel: 59.4
OBJECTNUMMER 42 Straatnaam hoek jacob catskade - de wittenstraat Initiatiefnemer hilde Email amsterdam+a209@wormenhotel.nl Aantal_bewoners 15 Soort_afval GF Gebruik Toestemming van initiatiefnemer Startjaar 2017 Foto NaN COORDS POINT(4.8761638,52.3813059) LNG 4,8761638 LAT 52,3813059 Unnamed: 12 NaN locatie_cleaned 4.8761638,52.3813059 lng 4.87616 lat 52.3813 Name: 33, dtype: object
Turns out I only have to walk a minute to the nearest hotel!
Let's wrap this up in a function so I can encode the whole dataset with this information.
buurtcompostlocaties = df_buurtcompost.locatie_cleaned.values
buurtcompostlocaties_string = ";".join(buurtcompostlocaties)
len(buurtcompostlocaties)
def compute_nearest_target_information(cell):
current_coordinates = cell
profile = "foot"
requestURL = "http://0.0.0.0:5000/table/v1/" + profile + "/" \
+ current_coordinates + ";" \
+ buurtcompostlocaties_string \
+ "?sources=0"
distances = requests.get(requestURL).json()["durations"][0]
distances.pop(0)
seconds = min(distances)
meters = seconds * 1.388888889
index = distances.index(min(distances))
buurtcompost_values = df_buurtcompost.iloc[index].values
return (seconds, meters, index, buurtcompost_values)
39
# Test output with my and someone else's house (or rather near it)
compute_nearest_target_information("4.877951499999995,52.3824856")
compute_nearest_target_information("4.8796727,52.3712739")
(162.2, 225.27777779579998, 33, array([42, 'hoek jacob catskade - de wittenstraat', 'hilde', 'amsterdam+a209@wormenhotel.nl', 15, 'GF', 'Toestemming van initiatiefnemer', 2017, nan, 'POINT(4.8761638,52.3813059)', '4,8761638', '52,3813059', nan, '4.8761638,52.3813059', 4.8761637999999996, 52.381305900000001], dtype=object))
(949.3, 1318.4722223277, 5, array([9, 'schimmelstraat 44', 'stadsboerderij zimmerhoeve, annelijn van amsterdam', 'info@zimmerhoeve.nl', 0, 'GF', 'Toestemming van initiatiefnemer', 0, nan, 'POINT(4.864346,52.3687925)', '4,864346', '52,3687925', nan, '4.864346,52.3687925', 4.8643460000000003, 52.368792499999998], dtype=object))
Success!
Neat! Now I can apply this function to the addresses in Amsterdam. First, lets apply this to small amount of people and also time it, so I can see how long our computer needs to work on the whole dataset.
Note: Because this takes a while, I ran this seperately and wrote the output to a CSV, which I'll import here instead.
# %%time
# df_adressen.is_copy = False
# df_adressen['Nearest wormhotel information'] = df_adressen['locatie_cleaned'].apply(
# np.vectorize(compute_nearest_target_information))
I now need to split the output from the function over new columns in the dataset. I also add the information from the nearest hotel as new columns to the dataset.
# # Let's split it! Value inside df_adressen['Nearest wormhotel information'] are
# # 'seconds, meters, index, buurtcompost_values', and buurtcompostvalues are
# # 'OBJECTNUMMER', 'Straatnaam', 'Initiatiefnemer', 'Email', 'Aantal_bewoners', 'Soort_afval',
# # 'Gebruik', 'Startjaar', 'Foto', 'COORDS', 'LNG', 'LAT', 'Unnamed: 12', 'locatie_cleaned''
# df_adressen['nearest_hotel_duration'], df_adressen['nearest_hotel_meters'], df_adressen[
# 'nearest_hotel_index'], df_adressen['nearest_hotel_values'] = zip(*df_adressen['Nearest wormhotel information'])
# df_adressen['hotel_OBJECTNUMMER'], df_adressen['hotel_Straatnaam'], df_adressen['hotel_Initiatiefnemer'], \
# df_adressen['hotel_Email'], df_adressen['hotel_Aantal_bewoners'], df_adressen['hotel_Soort_afval', ], \
# df_adressen['hotel_Gebruik'], df_adressen['hotel_Startjaar'], df_adressen['hotel_Foto'], \
# df_adressen['hotel_COORDS'], df_adressen['hotel_LNG'], df_adressen['hotel_LAT'], df_adressen['hotel_Unnamed: 12'], \
# df_adressen['hotel_locatie_cleaned'], df_adressen['hotel_lng'], df_adressen['hotel_lat'] = zip(*df_adressen['nearest_hotel_values'])
# df_adressen.to_csv('data/df_adressen.csv')
# Using zip to bypass github's filesize limit of 100mb
df_adressen = pd.read_csv('data/df_adressen_schoon.zip')
df_adressen_woning = df_adressen[df_adressen['Feitelijk gebruik'].str.contains("woning", na=False)]
df_adressen_woning.tail()
Naam openbare ruimte | Huisnummer | Huisletter | Huisnummertoevoeging | Postcode | Woonplaats | Naam stadsdeel | Code stadsdeel | Naam gebiedsgerichtwerkengebied | Code gebiedsgerichtwerkengebied | ... | hotel_Foto | hotel_COORDS | hotel_LNG | hotel_LAT | hotel_Unnamed: 12 | hotel_locatie_cleaned | hotel_lng | hotel_lat | wormhotel_within_5_minutes | is_wormhotel_within_5_minutes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
509940 | Goudsbloemstraat | 91 | NaN | 1 | 1015JK | NaN | Centrum | A | Centrum-West | DX01 | ... | NaN | POINT(4.8761638,52.3813059) | 4,8761638 | 52,3813059 | NaN | 4.8761638,52.3813059 | 4.876164 | 52.381306 | ['Not found'] | False |
510067 | John Franklinstraat | 69 | NaN | H | 1056TA | NaN | West | E | Oud West / De Baarsjes | DX05 | ... | magalhaens.jpeg | POINT(4.8538532,52.3676329) | 4,8538532 | 52,3676329 | NaN | 4.8538532,52.3676329 | 4.853853 | 52.367633 | ['Not found'] | False |
510117 | Retiefstraat | 22 | A | NaN | 1092XD | NaN | Oost | M | Oud-Oost | DX13 | ... | NaN | POINT(4.9177,52.351515) | 4,9177 | 52,351515 | NaN | 4.9177,52.351515 | 4.917700 | 52.351515 | ['Not found'] | False |
510188 | Tweede Atjehstraat | 40 | NaN | 4 | 1094LH | NaN | Oost | M | Indische Buurt / Oostelijk Havengebied | DX14 | ... | halma.jpeg | POINT(4.9392225,52.3594959) | 4,9392225 | 52,3594959 | NaN | 4.9392225,52.3594959 | 4.939222 | 52.359496 | ['Not found'] | False |
510354 | De Lairessestraat | 13 | A | NaN | 1071NR | NaN | Zuid | K | Oud Zuid | DX10 | ... | hartmonie.jpg | POINT(4.8815228,52.3514622) | 4,8815228 | 52,3514622 | NaN | 4.8815228,52.3514622 | 4.881523 | 52.351462 | ['Not found'] | False |
5 rows × 57 columns
The address dataset is complete!
Now I can build a simple function which queries the dataset for a streetname and number, and returns information about the nearest worm hotel:
def lookup_nearest_worm_hotel(straatnaam, nummer):
index = 0
index = df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].nearest_hotel_index
if len(index) == 0:
return "Nothing found"
else:
duration = df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].nearest_hotel_duration.iloc[0]
hotel_name = df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].hotel_Straatnaam.iloc[0]
contact = df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].hotel_Email.iloc[0]
hotel_latlong = str(df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].hotel_lat.iloc[0]) + "," + str(df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].hotel_lng.iloc[0])
adres_latlong = str(df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].lat.iloc[0]) + "," + str(df_adressen_woning.loc[(df_adressen_woning['Naam openbare ruimte'] == straatnaam) & (df_adressen_woning['Huisnummer'] == nummer)].lng.iloc[0])
return "Nearest Worm hotel name: " + hotel_name + ", Walking duration: " + str(round(duration / 60,2)) + " minutes. Contact " + str(contact) + " for more information. Openrouteservice.org URL: https://maps.openrouteservice.org/directions?a=" + str(adres_latlong) + "," + str(hotel_latlong) + "&b=2&c=0&g1=-1&g2=0&k1=en-US&k2=km"
# return "Nearest Worm hotel name: " + hotel_name + ", Walking duration: " + str(round(duration / 60,2)) + " minutes. Contact " + str(contact) + " for more information. Google maps URL: http://maps.google.com/maps?q=" + str(hotel_latlong) + " ."
lookup_nearest_worm_hotel("De Wittenstraat",174)
lookup_nearest_worm_hotel("Lauriergracht",112)
lookup_nearest_worm_hotel("The Wittenstraat",172) # Intentionally incorrect streetname to test error handling
lookup_nearest_worm_hotel("Groenburgwal", 1)
'Nearest Worm hotel name: hoek jacob catskade - de wittenstraat, Walking duration: 0.82 minutes. Contact amsterdam+a209@wormenhotel.nl for more information. Openrouteservice.org URL: https://maps.openrouteservice.org/directions?a=52.3818921,4.8763493,52.3813059,4.8761638&b=2&c=0&g1=-1&g2=0&k1=en-US&k2=km'
'Nearest Worm hotel name: schimmelstraat 44, Walking duration: 15.83 minutes. Contact info@zimmerhoeve.nl for more information. Openrouteservice.org URL: https://maps.openrouteservice.org/directions?a=52.3711914,4.8793065,52.3687925,4.864346&b=2&c=0&g1=-1&g2=0&k1=en-US&k2=km'
'Nothing found'
'Nearest Worm hotel name: sarphatipark/sweelinckstraat, Walking duration: 23.79 minutes. Contact wormenhotelsweelinck@gmail.com for more information. Openrouteservice.org URL: https://maps.openrouteservice.org/directions?a=52.3695584,4.8991772,52.355381,4.8980131&b=2&c=0&g1=-1&g2=0&k1=en-US&k2=km'
Let's see if the numbers from the isochrones compare to the information I have now.
len(df_adressen_woning)
df_adressen_woning.is_wormhotel_within_5_minutes.value_counts()
isochrones = sns.countplot(data=df_adressen_woning, x = 'is_wormhotel_within_5_minutes')
438028
False 334765 True 103263 Name: is_wormhotel_within_5_minutes, dtype: int64
df_adressen_woning['nearest_hotel_duration_under_300'] = df_adressen_woning['nearest_hotel_duration'].transform(lambda x: x < 300.5)
len(df_adressen_woning)
df_adressen_woning.nearest_hotel_duration_under_300.value_counts()
p = sns.countplot(data=df_adressen_woning, x = 'nearest_hotel_duration_under_300')
438028
False 386897 True 51131 Name: nearest_hotel_duration_under_300, dtype: int64
They don't! According to our routing through OSRM, only 12% of the residence addresses are within a 5 minute walk of a worm hotel, as opposed to the 24% I calculated with openrouteservice.org. Your mileage may vary!
Digging into the data generated by OSRM a bit more, I noticed that OSRM missed a few nearby worm hotels for a few neighbourhoods. When I tested the route between an address and a nearby worm hotel in IJburg, the frontend gave me this:
The route is completely off, so the results get skewed here. I don't know how to solve this, and it's strange that openrouteservice.org gives a different result with the isochrones, but also gives a weird (and different!) route when routing addresses in IJburg:
Google Maps does work, and shows this:
I notified Openstreetmap of this error.
I can also see this when I plot a histogram of the nearest times calculated by OSRM. Look at the tail over 10000 seconds (a 2.8 hour walk!):
ax = sns.distplot(df_adressen_woning['nearest_hotel_duration'], kde=False)
Ideally this would be addressed at openstreetmap and temporarily solved by using as the crow flies for IJburg, but for the purpose of the analysis of the whole city, I'll exclude IJburg for now. Luckily IJburg is already nicely covered with 3 worm hotels.
df_adressen_woning_not_ijburg = df_adressen_woning[df_adressen_woning['Naam Wijk'].str.contains('IJburg') == False]
Also, openrouteservice.org seems to give much shorter times when calculating an isochrone than when using the router option. For example 'Amaliastraat 10' in Amsterdam is supposed to be less than 300 seconds away from the nearest hotel according to our earlier isochrone calculations, but according to their own router, it's a 480 second walk. That's a good explanation why the OSRM calculation shows much less worm hotels within a 5 minute walk radius.
If I expand the radius of the OSRM calculation to 500 seconds, I get about the same distribution as I got from the isochrone calculation. I think 500 seconds is a good upper maximum: an 8 minute walk is doable for early adopters, and by bike it's even faster.
len(df_adressen_woning_not_ijburg)
df_adressen_woning_not_ijburg['nearest_hotel_duration_under_500'] = df_adressen_woning_not_ijburg['nearest_hotel_duration'].transform(lambda x: x < 500.5)
df_adressen_woning_not_ijburg.nearest_hotel_duration_under_500.value_counts()
p = sns.countplot(data=df_adressen_woning_not_ijburg, x = 'nearest_hotel_duration_under_500')
429179
False 321889 True 107290 Name: nearest_hotel_duration_under_500, dtype: int64
The percentage of the number of hotels within a 500 second radius according to OSRM is 33.3%:
str(int((df_adressen_woning_not_ijburg.nearest_hotel_duration_under_500.value_counts()[1] / df_adressen_woning_not_ijburg.nearest_hotel_duration_under_500.value_counts()[0]) * 1000) / 10) + "%"
'33.3%'
Looking at the histogram, I see a lot of times below 1000 seconds (16.7 minutes).
ax = sns.distplot(df_adressen_woning_not_ijburg['nearest_hotel_duration'], kde=False)
Let's zoom in on the most comon durations:
woning_not_ijburg_durations_below_3000 = df_adressen_woning_not_ijburg['nearest_hotel_duration'][df_adressen_woning_not_ijburg['nearest_hotel_duration'] < 3000]
ax = sns.distplot(woning_not_ijburg_durations_below_3000)
And use our own bins:
ax = sns.distplot(woning_ijburg_durations_below_3000, bins=[0, 500, 1000, 1500, 2000,3000])
Let's compute the average duration per (larger) neighbourhood, normalize that into a score, and create a cloropleth map. I used the so called 'gebiedsgerichtwerkengebied', a larger neighbourhood, instead of 'normal' neighbourhoods, because the address data of the municipality is out of date with the 'new' neighbourhood borders of 2015.
First, I'll create a pivot table of the neigbourhoods and mean walking duration in seconds:
df_neighbourhood_nearest_hotel_duration = df_adressen_woning_not_ijburg.pivot_table( \
columns='Naam gebiedsgerichtwerkengebied', values='nearest_hotel_duration').transpose()
df_neighbourhood_nearest_hotel_duration.sort_values(by="nearest_hotel_duration")
len(df_neighbourhood_nearest_hotel_duration.sort_values(by="nearest_hotel_duration"))
nearest_hotel_duration | |
---|---|
Naam gebiedsgerichtwerkengebied | |
Indische Buurt / Oostelijk Havengebied | 395.445561 |
Westerpark | 434.300928 |
De Pijp / Rivierenbuurt | 494.153409 |
Oud-Oost | 516.719633 |
Oud West / De Baarsjes | 562.601836 |
Ijburg / Eiland Zeeburg | 676.957284 |
Oud-Noord | 793.821163 |
Centrum-Oost | 798.746384 |
Watergraafsmeer | 804.365915 |
Bos en Lommer | 848.735375 |
Oud Zuid | 902.842803 |
Slotervaart | 925.571860 |
Centrum-West | 937.145351 |
Oost | 1169.752248 |
Gaasperdam / Driemond | 1290.230570 |
West | 1538.622201 |
Buitenveldert / Zuidas | 1627.046032 |
Osdorp | 1775.954414 |
Geuzenveld-Slotermeer-Sloterdijken | 1885.094372 |
Bijlmer Centrum | 2109.461601 |
De Aker, Sloten en Nieuw Sloten | 2264.087821 |
Bijlmer Oost | 3200.194357 |
22
Now I can also see the percentage of neighbourhoods where the mean walking time for residence addresses is less than 500 seconds (13.6%, or 3 out of 19)
df_neighbourhood_nearest_hotel_duration['nearest_hotel_duration_under_500'] = \
df_neighbourhood_nearest_hotel_duration['nearest_hotel_duration'].transform(lambda x: x < 500.5)
len(df_neighbourhood_nearest_hotel_duration)
df_neighbourhood_nearest_hotel_duration.nearest_hotel_duration_under_500.value_counts()
"Percentage neighbourhoods under 500 second walk: " + str(df_neighbourhood_nearest_hotel_duration.nearest_hotel_duration_under_500.value_counts()[1] / len(df_neighbourhood_nearest_hotel_duration) * 100) + "%"
p = sns.countplot(data=df_neighbourhood_nearest_hotel_duration, x = 'nearest_hotel_duration_under_500')
22
False 19 True 3 Name: nearest_hotel_duration_under_500, dtype: int64
'Percentage neighbourhoods under 500 second walk: 13.6363636364%'
To make the cloropleth useful, I'll normalize the data to an inversed score, so that a shorter mean duration translates to a higher score.
df_neighbourhood_nearest_hotel_duration['nearest_hotel_duration_normalized'] = \
df_neighbourhood_nearest_hotel_duration['nearest_hotel_duration'].transform(
lambda x: 1 - (x - x.min()) / (x.max() - x.min()))
df_neighbourhood_nearest_hotel_duration.sort_values(by="nearest_hotel_duration_normalized", ascending=False)
nearest_hotel_duration | nearest_hotel_duration_under_500 | nearest_hotel_duration_normalized | |
---|---|---|---|
Naam gebiedsgerichtwerkengebied | |||
Indische Buurt / Oostelijk Havengebied | 395.445561 | True | 1.000000 |
Westerpark | 434.300928 | True | 0.986147 |
De Pijp / Rivierenbuurt | 494.153409 | True | 0.964807 |
Oud-Oost | 516.719633 | False | 0.956761 |
Oud West / De Baarsjes | 562.601836 | False | 0.940402 |
Ijburg / Eiland Zeeburg | 676.957284 | False | 0.899630 |
Oud-Noord | 793.821163 | False | 0.857964 |
Centrum-Oost | 798.746384 | False | 0.856208 |
Watergraafsmeer | 804.365915 | False | 0.854204 |
Bos en Lommer | 848.735375 | False | 0.838385 |
Oud Zuid | 902.842803 | False | 0.819094 |
Slotervaart | 925.571860 | False | 0.810990 |
Centrum-West | 937.145351 | False | 0.806863 |
Oost | 1169.752248 | False | 0.723930 |
Gaasperdam / Driemond | 1290.230570 | False | 0.680975 |
West | 1538.622201 | False | 0.592414 |
Buitenveldert / Zuidas | 1627.046032 | False | 0.560887 |
Osdorp | 1775.954414 | False | 0.507796 |
Geuzenveld-Slotermeer-Sloterdijken | 1885.094372 | False | 0.468883 |
Bijlmer Centrum | 2109.461601 | False | 0.388888 |
De Aker, Sloten en Nieuw Sloten | 2264.087821 | False | 0.333758 |
Bijlmer Oost | 3200.194357 | False | 0.000000 |
Let's set up a color scale, colormap function, and import the neighbourhood geojson.
from branca.colormap import linear
colormap = linear.YlGn.scale(
df_neighbourhood_nearest_hotel_duration.nearest_hotel_duration_normalized.min(),
df_neighbourhood_nearest_hotel_duration.nearest_hotel_duration_normalized.max())
print(colormap(5.0))
colormap
#005a32
InteractiveShell.ast_node_interactivity = "last_expr"
m = folium.Map(location=[52.3824856,4.877951499999995], zoom_start=11, tiles='stamentoner',)
# folium.GeoJson(open("data/amsterdam-buurten-simplified-removed.geojson",encoding = "utf-8-sig").read(), name='geojson',
folium.GeoJson(open("data/amsterdam-gebiedsgerichtwerken.geojson",encoding = "utf-8-sig").read(), name='geojson',
style_function=lambda feature: {
'fillColor': colormap(df_neighbourhood_nearest_hotel_duration.loc[str(feature['properties']['titel'])]['nearest_hotel_duration_normalized']),
'color': 'black',
'weight': 1,
'dashArray': '5, 5',
'fillOpacity': 0.9,
}
).add_to(m)
m.save('03-choropleth-mean-duration.html')
m
InteractiveShell.ast_node_interactivity = "all" # Turn on all output again
Let's create a score that takes the population of each neighbourhood into account. By multiplying the average duration with the population, I have another number that can instruct us where to put a new hotel. I'll create a new cloropleth and combine it with the isochrones to further help to figure out the best placement. I'll also create mouseovers to the map.
I used this data from the municipality.
# Add the gebiedsgerichtwerkengebied code to the table
gebiedsgerichtwerkengebied_keys = df_adressen['Naam gebiedsgerichtwerkengebied'].unique()
gebiedsgerichtwerkengebied_values = df_adressen['Code gebiedsgerichtwerkengebied'].unique()
gebiedsgerichtwerkengebied_code_dict = dict(zip(gebiedsgerichtwerkengebied_keys, gebiedsgerichtwerkengebied_values))
df_neighbourhood_nearest_hotel_duration['Code gebied'] = \
df_neighbourhood_nearest_hotel_duration.index.map(lambda x: gebiedsgerichtwerkengebied_code_dict[x])
# Import the population excel
df_population = pd.read_csv('data/goisbasisbestandenbasisstatistiekbestandnu-op-de-ois-sitedata.amsterdambbga_latest_and_greatest.zip', sep=";")
df_population_2018_gebieds = df_population[(df_population['jaar'] == 2018) & (df_population['variabele'] == 'BEVTOTAAL')]
/Users/jurian/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2728: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
# Add the population to the table
bevolking_keys = df_population_2018_gebieds.gebiedcode15.values
bevolking_values = df_population_2018_gebieds.waarde.values
bevolking_dict = dict(zip(bevolking_keys, bevolking_values))
df_neighbourhood_nearest_hotel_duration['Population'] = \
df_neighbourhood_nearest_hotel_duration['Code gebied'].apply(lambda x: bevolking_dict[x])
df_neighbourhood_nearest_hotel_duration['Total seconds'] = \
df_neighbourhood_nearest_hotel_duration['nearest_hotel_duration'] * \
df_neighbourhood_nearest_hotel_duration['Population']
# Add normalized score
df_neighbourhood_nearest_hotel_duration['Total seconds normalized'] = \
df_neighbourhood_nearest_hotel_duration['Total seconds'].transform(
lambda x: 1 - (x - x.min()) / (x.max() - x.min()))
# Add rank according to scores
df_neighbourhood_nearest_hotel_duration['Rank - Duration'] = pd.to_numeric(df_neighbourhood_nearest_hotel_duration['nearest_hotel_duration_normalized'].rank(ascending=False),downcast='integer')
df_neighbourhood_nearest_hotel_duration['Rank - Total Seconds'] = pd.to_numeric(df_neighbourhood_nearest_hotel_duration['Total seconds normalized'].rank(ascending=False),downcast='integer')
# Housekeeping
df_ranks = df_neighbourhood_nearest_hotel_duration.rename(columns={'nearest_hotel_duration': 'Mean duration', "nearest_hotel_duration_under_500": 'Mean duration < 500', "nearest_hotel_duration_normalized": "Mean duration normalized"})
df_ranks.drop(columns=['Mean duration < 500', 'Code gebied'], inplace=True)
As I can see, there are some notable shifts when I take population size into account. For example: even though 'Oud West / De Baarsjes' has a relatively small average walking duration to the nearest worm hotel, they still rank low in terms of total seconds to be walked, because so many people live there.
# Qgrid is nice a nice table display, but doesn't render in nbviewer, so I'll skip it here
# # import qgrid
# # # qgrid.QgridWidget(df=None, grid_options={'filterable': False}, precision=None, show_toolbar=None)
# # qgrid = qgrid.show_grid(df_ranks.sort_values('Rank - Total Seconds', ascending=False), grid_options={'filterable': False, 'maxVisibleRows': 23})
# # qgrid
df_ranks.sort_values('Rank - Total Seconds', ascending=False)
df_ranks.round(decimals=2).sort_values('Rank - Total Seconds', ascending=False).to_clipboard()
Mean duration | Mean duration normalized | Population | Total seconds | Total seconds normalized | Rank - Duration | Rank - Total Seconds | |
---|---|---|---|---|---|---|---|
Naam gebiedsgerichtwerkengebied | |||||||
Bijlmer Oost | 3200.194357 | 0.000000 | 28848.0 | 9.231921e+07 | 0.000000 | 22 | 22 |
Geuzenveld-Slotermeer-Sloterdijken | 1885.094372 | 0.468883 | 45940.0 | 8.660124e+07 | 0.074601 | 19 | 21 |
Osdorp | 1775.954414 | 0.507796 | 39312.0 | 6.981632e+07 | 0.293588 | 18 | 20 |
De Aker, Sloten en Nieuw Sloten | 2264.087821 | 0.333758 | 28541.0 | 6.461933e+07 | 0.361391 | 21 | 19 |
West | 1538.622201 | 0.592414 | 37851.0 | 5.823839e+07 | 0.444641 | 16 | 18 |
Bijlmer Centrum | 2109.461601 | 0.388888 | 25008.0 | 5.275342e+07 | 0.516202 | 20 | 17 |
Oud Zuid | 902.842803 | 0.819094 | 54235.0 | 4.896568e+07 | 0.565619 | 11 | 16 |
Gaasperdam / Driemond | 1290.230570 | 0.680975 | 34143.0 | 4.405234e+07 | 0.629722 | 15 | 15 |
Buitenveldert / Zuidas | 1627.046032 | 0.560887 | 26280.0 | 4.275877e+07 | 0.646599 | 17 | 14 |
Oud West / De Baarsjes | 562.601836 | 0.940402 | 73117.0 | 4.113576e+07 | 0.667774 | 5 | 13 |
Centrum-West | 937.145351 | 0.806863 | 43386.0 | 4.065899e+07 | 0.673994 | 13 | 12 |
Slotervaart | 925.571860 | 0.810990 | 41988.0 | 3.886291e+07 | 0.697427 | 12 | 11 |
Centrum-Oost | 798.746384 | 0.856208 | 43465.0 | 3.471751e+07 | 0.751510 | 8 | 10 |
Oost | 1169.752248 | 0.723930 | 29365.0 | 3.434977e+07 | 0.756308 | 14 | 9 |
De Pijp / Rivierenbuurt | 494.153409 | 0.964807 | 65033.0 | 3.213628e+07 | 0.785187 | 3 | 8 |
Bos en Lommer | 848.735375 | 0.838385 | 35009.0 | 2.971338e+07 | 0.816798 | 10 | 7 |
Watergraafsmeer | 804.365915 | 0.854204 | 34119.0 | 2.744416e+07 | 0.846404 | 9 | 6 |
Oud-Noord | 793.821163 | 0.857964 | 29059.0 | 2.306765e+07 | 0.903502 | 7 | 5 |
Oud-Oost | 516.719633 | 0.956761 | 35846.0 | 1.852233e+07 | 0.962804 | 4 | 4 |
Ijburg / Eiland Zeeburg | 676.957284 | 0.899630 | 26062.0 | 1.764286e+07 | 0.974278 | 6 | 3 |
Indische Buurt / Oostelijk Havengebied | 395.445561 | 1.000000 | 41427.0 | 1.638212e+07 | 0.990726 | 1 | 2 |
Westerpark | 434.300928 | 0.986147 | 36084.0 | 1.567131e+07 | 1.000000 | 2 | 1 |
InteractiveShell.ast_node_interactivity = "last_expr"
import branca.colormap
colormap = linear.YlGn.scale(
df_ranks['Total seconds normalized'].min(),
df_ranks['Total seconds normalized'].max())
print(colormap(5.0))
colormap
#005a32
m = folium.Map(location=[52.3824856,4.877951499999995], zoom_start=11, tiles='stamentoner',)
totalsecondslayer = folium.FeatureGroup(name="By Total Seconds")
totalsecondslayer.add_child(folium.GeoJson(open("data/amsterdam-gebiedsgerichtwerken.geojson",encoding = "utf-8-sig").read(), name='geojson',
style_function=lambda feature: {
'fillColor': colormap(df_ranks.loc[str(feature['properties']['titel'])]['Total seconds normalized']),
'color': 'black',
'weight': 1,
# 'dashArray': '5, 5',
'fillOpacity': 0.9,
},
# highlight_function =lambda feature: { 'name': feature['properties']['titel']}
))
m.add_child(totalsecondslayer)
m.save('04-choropleth-total-seconds.html')
m
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-cdf1c4457025> in <module>() ----> 1 m = folium.Map(location=[52.3824856,4.877951499999995], zoom_start=11, tiles='stamentoner',) 2 3 totalsecondslayer = folium.FeatureGroup(name="By Total Seconds") 4 5 totalsecondslayer.add_child(folium.GeoJson(open("data/amsterdam-gebiedsgerichtwerken.geojson",encoding = "utf-8-sig").read(), name='geojson', NameError: name 'folium' is not defined
Complex Leaflet maps don't run in Jupyter, but here is a map showing both layers with popups.
And here is a screenshot of the maps side by side:
Right now, the worm hotels are placed on neighbourhood initiative, and it's apparent that some neighbourhoods get left out this way. My recommendation would be to promote the possibility of a worm hotel to people in the neighbourhoods with the highest score in the 'Mean duration' column, but to also take the 'Total seconds' column into account.
Feel free to email me at jurian [at] jurb dot me for details.