Romain Goldenberg / Flowminder foundation
This exercise provides an example of how we can use geographical data within an application.
The problem: We want to assess health facility coverage for maternal healthcare in Kaduna state. We are interested in finding out which areas are over-stretched with a high number of women of child-bearing age (WOCBA) per health-facility. By identifying a target number of people per health facility, we can begin to highlight locations that may need further invention.
Using geospatial analysis in Python, this demo will show an assessment of health facility coverage for maternal health using the GRID3 population data for Kaduna State.
In this exercise we are going to:
# General purpose libraries.
import pandas as pd
import os
import numpy as np
# Geospatial libraries.
import geopandas as gpd
import rasterio
from rasterio.plot import show
import folium
import shapely
# Voronoi and Zonal statistics libraries
from rasterstats import zonal_stats
# Visualization libraries.
from IPython.display import display
from matplotlib import pyplot as plt
import matplotlib.colors as mcolors
# Path for the data folder
data_path = os.path.join(os.getcwd(), "Data/Practical_1")
# Path to population (we will open it later with rasterio)
population = data_path + "/NGA_population_v1_2_agesex_f15_49_simple.tif"
# Below, we directly open the different shapefiles datasets with geopandas:
# 1- Health facilities
health_care = gpd.read_file(
data_path + "/health-care-facilities-primary-secondary-and-tertiary.geojson"
)
# 2- State boundaries
state_boundary = gpd.read_file(data_path + "/state-administrative-boundaries.geojson")
# 3- Wards boundaries
wards = gpd.read_file(data_path + "/operational-ward-boundaries.geojson")
We will be using four datasets in this example. They can be obtained from the GRID3 portal available at https://data.grid3.org/.
- Health facility locations
- Gridded population (We provide an already prepared dataset with women aged 15-49 for Kaduna State)
- State boundary
- Ward boundaries
In particular, the Nigeria Data portal is available here
We will view our health facilty data below. This includes data such as their location, the type of health centre (primary, secondary, tertiary), and whether it is private or publicly owned. An example showing some attributes (i.e. columns) of the data is shown below:
# Manually creating the list of column indices
indices = list(range(1, 3)) + [9, 11, 15, 19, 24]
display(health_care.iloc[:, indices])
latitude | longitude | category | name | ownership | ward_name | geometry | |
---|---|---|---|---|---|---|---|
0 | 10.648418 | 7.041725 | Primary Health Center | Labi Health Center | State Primary Healthcare Development Agency | Gayam | POINT (7.04172 10.64842) |
1 | 10.725830 | 6.745430 | Primary Health Center | Mando Health Center | State Primary Healthcare Development Agency | Gayam | POINT (6.74543 10.72583) |
2 | 10.725047 | 6.745297 | Primary Health Center | Gayam Health Center | State Primary Healthcare Development Agency | Gayam | POINT (6.74530 10.72505) |
3 | 10.769058 | 6.642158 | Primary Health Clinic | Girezin Primary Health Carep | Public | Gayam | POINT (6.64216 10.76906) |
4 | 10.791455 | 6.636343 | Primary Health Center | Kafanin Doka Health Center | National Primary Healthcare Development Agency | Gayam | POINT (6.63634 10.79145) |
... | ... | ... | ... | ... | ... | ... | ... |
2245 | 10.927040 | 7.787299 | Dispensary | Tsakiya Dispensary | Private | Wuciciri | POINT (7.78730 10.92704) |
2246 | 11.020853 | 7.761741 | Primary Health Center | Wuciciri Primary Health Center | National Primary Healthcare Development Agency | Wuciciri | POINT (7.76174 11.02085) |
2247 | 10.979158 | 7.766452 | Primary Health Center | Dan Arewa Clinic | Private | Wuciciri | POINT (7.76645 10.97916) |
2248 | 10.970477 | 7.779581 | Primary Health Clinic | Angwar Bisa Primary Health Clinic | Public | Wuciciri | POINT (7.77958 10.97048) |
2249 | 11.046583 | 7.761747 | Primary Health Clinic | Tudun Kusa Nursing Home And Maternity Ed | Public | Wuciciri | POINT (7.76175 11.04658) |
2250 rows × 7 columns
For our example, we are only interested in public health centers. We will therefore filter the dataset below:
# We select places where "ownership" = "Public", in other words public health centers.
public_health_care = (
health_care[health_care["ownership"] == "Public"].copy().reset_index(drop=True)
)
# We display the results.
display(public_health_care.iloc[:, indices])
latitude | longitude | category | name | ownership | ward_name | geometry | |
---|---|---|---|---|---|---|---|
0 | 10.769058 | 6.642158 | Primary Health Clinic | Girezin Primary Health Carep | Public | Gayam | POINT (6.64216 10.76906) |
1 | 10.707223 | 6.688533 | Primary Health Center | Pole Wire's Health Center | Public | Gayam | POINT (6.68853 10.70722) |
2 | 10.777357 | 6.180544 | Primary Health Center | Kankangi Model Primary Health Center | Public | Kakangi | POINT (6.18054 10.77736) |
3 | 10.649940 | 6.354280 | Primary Health Center | Sabon Layi Primary Health Center | Public | Kakangi | POINT (6.35428 10.64994) |
4 | 10.646690 | 6.351030 | Primary Health Clinic | Ikon Allah Nursing And Maternity | Public | Kakangi | POINT (6.35103 10.64669) |
... | ... | ... | ... | ... | ... | ... | ... |
807 | 11.051805 | 7.689320 | Health Post | Kofar Kuyam Bana Health Post | Public | Unguwar Juma | POINT (7.68932 11.05180) |
808 | 11.024940 | 7.775530 | Primary Health Clinic | Bogari Health Clinic | Public | Wuciciri | POINT (7.77553 11.02494) |
809 | 10.932492 | 7.785242 | Health Post | Rubuchi Health Post | Public | Wuciciri | POINT (7.78524 10.93249) |
810 | 10.970477 | 7.779581 | Primary Health Clinic | Angwar Bisa Primary Health Clinic | Public | Wuciciri | POINT (7.77958 10.97048) |
811 | 11.046583 | 7.761747 | Primary Health Clinic | Tudun Kusa Nursing Home And Maternity Ed | Public | Wuciciri | POINT (7.76175 11.04658) |
812 rows × 7 columns
All of our data is geospatial, meaning it has a relationship to spatial location. Our health care
dataset is point(s) data, wards
and state_boundary
polygon(s) data. They are examples of shapefile data.
So let's visualize our data on an interactive map.
Below is the first layer of our map, the wards
of Kaduna state:
# explore() is a geopandas method to create interactive maps.
# we assign it to the variable 'kaduna_map', to add more map layers after.
kaduna_map = wards.explore(
color="black", # Use black color for borders.
# Styling instructions. We fill the wards with lightgrey color (when hovering over them),
# and change the opacity of different elements.
style_kwds=dict(
fillColor="lightgrey", fill=True, opacity=1.0, fillOpacity=0, interactive=True
),
tiles="OpenStreetMap", # Use Open Street Map background tiles.
tooltip=False, # Do not show tooltip when hovering on wards.
popup="ward_name", # Show the name of the ward on click.
# Do not show the column label "ward_name" in the popup.
popup_kwds=dict(labels=False),
smooth_factor=0, # Prevent smoothing of the polygons edges.
name="wards", # Name of the layer in the map.
)
Then we continue with our second layer, the public_health_care
dataset:
public_health_care.explore(
m=kaduna_map, # Pass the previous map object 'kaduna_map'.
column="category", # Make choropleth based on "category" column.
tooltip="name", # Show "name" value in tooltip (on hover)
# Do not show column label in the tooltip.
tooltip_kwds=dict(labels=False),
# Show the selected values in popup (on click).
popup=["category", "name", "ownership"],
cmap="gnuplot2", # Use "gnuplot2" matplotlib color scheme.
marker_kwds=dict(radius=5), # Size of the points.
# Styling instructions. We draw small black circles around our points,
# and change the opacity of different elements.
style_kwds=dict(color="black", weight=1, fill=True, opacity=0.5, fillOpacity=0.8),
name="health_care", # Name of the layer in the map.
)
# Use the folium library (which Geopandas is based on for interactive mapping) to add layer control
folium.LayerControl().add_to(kaduna_map)
<folium.map.LayerControl at 0x157f0b7b590>
We can now display our map:
kaduna_map
Our dataset is instead an example of data. Now, let's visualize our population
raster data. We will use a different strategy for the sake of this exercise and produce here a simple static map.
We will use here the rasterio
library to manage and open our raster data:
# Use rasterio to import the raster data
with rasterio.open(population) as pop:
data_pop = pop.dataset_mask() # Returns the valid data mask.
img_pop = pop.read() # Returns the full image.
nodata_pop = pop.nodata # Returns the value of "no data" cells
Then, we can map our population
dataset using imshow()
from the matplotlib
library.
# We first define a normalization for our colormap (range of colors)
norm = mcolors.TwoSlopeNorm(vmin=0, vmax=300.0, vcenter=30)
# Then, we create a new figure
fig, ax = plt.subplots(figsize=(10, 10))
# We first plot the population data. We use 'afmhot_r' as color scheme, normalize the color scheme,
# and provide the geographic extent of the data in 'extent'.
plot_pop = ax.imshow(
data_pop, cmap="afmhot_r", norm=norm, extent=rasterio.plot.plotting_extent(pop)
)
# Then, we plot our wards on top of the population data
wards.plot(edgecolor="black", facecolor="none", linewidth=0.3, alpha=1, ax=ax)
# Finally, we add a colorbar to the map
fig.colorbar(plot_pop, fraction=0.043, pad=0.04, label="Population count", ax=ax)
plt.title("Women aged 15-49 in Kaduna state") # Title for our plot
plt.show() # Show the figure
plt.close()
Before running this algorithm, we need to make sure our datasets have the same CRS.
# Using .crs (from geopandas), we easily obtain information on the dataset CRS.
public_health_care.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Currently, our data is in WGS 84 (epsg:4326), a latitude/longitude coordinate system based on the Earth's center of mass (very common, used by the Global Positioning System among others). For our purpose, it is better to use a local projected coordinate system, rather than a global geographic system (as we currently have) to avoid distortions in results due to the projection.
We will now switch our data to the local CRS epsg:26392, a projected coordinate system (position measurements in meters) corresponding to Minna / Nigeria Mid Belt.
# With .to_crs (from geopandas), we can easily reproject our data.
public_health_care = public_health_care.to_crs("epsg:26392")
state_boundary = state_boundary.to_crs("epsg:26392")
wards = wards.to_crs("epsg:26392")
Let's have a look at the new CRS:
public_health_care.crs
<Projected CRS: EPSG:26392> Name: Minna / Nigeria Mid Belt Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - name: Nigeria between 6°30'E and 10°30'E, onshore and offshore shelf. - bounds: (6.5, 3.57, 10.51, 13.53) Coordinate Operation: - name: Nigeria Mid Belt - method: Transverse Mercator Datum: Minna - Ellipsoid: Clarke 1880 (RGS) - Prime Meridian: Greenwich
And a quick look at the data.
You should now see a difference in the coordinates of our geometry
column, compared to the first time we opened the data:
# We display the results.
display(public_health_care.iloc[:, indices])
latitude | longitude | category | name | ownership | ward_name | geometry | |
---|---|---|---|---|---|---|---|
0 | 10.769058 | 6.642158 | Primary Health Clinic | Girezin Primary Health Carep | Public | Gayam | POINT (467453.734 748966.431) |
1 | 10.707223 | 6.688533 | Primary Health Center | Pole Wire's Health Center | Public | Gayam | POINT (472486.719 742094.807) |
2 | 10.777357 | 6.180544 | Primary Health Center | Kankangi Model Primary Health Center | Public | Kakangi | POINT (416954.069 750229.065) |
3 | 10.649940 | 6.354280 | Primary Health Center | Sabon Layi Primary Health Center | Public | Kakangi | POINT (435865.980 735990.760) |
4 | 10.646690 | 6.351030 | Primary Health Clinic | Ikon Allah Nursing And Maternity | Public | Kakangi | POINT (435507.743 735633.587) |
... | ... | ... | ... | ... | ... | ... | ... |
807 | 11.051805 | 7.689320 | Health Post | Kofar Kuyam Bana Health Post | Public | Unguwar Juma | POINT (582069.723 779738.960) |
808 | 11.024940 | 7.775530 | Primary Health Clinic | Bogari Health Clinic | Public | Wuciciri | POINT (591481.024 776743.611) |
809 | 10.932492 | 7.785242 | Health Post | Rubuchi Health Post | Public | Wuciciri | POINT (592517.860 766516.749) |
810 | 10.970477 | 7.779581 | Primary Health Clinic | Angwar Bisa Primary Health Clinic | Public | Wuciciri | POINT (591909.185 770719.178) |
811 | 11.046583 | 7.761747 | Primary Health Clinic | Tudun Kusa Nursing Home And Maternity Ed | Public | Wuciciri | POINT (589981.002 779140.922) |
812 rows × 7 columns
We can now create our Voronoi polygons.
First, we select the geometry of our datasets:
# We select the 'geometry' columns from both our datasets
# This is our health care centers (points)
points_health_care = public_health_care.geometry.copy()
# Create a MultiPoint object from the GeoSeries
points_health_care = shapely.geometry.MultiPoint(points_health_care.tolist())
# And this is the boundaries of Kaduna state
polygon_state = state_boundary.geometry.iloc[0]
polygon_state
points_health_care
Then we can run the Voronoi creation:
voronoi = shapely.voronoi_polygons(points_health_care, extend_to=polygon_state)
voronoi
We can now extract our voronoi results and store them in a new geopandas dataset:
# Extract geometries from the GeometryCollection
geometries_list = [geom for geom in voronoi.geoms]
# Convert list of geometries to GeoDataFrame, and make sure to indicate the CRS of the data
# (epsg:26392, the same CRS we used to run the voronoi algorithm)
voronoi_polygons = gpd.GeoDataFrame({"geometry": geometries_list}, crs="epsg:26392")
# Intersect (i.e. cut) Voronoi polygons with Kaduna state
voronoi_polygons = voronoi_polygons.geometry.intersection(polygon_state)
# Create a new GeoDataFrame with clipped geometries and set the CRS
voronoi_polygons = gpd.GeoDataFrame({"geometry": voronoi_polygons}, crs="epsg:26392")
Finally we reproject our datasets back to the original WGS 84 CRS (epsg:4326):
voronoi_polygons = voronoi_polygons.to_crs("epsg:4326")
public_health_care = public_health_care.to_crs("epsg:4326")
state_boundary = state_boundary.to_crs("epsg:4326")
Let's now have a look at our results!
voronoi_map = voronoi_polygons.explore(
color="black", # Use black color for borders.
# Styling instructions. We fill the wards with lightgrey color (when hovering over them),
# and change the opacity of different elements.
style_kwds=dict(
fillColor="lightgrey", fill=True, opacity=1.0, fillOpacity=0, interactive=True
),
tiles="openstreetmap", # Use "Open Street Map" background tiles.
tooltip=False, # Do not show tooltip when hovering on wards.
smooth_factor=0, # Prevent smoothing of the polygons edges.
name="voronoi", # Name of the layer in the map.
)
public_health_care.explore(
m=voronoi_map, # Pass the previous map object 'voronoi_map'.
column="category", # Make choropleth based on "category" column.
tooltip="name", # Show "name" value in tooltip (on hover).
# Do not show column label in the tooltip.
tooltip_kwds=dict(labels=False),
popup=True, # Show the selected values in popup (on click).
cmap="gnuplot2", # Use "gnuplot2" matplotlib color scheme.
marker_kwds=dict(radius=5), # Size of the points.
# Styling instructions. We draw small black circles around our points,
# and change the opacity of different elements.
style_kwds=dict(color="black", weight=1, fill=True, opacity=0.5, fillOpacity=0.8),
name="health_care", # Name of the layer in the map.
)
folium.LayerControl().add_to(voronoi_map) # use folium to add layer control
<folium.map.LayerControl at 0x157f180a0d0>
We can now display our map:
voronoi_map
We are interested in finding out the population bound to specific health facilities.
We can, using zonal statistics, count the number of people living in each of our voronoi polygons:
# We use the function zonal_stats from the library 'rasterstats',
# to calculate a mean and sum of population per polygon.
# We store the results of the function in 'stats'.
stats = zonal_stats(
vectors=voronoi_polygons["geometry"], raster=population, stats=["mean", "sum"]
)
Then, we can join these results back to our voronoi_polygons
dataset:
# We make a dataframe out of our 'stats' results, and join it back to the voronoi polygons.
voronoi_polygons = voronoi_polygons.join(
pd.DataFrame(stats), how="left"
) # Note the .join (simple join) method.
We can rename the new columns, and have a look back at voronoi_polygons
:
# Rename the columns to 'Population_mean' and 'Population_sum'
voronoi_polygons = voronoi_polygons.rename(
columns={"mean": "Population_mean", "sum": "Population_sum"}
)
voronoi_polygons
geometry | Population_mean | Population_sum | |
---|---|---|---|
0 | POLYGON ((6.35607 10.90042, 6.33851 10.80842, ... | 8.322122 | 6433.0 |
1 | POLYGON ((6.20999 10.64080, 6.24852 10.55920, ... | 4.877341 | 5209.0 |
2 | POLYGON ((7.06982 10.15453, 7.27864 10.16575, ... | 3.727173 | 3388.0 |
3 | POLYGON ((6.96235 10.04701, 6.95227 10.04755, ... | 3.379874 | 2687.0 |
4 | POLYGON ((6.20999 10.64080, 6.27522 10.72405, ... | 7.081126 | 4277.0 |
... | ... | ... | ... |
807 | POLYGON ((8.67009 10.30241, 8.67188 10.32655, ... | 4.840290 | 2667.0 |
808 | POLYGON ((8.65935 10.40248, 8.67945 10.43338, ... | 12.548701 | 11595.0 |
809 | POLYGON ((8.71894 10.44895, 8.72037 10.45051, ... | 5.669492 | 669.0 |
810 | POLYGON ((8.71926 10.38929, 8.71951 10.39065, ... | 7.242188 | 927.0 |
811 | POLYGON ((8.74707 10.32979, 8.72229 10.38606, ... | 4.751402 | 2542.0 |
812 rows × 3 columns
We will now finally join the population data back to our public_health_care
.
To make sure we join the population data from our voronoi polygons to the correct health center data points, we will peform here instead a *spatial join*.
The spatial join (denoted .sjoin
below) is different from the simple join (denoted .join
above) we just used. The previous join simply add the population information we calculated, by matching the data in order to each rows of the table of voronoi_polygons
. The spatial join instead check the spatial location of each health care center point, and if this point is inside a particular voronoi polygon, it will add the information from the polygon to the point.
This is done for each health care center point:
# We use a spatial join to join population results back to health care centers.
public_health_care = public_health_care.sjoin(
voronoi_polygons, how="left"
) # Note the .sjoin (spatial join) method.
We can make sure the data has been correctly added:
# Display the results by descending order
public_health_care.sort_values(by=["Population_sum"], ascending=False)
display(public_health_care.iloc[:, indices + [26, 27]])
latitude | longitude | category | name | ownership | ward_name | geometry | Population_mean | Population_sum | |
---|---|---|---|---|---|---|---|---|---|
0 | 10.769058 | 6.642158 | Primary Health Clinic | Girezin Primary Health Carep | Public | Gayam | POINT (6.64216 10.76906) | 6.617143 | 4632.0 |
1 | 10.707223 | 6.688533 | Primary Health Center | Pole Wire's Health Center | Public | Gayam | POINT (6.68853 10.70722) | 5.366460 | 2592.0 |
2 | 10.777357 | 6.180544 | Primary Health Center | Kankangi Model Primary Health Center | Public | Kakangi | POINT (6.18054 10.77736) | 8.322122 | 6433.0 |
3 | 10.649940 | 6.354280 | Primary Health Center | Sabon Layi Primary Health Center | Public | Kakangi | POINT (6.35428 10.64994) | 8.494071 | 2149.0 |
4 | 10.646690 | 6.351030 | Primary Health Clinic | Ikon Allah Nursing And Maternity | Public | Kakangi | POINT (6.35103 10.64669) | 7.081126 | 4277.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
807 | 11.051805 | 7.689320 | Health Post | Kofar Kuyam Bana Health Post | Public | Unguwar Juma | POINT (7.68932 11.05180) | 28.758389 | 4285.0 |
808 | 11.024940 | 7.775530 | Primary Health Clinic | Bogari Health Clinic | Public | Wuciciri | POINT (7.77553 11.02494) | 5.909657 | 1897.0 |
809 | 10.932492 | 7.785242 | Health Post | Rubuchi Health Post | Public | Wuciciri | POINT (7.78524 10.93249) | 10.726744 | 1845.0 |
810 | 10.970477 | 7.779581 | Primary Health Clinic | Angwar Bisa Primary Health Clinic | Public | Wuciciri | POINT (7.77958 10.97048) | 11.590909 | 1530.0 |
811 | 11.046583 | 7.761747 | Primary Health Clinic | Tudun Kusa Nursing Home And Maternity Ed | Public | Wuciciri | POINT (7.76175 11.04658) | 5.276786 | 1182.0 |
812 rows × 9 columns
And perform a final check to make sure all of our health care centers have a population value attached:
# We check for null values (missing values) in each rows, and sum it back for each columns.
public_health_care[["Population_mean", "Population_sum"]].isnull().sum()
Population_mean 0 Population_sum 0 dtype: int64
Here, we set a target at 4,000 women of child-bearing age.
target = 4000
Let's quickly plot an histogram, to have a look at the repartition of population for the different health care centers.
# We create a new figure
fig, ax = plt.subplots(figsize=(14, 5))
# We plot an histogram based on the 'population_sum' column
ax.hist(public_health_care["Population_sum"], bins=50, alpha=0.5, edgecolor="black")
# Draw a vertical line at population=4000
ax.axvline(x=target, color="k", linestyle="--")
ax.set_xlabel("Population")
ax.set_ylabel("Frequency")
plt.show()
To finish, we just need to select voronoi areas where we have more than 4000 women of child-bearing age.
outliers = voronoi_polygons[voronoi_polygons["Population_sum"] > target]
And using the different element we have already created, add our outliers to the interactive map:
map_final = voronoi_polygons.explore(
color="black", # Use black color for borders.
# Styling instructions. We fill the wards with lightgrey color (when hovering over them),
# and change the opacity of different elements.
style_kwds=dict(
fillColor="lightgrey", fill=True, opacity=1.0, fillOpacity=0, interactive=True
),
tiles="openstreetmap", # Use "Open Street Map" background tiles.
tooltip=False, # Do not show tooltip when hovering on wards.
popup="Population_sum", # show population values in popup (on click).
popup_kwds=dict(labels=True), # Show column label in the popup.
smooth_factor=0, # Prevent smoothing of the polygons edges.
name="voronoi", # Name of the layer in the map.
)
outliers.explore(
m=map_final, # Pass the previous map object 'map_final'.
color="black", # Use black color for borders.
# Styling instructions. We fill the outlier wards with red color,
# and change the opacity of different elements.
style_kwds=dict(
fillColor="red", fill=True, opacity=0.0, fillOpacity=0.5, interactive=True
),
tooltip=False, # Do not show tooltip when hovering on wards.
popup="Population_sum", # Show population values in popup (on click).
popup_kwds=dict(labels=True), # Show column label in the popup.
smooth_factor=0, # Prevent smoothing of the polygons edges.
name="wards", # Name of the layer in the map.
)
public_health_care.explore(
m=map_final, # Pass the previous map object 'map_final'.
column="category", # Make choropleth based on "category" column.
tooltip="name", # Show "name" value in tooltip (on hover).
# Do not show column label in the tooltip.
tooltip_kwds=dict(labels=False),
popup=True, # Show the selected values in popup (on click).
cmap="gnuplot2", # Use "gnuplot2" matplotlib color scheme.
marker_kwds=dict(radius=5), # Size of the points.
# Styling instructions. We draw small black circles around our points,
# and change the opacity of different elements.
style_kwds=dict(color="black", weight=1, fill=True, opacity=0.5, fillOpacity=0.8),
name="health_care", # Name of the layer in the map.
)
folium.LayerControl().add_to(map_final) # use folium to add layer control
<folium.map.LayerControl at 0x157f4d1bf10>
map_final