%matplotlib inline
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
In this case study, we will explore a dataset on artisanal mining sites located in eastern DR Congo.
Note: this tutorial is meant as a hands-on session, and most code examples are provided as exercises to be filled in. I highly recommend actually trying to do this yourself, but if you want to follow the solved tutorial, you can find this in the _solved
directory.
IPIS, the International Peace Information Service, manages a database on mining site visits in eastern DR Congo: http://ipisresearch.be/home/conflict-mapping/maps/open-data/
Since 2009, IPIS has visited artisanal mining sites in the region during various data collection campaigns. As part of these campaigns, surveyor teams visit mining sites in the field, meet with miners and complete predefined questionnaires. These contain questions about the mining site, the minerals mined at the site and the armed groups possibly present at the site.
Some additional links:
IPIS provides a WFS server to access the data. We can send a query to this server to download the data, and load the result into a geopandas GeoDataFrame:
import requests
import json
wfs_url = "http://geo.ipisresearch.be/geoserver/public/ows"
params = dict(service='WFS', version='1.0.0', request='GetFeature',
typeName='public:cod_mines_curated_all_opendata_p_ipis', outputFormat='json')
r = requests.get(wfs_url, params=params)
data_features = json.loads(r.content.decode('UTF-8'))
data_visits = geopandas.GeoDataFrame.from_features(data_features, crs={'init': 'epsg:4326'})
However, the data is also provided in the tutorial materials as a GeoJSON file, so it is certainly available during the tutorial.
data_visits = geopandas.read_file("data/cod_mines_curated_all_opendata_p_ipis.geojson")
data_visits.head()
id | vid | source | project | pcode | name | visit_date | visit_onsite | visit_onsite_novisitreason | longitude | ... | digging_armed_group2 | forced_labour_armed_group2 | pillaging_armed_group2 | state_service1 | state_service2 | state_service3 | state_service4 | itsci | qualification | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cod_mines_curated_all_opendata_p_ipis.fid-11f0... | 1 | IPIS - Ministère des Mines | IPIS - 2009 | codmine00191 | Eohe | 2009-01-01Z | 1 | None | 28.712580 | ... | NaN | NaN | NaN | None | None | None | None | None | None | POINT (28.71258 0.33188) |
1 | cod_mines_curated_all_opendata_p_ipis.fid-11f0... | 2 | IPIS - Ministère des Mines | IPIS - 2009 | codmine00192 | Eita | 2009-01-01Z | 1 | None | 28.699160 | ... | NaN | NaN | NaN | None | None | None | None | None | None | POINT (28.69916 0.32153) |
2 | cod_mines_curated_all_opendata_p_ipis.fid-11f0... | 3 | IPIS - Ministère des Mines | IPIS - 2009 | codmine00242 | Mungu Iko | 2009-01-01Z | 1 | None | 28.185142 | ... | NaN | NaN | NaN | None | None | None | None | None | None | POINT (28.1851423 0.54499175) |
3 | cod_mines_curated_all_opendata_p_ipis.fid-11f0... | 4 | IPIS - Ministère des Mines | IPIS - 2009 | codmine00260 | Kiviri/Tayna | 2009-01-01Z | 1 | None | 28.884528 | ... | NaN | NaN | NaN | None | None | None | None | None | None | POINT (28.884528 -0.352529) |
4 | cod_mines_curated_all_opendata_p_ipis.fid-11f0... | 5 | IPIS - Ministère des Mines | IPIS - 2009 | codmine00272 | Makanga | 2009-01-01Z | 1 | None | 28.903945 | ... | NaN | NaN | NaN | None | None | None | None | None | None | POINT (28.903945 -0.036707) |
5 rows × 62 columns
len(data_visits)
3687
The provided dataset contains a lot of information, much more than we are going to use in this tutorial. Therefore, we will select a subset of the column:
data_visits = data_visits[['vid', 'project', 'visit_date', 'name', 'pcode', 'workers_numb', 'interference', 'armed_group1', 'mineral1', 'geometry']]
data_visits.head()
vid | project | visit_date | name | pcode | workers_numb | interference | armed_group1 | mineral1 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | IPIS - 2009 | 2009-01-01Z | Eohe | codmine00191 | 300.0 | NaN | None | Or | POINT (28.71258 0.33188) |
1 | 2 | IPIS - 2009 | 2009-01-01Z | Eita | codmine00192 | 110.0 | NaN | None | Or | POINT (28.69916 0.32153) |
2 | 3 | IPIS - 2009 | 2009-01-01Z | Mungu Iko | codmine00242 | NaN | NaN | FARDC | Or | POINT (28.1851423 0.54499175) |
3 | 4 | IPIS - 2009 | 2009-01-01Z | Kiviri/Tayna | codmine00260 | NaN | NaN | FDLR | Or | POINT (28.884528 -0.352529) |
4 | 5 | IPIS - 2009 | 2009-01-01Z | Makanga | codmine00272 | NaN | NaN | FDLR | Or | POINT (28.903945 -0.036707) |
Before starting the actual geospatial tutorial, we will use some more advanced pandas queries to construct a subset of the data that we will use further on:
# Take only the data of visits by IPIS
data_ipis = data_visits[data_visits['project'].str.contains('IPIS') & (data_visits['workers_numb'] > 0)]
# For those mining sites that were visited multiple times, take only the last visit
data_ipis_lastvisit = data_ipis.sort_values('visit_date').groupby('pcode', as_index=False).last()
data = geopandas.GeoDataFrame(data_ipis_lastvisit, crs=data_visits.crs)
Next to the mining site data, we are also going to use a dataset on protected areas (national parks) in Congo. This dataset was downloaded from http://www.wri.org/our-work/project/congo-basin-forests/democratic-republic-congo#project-tabs and included in the tutorial repository: data/cod_conservation.zip
.
protected_areas = geopandas.read_file("data/Conservation/RDC_aire_protegee_2013.shp")
# or to read it directly from the zip file:
# protected_areas = geopandas.read_file("/Conservation", vfs="zip://./data/cod_conservation.zip")
protected_areas.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7294d8128>
We will see that both datasets use a different Coordinate Reference System (CRS). For many operations, however, it is important that we use a consistent CRS, and therefore we will convert both to a commong CRS.
But first, we explore problems we can encounter related to CRSs.
Goma is the capital city of North Kivu province of Congo, close to the border with Rwanda. It's coordinates are 1.66°S 29.22°E.
from shapely.geometry import Point
goma = Point(29.22, -1.66)
dist_goma = data.distance(goma)
dist_goma.nsmallest(5)
301 0.226353 1868 0.235082 1865 0.252101 1067 0.263424 1538 0.264856 dtype: float64
The distances we see here in degrees, which is not helpful for interpreting those distances. That is a reason we will convert the data to another coordinate reference system (CRS) for the remainder of this tutorial.
Check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.
ax = protected_areas.plot()
data.plot(ax=ax, color='C1')
<matplotlib.axes._subplots.AxesSubplot at 0x7fd729500c18>
You will notice that the protected areas and mining sites do not map to the same area on the plot. This is because the Coordinate Reference Systems (CRS) differ for both datasets. Another reason we will need to convert the CRS!
Let's check the Coordinate Reference System (CRS) for both datasets.
The mining sites data uses the WGS 84 lat/lon (EPSG 4326) CRS:
data.crs
{'init': 'epsg:4326'}
The protected areas dataset, on the other hand, uses a WGS 84 / World Mercator (EPSG 3395) projection (with meters as unit):
protected_areas.crs
{'datum': 'WGS84', 'lat_ts': 5, 'lon_0': 0, 'no_defs': True, 'proj': 'merc', 'units': 'm', 'x_0': 0, 'y_0': 0}
We will convert both datasets to a local UTM zone, so we can plot them together and that distance-based calculations give sensible results.
To find the appropriate UTM zone, you can check http://www.dmap.co.uk/utmworld.htm or https://www.latlong.net/lat-long-utm.html, and in this case we will use UTM zone 35, which gives use EPSG 32735: https://epsg.io/32735
data_utm = data.to_crs(epsg=32735)
protected_areas_utm = protected_areas.to_crs(epsg=32735)
ax = protected_areas_utm.plot()
data_utm.plot(ax=ax, color='C1')
<matplotlib.axes._subplots.AxesSubplot at 0x7fd7292c4898>
For the following exercises, check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.
ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5)
ax.set_axis_off()
# alternative with constructing the matplotlib figure first
fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(aspect='equal'))
protected_areas_utm.plot(ax=ax, color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5)
ax.set_axis_off()
In addition to the previous figure:
ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='interference')
ax.set_axis_off()
In addition to the previous figure:
ax = protected_areas_utm.plot(figsize=(10, 10), color='green')
data_utm.plot(ax=ax, markersize=5, alpha=0.5, column='mineral1', legend=True)
ax.set_axis_off()
kahuzi = protected_areas_utm[protected_areas_utm['NAME_AP'] == "Kahuzi-Biega National park"].geometry.squeeze()
mines_kahuzi = data_utm[data_utm.within(kahuzi)]
mines_kahuzi
pcode | vid | project | visit_date | name | workers_numb | interference | armed_group1 | mineral1 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
661 | codmine00680 | 1032 | IPIS - PROMINES MoFA 2013-2014 | 2013-08-28Z | Ibozia/Kalumé | 80.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (567832.7086093378 9759143.339360647) |
662 | codmine00681 | 1025 | IPIS - PROMINES MoFA 2013-2014 | 2013-08-26Z | Matamba | 150.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (598323.5389475008 9758688.142411157) |
663 | codmine00682 | 1031 | IPIS - PROMINES MoFA 2013-2014 | 2013-08-27Z | Mutete/Mukina | 170.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (570733.4369126211 9761871.114227083) |
664 | codmine00683 | 1033 | IPIS - PROMINES MoFA 2013-2014 | 2013-08-28Z | Mutete | 100.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (569881.0930415759 9762219.110778008) |
760 | codmine00779 | 1603 | IPIS - PROMINES MoFA 2013-2014 | 2014-02-25Z | Mazankala | 120.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (613075.5326777868 9722956.979837928) |
813 | codmine00833 | 2439 | IPIS - IOM PROMINES 2015 | 2015-07-28Z | Kitendebwa | 50.0 | 0.0 | FARDC | Or | POINT (693078.9282059025 9770107.517721133) |
871 | codmine00893 | 1226 | IPIS - PROMINES MoFA 2013-2014 | 2013-09-28Z | Sebwa-Lukoma | 130.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (660406.3452248175 9715261.717041001) |
872 | codmine00894 | 1305 | IPIS - PROMINES MoFA 2013-2014 | 2013-10-30Z | Rwamakaza | 160.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (661266.834456568 9716072.198784607) |
1486 | codmine01764 | 180 | IPIS - 2009 | 2009-01-01Z | Mugaba I | 50.0 | NaN | NaN | Or | POINT (685167.3714990132 9744069.967416598) |
1487 | codmine01765 | 181 | IPIS - 2009 | 2009-01-01Z | Mugaba Ouest | 46.0 | NaN | NaN | Or | POINT (683156.6865782175 9746324.416321497) |
1681 | codmine01997 | 2476 | IPIS - IOM PROMINES 2015 | 2015-08-02Z | Nguba(Nkuba) kamisoke | 122.0 | 1.0 | Raïa Mutomboki | Cassitérite | POINT (622151.3489110788 9808363.111073116) |
len(mines_kahuzi)
11
Shapely and GeoPandas provide a lot functionality out of the box, because sometimes they will not exactly fit your analysis needs. However, we can still easily use them as building blocks for a custom spatial operation, and apply this to all our geometries.
single_mine = data_utm.geometry[0]
dist = protected_areas_utm.distance(single_mine)
idx = dist.idxmin()
closest_area = protected_areas_utm.loc[idx, 'NAME_AP']
closest_area
'Virunga National park'
def closest_protected_area(mine, protected_areas):
dist = protected_areas.distance(mine)
idx = dist.idxmin()
closest_area = protected_areas.loc[idx, 'NAME_AP']
return closest_area
result = data_utm.geometry.apply(lambda site: closest_protected_area(site, protected_areas_utm))
Based on the analysis and visualizations above, we can already see that there are mining sites inside the protected areas. Let's now do an actual spatial join to determine which sites are within the protected areas.
data_within_protected = geopandas.sjoin(data_utm, protected_areas_utm[['NAME_AP', 'geometry']],
op='within', how='inner')
len(data_within_protected)
64
data_within_protected['NAME_AP'].value_counts()
# or data_within_protected.groupby('NAME_AP').size()
Itombwe Nature Reserve 21 Luama-Katanga Hunting Domain 14 Kahuzi-Biega National park 11 Luama-Kivu Hunting Domain 9 Okapi Faunal Reserve 5 Maiko National park 3 Tayna Nature Reserve 1 Name: NAME_AP, dtype: int64
data_within_protected.groupby('NAME_AP')['workers_numb'].sum()
NAME_AP Itombwe Nature Reserve 2987.0 Kahuzi-Biega National park 1178.0 Luama-Katanga Hunting Domain 930.0 Luama-Kivu Hunting Domain 1057.0 Maiko National park 493.0 Okapi Faunal Reserve 997.0 Tayna Nature Reserve 244.0 Name: workers_numb, dtype: float64
And what about the borders of the protected areas? (just outside the park)
protected_areas_border = protected_areas_utm[['NAME_AP', 'geometry']].copy()
protected_areas_border['geometry'] = protected_areas_border.buffer(10000).difference(protected_areas_utm.unary_union)
protected_areas_border.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fd728f47518>
data_within_border = geopandas.sjoin(data_utm, protected_areas_border,
op='within', how='inner')
data_within_border['NAME_AP'].value_counts()
Kahuzi-Biega National park 99 Okapi Faunal Reserve 50 Maiko National park 33 Itombwe Nature Reserve 32 Luama-Kivu Hunting Domain 23 Kisimba Ikobo Nature Reserve 21 Tayna Nature Reserve 11 Luama-Katanga Hunting Domain 9 Upemba National park 4 Virunga National park 3 Mulumbu Hunting Domain 1 Name: NAME_AP, dtype: int64
This section shows an example how to extract information from a raster layer for each of the vector features. In this case, we will extract the value of a vegetation map for each mining site. The data can be downloaded from http://www.wri.org/our-work/project/congo-basin-forests/democratic-republic-congo#project-tabs (scroll a bit down, and in the right sidebar you can download the "Vegetation" data).
We use the rasterstats
package to do this:
from rasterstats import point_query
vegetation_raster = "data/Vegetation/Carte_vgt_ucl_10/carte_foraf/central_africa_vegetation_map_foraf.tif"
data['vegetation'] = point_query(data.geometry, vegetation_raster, interpolate='nearest')
/home/joris/miniconda3/lib/python3.5/site-packages/rasterstats/point.py:167: FutureWarning: The value of this property will change in version 1.0. Please see https://github.com/mapbox/rasterio/issues/86 for details. with Raster(raster, nodata=nodata, affine=affine, band=band) as rast: /home/joris/miniconda3/lib/python3.5/site-packages/rasterstats/io.py:242: FutureWarning: GDAL-style transforms are deprecated and will not be supported in Rasterio 1.0. self.affine = guard_transform(self.src.transform) /home/joris/miniconda3/lib/python3.5/site-packages/rasterstats/io.py:294: UserWarning: Setting nodata to -999; specify nodata explicitly warnings.warn("Setting nodata to -999; specify nodata explicitly")
The legend available as an excel file:
legend = pd.read_excel("data/Vegetation/Carte_vgt_ucl_10/carte_foraf/legend_central_africa_vegetation_map_foraf.xls")
data['vegetation'] = data['vegetation'].replace(legend['Final Label (sur le tif)'])
data['vegetation'].tail(10)
2141 Mosaic cultivated areas / vegetation (herbaceo... 2142 Mosaic cultivated areas / vegetation (herbaceo... 2143 Rural complex and Young secondary forest 2144 Savanna woodland - Tree savanna 2145 Rural complex and Young secondary forest 2146 Rural complex and Young secondary forest 2147 Submontane forest 2148 Rural complex and Young secondary forest 2149 Rural complex and Young secondary forest 2150 Rural complex and Young secondary forest Name: vegetation, dtype: object
data.plot(column='vegetation', figsize=(15, 15), legend=True)
<matplotlib.axes._subplots.AxesSubplot at 0x7faebd571320>