%matplotlib inline
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
In this case study, we will explore a dataset on artisanal mining sites located in eastern DR Congo.
Note: this tutorial is meant as a hands-on session, and most code examples are provided as exercises to be filled in. I highly recommend actually trying to do this yourself, but if you want to follow the solved tutorial, you can find this in the _solved
directory.
IPIS, the International Peace Information Service, manages a database on mining site visits in eastern DR Congo: http://ipisresearch.be/home/conflict-mapping/maps/open-data/
Since 2009, IPIS has visited artisanal mining sites in the region during various data collection campaigns. As part of these campaigns, surveyor teams visit mining sites in the field, meet with miners and complete predefined questionnaires. These contain questions about the mining site, the minerals mined at the site and the armed groups possibly present at the site.
Some additional links:
IPIS provides a WFS server to access the data. We can send a query to this server to download the data, and load the result into a geopandas GeoDataFrame:
import requests
import json
wfs_url = "http://geo.ipisresearch.be/geoserver/public/ows"
params = dict(service='WFS', version='1.0.0', request='GetFeature',
typeName='public:cod_mines_curated_all_opendata_p_ipis', outputFormat='json')
r = requests.get(wfs_url, params=params)
data_features = json.loads(r.content.decode('UTF-8'))
data_visits = geopandas.GeoDataFrame.from_features(data_features, crs={'init': 'epsg:4326'})
However, the data is also provided in the tutorial materials as a GeoJSON file, so it is certainly available during the tutorial.
# %load _solved/solutions/case-conflict-mapping3.py
# %load _solved/solutions/case-conflict-mapping4.py
# %load _solved/solutions/case-conflict-mapping5.py
The provided dataset contains a lot of information, much more than we are going to use in this tutorial. Therefore, we will select a subset of the column:
data_visits = data_visits[['vid', 'project', 'visit_date', 'name', 'pcode', 'workers_numb', 'interference', 'armed_group1', 'mineral1', 'geometry']]
data_visits.head()
Before starting the actual geospatial tutorial, we will use some more advanced pandas queries to construct a subset of the data that we will use further on:
# Take only the data of visits by IPIS
data_ipis = data_visits[data_visits['project'].str.contains('IPIS') & (data_visits['workers_numb'] > 0)]
# For those mining sites that were visited multiple times, take only the last visit
data_ipis_lastvisit = data_ipis.sort_values('visit_date').groupby('pcode', as_index=False).last()
data = geopandas.GeoDataFrame(data_ipis_lastvisit, crs=data_visits.crs)
Next to the mining site data, we are also going to use a dataset on protected areas (national parks) in Congo. This dataset was downloaded from http://www.wri.org/our-work/project/congo-basin-forests/democratic-republic-congo#project-tabs and included in the tutorial repository: data/cod_conservation.zip
.
# %load _solved/solutions/case-conflict-mapping10.py
# %load _solved/solutions/case-conflict-mapping11.py
We will see that both datasets use a different Coordinate Reference System (CRS). For many operations, however, it is important that we use a consistent CRS, and therefore we will convert both to a commong CRS.
But first, we explore problems we can encounter related to CRSs.
Goma is the capital city of North Kivu province of Congo, close to the border with Rwanda. It's coordinates are 1.66°S 29.22°E.
# %load _solved/solutions/case-conflict-mapping12.py
# %load _solved/solutions/case-conflict-mapping13.py
# %load _solved/solutions/case-conflict-mapping14.py
# %load _solved/solutions/case-conflict-mapping15.py
The distances we see here in degrees, which is not helpful for interpreting those distances. That is a reason we will convert the data to another coordinate reference system (CRS) for the remainder of this tutorial.
Check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.
# %load _solved/solutions/case-conflict-mapping16.py
You will notice that the protected areas and mining sites do not map to the same area on the plot. This is because the Coordinate Reference Systems (CRS) differ for both datasets. Another reason we will need to convert the CRS!
Let's check the Coordinate Reference System (CRS) for both datasets.
The mining sites data uses the WGS 84 lat/lon (EPSG 4326) CRS:
data.crs
The protected areas dataset, on the other hand, uses a WGS 84 / World Mercator (EPSG 3395) projection (with meters as unit):
protected_areas.crs
We will convert both datasets to a local UTM zone, so we can plot them together and that distance-based calculations give sensible results.
To find the appropriate UTM zone, you can check http://www.dmap.co.uk/utmworld.htm or https://www.latlong.net/lat-long-utm.html, and in this case we will use UTM zone 35, which gives use EPSG 32735: https://epsg.io/32735
# %load _solved/solutions/case-conflict-mapping19.py
# %load _solved/solutions/case-conflict-mapping20.py
For the following exercises, check the first section of the [04-more-on-visualization.ipynb](04-more-on-visualization.ipynb) notebook for tips and tricks to plot with GeoPandas.
# %load _solved/solutions/case-conflict-mapping21.py
# %load _solved/solutions/case-conflict-mapping22.py
In addition to the previous figure:
# %load _solved/solutions/case-conflict-mapping23.py
In addition to the previous figure:
# %load _solved/solutions/case-conflict-mapping24.py
# %load _solved/solutions/case-conflict-mapping25.py
# %load _solved/solutions/case-conflict-mapping26.py
# %load _solved/solutions/case-conflict-mapping27.py
# %load _solved/solutions/case-conflict-mapping28.py
# %load _solved/solutions/case-conflict-mapping29.py
# %load _solved/solutions/case-conflict-mapping30.py
# %load _solved/solutions/case-conflict-mapping31.py
# %load _solved/solutions/case-conflict-mapping32.py
Based on the analysis and visualizations above, we can already see that there are mining sites inside the protected areas. Let's now do an actual spatial join to determine which sites are within the protected areas.
# %load _solved/solutions/case-conflict-mapping33.py
# %load _solved/solutions/case-conflict-mapping34.py
# %load _solved/solutions/case-conflict-mapping35.py
# %load _solved/solutions/case-conflict-mapping36.py
And what about the borders of the protected areas? (just outside the park)
# %load _solved/solutions/case-conflict-mapping37.py
# %load _solved/solutions/case-conflict-mapping38.py
# %load _solved/solutions/case-conflict-mapping39.py
# %load _solved/solutions/case-conflict-mapping40.py
# %load _solved/solutions/case-conflict-mapping41.py