Spatial relationships and operations

In [1]:
%matplotlib inline

import pandas as pd
import geopandas

pd.options.display.max_rows = 10
In [2]:
countries = geopandas.read_file("zip://./data/ne_110m_admin_0_countries.zip")
cities = geopandas.read_file("zip://./data/ne_110m_populated_places.zip")
rivers = geopandas.read_file("zip://./data/ne_50m_rivers_lake_centerlines.zip")

Spatial relationships

An important aspect of geospatial data is that we can look at spatial relationships: how two spatial objects relate to each other (whether they overlap, intersect, contain, .. one another).

The topological, set-theoretic relationships in GIS are typically based on the DE-9IM model. See https://en.wikipedia.org/wiki/Spatial_relation for more information.

(Image by Krauss, CC BY-SA 3.0)

Relationships between individual objects

Let's first create some small toy spatial objects:

A polygon (note: we use `.squeeze()` here to to extract the scalar geometry object from the GeoSeries of length 1):

In [3]:
belgium = countries.loc[countries['name'] == 'Belgium', 'geometry'].squeeze()

Two points:

In [4]:
paris = cities.loc[cities['name'] == 'Paris', 'geometry'].squeeze()
brussels = cities.loc[cities['name'] == 'Brussels', 'geometry'].squeeze()

And a linestring:

In [5]:
from shapely.geometry import LineString
line = LineString([paris, brussels])

Let's visualize those 4 geometry objects together (I only put them in a GeoSeries to easily display them together with the geopandas .plot() method):

In [6]:
geopandas.GeoSeries([belgium, paris, brussels, line]).plot(cmap='tab10')
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9975ca11d0>

You can recognize the abstract shape of Belgium.

Brussels, the capital of Belgium, is thus located within Belgium. This is a spatial relationship, and we can test this using the individual shapely geometry objects as follow:

In [7]:
brussels.within(belgium)
Out[7]:
True

And using the reverse, Belgium contains Brussels:

In [8]:
belgium.contains(brussels)
Out[8]:
True

On the other hand, Paris is not located in Belgium:

In [9]:
belgium.contains(paris)
Out[9]:
False
In [10]:
paris.within(belgium)
Out[10]:
False

The straight line we draw from Paris to Brussels is not fully located within Belgium, but it does intersect with it:

In [11]:
belgium.contains(line)
Out[11]:
False
In [12]:
line.intersects(belgium)
Out[12]:
True

Spatial relationships with GeoDataFrames

The same methods that are available on individual shapely geometries as we have seen above, are also available as methods on GeoSeries / GeoDataFrame objects.

For example, if we call the contains method on the world dataset with the paris point, it will do this spatial check for each country in the world dataframe:

In [13]:
countries.contains(paris)
Out[13]:
0      False
1      False
2      False
3      False
4      False
       ...  
172    False
173    False
174    False
175    False
176    False
Length: 177, dtype: bool

Because the above gives us a boolean result, we can use that to filter the dataframe:

In [14]:
countries[countries.contains(paris)]
Out[14]:
iso_a3 name continent pop_est gdp_md_est geometry
55 FRA France Europe 67106161.0 2699000.0 (POLYGON ((2.513573032246114 51.14850617126189...

And indeed, France is the only country in the world in which Paris is located.

Another example, extracting the linestring of the Amazon river in South America, we can query through which countries the river flows:

In [15]:
amazon = rivers[rivers['name'] == 'Amazonas'].geometry.squeeze()
In [16]:
countries[countries.crosses(amazon)]  # or .intersects
Out[16]:
iso_a3 name continent pop_est gdp_md_est geometry
22 BRA Brazil South America 207353391.0 3081000.0 POLYGON ((-57.625133429583 -30.21629485445423,...
35 COL Colombia South America 47698524.0 688000.0 POLYGON ((-66.87632585312258 1.253360500489336...
124 PER Peru South America 31036656.0 410400.0 POLYGON ((-69.52967810736496 -10.9517343075021...
REFERENCE:

Overview of the different functions to check spatial relationships (*spatial predicate functions*): * `equals` * `contains` * `crosses` * `disjoint` * `intersects` * `overlaps` * `touches` * `within` * `covers` See https://shapely.readthedocs.io/en/stable/manual.html#predicates-and-relationships for an overview of those methods. See https://en.wikipedia.org/wiki/DE-9IM for all details on the semantics of those operations.

Let's practice!

We will again use the Paris datasets to do some exercises. Let's start importing them again:

In [17]:
districts = geopandas.read_file("data/paris_districts_utm.geojson")
stations = geopandas.read_file("data/paris_sharing_bike_stations_utm.geojson")
EXERCISE: * Create a shapely `Point` object for the Notre Dame cathedral (which has x/y coordinates of (452321.4581477511, 5411311.330882619)) * Calculate the distance of each bike station to the Notre Dame. * Check in which district the Notre Dame is located.
In [18]:
from shapely.geometry import Point
In [19]:
notre_dame = Point(452321.4581477511, 5411311.330882619)
In [20]:
stations.distance(notre_dame)
Out[20]:
0       2143.292639
1       4138.525426
2       4053.291188
3       6427.098278
4       6059.063211
           ...     
1221    2549.427604
1222    3575.859934
1223    1521.203550
1224    5961.745568
1225    2347.675288
Length: 1226, dtype: float64
In [21]:
districts.contains(notre_dame)
Out[21]:
0     False
1     False
2     False
3     False
4     False
      ...  
75    False
76    False
77    False
78    False
79    False
Length: 80, dtype: bool
In [22]:
districts[districts.contains(notre_dame)]
Out[22]:
id district_name population geometry
15 16 Notre-Dame 4087 POLYGON ((453143.5543612476 5410820.043549786,...

Spatial operations

Next to the spatial predicates that return boolean values, Shapely and GeoPandas also provide operations that return new geometric objects.

Binary operations:

Buffer:

See https://shapely.readthedocs.io/en/stable/manual.html#spatial-analysis-methods for more details.

For example, using the toy data from above, let's construct a buffer around Brussels (which returns a Polygon):

In [23]:
geopandas.GeoSeries([belgium, brussels.buffer(1)]).plot(alpha=0.5, cmap='tab10')
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f997432a080>

and now take the intersection, union or difference of those two polygons:

In [24]:
brussels.buffer(1).intersection(belgium)
Out[24]:
In [25]:
brussels.buffer(1).union(belgium)
Out[25]:
In [26]:
brussels.buffer(1).difference(belgium)
Out[26]:

Another useful method is the unary_union attribute, which converts the set of geometry objects in a GeoDataFrame into a single geometry object by taking the union of all those geometries.

For example, we can construct a single object for the Africa continent:

In [27]:
africa_countries = countries[countries['continent'] == 'Africa']
In [28]:
africa = africa_countries.unary_union
In [29]:
africa
Out[29]:
In [30]:
print(str(africa)[:1000])
MULTIPOLYGON (((49.54351891459575 -12.46983285894055, 49.80898074727909 -12.89528492599955, 50.05651085795716 -13.55576140712198, 50.21743126811407 -14.7587887508768, 50.47653689962553 -15.22651213955054, 50.37711144389596 -15.70606943121913, 50.20027469259318 -16.00026336025677, 49.86060550313868 -15.41425261806692, 49.67260664246086 -15.71020354580248, 49.86334435405016 -16.45103687913878, 49.77456424337271 -16.8750420060936, 49.49861209493412 -17.10603565843827, 49.43561852397031 -17.95306406013437, 49.04179243347394 -19.11878101977445, 48.54854088724801 -20.49688811613413, 47.93074913919867 -22.39150115325108, 47.54772342305131 -23.78195891692852, 47.0957613462266 -24.94162973399045, 46.28247765481709 -25.17846282318411, 45.40950768411045 -25.60143442149309, 44.83357384621755 -25.34610116953894, 44.03972049334976 -24.98834522878231, 43.76376834491117 -24.46067717864999, 43.69777754087445 -23.5741163062506, 43.34565433123763 -22.77690398528387, 43.254187046081 -22.05741301848412, 43
REMEMBER:

GeoPandas (and Shapely for the individual objects) provides a whole lot of basic methods to analyse the geospatial data (distance, length, centroid, boundary, convex_hull, simplify, transform, ....), much more than the few that we can touch in this tutorial. * An overview of all methods provided by GeoPandas can be found here: http://geopandas.readthedocs.io/en/latest/reference.html

Let's practice!

EXERCISE: What are the districts close to the Seine?

Below, the coordinates for the Seine river in the neighbourhood of Paris are provided as a GeoJSON-like feature dictionary (created at http://geojson.io).

Based on this `seine` object, we want to know which districts are located close (maximum 150 m) to the Seine.

  • Create a buffer of 150 m around the Seine.
  • Check which districts intersect with this buffered object.
  • Make a visualization of the districts indicating which districts are located close to the Seine.

In [31]:
# created a line with http://geojson.io
s_seine = geopandas.GeoDataFrame.from_features({"type":"FeatureCollection","features":[{"type":"Feature","properties":{},"geometry":{"type":"LineString","coordinates":[[2.408924102783203,48.805619828930226],[2.4092674255371094,48.81703747481909],[2.3927879333496094,48.82325391133874],[2.360687255859375,48.84912860497674],[2.338714599609375,48.85827758964043],[2.318115234375,48.8641501307046],[2.298717498779297,48.863246707697],[2.2913360595703125,48.859519915404825],[2.2594070434570312,48.8311646245967],[2.2436141967773438,48.82325391133874],[2.236919403076172,48.82347994904826],[2.227306365966797,48.828339513221444],[2.2224998474121094,48.83862215329593],[2.2254180908203125,48.84856379804802],[2.2240447998046875,48.85409863123821],[2.230224609375,48.867989496547864],[2.260265350341797,48.89192242750887],[2.300262451171875,48.910203080780285]]}}]},
                                               crs={'init': 'epsg:4326'})
In [32]:
# convert to local UTM zone
s_seine_utm = s_seine.to_crs(epsg=32631)
In [33]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(20, 10))
districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')
s_seine_utm.plot(ax=ax)
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f99742abda0>
In [34]:
# access the single geometry object
seine = s_seine_utm.geometry.squeeze()
In [35]:
seine_buffer = seine.buffer(150)
In [36]:
seine_buffer
Out[36]:
In [37]:
districts_seine = districts[districts.intersects(seine_buffer)]
In [38]:
fig, ax = plt.subplots(figsize=(20, 10))
districts.plot(ax=ax, color='grey', alpha=0.4, edgecolor='k')
districts_seine.plot(ax=ax, color='blue', alpha=0.4, edgecolor='k')
s_seine_utm.plot(ax=ax)
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f99741e5908>
In [ ]: