%matplotlib inline
import pandas as pd
import geopandas
Geospatial data is often available from specific GIS file formats or data stores, like ESRI shapefiles, GeoJSON files, geopackage files, PostGIS (PostgreSQL) database, ...
We can use the GeoPandas library to read many of those GIS file formats (relying on the fiona
library under the hood, which is an interface to GDAL/OGR), using the geopandas.read_file
function.
For example, let's start by reading a shapefile with all the countries of the world (adapted from http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/, zip file is available in the /data
directory), and inspect the data:
countries = geopandas.read_file("zip://./data/ne_110m_admin_0_countries.zip")
# or if the archive is unpacked:
# countries = geopandas.read_file("data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")
countries.head()
countries.plot()
What can we observe:
.head()
we can see the first rows of the dataset, just like we can do with Pandas..plot()
method to quickly get a basic visualization of the dataWe used the GeoPandas library to read in the geospatial data, and this returned us a GeoDataFrame
:
type(countries)
A GeoDataFrame contains a tabular, geospatial dataset:
Such a GeoDataFrame
is just like a pandas DataFrame
, but with some additional functionality for working with geospatial data:
.geometry
attribute that always returns the column with the geometry information (returning a GeoSeries). The column name itself does not necessarily need to be 'geometry', but it will always be accessible as the .geometry
attribute.countries.geometry
type(countries.geometry)
countries.geometry.area
It's still a DataFrame, so we have all the pandas functionality available to use on the geospatial dataset, and to do data manipulations with the attributes and geometry information together.
For example, we can calculate average population number over all countries (by accessing the 'pop_est' column, and calling the mean
method on it):
countries['pop_est'].mean()
Or, we can use boolean filtering to select a subset of the dataframe based on a condition:
africa = countries[countries['continent'] == 'Africa']
africa.plot()
The rest of the tutorial is going to assume you already know some pandas basics, but we will try to give hints for that part for those that are not familiar.
A few resources in case you want to learn more about pandas:
Spatial vector data can consist of different types, and the 3 fundamental types are:
And each of them can also be combined in multi-part geometries (See https://shapely.readthedocs.io/en/stable/manual.html#geometric-objects for extensive overview).
For the example we have seen up to now, the individual geometry objects are Polygons:
print(countries.geometry[2])
Let's import some other datasets with different types of geometry objects.
A dateset about cities in the world (adapted from http://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-populated-places/, zip file is available in the /data
directory), consisting of Point data:
cities = geopandas.read_file("zip://./data/ne_110m_populated_places.zip")
print(cities.geometry[0])
And a dataset of rivers in the world (from http://www.naturalearthdata.com/downloads/50m-physical-vectors/50m-rivers-lake-centerlines/, zip file is available in the /data
directory) where each river is a (multi-)line:
rivers = geopandas.read_file("zip://./data/ne_50m_rivers_lake_centerlines.zip")
print(rivers.geometry[0])
type(countries.geometry[0])
To construct one ourselves:
from shapely.geometry import Point, Polygon, LineString
p = Point(0, 0)
print(p)
polygon = Polygon([(1, 1), (2,2), (2, 1)])
polygon.area
polygon.distance(p)
ax = countries.plot(edgecolor='k', facecolor='none', figsize=(15, 10))
rivers.plot(ax=ax)
cities.plot(ax=ax, color='red')
ax.set(xlim=(-20, 60), ylim=(-40, 40))
See the 04-more-on-visualization.ipynb notebook for more details on visualizing geospatial datasets.
Throughout the exercises in this course, we will work with several datasets about the city of Paris.
Here, we start with the following datasets:
paris_districts_utm.geojson
data/paris_bike_stations_mercator.gpkg
Both datasets are provided as files.
Let's explore those datasets:
# %load _solved/solutions/01-introduction-geospatial-data1.py
# %load _solved/solutions/01-introduction-geospatial-data2.py
# %load _solved/solutions/01-introduction-geospatial-data3.py
# %load _solved/solutions/01-introduction-geospatial-data4.py
# %load _solved/solutions/01-introduction-geospatial-data5.py
A plot with just some points can be hard to interpret without any spatial context. Therefore, in the next exercise we will learn how to add a background map.
We are going to make use of the contextily package. The add_basemap()
function of this package makes it easy to add a background web map to our plot. We begin by plotting our data first, and then pass the matplotlib axes object (returned by dataframe's plot()
method) to the add_basemap()
function. contextily
will then download the web tiles needed for the geographical extent of your plot.
# %load _solved/solutions/01-introduction-geospatial-data6.py
# %load _solved/solutions/01-introduction-geospatial-data7.py
# %load _solved/solutions/01-introduction-geospatial-data8.py
# %load _solved/solutions/01-introduction-geospatial-data9.py
# %load _solved/solutions/01-introduction-geospatial-data10.py
# %load _solved/solutions/01-introduction-geospatial-data11.py
# %load _solved/solutions/01-introduction-geospatial-data12.py
# %load _solved/solutions/01-introduction-geospatial-data13.py
# %load _solved/solutions/01-introduction-geospatial-data14.py
# %load _solved/solutions/01-introduction-geospatial-data15.py
# %load _solved/solutions/01-introduction-geospatial-data16.py
# %load _solved/solutions/01-introduction-geospatial-data17.py
# %load _solved/solutions/01-introduction-geospatial-data18.py
# %load _solved/solutions/01-introduction-geospatial-data19.py
fiona
¶Under the hood, GeoPandas uses the Fiona library (pythonic interface to GDAL/OGR) to read and write data. GeoPandas provides a more user-friendly wrapper, which is sufficient for most use cases. But sometimes you want more control, and in that case, to read a file with fiona you can do the following:
import fiona
from shapely.geometry import shape
with fiona.Env():
with fiona.open("zip://./data/ne_110m_admin_0_countries.zip") as collection:
for feature in collection:
# ... do something with geometry
geom = shape(feature['geometry'])
# ... do something with properties
print(feature['properties']['name'])
geopandas.GeoDataFrame({
'geometry': [Point(1, 1), Point(2, 2)],
'attribute1': [1, 2],
'attribute2': [0.1, 0.2]})
For example, if you have lat/lon coordinates in two columns:
df = pd.DataFrame(
{'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
gdf = geopandas.GeoDataFrame(
df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))
gdf
See http://geopandas.readthedocs.io/en/latest/gallery/create_geopandas_from_pandas.html for full example