Tutorial 0: Getting started with MovingPandas

MovingPandas provides a trajectory datatype based on GeoPandas. The project home is at https://github.com/anitagraser/movingpandas

This tutorial presents some of the trajectory manipulation and visualization functions implemented in MovingPandas.

After following this tutorial, you will have a basic understanding of what MovingPandas is and what it can be used for. You'll be ready to dive into application examples presented in the the follow-up tutorials:

Introduction

MovingPandas follows the trajectories = timeseries with geometries approach of modeling movement data.

A MovingPandas trajectory can be interpreted as either a time series of points or a time series of line segments. The line-based approach has many advantages for trajectory analysis and visualization. (For more detail, see e.g. Westermeier (2018))

alt text alt text alt text

References

  • Graser, A. (2019). MovingPandas: Efficient Structures for Movement Data in Python. GI_Forum ‒ Journal of Geographic Information Science 2019, 1-2019, 54-68. doi:10.1553/giscience2019_01_s54. URL: https://www.austriaca.at/rootcollection?arp=0x003aba2b
  • Westermeier, E.M. (2018). Contextual Trajectory Modeling and Analysis. Master Thesis, Interfaculty Department of Geoinformatics, University of Salzburg.

Jupyter notebook setup

In [ ]:
%matplotlib inline
In [ ]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
In [ ]:
import urllib
import os
import pandas as pd
import geopandas as gpd
from geopandas import GeoDataFrame, read_file
from shapely.geometry import Point, LineString, Polygon
from fiona.crs import from_epsg
from datetime import datetime, timedelta
from matplotlib import pyplot as plt

import sys
sys.path.append("..")
import movingpandas as mpd
print(mpd.__version__)

import warnings
warnings.simplefilter("ignore")
In [ ]:
CRS_METRIC = from_epsg(31256)

Creating a trajectory from scratch

Trajectory objects consist of a trajectory ID and a GeoPandas GeoDataFrame with a DatetimeIndex. The data frame therefore represents the trajectory data as a Pandas time series with associated point locations (and optional further attributes).

Let's create a small toy trajectory to see how this works:

In [ ]:
df = pd.DataFrame([
  {'geometry':Point(0,0), 't':datetime(2018,1,1,12,0,0)},
  {'geometry':Point(6,0), 't':datetime(2018,1,1,12,6,0)},
  {'geometry':Point(6,6), 't':datetime(2018,1,1,12,10,0)},
  {'geometry':Point(9,9), 't':datetime(2018,1,1,12,15,0)}
]).set_index('t')
geo_df = GeoDataFrame(df, crs=CRS_METRIC)
toy_traj = mpd.Trajectory(geo_df, 1)
toy_traj.df

We can access key information about our trajectory by looking at the print output:

In [ ]:
print(toy_traj)

We can also access the trajectories GeoDataFrame:

In [ ]:
toy_traj.df

Visualizing trajectories

To visualize the trajectory, we can turn it into a linestring.

(The notebook environment automatically plots Shapely geometry objects like the LineString returned by to_linestring().)

In [ ]:
toy_traj.to_linestring()

We can compute the speed of movement along the trajectory (between consecutive points). The values are in meters per second:

In [ ]:
toy_traj.add_speed(overwrite=True)
toy_traj.df

We can also visualize the speed values:

In [ ]:
toy_traj.plot(column="speed", linewidth=5, capstyle='round', legend=True)

In contrast to the earlier example where we visualized the whole trajectory as one linestring, the trajectory plot() function draws each line segment individually and thus each can have a different color.

Analyzing trajectories

MovingPandas provides many functions for trajectory analysis.

To see all available functions of the MovingPandas.Trajectory class use:

In [ ]:
dir(mpd.Trajectory)

Functions that start with an underscore (e.g. __str__) should not be called directly. All other functions are free to use.

Extracting a moving object's position was at a certain time

For example, let's have a look at the get_position_at() function:

In [ ]:
help(mpd.Trajectory.get_position_at)

When we call this method, the resulting point is directly rendered:

In [ ]:
toy_traj.get_position_at(datetime(2018,1,1,12,6,0), method="nearest")    

To see its coordinates, we can look at the print output:

In [ ]:
print(toy_traj.get_position_at(datetime(2018,1,1,12,6,0), method="nearest"))

The method parameter describes what the function should do if there is no entry in the trajectory GeoDataFrame for the specified timestamp.

For example, there is no entry at 2018-01-01 12:07:00

In [ ]:
toy_traj.df
In [ ]:
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="nearest"))
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="interpolated"))
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="ffill")) # from the previous row
print(toy_traj.get_position_at(datetime(2018,1,1,12,7,0), method="bfill")) # from the following row

Extracting trajectory segments based on time or geometry (i.e. clipping)

First, let's extract the trajectory segment for a certain time period:

In [ ]:
segment = toy_traj.get_segment_between(datetime(2018,1,1,12,6,0),datetime(2018,1,1,12,12,0))
print(segment)

Now, let's extract the trajectory segment that intersects with a given polygon:

In [ ]:
xmin, xmax, ymin, ymax = 2, 8, -10, 5
polygon = Polygon([(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin), (xmin, ymin)])
polygon
In [ ]:
intersections = toy_traj.clip(polygon)
print(intersections[0])
In [ ]:
intersections[0].plot(linewidth=5, capstyle='round')

Beyond toy trajectories: Loading trajectory data from GeoPackage

The MovingPandas repository contains a demo GeoPackage file that can be loaded as follows:

In [ ]:
%%time
df = read_file('data/demodata_geolife.gpkg')
df['t'] = pd.to_datetime(df['t'])
df = df.set_index('t').tz_localize(None)
print("Finished reading {} rows".format(len(df)))

After reading the trajectory point data from file, we want to construct the trajectories.

There are two options:

  1. Manually calling the Trajectory constructor
  2. Using TrajectoryCollection

Option 1: Creating trajectories manually

Pandas makes it straightforward to group trajectory points by trajectory id. After the grouping step, we can call the Trajectory constructor:

In [ ]:
%%time
trajectories = []
for key, values in df.groupby(['trajectory_id']):
    trajectory = mpd.Trajectory(values, key)
    print(trajectory)
    trajectories.append(trajectory)

print("Finished creating {} trajectories".format(len(trajectories)))

Option 2: Creating trajectories with TrajectoryCollection

TrajectoryCollection is a convenience class that takes care of creating trajectories from a GeoDataFrame:

In [ ]:
traj_collection = mpd.TrajectoryCollection(df, 'trajectory_id')
print(traj_collection)
In [ ]:
traj_collection.plot(column='trajectory_id', legend=True, figsize=(9,5))

Let's look at one of those trajectories:

In [ ]:
my_traj = traj_collection.trajectories[1]
print(my_traj)
In [ ]:
my_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True, vmax=20)

To visualize trajectories in their geographical context, we can also create interactive plots with basemaps:

In [ ]:
my_traj.hvplot(c='speed', width=700, height=400, line_width=7.0, tiles='StamenTonerBackground', cmap='Viridis', colorbar=True, clim=(0,20))
In [ ]:
( my_traj.hvplot(c='speed', width=700, height=400, line_width=7.0, tiles='StamenTonerBackground', cmap='Viridis', colorbar=True, clim=(0,20)) * 
  gpd.GeoDataFrame([my_traj.get_row_at(datetime(2009,6,29,8,0,0))]).hvplot(geo=True, size=200, color='red') )

Trajectory manipulation and handling

Finding intersections with a Shapely polygon

The clip function can be used to extract trajectory segments that are located within an area of interest polygon.

This is how to use clip on a list of Trajectory objects:

In [ ]:
xmin, xmax, ymin, ymax = 116.3685035,116.3702945,39.904675,39.907728
polygon = Polygon([(xmin,ymin), (xmin,ymax), (xmax,ymax), (xmax,ymin), (xmin,ymin)])

intersections = []
for traj in trajectories:
    for intersection in traj.clip(polygon):
        intersections.append(intersection)
print("Found {} intersections".format(len(intersections)))
In [ ]:
intersections[2].plot(linewidth=5.0, capstyle='round')

Alternatively, using TrajectoryCollection:

In [ ]:
clipped = traj_collection.clip(polygon)
clipped.trajectories[2].plot(linewidth=5.0, capstyle='round')

Splitting trajectories

Gaps are quite common in trajectories. For example, GPS tracks may contain gaps if moving objects enter tunnels where GPS reception is lost. In other use cases, moving objects may leave the observation area for longer time before returning and continuing their recorded track.

Depending on the use case, we therefore might want to split trajectories at observation gaps that exceed a certain minimum duration:

In [ ]:
my_traj = trajectories[1]
print(my_traj)
my_traj.plot(linewidth=5.0, capstyle='round')
In [ ]:
split = mpd.ObservationGapSplitter(my_traj).split(gap=timedelta(minutes=5))
for traj in split:
    print(traj)
In [ ]:
fig, axes = plt.subplots(nrows=1, ncols=len(split), figsize=(19,4))
for i, traj in enumerate(split):
    traj.plot(ax=axes[i], linewidth=5.0, capstyle='round')

Generalizing trajectories

To reduce the size of trajectory objects, we can generalize them, for example, using the Douglas-Peucker algorithm:

In [ ]:
original_traj = trajectories[1]
print(original_traj)
In [ ]:
original_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True)

Try different tolerance settings and observe the results in line geometry and therefore also length:

In [ ]:
generalized_traj = mpd.DouglasPeuckerGeneralizer(original_traj).generalize(tolerance=0.001)
generalized_traj.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True)
In [ ]:
print('Original length: %s'%(original_traj.get_length()))
print('Generalized length: %s'%(generalized_traj.get_length()))

An alternative generalization method is to down-sample the trajectory to ensure a certain time delta between records:

In [ ]:
time_generalized = mpd.MinTimeDeltaGeneralizer(original_traj).generalize(tolerance=timedelta(minutes=1))
time_generalized.plot(column='speed', linewidth=5, capstyle='round', figsize=(9,3), legend=True)
In [ ]:
time_generalized.df.head(10)
In [ ]:
original_traj.df.head(10)

Continue exploring MovingPandas