In [39]:
import sys
sys.path.append('./notebooks')
from gps_utils import rdp

import matplotlib.pyplot as plt
plt.rcParams['axes.xmargin'] = 0.1
plt.rcParams['axes.ymargin'] = 0.1
%matplotlib inline

import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("talk")

%load_ext autoreload
%autoreload 2

# to suppress warnings of Seaborn's deprecated usage of Matplotlib
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd

from pykalman import KalmanFilter
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Handling GPS data with Python

Dr. Florian Wilhelm

Senior Data Scientist @ inovex

Motivation

  • Project work and general interest
  • Hard to find information about Python libraries
  • Interest in the mathematical algorithms of that domain
  • Needed a subject for a EuroPython talk ;-)

GPX: GPS Exchange Format

  • common GPS data format
  • based on XML schema
  • describes waypoints, routes and tracks
  • GPS coordinates based GS-84 ellipsoid, height in meters

General structure

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<gpx version="1.1" creator="Creator of the file">
  <metadata> <!-- metadata of the file --> </metadata>
  <wpt lat="xx.xxx" lon="yy.yyy"><!-- ... --></wpt>
  <!-- more waypoints -->
  <rte>
    <!-- attributes of the route -->
    <rtept lat="xx.xxx" lon="yy.yyy"><!-- ... --></rtept>
    <!-- more route points -->
  </rte>
  <!-- more routes -->
  <trk>
    <!-- attributes of the track -->
    <trkseg>
      <trkpt lat="xx.xxx" lon="yy.yyy"><!-- ... --></trkpt>
      <!-- more track points -->
    </trkseg>
    <!-- more track segments -->
  </trk>
  <!-- more tracks -->
</gpx>

Example from Polar Flow

<?xml version="1.0" encoding="UTF-8"?>
<gpx xmlns="http://www.topografix.com/GPX/1/1" version="1.1" creator="Polar Flow">
  <metadata>
    <author>
      <name>Polar</name>
    </author>
    <time>2016-04-17T08:02:12.000Z</time>
  </metadata>
  <trk>
    <trkseg>
      <trkpt lat="53.560591" lon="9.9755985">
        <ele>17.0</ele>
        <time>2016-04-17T08:02:12.000Z</time>
      </trkpt>
      <trkpt lat="53.560591" lon="9.9755985">
        <ele>17.0</ele>
        <time>2016-04-17T08:02:13.000Z</time>
      </trkpt>
      <!-- ... many more points -->
    </trkseg>
  </trk>
</gpx>

gpxpy: GPX file parser

  • for reading and writing GPX files
  • licensed unter Apache 2.0
  • contains gpxinfo cli tool for basic stats
  • Python 3 compatible
  • written by Tomo Krajina
  • used by http://www.trackprofiler.com/
In [40]:
import gpxpy

with open('./gpx/hh_marathon.gpx') as fh:
    gpx_file = gpxpy.parse(fh)
In [41]:
segment = gpx_file.tracks[0].segments[0]
coords = pd.DataFrame([
        {'lat': p.latitude, 
         'lon': p.longitude, 
         'ele': p.elevation,
         'time': p.time} for p in segment.points])
coords.set_index('time', drop=True, inplace=True)
coords.head(3)
Out[41]:
ele lat lon
time
2016-04-17 08:02:12 17.0 53.560591 9.975599
2016-04-17 08:02:13 17.0 53.560591 9.975599
2016-04-17 08:02:14 17.0 53.560561 9.975591

Plotting a track

In [42]:
plt.plot(coords['lon'].values, coords['lat'].values);

... and the actual GPS coordinates

In [43]:
plt.plot(coords['lon'].values, coords['lat'].values)
plt.plot(coords['lon'].values, coords['lat'].values, 'ro');

Ohh! Need to reduce the number of points!

In [44]:
plt.plot(coords['lon'].values, coords['lat'].values)
plt.plot(coords['lon'].values[::150], coords['lat'].values[::150], 'ro');

Simplifying GPS tracks

Most GPS sensors have a uniform sampling rate leading to overly many points on almost straight lines.

How can we reduce the number of points?

Ramer-Douglas-Peucker algorithm



In [45]:
simple_coords = rdp(coords[['lon', 'lat']].values, epsilon=1e-4)
print("{} points reduced to {}!".format(coords.shape[0], simple_coords.shape[0]))
plt.plot(simple_coords[:, 0], simple_coords[:, 1])
plt.plot(simple_coords[:, 0], simple_coords[:, 1], 'ro');
12072 points reduced to 193!

Elevation of a GPS track

In [46]:
with open('./gpx/3-laender-giro.gpx') as fh:
    gpx_file = gpxpy.parse(fh)
    
segment = gpx_file.tracks[0].segments[0]
coords = pd.DataFrame([{'lat': p.latitude, 
                        'lon': p.longitude, 
                        'ele': p.elevation,
                        'time': p.time} for p in segment.points])
coords.set_index('time', drop=True, inplace=True)
In [47]:
coords.ele.plot();

... but what if the elevation was missing?

In [48]:
# Delete elevation data
for p in gpx_file.tracks[0].segments[0].points:
    p.elevation = None

srtm.py: add missing GPS elevation

In [49]:
import srtm
elevation_data = srtm.get_data()
elevation_data.add_elevations(gpx_file, smooth=True)
In [50]:
coords['srtm'] = [p.elevation for p in gpx_file.tracks[0].segments[0].points]
coords[['ele','srtm']].plot(title='Elevation');

mplleaflet: interactive Leaflet web maps

  • converts a matplotlib plot into a pannable, zoomable slippy map
  • extremely easy to use
  • licensed under new-BSD
  • integrates well with Jupyter
  • Python 3 compatible
  • tracks should be simplified with Ramer-Douglas-Peucker algorithm first

First we start with a matplotlib plot...

In [51]:
fig = plt.figure()
plt.plot(simple_coords[:,0], simple_coords[:,1])
plt.plot(simple_coords[:,0], simple_coords[:,1], 'ro');

... and project it onto OpenStreetMap

In [52]:
import mplleaflet
mplleaflet.display(fig=fig) # .show(fig=fig) to display in new window
Out[52]:
In [53]:
# prepare 3-laender-giro gps track
fig = plt.figure()
simple_coords = rdp(coords[['lon', 'lat']].values, epsilon=0.0003)
plt.plot(simple_coords[:, 0], simple_coords[:, 1])
plt.plot(simple_coords[:, 0], simple_coords[:, 1], 'ro');
In [54]:
mplleaflet.display(fig=fig)
Out[54]: