For the last two years, I've used a R Markdown file to visualize my OpenPaths.cc-data and give some insight on my wherebaouts in the passed year. I've been using Python a ton more this year, and got to love both Jupyter and mplleaflet (for my happy new year card). And also because I can more easily massage the data in Python, I've switched the old "analysis" to this Jupyter notebook.
To visualize the OpenPaths data, we first have to import some Python packages
# Imports
import numpy # Numerical calculations
numpy.seterr(divide='ignore', invalid='ignore') # We use 'log' below for (sub)zero values and don't want to be warned
import matplotlib.pyplot as plt # The de-facto standard for Python plotting
%matplotlib inline
import mplleaflet # Easy matplotlib plots to interactive Leaflet web maps
import pandas # Data analysis and statistics
import geopy # Geodata handling
from geopy.geocoders import Nominatim # Address search from OpenStreetMap
import geopandas as gpd # Geodata handling, again
# Setup some default values
plt.rcParams['image.cmap'] = 'viridis' # Change default colormap to a nice one (https://bids.github.io/colormap/)
plt.rcParams['figure.figsize'] = (16, 9) # We live in a widescreen world
At first we use the pandas
CSV reader to import the data.
The fourth column is the date colums, so we immediately parse this as a date.
# Load data from CSV
openpaths = pandas.read_csv('openpaths_habi.csv', parse_dates=[3])
openpaths.describe()
lat | lon | alt | version | |
---|---|---|---|---|
count | 28275.000000 | 28275.000000 | 28275.000000 | 2.827500e+04 |
mean | 46.336489 | 14.821240 | 514.422474 | 1.100000e+00 |
std | 3.143443 | 29.446446 | 362.733171 | 2.220485e-16 |
min | 30.024679 | -9.880663 | -48.000000 | 1.100000e+00 |
25% | 46.931829 | 7.437073 | 356.000000 | 1.100000e+00 |
50% | 46.968079 | 7.900918 | 489.330231 | 1.100000e+00 |
75% | 47.481705 | 8.220958 | 553.736145 | 1.100000e+00 |
max | 53.586563 | 141.174377 | 3893.417969 | 1.100000e+00 |
Since we're only interested in this years data, we subset the so called dataframe to 2016.
If we want to show all the data in the file, we could set plot_current_year' to
False`, then no subsetting happens.
# Use only this years data
plot_current_year = True
if plot_current_year:
whichyear = 2016
thisyear = openpaths[pandas.Timestamp(str(whichyear)) < openpaths['date']]
thisyear = thisyear[thisyear['date'] < pandas.Timestamp(str(whichyear + 1))]
thisyear.describe()
else:
thisyear = openpaths
# Show the beginning of the dataframe
thisyear.head()
lat | lon | alt | date | device | os | version | |
---|---|---|---|---|---|---|---|
22765 | 46.286259 | 7.798826 | 1208.067993 | 2016-01-01 07:57:52 | iPhone6,2 | 9.2.1 | 1.1 |
22766 | 46.294952 | 7.800951 | 1017.783081 | 2016-01-01 10:57:04 | iPhone6,2 | 9.2.1 | 1.1 |
22767 | 46.304230 | 7.801154 | 640.996643 | 2016-01-01 11:03:28 | iPhone6,2 | 9.2.1 | 1.1 |
22768 | 46.306042 | 7.809315 | 640.624329 | 2016-01-01 11:14:08 | iPhone6,2 | 9.2.1 | 1.1 |
22769 | 46.313583 | 7.823861 | 635.348633 | 2016-01-01 11:22:40 | iPhone6,2 | 9.2.1 | 1.1 |
I'm still using an iPhone 5S (iPhone6,2
) the whole year (when the new Salt-store opened in town one could get repairs for half the price. I wanted to fix my broken display and get a new battery on my nearly three year old phone. Insted of repairing the phone, I got a new one for a very good price, so it's actually two different phones, but the same model).
I went through 9 different versions of iOS.
If we assume that the app tracked the positions equally for each version, then I've used iOS 9.3 the longest, with nearly 2000 data points.
print('Iphone models: %s' % thisyear['device'].unique()[0])
print('iOS versions: %s' % len(thisyear['os'].unique()))
for version in sorted(thisyear['os'].unique()):
print('Version %s:\t%4s data points' % (version, len(thisyear[thisyear['os'] == version])))
Iphone models: iPhone6,2 iOS versions: 9 Version 10.0: 1149 data points Version 10.0.2: 706 data points Version 10.1.1: 378 data points Version 10.2: 33 data points Version 9.2.1: 256 data points Version 9.3: 1979 data points Version 9.3.1: 214 data points Version 9.3.2: 535 data points Version 9.3.5: 256 data points
Let's see where we've been all this year. For the overview we use the nice 'Toner' maps from Stamen.
It seems that I've been only in Switzerland and Morocco. Unfortunately, most of the data from May is missing, so there's no location points from our vacation in Sardinia...
# Plot a subset of the data points
# We use a subset of the data to not overwhelm the map...
subset=10
plt.scatter(thisyear['lon'][::subset],
thisyear['lat'][::subset], edgecolor='none', alpha=0.618, s=200)
mplleaflet.display(tiles='stamen_toner')