from __future__ import print_function
On 10 May 2015, I'll run my third marathon in Eugene, Oregon. This is my second time training for this particular race, and my second attempt to run 26.2 miles in less than four hours. I missed it last year by about 10 minutes.
In this notebook, I compare my training log to the one from my previous effort. This data is all on Strava, which has an excellent API and third-party Python binding, so it's easy to dive in.
%matplotlib inline
import numpy as np
import pylab as p
p.mpl.rc('savefig', dpi=200)
p.mpl.rc('figure', figsize=(5,2.5))
p.mpl.rc('font', size=6)
from stravalib import Client, unithelper
To use the Strava API I needed to sign up for an access token. This is a secret string that authenticates me and limits my usage to 600 requests every 15 minutes. I pasted it into a separate file to avoid publishing it.
with open('strava_token.txt') as f:
TOKEN = f.readline().rstrip()
c = Client(access_token=TOKEN)
me = c.get_athlete()
me
<Athlete id=2930517 firstname=Thomas lastname=Baldwin>
I can grab a list of all the activities I've ever done:
activities = c.get_activities()
# strava returns most recent first, reverse this and convert to list
activities = list(activities)[::-1]
As an example, look back on the first GPS run I ever logged:
eg = activities[0]
eg.name
u'First Nike+ run'
print(eg.distance)
6750.00 m
stravalib
is managing units for me, which is cool. I can use unithelper
to work in miles instead of meters:
print(unithelper.miles(eg.distance))
4.19 mi
My list of activities also involves bike rides, hikes, etc. I'll limit myself to only runs:
runs = [a for a in activities if a.type == a.RUN]
To plot these all on one graph, I make numpy arrays for the dates and the distances. (I cast the distances to floating-point to discard the unit.)
dists = np.array([float(unithelper.miles(a.distance)) for a in runs])
dates = np.array([a.start_date_local for a in runs])
p.plot(dates, dists, 'o-')
p.xticks(rotation='vertical');
I usually run about 4 miles at a time. My two periods of marathon training are pretty obvious on this graph - two 18-week spans where I often went long.
A good way of visualizing cumulative mileage is the famous "Goering diagram", named for its inventor, Andrea Goering.
import datetime
def in_year(date, year):
begin = datetime.datetime(year, 1, 1)
end = datetime.datetime(year + 1, 1, 1)
return (date > begin) & (date < end)
years = (2013, 2014, 2015)
for i,year in enumerate(reversed(years)):
mask = in_year(dates, year)
logged = np.cumsum(dists[mask])
calendar = dates[mask] + i * datetime.timedelta(365)
p.plot(calendar, logged, 'o-', label=str(year))
p.xticks(rotation='vertical')
p.ylabel('cumulative mileage')
p.legend(loc='best');
It looks like I've really been getting after it in 2015, which is true, but not any more so than in 2014. I've just been doing so earlier in the year, since the marathon was moved from late July to early May.
A fairer Goering diagram would compare my 18-week preparation for each of these races in isolation:
races = [
('EM 2014', datetime.date(2014, 7, 27)),
('EM 2015', datetime.date(2015, 5, 10)),
]
def end_of_day(date):
return datetime.datetime.combine(date, datetime.time(23, 59, 59))
def in_training(date, race_day):
end = end_of_day(race_day)
begin = end - datetime.timedelta( 7 * 18 ) # 18 weeks
return (date > begin) & (date < end)
for year,race_day in reversed(races):
mask = in_training(dates, race_day)
logged = np.cumsum(dists[mask])
diff = dates[mask] - end_of_day(race_day)
tminus = [dt.days for dt in diff]
p.plot(tminus, logged, 'o-', label=str(year))
p.xticks(rotation='vertical')
p.ylabel('cumulative mileage')
p.xlabel('race countdown')
p.legend(loc='best');
I've actually trained less for the 2015 race than I did for the 2014 one. I lost two consecutive weeks to injury/illness this time around, as opposed to only one last year.
Outside of those periods, my training has been more or less identical. I stick with "Novice 2" by Hal Higdon.
Strava also provides 'stream data' - raw data logs from the workout. I'll fetch a couple from a track workout I did with the TRE Flyers a couple weeks ago.
FLYERS = 291449438
run = c.get_activity(FLYERS)
run
WARNING:stravalib.model.Activity:No such attribute similar_activities on entity <Activity id=291449438 name=u'Your Fly is Open' resource_state=None>
<Activity id=291449438 name=u'Your Fly is Open' resource_state=3>
types = ['time', 'moving', 'distance', 'velocity_smooth']
# download streams from strava
streams = c.get_activity_streams(291449438, types)
time = np.array(streams['time'].data)
distance = np.array(streams['distance'].data)
moving = np.array(streams['moving'].data)
velocity = np.array(streams['velocity_smooth'].data)
It's easy to plot a pace graph for the workout:
p.plot(time, velocity)
p.xlabel('time (seconds)')
p.ylabel('speed (m/s)')
<matplotlib.text.Text at 0x10807c310>
Looks like I did 4 sets of 4x400m, with a warmup mile and a couple cooldown laps.
I can also plot cumulative distance within the workout:
p.plot(time, distance, '-o')
p.xlabel('time (seconds)')
p.ylabel('distance (m)')
<matplotlib.text.Text at 0x10390fcd0>
I paused my GPS for a while after my warmup lap, but I left it running during the other rests. Strava can figure out when I wasn't moving, regardless of whether I paused the recording. This is the "moving" stream:
p.plot(time[moving], distance[moving], '-o')
[<matplotlib.lines.Line2D at 0x108707450>]
Now I'll do a cumulative distance plot (trajectory) of each run in the 18-week training period. Hopefully they will scatter promisingly around my target pace.
ids = np.array([a.id for a in runs])
prep = {}
for race,race_day in reversed(races):
mask = in_training(dates, race_day)
prep[race] = ids[mask]
prep_2014 = prep['EM 2014']
prep_2015 = prep['EM 2015']
The following two cells make many Strava requests, so I will avoid re-running them.
tracks_2014 = [c.get_activity_streams(run_id, types) for run_id in prep_2014]
tracks_2015 = [c.get_activity_streams(run_id, types) for run_id in prep_2015]
MILE = 1609.
HOUR = 3600.
def make_trajectory(streams):
time = np.array(streams['time'].data)
distance = np.array(streams['distance'].data)
moving = np.array(streams['moving'].data)
velocity = np.array(streams['velocity_smooth'].data)
return time[moving], distance[moving]/MILE
def make_trajectory_plot(tracks, highlight_last=True):
for streams in tracks:
t,d = make_trajectory(streams)
p.plot(t/HOUR, d, 'k-')
if highlight_last:
p.gca().lines[-1].set_color('r')
p.gca().lines[-1].set_lw(2)
p.plot([0,4], [0,26.2], 'c--') # goal
p.ylim(0,26.2)
p.xlim(0,4.5);
p.xticks(np.arange(0,5))
f, (ax1, ax2) = p.subplots(1, 2, sharex=True, sharey=True)
f.subplots_adjust(wspace=0)
p.sca(ax1)
make_trajectory_plot(tracks_2014)
p.sca(ax2)
make_trajectory_plot(tracks_2015)
for ax,label in zip((ax1,ax2), (2014,2015)):
p.text(0.1, 0.8, label, transform=ax.transAxes)
ax.set_xlabel('time (hours)')
ax1.set_ylabel('distance (miles)')
<matplotlib.text.Text at 0x108994ad0>