Let's take a look at the Velib' open data. You'll need to register for free to get an API key if you want to run this notebook.
We first import some modules to load and analyze the data.
import urllib2
import json
import os
import datetime
import pandas as pd
We retrieve the API key in ~/.velib
, a text file with just the API key.
with open(os.path.expanduser('~/.velib'), 'r') as f:
key = f.read()
This function generates the full URL from a short REST path.
def geturl(path):
delim = '&' if '?' in path else '?'
return "https://api.jcdecaux.com/vls/v1/{0:s}{1:s}apiKey={2:s}".format(path, delim, key)
This function returns the requested data in a Python dictionary.
def get(path):
url = geturl(path)
return json.loads(urllib2.urlopen(url).read())
Here, we retrieve the list of all contracts, and show only the Paris
contract.
filter(lambda d: d['name'] == 'Paris', get('contracts'))
[{u'cities': [u'Arcueil', u'Aubervilliers', u'Bagnolet', u'Boulogne Billancourt', u'Charenton', u'Clichy', u'Fontenay-sous-Bois', u'Gentilly', u'Issy les Moulineaux', u'Ivry', u'Joinville', u'Le Kremlin Bic\xeatre', u'Le Pr\xe9 St Gervais', u'Les Lilas', u'Levallois-Perret', u'Malakoff', u'Montreuil', u'Montrouge', u'Neuilly', u'Nogent', u'Pantin', u'Paris', u'Puteaux', u'Saint Cloud', u'Saint Denis', u'Saint Mand\xe9', u'Saint Maurice', u'Saint Ouen', u'Suresnes', u'Vanves', u'Vincennes'], u'commercial_name': u'Velib', u'name': u'Paris'}]
Now, we retrieve the list of all stations in the Paris contract.
stations = get('stations?contract=Paris')
We also generate a Pandas DataFrame from this dictionary.
stations_df = pd.DataFrame(stations)
Let's analyse the bike stands.
stands = stations_df.bike_stands
print("""There are {0:d} stations with a total of {1:d} bike stands near Paris.
Each station has between {2:d} and {3:d} stands, with a mean of {4:.1f} stands.
""".format(
stands.count(),
stands.sum(),
stands.min(),
stands.max(),
stands.mean(),
))
There are 1227 stations with a total of 39920 bike stands near Paris. Each station has between 7 and 72 stands, with a mean of 32.5 stands.
Let's plot a histogram with the number of bike stands per station.
stands.hist();
title("Number of bike stands per station.");
When was the last bike availability update across all stations?
timestamp = stations_df.last_update.max()
date = datetime.datetime.fromtimestamp(timestamp / 1000.).strftime('%Y-%m-%d %H:%M:%S')
print(date)
2013-05-05 15:05:24
available_bike_stands = stations_df.available_bike_stands
available_bikes = stations_df.available_bikes
print("""There are {0:d} stations with no bikes out of {1:d} stations on {2:s}.""".format(
np.sum(available_bikes == 0),
available_bikes.count(),
date,))
There are 207 stations with no bikes out of 1227 stations on 2013-05-05 15:05:24.
We retrieve the coordinates of all stations, and remove the stations with no coordinates.
positions = np.array([(d['position']['lng'], d['position']['lat']) for d in stations])
indices = positions.min(axis=1) != 0.0
positions = positions[indices,:]
x, y = positions.T
Let's get the number of bike stands and available bikes for these stations.
sizes = stations_df.bike_stands[indices]
available_stands = stations_df.available_bike_stands[indices]
We now display all stations with the size proportional to the number of stands, and the color indicating the number of available bike stands (red=few free stands available, blue=most stands available).
figure(figsize=(12,8));
scatter(x, y, c=available_stands, s=sizes, edgecolors='none', cmap=get_cmap('RdYlGn'));
xticks([]);
yticks([]);
title("Available bike stands in Paris Velib' stations, {0:s}".format(date));
This is a sunny day, there are probably a lot of people around the Seine...