In [1]:

%matplotlib inline

import pandas as pd

In [2]:

trajets = pd.read_hdf('../datachallenge/divvy-trips.h5', 'fixed')

Afficher les statistiques sur la durée

In [3]:

trajets.tripduration.describe()

Out[3]:

count    1548935.000000
mean         996.651052
std         1746.803926
min           60.000000
25%          422.000000
50%          724.000000
75%         1200.000000
max        86392.000000
dtype: float64

Les mêmes, mais en minutes

In [4]:

(trajets.tripduration / 60).describe()

Out[4]:

count    1548935.000000
mean          16.610851
std           29.113399
min            1.000000
25%            7.033333
50%           12.066667
75%           20.000000
max         1439.866667
dtype: float64

Abonnés vs occasionnels

Attention! sum() n'est pas la même chose que count(). Ici on veut compter le nombre de lignes (peu importe quelle colonne).

In [5]:

trajets.groupby('usertype').count().trip_id.plot(kind='bar')

Out[5]:

<matplotlib.axes._subplots.AxesSubplot at 0x7faaba1c8350>

In [6]:

# Une solution alternative
trajets.usertype.value_counts().plot(kind='bar')

Out[6]:

<matplotlib.axes._subplots.AxesSubplot at 0x7faaba152310>

Age maximum et minimum

In [7]:

(2014 - trajets.birthyear).describe()

Out[7]:

count    1071696.000000
mean          35.155730
std           10.476417
min           16.000000
25%           27.000000
50%           32.000000
75%           41.000000
max          116.000000
dtype: float64

Distribution des sexes.

Mêmes remarques qu'avant.

In [8]:

trajets.groupby('gender').count().trip_id.plot(kind='pie')

Out[8]:

<matplotlib.axes._subplots.AxesSubplot at 0x7faaba12f690>

Trajets en fonction de l'age.

Encore une fois, on veut compter, pas sommer !

In [9]:

trajets.groupby(2014 - trajets.birthyear).count().trip_id.plot()

Out[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x7faab97d0350>

In [10]:

trajets.set_index('starttime', inplace=True, drop=False)
trajets['date'] = trajets.index.date
trajets['joursemaine'] = trajets.index.weekday
trajets['heure'] = trajets.index.hour

Durée moyenne par jour (calendrier).

On ne veut plus compter : on veut une moyenne (mean)

In [11]:

trajets.groupby('date').mean().tripduration.plot()

Out[11]:

<matplotlib.axes._subplots.AxesSubplot at 0x7faab3f6b2d0>

In [12]:

trajets.groupby('joursemaine').mean().tripduration.plot(kind='bar')

Out[12]:

<matplotlib.axes._subplots.AxesSubplot at 0x7faab9626550>

In [12]: