Importing pandas, a leading data manipulation python library
import pandas as pd
Reading in the data set.
df = pd.read_csv('raw/data.csv', names=['station_id', 'bike_id', 'from', 'to'])
Observing the data set
type(df)
pandas.core.frame.DataFrame
len(df)
1000
df.shape
(1000, 4)
df.head(10)
station_id | bike_id | from | to | |
---|---|---|---|---|
0 | 212 | 2294 | 2020-12-13T11:26:54Z | 2020-12-17T16:13:54Z |
1 | 418 | 3441 | 2020-08-25T11:37:11Z | 2020-08-31T11:18:11Z |
2 | 301 | 6467 | 2021-04-10T17:05:16Z | 2021-04-11T15:00:16Z |
3 | 560 | 9386 | 2021-04-28T12:10:24Z | 2021-05-02T07:31:24Z |
4 | 87 | 8755 | 2021-01-10T10:40:53Z | 2021-01-11T20:58:53Z |
5 | 651 | 1525 | 2020-08-08T06:44:20Z | 2020-08-11T01:15:20Z |
6 | 40 | 8238 | 2021-01-16T10:20:47Z | 2021-01-16T14:06:47Z |
7 | 97 | 2437 | 2021-06-21T15:52:08Z | 2021-06-24T00:56:08Z |
8 | 362 | 8428 | 2021-03-12T16:38:10Z | 2021-03-17T05:07:10Z |
9 | 234 | 7266 | 2021-01-02T19:17:57Z | 2021-01-08T01:42:57Z |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 station_id 1000 non-null int64 1 bike_id 1000 non-null int64 2 from 1000 non-null object 3 to 1000 non-null object dtypes: int64(2), object(2) memory usage: 31.4+ KB
Using the datetime objects in Pandas DataFrame.
df['from'] = pd.to_datetime(df['from'])
df['to'] = pd.to_datetime(df['to'])
Subracting the two date columns and return a Series of timedelta objects with the time diff between each row.
difference = (df['to'] - df['from'])
difference
0 4 days 04:47:00 1 5 days 23:41:00 2 0 days 21:55:00 3 3 days 19:21:00 4 1 days 10:18:00 ... 995 5 days 02:16:00 996 0 days 20:08:00 997 5 days 14:35:00 998 3 days 04:50:00 999 3 days 00:12:00 Length: 1000, dtype: timedelta64[ns]
avg_journey_duration = difference.mean()
print(f'{avg_journey_duration} is the average (mean) journey duration across all bikes and all stations for this reporting period.')
3 days 13:43:53.760000 is the average (mean) journey duration across all bikes and all stations for this reporting period.