Finding Heavy Traffic Indicators on I-94
I'm going to analyze a dataset about the westbound traffic on the I-94 Interstate highway.
My goal is to find out what factors affect heavy traffic on I-94. These factors can be weather type, time of the day, time of the week, etc.
Here is the section to import all the packages/libraries that will be used through this notebook.
# Data handling
import pandas as pd
import numpy as np
from datetime import datetime
# Vizualisation (Matplotlib, Plotly, Seaborn, etc. )
import seaborn as sns
import matplotlib.pyplot as plt
# EDA (pandas-profiling, etc. )
...
# Feature Processing (Scikit-learn processing, etc. )
...
# Machine Learning (Scikit-learn Estimators, Catboost, LightGBM, etc. )
...
# Hyperparameters Fine-tuning (Scikit-learn hp search, cross-validation, etc. )
...
# Other packages
Ellipsis
Here is the section to load the datasets (train, eval, test) and the additional files
traffic=pd.read_csv("Metro_Interstate_Traffic_Volume.csv")
Here is the section to inspect the datasets in depth, present it, make hypotheses and think the cleaning, processing and features creation.
Have a look at the loaded datsets using the following methods: .head(), .info()
traffic.shape
(48204, 9)
traffic.head(20)
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | date_time | traffic_volume | |
---|---|---|---|---|---|---|---|---|---|
0 | None | 288.28 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 2012-10-02 09:00:00 | 5545 |
1 | None | 289.36 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 10:00:00 | 4516 |
2 | None | 289.58 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 11:00:00 | 4767 |
3 | None | 290.13 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 2012-10-02 12:00:00 | 5026 |
4 | None | 291.14 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2012-10-02 13:00:00 | 4918 |
5 | None | 291.72 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 14:00:00 | 5181 |
6 | None | 293.17 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 15:00:00 | 5584 |
7 | None | 293.86 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 16:00:00 | 6015 |
8 | None | 294.14 | 0.0 | 0.0 | 20 | Clouds | few clouds | 2012-10-02 17:00:00 | 5791 |
9 | None | 293.10 | 0.0 | 0.0 | 20 | Clouds | few clouds | 2012-10-02 18:00:00 | 4770 |
10 | None | 290.97 | 0.0 | 0.0 | 20 | Clouds | few clouds | 2012-10-02 19:00:00 | 3539 |
11 | None | 289.38 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 20:00:00 | 2784 |
12 | None | 288.61 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 21:00:00 | 2361 |
13 | None | 287.16 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 22:00:00 | 1529 |
14 | None | 285.45 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-02 23:00:00 | 963 |
15 | None | 284.63 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-03 00:00:00 | 506 |
16 | None | 283.47 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-03 01:00:00 | 321 |
17 | None | 281.18 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-03 02:00:00 | 273 |
18 | None | 281.09 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-03 03:00:00 | 367 |
19 | None | 279.53 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2012-10-03 04:00:00 | 814 |
traffic['holiday'].unique()
array(['None', 'Columbus Day', 'Veterans Day', 'Thanksgiving Day', 'Christmas Day', 'New Years Day', 'Washingtons Birthday', 'Memorial Day', 'Independence Day', 'State Fair', 'Labor Day', 'Martin Luther King Jr Day'], dtype=object)
MORNING This is the time from midnight to midday.
AFTERNOON This is the time from midday (noon) to evening. From 12:00 hours to approximately 18:00 hours.
EVENING This is the time from the end of the afternoon to midnight. From approximately 18:00 hours to 00:00 hours.
MIDNIGHT This is the middle of the night (00:00 hours).
MIDDAY This is the middle of the day, also called "NOON" (12:00 hours).
traffic.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 48204 entries, 0 to 48203 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 holiday 48204 non-null object 1 temp 48204 non-null float64 2 rain_1h 48204 non-null float64 3 snow_1h 48204 non-null float64 4 clouds_all 48204 non-null int64 5 weather_main 48204 non-null object 6 weather_description 48204 non-null object 7 date_time 48204 non-null object 8 traffic_volume 48204 non-null int64 dtypes: float64(3), int64(2), object(4) memory usage: 3.3+ MB
#splitting the date_time column into date and time
traffic[['date','time']] = traffic['date_time'].str.split(' ',expand=True)
traffic.head(1)
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | date_time | traffic_volume | date | time | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | 288.28 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 2012-10-02 09:00:00 | 5545 | 2012-10-02 | 09:00:00 |
#creating weekday column from date_time column
traffic['weekday'] =pd.to_datetime(traffic['date_time']).dt.day_name()
#.dt.dayofweek
traffic.drop(columns=['date_time'], axis=1, inplace = True)
traffic['weekday'].unique()
array(['Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'Monday'], dtype=object)
def stringToTime(timeString):
return datetime.strptime(timeString, '%H:%M:%S').time()
midnight=stringToTime('00:00:00')
midday=stringToTime('12:00:00')
sixpm=stringToTime('18:00:00')
#creating time_of_day column from time column
traffic['time_of_day'] = traffic['time'].apply(
lambda x: 'morning' if midnight<stringToTime(x)<midday
else ('afternoon' if midday<stringToTime(x)<=sixpm
else ('evening' if sixpm<=stringToTime(x)>midnight
else ('midday' if stringToTime(x)==midday
else ('midnight' if stringToTime(x)==midnight
else x))))
)
#dropping time column
#traffic.drop(columns=['time'], axis=1, inplace = True)
traffic['time'] = traffic['time'].apply(lambda x: int(stringToTime(x).strftime("%H%M%S")))
# int(current_date.strftime("%Y%m%d%H%M%S")))
traffic['time_of_day'].unique()
array(['morning', 'midday', 'afternoon', 'evening', 'midnight'], dtype=object)
traffic.head(60)
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | traffic_volume | date | time | weekday | time_of_day | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | 288.28 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 5545 | 2012-10-02 | 90000 | Tuesday | morning |
1 | None | 289.36 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 4516 | 2012-10-02 | 100000 | Tuesday | morning |
2 | None | 289.58 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 4767 | 2012-10-02 | 110000 | Tuesday | morning |
3 | None | 290.13 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 5026 | 2012-10-02 | 120000 | Tuesday | midday |
4 | None | 291.14 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 4918 | 2012-10-02 | 130000 | Tuesday | afternoon |
5 | None | 291.72 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5181 | 2012-10-02 | 140000 | Tuesday | afternoon |
6 | None | 293.17 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5584 | 2012-10-02 | 150000 | Tuesday | afternoon |
7 | None | 293.86 | 0.0 | 0.0 | 1 | Clear | sky is clear | 6015 | 2012-10-02 | 160000 | Tuesday | afternoon |
8 | None | 294.14 | 0.0 | 0.0 | 20 | Clouds | few clouds | 5791 | 2012-10-02 | 170000 | Tuesday | afternoon |
9 | None | 293.10 | 0.0 | 0.0 | 20 | Clouds | few clouds | 4770 | 2012-10-02 | 180000 | Tuesday | afternoon |
10 | None | 290.97 | 0.0 | 0.0 | 20 | Clouds | few clouds | 3539 | 2012-10-02 | 190000 | Tuesday | evening |
11 | None | 289.38 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2784 | 2012-10-02 | 200000 | Tuesday | evening |
12 | None | 288.61 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2361 | 2012-10-02 | 210000 | Tuesday | evening |
13 | None | 287.16 | 0.0 | 0.0 | 1 | Clear | sky is clear | 1529 | 2012-10-02 | 220000 | Tuesday | evening |
14 | None | 285.45 | 0.0 | 0.0 | 1 | Clear | sky is clear | 963 | 2012-10-02 | 230000 | Tuesday | evening |
15 | None | 284.63 | 0.0 | 0.0 | 1 | Clear | sky is clear | 506 | 2012-10-03 | 0 | Wednesday | midnight |
16 | None | 283.47 | 0.0 | 0.0 | 1 | Clear | sky is clear | 321 | 2012-10-03 | 10000 | Wednesday | morning |
17 | None | 281.18 | 0.0 | 0.0 | 1 | Clear | sky is clear | 273 | 2012-10-03 | 20000 | Wednesday | morning |
18 | None | 281.09 | 0.0 | 0.0 | 1 | Clear | sky is clear | 367 | 2012-10-03 | 30000 | Wednesday | morning |
19 | None | 279.53 | 0.0 | 0.0 | 1 | Clear | sky is clear | 814 | 2012-10-03 | 40000 | Wednesday | morning |
20 | None | 278.62 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2718 | 2012-10-03 | 50000 | Wednesday | morning |
21 | None | 278.23 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5673 | 2012-10-03 | 60000 | Wednesday | morning |
22 | None | 278.12 | 0.0 | 0.0 | 1 | Clear | sky is clear | 6511 | 2012-10-03 | 80000 | Wednesday | morning |
23 | None | 282.48 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5471 | 2012-10-03 | 90000 | Wednesday | morning |
24 | None | 291.97 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5097 | 2012-10-03 | 120000 | Wednesday | midday |
25 | None | 293.23 | 0.0 | 0.0 | 1 | Clear | sky is clear | 4887 | 2012-10-03 | 130000 | Wednesday | afternoon |
26 | None | 294.31 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5337 | 2012-10-03 | 140000 | Wednesday | afternoon |
27 | None | 295.17 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5692 | 2012-10-03 | 150000 | Wednesday | afternoon |
28 | None | 295.13 | 0.0 | 0.0 | 1 | Clear | sky is clear | 6137 | 2012-10-03 | 160000 | Wednesday | afternoon |
29 | None | 293.66 | 0.0 | 0.0 | 20 | Clouds | few clouds | 4623 | 2012-10-03 | 180000 | Wednesday | afternoon |
30 | None | 290.65 | 0.0 | 0.0 | 20 | Clouds | few clouds | 3591 | 2012-10-03 | 190000 | Wednesday | evening |
31 | None | 288.19 | 0.0 | 0.0 | 20 | Clouds | few clouds | 2898 | 2012-10-03 | 200000 | Wednesday | evening |
32 | None | 287.10 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2637 | 2012-10-03 | 210000 | Wednesday | evening |
33 | None | 286.25 | 0.0 | 0.0 | 1 | Clear | sky is clear | 1777 | 2012-10-03 | 220000 | Wednesday | evening |
34 | None | 285.26 | 0.0 | 0.0 | 1 | Clear | sky is clear | 1015 | 2012-10-03 | 230000 | Wednesday | evening |
35 | None | 284.55 | 0.0 | 0.0 | 1 | Clear | sky is clear | 598 | 2012-10-04 | 0 | Thursday | midnight |
36 | None | 283.47 | 0.0 | 0.0 | 1 | Clear | sky is clear | 369 | 2012-10-04 | 10000 | Thursday | morning |
37 | None | 283.17 | 0.0 | 0.0 | 1 | Clear | sky is clear | 312 | 2012-10-04 | 20000 | Thursday | morning |
38 | None | 282.04 | 0.0 | 0.0 | 1 | Clear | sky is clear | 367 | 2012-10-04 | 30000 | Thursday | morning |
39 | None | 281.69 | 0.0 | 0.0 | 1 | Clear | sky is clear | 835 | 2012-10-04 | 40000 | Thursday | morning |
40 | None | 281.32 | 0.0 | 0.0 | 1 | Clear | sky is clear | 2726 | 2012-10-04 | 50000 | Thursday | morning |
41 | None | 280.74 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5689 | 2012-10-04 | 60000 | Thursday | morning |
42 | None | 280.57 | 0.0 | 0.0 | 1 | Clear | sky is clear | 6990 | 2012-10-04 | 70000 | Thursday | morning |
43 | None | 281.86 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5985 | 2012-10-04 | 80000 | Thursday | morning |
44 | None | 284.98 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5309 | 2012-10-04 | 90000 | Thursday | morning |
45 | None | 289.18 | 0.0 | 0.0 | 1 | Clear | sky is clear | 4603 | 2012-10-04 | 100000 | Thursday | morning |
46 | None | 291.55 | 0.0 | 0.0 | 1 | Clear | sky is clear | 4884 | 2012-10-04 | 110000 | Thursday | morning |
47 | None | 294.97 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5104 | 2012-10-04 | 120000 | Thursday | midday |
48 | None | 296.38 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5178 | 2012-10-04 | 130000 | Thursday | afternoon |
49 | None | 297.32 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5501 | 2012-10-04 | 140000 | Thursday | afternoon |
50 | None | 298.17 | 0.0 | 0.0 | 1 | Clear | sky is clear | 5713 | 2012-10-04 | 150000 | Thursday | afternoon |
51 | None | 298.06 | 0.0 | 0.0 | 20 | Clouds | few clouds | 6292 | 2012-10-04 | 160000 | Thursday | afternoon |
52 | None | 297.67 | 0.0 | 0.0 | 20 | Clouds | few clouds | 6057 | 2012-10-04 | 170000 | Thursday | afternoon |
53 | None | 296.36 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 4907 | 2012-10-04 | 180000 | Thursday | afternoon |
54 | None | 293.85 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 3503 | 2012-10-04 | 190000 | Thursday | evening |
55 | None | 292.43 | 0.0 | 0.0 | 40 | Clouds | scattered clouds | 3037 | 2012-10-04 | 200000 | Thursday | evening |
56 | None | 291.77 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 2822 | 2012-10-04 | 210000 | Thursday | evening |
57 | None | 291.36 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 1992 | 2012-10-04 | 220000 | Thursday | evening |
58 | None | 291.12 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 1166 | 2012-10-04 | 230000 | Thursday | evening |
59 | None | 290.63 | 0.0 | 0.0 | 75 | Clouds | broken clouds | 627 | 2012-10-05 | 0 | Friday | midnight |
top_holidays=traffic.groupby("holiday")["traffic_volume"].sum().reset_index().sort_values(by="traffic_volume",ascending=False)
top_holidays=top_holidays[top_holidays.holiday != 'None']
top_holidays
holiday | traffic_volume | |
---|---|---|
6 | New Years Day | 8136 |
3 | Labor Day | 7092 |
9 | Thanksgiving Day | 5601 |
5 | Memorial Day | 5538 |
2 | Independence Day | 5380 |
0 | Christmas Day | 4965 |
4 | Martin Luther King Jr Day | 3676 |
10 | Veterans Day | 3457 |
11 | Washingtons Birthday | 3176 |
8 | State Fair | 3174 |
1 | Columbus Day | 2597 |
fig = plt.figure(figsize=(12,5))
plt.title("Top Holidays by Traffic_Volume")
sns.barplot(data=top_holidays.head(-7), y="holiday", x="traffic_volume", palette='Blues_d')
fig.show()
C:\Users\My Pc\AppData\Local\Temp\ipykernel_6252\3356038720.py:5: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
top_weekdays=traffic.groupby('weekday')["traffic_volume"].sum().reset_index().sort_values(by="traffic_volume",ascending=False)
top_weekdays
weekday | traffic_volume | |
---|---|---|
0 | Friday | 24994869 |
6 | Wednesday | 24831553 |
4 | Thursday | 24799562 |
5 | Tuesday | 23882653 |
1 | Monday | 23403986 |
2 | Saturday | 18946722 |
3 | Sunday | 16276939 |
fig = plt.figure(figsize=(5,5))
plt.title("Top Weekdays by Traffic_Volume")
sns.barplot(data=top_weekdays.head(-3), x="weekday", y="traffic_volume")
fig.show()
C:\Users\My Pc\AppData\Local\Temp\ipykernel_6252\610389464.py:5: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
top_hours=traffic.groupby('time_of_day')["traffic_volume"].sum().reset_index().sort_values(by="traffic_volume",ascending=False)
top_hours
time_of_day | traffic_volume | |
---|---|---|
4 | morning | 62684570 |
0 | afternoon | 58819695 |
1 | evening | 24707307 |
2 | midday | 9224263 |
3 | midnight | 1700449 |
corr_matrix=traffic.corr()
corr_matrix['traffic_volume'].sort_values(ascending=False)
traffic_volume 1.000000 time 0.352401 temp 0.130299 clouds_all 0.067054 rain_1h 0.004714 snow_1h 0.000733 Name: traffic_volume, dtype: float64
No variable has a strong linear correlation with traffic_volume.
Although traffic is independent of all other variables in our data, time has the strongest linear correlation with traffic volume.
Our null hypothesis is therefore true, since time affects traffic_volume the most.
top_traffic = traffic.sort_values('traffic_volume', ascending=False)
top_traffic.head(1)
holiday | temp | rain_1h | snow_1h | clouds_all | weather_main | weather_description | traffic_volume | date | time | weekday | time_of_day | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
31615 | None | 270.75 | 0.0 | 0.0 | 90 | Clouds | overcast clouds | 7280 | 2017-03-09 | 160000 | Thursday | afternoon |