ECON 323 Final Project

Factors affecting bike share in San Francisco

Background

Bike sharing is an increasingly popular service, especially as cities are moving to be more sustainable. Eren and Uz (2020) identified several factors affecting bike sharing, including weather (e.g. seasons, precipitation, and temperature), public transportation, safety, and other temporal factors. Bay Wheels is a bike sharing service in San Francisco provided by Lyft (similar to the Shaw Mobi bikes in Vancouver). Some differences are that Bay Wheels provides both electric bikes and regular bikes, and that the service is accessible through the Lyft application and Clipper cards (similar to Vancouver’s Compass card). Bay Wheels’ dataset is publicly available on Lyft’s website at https://www.lyft.com/bikes/bay-wheels/system-data. This project aims to look at factors that influence bike sharing in San Francisco.

Literature Review

Based on Eren and Uz (2020)’s findings, the optimal temperature for bike sharing hovers around 20 – 30 °C with no precipitation, while the 10-20 °C range also has a positive correlation in bike sharing. Eren and Uz also found that bike sharing can be used as a substitute for public transportation in areas with poor public transport options or at times where services are reduced (e.g. late night/early morning). However, bike sharing services are also used as complements to public transportation to reduce travel time. Eren and Uz found that the correlation between income level and usage of bike share services tend to be strongly correlated as income goes up. However, Guo et al (2017) found that in Ningbo, China found that the use of bike sharing services are highest among lower income residents, followed by middle-income and high-income respectively.

Hypothesis/Research

Using Eren and Uz (2020)’s findings as a starting point, I will look at how the following factors affect bike sharing in San Francisco.

  • Weather – San Francisco’s mild coastal weather ranges around 8 – 21 °C all year round, with an average of 8 days of precipitation in the winter months and 0 days in the summer months. These numbers make San Francisco’s weather the ideal conditions for bike sharing. I will examine whether seasonality still influences bike sharing in San Francisco by cross-examining weather data with start/end times for bike share sessions.

  • Public transportation – I will look bike share usage along BART (rail/subway), Caltrain, and MUNI (light rail and cable car) stops and the time of day. Since many working in the city are commuters, I would expect higher usages along major transit stops before/after work hours.

  • Safety – I will see whether crime rate (from San Francisco Police Department) affects bike share usage. Safety is often cited as a concern for bike share users, and I expect there to be a negative correlation between bike share usage and crime rate.

  • Temporal factors - I will break down bike share usage by weekdays/weekends, and also the time of day.

Dataset

The main dataset used in this analysis comes from Lyft's Bay Wheels bike share system data from June 2019 to March 2021 (the most recently available). It is available at https://www.lyft.com/bikes/bay-wheels/system-data. I have chosen to exclude data prior to Lyft's acquisitation of Bay Wheels in June 2019 (previously Ford GoBike), as the rebrand led to slight changes to the bike share model (e.g. docked and dockless bikes, increased electrical bikes, etc...) and the rental access methods (i.e. the bikes can now be rented through the Lyft app). Bay Wheels operates in San Francisco, Oakland, Berkeley, Emeryville, and San Jose. I will focus this study on bike sharing in San Francisco only.

Other datasets used:

Outline

  1. Examining temporal factors: visualizations of time of day and weekday/weekend usage of bike shares, as well as user types over weekday/weekends (casual versus subscribed member)

  2. Examining the role of weather: visualization of daily temperature, precipitation, and rate of bike shares

  3. Examining safety and public transportation as factors affeting bike sharing: map visualization of pubic transportation (BART, MUNI, Caltrain) stations, crime rate for each neighborhood, and bike share usage based on start location

  4. Training a linear regression model to predict bike share usage comparing linear regression and one with polynomial transformations applied

  5. Conclusion: discussion of limitations, areas for further research

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point
from geopandas import GeoDataFrame
import geopandas as gpd

all_files_to_format = ['201906-baywheels-tripdata.csv', '201907-baywheels-tripdata.csv',
                       '201908-baywheels-tripdata.csv', '201909-baywheels-tripdata.csv', 
                       '201910-baywheels-tripdata.csv','201911-baywheels-tripdata.csv',
                       '201912-baywheels-tripdata.csv', '202001-baywheels-tripdata.csv',
                       '202002-baywheels-tripdata.csv', '202003-baywheels-tripdata.csv']

all_files = ['202004-baywheels-tripdata.csv', '202005-baywheels-tripdata.csv', '202006-baywheels-tripdata.csv', 
             '202007-baywheels-tripdata.csv', '202008-baywheels-tripdata.csv', '202009-baywheels-tripdata.csv', 
             '202010-baywheels-tripdata.csv', '202011-baywheels-tripdata.csv', '202012-baywheels-tripdata.csv',
             '202101-baywheels-tripdata.csv', '202102-baywheels-tripdata.csv', '202103-baywheels-tripdata.csv']

# read each month's data and append them to the df
li = []
for filename in all_files_to_format:
    df = pd.read_csv(filename, index_col=None, header=0, dtype={'bike_share_for_all_trip': 'string', 'rental_access_method': 'string'})
    # pre-2020 data has some discrepancies in column names and values, rename them to match 2020 data
    df = df.rename(columns={'start_station_latitude': 'start_lat', 'start_station_longitude': 'start_lng',
                           'end_station_latitude': 'end_lat', 'end_station_longitude': 'end_lng',
                           'user_type': 'member_casual', 'start_time': 'started_at', 'end_time': 'ended_at'
                          })
    df = df.replace({'member_casual': r'Customer$'}, {'member_casual': 'casual'}, regex=True)
    df = df.replace({'member_casual': r'Subscriber$'}, {'member_casual': 'member'}, regex=True)
    li.append(df)
    
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0, dtype={'bike_share_for_all_trip': 'string', 'rental_access_method': 'string'})
    li.append(df)

lyft_data = pd.concat(li, axis=0, ignore_index=True)


# add new column to df to filter for sf data only
lyft_data['withinSF'] = 1
withinSFlist = []

# get sf polygon
state_df = gpd.read_file("http://www2.census.gov/geo/tiger/PREVGENZ/ma/ma99/ce00shp/ce99_d00_shp.zip")
SF_polygon = state_df.loc[state_df['NAME'] == 'San Francisco'].iloc[1].geometry

for lon,lat in zip(lyft_data.start_lng, lyft_data.start_lat):
    pt = Point(lon, lat)
    withinSF = SF_polygon.contains(pt)
    withinSFlist.append(withinSF)
    
lyft_data['withinSF'] = withinSFlist
lyft_data = lyft_data[lyft_data.withinSF == 1]

# drop unused/renamed cols
lyft_data = lyft_data.drop(columns=['withinSF', 'duration_sec', 'ride_id'])

lyft_data
Out[2]:
started_at ended_at start_station_id start_station_name start_lat start_lng end_station_id end_station_name end_lat end_lng bike_id member_casual bike_share_for_all_trip rental_access_method rideable_type
0 2019-06-30 18:16:09.7730 2019-07-01 16:57:45.5920 109 17th St at Valencia St 37.763316 -122.421904 56 Koshland Park 37.773414 -122.427317 1502.0 member No <NA> NaN
1 2019-06-30 18:09:55.8300 2019-07-01 14:47:36.6810 50 2nd St at Townsend St 37.780526 -122.390288 101 15th St at Potrero Ave 37.767079 -122.407359 2526.0 casual No <NA> NaN
2 2019-06-30 15:40:31.0380 2019-07-01 08:13:54.3490 23 The Embarcadero at Steuart St 37.791464 -122.391034 30 San Francisco Caltrain (Townsend St at 4th St) 37.776598 -122.395282 2427.0 member No <NA> NaN
4 2019-06-30 17:21:00.0550 2019-07-01 06:55:54.9960 15 San Francisco Ferry Building (Harry Bridges Pl... 37.795392 -122.394203 30 San Francisco Caltrain (Townsend St at 4th St) 37.776598 -122.395282 1070.0 casual No <NA> NaN
6 2019-06-30 14:31:39.5730 2019-07-01 00:53:02.2520 6 The Embarcadero at Sansome St 37.804770 -122.403234 400 Buchanan St at North Point St 37.804272 -122.433537 1980.0 casual No <NA> NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3944933 2021-03-17 15:28:27 2021-03-17 15:29:10 SF-C28-2 Broadway at Battery St 37.798572 -122.400869 SF-C28-2 Broadway at Battery St 37.798572 -122.400869 NaN casual NaN NaN classic_bike
3944935 2021-03-26 11:27:13 2021-03-26 11:38:59 SF-C28-2 Broadway at Battery St 37.798509 -122.400834 SF-C28-2 Broadway at Battery St 37.798637 -122.400695 NaN casual NaN NaN electric_bike
3944936 2021-03-21 11:24:06 2021-03-21 11:25:33 SF-E18 Divisadero St at Clay St 37.789588 -122.440683 SF-E18 Divisadero St at Clay St 37.789588 -122.440683 NaN casual NaN NaN classic_bike
3944937 2021-03-29 20:34:27 2021-03-29 20:47:19 SF-BB17 London St at Geneva Ave 37.716183 -122.440112 SF-BB17 London St at Geneva Ave 37.716209 -122.440078 NaN casual NaN NaN electric_bike
3944938 2021-03-15 18:18:24 2021-03-15 18:27:03 SF-N27 Rhode Island St at 17th St 37.764478 -122.402570 SF-N27 Rhode Island St at 17th St 37.764478 -122.402570 NaN casual NaN NaN classic_bike

3118591 rows × 15 columns

In [8]:
import datetime
import seaborn as sns

# convert timestamp into useful date/hour/day of week columns we can use

def getDate(timestamp):
    date_only = timestamp.split()[0]
    return date_only

def getYear(timestamp):
    date_only = timestamp.split()[0]
    date_arr = date_only.split('-')
    year = int(date_arr[0])
    return year

def getMonth(timestamp):
    date_only = timestamp.split()[0]
    date_arr = date_only.split('-')
    month = int(date_arr[1])
    return month

def getDayOfWeek(timestamp):
    date_only = timestamp.split()[0]
    date_arr = date_only.split('-')
    year = int(date_arr[0])
    month = int(date_arr[1])
    day = int(date_arr[2])
    return datetime.datetime(year, month, day).weekday()

def getHourOfDay(timestamp):
    time_only = timestamp.split()[1]
    time_arr = time_only.split(':')
    return time_arr[0]

def isWeekend(day_of_week):
    return day_of_week > 4

lyft_data['date'] = lyft_data.apply(lambda row: getDate(row.started_at), axis=1)
lyft_data['year']= lyft_data.apply(lambda row: getYear(row.started_at), axis=1)
lyft_data['month'] = lyft_data.apply(lambda row: getMonth(row.started_at), axis=1)
lyft_data['day_of_week'] = lyft_data.apply(lambda row: getDayOfWeek(row.started_at), axis=1)
lyft_data['is_weekend'] = lyft_data.apply(lambda row: isWeekend(row.day_of_week), axis=1)
lyft_data['hour_of_day'] = lyft_data.apply(lambda row: getHourOfDay(row.started_at), axis=1)

lyft_data.head()
Out[8]:
started_at ended_at start_station_id start_station_name start_lat start_lng end_station_id end_station_name end_lat end_lng ... member_casual bike_share_for_all_trip rental_access_method rideable_type date year month day_of_week is_weekend hour_of_day
0 2019-06-30 18:16:09.7730 2019-07-01 16:57:45.5920 109 17th St at Valencia St 37.763316 -122.421904 56 Koshland Park 37.773414 -122.427317 ... member No <NA> NaN 2019-06-30 2019 6 6 True 18
1 2019-06-30 18:09:55.8300 2019-07-01 14:47:36.6810 50 2nd St at Townsend St 37.780526 -122.390288 101 15th St at Potrero Ave 37.767079 -122.407359 ... casual No <NA> NaN 2019-06-30 2019 6 6 True 18
2 2019-06-30 15:40:31.0380 2019-07-01 08:13:54.3490 23 The Embarcadero at Steuart St 37.791464 -122.391034 30 San Francisco Caltrain (Townsend St at 4th St) 37.776598 -122.395282 ... member No <NA> NaN 2019-06-30 2019 6 6 True 15
4 2019-06-30 17:21:00.0550 2019-07-01 06:55:54.9960 15 San Francisco Ferry Building (Harry Bridges Pl... 37.795392 -122.394203 30 San Francisco Caltrain (Townsend St at 4th St) 37.776598 -122.395282 ... casual No <NA> NaN 2019-06-30 2019 6 6 True 17
6 2019-06-30 14:31:39.5730 2019-07-01 00:53:02.2520 6 The Embarcadero at Sansome St 37.804770 -122.403234 400 Buchanan St at North Point St 37.804272 -122.433537 ... casual No <NA> NaN 2019-06-30 2019 6 6 True 14

5 rows × 21 columns

1) Examining Temporal Factors

Looking at usage over weekdays/weekends and over different times of day.

In [10]:
fig, axes = plt.subplots(2, 1, figsize=(12, 12), sharex=False)

# group data based on weekday/weekend and hour of day
hourly_usage = (lyft_data.groupby(['is_weekend'])['hour_of_day'].value_counts()
                   .rename('usage count')
                   .mul(100)
                   .reset_index()
                   .sort_values('hour_of_day'))

hourly_usage = hourly_usage.replace({'is_weekend': True}, {'is_weekend': 'weekend'}, regex=True)
hourly_usage = hourly_usage.replace({'is_weekend': False}, {'is_weekend': 'weekday'}, regex=True)


hourly_usage_plot = sns.barplot(ax=axes[0], x="is_weekend", y="usage count", hue="hour_of_day", data=hourly_usage)
hourly_usage_plot.set(xlabel=None, title='Figure 1: Hourly Usage during Weekdays and Weekends')
axes[0].legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.1, title='Hour of Day')

# Plot usage data for each day of the week
weekday_data = lyft_data.groupby(['day_of_week']).count()

labels = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
weekday_plot = weekday_data.plot.pie(ax=axes[1], y='started_at', labels=labels, autopct='%1.1f%%')
axes[1].get_legend().remove()
weekday_plot.set(ylabel=None, title='Figure 2: Distribution of Usage by Day of Week')
Out[10]:
[Text(0, 0.5, ''),
 Text(0.5, 1.0, 'Figure 2: Distribution of Usage by Day of Week')]

The distribution of usage by day of week (Figure 2) shows that usage is slightly higher towards the weekend, but it's mostly equally distributed. However, the hourly usages vary quite a bit over the weekend versus weekday.

According to Figure 1, weekday usage seems to peak when people get to work and when they get off work. Usage is also higher in the evenings, suggesting that people might use bike sharing as a more leisurely means of getting home, when they do not face the pressure of having to get to work on time (like in the mornings). On weekends, usage seems mirror people's leisure activities and peaks during the middle of the day.

Next, let's compare the type of users over the weekend/weekday.

In [11]:
weekday_usage_types = (lyft_data.groupby(['is_weekend'])['member_casual'].value_counts(normalize=True)
                   .rename('percentage')
                   .mul(100)
                   .reset_index())

weekday_usage_types = weekday_usage_types.replace({'is_weekend': True}, {'is_weekend': 'weekend'}, regex=True)
weekday_usage_types = weekday_usage_types.replace({'is_weekend': False}, {'is_weekend': 'weekday'}, regex=True)

sns.set(rc={'figure.figsize':(6, 6)})
weekday_usage_types_plot = sns.barplot(x="is_weekend", y="percentage", hue="member_casual", data=weekday_usage_types)
weekday_usage_types_plot.set(xlabel=None, ylabel='Percentage (%)', title='Figure 3: Usage Types as a Percentage of Total Usage')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0, title='Usage Type')
Out[11]:
<matplotlib.legend.Legend at 0x1051651d0>

On weekdays, the share of casual versus subscribed (member) users are quite equally distributed. However, during weekends, casual users take up a much higher proportion of bike share usage. Based on the graphed usage by hour in Figure 1, usage seems to peak around evening commute hours, which suggests a siginificant amount of subscribed weekly commutters using bikes as a means to get home on a regular basis.

2) Examining the role of weather

Looking at each day's bike share usage, plotted against temperature and precipitation. I will look at monthly usage counts and monthly averages of temperature and precipitation. To examine this relationship further, I will plot average daily temperatures and precipitation against bike share usage counts.

An outline of the steps below

  1. Get daily weather data (temperature and precipitation), convert to metric units
  2. Calculate the average monthly weather data, total up monthly bike share usage counts, and plot them by month.
  3. Group the Lyft bike share data into daily counts, plot these counts against daily temperature and precipitation
In [13]:
import calendar

# read daily weather data files
weather_files_by_year = ['sf_weather_data_2020.csv', 'sf_weather_data_2019.csv', 'sf_weather_data_2021.csv']
li = []
for filename in weather_files_by_year:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)
    
sf_weather = pd.concat(li, axis=0, ignore_index=True)
sf_weather = sf_weather.drop(columns=['Maximum', 'Minimum', 'Departure', 'HDD', 'CDD'])
sf_weather = sf_weather.rename(columns={'Date': 'date', 'Precipitation': 'precipitation', 'Average': 'temp_average'})

def fahr_to_celsius(temp_fahr):
    temp_celsius = (temp_fahr - 32) * 5 / 9
    return temp_celsius

def in_to_cm(precip_in):
    precip_cm = precip_in * 2.54
    return precip_cm

def formatMonthYear(row):
    m = calendar.month_abbr[row['month']]
    y = str(row['year'])
    y = y[-2:]
    return m + " '" + y

# convert units to metric (brrrr imperial)
sf_weather['temp_average'] = sf_weather['temp_average'].astype(float)
sf_weather['temp_average'] = fahr_to_celsius(sf_weather['temp_average'])
sf_weather['precipitation'] = sf_weather['precipitation'].astype(float)
sf_weather['precipitation'] = in_to_cm(sf_weather['precipitation'])


sf_weather['month'] = pd.DatetimeIndex(sf_weather['date']).month
sf_weather['year'] = pd.DatetimeIndex(sf_weather['date']).year
sf_weather['month_year'] = sf_weather.apply(lambda row: formatMonthYear(row), axis=1)

# get each month's average temperature (sf_weather is our daily average)
sf_weather_monthly_average = (sf_weather.groupby(['month', 'year']).mean()
                             .reset_index())

sf_weather_monthly_average.head()
Out[13]:
month year temp_average precipitation
0 1 2019 12.768817 0.374445
1 1 2020 11.568100 0.200742
2 1 2021 11.818996 0.239252
3 2 2019 9.990079 0.652236
4 2 2020 13.534483 0.000000
In [242]:
fig, axes = plt.subplots(2, 1, figsize=(20, 15))



###################### plot monthly bike share data with weather ###################### 

# get monthly bike share totals
usage_by_month = (lyft_data.groupby(['month', 'year']).size()
                             .rename('usage count')
                             .reset_index())
usage_by_month['month_year'] = usage_by_month.apply(lambda row: formatMonthYear(row), axis=1)

# sort bars by year, then month
usage_by_month = usage_by_month.sort_values(['year', 'month']).reset_index().drop(columns=['index'])
usage_by_month = usage_by_month.reset_index().rename(columns={'index': 'order'})
usage_by_month['month_year'] = pd.Categorical(usage_by_month['month_year'],
                                   categories=usage_by_month['month_year'].tolist(),
                                   ordered=True)
# merge monthly usage data and monthly weather averages
monthly_usage_and_weather = pd.merge(usage_by_month, sf_weather_monthly_average)

# plot monthly usage, temperature, and precipitation data on the same plot
monthly_usage_plot = sns.barplot(ax=axes[0], data=monthly_usage_and_weather, x='month_year', y='usage count', color='b')
ax2 = axes[0].twinx()
monthly_weather_plot = sns.lineplot(data=monthly_usage_and_weather, x='month_year', y='temp_average', color="r", ax=ax2)
axes[0].grid(False)
ax2.grid(False)
monthly_weather_plot.set(xlabel='month', ylabel='precipitation (cm) / temp (c)', title='Figure 4: Monthly Usage and Weather Averages')



###################### showing the relationship between daily bike share counts with weather ###################### 

# get daily bike share totals
lyft_data_daily = (lyft_data.groupby(['date']).size()
                             .rename('count')
                             .reset_index())

# merge daily usage data with daily weather averages
lyft_data_daily = pd.merge(lyft_data_daily, sf_weather)
lyft_data_daily

daily_weather_and_usage_plot = sns.scatterplot(ax=axes[1], data=lyft_data_daily, x="temp_average", y="count")
sns.scatterplot(ax=axes[1], data=lyft_data_daily, x="precipitation", y="count")
daily_weather_and_usage_plot.set(xlabel='precipitation (cm) / temp (c)', ylabel='usage count', title='Figure 5: Weather and Daily Usage')
Out[242]:
[Text(0, 0.5, 'usage count'),
 Text(0.5, 0, 'precipitation (cm) / temp (c)'),
 Text(0.5, 1.0, 'Figure 5: Weather and Daily Usage')]

Based on Figure 4, usage seems to decrease with extremely low temperatures. From October to December of 2020 and 2021, usage and temperatures both fall together. There is a notable spike in usage in January and February of 2020, which can be attributed to the return of electric bikes after battery issues in July 2019 caused Lyft to pause their electric fleet offerings. From March 19, 2020 to January 25, 2021, California issued a statewide stay-at-home order, which could explain a drop in the usage. Besides the spike from the return of electric bikes at the beginning of 2020, there doesn't seem to be a lot of flucuation in usage in the monthly data.

Figure 5 is more telling of the relationship between weather and usage. Although San Francisco does not get a lot of rain at all, users are still very sensitive to precipitation levels. At low levels of precipitation, usage is under 10,000 rides per day. At slightly higher levels of precipitation (around 2 – 4cm per day), usage does not go avove 6000 rides per day. Temperatures within the modereate, regular range (of around 10 – 18 °C) does not seem to have much of an effect on bike share usage. However, when temperatures gets warmer (above 20 °C), usage drops to below 8000 rides per day.

These results confirm Eren and Uz (2020)’s findings to an extent. These results confirm Eren and Uz's findings that trip production is positively correlated with temperature when temperature ranges around 0 – 20 °C with no precipitation. Eren and Uz also found that 20 – 30 °C is the temperature range in which temperature bike sharing demand is at the maximum level. These results show that bike sharing demand fluctuates to higher levels around 10 - 20 °C, but the positive correlation is still observed past 20 °C.

3) Examing Safety and Public Transportation as Factors Affeting Bike Sharing

Mapping crime rate, public transportation options, and bike share usage

In [241]:
import shapely
from folium.plugins import BeautifyIcon
import folium

sf_latitude = 37.7749
sf_longitude = -122.4194

# Create map
sf_usage_map = folium.Map(
    location=[sf_latitude,sf_longitude],
    zoom_start=13,
    tiles='cartodb positron')

###################### map out crime rate per neighborhood ###################### 

# read sf neighborhood polygons
neighborhoods = gpd.read_file('SF_Analysis_Neighborhoods.csv')
neighborhoods = neighborhoods.drop(columns=['geometry'])
neighborhoods = neighborhoods.rename(columns={'NHOOD': 'Analysis Neighborhood', 'the_geom': 'geometry'})

# read sf crime data
incidents = pd.read_csv('SF_Police_Incident_Reports_2018_to_Present.csv') 
incidents = incidents.dropna(subset=['Latitude', 'Longitude'])

# get a count of incidents by neighborhood, sorted from lowest to highest
incidents_by_neighborhood = (incidents.groupby(['Analysis Neighborhood']).size()
                             .rename('count')
                             .reset_index()
                             .sort_values('count'))

# add color based on number of incidents (for map) 
incidents_by_neighborhood['fill_color'] = pd.cut(incidents_by_neighborhood['count'], bins=5, 
                                                 labels=['#fcfadc', '#F2E750', '#F2B807', '#F28705', '#C52104'])


# get each neighborhood's polygons from the neighborhood dataset
incidents_by_neighborhood = pd.merge(incidents_by_neighborhood, neighborhoods)

# convert polygon data for mapping
geometry = incidents_by_neighborhood['geometry'].map(shapely.wkt.loads)
incidents_by_neighborhood = incidents_by_neighborhood.drop('geometry', axis=1)
neighborhood_incidents_gdf = gpd.GeoDataFrame(incidents_by_neighborhood, crs="EPSG:4326", geometry=geometry)


folium.GeoJson(neighborhood_incidents_gdf,
               style_function=lambda feature: {
                   'fillColor': feature['properties']['fill_color'],
                   'color' : feature['properties']['fill_color'],
                   'fillOpacity' : 0.3,
               }).add_to(sf_usage_map)


###################### add public transit stations data ###################### 

stations = pd.read_csv('Passenger_Rail_Stations_2019.csv')

# add MUNI stations (light rail)
muni_stations = stations[stations['mode'] == 'Light Rail']
for index, row in muni_stations.iterrows():
    lng = row.X
    lat = row.Y
    folium.CircleMarker(
      [lat,lng],
      radius=1,
      color='#d1feff',
      fill=False,
      fill_opacity=1
    ).add_to(sf_usage_map) 

# add Cal Train stations
cal_train_stations = stations[stations['mode'] == 'Commuter Rail']
for index, row in cal_train_stations.iterrows():
    lng = row.X
    lat = row.Y
    folium.CircleMarker(
      [lat,lng],
      radius=8,
      color='#9768D1',
      fill=True,
      fill_color='#9768D1',
      fill_opacity=1
    ).add_to(sf_usage_map) 

# add BART stations (subway)
bart_stations = stations[stations['mode'] == 'Rapid Rail']
for index, row in bart_stations.iterrows():
    lng = row.X
    lat = row.Y
    folium.CircleMarker(
      [lat,lng],
      radius=5,
      color='#9C27B0',
      fill=True,
      fill_color='#9C27B0',
      fill_opacity=1
    ).add_to(sf_usage_map) 

###################### add bike usage data ###################### 

# get total usage by station, color marker based on usage
loc_data = lyft_data.groupby(['start_station_id']).size().rename('count').reset_index().sort_values('count')

marker_colors = ['#b1ff91', '#94e586', '#7acb7b', '#4b9963', '#27b48f',
                                          '#378056', '#246848', '#14513a', '#063b2b', '00261c']
loc_data['marker_color'] = pd.cut(loc_data['count'], bins=10, 
                                  labels=marker_colors)


# getlat lon of each station 
lyft_data = lyft_data.dropna(subset=['start_station_id'])
loc_data_first_row = lyft_data.groupby(['start_station_id']).head(1)
loc_data_first_row = loc_data_first_row.drop(
    columns=['started_at', 'ended_at', 'bike_id', 'hour_of_day', 'day_of_week', 'is_weekend', 'rideable_type', 'member_casual'])

loc_data = pd.merge(loc_data, loc_data_first_row ,on='start_station_id',how='inner')

for lat,lng, marker_color in zip(loc_data.start_lat, loc_data.start_lng, loc_data.marker_color):
    folium.CircleMarker(
      [lat,lng],
      radius=3,
      color=marker_color,
      fill=True,
      fill_color=marker_color,
      fill_opacity=0.5
    ).add_to(sf_usage_map)   


sf_usage_map
Out[241]:
Make this Notebook Trusted to load map: File -> Trust Notebook