A Summary of lecture "Analyzing Police Activity with pandas", via datacamp
# Import the pandas library as pd
import pandas as pd
# Read 'police.csv' into a DataFrame named ri
ri = pd.read_csv('./dataset/police.csv')
# Count the unique values in 'violation'
print(ri['violation'].value_counts())
# Express the counts as proportions
print(ri['violation'].value_counts(normalize=True))
Speeding 48424 Moving violation 16224 Equipment 10922 Other 4410 Registration/plates 3703 Seat belt 2856 Name: violation, dtype: int64 Speeding 0.559563 Moving violation 0.187476 Equipment 0.126209 Other 0.050960 Registration/plates 0.042790 Seat belt 0.033002 Name: violation, dtype: float64
# Create a DataFrame of female drivers
female = ri[ri['driver_gender'] == 'F']
# Create a DataFrame of male drivers
male = ri[ri['driver_gender'] == 'M']
# Compute the violations by female drivers (as proportions)
print(female['violation'].value_counts(normalize=True))
# Compute the violations by male drivers (as proportions)
print(male['violation'].value_counts(normalize=True))
Speeding 0.658114 Moving violation 0.138218 Equipment 0.105199 Registration/plates 0.044418 Other 0.029738 Seat belt 0.024312 Name: violation, dtype: float64 Speeding 0.522243 Moving violation 0.206144 Equipment 0.134158 Other 0.058985 Registration/plates 0.042175 Seat belt 0.036296 Name: violation, dtype: float64
# Create a DataFrame of female drivers stopped for speeding
female_and_speeding = ri[(ri['driver_gender'] == 'F') & (ri['violation'] == 'Speeding')]
# Create a DataFrame of male drivers stopped for speeding
male_and_speeding = ri[(ri['driver_gender'] == 'M') & (ri['violation'] == 'Speeding')]
# Compute the stop outcomes for female drivers (as proportions)
print(female_and_speeding.stop_outcome.value_counts(normalize=True))
# Compute the stop outcomes for male drivers (as proportions)
print(male_and_speeding.stop_outcome.value_counts(normalize=True))
Citation 0.952192 Warning 0.040074 Arrest Driver 0.005752 N/D 0.000959 Arrest Passenger 0.000639 No Action 0.000383 Name: stop_outcome, dtype: float64 Citation 0.944595 Warning 0.036184 Arrest Driver 0.015895 Arrest Passenger 0.001281 No Action 0.001068 N/D 0.000976 Name: stop_outcome, dtype: float64
# Check the data type of 'search_conducted'
print(ri['search_conducted'].dtypes)
# Calculate the search rate by counting the values
print(ri['search_conducted'].value_counts(normalize=True))
# Calculate the search rate by taking the mean
print(ri['search_conducted'].mean())
bool False 0.963953 True 0.036047 Name: search_conducted, dtype: float64 0.03604713268876511
# Calculating the search rate for female drivers
print(ri[ri['driver_gender'] == 'F'].search_conducted.mean())
# Calculating the search rate for male drivers
print(ri[ri['driver_gender'] == 'M'].search_conducted.mean())
# Calculate the search rate for both groups simultaneously
print(ri.groupby('driver_gender').search_conducted.mean())
0.019180617481282074 0.04542557598546892 driver_gender F 0.019181 M 0.045426 Name: search_conducted, dtype: float64
# Calculate the search rate for each combination of gender and violation
print(ri.groupby(['driver_gender', 'violation']).search_conducted.mean())
# Reverse the ordering to group by violation before gender
print(ri.groupby(['violation', 'driver_gender']).search_conducted.mean())
driver_gender violation F Equipment 0.039984 Moving violation 0.039257 Other 0.041018 Registration/plates 0.054924 Seat belt 0.017301 Speeding 0.008309 M Equipment 0.071496 Moving violation 0.061524 Other 0.046191 Registration/plates 0.108802 Seat belt 0.035119 Speeding 0.027885 Name: search_conducted, dtype: float64 violation driver_gender Equipment F 0.039984 M 0.071496 Moving violation F 0.039257 M 0.061524 Other F 0.041018 M 0.046191 Registration/plates F 0.054924 M 0.108802 Seat belt F 0.017301 M 0.035119 Speeding F 0.008309 M 0.027885 Name: search_conducted, dtype: float64
During a vehicle search, the police officer may pat down the driver to check if they have a weapon. This is known as a "protective frisk."
In this exercise, you'll first check to see how many times "Protective Frisk" was the only search type. Then, you'll use a string method to locate all instances in which the driver was frisked.
# Count the 'search_type' values
print(ri['search_type'].value_counts())
# Check if 'search_type' contains the string 'Protective Frisk'
ri['frisk'] = ri.search_type.str.contains('Protective Frisk', na=False)
# Check the data type of 'frisk'
print(ri['frisk'].dtypes)
# Take the sum of frisk
print(ri['frisk'].sum())
Incident to Arrest 1290 Probable Cause 924 Inventory 219 Reasonable Suspicion 214 Protective Frisk 164 Incident to Arrest,Inventory 123 Incident to Arrest,Probable Cause 100 Probable Cause,Reasonable Suspicion 54 Probable Cause,Protective Frisk 35 Incident to Arrest,Inventory,Probable Cause 35 Incident to Arrest,Protective Frisk 33 Inventory,Probable Cause 25 Protective Frisk,Reasonable Suspicion 19 Incident to Arrest,Inventory,Protective Frisk 18 Incident to Arrest,Probable Cause,Protective Frisk 13 Inventory,Protective Frisk 12 Incident to Arrest,Reasonable Suspicion 8 Incident to Arrest,Probable Cause,Reasonable Suspicion 5 Probable Cause,Protective Frisk,Reasonable Suspicion 5 Incident to Arrest,Inventory,Reasonable Suspicion 4 Inventory,Reasonable Suspicion 2 Incident to Arrest,Protective Frisk,Reasonable Suspicion 2 Inventory,Probable Cause,Protective Frisk 1 Inventory,Protective Frisk,Reasonable Suspicion 1 Inventory,Probable Cause,Reasonable Suspicion 1 Name: search_type, dtype: int64 bool 303
In this exercise, you'll compare the rates at which female and male drivers are frisked during a search. Are males frisked more often than females, perhaps because police officers consider them to be higher risk?
Before doing any calculations, it's important to filter the DataFrame to only include the relevant subset of data, namely stops in which a search was conducted.
# Create a DataFrame of stops in which a search was conducted
searched = ri[ri.search_conducted == True]
# Calculate the overall frisk rate by taking the mean of 'frisk'
print(searched.frisk.mean())
# Calculate the frisk rate for each gender
print(searched.groupby('driver_gender').frisk.mean())
0.09162382824312065 driver_gender F 0.074561 M 0.094353 Name: frisk, dtype: float64