#!/usr/bin/env python # coding: utf-8 # # Election Insights: Uncovering Voter Trends # # ### Description # It's election season, and the Democratic National Committee (**DNC**) and the Republican National Committee (**RNC**) need your help analyzing voter demographics and donation data to inform campaign strategies. # # ### Tasks # - Merge the voter demographics and donation dataframes on the voter ID column and calculate the total donations received from each state. # - Transform the 'donation_date' column to datetime format and extract the month and year of each donation. Then, create a new dataframe showing the total donations received each month, grouped by state. # - Pivot the merged dataframe to show the average donation amount by age group and state. # In[1]: # import libraries import pandas as pd import numpy as np import sys print('Python version ' + sys.version) print('Pandas version ' + pd.__version__) print('Numpy version ' + np.__version__) # # The Data # # The dataset consists of two CSV files: "voter_demographics" and "donations", containing information on 100,000 voters (age, state, party affiliation) and 50,000 donations (date, amount) made by these voters, respectively. The data is synthetic, representing a fictional scenario, and is designed to mimic real-world voter demographics and donation patterns. # # ### Columns: # **Voter Demographics:** # - **voter_id:** Unique identifier for each voter. # - **state:** Two-letter abbreviation for the voter's state. # - **age:** Voter's age. # - **party_affiliation:** Voter's party affiliation. # # # **Donations:** # - **voter_id:** Unique identifier for each voter. # - **donation_date:** Date of donation. # - **donation_amount:** Amount donated. # In[2]: # set the seed np.random.seed(0) # voter demographics data voter_demographics = pd.DataFrame({ 'voter_id': range(100000), 'state': np.random.choice(['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA'], size=100000), 'age': np.random.randint(18, 80, size=100000), 'party_affiliation': np.random.choice(['Democrat', 'Republican', 'Independent'], size=100000) }) # donation data donations = pd.DataFrame({ 'voter_id': np.random.choice(range(100000), size=50000), 'donation_date': pd.date_range('2022-01-01', '2024-11-04', periods=50000), 'donation_amount': np.random.randint(10, 1000, size=50000) }) # save dataframes to CSV files voter_demographics.to_csv('voter_demographics.csv', index=False) donations.to_csv('donations.csv', index=False) # We will begin by reading the CSV files into memory. # In[3]: # create dataframes voter_demographics_df = pd.read_csv('voter_demographics.csv') donations_df = pd.read_csv('donations.csv') # Let us take a look at the data types. # In[4]: voter_demographics_df.info() # Take note that the column named "donation_date" will need to be converted into a date object since it is currently being treated as a string. (see task #2) # In[5]: donations_df.info() # # Task #1 # # Merge the voter demographics and donation dataframes on the voter ID column and calculate the total donations received from each state. # # **Note:** By default, when using the `.merge` method, we are performing an inner join. This means we will not include any row where the "voter_id" column does not match between the two dataframes. # In[6]: # merge the two dataframes df = voter_demographics_df.merge(donations_df, on='voter_id') df.head() # We will make use of the `.groupby` method to calculate the total donations received from each state. # # Looking at the final results, we can see that California gave the most donations and Arizona gave the least. # In[7]: # create group object group = df.groupby('state') # total donations for the group group['donation_amount'].sum().sort_values(ascending=False) # # Task #2 # # Transform the 'donation_date' column to datetime format and extract the month and year of each donation. Then, create a new dataframe showing the total donations received each month, grouped by state. # In[8]: # convert to date object df['donation_date'] = pd.to_datetime(df['donation_date']) # check to make sure it worked df.info() # In[9]: # create a new dataframe df_new = df.loc[:,['state','donation_date','donation_amount']] # create new month and year columns df_new['donation_month'] = df_new['donation_date'].dt.month df_new['donation_year'] = df_new['donation_date'].dt.year df_new.head() # In[10]: # create group object group = df_new.groupby(['donation_month','state']) # total donations received by group group['donation_amount'].sum() # # Task #3 # # Pivot the merged dataframe to show the average donation amount by age group and state. # In[11]: # create group object group = df.groupby(['state','age']) # total donations received by group group['donation_amount'].mean() # DNC and RNC listen up! # # Voters in the 45-54 age bracket give the highest donation amounts. Do you agree? # In[12]: # create bins for the (18-24), (25-34), (35-44), (45-54) and (55+) categories bins = [18, 25, 34, 44, 55, float('inf')] # labels for the three categories labels = ['18-24', '25-34', '35-44', '45-54', '55+'] # bin it up! df['age_brackets'] = pd.cut(df['age'], bins=bins, labels=labels, include_lowest=True) # plot it! df.groupby('age_brackets', observed=True)['donation_amount'].median().plot(); # # Summary # # This tutorial focused on analyzing voter demographics and donation data to assist in election strategies. Using Pandas, various tasks were performed to uncover trends in voter donations across different states, age groups, and time periods. # # ### Key Takeaways: # - Learned how to merge two DataFrames (voter demographics and donations) on a common column (voter_id). # - Calculated total donations received by each state using the `.groupby()` and `.sum()` functions. # - Transformed the donation_date column from string to datetime format. # - Extracted month and year from the donation_date for further analysis. # - Visualized the median donation amounts per age bracket, discovering that the 45-54 age group contributed the most. #

This tutorial was created by HEDARO