#!/usr/bin/env python
# coding: utf-8

# # Election Insights: Uncovering Voter Trends  
# 
# ### Description  
# It's election season, and the Democratic National Committee (**DNC**) and the Republican National Committee (**RNC**) need your help analyzing voter demographics and donation data to inform campaign strategies.
# 
# ### Tasks
# - Merge the voter demographics and donation dataframes on the voter ID column and calculate the total donations received from each state.
# - Transform the 'donation_date' column to datetime format and extract the month and year of each donation. Then, create a new dataframe showing the total donations received each month, grouped by state.
# - Pivot the merged dataframe to show the average donation amount by age group and state.

# In[1]:


# import libraries
import pandas as pd
import numpy as np
import sys

print('Python version ' + sys.version)
print('Pandas version ' + pd.__version__)
print('Numpy version ' + np.__version__)


# # The Data  
# 
# The dataset consists of two CSV files: "voter_demographics" and "donations", containing information on 100,000 voters (age, state, party affiliation) and 50,000 donations (date, amount) made by these voters, respectively. The data is synthetic, representing a fictional scenario, and is designed to mimic real-world voter demographics and donation patterns.
# 
# ### Columns:
# **Voter Demographics:**  
# - **voter_id:** Unique identifier for each voter.
# - **state:** Two-letter abbreviation for the voter's state.
# - **age:** Voter's age.
# - **party_affiliation:** Voter's party affiliation.
# 
# 
# **Donations:**  
# - **voter_id:** Unique identifier for each voter.
# - **donation_date:** Date of donation.
# - **donation_amount:** Amount donated.

# In[2]:


# set the seed
np.random.seed(0)

# voter demographics data
voter_demographics = pd.DataFrame({
    'voter_id': range(100000),
    'state': np.random.choice(['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA'], size=100000),
    'age': np.random.randint(18, 80, size=100000),
    'party_affiliation': np.random.choice(['Democrat', 'Republican', 'Independent'], size=100000)
})

# donation data
donations = pd.DataFrame({
    'voter_id': np.random.choice(range(100000), size=50000),
    'donation_date': pd.date_range('2022-01-01', '2024-11-04', periods=50000),
    'donation_amount': np.random.randint(10, 1000, size=50000)
})

# save dataframes to CSV files
voter_demographics.to_csv('voter_demographics.csv', index=False)
donations.to_csv('donations.csv', index=False)


# We will begin by reading the CSV files into memory.

# In[3]:


# create dataframes
voter_demographics_df = pd.read_csv('voter_demographics.csv')
donations_df = pd.read_csv('donations.csv')


# Let us take a look at the data types.

# In[4]:


voter_demographics_df.info()


# Take note that the column named "donation_date" will need to be converted into a date object since it is currently being treated as a string. (see task #2)

# In[5]:


donations_df.info()


# # Task #1  
# 
# Merge the voter demographics and donation dataframes on the voter ID column and calculate the total donations received from each state.
# 
# **Note:** By default, when using the `.merge` method, we are performing an inner join. This means we will not include any row where the "voter_id" column does not match between the two dataframes.

# In[6]:


# merge the two dataframes
df = voter_demographics_df.merge(donations_df, on='voter_id')
df.head()


# We will make use of the `.groupby` method to calculate the total donations received from each state.
# 
# Looking at the final results, we can see that California gave the most donations and Arizona gave the least.

# In[7]:


# create group object
group = df.groupby('state')

# total donations for the group
group['donation_amount'].sum().sort_values(ascending=False)


# # Task #2  
# 
# Transform the 'donation_date' column to datetime format and extract the month and year of each donation. Then, create a new dataframe showing the total donations received each month, grouped by state.

# In[8]:


# convert to date object
df['donation_date'] = pd.to_datetime(df['donation_date'])

# check to make sure it worked
df.info()


# In[9]:


# create a new dataframe
df_new = df.loc[:,['state','donation_date','donation_amount']]

# create new month and year columns
df_new['donation_month'] = df_new['donation_date'].dt.month
df_new['donation_year'] = df_new['donation_date'].dt.year

df_new.head()


# In[10]:


# create group object
group = df_new.groupby(['donation_month','state'])

# total donations received by group
group['donation_amount'].sum()


# # Task #3  
# 
# Pivot the merged dataframe to show the average donation amount by age group and state.

# In[11]:


# create group object
group = df.groupby(['state','age'])

# total donations received by group
group['donation_amount'].mean()


# DNC and RNC listen up!
# 
# Voters in the 45-54 age bracket give the highest donation amounts. Do you agree?

# In[12]:


# create bins for the (18-24), (25-34), (35-44), (45-54) and (55+) categories
bins = [18, 25, 34, 44, 55, float('inf')]  
 
# labels for the three categories
labels = ['18-24', '25-34', '35-44', '45-54', '55+'] 
 
# bin it up!
df['age_brackets'] = pd.cut(df['age'], bins=bins, labels=labels, include_lowest=True)
 
# plot it!
df.groupby('age_brackets', observed=True)['donation_amount'].median().plot();


# # Summary  
# 
# This tutorial focused on analyzing voter demographics and donation data to assist in election strategies. Using Pandas, various tasks were performed to uncover trends in voter donations across different states, age groups, and time periods.
# 
# ### Key Takeaways:
# - Learned how to merge two DataFrames (voter demographics and donations) on a common column (voter_id).
# - Calculated total donations received by each state using the `.groupby()` and `.sum()` functions.
# - Transformed the donation_date column from string to datetime format.
# - Extracted month and year from the donation_date for further analysis.
# - Visualized the median donation amounts per age bracket, discovering that the 45-54 age group contributed the most.

# <p class="text-muted">This tutorial was created by <a href="https://www.hedaro.com" target="_blank"><strong>HEDARO</strong></a></p>