#!/usr/bin/env python # coding: utf-8 # # Challenge: "Sales Analysis" # # ### Description: # You are a sales analyst for an e-commerce company. You have been tasked with analyzing the sales data to determine which regions and product categories are generating the most revenue. You have a dataset containing information about sales, including the region, product category, and sales amount. # # ### Task: # # Using Pandas, write a program to: # # - Load the dataset into a pandas DataFrame. # - Group the sales data by region and product category, and calculate the total sales amount for each region and product category. # - Calculate the total sales amount for each region and product category as a percentage of the overall sales amount. # - Print the top 3 regions and product categories with their total sales amount and percentage of overall sales. # # **Hint:** You can use the groupby function to group the data by region and product category, and calculate the total sales amount for each group. Then, use the pivot_table function to reshape the data and calculate the percentages. # ### Here is the dataset # ``` # Region,Product Category,Sales Amount # North,Electronics,1000 # North,Clothing,800 # South,Electronics,1200 # South,Electronics,900 # East,Clothing,1100 # East,Electronics,700 # West,Electronics,1000 # West,Electronics,900 # North,Clothing,600 # South,Clothing,1000 # ``` # In[1]: import pandas as pd import sys # In[2]: print('Python version ' + sys.version) print('Pandas version ' + pd.__version__) # In[3]: df = pd.read_clipboard(sep=",") df # In[4]: # group the sales data by region and product category group = df.groupby(['Region','Product Category']) # total sales of group group.sum() # In[5]: # flatten the group agg = group['Sales Amount'].sum().reset_index() agg # In[6]: # calculate total sales amount total_sales = agg['Sales Amount'].sum() # calculate percentage of total sales for each region and product category agg['Percentage of Total Sales'] = (agg['Sales Amount'] / total_sales) * 100 agg # In[7]: # get the top 3 regions agg.sort_values(by='Percentage of Total Sales', ascending=False).head(3) # Here is an alternative solution using pivot tables. # In[8]: # pivot the data to get sales amount by region and product category pivot = pd.pivot_table(df, values='Sales Amount', index='Region', columns='Product Category', aggfunc='sum') pivot # In[9]: # calculate total sales amount total_sales = df['Sales Amount'].sum() total_sales # In[10]: # calculate percentage of total sales for each region and product category agg = (pivot / total_sales) * 100 agg # With the shape of this DataFrame ordering by percentage of total sales is not really possible. # In[11]: # the stack function lets us reshape the data so we can sort by the percentages agg.stack().sort_values(ascending=False).head(3) #

This tutorial was created by HEDARO