Finding the Best Markets to Advertise In

1) Introduction

In the present project, we'll assume that we're working for an an e-learning company that offers courses on programming, covering topics such as: web and mobile development, data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

We would like to explore surveys that have been already conducted, instead of organizing one ourselves (which would increase our expenses), and a good candidate is the data from freeCodeCamp's 2017 New Coder Survey. freeCodeCamp is a free e-learning platform that offers courses on web development. Because they run a popular Medium publication (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for the purpose of our analysis.

The survey data is publicly available in this GitHub repository.

2) Preparing the data

2.1) Data reading and initial inspection

In [1]:
# Importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
# Reading the data
raw_data = pd.read_csv("2017-fCC-New-Coders-Survey-Data.csv")

# Setting pandas so we can inspect every column in the output
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Displaying info of interest
display("Shape of the dataframe:", raw_data.shape)
display(raw_data.head())
#17 and 62
# display(raw_data.iloc[:,17].value_counts())
C:\Users\Alvaro\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3444: DtypeWarning: Columns (17,62) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)
'Shape of the dataframe:'
(18175, 136)
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston
0 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes Canada Canada software development and IT NaN Employed for wages NaN NaN NaN NaN female NaN NaN 1.0 0.0 1.0 0.0 0.0 0.0 NaN 15.0 02d9465b21e8bd09374b0066fb2d5614 eb78c1c3ac6cd9052aec557065070fbf NaN NaN 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN start your own business NaN NaN NaN English married or domestic partnership 150.0 6.0 6f1fbc6b2b 2017-03-09 00:36:22 2017-03-09 00:32:59 2017-03-09 00:59:46 2017-03-09 00:36:26 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 34.0 0.0 NaN NaN NaN NaN NaN less than 100,000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 35000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 10.0 5bfef9ecb211ec4f518cfc1d2a6f3e0c 21db37adb60cdcafadfa7dca1b13b6b1 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a nonprofit 1.0 Full-Stack Web Developer in an office with other developers English single, never married 80.0 6.0 f8f8be6910 2017-03-09 00:37:07 2017-03-09 00:33:26 2017-03-09 00:38:59 2017-03-09 00:37:10 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 21.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America software development and IT NaN Employed for wages NaN 70000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 25.0 14f1863afa9c7de488050b82eb3edd96 21ba173828fbe9e27ccebaf4d5166a55 13000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN work for a medium-sized company 1.0 Front-End Web Developer, Back-End Web Develo... no preference Spanish single, never married 1000.0 5.0 2ed189768e 2017-03-09 00:37:58 2017-03-09 00:33:53 2017-03-09 00:40:14 2017-03-09 00:38:02 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN Codenewbie NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN high school diploma or equivalent (GED) NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN
3 26.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN I work from home Brazil Brazil software development and IT NaN Employed for wages NaN 40000.0 0.0 NaN male NaN 0.0 1.0 1.0 1.0 1.0 0.0 0.0 40000.0 14.0 91756eb4dc280062a541c25a3d44cfb0 3be37b558f02daae93a6da10f83f0c77 24000.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a medium-sized company NaN Front-End Web Developer, Full-Stack Web Deve... from home Portuguese married or domestic partnership 0.0 5.0 dbdc0664d1 2017-03-09 00:40:13 2017-03-09 00:37:45 2017-03-09 00:42:26 2017-03-09 00:40:18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN some college credit, no degree NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN
4 20.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Portugal Portugal NaN NaN Not working but looking for work NaN 140000.0 NaN NaN female NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 aa3f061a1949a90b27bef7411ecd193f d7c56bbf2c7b62096be9db010e86d96d NaN 0.0 0.0 0.0 NaN Within 7 to 12 months 1.0 NaN NaN NaN 1.0 1.0 NaN 1.0 1.0 NaN NaN NaN NaN work for a multinational corporation 1.0 Full-Stack Web Developer, Information Security... in an office with other developers Portuguese single, never married 0.0 24.0 11b0f2d8a9 2017-03-09 00:42:45 2017-03-09 00:39:44 2017-03-09 00:45:42 2017-03-09 00:42:50 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Information Technology NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

2.2) Representativeness of our data

After a first inspection of the data, we will examine whether this data is actually useful to us, taking into account the population of our interest. The "JobRoleInterest" column describes for every participant the role(s) they'd be interested in working, so we'll filter the data to keep only rows where this question was answered, and we'll see its unique values next:

In [3]:
# Keeping only rows with jobs role of interest
filtered_data = raw_data.dropna(subset=["JobRoleInterest"]).copy()
display("New dataframe's shape:", filtered_data.shape)

# Plotting the job roles that are more interesting to respondants
filtered_data["JobRoleInterest"].value_counts(normalize=True, dropna=False)[:10].plot.barh()
plt.xticks(ticks=[0, .025, .05, .075, .1, .125], labels=[0, 2.5, 5, 7.5, 10, 12.5])
plt.gca().invert_yaxis()
plt.title("% of choices for each job role of interest")
plt.show()
"New dataframe's shape:"
(6992, 136)

Looking at the information above, we can conclude that this data clearly shows professional profiles of our interest.

People can be interested in more than one profile at the same time, so we chose the 10 most popular choices.

Also, we have kept a small piece of the original data, because we're only interested in people that we know they can be interested in jobs that our company offers trainings about.

3) Data analysis

Two columns are of special interest to us to proceed with our analysis:

  1. "CountryLive", which designates the country of residence of respondants.
  2. "MoneyForLearning", which designates how much mony the respondants have spent on learning code so far, in US dollars.

3.1) Countries with bigger target populations

We'll begin by analyzing what countries have more people interested in the topics we offer:

In [4]:
# Generating absolute and relative value counts for countries
freq_table_abs = pd.DataFrame.from_dict(filtered_data["CountryLive"].value_counts().reset_index())
freq_table_rel = pd.DataFrame.from_dict(filtered_data["CountryLive"].value_counts(normalize=True).reset_index())

# Merging both types of value counts
freq_table = freq_table_abs.merge(freq_table_rel, on="index")

# Renaming columns and properly calculating percentages
freq_table.columns = ["Country", "Number", "Percentage"]
freq_table["Percentage"] = round(freq_table["Percentage"] * 100, 1)

display(freq_table[:10])
Country Number Percentage
0 United States of America 3125 45.7
1 India 528 7.7
2 United Kingdom 315 4.6
3 Canada 260 3.8
4 Poland 131 1.9
5 Brazil 129 1.9
6 Germany 125 1.8
7 Australia 112 1.6
8 Russia 102 1.5
9 Ukraine 89 1.3

The table above shows that the USA is, by far, the country that could interest us more - almost 46% of the respondants live there.

We'll now proceed with the money spent on learning.

3.2) Money spent on learning per month

Our company sells subscriptions at a price of $59 per month, and for this reason we're interested in finding out how much money each student spends per month. Therefore, we'll need to divide the "MoneyForLearning" column to the "MonthsProgramming" column.

In [5]:
# Replacing "0 months programming" for 1 month learning, as these are people that just started - to avoid dividing by 0
filtered_data["MonthsProgramming"] = filtered_data["MonthsProgramming"].replace(to_replace=0, value=1)

# Creating a "MoneyPerMonth" column
filtered_data["MoneyPerMonth"] = filtered_data["MoneyForLearning"] / filtered_data["MonthsProgramming"]

# Dropping rows with null values in our new column "MoneyPerMonth" or "CountryLive", as these are fundamental for us
filtered_data.dropna(subset=["MoneyPerMonth", "CountryLive"], inplace=True)

# Generating grouped mean and median for money spent per month, per country
money_spent = filtered_data.groupby("CountryLive")["MoneyPerMonth"].agg(["mean", "median"])

# Displaing the results for the countries that are more interesting to us
display(money_spent.loc[["United States of America", "India" ,"United Kingdom", "Canada"]])
mean median
CountryLive
United States of America 227.997996 3.333333
India 135.100982 0.000000
United Kingdom 45.534443 0.000000
Canada 113.510961 0.000000

The table above shows some odd results:

  1. The mean and median values are extremely appart from each other, being the median always close to zero.
  2. Canada has a very low mean value compared to India, given their socio-economical metrics.

We'll search for outliers and distributions to understand this discrepancies:

In [6]:
# Keeping the countries we're interested in
graph_data = filtered_data[filtered_data["CountryLive"].isin(["United States of America", "India" ,"United Kingdom", "Canada"])]

# Plotting box plots of money spent per month for these countries
sns.boxplot(data=graph_data, x="CountryLive", y="MoneyPerMonth")
Out[6]:
<AxesSubplot:xlabel='CountryLive', ylabel='MoneyPerMonth'>

The plot above shows that the data has several outliers, therefore the distributions are completely odd. We'll clean this data to see the actual values and distributions. The value we'll use to filter the data will be 300$ per month, which is actualy a very high price, and see what happens:

In [7]:
# Filtering the data
graph_data = graph_data[graph_data["MoneyPerMonth"] <= 300]

# Displaying grouped statistics
display(graph_data.groupby("CountryLive")["MoneyPerMonth"].agg(["mean", "median"]))

# Plotting "MoneyPerMonth" distributions
sns.boxplot(data=graph_data, x="CountryLive", y="MoneyPerMonth")
mean median
CountryLive
Canada 19.655420 0.000000
India 17.955836 0.000000
United Kingdom 21.842034 0.000000
United States of America 25.828829 0.416667
Out[7]:
<AxesSubplot:xlabel='CountryLive', ylabel='MoneyPerMonth'>

Even if we just cleaned the data, we still see several outliers in each distribution. This means that the values are centered in the very low part of the scale.

Still, we can clearly see that USA and United Kingdom are the countries with higher money per month values.

4) Conclussions

The analysis we just conducted shows some interesting results:

  1. The countries with higher target populations are: United States of America (45.7%), India (7.7%), United Kingdom (4.6%) and Canada (3.8%).
  2. The average money per month that people in these countries spend in learning the topics our company offers are: United States of America (26 USD), United Kingdom (22 USD), Canada (20 USD) and India (18 USD).

Therefore, our recommendation for advertisment would be to clearly invest in ***USA***, and probably ***India*** - even if the average money spent per month is 4 and 2 USD lower than UK and Canada, respectively, there are nearly twice more people compared with those countries.