Finding the Best Markets to Advertise In¶

1) Introduction¶

In the present project, we'll assume that we're working for an an e-learning company that offers courses on programming, covering topics such as: web and mobile development, data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

We would like to explore surveys that have been already conducted, instead of organizing one ourselves (which would increase our expenses), and a good candidate is the data from freeCodeCamp's 2017 New Coder Survey. freeCodeCamp is a free e-learning platform that offers courses on web development. Because they run a popular Medium publication (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for the purpose of our analysis.

The survey data is publicly available in this GitHub repository.

2) Preparing the data¶

2.1) Data reading and initial inspection¶

In [1]:

# Importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:

# Reading the data
raw_data = pd.read_csv("2017-fCC-New-Coders-Survey-Data.csv")

# Setting pandas so we can inspect every column in the output
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Displaying info of interest
display("Shape of the dataframe:", raw_data.shape)
display(raw_data.head())
#17 and 62
# display(raw_data.iloc[:,17].value_counts())

C:\Users\Alvaro\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3444: DtypeWarning: Columns (17,62) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

'Shape of the dataframe:'

(18175, 136)

	Age	BootcampFinish	BootcampLoanYesNo	BootcampName	BootcampRecommend	ChildrenNumber	CityPopulation	CodeEventConferences	CodeEventDjangoGirls	CodeEventFCC	CodeEventGameJam	CodeEventGirlDev	CodeEventHackathons	CodeEventMeetup	CodeEventNodeSchool	CodeEventNone	CodeEventOther	CodeEventRailsBridge	CodeEventRailsGirls	CodeEventStartUpWknd	CodeEventWkdBootcamps	CodeEventWomenCode	CodeEventWorkshops	CommuteTime	CountryCitizen	CountryLive	EmploymentField	EmploymentFieldOther	EmploymentStatus	EmploymentStatusOther	ExpectedEarning	FinanciallySupporting	FirstDevJob	Gender	GenderOther	HasChildren	HasDebt	HasFinancialDependents	HasHighSpdInternet	HasHomeMortgage	HasStudentDebt	HomeMortgageOwe	HoursLearning	ID.x	ID.y	Income	IsEthnicMinority	IsUnderEmployed	JobApplyWhen	JobInterestBackEnd	JobInterestDataEngr	JobInterestDataSci	JobInterestDevOps	JobInterestFrontEnd	JobInterestFullStack	JobInterestGameDev	JobInterestInfoSec	JobInterestMobile	JobInterestOther	JobInterestProjMngr	JobInterestQAEngr	JobInterestUX	JobPref	JobRelocateYesNo	JobRoleInterest	JobWherePref	LanguageAtHome	MaritalStatus	MoneyForLearning	MonthsProgramming	NetworkID	Part1EndTime	Part1StartTime	Part2EndTime	Part2StartTime	PodcastChangeLog	PodcastCodeNewbie	PodcastCodePen	PodcastDevTea	PodcastDotNET	PodcastGiantRobots	PodcastJSAir	PodcastJSJabber	PodcastNone	PodcastOther	PodcastProgThrowdown	PodcastRubyRogues	PodcastSEDaily	PodcastSERadio	PodcastShopTalk	PodcastTalkPython	PodcastTheWebAhead	ResourceCodecademy	ResourceCodeWars	ResourceCoursera	ResourceCSS	ResourceEdX	ResourceEgghead	ResourceFCC	ResourceHackerRank	ResourceKA	ResourceLynda	ResourceMDN	ResourceOdinProj	ResourceOther	ResourcePluralSight	ResourceSkillcrush	ResourceSO	ResourceTreehouse	ResourceUdacity	ResourceUdemy	ResourceW3S	SchoolDegree	SchoolMajor	StudentDebtOwe	YouTubeCodeCourse	YouTubeCodingTrain	YouTubeCodingTut360	YouTubeComputerphile	YouTubeDerekBanas	YouTubeDevTips	YouTubeEngineeredTruth	YouTubeFCC	YouTubeFunFunFunction	YouTubeGoogleDev	YouTubeLearnCode	YouTubeLevelUpTuts	YouTubeMIT	YouTubeMozillaHacks	YouTubeOther	YouTubeSimplilearn	YouTubeTheNewBoston
0	27.0	NaN	NaN	NaN	NaN	NaN	more than 1 million	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	15 to 29 minutes	Canada	Canada	software development and IT	NaN	Employed for wages	NaN	NaN	NaN	NaN	female	NaN	NaN	1.0	0.0	1.0	0.0	0.0	NaN	15.0	02d9465b21e8bd09374b0066fb2d5614	eb78c1c3ac6cd9052aec557065070fbf	NaN	NaN	0.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	start your own business	NaN	NaN	NaN	English	married or domestic partnership	150.0	6.0	6f1fbc6b2b	2017-03-09 00:36:22	2017-03-09 00:32:59	2017-03-09 00:59:46	2017-03-09 00:36:26	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	1.0	some college credit, no degree	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	34.0	NaN	NaN	NaN	NaN	NaN	less than 100,000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	United States of America	United States of America	NaN	NaN	Not working but looking for work	NaN	35000.0	NaN	NaN	male	NaN	NaN	1.0	0.0	1.0	0.0	1.0	NaN	10.0	5bfef9ecb211ec4f518cfc1d2a6f3e0c	21db37adb60cdcafadfa7dca1b13b6b1	NaN	0.0	NaN	Within 7 to 12 months	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	work for a nonprofit	1.0	Full-Stack Web Developer	in an office with other developers	English	single, never married	80.0	6.0	f8f8be6910	2017-03-09 00:37:07	2017-03-09 00:33:26	2017-03-09 00:38:59	2017-03-09 00:37:10	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	1.0	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	1.0	1.0	some college credit, no degree	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	21.0	NaN	NaN	NaN	NaN	NaN	more than 1 million	NaN	NaN	NaN	NaN	NaN	1.0	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	15 to 29 minutes	United States of America	United States of America	software development and IT	NaN	Employed for wages	NaN	70000.0	NaN	NaN	male	NaN	NaN	0.0	0.0	1.0	NaN	NaN	NaN	25.0	14f1863afa9c7de488050b82eb3edd96	21ba173828fbe9e27ccebaf4d5166a55	13000.0	1.0	0.0	Within 7 to 12 months	1.0	NaN	NaN	1.0	1.0	1.0	NaN	NaN	1.0	NaN	NaN	NaN	NaN	work for a medium-sized company	1.0	Front-End Web Developer, Back-End Web Develo...	no preference	Spanish	single, never married	1000.0	5.0	2ed189768e	2017-03-09 00:37:58	2017-03-09 00:33:53	2017-03-09 00:40:14	2017-03-09 00:38:02	1.0	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	Codenewbie	NaN	NaN	NaN	NaN	1.0	NaN	NaN	1.0	NaN	NaN	1.0	NaN	NaN	1.0	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	1.0	1.0	NaN	high school diploma or equivalent (GED)	NaN	NaN	NaN	NaN	1.0	NaN	1.0	1.0	NaN	NaN	NaN	NaN	1.0	1.0	NaN	NaN	NaN	NaN	NaN
3	26.0	NaN	NaN	NaN	NaN	NaN	between 100,000 and 1 million	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	I work from home	Brazil	Brazil	software development and IT	NaN	Employed for wages	NaN	40000.0	0.0	NaN	male	NaN	0.0	1.0	1.0	1.0	1.0	0.0	40000.0	14.0	91756eb4dc280062a541c25a3d44cfb0	3be37b558f02daae93a6da10f83f0c77	24000.0	0.0	1.0	Within the next 6 months	1.0	NaN	NaN	NaN	1.0	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	work for a medium-sized company	NaN	Front-End Web Developer, Full-Stack Web Deve...	from home	Portuguese	married or domestic partnership	0.0	5.0	dbdc0664d1	2017-03-09 00:40:13	2017-03-09 00:37:45	2017-03-09 00:42:26	2017-03-09 00:40:18	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	1.0	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	some college credit, no degree	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	1.0	1.0	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN
4	20.0	NaN	NaN	NaN	NaN	NaN	between 100,000 and 1 million	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Portugal	Portugal	NaN	NaN	Not working but looking for work	NaN	140000.0	NaN	NaN	female	NaN	NaN	0.0	0.0	1.0	NaN	NaN	NaN	10.0	aa3f061a1949a90b27bef7411ecd193f	d7c56bbf2c7b62096be9db010e86d96d	NaN	0.0	NaN	Within 7 to 12 months	1.0	NaN	NaN	NaN	1.0	1.0	NaN	1.0	1.0	NaN	NaN	NaN	NaN	work for a multinational corporation	1.0	Full-Stack Web Developer, Information Security...	in an office with other developers	Portuguese	single, never married	0.0	24.0	11b0f2d8a9	2017-03-09 00:42:45	2017-03-09 00:39:44	2017-03-09 00:45:42	2017-03-09 00:42:50	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	bachelor's degree	Information Technology	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

2.2) Representativeness of our data¶

After a first inspection of the data, we will examine whether this data is actually useful to us, taking into account the population of our interest. The "JobRoleInterest" column describes for every participant the role(s) they'd be interested in working, so we'll filter the data to keep only rows where this question was answered, and we'll see its unique values next:

In [3]:

# Keeping only rows with jobs role of interest
filtered_data = raw_data.dropna(subset=["JobRoleInterest"]).copy()
display("New dataframe's shape:", filtered_data.shape)

# Plotting the job roles that are more interesting to respondants
filtered_data["JobRoleInterest"].value_counts(normalize=True, dropna=False)[:10].plot.barh()
plt.xticks(ticks=[0, .025, .05, .075, .1, .125], labels=[0, 2.5, 5, 7.5, 10, 12.5])
plt.gca().invert_yaxis()
plt.title("% of choices for each job role of interest")
plt.show()

"New dataframe's shape:"

(6992, 136)

Looking at the information above, we can conclude that this data clearly shows professional profiles of our interest.

People can be interested in more than one profile at the same time, so we chose the 10 most popular choices.

Also, we have kept a small piece of the original data, because we're only interested in people that we know they can be interested in jobs that our company offers trainings about.

3) Data analysis¶

Two columns are of special interest to us to proceed with our analysis:

"CountryLive", which designates the country of residence of respondants.
"MoneyForLearning", which designates how much mony the respondants have spent on learning code so far, in US dollars.

3.1) Countries with bigger target populations¶

We'll begin by analyzing what countries have more people interested in the topics we offer:

In [4]:

# Generating absolute and relative value counts for countries
freq_table_abs = pd.DataFrame.from_dict(filtered_data["CountryLive"].value_counts().reset_index())
freq_table_rel = pd.DataFrame.from_dict(filtered_data["CountryLive"].value_counts(normalize=True).reset_index())

# Merging both types of value counts
freq_table = freq_table_abs.merge(freq_table_rel, on="index")

# Renaming columns and properly calculating percentages
freq_table.columns = ["Country", "Number", "Percentage"]
freq_table["Percentage"] = round(freq_table["Percentage"] * 100, 1)

display(freq_table[:10])

	Country	Number	Percentage
0	United States of America	3125	45.7
1	India	528	7.7
2	United Kingdom	315	4.6
3	Canada	260	3.8
4	Poland	131	1.9
5	Brazil	129	1.9
6	Germany	125	1.8
7	Australia	112	1.6
8	Russia	102	1.5
9	Ukraine	89	1.3

The table above shows that the USA is, by far, the country that could interest us more - almost *46%* of the respondants live there.

We'll now proceed with the money spent on learning.

3.2) Money spent on learning per month¶

Our company sells subscriptions at a price of $59 per month, and for this reason we're interested in finding out how much money each student spends per month. Therefore, we'll need to divide the "MoneyForLearning" column to the "MonthsProgramming" column.

In [5]:

# Replacing "0 months programming" for 1 month learning, as these are people that just started - to avoid dividing by 0
filtered_data["MonthsProgramming"] = filtered_data["MonthsProgramming"].replace(to_replace=0, value=1)

# Creating a "MoneyPerMonth" column
filtered_data["MoneyPerMonth"] = filtered_data["MoneyForLearning"] / filtered_data["MonthsProgramming"]

# Dropping rows with null values in our new column "MoneyPerMonth" or "CountryLive", as these are fundamental for us
filtered_data.dropna(subset=["MoneyPerMonth", "CountryLive"], inplace=True)

# Generating grouped mean and median for money spent per month, per country
money_spent = filtered_data.groupby("CountryLive")["MoneyPerMonth"].agg(["mean", "median"])

# Displaing the results for the countries that are more interesting to us
display(money_spent.loc[["United States of America", "India" ,"United Kingdom", "Canada"]])

	mean	median
CountryLive
United States of America	227.997996	3.333333
India	135.100982	0.000000
United Kingdom	45.534443	0.000000
Canada	113.510961	0.000000

The table above shows some odd results:

The mean and median values are extremely appart from each other, being the median always close to zero.
Canada has a very low mean value compared to India, given their socio-economical metrics.

We'll search for outliers and distributions to understand this discrepancies:

In [6]:

# Keeping the countries we're interested in
graph_data = filtered_data[filtered_data["CountryLive"].isin(["United States of America", "India" ,"United Kingdom", "Canada"])]

# Plotting box plots of money spent per month for these countries
sns.boxplot(data=graph_data, x="CountryLive", y="MoneyPerMonth")

Out[6]:

<AxesSubplot:xlabel='CountryLive', ylabel='MoneyPerMonth'>

The plot above shows that the data has several outliers, therefore the distributions are completely odd. We'll clean this data to see the actual values and distributions. The value we'll use to filter the data will be 300$ per month, which is actualy a very high price, and see what happens:

In [7]:

# Filtering the data
graph_data = graph_data[graph_data["MoneyPerMonth"] <= 300]

# Displaying grouped statistics
display(graph_data.groupby("CountryLive")["MoneyPerMonth"].agg(["mean", "median"]))

# Plotting "MoneyPerMonth" distributions
sns.boxplot(data=graph_data, x="CountryLive", y="MoneyPerMonth")

	mean	median
CountryLive
Canada	19.655420	0.000000
India	17.955836	0.000000
United Kingdom	21.842034	0.000000
United States of America	25.828829	0.416667

Out[7]:

<AxesSubplot:xlabel='CountryLive', ylabel='MoneyPerMonth'>

Even if we just cleaned the data, we still see several outliers in each distribution. This means that the values are centered in the very low part of the scale.

Still, we can clearly see that *USA* and *United Kingdom* are the countries with higher money per month values.

4) Conclussions¶

The analysis we just conducted shows some interesting results:

The countries with higher target populations are: United States of America (45.7%), India (7.7%), United Kingdom (4.6%) and Canada (3.8%).
The average money per month that people in these countries spend in learning the topics our company offers are: United States of America (26 USD), United Kingdom (22 USD), Canada (20 USD) and India (18 USD).

Therefore, our recommendation for advertisment would be to clearly invest in *USA, and probably India* - even if the average money spent per month is 4 and 2 USD lower than UK and Canada, respectively, there are nearly twice more people compared with those countries.