What is the best market to advertise in?

We're working for an an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

To reach our goal, we could organize surveys for a couple of different markets to find out which would the best choices for advertising. This is very costly, however, and it's a good call to explore cheaper options first.

We can try to search existing data that might be relevant for our purpose. One good candidate is the data from freeCodeCamp's 2017 New Coder Survey. freeCodeCamp is a free e-learning platform that offers courses on web development. Because they run a popular Medium publication (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for the purpose of our analysis.

Let's have a look at the survey dataset:

Data Exploration

In [1]:
# Import Pandas
import pandas as pd

# view all columns
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Read file and view first 5 rows
file = pd.read_csv("2017-fCC-New-Coders-Survey-Data.csv", low_memory=False)
file.head()
Out[1]:
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston
0 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes Canada Canada software development and IT NaN Employed for wages NaN NaN NaN NaN female NaN NaN 1.0 0.0 1.0 0.0 0.0 0.0 NaN 15.0 02d9465b21e8bd09374b0066fb2d5614 eb78c1c3ac6cd9052aec557065070fbf NaN NaN 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN start your own business NaN NaN NaN English married or domestic partnership 150.0 6.0 6f1fbc6b2b 2017-03-09 00:36:22 2017-03-09 00:32:59 2017-03-09 00:59:46 2017-03-09 00:36:26 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 34.0 0.0 NaN NaN NaN NaN NaN less than 100,000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 35000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 10.0 5bfef9ecb211ec4f518cfc1d2a6f3e0c 21db37adb60cdcafadfa7dca1b13b6b1 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a nonprofit 1.0 Full-Stack Web Developer in an office with other developers English single, never married 80.0 6.0 f8f8be6910 2017-03-09 00:37:07 2017-03-09 00:33:26 2017-03-09 00:38:59 2017-03-09 00:37:10 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 21.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America software development and IT NaN Employed for wages NaN 70000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 25.0 14f1863afa9c7de488050b82eb3edd96 21ba173828fbe9e27ccebaf4d5166a55 13000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN work for a medium-sized company 1.0 Front-End Web Developer, Back-End Web Develo... no preference Spanish single, never married 1000.0 5.0 2ed189768e 2017-03-09 00:37:58 2017-03-09 00:33:53 2017-03-09 00:40:14 2017-03-09 00:38:02 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN Codenewbie NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN high school diploma or equivalent (GED) NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN
3 26.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN I work from home Brazil Brazil software development and IT NaN Employed for wages NaN 40000.0 0.0 NaN male NaN 0.0 1.0 1.0 1.0 1.0 0.0 0.0 40000.0 14.0 91756eb4dc280062a541c25a3d44cfb0 3be37b558f02daae93a6da10f83f0c77 24000.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a medium-sized company NaN Front-End Web Developer, Full-Stack Web Deve... from home Portuguese married or domestic partnership 0.0 5.0 dbdc0664d1 2017-03-09 00:40:13 2017-03-09 00:37:45 2017-03-09 00:42:26 2017-03-09 00:40:18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN some college credit, no degree NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN
4 20.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Portugal Portugal NaN NaN Not working but looking for work NaN 140000.0 NaN NaN female NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 aa3f061a1949a90b27bef7411ecd193f d7c56bbf2c7b62096be9db010e86d96d NaN 0.0 0.0 0.0 NaN Within 7 to 12 months 1.0 NaN NaN NaN 1.0 1.0 NaN 1.0 1.0 NaN NaN NaN NaN work for a multinational corporation 1.0 Full-Stack Web Developer, Information Security... in an office with other developers Portuguese single, never married 0.0 24.0 11b0f2d8a9 2017-03-09 00:42:45 2017-03-09 00:39:44 2017-03-09 00:45:42 2017-03-09 00:42:50 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Information Technology NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

The data contains a lot of information that could be useful for us. For example, columns include location, Money for learning, Job preferences and information on where they learn. All of this will help alot but we do not yet know if this sample is actually useful for our analysis. We are working for a company that focuses mainly on web development and mobile development. So, let's make sure that these interests are well represented in the survey:

In [2]:
# Value counts of Job role interests
file["JobRoleInterest"].value_counts(normalize=True)
Out[2]:
Full-Stack Web Developer                                                                                                                                                                                                      0.117706
  Front-End Web Developer                                                                                                                                                                                                     0.064359
  Data Scientist                                                                                                                                                                                                              0.021739
Back-End Web Developer                                                                                                                                                                                                        0.020309
  Mobile Developer                                                                                                                                                                                                            0.016733
                                                                                                                                                                                                                                ...   
Game Developer,   Data Scientist, Full-Stack Web Developer                                                                                                                                                                    0.000143
Back-End Web Developer, Game Developer,   User Experience Designer,   Data Scientist, Full-Stack Web Developer,   Mobile Developer                                                                                            0.000143
  Front-End Web Developer,   Mobile Developer,   Quality Assurance Engineer, Data Engineer, Full-Stack Web Developer, Game Developer,   Product Manager,   DevOps / SysAdmin, Information Security, Back-End Web Developer    0.000143
Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer, Information Security, Data Engineer,   Data Scientist                                                                                            0.000143
Information Security, Back-End Web Developer,   User Experience Designer,   Front-End Web Developer,   Mobile Developer, Data Engineer                                                                                        0.000143
Name: JobRoleInterest, Length: 3213, dtype: float64

Just from looking through the most popular responses in the 'JobRoleInterest' column doesn't help too much. We can see that the most popular repsonses seem to involve Web development but responders can give multiple responses which means we have 3212 unique responses. Does this indicate that most are unsure of what they are interested in learning? let's take a look.

In [3]:
# Import matplotlib
import matplotlib.pyplot as plt


# Find numbers with 1 or more interests
multiple_interests = 0
one_interest = 0

for i in file["JobRoleInterest"].dropna():
        i = i.split(",")
        if len(i) > 1:
            multiple_interests += 1
        else:
            one_interest += 1
            
# convert to %            
percent_multiple_interests = (multiple_interests / len(file["JobRoleInterest"].dropna())) * 100
percent_one_interest = (one_interest / len(file["JobRoleInterest"].dropna())) * 100   

# Print Values
print("Multiple Interests: {}".format(percent_multiple_interests))
print("percent_one_interest: {}".format(percent_one_interest))

# Show in Chart
plt.bar(['Multiple Interests', 'One Interest'], [percent_multiple_interests, percent_one_interest])
plt.title("Majority have more than 1 interest")
plt.ylabel("Percentage")
plt.show()
Multiple Interests: 68.34954233409611
percent_one_interest: 31.650457665903893

68% of responders gave more than response which indicates the majority are unsure of what they want to learn. Now remember that the company we are working for offers many different courses so this isnt actually a problem. With may dfferent courses they can appeal to those unsure what they want to learn.

The main focus of the company, however, is on web development and mobile development. So, what % are interested in at least one of those?

In [4]:
# Find munber intersted in web or mobile
are_interested = 0
for i in file["JobRoleInterest"].dropna():
        if 'Web' in i or 'Mobile' in i:
            are_interested += 1

# Convert to %
percent_interested = (are_interested / len(file["JobRoleInterest"].dropna())) * 100
percent_not_interested = 100 - percent_interested

# Print
print('Interested: {}'.format(percent_interested))
print('Not Interested: {}'.format(percent_not_interested))

# Show in Bar Chart
plt.bar(['Interested', 'Not Interested'], [percent_interested, percent_not_interested])
plt.title("Majority are interested in web or mobile")
plt.ylabel("Percentage")
plt.show()
Interested: 86.29862700228833
Not Interested: 13.701372997711672

Great, over 86% of responders are interested in at least one. Now that we know the survey is representative of our population interest we can move on. Let's look through where most of our potential customers live.

Where do they live?

In [5]:
# Remove null columns
file_clean = file[file['JobRoleInterest'].notnull()].copy()
# Show proportion from each country
file_clean['CountryLive'].value_counts(normalize=True) *100
Out[5]:
United States of America         45.700497
India                             7.721556
United Kingdom                    4.606610
Canada                            3.802281
Poland                            1.915765
Brazil                            1.886517
Germany                           1.828020
Australia                         1.637906
Russia                            1.491664
Ukraine                           1.301550
Nigeria                           1.228429
Spain                             1.126060
France                            1.096812
Romania                           1.038315
Netherlands (Holland, Europe)     0.950570
Italy                             0.906698
Serbia                            0.760456
Philippines                       0.760456
Greece                            0.672711
Ireland                           0.628839
South Africa                      0.570342
Mexico                            0.541094
Turkey                            0.526470
Singapore                         0.497221
Hungary                           0.497221
New Zealand                       0.482597
Croatia                           0.467973
Argentina                         0.467973
Pakistan                          0.453349
Indonesia                         0.453349
Norway                            0.453349
Sweden                            0.453349
Denmark                           0.438725
Israel                            0.424101
Egypt                             0.424101
Finland                           0.424101
Portugal                          0.409476
China                             0.409476
Vietnam                           0.409476
Malaysia                          0.409476
Czech Republic                    0.380228
Kenya                             0.380228
Japan                             0.350980
Bangladesh                        0.336356
Lithuania                         0.336356
Great Britain                     0.307107
Belarus                           0.292483
Bosnia & Herzegovina              0.292483
Belgium                           0.277859
United Arab Emirates              0.277859
Austria                           0.248611
Nepal                             0.248611
Korea South                       0.248611
Venezuela                         0.233987
Colombia                          0.233987
Bulgaria                          0.219362
Republic of Serbia                0.204738
Taiwan                            0.204738
Switzerland                       0.204738
Thailand                          0.190114
Latvia                            0.190114
Ghana                             0.175490
Hong Kong                         0.160866
Kazakhstan                        0.160866
Sri Lanka                         0.146242
Macedonia                         0.146242
Morocco                           0.146242
Slovakia                          0.131617
Slovenia                          0.131617
Saudi Arabia                      0.131617
Jamaica                           0.131617
Estonia                           0.116993
Peru                              0.116993
Algeria                           0.116993
Dominican Republic                0.116993
Costa Rica                        0.102369
Puerto Rico                       0.102369
Virgin Islands (USA)              0.087745
Luxembourg                        0.087745
Chile                             0.087745
Albania                           0.087745
Iran                              0.073121
Tunisia                           0.073121
Uruguay                           0.073121
Azerbaijan                        0.073121
Zimbabwe                          0.058497
Afghanistan                       0.058497
Georgia                           0.058497
Cambodia                          0.058497
Iceland                           0.043872
Uganda                            0.043872
Senegal                           0.043872
Netherland Antilles               0.043872
Paraguay                          0.043872
Niger                             0.043872
Uzbekistan                        0.043872
Lebanon                           0.029248
Iraq                              0.029248
Haiti                             0.029248
Ecuador                           0.029248
Mauritius                         0.029248
Moldova                           0.029248
Guam                              0.029248
Honduras                          0.029248
Bahrain                           0.029248
Cyprus                            0.029248
Nambia                            0.014624
Qatar                             0.014624
Papua New Guinea                  0.014624
Aruba                             0.014624
Trinidad & Tobago                 0.014624
Angola                            0.014624
Yemen                             0.014624
Somalia                           0.014624
Anguilla                          0.014624
Turkmenistan                      0.014624
Rwanda                            0.014624
Jordan                            0.014624
Botswana                          0.014624
Samoa                             0.014624
Kyrgyzstan                        0.014624
Mozambique                        0.014624
Liberia                           0.014624
Gambia                            0.014624
Gibraltar                         0.014624
Myanmar                           0.014624
Nicaragua                         0.014624
Sudan                             0.014624
Vanuatu                           0.014624
Panama                            0.014624
Channel Islands                   0.014624
Cameroon                          0.014624
Guadeloupe                        0.014624
Guatemala                         0.014624
Cuba                              0.014624
Cayman Islands                    0.014624
Bolivia                           0.014624
Name: CountryLive, dtype: float64

By far the the highest proportion come from the United States. In second place is India. I am fairly sure that with such a large number of potential customers that the United States is going to be 1 of the 2 markets I suggest advertising in. However, before deciding on India as the second choice we need to also check how much money each customer is able to spend.

I am going to narrow things down to the top 4 countries in the chart above. These are 4 countries with English as an official language(the programming courses are in English) and are also a fair bit ahead of the 5th most common country.

How much do they spend?

The subscription for a our programming cours is 59$ per month so lets find out how much on average each customer can spend by dividing the money for learning column with the months programming column. I will then use this number to find the average spend of each potential customer from our top 4 countries. This should give us a good idea which 2 markets to invest in.

In [6]:
# Create money spent column
programming = file_clean["MonthsProgramming"].replace(0, 1, inplace = True)
file_clean['Money_Spent'] = round(file_clean['MoneyForLearning'] / file_clean["MonthsProgramming"], 2)


# Remove null values
file_clean = file_clean[file_clean['Money_Spent'].notnull()].copy()
file_clean = file_clean[file_clean['CountryLive'].notnull()].copy()

# Filter top 4 countries
top_four = ["United States of America", "United Kingdom", "Canada", "India"]
file_clean_four = file_clean[(file_clean["CountryLive"] == "United States of America") | (file_clean["CountryLive"] == "United Kingdom")
                             | (file_clean["CountryLive"] == "Canada") | (file_clean["CountryLive"] == "India")].copy()

# Show mean per country
money_per_region = file_clean_four.groupby('CountryLive')["Money_Spent"].mean()
money_per_region
Out[6]:
CountryLive
Canada                      113.510958
India                       135.101102
United Kingdom               45.534337
United States of America    227.998023
Name: Money_Spent, dtype: float64

I must admit these results are a slght suprise. I expected the average spend in India to be the lowest because of the average income in each country. Before suggesting USA and India as the 2 markets I will just check to make sure that the results are not being influenced by extreme outliers. Let's check a box plot for each country:

In [7]:
# Import module
import seaborn as sns

# Plot box plot
ax = sns.boxplot(x = 'CountryLive', y = 'Money_Spent', data = file_clean_four)
ax.set_title('Average money spent per person in each Country')
ax.set_xticklabels(labels = ['USA', 'UK', 'India', 'Canada'],  rotation=90)
ax.tick_params(left=False, bottom=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

It's clear we have outliers affecting the results. Someone in America is spending 80,000 a month! I am going to use z score to remove some of the extreme outliers from each country. Usually, anything with a z score above 3 would be removed. However, I do expect quite a lot of variance in the data because of the difference in spending between learners who study for free online and thos who pay for expensive coding bootcamps. I will therefore remove any that have a z score above 10.

In [8]:
# Calculate mean and std per country
grouped_mean = file_clean_four.groupby('CountryLive')["Money_Spent"].transform("mean")
grouped_std = file_clean_four.groupby('CountryLive')["Money_Spent"].transform("std")
file_clean_four["grouped_std"] = grouped_std
file_clean_four["grouped_mean"] = grouped_mean

# Create x score column
file_clean_four["z_score"] = (file_clean_four['Money_Spent'] - file_clean_four["grouped_mean"]) / file_clean_four["grouped_std"]
# Filter out high z score
file_clean_four = file_clean_four[file_clean_four["z_score"] < 10]

# Show new mean
money_per_region = file_clean_four.groupby('CountryLive')["Money_Spent"].mean()
money_per_region
Out[8]:
CountryLive
Canada                       93.065397
India                       113.748506
United Kingdom               45.534337
United States of America    183.800136
Name: Money_Spent, dtype: float64
In [9]:
# Plot box plot
ax = sns.boxplot(x = 'CountryLive', y = 'Money_Spent', data = file_clean_four)
ax.set_title('Average money spent per person in each Country')
ax.set_xticklabels(labels = ['USA', 'UK', 'India', 'Canada'],  rotation=90)
ax.tick_params(left=False, bottom=False)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

These boxplots look better but we still have a few outliers in the USA and India columns. Let's take a closer look:

In [10]:
# Show USA outliers
usa_outliers = file_clean_four[(file_clean_four["CountryLive"] == "United States of America") & (file_clean_four["Money_Spent"] > 7500)]
usa_outliers
Out[10]:
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston Money_Spent grouped_std grouped_mean z_score
718 26.0 1.0 0.0 0.0 The Coding Boot Camp at UCLA Extension 1.0 NaN more than 1 million 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America architecture or physical engineering NaN Employed for wages NaN 50000.0 NaN NaN male NaN NaN 0.0 0.0 0.0 NaN 0.0 NaN NaN 35.0 796ae14c2acdee36eebc250a252abdaf d9e44d73057fa5d322a071adc744bf07 44500.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 work for a startup 1.0 User Experience Designer, Full-Stack Web Dev... in an office with other developers English single, never married 8000.0 1.0 50dab3f716 2017-03-09 21:26:35 2017-03-09 21:21:58 2017-03-09 21:29:10 2017-03-09 21:26:39 NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN bachelor's degree Architecture NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 8000.00 1940.245622 227.998023 4.005679
3184 34.0 1.0 1.0 0.0 We Can Code IT 1.0 NaN more than 1 million NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN Less than 15 minutes NaN United States of America software development and IT NaN Employed for wages NaN 60000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 5d4889491d9d25a255e57fd1c0022458 585e8f8b9a838ef1abbe8c6f1891c048 40000.0 0.0 0.0 0.0 0.0 I haven't decided NaN 1.0 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN 1.0 1.0 work for a medium-sized company 0.0 Quality Assurance Engineer, DevOps / SysAd... in an office with other developers English single, never married 9000.0 1.0 e7bebaabd4 2017-03-11 23:34:16 2017-03-11 23:31:17 2017-03-11 23:36:02 2017-03-11 23:34:21 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9000.00 1940.245622 227.998023 4.521078
3930 31.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working and not looking for work NaN 100000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 50.0 e1d790033545934fbe5bb5b60e368cd9 7cf1e41682462c42ce48029abf77d43c NaN 1.0 0.0 0.0 NaN Within the next 6 months 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a startup 1.0 DevOps / SysAdmin, Front-End Web Developer... no preference English married or domestic partnership 65000.0 6.0 75759e5a1c 2017-03-13 10:06:46 2017-03-13 09:56:13 2017-03-13 10:10:00 2017-03-13 10:06:50 NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN reactivex.io/learnrx/ & jafar husain NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Biology 40000.0 NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0 1.0 1.0 1.0 1.0 NaN various conf presentations NaN NaN 10833.33 1940.245622 227.998023 5.465974
6805 46.0 1.0 1.0 1.0 Sabio.la 0.0 NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 70000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 45.0 69096aacf4245694303cf8f7ce68a63f 4c56f82a348836e76dd90d18a3d5ed88 NaN 1.0 0.0 0.0 NaN Within the next 6 months NaN 1.0 1.0 NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN work for a multinational corporation 1.0 Full-Stack Web Developer, Game Developer, Pr... no preference English married or domestic partnership 15000.0 1.0 53d13b58e9 2017-03-21 20:13:08 2017-03-21 20:10:25 2017-03-21 20:14:36 2017-03-21 20:13:11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0 1.0 1.0 bachelor's degree Business Administration and Management 45000.0 NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 15000.00 1940.245622 227.998023 7.613470
7198 32.0 0.0 NaN NaN NaN NaN NaN more than 1 million 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America education NaN Employed for wages NaN 55000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 4.0 cb2754165344e6be79da8a4c76bf3917 272219fbd28a3a7562cb1d778e482e1e NaN 1.0 0.0 0.0 0.0 I'm already applying 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a multinational corporation 0.0 Full-Stack Web Developer, Back-End Web Developer no preference Spanish single, never married 70000.0 5.0 439a4adaf6 2017-03-23 01:37:46 2017-03-23 01:35:01 2017-03-23 01:39:37 2017-03-23 01:37:49 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN 1.0 NaN 1.0 NaN 1.0 NaN NaN NaN NaN 1.0 NaN 1.0 NaN 1.0 professional degree (MBA, MD, JD, etc.) Computer Science NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN 14000.00 1940.245622 227.998023 7.098071
9778 33.0 1.0 0.0 1.0 Grand Circus 1.0 NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America education NaN Employed for wages NaN 55000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 40.0 7a62790f6ded15e26d5f429b8a4d1095 98eeee1aa81ba70b2ab288bf4b63d703 20000.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN work for a medium-sized company NaN Full-Stack Web Developer, Data Engineer, Qua... from home English single, never married 8000.0 1.0 ea80a3b15e 2017-04-05 19:48:12 2017-04-05 19:40:19 2017-04-05 19:49:44 2017-04-05 19:49:03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 1.0 NaN NaN master's degree (non-professional) Chemical Engineering 45000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 8000.00 1940.245622 227.998023 4.005679
16650 29.0 0.0 NaN NaN NaN NaN 2.0 more than 1 million NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN NaN 1.0 NaN male NaN 1.0 1.0 1.0 1.0 1.0 0.0 1.0 400000.0 40.0 e1925d408c973b91cf3e9a9285238796 7e9e3c31a3dc2cafe3a09269398c4de8 NaN 1.0 1.0 0.0 NaN I'm already applying 1.0 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN NaN 1.0 NaN NaN work for a multinational corporation 1.0 Product Manager, Data Engineer, Full-Stack W... in an office with other developers English married or domestic partnership 200000.0 12.0 1a45f4a3ef 2017-03-14 02:42:57 2017-03-14 02:40:10 2017-03-14 02:45:55 2017-03-14 02:43:05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 associate's degree Computer Programming 30000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 16666.67 1940.245622 227.998023 8.472470
16997 27.0 0.0 NaN NaN NaN NaN 1.0 more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America health care NaN Employed for wages NaN 60000.0 0.0 NaN female NaN 1.0 1.0 1.0 1.0 0.0 0.0 1.0 NaN 12.0 624914ce07c296c866c9e16a14dc01c7 6384a1e576caf4b6b9339fe496a51f1f 40000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months NaN NaN NaN NaN 1.0 1.0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 work for a medium-sized company 1.0 Mobile Developer, Game Developer, User Exp... in an office with other developers English single, never married 12500.0 1.0 ad1a21217c 2017-03-20 05:43:28 2017-03-20 05:40:08 2017-03-20 05:45:28 2017-03-20 05:43:32 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 1.0 some college credit, no degree NaN 12500.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12500.00 1940.245622 227.998023 6.324973
17231 50.0 0.0 NaN NaN NaN NaN 2.0 less than 100,000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN Kenya United States of America NaN NaN Not working but looking for work NaN 40000.0 0.0 NaN female NaN 1.0 0.0 1.0 1.0 NaN 0.0 NaN NaN 1.0 d4bc6ae775b20816fcd41048ef75417c 606749cd07b124234ab6dff81b324c02 NaN 1.0 0.0 0.0 NaN Within the next 6 months NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN work for a nonprofit 0.0 Front-End Web Developer in an office with other developers English married or domestic partnership 30000.0 2.0 38c1b478d0 2017-03-24 18:48:23 2017-03-24 18:46:01 2017-03-24 18:51:20 2017-03-24 18:48:27 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN bachelor's degree Computer Programming NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15000.00 1940.245622 227.998023 7.613470

Of the outliers here 5 of the 9 didn't go to bootcamp. How else would they have spent so much? (University tuition is not included). They can be removed. The other 4 did attend bootcamp but the data shows that they been programming for no more than three months when they completed the survey. They most likely paid a large sum of money for a bootcamp that was going to last for several months, so the amount of money spent per month is unrealistic and should be significantly lower (because they probably didn't spend anything for the next couple of months after the survey). As a consequence, I'll remove all outliers.

In [11]:
# Show India outliers
india_outliers = file_clean_four[(file_clean_four["CountryLive"] == "India") & (file_clean_four["Money_Spent"] > 2500)]
india_outliers
Out[11]:
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston Money_Spent grouped_std grouped_mean z_score
1728 24.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN India India NaN NaN A stay-at-home parent or homemaker NaN 70000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 30.0 d964ec629fd6d85a5bf27f7339f4fa6d 950a8cf9cef1ae6a15da470e572b1b7a NaN 0.0 0.0 0.0 NaN Within the next 6 months 1.0 NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN 1.0 NaN 1.0 work for a startup 1.0 User Experience Designer, Mobile Developer... in an office with other developers Bengali single, never married 20000.0 4.0 38d312a990 2017-03-10 10:22:34 2017-03-10 10:17:42 2017-03-10 10:24:38 2017-03-10 10:22:40 NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 bachelor's degree Computer Programming NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 5000.00 692.960385 135.101102 7.020457
1755 20.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN 1.0 NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN India India NaN NaN Not working and not looking for work NaN 100000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 811bf953ef546460f5436fcf2baa532d 81e2a4cab0543e14746c4a20ffdae17c NaN 0.0 0.0 0.0 NaN I haven't decided NaN 1.0 NaN 1.0 1.0 1.0 NaN 1.0 NaN NaN NaN NaN NaN work for a multinational corporation 1.0 Information Security, Full-Stack Web Developer... no preference Hindi single, never married 50000.0 15.0 4611a76b60 2017-03-10 10:48:31 2017-03-10 10:42:29 2017-03-10 10:51:37 2017-03-10 10:48:38 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 1.0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN 1.0 1.0 1.0 bachelor's degree Computer Science NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN 3333.33 692.960385 135.101102 4.615313
7989 28.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 15 to 29 minutes India India software development and IT NaN Employed for wages NaN 500000.0 1.0 NaN male NaN 0.0 1.0 1.0 1.0 0.0 0.0 1.0 NaN 20.0 a6a5597bbbc2c282386d6675641b744a da7bbb54a8b26a379707be56b6c51e65 300000.0 0.0 0.0 0.0 0.0 more than 12 months from now 1.0 NaN NaN NaN 1.0 1.0 1.0 NaN NaN NaN NaN NaN 1.0 work for a multinational corporation 1.0 User Experience Designer, Back-End Web Devel... in an office with other developers Marathi married or domestic partnership 5000.0 1.0 c47a447b5d 2017-03-26 14:06:48 2017-03-26 14:02:41 2017-03-26 14:13:13 2017-03-26 14:07:17 NaN NaN NaN NaN NaN NaN NaN NaN NaN Not listened to anything yet. NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN bachelor's degree Aerospace and Aeronautical Engineering 2500.0 NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 5000.00 692.960385 135.101102 7.020457
8126 22.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN India India NaN NaN Not working but looking for work NaN 80000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 80.0 69e8ab9126baee49f66e3577aea7fd3c 9f08092e82f709e63847ba88841247c0 NaN 0.0 0.0 0.0 NaN I'm already applying 1.0 NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a startup 1.0 Back-End Web Developer, Full-Stack Web Develop... in an office with other developers Malayalam single, never married 5000.0 1.0 0d3d1762a4 2017-03-27 07:10:17 2017-03-27 07:05:23 2017-03-27 07:12:21 2017-03-27 07:10:22 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Electrical and Electronics Engineering 10000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 5000.00 692.960385 135.101102 7.020457
15587 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes India India software development and IT NaN Employed for wages NaN 65000.0 0.0 NaN male NaN 0.0 1.0 1.0 1.0 0.0 0.0 1.0 NaN 36.0 5a7394f24292cb82b72adb702886543a 8bc7997217d4a57b22242471cc8d89ef 60000.0 0.0 0.0 0.0 1.0 I haven't decided NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a startup NaN Full-Stack Web Developer, Data Scientist from home Hindi single, never married 100000.0 24.0 8af0c2b6da 2017-04-03 09:43:53 2017-04-03 09:39:38 2017-04-03 09:54:39 2017-04-03 09:43:57 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 bachelor's degree Communications 25000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN 1.0 NaN NaN NaN NaN 4166.67 692.960385 135.101102 5.817892

Meanwhile, none of the Indian outliers attended bootcamp and can therefore all be excluded.

In [12]:
# Drop outliers
file_clean_four = file_clean_four.drop(usa_outliers.index)
file_clean_four = file_clean_four.drop(india_outliers.index)

# Show new mean values per country
money_per_region = file_clean_four.groupby('CountryLive')["Money_Spent"].mean()
money_per_region
Out[12]:
CountryLive
Canada                       93.065397
India                        65.758884
United Kingdom               45.534337
United States of America    147.063039
Name: Money_Spent, dtype: float64

Removing all the outliers from the box plot leaves us with the following average money spent shown above. Before giving the final verdict I am going to make a couple more charts. The first will show the total number of potential customers in each country who spend enough for a monthly subscription. The second will show the total money spent by all learners from each country. This will give a good idea of the total potnetial customers and revenue available in each market.

Market Potential

In [13]:
# Number who spend above 59 dollars in each country
india_spend_enough = len(file_clean_four[(file_clean_four["CountryLive"] == "India") &
                                 (file_clean_four["MoneyForLearning"] > 59)])
canada_spend_enough = len(file_clean_four[(file_clean_four["CountryLive"] == "Canada") &
                                 (file_clean_four["MoneyForLearning"] > 59)])
uk_spend_enough = len(file_clean_four[(file_clean_four["CountryLive"] == "United Kingdom") &
                                 (file_clean_four["MoneyForLearning"] > 59)])
usa_spend_enough = len(file_clean_four[(file_clean_four["CountryLive"] == "United States of America") &
                                 (file_clean_four["MoneyForLearning"] > 59)])

spend_enough = [canada_spend_enough, india_spend_enough, usa_spend_enough, uk_spend_enough]

# Spend per country
canada_meanxpotential = money_per_region["Canada"] * len(file_clean_four[file_clean_four["CountryLive"] == "Canada"])
india_meanxpotential = money_per_region["India"] * len(file_clean_four[file_clean_four["CountryLive"] == "Canada"])
usa_meanxpotential = money_per_region["United States of America"] * len(file_clean_four[file_clean_four["CountryLive"] == "United States of America"])
uk_meanxpotential = money_per_region["United Kingdom"] * len(file_clean_four[file_clean_four["CountryLive"] == "United Kingdom"])
meanxpotential = [canada_meanxpotential, india_meanxpotential, usa_meanxpotential, uk_meanxpotential]

# Create new dataframe
potential_customers = pd.DataFrame([spend_enough, meanxpotential])
potential_customers.columns = ['Canada', 'India', 'USA', 'UK']
potential_customers.index = ['Total Spend > 59', 'Spending * potential customers']
potential_customers
Out[13]:
Canada India USA UK
Total Spend > 59 85.00 151.000000 1281.0 96.00
Spending * potential customers 22242.63 15716.373282 429718.2 12704.08
In [14]:
# Create 1x2 grid
fig = plt.figure(figsize= (14,6))
fig.tight_layout
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)

# plot number who spend above 59 dollars
ax1.bar(potential_customers.columns, potential_customers.iloc[0])
ax1.set_title("Number of potential customers who spend more per month \n than subscription cost.")
ax1.set_ylabel("Number of people")
ax1.set_xlabel("Countries")
ax1.tick_params(left=False, bottom=False)
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)

# Plot total spend per country
ax2.bar(potential_customers.columns, potential_customers.iloc[1], color='g')
ax2.set_title("Potential money in the market")
ax2.set_ylabel("Spend X potential Customers")
ax2.set_xlabel("Countries")
ax2.tick_params(left=False, bottom=False)
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)

If it wasn't clear already the USA is by far the best market to advertise in. With both the highest number of potential customers and the highest mean spend the US market dwarfs the competition. The chart on the left shows that the US has over 6X the number of potential customers who spend more than the subscription price per month. The chart on the right shows how much more potential money is in the US market. Candada and India are both slightly ahead of the UK in both charts. India has more potential customers who spend at least the subscription cost per month, whilst there is more total money in the Canadian market due to it's higher mean spend.

Conclusion

One of the markets must be the United States and it may be prudent just to spend all of the budget there considering how much more potential money is in the market. The second market should be either India or Canada. If it has to be 2 markets then I would suggest 80% of the budget goes to the US whilst 20% goes to Canada which has slightly more potential money in its market than India.

Suggested Budget

USA 80%

Canada 20%

In [ ]: