Guided Project : Finding the Best Markets to Advertise In

We're working for an an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

To reach our goal, we could organize surveys for a couple of different markets to find out which would the best choices for advertising. This is very costly, however, and it's a good call to explore cheaper options first.

We can try to search existing data that might be relevant for our purpose. One good candidate is the data from freeCodeCamp's 2017 New Coder Survey. freeCodeCamp is a free e-learning platform that offers courses on web development. Because they run a popular Medium publication (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for the purpose of our analysis.

The survey data is publicly available in this GitHub repository.

Reading in and Exploring our Data

In [1]:
# read in our data en import the libraries we'll use
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

survey = pd.read_csv('2017-fCC-New-Coders-Survey-Data.csv', low_memory=False)
survey.shape
Out[1]:
(18175, 136)
In [2]:
survey.head()
Out[2]:
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls ... YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston
0 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 34.0 0.0 NaN NaN NaN NaN NaN less than 100,000 NaN NaN ... 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 21.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN ... NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN
3 26.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN ... 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN
4 20.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 136 columns

Our dataset contains 18175 rows and 136 columns. Some interesting columns for our analysis are JobRoleInterest, HoursLearning (the time clients spent studying), CityPopulation (the place clients are located) and MoneyForLearning (hwo much money they want to spend on learning).

Representativity of our Sample

As we mentioned earlier, most of the courses we offer are on web and mobile development, but we also cover many other domains, like data science, game development, etc. For the purpose of our analysis, we want to answer questions about a population of new coders that are interested in the subjects we teach. What we'd like to know:

  • Where are these new coders located?
  • What are the locations with the greatest number of new coders?
  • How much money new coders are willing to spend on learning?

Before starting to analyze the sample data we have, we need to clarify whether it's representative for our population of interest and whether it has the right categories of people for our purpose. The JobRoleInterest column describes for every participant the role(s) they'd be interested in working.

In [3]:
# generating a frequency table for the JobRoleInterest column
freq_interest = survey['JobRoleInterest'].value_counts(normalize=True) * 100
freq_interest
Out[3]:
Full-Stack Web Developer                                                                                                                                                                                                                                                    11.770595
  Front-End Web Developer                                                                                                                                                                                                                                                    6.435927
  Data Scientist                                                                                                                                                                                                                                                             2.173913
Back-End Web Developer                                                                                                                                                                                                                                                       2.030892
  Mobile Developer                                                                                                                                                                                                                                                           1.673341
Game Developer                                                                                                                                                                                                                                                               1.630435
Information Security                                                                                                                                                                                                                                                         1.315789
Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                                                                                          0.915332
  Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                                                          0.800915
  Product Manager                                                                                                                                                                                                                                                            0.786613
Data Engineer                                                                                                                                                                                                                                                                0.758009
  User Experience Designer                                                                                                                                                                                                                                                   0.743707
  User Experience Designer,   Front-End Web Developer                                                                                                                                                                                                                        0.614989
  Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                                  0.557780
Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                                                                  0.514874
Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                                  0.514874
  DevOps / SysAdmin                                                                                                                                                                                                                                                          0.514874
Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer                                                                                                                                                                                                  0.443364
  Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer                                                                                                                                                                                                  0.429062
Full-Stack Web Developer,   Mobile Developer                                                                                                                                                                                                                                 0.414760
  Front-End Web Developer,   User Experience Designer                                                                                                                                                                                                                        0.414760
Back-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                                                             0.386156
Full-Stack Web Developer, Back-End Web Developer                                                                                                                                                                                                                             0.371854
Back-End Web Developer,   Front-End Web Developer                                                                                                                                                                                                                            0.286041
Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer                                                                                                                                                                                                  0.271739
Data Engineer,   Data Scientist                                                                                                                                                                                                                                              0.271739
  Front-End Web Developer,   Mobile Developer                                                                                                                                                                                                                                0.257437
Full-Stack Web Developer,   Data Scientist                                                                                                                                                                                                                                   0.243135
  Mobile Developer, Game Developer                                                                                                                                                                                                                                           0.228833
  Data Scientist, Data Engineer                                                                                                                                                                                                                                              0.228833
                                                                                                                                                                                                                                                                              ...    
  User Experience Designer,   Front-End Web Developer, Data Engineer,   Product Manager,   DevOps / SysAdmin, Full-Stack Web Developer,   Data Scientist, Game Developer, Back-End Web Developer                                                                             0.014302
  User Experience Designer,   Front-End Web Developer,   Data Scientist, Information Security,   Mobile Developer,   DevOps / SysAdmin,   Product Manager                                                                                                                    0.014302
  Mobile Developer, Back-End Web Developer,   Front-End Web Developer, Information Security                                                                                                                                                                                  0.014302
  DevOps / SysAdmin, Game Developer, Data Engineer,   Mobile Developer, Information Security, Full-Stack Web Developer,   Data Scientist                                                                                                                                     0.014302
  Front-End Web Developer, Full-Stack Web Developer,   Data Scientist,   DevOps / SysAdmin, Back-End Web Developer, Game Developer,   Mobile Developer                                                                                                                       0.014302
Data Engineer,   User Experience Designer,   Mobile Developer,   Data Scientist,   Product Manager, Full-Stack Web Developer, Information Security, Product Designer                                                                                                         0.014302
  Data Scientist, Data Engineer, Full-Stack Web Developer, Information Security,   Front-End Web Developer, Back-End Web Developer,   DevOps / SysAdmin                                                                                                                      0.014302
  User Experience Designer, Game Developer, Full-Stack Web Developer,   DevOps / SysAdmin,   Data Scientist                                                                                                                                                                  0.014302
Data Engineer,   DevOps / SysAdmin,   Data Scientist, Game Developer,   Front-End Web Developer                                                                                                                                                                              0.014302
  Mobile Developer,   Quality Assurance Engineer, Full-Stack Web Developer,   Front-End Web Developer,   User Experience Designer,   Product Manager                                                                                                                         0.014302
Game Developer,   Mobile Developer, Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer, Information Security                                                                                                                                        0.014302
Full-Stack Web Developer,   Mobile Developer,   Data Scientist, Game Developer, Back-End Web Developer                                                                                                                                                                       0.014302
  Mobile Developer,   Product Manager, Full-Stack Web Developer, Back-End Web Developer, Information Security                                                                                                                                                                0.014302
Back-End Web Developer,   Mobile Developer,   Data Scientist,   User Experience Designer,   Quality Assurance Engineer,   Product Manager, Information Security,   DevOps / SysAdmin,   Front-End Web Developer, Data Engineer, Full-Stack Web Developer, Game Developer     0.014302
  Mobile Developer, Information Security,   Front-End Web Developer, Back-End Web Developer                                                                                                                                                                                  0.014302
Back-End Web Developer,   DevOps / SysAdmin, Data Engineer                                                                                                                                                                                                                   0.014302
  Data Scientist,   DevOps / SysAdmin,   Product Manager                                                                                                                                                                                                                     0.014302
Game Developer,   Front-End Web Developer, Information Security, Full-Stack Web Developer, Data Engineer                                                                                                                                                                     0.014302
  Mobile Developer,   User Experience Designer, Full-Stack Web Developer,   DevOps / SysAdmin, Technical Writer                                                                                                                                                              0.014302
Information Security, Game Developer,   Mobile Developer,   DevOps / SysAdmin                                                                                                                                                                                                0.014302
  Mobile Developer,   User Experience Designer,   Front-End Web Developer, Information Security, Full-Stack Web Developer                                                                                                                                                    0.014302
  Front-End Web Developer,   Mobile Developer, Back-End Web Developer, Information Security, Full-Stack Web Developer,   Data Scientist, Data Engineer,   DevOps / SysAdmin                                                                                                  0.014302
  Mobile Developer, Back-End Web Developer,   Product Manager, Full-Stack Web Developer, Information Security,   DevOps / SysAdmin                                                                                                                                           0.014302
  Front-End Web Developer, Full-Stack Web Developer,   User Experience Designer,   Mobile Developer, Information Security                                                                                                                                                    0.014302
Full-Stack Web Developer,   Front-End Web Developer, Data Engineer, Game Developer,   Quality Assurance Engineer,   Data Scientist, Back-End Web Developer, Information Security,   DevOps / SysAdmin,   Mobile Developer                                                    0.014302
Game Developer,   Mobile Developer,   DevOps / SysAdmin,   Data Scientist,   Front-End Web Developer, Data Engineer,   User Experience Designer, Information Security,   Quality Assurance Engineer                                                                          0.014302
Information Security, Game Developer,   Front-End Web Developer, Data Engineer,   Data Scientist,   Quality Assurance Engineer, Back-End Web Developer,   Mobile Developer                                                                                                   0.014302
Game Developer, Back-End Web Developer, Full-Stack Web Developer, Desktop Application Developer                                                                                                                                                                              0.014302
Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer,   Mobile Developer, Information Security, Data Engineer,   Quality Assurance Engineer                                                                                                           0.014302
  DevOps / SysAdmin, Information Security,   Mobile Developer,   Quality Assurance Engineer                                                                                                                                                                                  0.014302
Name: JobRoleInterest, Length: 3213, dtype: float64

Analyzing our frequency table, we can tell that almost 12% of the people who took the survey are interested in becoming a Full-Stack Web Developer. Almost 6% is interested in a role as Front-End Web Developer. When we look to the second half of our table, we can notice that many people are interested in more than one subject. We'll plot this data in a pie chart.

In [4]:
# analyze the number of students interested in one or multiple courses
multiple_courses = survey['JobRoleInterest'].str.contains(',')
print("Number of students interested in multiple courses: " + str(multiple_courses.sum()))

all_students = survey['JobRoleInterest'].shape[0]

single_course = all_students - multiple_courses.sum()
print("Number of students interested in one single course: " + str(single_course))

print('Total number of students: ' + str(all_students))
Number of students interested in multiple courses: 4779
Number of students interested in one single course: 13396
Total number of students: 18175
In [5]:
# plot our data in a pie chart
df = pd.DataFrame({'Courses': [multiple_courses.sum(), single_course]}, index=['Multiple Courses', 'One Course'])
df.plot.pie(y='Courses', figsize=(5, 5))
plt.ylabel('')
plt.legend(loc='lower right')
Out[5]:
<matplotlib.legend.Legend at 0x7fa2c7756630>

More than a quarter of the students is interested in multiple courses. We can say that it makes sense to our company to propose a variety of course subjects. Let's analyze how many people are interested in web and mobile development - the focus of our courses.

In [6]:
# create a frequency table with absolute values
survey_web_mobile = survey['JobRoleInterest'].str.contains('Web Developer|Mobile Developer')
survey_web_mobile.value_counts()
Out[6]:
True     6030
False     962
Name: JobRoleInterest, dtype: int64
In [7]:
# create a frequency table with percentages
survey_web_mobile_percent = survey_web_mobile.value_counts(normalize=True) * 100
survey_web_mobile_percent.plot(kind='bar')

# plot the results in a bar chart
plt.style.use('fivethirtyeight')
plt.title('Distribution of Students \nby Job Role Interest', size=18)
plt.ylabel('Percentage', size=12)
plt.ylim([0, 100])
plt.xticks([0, 1], ['Web or Mobile \nDevelopment', 'Other Subjects'], rotation=0)
plt.show()

More than 86% of the people who took the survey indicated that they are interested in web or mobile related subjects. We can conclude that our sample is representative for the purpose of our analysis. Now that we found out that the sample has the right categories of people, we can begin analyzing it.

Location of Potential Customers

We can start with finding out where these new coders are located and what are the densities for each location. We can find this information in the CountryCitizen and CountryLive columns which describe the country of origin and the country each participant lives in. As we're interested in where people are located, we'll use the last mentioned column only. One good indicator of a good market is the number of potential customers - the more potential customers in a market, the better.

In [8]:
# drop the rows where respondents did not answer what role they are interested in
survey_clean = survey[survey['JobRoleInterest'].notnull()].copy()
In [9]:
survey_clean.shape
Out[9]:
(6992, 136)
In [10]:
# generate a frequency table for the `CountryLive` column with absolute values
absolute_values = survey_clean['CountryLive'].value_counts()
absolute_values.head()
Out[10]:
United States of America    3125
India                        528
United Kingdom               315
Canada                       260
Poland                       131
Name: CountryLive, dtype: int64
In [11]:
# generate a frequency table for the `CountryLive` column with percentages
percentage = survey_clean['CountryLive'].value_counts(normalize=True) * 100
percentage.head()
Out[11]:
United States of America    45.700497
India                        7.721556
United Kingdom               4.606610
Canada                       3.802281
Poland                       1.915765
Name: CountryLive, dtype: float64

The United States is the top country where more than 45% of the respondents live. India comes far behind with almost 8% of the respondents. All the other countries stay under 5%. For our advertisement, we certainly will choose the United States for advertisement. As a second market, we could choose India, but we should check whether people living in India are willing to spend a lot of money on learning courses compared to the numbers 3 and 4 in the list - the United Kingdom and Canada. The MoneyForLearning column will be helpfull to answer this question.

We'll narrow down our analysis to only four countries: the US, India, the United Kingdom and Canada. These are the countries having the highest absolute frequencies in our sample, which means we have a decent amount of data for each. In addition to that, our courses are written in English, and English is an official language in all these four countries. The more people that know English, the better our chances to target the right people with our ads.

Amount of Money spent for Learning

The MoneyForLearning column describes in American dollars the amount of money spent by participants from the moment they started coding until the moment they completed the survey. Our company sells subscriptions at a price of \$59 per month, and for this reason we're interested in finding out how much money each student spends per month. Therefore, we will divide the MoneyForLearning column to the MonthsProgramming column and replace all values of 0 (students that had just started when they completed the survey) in the last mentioned column with 1 to avoid dividing by 0.

In [12]:
# replace values of 0 with 1
survey_clean['MonthsProgramming'] = survey_clean['MonthsProgramming'].replace(0, 1)

# create a new column that describes the money students spent per month
survey_clean['MonthlySpent'] = round(survey_clean['MoneyForLearning'] / survey_clean['MonthsProgramming'], 2)
survey_clean['MonthlySpent'].head()
Out[12]:
1     13.33
2    200.00
3      0.00
4      0.00
6      0.00
Name: MonthlySpent, dtype: float64
In [13]:
# count the null values in the `MonthlySpent` column
survey_clean['MonthlySpent'].isnull().sum()
Out[13]:
675
In [14]:
# count the null values in the `CountryLive` column
survey_clean['CountryLive'].isnull().sum()
Out[14]:
154
In [15]:
# remove the null values in the new column and the `CountryLive` column
survey_clean['MonthlySpent'] = survey_clean['MonthlySpent'].notnull().copy()
survey_clean['CountryLive'] = survey_clean['CountryLive'].notnull().copy()
In [16]:
# check the null values again
print(survey_clean['MonthlySpent'].isnull().sum())
print(survey_clean['CountryLive'].isnull().sum())
0
0
In [17]:
# group the remaining data by country and mean of the `MonthlySpent` column
countries_mean = survey_clean.groupby('CountryLive')['MonthlySpent'].mean()
In [18]:
countries_mean
Out[18]:
CountryLive
False    0.681818
True     0.908453
Name: MonthlySpent, dtype: float64

We got some surprising results. Intuitively, we'd expect people in the UK and Canada to spend more on learning than people in India, taking into account a few socio-economical metrics like GDP per capita. It might be that we don't have enough representative data for the UK, Canada and India, or we might have some outliers coming from wrong survey answers for example influencing the mean, or it might be that the results are correct.

We'll generate 4 box plots to visualize the distribution of the MonthlySpent column for each country and spot possible outliers.

In [23]:
# generate 4 box plots for the `MonthlySpent` column
plt.boxplot(countries_mean[['United States of America']])

KeyErrorTraceback (most recent call last)
<ipython-input-23-a9b99658ecbc> in <module>()
      1 # generate 4 box plots for the `MonthlySpent` column
----> 2 plt.boxplot(countries_mean[['United States of America']])

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/series.py in __getitem__(self, key)
    662             key = check_bool_indexer(self.index, key)
    663 
--> 664         return self._get_with(key)
    665 
    666     def _get_with(self, key):

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/series.py in _get_with(self, key)
    703                     # handle the dup indexing case (GH 4246)
    704                     if isinstance(key, (list, tuple)):
--> 705                         return self.loc[key]
    706 
    707                     return self.reindex(key)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1371 
   1372             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1373             return self._getitem_axis(maybe_callable, axis=axis)
   1374 
   1375     def _is_scalar_access(self, key):

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1614                     raise ValueError('Cannot index with multidimensional key')
   1615 
-> 1616                 return self._getitem_iterable(key, axis=axis)
   1617 
   1618             # nested tuple slicing

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1113 
   1114         if self._should_validate_iterable(axis):
-> 1115             self._has_valid_type(key, axis)
   1116 
   1117         labels = self.obj._get_axis(axis)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1470                         raise KeyError(
   1471                             u"None of [{key}] are in the [{axis}]".format(
-> 1472                                 key=key, axis=self.obj._get_axis_name(axis)))
   1473                     else:
   1474 

KeyError: "None of [['United States of America']] are in the [index]"
In [ ]: