Selecting Advertising Markets for an E-Learning Platform

In this practice project, I assume that I'm working for an an e-learning company that offers online courses on programming. Most of the courses are on web and mobile development, but the company also covers many other domains, like data science, game development, etc. It wants to promote its product and it wants to invest some money in advertisement.

The goal in this project is to find out the two best markets for advertising the e-learning platform.

To reach this goal, the company could organize surveys for a couple of different markets to find out which would be the best choices for advertising. However, this is very costly. It would be more efficient to first explore cheaper options, such as using existing sources of market research and surveys.

To select these markets, I use data from an existing survey conducted by FreeCodeCamp as a sample of the population of people we would like to target with advertising: new coders or coders interested in continuing education. This will help narrow down which two markets are the best for our advertising campaign.

The main questions I wanted to answer are:

What types of development and tech job are survey respondents most interested in?
What are the locations with the greatest number of coders?
How much money are new coders willing to spend on learning?

By answering these questions, I will then suggest which two national markets would be the best for advertising.

The conclusions for these questions were:

Web development and mobile development had by far the most interest among respondents overall, particularly web development. Data science took third place in overall interest.
Most of the survey respondents were located, in descending order, in the US, India or UK
Respondents in the US & the UK had the most available money to spend monthly, on average, for learning resources

Based on these conclusions, I recommend the US and India as the two best national markets for advertising:

the US, because it has the most survey repsondents by far (almost 50% of total responses) and respondents had on average the most available money to spend monthly on learning
India, because although it came third place in available money to spend per month, it came second place in total number of respondents, indicating a larger base of potential customers than countrie that had higher avaiable money to spend monthly

After I've answered those questions, I decided to look at additional market segments relevant points of investigation. I considered looking at:

Gender
Age
Relevant events & media (e.g. podcasts) used by respondents

I know that within tech fields, underrpresentation of women is a major problems, and contribute to gender discrimination in terms of unequal pay, workplace harassment and a lack of technology being made with the needs of women in mind. I therefore decided to explore the gender breakdown of respondents: how many men vs. women participated in the survey and whether there are differences in job role interests, location or ability to pay between genders.

My findings from the gender analysis were that:

Men represent a clear majority (76%) over women (23%) in survey respondents
While the US, India and the UK were the countries with most respondents overall, the top 3 countries for female respondents were the US, UK and Canada
Surprisingly, considering the average gender pay gap between men and women across the world, female respondents reported, on average, a larger amount of money available to spend on learning resources per month than male respondents

My conclusions for the gender analysis of the sample were that:

Since female respondents show an even greater ability to pay monthly for learning than male respondents and are underrepresented both in the sample and in tech fields worldwide, the company could both boost its users and help contribute to reducing gender inqeuality by targeting advertising or even offering discounted rates to interested female learners in the field of web and mobile development.

Data Selection & Overview

I am useing data from freeCodeCamp's 2017 New Coder Survey. Because they run a popular Medium publication (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for the purpose of our analysis. I am using an existing survey rather than conducting our own in order to save money on the initial analysis.

Survey data is taken from this GitHub repository: https://github.com/freeCodeCamp/2017-new-coder-survey

In [1]:

import numpy as np
import pandas as pd

survey_data = pd.read_csv('2017-fCC-New-Coders-Survey-Data.csv', sep=',')

survey_data.info()

## On first pass, receive error:
##/dataquest/system/env/python3/lib/python3.4/site-packages/IPython/core/interactiveshell.py:2723: 
##vDtypeWarning: Columns (17,62) have mixed types. 
## Specify dtype option on import or set low_memory=False.
  ## interactivity=interactivity, compiler=compiler, result=result)
## So we will investigate dtypes for columns 17 and 61

survey_data.columns[17]
## Printed 'CodeEventOther'

survey_data.columns[62]
## Printed 'JobInterestOther'

survey_data['CodeEventOther'].dtype

survey_data['JobInterestOther'].dtype

## Both return dtype 'O' indicating a python object (string)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18175 entries, 0 to 18174
Columns: 136 entries, Age to YouTubeTheNewBoston
dtypes: float64(105), object(31)
memory usage: 18.9+ MB

/dataquest/system/env/python3/lib/python3.4/site-packages/IPython/core/interactiveshell.py:2723: DtypeWarning: Columns (17,62) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

Out[1]:

dtype('O')

Next, I'll look more closely at the unique values in each column to determine what dtype to assign.

In [2]:

survey_data['CodeEventOther'].unique()

## This returned unique values that were entirely titles of 
## different code camps, so we will leave the dtype 'O' designation

Out[2]:

array([nan, 'Ladies Learning Code', 'Peatix Events',
       'Virada Tecnológica - Brasil', 'Mentoring', 'Microsoft Dev Camps',
       'GDG', 'Include Girls Workshop', 'Local Event Called CodeCamp',
       'DjangoCon', 'C3', 'Learners Guild Enrollment Day', 'Bootcamp',
       'Laracon', 'Girl Geek Carrot', 'Coding Kids', 'No',
       'Meet-ups In College', 'Codepen Meetups', 'Code2040',
       'Summer Of Tech Bootcamps', 'CTFs', 'CodeNewbie',
       'Been Wanting To Attend Women Who Code Etc Hackathons, Go To Hacklab But Am A Chicken And Also A Tired Human',
       'Pyladies Events', 'LRUG', 'General Assembly', 'School Mettings',
       'TechOlympics', 'The Iron Yard Crash Courses',
       'LearningFuze Development Bootcamp In Irvine, CA',
       'Code First: Girls', "Havn't Attended Any Events", 'WordCamp',
       'App Academy Prep Course', ' Kc Code Noobs', 'Noone',
       'CCC Waterloo', 'BrazilJS', 'MLH Hackathons', 'No One', 'Techzion',
       'Bitmaker (Toronto, ON - CA)', 'The Recurse Center', 'Outreachy',
       'Mentor', 'Pink Programming', 'Women Techmakers',
       'There Are Very Limited Resources Where I Live',
       'Institute Of Code Bali', 'Mentorship For Girls', 'Code Up',
       'Competitive Programming', 'HearHerCode DC, Iron Yard Meetups',
       'DeveloperDeveloperDeveloper', 'Just In My University.',
       'No Attended', 'Weekend Coding Meet Ups', 'Hackathons', 'ACM ICPC',
       'Code4 America', 'PyCon', 'Css3 At Universidad De La Laguna',
       'Mentorship Saturdays In Portland', 'Codebar!!',
       'Some Meetups Organized In My Neighbourhood',
       'South Florida Code Camp', 'Hack Reactor', 'Launch Code',
       'IndustryConf',
       'Email Course On The Fundamentals Of Functional Programming',
       'DrupalCon', 'Udemy', 'Software Carpentry', 'Nan',
       'Team Retreat :P', 'Hear Me Code', 'Coalition For Queens (C4Q)',
       'Google Developer Group', 'WeTeach_CS', 'Code For America',
       'Coding Dojo',
       "Just Moved To Kansas City And I'm Going To All The Tech Meetups Here, Kansas City Women In Tech, Kansas City Code And Meetup, Etc. ",
       'School', 'IEEEXtreme Competition', 'Perl User Groups',
       'Codeathons', 'CodeCoach', 'Zappos Coding Challenge',
       'In-company Events', 'Derek Benas', 'Hz', 'Pyday', 'N0',
       'I Went To A Bootcamp', 'DevCon Philippines', 'For Loop',
       'Programming Knowledge Youtube Channel', 'Metal Toad Meetup',
       'Try Turing', 'Local Lectures From College', 'PDX Ruby Group',
       '12-week Bootcamp', 'Code Fellows Bootcamp In Seattle, WA',
       'Python Code School', 'Saylani Mass Training Program',
       'Conferences', 'Codebar',
       'There Are No Events In My Area Whatsoever.', 'Noble Desktop',
       'Have Yet To Visit', 'All Star Code',
       '2 Week Html Class And School', 'High School Code Class',
       'Pycon India', 'Google Developers Group',
       'General Assembly Introductory Class', 'Detroit.js', 'Discord',
       'Andela Bootcamp', 'Course Run At Local Digital Hub',
       'Private Classes', 'Pyladies', 'Launchcode', 'TENT',
       'Code.org Teacher Training', 'Null', 'Small University Group',
       'Didnt Attended Any', 'Android App Building School', 'OpenHack',
       'Didnt Have Chance To Attend To Any Event',
       'Study Groups For My Minor In College', 'WoC Hackers',
       'Py Ohio (Hosted By Local Meetup "Python Ohio" Every Year)', 'Q',
       'SharePoint Saturdays', 'Manchester Girl Geeks', 'College',
       'Spring In JAVA', 'No Events', 'Codesmith', 'Black Tech Week',
       'Ninguno', 'Na', 'PyLadies', 'Any',
       'AWS Day By AWS User Group Japan In Japan', 'DrupalCamp',
       'Meet.js', 'CoderCamps Meet And Greet', 'Library Code Courses',
       'Steer Data Viz D3.js', 'Vilnius Girls Code', 'Yt Videos',
       'Intro To Computer Science Class', 'Self Coding',
       'College-related',
       'Minor Programmen At The University Of Amsterdam',
       'Coding Bootcamp', '5 Month Private Class', 'Camp',
       'I Meet With A Friend Weekly.', 'CoderGirls',
       'Meetup With Friends', 'OKCoders', 'JavaForum Stuttgart',
       'Angular', 'Boise Code Camp', 'Competion',
       'Eduweb.pl Web Design UI/UX/HTML Bootcamp', 'MyCocoahead', 'Puppy',
       'Haven;t Been 2 One', 'Competitions',
       'Library Hosted Classes And Workshops', 'Girls Who Code',
       'I Dont Atend', 'Weekly Courses Held By Our School',
       'Campus Party Mexico', 'Le Wagon', 'Code & Beer In Seattle',
       'Local Events', 'Codeforce', 'Hackerspace Member', 'Brainstation',
       'Code For Orlando', 'Radius Co-Working Events', 'PiterJS',
       'Meetup Data Girls', 'ChiHack Night', 'WordPress Meetup Groups',
       'CoderGirl By LaunchCode', 'Code Fellows', 'NIT FossMeet',
       'LaunchCode', 'Any One', 'Letslearncoding.org', 'No Event',
       'Local', 'IDTech', 'Barcode', 'Www.thenewboston.com', '.',
       'Local Coding Club', 'CodeCamp Cluj, DevDays Cluj',
       'We Got Coders', 'Learn Teach Code LA', 'N A', "I Havn't",
       'Computer Science 1st Year',
       'Programming Group Projects In My College Of Technology',
       'Hyperion', 'YouTube', 'Epicodus (part Time/full Time Boot Camp)',
       'Rmotr', 'High School Curriculum', 'NEXT Academy (Malaysia)',
       'Easyctf', 'Meetup', 'GreenFox', 'Friends', 'School Camp',
       'Codemotion', 'At Our University PAFKIET(IntraBattle)', 'Code.org',
       'Lær Kids Koding',
       'Networking With Women Like Me Who Are Currently Learning Code',
       'Free Class', 'Mini Curso De Python', 'Google', 'Hacklab Almeria',
       'Local Developers Events', 'Symposium',
       'Coding Club Started With Developers At The Software Company I Work For.',
       'Meetus', 'Reactive Conference Bratislava', 'CoderDojo Scotland',
       'Pink Programming, Codher', 'Coding Tutorials 360', 'Code Clan',
       'Ladies Learn Code', 'Bay Area Coding Sessions', 'Codementor.io',
       'HackSSC', 'Thinkful Meetups', 'CodeUp Manchester',
       'Amazon Workshop For Getting Into Coding', 'Php Conference',
       'HeartLandGaming Expo', 'DroidCon', 'Pycon',
       'Continuing Education Program For IT', 'Smashing Conference',
       'Google Developers Conference. GDG', 'Informatics Olympiads',
       'Hour Of Code', 'Galvanize', 'SC Codes', 'Hackerspace',
       'Google Developers Group Ghana', 'Codingame', 'AZ Code Challenge',
       "Here In My Country We Don't Have Much Of This Kind Of Meetings. Brazil",
       '7 Week Bootcamp', 'Ironhack',
       'Google Developer Groups Study Jam,devfest', '微软win10校园开发',
       'Codechef', 'Code Over Coffee', 'Week Long Bootcamp',
       'Free Software Day Event', 'Grace Hopper Conference',
       'Free Code Camp', '-', 'Firebase Dev Summit',
       'Talking To Other Developers At My Company', 'PuppetConf 2016',
       'Hackaton', 'Code For Miami', 'Ainsleys Cooking Classes',
       'College Competion', 'Laracast', 'Code Academy Azerbaijan',
       'Iforum, Wordpress Kitchen', 'Codecool', 'Bristol JS',
       'Hack Reactor Meetups, Rithm School Meetups', 'WoMoz',
       'Founders And Coders', 'Local Courses From Es-press-oh',
       'Exercises At The University', 'FISL',
       'Inspaya Incubator In Nigeria', 'Edx', ' D', 'Weekly Bootcamps',
       'Νονε', 'The Iron Yard', 'Sololearn', 'Code Camp South Florida',
       'Bhkbhk', 'LaunchCode Mentorship', 'A', 'LaunchCode LC101',
       'Unknown', 'Roblox', 'Html500 Vancouver',
       'Tech Meetup Edinburgh, Scotland Based Event', 'Asdf',
       'GDG Meetups', "I Don't Remember What's That Called",
       'Bootcamps Night', 'CoderGirl',
       'General Assembly Data Science Bootcamp', 'Codebar Brighton',
       'School Classes', 'Coder Dojo', 'She Codes;', 'Codecademy London',
       'Bitwise', 'CTF', 'Class', 'Crash Courses', 'Didnt Attempt',
       'Google DevFest', "GDG Code D'Armor", 'I Havnt',
       'Sleep Over With Friends', 'I Attended In Bootcamps In My Region',
       'Coding Contests', 'Gdg', 'Campus Party Brasil 2017', 'Web Summit',
       'Google Developer Groups', 'College Class For A Year',
       'College Events', 'PyCaribbean 2017',
       "Couplo' Kids Web Pages On Coding :)",
       "I Live In Saudi Arabia And There Ain't Much Event Organized Here",
       'School Groups :)', 'College Hackathon', 'Home Meets', 'IRL',
       'Mobile App Development Class', 'Baidu',
       "My College's Own Activities", 'Iduntdoanyting',
       'Random Conversations With Friends Of Friends And Friend-friends',
       'Hackers News Meetup', 'Meet Ups', 'Other School',
       'Coding Classes', 'College :(', 'Coder Camps-Pearland Texas',
       'Seattle Coder Dojo', 'University Events Etc', 'Short Bootcamp',
       'Have Yet To Attend', 'Group, Friend, Coding Jams!', 'Php Conf',
       'Slef', 'Jug Meet Up',
       'Just Starting School, Take Advantage Of Office Hours, Slack And Coffe Clatches Dedicated To Coding',
       'Code First Girls',
       'I Attend Firestation 101 Every Saturday. We Use FreeCodeCamp.',
       'Developer Conferences', "I Don't Have Attended Yet.", 'Youtube',
       'Coderhouse', 'University Workshop', 'Codecademy', 'LearningFuze',
       'Geek Girls Carrots', 'CoderDojo Silicon Valley', 'Rails Girls',
       'CoLab Kaduna', 'Internet', 'DevFest', 'Q College',
       'AndroidSchool', 'Campus Party',
       'Python For (Typo)Graphic Designers', 'For Loop Lagos',
       'Meet Up And Teaching', 'HackerRank', 'Female Coders Lab',
       'Hackathon', 'Open Source 101', 'Ieee', 'Hackerrank',
       'PHP 7 Lection', 'RubyRails', 'Other', 'Swap Round Project',
       'CodeFirst:Girls', 'Didnt Attend', 'Coderbunker', 'Nobody',
       'Forloop', 'I Didnt', 'DevCongress Meetups', 'Audax', 'CodeSmith',
       'A Part-time Course With Instructors And Peers', "I Don't Know",
       'No,I Have No Attended It', 'Ihub Meetups', 'GDG Events',
       'Facebook Groups', 'Code Platoon', 'Rutgers Coding Bootcamp',
       'Lighthouse Labs', 'Local Part Time Bootcamps', 'University',
       'Linux Security Related 5 Days Boot Camp', 'PHP Courses',
       'AstanaJUG', 'Betabeers', 'School Events Such As Cat Barcamp',
       'Yow', 'Local Ones', 'Books', 'College Class Meetups',
       'Youtube Conversations Live - Video Streaming',
       'No Coding-related Events', 'Baseball Hack Day',
       'Code For Greenville', 'User Group Meetup', 'GoCode Colorado',
       'Bootcamp Free Workshops', 'Google Dev Conference',
       'Work Full Time', 'HackerYou, Ladies Learning Code', 'Rails Camp',
       'Learning Tree', 'User Groups', 'Bootcamp Online',
       'Cisco Live DevNet', 'YOU TUBE FREE TUTORIALS', 'Meetups',
       'Webinar Online', 'Local Ruby Meetup (Dakar)',
       'Local Game Developer Group', 'Chi Hack Night Open Data Hacking',
       'Off-line Java Courses', 'Legal Hackers', 'Word Camp',
       'Grand Circus', 'Edx, Alison, Coursera', 'An Event Apart',
       'Self Trained', 'EL ZERO WEB SCHOOL',
       'Getting Together With Family Who Code', 'Learners Guild',
       'Local Coding And Dinner Event', 'My Friend', 'N', 'Sir Syed',
       'Google Code Camp', 'WSD - WebStandardsDays',
       'A Class At Community College', 'Matf', 'EPAM Trainings'],
      dtype=object)

In [3]:

survey_data['JobInterestOther'].unique()

## This returned a list of different job titles, so we will
## also leave the dtype 'O' distinction

Out[3]:

array([nan, 'Security Expert', 'Technical Writer', 'Researcher',
       'Systems Engineer', 'Desktop Applications Programmer', 'Robotics',
       'Non Technical', 'UI Design', 'Software Engineer', 'Email Coder',
       'Data Analyst', 'I Dont Yet Know', 'UX Developer/designer',
       'Support Scientific Resaerch', 'AI And Neuroscience',
       'Full Stack Software Engineer', 'Program Manager',
       'Application Support Analyst',
       "This Futurist's Dream Of Using Some Tech In A Way That Inspires Critical Amounts Of People To Influence The Changes We Need To Protect & Repair Our Planet",
       'Information Architect', 'Physicist', 'Security Business Analyst',
       'Bioinformatics/science',
       'Creative Coder / Generative Artist/designer',
       'A Job In Which I Can Use Coding Skills To Create Valuable Portals To Advance Human Rights',
       'Research', 'Bitcoin/Crypto', 'Embedded Hardware',
       'Data/Interactive Journalist', 'Software Engineering',
       'Business Analyst', 'Network Engineer', 'Information Developer',
       'Java Developer', 'Project Management',
       'Machine Learning Engineer', 'Real-time Systems', 'Cybersecurity',
       'GIS Developer', 'Research And Education', 'System Software',
       'Full Stack Developer & Instructional Designer / Educational Technologist',
       'AI', ' Bioinformatics', 'Urban Planner', 'Full Stack Developer',
       'SWE', 'Embedded Developer', 'Virtual Reality Developer',
       'Journalist/Graphic Designer/Marketing', 'Web Designer',
       'Computer Architect', 'Networking', 'Software Developer',
       'AI And Machine Learning', 'Computer Engineer',
       'Artificial Intelligence', 'Systems Programming',
       'Software Engineer (Computer Science Based)',
       'Technology Management', 'Full-stack Developer', 'BA Or Developer',
       'User Interface Design', 'System Engineer', 'Network', 'Analyst',
       'Machine Learning', 'Pharmacy Tech',
       'Data Journalist / Data Visualist', 'Desings',
       'Infrastructure Architect', 'Tech Art',
       'Technology-Business Liaison', 'Product Designer',
       'Front-End Web Designer', 'Document Controller',
       'Software Enginner', 'Programmer', 'Undeceided',
       'Pharmaceutical Industry', 'Information Technology',
       'Library Developer', 'Desktop Application Developer',
       'Operating Systems, Compilers, Etc...', 'GIS Database Admin',
       'Designer', 'Support Engineer Or API Support', 'Python Developer',
       'Bioinformatics', 'Robotics Process Automation Specialist',
       'Data Visualisation', 'Desktop Applications Developer',
       'All - Whatever Is Required To Develop Tools To Revolutionize The Mechanical Engineering Process',
       'Digital Humanitites', 'User Interface Designer',
       'Software Development', 'Programming', 'Web Development',
       'Marketing', 'Financial Services', 'Natural Language Processing',
       'Entreprenuer / Web Dev Hustler', 'Marketing Automation',
       'AI Developer', 'Network Admin',
       'Front End, Back End, Game, Web, Mobile Developer',
       'Computer Scientist', 'UI Designer', 'Data Entry',
       'Business Consultant', 'Cloud Computing',
       'Machine\u200b Learning Engineer',
       "I'd Like To Wear Lots Of Hats And Do Hard Work", 'Fintech',
       'Neuroscientist', 'Visual Designer', 'Database Administration',
       'Application Developer', 'AI Development', 'Eggs',
       'Project Manager', 'Undecided', 'Milatary Engineer', 'SEO',
       'Astrophysicist', '*', 'Journalist', 'Philosopher',
       'Desktop Applications', 'IoT Developer', 'Systems Programmer',
       'Professor', 'Artificial Intelligence Engineer',
       'Developer Evangelist', 'Interaction Developer',
       'Bioinformatitian', 'IoT', 'Entrepreneur',
       'I Am Interested In Game Development, Mobile Development, Web Design, Front End Web Development',
       'Data Reporter', 'Computer Vision Engineer/Research Scientist',
       'Web Developer', 'Robotics And AI Engineer', 'Ethical Hacker',
       'Scientific Programming',
       'Software Developer Or Front-End Web Developer',
       'Campaign Manager', 'AI Engineer', 'Software Specialist',
       'Growth Hacker', 'Founder', 'Software Engineers',
       'VR Technology Developer', 'Developer', 'Plc', 'Ceo',
       'Tech Lobbiest', 'Quant (Algorithmic Trader)',
       'Machine Learning And AI', 'Databases', 'Software Developper',
       'College Professor', 'System Administrator/Network',
       'Software Projects Manager', 'Teacher. Teaching Students To Code.',
       'Education',
       'Code Developer...in Whatever Format, Front-end, Back-end, App Dev Etc.',
       'Improving In My Current Career As A Learning Technologist',
       'Informatician', 'Lab Scientist', 'Data Visualization Specialist',
       "I'm Just Learning Code To Increase My Skill-set. I See It As A Literacy Issue.",
       'Teacher', 'Criminal Defense Attorney-- Focusing On Cyber Crimes',
       'Remote Support', 'Non-programmer', 'IT Specialist'], dtype=object)

In [4]:

## Print first 5 rows for overview
## Data has 136 columns
survey_data.head()

## Assign a list of columns to identify best for analysis
column_list = list(survey_data.columns)

## Print a numbered list to more easily identify columns
for i, item in enumerate(column_list, start=0):
    print(i, item)

0 Age
1 AttendedBootcamp
2 BootcampFinish
3 BootcampLoanYesNo
4 BootcampName
5 BootcampRecommend
6 ChildrenNumber
7 CityPopulation
8 CodeEventConferences
9 CodeEventDjangoGirls
10 CodeEventFCC
11 CodeEventGameJam
12 CodeEventGirlDev
13 CodeEventHackathons
14 CodeEventMeetup
15 CodeEventNodeSchool
16 CodeEventNone
17 CodeEventOther
18 CodeEventRailsBridge
19 CodeEventRailsGirls
20 CodeEventStartUpWknd
21 CodeEventWkdBootcamps
22 CodeEventWomenCode
23 CodeEventWorkshops
24 CommuteTime
25 CountryCitizen
26 CountryLive
27 EmploymentField
28 EmploymentFieldOther
29 EmploymentStatus
30 EmploymentStatusOther
31 ExpectedEarning
32 FinanciallySupporting
33 FirstDevJob
34 Gender
35 GenderOther
36 HasChildren
37 HasDebt
38 HasFinancialDependents
39 HasHighSpdInternet
40 HasHomeMortgage
41 HasServedInMilitary
42 HasStudentDebt
43 HomeMortgageOwe
44 HoursLearning
45 ID.x
46 ID.y
47 Income
48 IsEthnicMinority
49 IsReceiveDisabilitiesBenefits
50 IsSoftwareDev
51 IsUnderEmployed
52 JobApplyWhen
53 JobInterestBackEnd
54 JobInterestDataEngr
55 JobInterestDataSci
56 JobInterestDevOps
57 JobInterestFrontEnd
58 JobInterestFullStack
59 JobInterestGameDev
60 JobInterestInfoSec
61 JobInterestMobile
62 JobInterestOther
63 JobInterestProjMngr
64 JobInterestQAEngr
65 JobInterestUX
66 JobPref
67 JobRelocateYesNo
68 JobRoleInterest
69 JobWherePref
70 LanguageAtHome
71 MaritalStatus
72 MoneyForLearning
73 MonthsProgramming
74 NetworkID
75 Part1EndTime
76 Part1StartTime
77 Part2EndTime
78 Part2StartTime
79 PodcastChangeLog
80 PodcastCodeNewbie
81 PodcastCodePen
82 PodcastDevTea
83 PodcastDotNET
84 PodcastGiantRobots
85 PodcastJSAir
86 PodcastJSJabber
87 PodcastNone
88 PodcastOther
89 PodcastProgThrowdown
90 PodcastRubyRogues
91 PodcastSEDaily
92 PodcastSERadio
93 PodcastShopTalk
94 PodcastTalkPython
95 PodcastTheWebAhead
96 ResourceCodecademy
97 ResourceCodeWars
98 ResourceCoursera
99 ResourceCSS
100 ResourceEdX
101 ResourceEgghead
102 ResourceFCC
103 ResourceHackerRank
104 ResourceKA
105 ResourceLynda
106 ResourceMDN
107 ResourceOdinProj
108 ResourceOther
109 ResourcePluralSight
110 ResourceSkillcrush
111 ResourceSO
112 ResourceTreehouse
113 ResourceUdacity
114 ResourceUdemy
115 ResourceW3S
116 SchoolDegree
117 SchoolMajor
118 StudentDebtOwe
119 YouTubeCodeCourse
120 YouTubeCodingTrain
121 YouTubeCodingTut360
122 YouTubeComputerphile
123 YouTubeDerekBanas
124 YouTubeDevTips
125 YouTubeEngineeredTruth
126 YouTubeFCC
127 YouTubeFunFunFunction
128 YouTubeGoogleDev
129 YouTubeLearnCode
130 YouTubeLevelUpTuts
131 YouTubeMIT
132 YouTubeMozillaHacks
133 YouTubeOther
134 YouTubeSimplilearn
135 YouTubeTheNewBoston

There are many columns here that would be useful when deciding how to target an advertising campaign. Some of the most potentially relevant column(s) in parentheses:

Age of those interested in programming (Age)
Location (CityPopulation, CountryCitizen, CountryLive)
Gender (Gender)
Current Job Type (EmploymentField)
Native language (or preferred language for learning) (LanguageAtHome)
Willingness to pay (MoneyForLearning)
Type of programming desired (JobRoleInterest)

Sample Representativity

Although the focus of the e-learning platform is on web and mobile development, it also wants to appeal to those interested in other programming areas like data science or game development.

To get an idea whether the population taking this survey represents our population of interest, I look at the JobRoleInterest column to see what roles survey participants are interested in.

In [5]:

## Generate a frequency table of the JobRoleInterest column
## Normalize to show percentages and drop null values (NaN)
job_freq = survey_data['JobRoleInterest'].value_counts(normalize=True, dropna=True)*100

job_freq

Out[5]:

Full-Stack Web Developer                                                                                                                                                                              11.770595
  Front-End Web Developer                                                                                                                                                                              6.435927
  Data Scientist                                                                                                                                                                                       2.173913
Back-End Web Developer                                                                                                                                                                                 2.030892
  Mobile Developer                                                                                                                                                                                     1.673341
Game Developer                                                                                                                                                                                         1.630435
Information Security                                                                                                                                                                                   1.315789
Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                    0.915332
  Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                    0.800915
  Product Manager                                                                                                                                                                                      0.786613
Data Engineer                                                                                                                                                                                          0.758009
  User Experience Designer                                                                                                                                                                             0.743707
  User Experience Designer,   Front-End Web Developer                                                                                                                                                  0.614989
  Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer                                                                                                                            0.557780
Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                            0.514874
  DevOps / SysAdmin                                                                                                                                                                                    0.514874
Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer                                                                                                                            0.514874
Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer                                                                                                                            0.443364
  Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer                                                                                                                            0.429062
Full-Stack Web Developer,   Mobile Developer                                                                                                                                                           0.414760
  Front-End Web Developer,   User Experience Designer                                                                                                                                                  0.414760
Back-End Web Developer, Full-Stack Web Developer                                                                                                                                                       0.386156
Full-Stack Web Developer, Back-End Web Developer                                                                                                                                                       0.371854
Back-End Web Developer,   Front-End Web Developer                                                                                                                                                      0.286041
Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer                                                                                                                            0.271739
Data Engineer,   Data Scientist                                                                                                                                                                        0.271739
  Front-End Web Developer,   Mobile Developer                                                                                                                                                          0.257437
Full-Stack Web Developer,   Data Scientist                                                                                                                                                             0.243135
  Data Scientist, Data Engineer                                                                                                                                                                        0.228833
  Mobile Developer, Game Developer                                                                                                                                                                     0.228833
                                                                                                                                                                                                        ...    
Data Engineer,   Mobile Developer,   Front-End Web Developer, Back-End Web Developer, Game Developer                                                                                                   0.014302
  Front-End Web Developer, Back-End Web Developer,   Quality Assurance Engineer, Full-Stack Web Developer,   Product Manager                                                                           0.014302
Full-Stack Web Developer, Data Engineer, Information Security,   User Experience Designer,   Mobile Developer, Back-End Web Developer,   Front-End Web Developer                                       0.014302
Game Developer,   Front-End Web Developer,   User Experience Designer, Information Security                                                                                                            0.014302
Information Security, Full-Stack Web Developer,   Data Scientist, Back-End Web Developer                                                                                                               0.014302
Full-Stack Web Developer, Back-End Web Developer,   Mobile Developer,   User Experience Designer,   Front-End Web Developer                                                                            0.014302
  Mobile Developer, Back-End Web Developer,   User Experience Designer, Full-Stack Web Developer,   DevOps / SysAdmin,   Front-End Web Developer                                                       0.014302
Game Developer, Full-Stack Web Developer, Software Developer                                                                                                                                           0.014302
Game Developer, Full-Stack Web Developer,   Front-End Web Developer, Data Engineer,   User Experience Designer,   Data Scientist,   Mobile Developer, Back-End Web Developer, Information Security     0.014302
Data Engineer, Full-Stack Web Developer,   Data Scientist, Information Security, Back-End Web Developer                                                                                                0.014302
  Mobile Developer, Full-Stack Web Developer,   DevOps / SysAdmin,   Front-End Web Developer, Game Developer, Back-End Web Developer                                                                   0.014302
Full-Stack Web Developer, I dont yet know                                                                                                                                                              0.014302
  Front-End Web Developer,   Data Scientist, Back-End Web Developer, Data Engineer, Full-Stack Web Developer                                                                                           0.014302
I'm just learning code to increase my skill-set. I see it as a literacy issue.                                                                                                                         0.014302
  User Experience Designer, Data Engineer,   Front-End Web Developer, Back-End Web Developer, Game Developer,   Data Scientist, Information Security, Full-Stack Web Developer,   Mobile Developer     0.014302
  User Experience Designer, Information Security,   Mobile Developer,   Product Manager,   Quality Assurance Engineer                                                                                  0.014302
Data Engineer,   Data Scientist,   Front-End Web Developer                                                                                                                                             0.014302
  DevOps / SysAdmin,   Front-End Web Developer,   User Experience Designer,   Data Scientist, Full-Stack Web Developer, Information Security,   Mobile Developer                                       0.014302
Game Developer,   Mobile Developer, Data Engineer,   User Experience Designer,   Product Manager,   DevOps / SysAdmin, Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer     0.014302
  Mobile Developer, Game Developer,   User Experience Designer,   Product Manager, Full-Stack Web Developer,   Front-End Web Developer                                                                 0.014302
Full-Stack Web Developer,   User Experience Designer,   Data Scientist,   Mobile Developer, Game Developer                                                                                             0.014302
Full-Stack Web Developer, Game Developer,   Quality Assurance Engineer,   Front-End Web Developer                                                                                                      0.014302
Data Engineer, Back-End Web Developer, Full-Stack Web Developer,   Data Scientist                                                                                                                      0.014302
Full-Stack Web Developer,   Front-End Web Developer,   Mobile Developer,   Data Scientist,   Product Manager,   User Experience Designer                                                               0.014302
Back-End Web Developer,   User Experience Designer,   Mobile Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                            0.014302
Information Security, Game Developer, Full-Stack Web Developer, Back-End Web Developer,   Mobile Developer,   Front-End Web Developer                                                                  0.014302
  Front-End Web Developer, Full-Stack Web Developer, Information Security,   Data Scientist, Back-End Web Developer, Game Developer                                                                    0.014302
Data Engineer,   Front-End Web Developer,   Mobile Developer, Full-Stack Web Developer, Back-End Web Developer                                                                                         0.014302
  User Experience Designer, Full-Stack Web Developer,   DevOps / SysAdmin                                                                                                                              0.014302
Back-End Web Developer,   Data Scientist,   Product Manager, Data Engineer                                                                                                                             0.014302
Name: JobRoleInterest, Length: 3213, dtype: float64

Here its shown that the largest share of survey respondents are interested in Full-Stack Web Development, followed by Front-End Web Development, Data Science and Back-End Development. However, as one scrolls through the list, it can be seen that a large amount of respondents selected more than one answer for the job role they are interested in. It therefore can't really be said yet whether the "majority" is interested in web development or not, since one would need to add up the percentages for the answers selecting multiple roles.

First, let's see how many respondents chose more than one job role by generating another frequency table.

In [6]:

## Starting with our original dataset, we first drop null values from the 
## Job Role Interest column
jobrole_no_nulls = survey_data['JobRoleInterest'].dropna()

## We then use our cleaned column to split by commas to separate out
## individual roles from the multiple selection answers
multi_jobs = jobrole_no_nulls.str.split(',')

multi_jobs

## This returns the same column, but with each individual job role selected
## as a separate list item within each row

Out[6]:

1                               [Full-Stack Web Developer]
2        [  Front-End Web Developer,  Back-End Web Deve...
3        [  Front-End Web Developer,  Full-Stack Web De...
4        [Full-Stack Web Developer,  Information Securi...
6                               [Full-Stack Web Developer]
9        [Full-Stack Web Developer,    Quality Assuranc...
11       [  DevOps / SysAdmin,    Data Scientist,  Info...
13       [Back-End Web Developer,  Full-Stack Web Devel...
14                              [Full-Stack Web Developer]
15                              [Full-Stack Web Developer]
16                              [Full-Stack Web Developer]
18       [Full-Stack Web Developer,    Front-End Web De...
19       [  Front-End Web Developer,    Mobile Develope...
21                                  [Information Security]
22                              [Full-Stack Web Developer]
23                                [Back-End Web Developer]
28                              [Full-Stack Web Developer]
29       [  Front-End Web Developer,    Data Scientist,...
30       [Back-End Web Developer,  Full-Stack Web Devel...
31                             [  Front-End Web Developer]
32       [  Data Scientist,  Information Security,  Dat...
33       [Full-Stack Web Developer,    Quality Assuranc...
34       [Back-End Web Developer,  Full-Stack Web Devel...
35       [Back-End Web Developer,  Full-Stack Web Devel...
37                [  Mobile Developer,    Product Manager]
40       [  Front-End Web Developer,  Back-End Web Deve...
41                             [  Front-End Web Developer]
42                              [Full-Stack Web Developer]
43       [Back-End Web Developer,    Front-End Web Deve...
52       [  Data Scientist,  Game Developer,  Full-Stac...
                               ...                        
18080     [  Mobile Developer,    Front-End Web Developer]
18081    [Full-Stack Web Developer,  Back-End Web Devel...
18088                           [Full-Stack Web Developer]
18089                       [  Quality Assurance Engineer]
18090    [Game Developer,    Data Scientist,  Full-Stac...
18093     [  Front-End Web Developer,    Mobile Developer]
18097    [Game Developer,    Mobile Developer,  Full-St...
18098    [  Front-End Web Developer,  Full-Stack Web De...
18099                           [Full-Stack Web Developer]
18107                           [Full-Stack Web Developer]
18111                [  Mobile Developer,  Game Developer]
18112    [  Mobile Developer,  Game Developer,  Full-St...
18113                [  Mobile Developer,  Game Developer]
18118    [  DevOps / SysAdmin,  Full-Stack Web Develope...
18125    [  Front-End Web Developer,  Full-Stack Web De...
18129                                 [  Mobile Developer]
18130    [  Front-End Web Developer,    User Experience...
18131    [Game Developer,    Front-End Web Developer,  ...
18151                          [  Front-End Web Developer]
18153    [Information Security,  Full-Stack Web Developer]
18154                           [Full-Stack Web Developer]
18155    [Full-Stack Web Developer,    Front-End Web De...
18156                           [Full-Stack Web Developer]
18157    [Back-End Web Developer,  Data Engineer,    Mo...
18160                         [  User Experience Designer]
18161                           [Full-Stack Web Developer]
18162    [  Data Scientist,  Game Developer,    Quality...
18163    [Back-End Web Developer,  Data Engineer,    Da...
18171    [  DevOps / SysAdmin,    Mobile Developer,    ...
18174    [Back-End Web Developer,  Data Engineer,    Da...
Name: JobRoleInterest, Length: 6992, dtype: object

In [7]:

## We now use a lambda function to count the
## number of job roles within each row and sum up their frequencies

n_options = multi_jobs.apply(lambda x: len(x))## x represents number of job roles in the row

## Generate frequency table of number of options
n_freq = n_options.value_counts(normalize=True).sort_index()*100

n_freq

Out[7]:

1     31.650458
2     10.883867
3     15.889588
4     15.217391
5     12.042334
6      6.721968
7      3.861556
8      1.759153
9      0.986842
10     0.471968
11     0.185927
12     0.300343
13     0.028604
Name: JobRoleInterest, dtype: float64

This shows that almost 32% of respondents chose only one job role. However, the majority of the other respondents chose more than one role, with most choosing between 2-5 options.

This indicates that many potential customers are interested in multi-disciplinary learning, and it would be in the platform's best interests to offer learning opportunities on a variety of subjects to attract the broadest customer segment.

I will next need to select all rows that contain a mention of a certain role to get a better idea of how much interest there really is, including those who are interested in multiple roles.

The main course offerings are in web and mobile development, but the company is also interested in offering courses in data science, game development or other technical fields that meet our customers' needs.

So let's first see what proportion of the respondents are interested in at least one of those 4 fields: Web Development, Mobile Development, Game Development

In [8]:

## Select all the rows where at least one of the included job roles is mentioned
markets_4 = jobrole_no_nulls.str.contains('Web Developer|Mobile Developer|Data Scientist|Game Developer'
                                         ) ## returns a boolean array of the column

markets_4_freq = markets_4.value_counts(normalize=True)*100

markets_4_freq

Out[8]:

True     93.320938
False     6.679062
Name: JobRoleInterest, dtype: float64

About 93% of survey respondents showed an interest in learning in at least one of those fields. This indicates the company is on the right track in terms of its planned course offerings. Next, let's see how many respondents are interested in our main 2 areas of specialization: Web and Mobile Development.

In [9]:

## Identify rows that include either Web or Mobile Developer as a selected job role
web_mobile = jobrole_no_nulls.str.contains('Web Developer|Mobile Developer') ## returns a boolean array

## Generate a relative frequency table
web_mobile_freq = web_mobile.value_counts(normalize=True)*100

web_mobile_freq

Out[9]:

True     86.241419
False    13.758581
Name: JobRoleInterest, dtype: float64

Or visualized below:

In [10]:

## Display the plots within the cell
%matplotlib inline

##Import pyplot module from matplotlib library; this will remain imported
import matplotlib.pyplot as plt

## Set style; this will hold true for future plots generated
plt.style.use('ggplot')

## Generate bar plot of the relative frequency table for web or mobile development
web_mobile_freq.plot.bar()

## Set the title; y number pads the title upward
plt.title('Interest in Web or Mobile Development', y = 1.1)
## Set x tick labels and rotate horizontally
plt.xticks([0,1],['Web or Mobile \nDevelopment', 'Other'], rotation=0)
## Set y label and size
plt.ylabel('Percentage', fontsize=12)
## Set x label and size
plt.xlabel('Job Roles', fontsize=12)
## Set limits of y axis; since we normalized the frequencies to percentage,
## upper limit should be 100%
plt.ylim([0,100])
plt.show()

Here, it shows that 86% of respondents listed some interest in just Web or Mobile Development. It's safe to say it would be good for the company to continue specializing in these areas, while providing some offerings in Data Science or Game Development to supplement.

Let's continue this line of investigation to get even more nuanced data regarding the job roles respondents are most interested in.

Let's see both what percentage just mentions web development, and which role within web development (Full Stack, Front End, Back End) is more in demand.

In [11]:

## Generate boolean array for row containing Web Developer as a job selection
## Then generate a relative frequency table (all within same code line)
web= jobrole_no_nulls.str.contains('Web Developer').value_counts(normalize=True)*100

web

Out[11]:

True     82.608696
False    17.391304
Name: JobRoleInterest, dtype: float64

Or visualized below:

In [12]:

%matplotlib inline

## ggplot2 style is kept from previous set.style method executed above

web.plot.bar()

## Set same graph style parameters
plt.title('Interest in Web Development', y = 1.1)
plt.xticks([0,1],['Web Development', 'Other'], rotation=0)
plt.ylabel('Percentage', fontsize=12)
plt.xlabel('Job Roles', fontsize=12)
plt.ylim([0,100])
plt.show()

And interest in Mobile Development?

In [13]:

## Select rows with Mobile Developer selected as a job interest; 
## then generate relative frequency table
mobile = jobrole_no_nulls.str.contains('Mobile Developer').value_counts(normalize=True)*100

mobile

Out[13]:

False    67.048055
True     32.951945
Name: JobRoleInterest, dtype: float64

Above, we that only about 33% of respondent indicated some interested in Mobile Development, while in the graph above, we see that almost 83% indicated some interest in Web Development.

If we recall from the first frequency table made above, it showed that at the top, in response with only one job interest selected, data scientist actually took the third place spot, behind full-stack and front-end web development.

Let's see at what frequency it occurs across all responses, including those with multiple selections.

In [14]:

## Select rows with Data Scientis selected as a job interest; 
## then generate relative frequency table
data_sci = jobrole_no_nulls.str.contains('Data Scientist').value_counts(normalize=True)*100

data_sci

Out[14]:

False    76.501716
True     23.498284
Name: JobRoleInterest, dtype: float64

Interest in data science shows up in only about 23% of response, even less than mobile development.

Conclusion #1

From this initial data on Job Roles, it's confirmed that web and mobile development continue to be the best fields to focus our e-learning courses on, as they account for the majority of interest in the sample. I could run this same analysis for other single interest roles to see how often they are mentioned in the multi-selection answers, but it seems unlikely given the data we've already seen that they would account for a greater share than either mobile or web.

Besides, focusing particularly on web development (accounting for the largest share of interest in our sample) provides the opportunity to reach out to learners in a variety of sub-fields (full stack, front-end, back-end) and create more specially curated content towards these users.

Location & Density of Potential New Coders

Now that the target job roles are confirmed, I now want to analyze where new coders in the sample are located.

I can use the columns previously noted, CountryCitizen and CountryLive, to see both which countries new coders are originally from and where they are living now (if it is different than their origin country).

However, for the purpose of advertising, I'm more interested where new coders are living now.

Since, country is the most granular level of data we have on coders' location, I will use 'country' as synonymous with 'market' and find the two best countries/national markets to advertise in.

I will again remove all respondents who didn't answer to which job role they were interested in. This way, the sample remains representative of the population we are interested in. I'll use it to generate frequency tables for the CountryLive column.

In [15]:

## Drop na value in the JobRoleInterest column from the entire survey dataset
job_no_nulls = survey_data[survey_data['JobRoleInterest'].notnull()]

## Generate an absolute frequency table for the CountryLive column
job_no_nulls['CountryLive'].value_counts()

Out[15]:

United States of America         3125
India                             528
United Kingdom                    315
Canada                            260
Poland                            131
Brazil                            129
Germany                           125
Australia                         112
Russia                            102
Ukraine                            89
Nigeria                            84
Spain                              77
France                             75
Romania                            71
Netherlands (Holland, Europe)      65
Italy                              62
Philippines                        52
Serbia                             52
Greece                             46
Ireland                            43
South Africa                       39
Mexico                             37
Turkey                             36
Hungary                            34
Singapore                          34
New Zealand                        33
Croatia                            32
Argentina                          32
Sweden                             31
Norway                             31
                                 ... 
Samoa                               1
Botswana                            1
Liberia                             1
Sudan                               1
Bolivia                             1
Myanmar                             1
Guadeloupe                          1
Turkmenistan                        1
Yemen                               1
Nicaragua                           1
Trinidad & Tobago                   1
Channel Islands                     1
Gibraltar                           1
Rwanda                              1
Angola                              1
Cayman Islands                      1
Vanuatu                             1
Panama                              1
Guatemala                           1
Qatar                               1
Kyrgyzstan                          1
Aruba                               1
Cameroon                            1
Cuba                                1
Mozambique                          1
Jordan                              1
Papua New Guinea                    1
Somalia                             1
Nambia                              1
Anguilla                            1
Name: CountryLive, Length: 137, dtype: int64

The table above shows that, in absolute terms, our sample has most respondents residing in the US. Let's look at it in relative proportions.

In [16]:

## Generate relative frequency table
top_countries = job_no_nulls['CountryLive'].value_counts(normalize=True)*100

top_countries

Out[16]:

United States of America         45.700497
India                             7.721556
United Kingdom                    4.606610
Canada                            3.802281
Poland                            1.915765
Brazil                            1.886517
Germany                           1.828020
Australia                         1.637906
Russia                            1.491664
Ukraine                           1.301550
Nigeria                           1.228429
Spain                             1.126060
France                            1.096812
Romania                           1.038315
Netherlands (Holland, Europe)     0.950570
Italy                             0.906698
Philippines                       0.760456
Serbia                            0.760456
Greece                            0.672711
Ireland                           0.628839
South Africa                      0.570342
Mexico                            0.541094
Turkey                            0.526470
Hungary                           0.497221
Singapore                         0.497221
New Zealand                       0.482597
Croatia                           0.467973
Argentina                         0.467973
Sweden                            0.453349
Norway                            0.453349
                                   ...    
Samoa                             0.014624
Botswana                          0.014624
Liberia                           0.014624
Sudan                             0.014624
Bolivia                           0.014624
Myanmar                           0.014624
Guadeloupe                        0.014624
Turkmenistan                      0.014624
Yemen                             0.014624
Nicaragua                         0.014624
Trinidad & Tobago                 0.014624
Channel Islands                   0.014624
Gibraltar                         0.014624
Rwanda                            0.014624
Angola                            0.014624
Cayman Islands                    0.014624
Vanuatu                           0.014624
Panama                            0.014624
Guatemala                         0.014624
Qatar                             0.014624
Kyrgyzstan                        0.014624
Aruba                             0.014624
Cameroon                          0.014624
Cuba                              0.014624
Mozambique                        0.014624
Jordan                            0.014624
Papua New Guinea                  0.014624
Somalia                           0.014624
Nambia                            0.014624
Anguilla                          0.014624
Name: CountryLive, Length: 137, dtype: float64

About 46% of respondents in the sample are US residents. This is leagues ahead of the next highest proportion coming from India at roughly 8% and the UK at roughly 4%.

Based on these results, the company could focus the majority of our advertising budget on the US market, since, taking this sample as representative of our study population, it accounts for nearly half of our potential market.

In [17]:

%matplotlib inline

## ggplot2 style remains

## Too many individual country values to visualize well
## So we visualize the top 5 rows here
top_countries[:5].plot.bar()

## Same style parameters
plt.title('Resident Countries of \nSurvey Respondents', y = 1)

## Change the xtick parameters to avoid label overlap but still be 
## easily readable; 50 is equivalent to 50 degrees rotation
plt.xticks(rotation=50)
plt.ylabel('Percentage', fontsize=12)
plt.xlabel('Countries', fontsize=12)
plt.ylim([0,100])
plt.show()

Let's see if there are any other regions though that could comprise a similar 'single' market outside of just a single country.

For example, althought the EU is comprised of different countries, it is characterized by a high degree of cultural exchange, labor migration and educational mobility, with similar services, companies and websites located throughout the regions.

Let's see how big of a market there is in the EU by looking at all respondents from EU countries in the sample. The EU includes:

Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain and Sweden

In [18]:

## Create list of EU countries for us in row selection of survey data
eu =['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic', 'Denmark', 
     'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 
     'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland', 'Portugal', 
     'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden']

## Select rows where the CountryLive column value matches one of the elements
## within the eu list above;
## then generate a relative frequency table
eu_countries = job_no_nulls['CountryLive'].isin(eu).value_counts(normalize=True)*100

eu_countries

Out[18]:

False    86.255721
True     13.744279
Name: CountryLive, dtype: float64

In [19]:

%matplotlib inline

eu_countries.plot.bar()

## Same style parameters
plt.title('Percentage of EU Resident \n among Survey Respondents', y = 1)

## Change the xtick label and rotation 
plt.xticks([0,1], ['Outside EU', 'Within EU'], rotation=0)
plt.ylabel('Percentage', fontsize=12)
plt.xlabel('Countries', fontsize=12)
plt.ylim([0,100])
plt.show()

Even if all respondents from EU countries in our sample are summed together, they still only account for roughly 14% of overall respondents.

Conclusion #2: The US should be our primary market of focus for advertising, given that it is the country of residence for 45% of survey respondents. The second market recommended for advertisement would be either the EU overall, accounting for 14% of respondents, if a website or advertising media can be found that extends over most or all of the EU. Otherwise, India would be the second recommended target market for advertising as, although it only accounts for roughly 8% of respondents, it is the country with the second highest occurence as a resident country in the data.

Since the e-learning platform is entirely in English, we will stick with US and India as the two recommended countries for advertising since they are both native English speaking countries and have the highest absolute frequencies of respondents.

One point to keep in mind:

Free Code Camp offers all of its content in English only, which would likely influence the nationalities using it (based on the average English level in a given country) and therefore who is responding to the survey. If the company is only interested in offering courses in English, then it could stick with this sample. However, if it is interested in hosting courses in a variety of languages or using automatic translation in its courses, it would need to use a different sample representing more linguistic communities to draw conclusions about which markets to advertise in.

Available Spending per Month

We saw before that the dataset contains a MoneyForLearning column, showing how much money in USD $ respondents had already spent on learning to code from the moment they started until the time of the survey.

The company sells monthly subscriptions to its e-learning website for $59/month.

So it is interested in analyzing how much each respondent spends monthly on coding, on average.

For this analysis, I will include respondents from the two countries selected above (the US and India) as our two highest potentially grossing markets. I will also include the UK and Canada, as they come in third and fourth, respectively, in terms of absolute frequency of respondents and are also native English speaking countries which the ads would be understandable and attractive to.

I will calculate a new column in the dataset using the MoneyForLearning column and dividing by the MonthsProgramming column to get the approximate amount spent monthly by each respondent.

Since I will be dividing one column by another, I first need to evaluate if there are any characteristics in the data of those columns that would mess up the calculation, inluding the respective data types and whether the column in the denominator (MonthsProgramming) has any 0's.

In [20]:

## Replace 0 values with 1 in MonthsProgramming to avoid division by 0
job_no_nulls['MonthsProgramming'].replace(0,1, inplace = True)

## New column for the amount of money each student spends each month
job_no_nulls['MoneyPerMonth'] = job_no_nulls['MoneyForLearning'] / job_no_nulls['MonthsProgramming']

job_no_nulls['MoneyPerMonth']

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/generic.py:4619: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Out[20]:

1           13.333333
2          200.000000
3            0.000000
4            0.000000
6            0.000000
9            5.555556
11           0.000000
13                NaN
14                NaN
15           0.000000
16          16.666667
18          35.714286
19          17.857143
21         100.000000
22         285.714286
23         100.000000
28           2.416667
29                NaN
30          66.666667
31           0.000000
32         100.000000
33          83.333333
34                NaN
35           0.000000
37                NaN
40          25.000000
41           0.000000
42          50.000000
43           0.000000
52           0.000000
             ...     
18080       25.000000
18081             NaN
18088      182.000000
18089        0.000000
18090        0.000000
18093       27.777778
18097        0.000000
18098        1.222222
18099     1000.000000
18107      275.000000
18111      200.000000
18112        0.000000
18113        0.000000
18118        0.000000
18125       28.571429
18129             NaN
18130        0.000000
18131       16.666667
18151        0.000000
18153        0.000000
18154      297.000000
18155        0.000000
18156     1000.000000
18157        0.000000
18160       33.333333
18161        0.000000
18162        0.000000
18163             NaN
18171    10000.000000
18174             NaN
Name: MoneyPerMonth, Length: 6992, dtype: float64

We can see above that the MoneyPerMonth column I created has some null values. We'll now remove those.

In [21]:

## Remove the null values from MoneyPerMonth and assign to new variable
money = job_no_nulls[job_no_nulls['MoneyPerMonth'].notnull()]

money['MoneyPerMonth']

Out[21]:

1           13.333333
2          200.000000
3            0.000000
4            0.000000
6            0.000000
9            5.555556
11           0.000000
15           0.000000
16          16.666667
18          35.714286
19          17.857143
21         100.000000
22         285.714286
23         100.000000
28           2.416667
30          66.666667
31           0.000000
32         100.000000
33          83.333333
35           0.000000
40          25.000000
41           0.000000
42          50.000000
43           0.000000
52           0.000000
55           0.000000
58           0.000000
63          16.666667
64          50.000000
66           2.777778
             ...     
18070        0.055556
18071        7.500000
18073       16.666667
18078      500.000000
18080       25.000000
18088      182.000000
18089        0.000000
18090        0.000000
18093       27.777778
18097        0.000000
18098        1.222222
18099     1000.000000
18107      275.000000
18111      200.000000
18112        0.000000
18113        0.000000
18118        0.000000
18125       28.571429
18130        0.000000
18131       16.666667
18151        0.000000
18153        0.000000
18154      297.000000
18155        0.000000
18156     1000.000000
18157        0.000000
18160       33.333333
18161        0.000000
18162        0.000000
18171    10000.000000
Name: MoneyPerMonth, Length: 6317, dtype: float64

In [22]:

## Remove null values from the CountryLive column, in preparation for 
## grouping by country
money = money[money['CountryLive'].notnull()]

## Group data set by country
## Calculate mean sum of money spent by students in each country each month
mpm_mean = money.groupby('CountryLive').mean()

mpm_mean

Out[22]:

	Age	AttendedBootcamp	BootcampFinish	BootcampLoanYesNo	BootcampRecommend	ChildrenNumber	CodeEventConferences	CodeEventDjangoGirls	CodeEventFCC	CodeEventGameJam	...	YouTubeFCC	YouTubeFunFunFunction	YouTubeGoogleDev	YouTubeLearnCode	YouTubeLevelUpTuts	YouTubeMIT	YouTubeMozillaHacks	YouTubeSimplilearn	YouTubeTheNewBoston	MoneyPerMonth
CountryLive
Afghanistan	18.750000	0.000000	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	...	1.0	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	0.000000
Albania	20.666667	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	NaN	NaN	1.0	7.111111
Algeria	23.750000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	NaN	1.0	1.0	NaN	1.0	0.000000
Angola	20.000000	0.000000	NaN	NaN	NaN	NaN	1.0	NaN	1.0	1.0	...	NaN	NaN	NaN	1.0	NaN	NaN	NaN	NaN	NaN	116.666667
Anguilla	25.000000	NaN	NaN	NaN	NaN	2.000000	NaN	NaN	1.0	NaN	...	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.000000
Argentina	26.466667	0.000000	NaN	NaN	NaN	1.166667	1.0	NaN	1.0	1.0	...	1.0	1.0	1.0	1.0	NaN	1.0	NaN	NaN	1.0	55.984444
Australia	28.465347	0.068627	0.142857	0.428571	0.857143	1.714286	1.0	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	419.268452
Austria	26.500000	0.062500	0.000000	0.000000	1.000000	2.000000	1.0	NaN	1.0	NaN	...	1.0	NaN	1.0	1.0	NaN	1.0	NaN	NaN	1.0	936.208333
Azerbaijan	27.666667	0.000000	NaN	NaN	NaN	1.000000	NaN	NaN	NaN	NaN	...	1.0	NaN	NaN	1.0	NaN	1.0	NaN	NaN	NaN	25.555556
Bahrain	14.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.000000
Bangladesh	24.277778	0.000000	NaN	NaN	NaN	2.000000	1.0	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	NaN	NaN	1.0	239.361883
Belarus	25.333333	0.000000	NaN	NaN	NaN	1.500000	1.0	NaN	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	21.323854
Belgium	27.705882	0.000000	NaN	NaN	NaN	1.500000	1.0	NaN	1.0	NaN	...	1.0	NaN	1.0	NaN	NaN	1.0	NaN	NaN	1.0	53.774510
Bosnia & Herzegovina	26.050000	0.050000	1.000000	1.000000	0.000000	NaN	NaN	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	19.807540
Botswana	23.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.000000
Brazil	24.783784	0.000000	NaN	NaN	NaN	1.555556	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	56.239402
Bulgaria	25.357143	0.000000	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	...	1.0	NaN	1.0	1.0	NaN	1.0	NaN	NaN	1.0	75.833333
Cambodia	22.000000	0.000000	NaN	NaN	NaN	NaN	1.0	NaN	1.0	1.0	...	1.0	NaN	1.0	1.0	NaN	NaN	NaN	NaN	1.0	10.079365
Cameroon	21.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	1.0	0.946970
Canada	26.924686	0.037657	0.625000	0.222222	0.555556	1.888889	1.0	NaN	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	113.510961
Cayman Islands	20.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.000000
Channel Islands	NaN	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	1.0	NaN	1.0	1.0	NaN	NaN	NaN	NaN	1.0	0.000000
Chile	24.200000	0.000000	NaN	NaN	NaN	1.000000	NaN	NaN	NaN	1.0	...	1.0	NaN	1.0	1.0	NaN	1.0	1.0	NaN	1.0	300.416667
China	25.600000	0.000000	NaN	NaN	NaN	1.000000	1.0	NaN	1.0	1.0	...	1.0	1.0	1.0	1.0	NaN	1.0	1.0	NaN	1.0	236.441270
Colombia	23.142857	0.000000	NaN	NaN	NaN	1.000000	1.0	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	NaN	1.0	1.0	NaN	1.0	60.399660
Costa Rica	22.833333	0.000000	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	...	1.0	NaN	1.0	1.0	NaN	1.0	NaN	NaN	1.0	28.111111
Croatia	28.692308	0.000000	NaN	NaN	NaN	1.500000	1.0	1.0	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	NaN	NaN	1.0	31.674298
Cuba	26.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	1.0	NaN	NaN	1.0	NaN	NaN	NaN	0.000000
Cyprus	36.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	NaN	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	17.395833
Czech Republic	23.615385	0.000000	NaN	NaN	NaN	2.000000	1.0	1.0	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	NaN	NaN	1.0	40.038767
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Senegal	34.500000	0.000000	NaN	NaN	NaN	4.000000	NaN	NaN	NaN	NaN	...	1.0	NaN	1.0	NaN	1.0	1.0	NaN	NaN	1.0	80.595238
Serbia	28.113636	0.000000	NaN	NaN	NaN	1.600000	1.0	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	77.626263
Singapore	24.793103	0.034483	0.000000	0.000000	1.000000	1.500000	1.0	NaN	NaN	NaN	...	1.0	NaN	1.0	1.0	NaN	1.0	NaN	NaN	1.0	51.618774
Slovakia	24.428571	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	29.738095
Slovenia	27.666667	0.000000	NaN	NaN	NaN	2.500000	1.0	NaN	NaN	NaN	...	1.0	1.0	1.0	1.0	NaN	1.0	NaN	NaN	1.0	18.425926
Somalia	34.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	150.000000
South Africa	27.944444	0.027778	0.000000	0.000000	0.000000	1.555556	1.0	NaN	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	1.0	75.043561
Spain	28.200000	0.106061	0.428571	0.714286	0.857143	1.200000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	784.553084
Sri Lanka	24.000000	0.125000	0.000000	0.000000	1.000000	NaN	1.0	NaN	NaN	NaN	...	1.0	NaN	1.0	1.0	1.0	1.0	NaN	1.0	1.0	26.607143
Sweden	25.370370	0.000000	NaN	NaN	NaN	1.000000	1.0	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	35.413206
Switzerland	29.214286	0.000000	NaN	NaN	NaN	1.000000	1.0	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	NaN	1.0	1.0	NaN	NaN	35.530045
Taiwan	31.166667	0.166667	0.500000	0.500000	0.500000	2.333333	NaN	1.0	1.0	NaN	...	1.0	NaN	1.0	1.0	1.0	1.0	1.0	NaN	1.0	417.966524
Thailand	23.900000	0.000000	NaN	NaN	NaN	3.000000	1.0	NaN	NaN	NaN	...	1.0	NaN	1.0	NaN	NaN	1.0	1.0	NaN	1.0	11.166314
Trinidad & Tobago	37.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	1.0	1.0	NaN	NaN	NaN	NaN	NaN	500.000000
Tunisia	21.000000	0.200000	1.000000	0.000000	1.000000	NaN	NaN	NaN	1.0	NaN	...	1.0	NaN	1.0	1.0	NaN	NaN	1.0	NaN	NaN	18.000000
Turkey	24.612903	0.000000	NaN	NaN	NaN	NaN	1.0	NaN	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	NaN	NaN	1.0	71.451613
Turkmenistan	17.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.000000
Uganda	26.666667	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	1.0	NaN	1.0	1.0	NaN	1.0	1.0	NaN	NaN	276.620370
Ukraine	25.797468	0.038462	0.000000	0.333333	1.000000	2.200000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	54.314874
United Arab Emirates	25.062500	0.000000	NaN	NaN	NaN	2.000000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	94.583333
United Kingdom	28.620939	0.025090	0.714286	0.571429	0.857143	1.833333	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	45.534443
United States of America	29.382282	0.081877	0.529661	0.341772	0.772152	1.918367	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	227.997996
Uruguay	30.500000	0.000000	NaN	NaN	NaN	2.000000	NaN	NaN	1.0	NaN	...	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	8.333333
Uzbekistan	21.333333	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	1.0	1.0	1.0	NaN	1.0	1.0	NaN	NaN	1.0	3.333333
Vanuatu	28.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	1.0	NaN	NaN	NaN	NaN	1.0	NaN	NaN	NaN	0.000000
Venezuela	23.384615	0.000000	NaN	NaN	NaN	NaN	1.0	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	11.923077
Vietnam	22.291667	0.000000	NaN	NaN	NaN	1.000000	NaN	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	NaN	1.0	NaN	NaN	1.0	243.324074
Virgin Islands (USA)	36.166667	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	...	1.0	1.0	1.0	1.0	1.0	1.0	NaN	1.0	NaN	60.416667
Yemen	1.000000	0.000000	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	NaN	10000.000000
Zimbabwe	28.750000	0.000000	NaN	NaN	NaN	1.500000	NaN	NaN	NaN	NaN	...	1.0	NaN	1.0	1.0	NaN	NaN	NaN	NaN	1.0	23.125000

131 rows × 106 columns

In [23]:

## See the average money spent per month for student in the top 4 countries
mpm_mean['MoneyPerMonth'][['United States of America',
                            'India', 'United Kingdom',
                            'Canada']].sort_values(ascending=False)

Out[23]:

CountryLive
United States of America    227.997996
India                       135.100982
Canada                      113.510961
United Kingdom               45.534443
Name: MoneyPerMonth, dtype: float64

This gives us some surprising results. I expected the countries with the highest average MoneyPerMonth to generally follow a patterns similar to average GDP. By this logic, one would expect the UK and Canada to have higerh average money per month available for learning than respondents in India.

We will generate plots to understand how the average spending per month is distributed among respondents from each country. This may give us a clue as to what is driving Indian respondents average spending per month to be higher than expected.

In [24]:

import seaborn as sns

## Create separate data set for the top 4 countries identified above
top_4 = money[money['CountryLive'].str.contains(
    'United States of America|India|United Kingdom|Canada')]

## Plot the MoneyPerMonth column for the top 4 countries
sns.set_style('darkgrid')
sns.boxplot(y = 'MoneyPerMonth', x = 'CountryLive',
            data = top_4)
plt.title('Money Spent Per Month Per Country\n(Distributions)',
         fontsize = 16)
plt.ylabel('Money Per Month (US dollars)')
plt.xlabel('Country')
plt.xticks([0,1,2,3],['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap
plt.show()

/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:454: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)

We can clearly see some obvious outliers already, just based on logic. Anything above 10,000 is unrealistic. We know that even for expensive coding coursework like bootcamps, the maximum cost is usally up to around 9,000 USD total. We can see values of 10,000 USD or above for India and the US. I'll first remove all values at or above 10,000 USD.

In [25]:

top_4_under = top_4[top_4['MoneyPerMonth'] < 10000]

top_4_under['MoneyPerMonth'].describe()

Out[25]:

count    3906.000000
mean      140.096420
std       555.236533
min         0.000000
25%         0.000000
50%         1.846591
75%        37.500000
max      9000.000000
Name: MoneyPerMonth, dtype: float64

Now, I will group our data again by country and mean MoneyPerMonth to see how removing those outliers changed our data.

In [69]:

top_4_mean = top_4_under.groupby('CountryLive').mean()

top_4_mean['MoneyPerMonth'].sort_values(ascending=False)

Out[69]:

CountryLive
United States of America    155.459187
India                       113.748387
Canada                      113.510961
United Kingdom               45.534443
Name: MoneyPerMonth, dtype: float64

We can see that India is still on par with Canada and above the UK, which still doesn't make sense based on what we know about the average GDP of each country.

In [27]:

import seaborn as sns

## Generate boxplot of distributions for the MoneyPerMonth column
sns.set_style('darkgrid')
sns.boxplot(y = 'MoneyPerMonth', x = 'CountryLive',
            data = top_4_under)
plt.title('Money Spent Per Month Per Country\n(Distributions)',
         fontsize = 16)
plt.ylabel('Money Per Month (US dollars)')
plt.xlabel('Country')
plt.xticks([0,1,2,3],['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap
plt.ylim(0,10000)
plt.show()

/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:454: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)

Students who spend thousands of dollar per month on learning are most likely either attending a bootcamp or doing some other kind of formal education, like university. The cheaper monthly cost of e-learning platforms is one of their major selling points over formal, in-person education. I could remove respondents who responded with very high monthly spending amount one of two ways:

I could manually set another, lower spending limit that we think will effectively filter out those respondents with thousands to spend each month, indicating they are most likely participating in a bootcamp or formal education
I could remove respondents from the data set who indicated on the survey that they were participating or had participated in a bootcamp.

Let's explore option 2 more by exploring the original dataset to which columns we could use to filter out bootcamp participants.

In [28]:

top_4_under.describe()

## We can see there is an AttendedBootcamp column

Out[28]:

	Age	AttendedBootcamp	BootcampFinish	BootcampLoanYesNo	BootcampRecommend	ChildrenNumber	CodeEventConferences	CodeEventDjangoGirls	CodeEventFCC	CodeEventGameJam	...	YouTubeFCC	YouTubeFunFunFunction	YouTubeGoogleDev	YouTubeLearnCode	YouTubeLevelUpTuts	YouTubeMIT	YouTubeMozillaHacks	YouTubeSimplilearn	YouTubeTheNewBoston	MoneyPerMonth
count	3866.000000	3889.000000	255.000000	257.000000	257.000000	611.000000	217.0	25.0	376.0	46.0	...	1463.0	244.0	631.0	561.0	264.0	808.0	85.0	43.0	646.0	3906.000000
mean	28.223487	0.066855	0.529412	0.334630	0.770428	1.906710	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	140.096420
std	8.989643	0.249803	0.500116	0.472782	0.421378	0.974833	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	555.236533
min	0.000000	0.000000	0.000000	0.000000	0.000000	1.000000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.000000
25%	22.000000	0.000000	0.000000	0.000000	1.000000	1.000000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.000000
50%	27.000000	0.000000	1.000000	0.000000	1.000000	2.000000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.846591
75%	33.000000	0.000000	1.000000	1.000000	1.000000	2.000000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	37.500000
max	71.000000	1.000000	1.000000	1.000000	1.000000	7.000000	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	9000.000000

8 rows × 106 columns

In [29]:

top_4_under['AttendedBootcamp'].value_counts()

## This looks like a binary response column; 0=no and 1=yes

Out[29]:

0.0    3629
1.0     260
Name: AttendedBootcamp, dtype: int64

This would indicate that there 260 respondents saying they did participate in a bootcamp. Let's look at what money theses respondents are spending per month to see if these answers could be skewing our data.

In [30]:

bootcamps = top_4_under[top_4_under['AttendedBootcamp'] == 1.0]

bootcamps['MoneyPerMonth'].describe()

Out[30]:

count     260.000000
mean     1031.457503
std      1459.192236
min         0.000000
25%        82.500000
50%       500.000000
75%      1297.619048
max      9000.000000
Name: MoneyPerMonth, dtype: float64

In [31]:

top_4_under['MoneyPerMonth'].describe()

Out[31]:

count    3906.000000
mean      140.096420
std       555.236533
min         0.000000
25%         0.000000
50%         1.846591
75%        37.500000
max      9000.000000
Name: MoneyPerMonth, dtype: float64

Looking at the descriptive statistics for MoneyPerMonth from those who attend Bootcamps compared to all respondents, it seems likely the Bootcamp affirmative respondents are skewing the MoneyPerMonth results. The mean MoneyPerMonth for those attending bootcamps is over 1000 USD, while the mean of all respondents is only 140 USD.

I'll now remove the respondents who attended bootcamps and see how it changes the results for MoneyPerMonth.

In [32]:

## Select only respondents who did not attend a bootcamp
top4_no_bc = top_4_under[top_4_under['AttendedBootcamp'] == 0.0]

## Generate new boxplots
sns.set_style('darkgrid')
sns.boxplot(y = 'MoneyPerMonth', x = 'CountryLive',
            data = top4_no_bc)
plt.title('Money Spent Per Month Per Country\n(Distributions)',
         fontsize = 16)
plt.ylabel('Money Per Month (US dollars)')
plt.xlabel('Country')
plt.xticks([0,1,2,3],['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap
plt.ylim(0,10000)
plt.show()

/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:454: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)

Although the variation in our data is gradually lessening and looking more realistic, we can still see that the data for India seems unusual compared to its average GDP compared vs. the average GDP of the other countries. We can also see in the box plot that the four highest data points on the upper wihisker are very dispersed from one another compared to the majority closer to the box itself. They do not match the rest of the data very well.

We'll take a look at the India data and the remove these 4 outlier data points.

In [33]:

## Examine India data points at 2000 or above
india_outliers = top4_no_bc[(top4_no_bc['CountryLive'] == 'India')&(top4_no_bc['MoneyPerMonth'] >= 2000)]

india_outliers

Out[33]:

	Age	BootcampFinish	BootcampLoanYesNo	BootcampName	BootcampRecommend	ChildrenNumber	CityPopulation	CodeEventConferences	CodeEventDjangoGirls	...	YouTubeFunFunFunction	YouTubeGoogleDev	YouTubeLearnCode	YouTubeLevelUpTuts	YouTubeMIT	YouTubeMozillaHacks	YouTubeOther	YouTubeSimplilearn	YouTubeTheNewBoston	MoneyPerMonth
1728	24.0	NaN	NaN	NaN	NaN	NaN	between 100,000 and 1 million	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	5000.000000
1755	20.0	NaN	NaN	NaN	NaN	NaN	more than 1 million	NaN	NaN	...	NaN	NaN	1.0	NaN	1.0	NaN	NaN	NaN	NaN	3333.333333
7989	28.0	NaN	NaN	NaN	NaN	NaN	between 100,000 and 1 million	1.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	5000.000000
8126	22.0	NaN	NaN	NaN	NaN	NaN	more than 1 million	NaN	NaN	...	NaN	1.0	NaN	NaN	1.0	NaN	NaN	NaN	1.0	5000.000000
9410	38.0	NaN	NaN	NaN	NaN	NaN	more than 1 million	1.0	NaN	...	1.0	1.0	NaN	NaN	1.0	1.0	NaN	NaN	NaN	2000.000000
12451	24.0	NaN	NaN	NaN	NaN	NaN	between 100,000 and 1 million	NaN	NaN	...	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	1.0	2000.000000
15587	27.0	NaN	NaN	NaN	NaN	NaN	more than 1 million	NaN	NaN	...	NaN	1.0	1.0	NaN	1.0	NaN	NaN	NaN	NaN	4166.666667

7 rows × 137 columns

In [34]:

## Remove the outliers for India
top4_no_outliers = top4_no_bc.drop(india_outliers.index) #Passes the row labels of Indian outliers
# to the drop function

In [35]:

## Generate boxplots
sns.set_style('darkgrid')
sns.boxplot(y = 'MoneyPerMonth', x = 'CountryLive',
            data = top4_no_outliers)
plt.title('Money Spent Per Month Per Country\n(Distributions)',
         fontsize = 16)
plt.ylabel('Money Per Month (US dollars)')
plt.xlabel('Country')
plt.xticks([0,1,2,3],['US', 'UK', 'India', 'Canada']) # avoids tick labels overlap
plt.ylim(0,10000)
plt.show()

/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:454: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)

The spread of these boxplots looks more coherent now. Let's regroup the data by country to see what the mean MoneyPerMonth spent is now.

In [70]:

top4_new_mean = top4_no_outliers.groupby('CountryLive').mean()

top4_new_mean['MoneyPerMonth'].sort_values(ascending=False)

Out[70]:

CountryLive
United States of America    76.350634
Canada                      64.127841
India                       53.923528
United Kingdom              34.468329
Name: MoneyPerMonth, dtype: float64

If we compare to the first grouping by country, the US and Canada now show the highest average spending per month by students, with India in 3rd place and the UK in 4th. Previously, before removing outliers, India was equal to Canada in monthly spending. While it is still a bit surprising that Indian respondents show more available montly spending on learning than UK respondents, we have already removed many outliers, so we can assume this trend is real. We also don't want to let our assumptions about how the world works overly influence what the data actually shows.

We can now safely say that the top market we want to advertise in is still the US. We saw before that the majority of respondents in the survey were from the US, and we now see above that US respondents have the highest available monthly spending, on average.

Which should be our 2nd national target market? We can see that Canada has the second highest available monthly spending, at roughly 64 USD. Given that the montly price for our e-learning platform is 59 USD, we would likely do well advertising in Canada since our price is below the average willingness to spend for beginner coders. However let's look again at how many respondents are from Canada vs. the rest of the countries.

In [37]:

top4_no_outliers['CountryLive'].value_counts(normalize=True)*100

Out[37]:

United States of America    73.826615
India                       12.313639
United Kingdom               7.509663
Canada                       6.350083
Name: CountryLive, dtype: float64

Although Canada has the second highest available money to spend monthly, it has the lowest percentage of respondents in our top 4 countries. India, on other other hand still shows an average willingness to spend (53 USD) relatively close to our monthly price and has a higher percentage of respondents (~12%) compared to Canada and the UK.

For this reason, we stand to gain more potential subscribers by choosing India as our second advertising market than by choosing Canada.

Conclusion #3¶

We will advertise our e-learning platform to learners in the fields of web and mobile development in both the US and Indian national markets.

How to allocate the advertising budget?

We can see above that 73% of the respondents in our top 4 countries come from the US, while only 12% come from the next best market, India.

I propose using these percentages to create a simple logic for allocating the advertising budget, by allocating 70% of the advertising budget to the US market (in line with the percentage of respondents from the US) and the remaining 30% to the Indian market.

However:

It might be useful to simply *pass the analysis onto the marketing or business development team for further decision making on the budget*, as other important factors to consider when making this decision are:

The costs of advertising in the US vs. the Indian market. How much will we have to pay per ad in each? If advertising in the US market is more expensive, should we allocate even more of our budget there to ensure we can create enough marketing material to reach our audience?
What types of advertisements (e.g. web banners, video, social media) work best in each market? How will relying on different types affect the cost of advertising in each market?
How many competitors do we have in each market? And what is the relative likelihood that we can outcompete them with this budget?

Marketing or business development would likely have the most available data and experience to answer these questions.

Options for Further Analysis

Is there any further useful information we could get out of this data to help marketing or business development make their decision? Let's see.

In [38]:

## Look at other information provided in the dataset's columns
list(top4_no_outliers.columns)

Out[38]:

['Age',
 'AttendedBootcamp',
 'BootcampFinish',
 'BootcampLoanYesNo',
 'BootcampName',
 'BootcampRecommend',
 'ChildrenNumber',
 'CityPopulation',
 'CodeEventConferences',
 'CodeEventDjangoGirls',
 'CodeEventFCC',
 'CodeEventGameJam',
 'CodeEventGirlDev',
 'CodeEventHackathons',
 'CodeEventMeetup',
 'CodeEventNodeSchool',
 'CodeEventNone',
 'CodeEventOther',
 'CodeEventRailsBridge',
 'CodeEventRailsGirls',
 'CodeEventStartUpWknd',
 'CodeEventWkdBootcamps',
 'CodeEventWomenCode',
 'CodeEventWorkshops',
 'CommuteTime',
 'CountryCitizen',
 'CountryLive',
 'EmploymentField',
 'EmploymentFieldOther',
 'EmploymentStatus',
 'EmploymentStatusOther',
 'ExpectedEarning',
 'FinanciallySupporting',
 'FirstDevJob',
 'Gender',
 'GenderOther',
 'HasChildren',
 'HasDebt',
 'HasFinancialDependents',
 'HasHighSpdInternet',
 'HasHomeMortgage',
 'HasServedInMilitary',
 'HasStudentDebt',
 'HomeMortgageOwe',
 'HoursLearning',
 'ID.x',
 'ID.y',
 'Income',
 'IsEthnicMinority',
 'IsReceiveDisabilitiesBenefits',
 'IsSoftwareDev',
 'IsUnderEmployed',
 'JobApplyWhen',
 'JobInterestBackEnd',
 'JobInterestDataEngr',
 'JobInterestDataSci',
 'JobInterestDevOps',
 'JobInterestFrontEnd',
 'JobInterestFullStack',
 'JobInterestGameDev',
 'JobInterestInfoSec',
 'JobInterestMobile',
 'JobInterestOther',
 'JobInterestProjMngr',
 'JobInterestQAEngr',
 'JobInterestUX',
 'JobPref',
 'JobRelocateYesNo',
 'JobRoleInterest',
 'JobWherePref',
 'LanguageAtHome',
 'MaritalStatus',
 'MoneyForLearning',
 'MonthsProgramming',
 'NetworkID',
 'Part1EndTime',
 'Part1StartTime',
 'Part2EndTime',
 'Part2StartTime',
 'PodcastChangeLog',
 'PodcastCodeNewbie',
 'PodcastCodePen',
 'PodcastDevTea',
 'PodcastDotNET',
 'PodcastGiantRobots',
 'PodcastJSAir',
 'PodcastJSJabber',
 'PodcastNone',
 'PodcastOther',
 'PodcastProgThrowdown',
 'PodcastRubyRogues',
 'PodcastSEDaily',
 'PodcastSERadio',
 'PodcastShopTalk',
 'PodcastTalkPython',
 'PodcastTheWebAhead',
 'ResourceCodecademy',
 'ResourceCodeWars',
 'ResourceCoursera',
 'ResourceCSS',
 'ResourceEdX',
 'ResourceEgghead',
 'ResourceFCC',
 'ResourceHackerRank',
 'ResourceKA',
 'ResourceLynda',
 'ResourceMDN',
 'ResourceOdinProj',
 'ResourceOther',
 'ResourcePluralSight',
 'ResourceSkillcrush',
 'ResourceSO',
 'ResourceTreehouse',
 'ResourceUdacity',
 'ResourceUdemy',
 'ResourceW3S',
 'SchoolDegree',
 'SchoolMajor',
 'StudentDebtOwe',
 'YouTubeCodeCourse',
 'YouTubeCodingTrain',
 'YouTubeCodingTut360',
 'YouTubeComputerphile',
 'YouTubeDerekBanas',
 'YouTubeDevTips',
 'YouTubeEngineeredTruth',
 'YouTubeFCC',
 'YouTubeFunFunFunction',
 'YouTubeGoogleDev',
 'YouTubeLearnCode',
 'YouTubeLevelUpTuts',
 'YouTubeMIT',
 'YouTubeMozillaHacks',
 'YouTubeOther',
 'YouTubeSimplilearn',
 'YouTubeTheNewBoston',
 'MoneyPerMonth']

Some further avenues for analysis could be:

Exploring the average age of respondents in our target markets
Exploring the gender balance of respondents in our target markets
We see some columns referencing whether respondents have attended different coding events, listen to different podcasts or have used different coding resources. By looking at which one of these media sources is most popular and/or which is most popular within each category, I might be able to identify some excellent advertising platforms.

Since we know there is a large gender gap in many tech fields still today, particularly in web and mobile development, let's look at gender first.

Gender Analysis: Job Role Interests

In [39]:

## Use dataset with null values in JobRole column removed
## Remove null values from Gender column

gender = job_no_nulls[job_no_nulls['Gender'].notnull()]

## First look at unique values in the gender column
gender['Gender'].value_counts()

Out[39]:

male           5221
female         1572
trans            42
genderqueer      38
agender          18
Name: Gender, dtype: int64

In [40]:

## Look at unique values in the GenderOther column
gender['GenderOther'].value_counts()

Out[40]:

Series([], Name: GenderOther, dtype: int64)

In [41]:

## Closer look at GenderOther
gender['GenderOther']

Out[41]:

1        NaN
2        NaN
3        NaN
4        NaN
6        NaN
9        NaN
11       NaN
13       NaN
14       NaN
15       NaN
16       NaN
18       NaN
19       NaN
21       NaN
22       NaN
23       NaN
28       NaN
29       NaN
30       NaN
31       NaN
32       NaN
33       NaN
34       NaN
35       NaN
37       NaN
40       NaN
41       NaN
42       NaN
43       NaN
52       NaN
        ... 
18080    NaN
18081    NaN
18088    NaN
18089    NaN
18090    NaN
18093    NaN
18097    NaN
18098    NaN
18099    NaN
18107    NaN
18111    NaN
18112    NaN
18113    NaN
18118    NaN
18125    NaN
18129    NaN
18130    NaN
18131    NaN
18151    NaN
18153    NaN
18154    NaN
18155    NaN
18156    NaN
18157    NaN
18160    NaN
18161    NaN
18162    NaN
18163    NaN
18171    NaN
18174    NaN
Name: GenderOther, Length: 6891, dtype: object

The GenderOther column seems to contain nothing but null values, indicating it wasn't used by respondents. Maybe it wasn't relevant to the majority of respondents who already answered the first 'Gender' selection question, or maybe the question was confusing or inaccessible. Regardless, I will stick with just the Gender column for further analysis.

We can see from looking at the unique values for that column that men account for the majority of respondents. Let's look more closely at the relative amounts though.

In [42]:

## Show proportion of each unique value of the total
gender['Gender'].value_counts(normalize=True)*100

Out[42]:

male           75.765491
female         22.812364
trans           0.609491
genderqueer     0.551444
agender         0.261210
Name: Gender, dtype: float64

Men account for a whopping 76% of those who responded. Women, trans and other non-binary genders together only account for approximately 25% of respondents. Female respondents alone account for only 23%.

This is a clear gender gap. We're well aware that women are less represented in tech fields, which leads to a variety of other problems, such as technology that doesn't meet the needs of women, harassment at work, wider gender-based salary gap and others. Let's say the company here wants to play a role in helping to correct this by encouraging women and underrepresented genders to learn programming and other tech skills on their platform so that they can move into tech careers.

I'll now do some exploration of job role interests and if there are any significant differences in interest between men and women. I'll first isolate the male and female respondents with their job roles in separate data frames, and then look at which job roles have the most interest proportionally. Let's look at what job roles women and non-binary genders are most interested in.

In [43]:

women_jobroles = gender[(gender['Gender'] == 'female')&(gender['JobRoleInterest'])]

women_jobroles['JobRoleInterest'].value_counts(normalize=True)*100

Out[43]:

Full-Stack Web Developer                                                                                                                                                                                                                 9.351145
  Front-End Web Developer                                                                                                                                                                                                                8.778626
  Data Scientist                                                                                                                                                                                                                         2.226463
  User Experience Designer                                                                                                                                                                                                               1.781170
  Mobile Developer                                                                                                                                                                                                                       1.463104
Back-End Web Developer                                                                                                                                                                                                                   1.399491
Information Security                                                                                                                                                                                                                     1.335878
  User Experience Designer,   Front-End Web Developer                                                                                                                                                                                    1.335878
Game Developer                                                                                                                                                                                                                           1.145038
  Front-End Web Developer,   User Experience Designer                                                                                                                                                                                    0.954198
  Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                      0.890585
Data Engineer                                                                                                                                                                                                                            0.699746
  Product Manager                                                                                                                                                                                                                        0.636132
Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer                                                                                                                                                              0.636132
  Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer                                                                                                                                                              0.572519
Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                                                      0.508906
Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                              0.508906
  Front-End Web Developer,   Mobile Developer, Full-Stack Web Developer                                                                                                                                                                  0.445293
  DevOps / SysAdmin                                                                                                                                                                                                                      0.445293
Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer                                                                                                                                                              0.445293
  User Experience Designer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                                          0.445293
  Mobile Developer, Full-Stack Web Developer                                                                                                                                                                                             0.381679
Back-End Web Developer,   Front-End Web Developer                                                                                                                                                                                        0.381679
Back-End Web Developer,   Data Scientist                                                                                                                                                                                                 0.381679
Game Developer,   Mobile Developer                                                                                                                                                                                                       0.381679
Back-End Web Developer, Full-Stack Web Developer                                                                                                                                                                                         0.381679
  Mobile Developer,   Front-End Web Developer                                                                                                                                                                                            0.381679
  Front-End Web Developer, Full-Stack Web Developer,   User Experience Designer                                                                                                                                                          0.318066
Full-Stack Web Developer, Back-End Web Developer                                                                                                                                                                                         0.318066
Full-Stack Web Developer,   User Experience Designer,   Front-End Web Developer                                                                                                                                                          0.318066
                                                                                                                                                                                                                                           ...   
  Mobile Developer, Game Developer,   User Experience Designer,   Product Manager, Full-Stack Web Developer,   Front-End Web Developer                                                                                                   0.063613
  Mobile Developer,   User Experience Designer, Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer                                                                                                              0.063613
  Front-End Web Developer,   User Experience Designer,   Data Scientist, Full-Stack Web Developer, Back-End Web Developer                                                                                                                0.063613
  Product Manager, Game Developer,   Front-End Web Developer,   User Experience Designer                                                                                                                                                 0.063613
  User Experience Designer,   Front-End Web Developer, Data Engineer, Back-End Web Developer, Game Developer,   Mobile Developer, Full-Stack Web Developer                                                                               0.063613
  Mobile Developer, Information Security, Back-End Web Developer,   Front-End Web Developer, Game Developer,   User Experience Designer                                                                                                  0.063613
  Mobile Developer,   Product Manager, Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer, Game Developer                                                                                                       0.063613
  User Experience Designer,   Front-End Web Developer, Full-Stack Web Developer,   Mobile Developer, Back-End Web Developer                                                                                                              0.063613
  User Experience Designer, Game Developer,   Data Scientist, Full-Stack Web Developer                                                                                                                                                   0.063613
  Data Scientist, Game Developer, Information Security,   Mobile Developer,   Front-End Web Developer                                                                                                                                    0.063613
  Mobile Developer,   User Experience Designer,   Data Scientist,   Front-End Web Developer,   Quality Assurance Engineer,   DevOps / SysAdmin, Full-Stack Web Developer, Data Engineer, Information Security, Back-End Web Developer    0.063613
  User Experience Designer,   Front-End Web Developer,   Mobile Developer, Game Developer                                                                                                                                                0.063613
  Front-End Web Developer,   Product Manager,   Mobile Developer,   Data Scientist, Data Engineer,   User Experience Designer                                                                                                            0.063613
  User Experience Designer, Game Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                          0.063613
Game Developer,   Front-End Web Developer,   Product Manager,   Mobile Developer, Back-End Web Developer                                                                                                                                 0.063613
Game Developer,   Mobile Developer,   Front-End Web Developer                                                                                                                                                                            0.063613
Full-Stack Web Developer,   Mobile Developer, Game Developer,   Front-End Web Developer, Back-End Web Developer                                                                                                                          0.063613
  Mobile Developer,   Front-End Web Developer, Full-Stack Web Developer,   Data Scientist,   Product Manager, Back-End Web Developer                                                                                                     0.063613
  User Experience Designer,   Mobile Developer, Back-End Web Developer,   Front-End Web Developer,   DevOps / SysAdmin, Full-Stack Web Developer                                                                                         0.063613
  Front-End Web Developer, Full-Stack Web Developer,   User Experience Designer,   Quality Assurance Engineer, Back-End Web Developer                                                                                                    0.063613
  Quality Assurance Engineer,   Front-End Web Developer,   Product Manager,   User Experience Designer                                                                                                                                   0.063613
Back-End Web Developer,   Mobile Developer,   Front-End Web Developer                                                                                                                                                                    0.063613
Back-End Web Developer,   Mobile Developer, Game Developer,   DevOps / SysAdmin,   Front-End Web Developer                                                                                                                               0.063613
I don't know yet!                                                                                                                                                                                                                        0.063613
  Product Manager, Back-End Web Developer, Full-Stack Web Developer,   Mobile Developer,   Front-End Web Developer                                                                                                                       0.063613
  Front-End Web Developer, Information Security, Game Developer,   Data Scientist                                                                                                                                                        0.063613
Data Engineer,   DevOps / SysAdmin                                                                                                                                                                                                       0.063613
Full-Stack Web Developer,   Mobile Developer, Game Developer, Back-End Web Developer,   Front-End Web Developer                                                                                                                          0.063613
  Mobile Developer,   Product Manager, Information Security,   User Experience Designer                                                                                                                                                  0.063613
Information Security, Full-Stack Web Developer,   Front-End Web Developer,   User Experience Designer, Back-End Web Developer,   Mobile Developer                                                                                        0.063613
Name: JobRoleInterest, Length: 874, dtype: float64

We can see here that, similar to the full dataset containing all genders, front end and full stack web development hold the top 2 positions, and data science is still within the top 4. However, we can also see that User Experience Designer is in the top 4. Much earlier in this analysis, when we looked at the full dataset containing repsondents of all genders, we barely saw rows where User Experience Designer was selected.

Next, I'll isolate male respondents to see their specific job role interests.

In [44]:

## Create dataset isolating respondents who identify as male

men_jobroles = gender[(gender['Gender'] == 'male')&(gender['JobRoleInterest'])]

## Generate frequency table
men_jobroles['JobRoleInterest'].value_counts(normalize=True)*100

Out[44]:

Full-Stack Web Developer                                                                                                                                                           12.679563
  Front-End Web Developer                                                                                                                                                           5.669412
Back-End Web Developer                                                                                                                                                              2.240950
  Data Scientist                                                                                                                                                                    2.106876
Game Developer                                                                                                                                                                      1.781268
  Mobile Developer                                                                                                                                                                  1.685501
Information Security                                                                                                                                                                1.340739
Full-Stack Web Developer,   Front-End Web Developer                                                                                                                                 1.072591
  Product Manager                                                                                                                                                                   0.823597
Data Engineer                                                                                                                                                                       0.804444
  Front-End Web Developer, Full-Stack Web Developer                                                                                                                                 0.785290
  Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer                                                                                                         0.536296
  Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer                                                                                                         0.517142
  DevOps / SysAdmin                                                                                                                                                                 0.517142
Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer                                                                                                         0.517142
Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer                                                                                                         0.497989
Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer                                                                                                         0.459682
Full-Stack Web Developer,   Mobile Developer                                                                                                                                        0.440529
  User Experience Designer                                                                                                                                                          0.440529
  User Experience Designer,   Front-End Web Developer                                                                                                                               0.402222
Back-End Web Developer, Full-Stack Web Developer                                                                                                                                    0.383068
Full-Stack Web Developer, Back-End Web Developer                                                                                                                                    0.383068
Full-Stack Web Developer, Back-End Web Developer,   Front-End Web Developer                                                                                                         0.325608
  Mobile Developer, Game Developer                                                                                                                                                  0.268148
  Data Scientist, Data Engineer                                                                                                                                                     0.268148
Data Engineer,   Data Scientist                                                                                                                                                     0.268148
Full-Stack Web Developer, Game Developer                                                                                                                                            0.268148
  Front-End Web Developer,   User Experience Designer                                                                                                                               0.268148
Back-End Web Developer,   Front-End Web Developer                                                                                                                                   0.268148
  Data Scientist, Full-Stack Web Developer                                                                                                                                          0.248994
                                                                                                                                                                                     ...    
  Mobile Developer, Back-End Web Developer,   Front-End Web Developer,   User Experience Designer, Full-Stack Web Developer, Information Security                                   0.019153
Full-Stack Web Developer, Game Developer, Back-End Web Developer,   User Experience Designer                                                                                        0.019153
  Front-End Web Developer,   DevOps / SysAdmin, Back-End Web Developer, Full-Stack Web Developer,   Data Scientist,   Mobile Developer, Data Engineer                               0.019153
Data Engineer,   Data Scientist,   Product Manager, Back-End Web Developer,   Quality Assurance Engineer                                                                            0.019153
Back-End Web Developer, Game Developer,   Front-End Web Developer, Full-Stack Web Developer,   Mobile Developer,   Quality Assurance Engineer                                       0.019153
  Data Scientist, Full-Stack Web Developer,   DevOps / SysAdmin, Data Engineer                                                                                                      0.019153
Information Security, Back-End Web Developer,   Mobile Developer, Full-Stack Web Developer,   Data Scientist                                                                        0.019153
  Front-End Web Developer, Full-Stack Web Developer, Data Engineer, Game Developer,   Data Scientist, Back-End Web Developer                                                        0.019153
  Mobile Developer, Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer,   Product Manager                                                                  0.019153
  Quality Assurance Engineer, Back-End Web Developer, Data Engineer, Game Developer                                                                                                 0.019153
Game Developer,   Mobile Developer, Full-Stack Web Developer,   Front-End Web Developer,   DevOps / SysAdmin                                                                        0.019153
Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer,   DevOps / SysAdmin, Information Security                                                              0.019153
Information Security, Data Engineer,   Data Scientist,   Front-End Web Developer, Back-End Web Developer, Game Developer,   DevOps / SysAdmin, Full-Stack Web Developer             0.019153
Back-End Web Developer, Game Developer, Full-Stack Web Developer, Data Engineer                                                                                                     0.019153
Full-Stack Web Developer, Game Developer, Back-End Web Developer,   Front-End Web Developer,   DevOps / SysAdmin                                                                    0.019153
  DevOps / SysAdmin,   Front-End Web Developer, Back-End Web Developer, Information Security, Full-Stack Web Developer                                                              0.019153
Information Security,   Product Manager,   Data Scientist                                                                                                                           0.019153
  Data Scientist, Back-End Web Developer,   Product Manager,   DevOps / SysAdmin,   Mobile Developer                                                                                0.019153
  Product Manager, Full-Stack Web Developer, Back-End Web Developer,   Mobile Developer                                                                                             0.019153
Back-End Web Developer, Information Security,   DevOps / SysAdmin, Full-Stack Web Developer                                                                                         0.019153
Game Developer, Data Engineer, Back-End Web Developer, Information Security                                                                                                         0.019153
  DevOps / SysAdmin, Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer, Information Security                                                              0.019153
  Front-End Web Developer,   Mobile Developer,   User Experience Designer, Game Developer                                                                                           0.019153
  Data Scientist, Data Engineer, Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer                                                                        0.019153
Full-Stack Web Developer,   Data Scientist,   Mobile Developer, Game Developer, Information Security, Back-End Web Developer,   Front-End Web Developer, Data Engineer              0.019153
  Front-End Web Developer,   User Experience Designer,   Mobile Developer, Information Security, Back-End Web Developer, Full-Stack Web Developer                                   0.019153
  Front-End Web Developer, Data Engineer, Full-Stack Web Developer,   Data Scientist, Back-End Web Developer                                                                        0.019153
  Data Scientist, Data Engineer,   DevOps / SysAdmin, Full-Stack Web Developer, Back-End Web Developer                                                                              0.019153
Data Engineer,   Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer, Information Security,   Mobile Developer,   Data Scientist                              0.019153
  DevOps / SysAdmin,   Front-End Web Developer, Full-Stack Web Developer, Game Developer,   Mobile Developer, Back-End Web Developer, Data Engineer,   User Experience Designer     0.019153
Name: JobRoleInterest, Length: 2514, dtype: float64

The frequency table for male respondents shows that web development, specifically full-stack and front-end development, are in the top choices in single selection answers, just as for female respondents.

One things that stands out comparing the frequency tables of job role interests between male and femal respondents is the relative interest in the role of Back-End Developer, which shows the reverse pattern from UX Design. From the frequency table for male respondents, we see that Back-End Developer was the number 3 job role among single selections in the Job Role Interest column. However, for female respondents, its takes only 6th place single job role selections.

Next, I'll get a fuller picture of the interest in different job roles by selecting all rows, including the multi-select answers, containing the names of certain roles. I'll then graphically compare the popularity of different roles between male and female respondents.

First, I'll check whether web development really does have relatively equal interest among men and women in our sample.

Web Development Gender Comparison¶

In [45]:

## Isolate rows indicating interest in web development for women
women_web = women_jobroles['JobRoleInterest'].str.contains('Web Developer')

## Generate frequency table for relative interest
women_web_freq = women_web.value_counts(normalize=True)*100

women_web_freq

Out[45]:

True     81.615776
False    18.384224
Name: JobRoleInterest, dtype: float64

In [46]:

## Bar plot in matplotlib, still with ggplot style
women_web_freq.plot.bar()

## Set plot parameters
plt.title('Percentage of Women Interested in \nWeb Development')
plt.xticks([0,1],['Web Dev','Other'], rotation=0)
plt.xlabel('Job Role', fontsize=12)
plt.ylabel('Percentage', fontsize=12)
plt.ylim([0,100])

plt.show()

In [47]:

## How many men are interested in web development compared to women?

men_web = men_jobroles['JobRoleInterest'].str.contains('Web Developer')

men_web_freq = men_web.value_counts(normalize=True)*100

men_web_freq

#Roughly 82% of women showed some interest in web development
#For men, it's roughly 83%

Out[47]:

True     82.991764
False    17.008236
Name: JobRoleInterest, dtype: float64

In [48]:

## Plot a comparison of the relative percentages of interest in web dev

## Create new data set from relative frequencies for web dev interest
web_data = {'Men': men_web_freq,
            'Women': women_web_freq}

web_compare = pd.concat(web_data, axis=1)

web_compare

Out[48]:

	Men	Women
True	82.991764	81.615776
False	17.008236	18.384224

In [49]:

## Create grouped bar chart
web_compare.plot.bar()

plt.title('Comparative Gender Differences in \nWeb Development Interest')
plt.xlabel('Interested- True or False?')
plt.ylabel('Percentage')
plt.ylim([0,100])
plt.legend()

plt.show()

The bar graph clearly shows that interest for working in web development is almost equal for both men and women.

Next, I'll look at relative interest between genders in UX design.

UX Design Interest Gender Comparison¶

In [50]:

## Isolate responses with the word "designer" in job role interest column
women_ux = women_jobroles['JobRoleInterest'].str.contains('User Experience Designer')

## Generate frequency table
women_ux_freq = women_ux.value_counts(normalize=True)*100

women_ux_freq

Out[50]:

False    68.956743
True     31.043257
Name: JobRoleInterest, dtype: float64

In [51]:

## Generate bar plot

women_ux_freq.plot.bar()

plt.title('Women Interested in \nUX Design')
plt.xticks([0,1],['Other','UX Design'], rotation=0)
plt.xlabel('Job Role', fontsize=12)
plt.ylabel('Percentage', fontsize=12)
plt.ylim([0,100])

plt.show()

It looks like about 31% of women who responded indicated some interest in User Experience Design. While not nearly as high as interest in Web Development, it stands out because in the full dataset containing all genders,UX design did not appear anywhere near the top results for relative interest. This is why we did not explor this job role before.

Logically, this must mean that most men showed little interest in UX design, skewing the results. Let's look at how many men are interested in UX design.

In [52]:

## How many men are intersted in user experience design compared to women?

men_ux = men_jobroles['JobRoleInterest'].str.contains('User Experience Designer')

men_ux_freq = men_ux.value_counts(normalize=True)*100

men_ux_freq

##Only about 18% of men showed an interest in UX design, compared to 31% of women

Out[52]:

False    82.149014
True     17.850986
Name: JobRoleInterest, dtype: float64

In [53]:

## Plotting differences in interest in UX design between men & women

## Create new data set from UX relative frequencies
ux_data = {'Men': men_ux_freq,
            'Women': women_ux_freq}

ux_compare = pd.concat(ux_data, axis=1)

ux_compare

## Create grouped bar chart
ux_compare.plot.bar()

plt.title('Comparative Gender Differences in \nUX Design Interest')
plt.xlabel('Interested- True or False?')
plt.ylabel('Percentage')
plt.ylim([0,100])
plt.legend()

plt.show()

The bar graph above shows that there is not particularly high interest in UX design among either women or men, with neither accounting fo 50% or more of interest in job roles. However, of those interested, significantly more women are interested in UX design than men.

Let's see if there are differences in men and women interested in back-end development.

In [54]:

## Isolate women who listed Back-End Web Development in JobRoleInterest column
women_backend = women_jobroles['JobRoleInterest'].str.contains('Back-End Web Developer')

## Generate frequency table
women_backend_freq = women_backend.value_counts(normalize=True)*100

women_backend_freq

## Roughly 34% of female respondents showed interest in back-end development

Out[54]:

False    66.284987
True     33.715013
Name: JobRoleInterest, dtype: float64

In [55]:

## Isolate men who listed Back-End Web Development in JobRoleInterest column
men_backend = men_jobroles['JobRoleInterest'].str.contains('Back-End Web Developer')

## Generate frequency table
men_backend_freq = men_backend.value_counts(normalize=True)*100

men_backend_freq

## Roughly 41% of male respondents showed interest in back-end development

Out[55]:

False    58.762689
True     41.237311
Name: JobRoleInterest, dtype: float64

In [56]:

## Plotting differences in interest in back-end development between men & women

## Create new data set from back-end relative frequencies
backend_data = {'Men': men_backend_freq,
            'Women': women_backend_freq}

backend_compare = pd.concat(backend_data, axis=1)

## Create grouped bar chart
backend_compare.plot.bar()

plt.title('Comparative Gender Differences in \nBack-End Development Interest')
plt.xlabel('Interested- True or False?')
plt.ylabel('Percentage')
plt.ylim([0,100])
plt.legend()

plt.show()

Interest in back-end development shows the reverse pattern from UX Design. While not a top choice for either gender, significantly more men show interest in back-end development than women.

Gender Analysis: Conclusion #1

We can see a few obvious differences by comparing the frequency tables for job role interest between men and women.

Although web development is the top field of interest among men and women, at roughly 82% and 81% interest respectively, UX design has far more female respondents interested than men. 31% of women showed an interest in UX design compared to only 17% of men.

Similarly, back-end web development has more men interested in it than women, with 41% of men showing interest compared to only 34% of women.

This indicates that if the company wants to encourage more women to use its platform, one potential way of doing so would be to offer some courses in user experience design, or courses that combine elements of UX design and web development and/or data science.

While it's not necessarily important for our immediate business decisions, this also indicates that the tech industry as a whole might have a problem with a gender gap particularly in back-end web development. It would be interesting to investigate this further to look into reasons why fewer women are interested in purusing back-end development and why so few men are interested in pursuing UX design. This could also have implications for average earning between men and women in tech, as the average salary for back-end developers os often significantly higher than UX designers.

While this is an interesting point to consider, it is outside the scope of the current goal of the analysis.

Gender Analysis: Location

Now that we know what proportion of our sample identifies as women and what fields women are most interested in, it would also be interesting to check if the women in our sample are mostly located within the same top 4 countries we identified earlier, or not. I also want to compare to check where most men are, and how that would skew the results for the country analysis on the full data we did before. Let's have a look.

In [57]:

## Remove null values in the CountryLive column from gender dataset
gender_countries = gender[gender['CountryLive'].notnull()]

gender_countries

## Isolate women and the CountryLive column
women_countries = gender_countries[(gender_countries['Gender'] == 'female')
                                   &(gender_countries['CountryLive'])]

## Generate frequency table
women_countries_freq = women_countries['CountryLive'].value_counts(normalize=True)*100

women_countries_freq

Out[57]:

United States of America         56.957929
United Kingdom                    4.983819
Canada                            4.142395
India                             3.689320
Australia                         1.941748
Poland                            1.682848
Germany                           1.553398
Brazil                            1.488673
Netherlands (Holland, Europe)     1.294498
Russia                            1.229773
France                            1.035599
Ukraine                           1.035599
Spain                             0.970874
Nigeria                           0.970874
Italy                             0.906149
Romania                           0.906149
Sweden                            0.776699
Great Britain                     0.711974
Philippines                       0.711974
Japan                             0.582524
Malaysia                          0.517799
Denmark                           0.517799
Vietnam                           0.517799
Singapore                         0.517799
South Africa                      0.453074
Serbia                            0.453074
Ireland                           0.388350
Turkey                            0.388350
Portugal                          0.388350
Bosnia & Herzegovina              0.388350
                                   ...    
Iran                              0.129450
Chile                             0.129450
Colombia                          0.129450
Sri Lanka                         0.129450
Jamaica                           0.129450
Austria                           0.129450
Kenya                             0.129450
Indonesia                         0.129450
Virgin Islands (USA)              0.129450
Belarus                           0.129450
Pakistan                          0.129450
Egypt                             0.129450
Kazakhstan                        0.064725
Luxembourg                        0.064725
Morocco                           0.064725
Iraq                              0.064725
Haiti                             0.064725
Dominican Republic                0.064725
Niger                             0.064725
Uruguay                           0.064725
Lithuania                         0.064725
Ghana                             0.064725
Bahrain                           0.064725
Jordan                            0.064725
Taiwan                            0.064725
Papua New Guinea                  0.064725
Hong Kong                         0.064725
Saudi Arabia                      0.064725
Paraguay                          0.064725
Cambodia                          0.064725
Name: CountryLive, Length: 83, dtype: float64

In [58]:

## Isolate women and the CountryLive column
men_countries = gender_countries[(gender_countries['Gender'] == 'male')
                                 &(gender_countries['CountryLive'])]

## Generate frequency table
men_countries_freq = men_countries['CountryLive'].value_counts(normalize=True)*100

men_countries_freq

Out[58]:

United States of America         41.956142
India                             9.101494
United Kingdom                    4.482826
Canada                            3.725985
Poland                            2.037648
Brazil                            1.998836
Germany                           1.843586
Russia                            1.571900
Australia                         1.552494
Ukraine                           1.397244
Nigeria                           1.319620
Spain                             1.164370
France                            1.125558
Romania                           1.086746
Italy                             0.931496
Serbia                            0.873278
Netherlands (Holland, Europe)     0.815059
Greece                            0.815059
Philippines                       0.776247
Ireland                           0.718028
South Africa                      0.620997
Mexico                            0.620997
Hungary                           0.582185
Turkey                            0.582185
Indonesia                         0.562779
Pakistan                          0.562779
Argentina                         0.543373
Egypt                             0.523967
Croatia                           0.523967
Norway                            0.523967
                                   ...    
Myanmar                           0.019406
Sudan                             0.019406
Guadeloupe                        0.019406
Turkmenistan                      0.019406
Liberia                           0.019406
Gambia                            0.019406
Samoa                             0.019406
Haiti                             0.019406
Nicaragua                         0.019406
Botswana                          0.019406
Guatemala                         0.019406
Bahrain                           0.019406
Cuba                              0.019406
Cameroon                          0.019406
Iraq                              0.019406
Panama                            0.019406
Angola                            0.019406
Qatar                             0.019406
Kyrgyzstan                        0.019406
Aruba                             0.019406
Gibraltar                         0.019406
Channel Islands                   0.019406
Mozambique                        0.019406
Afghanistan                       0.019406
Nambia                            0.019406
Rwanda                            0.019406
Somalia                           0.019406
Vanuatu                           0.019406
Trinidad & Tobago                 0.019406
Anguilla                          0.019406
Name: CountryLive, Length: 133, dtype: float64

In [59]:

## Compare top 5 countries for men and women side by side

## Show first 5 rows
women_top5 = women_countries_freq.head()

## Show first 5 rows
men_top5 = men_countries_freq.head()

## Create dictionary with data
top5_data = {'Women': women_top5, 'Men': men_top5}

## Create new frame from both series
top5_mw = pd.concat(top5_data, axis=1)

top5_mw

Out[59]:

	Men	Women
Australia	NaN	1.941748
Canada	3.725985	4.142395
India	9.101494	3.689320
Poland	2.037648	NaN
United Kingdom	4.482826	4.983819
United States of America	41.956142	56.957929

In [60]:

## Plotting grouped bar chart

top5_mw.plot.bar()

plt.title('Comparative Gender Differences by \nResident Country')
plt.xlabel('Countries')
plt.ylabel('Percentage Residing')
plt.ylim([0,100])
plt.legend()

plt.show()

Gender Analysis: Conclusion #2¶

We can tell a few things from looking at this gender breakdown by country.

First, the same 4 countries appear in the top 4 for both men and women in our sample, only that they appear in a slightly different order of importance. The order for men (US, India, UK, Canada) reflects a similar order from greatest number of interested learners to least that we found when analysing the overall data set including all genders.

However, when we isolate countries for women, India falls to a lower rank, with only roughly 4% of all female respondents residing there, compared to 9% of male respondents. Poland is the 5th country with the most male respondents, while Australia is the 5th country with the most female respondents. The top countries for female respondents are, in descending order, the US, UK, Canada and India.

This has some interesting implications for the company if it were interested in helping to close the gender gap in tech by appealing to its broadest base of female audience. The US is still the clear winner in the sample when it comes to interested female learners, accounting for 56%. But India is no longer the second most important market. Instead, the UK, accounting for roughly 5% of female respondents has the second highest proportion of female learners and Canada is close behind at 4%.

This means, if the company decided at some point to run a targeted marketing campaign encouraging women to learn skills related to tech and development, it would rather want to switch its advertising campaign to focus still primarily on the US, but with a secondary focus on either the UK or Canada.

Gender Analysis: Estimated Available Spending per Month¶

In the previous analysis on available spending per month for the full data, I created a MoneyPerMonth column by dividing the MoneyForLearning column by the MonthsProgramming column, and then removed null values.

I will use this and filter by gender to:

see if there is a general difference in available monthly spending for courses between men and women. This will be interesting to know over all countries to see if there is a gap in ability to pay. Given the well-known gender pay gap between men and women, this is likely, and can inform our knowledge on how much of disadvantage women who want to learn tech skills may be at when it comes to paying for course
use the group by function to see how available spending for women differs by country, and compared to men, in our selected 3 countries of interest for female potential coders (US, UK, Canada)

I will again remove respondents who have attended bootcamps, to avoid skewing the data towards high monthly spending amounts and use a sample of spending data that is more realistic for a remote, self-paced learning platform like ours.

In [61]:

women_mpm = money[(money['Gender']=='female')&
                  (money['AttendedBootcamp']==0.0)&
                  (money['MoneyPerMonth'])]

women_mpm['MoneyPerMonth'].describe()

Out[61]:

count      597.000000
mean       187.835575
std        879.070620
min          0.066667
25%          8.333333
50%         30.000000
75%        100.000000
max      15000.000000
Name: MoneyPerMonth, dtype: float64

In [62]:

men_mpm = money[(money['Gender']=='male')&
                (money['AttendedBootcamp']==0.0)&
                (money['MoneyPerMonth'])]

men_mpm['MoneyPerMonth'].describe()

Out[62]:

count     2192.000000
mean       259.765508
std       2208.591653
min          0.033333
25%          7.500000
50%         25.000000
75%         84.910000
max      80000.000000
Name: MoneyPerMonth, dtype: float64

We can see from the descriptive stats provided that we will still need to remove significant outliers. The data for the men has a much larger standard deviation than that of the women, but both are very large. Also, the maximum value in both is rather unrealistic compared to context of monthly spending for learning programming and compared to the IQRs.

We'll graph both in boxplots to visualize how might be best to define outliers to remove.

In [63]:

## Concatenate women and men MPM data to single dataframe for use in pyplot
## Reset index to remove gaps within columns

gender_mpm = pd.concat([women_mpm['MoneyPerMonth'].reset_index(drop=True).rename('Women'),
                            men_mpm['MoneyPerMonth'].reset_index(drop=True).rename('Men')], axis=1)

gender_mpm

Out[63]:

	Women	Men
0	35.714286	13.333333
1	100.000000	200.000000
2	285.714286	5.555556
3	100.000000	16.666667
4	166.666667	17.857143
5	2.777778	2.416667
6	1.388889	66.666667
7	2.714286	100.000000
8	10.000000	83.333333
9	40.000000	25.000000
10	100.000000	50.000000
11	16.666667	16.666667
12	117.500000	50.000000
13	8.333333	2.777778
14	58.333333	1.785714
15	166.666667	357.142857
16	4.166667	50.000000
17	2.727273	16.666667
18	50.000000	1.052632
19	10.416667	0.833333
20	12.500000	100.000000
21	11.111111	50.000000
22	1.250000	55.555556
23	166.666667	8.333333
24	20.000000	20.833333
25	16.666667	35.416667
26	60.000000	18.750000
27	600.000000	1.500000
28	144.000000	25.000000
29	12.500000	50.000000
...	...	...
2162	NaN	30.000000
2163	NaN	50.000000
2164	NaN	16.666667
2165	NaN	33.333333
2166	NaN	4.166667
2167	NaN	15.000000
2168	NaN	66.666667
2169	NaN	75.000000
2170	NaN	33.333333
2171	NaN	10.416667
2172	NaN	14.285714
2173	NaN	5.555556
2174	NaN	21.428571
2175	NaN	83.333333
2176	NaN	13.888889
2177	NaN	150.000000
2178	NaN	69.444444
2179	NaN	0.055556
2180	NaN	7.500000
2181	NaN	500.000000
2182	NaN	25.000000
2183	NaN	1.222222
2184	NaN	1000.000000
2185	NaN	275.000000
2186	NaN	200.000000
2187	NaN	16.666667
2188	NaN	297.000000
2189	NaN	1000.000000
2190	NaN	33.333333
2191	NaN	10000.000000

2192 rows × 2 columns

In [64]:

## Generate boxplots 
sns.set_style('darkgrid')
sns.boxplot(data = gender_mpm)
plt.title('Money Spent Per Month \nby Gender')
plt.ylabel('Money Per Month (US dollars)')
plt.xlabel('Gender')
plt.xticks([0,1],['Women','Men']) # avoids tick labels overlap

plt.show()

/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:454: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)

The box plots show outliers for both the Men and Women columns, although the most significant outliers are in the Men column. For uniformity, we will follow the same methodology used previously in the full dataset to remove outliers: removing any data points indicating MPM at or above 10,000 USD.

In [71]:

## Index gender_mpm for row where the value in Men or Women is less than or equal to 10000
gender_mpm_removed = gender_mpm[(gender_mpm['Women'] < 10000)&(gender_mpm['Men'] <= 10000)]

gender_mpm_removed

Out[71]:

	Women	Men
0	35.714286	13.333333
1	100.000000	200.000000
2	285.714286	5.555556
3	100.000000	16.666667
4	166.666667	17.857143
5	2.777778	2.416667
6	1.388889	66.666667
7	2.714286	100.000000
8	10.000000	83.333333
9	40.000000	25.000000
10	100.000000	50.000000
11	16.666667	16.666667
12	117.500000	50.000000
13	8.333333	2.777778
14	58.333333	1.785714
15	166.666667	357.142857
16	4.166667	50.000000
17	2.727273	16.666667
18	50.000000	1.052632
19	10.416667	0.833333
20	12.500000	100.000000
21	11.111111	50.000000
22	1.250000	55.555556
23	166.666667	8.333333
24	20.000000	20.833333
25	16.666667	35.416667
26	60.000000	18.750000
27	600.000000	1.500000
28	144.000000	25.000000
29	12.500000	50.000000
...	...	...
567	30.000000	500.000000
568	66.666667	333.333333
569	5.555556	33.333333
570	100.000000	104.166667
571	5.000000	800.000000
572	166.666667	150.000000
573	12.500000	2.916667
574	12.500000	1.250000
575	25.000000	5.000000
576	166.666667	138.888889
577	54.545455	1.250000
578	12.500000	15.000000
579	1400.000000	114.285714
580	3.333333	62.500000
581	50.000000	25.000000
582	250.000000	62.500000
583	866.666667	10.000000
584	166.666667	266.666667
585	33.250000	166.666667
586	25.000000	400.000000
587	0.833333	5.000000
588	2.500000	12.500000
589	25.000000	28.571429
590	8.333333	83.333333
591	25.000000	2.500000
592	300.000000	200.000000
593	10.000000	44.444444
594	16.666667	20.000000
595	182.000000	3.333333
596	28.571429	7.000000

593 rows × 2 columns

In [72]:

## Generate new boxplots
sns.set_style('darkgrid')
sns.boxplot(data = gender_mpm_removed)
plt.title('Money Spent Per Month \nby Gender without Outliers')
plt.ylabel('Money Per Month (US dollars)')
plt.xlabel('Gender')
plt.xticks([0,1],['Women','Men']) # avoids tick labels overlap
plt.ylim([0,10000])
plt.show()

/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:454: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)

Once I've removed outliers, we can see that actually, when considering respondents from all countries, women and men indicated relatively equal financial resources dedicated to learning.

Now, I will instead compare how men and women might differ in the amount of financial resources avaialble for learning by country using the 3 that we found to have the most female respondents: US, UK Canada

In [73]:

## Isolate gender data set for:
## 1) male & female respondents
## 2) our 3 countries of interest for gender analysis: US, UK, Canada
## 3) remove outliers in MoneyPerMonth equal to or greater than 10000
gender_top3_mw = money[(money['CountryLive'].str.contains(
'United States of America|United Kingdom|Canada'))&(money['Gender'].str.contains(
'female|male')&(money['MoneyPerMonth']<10000))]

## Generate grouped box plots by country and gender
sns.set_style('darkgrid')
sns.boxplot(x='CountryLive', y='MoneyPerMonth', hue='Gender', data=gender_top3_mw)

plt.title('Money Spent Per Month \nby Gender and Country')
plt.ylabel('Money Per Month (US dollars)')
plt.xlabel('Country')
plt.xticks(rotation=0) # avoids tick labels overlap
plt.ylim([0,10000])
plt.show()

/dataquest/system/env/python3/lib/python3.4/site-packages/seaborn/categorical.py:482: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data[hue_mask])

In [68]:

## Group gender_top3_mw by country and gender with average MPM
top3_mw_grouped = gender_top3_mw.groupby(['CountryLive','Gender']).mean()

top3_mw_grouped['MoneyPerMonth']

Out[68]:

CountryLive               Gender
Canada                    female    152.252715
                          male      102.158928
United Kingdom            female     78.673260
                          male       36.668347
United States of America  female    205.464014
                          male      138.559582
Name: MoneyPerMonth, dtype: float64

Gender Analysis: Conclusion #3¶

We can see that, despite available statistics showing average salaries for women being lower than for men, female respondents to the survey in the US, UK and Canada actually reported having more money available to spend on learning per month than men.

This disproves our original hypothesis: that female survey respondents may be at a disadvantage financially compared to the male respondents and may benefit from a lower cost for our e-learning courses.

Although the data here shows that women in these 3 countries have the financial resources to study programming and other tech skills, they still represent a minority in our sample, accounting for only 22% of overall responses.

Therefore, should the company want to contribute to reducing the gender representation gap in tech fields, offering a scholarship or discount on learning fees to women in their advertising campaign may be a way to encourage more women to consider using their courses to start a career in tech.