Best Markets for Advertising

Suppose we are data analysts for an online e-learning company that specializes in programming courses. We cover domains such as data science and game development, but our primary focus is web and mobile development. Our goal is to promote our products and invest money in more advertisement, but to do that we need to know what markets to advertise in. We ultiized three surveys related to programming/web development from FreeCodeCamp and Stack Overflow.

These surveys were conducted online by participants worldwide in 2016, 2017, and 2018. FreeCodeCamp's surveys targeted new programmers and asked many questions related to career interest, income expectations, age, gender, home country, time spent programming, and so on. Stack Overflow's 2018 survey was aimed primarily at individuals already in the developer community concerning topics from favorite technologies to job preferences.

We discovered that new programmers are interested in a wide variety of career fields to include web development, data science, data engineering, game development, QA engineering, machine learning, and many other careers. We found that the likely motivator for their programming journey was to advance their income and career opportunities. With this knowledge, we need to ensure that our courses stay up to date, relevant, and beneficial for our customers.

Most importantly, after exploring the surveys we discovered that the two best potential countries to invest our advertising in were the United States and India. Both countries had the highest number of survey participants, which indicates that most new programmers are presumably most numerous in these two countries. Secondly, The US has the highest average monthly spending for programming education, whereas India has a lower average spending. However, India's average monthly spending is still around the same amount as our monthly subscription (\$59 US dollars per month).

In short, the two best markets for advertising include the United States and India, we recommend to the marketing team to focus their efforts into these two regions.

We want to answer questions about a population of new coders that are interested in the subjects we teach. We'd like to know:

  • Where are these new coders located.
  • What are the locations with the greatest number of new coders.
  • How much money new coders are willing to spend on learning.

FreeCodeCamp Survey: https://www.freecodecamp.org/news/we-asked-20-000-people-who-they-are-and-how-theyre-learning-to-code-fff5d668969

Github repository: Survey Year 2017: https://github.com/freeCodeCamp/2017-new-coder-survey/tree/master/clean-data

Survey Year 2016: https://github.com/freeCodeCamp/2016-new-coder-survey#about-the-data

Stack Overflow Survey: https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2018-developer-survey

Some limitations for analyzing survey data:

  • For some questions, participants had the freedom to write in their own responses; this makes it difficult to properly every single response into unique values due to spelling, grammar, punctuation, and word usage, however we have done our best to clean some of these columns
  • Almost all columns have missing data, participants were able to leave questions blank if they did not want to answer a particular question; this makes it impossible to get a completely accurate analysis of all the data

Method

  • Load datasets
  • Clean dataframes, including standardizing any columns if needed
  • Concatenate/merge datasets
  • Correct any remaining inconsistencies/errors
  • Perform analysis and visualization
In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.style as style
#style.use("fivethirtyeight")
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
#pd.options.display.float_format = '{:20,.2f}'.format
In [2]:
pd.options.display.max_columns = 150 # to avoid truncated output 

# Freecodecamp survey 2017
csv = pd.read_csv("2017-fCC-New-Coders-Survey-Data.csv", low_memory= False)

# Freecodecamp survey 2016
csv2016 = pd.read_csv("2016-fCC-New-Coders-Survey-Data.csv", low_memory = False)

# Stack exchange survey
exchange = pd.read_csv("survey_results_public.csv", low_memory= False)
In [3]:
csv.head()
Out[3]:
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev CodeEventHackathons CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming NetworkID Part1EndTime Part1StartTime Part2EndTime Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk PodcastTalkPython PodcastTheWebAhead ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston
0 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes Canada Canada software development and IT NaN Employed for wages NaN NaN NaN NaN female NaN NaN 1.0 0.0 1.0 0.0 0.0 0.0 NaN 15.0 02d9465b21e8bd09374b0066fb2d5614 eb78c1c3ac6cd9052aec557065070fbf NaN NaN 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN start your own business NaN NaN NaN English married or domestic partnership 150.0 6.0 6f1fbc6b2b 2017-03-09 00:36:22 2017-03-09 00:32:59 2017-03-09 00:59:46 2017-03-09 00:36:26 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 34.0 0.0 NaN NaN NaN NaN NaN less than 100,000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 35000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 10.0 5bfef9ecb211ec4f518cfc1d2a6f3e0c 21db37adb60cdcafadfa7dca1b13b6b1 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a nonprofit 1.0 Full-Stack Web Developer in an office with other developers English single, never married 80.0 6.0 f8f8be6910 2017-03-09 00:37:07 2017-03-09 00:33:26 2017-03-09 00:38:59 2017-03-09 00:37:10 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN 1.0 1.0 some college credit, no degree NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 21.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN NaN NaN NaN NaN 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN 15 to 29 minutes United States of America United States of America software development and IT NaN Employed for wages NaN 70000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 25.0 14f1863afa9c7de488050b82eb3edd96 21ba173828fbe9e27ccebaf4d5166a55 13000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN work for a medium-sized company 1.0 Front-End Web Developer, Back-End Web Develo... no preference Spanish single, never married 1000.0 5.0 2ed189768e 2017-03-09 00:37:58 2017-03-09 00:33:53 2017-03-09 00:40:14 2017-03-09 00:38:02 1.0 NaN 1.0 NaN NaN NaN NaN NaN NaN Codenewbie NaN NaN NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN high school diploma or equivalent (GED) NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN
3 26.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN I work from home Brazil Brazil software development and IT NaN Employed for wages NaN 40000.0 0.0 NaN male NaN 0.0 1.0 1.0 1.0 1.0 0.0 0.0 40000.0 14.0 91756eb4dc280062a541c25a3d44cfb0 3be37b558f02daae93a6da10f83f0c77 24000.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a medium-sized company NaN Front-End Web Developer, Full-Stack Web Deve... from home Portuguese married or domestic partnership 0.0 5.0 dbdc0664d1 2017-03-09 00:40:13 2017-03-09 00:37:45 2017-03-09 00:42:26 2017-03-09 00:40:18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN NaN some college credit, no degree NaN NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN
4 20.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Portugal Portugal NaN NaN Not working but looking for work NaN 140000.0 NaN NaN female NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 aa3f061a1949a90b27bef7411ecd193f d7c56bbf2c7b62096be9db010e86d96d NaN 0.0 0.0 0.0 NaN Within 7 to 12 months 1.0 NaN NaN NaN 1.0 1.0 NaN 1.0 1.0 NaN NaN NaN NaN work for a multinational corporation 1.0 Full-Stack Web Developer, Information Security... in an office with other developers Portuguese single, never married 0.0 24.0 11b0f2d8a9 2017-03-09 00:42:45 2017-03-09 00:39:44 2017-03-09 00:45:42 2017-03-09 00:42:50 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN bachelor's degree Information Technology NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [4]:
exchange.head()
Out[4]:
Respondent Hobby OpenSource Country Student Employment FormalEducation UndergradMajor CompanySize DevType YearsCoding YearsCodingProf JobSatisfaction CareerSatisfaction HopeFiveYears JobSearchStatus LastNewJob AssessJob1 AssessJob2 AssessJob3 AssessJob4 AssessJob5 AssessJob6 AssessJob7 AssessJob8 AssessJob9 AssessJob10 AssessBenefits1 AssessBenefits2 AssessBenefits3 AssessBenefits4 AssessBenefits5 AssessBenefits6 AssessBenefits7 AssessBenefits8 AssessBenefits9 AssessBenefits10 AssessBenefits11 JobContactPriorities1 JobContactPriorities2 JobContactPriorities3 JobContactPriorities4 JobContactPriorities5 JobEmailPriorities1 JobEmailPriorities2 JobEmailPriorities3 JobEmailPriorities4 JobEmailPriorities5 JobEmailPriorities6 JobEmailPriorities7 UpdateCV Currency Salary SalaryType ConvertedSalary CurrencySymbol CommunicationTools TimeFullyProductive EducationTypes SelfTaughtTypes TimeAfterBootcamp HackathonReasons AgreeDisagree1 AgreeDisagree2 AgreeDisagree3 LanguageWorkedWith LanguageDesireNextYear DatabaseWorkedWith DatabaseDesireNextYear PlatformWorkedWith PlatformDesireNextYear FrameworkWorkedWith FrameworkDesireNextYear IDE OperatingSystem NumberMonitors Methodology VersionControl CheckInCode AdBlocker AdBlockerDisable AdBlockerReasons AdsAgreeDisagree1 AdsAgreeDisagree2 AdsAgreeDisagree3 AdsActions AdsPriorities1 AdsPriorities2 AdsPriorities3 AdsPriorities4 AdsPriorities5 AdsPriorities6 AdsPriorities7 AIDangerous AIInteresting AIResponsible AIFuture EthicsChoice EthicsReport EthicsResponsible EthicalImplications StackOverflowRecommend StackOverflowVisit StackOverflowHasAccount StackOverflowParticipate StackOverflowJobs StackOverflowDevStory StackOverflowJobsRecommend StackOverflowConsiderMember HypotheticalTools1 HypotheticalTools2 HypotheticalTools3 HypotheticalTools4 HypotheticalTools5 WakeTime HoursComputer HoursOutside SkipMeals ErgonomicDevices Exercise Gender SexualOrientation EducationParents RaceEthnicity Age Dependents MilitaryUS SurveyTooLong SurveyEasy
0 1 Yes No Kenya No Employed part-time Bachelor’s degree (BA, BS, B.Eng., etc.) Mathematics or statistics 20 to 99 employees Full-stack developer 3-5 years 3-5 years Extremely satisfied Extremely satisfied Working as a founder or co-founder of my own c... I’m not actively looking, but I am open to new... Less than a year ago 10.0 7.0 8.0 1.0 2.0 5.0 3.0 4.0 9.0 6.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 1.0 4.0 2.0 5.0 5.0 6.0 7.0 2.0 1.0 4.0 3.0 My job status or other personal status changed NaN NaN Monthly NaN KES Slack One to three months Taught yourself a new language, framework, or ... The official documentation and/or standards fo... NaN To build my professional network Strongly agree Strongly agree Neither Agree nor Disagree JavaScript;Python;HTML;CSS JavaScript;Python;HTML;CSS Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... AWS;Azure;Linux;Firebase AWS;Azure;Linux;Firebase Django;React Django;React Komodo;Vim;Visual Studio Code Linux-based 1 Agile;Scrum Git Multiple times per day Yes No NaN Strongly agree Strongly agree Strongly agree Saw an online advertisement and then researche... 1.0 5.0 4.0 7.0 2.0 6.0 3.0 Artificial intelligence surpassing human intel... Algorithms making important decisions The developers or the people creating the AI I'm excited about the possibilities more than ... No Yes, and publicly Upper management at the company/organization Yes 10 (Very Likely) Multiple times per day Yes I have never participated in Q&A on Stack Over... No, I knew that Stack Overflow had a jobs boar... Yes NaN Yes Extremely interested Extremely interested Extremely interested Extremely interested Extremely interested Between 5:00 - 6:00 AM 9 - 12 hours 1 - 2 hours Never Standing desk 3 - 4 times per week Male Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent 25 - 34 years old Yes NaN The survey was an appropriate length Very easy
1 3 Yes Yes United Kingdom No Employed full-time Bachelor’s degree (BA, BS, B.Eng., etc.) A natural science (ex. biology, chemistry, phy... 10,000 or more employees Database administrator;DevOps specialist;Full-... 30 or more years 18-20 years Moderately dissatisfied Neither satisfied nor dissatisfied Working in a different or more specialized tec... I am actively looking for a job More than 4 years ago 1.0 7.0 10.0 8.0 2.0 5.0 4.0 3.0 6.0 9.0 1.0 5.0 3.0 7.0 10.0 4.0 11.0 9.0 6.0 2.0 8.0 3.0 1.0 5.0 2.0 4.0 1.0 3.0 4.0 5.0 2.0 6.0 7.0 I saw an employer’s advertisement British pounds sterling (£) 51000 Yearly 70841.0 GBP Confluence;Office / productivity suite (Micros... One to three months Taught yourself a new language, framework, or ... The official documentation and/or standards fo... NaN NaN Agree Agree Neither Agree nor Disagree JavaScript;Python;Bash/Shell Go;Python Redis;PostgreSQL;Memcached PostgreSQL Linux Linux Django React IPython / Jupyter;Sublime Text;Vim Linux-based 2 NaN Git;Subversion A few times per week Yes Yes The website I was visiting asked me to disable it Somewhat agree Neither agree nor disagree Neither agree nor disagree NaN 3.0 5.0 1.0 4.0 6.0 7.0 2.0 Increasing automation of jobs Increasing automation of jobs The developers or the people creating the AI I'm excited about the possibilities more than ... Depends on what it is Depends on what it is Upper management at the company/organization Yes 10 (Very Likely) A few times per month or weekly Yes A few times per month or weekly Yes No, I have one but it's out of date 7 Yes A little bit interested A little bit interested A little bit interested A little bit interested A little bit interested Between 6:01 - 7:00 AM 5 - 8 hours 30 - 59 minutes Never Ergonomic keyboard or mouse Daily or almost every day Male Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent 35 - 44 years old Yes NaN The survey was an appropriate length Somewhat easy
2 4 Yes Yes United States No Employed full-time Associate degree Computer science, computer engineering, or sof... 20 to 99 employees Engineering manager;Full-stack developer 24-26 years 6-8 years Moderately satisfied Moderately satisfied Working as a founder or co-founder of my own c... I’m not actively looking, but I am open to new... Less than a year ago NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 5 No No United States No Employed full-time Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... 100 to 499 employees Full-stack developer 18-20 years 12-14 years Neither satisfied nor dissatisfied Slightly dissatisfied Working as a founder or co-founder of my own c... I’m not actively looking, but I am open to new... Less than a year ago NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A recruiter contacted me U.S. dollars ($) NaN NaN NaN NaN NaN Three to six months Completed an industry certification program (e... The official documentation and/or standards fo... NaN NaN Disagree Disagree Strongly disagree C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell SQL Server;Microsoft Azure (Tables, CosmosDB, ... SQL Server;Microsoft Azure (Tables, CosmosDB, ... Azure Azure NaN Angular;.NET Core;React Visual Studio;Visual Studio Code Windows 2 Agile;Kanban;Scrum Git Multiple times per day Yes Yes The ad-blocking software was causing display i... Neither agree nor disagree Somewhat agree Somewhat agree Stopped going to a website because of their ad... NaN NaN NaN NaN NaN NaN NaN Artificial intelligence surpassing human intel... Artificial intelligence surpassing human intel... A governmental or other regulatory body I don't care about it, or I haven't thought ab... No Yes, but only within the company Upper management at the company/organization Yes 10 (Very Likely) A few times per week Yes A few times per month or weekly Yes No, I have one but it's out of date 8 Yes Somewhat interested Somewhat interested Somewhat interested Somewhat interested Somewhat interested Between 6:01 - 7:00 AM 9 - 12 hours Less than 30 minutes 3 - 4 times per week NaN I don't typically exercise Male Straight or heterosexual Some college/university study without earning ... White or of European descent 35 - 44 years old No No The survey was an appropriate length Somewhat easy
4 7 Yes No South Africa Yes, part-time Employed full-time Some college/university study without earning ... Computer science, computer engineering, or sof... 10,000 or more employees Data or business analyst;Desktop or enterprise... 6-8 years 0-2 years Slightly satisfied Moderately satisfied Working in a different or more specialized tec... I’m not actively looking, but I am open to new... Between 1 and 2 years ago 8.0 5.0 7.0 1.0 2.0 6.0 4.0 3.0 10.0 9.0 1.0 10.0 2.0 4.0 8.0 3.0 11.0 7.0 5.0 9.0 6.0 2.0 1.0 4.0 5.0 3.0 7.0 3.0 6.0 2.0 1.0 4.0 5.0 My job status or other personal status changed South African rands (R) 260000 Yearly 21426.0 ZAR Office / productivity suite (Microsoft Office,... Three to six months Taken a part-time in-person course in programm... The official documentation and/or standards fo... NaN NaN Strongly agree Agree Strongly disagree C;C++;Java;Matlab;R;SQL;Bash/Shell Assembly;C;C++;Matlab;SQL;Bash/Shell SQL Server;PostgreSQL;Oracle;IBM Db2 PostgreSQL;Oracle;IBM Db2 Arduino;Windows Desktop or Server Arduino;Windows Desktop or Server NaN NaN Notepad++;Visual Studio;Visual Studio Code Windows 2 Evidence-based software engineering;Formal sta... Zip file back-ups Weekly or a few times per month No NaN NaN Somewhat agree Somewhat agree Somewhat disagree Clicked on an online advertisement;Saw an onli... 2.0 3.0 4.0 6.0 1.0 7.0 5.0 Algorithms making important decisions Algorithms making important decisions The developers or the people creating the AI I'm excited about the possibilities more than ... No Yes, but only within the company Upper management at the company/organization Yes 10 (Very Likely) Daily or almost daily Yes Less than once per month or monthly No, I knew that Stack Overflow had a jobs boar... No, I know what it is but I don't have one NaN Yes Extremely interested Extremely interested Extremely interested Extremely interested Extremely interested Before 5:00 AM Over 12 hours 1 - 2 hours Never NaN 3 - 4 times per week Male Straight or heterosexual Some college/university study without earning ... White or of European descent 18 - 24 years old Yes NaN The survey was an appropriate length Somewhat easy

Data Processing and Cleaning

The first step in our analysis is to identify the appropriate columns that are relevant. Unfortunately there are over 100 columns which is far too many for a practical analysis.

We identified a few columns for analysis using datapackage.json. This JSON file describes each column for FreeCodeCamp's new coder surveys.

In [5]:
# Index location of the first set of columns to drop
print(csv.columns.get_loc("CodeEventConferences"))
print(csv.columns.get_loc("CodeEventWorkshops"))
8
23
In [6]:
# Drops columns
csv = csv.drop(csv.iloc[:, 8:23], axis=1)
In [7]:
# Index location of the next set of columns to drop
print(csv.columns.get_loc("NetworkID"))
print(csv.columns.get_loc("ResourceW3S"))
59
100
In [8]:
# Drop columns
csv = csv.drop(csv.iloc[:, 59:100], axis=1)
In [9]:
print(csv.columns.get_loc("YouTubeCodeCourse"))
63
In [10]:
# Drop remaining columns including index postion 63 and onward
csv = csv.drop(csv.iloc[:, 63:], axis=1)
csv.head()
Out[10]:
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther HasChildren HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning ID.x ID.y Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobInterestBackEnd JobInterestDataEngr JobInterestDataSci JobInterestDevOps JobInterestFrontEnd JobInterestFullStack JobInterestGameDev JobInterestInfoSec JobInterestMobile JobInterestOther JobInterestProjMngr JobInterestQAEngr JobInterestUX JobPref JobRelocateYesNo JobRoleInterest JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming ResourceW3S SchoolDegree SchoolMajor StudentDebtOwe
0 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN 15 to 29 minutes Canada Canada software development and IT NaN Employed for wages NaN NaN NaN NaN female NaN NaN 1.0 0.0 1.0 0.0 0.0 0.0 NaN 15.0 02d9465b21e8bd09374b0066fb2d5614 eb78c1c3ac6cd9052aec557065070fbf NaN NaN 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN start your own business NaN NaN NaN English married or domestic partnership 150.0 6.0 1.0 some college credit, no degree NaN NaN
1 34.0 0.0 NaN NaN NaN NaN NaN less than 100,000 NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 35000.0 NaN NaN male NaN NaN 1.0 0.0 1.0 0.0 0.0 1.0 NaN 10.0 5bfef9ecb211ec4f518cfc1d2a6f3e0c 21db37adb60cdcafadfa7dca1b13b6b1 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN work for a nonprofit 1.0 Full-Stack Web Developer in an office with other developers English single, never married 80.0 6.0 1.0 some college credit, no degree NaN NaN
2 21.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN 15 to 29 minutes United States of America United States of America software development and IT NaN Employed for wages NaN 70000.0 NaN NaN male NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 25.0 14f1863afa9c7de488050b82eb3edd96 21ba173828fbe9e27ccebaf4d5166a55 13000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months 1.0 NaN NaN 1.0 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN work for a medium-sized company 1.0 Front-End Web Developer, Back-End Web Develo... no preference Spanish single, never married 1000.0 5.0 NaN high school diploma or equivalent (GED) NaN NaN
3 26.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN I work from home Brazil Brazil software development and IT NaN Employed for wages NaN 40000.0 0.0 NaN male NaN 0.0 1.0 1.0 1.0 1.0 0.0 0.0 40000.0 14.0 91756eb4dc280062a541c25a3d44cfb0 3be37b558f02daae93a6da10f83f0c77 24000.0 0.0 0.0 0.0 1.0 Within the next 6 months 1.0 NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN work for a medium-sized company NaN Front-End Web Developer, Full-Stack Web Deve... from home Portuguese married or domestic partnership 0.0 5.0 NaN some college credit, no degree NaN NaN
4 20.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN Portugal Portugal NaN NaN Not working but looking for work NaN 140000.0 NaN NaN female NaN NaN 0.0 0.0 1.0 NaN 0.0 NaN NaN 10.0 aa3f061a1949a90b27bef7411ecd193f d7c56bbf2c7b62096be9db010e86d96d NaN 0.0 0.0 0.0 NaN Within 7 to 12 months 1.0 NaN NaN NaN 1.0 1.0 NaN 1.0 1.0 NaN NaN NaN NaN work for a multinational corporation 1.0 Full-Stack Web Developer, Information Security... in an office with other developers Portuguese single, never married 0.0 24.0 NaN bachelor's degree Information Technology NaN
In [11]:
csv.iloc[:,:20]
Out[11]:
Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName BootcampRecommend ChildrenNumber CityPopulation CodeEventWorkshops CommuteTime CountryCitizen CountryLive EmploymentField EmploymentFieldOther EmploymentStatus EmploymentStatusOther ExpectedEarning FinanciallySupporting FirstDevJob Gender
0 27.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN 15 to 29 minutes Canada Canada software development and IT NaN Employed for wages NaN NaN NaN NaN female
1 34.0 0.0 NaN NaN NaN NaN NaN less than 100,000 NaN NaN United States of America United States of America NaN NaN Not working but looking for work NaN 35000.0 NaN NaN male
2 21.0 0.0 NaN NaN NaN NaN NaN more than 1 million NaN 15 to 29 minutes United States of America United States of America software development and IT NaN Employed for wages NaN 70000.0 NaN NaN male
3 26.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN I work from home Brazil Brazil software development and IT NaN Employed for wages NaN 40000.0 0.0 NaN male
4 20.0 0.0 NaN NaN NaN NaN NaN between 100,000 and 1 million NaN NaN Portugal Portugal NaN NaN Not working but looking for work NaN 140000.0 NaN NaN female
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
18170 41.0 0.0 NaN NaN NaN NaN 1.0 more than 1 million NaN I work from home Indonesia Indonesia software development and IT NaN Self-employed freelancer NaN NaN 0.0 NaN male
18171 31.0 0.0 NaN NaN NaN NaN 1.0 more than 1 million NaN Less than 15 minutes Nigeria Nigeria transportation NaN Self-employed freelancer NaN 70000.0 1.0 NaN male
18172 39.0 0.0 NaN NaN NaN NaN 3.0 more than 1 million 1.0 45 to 60 minutes South Africa South Africa NaN IT support and website update Employed for wages NaN NaN 0.0 1.0 male
18173 54.0 0.0 NaN NaN NaN NaN 3.0 between 100,000 and 1 million NaN Less than 15 minutes United Kingdom United Kingdom education NaN Employed for wages NaN NaN 0.0 NaN male
18174 50.0 0.0 NaN NaN NaN NaN 2.0 less than 100,000 NaN 15 to 29 minutes United Kingdom United Kingdom health care NaN Employed for wages NaN NaN 0.0 NaN male

18175 rows × 20 columns

If we utilize the following code below we'll get a better understanding of missing data in the columns. There are instances of respondents failing to enter information during the survey. Many columns have missing data, and it's going to be difficult to clean the dataset without removing nearly every row.

In [12]:
# Missing data calculated
series = csv.apply(pd.isnull).sum()/csv.shape[0] * 100

# Columns with less than or equal to 60% missing data points
list = series[series <= 60].index
In [13]:
print(series)
Age                  15.449794
AttendedBootcamp      2.563961
BootcampFinish       94.118294
BootcampLoanYesNo    94.063274
BootcampName         94.778542
                       ...    
MonthsProgramming     6.002751
ResourceW3S          46.272352
SchoolDegree         15.444292
SchoolMajor          51.983494
StudentDebtOwe       81.502063
Length: 63, dtype: float64
In [14]:
# Converts the list of columns we want to use from pandas.index to list
cols_to_use = pd.Index.tolist(list)
cols_to_use.extend(["JobRoleInterest", "ExpectedEarning"])

# Isolates the dataframe down to only preferred columns
csv = csv[cols_to_use]

# Drop id.x and id.y columns
csv = csv.drop(columns=["ID.x","ID.y","ResourceW3S"])
csv
Out[14]:
Age AttendedBootcamp CityPopulation CommuteTime CountryCitizen CountryLive EmploymentField EmploymentStatus Gender HasDebt HasFinancialDependents HasHighSpdInternet HasServedInMilitary HoursLearning Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobPref JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming SchoolDegree SchoolMajor JobRoleInterest ExpectedEarning
0 27.0 0.0 more than 1 million 15 to 29 minutes Canada Canada software development and IT Employed for wages female 1.0 0.0 1.0 0.0 15.0 NaN NaN 0.0 0.0 0.0 NaN start your own business NaN English married or domestic partnership 150.0 6.0 some college credit, no degree NaN NaN NaN
1 34.0 0.0 less than 100,000 NaN United States of America United States of America NaN Not working but looking for work male 1.0 0.0 1.0 0.0 10.0 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months work for a nonprofit in an office with other developers English single, never married 80.0 6.0 some college credit, no degree NaN Full-Stack Web Developer 35000.0
2 21.0 0.0 more than 1 million 15 to 29 minutes United States of America United States of America software development and IT Employed for wages male 0.0 0.0 1.0 0.0 25.0 13000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months work for a medium-sized company no preference Spanish single, never married 1000.0 5.0 high school diploma or equivalent (GED) NaN Front-End Web Developer, Back-End Web Develo... 70000.0
3 26.0 0.0 between 100,000 and 1 million I work from home Brazil Brazil software development and IT Employed for wages male 1.0 1.0 1.0 0.0 14.0 24000.0 0.0 0.0 0.0 1.0 Within the next 6 months work for a medium-sized company from home Portuguese married or domestic partnership 0.0 5.0 some college credit, no degree NaN Front-End Web Developer, Full-Stack Web Deve... 40000.0
4 20.0 0.0 between 100,000 and 1 million NaN Portugal Portugal NaN Not working but looking for work female 0.0 0.0 1.0 0.0 10.0 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months work for a multinational corporation in an office with other developers Portuguese single, never married 0.0 24.0 bachelor's degree Information Technology Full-Stack Web Developer, Information Security... 140000.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
18170 41.0 0.0 more than 1 million I work from home Indonesia Indonesia software development and IT Self-employed freelancer male 1.0 1.0 0.0 0.0 10.0 60000.0 0.0 0.0 0.0 0.0 NaN start your own business NaN Indonesian married or domestic partnership 10.0 1.0 bachelor's degree Telecommunications Technician NaN NaN
18171 31.0 0.0 more than 1 million Less than 15 minutes Nigeria Nigeria transportation Self-employed freelancer male 1.0 1.0 0.0 0.0 1.0 60000.0 0.0 0.0 0.0 1.0 more than 12 months from now work for a nonprofit no preference English divorced 10000.0 1.0 high school diploma or equivalent (GED) NaN DevOps / SysAdmin, Mobile Developer, Pro... 70000.0
18172 39.0 0.0 more than 1 million 45 to 60 minutes South Africa South Africa NaN Employed for wages male 1.0 1.0 0.0 0.0 10.0 1000000.0 0.0 0.0 1.0 1.0 NaN NaN NaN Zulu married or domestic partnership 19.0 3.0 some high school NaN NaN NaN
18173 54.0 0.0 between 100,000 and 1 million Less than 15 minutes United Kingdom United Kingdom education Employed for wages male 0.0 1.0 1.0 0.0 1.0 1000000.0 0.0 0.0 0.0 1.0 NaN freelance NaN English divorced 0.0 5.0 trade, technical, or vocational training NaN NaN NaN
18174 50.0 0.0 less than 100,000 15 to 29 minutes United Kingdom United Kingdom health care Employed for wages male 1.0 1.0 1.0 1.0 5.0 1000000.0 0.0 0.0 0.0 1.0 I haven't decided work for a government no preference English married or domestic partnership NaN 10.0 bachelor's degree Computer and Information Studies Back-End Web Developer, Data Engineer, Data ... NaN

18175 rows × 30 columns

In [15]:
# Count missing data
nulls = csv.apply(pd.isnull).sum()/csv.shape[0] * 100
nulls = nulls.sort_values()
nulls
Out[15]:
IsSoftwareDev                     0.588721
AttendedBootcamp                  2.563961
MonthsProgramming                 6.002751
HoursLearning                     8.038514
MoneyForLearning                  8.792297
Gender                           14.971114
CountryCitizen                   15.367263
HasHighSpdInternet               15.378267
SchoolDegree                     15.444292
Age                              15.449794
CityPopulation                   15.521320
LanguageAtHome                   15.576341
CountryLive                      15.620358
MaritalStatus                    15.625860
HasFinancialDependents           15.658872
IsEthnicMinority                 15.856946
HasDebt                          15.867950
HasServedInMilitary              16.060523
IsReceiveDisabilitiesBenefits    16.247593
EmploymentStatus                 21.072902
JobPref                          25.815681
CommuteTime                      49.127923
IsUnderEmployed                  49.254470
SchoolMajor                      51.983494
JobApplyWhen                     55.224209
JobWherePref                     55.334250
EmploymentField                  55.345254
Income                           58.057772
ExpectedEarning                  60.385144
JobRoleInterest                  61.529574
dtype: float64
In [16]:
csv.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18175 entries, 0 to 18174
Data columns (total 30 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            15367 non-null  float64
 1   AttendedBootcamp               17709 non-null  float64
 2   CityPopulation                 15354 non-null  object 
 3   CommuteTime                    9246 non-null   object 
 4   CountryCitizen                 15382 non-null  object 
 5   CountryLive                    15336 non-null  object 
 6   EmploymentField                8116 non-null   object 
 7   EmploymentStatus               14345 non-null  object 
 8   Gender                         15454 non-null  object 
 9   HasDebt                        15291 non-null  float64
 10  HasFinancialDependents         15329 non-null  float64
 11  HasHighSpdInternet             15380 non-null  float64
 12  HasServedInMilitary            15256 non-null  float64
 13  HoursLearning                  16714 non-null  float64
 14  Income                         7623 non-null   float64
 15  IsEthnicMinority               15293 non-null  float64
 16  IsReceiveDisabilitiesBenefits  15222 non-null  float64
 17  IsSoftwareDev                  18068 non-null  float64
 18  IsUnderEmployed                9223 non-null   float64
 19  JobApplyWhen                   8138 non-null   object 
 20  JobPref                        13483 non-null  object 
 21  JobWherePref                   8118 non-null   object 
 22  LanguageAtHome                 15344 non-null  object 
 23  MaritalStatus                  15335 non-null  object 
 24  MoneyForLearning               16577 non-null  float64
 25  MonthsProgramming              17084 non-null  float64
 26  SchoolDegree                   15368 non-null  object 
 27  SchoolMajor                    8727 non-null   object 
 28  JobRoleInterest                6992 non-null   object 
 29  ExpectedEarning                7200 non-null   float64
dtypes: float64(15), object(15)
memory usage: 4.2+ MB
In [17]:
# New column to indicate year of survey completion
csv["Year"] = 2017
csv2016["Year"] = 2016

# Columns of interest
column_lists = csv.columns.to_list()
column_lists

# Apply column filtering to survey 2016
survey_2016 = csv2016[column_lists]

Dataset merging

In [18]:
# Merge dataframes
combined_survey = pd.concat([csv, survey_2016])

# Merged dataframe length (rows)
print("Number of Rows:")
print(combined_survey.shape[0])
Number of Rows:
33795

JobRoleInterest: "Which one of these careers are you interested in?"

Most of the courses offered on our e-learning platform are for web and mobile development. We need to identify if the sample from the dataset is representative of the population of new coders. One significant limitation to this survey is in regards to the number of rows that contain missing information for JobRoleInterest. Roughly 6 out of 10 observations do not have a response to this question.

It's strange that this many people took the survey neglected to answer this question. In addtion to this question, perhaps another question should have been asked: "What are your goals for learning programming", or something similar.

After merging both dataframes together we ended up with 33,795 rows. For analysis we're going to remove all observations that failed to answer this question. The final dataframe will include only 13,495 rows.

Of these observations we'll notice that career interest heavily leans to web development (including full stack, front end, and back end web development). Many observations also include multiple categories, rather than just one category. We can split each string for each row in the JobRoleInterest column. This will help us understand the number of choices that each person selected.

We can split each occurance of a job category for rows containing multiple categories. To do this we'll have to use pandas.Series.str.split. This approach will help us count every individual job category.

In [19]:
interests = combined_survey["JobRoleInterest"].value_counts(normalize=True) * 100
interests.head(20)
Out[19]:
Full-Stack Web Developer                                                       25.150056
  Front-End Web Developer                                                      13.553168
Back-End Web Developer                                                          6.268989
  Data Scientist / Data Engineer                                                4.786958
  Mobile Developer                                                              3.934791
  User Experience Designer                                                      2.423120
  DevOps / SysAdmin                                                             1.889589
  Product Manager                                                               1.822897
  Data Scientist                                                                1.126343
  Quality Assurance Engineer                                                    0.881808
Game Developer                                                                  0.844757
Information Security                                                            0.681734
Full-Stack Web Developer,   Front-End Web Developer                             0.474250
  Front-End Web Developer, Full-Stack Web Developer                             0.414969
Data Engineer                                                                   0.392738
  User Experience Designer,   Front-End Web Developer                           0.318637
  Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer     0.288996
Back-End Web Developer,   Front-End Web Developer, Full-Stack Web Developer     0.266765
Back-End Web Developer, Full-Stack Web Developer,   Front-End Web Developer     0.266765
Full-Stack Web Developer,   Front-End Web Developer, Back-End Web Developer     0.229715
Name: JobRoleInterest, dtype: float64
In [20]:
# Combination of all job interests
len(interests)
Out[20]:
3214
In [21]:
# New dataframe excluding any missing data from JobRoleInterest column
survey = combined_survey[combined_survey["JobRoleInterest"].notnull()].copy()

# Splits each occurence of a job category
survey["JobRoleInterest"] = survey["JobRoleInterest"].str.split(",")
In [22]:
# Combined dataset (survey) missing values in percentage
(survey.apply(pd.isnull).sum()/survey.shape[0] * 100).sort_values(ascending = False)
Out[22]:
EmploymentField                  61.645054
Income                           59.147833
CommuteTime                      53.864394
IsUnderEmployed                  53.093738
SchoolMajor                      47.143386
MaritalStatus                    39.066321
EmploymentStatus                 14.071878
ExpectedEarning                  10.596517
IsReceiveDisabilitiesBenefits     7.773249
HasServedInMilitary               7.654687
IsEthnicMinority                  7.476843
LanguageAtHome                    7.476843
HasDebt                           7.454613
Age                               7.387921
CityPopulation                    7.365691
CountryLive                       7.321230
HasFinancialDependents            7.306410
SchoolDegree                      7.128566
CountryCitizen                    7.121156
HasHighSpdInternet                7.054465
Gender                            6.595035
MoneyForLearning                  6.587625
HoursLearning                     5.779918
MonthsProgramming                 4.579474
AttendedBootcamp                  1.237495
JobPref                           0.955910
JobWherePref                      0.652093
JobApplyWhen                      0.548351
IsSoftwareDev                     0.229715
JobRoleInterest                   0.000000
Year                              0.000000
dtype: float64
In [23]:
# Fill missing data points with average
survey["ExpectedEarning"] = survey["ExpectedEarning"].fillna(survey["ExpectedEarning"].median())
In [24]:
survey
Out[24]:
Age AttendedBootcamp CityPopulation CommuteTime CountryCitizen CountryLive EmploymentField EmploymentStatus Gender HasDebt HasFinancialDependents HasHighSpdInternet HasServedInMilitary HoursLearning Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed JobApplyWhen JobPref JobWherePref LanguageAtHome MaritalStatus MoneyForLearning MonthsProgramming SchoolDegree SchoolMajor JobRoleInterest ExpectedEarning Year
1 34.0 0.0 less than 100,000 NaN United States of America United States of America NaN Not working but looking for work male 1.0 0.0 1.0 0.0 10.0 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months work for a nonprofit in an office with other developers English single, never married 80.0 6.0 some college credit, no degree NaN [Full-Stack Web Developer] 35000.0 2017
2 21.0 0.0 more than 1 million 15 to 29 minutes United States of America United States of America software development and IT Employed for wages male 0.0 0.0 1.0 0.0 25.0 13000.0 1.0 0.0 0.0 0.0 Within 7 to 12 months work for a medium-sized company no preference Spanish single, never married 1000.0 5.0 high school diploma or equivalent (GED) NaN [ Front-End Web Developer, Back-End Web Deve... 70000.0 2017
3 26.0 0.0 between 100,000 and 1 million I work from home Brazil Brazil software development and IT Employed for wages male 1.0 1.0 1.0 0.0 14.0 24000.0 0.0 0.0 0.0 1.0 Within the next 6 months work for a medium-sized company from home Portuguese married or domestic partnership 0.0 5.0 some college credit, no degree NaN [ Front-End Web Developer, Full-Stack Web De... 40000.0 2017
4 20.0 0.0 between 100,000 and 1 million NaN Portugal Portugal NaN Not working but looking for work female 0.0 0.0 1.0 0.0 10.0 NaN 0.0 0.0 0.0 NaN Within 7 to 12 months work for a multinational corporation in an office with other developers Portuguese single, never married 0.0 24.0 bachelor's degree Information Technology [Full-Stack Web Developer, Information Securi... 140000.0 2017
6 29.0 0.0 between 100,000 and 1 million 30 to 44 minutes United Kingdom United Kingdom NaN Employed for wages female 1.0 0.0 1.0 0.0 16.0 40000.0 NaN 0.0 0.0 0.0 I'm already applying work for a medium-sized company no preference English married or domestic partnership 0.0 12.0 some college credit, no degree NaN [Full-Stack Web Developer] 30000.0 2017
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
15585 32.0 0.0 more than 1 million 40.0 Ukraine Ukraine health care Employed for wages female 0.0 1.0 1.0 0.0 5.0 36000.0 1.0 0.0 0.0 1.0 Within the next 6 months work for a multinational corporation in an office with other developers Russian married or domestic partnership 5.0 2.0 bachelor's degree Linguistics [ Front-End Web Developer] 8400.0 2016
15598 51.0 0.0 less than 100,000 30.0 United States of America United States of America finance Employed for wages male 1.0 1.0 1.0 1.0 30.0 200000.0 0.0 0.0 0.0 0.0 more than 12 months from now work for a medium-sized company in an office with other developers English married or domestic partnership 100.0 12.0 professional degree (MBA, MD, JD, etc.) Investments and Securities [Full-Stack Web Developer] 100000.0 2016
15600 38.0 0.0 more than 1 million 90.0 United States of America United States of America finance Employed for wages male 0.0 1.0 1.0 0.0 6.0 200000.0 0.0 0.0 0.0 0.0 more than 12 months from now work for a startup no preference English married or domestic partnership 500.0 12.0 bachelor's degree Finance [Full-Stack Web Developer] 150000.0 2016
15608 40.0 0.0 more than 1 million 60.0 Australia Australia software development and IT Employed for wages male 1.0 1.0 0.0 0.0 10.0 200000.0 0.0 0.0 0.0 0.0 more than 12 months from now work for a multinational corporation in an office with other developers English married or domestic partnership 0.0 2.0 bachelor's degree Computer Systems Analysis [ DevOps / SysAdmin] 80000.0 2016
15615 28.0 0.0 less than 100,000 7.0 United States of America United States of America food and beverage Employed for wages male 1.0 1.0 1.0 0.0 20.0 200000.0 0.0 0.0 0.0 1.0 I'm already applying work for a medium-sized company from home English married or domestic partnership 1400.0 7.0 associate's degree Computer and Information Systems Security [Full-Stack Web Developer] 50000.0 2016

13495 rows × 31 columns

In [25]:
# Counts each occurence of a particular category
category_count = dict()

# For loop for counting each individual category in the JobRoleInterest column
for categories in survey["JobRoleInterest"]: 
    for category in categories:
        if category in category_count:
            category_count[category] += 1 # counts category key if already present in dictionary
        else:
            category_count[category] = 1 # adds unique category key to dictionary if not already present

# Transforms dictionary to dataframe 
category_count = pd.DataFrame.from_dict(category_count, orient="index", columns= ["Count"])
category_count = category_count.reset_index(level = 0)
category_count = category_count.rename(columns = {"index":"Interests"})
In [26]:
category_count["Interests"].unique()
Out[26]:
array(['Full-Stack Web Developer', '  Front-End Web Developer',
       ' Back-End Web Developer', '   DevOps / SysAdmin',
       '   Mobile Developer', ' Full-Stack Web Developer',
       ' Information Security', '   Front-End Web Developer',
       '   Quality Assurance Engineer', ' Game Developer',
       '   User Experience Designer', '  DevOps / SysAdmin',
       '   Data Scientist', ' Data Engineer', 'Back-End Web Developer',
       'Information Security', '  Data Scientist', '  Mobile Developer',
       '   Product Manager', 'Data Engineer', 'Game Developer',
       '  Product Manager', '  User Experience Designer',
       '  Quality Assurance Engineer', 'Ethical Hacker',
       ' security expert', ' Technical Writer', ' Researcher',
       'Systems Engineer', 'Desktop Applications Programmer', ' Robotics',
       'Non technical ', ' UI Design', 'Software engineer ',
       'email coder', ' Data analyst', ' I dont yet know',
       ' UX developer/designer', ' support scientific resaerch ',
       ' AI and neuroscience', 'Full Stack Software Engineer',
       ' Program Manager', ' Application Support Analyst',
       " This futurist's dream of using some tech in a way that inspires critical amounts of people to influence the changes we need to protect ",
       ' Information Architect', 'Physicist ',
       'Security Business Analyst ', ' Bioinformatics/science ',
       ' creative coder / generative artist/designer',
       ' a job in which I can use coding skills to create valuable portals to advance human rights',
       'Research ', ' Bitcoin/Crypto', 'Embedded hardware',
       'Data/Interactive Journalist', 'Software Engineering',
       ' Software Engineer', ' Business Analyst', 'Network Engineer',
       'Information Developer', 'Java developer', ' Project Management',
       'Machine learning engineer', 'Real-time systems', ' Cybersecurity',
       ' software engineer', 'GIS Developer', 'Research and education',
       ' System Software', 'Full Stack Developer ', 'AI',
       '  Bioinformatics ', ' Data Analyst', 'Urban Planner',
       'Software Engineer', 'full stack developer', ' SWE',
       ' Embedded Developer', ' virtual reality developer',
       ' Journalist/Graphic Designer/Marketing', ' Web Designer',
       'Computer Architect', ' Networking', 'Software Developer',
       ' Software Developer', ' Machine Learning Engineer',
       ' data analyst', ' AI and Machine Learning', ' computer engineer',
       ' Artificial Intelligence', 'Systems Programming',
       'Software Engineer (Computer Science Based)',
       'Technology Management', 'full-stack developer',
       ' Software developer', 'BA or developer', ' User Interface Design',
       'System Engineer', 'Network', ' Analyst', ' Machine Learning ',
       'Pharmacy tech', 'data journalist / data visualist', 'Desings',
       ' Infrastructure Architect ', ' Tech art',
       ' Technology-Business Liaison', ' Product Designer',
       'Front-End Web Designer', 'Document Controller',
       ' Software enginner', ' programmer', 'undeceided',
       'Pharmaceutical industry', ' Information Technology',
       ' Library Developer', ' Desktop Application Developer',
       ' Machine Learning', ' Operating Systems', ' Compilers', ' etc...',
       ' GIS Database Admin', ' designer',
       'Support Engineer or API Support', ' Software engineer',
       ' Python Developer', ' Bioinformatics',
       'Robotics Process Automation Specialist', 'Data visualisation',
       ' Desktop applications developer',
       'All - whatever is required to develop tools to revolutionize the mechanical engineering process',
       'Digital Humanitites', ' User Interface Designer',
       'Artificial Intelligence', ' Software Development', 'Programming',
       'Web development ', ' Marketing', 'Financial Services',
       'software developer', 'Natural Language Processing',
       ' Entreprenuer / Web Dev Hustler ', ' Machine Learning Engineer ',
       'Marketing Automation ', 'AI Developer', ' network admin',
       'Front end', ' back end', ' game', ' web', ' mobile developer',
       'Not sure!', ' Anything that engages me',
       "i don't know what the difference is between most of these soz lol",
       'Unsure', 'Any of them.', 'Not sure yet', 'Not Sure Yet',
       'Not sure', ' i dunno!!!!', ' milatary engineer', ' SEO',
       'Software engineer', 'Astrophysicist', ' Journalist',
       'philosopher', ' Java developer', 'Desktop Applications',
       ' Programmer', 'IoT Developer', 'Systems Programmer',
       'Web Designer', "Don't know yet", ' Artificial intelligence',
       ' Artificial Intelligence Engineer', 'Developer Evangelist',
       ' Bioinformatitian', ' IoT', ' Entrepreneur',
       ' I am interested in Game Development', ' Mobile Development',
       ' Web Design', ' Front End Web Development', 'programmer',
       'Data Reporter', 'Not Sure', 'Web developer',
       'User Interface Designer', 'Robotics and AI Engineer',
       ' Ethical Hacker', ' Artificial Intelligence engineer',
       ' Scientific Programming',
       ' Software Developer or Front-End Web Developer', ' UI Designer',
       ' Campaign Manager', ' AI Engineer', 'Software Specialist ',
       ' Project Manager', ' Growth Hacker', 'Research', 'idk',
       ' Founder', 'Software Engineers', 'VR Technology developer',
       ' developer', ' plc', 'Ceo', ' Tech lobbiest',
       'Quant (Algorithmic Trader)', 'Machine learning and AI ',
       'Project manager', 'undecided', ' Databases', 'Project Manager',
       'Cloud computing ', 'Software Developper', 'College professor',
       ' System Administrator/Network', ' Software Projects Manager',
       'Teacher. Teaching students to code. ', 'Education',
       'code developer...in whatever format', ' front-end', ' back-end',
       ' app dev etc.',
       'improving in my current career as a Learning technologist',
       'Informatician', ' Artificial Intelligence ', 'lab scientist',
       'Data Visualization Specialist', "I don't know yet!",
       "I'm just learning code to increase my skill-set. I see it as a literacy issue.",
       ' Teacher',
       ' Criminal Defense Attorney-- focusing on cyber crimes ',
       'Remote Support', 'non-programmer', ' IT specialist ',
       '  Data Scientist / Data Engineer'], dtype=object)

There are many different "job interests" throughout the survey, and it's obvious that respondents were able to write-in their own response to the question. The biggest downfall of this approach is that we end up with many different variations of the same career, different spelling and capitalization, and unknown responses.

Python-Pandas counts these all as unique values so it is more difficult to get a completely accurate count. For example, different variations of "Front-End Developer". We do see some extra whitespace scattered throughout some of the values too. In order to clean up some of the values in this dataframe we'll strip any extra white space and change everything to lower case font.

In [27]:
# Strips whitespace, changes to lower case 
category_count["Interests"] = category_count["Interests"].str.lstrip().str.rstrip().str.lower()

# Groupy by interests and adds up the number of occurences
category_count.groupby("Interests").sum().sort_values(by = "Count", ascending= False).head(50)
Out[27]:
Count
Interests
full-stack web developer 6769
front-end web developer 4912
back-end web developer 3476
mobile developer 2719
user experience designer 1744
data scientist 1643
game developer 1628
information security 1326
data engineer 1248
devops / sysadmin 1146
product manager 1005
data scientist / data engineer 646
quality assurance engineer 602
software engineer 16
software developer 8
artificial intelligence 5
data analyst 5
programmer 4
machine learning engineer 4
desktop application developer 3
not sure 3
not sure yet 3
project manager 3
machine learning 2
product designer 2
web designer 2
full stack developer 2
research 2
ethical hacker 2
user interface designer 2
researcher 2
business analyst 2
bioinformatics 2
undecided 2
unsure 2
java developer 2
artificial intelligence engineer 2
python developer 1
quant (algorithmic trader) 1
project management 1
remote support 1
research and education 1
real-time systems 1
philosopher 1
programming 1
program manager 1
mobile development 1
natural language processing 1
network 1
network admin 1
In [28]:
# Career interest frequency
group_category = category_count.groupby("Interests").sum().sort_values(by = "Count", ascending= False).head(50)

# Plot results
fig, ax = plt.subplots(figsize = (10,8))
plt.barh(group_category.index[:15], group_category["Count"][:15], height = .6, color = "grey")

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()

# Title
plt.title("Career Interests", size = 20, loc = "left", x = -0.28, y = 1.08)

# X label
plt.text(-1950, -1.6,"Frequency", size = 14, color = "grey")

plt.show()

After some data cleaning we can see that it's not perfect, but we definitely can tell that we have quite a range of interests ranging from primarily web-development to data science, game development and many other interests.

While we have many mixed interests, this is a good way to show that individuals might be interested in other topics than just web-development. We also see that some individuals responded with different versions of "I don't know". While it would be possible to remove any rows with this answers, given how few there are it's unlikely to affect our analysis either way.

Age and Gender

In [29]:
# Gender frequency (Freecodecamp)
genders = survey["Gender"].value_counts(normalize=True, dropna=False) * 100

# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
genders.plot(kind = "bar", color = "grey", width = .58)

# Title
plt.title("Gender representation (FreeCodeCamp)", size = 19, loc = "left", x = -0.1, y = 1.02)

# Remove spines
plt.gca().spines[["top", "left", "right"]].set_visible(False)

# X and Y labels
plt.ylabel("Frequency (percent)", color = "grey", size = 14, loc = "top")
plt.xlabel("Gender", color = "grey", size = 14, loc = "left")

# X and Y ticks
plt.yticks(size = 12)
plt.xticks(rotation = 0, size = 12)

plt.show()

We'll introduce a similar survey conducted in 2018 by Stack Exchange (a popular forum for asking and answering software/programming related questions). We'll perform data cleaning on this dataset shortly, but first we can get an overview of its contents and how its demographics compare to Freecodecamp's.

In [30]:
# Gender frequency (Stack Exchange)
genders_stk_exchange = exchange["Gender"].value_counts(normalize=True, dropna=False) * 100

# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
genders_stk_exchange[:3].plot(kind = "bar", color = "grey", width = .57)

# Title
plt.title("Gender representation (Stack Exchange)", size = 19, loc = "left", x = -0.1, y = 1.02)

# Remove spines
plt.gca().spines[["top", "left", "right"]].set_visible(False)

# X and Y labels
plt.ylabel("Frequency (percent)", color = "grey", size = 14, loc = "top")
plt.xlabel("Gender", color = "grey", size = 14, loc = "left")

# X and Y ticks
plt.yticks(size = 12)
plt.xticks(rotation = 0, size = 12)

plt.show()
In [31]:
# Age distribution plotted
fig, ax = plt.subplots(figsize = (12,8))
survey["Age"].hist(bins = 20, color = "grey")

# Title
plt.title("Age Groups (FreeCodeCamp)", size = 19, loc = "left", x = -0.1, y = 1.02)

# Remove gridlines
ax.grid(False)

# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)

# X and Y labels
plt.ylabel("# of observations", color = "grey", size = 14, loc = "top")
plt.xlabel("Age", color = "grey", size = 14, loc = "left")

# X and Y ticks
plt.yticks(size = 12)
plt.xticks(size = 12)

# Text
plt.text(32.5,2700,"Most new programmers\nare in their early 20s to early 30s", size = 14, color = "maroon")

# Main demographic highlighted
plt.axvspan(survey["Age"].quantile(0.25), survey["Age"].quantile(0.75), ymax=1000, color = "maroon", alpha = 0.4)

plt.show()
In [32]:
# Stack exchange age groups

# Color assignment
colors = ["grey","grey", "maroon", "grey", "grey", "grey"]

# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
ages = exchange["Age"].value_counts().iloc[[4,1,0,2,3,5]].plot.bar(width = 0.65, color = colors)

# Remove spines
plt.gca().spines[["top", "left", "right"]].set_visible(False)

# Title
plt.title("Age Groups (Stack Exchange)", size = 19, loc = "left",x = -0.1, y = 1.02)

# X and Y lables
plt.ylabel("# of observations", color = "grey", size = 14, loc = "top")
plt.xlabel("Age", color = "grey", size = 14, loc = "left")

# X and Y ticks
plt.yticks(size = 12, color = "grey")
plt.xticks(size = 11, rotation = 0, color = "grey")

# Most frequent age group highlighted
plt.gca().get_xticklabels()[2].set_color("maroon")

plt.show()

Country Representation

In [33]:
# Freecodecamp countries
# Country frequency (freecodecamp)
countries = survey["CountryLive"].value_counts(normalize=True) * 100
# Frequency table to dataframe
countries = pd.Series.to_frame(countries).reset_index()
# Rename dataframe columns
countries = countries.rename(columns={"index":"Country","CountryLive":"Percentage"})

#------------------------------------------------------------------------------------------------#

# Stack Exchange Countries
# Country frequency (Stack Exchange)
countries_stack = exchange["Country"].value_counts(normalize=True) * 100
# Frequency table to dataframe
countries_stack = pd.Series.to_frame(countries_stack).reset_index()
# Rename dataframe columns
countries_stack = countries_stack.rename(columns={"index":"Country","Country":"Percentage"})

#---------------------------------------------------------------------------------------------------#

# Plot results (FreeCodeCamp)

# Color assignment
colors = ["maroon","maroon","maroon","maroon","grey","grey","grey","grey","grey","grey"]

fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(countries["Country"][:10], countries["Percentage"][:10], color = colors, height= 0.65)

# Title
plt.title("Country Representation (FreeCodeCamp)", loc = "left", size = 18, x = -0.3, y = 1.08)

# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Text
plt.text(-15.2, -1.4,"Frequency (in percent)", size = 14, color = "grey")

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")

# Top 4 countries highlighted
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")
plt.gca().get_yticklabels()[2].set_color("maroon")
plt.gca().get_yticklabels()[3].set_color("maroon")

plt.show()


# Plot results (Stack Exchange)

# Color Assignment
colors = ["maroon","maroon","#D6A0A9","maroon","maroon","grey","grey","grey","grey","grey"]

fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(countries_stack["Country"][:10], countries_stack["Percentage"][:10], color = colors, height= 0.6)

# Title
plt.title("Country Representation (Stack Exchange)", loc = "left", size = 18, x = -0.23, y = 1.09)

# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Text
plt.text(-4.9, -1.4,"Frequency (in percent)", size = 14, color = "grey")

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")

# Highlight top 5 countries
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")
plt.gca().get_yticklabels()[2].set_color("#D6A0A9") # Germany
plt.gca().get_yticklabels()[3].set_color("maroon")
plt.gca().get_yticklabels()[4].set_color("maroon")

plt.show()

Education levels

In [34]:
# FreeCodeCamp
# School degree frequency (Freecodecamp)
code_camp_edu = survey["SchoolDegree"].value_counts(normalize=True) * 100
# Frequency table to dataframe
code_camp_edu = pd.Series.to_frame(code_camp_edu).reset_index()
# Rename dataframe columns
code_camp_edu = code_camp_edu.rename(columns={"index":"School Degree","SchoolDegree":"Percentage"})

# Color assignment
colors = ["maroon","maroon","grey","grey","grey","grey","grey","grey","grey","grey"]

# Plot results
fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(code_camp_edu["School Degree"][:10], code_camp_edu["Percentage"][:10], color = colors, height= 0.62)

# Title
plt.title("School Degree Representation (FreeCodeCamp)", loc = "left", size = 18, x = -0.52, y = 1.1)

# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")

# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Text
plt.text(-12, -1.6,"Frequency (in percent)", size = 14, color = "grey")

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")

# Highlight top 2 degrees
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")

plt.show()


# Stack Exchange
# Replace string values
exchange["FormalEducation"] = exchange["FormalEducation"].replace({"Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)":"High School"})

# School degree frequency (Stack Exchange)
stk_exchange_edu = exchange["FormalEducation"].value_counts(normalize=True) * 100
# Frequency table to dataframe
stk_exchange_edu = pd.Series.to_frame(stk_exchange_edu).reset_index()
# Rename dataframe columns
stk_exchange_edu = stk_exchange_edu.rename(columns={"index":"School Degree","FormalEducation":"Percentage"})

# Color assignment
colors = ["maroon","maroon","grey","grey","grey","grey","grey","grey","grey","grey"]

# Plot results
fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(stk_exchange_edu["School Degree"], stk_exchange_edu["Percentage"], color = colors, height= 0.62)

# Title
plt.title("School Degree Representation (Stack Exchange)", loc = "left", size = 18, x = -0.72, y = 1.1)

# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")

# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Text
plt.text(-12, -1.5,"Frequency (in percent)", size = 14, color = "grey")

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")

# Highlight top 2 degrees
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")
plt.show()

Thus far we have done the following:

  1. Read in the dataframes
  2. Investigated missing data
  3. Selected appropriate columns for analysis
  4. Merged 2016 and 2017 Freecodecamp surveys into one dataframe
  5. Filtered out all observations that are missing JobRoleInterest input
  6. Plotted the frequency of JobRoleInterest categories
  7. Plotted the frequency of age and genders for both dataframes (freecodecamp & stack exchange)
  8. Plotted the frequency of countries for both dataframes
  9. Plotted education levels

Both datasets share similar a similar distribution concerning age and gender. Men consist of the majority of respondents of new programmers (70 %, women at 20%).

The stack exchange survey is consisted primarily of STEM careers, and the distribution of gender is even more pronounced. Men represent nearly 60% of respondents, nans (unknown, missing data) at roughly 35% and women at only around 5%.

Age distribution is roughly the same too. New programmers are most likely to be in their early 20s to early 30s, and stack exchange survey participants are usually 25 to 34 years old.

Country representation between both surveys is about the same. A majority of survey participants are from the United States, followed by India in both examples. Countries with the highest participation are English-Speaking countries (except for Germany in Stack Exchange).

Bachelor's degrees are the most common degree held by respondents from both surveys.

We've seen a high level overview of the data. To provide customers with the most relevant training possible, we need to discover why people decide to learn a new skill like programming.

We'll provide the several charts and data that we believe supports the idea that new programmers are motivated by income and career opportunities. While only 40% of respondents answered the JobRoleInterest question; 13,495 observations is more than enough to get a representative sample. There are many different career paths utilizing programming and tech skills that respondents are interested in.

Job Benefits and Satisfaction

Participants were asked the following questions regarding employment opportunities:

  1. "Imagine that you are assessing a potential job opportunity. Please rank the following aspects of the job opportunity in order of importance , where 1 is the most important and 10 is the least important.

  2. "Now, imagine you are assessing a job's benefits package. Please rank the following aspects of a job's benefits package from most to least important to you, where 1 is most important and 11 is least important.

By calculating the job aspects and benefits, on average the most important values should have a lower score (since 1 is most important, and 10 is least important). Before this calculation, we'll perform a bit of data cleaning on the stack exchange dataset.

In [35]:
# Rename current job related columns from stack exchange dataset
# Currency related columns
currency = exchange.columns[51:56].tolist()

# Columns up to index 38
columns = exchange.columns[:38].tolist()

# Age and gender columns
columns.extend(["Gender", "Age"])

# Add currency related columns to list
for i in currency:
    columns.append(i)

# Isolates dataframe down to columns from list "columns"
stk_exchange = exchange[columns].copy()

# Rename job aspects and job benefits columns for easier comprehension
rename_cols = {
                "AssessJob1":"Industry_working_in",
                "AssessJob2":"Company_funding",
                "AssessJob3":"Department_working_in",
                "AssessJob4":"Technologies/Frameworks",
                "AssessJob5":"Compensation_and_benefits",
                "AssessJob6":"Company_culture",
                "AssessJob7":"WFH",
                "AssessJob8":"Professional_development",
                "AssessJob9":"Company_diversity",
                "AssessJob10":"Product_impact",
                "AssessBenefits1":"Compensation",
                "AssessBenefits2":"Stock_options",
                "AssessBenefits3":"Health_insurance",
                "AssessBenefits4":"Parental_leave",
                "AssessBenefits5":"Fitness_wellness_benefit",
                "AssessBenefits6":"Retirement",
                "AssessBenefits7":"Meals/snacks",
                "AssessBenefits8":"Computer/office_equipment",
                "AssessBenefits9":"Childcare_benefit",
                "AssessBenefits10":"Transportaion_benefit",
                "AssessBenefits11":"Conference/education_budget"
                }

exchange = exchange.rename(columns=rename_cols)

# Isolate rows only containing following countries listed below
stk_countries = stk_exchange[stk_exchange["Country"].str.contains("United States|India|United Kingdom|Canada", na = False)]
len(stk_countries["Country"])
Out[35]:
43644
In [36]:
exchange
Out[36]:
Respondent Hobby OpenSource Country Student Employment FormalEducation UndergradMajor CompanySize DevType YearsCoding YearsCodingProf JobSatisfaction CareerSatisfaction HopeFiveYears JobSearchStatus LastNewJob Industry_working_in Company_funding Department_working_in Technologies/Frameworks Compensation_and_benefits Company_culture WFH Professional_development Company_diversity Product_impact Compensation Stock_options Health_insurance Parental_leave Fitness_wellness_benefit Retirement Meals/snacks Computer/office_equipment Childcare_benefit Transportaion_benefit Conference/education_budget JobContactPriorities1 JobContactPriorities2 JobContactPriorities3 JobContactPriorities4 JobContactPriorities5 JobEmailPriorities1 JobEmailPriorities2 JobEmailPriorities3 JobEmailPriorities4 JobEmailPriorities5 JobEmailPriorities6 JobEmailPriorities7 UpdateCV Currency Salary SalaryType ConvertedSalary CurrencySymbol CommunicationTools TimeFullyProductive EducationTypes SelfTaughtTypes TimeAfterBootcamp HackathonReasons AgreeDisagree1 AgreeDisagree2 AgreeDisagree3 LanguageWorkedWith LanguageDesireNextYear DatabaseWorkedWith DatabaseDesireNextYear PlatformWorkedWith PlatformDesireNextYear FrameworkWorkedWith FrameworkDesireNextYear IDE OperatingSystem NumberMonitors Methodology VersionControl CheckInCode AdBlocker AdBlockerDisable AdBlockerReasons AdsAgreeDisagree1 AdsAgreeDisagree2 AdsAgreeDisagree3 AdsActions AdsPriorities1 AdsPriorities2 AdsPriorities3 AdsPriorities4 AdsPriorities5 AdsPriorities6 AdsPriorities7 AIDangerous AIInteresting AIResponsible AIFuture EthicsChoice EthicsReport EthicsResponsible EthicalImplications StackOverflowRecommend StackOverflowVisit StackOverflowHasAccount StackOverflowParticipate StackOverflowJobs StackOverflowDevStory StackOverflowJobsRecommend StackOverflowConsiderMember HypotheticalTools1 HypotheticalTools2 HypotheticalTools3 HypotheticalTools4 HypotheticalTools5 WakeTime HoursComputer HoursOutside SkipMeals ErgonomicDevices Exercise Gender SexualOrientation EducationParents RaceEthnicity Age Dependents MilitaryUS SurveyTooLong SurveyEasy
0 1 Yes No Kenya No Employed part-time Bachelor’s degree (BA, BS, B.Eng., etc.) Mathematics or statistics 20 to 99 employees Full-stack developer 3-5 years 3-5 years Extremely satisfied Extremely satisfied Working as a founder or co-founder of my own c... I’m not actively looking, but I am open to new... Less than a year ago 10.0 7.0 8.0 1.0 2.0 5.0 3.0 4.0 9.0 6.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 1.0 4.0 2.0 5.0 5.0 6.0 7.0 2.0 1.0 4.0 3.0 My job status or other personal status changed NaN NaN Monthly NaN KES Slack One to three months Taught yourself a new language, framework, or ... The official documentation and/or standards fo... NaN To build my professional network Strongly agree Strongly agree Neither Agree nor Disagree JavaScript;Python;HTML;CSS JavaScript;Python;HTML;CSS Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... AWS;Azure;Linux;Firebase AWS;Azure;Linux;Firebase Django;React Django;React Komodo;Vim;Visual Studio Code Linux-based 1 Agile;Scrum Git Multiple times per day Yes No NaN Strongly agree Strongly agree Strongly agree Saw an online advertisement and then researche... 1.0 5.0 4.0 7.0 2.0 6.0 3.0 Artificial intelligence surpassing human intel... Algorithms making important decisions The developers or the people creating the AI I'm excited about the possibilities more than ... No Yes, and publicly Upper management at the company/organization Yes 10 (Very Likely) Multiple times per day Yes I have never participated in Q&A on Stack Over... No, I knew that Stack Overflow had a jobs boar... Yes NaN Yes Extremely interested Extremely interested Extremely interested Extremely interested Extremely interested Between 5:00 - 6:00 AM 9 - 12 hours 1 - 2 hours Never Standing desk 3 - 4 times per week Male Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) Black or of African descent 25 - 34 years old Yes NaN The survey was an appropriate length Very easy
1 3 Yes Yes United Kingdom No Employed full-time Bachelor’s degree (BA, BS, B.Eng., etc.) A natural science (ex. biology, chemistry, phy... 10,000 or more employees Database administrator;DevOps specialist;Full-... 30 or more years 18-20 years Moderately dissatisfied Neither satisfied nor dissatisfied Working in a different or more specialized tec... I am actively looking for a job More than 4 years ago 1.0 7.0 10.0 8.0 2.0 5.0 4.0 3.0 6.0 9.0 1.0 5.0 3.0 7.0 10.0 4.0 11.0 9.0 6.0 2.0 8.0 3.0 1.0 5.0 2.0 4.0 1.0 3.0 4.0 5.0 2.0 6.0 7.0 I saw an employer’s advertisement British pounds sterling (£) 51000 Yearly 70841.0 GBP Confluence;Office / productivity suite (Micros... One to three months Taught yourself a new language, framework, or ... The official documentation and/or standards fo... NaN NaN Agree Agree Neither Agree nor Disagree JavaScript;Python;Bash/Shell Go;Python Redis;PostgreSQL;Memcached PostgreSQL Linux Linux Django React IPython / Jupyter;Sublime Text;Vim Linux-based 2 NaN Git;Subversion A few times per week Yes Yes The website I was visiting asked me to disable it Somewhat agree Neither agree nor disagree Neither agree nor disagree NaN 3.0 5.0 1.0 4.0 6.0 7.0 2.0 Increasing automation of jobs Increasing automation of jobs The developers or the people creating the AI I'm excited about the possibilities more than ... Depends on what it is Depends on what it is Upper management at the company/organization Yes 10 (Very Likely) A few times per month or weekly Yes A few times per month or weekly Yes No, I have one but it's out of date 7 Yes A little bit interested A little bit interested A little bit interested A little bit interested A little bit interested Between 6:01 - 7:00 AM 5 - 8 hours 30 - 59 minutes Never Ergonomic keyboard or mouse Daily or almost every day Male Straight or heterosexual Bachelor’s degree (BA, BS, B.Eng., etc.) White or of European descent 35 - 44 years old Yes NaN The survey was an appropriate length Somewhat easy
2 4 Yes Yes United States No Employed full-time Associate degree Computer science, computer engineering, or sof... 20 to 99 employees Engineering manager;Full-stack developer 24-26 years 6-8 years Moderately satisfied Moderately satisfied Working as a founder or co-founder of my own c... I’m not actively looking, but I am open to new... Less than a year ago NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 5 No No United States No Employed full-time Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... 100 to 499 employees Full-stack developer 18-20 years 12-14 years Neither satisfied nor dissatisfied Slightly dissatisfied Working as a founder or co-founder of my own c... I’m not actively looking, but I am open to new... Less than a year ago NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A recruiter contacted me U.S. dollars ($) NaN NaN NaN NaN NaN Three to six months Completed an industry certification program (e... The official documentation and/or standards fo... NaN NaN Disagree Disagree Strongly disagree C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell SQL Server;Microsoft Azure (Tables, CosmosDB, ... SQL Server;Microsoft Azure (Tables, CosmosDB, ... Azure Azure NaN Angular;.NET Core;React Visual Studio;Visual Studio Code Windows 2 Agile;Kanban;Scrum Git Multiple times per day Yes Yes The ad-blocking software was causing display i... Neither agree nor disagree Somewhat agree Somewhat agree Stopped going to a website because of their ad... NaN NaN NaN NaN NaN NaN NaN Artificial intelligence surpassing human intel... Artificial intelligence surpassing human intel... A governmental or other regulatory body I don't care about it, or I haven't thought ab... No Yes, but only within the company Upper management at the company/organization Yes 10 (Very Likely) A few times per week Yes A few times per month or weekly Yes No, I have one but it's out of date 8 Yes Somewhat interested Somewhat interested Somewhat interested Somewhat interested Somewhat interested Between 6:01 - 7:00 AM 9 - 12 hours Less than 30 minutes 3 - 4 times per week NaN I don't typically exercise Male Straight or heterosexual Some college/university study without earning ... White or of European descent 35 - 44 years old No No The survey was an appropriate length Somewhat easy
4 7 Yes No South Africa Yes, part-time Employed full-time Some college/university study without earning ... Computer science, computer engineering, or sof... 10,000 or more employees Data or business analyst;Desktop or enterprise... 6-8 years 0-2 years Slightly satisfied Moderately satisfied Working in a different or more specialized tec... I’m not actively looking, but I am open to new... Between 1 and 2 years ago 8.0 5.0 7.0 1.0 2.0 6.0 4.0 3.0 10.0 9.0 1.0 10.0 2.0 4.0 8.0 3.0 11.0 7.0 5.0 9.0 6.0 2.0 1.0 4.0 5.0 3.0 7.0 3.0 6.0 2.0 1.0 4.0 5.0 My job status or other personal status changed South African rands (R) 260000 Yearly 21426.0 ZAR Office / productivity suite (Microsoft Office,... Three to six months Taken a part-time in-person course in programm... The official documentation and/or standards fo... NaN NaN Strongly agree Agree Strongly disagree C;C++;Java;Matlab;R;SQL;Bash/Shell Assembly;C;C++;Matlab;SQL;Bash/Shell SQL Server;PostgreSQL;Oracle;IBM Db2 PostgreSQL;Oracle;IBM Db2 Arduino;Windows Desktop or Server Arduino;Windows Desktop or Server NaN NaN Notepad++;Visual Studio;Visual Studio Code Windows 2 Evidence-based software engineering;Formal sta... Zip file back-ups Weekly or a few times per month No NaN NaN Somewhat agree Somewhat agree Somewhat disagree Clicked on an online advertisement;Saw an onli... 2.0 3.0 4.0 6.0 1.0 7.0 5.0 Algorithms making important decisions Algorithms making important decisions The developers or the people creating the AI I'm excited about the possibilities more than ... No Yes, but only within the company Upper management at the company/organization Yes 10 (Very Likely) Daily or almost daily Yes Less than once per month or monthly No, I knew that Stack Overflow had a jobs boar... No, I know what it is but I don't have one NaN Yes Extremely interested Extremely interested Extremely interested Extremely interested Extremely interested Before 5:00 AM Over 12 hours 1 - 2 hours Never NaN 3 - 4 times per week Male Straight or heterosexual Some college/university study without earning ... White or of European descent 18 - 24 years old Yes NaN The survey was an appropriate length Somewhat easy

98850 101513 Yes Yes United States NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
98851 101531 No Yes Spain Yes, full-time Not employed, but looking for work NaN NaN NaN Back-end developer;Front-end developer 0-2 years 0-2 years NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
98852 101541 Yes Yes India Yes, full-time Employed full-time Bachelor’s degree (BA, BS, B.Eng., etc.) NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
98853 101544 Yes No Russian Federation No Independent contractor, freelancer, or self-em... Some college/university study without earning ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
98854 101548 Yes Yes Cambodia NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

98855 rows × 129 columns

These benefits and aspects are measured by current employees working in STEM fields. So we have to be careful to not assume these ratings directly relate to new programmers that participated in FreeCodeCamp's survey (as many of these respondents do not work in software/tech jobs).

However, if the same questions were asked by FreeCodeCamp, it's probable that we would see similar results. Therefore, if we use the stack exchange survey as proxy, compensation and health insurance are the most important to job applicants, or those interested in switching jobs. Some of the least important benefits include childcare, parental leave or a fitness/wellness benefit.

Job aspects describe how job candidates view a potential job opportunity, and the particular make-up of an organization. Respondents rated pay and benefits (which for some reason is listed as a benefit and an aspect), the technologies or programs used, career mobility, and the company culture higher than other aspects.

In [37]:
# Slice dataset to contain only job aspect columns
job_assessment = exchange.iloc[:,17:27]

# Constructs new dataframe of column averages
assessments = pd.Series.to_frame(job_assessment.mean(axis=0).sort_values(ascending=False)) # Calculate averages along each column
# Assign index name
assessments.index.name = "Aspects"
# Reset index
assessments.reset_index()

#---------------------------------------------------------------------------------------------------------------------------------#
# Slice dataset to contain only job aspect columns
benefits = exchange.iloc[:,27:38]
# Constructs new dataframe of column averages
job_benefits = pd.Series.to_frame(benefits.mean(axis=0).sort_values(ascending=False)) # Calculate averages along each column
# Assign index name
job_benefits.index.name = "Benefits"
# Reset index
job_benefits.reset_index()

#----------------------------------------------------------------------------------------------------------------------------------#

# Plot results
# If looking for a new job, rate importance of job aspects from 1(most important) to 11(least important)
# Color assignment
colors = ["grey","grey","grey","grey","grey","grey","grey","grey","grey","maroon","maroon"]

fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(job_benefits.index, job_benefits[0], color = colors, height= 0.62)

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# X axis top
ax.xaxis.tick_top()

# Title
plt.title("Job benefits", size = 19, loc = "left", x= -0.35, y = 1.16)

# Text
plt.text(-3.2,12,"Rating (1 most important, 11 least important), average", color = "grey", size = 14)

# X and Y ticks
plt.yticks(size = 14, color = "grey")
plt.xticks(size = 13, color = "grey")

# Highlight top 2 benefits
plt.gca().get_yticklabels()[-1].set_color("maroon")
plt.gca().get_yticklabels()[-2].set_color("maroon")
plt.show()

# Plot results
# If looking for a new job, rate importance of job aspects from 1(most important) to 10(least important)
# Color assignment
colors = ["grey","grey","grey","grey","grey","grey","maroon","maroon","maroon","maroon"]

fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(assessments.index, assessments[0], color = colors, height= 0.6)

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# X axis top
ax.xaxis.tick_top()

# Title
plt.title("Job aspects", size = 19, loc = "left", x= -0.4, y = 1.16)

# Text
plt.text(-3.2,11,"Rating (1 most important, 10 least important), average", color = "grey", size = 14)

# X and Y ticks
plt.yticks(size = 14, color = "grey")
plt.xticks(size = 13, color = "grey")

# Highlight top 4 job aspects
plt.gca().get_yticklabels()[-1].set_color("maroon")
plt.gca().get_yticklabels()[-2].set_color("maroon")
plt.gca().get_yticklabels()[-3].set_color("maroon")
plt.gca().get_yticklabels()[-4].set_color("maroon")

plt.show()

Income/Financial Situations

Income: Respondents were asked their current yearly income.

ExpectedEarning: "About how much money do you expect to earn per year at your first developer job, in US dollars?"

Has Debt: The question asked was "Do you have any debt?"

In a high level overview we'll see that the median and average salary of new programmers is less than \$50,000 dollars(US). We'll see that new programmers expect to earn about \$15,000 to \$20,000 more in their new tech/software careers than what they currently earn.

In [38]:
# Income distribution
# Plot results
fig, ax = plt.subplots(figsize = (14,10))
survey["Income"].plot.hist(bins = 120, color = "grey", xlim = (0,250000))

# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)

# Title
plt.title("Income distribution of survey respondents\n(All countries)",loc= "left", size = 18, y = 1.02)

# Average and median income
plt.axvline(survey["Income"].mean(), color = "red", alpha = 0.5, linewidth = 3)
plt.axvline(survey["Income"].median(), color = "blue", alpha = 0.5, linewidth = 3)

# Misc. Text
plt.text(41000, 850, " Average \n Income", size = 15, color = "red")
plt.text(14000, 850, " Median \n Income", size = 15, color = "blue")

# X and Y labels
plt.ylabel("Frequency",size = 15, loc = "top", color ="grey")
plt.xlabel("Income, Yearly (US dollars)", size = 15, loc = "left", color ="grey")

# X and Y ticks
plt.yticks(size = 14)
plt.xticks(size = 13)

plt.show()
In [39]:
# Difference between current income and expected income
fig, ax = plt.subplots(figsize = (13,10))

# Freecodecamp survey expected earning distribution
survey["ExpectedEarning"].plot.kde(xlim = (0, 200000), color = "#ED7E00", linewidth = 3)

# Freecodecamp survey current income distribution
survey["Income"].plot.kde(color = "#4B86C1", linewidth = 3)

# Title
plt.title("Current earnings vs. Expected earnings\n(All countries)", size = 18, loc = "left", y = 1.05)

# X and Y ticks
plt.xticks(size = 14)
plt.yticks(size = 14)

# X and Y labels
plt.ylabel("Density (Probability)", size = 14, color = "grey", loc = "top")
plt.xlabel("Income, Yearly (US dollars)", size = 14, color = "grey", loc = "left")

# Remove spines
plt.gca().spines[["right", "top"]].set_visible(False)

# Misc. text
plt.text(x = 0.01, y = 0.84, s="Income: Freecodecamp", color = "#4B86C1", size = 13, transform=ax.transAxes)
plt.text(x = 0.29, y = .90, s="Desired Income", color = "#ED7E00", size = 13, transform=ax.transAxes)
plt.text(0.55,0.85,"""Typically, survey participants expect to earn\n\$15,000 to \$20,000 more in their new career,
compared to their current income""", color = "grey", size = 14, transform=ax.transAxes)

# X and Y ticks
plt.yticks(size = 13)
plt.xticks(size = 13)

plt.show()

We can find each person's desired salary increase (relative to their current income, as a percentage) by utilizing the following formula:

Increase = New Number - Original Number

% increase = Increase / Original Number x 100

Since we have missing data points in both columns we expect to see negative percentages in the new column that we create. Missing data won't be dropped, instead we'll ignore any percentages below 0.

We'll notice that most often, respondents desire a salary increase in the range of 0% to 120%.

In [40]:
# Column creation using formula above
survey["Percent_Increase"] = (survey["ExpectedEarning"] - survey["Income"]) / survey["Income"] * 100

# Frequency distribution
survey["Percent_Increase"].value_counts(bins = 20, normalize= True) * 100
Out[40]:
(-115.647, 734.302]       40.192664
(734.302, 1567.585]        0.570582
(1567.585, 2400.867]       0.044461
(15733.384, 16566.667]     0.014820
(4067.432, 4900.714]       0.007410
(5733.996, 6567.279]       0.007410
(2400.867, 3234.149]       0.007410
(14066.82, 14900.102]      0.007410
(3234.149, 4067.432]       0.000000
(4900.714, 5733.996]       0.000000
(6567.279, 7400.561]       0.000000
(7400.561, 8233.843]       0.000000
(9067.126, 9900.408]       0.000000
(9900.408, 10733.69]       0.000000
(10733.69, 11566.973]      0.000000
(11566.973, 12400.255]     0.000000
(12400.255, 13233.537]     0.000000
(13233.537, 14066.82]      0.000000
(14900.102, 15733.384]     0.000000
(8233.843, 9067.126]       0.000000
Name: Percent_Increase, dtype: float64
In [41]:
fig, ax = plt.subplots(figsize = (13,9))

# Expected salary increase (in a percentage) histogram
survey[survey["Percent_Increase"] <= 500]["Percent_Increase"].plot.hist(bins = 15, color = "grey")
# Boolean masking ^^^ less than or equal to %500 ^^^

# Lower and upper quartile %25 to %75 range
plt.axvspan(survey["Percent_Increase"].quantile(0.25), survey["Percent_Increase"].quantile(0.75), color = "maroon", alpha = 0.4)

# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)

# Title
plt.title("Desired salary increase (in percent)", loc="left", size = 20, y = 1.05)

# X and Y labels
plt.ylabel("Frequency", size = 15, color = "grey", loc = "top")
plt.xlabel("Percent Increase", size = 15, loc = "left", color = "grey")

# X and Y ticks
plt.xticks(size = 13)
plt.yticks(size = 13)

# Text
plt.text(126,1000,"Typical range of expected salary increase (in percent)", size = 14, color = "maroon")

plt.show()

Most respondents do not have financial dependents to care for, and less than half do not have debts to pay off.

In [42]:
# Replaces following columns with True/False values
survey["HasDebt"] = survey["HasDebt"].replace({1.0:"True", 0.0: "False"})
survey["HasFinancialDependents"] = survey["HasFinancialDependents"].replace({1.0:"True", 0.0: "False"})
In [43]:
# Financial dependents
print("Financial Dependents:","\n", survey["HasFinancialDependents"].value_counts(normalize = True, dropna=False) * 100) 
print("\n")

# Has debt of any kind
print("Has Debt:", "\n", survey["HasDebt"].value_counts(normalize = True, dropna=False) * 100) 
print("\n")
Financial Dependents: 
 False    71.619118
True     21.074472
NaN       7.306410
Name: HasFinancialDependents, dtype: float64


Has Debt: 
 False    50.596517
True     41.948870
NaN       7.454613
Name: HasDebt, dtype: float64


Employment status

EmploymentStatus: "Regarding employment status, are you currently..." Respondents were asked to select their current employment stats, examples include not working, employed for wages, self-employed, military, etc... About half of respondents answered that they are actively working in some manner for their income. A smaller percentage neglected to answer, and the remaining participants are either not working but actively looking for work, not working and not looking for work, and the survey includes stay at home parents.

"Employed for wages" is the most common employment status, but this group has lowest median hours spent per week (10 hours) learning. The employment group "Not working but looking for work" has the highest median hours (20). Typically, respondents spend about 12 hours per week (median) or 1.7 hours per day learning programming. We did not calculate the weekly average, because the data contains many outliers in the range of 30 hours to 175 per week that significantly skews the distribtion.

In [44]:
# Fills in missing data from hours learning column
survey["HoursLearning"] = survey["HoursLearning"].fillna(survey["HoursLearning"].median())
In [45]:
# Hours spent learning distribution
fig, ax = plt.subplots(figsize = (12, 6))
sns.boxplot(x= "HoursLearning", data = survey, color = "grey", medianprops=dict(color="maroon", alpha=1))

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Title
plt.title("Hours spent learning per week", loc="left", size = 20, y = 1.05)

# Misc. Text
plt.text(80,-0.04,"Outliers", size = 14)
plt.text(0, -0.42, "Median (12 hours/week)", size = 14, color = "maroon")
plt.text(100, -0.3, "75% of participants spent\n20 hours or less per week learning", size = 14, color = "grey")
plt.text(100, -0.15, "25% of participants spent\n6 hours or less per week learning", size = 14, color = "grey")

# X label
plt.xlabel("Hours", size = 15, loc = "left", color = "grey")

# X ticks
plt.xticks(size = 13, color = "grey")


plt.show()

# Print stats
print(survey["HoursLearning"].describe())
count    13495.000000
mean        16.955761
std         14.573179
min          0.000000
25%          7.000000
50%         12.000000
75%         20.000000
max        168.000000
Name: HoursLearning, dtype: float64
In [46]:
# Frequency table employment status
survey["EmploymentStatus"].value_counts(dropna=False)
Out[46]:
Employed for wages                      5564
Not working but looking for work        3395
NaN                                     1899
Not working and not looking for work     948
Self-employed freelancer                 644
Doing an unpaid internship               294
Unable to work                           235
A stay-at-home parent or homemaker       220
Self-employed business owner             210
Military                                  62
Retired                                   24
Name: EmploymentStatus, dtype: int64
In [47]:
# Hours spent per week by employment status
survey.groupby("EmploymentStatus")["HoursLearning"].median().sort_values(ascending=False)
Out[47]:
EmploymentStatus
Not working but looking for work        20.0
Self-employed freelancer                20.0
Doing an unpaid internship              15.0
Self-employed business owner            15.0
Not working and not looking for work    14.0
A stay-at-home parent or homemaker      13.0
Retired                                 12.0
Unable to work                          12.0
Employed for wages                      10.0
Military                                10.0
Name: HoursLearning, dtype: float64

Career comparison by salary and experience (months programming)

MonthsProgramming: "About how many months have you been programming for? ("Programming experience")

There is some evidence that may suggest the type of career field has less influence on the motivation of individuals to learning programming.

Farming/fishing/forestry and education (typically careers we would not associate with programming/software development) have the greatest number of months programming. Besides these two career fields the IT/Software development field has the third highest average amount of experience. Presumbably respondents in the IT/Software development were likely spending time outside of work learning, or had just been hired.

Farming/fishing/forestry and education are some of the lowest paid career fields in this survey, yet on average, respondents expected a lower expected income than other career fields. Instead we see higher paying careers with less "programming experience" expecting higher income after switching to tech/software related jobs.

There may be a better argument to be made that education level may have more influence over a person's reason to begin learning a skill like programming for more career opportunities.

In [48]:
# Salary and experience comparison for employment fields

# Assign groupby objects for plotting using SchoolDegree 
empfld_months_prg = survey.groupby("EmploymentField").mean().sort_values(by="MonthsProgramming") # sort by the average number of months programming
empfld_income = survey.groupby("EmploymentField").mean().sort_values(by="Income") # sort by the average income
empfld_expected_salary = survey.groupby("EmploymentField").mean().sort_values(by="ExpectedEarning")

#-------------------------------------------------------------------------------------------------------------------------------------#
# Color assignment
colors = ["grey", "grey", "grey", "grey", "grey","grey", "grey", "grey", "grey", "grey","grey", "grey", "maroon", "maroon", "maroon",]

# Plot results experience
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(empfld_months_prg.index, empfld_months_prg["MonthsProgramming"], color = colors, height = 0.6)

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Y label
plt.ylabel("Career Field", loc = "top", size = 14, color = "grey")

# Text
plt.text(-8,16.3,"Average number of months", size = 14, color = "grey")

# Title
plt.title("New programmer experience by career field", size = 20, loc = "left", x = -0.65, y = 1.12)

# X axis to top
ax.xaxis.tick_top()

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)

# Highlight top 3 career fields most experience
plt.gca().get_yticklabels()[-1].set_color("maroon")
plt.gca().get_yticklabels()[-2].set_color("maroon")
plt.gca().get_yticklabels()[-3].set_color("maroon")

plt.show()

#---------------------------------------------------------------------------------------------------------------------------------------#

# Salary

# Color assignment
colors = ["maroon", "grey", "grey", "grey", "grey","maroon", "grey", "maroon", "grey", "grey","grey", "grey", "grey", "grey", "grey",]

# Plot results income
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(empfld_income.index, empfld_income["Income"], color = colors, height = 0.6)

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Y label
plt.ylabel("Career Field", loc = "top", size = 14, color = "grey")

# Text
plt.text(-25000,16.4,"Average Salary (US dollars)", size = 14, color = "grey")
plt.text(30000,-0.5, "Career fields shaded in red\nhave the highest average number of months\nspent learning programming", color = "grey")

# Title
plt.title("Salary by career field", size = 20, loc = "left", x = -0.65, y = 1.12)

# X axis to top
ax.xaxis.tick_top()

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)

# Highlight top 3 career fields most experience
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[5].set_color("maroon")
plt.gca().get_yticklabels()[-8].set_color("maroon")

plt.show()

#--------------------------------------------------------------------------------------------------------------------------------------------#

# Color assignment
colors_salary = ["maroon", "grey", "grey", "grey", "grey","grey", "grey", "maroon", "grey", "grey","maroon", "grey", "grey", "grey", "grey"]

# Plot results expected earning
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(empfld_expected_salary.index, empfld_expected_salary["ExpectedEarning"].sort_values(), color = colors_salary, height = 0.6)

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Y label
plt.ylabel("Career Field", loc = "top", size = 14, color = "grey")

# Text
plt.text(-20000, 16.3,"Average (US dollars)", size = 14, color = "grey")

# Title
plt.title("Expected annual salary increase", size = 20, loc = "left", x = -0.65, y = 1.12)

# X axis to top
ax.xaxis.tick_top()

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)

# Highlight top 3 career fields most experience
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[-5].set_color("maroon")
plt.gca().get_yticklabels()[7].set_color("maroon")

plt.show()

Education

Earlier we noticed that the career field someone is in may have a weaker influence on the motivation for people to start learning programming. Instead, a person's education level may be a more significant factor. The data suggests that individuals with education less than a bachelor's have some of the greatest amount of "programming experience".

These three fields are:

  • "Some high school"
  • "Associate's degree"
  • "Some college credit, no degree"

They have the highest median and average expected salary increase (in percent), and in terms of yearly income, respondents in these groups are some of the lowest earning. However, we have to note that in comparison to the average expected earning (in US dollar amounts), degree holders with Ph.D.s, professional degrees, and bachelor's generally expect a higher salary, with the exception of associate's degree holders.

Neither career type nor education level are perfect indicators for whether or not some one may be more motivated/interested in learning new programming/tech skills. We think it's reasonable to argue that the data suggests survey participants are generally interested in programming for the career and income opportunities.

In [49]:
# Average expected earning by school degree
round(survey.groupby("SchoolDegree")["ExpectedEarning"].mean().sort_values(ascending=False), 2)
Out[49]:
SchoolDegree
Ph.D.                                       61165.49
associate's degree                          60870.16
professional degree (MBA, MD, JD, etc.)     56383.49
bachelor's degree                           55103.32
no high school (secondary school)           54756.51
some high school                            53954.36
some college credit, no degree              53603.29
master's degree (non-professional)          52670.46
trade, technical, or vocational training    49398.36
high school diploma or equivalent (GED)     48131.91
Name: ExpectedEarning, dtype: float64
In [50]:
# Median expeceted salary increase (percent)
salary_increase_median = round(survey.groupby("SchoolDegree")["Percent_Increase"].median().sort_values(ascending=False),2)
salary_increase_median = pd.Series.to_frame(salary_increase_median).reset_index()
salary_increase_median = salary_increase_median.rename(columns={"index":"SchoolDegree","Percent_Increase":"Percentage"})

# Average expected salary increase (percent)
salary_increase = round(survey.groupby("SchoolDegree")["Percent_Increase"].mean().sort_values(ascending=False),2)
salary_increase = pd.Series.to_frame(salary_increase).reset_index()
salary_increase = salary_increase.rename(columns={"index":"SchoolDegree","Percent_Increase":"Percentage"})
#------------------------------------------------------------------------------------------------------------------------#

# Color assignment
colors = ["#145DDE", "#145DDE","#145DDE", "grey", "grey", "grey","grey", "grey", "grey", "grey"]

# Plot (1) results average and median salary increase (percent)
fig, ax = plt.subplots(figsize = (8, 6))

# Title
plt.title("Expected salary raise by education level", size = 19, loc = "left", x= -0.65, y = 1.16)

# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")

# Misc. text
plt.text(-4,-2,"Median", color = "#4B86C1", size = 14)
plt.text(35,-2,"Average", color = "grey", size = 14)
plt.text(80,-2,"(Percent)", size = 14)
plt.text(160, 2.5,"Education levels below bachelor's\ndegree have the highest average\nand median expected salary increase", color = "grey")

# Average plotted
plt.barh(salary_increase["SchoolDegree"], salary_increase["Percentage"], color = colors, height = 0.62)
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
ax.xaxis.tick_top()

# Median plotted
plt.barh(salary_increase_median["SchoolDegree"], salary_increase_median["Percentage"], color = "#4B86C1", height = 0.62)
plt.yticks(size = 14, color = "grey")
plt.xticks(size = 13, color = "grey")
plt.gca().invert_yaxis()

# Top 3 education levels by expected salary raise highlighted
plt.gca().get_yticklabels()[0].set_color("#145DDE")
plt.gca().get_yticklabels()[1].set_color("#145DDE")
plt.gca().get_yticklabels()[2].set_color("#145DDE")

plt.show()

#-----------------------------------------------------------------------------------------------------------------------------------------#

# Assign groupby objects for plotting using SchoolDegree 
schl_dgree = survey.groupby("SchoolDegree").mean().sort_values(by = "MonthsProgramming") # sort by the average number of months programming
degree_income = survey.groupby("SchoolDegree").mean().sort_values(by = "Income") # sort by the average income

# Plot results income by school edcuation level
colors_degree_income = ["#145DDE", "grey","#145DDE", "grey", "grey", "#145DDE","grey", "grey", "grey", "grey"]

# Plot (2) school degree income
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(degree_income.index, degree_income["Income"], color = colors_degree_income, height = 0.62)

# Title
plt.title("Salary by education", size = 20, loc = "left", x = -0.7, y = 1.12)

# Text
plt.text(-23000,10.7,"Average Salary (US dollars)", size = 14, color = "grey")

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")

# X axis to top
ax.xaxis.tick_top()

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)

# Top 3 education levels by expected salary raise highlighted
plt.gca().get_yticklabels()[0].set_color("#145DDE")
plt.gca().get_yticklabels()[2].set_color("#145DDE")
plt.gca().get_yticklabels()[5].set_color("#145DDE")

plt.show()

#-----------------------------------------------------------------------------------------------------------------------------------------#

# Plot results number of months programming by edcuation level
colors_schl_dgree = ["grey", "grey","grey", "grey", "grey", "#145DDE","#145DDE", "grey", "#145DDE", "grey"]

# Plot (3) school degree number of months programming
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(schl_dgree.index, schl_dgree["MonthsProgramming"], color = colors_schl_dgree, height = 0.62)

# Title
plt.title("New programmer experience by education", size = 20, loc = "left", x = -0.7, y = 1.12)

# Text
plt.text(-11,10.7,"Average number of months", size = 14, color = "grey")

# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)

# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")

# X axis to top
ax.xaxis.tick_top()

# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)

# Top 3 education levels by expected salary raise highlighted
plt.gca().get_yticklabels()[-2].set_color("#145DDE")
plt.gca().get_yticklabels()[-4].set_color("#145DDE")
plt.gca().get_yticklabels()[-5].set_color("#145DDE")

plt.show()