We are going to check the most used programming languages for data related jobs based on Stack Overflow Developer Survey 2021 results.
import pandas as pd
import numpy as np
result_df = pd.read_csv("C:/Users/Marselo/Downloads/Stackoverflow survey results/2021 survey_results_public.csv")
result_df.head(5)
ResponseId | MainBranch | Employment | Country | US_State | UK_Country | EdLevel | Age1stCode | LearnCode | YearsCode | ... | Age | Gender | Trans | Sexuality | Ethnicity | Accessibility | MentalHealth | SurveyLength | SurveyEase | ConvertedCompYearly | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | I am a developer by profession | Independent contractor, freelancer, or self-em... | Slovakia | NaN | NaN | Secondary school (e.g. American high school, G... | 18 - 24 years | Coding Bootcamp;Other online resources (ex: vi... | NaN | ... | 25-34 years old | Man | No | Straight / Heterosexual | White or of European descent | None of the above | None of the above | Appropriate in length | Easy | 62268.0 |
1 | 2 | I am a student who is learning to code | Student, full-time | Netherlands | NaN | NaN | Bachelor’s degree (B.A., B.S., B.Eng., etc.) | 11 - 17 years | Other online resources (ex: videos, blogs, etc... | 7 | ... | 18-24 years old | Man | No | Straight / Heterosexual | White or of European descent | None of the above | None of the above | Appropriate in length | Easy | NaN |
2 | 3 | I am not primarily a developer, but I write co... | Student, full-time | Russian Federation | NaN | NaN | Bachelor’s degree (B.A., B.S., B.Eng., etc.) | 11 - 17 years | Other online resources (ex: videos, blogs, etc... | NaN | ... | 18-24 years old | Man | No | Prefer not to say | Prefer not to say | None of the above | None of the above | Appropriate in length | Easy | NaN |
3 | 4 | I am a developer by profession | Employed full-time | Austria | NaN | NaN | Master’s degree (M.A., M.S., M.Eng., MBA, etc.) | 11 - 17 years | NaN | NaN | ... | 35-44 years old | Man | No | Straight / Heterosexual | White or of European descent | I am deaf / hard of hearing | NaN | Appropriate in length | Neither easy nor difficult | NaN |
4 | 5 | I am a developer by profession | Independent contractor, freelancer, or self-em... | United Kingdom of Great Britain and Northern I... | NaN | England | Master’s degree (M.A., M.S., M.Eng., MBA, etc.) | 5 - 10 years | Friend or family member | 17 | ... | 25-34 years old | Man | No | NaN | White or of European descent | None of the above | NaN | Appropriate in length | Easy | NaN |
5 rows × 48 columns
result_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 83439 entries, 0 to 83438 Data columns (total 48 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ResponseId 83439 non-null int64 1 MainBranch 83439 non-null object 2 Employment 83323 non-null object 3 Country 83439 non-null object 4 US_State 14920 non-null object 5 UK_Country 4418 non-null object 6 EdLevel 83126 non-null object 7 Age1stCode 83243 non-null object 8 LearnCode 82963 non-null object 9 YearsCode 81641 non-null object 10 YearsCodePro 61216 non-null object 11 DevType 66484 non-null object 12 OrgSize 60726 non-null object 13 Currency 61080 non-null object 14 CompTotal 47183 non-null float64 15 CompFreq 52150 non-null object 16 LanguageHaveWorkedWith 82357 non-null object 17 LanguageWantToWorkWith 76821 non-null object 18 DatabaseHaveWorkedWith 69546 non-null object 19 DatabaseWantToWorkWith 58299 non-null object 20 PlatformHaveWorkedWith 52135 non-null object 21 PlatformWantToWorkWith 41619 non-null object 22 WebframeHaveWorkedWith 61707 non-null object 23 WebframeWantToWorkWith 52095 non-null object 24 MiscTechHaveWorkedWith 47055 non-null object 25 MiscTechWantToWorkWith 38021 non-null object 26 ToolsTechHaveWorkedWith 72537 non-null object 27 ToolsTechWantToWorkWith 65480 non-null object 28 NEWCollabToolsHaveWorkedWith 81234 non-null object 29 NEWCollabToolsWantToWorkWith 73022 non-null object 30 OpSys 83294 non-null object 31 NEWStuck 83052 non-null object 32 NEWSOSites 83171 non-null object 33 SOVisitFreq 82413 non-null object 34 SOAccount 82525 non-null object 35 SOPartFreq 67553 non-null object 36 SOComm 82319 non-null object 37 NEWOtherComms 82828 non-null object 38 Age 82407 non-null object 39 Gender 82286 non-null object 40 Trans 80678 non-null object 41 Sexuality 73366 non-null object 42 Ethnicity 79464 non-null object 43 Accessibility 77603 non-null object 44 MentalHealth 76920 non-null object 45 SurveyLength 81711 non-null object 46 SurveyEase 81948 non-null object 47 ConvertedCompYearly 46844 non-null float64 dtypes: float64(2), int64(1), object(45) memory usage: 30.6+ MB
result_df["MainBranch"].value_counts()
I am a developer by profession 58153 I am a student who is learning to code 12029 I am not primarily a developer, but I write code sometimes as part of my work 6578 I code primarily as a hobby 4929 I used to be a developer by profession, but no longer am 1237 None of these 513 Name: MainBranch, dtype: int64
We are primarily interested with those who code for the purpose of work. So we will isolate these two MainBranch: "I am a developer by profession" and "I am not primarily a developer, but I write code sometimes as part of my work".
result2_df: pd.DataFrame = result_df.loc[(result_df["MainBranch"]=="I am a developer by profession") | (result_df["MainBranch"]=="I am not primarily a developer, but I write code sometimes as part of my work")]
result2_df.head(5)
ResponseId | MainBranch | Employment | Country | US_State | UK_Country | EdLevel | Age1stCode | LearnCode | YearsCode | ... | Age | Gender | Trans | Sexuality | Ethnicity | Accessibility | MentalHealth | SurveyLength | SurveyEase | ConvertedCompYearly | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | I am a developer by profession | Independent contractor, freelancer, or self-em... | Slovakia | NaN | NaN | Secondary school (e.g. American high school, G... | 18 - 24 years | Coding Bootcamp;Other online resources (ex: vi... | NaN | ... | 25-34 years old | Man | No | Straight / Heterosexual | White or of European descent | None of the above | None of the above | Appropriate in length | Easy | 62268.0 |
2 | 3 | I am not primarily a developer, but I write co... | Student, full-time | Russian Federation | NaN | NaN | Bachelor’s degree (B.A., B.S., B.Eng., etc.) | 11 - 17 years | Other online resources (ex: videos, blogs, etc... | NaN | ... | 18-24 years old | Man | No | Prefer not to say | Prefer not to say | None of the above | None of the above | Appropriate in length | Easy | NaN |
3 | 4 | I am a developer by profession | Employed full-time | Austria | NaN | NaN | Master’s degree (M.A., M.S., M.Eng., MBA, etc.) | 11 - 17 years | NaN | NaN | ... | 35-44 years old | Man | No | Straight / Heterosexual | White or of European descent | I am deaf / hard of hearing | NaN | Appropriate in length | Neither easy nor difficult | NaN |
4 | 5 | I am a developer by profession | Independent contractor, freelancer, or self-em... | United Kingdom of Great Britain and Northern I... | NaN | England | Master’s degree (M.A., M.S., M.Eng., MBA, etc.) | 5 - 10 years | Friend or family member | 17 | ... | 25-34 years old | Man | No | NaN | White or of European descent | None of the above | NaN | Appropriate in length | Easy | NaN |
8 | 9 | I am a developer by profession | Employed part-time | India | NaN | NaN | Bachelor’s degree (B.A., B.S., B.Eng., etc.) | 18 - 24 years | Coding Bootcamp | 6 | ... | 25-34 years old | Man | No | NaN | South Asian | NaN | I have a concentration and/or memory disorder ... | Appropriate in length | Easy | NaN |
5 rows × 48 columns
Drop the rows if DevType column is null.
print(result2_df.shape)
result2_df = result2_df.dropna(subset="DevType")
print(result2_df.shape)
(64731, 48) (61602, 48)
result2_df["DevType"].value_counts()
Developer, full-stack 8415 Developer, back-end 5378 Developer, front-end 2304 Developer, front-end;Developer, full-stack;Developer, back-end 2108 Developer, full-stack;Developer, back-end 1759 ... Developer, desktop or enterprise applications;Developer, full-stack;Developer, back-end;Database administrator;Developer, QA or test;DevOps specialist;Engineer, site reliability;System administrator;Educator 1 Developer, mobile;Developer, front-end;Developer, desktop or enterprise applications;Developer, full-stack;Other (please specify):;Developer, back-end;Academic researcher;Database administrator;Developer, game or graphics;Developer, embedded applications or devices;DevOps specialist;Designer;System administrator 1 Developer, mobile;Developer, front-end;Developer, desktop or enterprise applications;Developer, full-stack;Developer, back-end;Database administrator;Developer, embedded applications or devices;Designer;System administrator;Educator 1 Developer, front-end;Developer, full-stack;Developer, back-end;Academic researcher;Database administrator;DevOps specialist 1 Developer, mobile;Developer, desktop or enterprise applications;Data scientist or machine learning specialist;Developer, back-end;Engineering manager 1 Name: DevType, Length: 8159, dtype: int64
We can see that one person is allowed to choose more than one option for DevType column (separated by ;). Now, we will check options available for DevType column.
result2_df.loc[~result2_df["DevType"].str.contains(";"), "DevType"].unique()
array(['Developer, mobile', 'Developer, front-end', 'Data scientist or machine learning specialist', 'Developer, back-end', 'Developer, full-stack', 'Developer, game or graphics', 'Developer, embedded applications or devices', 'Developer, desktop or enterprise applications', 'Data or business analyst', 'Engineer, data', 'Academic researcher', 'Other (please specify):', 'Engineering manager', 'DevOps specialist', 'Senior Executive (C-Suite, VP, etc.)', 'Product manager', 'Developer, QA or test', 'Designer', 'Scientist', 'System administrator', 'Database administrator', 'Student', 'Engineer, site reliability', 'Educator', 'Marketing or sales professional'], dtype=object)
These are the available data related jobs for DevType options:
All data related jobs contain the word "data".
data_job_df = result2_df.loc[result2_df["DevType"].str.contains("data", case=False)]
data_job_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 12316 entries, 4 to 83436 Data columns (total 48 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ResponseId 12316 non-null int64 1 MainBranch 12316 non-null object 2 Employment 12316 non-null object 3 Country 12316 non-null object 4 US_State 2614 non-null object 5 UK_Country 758 non-null object 6 EdLevel 12304 non-null object 7 Age1stCode 12308 non-null object 8 LearnCode 12281 non-null object 9 YearsCode 12179 non-null object 10 YearsCodePro 11976 non-null object 11 DevType 12316 non-null object 12 OrgSize 12105 non-null object 13 Currency 12143 non-null object 14 CompTotal 9346 non-null float64 15 CompFreq 10334 non-null object 16 LanguageHaveWorkedWith 12257 non-null object 17 LanguageWantToWorkWith 11452 non-null object 18 DatabaseHaveWorkedWith 11231 non-null object 19 DatabaseWantToWorkWith 9688 non-null object 20 PlatformHaveWorkedWith 8771 non-null object 21 PlatformWantToWorkWith 7194 non-null object 22 WebframeHaveWorkedWith 9167 non-null object 23 WebframeWantToWorkWith 7693 non-null object 24 MiscTechHaveWorkedWith 9026 non-null object 25 MiscTechWantToWorkWith 7749 non-null object 26 ToolsTechHaveWorkedWith 10871 non-null object 27 ToolsTechWantToWorkWith 9898 non-null object 28 NEWCollabToolsHaveWorkedWith 12052 non-null object 29 NEWCollabToolsWantToWorkWith 10930 non-null object 30 OpSys 12305 non-null object 31 NEWStuck 12287 non-null object 32 NEWSOSites 12299 non-null object 33 SOVisitFreq 12249 non-null object 34 SOAccount 12261 non-null object 35 SOPartFreq 10294 non-null object 36 SOComm 12244 non-null object 37 NEWOtherComms 12249 non-null object 38 Age 12212 non-null object 39 Gender 12189 non-null object 40 Trans 11984 non-null object 41 Sexuality 11025 non-null object 42 Ethnicity 11812 non-null object 43 Accessibility 11514 non-null object 44 MentalHealth 11475 non-null object 45 SurveyLength 12093 non-null object 46 SurveyEase 12125 non-null object 47 ConvertedCompYearly 9272 non-null float64 dtypes: float64(2), int64(1), object(45) memory usage: 4.6+ MB
Drop the rows if the LanguageHaveWorkedWith column is not available.
print(data_job_df.shape)
data_job_df = data_job_df.dropna(subset="LanguageHaveWorkedWith")
print(data_job_df.shape)
(12316, 48) (12257, 48)
data_job_df["DevType"].value_counts()
Data scientist or machine learning specialist 632 Engineer, data 358 Engineer, data;Developer, back-end 316 Data or business analyst 269 Data scientist or machine learning specialist;Data or business analyst 197 ... Developer, mobile;Developer, full-stack;Database administrator;Data or business analyst;Senior Executive (C-Suite, VP, etc.) 1 Developer, desktop or enterprise applications;Data scientist or machine learning specialist;Database administrator;Designer 1 Developer, front-end;Developer, full-stack;Developer, back-end;Academic researcher;Database administrator;DevOps specialist 1 Developer, mobile;Developer, front-end;Developer, desktop or enterprise applications;Developer, full-stack;Developer, back-end;Database administrator;Developer, embedded applications or devices;Designer;System administrator;Educator 1 Developer, mobile;Developer, desktop or enterprise applications;Data scientist or machine learning specialist;Developer, back-end;Engineering manager 1 Name: DevType, Length: 4830, dtype: int64
Since the rows with more than one DevType might skew the language count results, we are going to include only the options below: "Data scientist or machine learning specialist", "Data or business analyst", "Engineer, data", "Database administrator"
data_job_df = data_job_df[(data_job_df["DevType"] == "Data scientist or machine learning specialist") | (data_job_df["DevType"] == "Data or business analyst") | (data_job_df["DevType"] == "Engineer, data") | (data_job_df["DevType"] == "Database administrator")]
data_job_df["DevType"].value_counts()
Data scientist or machine learning specialist 632 Engineer, data 358 Data or business analyst 269 Database administrator 44 Name: DevType, dtype: int64
Now we want to see the programming languages that are used by data related professions.
language_df: pd.DataFrame = data_job_df.loc[:,["DevType","LanguageHaveWorkedWith"]]
language_df
DevType | LanguageHaveWorkedWith | |
---|---|---|
9 | Data scientist or machine learning specialist | C++;Python |
61 | Data or business analyst | Python;R;VBA |
77 | Data scientist or machine learning specialist | HTML/CSS;Python;R |
137 | Data or business analyst | HTML/CSS;JavaScript;R;SQL |
160 | Engineer, data | Assembly;Python |
... | ... | ... |
83112 | Data scientist or machine learning specialist | Bash/Shell;Perl;Python;R |
83120 | Data scientist or machine learning specialist | C;C++;Java;Kotlin |
83203 | Engineer, data | Bash/Shell;C#;Ruby;Rust |
83216 | Data or business analyst | HTML/CSS;JavaScript;SQL;VBA |
83335 | Engineer, data | Bash/Shell;Python;R |
1303 rows × 2 columns
Similar to DevType, one person is allowed to choose more than one option for LanguageHaveWorkedWith column (separated by ;). Now, we will check the options available for DevType column.
languages = language_df.loc[~language_df["LanguageHaveWorkedWith"].str.contains(";"), "LanguageHaveWorkedWith"].unique()
for language in languages:
print(language)
Matlab Python SQL VBA C# R C Scala Delphi Perl PHP Java C++ Clojure Julia APL Swift HTML/CSS
Now we will make a dataframe for each DevType ("Data scientist or machine learning specialist", "Data or business analyst", "Engineer, data", and "Database administrator").
ds_df = language_df[language_df["DevType"].str.contains("Data scientist or machine learning specialist")]
ds_df
DevType | LanguageHaveWorkedWith | |
---|---|---|
9 | Data scientist or machine learning specialist | C++;Python |
77 | Data scientist or machine learning specialist | HTML/CSS;Python;R |
161 | Data scientist or machine learning specialist | Matlab |
209 | Data scientist or machine learning specialist | C#;HTML/CSS;Matlab;Python;R;SQL |
224 | Data scientist or machine learning specialist | Python |
... | ... | ... |
82900 | Data scientist or machine learning specialist | HTML/CSS;Python |
82906 | Data scientist or machine learning specialist | HTML/CSS;Python;R;SQL |
82921 | Data scientist or machine learning specialist | Python;R |
83112 | Data scientist or machine learning specialist | Bash/Shell;Perl;Python;R |
83120 | Data scientist or machine learning specialist | C;C++;Java;Kotlin |
632 rows × 2 columns
da_df = language_df[language_df["DevType"].str.contains("Data or business analyst")]
de_df = language_df[language_df["DevType"].str.contains("Engineer, data")]
dba_df = language_df[language_df["DevType"].str.contains("Database administrator")]
First we create a new dataframe to store the languages and their count based on each DevType. We calculate the total occurrences for each DevType and the occurrences of each language using a for
loop. C and C++ are not included in the for
loop because str.count
is based on Regex pattern, and they would cause error/miscount.
lang_count_df = pd.DataFrame()
languages = np.delete(languages,(languages == "C") | (languages == "C++"))
def add_lang_count(df: pd.DataFrame, job_type: str):
lang_count_df.loc["total", job_type] = df.shape[0]
for language in languages:
lang_count_df.loc[language,job_type] = df["LanguageHaveWorkedWith"].str.count(language).sum()
lang_count_df.loc["C", job_type] = df["LanguageHaveWorkedWith"].str.count("C[^\w+#]").sum()
lang_count_df.loc["C++", job_type] = df["LanguageHaveWorkedWith"].str.count("C\+\+").sum()
add_lang_count(ds_df,"Data Scientist")
add_lang_count(da_df, "Data Analyst")
add_lang_count(de_df, "Data Engineer")
add_lang_count(dba_df, "Database Administrator")
lang_count_df
Data Scientist | Data Analyst | Data Engineer | Database Administrator | |
---|---|---|---|---|
total | 632.0 | 269.0 | 358.0 | 44.0 |
Matlab | 55.0 | 13.0 | 29.0 | 1.0 |
Python | 595.0 | 179.0 | 294.0 | 18.0 |
SQL | 315.0 | 168.0 | 234.0 | 36.0 |
VBA | 25.0 | 77.0 | 24.0 | 7.0 |
C# | 42.0 | 34.0 | 46.0 | 11.0 |
R | 219.0 | 81.0 | 58.0 | 9.0 |
Scala | 40.0 | 2.0 | 89.0 | 1.0 |
Delphi | 0.0 | 3.0 | 2.0 | 1.0 |
Perl | 10.0 | 5.0 | 8.0 | 2.0 |
PHP | 14.0 | 19.0 | 23.0 | 5.0 |
Java | 218.0 | 115.0 | 230.0 | 15.0 |
Clojure | 9.0 | 6.0 | 5.0 | 1.0 |
Julia | 33.0 | 5.0 | 9.0 | 1.0 |
APL | 0.0 | 5.0 | 3.0 | 3.0 |
Swift | 8.0 | 4.0 | 4.0 | 4.0 |
HTML/CSS | 113.0 | 78.0 | 86.0 | 11.0 |
C | 89.0 | 20.0 | 49.0 | 8.0 |
C++ | 136.0 | 23.0 | 58.0 | 5.0 |
Import visualization libraries
import plotly.graph_objs as go
import plotly.express as px
import plotly.offline as pyo
pyo.init_notebook_mode()
We will add a new column to calculate the most used programming languages across all data related professions. Note that the sample size of data scientist is larger compared to other groups, and the sample of database administrator is smaller compared to the other groups.
total_ser = lang_count_df.sum(axis = 1)
lang_count_df["Total"] = total_ser
Normalize the data to percentage, and add a color code to each language.
normalize_df = lang_count_df.iloc[1:]/lang_count_df.loc["total"]
normalize_df["Color"] = "NA"
for n in range(normalize_df.shape[0]):
normalize_df.iloc[n, -1] = px.colors.qualitative.Light24[n]
normalize_df
Data Scientist | Data Analyst | Data Engineer | Database Administrator | Total | Color | |
---|---|---|---|---|---|---|
Matlab | 0.087025 | 0.048327 | 0.081006 | 0.022727 | 0.075211 | #FD3216 |
Python | 0.941456 | 0.665428 | 0.821229 | 0.409091 | 0.833461 | #00FE35 |
SQL | 0.498418 | 0.624535 | 0.653631 | 0.818182 | 0.577897 | #6A76FC |
VBA | 0.039557 | 0.286245 | 0.067039 | 0.159091 | 0.102072 | #FED4C4 |
C# | 0.066456 | 0.126394 | 0.128492 | 0.250000 | 0.102072 | #FE00CE |
R | 0.346519 | 0.301115 | 0.162011 | 0.204545 | 0.281658 | #0DF9FF |
Scala | 0.063291 | 0.007435 | 0.248603 | 0.022727 | 0.101305 | #F6F926 |
Delphi | 0.000000 | 0.011152 | 0.005587 | 0.022727 | 0.004605 | #FF9616 |
Perl | 0.015823 | 0.018587 | 0.022346 | 0.045455 | 0.019186 | #479B55 |
PHP | 0.022152 | 0.070632 | 0.064246 | 0.113636 | 0.046815 | #EEA6FB |
Java | 0.344937 | 0.427509 | 0.642458 | 0.340909 | 0.443592 | #DC587D |
Clojure | 0.014241 | 0.022305 | 0.013966 | 0.022727 | 0.016117 | #D626FF |
Julia | 0.052215 | 0.018587 | 0.025140 | 0.022727 | 0.036838 | #6E899C |
APL | 0.000000 | 0.018587 | 0.008380 | 0.068182 | 0.008442 | #00B5F7 |
Swift | 0.012658 | 0.014870 | 0.011173 | 0.090909 | 0.015349 | #B68E00 |
HTML/CSS | 0.178797 | 0.289963 | 0.240223 | 0.250000 | 0.221028 | #C9FBE5 |
C | 0.140823 | 0.074349 | 0.136872 | 0.181818 | 0.127398 | #FF0092 |
C++ | 0.215190 | 0.085502 | 0.162011 | 0.113636 | 0.170376 | #22FFA7 |
Extract the top 5 languages from each job type. sort_index()
is used to give a better visualization result.
top_5_total = normalize_df[["Total","Color"]].sort_values("Total", ascending=False)[:5].sort_index()
top_5_ds = normalize_df[["Data Scientist", "Color"]].sort_values("Data Scientist", ascending=False)[:5].sort_index()
top_5_da = normalize_df[["Data Analyst", "Color"]].sort_values("Data Analyst", ascending=False)[:5].sort_index()
top_5_de = normalize_df[["Data Engineer", "Color"]].sort_values("Data Engineer", ascending=False)[:5].sort_index()
top_5_dba = normalize_df[["Database Administrator", "Color"]].sort_values("Database Administrator", ascending=False)[:5].sort_index()
top_5_list = [top_5_total, top_5_ds, top_5_da, top_5_de, top_5_dba]
for df in top_5_list:
df.columns = ["Total", "Color"]
top_5_total
Total | Color | |
---|---|---|
HTML/CSS | 0.221028 | #C9FBE5 |
Java | 0.443592 | #DC587D |
Python | 0.833461 | #00FE35 |
R | 0.281658 | #0DF9FF |
SQL | 0.577897 | #6A76FC |
fig = go.Figure(go.Barpolar(
r= top_5_total["Total"],
theta= top_5_total.index,
width=0.45,
marker_color=top_5_total["Color"],
marker_line_color="black",
marker_line_width=2,
opacity=0.7
))
fig.update_layout(
template="ggplot2",
polar = dict(
radialaxis = dict(range=[0, 1.0], showticklabels=False, ticks=''),
angularaxis = dict(showticklabels=True, ticks='')
)
)
fig.show()
from plotly.subplots import make_subplots
fig = make_subplots(rows=2, cols=2, specs=[[{'type': 'polar'}]*2]*2, subplot_titles=("<b>Data Scientist</b>", "<b>Data Analyst</b>", "<b>Data Engineer</b>", "<b>Database Administrator</b>"))
fig.add_trace(go.Barpolar(
r= top_5_ds["Total"],
theta= top_5_ds.index,
width=0.45,
marker_color=top_5_ds["Color"],
marker_line_color="black",
marker_line_width=2,
opacity=0.7
), 1, 1)
fig.add_trace(go.Barpolar(
r= top_5_da["Total"],
theta= top_5_da.index,
width=0.45,
marker_color=top_5_da["Color"],
marker_line_color="black",
marker_line_width=2,
opacity=0.7
), 1, 2)
fig.add_trace(go.Barpolar(
r= top_5_de["Total"],
theta= top_5_de.index,
width=0.45,
marker_color=top_5_de["Color"],
marker_line_color="black",
marker_line_width=2,
opacity=0.7
), 2, 1)
fig.add_trace(go.Barpolar(
r= top_5_dba["Total"],
theta= top_5_dba.index,
width=0.45,
marker_color=top_5_dba["Color"],
marker_line_color="black",
marker_line_width=2,
opacity=0.7
), 2, 2)
fig.update_traces(hovertemplate='%{theta}: %{r:.0%}<extra></extra>')
fig.update_layout(
font_color="darkblue",
font_family="Helvetica",
title_font_color="darkblue",
title_font_family="Helvetica",
title_font_size=24,
margin = dict(t = 150),
paper_bgcolor='#DEDCC6',
height = 800,
width = 1000,
showlegend = False,
title=dict(text="<b>Popular Languages for Data Professionals</b>"),
template="ggplot2",
polar = dict(
radialaxis = dict(range=[0, 1.0], showline = False, showticklabels=False, ticks=''),
angularaxis = dict(showticklabels=True, ticks='', linecolor='black', linewidth=1.5)
),
polar2 = dict(
radialaxis = dict(range=[0, 1.0], showline = False, showticklabels=False, ticks=''),
angularaxis = dict(showticklabels=True, ticks='', linecolor='black', linewidth=1.5)
),
polar3 = dict(
radialaxis = dict(range=[0, 1.0], showline = False, showticklabels=False, ticks=''),
angularaxis = dict(showticklabels=True, ticks='', linecolor='black', linewidth=1.5)
),
polar4 = dict(
radialaxis = dict(range=[0, 1.0], showline = False, showticklabels=False, ticks=''),
angularaxis = dict(showticklabels=True, ticks='', linecolor='black', linewidth=1.5)
)
)
fig.update_annotations(yshift=20)
fig.add_annotation({
"x": 1.05,
"y": -0.1,
"font": {"size": 14},
"text": "tmtsmrsl.github.io",
"showarrow": False
})
fig.show()