This project focuses on data cleaning and some preliminary analysis of exit survey data. This is the sixth guided project in Dataquest's Data Scientist in Python Path.
The data we will use for this project are from exit surveys of employees from the Deparment of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institutes in Australia. The raw data for the DETE exit survey can be found here while the TAFE exit survey is here.
We'll try to answer the following questions:
We begin by loading the modules and reading in the two data sets to get an idea of what we're working with.
# Loading Modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Magic function to make plots appear directly below the code in Jupyter Notebook
%matplotlib inline
# Loading DETE exit surveys data
dete_survey = pd.read_csv('dete_survey.csv')
print("DETE")
print(dete_survey.info())
print("DETE shape:", dete_survey.shape)
print("\n")
# Loading TAFE exit surveys data
tafe_survey = pd.read_csv('tafe_survey.csv')
print("TAFE")
print(tafe_survey.info())
print("TAFE shape:", tafe_survey.shape)
DETE <class 'pandas.core.frame.DataFrame'> RangeIndex: 822 entries, 0 to 821 Data columns (total 56 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 822 non-null int64 1 SeparationType 822 non-null object 2 Cease Date 822 non-null object 3 DETE Start Date 822 non-null object 4 Role Start Date 822 non-null object 5 Position 817 non-null object 6 Classification 455 non-null object 7 Region 822 non-null object 8 Business Unit 126 non-null object 9 Employment Status 817 non-null object 10 Career move to public sector 822 non-null bool 11 Career move to private sector 822 non-null bool 12 Interpersonal conflicts 822 non-null bool 13 Job dissatisfaction 822 non-null bool 14 Dissatisfaction with the department 822 non-null bool 15 Physical work environment 822 non-null bool 16 Lack of recognition 822 non-null bool 17 Lack of job security 822 non-null bool 18 Work location 822 non-null bool 19 Employment conditions 822 non-null bool 20 Maternity/family 822 non-null bool 21 Relocation 822 non-null bool 22 Study/Travel 822 non-null bool 23 Ill Health 822 non-null bool 24 Traumatic incident 822 non-null bool 25 Work life balance 822 non-null bool 26 Workload 822 non-null bool 27 None of the above 822 non-null bool 28 Professional Development 808 non-null object 29 Opportunities for promotion 735 non-null object 30 Staff morale 816 non-null object 31 Workplace issue 788 non-null object 32 Physical environment 817 non-null object 33 Worklife balance 815 non-null object 34 Stress and pressure support 810 non-null object 35 Performance of supervisor 813 non-null object 36 Peer support 812 non-null object 37 Initiative 813 non-null object 38 Skills 811 non-null object 39 Coach 767 non-null object 40 Career Aspirations 746 non-null object 41 Feedback 792 non-null object 42 Further PD 768 non-null object 43 Communication 814 non-null object 44 My say 812 non-null object 45 Information 816 non-null object 46 Kept informed 813 non-null object 47 Wellness programs 766 non-null object 48 Health & Safety 793 non-null object 49 Gender 798 non-null object 50 Age 811 non-null object 51 Aboriginal 16 non-null object 52 Torres Strait 3 non-null object 53 South Sea 7 non-null object 54 Disability 23 non-null object 55 NESB 32 non-null object dtypes: bool(18), int64(1), object(37) memory usage: 258.6+ KB None DETE shape: (822, 56) TAFE <class 'pandas.core.frame.DataFrame'> RangeIndex: 702 entries, 0 to 701 Data columns (total 72 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Record ID 702 non-null float64 1 Institute 702 non-null object 2 WorkArea 702 non-null object 3 CESSATION YEAR 695 non-null float64 4 Reason for ceasing employment 701 non-null object 5 Contributing Factors. Career Move - Public Sector 437 non-null object 6 Contributing Factors. Career Move - Private Sector 437 non-null object 7 Contributing Factors. Career Move - Self-employment 437 non-null object 8 Contributing Factors. Ill Health 437 non-null object 9 Contributing Factors. Maternity/Family 437 non-null object 10 Contributing Factors. Dissatisfaction 437 non-null object 11 Contributing Factors. Job Dissatisfaction 437 non-null object 12 Contributing Factors. Interpersonal Conflict 437 non-null object 13 Contributing Factors. Study 437 non-null object 14 Contributing Factors. Travel 437 non-null object 15 Contributing Factors. Other 437 non-null object 16 Contributing Factors. NONE 437 non-null object 17 Main Factor. Which of these was the main factor for leaving? 113 non-null object 18 InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction 608 non-null object 19 InstituteViews. Topic:2. I was given access to skills training to help me do my job better 613 non-null object 20 InstituteViews. Topic:3. I was given adequate opportunities for personal development 610 non-null object 21 InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL% 608 non-null object 22 InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had 615 non-null object 23 InstituteViews. Topic:6. The organisation recognised when staff did good work 607 non-null object 24 InstituteViews. Topic:7. Management was generally supportive of me 614 non-null object 25 InstituteViews. Topic:8. Management was generally supportive of my team 608 non-null object 26 InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me 610 non-null object 27 InstituteViews. Topic:10. Staff morale was positive within the Institute 602 non-null object 28 InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly 601 non-null object 29 InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently 597 non-null object 30 InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly 601 non-null object 31 WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit 609 non-null object 32 WorkUnitViews. Topic:15. I worked well with my colleagues 605 non-null object 33 WorkUnitViews. Topic:16. My job was challenging and interesting 607 non-null object 34 WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work 610 non-null object 35 WorkUnitViews. Topic:18. I had sufficient contact with other people in my job 613 non-null object 36 WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job 609 non-null object 37 WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job 609 non-null object 38 WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT] 608 non-null object 39 WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job 608 non-null object 40 WorkUnitViews. Topic:23. My job provided sufficient variety 611 non-null object 41 WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job 610 non-null object 42 WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction 611 non-null object 43 WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance 606 non-null object 44 WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area 610 non-null object 45 WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date 609 non-null object 46 WorkUnitViews. Topic:29. There was adequate communication between staff in my unit 603 non-null object 47 WorkUnitViews. Topic:30. Staff morale was positive within my work unit 606 non-null object 48 Induction. Did you undertake Workplace Induction? 619 non-null object 49 InductionInfo. Topic:Did you undertake a Corporate Induction? 432 non-null object 50 InductionInfo. Topic:Did you undertake a Institute Induction? 483 non-null object 51 InductionInfo. Topic: Did you undertake Team Induction? 440 non-null object 52 InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 555 non-null object 53 InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 555 non-null object 54 InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction? 555 non-null object 55 InductionInfo. Face to Face Topic:Did you undertake a Institute Induction? 530 non-null object 56 InductionInfo. On-line Topic:Did you undertake a Institute Induction? 555 non-null object 57 InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction? 553 non-null object 58 InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category? 555 non-null object 59 InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.] 555 non-null object 60 InductionInfo. Induction Manual Topic: Did you undertake Team Induction? 555 non-null object 61 Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)? 608 non-null object 62 Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? 594 non-null object 63 Workplace. Topic:Does your workplace promote and practice the principles of employment equity? 587 non-null object 64 Workplace. Topic:Does your workplace value the diversity of its employees? 586 non-null object 65 Workplace. Topic:Would you recommend the Institute as an employer to others? 581 non-null object 66 Gender. What is your Gender? 596 non-null object 67 CurrentAge. Current Age 596 non-null object 68 Employment Type. Employment Type 596 non-null object 69 Classification. Classification 596 non-null object 70 LengthofServiceOverall. Overall Length of Service at Institute (in years) 596 non-null object 71 LengthofServiceCurrent. Length of Service at current workplace (in years) 596 non-null object dtypes: float64(2), object(70) memory usage: 395.0+ KB None TAFE shape: (702, 72)
We're dealing with columns mostly containing strings, mixed in with a few bools and some numeric data. DETE exit survey has 822 rows and 56 columns while TAFE exit survey has 72 columns and 702 rows. Let's also check the first few rows of our data.
dete_survey.head()
ID | SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | ... | Kept informed | Wellness programs | Health & Safety | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Ill Health Retirement | 08/2012 | 1984 | 2004 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
1 | 2 | Voluntary Early Retirement (VER) | 08/2012 | Not Stated | Not Stated | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
2 | 3 | Voluntary Early Retirement (VER) | 05/2012 | 2011 | 2011 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | ... | N | N | N | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
3 | 4 | Resignation-Other reasons | 05/2012 | 2005 | 2006 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | ... | A | N | A | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
4 | 5 | Age Retirement | 05/2012 | 1970 | 1989 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | ... | N | A | M | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
5 rows × 56 columns
tafe_survey.head()
Record ID | Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | ... | Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? | Workplace. Topic:Does your workplace promote and practice the principles of employment equity? | Workplace. Topic:Does your workplace value the diversity of its employees? | Workplace. Topic:Would you recommend the Institute as an employer to others? | Gender. What is your Gender? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | ... | Yes | Yes | Yes | Yes | Female | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
1 | 6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | ... | Yes | Yes | Yes | Yes | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
5 rows × 72 columns
We will now check the severity of missing data per column. We'll decide later which columns to drop.
Note that when we checked the first few rows of dete_survey
, we saw that some dates have string entries indicating "Not Stated"
. We will reload the DETE exit survey data set, this time indicating that entries with "Not Stated"
should be considered missing so that we can load the date columns as numeric types instead of strings.
# Reloading DETE exit survey data
dete_survey = pd.read_csv('dete_survey.csv', na_values='Not Stated')
# Checking columns with missing data
dete_miss = dete_survey.isnull().sum()
print("Number of columns with missing data in DETE exit survey:", len(dete_miss[dete_miss>0]))
print(type(dete_miss))
for index, value in dete_miss[dete_miss>0].iteritems():
print(value, ":", index)
print('\n')
tafe_miss = tafe_survey.isnull().sum()
print("Number of columns with missing data in TAFE ext survey:", len(tafe_miss[tafe_miss>0]))
for index, value in tafe_miss[tafe_miss>0].iteritems():
print(value, ":", index)
Number of columns with missing data in DETE exit survey: 36 <class 'pandas.core.series.Series'> 34 : Cease Date 73 : DETE Start Date 98 : Role Start Date 5 : Position 367 : Classification 105 : Region 696 : Business Unit 5 : Employment Status 14 : Professional Development 87 : Opportunities for promotion 6 : Staff morale 34 : Workplace issue 5 : Physical environment 7 : Worklife balance 12 : Stress and pressure support 9 : Performance of supervisor 10 : Peer support 9 : Initiative 11 : Skills 55 : Coach 76 : Career Aspirations 30 : Feedback 54 : Further PD 8 : Communication 10 : My say 6 : Information 9 : Kept informed 56 : Wellness programs 29 : Health & Safety 24 : Gender 11 : Age 806 : Aboriginal 819 : Torres Strait 815 : South Sea 799 : Disability 790 : NESB Number of columns with missing data in TAFE ext survey: 69 7 : CESSATION YEAR 1 : Reason for ceasing employment 265 : Contributing Factors. Career Move - Public Sector 265 : Contributing Factors. Career Move - Private Sector 265 : Contributing Factors. Career Move - Self-employment 265 : Contributing Factors. Ill Health 265 : Contributing Factors. Maternity/Family 265 : Contributing Factors. Dissatisfaction 265 : Contributing Factors. Job Dissatisfaction 265 : Contributing Factors. Interpersonal Conflict 265 : Contributing Factors. Study 265 : Contributing Factors. Travel 265 : Contributing Factors. Other 265 : Contributing Factors. NONE 589 : Main Factor. Which of these was the main factor for leaving? 94 : InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction 89 : InstituteViews. Topic:2. I was given access to skills training to help me do my job better 92 : InstituteViews. Topic:3. I was given adequate opportunities for personal development 94 : InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL% 87 : InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had 95 : InstituteViews. Topic:6. The organisation recognised when staff did good work 88 : InstituteViews. Topic:7. Management was generally supportive of me 94 : InstituteViews. Topic:8. Management was generally supportive of my team 92 : InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me 100 : InstituteViews. Topic:10. Staff morale was positive within the Institute 101 : InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly 105 : InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently 101 : InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly 93 : WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit 97 : WorkUnitViews. Topic:15. I worked well with my colleagues 95 : WorkUnitViews. Topic:16. My job was challenging and interesting 92 : WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work 89 : WorkUnitViews. Topic:18. I had sufficient contact with other people in my job 93 : WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job 93 : WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job 94 : WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT] 94 : WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job 91 : WorkUnitViews. Topic:23. My job provided sufficient variety 92 : WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job 91 : WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction 96 : WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance 92 : WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area 93 : WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date 99 : WorkUnitViews. Topic:29. There was adequate communication between staff in my unit 96 : WorkUnitViews. Topic:30. Staff morale was positive within my work unit 83 : Induction. Did you undertake Workplace Induction? 270 : InductionInfo. Topic:Did you undertake a Corporate Induction? 219 : InductionInfo. Topic:Did you undertake a Institute Induction? 262 : InductionInfo. Topic: Did you undertake Team Induction? 147 : InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 147 : InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 147 : InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction? 172 : InductionInfo. Face to Face Topic:Did you undertake a Institute Induction? 147 : InductionInfo. On-line Topic:Did you undertake a Institute Induction? 149 : InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction? 147 : InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category? 147 : InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.] 147 : InductionInfo. Induction Manual Topic: Did you undertake Team Induction? 94 : Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)? 108 : Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? 115 : Workplace. Topic:Does your workplace promote and practice the principles of employment equity? 116 : Workplace. Topic:Does your workplace value the diversity of its employees? 121 : Workplace. Topic:Would you recommend the Institute as an employer to others? 106 : Gender. What is your Gender? 106 : CurrentAge. Current Age 106 : Employment Type. Employment Type 106 : Classification. Classification 106 : LengthofServiceOverall. Overall Length of Service at Institute (in years) 106 : LengthofServiceCurrent. Length of Service at current workplace (in years)
It seems the TAFE exit survey data is more problematic, with 69 out of its 72 columns having at least 1 row of missing data. Meanwhile, the DETE exit survey data has 36 out of its 56 columns with at least 1 row of missing data.
We could drop all columns that have a certain number of rows with missing values but that would most likely result in the dropping of important columns that indicate contributing factors for the exit (which have 265 missing values).
Instead, we will manually select which columns we will keep for both data sets. Before we do that though, let's drop columns where more than 50% of the rows have missing values.
# Computing for dropping threshold
dete_thresh = int(len(dete_survey)/2)
tafe_thresh = int(len(tafe_survey)/2)
# Dropping columns where more than half of the observations are missing
dete_survey = dete_survey.dropna(axis=1, thresh=dete_thresh)
tafe_survey = tafe_survey.dropna(axis=1, thresh=tafe_thresh)
# Checking remaining columns and their index numbers
print('DETE Columns')
for idx, val in enumerate(dete_survey.columns):
print(idx, val)
print('\n')
print('TAFE Columns')
for idx, val in enumerate(tafe_survey.columns):
print(idx, val)
DETE Columns 0 ID 1 SeparationType 2 Cease Date 3 DETE Start Date 4 Role Start Date 5 Position 6 Classification 7 Region 8 Employment Status 9 Career move to public sector 10 Career move to private sector 11 Interpersonal conflicts 12 Job dissatisfaction 13 Dissatisfaction with the department 14 Physical work environment 15 Lack of recognition 16 Lack of job security 17 Work location 18 Employment conditions 19 Maternity/family 20 Relocation 21 Study/Travel 22 Ill Health 23 Traumatic incident 24 Work life balance 25 Workload 26 None of the above 27 Professional Development 28 Opportunities for promotion 29 Staff morale 30 Workplace issue 31 Physical environment 32 Worklife balance 33 Stress and pressure support 34 Performance of supervisor 35 Peer support 36 Initiative 37 Skills 38 Coach 39 Career Aspirations 40 Feedback 41 Further PD 42 Communication 43 My say 44 Information 45 Kept informed 46 Wellness programs 47 Health & Safety 48 Gender 49 Age TAFE Columns 0 Record ID 1 Institute 2 WorkArea 3 CESSATION YEAR 4 Reason for ceasing employment 5 Contributing Factors. Career Move - Public Sector 6 Contributing Factors. Career Move - Private Sector 7 Contributing Factors. Career Move - Self-employment 8 Contributing Factors. Ill Health 9 Contributing Factors. Maternity/Family 10 Contributing Factors. Dissatisfaction 11 Contributing Factors. Job Dissatisfaction 12 Contributing Factors. Interpersonal Conflict 13 Contributing Factors. Study 14 Contributing Factors. Travel 15 Contributing Factors. Other 16 Contributing Factors. NONE 17 InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction 18 InstituteViews. Topic:2. I was given access to skills training to help me do my job better 19 InstituteViews. Topic:3. I was given adequate opportunities for personal development 20 InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL% 21 InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had 22 InstituteViews. Topic:6. The organisation recognised when staff did good work 23 InstituteViews. Topic:7. Management was generally supportive of me 24 InstituteViews. Topic:8. Management was generally supportive of my team 25 InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me 26 InstituteViews. Topic:10. Staff morale was positive within the Institute 27 InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly 28 InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently 29 InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly 30 WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit 31 WorkUnitViews. Topic:15. I worked well with my colleagues 32 WorkUnitViews. Topic:16. My job was challenging and interesting 33 WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work 34 WorkUnitViews. Topic:18. I had sufficient contact with other people in my job 35 WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job 36 WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job 37 WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT] 38 WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job 39 WorkUnitViews. Topic:23. My job provided sufficient variety 40 WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job 41 WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction 42 WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance 43 WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area 44 WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date 45 WorkUnitViews. Topic:29. There was adequate communication between staff in my unit 46 WorkUnitViews. Topic:30. Staff morale was positive within my work unit 47 Induction. Did you undertake Workplace Induction? 48 InductionInfo. Topic:Did you undertake a Corporate Induction? 49 InductionInfo. Topic:Did you undertake a Institute Induction? 50 InductionInfo. Topic: Did you undertake Team Induction? 51 InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 52 InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 53 InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction? 54 InductionInfo. Face to Face Topic:Did you undertake a Institute Induction? 55 InductionInfo. On-line Topic:Did you undertake a Institute Induction? 56 InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction? 57 InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category? 58 InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.] 59 InductionInfo. Induction Manual Topic: Did you undertake Team Induction? 60 Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)? 61 Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? 62 Workplace. Topic:Does your workplace promote and practice the principles of employment equity? 63 Workplace. Topic:Does your workplace value the diversity of its employees? 64 Workplace. Topic:Would you recommend the Institute as an employer to others? 65 Gender. What is your Gender? 66 CurrentAge. Current Age 67 Employment Type. Employment Type 68 Classification. Classification 69 LengthofServiceOverall. Overall Length of Service at Institute (in years) 70 LengthofServiceCurrent. Length of Service at current workplace (in years)
After dropping columns with plenty of missing data, our next task is to clean up our dataframes some more by dropping columns that we will not be using for our analysis.
For the dete_survey
dataframe, we'll drop the columns that contain Likert scale data. Those are the columns with indexes 27 to 47.
For the tafe_survey
dataframe, we'll only keep the columns that help identify employee characteristics and the contributing factors for the exit. Hence, we'll drop columns from index 17 to 64.
# Dropping unnecessary columns
dete_survey_updated = dete_survey.drop(dete_survey.columns[27:48], axis=1).copy()
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:65], axis=1).copy()
Next, let's clean up our column names and convert all of them to camel case.
# Cleaning up column names
dete_survey_updated.columns = (dete_survey_updated.columns.
str.lower().
str.strip().
str.replace(' ','_', regex=False).
str.replace('.','_', regex=False).
str.replace('/','_', regex=False).
str.replace(':','_', regex=False).
str.replace('(','', regex=False).
str.replace(')','', regex=False).
str.replace('?','', regex=False).
str.replace('__','_', regex=False)
)
tafe_survey_updated.columns = (tafe_survey_updated.columns.
str.lower().
str.strip().
str.replace(' ','_', regex=False).
str.replace('.','_', regex=False).
str.replace('/','_', regex=False).
str.replace(':','_', regex=False).
str.replace('(','', regex=False).
str.replace(')','', regex=False).
str.replace('?','', regex=False).
str.replace('__','_', regex=False)
)
# Checking the new column names
print("DETE columns")
for col in dete_survey_updated.columns:
print(col)
print('\n')
print("TAFE columns")
for col in tafe_survey_updated.columns:
print(col)
DETE columns id separationtype cease_date dete_start_date role_start_date position classification region employment_status career_move_to_public_sector career_move_to_private_sector interpersonal_conflicts job_dissatisfaction dissatisfaction_with_the_department physical_work_environment lack_of_recognition lack_of_job_security work_location employment_conditions maternity_family relocation study_travel ill_health traumatic_incident work_life_balance workload none_of_the_above gender age TAFE columns record_id institute workarea cessation_year reason_for_ceasing_employment contributing_factors_career_move_-_public_sector contributing_factors_career_move_-_private_sector contributing_factors_career_move_-_self-employment contributing_factors_ill_health contributing_factors_maternity_family contributing_factors_dissatisfaction contributing_factors_job_dissatisfaction contributing_factors_interpersonal_conflict contributing_factors_study contributing_factors_travel contributing_factors_other contributing_factors_none gender_what_is_your_gender currentage_current_age employment_type_employment_type classification_classification lengthofserviceoverall_overall_length_of_service_at_institute_in_years lengthofservicecurrent_length_of_service_at_current_workplace_in_years
We notice that some column names are repeating, let's clean up the column names some more by manually renaming some of them.
# Renaming DETE columns
dete_column_dict = {'separationtype':'separation_type'}
dete_survey_updated.rename(columns=dete_column_dict, inplace=True)
# Renaming TAFE columns
tafe_column_dict = {"record_id":"id",
"workarea":"work_area",
"reason_for_ceasing_employment":"separation_type",
"contributing_factors_career_move_-_public_sector":"factor_move_public_sector",
"contributing_factors_career_move_-_private_sector":"factor_move_private_sector",
"contributing_factors_career_move_-_self-employment":"factor_move_self_employment",
"contributing_factors_ill_health":"factor_ill_health",
"contributing_factors_maternity_family":"factor_maternity_family",
"contributing_factors_dissatisfaction":"factor_dissatisfaction",
"contributing_factors_job_dissatisfaction":"factor_job_dissatisfaction",
"contributing_factors_interpersonal_conflict":"factor_interpersonal_conflict",
"contributing_factors_study":"factor_study",
"contributing_factors_travel":"factor_travel",
"contributing_factors_other":"factor_other",
"contributing_factors_none":"factor_none",
"gender_what_is_your_gender":"gender",
"currentage_current_age":"age",
"employment_type_employment_type":"employment_type",
"classification_classification":"classification",
"lengthofserviceoverall_overall_length_of_service_at_institute_in_years":"institute_service",
"lengthofservicecurrent_length_of_service_at_current_workplace_in_years":"workplace_service"
}
tafe_survey_updated.rename(columns=tafe_column_dict, inplace=True)
print(dete_survey_updated.info())
print(tafe_survey_updated.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 822 entries, 0 to 821 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 822 non-null int64 1 separation_type 822 non-null object 2 cease_date 788 non-null object 3 dete_start_date 749 non-null float64 4 role_start_date 724 non-null float64 5 position 817 non-null object 6 classification 455 non-null object 7 region 717 non-null object 8 employment_status 817 non-null object 9 career_move_to_public_sector 822 non-null bool 10 career_move_to_private_sector 822 non-null bool 11 interpersonal_conflicts 822 non-null bool 12 job_dissatisfaction 822 non-null bool 13 dissatisfaction_with_the_department 822 non-null bool 14 physical_work_environment 822 non-null bool 15 lack_of_recognition 822 non-null bool 16 lack_of_job_security 822 non-null bool 17 work_location 822 non-null bool 18 employment_conditions 822 non-null bool 19 maternity_family 822 non-null bool 20 relocation 822 non-null bool 21 study_travel 822 non-null bool 22 ill_health 822 non-null bool 23 traumatic_incident 822 non-null bool 24 work_life_balance 822 non-null bool 25 workload 822 non-null bool 26 none_of_the_above 822 non-null bool 27 gender 798 non-null object 28 age 811 non-null object dtypes: bool(18), float64(2), int64(1), object(8) memory usage: 85.2+ KB None <class 'pandas.core.frame.DataFrame'> RangeIndex: 702 entries, 0 to 701 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 702 non-null float64 1 institute 702 non-null object 2 work_area 702 non-null object 3 cessation_year 695 non-null float64 4 separation_type 701 non-null object 5 factor_move_public_sector 437 non-null object 6 factor_move_private_sector 437 non-null object 7 factor_move_self_employment 437 non-null object 8 factor_ill_health 437 non-null object 9 factor_maternity_family 437 non-null object 10 factor_dissatisfaction 437 non-null object 11 factor_job_dissatisfaction 437 non-null object 12 factor_interpersonal_conflict 437 non-null object 13 factor_study 437 non-null object 14 factor_travel 437 non-null object 15 factor_other 437 non-null object 16 factor_none 437 non-null object 17 gender 596 non-null object 18 age 596 non-null object 19 employment_type 596 non-null object 20 classification 596 non-null object 21 institute_service 596 non-null object 22 workplace_service 596 non-null object dtypes: float64(2), object(21) memory usage: 126.3+ KB None
We note that some columns that should be numeric or datetime such as age, length of service, and dates are currently stored as strings. We need to convert these columns to the appropriate data types before we can proceed with our analysis.
For the dete_survey_updated
dataframe, we need to convert the "cease_date"
and "age"
columns to numeric. Here's what we'll do:
"cease_date"
column such that it only contains the year component"age"
column which is stored as ranges of numbersFor the tafe_survey_updated
dataframe, we need to convert the "age"
, "institute_service"
, and "workplace_service"
columns to numeric. Here's what we'll do:
"age"
column which is stored as ranges of numbers"institute_service"
column which is stored as ranges of numbers"workplace_service"
column which is stored as ranges of numbers# Cleaning cease_date column of DETE
dete_survey_updated.loc[:, 'cease_date'] = dete_survey_updated['cease_date'].str.split('/').str[-1].astype(float)
dete_survey_updated['cease_date'].value_counts(dropna=False)
2013.0 380 2012.0 354 2014.0 51 NaN 34 2010.0 2 2006.0 1 Name: cease_date, dtype: int64
Quick note: For some reason, using the .loc
accessor (i.e. df.loc[:, "col_name"]
to update the values of the following columns results in missing values. Thus we just use brackets to access and update the columns of interest.
# Cleaning the age column of DETE
dete_survey_updated['age'] = dete_survey_updated['age'].str.extract(r"(\d+)").astype(float)
dete_survey_updated['age'].value_counts(dropna=False)
61.0 222 56.0 174 51.0 103 46.0 63 41.0 61 26.0 57 36.0 51 21.0 40 31.0 39 NaN 11 20.0 1 Name: age, dtype: int64
# Cleaning the age column of TAFE
tafe_survey_updated['age'] = tafe_survey_updated['age'].str.extract(r"(\d+)").astype(float)
tafe_survey_updated['age'].value_counts(dropna=False)
56.0 162 NaN 106 51.0 82 41.0 80 46.0 59 31.0 52 36.0 51 26.0 50 21.0 44 20.0 16 Name: age, dtype: int64
# Cleaning the institute_length_of_service column of TAFE
tafe_survey_updated['institute_service'] = tafe_survey_updated['institute_service'].str.extract(r"(\d+)").astype(float)
tafe_survey_updated['institute_service'].value_counts(dropna=False)
1.0 249 NaN 106 3.0 96 11.0 89 20.0 71 5.0 48 7.0 43 Name: institute_service, dtype: int64
# Cleaning the workplace_length_of_service column of TAFE
tafe_survey_updated['workplace_service'] = tafe_survey_updated['workplace_service'].str.extract(r"(\d+)").astype(float)
tafe_survey_updated['workplace_service'].value_counts(dropna=False)
1.0 290 NaN 106 3.0 86 11.0 82 20.0 54 7.0 44 5.0 40 Name: workplace_service, dtype: int64
In addition to the above, let's also create a new column for our dete_survey_updated
dataframe indicating the number of years of service to the institute.
We will also create columns that identify whether an employee left the institute because of dissatisfaction.
For DETE, we identify sources of dissatisfaction as:
For TEFE, the identifiers for dissatisfaction are:
# Creating institute_service column for DETE
dete_survey_updated.loc[:,'institute_service'] = dete_survey_updated['cease_date'] - dete_survey_updated['dete_start_date']
# Creating dissatisfied column for DETE
dete_diss_cols = ['job_dissatisfaction',
'dissatisfaction_with_the_department',
'physical_work_environment',
'lack_of_recognition',
'lack_of_job_security',
'work_location',
'employment_conditions',
'work_life_balance',
'workload'
]
dete_survey_updated.loc[:, 'dissatisfied'] = dete_survey_updated[dete_diss_cols].any(axis=1, skipna=False)
dete_survey_ready = dete_survey_updated.copy()
dete_survey_ready['dissatisfied'].value_counts(dropna=False)
False 447 True 375 Name: dissatisfied, dtype: int64
Recall that the entries for factors in the TAFE data are not stored as bools. We will define a function which we can use to transform entries into bools before we create the "dissastisfied"
column for TAFE.
# Factors identifying dissatisfaction in TAFE
tafe_diss_cols = ['factor_dissatisfaction',
'factor_job_dissatisfaction',
'factor_interpersonal_conflict'
]
for col in tafe_diss_cols:
print(tafe_survey_updated[col].value_counts(dropna=False))
- 371 NaN 265 Contributing Factors. Dissatisfaction 66 Name: factor_dissatisfaction, dtype: int64 - 360 NaN 265 Job Dissatisfaction 77 Name: factor_job_dissatisfaction, dtype: int64 - 410 NaN 265 Interpersonal Conflict 27 Name: factor_interpersonal_conflict, dtype: int64
# Defining function for transforming values to bool
def update_vals(val):
if pd.isnull(val):
return np.nan
elif val == '-':
return False
else:
return True
tafe_survey_updated.loc[:, 'dissatisfied'] = (tafe_survey_updated[tafe_diss_cols].
applymap(update_vals).
any(axis=1, skipna=False)
)
tafe_survey_ready = tafe_survey_updated.copy()
tafe_survey_ready['dissatisfied'].value_counts(dropna=False)
False 315 NaN 265 True 122 Name: dissatisfied, dtype: int64
We will combine the data sets to make it easier to analyze both data sets. Before we do that, we need to make sure that it's easy to identify what institute each observation is from.
dete_survey_updated['institute'] = 'DETE'
tafe_survey_updated['institute'] = 'TAFE'
combined = pd.concat([dete_survey_updated, tafe_survey_updated], ignore_index=True)
print(combined.shape)
combined.head()
(1524, 48)
id | separation_type | cease_date | dete_start_date | role_start_date | position | classification | region | employment_status | career_move_to_public_sector | ... | factor_maternity_family | factor_dissatisfaction | factor_job_dissatisfaction | factor_interpersonal_conflict | factor_study | factor_travel | factor_other | factor_none | employment_type | workplace_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | Ill Health Retirement | 2012.0 | 1984.0 | 2004.0 | Public Servant | A01-A04 | Central Office | Permanent Full-time | True | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 2.0 | Voluntary Early Retirement (VER) | 2012.0 | NaN | NaN | Public Servant | AO5-AO7 | Central Office | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 3.0 | Voluntary Early Retirement (VER) | 2012.0 | 2011.0 | 2011.0 | Schools Officer | NaN | Central Office | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 4.0 | Resignation-Other reasons | 2012.0 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 5.0 | Age Retirement | 2012.0 | 1970.0 | 1989.0 | Head of Curriculum/Head of Special Education | NaN | South East | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 48 columns
dete = (combined['institute'] == 'DETE')
tafe = (combined['institute'] == 'TAFE')
# Creating dictionary for updating values of separation_type
dete_sep_dict ={"Age Retirement":"Retirement",
"Resignation-Other reasons":"Resignation",
"Resignation-Other employer":"Resignation",
"Resignation-Move overseas/interstate":"Resignation",
"Voluntary Early Retirement (VER)":"Retirement",
"Ill Health Retirement":"Retirement"
}
# Updating values of separation_type_column
combined = combined.replace({'separation_type':dete_sep_dict})
combined.loc[dete, 'separation_type'].value_counts().plot(kind='pie',
autopct='%1.0f%%',
startangle=45,
cmap='tab20b'
)
plt.ylabel("")
plt.title('Separation Types for DETE')
plt.show()
combined.loc[dete, 'separation_type'].value_counts()
Retirement 413 Resignation 311 Other 49 Contract Expired 34 Termination 15 Name: separation_type, dtype: int64
combined.loc[tafe, 'separation_type'].value_counts().plot(kind='pie',
autopct='%1.0f%%',
startangle=45,
cmap='tab20b'
)
plt.ylabel("")
plt.title('Separation Types for TAFE')
plt.show()
combined.loc[tafe, 'separation_type'].value_counts()
Resignation 340 Contract Expired 127 Retrenchment/ Redundancy 104 Retirement 82 Transfer 25 Termination 23 Name: separation_type, dtype: int64
The reasons for employee exit vary between DETE and TAFE.
Let's explore how long employees have worked for the institutes when they leave. For this analysis, we will remove retirements and focus instead on the other reasons for exit. We will also categorize their lengths of service by category.
# Creating filtered dataframe removing retirement exits
not_retirement = (combined['separation_type'] != 'Retirement')
combined_non_retire = combined[not_retirement].copy()
# Defining function for creating service length categories
def experience(val):
if pd.isnull(val):
return np.nan
elif val >= 11:
return 'Veteran'
elif val >= 7:
return 'Established'
elif val >= 3:
return 'Experienced'
else:
return 'New'
# Creating service category column
combined_non_retire['service_cat'] = combined_non_retire['institute_service'].apply(experience)
# Setting category order
combined_non_retire['service_cat'] = pd.Categorical(combined_non_retire['service_cat'],
ordered=True,
categories = ['Veteran', 'Established', 'Experienced', 'New']
)
# Plotting overall service categories
combined_non_retire['service_cat'].value_counts().plot(kind='pie',
autopct='%1.0f%%',
startangle=45,
cmap='tab20b'
)
plt.title('Non-Retirement Employee Exit Experience Level (Combined)')
plt.show()
combined_non_retire['service_cat'].value_counts()
New 332 Experienced 242 Veteran 224 Established 85 Name: service_cat, dtype: int64
# Plotting DETE non-retirement service categories
combined_non_retire.loc[dete, 'service_cat'].value_counts().plot(kind='pie',
autopct='%1.0f%%',
startangle=45,
cmap='tab20b'
)
plt.title('Non-Retirement Employee Exit Experience Level (DETE)')
plt.show()
combined_non_retire.loc[dete, 'service_cat'].value_counts()
Veteran 121 Experienced 104 New 84 Established 48 Name: service_cat, dtype: int64
# Plotting TAFE non-retirement service categories
combined_non_retire.loc[tafe, 'service_cat'].value_counts().plot(kind='pie',
autopct='%1.0f%%',
startangle=45,
cmap='tab20b'
)
plt.title('Non-Retirement Employee Exit Experience Level (TAFE)')
plt.show()
combined_non_retire.loc[tafe, 'service_cat'].value_counts()
New 248 Experienced 138 Veteran 103 Established 37 Name: service_cat, dtype: int64
Overall, employees who have worked with the institutes for a fewer number of years (New and Experienced) account for a majority of the recorded employee exits, combining for a total of 63% of all non-retirement exits.
TAFE appears to have a problem with retaining employees for long periods of time. Even when we only consider non-retirement exits, 47% of the employees leaving the institute can be considered new (less than 3 years of institute service).
DETE, on the other hand, is seeing a significant portion of veterans (more than 11 years of service) leaving the institute, accounting for 34% of non-retirement exits.
Let's dig a little deeper and explore whether employees who resign do so because of some form of dissatisfaction. We begin by creating a new filtered dataframe that includes only resignation exits.
# Creating filtered dataframe containing only resignation exits
resignation = (combined['separation_type'] == 'Resignation')
combined_resignation = combined[resignation].copy()
# Defining function for creating service length categories
def experience(val):
if pd.isnull(val):
return np.nan
elif val >= 11:
return 'Veteran'
elif val >= 7:
return 'Established'
elif val >= 3:
return 'Experienced'
else:
return 'New'
# Creating service category column
combined_resignation['service_cat'] = combined_resignation['institute_service'].apply(experience)
# Setting category order
combined_resignation['service_cat'] = pd.Categorical(combined_resignation['service_cat'],
ordered=True,
categories = ['Veteran', 'Established', 'Experienced', 'New']
)
combined_resignation['service_cat'].value_counts().plot(kind='pie',
autopct='%1.0f%%',
startangle=45,
cmap='tab20b'
)
plt.ylabel("")
plt.title("Service Length for Resignation Exits (Overall)")
plt.show()
combined_resignation['service_cat'].value_counts()
New 193 Experienced 172 Veteran 136 Established 62 Name: service_cat, dtype: int64
We'll also perform just a little bit more cleaning. Since there are rows with missing data for the "dissatisfied"
column, we'll fill those up with the most common value found in the data.
# Checking value counts for dissatisfied column
print('Dissatisfied before cleaning')
print(combined_resignation['dissatisfied'].value_counts(ascending=False, dropna=False))
print('\n')
# Filling in missing rows
combined_resignation['dissatisfied'].fillna(combined_resignation['dissatisfied'].value_counts().index[0], inplace=True)
print('Dissatisfied after cleaning')
print(combined_resignation['dissatisfied'].value_counts(ascending=False, dropna=False))
Dissatisfied before cleaning False 397 True 246 NaN 8 Name: dissatisfied, dtype: int64 Dissatisfied after cleaning False 405 True 246 Name: dissatisfied, dtype: int64
It's now time to explore the how many of the resigning employees expressed dissatisfaction. We'll compare these based on different categories. We'll start with service length.
Dissatisfaction by Service Length
# Dataframes filtered by institution
dete_resignation = combined_resignation[combined_resignation['institute'] == 'DETE'].copy()
tafe_resignation = combined_resignation[combined_resignation['institute'] == 'TAFE'].copy()
# Setting up tables
service_diss_table = combined_resignation.pivot_table(index='service_cat', values='dissatisfied')
dete_service_diss_table = dete_resignation.pivot_table(index='service_cat', values='dissatisfied')
tafe_service_diss_table = tafe_resignation.pivot_table(index='service_cat', values='dissatisfied')
# Lists of lists for looping over
service_diss_tables = [service_diss_table, dete_service_diss_table, tafe_service_diss_table]
service_diss_titles = ["Overall", "DETE", "TAFE"]
# Generating Horizontal Bar Graphs
sides = ['top', 'right', 'left']
for sp in range(0,3):
ax = service_diss_tables[sp].sort_index(ascending=False).plot(kind='barh', legend=False)
ax.set_title("Resignations: Portion Dissatisfied by Service Length ({})".format(service_diss_titles[sp]))
for side in sides:
ax.spines[side].set_visible(False)
ax.set_ylabel(None)
ax.set_xlim(0, 0.6)
plt.show()
Overall, resignations by more experienced employees (Veteran and Established) expressed dissatisfaction as a factor in their decision to resign. Interestingly, the proportion of resignations who expressed dissatisfaction is significantly lower in TAFE, maxing out at just over 30% for Established employees.
Dissatisfaction by Gender
# Setting up tables
gender_diss_table = combined_resignation.pivot_table(index='gender', values='dissatisfied')
dete_gender_diss_table = dete_resignation.pivot_table(index='gender', values='dissatisfied')
tafe_gender_diss_table = tafe_resignation.pivot_table(index='gender', values='dissatisfied')
# Lists of lists for looping over
gender_diss_tables = [gender_diss_table, dete_gender_diss_table, tafe_gender_diss_table]
gender_diss_titles = ["Overall", "DETE", "TAFE"]
# Generating Horizontal Bar Graphs
sides = ['top', 'right', 'left']
for sp in range(0,3):
ax = gender_diss_tables[sp].sort_index(ascending=False).plot(kind='barh', legend=False, width=0.4)
ax.set_title("Resignations: Portion Dissatisfied by Gender ({})".format(gender_diss_titles[sp]))
for side in sides:
ax.spines[side].set_visible(False)
ax.set_ylabel(None)
ax.set_xlim(0, 0.6)
plt.show()
The same pattern where expressed dissatisfaction upon resigning is lower in TAFE can be seen when we look at dissatisfaction by gender. Overall, dissatisfaction does not vary much between males and females. For DETE, however, dissatisfaction among resigning males is around 10% higher compared to females.
Dissatisfaction by Age Group
We will perform a little more cleaning and create a new column "age_group"
to categorize observations based on age.
def age_group(val):
if pd.isnull(val):
return np.nan
elif val >= 50:
return '60 & above'
elif val >= 45:
return '45 to 60'
elif val >= 30:
return '30 to 45'
else:
return 'below 30'
combined_resignation['age_group'] = combined_resignation['age'].apply(age_group)
combined_resignation['age_group'] = pd.Categorical(combined_resignation['age_group'],
ordered=True,
categories = ['60 & above', '45 to 60', '30 to 45', 'below 30']
)
# Dataframes filtered by institution
dete_resignation = combined_resignation[combined_resignation['institute'] == 'DETE'].copy()
tafe_resignation = combined_resignation[combined_resignation['institute'] == 'TAFE'].copy()
# Setting up tables
age_diss_table = combined_resignation.pivot_table(index='age_group', values='dissatisfied')
dete_age_diss_table = dete_resignation.pivot_table(index='age_group', values='dissatisfied')
tafe_age_diss_table = tafe_resignation.pivot_table(index='age_group', values='dissatisfied')
# Lists of lists for looping over
age_diss_tables = [age_diss_table, dete_age_diss_table, tafe_age_diss_table]
age_diss_titles = ["Overall", "DETE", "TAFE"]
# Generating Horizontal Bar Graphs
sides = ['top', 'right', 'left']
for sp in range(0,3):
ax = age_diss_tables[sp].sort_index(ascending=False).plot(kind='barh', legend=False)
ax.set_title("Resignations: Portion Dissatisfied by Age ({})".format(age_diss_titles[sp]))
for side in sides:
ax.spines[side].set_visible(False)
ax.set_ylabel(None)
ax.set_xlim(0, 0.6)
plt.show()
Overall, dissatisfaction among resigning employees is highest in the 60 & above age group. This is something that may be worth looking into deeper to figure out why some personnel near retirement age choose to resign rather than retire.
For TAFE, dissatisfaction is once again lower across the board, with the highest dissatisfaction rate recorded for the 45 to 60 age group.
For DETE, well over 50% of the age group 60 and above expressed some dissatisfaction
For this project, we were able to practice data cleaning tasks such as creating functions for data cleaning, using built-in methods and functions to explore and correct values, and combining data sets properly. We were also able to practice producing charts and graphs combined with various techniques such as pivot tables to help in our analysis and the visual presentation of our findings.
We learned that patterns in employee exits differ between DETE and TAFE, where dissatisfaction appears to be lower in TAFE. Caution, however, must be taken before making definitive conclusions since the questionnaires used for measuring dissatisfaction are different for the two institutions. Given the data we have available so far, it appears as if TAFE has younger employees who decide to resign. We may want to look closer as to the exact reasons for these resignations although that is outside the scope of this project for now.
There are plenty more interesting questions that may be answered using the available data and we can revisit this data set once we further learn more and improve our data science skills.