Employee Exit Surveys : Clean & Analyze

Introduction

BACKGROUND:

  • Exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. You can find the DETE survey here(https://data.gov.au/dataset/ds-qld-fe96ff30-d157-4a81-851d-215f2a0fe26d/details?q=exit%20survey). The TAFE survey is no longer available. We've made some slight modifications to these datasets to make them easier to work with, including changing the encoding to UTF-8 (the original ones are encoded using cp1252.)


TARGET:

  • Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
  • Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

Below is a preview of a couple columns we'll work with from the dete_survey.csv:

  • ID: An id used to identify the participant of the survey
  • SeparationType: The reason why the person's employment ended
  • Cease Date: The year or month the person's employment ended
  • DETE Start Date: The year the person began employment with the DETE


Below is a preview of a couple columns we'll work with from the tafe_survey.csv:

  • Record ID: An id used to identify the participant of the survey
  • Reason for ceasing employment: The reason why the person's employment ended
  • LengthofServiceOverall. Overall Length of Service at Institute (in years): The length of the person's employment (in years)

Import Packages

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Load Data

In [2]:
dete_survey = pd.read_csv('dete_survey.csv',na_values='Not Stated')
tafe_survey = pd.read_csv('tafe_survey.csv')

1st Glance at data sets

Get Informationabout content, shape, size of data sets

DETE DATA

In [3]:
dete_survey.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 822 entries, 0 to 821
Data columns (total 56 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   ID                                   822 non-null    int64  
 1   SeparationType                       822 non-null    object 
 2   Cease Date                           788 non-null    object 
 3   DETE Start Date                      749 non-null    float64
 4   Role Start Date                      724 non-null    float64
 5   Position                             817 non-null    object 
 6   Classification                       455 non-null    object 
 7   Region                               717 non-null    object 
 8   Business Unit                        126 non-null    object 
 9   Employment Status                    817 non-null    object 
 10  Career move to public sector         822 non-null    bool   
 11  Career move to private sector        822 non-null    bool   
 12  Interpersonal conflicts              822 non-null    bool   
 13  Job dissatisfaction                  822 non-null    bool   
 14  Dissatisfaction with the department  822 non-null    bool   
 15  Physical work environment            822 non-null    bool   
 16  Lack of recognition                  822 non-null    bool   
 17  Lack of job security                 822 non-null    bool   
 18  Work location                        822 non-null    bool   
 19  Employment conditions                822 non-null    bool   
 20  Maternity/family                     822 non-null    bool   
 21  Relocation                           822 non-null    bool   
 22  Study/Travel                         822 non-null    bool   
 23  Ill Health                           822 non-null    bool   
 24  Traumatic incident                   822 non-null    bool   
 25  Work life balance                    822 non-null    bool   
 26  Workload                             822 non-null    bool   
 27  None of the above                    822 non-null    bool   
 28  Professional Development             808 non-null    object 
 29  Opportunities for promotion          735 non-null    object 
 30  Staff morale                         816 non-null    object 
 31  Workplace issue                      788 non-null    object 
 32  Physical environment                 817 non-null    object 
 33  Worklife balance                     815 non-null    object 
 34  Stress and pressure support          810 non-null    object 
 35  Performance of supervisor            813 non-null    object 
 36  Peer support                         812 non-null    object 
 37  Initiative                           813 non-null    object 
 38  Skills                               811 non-null    object 
 39  Coach                                767 non-null    object 
 40  Career Aspirations                   746 non-null    object 
 41  Feedback                             792 non-null    object 
 42  Further PD                           768 non-null    object 
 43  Communication                        814 non-null    object 
 44  My say                               812 non-null    object 
 45  Information                          816 non-null    object 
 46  Kept informed                        813 non-null    object 
 47  Wellness programs                    766 non-null    object 
 48  Health & Safety                      793 non-null    object 
 49  Gender                               798 non-null    object 
 50  Age                                  811 non-null    object 
 51  Aboriginal                           16 non-null     object 
 52  Torres Strait                        3 non-null      object 
 53  South Sea                            7 non-null      object 
 54  Disability                           23 non-null     object 
 55  NESB                                 32 non-null     object 
dtypes: bool(18), float64(2), int64(1), object(35)
memory usage: 258.6+ KB

Check NaN/null values in DETE dataframe

In [4]:
sorted = dete_survey.set_index('Region').sort_values(['Region', 'SeparationType'])
sns.heatmap(sorted.isnull(), cbar=False)
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa849c6610>
In [5]:
dete_survey.isnull().sum()
Out[5]:
ID                                       0
SeparationType                           0
Cease Date                              34
DETE Start Date                         73
Role Start Date                         98
Position                                 5
Classification                         367
Region                                 105
Business Unit                          696
Employment Status                        5
Career move to public sector             0
Career move to private sector            0
Interpersonal conflicts                  0
Job dissatisfaction                      0
Dissatisfaction with the department      0
Physical work environment                0
Lack of recognition                      0
Lack of job security                     0
Work location                            0
Employment conditions                    0
Maternity/family                         0
Relocation                               0
Study/Travel                             0
Ill Health                               0
Traumatic incident                       0
Work life balance                        0
Workload                                 0
None of the above                        0
Professional Development                14
Opportunities for promotion             87
Staff morale                             6
Workplace issue                         34
Physical environment                     5
Worklife balance                         7
Stress and pressure support             12
Performance of supervisor                9
Peer support                            10
Initiative                               9
Skills                                  11
Coach                                   55
Career Aspirations                      76
Feedback                                30
Further PD                              54
Communication                            8
My say                                  10
Information                              6
Kept informed                            9
Wellness programs                       56
Health & Safety                         29
Gender                                  24
Age                                     11
Aboriginal                             806
Torres Strait                          819
South Sea                              815
Disability                             799
NESB                                   790
dtype: int64
In [6]:
dete_survey.columns[dete_survey.isnull().sum()>100]
Out[6]:
Index(['Classification', 'Region', 'Business Unit', 'Aboriginal',
       'Torres Strait', 'South Sea', 'Disability', 'NESB'],
      dtype='object')

Check for columns with remaining 'Not Stated' elements (fixed via na_values parameter of read_csv() )

In [7]:
for col in dete_survey.columns:
    if ('Not Stated' in dete_survey[col].unique().tolist()):
        print(col)

Check for total amount of columns that contain NaN

In [8]:
dete_survey.columns[dete_survey.isnull().sum()>0].shape
Out[8]:
(36,)

Check Data Types of columns

In [9]:
dt = dete_survey.dtypes
dt.value_counts()
Out[9]:
object     35
bool       18
float64     2
int64       1
dtype: int64

Observations DETE SURVEY DATA:

  • 56 columns and 821 rows
  • Column names seem cleaned but some columns contain Camel Case Naming
  • 36 columns contain NaN values (57%)
  • 8 column contain more than 100 NaN (13%) and are almost entirely empty (Career Aspirations, Feedback, Further PD, Communication, My say, Information, Kept informed)
  • most columns are strings/object(35), some boolean(18) , two float (DETE start date, Role start date) and one single int64 (ID)

TAFE DATA

In [10]:
tafe_survey.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 702 entries, 0 to 701
Data columns (total 72 columns):
 #   Column                                                                                                                                                         Non-Null Count  Dtype  
---  ------                                                                                                                                                         --------------  -----  
 0   Record ID                                                                                                                                                      702 non-null    float64
 1   Institute                                                                                                                                                      702 non-null    object 
 2   WorkArea                                                                                                                                                       702 non-null    object 
 3   CESSATION YEAR                                                                                                                                                 695 non-null    float64
 4   Reason for ceasing employment                                                                                                                                  701 non-null    object 
 5   Contributing Factors. Career Move - Public Sector                                                                                                              437 non-null    object 
 6   Contributing Factors. Career Move - Private Sector                                                                                                             437 non-null    object 
 7   Contributing Factors. Career Move - Self-employment                                                                                                            437 non-null    object 
 8   Contributing Factors. Ill Health                                                                                                                               437 non-null    object 
 9   Contributing Factors. Maternity/Family                                                                                                                         437 non-null    object 
 10  Contributing Factors. Dissatisfaction                                                                                                                          437 non-null    object 
 11  Contributing Factors. Job Dissatisfaction                                                                                                                      437 non-null    object 
 12  Contributing Factors. Interpersonal Conflict                                                                                                                   437 non-null    object 
 13  Contributing Factors. Study                                                                                                                                    437 non-null    object 
 14  Contributing Factors. Travel                                                                                                                                   437 non-null    object 
 15  Contributing Factors. Other                                                                                                                                    437 non-null    object 
 16  Contributing Factors. NONE                                                                                                                                     437 non-null    object 
 17  Main Factor. Which of these was the main factor for leaving?                                                                                                   113 non-null    object 
 18  InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction                                                                         608 non-null    object 
 19  InstituteViews. Topic:2. I was given access to skills training to help me do my job better                                                                     613 non-null    object 
 20  InstituteViews. Topic:3. I was given adequate opportunities for personal development                                                                           610 non-null    object 
 21  InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%                                                            608 non-null    object 
 22  InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had                                                                615 non-null    object 
 23  InstituteViews. Topic:6. The organisation recognised when staff did good work                                                                                  607 non-null    object 
 24  InstituteViews. Topic:7. Management was generally supportive of me                                                                                             614 non-null    object 
 25  InstituteViews. Topic:8. Management was generally supportive of my team                                                                                        608 non-null    object 
 26  InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me                                                          610 non-null    object 
 27  InstituteViews. Topic:10. Staff morale was positive within the Institute                                                                                       602 non-null    object 
 28  InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly                                                                                 601 non-null    object 
 29  InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently                                                                             597 non-null    object 
 30  InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly                                                                              601 non-null    object 
 31  WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit                                                609 non-null    object 
 32  WorkUnitViews. Topic:15. I worked well with my colleagues                                                                                                      605 non-null    object 
 33  WorkUnitViews. Topic:16. My job was challenging and interesting                                                                                                607 non-null    object 
 34  WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work                                                                        610 non-null    object 
 35  WorkUnitViews. Topic:18. I had sufficient contact with other people in my job                                                                                  613 non-null    object 
 36  WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job                                                   609 non-null    object 
 37  WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job                                                                               609 non-null    object 
 38  WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT]  608 non-null    object 
 39  WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job                                                                            608 non-null    object 
 40  WorkUnitViews. Topic:23. My job provided sufficient variety                                                                                                    611 non-null    object 
 41  WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job                                                                    610 non-null    object 
 42  WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction                                                        611 non-null    object 
 43  WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance                                                    606 non-null    object 
 44  WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area                       610 non-null    object 
 45  WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date  609 non-null    object 
 46  WorkUnitViews. Topic:29. There was adequate communication between staff in my unit                                                                             603 non-null    object 
 47  WorkUnitViews. Topic:30. Staff morale was positive within my work unit                                                                                         606 non-null    object 
 48  Induction. Did you undertake Workplace Induction?                                                                                                              619 non-null    object 
 49  InductionInfo. Topic:Did you undertake a Corporate Induction?                                                                                                  432 non-null    object 
 50  InductionInfo. Topic:Did you undertake a Institute Induction?                                                                                                  483 non-null    object 
 51  InductionInfo. Topic: Did you undertake Team Induction?                                                                                                        440 non-null    object 
 52  InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?                                                      555 non-null    object 
 53  InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?                                                           555 non-null    object 
 54  InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?                                                                                 555 non-null    object 
 55  InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?                                                                                     530 non-null    object 
 56  InductionInfo. On-line Topic:Did you undertake a Institute Induction?                                                                                          555 non-null    object 
 57  InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?                                                                                 553 non-null    object 
 58  InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?                                                                                 555 non-null    object 
 59  InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]                                                555 non-null    object 
 60  InductionInfo. Induction Manual Topic: Did you undertake Team Induction?                                                                                       555 non-null    object 
 61  Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?                                                      608 non-null    object 
 62  Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?                                                    594 non-null    object 
 63  Workplace. Topic:Does your workplace promote and practice the principles of employment equity?                                                                 587 non-null    object 
 64  Workplace. Topic:Does your workplace value the diversity of its employees?                                                                                     586 non-null    object 
 65  Workplace. Topic:Would you recommend the Institute as an employer to others?                                                                                   581 non-null    object 
 66  Gender. What is your Gender?                                                                                                                                   596 non-null    object 
 67  CurrentAge. Current Age                                                                                                                                        596 non-null    object 
 68  Employment Type. Employment Type                                                                                                                               596 non-null    object 
 69  Classification. Classification                                                                                                                                 596 non-null    object 
 70  LengthofServiceOverall. Overall Length of Service at Institute (in years)                                                                                      596 non-null    object 
 71  LengthofServiceCurrent. Length of Service at current workplace (in years)                                                                                      596 non-null    object 
dtypes: float64(2), object(70)
memory usage: 395.0+ KB
In [11]:
tafe_survey.columns
Out[11]:
Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR',
       'Reason for ceasing employment',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE',
       'Main Factor. Which of these was the main factor for leaving?',
       'InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction',
       'InstituteViews. Topic:2. I was given access to skills training to help me do my job better',
       'InstituteViews. Topic:3. I was given adequate opportunities for personal development',
       'InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%',
       'InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had',
       'InstituteViews. Topic:6. The organisation recognised when staff did good work',
       'InstituteViews. Topic:7. Management was generally supportive of me',
       'InstituteViews. Topic:8. Management was generally supportive of my team',
       'InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me',
       'InstituteViews. Topic:10. Staff morale was positive within the Institute',
       'InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly',
       'InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently',
       'InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly',
       'WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit',
       'WorkUnitViews. Topic:15. I worked well with my colleagues',
       'WorkUnitViews. Topic:16. My job was challenging and interesting',
       'WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work',
       'WorkUnitViews. Topic:18. I had sufficient contact with other people in my job',
       'WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job',
       'WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job',
       'WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT]',
       'WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job',
       'WorkUnitViews. Topic:23. My job provided sufficient variety',
       'WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job',
       'WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction',
       'WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance',
       'WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area',
       'WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date',
       'WorkUnitViews. Topic:29. There was adequate communication between staff in my unit',
       'WorkUnitViews. Topic:30. Staff morale was positive within my work unit',
       'Induction. Did you undertake Workplace Induction?',
       'InductionInfo. Topic:Did you undertake a Corporate Induction?',
       'InductionInfo. Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Topic: Did you undertake Team Induction?',
       'InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?',
       'InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?',
       'InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?',
       'InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?',
       'InductionInfo. On-line Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?',
       'InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]',
       'InductionInfo. Induction Manual Topic: Did you undertake Team Induction?',
       'Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?',
       'Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?',
       'Workplace. Topic:Does your workplace promote and practice the principles of employment equity?',
       'Workplace. Topic:Does your workplace value the diversity of its employees?',
       'Workplace. Topic:Would you recommend the Institute as an employer to others?',
       'Gender. What is your Gender?', 'CurrentAge. Current Age',
       'Employment Type. Employment Type', 'Classification. Classification',
       'LengthofServiceOverall. Overall Length of Service at Institute (in years)',
       'LengthofServiceCurrent. Length of Service at current workplace (in years)'],
      dtype='object')
In [12]:
tafe_survey.sample()
Out[12]:
Record ID Institute WorkArea CESSATION YEAR Reason for ceasing employment Contributing Factors. Career Move - Public Sector Contributing Factors. Career Move - Private Sector Contributing Factors. Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family ... Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? Workplace. Topic:Does your workplace promote and practice the principles of employment equity? Workplace. Topic:Does your workplace value the diversity of its employees? Workplace. Topic:Would you recommend the Institute as an employer to others? Gender. What is your Gender? CurrentAge. Current Age Employment Type. Employment Type Classification. Classification LengthofServiceOverall. Overall Length of Service at Institute (in years) LengthofServiceCurrent. Length of Service at current workplace (in years)
582 6.348832e+17 Southern Queensland Institute of TAFE Delivery (teaching) 2012.0 Termination NaN NaN NaN NaN NaN ... No Yes Yes No Female 20 or younger Contract/casual Teacher (including LVT) 7-10 7-10

1 rows × 72 columns

Check NaN/null values in TAFE dataframe

In [13]:
sorted = tafe_survey.set_index('Institute').sort_values(['Institute', 'Reason for ceasing employment'])
sns.heatmap(sorted.isnull(), cbar=False)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa84ab8b20>
In [14]:
tafe_survey.columns[tafe_survey.isnull().sum()>0]
Out[14]:
Index(['CESSATION YEAR', 'Reason for ceasing employment',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE',
       'Main Factor. Which of these was the main factor for leaving?',
       'InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction',
       'InstituteViews. Topic:2. I was given access to skills training to help me do my job better',
       'InstituteViews. Topic:3. I was given adequate opportunities for personal development',
       'InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%',
       'InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had',
       'InstituteViews. Topic:6. The organisation recognised when staff did good work',
       'InstituteViews. Topic:7. Management was generally supportive of me',
       'InstituteViews. Topic:8. Management was generally supportive of my team',
       'InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me',
       'InstituteViews. Topic:10. Staff morale was positive within the Institute',
       'InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly',
       'InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently',
       'InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly',
       'WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit',
       'WorkUnitViews. Topic:15. I worked well with my colleagues',
       'WorkUnitViews. Topic:16. My job was challenging and interesting',
       'WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work',
       'WorkUnitViews. Topic:18. I had sufficient contact with other people in my job',
       'WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job',
       'WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job',
       'WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT]',
       'WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job',
       'WorkUnitViews. Topic:23. My job provided sufficient variety',
       'WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job',
       'WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction',
       'WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance',
       'WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area',
       'WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date',
       'WorkUnitViews. Topic:29. There was adequate communication between staff in my unit',
       'WorkUnitViews. Topic:30. Staff morale was positive within my work unit',
       'Induction. Did you undertake Workplace Induction?',
       'InductionInfo. Topic:Did you undertake a Corporate Induction?',
       'InductionInfo. Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Topic: Did you undertake Team Induction?',
       'InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?',
       'InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?',
       'InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?',
       'InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?',
       'InductionInfo. On-line Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?',
       'InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]',
       'InductionInfo. Induction Manual Topic: Did you undertake Team Induction?',
       'Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?',
       'Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?',
       'Workplace. Topic:Does your workplace promote and practice the principles of employment equity?',
       'Workplace. Topic:Does your workplace value the diversity of its employees?',
       'Workplace. Topic:Would you recommend the Institute as an employer to others?',
       'Gender. What is your Gender?', 'CurrentAge. Current Age',
       'Employment Type. Employment Type', 'Classification. Classification',
       'LengthofServiceOverall. Overall Length of Service at Institute (in years)',
       'LengthofServiceCurrent. Length of Service at current workplace (in years)'],
      dtype='object')
In [15]:
tafe_survey.columns[tafe_survey.isnull().sum()>100].shape
Out[15]:
(38,)
In [16]:
dt = tafe_survey.dtypes
dt.value_counts()
Out[16]:
object     70
float64     2
dtype: int64

Observations TAFE SURVEY DATA:

  • contains 72 columns and 702 rows
  • Column names seem not cleaned, contain a lot of special characters and are not formated in any way
  • 69 columns contain NaN values (96%)
  • 38 column contain more than 100 NaN (53%)
  • most columns are strings/object(70), some are float64 (2)

Data set cleaning

Clean DETE Data set

Drop columns that are not needed for the analysis. Those columns contain work place and career related topics.

In [17]:
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49],axis=1)

Format column names to lowercase, remove spaces - to have common naming between DETE and TAFE data set

In [18]:
dete_survey_updated.columns = dete_survey_updated.columns.str.lower().str.strip().str.replace(' ','_')
In [19]:
dete_survey_updated.columns
Out[19]:
Index(['id', 'separationtype', 'cease_date', 'dete_start_date',
       'role_start_date', 'position', 'classification', 'region',
       'business_unit', 'employment_status', 'career_move_to_public_sector',
       'career_move_to_private_sector', 'interpersonal_conflicts',
       'job_dissatisfaction', 'dissatisfaction_with_the_department',
       'physical_work_environment', 'lack_of_recognition',
       'lack_of_job_security', 'work_location', 'employment_conditions',
       'maternity/family', 'relocation', 'study/travel', 'ill_health',
       'traumatic_incident', 'work_life_balance', 'workload',
       'none_of_the_above', 'gender', 'age', 'aboriginal', 'torres_strait',
       'south_sea', 'disability', 'nesb'],
      dtype='object')

Clean TAFE Data set

Drop columns that are not needed for the analysis. Those columns contain detailed questions/answers regarding the reason to leave the institute

In [20]:
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66],axis=1)

Rename column names - to have common naming between DETE and TAFE data set

In [21]:
col_map={'Record ID':'id',
        'CESSATION YEAR': 'cease_date',
         'Reason for ceasing employment':'separationtype',
         'Gender. What is your Gender?':'gender',
         'CurrentAge. Current Age':'age',
         'Employment Type. Employment Type':'employment_status',
         'Classification. Classification':'position',
         'LengthofServiceOverall. Overall Length of Service at Institute (in years)':'institute_service',
         'LengthofServiceCurrent. Length of Service at current workplace (in years)':'role_service'
        }
In [22]:
tafe_survey_updated.rename(col_map,axis=1,inplace=True)
In [23]:
tafe_survey_updated.columns
Out[23]:
Index(['id', 'Institute', 'WorkArea', 'cease_date', 'separationtype',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE', 'gender',
       'age', 'employment_status', 'position', 'institute_service',
       'role_service'],
      dtype='object')

Check amount Resignation cases in each data set

In [24]:
dete_survey_updated.separationtype.value_counts()
Out[24]:
Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: separationtype, dtype: int64

Extract only Resignation cases (seperationtype entry starts with Resignation...)

In [25]:
dete_resignations = dete_survey_updated[dete_survey_updated.separationtype.str.contains('^Resignation.*')].copy()
In [26]:
dete_resignations.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 311 entries, 3 to 821
Data columns (total 35 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   id                                   311 non-null    int64  
 1   separationtype                       311 non-null    object 
 2   cease_date                           300 non-null    object 
 3   dete_start_date                      283 non-null    float64
 4   role_start_date                      271 non-null    float64
 5   position                             308 non-null    object 
 6   classification                       161 non-null    object 
 7   region                               265 non-null    object 
 8   business_unit                        32 non-null     object 
 9   employment_status                    307 non-null    object 
 10  career_move_to_public_sector         311 non-null    bool   
 11  career_move_to_private_sector        311 non-null    bool   
 12  interpersonal_conflicts              311 non-null    bool   
 13  job_dissatisfaction                  311 non-null    bool   
 14  dissatisfaction_with_the_department  311 non-null    bool   
 15  physical_work_environment            311 non-null    bool   
 16  lack_of_recognition                  311 non-null    bool   
 17  lack_of_job_security                 311 non-null    bool   
 18  work_location                        311 non-null    bool   
 19  employment_conditions                311 non-null    bool   
 20  maternity/family                     311 non-null    bool   
 21  relocation                           311 non-null    bool   
 22  study/travel                         311 non-null    bool   
 23  ill_health                           311 non-null    bool   
 24  traumatic_incident                   311 non-null    bool   
 25  work_life_balance                    311 non-null    bool   
 26  workload                             311 non-null    bool   
 27  none_of_the_above                    311 non-null    bool   
 28  gender                               302 non-null    object 
 29  age                                  306 non-null    object 
 30  aboriginal                           7 non-null      object 
 31  torres_strait                        0 non-null      object 
 32  south_sea                            3 non-null      object 
 33  disability                           8 non-null      object 
 34  nesb                                 9 non-null      object 
dtypes: bool(18), float64(2), int64(1), object(14)
memory usage: 49.2+ KB
In [27]:
tafe_survey_updated.separationtype.value_counts()
Out[27]:
Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
Name: separationtype, dtype: int64

Extract only cases where seperationtype column gives Resignation value

In [28]:
tafe_resignations = tafe_survey_updated[tafe_survey_updated.separationtype=='Resignation'].copy()
In [29]:
tafe_resignations.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 340 entries, 3 to 701
Data columns (total 23 columns):
 #   Column                                               Non-Null Count  Dtype  
---  ------                                               --------------  -----  
 0   id                                                   340 non-null    float64
 1   Institute                                            340 non-null    object 
 2   WorkArea                                             340 non-null    object 
 3   cease_date                                           335 non-null    float64
 4   separationtype                                       340 non-null    object 
 5   Contributing Factors. Career Move - Public Sector    332 non-null    object 
 6   Contributing Factors. Career Move - Private Sector   332 non-null    object 
 7   Contributing Factors. Career Move - Self-employment  332 non-null    object 
 8   Contributing Factors. Ill Health                     332 non-null    object 
 9   Contributing Factors. Maternity/Family               332 non-null    object 
 10  Contributing Factors. Dissatisfaction                332 non-null    object 
 11  Contributing Factors. Job Dissatisfaction            332 non-null    object 
 12  Contributing Factors. Interpersonal Conflict         332 non-null    object 
 13  Contributing Factors. Study                          332 non-null    object 
 14  Contributing Factors. Travel                         332 non-null    object 
 15  Contributing Factors. Other                          332 non-null    object 
 16  Contributing Factors. NONE                           332 non-null    object 
 17  gender                                               290 non-null    object 
 18  age                                                  290 non-null    object 
 19  employment_status                                    290 non-null    object 
 20  position                                             290 non-null    object 
 21  institute_service                                    290 non-null    object 
 22  role_service                                         290 non-null    object 
dtypes: float64(2), object(21)
memory usage: 63.8+ KB
In [30]:
dete_resignations.cease_date.value_counts()
Out[30]:
2012       126
2013        74
01/2014     22
12/2013     17
06/2013     14
09/2013     11
11/2013      9
07/2013      9
10/2013      6
08/2013      4
05/2012      2
05/2013      2
09/2010      1
2010         1
07/2012      1
07/2006      1
Name: cease_date, dtype: int64

Convert "cease_date" column from string to float, extracting only year

In [31]:
pattern = r"(2\d{3})"
cease_yr = dete_resignations['cease_date'].str.extract(pattern)
dete_resignations['cease_date'] = cease_yr.squeeze().astype('float') # convert to float and squeeze to column array/Series
In [32]:
# check resulting cease date column
print(dete_resignations['cease_date'] .value_counts().sort_index())
print('Contained NaN values: {}'.format(dete_resignations['cease_date'].isnull().sum()))
2006.0      1
2010.0      2
2012.0    129
2013.0    146
2014.0     22
Name: cease_date, dtype: int64
Contained NaN values: 11
In [33]:
#check start date column
print(dete_resignations['dete_start_date'] .value_counts().sort_index())
print('Contained NaN values: {}'.format(dete_resignations['dete_start_date'] .isnull().sum()))
1963.0     1
1971.0     1
1972.0     1
1973.0     1
1974.0     2
1975.0     1
1976.0     2
1977.0     1
1980.0     5
1982.0     1
1983.0     2
1984.0     1
1985.0     3
1986.0     3
1987.0     1
1988.0     4
1989.0     4
1990.0     5
1991.0     4
1992.0     6
1993.0     5
1994.0     6
1995.0     4
1996.0     6
1997.0     5
1998.0     6
1999.0     8
2000.0     9
2001.0     3
2002.0     6
2003.0     6
2004.0    14
2005.0    15
2006.0    13
2007.0    21
2008.0    22
2009.0    13
2010.0    17
2011.0    24
2012.0    21
2013.0    10
Name: dete_start_date, dtype: int64
Contained NaN values: 28
In [34]:
#check tafe cease_date
print(tafe_resignations['cease_date'].value_counts().sort_index())
print('Contained NaN values:{}'.format(tafe_resignations['cease_date'].isnull().sum()))
2009.0      2
2010.0     68
2011.0    116
2012.0     94
2013.0     55
Name: cease_date, dtype: int64
Contained NaN values:5
In [35]:
import matplotlib.pyplot as plt
In [36]:
fig,ax = plt.subplots(1,2)

dete_resignations.boxplot(column='cease_date',ax=ax[0])
tafe_resignations.boxplot(column='cease_date',ax=ax[1])
ax[0].set_title('DETE')
ax[1].set_title('TAFE')
fig.tight_layout()

Observations :

  • DETE cease_date column dates are spread from 2012-2014,with date beeing spread from quartiles Q1(25%) - Q3(75%) between 2012/13
  • TAFE cease_date column dates are spread from 2010-2013,with date beeing spread from quartiles Q1(25%) - Q3(75%) between 2010/12
In [37]:
tafe_resignations['institute_service'].value_counts()
Out[37]:
Less than 1 year      73
1-2                   64
3-4                   63
5-6                   33
11-20                 26
7-10                  21
More than 20 years    10
Name: institute_service, dtype: int64

Create new column, calculated time worked at DETE from difference of "cease_date" - "dete_start_date"

In [38]:
dete_resignations['institute_service'] = dete_resignations['cease_date'] - dete_resignations['dete_start_date']
In [39]:
#check results
print(dete_resignations['institute_service'].value_counts().sort_index())
print('Contained NaN values: {}'.format(dete_resignations['institute_service'].isnull().sum()))
0.0     20
1.0     22
2.0     14
3.0     20
4.0     16
5.0     23
6.0     17
7.0     13
8.0      8
9.0     14
10.0     6
11.0     4
12.0     6
13.0     8
14.0     6
15.0     7
16.0     5
17.0     6
18.0     5
19.0     3
20.0     7
21.0     3
22.0     6
23.0     4
24.0     4
25.0     2
26.0     2
27.0     1
28.0     2
29.0     1
30.0     2
31.0     1
32.0     3
33.0     1
34.0     1
35.0     1
36.0     2
38.0     1
39.0     3
41.0     1
42.0     1
49.0     1
Name: institute_service, dtype: int64
Contained NaN values: 38
In [40]:
dete_resignations.boxplot(column='institute_service')
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa840571c0>

Observations:

  • The worked time at DETE (column "institute_service") is spread from 0-49 years, the column contains 38 NaN values

Extract employees resignments due to dissatisfaction

Convert 'Contributing Factors. Dissatisfaction' column to boolean values

In [41]:
## ALTERNATIVE CODE
#rp_dict = {'Contributing Factors. Dissatisfaction ':True,
#          '-':False}
#tafe_resignations['Contributing Factors. Dissatisfaction'] = tafe_resignations['Contributing Factors. Dissatisfaction'].replace(rp_dict)
#tafe_resignations['Contributing Factors. Dissatisfaction'].value_counts()

#rp_dict = {'Job Dissatisfaction':True,
#          '-':False}
#tafe_resignations['Contributing Factors. Job Dissatisfaction'] = tafe_resignations['Contributing Factors. Job Dissatisfaction'].replace(rp_dict)
#tafe_resignations['Contributing Factors. Job Dissatisfaction'].value_counts()
In [42]:
print(tafe_resignations['Contributing Factors. Dissatisfaction'].value_counts())
print(tafe_resignations['Contributing Factors. Job Dissatisfaction'].value_counts())
-                                         277
Contributing Factors. Dissatisfaction      55
Name: Contributing Factors. Dissatisfaction, dtype: int64
-                      270
Job Dissatisfaction     62
Name: Contributing Factors. Job Dissatisfaction, dtype: int64
In [43]:
def update_vals(element):
    if pd.isnull(element):
        return np.nan
    elif element == '-':
        return False
    else:
        return True

Update columns 'Contruting Factors. Job Dissatisfaction' and ' Contibuting Factor. Dissatisfaction' with boolean values (True/False) or NaN

In [44]:
tafe_resignations[['Contributing Factors. Job Dissatisfaction','Contributing Factors. Dissatisfaction']] = tafe_resignations[['Contributing Factors. Job Dissatisfaction','Contributing Factors. Dissatisfaction']].applymap(update_vals)

Create a single column dissatisfaction containig boolean value in function of the two columns:

  • Contributing Factors. Job Dissatisfaction
  • Contributing Factors. Dissatisfaction
In [45]:
factors = ['Contributing Factors. Job Dissatisfaction','Contributing Factors. Dissatisfaction']
tafe_resignations['dissatisfied'] = tafe_resignations[factors].any(axis=1,skipna=False)

Check if NaN elements are skipped >> represented as NaN

In [46]:
ii_nan = (tafe_resignations[factors[0]].isnull()) | (tafe_resignations[factors[1]].isnull())
In [47]:
tafe_resignations.loc[ii_nan,["dissatisfied","Contributing Factors. Dissatisfaction","Contributing Factors. Dissatisfaction"]]
Out[47]:
dissatisfied Contributing Factors. Dissatisfaction Contributing Factors. Dissatisfaction
16 NaN NaN NaN
18 NaN NaN NaN
51 NaN NaN NaN
258 NaN NaN NaN
276 NaN NaN NaN
437 NaN NaN NaN
513 NaN NaN NaN
670 NaN NaN NaN
In [48]:
display(tafe_resignations[factors[0]].value_counts(dropna=False))
display(tafe_resignations[factors[1]].value_counts(dropna=False))
display(tafe_resignations['dissatisfied'].value_counts(dropna=False))
False    270
True      62
NaN        8
Name: Contributing Factors. Job Dissatisfaction, dtype: int64
False    277
True      55
NaN        8
Name: Contributing Factors. Dissatisfaction, dtype: int64
False    241
True      91
NaN        8
Name: dissatisfied, dtype: int64
In [49]:
tafe_resignations_up = tafe_resignations.copy()

Create a single column 'dissatisfied' containing boolean value in function of the below mentioned 9 columns indicating dissatisfaction:

  • job_dissatisfaction
  • dissatisfaction_with_the_department
  • physical_work_environment
  • lack_of_recognition
  • lack_of_job_security
  • work_location
  • employment_conditions
  • work_life_balance
  • workload
In [50]:
factors =  ['job_dissatisfaction','dissatisfaction_with_the_department',
            'physical_work_environment','lack_of_recognition',
            'lack_of_job_security','work_location','employment_conditions',
           'work_life_balance','workload']
dete_resignations['dissatisfied'] = dete_resignations[factors].any(axis=1,skipna=False)
In [51]:
ii_nan = (dete_resignations[factors[0]].isnull()) | \
         (dete_resignations[factors[1]].isnull()) | \
         (dete_resignations[factors[2]].isnull()) | \
         (dete_resignations[factors[3]].isnull()) | \
         (dete_resignations[factors[4]].isnull()) | \
         (dete_resignations[factors[5]].isnull()) | \
         (dete_resignations[factors[6]].isnull()) | \
         (dete_resignations[factors[7]].isnull()) | \
         (dete_resignations[factors[8]].isnull()) 

Check for NaN in the 9 "dissatified" columns

In [52]:
[dete_resignations[el].isnull().sum() for el in factors]
Out[52]:
[0, 0, 0, 0, 0, 0, 0, 0, 0]
In [53]:
dc_str = [el for el in factors]
dc_str.append("dissatisfied")
dete_resignations.loc[ii_nan,dc_str]
Out[53]:
job_dissatisfaction dissatisfaction_with_the_department physical_work_environment lack_of_recognition lack_of_job_security work_location employment_conditions work_life_balance workload dissatisfied
In [54]:
dete_resignations_up = dete_resignations.copy()

FRAME8: Combine data frames: DETE and TAFE

In [55]:
dete_resignations_up['institute'] = 'DETE'
tafe_resignations_up['institute'] = 'TAFE'
In [56]:
dete_resignations_up['age'].value_counts()
Out[56]:
41-45            48
46-50            42
36-40            41
26-30            35
51-55            32
31-35            29
21-25            29
56-60            26
61 or older      23
20 or younger     1
Name: age, dtype: int64
In [57]:
tafe_resignations_up['age'].value_counts()
Out[57]:
41  45           45
46  50           39
51-55            39
21  25           33
36  40           32
26  30           32
31  35           32
56 or older      29
20 or younger     9
Name: age, dtype: int64

Examine common columns between both data sets before joining

In [58]:
set.intersection(set(dete_resignations_up.keys()),set(tafe_resignations_up.keys()))
Out[58]:
{'age',
 'cease_date',
 'dissatisfied',
 'employment_status',
 'gender',
 'id',
 'institute',
 'institute_service',
 'position',
 'separationtype'}

Concatenate data sets

In [59]:
combined = pd.concat([dete_resignations_up, tafe_resignations_up],ignore_index=True)
combined.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651 entries, 0 to 650
Data columns (total 53 columns):
 #   Column                                               Non-Null Count  Dtype  
---  ------                                               --------------  -----  
 0   id                                                   651 non-null    float64
 1   separationtype                                       651 non-null    object 
 2   cease_date                                           635 non-null    float64
 3   dete_start_date                                      283 non-null    float64
 4   role_start_date                                      271 non-null    float64
 5   position                                             598 non-null    object 
 6   classification                                       161 non-null    object 
 7   region                                               265 non-null    object 
 8   business_unit                                        32 non-null     object 
 9   employment_status                                    597 non-null    object 
 10  career_move_to_public_sector                         311 non-null    object 
 11  career_move_to_private_sector                        311 non-null    object 
 12  interpersonal_conflicts                              311 non-null    object 
 13  job_dissatisfaction                                  311 non-null    object 
 14  dissatisfaction_with_the_department                  311 non-null    object 
 15  physical_work_environment                            311 non-null    object 
 16  lack_of_recognition                                  311 non-null    object 
 17  lack_of_job_security                                 311 non-null    object 
 18  work_location                                        311 non-null    object 
 19  employment_conditions                                311 non-null    object 
 20  maternity/family                                     311 non-null    object 
 21  relocation                                           311 non-null    object 
 22  study/travel                                         311 non-null    object 
 23  ill_health                                           311 non-null    object 
 24  traumatic_incident                                   311 non-null    object 
 25  work_life_balance                                    311 non-null    object 
 26  workload                                             311 non-null    object 
 27  none_of_the_above                                    311 non-null    object 
 28  gender                                               592 non-null    object 
 29  age                                                  596 non-null    object 
 30  aboriginal                                           7 non-null      object 
 31  torres_strait                                        0 non-null      object 
 32  south_sea                                            3 non-null      object 
 33  disability                                           8 non-null      object 
 34  nesb                                                 9 non-null      object 
 35  institute_service                                    563 non-null    object 
 36  dissatisfied                                         643 non-null    object 
 37  institute                                            651 non-null    object 
 38  Institute                                            340 non-null    object 
 39  WorkArea                                             340 non-null    object 
 40  Contributing Factors. Career Move - Public Sector    332 non-null    object 
 41  Contributing Factors. Career Move - Private Sector   332 non-null    object 
 42  Contributing Factors. Career Move - Self-employment  332 non-null    object 
 43  Contributing Factors. Ill Health                     332 non-null    object 
 44  Contributing Factors. Maternity/Family               332 non-null    object 
 45  Contributing Factors. Dissatisfaction                332 non-null    object 
 46  Contributing Factors. Job Dissatisfaction            332 non-null    object 
 47  Contributing Factors. Interpersonal Conflict         332 non-null    object 
 48  Contributing Factors. Study                          332 non-null    object 
 49  Contributing Factors. Travel                         332 non-null    object 
 50  Contributing Factors. Other                          332 non-null    object 
 51  Contributing Factors. NONE                           332 non-null    object 
 52  role_service                                         290 non-null    object 
dtypes: float64(4), object(49)
memory usage: 269.7+ KB

Drop all columns with more then 500 NaN

In [60]:
combined_updated = combined.dropna(axis=1,thresh=500)
combined_updated.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651 entries, 0 to 650
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 651 non-null    float64
 1   separationtype     651 non-null    object 
 2   cease_date         635 non-null    float64
 3   position           598 non-null    object 
 4   employment_status  597 non-null    object 
 5   gender             592 non-null    object 
 6   age                596 non-null    object 
 7   institute_service  563 non-null    object 
 8   dissatisfied       643 non-null    object 
 9   institute          651 non-null    object 
dtypes: float64(2), object(8)
memory usage: 51.0+ KB

Checking contents of ["institute_service"] column

In [61]:
combined_updated.institute_service.value_counts()
Out[61]:
Less than 1 year      73
1-2                   64
3-4                   63
5-6                   33
11-20                 26
5.0                   23
1.0                   22
7-10                  21
3.0                   20
0.0                   20
6.0                   17
4.0                   16
9.0                   14
2.0                   14
7.0                   13
More than 20 years    10
13.0                   8
8.0                    8
20.0                   7
15.0                   7
14.0                   6
17.0                   6
12.0                   6
10.0                   6
22.0                   6
18.0                   5
16.0                   5
24.0                   4
23.0                   4
11.0                   4
39.0                   3
19.0                   3
21.0                   3
32.0                   3
25.0                   2
26.0                   2
36.0                   2
28.0                   2
30.0                   2
42.0                   1
49.0                   1
35.0                   1
34.0                   1
38.0                   1
33.0                   1
29.0                   1
27.0                   1
41.0                   1
31.0                   1
Name: institute_service, dtype: int64
In [62]:
combined_updated['institute_service'].astype('str').value_counts()
Out[62]:
nan                   88
Less than 1 year      73
1-2                   64
3-4                   63
5-6                   33
11-20                 26
5.0                   23
1.0                   22
7-10                  21
0.0                   20
3.0                   20
6.0                   17
4.0                   16
2.0                   14
9.0                   14
7.0                   13
More than 20 years    10
13.0                   8
8.0                    8
15.0                   7
20.0                   7
17.0                   6
14.0                   6
10.0                   6
22.0                   6
12.0                   6
18.0                   5
16.0                   5
11.0                   4
24.0                   4
23.0                   4
39.0                   3
32.0                   3
21.0                   3
19.0                   3
28.0                   2
25.0                   2
26.0                   2
30.0                   2
36.0                   2
41.0                   1
38.0                   1
49.0                   1
34.0                   1
35.0                   1
33.0                   1
42.0                   1
31.0                   1
29.0                   1
27.0                   1
Name: institute_service, dtype: int64

Frame 9: Extract numbers/years and calc float value

In [63]:
# map function - calculating YEAR info
# - based on extracted> float-converted 2 columns of numbers
def calc_yrs(ycol):
    if ycol[1]==0:       # if second number is zero (original value notes as float e.g. 3.0)
        return ycol[0]   # return 1st column
    else:
        return (ycol[0]+ycol[1])/2   # return average of 1st+2nd number

Extract the years worked at service in numeric/float format

In [64]:
yrs_str = combined_updated['institute_service'].astype('str')  # convert to string

patt = r"([0-9]+)[-\.]?([0-9]+)?" #regular exp pattern to extract up to 2 numbers

yrs_extr = yrs_str.str.extract(patt)              # extract numbers
yrs_cln  = yrs_extr.dropna(how='all')             # remove all lines with both elements NaN - 88 lines(ref above)
yrs_calc = yrs_cln.fillna('0').astype('float').apply(calc_yrs,axis=1) #fill single column NaN with '0'   >> convert to float >> deploy calc_yrs function 

Mapping function for Carreer stages

In [65]:
# map function to convert YEAR info to carreer stages
def carr_stage(el):
    if el<3.0:
        return 'New'
    elif (el>=3.0) & (el<7.0):
        return 'Experienced'
    elif (el>=7.0) & (el<11.0):
        return 'Established'
    elif el>=11.0:
        return 'Veteran'
 
In [66]:
combined_updated.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651 entries, 0 to 650
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 651 non-null    float64
 1   separationtype     651 non-null    object 
 2   cease_date         635 non-null    float64
 3   position           598 non-null    object 
 4   employment_status  597 non-null    object 
 5   gender             592 non-null    object 
 6   age                596 non-null    object 
 7   institute_service  563 non-null    object 
 8   dissatisfied       643 non-null    object 
 9   institute          651 non-null    object 
dtypes: float64(2), object(8)
memory usage: 51.0+ KB

Create a new colum based on the calculated years at service Series mapped with the carreer stages

In [119]:
# add column "service_cat"
#combined_updated['service_cat'] = yrs_calc
#combined_updated['service_cat'].map(carr_stage,inplace=True)
#combined_updated['service_cat'] = yrs_calc.map(carr_stage)#.copy()
combined_updated.loc[:,'service_cat'] = yrs_calc.map(carr_stage).copy()
In [68]:
combined_updated.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651 entries, 0 to 650
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 651 non-null    float64
 1   separationtype     651 non-null    object 
 2   cease_date         635 non-null    float64
 3   position           598 non-null    object 
 4   employment_status  597 non-null    object 
 5   gender             592 non-null    object 
 6   age                596 non-null    object 
 7   institute_service  563 non-null    object 
 8   dissatisfied       643 non-null    object 
 9   institute          651 non-null    object 
 10  service_cat        563 non-null    object 
dtypes: float64(2), object(9)
memory usage: 56.1+ KB

Frame 10: First analysis

In [69]:
combined_updated.dissatisfied.value_counts(dropna=False)
Out[69]:
False    403
True     240
NaN        8
Name: dissatisfied, dtype: int64

Fill remaining NaN in ["dissatisfied"] column with "False", as it is the most occuring value

In [70]:
combined_updated.loc[:,'dissatisfied'] = combined_updated.dissatisfied.fillna(value=False).copy()
In [71]:
combined_updated.dissatisfied.value_counts(dropna=False)
Out[71]:
False    411
True     240
Name: dissatisfied, dtype: int64

Cross-Check values vs pivot_table/mean approach

In [72]:
def dissatisfied_perc(group):
    n = len(group)
    diss_n = group[group==True].value_counts()
    return diss_n/n*100
In [73]:
combined_updated.pivot_table(values='dissatisfied',index='service_cat',aggfunc=dissatisfied_perc)
Out[73]:
dissatisfied
service_cat
Established 51.612903
Experienced 34.302326
New 29.533679
Veteran 48.529412
In [84]:
combined_updated.pivot_table(values='dissatisfied',index='service_cat')*100
Out[84]:
dissatisfied
service_cat
Established 51.612903
Experienced 34.302326
New 29.533679
Veteran 48.529412
In [75]:
combined_updated.pivot_table(values='dissatisfied',index='service_cat').reindex(['New','Experienced','Established','Veteran']).plot(kind='bar')
Out[75]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa84362430>

Conclusion 1

Answering the first question:
" Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction?"
" What about employees who have been there longer?"

  • We observe a continuous increase of dissatisfaction with seniority, peaking with 7-10 years(Established).


Therefore we can say that "New" employees are not resigning due to dissatisfaction with higher rates then other seniority levels.
The highest rates were found for the "Established" level.

Compare DETE vs TAFE

In [82]:
combined_updated.pivot_table(values='dissatisfied',index=['institute','service_cat']).reindex([['DETE','DETE','DETE','DETE','TAFE','TAFE','TAFE','TAFE'],['New','Experienced','Established','Veteran','New','Experienced','Established','Veteran']]).plot(kind='bar',stacked=True)
Out[82]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa84322160>

Conclusion

  • The DETE sourced data seems to be responsible for the trend we observed in the combined data set
  • TAFE datas shows equallevel of dissatisfaction except for a small peak for the Established (7-10yrs) employees
    This can be confirmed in the alternative plot below in which we displayed the data from DETE (blue) and TAFE (red) as stacked bar graphs
In [83]:
fig,axi = plt.subplots(1,1)
combined_updated[combined_updated.institute=='DETE'].pivot_table(values='dissatisfied',index='service_cat').reindex(['New','Experienced','Established','Veteran']).plot(kind='bar',ax=axi,color='b',stacked=True)
combined_updated[combined_updated.institute=='TAFE'].pivot_table(values='dissatisfied',index='service_cat').reindex(['New','Experienced','Established','Veteran']).plot(kind='bar',ax=axi,color='r',stacked=True)
Out[83]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa7d794250>

Analyse influence of Age

In [77]:
combined_updated['age'].value_counts()
Out[77]:
51-55            71
41-45            48
41  45           45
46-50            42
36-40            41
46  50           39
26-30            35
21  25           33
36  40           32
26  30           32
31  35           32
21-25            29
56 or older      29
31-35            29
56-60            26
61 or older      23
20 or younger    10
Name: age, dtype: int64
In [128]:
rp_age = {'41  45':'41-45',
          '46  50':'46-50',
          '21  25':'21-25',
          '36  40':'36-40',
          '31  35':'31-35',
          '26  30':'26-30'}
combined_updated['age'].replace(rp_age,inplace=True)
In [127]:
combined_updated.pivot_table(values='dissatisfied',index='age').plot(kind='bar')
Out[127]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa7d234250>

Conclusion 2

Answering the second question:
"Are younger employees resigning due to some kind of dissatisfaction?
"What about older employees?"

  • Dissaticfaction seems to increase with age, peaking for age groups older than 56yrs (remark that "56 or older" bar could be equally added to the "56-60" and "61 or older" bars)
  • A first peakof dissatisfaction seems to occur for the 26-30yrs old


Difficult to detect a significant trend for younger employees (<30years), but dissatisfaction seems to peak with the age group of 26-30 years. As this is as well the age group where most people become parents for a first time being aged 25-34 years:
https://aifs.gov.au/facts-and-figures/births-in-australia
It would have been interesting to investigate the influence of an upcoming child birth on the resigning rate in that age group


We abosere a rather big jump in resigning rate for the age groups above 56years

Quick visualization of remaining "NaN/Null" Values

In [76]:
sns.heatmap(combined_updated.isnull(), cbar=False)
Out[76]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ffa84802a90>