Clean and analyse employee exit surveys

This project entails working with exit surveys from employees of the Department of Education, training and Employment(DETE) and the Technical and Further Education(TAFE) institue in Queensland, AUstralia.

project goal?

Present to the stakeholders some key important answers to some questions which include:

1) Are employees who worked for a period of short time resigning due to a kind of dissatisfaction? considering the employees who have worked there for a longer period of time.

2) Are younger employees resigning due to some kind of dissatisfaction while considering the older employees.

In [1]:
#start by first reading in the datasets into pandas. this will be done by first importing the necessary libraries(pandas and numpy)
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from importlib import reload
reload(plt)

#i will read in the csv files right now using pandas
dete_survey = pd.read_csv('dete_survey.csv')
tafe_survey = pd.read_csv('tafe_survey.csv')

#lets access the information in our files using the dataframe method
In [2]:
dete_survey.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 822 entries, 0 to 821
Data columns (total 56 columns):
ID                                     822 non-null int64
SeparationType                         822 non-null object
Cease Date                             822 non-null object
DETE Start Date                        822 non-null object
Role Start Date                        822 non-null object
Position                               817 non-null object
Classification                         455 non-null object
Region                                 822 non-null object
Business Unit                          126 non-null object
Employment Status                      817 non-null object
Career move to public sector           822 non-null bool
Career move to private sector          822 non-null bool
Interpersonal conflicts                822 non-null bool
Job dissatisfaction                    822 non-null bool
Dissatisfaction with the department    822 non-null bool
Physical work environment              822 non-null bool
Lack of recognition                    822 non-null bool
Lack of job security                   822 non-null bool
Work location                          822 non-null bool
Employment conditions                  822 non-null bool
Maternity/family                       822 non-null bool
Relocation                             822 non-null bool
Study/Travel                           822 non-null bool
Ill Health                             822 non-null bool
Traumatic incident                     822 non-null bool
Work life balance                      822 non-null bool
Workload                               822 non-null bool
None of the above                      822 non-null bool
Professional Development               808 non-null object
Opportunities for promotion            735 non-null object
Staff morale                           816 non-null object
Workplace issue                        788 non-null object
Physical environment                   817 non-null object
Worklife balance                       815 non-null object
Stress and pressure support            810 non-null object
Performance of supervisor              813 non-null object
Peer support                           812 non-null object
Initiative                             813 non-null object
Skills                                 811 non-null object
Coach                                  767 non-null object
Career Aspirations                     746 non-null object
Feedback                               792 non-null object
Further PD                             768 non-null object
Communication                          814 non-null object
My say                                 812 non-null object
Information                            816 non-null object
Kept informed                          813 non-null object
Wellness programs                      766 non-null object
Health & Safety                        793 non-null object
Gender                                 798 non-null object
Age                                    811 non-null object
Aboriginal                             16 non-null object
Torres Strait                          3 non-null object
South Sea                              7 non-null object
Disability                             23 non-null object
NESB                                   32 non-null object
dtypes: bool(18), int64(1), object(37)
memory usage: 258.6+ KB
In [3]:
dete_survey.head(20)
Out[3]:
ID SeparationType Cease Date DETE Start Date Role Start Date Position Classification Region Business Unit Employment Status ... Kept informed Wellness programs Health & Safety Gender Age Aboriginal Torres Strait South Sea Disability NESB
0 1 Ill Health Retirement 08/2012 1984 2004 Public Servant A01-A04 Central Office Corporate Strategy and Peformance Permanent Full-time ... N N N Male 56-60 NaN NaN NaN NaN Yes
1 2 Voluntary Early Retirement (VER) 08/2012 Not Stated Not Stated Public Servant AO5-AO7 Central Office Corporate Strategy and Peformance Permanent Full-time ... N N N Male 56-60 NaN NaN NaN NaN NaN
2 3 Voluntary Early Retirement (VER) 05/2012 2011 2011 Schools Officer NaN Central Office Education Queensland Permanent Full-time ... N N N Male 61 or older NaN NaN NaN NaN NaN
3 4 Resignation-Other reasons 05/2012 2005 2006 Teacher Primary Central Queensland NaN Permanent Full-time ... A N A Female 36-40 NaN NaN NaN NaN NaN
4 5 Age Retirement 05/2012 1970 1989 Head of Curriculum/Head of Special Education NaN South East NaN Permanent Full-time ... N A M Female 61 or older NaN NaN NaN NaN NaN
5 6 Resignation-Other reasons 05/2012 1994 1997 Guidance Officer NaN Central Office Education Queensland Permanent Full-time ... D D NaN Female 41-45 NaN NaN NaN NaN NaN
6 7 Age Retirement 05/2012 1972 2007 Teacher Secondary Darling Downs South West NaN Permanent Part-time ... D D SD Female 56-60 NaN NaN NaN NaN NaN
7 8 Age Retirement 05/2012 1988 1990 Teacher Aide NaN North Coast NaN Permanent Part-time ... SA NaN SA Female 61 or older NaN NaN NaN NaN NaN
8 9 Resignation-Other reasons 07/2012 2009 2009 Teacher Secondary North Queensland NaN Permanent Full-time ... A D N Female 31-35 NaN NaN NaN NaN NaN
9 10 Resignation-Other employer 2012 1997 2008 Teacher Aide NaN Not Stated NaN Permanent Part-time ... SD SD SD Female 46-50 NaN NaN NaN NaN NaN
10 11 Age Retirement 2012 1999 1999 Teacher Primary Central Office Education Queensland Permanent Full-time ... A NaN A Male 61 or older NaN NaN NaN NaN NaN
11 12 Resignation-Move overseas/interstate 2012 2009 2009 Teacher Secondary Far North Queensland NaN Permanent Full-time ... N N N Male 31-35 NaN NaN NaN NaN NaN
12 13 Resignation-Other reasons 2012 1998 1998 Teacher Primary Far North Queensland NaN Permanent Full-time ... SA A A Female 36-40 NaN NaN NaN NaN NaN
13 14 Age Retirement 2012 1967 2000 Teacher Primary Metropolitan NaN Permanent Part-time ... A D A Female 61 or older NaN NaN NaN NaN NaN
14 15 Resignation-Other employer 2012 2007 2010 Teacher Secondary Central Queensland NaN Permanent Full-time ... SA N SA Male 31-35 NaN NaN NaN NaN NaN
15 16 Voluntary Early Retirement (VER) 2012 1995 2004 Teacher Secondary Central Queensland NaN Permanent Full-time ... A N A Male 61 or older NaN NaN NaN NaN NaN
16 17 Resignation-Other reasons 2012 Not Stated Not Stated Teacher Aide NaN South East NaN Permanent Part-time ... M M M Male 61 or older NaN NaN NaN NaN NaN
17 18 Age Retirement 2012 1996 1996 Teacher Primary Central Queensland NaN Permanent Full-time ... A A A Female 56-60 NaN NaN NaN NaN NaN
18 19 Age Retirement 2012 2006 2006 Cleaner NaN Central Office Education Queensland Permanent Full-time ... A A A Female 61 or older NaN NaN NaN NaN NaN
19 20 Age Retirement 2012 1989 1989 Cleaner NaN Central Office Education Queensland Permanent Full-time ... A A A Male 61 or older NaN NaN NaN NaN NaN

20 rows × 56 columns

In [4]:
#let us look closer to each columns missing value sum

dete_survey['Classification'].value_counts()
Out[4]:
Primary              161
Secondary            124
A01-A04               66
AO5-AO7               46
Special Education     33
AO8 and Above         14
PO1-PO4                8
Middle                 3
Name: Classification, dtype: int64
In [5]:
#let us take a look at the age variation
dete_survey['Age'].value_counts()
Out[5]:
61 or older      222
56-60            174
51-55            103
46-50             63
41-45             61
26-30             57
36-40             51
21-25             40
31-35             39
20 or younger      1
Name: Age, dtype: int64
In [6]:
#let us look at disability column
dete_survey['Disability'].value_counts()
Out[6]:
Yes    23
Name: Disability, dtype: int64
In [7]:
#lets see how many missing values each columns have in our dete_survey
dete_survey.isnull().sum()
Out[7]:
ID                                       0
SeparationType                           0
Cease Date                               0
DETE Start Date                          0
Role Start Date                          0
Position                                 5
Classification                         367
Region                                   0
Business Unit                          696
Employment Status                        5
Career move to public sector             0
Career move to private sector            0
Interpersonal conflicts                  0
Job dissatisfaction                      0
Dissatisfaction with the department      0
Physical work environment                0
Lack of recognition                      0
Lack of job security                     0
Work location                            0
Employment conditions                    0
Maternity/family                         0
Relocation                               0
Study/Travel                             0
Ill Health                               0
Traumatic incident                       0
Work life balance                        0
Workload                                 0
None of the above                        0
Professional Development                14
Opportunities for promotion             87
Staff morale                             6
Workplace issue                         34
Physical environment                     5
Worklife balance                         7
Stress and pressure support             12
Performance of supervisor                9
Peer support                            10
Initiative                               9
Skills                                  11
Coach                                   55
Career Aspirations                      76
Feedback                                30
Further PD                              54
Communication                            8
My say                                  10
Information                              6
Kept informed                            9
Wellness programs                       56
Health & Safety                         29
Gender                                  24
Age                                     11
Aboriginal                             806
Torres Strait                          819
South Sea                              815
Disability                             799
NESB                                   790
dtype: int64

I will explore the dete_survey with large NaN values to get a deep view of what is going on

In [8]:
dete_survey['Classification'].value_counts()
Out[8]:
Primary              161
Secondary            124
A01-A04               66
AO5-AO7               46
Special Education     33
AO8 and Above         14
PO1-PO4                8
Middle                 3
Name: Classification, dtype: int64
In [9]:
#lets see what the tafe_survey has to tell us
tafe_survey.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 702 entries, 0 to 701
Data columns (total 72 columns):
Record ID                                                                                                                                                        702 non-null float64
Institute                                                                                                                                                        702 non-null object
WorkArea                                                                                                                                                         702 non-null object
CESSATION YEAR                                                                                                                                                   695 non-null float64
Reason for ceasing employment                                                                                                                                    701 non-null object
Contributing Factors. Career Move - Public Sector                                                                                                                437 non-null object
Contributing Factors. Career Move - Private Sector                                                                                                               437 non-null object
Contributing Factors. Career Move - Self-employment                                                                                                              437 non-null object
Contributing Factors. Ill Health                                                                                                                                 437 non-null object
Contributing Factors. Maternity/Family                                                                                                                           437 non-null object
Contributing Factors. Dissatisfaction                                                                                                                            437 non-null object
Contributing Factors. Job Dissatisfaction                                                                                                                        437 non-null object
Contributing Factors. Interpersonal Conflict                                                                                                                     437 non-null object
Contributing Factors. Study                                                                                                                                      437 non-null object
Contributing Factors. Travel                                                                                                                                     437 non-null object
Contributing Factors. Other                                                                                                                                      437 non-null object
Contributing Factors. NONE                                                                                                                                       437 non-null object
Main Factor. Which of these was the main factor for leaving?                                                                                                     113 non-null object
InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction                                                                           608 non-null object
InstituteViews. Topic:2. I was given access to skills training to help me do my job better                                                                       613 non-null object
InstituteViews. Topic:3. I was given adequate opportunities for personal development                                                                             610 non-null object
InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%                                                              608 non-null object
InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had                                                                  615 non-null object
InstituteViews. Topic:6. The organisation recognised when staff did good work                                                                                    607 non-null object
InstituteViews. Topic:7. Management was generally supportive of me                                                                                               614 non-null object
InstituteViews. Topic:8. Management was generally supportive of my team                                                                                          608 non-null object
InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me                                                            610 non-null object
InstituteViews. Topic:10. Staff morale was positive within the Institute                                                                                         602 non-null object
InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly                                                                                   601 non-null object
InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently                                                                               597 non-null object
InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly                                                                                601 non-null object
WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit                                                  609 non-null object
WorkUnitViews. Topic:15. I worked well with my colleagues                                                                                                        605 non-null object
WorkUnitViews. Topic:16. My job was challenging and interesting                                                                                                  607 non-null object
WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work                                                                          610 non-null object
WorkUnitViews. Topic:18. I had sufficient contact with other people in my job                                                                                    613 non-null object
WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job                                                     609 non-null object
WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job                                                                                 609 non-null object
WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT]    608 non-null object
WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job                                                                              608 non-null object
WorkUnitViews. Topic:23. My job provided sufficient variety                                                                                                      611 non-null object
WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job                                                                      610 non-null object
WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction                                                          611 non-null object
WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance                                                      606 non-null object
WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area                         610 non-null object
WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date    609 non-null object
WorkUnitViews. Topic:29. There was adequate communication between staff in my unit                                                                               603 non-null object
WorkUnitViews. Topic:30. Staff morale was positive within my work unit                                                                                           606 non-null object
Induction. Did you undertake Workplace Induction?                                                                                                                619 non-null object
InductionInfo. Topic:Did you undertake a Corporate Induction?                                                                                                    432 non-null object
InductionInfo. Topic:Did you undertake a Institute Induction?                                                                                                    483 non-null object
InductionInfo. Topic: Did you undertake Team Induction?                                                                                                          440 non-null object
InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?                                                        555 non-null object
InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?                                                             555 non-null object
InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?                                                                                   555 non-null object
InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?                                                                                       530 non-null object
InductionInfo. On-line Topic:Did you undertake a Institute Induction?                                                                                            555 non-null object
InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?                                                                                   553 non-null object
InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?                                                                                   555 non-null object
InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]                                                  555 non-null object
InductionInfo. Induction Manual Topic: Did you undertake Team Induction?                                                                                         555 non-null object
Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?                                                        608 non-null object
Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?                                                      594 non-null object
Workplace. Topic:Does your workplace promote and practice the principles of employment equity?                                                                   587 non-null object
Workplace. Topic:Does your workplace value the diversity of its employees?                                                                                       586 non-null object
Workplace. Topic:Would you recommend the Institute as an employer to others?                                                                                     581 non-null object
Gender. What is your Gender?                                                                                                                                     596 non-null object
CurrentAge. Current Age                                                                                                                                          596 non-null object
Employment Type. Employment Type                                                                                                                                 596 non-null object
Classification. Classification                                                                                                                                   596 non-null object
LengthofServiceOverall. Overall Length of Service at Institute (in years)                                                                                        596 non-null object
LengthofServiceCurrent. Length of Service at current workplace (in years)                                                                                        596 non-null object
dtypes: float64(2), object(70)
memory usage: 395.0+ KB
In [10]:
tafe_survey.head()
Out[10]:
Record ID Institute WorkArea CESSATION YEAR Reason for ceasing employment Contributing Factors. Career Move - Public Sector Contributing Factors. Career Move - Private Sector Contributing Factors. Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family ... Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? Workplace. Topic:Does your workplace promote and practice the principles of employment equity? Workplace. Topic:Does your workplace value the diversity of its employees? Workplace. Topic:Would you recommend the Institute as an employer to others? Gender. What is your Gender? CurrentAge. Current Age Employment Type. Employment Type Classification. Classification LengthofServiceOverall. Overall Length of Service at Institute (in years) LengthofServiceCurrent. Length of Service at current workplace (in years)
0 6.341330e+17 Southern Queensland Institute of TAFE Non-Delivery (corporate) 2010.0 Contract Expired NaN NaN NaN NaN NaN ... Yes Yes Yes Yes Female 26 30 Temporary Full-time Administration (AO) 1-2 1-2
1 6.341337e+17 Mount Isa Institute of TAFE Non-Delivery (corporate) 2010.0 Retirement - - - - - ... Yes Yes Yes Yes NaN NaN NaN NaN NaN NaN
2 6.341388e+17 Mount Isa Institute of TAFE Delivery (teaching) 2010.0 Retirement - - - - - ... Yes Yes Yes Yes NaN NaN NaN NaN NaN NaN
3 6.341399e+17 Mount Isa Institute of TAFE Non-Delivery (corporate) 2010.0 Resignation - - - - - ... Yes Yes Yes Yes NaN NaN NaN NaN NaN NaN
4 6.341466e+17 Southern Queensland Institute of TAFE Delivery (teaching) 2010.0 Resignation - Career Move - Private Sector - - - ... Yes Yes Yes Yes Male 41 45 Permanent Full-time Teacher (including LVT) 3-4 3-4

5 rows × 72 columns

In [11]:
tafe_survey.isnull().sum()
Out[11]:
Record ID                                                                                                                                                          0
Institute                                                                                                                                                          0
WorkArea                                                                                                                                                           0
CESSATION YEAR                                                                                                                                                     7
Reason for ceasing employment                                                                                                                                      1
Contributing Factors. Career Move - Public Sector                                                                                                                265
Contributing Factors. Career Move - Private Sector                                                                                                               265
Contributing Factors. Career Move - Self-employment                                                                                                              265
Contributing Factors. Ill Health                                                                                                                                 265
Contributing Factors. Maternity/Family                                                                                                                           265
Contributing Factors. Dissatisfaction                                                                                                                            265
Contributing Factors. Job Dissatisfaction                                                                                                                        265
Contributing Factors. Interpersonal Conflict                                                                                                                     265
Contributing Factors. Study                                                                                                                                      265
Contributing Factors. Travel                                                                                                                                     265
Contributing Factors. Other                                                                                                                                      265
Contributing Factors. NONE                                                                                                                                       265
Main Factor. Which of these was the main factor for leaving?                                                                                                     589
InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction                                                                            94
InstituteViews. Topic:2. I was given access to skills training to help me do my job better                                                                        89
InstituteViews. Topic:3. I was given adequate opportunities for personal development                                                                              92
InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%                                                               94
InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had                                                                   87
InstituteViews. Topic:6. The organisation recognised when staff did good work                                                                                     95
InstituteViews. Topic:7. Management was generally supportive of me                                                                                                88
InstituteViews. Topic:8. Management was generally supportive of my team                                                                                           94
InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me                                                             92
InstituteViews. Topic:10. Staff morale was positive within the Institute                                                                                         100
InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly                                                                                   101
InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently                                                                               105
                                                                                                                                                                ... 
WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction                                                           91
WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance                                                       96
WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area                          92
WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date     93
WorkUnitViews. Topic:29. There was adequate communication between staff in my unit                                                                                99
WorkUnitViews. Topic:30. Staff morale was positive within my work unit                                                                                            96
Induction. Did you undertake Workplace Induction?                                                                                                                 83
InductionInfo. Topic:Did you undertake a Corporate Induction?                                                                                                    270
InductionInfo. Topic:Did you undertake a Institute Induction?                                                                                                    219
InductionInfo. Topic: Did you undertake Team Induction?                                                                                                          262
InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?                                                        147
InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?                                                             147
InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?                                                                                   147
InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?                                                                                       172
InductionInfo. On-line Topic:Did you undertake a Institute Induction?                                                                                            147
InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?                                                                                   149
InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?                                                                                   147
InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]                                                  147
InductionInfo. Induction Manual Topic: Did you undertake Team Induction?                                                                                         147
Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?                                                         94
Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?                                                      108
Workplace. Topic:Does your workplace promote and practice the principles of employment equity?                                                                   115
Workplace. Topic:Does your workplace value the diversity of its employees?                                                                                       116
Workplace. Topic:Would you recommend the Institute as an employer to others?                                                                                     121
Gender. What is your Gender?                                                                                                                                     106
CurrentAge. Current Age                                                                                                                                          106
Employment Type. Employment Type                                                                                                                                 106
Classification. Classification                                                                                                                                   106
LengthofServiceOverall. Overall Length of Service at Institute (in years)                                                                                        106
LengthofServiceCurrent. Length of Service at current workplace (in years)                                                                                        106
Length: 72, dtype: int64

from the explored data, we could see that in out dete_survey

a) We have an unclean data and we need to do a lot of data cleaning

b) Both datasets have quite disimilar data recording and contains a lot of columns we may not necessary need

c) Some NaN values are not explicitly shown as Nan but as 'Not stated' which appeared in any format different from Nan

d) some rows repeated the reason for resignation.

e) Each dataset contains many of thesame columns but the columns have different names.

Now, let us start by handling the NaN and extra not needed columns in our analysis.

cleaning column names

In [12]:
# we will start by reading the dete_survey again as csv but this time,
# we will specify what should be counted as our NaN value

dete_survey = pd.read_csv('dete_survey.csv', na_values='Not Stated')
In [13]:
# let us view our dete_survey again

dete_survey.head()
Out[13]:
ID SeparationType Cease Date DETE Start Date Role Start Date Position Classification Region Business Unit Employment Status ... Kept informed Wellness programs Health & Safety Gender Age Aboriginal Torres Strait South Sea Disability NESB
0 1 Ill Health Retirement 08/2012 1984.0 2004.0 Public Servant A01-A04 Central Office Corporate Strategy and Peformance Permanent Full-time ... N N N Male 56-60 NaN NaN NaN NaN Yes
1 2 Voluntary Early Retirement (VER) 08/2012 NaN NaN Public Servant AO5-AO7 Central Office Corporate Strategy and Peformance Permanent Full-time ... N N N Male 56-60 NaN NaN NaN NaN NaN
2 3 Voluntary Early Retirement (VER) 05/2012 2011.0 2011.0 Schools Officer NaN Central Office Education Queensland Permanent Full-time ... N N N Male 61 or older NaN NaN NaN NaN NaN
3 4 Resignation-Other reasons 05/2012 2005.0 2006.0 Teacher Primary Central Queensland NaN Permanent Full-time ... A N A Female 36-40 NaN NaN NaN NaN NaN
4 5 Age Retirement 05/2012 1970.0 1989.0 Head of Curriculum/Head of Special Education NaN South East NaN Permanent Full-time ... N A M Female 61 or older NaN NaN NaN NaN NaN

5 rows × 56 columns

In [14]:
# let us drop some columns that is not necessary. we wont actually need 
# from column 28 to 49
dete_survey.columns[28:49]
Out[14]:
Index(['Professional Development', 'Opportunities for promotion',
       'Staff morale', 'Workplace issue', 'Physical environment',
       'Worklife balance', 'Stress and pressure support',
       'Performance of supervisor', 'Peer support', 'Initiative', 'Skills',
       'Coach', 'Career Aspirations', 'Feedback', 'Further PD',
       'Communication', 'My say', 'Information', 'Kept informed',
       'Wellness programs', 'Health & Safety'],
      dtype='object')
In [15]:
# let us drop some columns that is not necessary. we wont actually need 
# from column 28 to 49 dete_survey
dete_to_drop = dete_survey.columns[28:49]
dete_survey_updated = dete_survey.drop(dete_to_drop, axis=1)
In [16]:
dete_survey_updated.head()
Out[16]:
ID SeparationType Cease Date DETE Start Date Role Start Date Position Classification Region Business Unit Employment Status ... Work life balance Workload None of the above Gender Age Aboriginal Torres Strait South Sea Disability NESB
0 1 Ill Health Retirement 08/2012 1984.0 2004.0 Public Servant A01-A04 Central Office Corporate Strategy and Peformance Permanent Full-time ... False False True Male 56-60 NaN NaN NaN NaN Yes
1 2 Voluntary Early Retirement (VER) 08/2012 NaN NaN Public Servant AO5-AO7 Central Office Corporate Strategy and Peformance Permanent Full-time ... False False False Male 56-60 NaN NaN NaN NaN NaN
2 3 Voluntary Early Retirement (VER) 05/2012 2011.0 2011.0 Schools Officer NaN Central Office Education Queensland Permanent Full-time ... False False True Male 61 or older NaN NaN NaN NaN NaN
3 4 Resignation-Other reasons 05/2012 2005.0 2006.0 Teacher Primary Central Queensland NaN Permanent Full-time ... False False False Female 36-40 NaN NaN NaN NaN NaN
4 5 Age Retirement 05/2012 1970.0 1989.0 Head of Curriculum/Head of Special Education NaN South East NaN Permanent Full-time ... True False False Female 61 or older NaN NaN NaN NaN NaN

5 rows × 35 columns

Let us work on cleaning our column name to the choice we want. we want to start our cleaning from the columns names.

In [17]:
dete_survey_updated.columns = dete_survey_updated.columns.str.lower().str.strip().str.replace(' ', '_')
dete_survey_updated.columns
Out[17]:
Index(['id', 'separationtype', 'cease_date', 'dete_start_date',
       'role_start_date', 'position', 'classification', 'region',
       'business_unit', 'employment_status', 'career_move_to_public_sector',
       'career_move_to_private_sector', 'interpersonal_conflicts',
       'job_dissatisfaction', 'dissatisfaction_with_the_department',
       'physical_work_environment', 'lack_of_recognition',
       'lack_of_job_security', 'work_location', 'employment_conditions',
       'maternity/family', 'relocation', 'study/travel', 'ill_health',
       'traumatic_incident', 'work_life_balance', 'workload',
       'none_of_the_above', 'gender', 'age', 'aboriginal', 'torres_strait',
       'south_sea', 'disability', 'nesb'],
      dtype='object')
In [18]:
# let us drop some columns that is not necessary. we wont actually need 
# from column 17:66 for tafe_survey
tafe_to_drop = tafe_survey.columns[17:66]
tafe_survey_updated = tafe_survey.drop(tafe_to_drop, axis=1)

From the above code, i first set my NaN values to be all missing values from the dataset which is not seen as string NaN.

I had to first identify some columns that is not necessary for both of my datasets analysis for both datsets.

After the identification, i had to put each of them(differently) in a variable which is based on the columns not needed, and called them using the dataframe drop method since i am working with a dataframe with many columns

working on the cleaning of the columns names for tafe_survey dataset.

In [19]:
#let's assign the columns we want to rename to overcome ambigious nature of the column names
#we will start by assigning them to a seperate variable, for ease.
tafe_rename = {'Record ID': 'id','CESSATION YEAR': 'cease_date',
               'Reason for ceasing employment': 'separationtype', 
               'Gender. What is your Gender?': 'gender',
               'CurrentAge. Current Age': 'age', 
               'Employment Type. Employment Type': 'employment_status',
               'Classification. Classification': 'position',
               'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
               'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'
}

tafe_survey_updated = tafe_survey_updated.rename(tafe_rename, axis=1)

#let's view our columns head
tafe_survey_updated.head()
Out[19]:
id Institute WorkArea cease_date separationtype Contributing Factors. Career Move - Public Sector Contributing Factors. Career Move - Private Sector Contributing Factors. Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family ... Contributing Factors. Study Contributing Factors. Travel Contributing Factors. Other Contributing Factors. NONE gender age employment_status position institute_service role_service
0 6.341330e+17 Southern Queensland Institute of TAFE Non-Delivery (corporate) 2010.0 Contract Expired NaN NaN NaN NaN NaN ... NaN NaN NaN NaN Female 26 30 Temporary Full-time Administration (AO) 1-2 1-2
1 6.341337e+17 Mount Isa Institute of TAFE Non-Delivery (corporate) 2010.0 Retirement - - - - - ... - Travel - - NaN NaN NaN NaN NaN NaN
2 6.341388e+17 Mount Isa Institute of TAFE Delivery (teaching) 2010.0 Retirement - - - - - ... - - - NONE NaN NaN NaN NaN NaN NaN
3 6.341399e+17 Mount Isa Institute of TAFE Non-Delivery (corporate) 2010.0 Resignation - - - - - ... - Travel - - NaN NaN NaN NaN NaN NaN
4 6.341466e+17 Southern Queensland Institute of TAFE Delivery (teaching) 2010.0 Resignation - Career Move - Private Sector - - - ... - - - - Male 41 45 Permanent Full-time Teacher (including LVT) 3-4 3-4

5 rows × 23 columns

In [20]:
tafe_survey_updated['age'].value_counts()
Out[20]:
56 or older      162
51-55             82
41  45            80
46  50            59
31  35            52
36  40            51
26  30            50
21  25            44
20 or younger     16
Name: age, dtype: int64

Generally, what i have done so far, is looking at my data and trying to understand it fully.

This data was gotten from different sources with different survey pattern. so what insights are the survey saying? which should be my goal (knowing the employee exit survey)

To analyze this data so far, what i have done is

1) importing both pandas and numpy

2) reading my csv files(both dete_survey and tafe_survey) with pandas

3) looking at my column headers and checking for null(NaN) or missing data in my dataset, which i had many of them

4) Dropped some columns which it is not helpful to my endpoint, or my goal.

5) Tried working on my columns for each dataset, where in dete, i replaced some formats and in tafe, i renamed some headers.

Filtering the Data

In [21]:
#let us first look at the reason for the employee exit from the separationtype 
#column by looking at the unique value counts for each reason.
print(dete_survey_updated['separationtype'].value_counts())
print(tafe_survey_updated['separationtype'].value_counts())
Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: separationtype, dtype: int64
Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
Name: separationtype, dtype: int64

The goal of our project is to analyse the resignation type of employee exit, which from the separation type column for both dataset, we could see that Resignation for dete_survey accounts for (150+91+70) which is dependent on the reason for resignation, and that of tafe_survey is 340

In [22]:
#let us reuse our series.value_counts() method to look at what happens to
#our separationtype column for dete_survey_updated

dete_survey_updated['separationtype'].value_counts()
Out[22]:
Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: separationtype, dtype: int64
In [23]:
#now since we have different resignations orders, let us convert them to one word which is resignation
#i will first split the resignation type and remove other words that is not resignation
dete_survey_updated['separationtype'] = dete_survey_updated['separationtype'].str.split('-').str[0]

#lets confirm if our data was updated

dete_survey_updated['separationtype'].value_counts()

#confrimed, our resignation is now together.
Out[23]:
Resignation                         311
Age Retirement                      285
Voluntary Early Retirement (VER)     67
Ill Health Retirement                61
Other                                49
Contract Expired                     34
Termination                          15
Name: separationtype, dtype: int64
In [24]:
#let us reuse our series.value_counts() method to look at what happens to
#our separationtype column for tafe_survey_updated

tafe_survey_updated['separationtype'].value_counts()
Out[24]:
Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
Name: separationtype, dtype: int64
In [25]:
#since we are interested in the resignation of the employee exit type,
# we will work with resignation type from the separation column. but we will use dataframe.copy()
#this is so because we want to avoid the problem ofsettingwithCopy Warning.

dete_resignation = dete_survey_updated[dete_survey_updated['separationtype']=='Resignation'].copy()
In [26]:
#since we are interested in the resignation of the employee exit type,
# we will work with resignation type from the separation column. but we will use dataframe.copy()
#this is so because we want to avoid the problem ofsettingwithCopy Warning.

tafe_resignation = tafe_survey_updated[tafe_survey_updated['separationtype']=='Resignation'].copy()

At this point, i introudced a dataframe.copy method to copy seperately the findings from the resignation type in the separationtype column. This is so because i would encouter a SettingwithCopy warning. This warning arises as a result of chained assignment error.

Verifying the Data

when working with real life data, it is good we avoid the error of assuming that the data we are analyzing isn't corrupted. catching all the errors at once might not always be possible, but we have to make the data look reasonable. i will start be verifying the cease_date and dete_start_date columns to see if the years make any meaningful sense.

observations? yes, there is inconsistency in my date format type. But we are only interested in the years.

we will extract the year from the cease_date column

In [27]:
dete_resignation['cease_date'].value_counts()
Out[27]:
2012       126
2013        74
01/2014     22
12/2013     17
06/2013     14
09/2013     11
07/2013      9
11/2013      9
10/2013      6
08/2013      4
05/2013      2
05/2012      2
07/2006      1
07/2012      1
09/2010      1
2010         1
Name: cease_date, dtype: int64
In [28]:
#let's look at the dete_start_date column to check out for outliers

dete_resignation['dete_start_date'].value_counts().sort_index(ascending=True)
Out[28]:
1963.0     1
1971.0     1
1972.0     1
1973.0     1
1974.0     2
1975.0     1
1976.0     2
1977.0     1
1980.0     5
1982.0     1
1983.0     2
1984.0     1
1985.0     3
1986.0     3
1987.0     1
1988.0     4
1989.0     4
1990.0     5
1991.0     4
1992.0     6
1993.0     5
1994.0     6
1995.0     4
1996.0     6
1997.0     5
1998.0     6
1999.0     8
2000.0     9
2001.0     3
2002.0     6
2003.0     6
2004.0    14
2005.0    15
2006.0    13
2007.0    21
2008.0    22
2009.0    13
2010.0    17
2011.0    24
2012.0    21
2013.0    10
Name: dete_start_date, dtype: int64
In [29]:
#let us check out for tafe_resignation column

tafe_resignation['cease_date'].value_counts(ascending=True)
Out[29]:
2009.0      2
2013.0     55
2010.0     68
2012.0     94
2011.0    116
Name: cease_date, dtype: int64
In [30]:
# Extract the years and convert them to a float type
dete_resignation['cease_date'] = dete_resignation['cease_date'].str.split('/').str[-1]
dete_resignation['cease_date'] = dete_resignation['cease_date'].astype("float")

# Check the values again and look for outliers
dete_resignation['cease_date'].value_counts()
Out[30]:
2013.0    146
2012.0    129
2014.0     22
2010.0      2
2006.0      1
Name: cease_date, dtype: int64

Boxplot representation

let us use boxplot to reresent our date column data to point out outliers

In [31]:
dete_resignation['cease_date'].describe()
Out[31]:
count     300.00000
mean     2012.60000
std         0.75403
min      2006.00000
25%      2012.00000
50%      2013.00000
75%      2013.00000
max      2014.00000
Name: cease_date, dtype: float64

box plot for dete_resignation cease_date

ax = dete_resignation.boxplot(column=['cease_date']) plt.title("dete cease employee year") ax.set_ylim(2004,2016) plt.ylabel("Year") plt.show

The box plot for dete_resignation and statistics of cease_date suggests that we recorded large number of resignation within 2012 - 2014 alone, while we have an outlier in 2006

In [32]:
#lets work for start date on dete_survey
ax = dete_resignation.boxplot(column = ['dete_start_date'])
plt.title("dete start date")
ax.set_ylim(1963,2013)
plt.ylabel("year")
plt.show()
In [33]:
dete_resignation['dete_start_date'].describe()
Out[33]:
count     283.000000
mean     2002.067138
std         9.914479
min      1963.000000
25%      1997.000000
50%      2005.000000
75%      2010.000000
max      2013.000000
Name: dete_start_date, dtype: float64

As can be seen, most of our start date occured in the late 1997 to 2010, while we outliers in periods below 1997

In [34]:
tafe_resignation['cease_date'].describe()
Out[34]:
count     335.000000
mean     2011.394030
std         1.005952
min      2009.000000
25%      2011.000000
50%      2011.000000
75%      2012.000000
max      2013.000000
Name: cease_date, dtype: float64
In [35]:
#lets work on tafe_resignation
ax = tafe_resignation.boxplot(column=['cease_date'])
plt.title("tafe employee cease date")
ax.set_ylim(2009,2013)
plt.ylabel("year")
plt.show()

tafe_resignation suggests that people resigned mostly betwen 2011-2012

Summary on date values

I did not see any years listed after the current date, and I did not see any start dates before the year 1940. There were many outliers present, but nothing that seemed inherently incorrect.

Therefore, I can verify that there aren't any major issues with the years.

Creating a new column, institute_service in dete_resignation

In [36]:
# let us review our dete_resignation columns again.


dete_resignation.head()
Out[36]:
id separationtype cease_date dete_start_date role_start_date position classification region business_unit employment_status ... work_life_balance workload none_of_the_above gender age aboriginal torres_strait south_sea disability nesb
3 4 Resignation 2012.0 2005.0 2006.0 Teacher Primary Central Queensland NaN Permanent Full-time ... False False False Female 36-40 NaN NaN NaN NaN NaN
5 6 Resignation 2012.0 1994.0 1997.0 Guidance Officer NaN Central Office Education Queensland Permanent Full-time ... False False False Female 41-45 NaN NaN NaN NaN NaN
8 9 Resignation 2012.0 2009.0 2009.0 Teacher Secondary North Queensland NaN Permanent Full-time ... False False False Female 31-35 NaN NaN NaN NaN NaN
9 10 Resignation 2012.0 1997.0 2008.0 Teacher Aide NaN NaN NaN Permanent Part-time ... False False False Female 46-50 NaN NaN NaN NaN NaN
11 12 Resignation 2012.0 2009.0 2009.0 Teacher Secondary Far North Queensland NaN Permanent Full-time ... False False False Male 31-35 NaN NaN NaN NaN NaN

5 rows × 35 columns

Having our goal in mind which is checking out on whether resignation of employees was as a result of dissatisfaction. we will do it by comparing both employees with long and short service rate.(which will be done by substracting start date and cease date of the various employees) we will create a column in dete_resignation column, and term it institue_service, since we already have a corresponding column description in tafe_resignation.

from the dete_resignation column ahead, we can see that we can infer length of service from the difference between cease_date and dete_start_date.

Now let us create a new column that would refer to the length of service

In [37]:
#creating length of service column will be done by subtracting the dete_start_date from the cease_date

institute_service = dete_resignation['cease_date'] - dete_resignation['dete_start_date']

#let us assign this variable as a column in our dete_resignation data

dete_resignation['institute_service'] = institute_service

print(dete_resignation['institute_service'])
3       7.0
5      18.0
8       3.0
9      15.0
11      3.0
12     14.0
14      5.0
16      NaN
20     30.0
21     32.0
22     15.0
23     39.0
25     17.0
27      7.0
33      9.0
34      6.0
37      1.0
39      NaN
40     35.0
41     38.0
42      1.0
43     36.0
48      3.0
50      3.0
51     19.0
55      4.0
57      9.0
61      1.0
69      6.0
71      1.0
       ... 
747     6.0
751     8.0
752    15.0
753     9.0
755     1.0
762     0.0
766     7.0
769     5.0
770     NaN
771    12.0
774     NaN
784     0.0
786    20.0
788     NaN
789    31.0
790     6.0
791     NaN
794     NaN
797     NaN
798     NaN
802     NaN
803    10.0
804     6.0
806     8.0
807     9.0
808     3.0
815     2.0
816     2.0
819     5.0
821     NaN
Name: institute_service, Length: 311, dtype: float64
In [38]:
ax = dete_resignation.boxplot(column=['institute_service'])
plt.title("dete work years")
plt.ylabel("year")
plt.show()
In [39]:
dete_resignation['institute_service'].describe()
Out[39]:
count    273.000000
mean      10.457875
std        9.931709
min        0.000000
25%        3.000000
50%        7.000000
75%       16.000000
max       49.000000
Name: institute_service, dtype: float64

dete_resignation insitute_service boxplot suggests that majority of recorded years work lies between 3 - 16 years.

We have successfully created a new column in our dete_resignation column known as institute_service, which takes care of the difference between the start and end work date of the exited employees (which is known as years of service).

Observations made from the result:

from our observations, we can see that though we have succesfully crated a column termed institute_service, which calculates the length of service by finding the years range between start and end work date of employees(years of service), some rows still shows NaN.

tafe institute service

In [40]:
tafe_resignation['institute_service'].value_counts()
Out[40]:
Less than 1 year      73
1-2                   64
3-4                   63
5-6                   33
11-20                 26
7-10                  21
More than 20 years    10
Name: institute_service, dtype: int64

Identifying Dissatisfied Employees

In [41]:
dete_survey_updated.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 822 entries, 0 to 821
Data columns (total 35 columns):
id                                     822 non-null int64
separationtype                         822 non-null object
cease_date                             788 non-null object
dete_start_date                        749 non-null float64
role_start_date                        724 non-null float64
position                               817 non-null object
classification                         455 non-null object
region                                 717 non-null object
business_unit                          126 non-null object
employment_status                      817 non-null object
career_move_to_public_sector           822 non-null bool
career_move_to_private_sector          822 non-null bool
interpersonal_conflicts                822 non-null bool
job_dissatisfaction                    822 non-null bool
dissatisfaction_with_the_department    822 non-null bool
physical_work_environment              822 non-null bool
lack_of_recognition                    822 non-null bool
lack_of_job_security                   822 non-null bool
work_location                          822 non-null bool
employment_conditions                  822 non-null bool
maternity/family                       822 non-null bool
relocation                             822 non-null bool
study/travel                           822 non-null bool
ill_health                             822 non-null bool
traumatic_incident                     822 non-null bool
work_life_balance                      822 non-null bool
workload                               822 non-null bool
none_of_the_above                      822 non-null bool
gender                                 798 non-null object
age                                    811 non-null object
aboriginal                             16 non-null object
torres_strait                          3 non-null object
south_sea                              7 non-null object
disability                             23 non-null object
nesb                                   32 non-null object
dtypes: bool(18), float64(2), int64(1), object(14)
memory usage: 123.7+ KB
In [42]:
tafe_survey_updated.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 702 entries, 0 to 701
Data columns (total 23 columns):
id                                                     702 non-null float64
Institute                                              702 non-null object
WorkArea                                               702 non-null object
cease_date                                             695 non-null float64
separationtype                                         701 non-null object
Contributing Factors. Career Move - Public Sector      437 non-null object
Contributing Factors. Career Move - Private Sector     437 non-null object
Contributing Factors. Career Move - Self-employment    437 non-null object
Contributing Factors. Ill Health                       437 non-null object
Contributing Factors. Maternity/Family                 437 non-null object
Contributing Factors. Dissatisfaction                  437 non-null object
Contributing Factors. Job Dissatisfaction              437 non-null object
Contributing Factors. Interpersonal Conflict           437 non-null object
Contributing Factors. Study                            437 non-null object
Contributing Factors. Travel                           437 non-null object
Contributing Factors. Other                            437 non-null object
Contributing Factors. NONE                             437 non-null object
gender                                                 596 non-null object
age                                                    596 non-null object
employment_status                                      596 non-null object
position                                               596 non-null object
institute_service                                      596 non-null object
role_service                                           596 non-null object
dtypes: float64(2), object(21)
memory usage: 126.2+ KB

We are going to select the columns necessary to categorie employees as 'dissatisfied'.

Going through the tafe_survey_updated, i think, 3 columns would be necessary for this.

1) Contributing Factors: Dissatisfaction(are they tired of the whole system?)

2) Contributing Facctors: Job Dissatisfaction(Are they tired of the job?) and if so, why?? the next step to this question is, is it because of education advancement(if they are young), this will make me to go for the next column selection which will be

for dete_survey_updated i will select the following:

1) job_dissatisfaction

2) dissatisfaction_with_the_department

3) physical_work_environment

4) lack_of_recognition

5) lack_of_job_security

6) work_location

7) employment_conditions

8) work_life_balance

9) workload

Hence, our next step is to see if any of the employees matches this discription (respectively for each survey) and we will classlify them as dissatisfied in a new column.

in Summary: True: indicates a person resigned because they were dissatisfied in some way

False: indicates a person resigned because of a reason other than dissatisfaction with the job

NaN: indicates the value is missing

In [43]:
#let us verify the unique content of each selected contributing factors in tafe_resignation
tafe_resignation['Contributing Factors. Dissatisfaction'].value_counts()

#shows that contributing factor.dissatifaction numbers 55 recorded cases, while others are represented as (-)
Out[43]:
-                                         277
Contributing Factors. Dissatisfaction      55
Name: Contributing Factors. Dissatisfaction, dtype: int64
In [44]:
##let us verify the unique content of each selected contributing factors in tafe_resignation
tafe_resignation['Contributing Factors. Job Dissatisfaction'].value_counts()
#shows that contributing factor. job dissatifaction numbers 62 recorded cases, while others are represented as (-)
Out[44]:
-                      270
Job Dissatisfaction     62
Name: Contributing Factors. Job Dissatisfaction, dtype: int64
In [45]:
# let us define a function that would carry out this operation for us.

def considered_value(val):
    if val == '-':
        return False
    elif pd.isnull(val):
        return np.nan
    else:
        return True
#let us now create a column termed dissatisfied that fits in with this defined function

tafe_resignation['dissatisfied'] = tafe_resignation[['Contributing Factors. Dissatisfaction', 
                                                     'Contributing Factors. Job Dissatisfaction', 
                                                 ]].applymap(considered_value).any(axis=1, skipna=False)


tafe_resignation_up = tafe_resignation.copy()
tafe_resignation_up['dissatisfied'].value_counts(dropna=False)
Out[45]:
False    241
True      91
NaN        8
Name: dissatisfied, dtype: int64

from this, it can be seen that i have 91 cases of dissatisfaction for tafe_resignation

In [46]:
# lets do the same for dete_resignation
dete_resignation['dissatisfied'] = dete_resignation[['job_dissatisfaction',
       'dissatisfaction_with_the_department', 'physical_work_environment',
       'lack_of_recognition', 'lack_of_job_security', 'work_location',
       'employment_conditions', 'work_life_balance',
       'workload']].any(axis=1, skipna=False)

dete_resignation_up = dete_resignation.copy()

dete_resignation_up['dissatisfied'].value_counts(dropna=False)
Out[46]:
False    162
True     149
Name: dissatisfied, dtype: int64

from this, it can be seen that i have 149 cases of dissatisfaction for dete_resignation.

So far, we have created a new column(with required or needed variables) seperately for our analysis for both data survey respectively which shows the reason (which surrounds dissatisfaction) of employees exit

Combining the Data

Summary of what i have done so far:

  • Renamed the columns
  • Dropped any data not needed for our analysis
  • Verified the quality of our data
  • Created a new institute_service column
  • Cleaned the Contributing Factors columns
  • Created a new column indicating if an employee resigned because they were dissatisfied in some way

Let us try to aggregate the datasets. Our end goal is aggregating data according to the institute_service column(which we earlier on created)

In [47]:
#we will add a new column to be able to differentiate between the intitute_service column of both data surveys or dataframes easily.

dete_resignation_up['institute'] = "DETE"
tafe_resignation_up['institute'] = "TAFE"

#combining the dataframe
combined = pd.concat([dete_resignation_up, tafe_resignation_up], ignore_index=True) 

#let us verify our data concatenation 
combined.notnull().sum().sort_values()
Out[47]:
torres_strait                                            0
south_sea                                                3
aboriginal                                               7
disability                                               8
nesb                                                     9
business_unit                                           32
classification                                         161
region                                                 265
role_start_date                                        271
dete_start_date                                        283
role_service                                           290
career_move_to_public_sector                           311
employment_conditions                                  311
work_location                                          311
lack_of_job_security                                   311
job_dissatisfaction                                    311
dissatisfaction_with_the_department                    311
workload                                               311
lack_of_recognition                                    311
interpersonal_conflicts                                311
maternity/family                                       311
none_of_the_above                                      311
physical_work_environment                              311
relocation                                             311
study/travel                                           311
traumatic_incident                                     311
work_life_balance                                      311
career_move_to_private_sector                          311
ill_health                                             311
Contributing Factors. Career Move - Private Sector     332
Contributing Factors. Other                            332
Contributing Factors. Career Move - Public Sector      332
Contributing Factors. Career Move - Self-employment    332
Contributing Factors. Travel                           332
Contributing Factors. Study                            332
Contributing Factors. Dissatisfaction                  332
Contributing Factors. Ill Health                       332
Contributing Factors. NONE                             332
Contributing Factors. Maternity/Family                 332
Contributing Factors. Job Dissatisfaction              332
Contributing Factors. Interpersonal Conflict           332
WorkArea                                               340
Institute                                              340
institute_service                                      563
gender                                                 592
age                                                    596
employment_status                                      597
position                                               598
cease_date                                             635
dissatisfied                                           643
id                                                     651
separationtype                                         651
institute                                              651
dtype: int64
In [48]:
#since we only need the relevant data for our analyses, let us drop those which are not relevant
#from observation, we are going to drop data with less than 500 non null values because they are not important for our amalyses.

combined_update = combined.dropna(thresh=500, axis=1).copy()
combined_update.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651 entries, 0 to 650
Data columns (total 10 columns):
age                  596 non-null object
cease_date           635 non-null float64
dissatisfied         643 non-null object
employment_status    597 non-null object
gender               592 non-null object
id                   651 non-null float64
institute            651 non-null object
institute_service    563 non-null object
position             598 non-null object
separationtype       651 non-null object
dtypes: float64(2), object(8)
memory usage: 50.9+ KB
In [49]:
combined_update['institute'].value_counts()
Out[49]:
TAFE    340
DETE    311
Name: institute, dtype: int64

in combining the data, we first created a column, which would make it easier to identify and spot each dataset(between DETE and TAFE)

After that, we were able to remove the data not necessary for our analyses by using dropna method

Cleaning the Service column

In [50]:
combined['institute_service'].value_counts()
Out[50]:
Less than 1 year      73
1-2                   64
3-4                   63
5-6                   33
11-20                 26
5.0                   23
1.0                   22
7-10                  21
3.0                   20
0.0                   20
6.0                   17
4.0                   16
9.0                   14
2.0                   14
7.0                   13
More than 20 years    10
13.0                   8
8.0                    8
20.0                   7
15.0                   7
14.0                   6
17.0                   6
12.0                   6
10.0                   6
22.0                   6
18.0                   5
16.0                   5
24.0                   4
23.0                   4
11.0                   4
19.0                   3
39.0                   3
21.0                   3
32.0                   3
36.0                   2
25.0                   2
26.0                   2
28.0                   2
30.0                   2
42.0                   1
35.0                   1
49.0                   1
34.0                   1
31.0                   1
33.0                   1
29.0                   1
27.0                   1
41.0                   1
38.0                   1
Name: institute_service, dtype: int64

working of the institute service column in important since there are variations in the years range.

hence, we will convert this numbers into categories. Hence, our analysis of this article(https://www.businesswire.com/news/home/20171108006002/en/Age-Number-Engage-Employees-Career-Stage) which makes cases of understanding employeees needs based on career stage than age.

our classification will be:

New: Less than 3 years at a company

Experienced: 3-6 years at a company

Established: 7-10 years at a company

Veteran: 11 or more years at a company

now let us categorize the institute_service column

In [51]:
#let us extract the years of service from the institute_service and change the type to str which is based on regex method

combined_update['institute_service_up'] = combined_update['institute_service'].astype('str').str.extract(r'(\d+)', expand=False)
combined_update['institute_service_up'] = combined_update['institute_service_up'].astype('float')

#let us verify our changes and compare with the combined_update['institute_service']

combined_update['institute_service_up'].value_counts()
Out[51]:
1.0     159
3.0      83
5.0      56
7.0      34
11.0     30
0.0      20
20.0     17
6.0      17
4.0      16
9.0      14
2.0      14
13.0      8
8.0       8
15.0      7
17.0      6
10.0      6
12.0      6
14.0      6
22.0      6
16.0      5
18.0      5
24.0      4
23.0      4
39.0      3
19.0      3
21.0      3
32.0      3
28.0      2
36.0      2
25.0      2
30.0      2
26.0      2
29.0      1
38.0      1
42.0      1
27.0      1
41.0      1
35.0      1
49.0      1
34.0      1
33.0      1
31.0      1
Name: institute_service_up, dtype: int64
In [52]:
# Change values to float
combined_update['institute_service_up'] = combined_update['institute_service_up'].astype(float)
combined_update['institute_service_up'].describe()
Out[52]:
count    563.000000
mean       7.067496
std        8.251974
min        0.000000
25%        1.000000
50%        4.000000
75%       10.000000
max       49.000000
Name: institute_service_up, dtype: float64
In [53]:
combined_update['institute_service_up'].value_counts(dropna=False)
Out[53]:
 1.0     159
NaN       88
 3.0      83
 5.0      56
 7.0      34
 11.0     30
 0.0      20
 20.0     17
 6.0      17
 4.0      16
 9.0      14
 2.0      14
 13.0      8
 8.0       8
 15.0      7
 22.0      6
 10.0      6
 17.0      6
 14.0      6
 12.0      6
 16.0      5
 18.0      5
 24.0      4
 23.0      4
 21.0      3
 39.0      3
 32.0      3
 19.0      3
 36.0      2
 30.0      2
 25.0      2
 26.0      2
 28.0      2
 42.0      1
 29.0      1
 35.0      1
 27.0      1
 41.0      1
 49.0      1
 38.0      1
 34.0      1
 33.0      1
 31.0      1
Name: institute_service_up, dtype: int64

we have succesfully converted the institute_service_up to float values with characters.

Now let us analyse our data with the above grouped classification for age, by creating a function that will do the mapping. the grouping will be like:

New: Less than 3 years at a company

Experienced: 3-6 years at a company

Established: 7-10 years at a company

Veteran: 11 or more years at a company

In [54]:
#let us create a function that will map the age classification.

def institute_map(element):
    if element < 3:
        return "New"
    elif element >=3 and element <= 6:
        return "Experienced"
    elif element >=7 and element <= 10:
        return "Established"
    elif pd.isnull(element):
        return np.nan
    else:
        return "Veteran"

combined_update['service_cat'] = combined_update['institute_service_up'].apply(institute_map)

combined_update['service_cat'].value_counts(dropna=False)
    
Out[54]:
New            193
Experienced    172
Veteran        136
NaN             88
Established     62
Name: service_cat, dtype: int64

I have created a column category service_cat that categorizes employees according to the amount of years spent in their workplace.

Performing Analysis

In [55]:
#verifying our dissatisfied column
#we have 392 false value, 251 true and 8 missing values
combined_update['dissatisfied'].value_counts(dropna=False)
Out[55]:
False    403
True     240
NaN        8
Name: dissatisfied, dtype: int64
In [56]:
#since i have missing values as nan, i will update my false values to include nan
#since the false values appears most

combined_update['dissatisfied'] = combined_update['dissatisfied'].fillna(False)
In [57]:
#calculating the percentage of dissastisfied employees in each service_cat group

dissatisfied_percent = combined_update.pivot_table(index='service_cat', values='dissatisfied')

%matplotlib inline
dissatisfied_percent.plot(kind='bar', rot=30)
Out[57]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2d9a195390>

consdering the job dissatisfaction from both dataset, Established(more than 7-10years) seems like they are more likely to exit due to dissatisaction. we will look on the next dataset to finalize

In [58]:
gender_dist = combined_update['gender'].value_counts(dropna=False)
print(gender_dist)
Female    424
Male      168
NaN        59
Name: gender, dtype: int64
In [59]:
combined_update.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651 entries, 0 to 650
Data columns (total 12 columns):
age                     596 non-null object
cease_date              635 non-null float64
dissatisfied            651 non-null bool
employment_status       597 non-null object
gender                  592 non-null object
id                      651 non-null float64
institute               651 non-null object
institute_service       563 non-null object
position                598 non-null object
separationtype          651 non-null object
institute_service_up    563 non-null float64
service_cat             563 non-null object
dtypes: bool(1), float64(3), object(8)
memory usage: 56.7+ KB
In [60]:
combined_update['age'].value_counts(dropna=False)
Out[60]:
51-55            71
NaN              55
41-45            48
41  45           45
46-50            42
36-40            41
46  50           39
26-30            35
21  25           33
26  30           32
36  40           32
31  35           32
21-25            29
56 or older      29
31-35            29
56-60            26
61 or older      23
20 or younger    10
Name: age, dtype: int64

In order to answer the question about dissatisfaction and age, I need to evaluate and clean the "age" column, since my age column seems not to be good and neat for analysis.

from the age column, there are irregularities in the different rows, some contain "-" while others dont. i will make it in uniform by converting those without "-" to contain "-", and i will replace 56 or older to 56-60.

In [61]:
# first, let me convert the age column to a string type

combined_update['age'] = combined_update['age'].astype(str)

combined_update['age_cleaned'] = combined['age'].str.replace("  ", "-").str.replace("56 or older", "56-60")

combined_update['age_cleaned'].value_counts(dropna=False)
Out[61]:
41-45            93
46-50            81
36-40            73
51-55            71
26-30            67
21-25            62
31-35            61
56-60            55
NaN              55
61 or older      23
20 or younger    10
Name: age_cleaned, dtype: int64

This section, I cleaned the age column in order to have consistent values across the dataset. i will use this information in analysing the dissastifaction rate of each employee catgory

Analysis on dissastifaction of the employees

We are going to look on the resignation of the employees under the following headings

1) Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?

2) Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

In [62]:
combined_update['dissatisfied'].value_counts()
Out[62]:
False    411
True     240
Name: dissatisfied, dtype: int64

Career stage dissastifaction plot

We will look on young resignation against old resignation category. Are new employees resigning more than the veterans or old employees? what should be the cause? for us to that, we will have to convert our data to pivot_table

In [63]:
#let us calculate the percentage of dissastified employees with there category

category_per_dis = combined_update.pivot_table(index = 'service_cat', values='dissatisfied')
print(category_per_dis)
             dissatisfied
service_cat              
Established      0.516129
Experienced      0.343023
New              0.295337
Veteran          0.485294
In [64]:
#lets plot this using bar chart
category_per_dis.plot(kind='bar', legend=False, rot=30, title='Resignation: Career Stage and Dissatisfaction')
plt.xlabel("Employee stage")
plt.ylabel("percent dissatisfied")
plt.show()

Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?

New employees of about 29.5% resigned due to dissastifaction, while Established of about 7-10 years of experience has the highest resigning rate with about 51.6% while veterans with about 48% rate resigned and Experienced at about 34% resigned

Summary, New employees have a lesser resigning rate than those with higher work years experience.

Age dissastifaction rate

lets work with the age range. recall that we have a NaN value, lets drop it from this plot.

In [65]:
combined_update['age_cleaned'].value_counts()
Out[65]:
41-45            93
46-50            81
36-40            73
51-55            71
26-30            67
21-25            62
31-35            61
56-60            55
61 or older      23
20 or younger    10
Name: age_cleaned, dtype: int64
In [66]:
#lets convert the age catgory to pivot table to ease the plot
age_dis_per = combined_update.pivot_table(index='age_cleaned', values='dissatisfied')
age_dis_per
Out[66]:
dissatisfied
age_cleaned
20 or younger 0.200000
21-25 0.306452
26-30 0.417910
31-35 0.377049
36-40 0.342466
41-45 0.376344
46-50 0.382716
51-55 0.422535
56-60 0.381818
61 or older 0.521739
In [67]:
# lets visualize the plot
age_dis_per.plot(kind='bar', rot=90, title='Resignation: Age and Dissatisfaction', legend=False)
plt.xlabel('Age')
plt.ylabel('dissatisfied percent')
plt.show()

Employees 20 years younger are the least to resign at the rate of 20%, while those with more than 61 years of age resign the most at the rate of 52%

Summary

The analysis so far has showed that the groups most at risk of resigning due to dissatisfaction are the older employees (61 or older) and those who have been at the company 7 to 10 years (Established). The groups least at risk are the employees who are 21 and younger and those who have been at the company less than three years (New).

Age Impact and Career dissatisfaction rate

I want to investigate into that a bit further, and look at which combination of career stage and age is most likely to claim leaving due to dissatisfaction. I will also look into the effects of institute and position.

In [68]:
# let me create a pivot table for age, career stage and dissastifaction
age_car_dis_pt = combined_update.pivot_table(index="service_cat",columns="age_cleaned",values="dissatisfied")
age_car_dis_pt.head()
Out[68]:
age_cleaned 20 or younger 21-25 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61 or older
service_cat
Established NaN 0.000000 0.545455 0.75 0.400000 0.666667 0.363636 0.500000 0.333333 0.500000
Experienced 0.333333 0.285714 0.444444 0.30 0.380952 0.413793 0.285714 0.250000 0.250000 0.500000
New 0.142857 0.297297 0.320000 0.25 0.347826 0.233333 0.368421 0.318182 0.400000 NaN
Veteran NaN NaN NaN 0.40 0.285714 0.500000 0.545455 0.600000 0.387097 0.642857
In [69]:
# now lets plot this into a figure plot.
fig = plt.figure(figsize=(16,16))
fig.suptitle('Dissatisfied Employee by Career stage and Age', fontsize=16)

#let us plot for New 
ax = plt.subplot('221')
#to select only the new role,
age_car_dis_pt.iloc[2,:].plot(kind='bar', ax= ax)
plt.ylabel("Percent Dissatisfied")
ax.set_title("New: Less than 3 years")
ax.set_ylim([0, .8])

#lets plot for established
ax = plt.subplot('222')
#to select only the new role,
age_car_dis_pt.iloc[0,:].plot(kind='bar', ax= ax)
ax.set_title("Established: 3 - 6 years")
ax.set_ylim([0, .8])

#lets plot for experienced
ax = plt.subplot('223')
#to select only the new role,
age_car_dis_pt.iloc[1,:].plot(kind='bar', ax= ax)
plt.ylabel("Percent Dissatisfied")
ax.set_title("Experienced: 7 - 10 years")
ax.set_ylim([0, .8])

#lets plot for veteran

ax = plt.subplot('224')
#to select only the new role,
age_car_dis_pt.iloc[3,:].plot(kind='bar', ax= ax)
ax.set_title("New: more than 11 years")
ax.set_ylim([0, .8])

plt.show()
In [70]:
#using for loop to achieve the same result
fig = plt.figure(figsize=(16,16))
fig.suptitle('Dissatisfied Employee by Career stage and Age', fontsize=16)

for_plot = (("221", 2, "New: less than 3 years"),
            ("222", 0, "Established: 3-6 years"),
            ("223", 1, "Experienced: 7-10 years"),
            ("224", 3, "veteran: more than 11 years")
           )
for sn, id, title in for_plot:
    ax = plt.subplot(sn)
    age_car_dis_pt.iloc[id,:].plot(kind="bar", ax= ax)
    ax.set_title(title)
    ax.set_ylim([0, .8])
    plt.xlabel("Employee Age")
    plt.ylabel("Percent Dissatisfied")

plt.show()

The new plot, Experienced and Veteran showed that the resignation rate rises on the inxrease for those workers with more than 61 of years.

Thus this was different for Established(between 3-6 years) which sees themselves in the high rsignation

DETE Resignation and TAFE rsignation

Now the question still rooms, amongst the different institute, which recorded the highest resigning rate?

I will work in comparing the resignation rate between the two institute.

let us first view the institute column

In [71]:
#lets view the total data in each institute survey
combined_update['institute'].value_counts()
Out[71]:
TAFE    340
DETE    311
Name: institute, dtype: int64
In [72]:
#We can see that Tafe seems to record more survey, lets see how it will affect the resignation rate

institute_plot = combined_update.pivot_table(index='institute', values='dissatisfied')

#let us put our data in a plot visual

institute_plot.plot(kind='bar', rot=0, title='Resignation due to Dissatisfaction per Institute', legend=False)
plt.xlabel='Institute'
plt.ylabel='Percent Dissatisfied'
plt.show()
This plot shows that Dete has more resignation rate than Tafe. DEffects of Career Stage on Dissatisfaction per Inete has about 48% reignation rate while Tafe recorded about 28%. Let us move forward in seeing how this will affect the different service_cat category.

Effects of Career Stage on Dissatisfaction per Institute

In [73]:
#let us first convert the require column to pivot table
inst_car_plot = combined_update.pivot_table(index='service_cat', columns='institute', values='dissatisfied')
inst_car_plot.head()
Out[73]:
institute DETE TAFE
service_cat
Established 0.609756 0.333333
Experienced 0.460526 0.250000
New 0.375000 0.262774
Veteran 0.560000 0.277778
In [74]:
#let us now plot our career stage on dissatisfaction per institute

inst_car_plot.plot(kind='bar', rot=0, title='Effects of Career Stage on Dissatisfaction per Institute')
plt.xlabel='career stage'
plt.ylabel='percent dissatisfied'
plt.legend(loc='upper right', fontsize='small')
plt.show()

Dete Established saw the highest resigning rate from the dete data survey by about 60% based on career stage category, which is followed by the veteran(more than 61 years older) and in Tafe, it seems that resignation is relatively constant across the levels but, established saw the highest resigning rate.

It could be looked further for the reason of resignation of established.

In [75]:
combined_update['gender'].value_counts()
Out[75]:
Female    424
Male      168
Name: gender, dtype: int64
In [76]:
gen_plot = combined_update.pivot_table(index='service_cat', columns='gender', values='dissatisfied')
gen_plot
Out[76]:
gender Female Male
service_cat
Established 0.545455 0.444444
Experienced 0.372881 0.283019
New 0.262411 0.384615
Veteran 0.478723 0.500000
In [77]:
gen_plot.plot(kind='bar', rot=0, title='gender resignation rate by service category')
plt.ylabel='percent dissatisfied'
plt.xlabel='gender resignation'
plt.legend(loc='best', fontsize='small')
plt.show()

More females contributed to the resignation rate in the employee exit.

Institute Summary

The differences between the institutes look very significant, but it is important to remember that there were 9 columns to describe dissatisfaction in the DETE survey, and two in TAFE.

Resignation effect on position

lets us look ath row position played in this resignation of employees to see which position in each survey got tired quickly

let us look at our position setting for both survey.

In [78]:
combined_update['position'].value_counts()
Out[78]:
Administration (AO)                                        148
Teacher                                                    129
Teacher (including LVT)                                     95
Teacher Aide                                                63
Cleaner                                                     39
Public Servant                                              30
Professional Officer (PO)                                   16
Operational (OO)                                            13
Head of Curriculum/Head of Special Education                10
School Administrative Staff                                  8
Technical Officer                                            8
Schools Officer                                              7
Workplace Training Officer                                   6
School Based Professional Staff (Therapist, nurse, etc)      5
Technical Officer (TO)                                       5
Executive (SES/SO)                                           4
Guidance Officer                                             3
Tutor                                                        3
Other                                                        3
Professional Officer                                         2
Business Service Manager                                     1
Name: position, dtype: int64

To get a bigger picture of what each survey is saying, and how to further clean or relace position names which are not matching or will ease our analysis, we will have to seperate and look at each survey independently

In [79]:
print('Dete position')

combined_update.loc[combined_update['institute']=="DETE", "position"].value_counts()
Dete position
Out[79]:
Teacher                                                    129
Teacher Aide                                                63
Cleaner                                                     39
Public Servant                                              30
Head of Curriculum/Head of Special Education                10
School Administrative Staff                                  8
Technical Officer                                            8
Schools Officer                                              7
School Based Professional Staff (Therapist, nurse, etc)      5
Guidance Officer                                             3
Other                                                        3
Professional Officer                                         2
Business Service Manager                                     1
Name: position, dtype: int64
In [80]:
combined_update.loc[combined_update['institute']=="TAFE", "position"].value_counts()
Out[80]:
Administration (AO)           148
Teacher (including LVT)        95
Professional Officer (PO)      16
Operational (OO)               13
Workplace Training Officer      6
Technical Officer (TO)          5
Executive (SES/SO)              4
Tutor                           3
Name: position, dtype: int64

Now we can see clearly that our Institute position in tafe survey needs some cleaning. we have to rename some measures.

In [81]:
tafe_pos_rename = {'Administration (AO)' : 'Administration', 
                   'Teacher (including LVT)' : 'Teacher',
                   'Professional Officer (PO)' : 'Professional Officer',
                   'Operational (OO)' : 'Operational',
                   'Technical Officer (TO)' : 'Technical Officer',
                   'Executive (SES/SO)' : 'Executive'}
combined_update = combined_update.replace({"position" : tafe_pos_rename})
combined_update.loc[combined_update['institute']=="TAFE", "position"].value_counts()
Out[81]:
Administration                148
Teacher                        95
Professional Officer           16
Operational                    13
Workplace Training Officer      6
Technical Officer               5
Executive                       4
Tutor                           3
Name: position, dtype: int64

The position for Tafe dateset column has been corrected. Now lets see what th data shows for position level and the rate of resignation.

we will convert it into pivot_table first before we start visualizing

In [82]:
from importlib import reload
reload(plt)
pos_val = combined_update.pivot_table(index="position",values="dissatisfied") # mean is default arg
pos_val.plot(kind="bar", rot = 90, legend=False,
                title="Resignation due to Dissatisfaction per Position and Institute")
plt.xlabel("Position")
plt.ylabel("Percent Dissatisfied")
plt.show()

From the graph, we could see that the Guidance value recorded the highest value to 1 while others ranges between 0 and 1. This could be as a large response of the data.

Nows visualize individually.

In [83]:
pos_num_counts = combined_update["position"].value_counts()
pos_num_counts.plot(kind="bar", color = "green", title="Employees per Position")
plt.xlabel("Position")
plt.ylabel("Number of Employees")
plt.show()

We could now finalize that from our combined dataset, Teachers showed the highest form of dissatisfaction rate with their resignations

Now let us visualize it indivudually and see what indiviual dataset has to say about the resignation

In [84]:
pos_inst_cat = combined_update.pivot_table(index="position",columns="institute",values="dissatisfied") # mean is default arg
pos_inst_cat.plot(kind="bar", rot = 90,
                title="Resignation due to Dissatisfaction per Position and Institute")
plt.xlabel("Position")
plt.ylabel("Percent Dissatisfied")
plt.legend(loc='best',fontsize="small") 
plt.show()

We have analysed the age, position and institute category to finalize the resignation of employees and its dissatisaction rate