Clean And Analyze Employee Exit Surveys¶

By Combining Multiple Datasets

Author - Raghav_A

For this project, I was approached by 2 (imaginary) institutes - Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. Both the stakeholders were eager to know whether -

Employees who only worked for the institutes for a short period of time were resigning due to some kind of dissatisfaction. Also, what about experienced, established and veteran employees, who have been there longer?
Younger employees were resigning due to some kind of dissatisfaction. Also, what about older employees?

Both the institutes made their own exit-surve data available to me. Also, they want the datasets to be combined before I answer these questions.

The datasets are available in public domain also, I have provided the link below-
TAFE - https://data.gov.au/dataset/ds-qld-89970a3b-182b-41ea-aea2-6f9f17b5907e/details?q=exit%20survey
DETE - https://data.gov.au/dataset/ds-qld-fe96ff30-d157-4a81-851d-215f2a0fe26d/details?q=exit%20survey

Step 0 - Importing the Libraries & Reading the Datasets¶

Before familiarising with the datasets, I will import the relevant python libraries.
Since I intend to work with dataframes and arrays specifically (since the data is tabular and is stored in .CSV files), I shall import the Numpy and Pandas libraries for now.
Having read my files into DataFrame objects, I shall use the .head(), .info() methods to look inside the datasets, to familiarise with it.
Here we go -

In [1]:

import pandas as pd
import numpy as np

In [2]:

dete_survey = pd.read_csv('dete_survey.csv')
tafe_survey = pd.read_csv('tafe_survey.csv')

Step 1 - Display Both Datasets using `head()` (to seek out BAD DATA)¶

1.1 Displaying Top 5 Rows of the DETE dataset¶

In [3]:

dete_survey.head()

Out[3]:

	ID	SeparationType	Cease Date	DETE Start Date	Role Start Date	Position	Classification	Region	Business Unit	Employment Status	...	Kept informed	Wellness programs	Health & Safety	Gender	Age	Aboriginal	Torres Strait	South Sea	Disability	NESB
0	1	Ill Health Retirement	08/2012	1984	2004	Public Servant	A01-A04	Central Office	Corporate Strategy and Peformance	Permanent Full-time	...	N	N	N	Male	56-60	NaN	NaN	NaN	NaN	Yes
1	2	Voluntary Early Retirement (VER)	08/2012	Not Stated	Not Stated	Public Servant	AO5-AO7	Central Office	Corporate Strategy and Peformance	Permanent Full-time	...	N	N	N	Male	56-60	NaN	NaN	NaN	NaN	NaN
2	3	Voluntary Early Retirement (VER)	05/2012	2011	2011	Schools Officer	NaN	Central Office	Education Queensland	Permanent Full-time	...	N	N	N	Male	61 or older	NaN	NaN	NaN	NaN	NaN
3	4	Resignation-Other reasons	05/2012	2005	2006	Teacher	Primary	Central Queensland	NaN	Permanent Full-time	...	A	N	A	Female	36-40	NaN	NaN	NaN	NaN	NaN
4	5	Age Retirement	05/2012	1970	1989	Head of Curriculum/Head of Special Education	NaN	South East	NaN	Permanent Full-time	...	N	A	M	Female	61 or older	NaN	NaN	NaN	NaN	NaN

5 rows × 56 columns

1.2 Displaying Top 5 Rows of the TAFE dataset¶

In [4]:

tafe_survey.head()

Out[4]:

	Record ID	Institute	WorkArea	CESSATION YEAR	Reason for ceasing employment	Contributing Factors. Career Move - Public Sector	Contributing Factors. Career Move - Private Sector	Contributing Factors. Career Move - Self-employment	Contributing Factors. Ill Health	Contributing Factors. Maternity/Family	...	Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?	Workplace. Topic:Does your workplace promote and practice the principles of employment equity?	Workplace. Topic:Does your workplace value the diversity of its employees?	Workplace. Topic:Would you recommend the Institute as an employer to others?	Gender. What is your Gender?	CurrentAge. Current Age	Employment Type. Employment Type	Classification. Classification	LengthofServiceOverall. Overall Length of Service at Institute (in years)	LengthofServiceCurrent. Length of Service at current workplace (in years)
0	6.341330e+17	Southern Queensland Institute of TAFE	Non-Delivery (corporate)	2010.0	Contract Expired	NaN	NaN	NaN	NaN	NaN	...	Yes	Yes	Yes	Yes	Female	26 30	Temporary Full-time	Administration (AO)	1-2	1-2
1	6.341337e+17	Mount Isa Institute of TAFE	Non-Delivery (corporate)	2010.0	Retirement	-	-	-	-	-	...	Yes	Yes	Yes	Yes	NaN	NaN	NaN	NaN	NaN	NaN
2	6.341388e+17	Mount Isa Institute of TAFE	Delivery (teaching)	2010.0	Retirement	-	-	-	-	-	...	Yes	Yes	Yes	Yes	NaN	NaN	NaN	NaN	NaN	NaN
3	6.341399e+17	Mount Isa Institute of TAFE	Non-Delivery (corporate)	2010.0	Resignation	-	-	-	-	-	...	Yes	Yes	Yes	Yes	NaN	NaN	NaN	NaN	NaN	NaN
4	6.341466e+17	Southern Queensland Institute of TAFE	Delivery (teaching)	2010.0	Resignation	-	Career Move - Private Sector	-	-	-	...	Yes	Yes	Yes	Yes	Male	41 45	Permanent Full-time	Teacher (including LVT)	3-4	3-4

5 rows × 72 columns

1.3 Displaying Columns Names of the DETE dataset¶

In [5]:

index = 0
for colname in dete_survey.columns:
    print(index, colname)
    index +=1

0 ID
1 SeparationType
2 Cease Date
3 DETE Start Date
4 Role Start Date
5 Position
6 Classification
7 Region
8 Business Unit
9 Employment Status
10 Career move to public sector
11 Career move to private sector
12 Interpersonal conflicts
13 Job dissatisfaction
14 Dissatisfaction with the department
15 Physical work environment
16 Lack of recognition
17 Lack of job security
18 Work location
19 Employment conditions
20 Maternity/family
21 Relocation
22 Study/Travel
23 Ill Health
24 Traumatic incident
25 Work life balance
26 Workload
27 None of the above
28 Professional Development
29 Opportunities for promotion
30 Staff morale
31 Workplace issue
32 Physical environment
33 Worklife balance
34 Stress and pressure support
35 Performance of supervisor
36 Peer support
37 Initiative
38 Skills
39 Coach
40 Career Aspirations
41 Feedback
42 Further PD
43 Communication
44 My say
45 Information
46 Kept informed
47 Wellness programs
48 Health & Safety
49 Gender
50 Age
51 Aboriginal
52 Torres Strait
53 South Sea
54 Disability
55 NESB

1.4 Displaying Columns Names of the TAFE dataset¶

In [6]:

index = 0
for colname in tafe_survey.columns:
    print(index, colname)
    index +=1

0 Record ID
1 Institute
2 WorkArea
3 CESSATION YEAR
4 Reason for ceasing employment
5 Contributing Factors. Career Move - Public Sector
6 Contributing Factors. Career Move - Private Sector
7 Contributing Factors. Career Move - Self-employment
8 Contributing Factors. Ill Health
9 Contributing Factors. Maternity/Family
10 Contributing Factors. Dissatisfaction
11 Contributing Factors. Job Dissatisfaction
12 Contributing Factors. Interpersonal Conflict
13 Contributing Factors. Study
14 Contributing Factors. Travel
15 Contributing Factors. Other
16 Contributing Factors. NONE
17 Main Factor. Which of these was the main factor for leaving?
18 InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction
19 InstituteViews. Topic:2. I was given access to skills training to help me do my job better
20 InstituteViews. Topic:3. I was given adequate opportunities for personal development
21 InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%
22 InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had
23 InstituteViews. Topic:6. The organisation recognised when staff did good work
24 InstituteViews. Topic:7. Management was generally supportive of me
25 InstituteViews. Topic:8. Management was generally supportive of my team
26 InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me
27 InstituteViews. Topic:10. Staff morale was positive within the Institute
28 InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly
29 InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently
30 InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly
31 WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit
32 WorkUnitViews. Topic:15. I worked well with my colleagues
33 WorkUnitViews. Topic:16. My job was challenging and interesting
34 WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work
35 WorkUnitViews. Topic:18. I had sufficient contact with other people in my job
36 WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job
37 WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job
38 WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT]
39 WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job
40 WorkUnitViews. Topic:23. My job provided sufficient variety
41 WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job
42 WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction
43 WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance
44 WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area
45 WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date
46 WorkUnitViews. Topic:29. There was adequate communication between staff in my unit
47 WorkUnitViews. Topic:30. Staff morale was positive within my work unit
48 Induction. Did you undertake Workplace Induction?
49 InductionInfo. Topic:Did you undertake a Corporate Induction?
50 InductionInfo. Topic:Did you undertake a Institute Induction?
51 InductionInfo. Topic: Did you undertake Team Induction?
52 InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?
53 InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?
54 InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?
55 InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?
56 InductionInfo. On-line Topic:Did you undertake a Institute Induction?
57 InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?
58 InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?
59 InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]
60 InductionInfo. Induction Manual Topic: Did you undertake Team Induction?
61 Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?
62 Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?
63 Workplace. Topic:Does your workplace promote and practice the principles of employment equity?
64 Workplace. Topic:Does your workplace value the diversity of its employees?
65 Workplace. Topic:Would you recommend the Institute as an employer to others?
66 Gender. What is your Gender?
67 CurrentAge. Current Age
68 Employment Type. Employment Type
69 Classification. Classification
70 LengthofServiceOverall. Overall Length of Service at Institute (in years)
71 LengthofServiceCurrent. Length of Service at current workplace (in years)

How "BAD" is the Raw-Data?¶

Some things in life just aren't perfect.
Bad Data almost always creeps inside datasets, and the bigger the dataset, the higher the volume of bad data. Cleaning and reshaping the data, thus, becomes a crucial aspect of data analysis.
The following observations can be pointed out immediately after glancing at the datasets above -

The dete_survey dataset contains 'Not Stated' values that indicate values are missing, but they aren't represented as NaN.
Both the dete_survey and tafe_survey dataframes contain many columns that we don't need to complete our analysis.
Each dataframe contains many of the same columns, but the column names are different.
There are multiple columns/answers that indicate an employee resigned because they were dissatisfied.

Step 2 - Re-Read "`Not Stated`" values in datasets as `NaN` values¶

From 1.1, a couple of Not Stated values can be observed. These values are of the str type, and it might be better if these values could be re-read as NaN or NULL type.

In [7]:

dete_survey = pd.read_csv('dete_survey.csv',na_values = 'Not Stated')
dete_survey[:3]

Out[7]:

	ID	SeparationType	Cease Date	DETE Start Date	Role Start Date	Position	Classification	Region	Business Unit	Employment Status	...	Kept informed	Wellness programs	Health & Safety	Gender	Age	Aboriginal	Torres Strait	South Sea	Disability	NESB
0	1	Ill Health Retirement	08/2012	1984.0	2004.0	Public Servant	A01-A04	Central Office	Corporate Strategy and Peformance	Permanent Full-time	...	N	N	N	Male	56-60	NaN	NaN	NaN	NaN	Yes
1	2	Voluntary Early Retirement (VER)	08/2012	NaN	NaN	Public Servant	AO5-AO7	Central Office	Corporate Strategy and Peformance	Permanent Full-time	...	N	N	N	Male	56-60	NaN	NaN	NaN	NaN	NaN
2	3	Voluntary Early Retirement (VER)	05/2012	2011.0	2011.0	Schools Officer	NaN	Central Office	Education Queensland	Permanent Full-time	...	N	N	N	Male	61 or older	NaN	NaN	NaN	NaN	NaN

3 rows × 56 columns

Step 3 - Clean Column Names - 1¶

From 1.4, it can be obsered that tafe_survey has too many columns, most of them having pretty long names. It would be better to replace the column names in tafe_survey dataset with some easy-to-write column names (for only the relevant columns) -

In [8]:

replace_dict = {'Record ID':'id',
                'CESSATION YEAR':'cease_date',
                'Reason for ceasing employment': 'separationtype',
                'Gender. What is your Gender?': 'gender',
                'CurrentAge. Current Age': 'age',
                'Employment Type. Employment Type': 'employment_status',
                'Classification. Classification': 'position',
                'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
                'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'
               }
tafe_survey.rename(columns = replace_dict, inplace = True)
tafe_survey[:3]

Out[8]:

	id	Institute	WorkArea	cease_date	separationtype	Contributing Factors. Career Move - Public Sector	Contributing Factors. Career Move - Private Sector	Contributing Factors. Career Move - Self-employment	Contributing Factors. Ill Health	Contributing Factors. Maternity/Family	...	Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?	Workplace. Topic:Does your workplace promote and practice the principles of employment equity?	Workplace. Topic:Does your workplace value the diversity of its employees?	Workplace. Topic:Would you recommend the Institute as an employer to others?	gender	age	employment_status	position	institute_service	role_service
0	6.341330e+17	Southern Queensland Institute of TAFE	Non-Delivery (corporate)	2010.0	Contract Expired	NaN	NaN	NaN	NaN	NaN	...	Yes	Yes	Yes	Yes	Female	26 30	Temporary Full-time	Administration (AO)	1-2	1-2
1	6.341337e+17	Mount Isa Institute of TAFE	Non-Delivery (corporate)	2010.0	Retirement	-	-	-	-	-	...	Yes	Yes	Yes	Yes	NaN	NaN	NaN	NaN	NaN	NaN
2	6.341388e+17	Mount Isa Institute of TAFE	Delivery (teaching)	2010.0	Retirement	-	-	-	-	-	...	Yes	Yes	Yes	Yes	NaN	NaN	NaN	NaN	NaN	NaN

3 rows × 72 columns

Step 4 - Clean Column Names - 2¶

Make all column names lowercase
Replace spaces with underscores ("_")
Strip Whitespace characters from ends of column names
Remove . character from column names

In [9]:

dete_survey.columns = dete_survey.columns.str.lower().str.strip().str.replace(' ','_').str.replace('.','')
dete_survey[:2]

Out[9]:

	id	separationtype	cease_date	dete_start_date	role_start_date	position	classification	region	business_unit	employment_status	...	kept_informed	wellness_programs	health_&_safety	gender	age	aboriginal	torres_strait	south_sea	disability	nesb
0	1	Ill Health Retirement	08/2012	1984.0	2004.0	Public Servant	A01-A04	Central Office	Corporate Strategy and Peformance	Permanent Full-time	...	N	N	N	Male	56-60	NaN	NaN	NaN	NaN	Yes
1	2	Voluntary Early Retirement (VER)	08/2012	NaN	NaN	Public Servant	AO5-AO7	Central Office	Corporate Strategy and Peformance	Permanent Full-time	...	N	N	N	Male	56-60	NaN	NaN	NaN	NaN	NaN

2 rows × 56 columns

In [10]:

tafe_survey.columns = tafe_survey.columns.str.lower().str.strip().str.replace(' ','_').str.replace('.','')
tafe_survey[:2]

Out[10]:

	id	institute	workarea	cease_date	separationtype	contributing_factors_career_move_-_public_sector	contributing_factors_career_move_-_private_sector	contributing_factors_career_move_-_self-employment	contributing_factors_ill_health	contributing_factors_maternity/family	...	workplace_topic:does_your_workplace_promote_a_work_culture_free_from_all_forms_of_unlawful_discrimination?	workplace_topic:does_your_workplace_promote_and_practice_the_principles_of_employment_equity?	workplace_topic:does_your_workplace_value_the_diversity_of_its_employees?	workplace_topic:would_you_recommend_the_institute_as_an_employer_to_others?	gender	age	employment_status	position	institute_service	role_service
0	6.341330e+17	Southern Queensland Institute of TAFE	Non-Delivery (corporate)	2010.0	Contract Expired	NaN	NaN	NaN	NaN	NaN	...	Yes	Yes	Yes	Yes	Female	26 30	Temporary Full-time	Administration (AO)	1-2	1-2
1	6.341337e+17	Mount Isa Institute of TAFE	Non-Delivery (corporate)	2010.0	Retirement	-	-	-	-	-	...	Yes	Yes	Yes	Yes	NaN	NaN	NaN	NaN	NaN	NaN

2 rows × 72 columns

Step 5 - Drop Unnecessary Columns¶

From 1.3 and 1.4, it can be observed that both the datasets have a lot of columns that don't seem to be important from the point of view of our objective. It is best to declutter the datasets and remove those unnecessary columns -

Junk Columns in dete_survey - index numbers [28:49]
Junk Columns in tafe_survey - index numbers [17:66]

5.1 Dropping columns in DETE Dataset¶

In [11]:

# Drop Junk Columns in DETE Dataset
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49], axis = 1)
# Display Columns in DETE Dataset
dete_survey_updated.columns

Out[11]:

Index(['id', 'separationtype', 'cease_date', 'dete_start_date',
       'role_start_date', 'position', 'classification', 'region',
       'business_unit', 'employment_status', 'career_move_to_public_sector',
       'career_move_to_private_sector', 'interpersonal_conflicts',
       'job_dissatisfaction', 'dissatisfaction_with_the_department',
       'physical_work_environment', 'lack_of_recognition',
       'lack_of_job_security', 'work_location', 'employment_conditions',
       'maternity/family', 'relocation', 'study/travel', 'ill_health',
       'traumatic_incident', 'work_life_balance', 'workload',
       'none_of_the_above', 'gender', 'age', 'aboriginal', 'torres_strait',
       'south_sea', 'disability', 'nesb'],
      dtype='object')

5.2 Dropping columns in TAFE Dataset¶

In [12]:

# Drop Junk Columns in TAFE Dataset
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66], axis = 1)
# Display Columns in TAFE Dataset
tafe_survey_updated.columns

Out[12]:

Index(['id', 'institute', 'workarea', 'cease_date', 'separationtype',
       'contributing_factors_career_move_-_public_sector',
       'contributing_factors_career_move_-_private_sector',
       'contributing_factors_career_move_-_self-employment',
       'contributing_factors_ill_health',
       'contributing_factors_maternity/family',
       'contributing_factors_dissatisfaction',
       'contributing_factors_job_dissatisfaction',
       'contributing_factors_interpersonal_conflict',
       'contributing_factors_study', 'contributing_factors_travel',
       'contributing_factors_other', 'contributing_factors_none', 'gender',
       'age', 'employment_status', 'position', 'institute_service',
       'role_service'],
      dtype='object')

Step 6 - Filter out employees whose reason for separation was `resignation`¶

Since our objective is to carry out analysis on only the subset of empployees who resigned, we will single out only those rows where value in the separatioontype columns contains the Resignation keyword. As a refresher, find below the screenshot of the objective below -

6.1 TAFE dataset

6.1.1 Analysing the distinct count of separationtype of TAFE employees

Out of 702 employees who separated from TAFE, almost 340 resigned.

In [13]:

tafe_survey['separationtype'].value_counts(dropna = False)

Out[13]:

Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
NaN                           1
Name: separationtype, dtype: int64

6.1.2 Filtering rows having Resignation value in separationtype column

In [14]:

tafe_resignations = tafe_survey_updated[tafe_survey_updated['separationtype']=='Resignation'].copy()

6.2 DETE Dataset

6.2.1 Analysing the distinct count of separationtype of DETE employees

Of the 822 employees in DETE, 311 resigned.

In [15]:

dete_survey['separationtype'].value_counts(dropna = False)

Out[15]:

Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: separationtype, dtype: int64

6.2.2 Filtering rows having Resignation value in separationtype column

In [16]:

dete_resignations = dete_survey_updated[dete_survey_updated['separationtype'].str.contains(r'Resignation')].copy()

Step 7 - Verifying `date` columns¶

7.1 Cleaning `cease_date` column from DETE dataset¶

It appears that some values of cease_date column in dete_resignations are formatted as mm/yyyy, while some are formatted as yyyy. In such a scenario, it is better to remove all the mm/ sequence of characters from cease_date values, for ease of calculation in the analysis ahead.

In [17]:

dete_resignations.head()

Out[17]:

	id	separationtype	cease_date	dete_start_date	role_start_date	position	classification	region	business_unit	employment_status	...	work_life_balance	workload	none_of_the_above	gender	age	aboriginal	torres_strait	south_sea	disability	nesb
3	4	Resignation-Other reasons	05/2012	2005.0	2006.0	Teacher	Primary	Central Queensland	NaN	Permanent Full-time	...	False	False	False	Female	36-40	NaN	NaN	NaN	NaN	NaN
5	6	Resignation-Other reasons	05/2012	1994.0	1997.0	Guidance Officer	NaN	Central Office	Education Queensland	Permanent Full-time	...	False	False	False	Female	41-45	NaN	NaN	NaN	NaN	NaN
8	9	Resignation-Other reasons	07/2012	2009.0	2009.0	Teacher	Secondary	North Queensland	NaN	Permanent Full-time	...	False	False	False	Female	31-35	NaN	NaN	NaN	NaN	NaN
9	10	Resignation-Other employer	2012	1997.0	2008.0	Teacher Aide	NaN	NaN	NaN	Permanent Part-time	...	False	False	False	Female	46-50	NaN	NaN	NaN	NaN	NaN
11	12	Resignation-Move overseas/interstate	2012	2009.0	2009.0	Teacher	Secondary	Far North Queensland	NaN	Permanent Full-time	...	False	False	False	Male	31-35	NaN	NaN	NaN	NaN	NaN

5 rows × 35 columns

In [18]:

# Filtering 'mm/' and assigning 'yyyy' to self
dete_resignations['cease_date'] = dete_resignations['cease_date'].astype(str).str[-4:].astype(float)

In [19]:

dete_resignations['cease_date'].value_counts(dropna = False)

Out[19]:

2013.0    146
2012.0    129
2014.0     22
NaN        11
2010.0      2
2006.0      1
Name: cease_date, dtype: int64

7.2 Making a new column for Years of Service of employees (`institute_service`)¶

In [20]:

# Years of Service of DETE employees (before resigning)
dete_resignations['institute_service'] = dete_resignations['cease_date'] - dete_resignations['dete_start_date']

In [21]:

dete_resignations['institute_service']

Out[21]:

3       7.0
5      18.0
8       3.0
9      15.0
11      3.0
       ... 
808     3.0
815     2.0
816     2.0
819     5.0
821     NaN
Name: institute_service, Length: 311, dtype: float64

Step 8 - Identifying Dissatisfied Employees¶

Now that the Resignation type of data has been filtered, we can figure out whether the reason for Resignation was either employee dissatisfaction, or some other reason.
From the mentioned column names in 1.3 & 1.4, the columns that are associated with employee dissatisfaction need to be singled out first.

Once that is done, those columns will be used to create a single (Boolean) column calleddissatisfaction in both the datasets, which shall have -

True value for an employee whose reason for resignation was some sort of dissatisfaction, and
False value for an employee whose reason for resignation was not dissatisfaction.

8.1 DETE Dataset resignation factor columns (Highlighted in Yellow)

In [22]:

dete_dissed_df = dete_resignations[['job_dissatisfaction','dissatisfaction_with_the_department',
                                    'physical_work_environment','lack_of_recognition',
                                    'lack_of_job_security','work_location','employment_conditions',
                                    'work_life_balance','workload'
                                  ]]
dete_dissed_df

Out[22]:

	job_dissatisfaction	dissatisfaction_with_the_department	physical_work_environment	lack_of_recognition	lack_of_job_security	work_location	employment_conditions	work_life_balance	workload
3	False	False	False	False	False	False	False	False	False
5	False	False	False	False	False	False	True	False	False
8	False	False	False	False	False	False	False	False	False
9	True	True	False	False	False	False	False	False	False
11	False	False	False	False	False	False	False	False	False
...	...	...	...	...	...	...	...	...	...
808	False	False	False	False	False	False	False	False	False
815	False	False	False	False	False	False	False	False	False
816	False	False	False	False	False	False	False	False	False
819	False	False	False	False	False	False	False	True	False
821	False	False	False	False	False	False	False	False	False

311 rows × 9 columns

In [23]:

dete_dissed_s = dete_dissed_df.any(axis = 1, skipna = False)
dete_dissed_s

Out[23]:

3      False
5       True
8      False
9       True
11     False
       ...  
808    False
815    False
816    False
819     True
821    False
Length: 311, dtype: bool

In [24]:

dete_resignations['dissatisfied'] = dete_dissed_s.copy()

8.2 TAFE Dataset

TAFE Dataset resignation factor columns (Highlighted in Yellow)

In [25]:

tafe_dissed_df = tafe_resignations[['contributing_factors_dissatisfaction',
                                    'contributing_factors_job_dissatisfaction'
                                  ]]
tafe_dissed_df.head(25)

Out[25]:

	contributing_factors_dissatisfaction	contributing_factors_job_dissatisfaction
3	-	-
4	-	-
5	-	-
6	-	-
7	-	-
8	-	-
9	-	-
10	-	-
13	-	-
14	Contributing Factors. Dissatisfaction	Job Dissatisfaction
15	-	-
16	NaN	NaN
17	-	Job Dissatisfaction
18	NaN	NaN
19	-	-
20	Contributing Factors. Dissatisfaction	Job Dissatisfaction
21	-	-
22	-	-
23	-	-
24	-	-
26	Contributing Factors. Dissatisfaction	Job Dissatisfaction
27	-	-
29	-	-
32	-	-
36	-	-

In [26]:

def update_vals(element):
    if element == '-':
        return False
    elif pd.isnull(element):
        return np.nan
    else:
        return True

temp_df = tafe_dissed_df.applymap(update_vals)
tafe_dissed_s = temp_df.any(axis=1, skipna = False)
temp_df

Out[26]:

	contributing_factors_dissatisfaction	contributing_factors_job_dissatisfaction
3	False	False
4	False	False
5	False	False
6	False	False
7	False	False
...	...	...
696	False	False
697	False	False
698	False	False
699	False	False
701	False	False

340 rows × 2 columns

In [27]:

tafe_dissed_s.head(25)

Out[27]:

3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
13    False
14     True
15    False
16      NaN
17     True
18      NaN
19    False
20     True
21    False
22    False
23    False
24    False
26     True
27    False
29    False
32    False
36    False
dtype: object

In [28]:

tafe_resignations['dissatisfied'] = tafe_dissed_s.copy()

Step 9 - Combining the Datasets¶

9.1 Creating a new column named `institute`¶

Before combining the data, a column named institute needs to be created in the DETE dataset, which will have 'DETE' as the value in each cell. (Note that the TAFE dataset already has the institute column, nevertheless each cell-value of that column will have to be rewritten as - 'TAFE')

In [29]:

dete_resignations['institute'] = 'DETE'
tafe_resignations['institute'] = 'TAFE'

9.2 Combining the Datasets¶

The combined dataset has 651 rows and 52 columns -

In [30]:

combined = pd.concat([dete_resignations,tafe_resignations], ignore_index = True)
combined

Out[30]:

	id	separationtype	cease_date	dete_start_date	role_start_date	position	classification	region	business_unit	employment_status	...	contributing_factors_ill_health	contributing_factors_maternity/family	contributing_factors_dissatisfaction	contributing_factors_job_dissatisfaction	contributing_factors_interpersonal_conflict	contributing_factors_study	contributing_factors_travel	contributing_factors_other	contributing_factors_none	role_service
0	4.000000e+00	Resignation-Other reasons	2012.0	2005.0	2006.0	Teacher	Primary	Central Queensland	NaN	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	6.000000e+00	Resignation-Other reasons	2012.0	1994.0	1997.0	Guidance Officer	NaN	Central Office	Education Queensland	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	9.000000e+00	Resignation-Other reasons	2012.0	2009.0	2009.0	Teacher	Secondary	North Queensland	NaN	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	1.000000e+01	Resignation-Other employer	2012.0	1997.0	2008.0	Teacher Aide	NaN	NaN	NaN	Permanent Part-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	1.200000e+01	Resignation-Move overseas/interstate	2012.0	2009.0	2009.0	Teacher	Secondary	Far North Queensland	NaN	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
646	6.350660e+17	Resignation	2013.0	NaN	NaN	Operational (OO)	NaN	NaN	NaN	Temporary Full-time	...	-	-	-	-	-	-	-	-	-	5-6
647	6.350668e+17	Resignation	2013.0	NaN	NaN	Teacher (including LVT)	NaN	NaN	NaN	Temporary Full-time	...	-	-	-	-	-	-	-	-	-	1-2
648	6.350677e+17	Resignation	2013.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	-	-	-	-	-	-	-	-	-	NaN
649	6.350704e+17	Resignation	2013.0	NaN	NaN	Teacher (including LVT)	NaN	NaN	NaN	Permanent Full-time	...	-	-	-	-	-	-	-	Other	-	1-2
650	6.350730e+17	Resignation	2013.0	NaN	NaN	Administration (AO)	NaN	NaN	NaN	Contract/casual	...	-	-	-	-	-	-	Travel	-	-	1-2

651 rows × 52 columns

9.3 Assigning `NaN` type to `-` values in `combined` dataset¶

It is immediately observed that many cells in the combined dataset have '-' value in them. Although we can read this value as NULL, python can't, and it will return the data type of such a value as str. Before proceeding, it is best to assign NULL or NaN type to any such cells in the combined dataset.
The updated combined dataset is named as combined_up -

In [31]:

def update_nan(element):
    if element == '-':
        return np.nan
    else:
        return element
combined_up = combined.applymap(update_nan)
combined_up

Out[31]:

	id	separationtype	cease_date	dete_start_date	role_start_date	position	classification	region	business_unit	employment_status	...	contributing_factors_ill_health	contributing_factors_maternity/family	contributing_factors_dissatisfaction	contributing_factors_job_dissatisfaction	contributing_factors_interpersonal_conflict	contributing_factors_study	contributing_factors_travel	contributing_factors_other	contributing_factors_none	role_service
0	4.000000e+00	Resignation-Other reasons	2012.0	2005.0	2006.0	Teacher	Primary	Central Queensland	NaN	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	6.000000e+00	Resignation-Other reasons	2012.0	1994.0	1997.0	Guidance Officer	NaN	Central Office	Education Queensland	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	9.000000e+00	Resignation-Other reasons	2012.0	2009.0	2009.0	Teacher	Secondary	North Queensland	NaN	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	1.000000e+01	Resignation-Other employer	2012.0	1997.0	2008.0	Teacher Aide	NaN	NaN	NaN	Permanent Part-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	1.200000e+01	Resignation-Move overseas/interstate	2012.0	2009.0	2009.0	Teacher	Secondary	Far North Queensland	NaN	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
646	6.350660e+17	Resignation	2013.0	NaN	NaN	Operational (OO)	NaN	NaN	NaN	Temporary Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	5-6
647	6.350668e+17	Resignation	2013.0	NaN	NaN	Teacher (including LVT)	NaN	NaN	NaN	Temporary Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1-2
648	6.350677e+17	Resignation	2013.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
649	6.350704e+17	Resignation	2013.0	NaN	NaN	Teacher (including LVT)	NaN	NaN	NaN	Permanent Full-time	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Other	NaN	1-2
650	6.350730e+17	Resignation	2013.0	NaN	NaN	Administration (AO)	NaN	NaN	NaN	Contract/casual	...	NaN	NaN	NaN	NaN	NaN	NaN	Travel	NaN	NaN	1-2

651 rows × 52 columns

9.4 Dropping unnecessary columns in `combined` dataset¶

On doing a NULL-Count on the combined_up dataset, it can be clearly observed that out of 52 columns, most of them comprise of NaN (NULL) values. As most of them aren't required for our analysis anyways, it is best to further de-clutter our dataset, and drop any columns that have less than 500 non-null values.

By doing this, I am able to bring down the number of columns from 52 too the relevant 10, without compromising on any useful data.

In [32]:

combined_up.isnull().sum()

Out[32]:

id                                                      0
separationtype                                          0
cease_date                                             16
dete_start_date                                       368
role_start_date                                       380
position                                               53
classification                                        490
region                                                386
business_unit                                         619
employment_status                                      54
career_move_to_public_sector                          340
career_move_to_private_sector                         340
interpersonal_conflicts                               340
job_dissatisfaction                                   340
dissatisfaction_with_the_department                   340
physical_work_environment                             340
lack_of_recognition                                   340
lack_of_job_security                                  340
work_location                                         340
employment_conditions                                 340
maternity/family                                      340
relocation                                            340
study/travel                                          340
ill_health                                            340
traumatic_incident                                    340
work_life_balance                                     340
workload                                              340
none_of_the_above                                     340
gender                                                 59
age                                                    55
aboriginal                                            644
torres_strait                                         651
south_sea                                             648
disability                                            643
nesb                                                  642
institute_service                                      88
dissatisfied                                            8
institute                                               0
workarea                                              311
contributing_factors_career_move_-_public_sector      603
contributing_factors_career_move_-_private_sector     552
contributing_factors_career_move_-_self-employment    638
contributing_factors_ill_health                       630
contributing_factors_maternity/family                 631
contributing_factors_dissatisfaction                  596
contributing_factors_job_dissatisfaction              589
contributing_factors_interpersonal_conflict           627
contributing_factors_study                            635
contributing_factors_travel                           634
contributing_factors_other                            565
contributing_factors_none                             635
role_service                                          361
dtype: int64

In [33]:

combined_updated = combined_up.dropna(thresh=500, axis =1).copy()
combined_updated

Out[33]:

	id	separationtype	cease_date	position	employment_status	gender	age	institute_service	dissatisfied	institute
0	4.000000e+00	Resignation-Other reasons	2012.0	Teacher	Permanent Full-time	Female	36-40	7	False	DETE
1	6.000000e+00	Resignation-Other reasons	2012.0	Guidance Officer	Permanent Full-time	Female	41-45	18	True	DETE
2	9.000000e+00	Resignation-Other reasons	2012.0	Teacher	Permanent Full-time	Female	31-35	3	False	DETE
3	1.000000e+01	Resignation-Other employer	2012.0	Teacher Aide	Permanent Part-time	Female	46-50	15	True	DETE
4	1.200000e+01	Resignation-Move overseas/interstate	2012.0	Teacher	Permanent Full-time	Male	31-35	3	False	DETE
...	...	...	...	...	...	...	...	...	...	...
646	6.350660e+17	Resignation	2013.0	Operational (OO)	Temporary Full-time	Male	21 25	5-6	False	TAFE
647	6.350668e+17	Resignation	2013.0	Teacher (including LVT)	Temporary Full-time	Male	51-55	1-2	False	TAFE
648	6.350677e+17	Resignation	2013.0	NaN	NaN	NaN	NaN	NaN	False	TAFE
649	6.350704e+17	Resignation	2013.0	Teacher (including LVT)	Permanent Full-time	Female	51-55	5-6	False	TAFE
650	6.350730e+17	Resignation	2013.0	Administration (AO)	Contract/casual	Female	26 30	3-4	False	TAFE

651 rows × 10 columns

In [34]:

combined_updated.isnull().sum()

Out[34]:

id                    0
separationtype        0
cease_date           16
position             53
employment_status    54
gender               59
age                  55
institute_service    88
dissatisfied          8
institute             0
dtype: int64

Step 10 - Cleaning the `institute_service` (years of service) column¶

Recalling the objective, it is required that we run some kind of initial analysis that depicts whether employees with low years of experience were more likely to resign due to dissatisfaction than those with higher experience.

Thus, the institute_service column that depicts the years of service of the employees needs to be cleaned next. But there's a slight hiccup - the institute_service column contains data in different forms -

Since we have both categorical and numerical values in our column, it is best to assign them all into different categories for our analysis further. A slighly modified definition for "years of experience" would be categorised as follows -

New: Less than 3 years at a company
Experienced: 3-6 years at a company
Established: 7-10 years at a company
Veteran: 11 or more years at a company

In [35]:

yrsofservice = combined_updated['institute_service'].copy()
yrsofservice = yrsofservice.astype(str)

In [36]:

yrsofservice.value_counts().sort_index()

Out[36]:

0.0                   20
1-2                   64
1.0                   22
10.0                   6
11-20                 26
11.0                   4
12.0                   6
13.0                   8
14.0                   6
15.0                   7
16.0                   5
17.0                   6
18.0                   5
19.0                   3
2.0                   14
20.0                   7
21.0                   3
22.0                   6
23.0                   4
24.0                   4
25.0                   2
26.0                   2
27.0                   1
28.0                   2
29.0                   1
3-4                   63
3.0                   20
30.0                   2
31.0                   1
32.0                   3
33.0                   1
34.0                   1
35.0                   1
36.0                   2
38.0                   1
39.0                   3
4.0                   16
41.0                   1
42.0                   1
49.0                   1
5-6                   33
5.0                   23
6.0                   17
7-10                  21
7.0                   13
8.0                    8
9.0                   14
Less than 1 year      73
More than 20 years    10
nan                   88
Name: institute_service, dtype: int64

In [37]:

temp = yrsofservice.copy()

In [38]:

def category_of_service(element):
    if element == 'nan': 
        return np.nan
    elif 'Less than 1 year' in element:
        return 'New'
    elif 'More than 20 years' in element:
        return 'Veteran'
    elif int(element) < 3:
        return 'New'
    elif (int(element) >= 3) & (int(element) <= 6):
        return 'Experienced'
    elif (int(element) >= 7) & (int(element) <= 10):
        return 'Established'
    else:
        return 'Veteran'

temp = temp.str.split('.').str[0].str.split('-').str[0]
service_cat = temp.apply(category_of_service)

In [39]:

service_cat

Out[39]:

0      Established
1          Veteran
2      Experienced
3          Veteran
4      Experienced
          ...     
646    Experienced
647            New
648            NaN
649    Experienced
650    Experienced
Name: institute_service, Length: 651, dtype: object

In [40]:

service_cat.value_counts().sort_index()

Out[40]:

Established     62
Experienced    172
New            193
Veteran        136
Name: institute_service, dtype: int64

Step 11 Categorising the Service Category - creating a new `service_cat` column¶

On basis of the years of experience, a new column called service_cat is created next, in the combined_updated dataset.
After that, we have a look at how many NaN values are in the dissatisfied column.

Since there are only 8 such values, we assign them a Boolean - False (since there are 403 False values as compared to 240 True values, and moreover, excluding or including these rows won't have a sigificant impact on the overall analysis due to their sheer low count (10))

In [41]:

combined_updated['service_cat'] = service_cat.copy()
combined_updated

Out[41]:

	id	separationtype	cease_date	position	employment_status	gender	age	institute_service	dissatisfied	institute	service_cat
0	4.000000e+00	Resignation-Other reasons	2012.0	Teacher	Permanent Full-time	Female	36-40	7	False	DETE	Established
1	6.000000e+00	Resignation-Other reasons	2012.0	Guidance Officer	Permanent Full-time	Female	41-45	18	True	DETE	Veteran
2	9.000000e+00	Resignation-Other reasons	2012.0	Teacher	Permanent Full-time	Female	31-35	3	False	DETE	Experienced
3	1.000000e+01	Resignation-Other employer	2012.0	Teacher Aide	Permanent Part-time	Female	46-50	15	True	DETE	Veteran
4	1.200000e+01	Resignation-Move overseas/interstate	2012.0	Teacher	Permanent Full-time	Male	31-35	3	False	DETE	Experienced
...	...	...	...	...	...	...	...	...	...	...	...
646	6.350660e+17	Resignation	2013.0	Operational (OO)	Temporary Full-time	Male	21 25	5-6	False	TAFE	Experienced
647	6.350668e+17	Resignation	2013.0	Teacher (including LVT)	Temporary Full-time	Male	51-55	1-2	False	TAFE	New
648	6.350677e+17	Resignation	2013.0	NaN	NaN	NaN	NaN	NaN	False	TAFE	NaN
649	6.350704e+17	Resignation	2013.0	Teacher (including LVT)	Permanent Full-time	Female	51-55	5-6	False	TAFE	Experienced
650	6.350730e+17	Resignation	2013.0	Administration (AO)	Contract/casual	Female	26 30	3-4	False	TAFE	Experienced

651 rows × 11 columns

In [42]:

combined_updated[(combined_updated['dissatisfied']!=False)&(combined_updated['dissatisfied']!=True)]

Out[42]:

	id	separationtype	cease_date	position	employment_status	gender	age	institute_service	dissatisfied	institute	service_cat
322	6.341770e+17	Resignation	2010.0	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN
324	6.341779e+17	Resignation	2010.0	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN
345	6.342141e+17	Resignation	2010.0	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN
466	6.345510e+17	Resignation	2011.0	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN
472	6.345581e+17	Resignation	2011.0	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN
523	6.346963e+17	Resignation	2012.0	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN
543	6.347827e+17	Resignation	NaN	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN
627	6.350124e+17	Resignation	2013.0	NaN	NaN	NaN	NaN	NaN	NaN	TAFE	NaN

In [43]:

combined_updated['dissatisfied'] = combined_updated['dissatisfied'].fillna(False)

In [44]:

# To check if there aren't any NaN values left in 'dissatisfied' column
combined_updated['dissatisfied'].value_counts(dropna = False)

Out[44]:

False    411
True     240
Name: dissatisfied, dtype: int64

Step 12 - CONCLUSION & Performing Initial Analysis¶

After all this cleaning, finally I am ready to perform my analysis, and (hopefully) I can arrive at some meaningful insights.

12.1 % Dissatisfied Resignations (Service Category wise)¶

Using a straightforward bar-graph to compare the percentage of resignations that were due to some sort of dissatisfaction of the exiting employee, the following significant findings are observed -

Contrary to the institutes' beliefs, the % of dissatisfied employees rises as the the years of service of an employee increase!
New employees (0-3 years of experience) are least likely to resign from the company due to dissatisfaction.
Almost half of the Established (7-10 yrs) and Veteran (> 11 yrs) employees resign due to dissatisfaction.

In [45]:

combined_updated.pivot_table(values='dissatisfied', index = 'service_cat')

Out[45]:

	dissatisfied
service_cat
Established	0.516129
Experienced	0.343023
New	0.295337
Veteran	0.485294

In [46]:

%matplotlib inline
combined_updated.pivot_table(values='dissatisfied', index = 'service_cat').plot.bar()

Out[46]:

<matplotlib.axes._subplots.AxesSubplot at 0xf609681e48>

12.2 % Dissatisfied Resignations (Institute Wise)¶

Although my imaginary stakeholders told me to carry out a combiined analysis for both the institutes, it is wise to know, which institute contributes more to the dissatisfied employee resignations in our data.

Here, again, I use a straighforward bar chart to compare the % resignations due to dissatisfaction. The findings are significant -

Nearly 50% resignations in DETE are from dissatisfied employees.
Only about 25% resignations in TAFE are from dissatisfied employees.
Employees in DETE are much more dissatisfied TAFE (which implicitly tells you a lot about the institute)

In [47]:

combined_updated.pivot_table(values='dissatisfied', index = 'institute').plot.bar()

Out[47]:

<matplotlib.axes._subplots.AxesSubplot at 0xf60a4f1f08>

12.3 % Dissatisfied Resignations (Employment Status Wise)¶

Here again, I use a straighforward bar chart to compare the % resignations due to dissatisfaction. The findings are significant -

More than 40% Permanent Full-Time and Permanent Part-Time employees' resign due to dissatisfaction.
Only 20% of other employees' (Temporary, Casual, Contractual) resign due to dissatisfaction.

Bottom Line - Permanent Employees are most dissatisfied with their jobs. This is something that the institutes should look into.

In [48]:

combined_updated['employment_status'].value_counts(dropna = False)
combined_updated.pivot_table(values='dissatisfied', index = 'employment_status').plot.bar()

Out[48]:

<matplotlib.axes._subplots.AxesSubplot at 0xf60a5628c8>

12.4 % Dissatisfied Resignations (Gender Wise)¶

Again, I use a straighforward bar chart to compare the % resignations due to dissatisfaction. The findings this time are NOT significant -

Both the genders are almost equally likely to resign due to dissatisfaction.

In [49]:

combined_updated['gender'].value_counts(dropna = False)
combined_updated.pivot_table(values='dissatisfied', index = 'gender').plot.bar()

Out[49]:

<matplotlib.axes._subplots.AxesSubplot at 0xf60a5e2e08>

12.5 % Dissatisfied Resignations (Job Position Wise)¶

Again, I use a straighforward bar chart to compare the % resignations due to dissatisfaction.

Here, I divide the Job Positions into 3 main categories -

Job Category 1 - Administration Position (287 rows)
Job Category 2 - Teaching Position (156 rows)
Job Categoty 3 - Others (155 rows)

The findings are significant -

Only about 20% resignations in Administration job roles are due to dissatisfaction.
More than 40% resignations in Teaching job roles are due to dissatisfaction.

In [50]:

combined_updated['position'].value_counts(dropna = False)

Out[50]:

Administration (AO)                                        148
Teacher                                                    129
Teacher (including LVT)                                     95
Teacher Aide                                                63
NaN                                                         53
Cleaner                                                     39
Public Servant                                              30
Professional Officer (PO)                                   16
Operational (OO)                                            13
Head of Curriculum/Head of Special Education                10
School Administrative Staff                                  8
Technical Officer                                            8
Schools Officer                                              7
Workplace Training Officer                                   6
Technical Officer (TO)                                       5
School Based Professional Staff (Therapist, nurse, etc)      5
Executive (SES/SO)                                           4
Tutor                                                        3
Other                                                        3
Guidance Officer                                             3
Professional Officer                                         2
Business Service Manager                                     1
Name: position, dtype: int64

In [51]:

def position_cat_func(element):
    if pd.isnull(element):
        return np.nan
    elif 'Admin' in element:
        return 'Administration Position'
    elif 'Teach' in element:
        return 'Teaching Position'
    else:
        return 'Others'

temp_df = combined_updated.copy()
temp_df['position_cat'] = temp_df['position'].apply(position_cat_func)

# Plotting on Bar Chart
temp_df.pivot_table(values='dissatisfied', index = 'position_cat').plot.bar()

Out[51]:

<matplotlib.axes._subplots.AxesSubplot at 0xf60a65cd88>

12.6 % Dissatisfied Resignations (Employee Age Wise)¶

Again, I use a straighforward bar chart to compare the % resignations due to dissatisfaction per age group.

Here, I divide the age the following categories -

< 20
21-30
31-40
41-50
51-55
56 +

The findings are significant -

Employees less than 20 years of age are less likely to resign from dissatisfaction (although take this figure with a grain of salt, as there are only 10 employees aged 20 years or less in the data set)
Employees in the 50+ years and 21-30 years bracket are most dissatisfied with their jobs, as that is the reason for more than 40% of the resignations in those age groups.

In [52]:

def age_cat_classifier(element):
    if pd.isnull(element):
        return np.nan
    elif '20 or younger' in element:
        return '< 20'
    elif ('56 or' in element) | ('56-60' in element) | ('61 or' in element):
        return '> 56'
    elif '5' in element:
        return '51-55'
    else:
        return element[0] + '1 to ' + str(int(element[0])+1) + '0'

# Add a new column to temp_df to capture age-category ('age_cat')
temp_df['age_cat'] = temp_df['age'].apply(age_cat_classifier)

In [53]:

temp_df['age_cat'].value_counts()

Out[53]:

51-55       368
> 56         78
31 to 40     73
21 to 30     67
< 20         10
Name: age_cat, dtype: int64

In [54]:

temp_df.pivot_table(values='dissatisfied', index = 'age_cat').plot.bar()

Out[54]:

<matplotlib.axes._subplots.AxesSubplot at 0xf60a6cb848>

Conclusion¶

More experience == More Dissatisfaction (12.1) - Almost Half of the established and veteran employees resigned due to dissatisfaction (compared to a 30% dissatisfaction rate for employees with <7 years of work-ex)
DETE Employees are more dissatisfied than TAFE (12.2) - Dissatisfaction rate in resignations is almost 50% for DETE, compared to 25% in TAFE.
Permanent Employees are more dissatisfied than Temporary/Casual/Contractual workers (12.3)
Teachers are more dissatisfied than Administrators (12.5) - Dissatisfaction rate in resignations is almost 45% in Teaching job roles, compared to just 25% in Administration job roles.
21-30 and 50+ Age Groups are more dissatisfied than other age-groups (12.6)

Thanks!