Clean and Analyze Employee Exit Surveys

NOTE: SettingWithCopyWarning

I used the recommended procedure shown below throughout the complete project to prevent the above Warning. It didn't always work.

"Try using .loc[row_indexer,col_indexer] = value instead"

Introduction

In this guided project, we'll work with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. You can find the TAFE exit survey here and the survey for the DETE here.

Some slight modifications were made to these datasets to make them easier to work with, including changing the encoding to UTF-8 (the original ones are encoded using cp1252.)

For this project, the questions posed are:

1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?

2. Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

NOTE:

**I'm going to deviate slightly from the project guidelines.** The guidelines suggest that after we clean and organize the two data sets, combine them and answer the questions "across" the two institutes.

I would agree with this if I knew at this point of the analysis that the answers to the questions for each of the institutes independent of each other were close enough to be considered the same. I choose to not assume that, and will therefore answer the questions for each of the Institutes separately and compare the results. After that, I will still combine the data and analyze the results of that as well.

MY CHOSEN PROCESS:

  1. I will do all of the data cleaning, organizing, correcting, etc. for the DETE Institute.
  2. I will then answer the questions associated with the DETE Institute.
  3. I will do all of the data cleaning, organizing, correcting, etc. for the TAFE Institute.
  4. I will then answer the questions associated with the TAFE Institute.
  5. I will compare the results.
  6. Whether they are very different or not, I will still join the two data sets and answer the questions for the combined data.
  7. Based on the overall results, I will then make conclusions.

DETE INSTITUTE ANALYSIS

Read in Data: DETE

In [1]:
# Import both pandas, numpy and matplotlib to cover all needed code for this project.
# Read in and save the DETE Institute *.csv data file as a DataFrame.
# Not all missing data is signified as NaN, some are as 'Not Stated'.
# Add na_values function to standardize missing data as NaN.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dete_survey = pd.read_csv('dete_survey.csv', na_values=['Not Stated'])

# print column headings and file info to get a feel for the data file structure.

print(dete_survey.columns)
print('\n')
print(dete_survey.info)
Index(['ID', 'SeparationType', 'Cease Date', 'DETE Start Date',
       'Role Start Date', 'Position', 'Classification', 'Region',
       'Business Unit', 'Employment Status', 'Career move to public sector',
       'Career move to private sector', 'Interpersonal conflicts',
       'Job dissatisfaction', 'Dissatisfaction with the department',
       'Physical work environment', 'Lack of recognition',
       'Lack of job security', 'Work location', 'Employment conditions',
       'Maternity/family', 'Relocation', 'Study/Travel', 'Ill Health',
       'Traumatic incident', 'Work life balance', 'Workload',
       'None of the above', 'Professional Development',
       'Opportunities for promotion', 'Staff morale', 'Workplace issue',
       'Physical environment', 'Worklife balance',
       'Stress and pressure support', 'Performance of supervisor',
       'Peer support', 'Initiative', 'Skills', 'Coach', 'Career Aspirations',
       'Feedback', 'Further PD', 'Communication', 'My say', 'Information',
       'Kept informed', 'Wellness programs', 'Health & Safety', 'Gender',
       'Age', 'Aboriginal', 'Torres Strait', 'South Sea', 'Disability',
       'NESB'],
      dtype='object')


<bound method DataFrame.info of       ID                        SeparationType Cease Date  DETE Start Date  \
0      1                 Ill Health Retirement    08/2012           1984.0   
1      2      Voluntary Early Retirement (VER)    08/2012              NaN   
2      3      Voluntary Early Retirement (VER)    05/2012           2011.0   
3      4             Resignation-Other reasons    05/2012           2005.0   
4      5                        Age Retirement    05/2012           1970.0   
..   ...                                   ...        ...              ...   
817  819                        Age Retirement    02/2014           1977.0   
818  820                        Age Retirement    01/2014           1980.0   
819  821  Resignation-Move overseas/interstate    01/2014           2009.0   
820  822                 Ill Health Retirement    12/2013           2001.0   
821  823  Resignation-Move overseas/interstate    12/2013              NaN   

     Role Start Date                                      Position  \
0             2004.0                                Public Servant   
1                NaN                                Public Servant   
2             2011.0                               Schools Officer   
3             2006.0                                       Teacher   
4             1989.0  Head of Curriculum/Head of Special Education   
..               ...                                           ...   
817           1999.0                                       Teacher   
818           1980.0                                       Teacher   
819           2009.0                                Public Servant   
820           2009.0                                       Teacher   
821              NaN                                  Teacher Aide   

    Classification                    Region  \
0          A01-A04            Central Office   
1          AO5-AO7            Central Office   
2              NaN            Central Office   
3          Primary        Central Queensland   
4              NaN                South East   
..             ...                       ...   
817        Primary        Central Queensland   
818      Secondary               North Coast   
819        A01-A04            Central Office   
820      Secondary  Darling Downs South West   
821            NaN              Metropolitan   

                         Business Unit    Employment Status  ...  \
0    Corporate Strategy and Peformance  Permanent Full-time  ...   
1    Corporate Strategy and Peformance  Permanent Full-time  ...   
2                 Education Queensland  Permanent Full-time  ...   
3                                  NaN  Permanent Full-time  ...   
4                                  NaN  Permanent Full-time  ...   
..                                 ...                  ...  ...   
817                                NaN  Permanent Part-time  ...   
818                                NaN  Permanent Full-time  ...   
819               Education Queensland  Permanent Full-time  ...   
820                                NaN  Permanent Full-time  ...   
821                                NaN                  NaN  ...   

     Kept informed  Wellness programs  Health & Safety  Gender          Age  \
0                N                  N                N    Male        56-60   
1                N                  N                N    Male        56-60   
2                N                  N                N    Male  61 or older   
3                A                  N                A  Female        36-40   
4                N                  A                M  Female  61 or older   
..             ...                ...              ...     ...          ...   
817              A                  A               SA  Female        56-60   
818              N                  N                N    Male        51-55   
819              A                  N                A  Female        31-35   
820              A                  N                A  Female        41-45   
821            NaN                NaN              NaN     NaN          NaN   

     Aboriginal  Torres Strait  South Sea  Disability  NESB  
0           NaN            NaN        NaN         NaN   Yes  
1           NaN            NaN        NaN         NaN   NaN  
2           NaN            NaN        NaN         NaN   NaN  
3           NaN            NaN        NaN         NaN   NaN  
4           NaN            NaN        NaN         NaN   NaN  
..          ...            ...        ...         ...   ...  
817         NaN            NaN        NaN         NaN   NaN  
818         NaN            NaN        NaN         NaN   NaN  
819         NaN            NaN        NaN         NaN   NaN  
820         NaN            NaN        NaN         NaN   NaN  
821         NaN            NaN        NaN         NaN   NaN  

[822 rows x 56 columns]>

Observations: DETE

There is a lot of variation on column heading formats and many columns with information that is not relevant to answering the questions posed for this project.

Identify Missing Values and Drop Unnecessary Columns: DETE

In [2]:
# Observe volume of missing data. by column.

print(dete_survey.isnull().sum())
ID                                       0
SeparationType                           0
Cease Date                              34
DETE Start Date                         73
Role Start Date                         98
Position                                 5
Classification                         367
Region                                 105
Business Unit                          696
Employment Status                        5
Career move to public sector             0
Career move to private sector            0
Interpersonal conflicts                  0
Job dissatisfaction                      0
Dissatisfaction with the department      0
Physical work environment                0
Lack of recognition                      0
Lack of job security                     0
Work location                            0
Employment conditions                    0
Maternity/family                         0
Relocation                               0
Study/Travel                             0
Ill Health                               0
Traumatic incident                       0
Work life balance                        0
Workload                                 0
None of the above                        0
Professional Development                14
Opportunities for promotion             87
Staff morale                             6
Workplace issue                         34
Physical environment                     5
Worklife balance                         7
Stress and pressure support             12
Performance of supervisor                9
Peer support                            10
Initiative                               9
Skills                                  11
Coach                                   55
Career Aspirations                      76
Feedback                                30
Further PD                              54
Communication                            8
My say                                  10
Information                              6
Kept informed                            9
Wellness programs                       56
Health & Safety                         29
Gender                                  24
Age                                     11
Aboriginal                             806
Torres Strait                          819
South Sea                              815
Disability                             799
NESB                                   790
dtype: int64
In [3]:
# There are many columns which are not relevant to answering the questions as described in the introduction.
# Use appropriate function to drop columns 28 to 49.

labels = dete_survey.columns[28:49]
dete_survey_updated = dete_survey.drop(columns=labels, axis=1)
dete_survey_updated.info
Out[3]:
<bound method DataFrame.info of       ID                        SeparationType Cease Date  DETE Start Date  \
0      1                 Ill Health Retirement    08/2012           1984.0   
1      2      Voluntary Early Retirement (VER)    08/2012              NaN   
2      3      Voluntary Early Retirement (VER)    05/2012           2011.0   
3      4             Resignation-Other reasons    05/2012           2005.0   
4      5                        Age Retirement    05/2012           1970.0   
..   ...                                   ...        ...              ...   
817  819                        Age Retirement    02/2014           1977.0   
818  820                        Age Retirement    01/2014           1980.0   
819  821  Resignation-Move overseas/interstate    01/2014           2009.0   
820  822                 Ill Health Retirement    12/2013           2001.0   
821  823  Resignation-Move overseas/interstate    12/2013              NaN   

     Role Start Date                                      Position  \
0             2004.0                                Public Servant   
1                NaN                                Public Servant   
2             2011.0                               Schools Officer   
3             2006.0                                       Teacher   
4             1989.0  Head of Curriculum/Head of Special Education   
..               ...                                           ...   
817           1999.0                                       Teacher   
818           1980.0                                       Teacher   
819           2009.0                                Public Servant   
820           2009.0                                       Teacher   
821              NaN                                  Teacher Aide   

    Classification                    Region  \
0          A01-A04            Central Office   
1          AO5-AO7            Central Office   
2              NaN            Central Office   
3          Primary        Central Queensland   
4              NaN                South East   
..             ...                       ...   
817        Primary        Central Queensland   
818      Secondary               North Coast   
819        A01-A04            Central Office   
820      Secondary  Darling Downs South West   
821            NaN              Metropolitan   

                         Business Unit    Employment Status  ...  \
0    Corporate Strategy and Peformance  Permanent Full-time  ...   
1    Corporate Strategy and Peformance  Permanent Full-time  ...   
2                 Education Queensland  Permanent Full-time  ...   
3                                  NaN  Permanent Full-time  ...   
4                                  NaN  Permanent Full-time  ...   
..                                 ...                  ...  ...   
817                                NaN  Permanent Part-time  ...   
818                                NaN  Permanent Full-time  ...   
819               Education Queensland  Permanent Full-time  ...   
820                                NaN  Permanent Full-time  ...   
821                                NaN                  NaN  ...   

     Work life balance  Workload  None of the above  Gender          Age  \
0                False     False               True    Male        56-60   
1                False     False              False    Male        56-60   
2                False     False               True    Male  61 or older   
3                False     False              False  Female        36-40   
4                 True     False              False  Female  61 or older   
..                 ...       ...                ...     ...          ...   
817              False      True              False  Female        56-60   
818              False     False               True    Male        51-55   
819               True     False              False  Female        31-35   
820              False     False              False  Female        41-45   
821              False     False              False     NaN          NaN   

     Aboriginal  Torres Strait  South Sea  Disability  NESB  
0           NaN            NaN        NaN         NaN   Yes  
1           NaN            NaN        NaN         NaN   NaN  
2           NaN            NaN        NaN         NaN   NaN  
3           NaN            NaN        NaN         NaN   NaN  
4           NaN            NaN        NaN         NaN   NaN  
..          ...            ...        ...         ...   ...  
817         NaN            NaN        NaN         NaN   NaN  
818         NaN            NaN        NaN         NaN   NaN  
819         NaN            NaN        NaN         NaN   NaN  
820         NaN            NaN        NaN         NaN   NaN  
821         NaN            NaN        NaN         NaN   NaN  

[822 rows x 35 columns]>

Observations: DETE

I removed many columns from the DETE dataset that were not needed to answer the questions. This reduces the volume of output from the many commands in this project and makes it a little easier to manipulate the file.

Clean Column Names: DETE

In [4]:
# Use the _pd.str.replace_ code to make column titles simplified and consistent.

print(dete_survey_updated.columns)
print('\n')
dete_survey_updated.columns = dete_survey_updated.columns.str.replace(' ', '_').str.strip().str.upper()

print(dete_survey_updated.columns)
Index(['ID', 'SeparationType', 'Cease Date', 'DETE Start Date',
       'Role Start Date', 'Position', 'Classification', 'Region',
       'Business Unit', 'Employment Status', 'Career move to public sector',
       'Career move to private sector', 'Interpersonal conflicts',
       'Job dissatisfaction', 'Dissatisfaction with the department',
       'Physical work environment', 'Lack of recognition',
       'Lack of job security', 'Work location', 'Employment conditions',
       'Maternity/family', 'Relocation', 'Study/Travel', 'Ill Health',
       'Traumatic incident', 'Work life balance', 'Workload',
       'None of the above', 'Gender', 'Age', 'Aboriginal', 'Torres Strait',
       'South Sea', 'Disability', 'NESB'],
      dtype='object')


Index(['ID', 'SEPARATIONTYPE', 'CEASE_DATE', 'DETE_START_DATE',
       'ROLE_START_DATE', 'POSITION', 'CLASSIFICATION', 'REGION',
       'BUSINESS_UNIT', 'EMPLOYMENT_STATUS', 'CAREER_MOVE_TO_PUBLIC_SECTOR',
       'CAREER_MOVE_TO_PRIVATE_SECTOR', 'INTERPERSONAL_CONFLICTS',
       'JOB_DISSATISFACTION', 'DISSATISFACTION_WITH_THE_DEPARTMENT',
       'PHYSICAL_WORK_ENVIRONMENT', 'LACK_OF_RECOGNITION',
       'LACK_OF_JOB_SECURITY', 'WORK_LOCATION', 'EMPLOYMENT_CONDITIONS',
       'MATERNITY/FAMILY', 'RELOCATION', 'STUDY/TRAVEL', 'ILL_HEALTH',
       'TRAUMATIC_INCIDENT', 'WORK_LIFE_BALANCE', 'WORKLOAD',
       'NONE_OF_THE_ABOVE', 'GENDER', 'AGE', 'ABORIGINAL', 'TORRES_STRAIT',
       'SOUTH_SEA', 'DISABILITY', 'NESB'],
      dtype='object')

Observations: DETE

I standardized the column title format with all capitals as well as removing spaces and changing the format from camelcase to snakecase. Again, it's all about simplifying and standardizing.

Filter the Data: DETE

In [5]:
# Observe the various types of reasons for resigning in column ['SEPEARATIONTYPE'].
# Extract only the resignation type reasons.

print(dete_survey_updated.loc[:, 'SEPARATIONTYPE'].value_counts())

dete_resignations = dete_survey_updated[dete_survey_updated.loc[:, 'SEPARATIONTYPE'].isin(['Resignation-Other reasons','Resignation-Other employer', 'Resignation-Move overseas/interstate'])]

print('\n')
print(dete_resignations.info())
Age Retirement                          285
Resignation-Other reasons               150
Resignation-Other employer               91
Resignation-Move overseas/interstate     70
Voluntary Early Retirement (VER)         67
Ill Health Retirement                    61
Other                                    49
Contract Expired                         34
Termination                              15
Name: SEPARATIONTYPE, dtype: int64


<class 'pandas.core.frame.DataFrame'>
Int64Index: 311 entries, 3 to 821
Data columns (total 35 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   ID                                   311 non-null    int64  
 1   SEPARATIONTYPE                       311 non-null    object 
 2   CEASE_DATE                           300 non-null    object 
 3   DETE_START_DATE                      283 non-null    float64
 4   ROLE_START_DATE                      271 non-null    float64
 5   POSITION                             308 non-null    object 
 6   CLASSIFICATION                       161 non-null    object 
 7   REGION                               265 non-null    object 
 8   BUSINESS_UNIT                        32 non-null     object 
 9   EMPLOYMENT_STATUS                    307 non-null    object 
 10  CAREER_MOVE_TO_PUBLIC_SECTOR         311 non-null    bool   
 11  CAREER_MOVE_TO_PRIVATE_SECTOR        311 non-null    bool   
 12  INTERPERSONAL_CONFLICTS              311 non-null    bool   
 13  JOB_DISSATISFACTION                  311 non-null    bool   
 14  DISSATISFACTION_WITH_THE_DEPARTMENT  311 non-null    bool   
 15  PHYSICAL_WORK_ENVIRONMENT            311 non-null    bool   
 16  LACK_OF_RECOGNITION                  311 non-null    bool   
 17  LACK_OF_JOB_SECURITY                 311 non-null    bool   
 18  WORK_LOCATION                        311 non-null    bool   
 19  EMPLOYMENT_CONDITIONS                311 non-null    bool   
 20  MATERNITY/FAMILY                     311 non-null    bool   
 21  RELOCATION                           311 non-null    bool   
 22  STUDY/TRAVEL                         311 non-null    bool   
 23  ILL_HEALTH                           311 non-null    bool   
 24  TRAUMATIC_INCIDENT                   311 non-null    bool   
 25  WORK_LIFE_BALANCE                    311 non-null    bool   
 26  WORKLOAD                             311 non-null    bool   
 27  NONE_OF_THE_ABOVE                    311 non-null    bool   
 28  GENDER                               302 non-null    object 
 29  AGE                                  306 non-null    object 
 30  ABORIGINAL                           7 non-null      object 
 31  TORRES_STRAIT                        0 non-null      object 
 32  SOUTH_SEA                            3 non-null      object 
 33  DISABILITY                           8 non-null      object 
 34  NESB                                 9 non-null      object 
dtypes: bool(18), float64(2), int64(1), object(14)
memory usage: 49.2+ KB
None

Observations: DETE

Resigning accounts for 38% of all the reasons for leaving the DETE Institute.

Verify the Data: DETE

In [6]:
# Use the _pd.str.strip & replace & get_ functions to separate year in ['CEASE_DATE'] column.

dete_resignations = dete_resignations.copy()
print(dete_resignations.loc[:, 'CEASE_DATE'].value_counts())
print('\n')

CEASE_DATE_CLEANED = dete_resignations.loc[:, 'CEASE_DATE'].str.strip().str.replace('/', '/ ').str.split().str.get(-1)
dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] = CEASE_DATE_CLEANED
print(dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].value_counts)

# Convert years from string to numeric - float in order to execute subtraction. 

dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] = dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].astype(float)

print(dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].unique())
print('\n')
dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] = dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].dropna()
dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].sort_index()
print(dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].value_counts())
print('\n')
dete_resignations.tail()

print(dete_resignations.loc[:, 'DETE_START_DATE'].value_counts())

# Calculate difference between start and cease dates in order to determine years of service.

INSTITUTE_SERVICE = dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] - dete_resignations.loc[:, 'DETE_START_DATE']
dete_resignations.loc[:, 'INSTITUTE_SERVICE'] = INSTITUTE_SERVICE
dete_resignations.loc[:, 'INSTITUTE_SERVICE'].value_counts()
2012       126
2013        74
01/2014     22
12/2013     17
06/2013     14
09/2013     11
11/2013      9
07/2013      9
10/2013      6
08/2013      4
05/2012      2
05/2013      2
07/2012      1
2010         1
09/2010      1
07/2006      1
Name: CEASE_DATE, dtype: int64


<bound method IndexOpsMixin.value_counts of 3      2012
5      2012
8      2012
9      2012
11     2012
       ... 
808    2013
815    2014
816    2014
819    2014
821    2013
Name: CEASE_DATE_CLEANED, Length: 311, dtype: object>
[2012. 2013. 2010. 2014.   nan 2006.]


2013.0    146
2012.0    129
2014.0     22
2010.0      2
2006.0      1
Name: CEASE_DATE_CLEANED, dtype: int64


2011.0    24
2008.0    22
2007.0    21
2012.0    21
2010.0    17
2005.0    15
2004.0    14
2009.0    13
2006.0    13
2013.0    10
2000.0     9
1999.0     8
1996.0     6
2002.0     6
1992.0     6
1998.0     6
2003.0     6
1994.0     6
1993.0     5
1990.0     5
1980.0     5
1997.0     5
1991.0     4
1989.0     4
1988.0     4
1995.0     4
2001.0     3
1985.0     3
1986.0     3
1983.0     2
1976.0     2
1974.0     2
1971.0     1
1972.0     1
1984.0     1
1982.0     1
1987.0     1
1975.0     1
1973.0     1
1977.0     1
1963.0     1
Name: DETE_START_DATE, dtype: int64
Out[6]:
5.0     23
1.0     22
3.0     20
0.0     20
6.0     17
4.0     16
9.0     14
2.0     14
7.0     13
13.0     8
8.0      8
20.0     7
15.0     7
10.0     6
22.0     6
14.0     6
17.0     6
12.0     6
16.0     5
18.0     5
23.0     4
11.0     4
24.0     4
39.0     3
19.0     3
21.0     3
32.0     3
28.0     2
26.0     2
25.0     2
30.0     2
36.0     2
29.0     1
33.0     1
42.0     1
27.0     1
41.0     1
35.0     1
38.0     1
34.0     1
49.0     1
31.0     1
Name: INSTITUTE_SERVICE, dtype: int64
In [7]:
# create a function and use with _applymap_ to consolidate
# True and False results into one column ['dissatisfied'].

def update_vals(element):
    if element == False:
        return False
    elif pd.isnull(element):
        return np.nan
    else:
        return True

columns = ['JOB_DISSATISFACTION', 'DISSATISFACTION_WITH_THE_DEPARTMENT',
       'PHYSICAL_WORK_ENVIRONMENT', 'LACK_OF_RECOGNITION',
       'LACK_OF_JOB_SECURITY', 'WORK_LOCATION', 'EMPLOYMENT_CONDITIONS',
       'WORK_LIFE_BALANCE', 'WORKLOAD',]    
    
# Use _applymap_ function to consolidate True and False results into one column ['dissatisfied'].
    
dete_resignations.loc[:, 'dissatisfied'] = dete_resignations[columns].applymap(update_vals).any(1, skipna=False)

dete_resignations_up = dete_resignations.copy()
print("False means resigned for other than being dissatisfied.")
print("True means resigned due to being dissatisfied in some particular way.")
print('\n')
print(dete_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
dete_resignations_up.head()
False means resigned for other than being dissatisfied.
True means resigned due to being dissatisfied in some particular way.


False    162
True     149
Name: dissatisfied, dtype: int64
Out[7]:
ID SEPARATIONTYPE CEASE_DATE DETE_START_DATE ROLE_START_DATE POSITION CLASSIFICATION REGION BUSINESS_UNIT EMPLOYMENT_STATUS ... GENDER AGE ABORIGINAL TORRES_STRAIT SOUTH_SEA DISABILITY NESB CEASE_DATE_CLEANED INSTITUTE_SERVICE dissatisfied
3 4 Resignation-Other reasons 05/2012 2005.0 2006.0 Teacher Primary Central Queensland NaN Permanent Full-time ... Female 36-40 NaN NaN NaN NaN NaN 2012.0 7.0 False
5 6 Resignation-Other reasons 05/2012 1994.0 1997.0 Guidance Officer NaN Central Office Education Queensland Permanent Full-time ... Female 41-45 NaN NaN NaN NaN NaN 2012.0 18.0 True
8 9 Resignation-Other reasons 07/2012 2009.0 2009.0 Teacher Secondary North Queensland NaN Permanent Full-time ... Female 31-35 NaN NaN NaN NaN NaN 2012.0 3.0 False
9 10 Resignation-Other employer 2012 1997.0 2008.0 Teacher Aide NaN NaN NaN Permanent Part-time ... Female 46-50 NaN NaN NaN NaN NaN 2012.0 15.0 True
11 12 Resignation-Move overseas/interstate 2012 2009.0 2009.0 Teacher Secondary Far North Queensland NaN Permanent Full-time ... Male 31-35 NaN NaN NaN NaN NaN 2012.0 3.0 False

5 rows × 38 columns

Observations: DETE

The cease_date had many different forms in the original datafile. It had to be standardized to year only in order to calculate years of service by subtracting start_date from it.

I also extracted data associated with "dissatisfied" as a reason for resigning from many different columns and consolidated it into one new column.

Create a New Column: DETE

In [8]:
# Create a new column designated as ['institute'] with every row being 'DETE'.

dete_resignations_up.loc[:, 'institute'] = 'DETE'

dete_resignations_up.columns = dete_resignations_up.columns.str.strip().str.lower()
print(dete_resignations_up.head())
dete_resignations_up.loc[:, 'institute_service'].value_counts(dropna=False)
    id                        separationtype cease_date  dete_start_date  \
3    4             Resignation-Other reasons    05/2012           2005.0   
5    6             Resignation-Other reasons    05/2012           1994.0   
8    9             Resignation-Other reasons    07/2012           2009.0   
9   10            Resignation-Other employer       2012           1997.0   
11  12  Resignation-Move overseas/interstate       2012           2009.0   

    role_start_date          position classification                region  \
3            2006.0           Teacher        Primary    Central Queensland   
5            1997.0  Guidance Officer            NaN        Central Office   
8            2009.0           Teacher      Secondary      North Queensland   
9            2008.0      Teacher Aide            NaN                   NaN   
11           2009.0           Teacher      Secondary  Far North Queensland   

           business_unit    employment_status  ...    age  aboriginal  \
3                    NaN  Permanent Full-time  ...  36-40         NaN   
5   Education Queensland  Permanent Full-time  ...  41-45         NaN   
8                    NaN  Permanent Full-time  ...  31-35         NaN   
9                    NaN  Permanent Part-time  ...  46-50         NaN   
11                   NaN  Permanent Full-time  ...  31-35         NaN   

    torres_strait  south_sea  disability  nesb  cease_date_cleaned  \
3             NaN        NaN         NaN   NaN              2012.0   
5             NaN        NaN         NaN   NaN              2012.0   
8             NaN        NaN         NaN   NaN              2012.0   
9             NaN        NaN         NaN   NaN              2012.0   
11            NaN        NaN         NaN   NaN              2012.0   

    institute_service  dissatisfied  institute  
3                 7.0         False       DETE  
5                18.0          True       DETE  
8                 3.0         False       DETE  
9                15.0          True       DETE  
11                3.0         False       DETE  

[5 rows x 39 columns]
Out[8]:
NaN     38
5.0     23
1.0     22
3.0     20
0.0     20
6.0     17
4.0     16
9.0     14
2.0     14
7.0     13
13.0     8
8.0      8
20.0     7
15.0     7
12.0     6
22.0     6
17.0     6
10.0     6
14.0     6
16.0     5
18.0     5
24.0     4
23.0     4
11.0     4
39.0     3
32.0     3
19.0     3
21.0     3
36.0     2
30.0     2
25.0     2
28.0     2
26.0     2
29.0     1
42.0     1
38.0     1
27.0     1
41.0     1
35.0     1
49.0     1
34.0     1
33.0     1
31.0     1
Name: institute_service, dtype: int64

Observations: DETE

The purpose of creating a new column "insitute" with 'DETE' as the input was to distinguish the DETE file from the TAFE file after they are combined.

I also changed the column titles to lower case to match the TAFE column titles.

Clean the Service Column: DETE

In [9]:
# Allocate the years of service into four categorical groups:
# New < 3 years
# Experienced 3-6 years
# Established 7-10 yers
# Veteran > 10 years

def update_vals(element):
    if element < 3.0:
        return 'New'
    elif 3 <= element < 7:
        return 'Experienced'
    elif 6  < element < 11:
        return 'Established'
    elif element > 10:
        return 'Veteran'
    elif pd.isnull(element):
        return np.nan
    
dete_resignations_up.loc[:, 'service_cat'] = dete_resignations_up.loc[:, 'institute_service'].map(update_vals)
dete_resignations_up.head()

dete_resignations_up.loc[:, 'service_cat'].value_counts(dropna=False)
Out[9]:
Veteran        100
Experienced     76
New             56
Established     41
NaN             38
Name: service_cat, dtype: int64

Observations: DETE

Rather than making a bar graph with as many distinct bars as there are distinct numbers of years of service, I grouped the years into categorical groups covering ranges of years of service. I condensed the number of groups to four as shown above in the output.

Perform Initial Analysis: DETE

In [10]:
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each years of service group.

dete_resignations_up.loc[:, 'dissatisfied'] = dete_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(dete_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(149/(154+149))
print("Percent Average DETE Employees Dissatisfied Regardless of Years of Service =", Percent_Average)
print('\n')
diss_dete_pct1 = dete_resignations_up.pivot_table(index='service_cat', values='dissatisfied')
print(diss_dete_pct1.info)
False    162
True     149
Name: dissatisfied, dtype: int64


Percent Average DETE Employees Dissatisfied Regardless of Years of Service = 49.17491749174918


<bound method DataFrame.info of              dissatisfied
service_cat              
Established      0.609756
Experienced      0.460526
New              0.375000
Veteran          0.560000>
In [11]:
# Use matplotlib dunction to plt a bar chart to provide visual
# comparison between yrs of service groups for percent employees
# resigning due to some kind of dissatisfaction.

%matplotlib inline
diss_dete_pct1.plot(kind='bar', rot=30, ylim=(0,0.75))

for i in range(4):
    plt.text(i-0.14, 0.2, round(diss_dete_pct1.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
    
plt.xlabel('Years of Service Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Service Categories: DETE')
Out[11]:
Text(0.5, 1.0, 'Proportion Dissatisfied by Service Categories: DETE')

Answers to Question 1: DETE

Question 1:

1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?

Answers:

The answer to both questions above is YES.

For employees classified as "New" (less than 3 years of service), about 38% resigned due to some kind of dissatisfaction. For employees with more than 6 years of service (Established and Veteran), about 61% and 56% respectively resigned due to some kind of dissatisfaction. The percent for employees classified as Experienced (between 3-6 years of srvice) had 46% dissatisfied.

The average percent across all years of service was 49%. To me that seems fairly high.

From the graph results shown above, it appears that the percentage of employees who resigned for "dissatisfied" reasons, increases as years of service increases. I can surmise why that might be; such as, the longer one is in service, the greater chance to see significant changes in management (good or bad), changes in moral, changes in institute direction, etc.

I will not conclude anything further at this point but will wait until analysis is complete for both institutes.

Consolidate Age into Ranges: DETE

In [12]:
# Consolidate the employee age into five age range groups:

print(dete_resignations_up.loc[:, 'age'].value_counts(dropna=False))
print('\n')

def update_vals(element):
    if element == '20 or younger':
        return 'Twenties'
    if element == '21-25':
        return 'Twenties'
    if element == '26-30':
        return 'Twenties'
    if element == '31-35':
        return 'Thirties'
    if element == '36-40':
        return 'Thirties'
    if element == '41-45':
        return 'Forties'
    if element == '46-50':
        return 'Forties'
    if element == '51-55':
        return 'Fifties'
    if element == '56-60':
        return 'Fifties'
    if element == '61 or older':
        return 'Above Sixty'
    elif pd.isnull(element):
        return np.nan

    
dete_resignations_up.loc[:, 'age_groups'] = dete_resignations_up.loc[:, 'age'].map(update_vals)
dete_resignations_up.head()

dete_resignations_up.loc[:, 'age_groups'].value_counts(dropna=False)
41-45            48
46-50            42
36-40            41
26-30            35
51-55            32
21-25            29
31-35            29
56-60            26
61 or older      23
NaN               5
20 or younger     1
Name: age, dtype: int64


Out[12]:
Forties        90
Thirties       70
Twenties       65
Fifties        58
Above Sixty    23
NaN             5
Name: age_groups, dtype: int64
In [13]:
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each age group.

dete_resignations_up.loc[:, 'dissatisfied'] = dete_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(dete_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(149/(154+149))
print("Percent Average DETE Employees Dissatisfied Regardless Their Age =", Percent_Average)
print('\n')
diss_dete_pct2 = dete_resignations_up.pivot_table(index='age_groups', values='dissatisfied')
print(diss_dete_pct2.info)
False    162
True     149
Name: dissatisfied, dtype: int64


Percent Average DETE Employees Dissatisfied Regardless Their Age = 49.17491749174918


<bound method DataFrame.info of              dissatisfied
age_groups               
Above Sixty      0.521739
Fifties          0.586207
Forties          0.466667
Thirties         0.457143
Twenties         0.446154>
In [14]:
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.

%matplotlib inline
diss_dete_pct2.plot(kind='bar', rot=30, ylim=(0,0.75))

for i in range(5):
    plt.text(i-0.22, 0.25, round(diss_dete_pct2.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
    
plt.xlabel('Age Group Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Age Group: DETE')
Out[14]:
Text(0.5, 1.0, 'Proportion Dissatisfied by Age Group: DETE')

Answers to Question 2: DETE

Question 2:

Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

Answers:

The answer to both questions above is YES.

There's very little difference in percent dissatisfied for employees in there twenties, thirties and forties: between 45% to 47%. There's a significant increase in percent for employees in their fifties: 59%, followed by a drop off for employees over 60: 52%.

The total range of percentage among the age groups is 14, which is much less than the total range for years of service groups: 23.5.



TAFE INSTITUTE ANALYSIS

NOTE:

**I will not write observations after each series of executions for the TAFE file in that they would basically be the same as the ones I wrote for the DETE file above.**

Read in Data: TAFE

In [15]:
# read in and save the TAFE Institution *.csv data files as DataFrame.
# not all missing data is signified as NaN, some are as 'Not Stated'.
# add na_values function to standardize missing data as NaN.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
tafe_survey = pd.read_csv('tafe_survey.csv', na_values=['Not Stated'])
print('\n')
print(tafe_survey.columns)
print('\n')
print(tafe_survey.head())
print('\n')
print(tafe_survey.info)
Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR',
       'Reason for ceasing employment',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE',
       'Main Factor. Which of these was the main factor for leaving?',
       'InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction',
       'InstituteViews. Topic:2. I was given access to skills training to help me do my job better',
       'InstituteViews. Topic:3. I was given adequate opportunities for personal development',
       'InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%',
       'InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had',
       'InstituteViews. Topic:6. The organisation recognised when staff did good work',
       'InstituteViews. Topic:7. Management was generally supportive of me',
       'InstituteViews. Topic:8. Management was generally supportive of my team',
       'InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me',
       'InstituteViews. Topic:10. Staff morale was positive within the Institute',
       'InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly',
       'InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently',
       'InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly',
       'WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit',
       'WorkUnitViews. Topic:15. I worked well with my colleagues',
       'WorkUnitViews. Topic:16. My job was challenging and interesting',
       'WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work',
       'WorkUnitViews. Topic:18. I had sufficient contact with other people in my job',
       'WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job',
       'WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job',
       'WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT]',
       'WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job',
       'WorkUnitViews. Topic:23. My job provided sufficient variety',
       'WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job',
       'WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction',
       'WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance',
       'WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area',
       'WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date',
       'WorkUnitViews. Topic:29. There was adequate communication between staff in my unit',
       'WorkUnitViews. Topic:30. Staff morale was positive within my work unit',
       'Induction. Did you undertake Workplace Induction?',
       'InductionInfo. Topic:Did you undertake a Corporate Induction?',
       'InductionInfo. Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Topic: Did you undertake Team Induction?',
       'InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?',
       'InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?',
       'InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?',
       'InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?',
       'InductionInfo. On-line Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?',
       'InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?',
       'InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]',
       'InductionInfo. Induction Manual Topic: Did you undertake Team Induction?',
       'Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?',
       'Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?',
       'Workplace. Topic:Does your workplace promote and practice the principles of employment equity?',
       'Workplace. Topic:Does your workplace value the diversity of its employees?',
       'Workplace. Topic:Would you recommend the Institute as an employer to others?',
       'Gender. What is your Gender?', 'CurrentAge. Current Age',
       'Employment Type. Employment Type', 'Classification. Classification',
       'LengthofServiceOverall. Overall Length of Service at Institute (in years)',
       'LengthofServiceCurrent. Length of Service at current workplace (in years)'],
      dtype='object')


      Record ID                              Institute  \
0  6.341330e+17  Southern Queensland Institute of TAFE   
1  6.341337e+17            Mount Isa Institute of TAFE   
2  6.341388e+17            Mount Isa Institute of TAFE   
3  6.341399e+17            Mount Isa Institute of TAFE   
4  6.341466e+17  Southern Queensland Institute of TAFE   

                   WorkArea  CESSATION YEAR Reason for ceasing employment  \
0  Non-Delivery (corporate)          2010.0              Contract Expired   
1  Non-Delivery (corporate)          2010.0                    Retirement   
2       Delivery (teaching)          2010.0                    Retirement   
3  Non-Delivery (corporate)          2010.0                   Resignation   
4       Delivery (teaching)          2010.0                   Resignation   

  Contributing Factors. Career Move - Public Sector   \
0                                                NaN   
1                                                  -   
2                                                  -   
3                                                  -   
4                                                  -   

  Contributing Factors. Career Move - Private Sector   \
0                                                NaN    
1                                                  -    
2                                                  -    
3                                                  -    
4                       Career Move - Private Sector    

  Contributing Factors. Career Move - Self-employment  \
0                                                NaN    
1                                                  -    
2                                                  -    
3                                                  -    
4                                                  -    

  Contributing Factors. Ill Health Contributing Factors. Maternity/Family  \
0                              NaN                                    NaN   
1                                -                                      -   
2                                -                                      -   
3                                -                                      -   
4                                -                                      -   

   ...  \
0  ...   
1  ...   
2  ...   
3  ...   
4  ...   

  Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?  \
0                                                Yes                                                            
1                                                Yes                                                            
2                                                Yes                                                            
3                                                Yes                                                            
4                                                Yes                                                            

  Workplace. Topic:Does your workplace promote and practice the principles of employment equity?  \
0                                                Yes                                               
1                                                Yes                                               
2                                                Yes                                               
3                                                Yes                                               
4                                                Yes                                               

  Workplace. Topic:Does your workplace value the diversity of its employees?  \
0                                                Yes                           
1                                                Yes                           
2                                                Yes                           
3                                                Yes                           
4                                                Yes                           

  Workplace. Topic:Would you recommend the Institute as an employer to others?  \
0                                                Yes                             
1                                                Yes                             
2                                                Yes                             
3                                                Yes                             
4                                                Yes                             

  Gender. What is your Gender? CurrentAge. Current Age  \
0                       Female                  26  30   
1                          NaN                     NaN   
2                          NaN                     NaN   
3                          NaN                     NaN   
4                         Male                  41  45   

  Employment Type. Employment Type Classification. Classification  \
0              Temporary Full-time            Administration (AO)   
1                              NaN                            NaN   
2                              NaN                            NaN   
3                              NaN                            NaN   
4              Permanent Full-time        Teacher (including LVT)   

  LengthofServiceOverall. Overall Length of Service at Institute (in years)  \
0                                                1-2                          
1                                                NaN                          
2                                                NaN                          
3                                                NaN                          
4                                                3-4                          

  LengthofServiceCurrent. Length of Service at current workplace (in years)  
0                                                1-2                         
1                                                NaN                         
2                                                NaN                         
3                                                NaN                         
4                                                3-4                         

[5 rows x 72 columns]


<bound method DataFrame.info of         Record ID                              Institute  \
0    6.341330e+17  Southern Queensland Institute of TAFE   
1    6.341337e+17            Mount Isa Institute of TAFE   
2    6.341388e+17            Mount Isa Institute of TAFE   
3    6.341399e+17            Mount Isa Institute of TAFE   
4    6.341466e+17  Southern Queensland Institute of TAFE   
..            ...                                    ...   
697  6.350668e+17         Barrier Reef Institute of TAFE   
698  6.350677e+17  Southern Queensland Institute of TAFE   
699  6.350704e+17       Tropical North Institute of TAFE   
700  6.350712e+17      Southbank Institute of Technology   
701  6.350730e+17       Tropical North Institute of TAFE   

                     WorkArea  CESSATION YEAR Reason for ceasing employment  \
0    Non-Delivery (corporate)          2010.0              Contract Expired   
1    Non-Delivery (corporate)          2010.0                    Retirement   
2         Delivery (teaching)          2010.0                    Retirement   
3    Non-Delivery (corporate)          2010.0                   Resignation   
4         Delivery (teaching)          2010.0                   Resignation   
..                        ...             ...                           ...   
697       Delivery (teaching)          2013.0                   Resignation   
698  Non-Delivery (corporate)          2013.0                   Resignation   
699       Delivery (teaching)          2013.0                   Resignation   
700  Non-Delivery (corporate)          2013.0              Contract Expired   
701  Non-Delivery (corporate)          2013.0                   Resignation   

    Contributing Factors. Career Move - Public Sector   \
0                                                  NaN   
1                                                    -   
2                                                    -   
3                                                    -   
4                                                    -   
..                                                 ...   
697                        Career Move - Public Sector   
698                        Career Move - Public Sector   
699                                                  -   
700                                                NaN   
701                                                  -   

    Contributing Factors. Career Move - Private Sector   \
0                                                  NaN    
1                                                    -    
2                                                    -    
3                                                    -    
4                         Career Move - Private Sector    
..                                                 ...    
697                                                  -    
698                                                  -    
699                                                  -    
700                                                NaN    
701                                                  -    

    Contributing Factors. Career Move - Self-employment  \
0                                                  NaN    
1                                                    -    
2                                                    -    
3                                                    -    
4                                                    -    
..                                                 ...    
697                                                  -    
698                                                  -    
699                                                  -    
700                                                NaN    
701                      Career Move - Self-employment    

    Contributing Factors. Ill Health Contributing Factors. Maternity/Family  \
0                                NaN                                    NaN   
1                                  -                                      -   
2                                  -                                      -   
3                                  -                                      -   
4                                  -                                      -   
..                               ...                                    ...   
697                                -                                      -   
698                                -                                      -   
699                                -                                      -   
700                              NaN                                    NaN   
701                                -                                      -   

     ...  \
0    ...   
1    ...   
2    ...   
3    ...   
4    ...   
..   ...   
697  ...   
698  ...   
699  ...   
700  ...   
701  ...   

    Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?  \
0                                                  Yes                                                            
1                                                  Yes                                                            
2                                                  Yes                                                            
3                                                  Yes                                                            
4                                                  Yes                                                            
..                                                 ...                                                            
697                                                Yes                                                            
698                                                NaN                                                            
699                                                Yes                                                            
700                                                 No                                                            
701                                                Yes                                                            

    Workplace. Topic:Does your workplace promote and practice the principles of employment equity?  \
0                                                  Yes                                               
1                                                  Yes                                               
2                                                  Yes                                               
3                                                  Yes                                               
4                                                  Yes                                               
..                                                 ...                                               
697                                                Yes                                               
698                                                NaN                                               
699                                                Yes                                               
700                                                 No                                               
701                                                Yes                                               

    Workplace. Topic:Does your workplace value the diversity of its employees?  \
0                                                  Yes                           
1                                                  Yes                           
2                                                  Yes                           
3                                                  Yes                           
4                                                  Yes                           
..                                                 ...                           
697                                                Yes                           
698                                                NaN                           
699                                                Yes                           
700                                                Yes                           
701                                                Yes                           

    Workplace. Topic:Would you recommend the Institute as an employer to others?  \
0                                                  Yes                             
1                                                  Yes                             
2                                                  Yes                             
3                                                  Yes                             
4                                                  Yes                             
..                                                 ...                             
697                                                Yes                             
698                                                NaN                             
699                                                Yes                             
700                                                 No                             
701                                                Yes                             

    Gender. What is your Gender? CurrentAge. Current Age  \
0                         Female                  26  30   
1                            NaN                     NaN   
2                            NaN                     NaN   
3                            NaN                     NaN   
4                           Male                  41  45   
..                           ...                     ...   
697                         Male                   51-55   
698                          NaN                     NaN   
699                       Female                   51-55   
700                       Female                  41  45   
701                       Female                  26  30   

    Employment Type. Employment Type Classification. Classification  \
0                Temporary Full-time            Administration (AO)   
1                                NaN                            NaN   
2                                NaN                            NaN   
3                                NaN                            NaN   
4                Permanent Full-time        Teacher (including LVT)   
..                               ...                            ...   
697              Temporary Full-time        Teacher (including LVT)   
698                              NaN                            NaN   
699              Permanent Full-time        Teacher (including LVT)   
700              Temporary Full-time      Professional Officer (PO)   
701                  Contract/casual            Administration (AO)   

    LengthofServiceOverall. Overall Length of Service at Institute (in years)  \
0                                                  1-2                          
1                                                  NaN                          
2                                                  NaN                          
3                                                  NaN                          
4                                                  3-4                          
..                                                 ...                          
697                                                1-2                          
698                                                NaN                          
699                                                5-6                          
700                                                1-2                          
701                                                3-4                          

    LengthofServiceCurrent. Length of Service at current workplace (in years)  
0                                                  1-2                         
1                                                  NaN                         
2                                                  NaN                         
3                                                  NaN                         
4                                                  3-4                         
..                                                 ...                         
697                                                1-2                         
698                                                NaN                         
699                                                1-2                         
700                                                1-2                         
701                                                1-2                         

[702 rows x 72 columns]>

Identify Missing Values and Drop Unnecessary Columns: TAFE

In [16]:
# Observe volume of missing data by column.

print(tafe_survey.isnull().sum())
Record ID                                                                      0
Institute                                                                      0
WorkArea                                                                       0
CESSATION YEAR                                                                 7
Reason for ceasing employment                                                  1
                                                                            ... 
CurrentAge. Current Age                                                      106
Employment Type. Employment Type                                             106
Classification. Classification                                               106
LengthofServiceOverall. Overall Length of Service at Institute (in years)    106
LengthofServiceCurrent. Length of Service at current workplace (in years)    106
Length: 72, dtype: int64
In [17]:
# There are many columns which are not relevant to answering the research questions as described in the introduction.
# Use appropriate function to drop the following columns.

labels = tafe_survey.columns[17:66]
tafe_survey_updated = tafe_survey.drop(columns=labels, axis=1)
print(tafe_survey_updated.info)
print('\n')
print(tafe_survey_updated.columns)
<bound method DataFrame.info of         Record ID                              Institute  \
0    6.341330e+17  Southern Queensland Institute of TAFE   
1    6.341337e+17            Mount Isa Institute of TAFE   
2    6.341388e+17            Mount Isa Institute of TAFE   
3    6.341399e+17            Mount Isa Institute of TAFE   
4    6.341466e+17  Southern Queensland Institute of TAFE   
..            ...                                    ...   
697  6.350668e+17         Barrier Reef Institute of TAFE   
698  6.350677e+17  Southern Queensland Institute of TAFE   
699  6.350704e+17       Tropical North Institute of TAFE   
700  6.350712e+17      Southbank Institute of Technology   
701  6.350730e+17       Tropical North Institute of TAFE   

                     WorkArea  CESSATION YEAR Reason for ceasing employment  \
0    Non-Delivery (corporate)          2010.0              Contract Expired   
1    Non-Delivery (corporate)          2010.0                    Retirement   
2         Delivery (teaching)          2010.0                    Retirement   
3    Non-Delivery (corporate)          2010.0                   Resignation   
4         Delivery (teaching)          2010.0                   Resignation   
..                        ...             ...                           ...   
697       Delivery (teaching)          2013.0                   Resignation   
698  Non-Delivery (corporate)          2013.0                   Resignation   
699       Delivery (teaching)          2013.0                   Resignation   
700  Non-Delivery (corporate)          2013.0              Contract Expired   
701  Non-Delivery (corporate)          2013.0                   Resignation   

    Contributing Factors. Career Move - Public Sector   \
0                                                  NaN   
1                                                    -   
2                                                    -   
3                                                    -   
4                                                    -   
..                                                 ...   
697                        Career Move - Public Sector   
698                        Career Move - Public Sector   
699                                                  -   
700                                                NaN   
701                                                  -   

    Contributing Factors. Career Move - Private Sector   \
0                                                  NaN    
1                                                    -    
2                                                    -    
3                                                    -    
4                         Career Move - Private Sector    
..                                                 ...    
697                                                  -    
698                                                  -    
699                                                  -    
700                                                NaN    
701                                                  -    

    Contributing Factors. Career Move - Self-employment  \
0                                                  NaN    
1                                                    -    
2                                                    -    
3                                                    -    
4                                                    -    
..                                                 ...    
697                                                  -    
698                                                  -    
699                                                  -    
700                                                NaN    
701                      Career Move - Self-employment    

    Contributing Factors. Ill Health Contributing Factors. Maternity/Family  \
0                                NaN                                    NaN   
1                                  -                                      -   
2                                  -                                      -   
3                                  -                                      -   
4                                  -                                      -   
..                               ...                                    ...   
697                                -                                      -   
698                                -                                      -   
699                                -                                      -   
700                              NaN                                    NaN   
701                                -                                      -   

     ... Contributing Factors. Study Contributing Factors. Travel  \
0    ...                         NaN                          NaN   
1    ...                           -                       Travel   
2    ...                           -                            -   
3    ...                           -                       Travel   
4    ...                           -                            -   
..   ...                         ...                          ...   
697  ...                           -                            -   
698  ...                           -                            -   
699  ...                           -                            -   
700  ...                         NaN                          NaN   
701  ...                           -                       Travel   

    Contributing Factors. Other Contributing Factors. NONE  \
0                           NaN                        NaN   
1                             -                          -   
2                             -                       NONE   
3                             -                          -   
4                             -                          -   
..                          ...                        ...   
697                           -                          -   
698                           -                          -   
699                       Other                          -   
700                         NaN                        NaN   
701                           -                          -   

    Gender. What is your Gender? CurrentAge. Current Age  \
0                         Female                  26  30   
1                            NaN                     NaN   
2                            NaN                     NaN   
3                            NaN                     NaN   
4                           Male                  41  45   
..                           ...                     ...   
697                         Male                   51-55   
698                          NaN                     NaN   
699                       Female                   51-55   
700                       Female                  41  45   
701                       Female                  26  30   

    Employment Type. Employment Type Classification. Classification  \
0                Temporary Full-time            Administration (AO)   
1                                NaN                            NaN   
2                                NaN                            NaN   
3                                NaN                            NaN   
4                Permanent Full-time        Teacher (including LVT)   
..                               ...                            ...   
697              Temporary Full-time        Teacher (including LVT)   
698                              NaN                            NaN   
699              Permanent Full-time        Teacher (including LVT)   
700              Temporary Full-time      Professional Officer (PO)   
701                  Contract/casual            Administration (AO)   

    LengthofServiceOverall. Overall Length of Service at Institute (in years)  \
0                                                  1-2                          
1                                                  NaN                          
2                                                  NaN                          
3                                                  NaN                          
4                                                  3-4                          
..                                                 ...                          
697                                                1-2                          
698                                                NaN                          
699                                                5-6                          
700                                                1-2                          
701                                                3-4                          

    LengthofServiceCurrent. Length of Service at current workplace (in years)  
0                                                  1-2                         
1                                                  NaN                         
2                                                  NaN                         
3                                                  NaN                         
4                                                  3-4                         
..                                                 ...                         
697                                                1-2                         
698                                                NaN                         
699                                                1-2                         
700                                                1-2                         
701                                                1-2                         

[702 rows x 23 columns]>


Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR',
       'Reason for ceasing employment',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE',
       'Gender. What is your Gender?', 'CurrentAge. Current Age',
       'Employment Type. Employment Type', 'Classification. Classification',
       'LengthofServiceOverall. Overall Length of Service at Institute (in years)',
       'LengthofServiceCurrent. Length of Service at current workplace (in years)'],
      dtype='object')

Clean Column Names: TAFE

In [18]:
# Use the _mapping_ code to make column titles simplified and consistent. 

print(tafe_survey_updated.columns)
print('\n')

mapping = {'Record ID': 'id', 'CESSATION YEAR': 'cease_date', 'Reason for ceasing employment': 'separationtype',
           'Gender. What is your Gender?': 'gender', 'CurrentAge. Current Age': 'age',
           'Employment Type. Employment Type': 'employment_status', 'Classification. Classification': 'position',
           'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
           'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service' }

tafe_survey_updated = tafe_survey_updated.rename(mapping, axis=1)
print(tafe_survey_updated.columns)
Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR',
       'Reason for ceasing employment',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE',
       'Gender. What is your Gender?', 'CurrentAge. Current Age',
       'Employment Type. Employment Type', 'Classification. Classification',
       'LengthofServiceOverall. Overall Length of Service at Institute (in years)',
       'LengthofServiceCurrent. Length of Service at current workplace (in years)'],
      dtype='object')


Index(['id', 'Institute', 'WorkArea', 'cease_date', 'separationtype',
       'Contributing Factors. Career Move - Public Sector ',
       'Contributing Factors. Career Move - Private Sector ',
       'Contributing Factors. Career Move - Self-employment',
       'Contributing Factors. Ill Health',
       'Contributing Factors. Maternity/Family',
       'Contributing Factors. Dissatisfaction',
       'Contributing Factors. Job Dissatisfaction',
       'Contributing Factors. Interpersonal Conflict',
       'Contributing Factors. Study', 'Contributing Factors. Travel',
       'Contributing Factors. Other', 'Contributing Factors. NONE', 'gender',
       'age', 'employment_status', 'position', 'institute_service',
       'role_service'],
      dtype='object')

Filter the Data: TAFE

In [19]:
# Observe the various types of reasons for resigning in column ['SEPEARATIONTYPE'].
# Extract only the resignation type reasons.

print(tafe_survey_updated.loc[:, 'separationtype'].value_counts())

tafe_resignations = tafe_survey_updated[tafe_survey_updated.loc[:, 'separationtype'] == 'Resignation']
tafe_resignations.copy()
print(tafe_resignations)                                        
                                        
Resignation                 340
Contract Expired            127
Retrenchment/ Redundancy    104
Retirement                   82
Transfer                     25
Termination                  23
Name: separationtype, dtype: int64
               id                              Institute  \
3    6.341399e+17            Mount Isa Institute of TAFE   
4    6.341466e+17  Southern Queensland Institute of TAFE   
5    6.341475e+17  Southern Queensland Institute of TAFE   
6    6.341520e+17         Barrier Reef Institute of TAFE   
7    6.341537e+17  Southern Queensland Institute of TAFE   
..            ...                                    ...   
696  6.350660e+17  Southern Queensland Institute of TAFE   
697  6.350668e+17         Barrier Reef Institute of TAFE   
698  6.350677e+17  Southern Queensland Institute of TAFE   
699  6.350704e+17       Tropical North Institute of TAFE   
701  6.350730e+17       Tropical North Institute of TAFE   

                     WorkArea  cease_date separationtype  \
3    Non-Delivery (corporate)      2010.0    Resignation   
4         Delivery (teaching)      2010.0    Resignation   
5         Delivery (teaching)      2010.0    Resignation   
6    Non-Delivery (corporate)      2010.0    Resignation   
7         Delivery (teaching)      2010.0    Resignation   
..                        ...         ...            ...   
696  Non-Delivery (corporate)      2013.0    Resignation   
697       Delivery (teaching)      2013.0    Resignation   
698  Non-Delivery (corporate)      2013.0    Resignation   
699       Delivery (teaching)      2013.0    Resignation   
701  Non-Delivery (corporate)      2013.0    Resignation   

    Contributing Factors. Career Move - Public Sector   \
3                                                    -   
4                                                    -   
5                                                    -   
6                                                    -   
7                                                    -   
..                                                 ...   
696                                                  -   
697                        Career Move - Public Sector   
698                        Career Move - Public Sector   
699                                                  -   
701                                                  -   

    Contributing Factors. Career Move - Private Sector   \
3                                                    -    
4                         Career Move - Private Sector    
5                                                    -    
6                         Career Move - Private Sector    
7                                                    -    
..                                                 ...    
696                       Career Move - Private Sector    
697                                                  -    
698                                                  -    
699                                                  -    
701                                                  -    

    Contributing Factors. Career Move - Self-employment  \
3                                                    -    
4                                                    -    
5                                                    -    
6                                                    -    
7                                                    -    
..                                                 ...    
696                                                  -    
697                                                  -    
698                                                  -    
699                                                  -    
701                      Career Move - Self-employment    

    Contributing Factors. Ill Health Contributing Factors. Maternity/Family  \
3                                  -                                      -   
4                                  -                                      -   
5                                  -                                      -   
6                                  -                       Maternity/Family   
7                                  -                                      -   
..                               ...                                    ...   
696                                -                                      -   
697                                -                                      -   
698                                -                                      -   
699                                -                                      -   
701                                -                                      -   

     ... Contributing Factors. Study Contributing Factors. Travel  \
3    ...                           -                       Travel   
4    ...                           -                            -   
5    ...                           -                            -   
6    ...                           -                            -   
7    ...                           -                            -   
..   ...                         ...                          ...   
696  ...                           -                            -   
697  ...                           -                            -   
698  ...                           -                            -   
699  ...                           -                            -   
701  ...                           -                       Travel   

    Contributing Factors. Other Contributing Factors. NONE  gender  \
3                             -                          -     NaN   
4                             -                          -    Male   
5                         Other                          -  Female   
6                         Other                          -    Male   
7                         Other                          -    Male   
..                          ...                        ...     ...   
696                           -                          -    Male   
697                           -                          -    Male   
698                           -                          -     NaN   
699                       Other                          -  Female   
701                           -                          -  Female   

               age    employment_status                 position  \
3              NaN                  NaN                      NaN   
4           41  45  Permanent Full-time  Teacher (including LVT)   
5      56 or older      Contract/casual  Teacher (including LVT)   
6    20 or younger  Temporary Full-time      Administration (AO)   
7           46  50  Permanent Full-time  Teacher (including LVT)   
..             ...                  ...                      ...   
696         21  25  Temporary Full-time         Operational (OO)   
697          51-55  Temporary Full-time  Teacher (including LVT)   
698            NaN                  NaN                      NaN   
699          51-55  Permanent Full-time  Teacher (including LVT)   
701         26  30      Contract/casual      Administration (AO)   

    institute_service role_service  
3                 NaN          NaN  
4                 3-4          3-4  
5                7-10         7-10  
6                 3-4          3-4  
7                 3-4          3-4  
..                ...          ...  
696               5-6          5-6  
697               1-2          1-2  
698               NaN          NaN  
699               5-6          1-2  
701               3-4          1-2  

[340 rows x 23 columns]

Verify the Data: TAFE

In [20]:
# Create a function and use with _applymap_ to consolidate
# True and False results into one column ['dissatisfied'].

def update_vals(element):
    if element == '-':
        return False
    elif pd.isnull(element):
        return np.nan
    else:
        return True

# Use _applymap_ function to consolidate True and False results into one column ['dissatisfied'].    
    
tafe_resignations.loc[:, 'dissatisfied'] = tafe_resignations.loc[:, ('Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction')].applymap(update_vals).any(1, skipna=False)


tafe_resignations_up = tafe_resignations

print(tafe_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
False    241
True      91
NaN        8
Name: dissatisfied, dtype: int64
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:845: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Create a New Column: TAFE

In [21]:
# Create a new column designated as ['institute'] with every row being 'TAFE'.

tafe_resignations_up.loc[:, 'institute'] = 'TAFE'
tafe_resignations_up.head()
tafe_resignations_up.loc[:, 'institute_service'].value_counts(dropna=False)
Out[21]:
Less than 1 year      73
1-2                   64
3-4                   63
NaN                   50
5-6                   33
11-20                 26
7-10                  21
More than 20 years    10
Name: institute_service, dtype: int64

Clean the Service Column: TAFE

In [22]:
# Allocate the years of service into four categorical groups:
# New < 3 years
# Experienced 3-6 years
# Established 7-10 yers
# Veteran > 10 years

def update_vals(element):
    if element == 'Less than 1 year':
        return 'New'
    elif element == '1-2':
        return 'New'
    elif element == '3-4':
        return 'Experienced'
    elif element == '5-6':
        return 'Experienced'
    elif element == '7-10':
        return 'Established'
    elif element == '11-20':
        return 'Veteran'
    elif element == 'More than 20 years':
        return 'Veteran'
    elif pd.isnull(element):
        return np.nan

    
tafe_resignations_up.loc[:, 'service_cat'] = tafe_resignations_up.loc[:, 'institute_service'].map(update_vals)
tafe_resignations_up.head()

tafe_resignations_up.loc[:, 'service_cat'].value_counts(dropna=False)
Out[22]:
New            137
Experienced     96
NaN             50
Veteran         36
Established     21
Name: service_cat, dtype: int64

Perform Initial Analysis: TAFE

In [23]:
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each years of service group.

tafe_resignations_up.loc[:, 'dissatisfied'] = tafe_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(tafe_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(91/(241+91))
print("Percent Average TAFE Employees Dissatisfied Regardless of Years of Service =", Percent_Average)
print('\n')

diss_tafe_pct1 = tafe_resignations_up.pivot_table(index='service_cat', values='dissatisfied')
print(diss_tafe_pct1.info)
False    249
True      91
Name: dissatisfied, dtype: int64


Percent Average TAFE Employees Dissatisfied Regardless of Years of Service = 27.40963855421687


<bound method DataFrame.info of              dissatisfied
service_cat              
Established      0.333333
Experienced      0.250000
New              0.262774
Veteran          0.277778>
In [24]:
# Use matplotlib dunction to plot a bar chart to provide visual
# comparison between yrs of service groups for percent employees
# resigning due to some kind of dissatisfaction.

%matplotlib inline

diss_tafe_pct1.plot(kind='bar', rot=30, ylim=(0,0.40))

for i in range(4):
    plt.text(i-0.16, 0.15, round(diss_tafe_pct1.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
    
plt.xlabel('Years of Service Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Service Categories: TAFE')
Out[24]:
Text(0.5, 1.0, 'Proportion Dissatisfied by Service Categories: TAFE')

Answers to Question 1: TAFE

Question 1:

1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?

Answers:

The answer to both questions above is YES.

For employees classified as "New" (less than 3 years of service), about 26% resigned due to some kind of dissatisfaction, which is very close to Experienced and Veteran groups. The Established group ws slightly higher at 33%.

**The overall range of percentages for the TAFE Institute being 8.3 is much lower than that of the DETE Institute: 23.5. Also the overall average across all years of service groups for TAFE is 27% which is much lower than that for DETE: 49%.**

To see these significant differences between the two institutes tells me it was appopriate to evaluate the results for each institute separately first before combining the data and analyzing the grouped results. We shall see if that masks the signficant differences between the two institutes.

I will not conclude anything further at this point but wait until analysis is complete for both institutes.

Consolidate Age into Ranges: TAFE

In [25]:
# Observe the data stucture in the age column to determine how to organize into groups.

tafe_resignations_up.loc[:, 'age'].value_counts(dropna=False)
Out[25]:
NaN              50
41  45           45
51-55            39
46  50           39
21  25           33
31  35           32
36  40           32
26  30           32
56 or older      29
20 or younger     9
Name: age, dtype: int64
In [26]:
# Use the _pd_str.strip and replace_ code to standardize the age group format:

tafe_resignations_up.copy()
tafe_resignations_up.loc[:, 'age_cleaned'] = tafe_resignations_up.loc[:, 'age'].str.strip().str.replace('  ', '-')

tafe_resignations_up.loc[:, 'age_cleaned'].value_counts(dropna=False)
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:845: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[26]:
NaN              50
41-45            45
51-55            39
46-50            39
21-25            33
36-40            32
31-35            32
26-30            32
56 or older      29
20 or younger     9
Name: age_cleaned, dtype: int64
In [27]:
# Consolidate the employee age into five age range groups:

def update_vals(element):
    if element == '20 or younger':
        return 'Twenties'
    if element == '21-25':
        return 'Twenties'
    if element == '26-30':
        return 'Twenties'
    if element == '31-35':
        return 'Thirties'
    if element == '36-40':
        return 'Thirties'
    if element == '41-45':
        return 'Forties'
    if element == '46-50':
        return 'Forties'
    if element == '51-55':
        return 'Fifties'
    if element == '56 or older':
        return 'Above 55'
    elif pd.isnull(element):
        return np.nan
    
tafe_resignations_up.loc[:, 'age_groups'] = tafe_resignations_up.loc[:, 'age_cleaned'].map(update_vals)
tafe_resignations_up.head()

tafe_resignations_up.loc[:, 'age_groups'].value_counts(dropna=False)
Out[27]:
Forties     84
Twenties    74
Thirties    64
NaN         50
Fifties     39
Above 55    29
Name: age_groups, dtype: int64
In [28]:
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each age group.

tafe_resignations_up.loc[:, 'dissatisfied'] = tafe_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(tafe_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(91/(241+91))
print("Percent Average TAFE Employees Dissatisfied Regardless of Years of Service =", Percent_Average)
print('\n')
diss_tafe_pct2 = tafe_resignations_up.pivot_table(index='age_groups', values='dissatisfied')
print(diss_tafe_pct2.info)
False    249
True      91
Name: dissatisfied, dtype: int64


Percent Average TAFE Employees Dissatisfied Regardless of Years of Service = 27.40963855421687


<bound method DataFrame.info of             dissatisfied
age_groups              
Above 55        0.206897
Fifties         0.282051
Forties         0.285714
Thirties        0.250000
Twenties        0.270270>
In [29]:
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.

%matplotlib inline
import matplotlib.pyplot as plt
diss_tafe_pct2.plot(kind='bar', rot=30, ylim=(0,0.33))

for i in range(5):
    plt.text(i-0.22, 0.12, round(diss_tafe_pct2.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
    
plt.xlabel('Age Group Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Age Group: TAFE')
Out[29]:
Text(0.5, 1.0, 'Proportion Dissatisfied by Age Group: TAFE')

Answers to Question 2: TAFE

Question 2:

Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

Answers:

The answer to both questions above is YES.

There's very little difference in percent dissatisfied for employees in there twenties, thirties, forties and fifties: between 25% to 29%. There's a slight decrease in percent for employees over 55: 21%.

**The overall range of percentages for the TAFE Institute being 7.9 is much lower than that of the DETE Institute: 14. Also the overall average across all employee age groups for TAFE is 27% which is much lower than that for DETE: 49%.**

Combining the Data: DETE and TAFE

In [30]:
# USE concatenate to combine the two data sets.

combined = pd.concat([dete_resignations_up, tafe_resignations_up], axis=0, ignore_index=True)

combined.head()

print(combined.info())
print('\n')
print(combined.loc[:, 'service_cat'].value_counts(dropna=False))
print('\n')
print(combined.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
print(combined.notnull().sum().sort_values())
print('\n')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 651 entries, 0 to 650
Data columns (total 57 columns):
 #   Column                                               Non-Null Count  Dtype  
---  ------                                               --------------  -----  
 0   id                                                   651 non-null    float64
 1   separationtype                                       651 non-null    object 
 2   cease_date                                           635 non-null    object 
 3   dete_start_date                                      283 non-null    float64
 4   role_start_date                                      271 non-null    float64
 5   position                                             598 non-null    object 
 6   classification                                       161 non-null    object 
 7   region                                               265 non-null    object 
 8   business_unit                                        32 non-null     object 
 9   employment_status                                    597 non-null    object 
 10  career_move_to_public_sector                         311 non-null    object 
 11  career_move_to_private_sector                        311 non-null    object 
 12  interpersonal_conflicts                              311 non-null    object 
 13  job_dissatisfaction                                  311 non-null    object 
 14  dissatisfaction_with_the_department                  311 non-null    object 
 15  physical_work_environment                            311 non-null    object 
 16  lack_of_recognition                                  311 non-null    object 
 17  lack_of_job_security                                 311 non-null    object 
 18  work_location                                        311 non-null    object 
 19  employment_conditions                                311 non-null    object 
 20  maternity/family                                     311 non-null    object 
 21  relocation                                           311 non-null    object 
 22  study/travel                                         311 non-null    object 
 23  ill_health                                           311 non-null    object 
 24  traumatic_incident                                   311 non-null    object 
 25  work_life_balance                                    311 non-null    object 
 26  workload                                             311 non-null    object 
 27  none_of_the_above                                    311 non-null    object 
 28  gender                                               592 non-null    object 
 29  age                                                  596 non-null    object 
 30  aboriginal                                           7 non-null      object 
 31  torres_strait                                        0 non-null      object 
 32  south_sea                                            3 non-null      object 
 33  disability                                           8 non-null      object 
 34  nesb                                                 9 non-null      object 
 35  cease_date_cleaned                                   300 non-null    float64
 36  institute_service                                    563 non-null    object 
 37  dissatisfied                                         651 non-null    bool   
 38  institute                                            651 non-null    object 
 39  service_cat                                          563 non-null    object 
 40  age_groups                                           596 non-null    object 
 41  Institute                                            340 non-null    object 
 42  WorkArea                                             340 non-null    object 
 43  Contributing Factors. Career Move - Public Sector    332 non-null    object 
 44  Contributing Factors. Career Move - Private Sector   332 non-null    object 
 45  Contributing Factors. Career Move - Self-employment  332 non-null    object 
 46  Contributing Factors. Ill Health                     332 non-null    object 
 47  Contributing Factors. Maternity/Family               332 non-null    object 
 48  Contributing Factors. Dissatisfaction                332 non-null    object 
 49  Contributing Factors. Job Dissatisfaction            332 non-null    object 
 50  Contributing Factors. Interpersonal Conflict         332 non-null    object 
 51  Contributing Factors. Study                          332 non-null    object 
 52  Contributing Factors. Travel                         332 non-null    object 
 53  Contributing Factors. Other                          332 non-null    object 
 54  Contributing Factors. NONE                           332 non-null    object 
 55  role_service                                         290 non-null    object 
 56  age_cleaned                                          290 non-null    object 
dtypes: bool(1), float64(4), object(52)
memory usage: 285.6+ KB
None


New            193
Experienced    172
Veteran        136
NaN             88
Established     62
Name: service_cat, dtype: int64


False    411
True     240
Name: dissatisfied, dtype: int64


torres_strait                                            0
south_sea                                                3
aboriginal                                               7
disability                                               8
nesb                                                     9
business_unit                                           32
classification                                         161
region                                                 265
role_start_date                                        271
dete_start_date                                        283
role_service                                           290
age_cleaned                                            290
cease_date_cleaned                                     300
none_of_the_above                                      311
workload                                               311
work_life_balance                                      311
traumatic_incident                                     311
study/travel                                           311
relocation                                             311
maternity/family                                       311
ill_health                                             311
work_location                                          311
lack_of_job_security                                   311
lack_of_recognition                                    311
physical_work_environment                              311
dissatisfaction_with_the_department                    311
job_dissatisfaction                                    311
interpersonal_conflicts                                311
career_move_to_private_sector                          311
career_move_to_public_sector                           311
employment_conditions                                  311
Contributing Factors. Career Move - Private Sector     332
Contributing Factors. Career Move - Public Sector      332
Contributing Factors. Maternity/Family                 332
Contributing Factors. Dissatisfaction                  332
Contributing Factors. Job Dissatisfaction              332
Contributing Factors. Travel                           332
Contributing Factors. NONE                             332
Contributing Factors. Other                            332
Contributing Factors. Study                            332
Contributing Factors. Interpersonal Conflict           332
Contributing Factors. Career Move - Self-employment    332
Contributing Factors. Ill Health                       332
Institute                                              340
WorkArea                                               340
service_cat                                            563
institute_service                                      563
gender                                                 592
age                                                    596
age_groups                                             596
employment_status                                      597
position                                               598
cease_date                                             635
institute                                              651
dissatisfied                                           651
separationtype                                         651
id                                                     651
dtype: int64


In [31]:
print(combined.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')

Percent_Average = 100*(240/(240+411))
print("Percent Average COMBINED Dissatisfied Regardless Their Age =", Percent_Average)
print('\n')

combined_updated = combined.dropna(axis=1, thresh = 400).copy()
not_missing = combined_updated.notnull().sum()
print(not_missing)
False    411
True     240
Name: dissatisfied, dtype: int64


Percent Average COMBINED Dissatisfied Regardless Their Age = 36.86635944700461


id                   651
separationtype       651
cease_date           635
position             598
employment_status    597
gender               592
age                  596
institute_service    563
dissatisfied         651
institute            651
service_cat          563
age_groups           596
dtype: int64
In [32]:
combined_updated_2 = combined_updated
combined_updated_2.loc[:, 'dissatisfied'] = combined_updated_2.loc[:, 'dissatisfied'].fillna(False)
diss_pct1 = combined_updated_2.pivot_table(index='service_cat', values='dissatisfied')
print(diss_pct1.info)
<bound method DataFrame.info of              dissatisfied
service_cat              
Established      0.516129
Experienced      0.343023
New              0.295337
Veteran          0.485294>
In [33]:
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.

%matplotlib inline
diss_pct1.plot(kind='bar', rot=30, ylim=(0,0.65))

for i in range(4):
    plt.text(i-0.16, 0.15, round(diss_pct1.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
    
plt.xlabel('Years of Service Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Service Categories: Combined')
Out[33]:
Text(0.5, 1.0, 'Proportion Dissatisfied by Service Categories: Combined')

Answers to Question 1 for COMBINED Data

Question 1:

1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?

Answers:

The answer to both questions above is YES.

For employees classified as "New" (less than 3 years of service), about 30% resigned due to some kind of dissatisfaction. Veteran and Established groups were the higher ones at 49% and 52% respectively.

So, if a task force was commissioned to focus on the group with the higher percentage, I suppose they wouldn't choose Veteran since they are very close to retirement age. If they chose to focus on the Established group, would it be correct to use the same strategy for both institutes, assuming the Established group has the same issues for both?

In [34]:
# Use the _pd_str.strip and replace_ code to standardize the age group format.

combined_updated_2.loc[:, 'age_groups'].value_counts(dropna=False)

age_2 = combined_updated_2.loc[:, 'age'].str.strip().str.replace('  ', '-')
combined_updated_2.loc[:, 'age_cleaned'] = age_2
print(combined_updated_2.loc[:, 'age_cleaned'].value_counts(dropna=False))
41-45            93
46-50            81
36-40            73
51-55            71
26-30            67
21-25            62
31-35            61
NaN              55
56 or older      29
56-60            26
61 or older      23
20 or younger    10
Name: age_cleaned, dtype: int64
In [35]:
# Consolidate the employee age into five age range groups:

def update_vals(element):
    if element == '20 or younger':
        return 'Twenties'
    if element == '21-25':
        return 'Twenties'
    if element == '26-30':
        return 'Twenties'
    if element == '31-35':
        return 'Thirties'
    if element == '36-40':
        return 'Thirties'
    if element == '41-45':
        return 'Forties'
    if element == '46-50':
        return 'Forties'
    if element == '51-55':
        return 'Fifties'
    if element == '56-60':
        return 'Above 55'
    if element == '56 or older':
        return 'Above 55'
    if element == '61 or older':
        return 'Above 55'
    
    elif pd.isnull(element):
        return np.nan
    
combined_updated_2.loc[:, 'age_groups'] = combined_updated_2.loc[:, 'age_cleaned'].map(update_vals)
print(combined_updated_2.loc[:, 'age_groups'].value_counts(dropna=False))
Forties     174
Twenties    139
Thirties    134
Above 55     78
Fifties      71
NaN          55
Name: age_groups, dtype: int64
In [36]:
print(combined_updated_2.loc[:, 'dissatisfied'].value_counts(dropna=False))

diss_pct2 = combined_updated_2.pivot_table(index='age_groups', values='dissatisfied')
print(diss_pct2.info)
False    411
True     240
Name: dissatisfied, dtype: int64
<bound method DataFrame.info of             dissatisfied
age_groups              
Above 55        0.423077
Fifties         0.422535
Forties         0.379310
Thirties        0.358209
Twenties        0.352518>
In [37]:
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.

%matplotlib inline
import matplotlib.pyplot as plt
diss_pct2.plot(kind='bar', rot=30, ylim=(0,0.6))

for i in range(5):
    plt.text(i-0.21, 0.2, round(diss_pct2.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
    
plt.xlabel('Age Group Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Age Group: Combined')
Out[37]:
Text(0.5, 1.0, 'Proportion Dissatisfied by Age Group: Combined')

Answers to Question 2 for Combined Data

Question 2:

Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

Answers:

The answer to both questions above is YES.

There's very little difference in percent dissatisfied for employees in there twenties, thirties, forties: between 35% to 38%. The result for employees in the fifties and over 55 are the same: 42%.

The total range among the groups is only 7. This suggests that age of employee has no bearing on level of percent dissatisfied.

However, is that necessarily true within each of the institutes? Now, for the conclusion of this project.

Conclusions - A Different View

Overall Percent of DETE Employees Resigning Due to Some Dissatisfaction = 49.0%

Overall Percent of TAFE Employees Resigning Due to Some Dissatisfaction = 27.4%

In conclusion, I would say that it paid dividends to analyze the data by each institute, DETE and TAFE separately before combining the data.

If I was part of a Corporate Human Resource Team commissioned to reduce the percentage of employees resigning due to some dissatisfaction, our strategy would be much different if we had the data broken out by institute versus having only the combined data results. Seeing that the overall percent of dissatisfied employees within DETE (49.0%) is almost twice that of TAFE (27.4%), we would most likely begin establishing a strategy starting with the DETE Institute first.

If we happened to have two teams available with each one focusing on one of the two institutes, each would most likely establish a different strategy of attack based on the results in the tables below for each institute.

If we had only the combined data results to base our strategy on, it would be difficult to know where to begin.

                                   % OF EMPLOYEES RESIGNING DUE TO SOME DISSATISFACTION
                                   ----------------------------------------------------

Yrs. Service Group Years DETE % TAFE % Difference Combined %
New < 3 37.5 26.3 **11.2** 29.5
Experienced 3-6 46.1 25.0 **21.1** 34.3
Established 7-10 61.0 33.3 **27.7** 51.6
Veteran > 10 56.0 27.8 **28.2** 48.5
Employee Age Group DETE % TAFE % Difference Combined %
Under 30 44.6 27.0 **17.6** 35.3
Thirties 45.7 25.0 **20.7** 35.8
Forties 46.7 28.6 **18.1** 37.9
Fifties 58.6 28.2 **30.4** 42.3
Over 56 52.2 20.7 **31.5** 42.3

It is interesting to see that in both tables above, the difference **(in red)** in percent between the two institutes increases as years of service or employee age increases.