In this project, we'll work with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. Our goal is to find if younger employees are resigning due to some kind of dissatisfaction?, what about older employees?
import pandas as pd
import numpy as np
dete_survey = pd.read_csv('dete_survey.csv')
tafe_survey = pd.read_csv('tafe_survey.csv')
dete_survey.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 822 entries, 0 to 821 Data columns (total 56 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 822 non-null int64 1 SeparationType 822 non-null object 2 Cease Date 822 non-null object 3 DETE Start Date 822 non-null object 4 Role Start Date 822 non-null object 5 Position 817 non-null object 6 Classification 455 non-null object 7 Region 822 non-null object 8 Business Unit 126 non-null object 9 Employment Status 817 non-null object 10 Career move to public sector 822 non-null bool 11 Career move to private sector 822 non-null bool 12 Interpersonal conflicts 822 non-null bool 13 Job dissatisfaction 822 non-null bool 14 Dissatisfaction with the department 822 non-null bool 15 Physical work environment 822 non-null bool 16 Lack of recognition 822 non-null bool 17 Lack of job security 822 non-null bool 18 Work location 822 non-null bool 19 Employment conditions 822 non-null bool 20 Maternity/family 822 non-null bool 21 Relocation 822 non-null bool 22 Study/Travel 822 non-null bool 23 Ill Health 822 non-null bool 24 Traumatic incident 822 non-null bool 25 Work life balance 822 non-null bool 26 Workload 822 non-null bool 27 None of the above 822 non-null bool 28 Professional Development 808 non-null object 29 Opportunities for promotion 735 non-null object 30 Staff morale 816 non-null object 31 Workplace issue 788 non-null object 32 Physical environment 817 non-null object 33 Worklife balance 815 non-null object 34 Stress and pressure support 810 non-null object 35 Performance of supervisor 813 non-null object 36 Peer support 812 non-null object 37 Initiative 813 non-null object 38 Skills 811 non-null object 39 Coach 767 non-null object 40 Career Aspirations 746 non-null object 41 Feedback 792 non-null object 42 Further PD 768 non-null object 43 Communication 814 non-null object 44 My say 812 non-null object 45 Information 816 non-null object 46 Kept informed 813 non-null object 47 Wellness programs 766 non-null object 48 Health & Safety 793 non-null object 49 Gender 798 non-null object 50 Age 811 non-null object 51 Aboriginal 16 non-null object 52 Torres Strait 3 non-null object 53 South Sea 7 non-null object 54 Disability 23 non-null object 55 NESB 32 non-null object dtypes: bool(18), int64(1), object(37) memory usage: 258.6+ KB
dete_survey.head()
ID | SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | ... | Kept informed | Wellness programs | Health & Safety | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Ill Health Retirement | 08/2012 | 1984 | 2004 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
1 | 2 | Voluntary Early Retirement (VER) | 08/2012 | Not Stated | Not Stated | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
2 | 3 | Voluntary Early Retirement (VER) | 05/2012 | 2011 | 2011 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | ... | N | N | N | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
3 | 4 | Resignation-Other reasons | 05/2012 | 2005 | 2006 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | ... | A | N | A | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
4 | 5 | Age Retirement | 05/2012 | 1970 | 1989 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | ... | N | A | M | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
5 rows × 56 columns
tafe_survey.head()
Record ID | Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | ... | Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? | Workplace. Topic:Does your workplace promote and practice the principles of employment equity? | Workplace. Topic:Does your workplace value the diversity of its employees? | Workplace. Topic:Would you recommend the Institute as an employer to others? | Gender. What is your Gender? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | ... | Yes | Yes | Yes | Yes | Female | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
1 | 6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | ... | Yes | Yes | Yes | Yes | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
5 rows × 72 columns
dete_survey['DETE Start Date'].value_counts()
Not Stated 73 2011 40 2007 34 2008 31 2012 27 2010 27 2009 24 2006 23 2013 21 1975 21 1970 21 2005 20 1990 20 1999 19 1996 19 1992 18 2000 18 1991 18 2004 18 1989 17 2003 15 1978 15 1976 15 1988 15 2002 15 1980 14 1997 14 1998 14 1979 14 1974 14 1995 14 1993 13 1986 12 1972 12 1977 11 1984 10 2001 10 1969 10 1971 10 1994 10 1981 9 1983 9 1973 8 1985 8 1987 7 1982 4 1963 4 1968 3 1967 2 1966 1 1965 1 Name: DETE Start Date, dtype: int64
dete_survey.isnull().sum()
ID 0 SeparationType 0 Cease Date 0 DETE Start Date 0 Role Start Date 0 Position 5 Classification 367 Region 0 Business Unit 696 Employment Status 5 Career move to public sector 0 Career move to private sector 0 Interpersonal conflicts 0 Job dissatisfaction 0 Dissatisfaction with the department 0 Physical work environment 0 Lack of recognition 0 Lack of job security 0 Work location 0 Employment conditions 0 Maternity/family 0 Relocation 0 Study/Travel 0 Ill Health 0 Traumatic incident 0 Work life balance 0 Workload 0 None of the above 0 Professional Development 14 Opportunities for promotion 87 Staff morale 6 Workplace issue 34 Physical environment 5 Worklife balance 7 Stress and pressure support 12 Performance of supervisor 9 Peer support 10 Initiative 9 Skills 11 Coach 55 Career Aspirations 76 Feedback 30 Further PD 54 Communication 8 My say 10 Information 6 Kept informed 9 Wellness programs 56 Health & Safety 29 Gender 24 Age 11 Aboriginal 806 Torres Strait 819 South Sea 815 Disability 799 NESB 790 dtype: int64
From what we observed some columns have 'Not stated' as values in place of Nan. Both dataframes have columns that we don't need to complete our analysis. Both dataframes contains many of the same columns, with different column names
2.Identify Missing Values and Drop Unnecessary Columns
dete_survey = pd.read_csv('dete_survey.csv', na_values = ['Not Stated']) #assign the 'Not Stated values to 'NaN'
haven identified that some cells have 'Not Stated' as values, i had to read the file again to assign 'Not Stated' as missing values or NaN.
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49], axis =1)
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66], axis =1)
in the above cells, i dropped columns that were not necessary for the purpose of the analysis.
dete_survey_updated
ID | SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | ... | Work life balance | Workload | None of the above | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Ill Health Retirement | 08/2012 | 1984.0 | 2004.0 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | False | False | True | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
1 | 2 | Voluntary Early Retirement (VER) | 08/2012 | NaN | NaN | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | False | False | False | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
2 | 3 | Voluntary Early Retirement (VER) | 05/2012 | 2011.0 | 2011.0 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | ... | False | False | True | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
3 | 4 | Resignation-Other reasons | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | ... | False | False | False | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
4 | 5 | Age Retirement | 05/2012 | 1970.0 | 1989.0 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | ... | True | False | False | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
817 | 819 | Age Retirement | 02/2014 | 1977.0 | 1999.0 | Teacher | Primary | Central Queensland | NaN | Permanent Part-time | ... | False | True | False | Female | 56-60 | NaN | NaN | NaN | NaN | NaN |
818 | 820 | Age Retirement | 01/2014 | 1980.0 | 1980.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | ... | False | False | True | Male | 51-55 | NaN | NaN | NaN | NaN | NaN |
819 | 821 | Resignation-Move overseas/interstate | 01/2014 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | ... | True | False | False | Female | 31-35 | NaN | NaN | NaN | NaN | NaN |
820 | 822 | Ill Health Retirement | 12/2013 | 2001.0 | 2009.0 | Teacher | Secondary | Darling Downs South West | NaN | Permanent Full-time | ... | False | False | False | Female | 41-45 | NaN | NaN | NaN | NaN | NaN |
821 | 823 | Resignation-Move overseas/interstate | 12/2013 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | ... | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
822 rows × 35 columns
tafe_survey_updated
Record ID | Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Gender. What is your Gender? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Female | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
1 | 6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | ... | - | - | - | NONE | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | ... | - | - | - | - | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
697 | 6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | ... | - | - | - | - | Male | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
698 | 6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | ... | - | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
699 | 6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | ... | - | - | Other | - | Female | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
700 | 6.350712e+17 | Southbank Institute of Technology | Non-Delivery (corporate) | 2013.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Female | 41 45 | Temporary Full-time | Professional Officer (PO) | 1-2 | 1-2 |
701 | 6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | ... | - | Travel | - | - | Female | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
702 rows × 23 columns
3. Clean Column Names
dete_survey_updated.columns = dete_survey_updated.columns.str.lower().str.strip().str.replace(' ', '_')
tafe_survey_updated = tafe_survey_updated.rename({'Record ID': 'id','CESSATION YEAR': 'cease_date','Reason for ceasing employment': 'separationtype','Gender. What is your Gender?': 'gender','CurrentAge. Current Age': 'age','Employment Type. Employment Type': 'employment_status','Classification. Classification': 'position','LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service','LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'}, axis = 1)
dete_survey_updated.head()
id | separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | ... | work_life_balance | workload | none_of_the_above | gender | age | aboriginal | torres_strait | south_sea | disability | nesb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Ill Health Retirement | 08/2012 | 1984.0 | 2004.0 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | False | False | True | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
1 | 2 | Voluntary Early Retirement (VER) | 08/2012 | NaN | NaN | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | ... | False | False | False | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
2 | 3 | Voluntary Early Retirement (VER) | 05/2012 | 2011.0 | 2011.0 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | ... | False | False | True | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
3 | 4 | Resignation-Other reasons | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | ... | False | False | False | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
4 | 5 | Age Retirement | 05/2012 | 1970.0 | 1989.0 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | ... | True | False | False | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
5 rows × 35 columns
tafe_survey_updated.head()
id | Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | gender | age | employment_status | position | institute_service | role_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | Female | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
1 | 6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | ... | - | - | - | NONE | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | ... | - | - | - | - | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
5 rows × 23 columns
The dete_survey_updated dataframe had different letter case, it was standardized to to fit the tafe_survey_updated column headers.
The tafe_survey_updated has many of the same columns as dete_survey_updated, but with different names. The columns identified were renamed to match those of dete_survey_updated.
4. Filter the Data
dete_survey_updated['separationtype'].value_counts()
Age Retirement 285 Resignation-Other reasons 150 Resignation-Other employer 91 Resignation-Move overseas/interstate 70 Voluntary Early Retirement (VER) 67 Ill Health Retirement 61 Other 49 Contract Expired 34 Termination 15 Name: separationtype, dtype: int64
tafe_survey_updated['separationtype'].value_counts()
Resignation 340 Contract Expired 127 Retrenchment/ Redundancy 104 Retirement 82 Transfer 25 Termination 23 Name: separationtype, dtype: int64
dete_survey_updated['separationtype'] = dete_survey_updated['separationtype'].str.split('-').str[0]
dete_resignation = dete_survey_updated.copy()[dete_survey_updated['separationtype']== 'Resignation']
tafe_resignation = tafe_survey_updated.copy()[tafe_survey_updated['separationtype']== 'Resignation']
dete_resignation
id | separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | ... | work_life_balance | workload | none_of_the_above | gender | age | aboriginal | torres_strait | south_sea | disability | nesb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 4 | Resignation | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | ... | False | False | False | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
5 | 6 | Resignation | 05/2012 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | ... | False | False | False | Female | 41-45 | NaN | NaN | NaN | NaN | NaN |
8 | 9 | Resignation | 07/2012 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | ... | False | False | False | Female | 31-35 | NaN | NaN | NaN | NaN | NaN |
9 | 10 | Resignation | 2012 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | ... | False | False | False | Female | 46-50 | NaN | NaN | NaN | NaN | NaN |
11 | 12 | Resignation | 2012 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | ... | False | False | False | Male | 31-35 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
808 | 810 | Resignation | 12/2013 | 2010.0 | 2010.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | ... | False | False | False | Female | 26-30 | NaN | NaN | NaN | NaN | NaN |
815 | 817 | Resignation | 01/2014 | 2012.0 | 2012.0 | Teacher | Primary | Far North Queensland | NaN | Permanent Full-time | ... | False | False | False | Male | 21-25 | NaN | NaN | NaN | NaN | NaN |
816 | 818 | Resignation | 01/2014 | 2012.0 | 2012.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | ... | False | False | False | Female | 21-25 | NaN | NaN | NaN | NaN | NaN |
819 | 821 | Resignation | 01/2014 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | ... | True | False | False | Female | 31-35 | NaN | NaN | NaN | NaN | NaN |
821 | 823 | Resignation | 12/2013 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | ... | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
311 rows × 35 columns
tafe_resignation
id | Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | gender | age | employment_status | position | institute_service | role_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | ... | - | - | - | - | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
5 | 6.341475e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | ... | - | - | Other | - | Female | 56 or older | Contract/casual | Teacher (including LVT) | 7-10 | 7-10 |
6 | 6.341520e+17 | Barrier Reef Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | Maternity/Family | ... | - | - | Other | - | Male | 20 or younger | Temporary Full-time | Administration (AO) | 3-4 | 3-4 |
7 | 6.341537e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | ... | - | - | Other | - | Male | 46 50 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
696 | 6.350660e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | Career Move - Private Sector | - | - | - | ... | - | - | - | - | Male | 21 25 | Temporary Full-time | Operational (OO) | 5-6 | 5-6 |
697 | 6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | ... | - | - | - | - | Male | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
698 | 6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | ... | - | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
699 | 6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | ... | - | - | Other | - | Female | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
701 | 6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | ... | - | Travel | - | - | Female | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
340 rows × 23 columns
The goal of this project is to find if younger employee are resigning due to some disatisfaction. hence we have to assign to a new variable; all the rows where the cause of separation was 'Resignation'. dete_resignation tafe_resignation
5. Verify the Data
dete_resignation['cease_date'].value_counts()
2012 126 2013 74 01/2014 22 12/2013 17 06/2013 14 09/2013 11 07/2013 9 11/2013 9 10/2013 6 08/2013 4 05/2012 2 05/2013 2 2010 1 07/2012 1 07/2006 1 09/2010 1 Name: cease_date, dtype: int64
dete_resignation['cease_date'] = dete_resignation['cease_date'].str.split('/').str[-1].copy()
dete_resignation['cease_date'] = dete_resignation['cease_date'].astype('float').copy()
dete_resignation['cease_date'].value_counts().sort_index(ascending = True)
2006.0 1 2010.0 2 2012.0 129 2013.0 146 2014.0 22 Name: cease_date, dtype: int64
dete_resignation['dete_start_date'].value_counts().sort_index(ascending = True)
1963.0 1 1971.0 1 1972.0 1 1973.0 1 1974.0 2 1975.0 1 1976.0 2 1977.0 1 1980.0 5 1982.0 1 1983.0 2 1984.0 1 1985.0 3 1986.0 3 1987.0 1 1988.0 4 1989.0 4 1990.0 5 1991.0 4 1992.0 6 1993.0 5 1994.0 6 1995.0 4 1996.0 6 1997.0 5 1998.0 6 1999.0 8 2000.0 9 2001.0 3 2002.0 6 2003.0 6 2004.0 14 2005.0 15 2006.0 13 2007.0 21 2008.0 22 2009.0 13 2010.0 17 2011.0 24 2012.0 21 2013.0 10 Name: dete_start_date, dtype: int64
tafe_resignation['cease_date'].value_counts().sort_index(ascending = True)
2009.0 2 2010.0 68 2011.0 116 2012.0 94 2013.0 55 Name: cease_date, dtype: int64
import matplotlib as plt
%matplotlib inline
dete_resignation.boxplot(column = ['cease_date'])
<matplotlib.axes._subplots.AxesSubplot at 0x7f146e40a610>
tafe_resignation.boxplot(column = ['cease_date'])
<matplotlib.axes._subplots.AxesSubplot at 0x7f146e3a03d0>
6. Create a New Column
dete_resignation['institute_service'] = dete_resignation['cease_date']-dete_resignation['dete_start_date'].copy()
dete_resignation.head()
id | separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | ... | workload | none_of_the_above | gender | age | aboriginal | torres_strait | south_sea | disability | nesb | institute_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 4 | Resignation | 2012.0 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | ... | False | False | Female | 36-40 | NaN | NaN | NaN | NaN | NaN | 7.0 |
5 | 6 | Resignation | 2012.0 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | ... | False | False | Female | 41-45 | NaN | NaN | NaN | NaN | NaN | 18.0 |
8 | 9 | Resignation | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | ... | False | False | Female | 31-35 | NaN | NaN | NaN | NaN | NaN | 3.0 |
9 | 10 | Resignation | 2012.0 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | ... | False | False | Female | 46-50 | NaN | NaN | NaN | NaN | NaN | 15.0 |
11 | 12 | Resignation | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | ... | False | False | Male | 31-35 | NaN | NaN | NaN | NaN | NaN | 3.0 |
5 rows × 36 columns
since our end goal is to find out why employee who worked for a short time are resigning, we need a column in dete_resignation dataframe to analyze survey respondents according to their length of employment, just like we have in tafe_resignation.
7. Identify Dissatisfied Employee
dete_resignation.columns
Index(['id', 'separationtype', 'cease_date', 'dete_start_date', 'role_start_date', 'position', 'classification', 'region', 'business_unit', 'employment_status', 'career_move_to_public_sector', 'career_move_to_private_sector', 'interpersonal_conflicts', 'job_dissatisfaction', 'dissatisfaction_with_the_department', 'physical_work_environment', 'lack_of_recognition', 'lack_of_job_security', 'work_location', 'employment_conditions', 'maternity/family', 'relocation', 'study/travel', 'ill_health', 'traumatic_incident', 'work_life_balance', 'workload', 'none_of_the_above', 'gender', 'age', 'aboriginal', 'torres_strait', 'south_sea', 'disability', 'nesb', 'institute_service'], dtype='object')
tafe_resignation.columns
Index(['id', 'Institute', 'WorkArea', 'cease_date', 'separationtype', 'Contributing Factors. Career Move - Public Sector ', 'Contributing Factors. Career Move - Private Sector ', 'Contributing Factors. Career Move - Self-employment', 'Contributing Factors. Ill Health', 'Contributing Factors. Maternity/Family', 'Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction', 'Contributing Factors. Interpersonal Conflict', 'Contributing Factors. Study', 'Contributing Factors. Travel', 'Contributing Factors. Other', 'Contributing Factors. NONE', 'gender', 'age', 'employment_status', 'position', 'institute_service', 'role_service'], dtype='object')
tafe_resignation['Contributing Factors. Dissatisfaction'].value_counts()
- 277 Contributing Factors. Dissatisfaction 55 Name: Contributing Factors. Dissatisfaction, dtype: int64
tafe_resignation['Contributing Factors. Job Dissatisfaction'].value_counts()
- 270 Job Dissatisfaction 62 Name: Contributing Factors. Job Dissatisfaction, dtype: int64
# Update the values in the contributing factors columns to be either True, False, or NaN
def update_vals(x):
if x == '-':
return False
elif pd.isnull(x):
return np.nan
else:
return True
tafe_resignation['dissatisfied'] = tafe_resignation[['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction']].applymap(update_vals).any(1, skipna=False)
tafe_resignation_up = tafe_resignation.copy()
tafe_resignation_up.head()
id | Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | ... | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | gender | age | employment_status | position | institute_service | role_service | dissatisfied | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | ... | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN | False |
4 | 6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | ... | - | - | - | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 | False |
5 | 6.341475e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | ... | - | Other | - | Female | 56 or older | Contract/casual | Teacher (including LVT) | 7-10 | 7-10 | False |
6 | 6.341520e+17 | Barrier Reef Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | Maternity/Family | ... | - | Other | - | Male | 20 or younger | Temporary Full-time | Administration (AO) | 3-4 | 3-4 | False |
7 | 6.341537e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | ... | - | Other | - | Male | 46 50 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 | False |
5 rows × 24 columns
tafe_resignation_up['dissatisfied'].value_counts(dropna = False)
False 241 True 91 NaN 8 Name: dissatisfied, dtype: int64
dete_resignation['dissatisfied'] = dete_resignation[['job_dissatisfaction','dissatisfaction_with_the_department','physical_work_environment','lack_of_recognition','lack_of_job_security','work_location','employment_conditions','work_life_balance','workload']].any(axis= 1, skipna = False)
dete_resignation_up = dete_resignation.copy()
dete_resignation_up['dissatisfied'].value_counts(dropna = False)
False 162 True 149 Name: dissatisfied, dtype: int64
we identified the columns/variables that depicts dissatisfaction from the respondents from both tafe_resignation and dete_resignation. we create a new column that returns True if any variable in the dissatisfaction columns is True and False is all returns False if all the respondents response is False.
for tafe_resignation, we had to make a function to return false for '-', return NaN if the value is np.nan and True for other elements in both variables('Contributing Factors. Dissatisfaction' and 'Contributing Factors. Job Dissatisfaction')
8. Combine the Data
tafe_resignation_up['institute'] = 'TAFE'
dete_resignation_up['institute'] = 'DETE'
combined = pd.concat([dete_resignation_up, tafe_resignation_up], ignore_index = True)
combined.notnull().sum().sort_values() # Verify the number of non null values in each column
torres_strait 0 south_sea 3 aboriginal 7 disability 8 nesb 9 business_unit 32 classification 161 region 265 role_start_date 271 dete_start_date 283 role_service 290 none_of_the_above 311 work_life_balance 311 traumatic_incident 311 ill_health 311 study/travel 311 relocation 311 maternity/family 311 employment_conditions 311 workload 311 lack_of_job_security 311 career_move_to_public_sector 311 career_move_to_private_sector 311 interpersonal_conflicts 311 work_location 311 dissatisfaction_with_the_department 311 physical_work_environment 311 lack_of_recognition 311 job_dissatisfaction 311 Contributing Factors. Job Dissatisfaction 332 Contributing Factors. Travel 332 Contributing Factors. Maternity/Family 332 Contributing Factors. Ill Health 332 Contributing Factors. Career Move - Self-employment 332 Contributing Factors. Career Move - Private Sector 332 Contributing Factors. Career Move - Public Sector 332 Contributing Factors. Dissatisfaction 332 Contributing Factors. Other 332 Contributing Factors. Interpersonal Conflict 332 Contributing Factors. NONE 332 Contributing Factors. Study 332 Institute 340 WorkArea 340 institute_service 563 gender 592 age 596 employment_status 597 position 598 cease_date 635 dissatisfied 643 separationtype 651 institute 651 id 651 dtype: int64
combined_update = combined.dropna(thresh = 500, axis =1).copy()
we added a new column to each dateframe, and imputed 'DETE'and 'TAFE' as elements in the new column name'institute' respectively, so we can easily identify each respondents survey institute.
we join both dataframe using the concat() function, then remove columns with less than 500 non_null() values
9. Clean the service column
combined_update['institute_service'].value_counts(dropna = False)
NaN 88 Less than 1 year 73 1-2 64 3-4 63 5-6 33 11-20 26 5.0 23 1.0 22 7-10 21 0.0 20 3.0 20 6.0 17 4.0 16 9.0 14 2.0 14 7.0 13 More than 20 years 10 13.0 8 8.0 8 15.0 7 20.0 7 10.0 6 12.0 6 14.0 6 17.0 6 22.0 6 16.0 5 18.0 5 11.0 4 23.0 4 24.0 4 19.0 3 32.0 3 39.0 3 21.0 3 28.0 2 30.0 2 26.0 2 36.0 2 25.0 2 29.0 1 31.0 1 27.0 1 34.0 1 35.0 1 38.0 1 41.0 1 42.0 1 49.0 1 33.0 1 Name: institute_service, dtype: int64
combined_update['institute_service'] = combined_update['institute_service'].astype('str').str.extract(r'(\d+)')
#Above we extracted the years from the service_institute column
combined_update['institute_service'] = combined_update['institute_service'].astype('float') #convert the column to float so we can carry out some analysis on the values od each row
def category(val):
if pd.isnull(val):
return np.nan
elif val <= 3:
return 'New'
elif val >= 3 and val < 6:
return 'Experienced'
elif val >= 7 and val < 10:
return 'Established'
else:
return 'Veteran'
#Above we created a function that can return a value for a particular range of years of experience
combined_update['service_cat'] = combined_update['institute_service'].apply(category) #Apply the function to each element in the service_institute column and attach it to a new column in the Dataframe
combined_update['service_cat'].value_counts()
New 276 Veteran 159 Experienced 72 Established 56 Name: service_cat, dtype: int64
10. Perform Initial Analysis
combined_update['dissatisfied'].value_counts(dropna = False)
False 403 True 240 NaN 8 Name: dissatisfied, dtype: int64
combined_update['dissatisfied'] = combined_update['dissatisfied'].fillna(False) #Fill the missing values with 'False' which is the reasonable estimate since most elements in the row are False
combined_update['dissatisfied'].value_counts(dropna = False)
False 411 True 240 Name: dissatisfied, dtype: int64
%matplotlib inline
graph = combined_update.pivot_table(values ='dissatisfied', index = 'service_cat')
#Above create a pivot_table with index as service_cat and values as dissatisfied. to enable us visualize the the dissatisfied employee in each service category
graph.plot(kind = 'bar', rot = 30)
<matplotlib.axes._subplots.AxesSubplot at 0x7f146e2ece20>
from this initial analysis, we can deduce that employees woth over 7years of experience are more likely to resigh over some type of dissatisfaction.
Lets see amongst the genders, which of them is more dissatisfied
combined_update['gender'].value_counts(dropna = False)
Female 424 Male 168 NaN 59 Name: gender, dtype: int64
gender_resignation = combined_update.pivot_table(values = 'dissatisfied', index = 'gender')
gender_resignation.plot(kind = 'bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7f146e2dbca0>
Above we explored the average dissatisfation among both genders, and men are appear to be more dissatisfied
combined_update.isnull().sum()
id 0 separationtype 0 cease_date 16 position 53 employment_status 54 gender 59 age 55 institute_service 88 dissatisfied 0 institute 0 service_cat 88 dtype: int64
Finally we would explore the age distribution, the see if theres any insight we can get from the dissatified employee and the age distribution
combined_update['age'].value_counts()
51-55 71 41-45 48 41 45 45 46-50 42 36-40 41 46 50 39 26-30 35 21 25 33 36 40 32 26 30 32 31 35 32 56 or older 29 31-35 29 21-25 29 56-60 26 61 or older 23 20 or younger 10 Name: age, dtype: int64
combined_update['age'] = combined_update['age'].astype('str').str.extract(r"(\d+)")
Clean the age column by extracting just the numerical values
combined_update['age'].value_counts(dropna = False)
41 93 46 81 36 73 51 71 26 67 21 62 31 61 56 55 NaN 55 61 23 20 10 Name: age, dtype: int64
def age_category(val):
if pd.isnull(val):
return np.nan
elif val >= 20 and val <30:
return 'Young'
elif val >= 30 and val < 49:
return 'Middle Aged'
else:
return 'Older'
Above we created a function that can return a value for a particular range of years
combined_update['age'] = combined_update['age'].astype('float')
combined_update['age_cat'] = combined_update['age'].apply(age_category)
age_resignation = combined_update.pivot_table(values = 'dissatisfied', index = 'age_cat')
age_resignation.plot(kind = 'barh', )
<matplotlib.axes._subplots.AxesSubplot at 0x7f14aec50a00>
haven cleaned the age column, i plotted the average of each age distribution(age_cat. From the graph, we can see employees from 50 and above are more dissatisfied than the other age distribution
combined_update.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 651 entries, 0 to 650 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 651 non-null float64 1 separationtype 651 non-null object 2 cease_date 635 non-null float64 3 position 598 non-null object 4 employment_status 597 non-null object 5 gender 592 non-null object 6 age 596 non-null object 7 institute_service 563 non-null float64 8 dissatisfied 651 non-null bool 9 institute 651 non-null object 10 service_cat 563 non-null object dtypes: bool(1), float64(3), object(7) memory usage: 51.6+ KB