In the code below, we will:
import pandas as pd # Imported the pandas library
import numpy as np # Imported the numpy library
dete_survey = pd.read_csv('dete_survey.csv', index_col = 0) # dete_survery.csv CSV file read into pandas
dete_survey
SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | Career move to public sector | ... | Kept informed | Wellness programs | Health & Safety | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
1 | Ill Health Retirement | 08/2012 | 1984 | 2004 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | True | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
2 | Voluntary Early Retirement (VER) | 08/2012 | Not Stated | Not Stated | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | False | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
3 | Voluntary Early Retirement (VER) | 05/2012 | 2011 | 2011 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | N | N | N | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
4 | Resignation-Other reasons | 05/2012 | 2005 | 2006 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | A | N | A | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
5 | Age Retirement | 05/2012 | 1970 | 1989 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | False | ... | N | A | M | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
819 | Age Retirement | 02/2014 | 1977 | 1999 | Teacher | Primary | Central Queensland | NaN | Permanent Part-time | False | ... | A | A | SA | Female | 56-60 | NaN | NaN | NaN | NaN | NaN |
820 | Age Retirement | 01/2014 | 1980 | 1980 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | ... | N | N | N | Male | 51-55 | NaN | NaN | NaN | NaN | NaN |
821 | Resignation-Move overseas/interstate | 01/2014 | 2009 | 2009 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | ... | A | N | A | Female | 31-35 | NaN | NaN | NaN | NaN | NaN |
822 | Ill Health Retirement | 12/2013 | 2001 | 2009 | Teacher | Secondary | Darling Downs South West | NaN | Permanent Full-time | False | ... | A | N | A | Female | 41-45 | NaN | NaN | NaN | NaN | NaN |
823 | Resignation-Move overseas/interstate | 12/2013 | Not Stated | Not Stated | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
822 rows × 55 columns
tafe_survey = pd.read_csv('tafe_survey.csv', index_col = 0) # tafe_survey.csv CSV file read into pandas
tafe_survey
Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? | Workplace. Topic:Does your workplace promote and practice the principles of employment equity? | Workplace. Topic:Does your workplace value the diversity of its employees? | Workplace. Topic:Would you recommend the Institute as an employer to others? | Gender. What is your Gender? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | Yes | Yes | Yes | Yes | Female | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | Yes | Yes | Yes | Yes | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | Male | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | Female | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
6.350712e+17 | Southbank Institute of Technology | Non-Delivery (corporate) | 2013.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | No | No | Yes | No | Female | 41 45 | Temporary Full-time | Professional Officer (PO) | 1-2 | 1-2 |
6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | - | ... | Yes | Yes | Yes | Yes | Female | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
702 rows × 71 columns
dete_survey.info() # Prints information for the dete_survey dataframe such as the column names, their type, if there are any null values, etc.
<class 'pandas.core.frame.DataFrame'> Int64Index: 822 entries, 1 to 823 Data columns (total 55 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 SeparationType 822 non-null object 1 Cease Date 822 non-null object 2 DETE Start Date 822 non-null object 3 Role Start Date 822 non-null object 4 Position 817 non-null object 5 Classification 455 non-null object 6 Region 822 non-null object 7 Business Unit 126 non-null object 8 Employment Status 817 non-null object 9 Career move to public sector 822 non-null bool 10 Career move to private sector 822 non-null bool 11 Interpersonal conflicts 822 non-null bool 12 Job dissatisfaction 822 non-null bool 13 Dissatisfaction with the department 822 non-null bool 14 Physical work environment 822 non-null bool 15 Lack of recognition 822 non-null bool 16 Lack of job security 822 non-null bool 17 Work location 822 non-null bool 18 Employment conditions 822 non-null bool 19 Maternity/family 822 non-null bool 20 Relocation 822 non-null bool 21 Study/Travel 822 non-null bool 22 Ill Health 822 non-null bool 23 Traumatic incident 822 non-null bool 24 Work life balance 822 non-null bool 25 Workload 822 non-null bool 26 None of the above 822 non-null bool 27 Professional Development 808 non-null object 28 Opportunities for promotion 735 non-null object 29 Staff morale 816 non-null object 30 Workplace issue 788 non-null object 31 Physical environment 817 non-null object 32 Worklife balance 815 non-null object 33 Stress and pressure support 810 non-null object 34 Performance of supervisor 813 non-null object 35 Peer support 812 non-null object 36 Initiative 813 non-null object 37 Skills 811 non-null object 38 Coach 767 non-null object 39 Career Aspirations 746 non-null object 40 Feedback 792 non-null object 41 Further PD 768 non-null object 42 Communication 814 non-null object 43 My say 812 non-null object 44 Information 816 non-null object 45 Kept informed 813 non-null object 46 Wellness programs 766 non-null object 47 Health & Safety 793 non-null object 48 Gender 798 non-null object 49 Age 811 non-null object 50 Aboriginal 16 non-null object 51 Torres Strait 3 non-null object 52 South Sea 7 non-null object 53 Disability 23 non-null object 54 NESB 32 non-null object dtypes: bool(18), object(37) memory usage: 258.5+ KB
tafe_survey.info() # Prints information for the tafe_survey dataframe " " " " " " " " " " " " " " ""
<class 'pandas.core.frame.DataFrame'> Float64Index: 702 entries, 6.34133009996094e+17 to 6.35073030973791e+17 Data columns (total 71 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Institute 702 non-null object 1 WorkArea 702 non-null object 2 CESSATION YEAR 695 non-null float64 3 Reason for ceasing employment 701 non-null object 4 Contributing Factors. Career Move - Public Sector 437 non-null object 5 Contributing Factors. Career Move - Private Sector 437 non-null object 6 Contributing Factors. Career Move - Self-employment 437 non-null object 7 Contributing Factors. Ill Health 437 non-null object 8 Contributing Factors. Maternity/Family 437 non-null object 9 Contributing Factors. Dissatisfaction 437 non-null object 10 Contributing Factors. Job Dissatisfaction 437 non-null object 11 Contributing Factors. Interpersonal Conflict 437 non-null object 12 Contributing Factors. Study 437 non-null object 13 Contributing Factors. Travel 437 non-null object 14 Contributing Factors. Other 437 non-null object 15 Contributing Factors. NONE 437 non-null object 16 Main Factor. Which of these was the main factor for leaving? 113 non-null object 17 InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction 608 non-null object 18 InstituteViews. Topic:2. I was given access to skills training to help me do my job better 613 non-null object 19 InstituteViews. Topic:3. I was given adequate opportunities for personal development 610 non-null object 20 InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL% 608 non-null object 21 InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had 615 non-null object 22 InstituteViews. Topic:6. The organisation recognised when staff did good work 607 non-null object 23 InstituteViews. Topic:7. Management was generally supportive of me 614 non-null object 24 InstituteViews. Topic:8. Management was generally supportive of my team 608 non-null object 25 InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me 610 non-null object 26 InstituteViews. Topic:10. Staff morale was positive within the Institute 602 non-null object 27 InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly 601 non-null object 28 InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently 597 non-null object 29 InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly 601 non-null object 30 WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit 609 non-null object 31 WorkUnitViews. Topic:15. I worked well with my colleagues 605 non-null object 32 WorkUnitViews. Topic:16. My job was challenging and interesting 607 non-null object 33 WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work 610 non-null object 34 WorkUnitViews. Topic:18. I had sufficient contact with other people in my job 613 non-null object 35 WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job 609 non-null object 36 WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job 609 non-null object 37 WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT] 608 non-null object 38 WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job 608 non-null object 39 WorkUnitViews. Topic:23. My job provided sufficient variety 611 non-null object 40 WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job 610 non-null object 41 WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction 611 non-null object 42 WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance 606 non-null object 43 WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area 610 non-null object 44 WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date 609 non-null object 45 WorkUnitViews. Topic:29. There was adequate communication between staff in my unit 603 non-null object 46 WorkUnitViews. Topic:30. Staff morale was positive within my work unit 606 non-null object 47 Induction. Did you undertake Workplace Induction? 619 non-null object 48 InductionInfo. Topic:Did you undertake a Corporate Induction? 432 non-null object 49 InductionInfo. Topic:Did you undertake a Institute Induction? 483 non-null object 50 InductionInfo. Topic: Did you undertake Team Induction? 440 non-null object 51 InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 555 non-null object 52 InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted? 555 non-null object 53 InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction? 555 non-null object 54 InductionInfo. Face to Face Topic:Did you undertake a Institute Induction? 530 non-null object 55 InductionInfo. On-line Topic:Did you undertake a Institute Induction? 555 non-null object 56 InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction? 553 non-null object 57 InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category? 555 non-null object 58 InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.] 555 non-null object 59 InductionInfo. Induction Manual Topic: Did you undertake Team Induction? 555 non-null object 60 Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)? 608 non-null object 61 Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? 594 non-null object 62 Workplace. Topic:Does your workplace promote and practice the principles of employment equity? 587 non-null object 63 Workplace. Topic:Does your workplace value the diversity of its employees? 586 non-null object 64 Workplace. Topic:Would you recommend the Institute as an employer to others? 581 non-null object 65 Gender. What is your Gender? 596 non-null object 66 CurrentAge. Current Age 596 non-null object 67 Employment Type. Employment Type 596 non-null object 68 Classification. Classification 596 non-null object 69 LengthofServiceOverall. Overall Length of Service at Institute (in years) 596 non-null object 70 LengthofServiceCurrent. Length of Service at current workplace (in years) 596 non-null object dtypes: float64(1), object(70) memory usage: 394.9+ KB
dete_survey.head() # Prints the first five rows for the dete_survey dataframe
SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | Career move to public sector | ... | Kept informed | Wellness programs | Health & Safety | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
1 | Ill Health Retirement | 08/2012 | 1984 | 2004 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | True | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
2 | Voluntary Early Retirement (VER) | 08/2012 | Not Stated | Not Stated | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | False | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
3 | Voluntary Early Retirement (VER) | 05/2012 | 2011 | 2011 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | N | N | N | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
4 | Resignation-Other reasons | 05/2012 | 2005 | 2006 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | A | N | A | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
5 | Age Retirement | 05/2012 | 1970 | 1989 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | False | ... | N | A | M | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
5 rows × 55 columns
tafe_survey.head() # Prints the first five rows for the tafe_survey dataframe
Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? | Workplace. Topic:Does your workplace promote and practice the principles of employment equity? | Workplace. Topic:Does your workplace value the diversity of its employees? | Workplace. Topic:Would you recommend the Institute as an employer to others? | Gender. What is your Gender? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | Yes | Yes | Yes | Yes | Female | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | Yes | Yes | Yes | Yes | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
5 rows × 71 columns
Separation = dete_survey['SeparationType'] # Selects only the values from the SeparationType column and it is stored in the Separation variable
Separation.value_counts() # Prints the number of times each value shows up for the SeparationType column
Age Retirement 285 Resignation-Other reasons 150 Resignation-Other employer 91 Resignation-Move overseas/interstate 70 Voluntary Early Retirement (VER) 67 Ill Health Retirement 61 Other 49 Contract Expired 34 Termination 15 Name: SeparationType, dtype: int64
Cease_date = dete_survey['Cease Date'] # Selects only the values from the Cease Date column
Cease_date # Prints out the values for the cease_date column
ID 1 08/2012 2 08/2012 3 05/2012 4 05/2012 5 05/2012 ... 819 02/2014 820 01/2014 821 01/2014 822 12/2013 823 12/2013 Name: Cease Date, Length: 822, dtype: object
Cease_date.value_counts()# Prints the number of times each value shows up for the Cease_Date column
2012 344 2013 200 01/2014 43 12/2013 40 Not Stated 34 09/2013 34 06/2013 27 07/2013 22 10/2013 20 11/2013 16 08/2013 12 05/2013 7 05/2012 6 04/2014 2 07/2014 2 08/2012 2 04/2013 2 02/2014 2 11/2012 1 09/2010 1 2010 1 2014 1 07/2012 1 09/2014 1 07/2006 1 Name: Cease Date, dtype: int64
ceasing_employment = tafe_survey['Reason for ceasing employment'] # Selects the values only from the reason for ceasing employment column and is stored in a variable called eceasing employment
ceasing_employment # Prints out the values for the ceasing_employment column
Record ID 6.341330e+17 Contract Expired 6.341337e+17 Retirement 6.341388e+17 Retirement 6.341399e+17 Resignation 6.341466e+17 Resignation ... 6.350668e+17 Resignation 6.350677e+17 Resignation 6.350704e+17 Resignation 6.350712e+17 Contract Expired 6.350730e+17 Resignation Name: Reason for ceasing employment, Length: 702, dtype: object
ceasing_employment.value_counts() # Prints out the number of times each value shows up for the ceasing employment column
Resignation 340 Contract Expired 127 Retrenchment/ Redundancy 104 Retirement 82 Transfer 25 Termination 23 Name: Reason for ceasing employment, dtype: int64
tafe_survey.isnull().sum() # Checks for any null values in each column in the dataframe
Institute 0 WorkArea 0 CESSATION YEAR 7 Reason for ceasing employment 1 Contributing Factors. Career Move - Public Sector 265 ... CurrentAge. Current Age 106 Employment Type. Employment Type 106 Classification. Classification 106 LengthofServiceOverall. Overall Length of Service at Institute (in years) 106 LengthofServiceCurrent. Length of Service at current workplace (in years) 106 Length: 71, dtype: int64
dete_survey.isnull().sum() # Checks for any null values in each column in the dataframe
SeparationType 0 Cease Date 0 DETE Start Date 0 Role Start Date 0 Position 5 Classification 367 Region 0 Business Unit 696 Employment Status 5 Career move to public sector 0 Career move to private sector 0 Interpersonal conflicts 0 Job dissatisfaction 0 Dissatisfaction with the department 0 Physical work environment 0 Lack of recognition 0 Lack of job security 0 Work location 0 Employment conditions 0 Maternity/family 0 Relocation 0 Study/Travel 0 Ill Health 0 Traumatic incident 0 Work life balance 0 Workload 0 None of the above 0 Professional Development 14 Opportunities for promotion 87 Staff morale 6 Workplace issue 34 Physical environment 5 Worklife balance 7 Stress and pressure support 12 Performance of supervisor 9 Peer support 10 Initiative 9 Skills 11 Coach 55 Career Aspirations 76 Feedback 30 Further PD 54 Communication 8 My say 10 Information 6 Kept informed 9 Wellness programs 56 Health & Safety 29 Gender 24 Age 11 Aboriginal 806 Torres Strait 819 South Sea 815 Disability 799 NESB 790 dtype: int64
dete_survey = pd.read_csv('dete_survey.csv', na_values = 'Not Stated', index_col = 0) # Not Stated read in as NaN. In short, any value that is 'Not Stated' is replaced with NaN
dete_survey # Prints out the updated dataframe
SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | Career move to public sector | ... | Kept informed | Wellness programs | Health & Safety | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
1 | Ill Health Retirement | 08/2012 | 1984.0 | 2004.0 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | True | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
2 | Voluntary Early Retirement (VER) | 08/2012 | NaN | NaN | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | False | ... | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
3 | Voluntary Early Retirement (VER) | 05/2012 | 2011.0 | 2011.0 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | N | N | N | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
4 | Resignation-Other reasons | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | A | N | A | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
5 | Age Retirement | 05/2012 | 1970.0 | 1989.0 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | False | ... | N | A | M | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
819 | Age Retirement | 02/2014 | 1977.0 | 1999.0 | Teacher | Primary | Central Queensland | NaN | Permanent Part-time | False | ... | A | A | SA | Female | 56-60 | NaN | NaN | NaN | NaN | NaN |
820 | Age Retirement | 01/2014 | 1980.0 | 1980.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | ... | N | N | N | Male | 51-55 | NaN | NaN | NaN | NaN | NaN |
821 | Resignation-Move overseas/interstate | 01/2014 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | ... | A | N | A | Female | 31-35 | NaN | NaN | NaN | NaN | NaN |
822 | Ill Health Retirement | 12/2013 | 2001.0 | 2009.0 | Teacher | Secondary | Darling Downs South West | NaN | Permanent Full-time | False | ... | A | N | A | Female | 41-45 | NaN | NaN | NaN | NaN | NaN |
823 | Resignation-Move overseas/interstate | 12/2013 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
822 rows × 55 columns
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49], axis = 1) # Drops column numbers 28 through 48
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66], axis = 1) # Drops column numbers 17 through 65b
tafe_survey_updated
Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | - | ... | - | - | - | NONE | NaN | NaN | NaN | NaN | NaN | NaN |
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | - | NaN | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
6.350712e+17 | Southbank Institute of Technology | Non-Delivery (corporate) | 2013.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | 41 45 | Temporary Full-time | Professional Officer (PO) | 1-2 | 1-2 |
6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | - | ... | - | Travel | - | - | Career Move - Self-employment | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
702 rows × 22 columns
dete_survey_update
. Second, we dropped column numbers 17 through 65 for the tafe_survey dataframe and stored it into a variable called tafe_survey_updated
. The reason to why we made these changes is because they both contain many columns that we do not need to complete our analysis. Also, each dataframe contains many of the same columns, but the column names are different. Lastly, there are multiple columns/answers that indicate an employee resigned because they were dissatisfied. We want to avoid repetitiondete_survey_updated.columns = dete_survey_updated.columns.str.replace(' ', '_').str.lower().str.strip() # Makes all of the columns lower case, gets rid of any trailing white space from the end of strings, and replaces the spaces with underscores
mapping = {'Record ID': 'id', 'CESSATION YEAR': 'cease_date', 'Reason for ceasing employment'
: 'separationtype', 'Gender. What is your Gender': 'gender', 'CurrentAge. Current Age': 'age',
'Employment Type. Employment Type': 'employment_status', 'Classification. Classification': 'position',
'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'}
tafe_survey_updated.rename(columns = mapping, inplace = True) # Used the dataframe.rename() method to change some of the column names to more simpler name
tafe_survey_updated.head() # Prints the first five rows for the tafe_survey_updated dataframe
Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | age | employment_status | position | institute_service | role_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | - | ... | - | - | - | NONE | NaN | NaN | NaN | NaN | NaN | NaN |
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | - | NaN | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
5 rows × 22 columns
dete_survey_updated.head() # Prints the first five rows for the dete_survey_updated dataframe
separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | ... | work_life_balance | workload | none_of_the_above | professional_development | age | aboriginal | torres_strait | south_sea | disability | nesb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
1 | Ill Health Retirement | 08/2012 | 1984.0 | 2004.0 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | True | ... | False | False | True | A | 56-60 | NaN | NaN | NaN | NaN | Yes |
2 | Voluntary Early Retirement (VER) | 08/2012 | NaN | NaN | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | False | ... | False | False | False | A | 56-60 | NaN | NaN | NaN | NaN | NaN |
3 | Voluntary Early Retirement (VER) | 05/2012 | 2011.0 | 2011.0 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | False | False | True | N | 61 or older | NaN | NaN | NaN | NaN | NaN |
4 | Resignation-Other reasons | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 36-40 | NaN | NaN | NaN | NaN | NaN |
5 | Age Retirement | 05/2012 | 1970.0 | 1989.0 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | False | ... | True | False | False | A | 61 or older | NaN | NaN | NaN | NaN | NaN |
5 rows × 34 columns
DataFrame.rename()
method to update some of the columns in the tafe_survey_updated dataframe. The reason to why we did this is because both of the dataframes contain many of the same columns but the column names are different. Also, because eventually we want to combine the columns, we had to standardize the column names. I also printed the first five rows of each of the updated dataframes to make sure that my changes look good.dete_survey_updated['separationtype'].value_counts() # Counts up the number of times each unique value shows up in the separationtype column in the dataset
Age Retirement 285 Resignation-Other reasons 150 Resignation-Other employer 91 Resignation-Move overseas/interstate 70 Voluntary Early Retirement (VER) 67 Ill Health Retirement 61 Other 49 Contract Expired 34 Termination 15 Name: separationtype, dtype: int64
tafe_survey_updated['separationtype'].value_counts() # Counts up the number of times each unique value shows up in the separationtype column in the dataset
Resignation 340 Contract Expired 127 Retrenchment/ Redundancy 104 Retirement 82 Transfer 25 Termination 23 Name: separationtype, dtype: int64
dete_survey_updated.copy() # Copied dataset to avoid the SettingWithCopy warning
separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | ... | work_life_balance | workload | none_of_the_above | professional_development | age | aboriginal | torres_strait | south_sea | disability | nesb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
1 | Ill Health Retirement | 08/2012 | 1984.0 | 2004.0 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | True | ... | False | False | True | A | 56-60 | NaN | NaN | NaN | NaN | Yes |
2 | Voluntary Early Retirement (VER) | 08/2012 | NaN | NaN | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | False | ... | False | False | False | A | 56-60 | NaN | NaN | NaN | NaN | NaN |
3 | Voluntary Early Retirement (VER) | 05/2012 | 2011.0 | 2011.0 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | False | False | True | N | 61 or older | NaN | NaN | NaN | NaN | NaN |
4 | Resignation-Other reasons | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 36-40 | NaN | NaN | NaN | NaN | NaN |
5 | Age Retirement | 05/2012 | 1970.0 | 1989.0 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | False | ... | True | False | False | A | 61 or older | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
819 | Age Retirement | 02/2014 | 1977.0 | 1999.0 | Teacher | Primary | Central Queensland | NaN | Permanent Part-time | False | ... | False | True | False | SA | 56-60 | NaN | NaN | NaN | NaN | NaN |
820 | Age Retirement | 01/2014 | 1980.0 | 1980.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | ... | False | False | True | A | 51-55 | NaN | NaN | NaN | NaN | NaN |
821 | Resignation-Move overseas/interstate | 01/2014 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | ... | True | False | False | A | 31-35 | NaN | NaN | NaN | NaN | NaN |
822 | Ill Health Retirement | 12/2013 | 2001.0 | 2009.0 | Teacher | Secondary | Darling Downs South West | NaN | Permanent Full-time | False | ... | False | False | False | A | 41-45 | NaN | NaN | NaN | NaN | NaN |
823 | Resignation-Move overseas/interstate | 12/2013 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | ... | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
822 rows × 34 columns
tafe_survey_updated.copy() # Copied dataset to avoid the SettingWithCopy warning
Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | age | employment_status | position | institute_service | role_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341330e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
6.341337e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Retirement | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341388e+17 | Mount Isa Institute of TAFE | Delivery (teaching) | 2010.0 | Retirement | - | - | - | - | - | - | ... | - | - | - | NONE | NaN | NaN | NaN | NaN | NaN | NaN |
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | - | NaN | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
6.350712e+17 | Southbank Institute of Technology | Non-Delivery (corporate) | 2013.0 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | 41 45 | Temporary Full-time | Professional Officer (PO) | 1-2 | 1-2 |
6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | - | ... | - | Travel | - | - | Career Move - Self-employment | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
702 rows × 22 columns
tafe_resignations = tafe_survey_updated[tafe_survey_updated['separationtype'] == "Resignation"] # Selects the rows where the separationtype is resignation for the tafe_survey_updated dataframe
tafe_resignations
Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | age | employment_status | position | institute_service | role_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | - | NaN | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
6.341475e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 56 or older | Contract/casual | Teacher (including LVT) | 7-10 | 7-10 |
6.341520e+17 | Barrier Reef Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | Maternity/Family | - | ... | - | - | Other | - | Career Move - Private Sector | 20 or younger | Temporary Full-time | Administration (AO) | 3-4 | 3-4 |
6.341537e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 46 50 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | - | NaN | 21 25 | Temporary Full-time | Operational (OO) | 5-6 | 5-6 |
6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | - | ... | - | Travel | - | - | Career Move - Self-employment | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
340 rows × 22 columns
dete_resignations = dete_survey_updated[(dete_survey_updated["separationtype"] == "Resignation-Other reasons")
| (dete_survey_updated["separationtype"] == "Resignation-Other employer")
| (dete_survey_updated["separationtype"] == "Resignation-Move overseas/interstate")]
# Selects the rows that have a separation type value of either resignation- other reasons, resignation - other employer, or
# resignation- move overseas/interstate and stored into the dete_resignations variable
dete_resignations # Prints out the updated dete_survey dataframe
separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | ... | work_life_balance | workload | none_of_the_above | professional_development | age | aboriginal | torres_strait | south_sea | disability | nesb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
4 | Resignation-Other reasons | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 36-40 | NaN | NaN | NaN | NaN | NaN |
6 | Resignation-Other reasons | 05/2012 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | False | False | False | SD | 41-45 | NaN | NaN | NaN | NaN | NaN |
9 | Resignation-Other reasons | 07/2012 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | D | 31-35 | NaN | NaN | NaN | NaN | NaN |
10 | Resignation-Other employer | 2012 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | False | False | SD | 46-50 | NaN | NaN | NaN | NaN | NaN |
12 | Resignation-Move overseas/interstate | 2012 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | N | 31-35 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
810 | Resignation-Other reasons | 12/2013 | 2010.0 | 2010.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | False | False | A | 26-30 | NaN | NaN | NaN | NaN | NaN |
817 | Resignation-Other employer | 01/2014 | 2012.0 | 2012.0 | Teacher | Primary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 21-25 | NaN | NaN | NaN | NaN | NaN |
818 | Resignation-Move overseas/interstate | 01/2014 | 2012.0 | 2012.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | ... | False | False | False | SA | 21-25 | NaN | NaN | NaN | NaN | NaN |
821 | Resignation-Move overseas/interstate | 01/2014 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | ... | True | False | False | A | 31-35 | NaN | NaN | NaN | NaN | NaN |
823 | Resignation-Move overseas/interstate | 12/2013 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | ... | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
311 rows × 34 columns
DataFrame.copy()
method to avoid the SettingWithCopy warning. For the tafe_survey_updated dataframe we selected the rows that have a value of resignation for the reason for ceasing employment column and stored it into a variable called tafe_resignations
. For the dete_survey_updated we selected the rows that either have a value of resignation-other reasons, resignation-other employer, resignation-move overseas/interstate values for the separationtype column and store it into a variable called dete_resignations
. We did this because if we look at the unique values in the separationtype
columns in each of the dataframes, we'll see that each contains a couple of different separation types. We only want to analyze survey respondents who resigned, so their separationtype contains the string Resignation
.dete_resignations['cease_date'].value_counts() # Views how many times each of the unique values in the cease_date column show up
2012 126 2013 74 01/2014 22 12/2013 17 06/2013 14 09/2013 11 07/2013 9 11/2013 9 10/2013 6 08/2013 4 05/2012 2 05/2013 2 07/2012 1 2010 1 09/2010 1 07/2006 1 Name: cease_date, dtype: int64
pattern = r"([1-2][0-9][0-9][0-9])" # What each digit in the years part can be. The first three digits can be between 0 and 9 and the last digit can be either 1 or 2
dete_resignations['cease_date'] = dete_resignations['cease_date'].str.extract(pattern) # Extracts years from the cease_date column by using the Series.str.extract() method
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/2022234742.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dete_resignations['cease_date'] = dete_resignations['cease_date'].str.extract(pattern) # Extracts years from the cease_date column by using the Series.str.extract() method
dete_resignations['cease_date'] = dete_resignations['cease_date'].astype('float') # Converts the values in the ceasedate column to a float or decimal
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/3777933587.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dete_resignations['cease_date'] = dete_resignations['cease_date'].astype('float') # Converts the values in the ceasedate column to a float or decimal
dete_resignations # Most up to date dataframe
separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | ... | work_life_balance | workload | none_of_the_above | professional_development | age | aboriginal | torres_strait | south_sea | disability | nesb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
4 | Resignation-Other reasons | 2012.0 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 36-40 | NaN | NaN | NaN | NaN | NaN |
6 | Resignation-Other reasons | 2012.0 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | False | False | False | SD | 41-45 | NaN | NaN | NaN | NaN | NaN |
9 | Resignation-Other reasons | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | D | 31-35 | NaN | NaN | NaN | NaN | NaN |
10 | Resignation-Other employer | 2012.0 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | False | False | SD | 46-50 | NaN | NaN | NaN | NaN | NaN |
12 | Resignation-Move overseas/interstate | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | N | 31-35 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
810 | Resignation-Other reasons | 2013.0 | 2010.0 | 2010.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | False | False | A | 26-30 | NaN | NaN | NaN | NaN | NaN |
817 | Resignation-Other employer | 2014.0 | 2012.0 | 2012.0 | Teacher | Primary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 21-25 | NaN | NaN | NaN | NaN | NaN |
818 | Resignation-Move overseas/interstate | 2014.0 | 2012.0 | 2012.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | ... | False | False | False | SA | 21-25 | NaN | NaN | NaN | NaN | NaN |
821 | Resignation-Move overseas/interstate | 2014.0 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | ... | True | False | False | A | 31-35 | NaN | NaN | NaN | NaN | NaN |
823 | Resignation-Move overseas/interstate | 2013.0 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | ... | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
311 rows × 34 columns
dete_resignations['cease_date'].value_counts() # Number of times each unique value shows up in the cease_date column
2013.0 146 2012.0 129 2014.0 22 2010.0 2 2006.0 1 Name: cease_date, dtype: int64
dete_resignations['dete_start_date'].value_counts().sort_index(ascending = True) # Number of times each unique value shows up in the dete_start_date column
1963.0 1 1971.0 1 1972.0 1 1973.0 1 1974.0 2 1975.0 1 1976.0 2 1977.0 1 1980.0 5 1982.0 1 1983.0 2 1984.0 1 1985.0 3 1986.0 3 1987.0 1 1988.0 4 1989.0 4 1990.0 5 1991.0 4 1992.0 6 1993.0 5 1994.0 6 1995.0 4 1996.0 6 1997.0 5 1998.0 6 1999.0 8 2000.0 9 2001.0 3 2002.0 6 2003.0 6 2004.0 14 2005.0 15 2006.0 13 2007.0 21 2008.0 22 2009.0 13 2010.0 17 2011.0 24 2012.0 21 2013.0 10 Name: dete_start_date, dtype: int64
tafe_resignations['cease_date'].value_counts().sort_index(ascending = True) # Counts up the number of times each unique value shows up in the cessation year column in ascending order
2009.0 2 2010.0 68 2011.0 116 2012.0 94 2013.0 55 Name: cease_date, dtype: int64
tafe_survey_updated.info()
<class 'pandas.core.frame.DataFrame'> Float64Index: 702 entries, 6.34133009996094e+17 to 6.35073030973791e+17 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Institute 702 non-null object 1 WorkArea 702 non-null object 2 cease_date 695 non-null float64 3 separationtype 701 non-null object 4 Contributing Factors. Career Move - Public Sector 437 non-null object 5 Contributing Factors. Career Move - Private Sector 437 non-null object 6 Contributing Factors. Career Move - Self-employment 437 non-null object 7 Contributing Factors. Ill Health 437 non-null object 8 Contributing Factors. Maternity/Family 437 non-null object 9 Contributing Factors. Dissatisfaction 437 non-null object 10 Contributing Factors. Job Dissatisfaction 437 non-null object 11 Contributing Factors. Interpersonal Conflict 437 non-null object 12 Contributing Factors. Study 437 non-null object 13 Contributing Factors. Travel 437 non-null object 14 Contributing Factors. Other 437 non-null object 15 Contributing Factors. NONE 437 non-null object 16 Main Factor. Which of these was the main factor for leaving? 113 non-null object 17 age 596 non-null object 18 employment_status 596 non-null object 19 position 596 non-null object 20 institute_service 596 non-null object 21 role_service 596 non-null object dtypes: float64(1), object(21) memory usage: 126.1+ KB
dete_resignations.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 311 entries, 4 to 823 Data columns (total 34 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 separationtype 311 non-null object 1 cease_date 300 non-null float64 2 dete_start_date 283 non-null float64 3 role_start_date 271 non-null float64 4 position 308 non-null object 5 classification 161 non-null object 6 region 265 non-null object 7 business_unit 32 non-null object 8 employment_status 307 non-null object 9 career_move_to_public_sector 311 non-null bool 10 career_move_to_private_sector 311 non-null bool 11 interpersonal_conflicts 311 non-null bool 12 job_dissatisfaction 311 non-null bool 13 dissatisfaction_with_the_department 311 non-null bool 14 physical_work_environment 311 non-null bool 15 lack_of_recognition 311 non-null bool 16 lack_of_job_security 311 non-null bool 17 work_location 311 non-null bool 18 employment_conditions 311 non-null bool 19 maternity/family 311 non-null bool 20 relocation 311 non-null bool 21 study/travel 311 non-null bool 22 ill_health 311 non-null bool 23 traumatic_incident 311 non-null bool 24 work_life_balance 311 non-null bool 25 workload 311 non-null bool 26 none_of_the_above 311 non-null bool 27 professional_development 305 non-null object 28 age 306 non-null object 29 aboriginal 7 non-null object 30 torres_strait 0 non-null object 31 south_sea 3 non-null object 32 disability 8 non-null object 33 nesb 9 non-null object dtypes: bool(18), float64(3), object(13) memory usage: 46.8+ KB
%matplotlib inline
import matplotlib.pyplot as plt
boxplot1 = dete_resignations['dete_start_date'].plot(kind = 'box') # Boxplot of the dete_start_date column
# that is in the dete_resignations dataframe
boxplot2 = dete_resignations['cease_date'].plot(kind = 'box') # Boxplot of the cease_date column
# that is in the dete_resignations dataframe
boxplot3 = tafe_resignations['cease_date'].plot(kind = 'box') # Boxplot of the cessation year
cease_date
column from the dete_resignations
dataframe showed up. After that , we extracted the years from the cease_date
column by using vectorized string methods. We did this because since the cease_date
is the last year of the person's employment and the dete_start_date
is the person's first year of employment, it wouldn't make any sense to have years after the current date. Then we converted it to a float. After that, we checked the number of times each unique value showed up in the cease_date
column in the tafe_resignations
dataframe. Then we made a couple of box and whisker plots and we can see that each of the columns in the box and whisker plots have some outlierstafe_resignations
Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | age | employment_status | position | institute_service | role_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | - | NaN | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
6.341475e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 56 or older | Contract/casual | Teacher (including LVT) | 7-10 | 7-10 |
6.341520e+17 | Barrier Reef Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | Maternity/Family | - | ... | - | - | Other | - | Career Move - Private Sector | 20 or younger | Temporary Full-time | Administration (AO) | 3-4 | 3-4 |
6.341537e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 46 50 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | - | NaN | 21 25 | Temporary Full-time | Operational (OO) | 5-6 | 5-6 |
6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN |
6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | - | ... | - | - | Other | - | NaN | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | - | ... | - | Travel | - | - | Career Move - Self-employment | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
340 rows × 22 columns
dete_resignations
separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | ... | work_life_balance | workload | none_of_the_above | professional_development | age | aboriginal | torres_strait | south_sea | disability | nesb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
4 | Resignation-Other reasons | 2012.0 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 36-40 | NaN | NaN | NaN | NaN | NaN |
6 | Resignation-Other reasons | 2012.0 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | False | False | False | SD | 41-45 | NaN | NaN | NaN | NaN | NaN |
9 | Resignation-Other reasons | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | D | 31-35 | NaN | NaN | NaN | NaN | NaN |
10 | Resignation-Other employer | 2012.0 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | False | False | SD | 46-50 | NaN | NaN | NaN | NaN | NaN |
12 | Resignation-Move overseas/interstate | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | N | 31-35 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
810 | Resignation-Other reasons | 2013.0 | 2010.0 | 2010.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | False | False | A | 26-30 | NaN | NaN | NaN | NaN | NaN |
817 | Resignation-Other employer | 2014.0 | 2012.0 | 2012.0 | Teacher | Primary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | False | False | A | 21-25 | NaN | NaN | NaN | NaN | NaN |
818 | Resignation-Move overseas/interstate | 2014.0 | 2012.0 | 2012.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | ... | False | False | False | SA | 21-25 | NaN | NaN | NaN | NaN | NaN |
821 | Resignation-Move overseas/interstate | 2014.0 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | ... | True | False | False | A | 31-35 | NaN | NaN | NaN | NaN | NaN |
823 | Resignation-Move overseas/interstate | 2013.0 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | ... | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
311 rows × 34 columns
dete_resignations['institute_service'] = dete_resignations['cease_date'] - dete_resignations['dete_start_date'] # Created a new column
# name called institute services which is simply the cease date minus the dete start date
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/260048519.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dete_resignations['institute_service'] = dete_resignations['cease_date'] - dete_resignations['dete_start_date'] # Created a new column
institute_services
for the dete_resignations
dataframe. We did this because in order to analyze both of the surveys together, we would have to create a corresponding institute_service
column in dete_resignations
. Also, for our information, in the human resources field, the length of time an employee spent in a workplace is referred to as their years of service. This is the question we want to answer:tafe_resignations['Contributing Factors. Dissatisfaction'].value_counts().sort_index(ascending= True) # Prints out the number of
# times each unique value shows up in the contributing factors dissatisfaction column in the tafe_resignations dataframe
- 277 Contributing Factors. Dissatisfaction 55 Name: Contributing Factors. Dissatisfaction, dtype: int64
tafe_resignations['Contributing Factors. Job Dissatisfaction'].value_counts().sort_index(ascending= True) # Prints out the number of
# times each unique value shows up in the contributing factors job dissatisfaction column in the tafe_resignations dataframe
- 270 Job Dissatisfaction 62 Name: Contributing Factors. Job Dissatisfaction, dtype: int64
def update_vals(value): # Created a function called update_vals that has one parameter called value
if pd.isnull(value):
return np.nan
if value == '-':
return False
else:
return True
tafe_resignations[['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction']].applymap(update_vals)
# Applies some of the column names from the tafe_resignations dataframe and sees if any of them contain a true, false , or NaN value. If it is false, then it will not be in a column called
# dissatisfied
Contributing Factors. Dissatisfaction | Contributing Factors. Job Dissatisfaction | |
---|---|---|
Record ID | ||
6.341399e+17 | False | False |
6.341466e+17 | False | False |
6.341475e+17 | False | False |
6.341520e+17 | False | False |
6.341537e+17 | False | False |
... | ... | ... |
6.350660e+17 | False | False |
6.350668e+17 | False | False |
6.350677e+17 | False | False |
6.350704e+17 | False | False |
6.350730e+17 | False | False |
340 rows × 2 columns
tafe_resignations['dissatisfied'] = tafe_resignations[['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction']].any(axis = 1, skipna = False) #
# Creates a new column called dissatisfied for the tafe_resignations dataframe
# In the dissatisfied column, it will only be true if the employee left only because they were dissatisfed
# But if there were other reasons other than being dissatisfied, then it will be false
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/3873368079.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy tafe_resignations['dissatisfied'] = tafe_resignations[['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction']].any(axis = 1, skipna = False) #
dete_resignations['dissatisfied'] = dete_resignations[['job_dissatisfaction', 'dissatisfaction_with_the_department',
'physical_work_environment', 'lack_of_recognition','lack_of_job_security', 'work_location', 'employment_conditions',
'work_life_balance', 'workload' ]].any(axis = 1, skipna = False)
# Creates a new column called dissatisfied for the dete_resignations dataframe.
# In the dissatisfied column, it will only be true if the employee left only because they were dissatisfed
# But if there were other reasons other than being dissatisfied, then it will be false
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/2955350181.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy dete_resignations['dissatisfied'] = dete_resignations[['job_dissatisfaction', 'dissatisfaction_with_the_department',
tafe_resignations_up = tafe_resignations.copy() # Copies the file so that it will avoid a SettingWithCopy Warning
tafe_resignations_up
Institute | WorkArea | cease_date | separationtype | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | ... | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | age | employment_status | position | institute_service | role_service | dissatisfied | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Record ID | |||||||||||||||||||||
6.341399e+17 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | - | - | - | - | - | ... | Travel | - | - | NaN | NaN | NaN | NaN | NaN | NaN | True |
6.341466e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | NaN | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 | True |
6.341475e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | Other | - | NaN | 56 or older | Contract/casual | Teacher (including LVT) | 7-10 | 7-10 | True |
6.341520e+17 | Barrier Reef Institute of TAFE | Non-Delivery (corporate) | 2010.0 | Resignation | - | Career Move - Private Sector | - | - | Maternity/Family | - | ... | - | Other | - | Career Move - Private Sector | 20 or younger | Temporary Full-time | Administration (AO) | 3-4 | 3-4 | True |
6.341537e+17 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2010.0 | Resignation | - | - | - | - | - | - | ... | - | Other | - | NaN | 46 50 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 | True |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | Career Move - Private Sector | - | - | - | - | ... | - | - | - | NaN | 21 25 | Temporary Full-time | Operational (OO) | 5-6 | 5-6 | True |
6.350668e+17 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | NaN | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 | True |
6.350677e+17 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | Career Move - Public Sector | - | - | - | - | - | ... | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN | True |
6.350704e+17 | Tropical North Institute of TAFE | Delivery (teaching) | 2013.0 | Resignation | - | - | - | - | - | - | ... | - | Other | - | NaN | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 | True |
6.350730e+17 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2013.0 | Resignation | - | - | Career Move - Self-employment | - | - | - | ... | Travel | - | - | Career Move - Self-employment | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 | True |
340 rows × 23 columns
dete_resignations_up = dete_resignations.copy() # Copies the file so that it will avoid a SettingWithCopy Warning
dete_resignations_up
separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | ... | none_of_the_above | professional_development | age | aboriginal | torres_strait | south_sea | disability | nesb | institute_service | dissatisfied | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | |||||||||||||||||||||
4 | Resignation-Other reasons | 2012.0 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | False | A | 36-40 | NaN | NaN | NaN | NaN | NaN | 7.0 | False |
6 | Resignation-Other reasons | 2012.0 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | False | SD | 41-45 | NaN | NaN | NaN | NaN | NaN | 18.0 | True |
9 | Resignation-Other reasons | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | False | ... | False | D | 31-35 | NaN | NaN | NaN | NaN | NaN | 3.0 | False |
10 | Resignation-Other employer | 2012.0 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | SD | 46-50 | NaN | NaN | NaN | NaN | NaN | 15.0 | True |
12 | Resignation-Move overseas/interstate | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | N | 31-35 | NaN | NaN | NaN | NaN | NaN | 3.0 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
810 | Resignation-Other reasons | 2013.0 | 2010.0 | 2010.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | False | A | 26-30 | NaN | NaN | NaN | NaN | NaN | 3.0 | False |
817 | Resignation-Other employer | 2014.0 | 2012.0 | 2012.0 | Teacher | Primary | Far North Queensland | NaN | Permanent Full-time | False | ... | False | A | 21-25 | NaN | NaN | NaN | NaN | NaN | 2.0 | False |
818 | Resignation-Move overseas/interstate | 2014.0 | 2012.0 | 2012.0 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | ... | False | SA | 21-25 | NaN | NaN | NaN | NaN | NaN | 2.0 | False |
821 | Resignation-Move overseas/interstate | 2014.0 | 2009.0 | 2009.0 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | ... | False | A | 31-35 | NaN | NaN | NaN | NaN | NaN | 5.0 | True |
823 | Resignation-Move overseas/interstate | 2013.0 | NaN | NaN | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | ... | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | False |
311 rows × 36 columns
Contributing Factors. Job Dissatisfaction
and Contributing Factors. Dissatisfaction
columns in the tafe_resignations
dataframe. After that, we made a function called update_vals()
that included a parameter called value
that does these following things:NaN
, return np.nan
-
, return FalseTrue
df.applymap()
method for both of the tafe_resignations
and dete_resignations
dataframes to apply the update_vals()
. After that, we used the Dataframe,any()
in order to make a new column called dissatisfied
. Then after that, we copied both of the files using the df.copy()
method to avoid the SettingWithCopy Warningdete_resignations_up['institute'] = 'DETE' # Created a new column called institute and have each row have a value called DETE
tafe_resignations_up['institute'] = 'TAFE' # Created a new column called institue and have each row have a value called TAFE
combined = pd.concat([dete_resignations_up, tafe_resignations_up])
# Combines both of the dete_resignation and tafe_resignation dataframes horizontally
combined
/opt/anaconda3/envs/cpi/lib/python3.10/site-packages/IPython/lib/pretty.py:778: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead. output = repr(obj) /opt/anaconda3/envs/cpi/lib/python3.10/site-packages/IPython/core/formatters.py:343: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead. return method()
separationtype | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | ... | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | Contributing Factors. Job Dissatisfaction | Contributing Factors. Interpersonal Conflict | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | role_service | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4.000000e+00 | Resignation-Other reasons | 2012.0 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
6.000000e+00 | Resignation-Other reasons | 2012.0 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9.000000e+00 | Resignation-Other reasons | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1.000000e+01 | Resignation-Other employer | 2012.0 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1.200000e+01 | Resignation-Move overseas/interstate | 2012.0 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | False | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Resignation | 2013.0 | NaN | NaN | Operational (OO) | NaN | NaN | NaN | Temporary Full-time | NaN | ... | - | - | - | - | - | - | - | - | NaN | 5-6 |
6.350668e+17 | Resignation | 2013.0 | NaN | NaN | Teacher (including LVT) | NaN | NaN | NaN | Temporary Full-time | NaN | ... | - | - | - | - | - | - | - | - | NaN | 1-2 |
6.350677e+17 | Resignation | 2013.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | - | - | - | - | - | - | - | - | NaN | NaN |
6.350704e+17 | Resignation | 2013.0 | NaN | NaN | Teacher (including LVT) | NaN | NaN | NaN | Permanent Full-time | NaN | ... | - | - | - | - | - | - | Other | - | NaN | 1-2 |
6.350730e+17 | Resignation | 2013.0 | NaN | NaN | Administration (AO) | NaN | NaN | NaN | Contract/casual | NaN | ... | - | - | - | - | - | Travel | - | - | Career Move - Self-employment | 1-2 |
651 rows × 53 columns
combined_updated = combined.dropna(thresh = 500, axis = 1) # Drops any columns that have less than 500 non null values
combined_updated
separationtype | cease_date | position | employment_status | age | institute_service | dissatisfied | institute | |
---|---|---|---|---|---|---|---|---|
4.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 36-40 | 7.0 | False | DETE |
6.000000e+00 | Resignation-Other reasons | 2012.0 | Guidance Officer | Permanent Full-time | 41-45 | 18.0 | True | DETE |
9.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE |
1.000000e+01 | Resignation-Other employer | 2012.0 | Teacher Aide | Permanent Part-time | 46-50 | 15.0 | True | DETE |
1.200000e+01 | Resignation-Move overseas/interstate | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE |
... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Resignation | 2013.0 | Operational (OO) | Temporary Full-time | 21 25 | 5-6 | True | TAFE |
6.350668e+17 | Resignation | 2013.0 | Teacher (including LVT) | Temporary Full-time | 51-55 | 1-2 | True | TAFE |
6.350677e+17 | Resignation | 2013.0 | NaN | NaN | NaN | NaN | True | TAFE |
6.350704e+17 | Resignation | 2013.0 | Teacher (including LVT) | Permanent Full-time | 51-55 | 5-6 | True | TAFE |
6.350730e+17 | Resignation | 2013.0 | Administration (AO) | Contract/casual | 26 30 | 3-4 | True | TAFE |
651 rows × 8 columns
institute
for the dete_resignation_up
dataframe and had each row have a value of DETE
for that column. Then, we created a column called institute
for the tafe_resignations_up
dataframe. We also had each row have a vale of TAFE
for that column. Then, we combined both of the datasets and stored it into a variable called combined
. Then we dropped any columns that have less than 500 non null values by using the Dataframe.dropna()
method. We used the thresh
command and ended up storing that in a variable called combined_updated
combined_updated['institute_service'] = combined_updated['institute_service'].astype('str') # Changing the column name to a string type
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/3024573208.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy combined_updated['institute_service'] = combined_updated['institute_service'].astype('str') # Changing the column name to a string type
combined_updated
separationtype | cease_date | position | employment_status | age | institute_service | dissatisfied | institute | |
---|---|---|---|---|---|---|---|---|
4.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 36-40 | 7.0 | False | DETE |
6.000000e+00 | Resignation-Other reasons | 2012.0 | Guidance Officer | Permanent Full-time | 41-45 | 18.0 | True | DETE |
9.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE |
1.000000e+01 | Resignation-Other employer | 2012.0 | Teacher Aide | Permanent Part-time | 46-50 | 15.0 | True | DETE |
1.200000e+01 | Resignation-Move overseas/interstate | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE |
... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Resignation | 2013.0 | Operational (OO) | Temporary Full-time | 21 25 | 5-6 | True | TAFE |
6.350668e+17 | Resignation | 2013.0 | Teacher (including LVT) | Temporary Full-time | 51-55 | 1-2 | True | TAFE |
6.350677e+17 | Resignation | 2013.0 | NaN | NaN | NaN | nan | True | TAFE |
6.350704e+17 | Resignation | 2013.0 | Teacher (including LVT) | Permanent Full-time | 51-55 | 5-6 | True | TAFE |
6.350730e+17 | Resignation | 2013.0 | Administration (AO) | Contract/casual | 26 30 | 3-4 | True | TAFE |
651 rows × 8 columns
pattern = r'(\d+)' # The pattern that we are looking for when extracting the years of service from the institute_service column
combined_updated['institute_service'] = combined_updated['institute_service'].str.extract(pattern) # Extracts the years of service following the pattern that stored in the pattern variable
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/2246548646.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy combined_updated['institute_service'] = combined_updated['institute_service'].str.extract(pattern) # Extracts the years of service following the pattern that stored in the pattern variable
combined_updated['institute_service'] = combined_updated['institute_service'].astype('float') # Converted it to a float
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/650572983.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy combined_updated['institute_service'] = combined_updated['institute_service'].astype('float') # Converted it to a float
def update_vals(value): # Created a function called update_vals() that maps each year value to one of the career stages
# listed in the instructions
if pd.isnull(value): # If the value is NAN then it returns a value of NaN
return np.nan
if value < 3: # If they were there for less than three years it will be returned as new
return 'New'
if value >= 3 and value <= 6: # If between 3 - 6 years, it will be returned as experienced
return 'Experienced'
if value >= 7 and value <= 10: # If between 7 and 10 years it will be returned as Established
return 'Established'
else:
return 'Veteran'
combined_updated['institute_service'].value_counts()
1.0 159 3.0 83 5.0 56 7.0 34 11.0 30 0.0 20 20.0 17 6.0 17 4.0 16 9.0 14 2.0 14 13.0 8 8.0 8 15.0 7 17.0 6 10.0 6 22.0 6 14.0 6 12.0 6 18.0 5 16.0 5 24.0 4 23.0 4 21.0 3 19.0 3 39.0 3 32.0 3 25.0 2 28.0 2 26.0 2 36.0 2 30.0 2 34.0 1 27.0 1 29.0 1 42.0 1 33.0 1 41.0 1 35.0 1 49.0 1 38.0 1 31.0 1 Name: institute_service, dtype: int64
combined_updated['service_cat'] = combined_updated['institute_service'].apply(update_vals) # Creates a new column called service_cat which is essentialy the career stage that each of the employees
# are in whether it be new, experienced, established, or veteran
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/3687251441.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy combined_updated['service_cat'] = combined_updated['institute_service'].apply(update_vals) # Creates a new column called service_cat which is essentialy the career stage that each of the employees
combined_updated
separationtype | cease_date | position | employment_status | age | institute_service | dissatisfied | institute | service_cat | |
---|---|---|---|---|---|---|---|---|---|
4.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 36-40 | 7.0 | False | DETE | Established |
6.000000e+00 | Resignation-Other reasons | 2012.0 | Guidance Officer | Permanent Full-time | 41-45 | 18.0 | True | DETE | Veteran |
9.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
1.000000e+01 | Resignation-Other employer | 2012.0 | Teacher Aide | Permanent Part-time | 46-50 | 15.0 | True | DETE | Veteran |
1.200000e+01 | Resignation-Move overseas/interstate | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Resignation | 2013.0 | Operational (OO) | Temporary Full-time | 21 25 | 5.0 | True | TAFE | Experienced |
6.350668e+17 | Resignation | 2013.0 | Teacher (including LVT) | Temporary Full-time | 51-55 | 1.0 | True | TAFE | New |
6.350677e+17 | Resignation | 2013.0 | NaN | NaN | NaN | NaN | True | TAFE | NaN |
6.350704e+17 | Resignation | 2013.0 | Teacher (including LVT) | Permanent Full-time | 51-55 | 5.0 | True | TAFE | Experienced |
6.350730e+17 | Resignation | 2013.0 | Administration (AO) | Contract/casual | 26 30 | 3.0 | True | TAFE | Experienced |
651 rows × 9 columns
Series.astype()
method to change the type for the institute_service
column to str
. After that, we used vectorized string methods to extract the years from each pattern for the institute_service
column. For the pattern
variable, we have a value, r(\d+)
and we used that for the str.extract()
method. After that we used the Series.astype()
method to change the column to a float
type. Then we mapped, each value to one of the career stages listed in the instructions which would either be New
, Experienced
, Established
, or Veteran
. We created a function called update_vals()
with a parameter called value
. If the value for the value
parameter is null, it will return NaN. If it is less than 3
, it would return the value, New
. If it is between 3
and 6
years, it would return the value, Experienced
. If it is between 7
and 10
years, then it will return the value, Established
. Lastly, if the value for the parameter is 11
or more years, then it would return a value of, Veteran
. We then used the Series.apply()
method to apply the function to the institute_service
column. We then stored it into a new column called service_cat
combined_updated['service_cat'].value_counts() # Counts up the number of times each of the values in the service_cat appears
New 193 Experienced 172 Veteran 136 Established 62 Name: service_cat, dtype: int64
combined_updated['dissatisfied']
4.000000e+00 False 6.000000e+00 True 9.000000e+00 False 1.000000e+01 True 1.200000e+01 False ... 6.350660e+17 True 6.350668e+17 True 6.350677e+17 True 6.350704e+17 True 6.350730e+17 True Name: dissatisfied, Length: 651, dtype: bool
combined_updated['dissatisfied'].value_counts(dropna = False) # Prints the number of times each value
# for the dissatisfied column shows up as well as the missing values
True 489 False 162 Name: dissatisfied, dtype: int64
combined_updated
separationtype | cease_date | position | employment_status | age | institute_service | dissatisfied | institute | service_cat | |
---|---|---|---|---|---|---|---|---|---|
4.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 36-40 | 7.0 | False | DETE | Established |
6.000000e+00 | Resignation-Other reasons | 2012.0 | Guidance Officer | Permanent Full-time | 41-45 | 18.0 | True | DETE | Veteran |
9.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
1.000000e+01 | Resignation-Other employer | 2012.0 | Teacher Aide | Permanent Part-time | 46-50 | 15.0 | True | DETE | Veteran |
1.200000e+01 | Resignation-Move overseas/interstate | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Resignation | 2013.0 | Operational (OO) | Temporary Full-time | 21 25 | 5.0 | True | TAFE | Experienced |
6.350668e+17 | Resignation | 2013.0 | Teacher (including LVT) | Temporary Full-time | 51-55 | 1.0 | True | TAFE | New |
6.350677e+17 | Resignation | 2013.0 | NaN | NaN | NaN | NaN | True | TAFE | NaN |
6.350704e+17 | Resignation | 2013.0 | Teacher (including LVT) | Permanent Full-time | 51-55 | 5.0 | True | TAFE | Experienced |
6.350730e+17 | Resignation | 2013.0 | Administration (AO) | Contract/casual | 26 30 | 3.0 | True | TAFE | Experienced |
651 rows × 9 columns
combined_updated['age'].isna().sum() # Counts up the number of null values that are in the age column
55
combined_updated['age'].value_counts(dropna = False) # Counts up the number of times each value in the age column shows up
# Also there is a dropna parameter that is set equaled to False
# To confirm the number of times NaN shows up which is a missing value
51-55 71 NaN 55 41-45 48 41 45 45 46-50 42 36-40 41 46 50 39 26-30 35 21 25 33 36 40 32 31 35 32 26 30 32 21-25 29 31-35 29 56 or older 29 56-60 26 61 or older 23 20 or younger 10 Name: age, dtype: int64
combined_updated['age'] = combined_updated['age'].fillna(method = "ffill") # Fills in the missing values in the age column
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/2764011580.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy combined_updated['age'] = combined_updated['age'].fillna(method = "ffill") # Fills in the missing values in the age column
combined_updated['age'].isna().sum() # Confirming that there are no longer any more missing values in the age column
0
combined_updated['dissatisfied'].value_counts(dropna = False)
True 489 False 162 Name: dissatisfied, dtype: int64
combined_updated['dissatisfied'] = combined_updated['dissatisfied'].replace("-", np.nan) # Replaces the "-" value
# in the dissatisfied column with, "NaN"
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/1487453079.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy combined_updated['dissatisfied'] = combined_updated['dissatisfied'].replace("-", np.nan) # Replaces the "-" value
combined_updated['dissatisfied'].isna().sum() # Counts up the number of missing values that are in the
# dissatisfied column
0
combined_updated['dissatisfied'] = combined_updated['dissatisfied'].fillna(False) # Fills in the missing values with False
/var/folders/0s/88sw5bwj56v6qymnqpqx8sjm0000gn/T/ipykernel_21902/3080839073.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy combined_updated['dissatisfied'] = combined_updated['dissatisfied'].fillna(False) # Fills in the missing values with False
combined_updated
separationtype | cease_date | position | employment_status | age | institute_service | dissatisfied | institute | service_cat | |
---|---|---|---|---|---|---|---|---|---|
4.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 36-40 | 7.0 | False | DETE | Established |
6.000000e+00 | Resignation-Other reasons | 2012.0 | Guidance Officer | Permanent Full-time | 41-45 | 18.0 | True | DETE | Veteran |
9.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
1.000000e+01 | Resignation-Other employer | 2012.0 | Teacher Aide | Permanent Part-time | 46-50 | 15.0 | True | DETE | Veteran |
1.200000e+01 | Resignation-Move overseas/interstate | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Resignation | 2013.0 | Operational (OO) | Temporary Full-time | 21 25 | 5.0 | True | TAFE | Experienced |
6.350668e+17 | Resignation | 2013.0 | Teacher (including LVT) | Temporary Full-time | 51-55 | 1.0 | True | TAFE | New |
6.350677e+17 | Resignation | 2013.0 | NaN | NaN | 51-55 | NaN | True | TAFE | NaN |
6.350704e+17 | Resignation | 2013.0 | Teacher (including LVT) | Permanent Full-time | 51-55 | 5.0 | True | TAFE | Experienced |
6.350730e+17 | Resignation | 2013.0 | Administration (AO) | Contract/casual | 26 30 | 3.0 | True | TAFE | Experienced |
651 rows × 9 columns
combined_updated['dissatisfied'].value_counts()
True 489 False 162 Name: dissatisfied, dtype: int64
combined_updated['dissatisfied'].isna().sum() # Confirming that there are no longer anymore missing values in
# the dissatisfied column
0
# Creating a dataframe
df = pd.DataFrame({'A': ['John', 'Boby', 'Mina', 'Peter', 'Nicky'],
'B': ['Masters', 'Graduate', 'Graduate', 'Masters', 'Graduate'],
'C': [27, 23, 21, 23, 24]})
df
A | B | C | |
---|---|---|---|
0 | John | Masters | 27 |
1 | Boby | Graduate | 23 |
2 | Mina | Graduate | 21 |
3 | Peter | Masters | 23 |
4 | Nicky | Graduate | 24 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 5 non-null object 1 B 5 non-null object 2 C 5 non-null int64 dtypes: int64(1), object(2) memory usage: 248.0+ bytes
# Simplest pivot table must have a dataframe and an index/list of index
table = pd.pivot_table(df, index = ['A', 'B'])
table
C | ||
---|---|---|
A | B | |
Boby | Graduate | 23 |
John | Masters | 27 |
Mina | Graduate | 21 |
Nicky | Graduate | 24 |
Peter | Masters | 23 |
table.info()
<class 'pandas.core.frame.DataFrame'> MultiIndex: 5 entries, ('Boby', 'Graduate') to ('Peter', 'Masters') Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 C 5 non-null int64 dtypes: int64(1) memory usage: 206.0+ bytes
# Creates a pivot table dataframe
table = pd.pivot_table(df, values = 'A', index = ['B', 'C'], columns = ['B'], aggfunc = np.sum)
table
B | Graduate | Masters | |
---|---|---|---|
B | C | ||
Graduate | 21 | Mina | NaN |
23 | Boby | NaN | |
24 | Nicky | NaN | |
Masters | 23 | NaN | Peter |
27 | NaN | John |
table.info()
<class 'pandas.core.frame.DataFrame'> MultiIndex: 5 entries, ('Graduate', 21) to ('Masters', 27) Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Graduate 3 non-null object 1 Masters 2 non-null object dtypes: object(2) memory usage: 238.0+ bytes
combined_updated
separationtype | cease_date | position | employment_status | age | institute_service | dissatisfied | institute | service_cat | |
---|---|---|---|---|---|---|---|---|---|
4.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 36-40 | 7.0 | False | DETE | Established |
6.000000e+00 | Resignation-Other reasons | 2012.0 | Guidance Officer | Permanent Full-time | 41-45 | 18.0 | True | DETE | Veteran |
9.000000e+00 | Resignation-Other reasons | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
1.000000e+01 | Resignation-Other employer | 2012.0 | Teacher Aide | Permanent Part-time | 46-50 | 15.0 | True | DETE | Veteran |
1.200000e+01 | Resignation-Move overseas/interstate | 2012.0 | Teacher | Permanent Full-time | 31-35 | 3.0 | False | DETE | Experienced |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6.350660e+17 | Resignation | 2013.0 | Operational (OO) | Temporary Full-time | 21 25 | 5.0 | True | TAFE | Experienced |
6.350668e+17 | Resignation | 2013.0 | Teacher (including LVT) | Temporary Full-time | 51-55 | 1.0 | True | TAFE | New |
6.350677e+17 | Resignation | 2013.0 | NaN | NaN | 51-55 | NaN | True | TAFE | NaN |
6.350704e+17 | Resignation | 2013.0 | Teacher (including LVT) | Permanent Full-time | 51-55 | 5.0 | True | TAFE | Experienced |
6.350730e+17 | Resignation | 2013.0 | Administration (AO) | Contract/casual | 26 30 | 3.0 | True | TAFE | Experienced |
651 rows × 9 columns
combined_updated = pd.pivot_table(combined_updated, values = 'dissatisfied', index = ['service_cat'])
combined_updated
dissatisfied | |
---|---|
service_cat | |
Established | 0.741935 |
Experienced | 0.761628 |
New | 0.818653 |
Veteran | 0.676471 |