퇴직여부 분류기 생성¶

IBM에서 제공했던 HR 데이터를 활용
결정트리를 이용하여 어떤 사람이 퇴직을 여부를 분류할 수 있는 분류기 생성

In [1]:

import pandas as pd
pd.options.display.max_columns=None

1. 데이터 로딩¶

IBM kaggle 데이터 : https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset

In [2]:

datasets = pd.read_csv('./inputs/HR-Employee-Attrition.csv')
datasets.head()

Out[2]:

	Age	Attrition	BusinessTravel	DailyRate	Department	DistanceFromHome	Education	EducationField	EmployeeCount	EmployeeNumber	EnvironmentSatisfaction	Gender	HourlyRate	JobInvolvement	JobLevel	JobRole	JobSatisfaction	MaritalStatus	MonthlyIncome	MonthlyRate	NumCompaniesWorked	Over18	OverTime	PercentSalaryHike	PerformanceRating	RelationshipSatisfaction	StandardHours	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
0	41	Yes	Travel_Rarely	1102	Sales	1	2	Life Sciences	1	1	2	Female	94	3	2	Sales Executive	4	Single	5993	19479	8	Y	Yes	11	3	1	80	0	8	0	1	6	4	0	5
1	49	No	Travel_Frequently	279	Research & Development	8	1	Life Sciences	1	2	3	Male	61	2	2	Research Scientist	2	Married	5130	24907	1	Y	No	23	4	4	80	1	10	3	3	10	7	1	7
2	37	Yes	Travel_Rarely	1373	Research & Development	2	2	Other	1	4	4	Male	92	2	1	Laboratory Technician	3	Single	2090	2396	6	Y	Yes	15	3	2	80	0	7	3	3	0	0	0	0
3	33	No	Travel_Frequently	1392	Research & Development	3	4	Life Sciences	1	5	4	Female	56	3	1	Research Scientist	3	Married	2909	23159	1	Y	Yes	11	3	3	80	0	8	3	3	8	7	3	0
4	27	No	Travel_Rarely	591	Research & Development	2	1	Medical	1	7	1	Male	40	3	1	Laboratory Technician	2	Married	3468	16632	9	Y	No	12	3	4	80	1	6	3	3	2	2	2	2

In [3]:

datasets.shape

Out[3]:

(1470, 35)

데이터를 살펴보면 categorycal features, numerical features가 함께 있습니다.

In [4]:

datasets.dtypes

Out[4]:

Age                          int64
Attrition                   object
BusinessTravel              object
DailyRate                    int64
Department                  object
DistanceFromHome             int64
Education                    int64
EducationField              object
EmployeeCount                int64
EmployeeNumber               int64
EnvironmentSatisfaction      int64
Gender                      object
HourlyRate                   int64
JobInvolvement               int64
JobLevel                     int64
JobRole                     object
JobSatisfaction              int64
MaritalStatus               object
MonthlyIncome                int64
MonthlyRate                  int64
NumCompaniesWorked           int64
Over18                      object
OverTime                    object
PercentSalaryHike            int64
PerformanceRating            int64
RelationshipSatisfaction     int64
StandardHours                int64
StockOptionLevel             int64
TotalWorkingYears            int64
TrainingTimesLastYear        int64
WorkLifeBalance              int64
YearsAtCompany               int64
YearsInCurrentRole           int64
YearsSinceLastPromotion      int64
YearsWithCurrManager         int64
dtype: object

Taget variable : Attrition

Yes / No -> 1 / 0으로 변경합니다

1 : 퇴직 Yes
0 : 퇴직 No

In [5]:

datasets['Attrition_idx'] = datasets['Attrition']\
    .apply(lambda x: 1 if x == 'Yes' else 0)
datasets.head()

Out[5]:

	Age	Attrition	BusinessTravel	DailyRate	Department	DistanceFromHome	Education	EducationField	EmployeeCount	EmployeeNumber	EnvironmentSatisfaction	Gender	HourlyRate	JobInvolvement	JobLevel	JobRole	JobSatisfaction	MaritalStatus	MonthlyIncome	MonthlyRate	NumCompaniesWorked	Over18	OverTime	PercentSalaryHike	PerformanceRating	RelationshipSatisfaction	StandardHours	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager	Attrition_idx
0	41	Yes	Travel_Rarely	1102	Sales	1	2	Life Sciences	1	1	2	Female	94	3	2	Sales Executive	4	Single	5993	19479	8	Y	Yes	11	3	1	80	0	8	0	1	6	4	0	5	1
1	49	No	Travel_Frequently	279	Research & Development	8	1	Life Sciences	1	2	3	Male	61	2	2	Research Scientist	2	Married	5130	24907	1	Y	No	23	4	4	80	1	10	3	3	10	7	1	7	0
2	37	Yes	Travel_Rarely	1373	Research & Development	2	2	Other	1	4	4	Male	92	2	1	Laboratory Technician	3	Single	2090	2396	6	Y	Yes	15	3	2	80	0	7	3	3	0	0	0	0	1
3	33	No	Travel_Frequently	1392	Research & Development	3	4	Life Sciences	1	5	4	Female	56	3	1	Research Scientist	3	Married	2909	23159	1	Y	Yes	11	3	3	80	0	8	3	3	8	7	3	0	0
4	27	No	Travel_Rarely	591	Research & Development	2	1	Medical	1	7	1	Male	40	3	1	Laboratory Technician	2	Married	3468	16632	9	Y	No	12	3	4	80	1	6	3	3	2	2	2	2	0

2. Column 전처리¶

In [6]:

col_names = datasets.columns
col_names

Out[6]:

Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount',
       'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate',
       'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction',
       'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
       'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating',
       'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel',
       'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance',
       'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
       'YearsWithCurrManager', 'Attrition_idx'],
      dtype='object')

필요없는 변수들이 있다 : EmployeeCount, EmployeeNumber, Over18, StandardHours

In [7]:

print(datasets.Over18.value_counts())
print(datasets.EmployeeCount.value_counts())
print(datasets.StandardHours.value_counts())

Y    1470
Name: Over18, dtype: int64
1    1470
Name: EmployeeCount, dtype: int64
80    1470
Name: StandardHours, dtype: int64

In [8]:

# Target은 feature에서 제외한다.
col_names = col_names\
    .drop(['Attrition_idx', 'Attrition', 'Over18', 
           'EmployeeCount', 'EmployeeNumber', 'StandardHours'])

Categorical column을 다루어보자.

Catagorical column을 numerical column을 나누어보자.

In [9]:

categorical_features = []
numerical_features = []
target = 'Attrition_idx'

# feature를 2가지 형태로 구분한다.
for col in col_names:
    if datasets[col].dtype == 'O':
        categorical_features.append(col)
    else:
        numerical_features.append(col)

In [10]:

print('Categorical feature의 수 :', len(categorical_features))
print('Numerical feature의 수 :', len(numerical_features))

Categorical feature의 수 : 7
Numerical feature의 수 : 23

In [11]:

categorical_features

Out[11]:

['BusinessTravel',
 'Department',
 'EducationField',
 'Gender',
 'JobRole',
 'MaritalStatus',
 'OverTime']

In [12]:

numerical_features

Out[12]:

['Age',
 'DailyRate',
 'DistanceFromHome',
 'Education',
 'EnvironmentSatisfaction',
 'HourlyRate',
 'JobInvolvement',
 'JobLevel',
 'JobSatisfaction',
 'MonthlyIncome',
 'MonthlyRate',
 'NumCompaniesWorked',
 'PercentSalaryHike',
 'PerformanceRating',
 'RelationshipSatisfaction',
 'StockOptionLevel',
 'TotalWorkingYears',
 'TrainingTimesLastYear',
 'WorkLifeBalance',
 'YearsAtCompany',
 'YearsInCurrentRole',
 'YearsSinceLastPromotion',
 'YearsWithCurrManager']

Categorical 데이터를 one-hot vector로 변경하자. Pandas에서 get_dummies를 이용하자.

Train, test set을 구분하지 않고 원핫벡터를 만드는 경우 : 해당 feature의 모든 원소들을 아는 경우, 예를 들어 회사 부서, 국가 코드의 경우에 해당한다.
Train, test set을 구분하고 train set으로만 원핫벡터를 만드는 경우 : 해당 feature의 원소가 test set에 없는 경우가 존재할 수 있다.

In [13]:

categorical_datasets = pd.get_dummies(datasets[categorical_features])
categorical_datasets.head()

Out[13]:

	BusinessTravel_Travel_Frequently	BusinessTravel_Travel_Rarely	Department_Research & Development	Department_Sales	EducationField_Life Sciences	EducationField_Medical	EducationField_Other	Gender_Female	Gender_Male	JobRole_Laboratory Technician	JobRole_Research Scientist	JobRole_Sales Executive	MaritalStatus_Married	MaritalStatus_Single	OverTime_No	OverTime_Yes
0	0	1	0	1	1	0	0	1	0	0	0	1	0	1	0	1
1	1	0	1	0	1	0	0	0	1	0	1	0	1	0	1	0
2	0	1	1	0	0	0	1	0	1	1	0	0	0	1	0	1
3	1	0	1	0	1	0	0	1	0	0	1	0	1	0	0	1
4	0	1	1	0	0	1	0	0	1	1	0	0	1	0	1	0

In [14]:

numerical_datasets = datasets[numerical_features]
numerical_datasets.head()

Out[14]:

	Age	DailyRate	DistanceFromHome	Education	EnvironmentSatisfaction	HourlyRate	JobInvolvement	JobLevel	JobSatisfaction	MonthlyIncome	MonthlyRate	NumCompaniesWorked	PercentSalaryHike	PerformanceRating	RelationshipSatisfaction	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
0	41	1102	1	2	2	94	3	2	4	5993	19479	8	11	3	1	0	8	0	1	6	4	0	5
1	49	279	8	1	3	61	2	2	2	5130	24907	1	23	4	4	1	10	3	3	10	7	1	7
2	37	1373	2	2	4	92	2	1	3	2090	2396	6	15	3	2	0	7	3	3	0	0	0	0
3	33	1392	3	4	4	56	3	1	3	2909	23159	1	11	3	3	0	8	3	3	8	7	3	0
4	27	591	2	1	1	40	3	1	2	3468	16632	9	12	3	4	1	6	3	3	2	2	2	2

Categorical dataset과 numerical dataset을 합친다. 모델의 input으로 사용할 feature이다.

In [15]:

X = pd.concat([categorical_datasets, numerical_datasets], axis=1)
X.head()

Out[15]:

	BusinessTravel_Travel_Frequently	BusinessTravel_Travel_Rarely	Department_Research & Development	Department_Sales	EducationField_Life Sciences	EducationField_Medical	EducationField_Other	Gender_Female	Gender_Male	JobRole_Laboratory Technician	JobRole_Research Scientist	JobRole_Sales Executive	MaritalStatus_Married	MaritalStatus_Single	OverTime_No	OverTime_Yes	Age	DailyRate	DistanceFromHome	Education	EnvironmentSatisfaction	HourlyRate	JobInvolvement	JobLevel	JobSatisfaction	MonthlyIncome	MonthlyRate	NumCompaniesWorked	PercentSalaryHike	PerformanceRating	RelationshipSatisfaction	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
0	0	1	0	1	1	0	0	1	0	0	0	1	0	1	0	1	41	1102	1	2	2	94	3	2	4	5993	19479	8	11	3	1	0	8	0	1	6	4	0	5
1	1	0	1	0	1	0	0	0	1	0	1	0	1	0	1	0	49	279	8	1	3	61	2	2	2	5130	24907	1	23	4	4	1	10	3	3	10	7	1	7
2	0	1	1	0	0	0	1	0	1	1	0	0	0	1	0	1	37	1373	2	2	4	92	2	1	3	2090	2396	6	15	3	2	0	7	3	3	0	0	0	0
3	1	0	1	0	1	0	0	1	0	0	1	0	1	0	0	1	33	1392	3	4	4	56	3	1	3	2909	23159	1	11	3	3	0	8	3	3	8	7	3	0
4	0	1	1	0	0	1	0	0	1	1	0	0	1	0	1	0	27	591	2	1	1	40	3	1	2	3468	16632	9	12	3	4	1	6	3	3	2	2	2	2

In [16]:

y = datasets[target]
y.head()

Out[16]:

0    1
1    0
2    1
3    0
4    0
Name: Attrition_idx, dtype: int64

3. Train set과 test set을 구분¶

In [17]:

from sklearn.model_selection import train_test_split

In [18]:

x_train, x_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.2, random_state=42)

4. 모델 학습 및 hyperparameter 찾기¶

Decision tree를 이용하여 분류기를 생성합니다. 아래의 파라미터를 grid search로 찾습니다.

트리의 최대깊이
분할을 위한 최소 관측값
각 단말 도느에서 필요한 최소 관측 값

In [19]:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

In [20]:

params = {
    'max_depth': [5,7,9],
    'min_samples_split': [2], 
    'min_samples_leaf': [1, 2, 3, 4]
}

grid_search_cv = \
    GridSearchCV(
        DecisionTreeClassifier(random_state=42), 
        params, 
        n_jobs=-1, 
        verbose=1, 
        cv=3)

grid_search_cv.fit(x_train, y_train)

Fitting 3 folds for each of 12 candidates, totalling 36 fits

[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:    0.2s finished

Out[20]:

GridSearchCV(cv=3, error_score='raise',
       estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best'),
       fit_params=None, iid=True, n_jobs=-1,
       param_grid={'max_depth': [5, 7, 9], 'min_samples_split': [2], 'min_samples_leaf': [1, 2, 3, 4]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=1)

In [21]:

# 검색 결과, 좋은 결과값을 주는 파라미터를 가진 tree 모델을 찾습니다.
tree_classifier = grid_search_cv.best_estimator_
tree_classifier

Out[21]:

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=5,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best')

In [22]:

# Score를 살펴봅니다.
grid_search_cv.best_score_

Out[22]:

0.8273809523809523

5. 모델 성능 측정¶

In [23]:

pred_train = tree_classifier.predict(x_train)
pred_test = tree_classifier.predict(x_test)

In [24]:

from sklearn.metrics import accuracy_score, classification_report

학습셋에 대한 결과값을 살펴보자.

In [25]:

# 1. Confusion Matrix
print('\n Train Confusion Matrix :')
display(pd.crosstab(y_train, pred_train, rownames=['Actual'], colnames=['Predict']))

# 2. Accuracy
print('\n Train accuracy :', accuracy_score(y_train, pred_train))

# 3. Classification Report
print('\n Classification Report : \n', classification_report(y_train, pred_train))

 Train Confusion Matrix :

Predict	0	1
Actual
0	951	27
1	98	100

 Train accuracy : 0.8937074829931972

 Classification Report : 
              precision    recall  f1-score   support

          0       0.91      0.97      0.94       978
          1       0.79      0.51      0.62       198

avg / total       0.89      0.89      0.88      1176

테스트셋에 대한 결과값을 살펴보자.

In [26]:

# 1. Confusion Matrix
print('\n Test Confusion Matrix :')
display(pd.crosstab(y_test, pred_test, rownames=['Actual'], colnames=['Predict']))

# 2. Accuracy
print('\n Test accuracy :', accuracy_score(y_test, pred_test))

# 3. Classification Report
print('\n Classification Report : \n', classification_report(y_test, pred_test))

 Test Confusion Matrix :

Predict	0	1
Actual
0	235	20
1	33	6

 Test accuracy : 0.8197278911564626

 Classification Report : 
              precision    recall  f1-score   support

          0       0.88      0.92      0.90       255
          1       0.23      0.15      0.18        39

avg / total       0.79      0.82      0.80       294

6. 클래스 가중치 조절¶

정확도가 86%로 높은 것은 그렇게 의미가 없다.

In [27]:

datasets.Attrition_idx.value_counts()

Out[27]:

0    1233
1     237
Name: Attrition_idx, dtype: int64

In [28]:

(1233-237)/1233

Out[28]:

0.8077858880778589

값을 살펴보면 1의 비율이 6:1이다. 따라서 분류기가 모든 샘플에 대하여 0이라고만 분류해도, 80.77%의 정확도를 얻을 수 있다. 1(퇴직자)에 대한 분류를 제대로 못하고 있다.

퇴사할 가능성이 많은 직원에게 보너스를 많이 주어, 퇴사를 방지하는 것에 이 모델을 사용한다면, 심각한 문제가 될 수 있다. 모델은 퇴직하지 않는다고 예측했는데, 예측과 달리 퇴직한 직원의 비율이 상당히 높다.

모델을 살짝 튜닝해보자. 클래스의 가중값을 조절해보자. 예를 들어, 부류 1(퇴직자)의 가중값을 올리면 실제로 퇴사할 특성이 있는 직원들을 더 잘 파악하게 되지만, 퇴사할 가능성이 없는 일부 직원들을 잠재적 퇴사자로 분류하게 된다. 즉, 퇴사를 더 잘 막을 수 있게 될 것이다. (대출을 실행할 때, 신용도 낮은 사람을 승인하는 것보다, 신용도가 조금 만족되는 사람이라도 거절하는 편이 더 낫다. 즉 감당할 수 있는 오류이다.)

클래스의 가중값을 바꾸어보며 테스트 한다.

In [29]:

import numpy as np

In [30]:

tuning_results = pd.DataFrame(np.empty((6, 10)))

In [31]:

tuning_results.columns = ['class_0_weight', 'class_1_weight', 
                          'train_accuracy', 'test_accuracy', 
                          'precision_class_0', 'precision_class_1', 'precision_overall', 
                          'recall_calss_0', 'recall_class_1', 'recall_overall']

In [32]:

# 나중에 결과 테이블을 만들 떄 사용
print(classification_report(y_test, pred_test).split())

['precision', 'recall', 'f1-score', 'support', '0', '0.88', '0.92', '0.90', '255', '1', '0.23', '0.15', '0.18', '39', 'avg', '/', 'total', '0.79', '0.82', '0.80', '294']

In [33]:

class_0_weight = [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.99]

for i in range(len(class_0_weight)):
    class_weights = {0: class_0_weight[i], 1: 1 - class_0_weight[i]}
    tree_classifier = DecisionTreeClassifier(criterion='gini',
                                             max_depth=5,
                                             min_samples_split=2,
                                             min_samples_leaf=1,
                                             random_state=42,
                                             class_weight=class_weights)
    tree_classifier.fit(x_train, y_train)
    pred_train = tree_classifier.predict(x_train)
    pred_test = tree_classifier.predict(x_test)
    tuning_results.loc[i, 'class_0_weight'] = class_weights[0]
    tuning_results.loc[i, 'class_1_weight'] = class_weights[1]
    tuning_results.loc[i, 'train_accuracy'] = round(accuracy_score(y_train, pred_train), 4)
    tuning_results.loc[i, 'test_accuracy'] = round(accuracy_score(y_test, pred_test), 4)
    c_r = classification_report(y_test, pred_test).split()
    tuning_results.loc[i, 'precision_class_0'] = float(c_r[5])
    tuning_results.loc[i, 'precision_class_1'] = float(c_r[10])
    tuning_results.loc[i, 'precision_overall'] = float(c_r[17])
    tuning_results.loc[i, 'recall_calss_0'] = float(c_r[6])
    tuning_results.loc[i, 'recall_class_1'] = float(c_r[11])
    tuning_results.loc[i, 'recall_overall'] = float(c_r[18])

    print(class_weights)
    print('Test accuracy :', accuracy_score(y_test, pred_test))
    display(pd.crosstab(y_test, pred_test, rownames=['Actual'], colnames=['Predict']))

{0: 0.01, 1: 0.99}
Test accuracy : 0.2925170068027211

Predict	0	1
Actual
0	50	205
1	3	36

{0: 0.1, 1: 0.9}
Test accuracy : 0.6972789115646258

Predict	0	1
Actual
0	183	72
1	17	22

{0: 0.2, 1: 0.8}
Test accuracy : 0.7891156462585034

Predict	0	1
Actual
0	216	39
1	23	16

{0: 0.3, 1: 0.7}
Test accuracy : 0.8095238095238095

Predict	0	1
Actual
0	226	29
1	27	12

{0: 0.4, 1: 0.6}
Test accuracy : 0.7993197278911565

Predict	0	1
Actual
0	225	30
1	29	10

{0: 0.5, 1: 0.5}
Test accuracy : 0.8197278911564626

Predict	0	1
Actual
0	235	20
1	33	6

{0: 0.6, 1: 0.4}
Test accuracy : 0.8469387755102041

Predict	0	1
Actual
0	247	8
1	37	2

{0: 0.7, 1: 0.30000000000000004}
Test accuracy : 0.8537414965986394

Predict	0	1
Actual
0	248	7
1	36	3

{0: 0.8, 1: 0.19999999999999996}
Test accuracy : 0.8571428571428571

Predict	0	1
Actual
0	250	5
1	37	2

{0: 0.9, 1: 0.09999999999999998}
Test accuracy : 0.8673469387755102

Predict	0	1
Actual
0	253	2
1	37	2

{0: 0.99, 1: 0.010000000000000009}
Test accuracy : 0.8707482993197279

Predict	0	1
Actual
0	255	0
1	38	1

In [34]:

tuning_results

Out[34]:

	class_0_weight	class_1_weight	train_accuracy	test_accuracy	precision_class_0	precision_class_1	precision_overall	recall_calss_0	recall_class_1	recall_overall
0	0.01	0.99	0.3580	0.2925	0.94	0.15	0.84	0.20	0.92	0.29
1	0.10	0.90	0.7976	0.6973	0.92	0.23	0.82	0.72	0.56	0.70
2	0.20	0.80	0.8759	0.7891	0.90	0.29	0.82	0.85	0.41	0.79
3	0.30	0.70	0.8912	0.8095	0.89	0.29	0.81	0.89	0.31	0.81
4	0.40	0.60	0.8903	0.7993	0.89	0.25	0.80	0.88	0.26	0.80
5	0.50	0.50	0.8937	0.8197	0.88	0.23	0.79	0.92	0.15	0.82
6	0.60	0.40	0.8954	0.8469	0.87	0.20	0.78	0.97	0.05	0.85
7	0.70	0.30	0.8963	0.8537	0.87	0.30	0.80	0.97	0.08	0.85
8	0.80	0.20	0.8869	0.8571	0.87	0.29	0.79	0.98	0.05	0.86
9	0.90	0.10	0.8622	0.8673	0.87	0.50	0.82	0.99	0.05	0.87
10	0.99	0.01	0.8435	0.8707	0.87	1.00	0.89	1.00	0.03	0.87

Class 0의 가중치가 커질수록, class 0으로 더 많이 예측한다. 우선 예측량이 많아지기 때문에 리콜은 상대적으로 높다. 정확도는 떨어질지라도 class 0으로 예측하는 양이 많아지기 때문에 recall이 상대적으로 높아지기 때문이다. 하지만 예측하는 양이 상대적으로 많이지기 때문에 precision은 떨어지게 된다.

결과를 살펴보면, class 0의 가중치가 0.3일 때, accuracy 및 precision, recall 이 괜찮은 결과를 보이고 있다.

	BusinessTravel_Travel_Frequently	BusinessTravel_Travel_Rarely	Department_Research & Development	Department_Sales	EducationField_Life Sciences	EducationField_Medical	EducationField_Other	Gender_Female	Gender_Male	JobRole_Laboratory Technician	JobRole_Research Scientist	JobRole_Sales Executive	MaritalStatus_Married	MaritalStatus_Single	OverTime_No	OverTime_Yes
0	0	1	0	1	1	0	0	1	0	0	0	1	0	1	0	1
1	1	0	1	0	1	0	0	0	1	0	1	0	1	0	1	0
2	0	1	1	0	0	0	1	0	1	1	0	0	0	1	0	1
3	1	0	1	0	1	0	0	1	0	0	1	0	1	0	0	1
4	0	1	1	0	0	1	0	0	1	1	0	0	1	0	1	0

	BusinessTravel_Travel_Frequently	BusinessTravel_Travel_Rarely	Department_Research & Development	Department_Sales	EducationField_Life Sciences	EducationField_Medical	EducationField_Other	Gender_Female	Gender_Male	JobRole_Laboratory Technician	JobRole_Research Scientist	JobRole_Sales Executive	MaritalStatus_Married	MaritalStatus_Single	OverTime_No	OverTime_Yes
0	0	1	0	1	1	0	0	1	0	0	0	1	0	1	0	1
1	1	0	1	0	1	0	0	0	1	0	1	0	1	0	1	0
2	0	1	1	0	0	0	1	0	1	1	0	0	0	1	0	1
3	1	0	1	0	1	0	0	1	0	0	1	0	1	0	0	1
4	0	1	1	0	0	1	0	0	1	1	0	0	1	0	1	0

	BusinessTravel_Travel_Frequently	BusinessTravel_Travel_Rarely	Department_Research & Development	Department_Sales	EducationField_Life Sciences	EducationField_Medical	EducationField_Other	Gender_Female	Gender_Male	JobRole_Laboratory Technician	JobRole_Research Scientist	JobRole_Sales Executive	MaritalStatus_Married	MaritalStatus_Single	OverTime_No	OverTime_Yes
0	0	1	0	1	1	0	0	1	0	0	0	1	0	1	0	1
1	1	0	1	0	1	0	0	0	1	0	1	0	1	0	1	0
2	0	1	1	0	0	0	1	0	1	1	0	0	0	1	0	1
3	1	0	1	0	1	0	0	1	0	0	1	0	1	0	0	1
4	0	1	1	0	0	1	0	0	1	1	0	0	1	0	1	0