!pip install pandas_profiling
Collecting pandas_profiling Downloading https://files.pythonhosted.org/packages/2c/2f/aae19e2173c10a9bb7fee5f5cad35dbe53a393960fc91abc477dcc4661e8/pandas-profiling-2.3.0.tar.gz (127kB) |████████████████████████████████| 133kB 15.5MB/s eta 0:00:01 Requirement already satisfied: pandas>=0.19 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas_profiling) (0.24.1) Requirement already satisfied: matplotlib>=1.4 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas_profiling) (3.0.2) Requirement already satisfied: jinja2>=2.8 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas_profiling) (2.10) Collecting missingno>=0.4.2 (from pandas_profiling) Downloading https://files.pythonhosted.org/packages/2b/de/6e4dd6d720c49939544352155dc06a08c9f7e4271aa631a559dfbeaaf9d4/missingno-0.4.2-py3-none-any.whl Collecting htmlmin>=0.1.12 (from pandas_profiling) Downloading https://files.pythonhosted.org/packages/b3/e7/fcd59e12169de19f0131ff2812077f964c6b960e7c09804d30a7bf2ab461/htmlmin-0.1.12.tar.gz Collecting phik>=0.9.8 (from pandas_profiling) Downloading https://files.pythonhosted.org/packages/45/ad/24a16fa4ba612fb96a3c4bb115a5b9741483f53b66d3d3afd987f20fa227/phik-0.9.8-py3-none-any.whl (606kB) |████████████████████████████████| 614kB 22.4MB/s eta 0:00:01 Collecting confuse>=1.0.0 (from pandas_profiling) Downloading https://files.pythonhosted.org/packages/4c/6f/90e860cba937c174d8b3775729ccc6377eb91f52ad4eeb008e7252a3646d/confuse-1.0.0.tar.gz Requirement already satisfied: astropy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas_profiling) (3.1.1) Requirement already satisfied: pytz>=2011k in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas>=0.19->pandas_profiling) (2018.9) Requirement already satisfied: python-dateutil>=2.5.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas>=0.19->pandas_profiling) (2.7.5) Requirement already satisfied: numpy>=1.12.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas>=0.19->pandas_profiling) (1.15.4) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from matplotlib>=1.4->pandas_profiling) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from matplotlib>=1.4->pandas_profiling) (1.0.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from matplotlib>=1.4->pandas_profiling) (2.3.1) Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from jinja2>=2.8->pandas_profiling) (1.1.0) Requirement already satisfied: seaborn in /opt/conda/envs/Python36/lib/python3.6/site-packages (from missingno>=0.4.2->pandas_profiling) (0.9.0) Requirement already satisfied: scipy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from missingno>=0.4.2->pandas_profiling) (1.2.0) Requirement already satisfied: pytest>=4.0.2 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from phik>=0.9.8->pandas_profiling) (4.2.1) Requirement already satisfied: numba>=0.38.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from phik>=0.9.8->pandas_profiling) (0.42.0) Requirement already satisfied: nbconvert>=5.3.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from phik>=0.9.8->pandas_profiling) (5.4.0) Requirement already satisfied: jupyter-client>=5.2.3 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from phik>=0.9.8->pandas_profiling) (5.2.4) Collecting pytest-pylint>=0.13.0 (from phik>=0.9.8->pandas_profiling) Downloading https://files.pythonhosted.org/packages/64/dc/6f35f114844fb12e38d60c4f3d2441a55baff7043ad4e013777dff55746c/pytest_pylint-0.14.1-py3-none-any.whl Requirement already satisfied: pyyaml in /opt/conda/envs/Python36/lib/python3.6/site-packages (from confuse>=1.0.0->pandas_profiling) (3.13) Requirement already satisfied: six>=1.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from python-dateutil>=2.5.0->pandas>=0.19->pandas_profiling) (1.12.0) Requirement already satisfied: setuptools in /opt/conda/envs/Python36/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib>=1.4->pandas_profiling) (40.8.0) Requirement already satisfied: py>=1.5.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas_profiling) (1.7.0) Requirement already satisfied: attrs>=17.4.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas_profiling) (18.2.0) Requirement already satisfied: atomicwrites>=1.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas_profiling) (1.3.0) Requirement already satisfied: pluggy>=0.7 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas_profiling) (0.8.1) Requirement already satisfied: more-itertools>=4.0.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pytest>=4.0.2->phik>=0.9.8->pandas_profiling) (5.0.0) Requirement already satisfied: llvmlite>=0.27.0dev0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from numba>=0.38.1->phik>=0.9.8->pandas_profiling) (0.27.0) Requirement already satisfied: mistune>=0.8.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (0.8.4) Requirement already satisfied: pygments in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (2.3.1) Requirement already satisfied: traitlets>=4.2 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (4.3.2) Requirement already satisfied: jupyter_core in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (4.4.0) Requirement already satisfied: nbformat>=4.4 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (4.4.0) Requirement already satisfied: entrypoints>=0.2.2 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (0.3) Requirement already satisfied: bleach in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (3.1.0) Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (1.4.2) Requirement already satisfied: testpath in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (0.4.2) Requirement already satisfied: defusedxml in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (0.5.0) Requirement already satisfied: pyzmq>=13 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from jupyter-client>=5.2.3->phik>=0.9.8->pandas_profiling) (17.1.2) Requirement already satisfied: tornado>=4.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from jupyter-client>=5.2.3->phik>=0.9.8->pandas_profiling) (5.1.1) Requirement already satisfied: pylint>=1.4.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pytest-pylint>=0.13.0->phik>=0.9.8->pandas_profiling) (2.2.2) Requirement already satisfied: ipython-genutils in /opt/conda/envs/Python36/lib/python3.6/site-packages (from traitlets>=4.2->nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (0.2.0) Requirement already satisfied: decorator in /opt/conda/envs/Python36/lib/python3.6/site-packages (from traitlets>=4.2->nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (4.3.2) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from nbformat>=4.4->nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (2.6.0) Requirement already satisfied: webencodings in /opt/conda/envs/Python36/lib/python3.6/site-packages (from bleach->nbconvert>=5.3.1->phik>=0.9.8->pandas_profiling) (0.5.1) Requirement already satisfied: astroid>=2.0.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas_profiling) (2.1.0) Requirement already satisfied: isort>=4.2.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas_profiling) (4.3.4) Requirement already satisfied: mccabe in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas_profiling) (0.6.1) Requirement already satisfied: lazy-object-proxy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from astroid>=2.0.0->pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas_profiling) (1.3.1) Requirement already satisfied: typed-ast; python_version < "3.7" and implementation_name == "cpython" in /opt/conda/envs/Python36/lib/python3.6/site-packages (from astroid>=2.0.0->pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas_profiling) (1.3.1) Requirement already satisfied: wrapt in /opt/conda/envs/Python36/lib/python3.6/site-packages (from astroid>=2.0.0->pylint>=1.4.5->pytest-pylint>=0.13.0->phik>=0.9.8->pandas_profiling) (1.11.1) Building wheels for collected packages: pandas-profiling, htmlmin, confuse Building wheel for pandas-profiling (setup.py) ... done Stored in directory: /home/dsxuser/.cache/pip/wheels/ce/c7/f1/dbfef4848ebb048cb1d4a22d1ed0c62d8ff2523747235e19fe Building wheel for htmlmin (setup.py) ... done Stored in directory: /home/dsxuser/.cache/pip/wheels/43/07/ac/7c5a9d708d65247ac1f94066cf1db075540b85716c30255459 Building wheel for confuse (setup.py) ... done Stored in directory: /home/dsxuser/.cache/pip/wheels/b0/b2/96/2074eee7dbf7b7df69d004c9b6ac4e32dad04fb7666cf943bd Successfully built pandas-profiling htmlmin confuse Installing collected packages: missingno, htmlmin, pytest-pylint, phik, confuse, pandas-profiling Successfully installed confuse-1.0.0 htmlmin-0.1.12 missingno-0.4.2 pandas-profiling-2.3.0 phik-0.9.8 pytest-pylint-0.14.1
!pip install lightgbm
Collecting lightgbm Downloading https://files.pythonhosted.org/packages/77/0f/5157e6b153b3d4a70dc5fbe2ab6f209604197590f387f03177b7a249ac60/lightgbm-2.2.3-py2.py3-none-manylinux1_x86_64.whl (1.2MB) |████████████████████████████████| 1.2MB 14.5MB/s eta 0:00:01 Requirement already satisfied: scipy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from lightgbm) (1.2.0) Requirement already satisfied: scikit-learn in /opt/conda/envs/Python36/lib/python3.6/site-packages (from lightgbm) (0.20.3) Requirement already satisfied: numpy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from lightgbm) (1.15.4) Installing collected packages: lightgbm Successfully installed lightgbm-2.2.3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ipaddress
import pandas_profiling as pp
%matplotlib inline
from sklearn import preprocessing
plt.rc("font", size=14)
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import warnings
warnings.filterwarnings("ignore")
import time
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import lightgbm as lgb
from lightgbm import LGBMClassifier
import seaborn as sns
sns.set(style="white")
sns.set(style="whitegrid", color_codes=True)
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
def __iter__(self): return 0
# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_554c07c959184a5fb85f7723b7045646 = ibm_boto3.client(service_name='s3',
ibm_api_key_id='LcFOxAeI1SgkmxN5c5YiOXLFytY-nF4IX4qVUCVKHjiG',
ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
config=Config(signature_version='oauth'),
endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')
body = client_554c07c959184a5fb85f7723b7045646.get_object(Bucket='fraudpredictionseries-donotdelete-pr-goroseftzd9ob6',Key='fraud_data.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )
df = pd.read_csv(body)
print(df.head())
print(df.shape)
Gender Married Dependents Education Self_Employed ApplicantIncome \ 0 1 0 0 1 0 5849 1 1 1 1 1 1 4583 2 1 1 0 1 1 3000 3 1 1 0 0 1 2583 4 1 0 0 1 0 6000 CoapplicantIncome LoanAmount Loan_Term Credit_History_Available \ 0 0 146 360 1 1 1508 128 360 1 2 0 66 360 1 3 2358 120 360 1 4 0 141 360 1 Housing Locality Fraud_Risk 0 1 1 0 1 1 3 1 2 1 1 1 3 1 1 1 4 1 1 0 (821, 13)
pp.ProfileReport(df)
We can observe that there are no missing values and no duplicates. We can do detailed analysis of each attribute to understand the data better.
count_fraud = len(df[df['Fraud_Risk']==0])
count_non_fraud = len(df[df['Fraud_Risk']==1])
pct_of_non_fraud = count_non_fraud/(count_non_fraud +count_fraud)
print("percentage of non Fraud Risk is", round(pct_of_non_fraud*100,2))
pct_of_fraud = count_fraud/(count_non_fraud +count_fraud)
print("percentage of Fraud Risk", round(pct_of_fraud*100,2))
percentage of non Fraud Risk is 58.1 percentage of Fraud Risk 41.9
Plot the target attribute
sns.countplot(x='Fraud_Risk',data=df, palette='hls')
plt.show()
df.groupby('Fraud_Risk').mean()
Gender | Married | Dependents | Education | Self_Employed | ApplicantIncome | CoapplicantIncome | LoanAmount | Loan_Term | Credit_History_Available | Housing | Locality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fraud_Risk | ||||||||||||
0 | 0.572674 | 0.000000 | 0.313953 | 0.805233 | 0.174419 | 4785.148256 | 1111.700581 | 125.720930 | 346.453488 | 0.973837 | 0.700581 | 2.011628 |
1 | 0.851153 | 0.834382 | 0.905660 | 0.777778 | 0.865828 | 5530.683438 | 1774.714885 | 152.104822 | 331.849057 | 0.819706 | 0.620545 | 1.958071 |
df.corr(method ='pearson')
Gender | Married | Dependents | Education | Self_Employed | ApplicantIncome | CoapplicantIncome | LoanAmount | Loan_Term | Credit_History_Available | Housing | Locality | Fraud_Risk | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gender | 1.000000 | 0.356961 | 0.121822 | -0.080729 | 0.265646 | 0.008332 | 0.116489 | 0.087919 | -0.074207 | -0.019188 | -0.013977 | -0.008063 | 0.311131 |
Married | 0.356961 | 1.000000 | 0.372524 | -0.030681 | 0.832014 | 0.071055 | 0.102828 | 0.174027 | -0.106939 | -0.083401 | -0.093887 | -0.022953 | 0.823742 |
Dependents | 0.121822 | 0.372524 | 1.000000 | -0.022988 | 0.349570 | 0.129937 | 0.024323 | 0.181592 | -0.048651 | -0.083369 | -0.022418 | -0.014326 | 0.311572 |
Education | -0.080729 | -0.030681 | -0.022988 | 1.000000 | -0.026170 | 0.128624 | 0.052806 | 0.155956 | 0.108287 | 0.055851 | 0.000979 | -0.102404 | -0.033216 |
Self_Employed | 0.265646 | 0.832014 | 0.349570 | -0.026170 | 1.000000 | 0.140173 | 0.076750 | 0.231262 | -0.103865 | -0.048297 | -0.100422 | -0.047370 | 0.690325 |
ApplicantIncome | 0.008332 | 0.071055 | 0.129937 | 0.128624 | 0.140173 | 1.000000 | -0.121032 | 0.564648 | -0.003925 | -0.021712 | -0.051459 | -0.020718 | 0.065581 |
CoapplicantIncome | 0.116489 | 0.102828 | 0.024323 | 0.052806 | 0.076750 | -0.121032 | 1.000000 | 0.165143 | -0.053702 | -0.034498 | -0.018697 | 0.001634 | 0.116479 |
LoanAmount | 0.087919 | 0.174027 | 0.181592 | 0.155956 | 0.231262 | 0.564648 | 0.165143 | 1.000000 | 0.074216 | -0.024232 | -0.086455 | 0.019726 | 0.162672 |
Loan_Term | -0.074207 | -0.106939 | -0.048651 | 0.108287 | -0.103865 | -0.003925 | -0.053702 | 0.074216 | 1.000000 | 0.075339 | 0.026265 | 0.086213 | -0.095366 |
Credit_History_Available | -0.019188 | -0.083401 | -0.083369 | 0.055851 | -0.048297 | -0.021712 | -0.034498 | -0.024232 | 0.075339 | 1.000000 | 0.017112 | -0.004215 | -0.237737 |
Housing | -0.013977 | -0.093887 | -0.022418 | 0.000979 | -0.100422 | -0.051459 | -0.018697 | -0.086455 | 0.026265 | 0.017112 | 1.000000 | 0.068134 | -0.083019 |
Locality | -0.008063 | -0.022953 | -0.014326 | -0.102404 | -0.047370 | -0.020718 | 0.001634 | 0.019726 | 0.086213 | -0.004215 | 0.068134 | 1.000000 | -0.034356 |
Fraud_Risk | 0.311131 | 0.823742 | 0.311572 | -0.033216 | 0.690325 | 0.065581 | 0.116479 | 0.162672 | -0.095366 | -0.237737 | -0.083019 | -0.034356 | 1.000000 |
We can observe strong positive co-relation between the attributes Married & Self Employed to the target variable which is Fraud Risk.
X = df[df.columns[0:12]]
y = df[df.columns[12:]]
df.dtypes
Gender int64 Married int64 Dependents int64 Education int64 Self_Employed int64 ApplicantIncome int64 CoapplicantIncome int64 LoanAmount int64 Loan_Term int64 Credit_History_Available int64 Housing int64 Locality int64 Fraud_Risk int64 dtype: object
We can observe that all the attributes are in Int data type.
Check for null values
df.isna()
Gender | Married | Dependents | Education | Self_Employed | ApplicantIncome | CoapplicantIncome | LoanAmount | Loan_Term | Credit_History_Available | Housing | Locality | Fraud_Risk | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | False | False | False | False | False | False | False | False | False | False | False | False | False |
1 | False | False | False | False | False | False | False | False | False | False | False | False | False |
2 | False | False | False | False | False | False | False | False | False | False | False | False | False |
3 | False | False | False | False | False | False | False | False | False | False | False | False | False |
4 | False | False | False | False | False | False | False | False | False | False | False | False | False |
5 | False | False | False | False | False | False | False | False | False | False | False | False | False |
6 | False | False | False | False | False | False | False | False | False | False | False | False | False |
7 | False | False | False | False | False | False | False | False | False | False | False | False | False |
8 | False | False | False | False | False | False | False | False | False | False | False | False | False |
9 | False | False | False | False | False | False | False | False | False | False | False | False | False |
10 | False | False | False | False | False | False | False | False | False | False | False | False | False |
11 | False | False | False | False | False | False | False | False | False | False | False | False | False |
12 | False | False | False | False | False | False | False | False | False | False | False | False | False |
13 | False | False | False | False | False | False | False | False | False | False | False | False | False |
14 | False | False | False | False | False | False | False | False | False | False | False | False | False |
15 | False | False | False | False | False | False | False | False | False | False | False | False | False |
16 | False | False | False | False | False | False | False | False | False | False | False | False | False |
17 | False | False | False | False | False | False | False | False | False | False | False | False | False |
18 | False | False | False | False | False | False | False | False | False | False | False | False | False |
19 | False | False | False | False | False | False | False | False | False | False | False | False | False |
20 | False | False | False | False | False | False | False | False | False | False | False | False | False |
21 | False | False | False | False | False | False | False | False | False | False | False | False | False |
22 | False | False | False | False | False | False | False | False | False | False | False | False | False |
23 | False | False | False | False | False | False | False | False | False | False | False | False | False |
24 | False | False | False | False | False | False | False | False | False | False | False | False | False |
25 | False | False | False | False | False | False | False | False | False | False | False | False | False |
26 | False | False | False | False | False | False | False | False | False | False | False | False | False |
27 | False | False | False | False | False | False | False | False | False | False | False | False | False |
28 | False | False | False | False | False | False | False | False | False | False | False | False | False |
29 | False | False | False | False | False | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
791 | False | False | False | False | False | False | False | False | False | False | False | False | False |
792 | False | False | False | False | False | False | False | False | False | False | False | False | False |
793 | False | False | False | False | False | False | False | False | False | False | False | False | False |
794 | False | False | False | False | False | False | False | False | False | False | False | False | False |
795 | False | False | False | False | False | False | False | False | False | False | False | False | False |
796 | False | False | False | False | False | False | False | False | False | False | False | False | False |
797 | False | False | False | False | False | False | False | False | False | False | False | False | False |
798 | False | False | False | False | False | False | False | False | False | False | False | False | False |
799 | False | False | False | False | False | False | False | False | False | False | False | False | False |
800 | False | False | False | False | False | False | False | False | False | False | False | False | False |
801 | False | False | False | False | False | False | False | False | False | False | False | False | False |
802 | False | False | False | False | False | False | False | False | False | False | False | False | False |
803 | False | False | False | False | False | False | False | False | False | False | False | False | False |
804 | False | False | False | False | False | False | False | False | False | False | False | False | False |
805 | False | False | False | False | False | False | False | False | False | False | False | False | False |
806 | False | False | False | False | False | False | False | False | False | False | False | False | False |
807 | False | False | False | False | False | False | False | False | False | False | False | False | False |
808 | False | False | False | False | False | False | False | False | False | False | False | False | False |
809 | False | False | False | False | False | False | False | False | False | False | False | False | False |
810 | False | False | False | False | False | False | False | False | False | False | False | False | False |
811 | False | False | False | False | False | False | False | False | False | False | False | False | False |
812 | False | False | False | False | False | False | False | False | False | False | False | False | False |
813 | False | False | False | False | False | False | False | False | False | False | False | False | False |
814 | False | False | False | False | False | False | False | False | False | False | False | False | False |
815 | False | False | False | False | False | False | False | False | False | False | False | False | False |
816 | False | False | False | False | False | False | False | False | False | False | False | False | False |
817 | False | False | False | False | False | False | False | False | False | False | False | False | False |
818 | False | False | False | False | False | False | False | False | False | False | False | False | False |
819 | False | False | False | False | False | False | False | False | False | False | False | False | False |
820 | False | False | False | False | False | False | False | False | False | False | False | False | False |
821 rows × 13 columns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
print("Train_x Shape :: ", X_train.shape)
print("Train_y Shape :: ", y_train.shape)
print("Test_x Shape :: ", X_test.shape)
print("Test_y Shape :: ", y_test.shape)
Train_x Shape :: (574, 12) Train_y Shape :: (574, 1) Test_x Shape :: (247, 12) Test_y Shape :: (247, 1)
d_train = lgb.Dataset(X_train, label=y_train)
Building the model with default parameters
def LGBM_classifier(features, target):
"""
To train the LGBM classifier with features and target data
:param features:
:param target:
:return: trained LGBM classifier
"""
model = LGBMClassifier(metric='binary_logloss', objective='binary')
model.fit(features, target)
return model
start = time.time()
trained_model = LGBM_classifier(X_train, y_train.values.ravel())
print("> Completion Time : ", time.time() - start)
print("Trained LGBM model :: ", trained_model)
predictions = trained_model.predict(X_test)
> Completion Time : 193.7288317680359 Trained LGBM model :: LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, importance_type='split', learning_rate=0.1, max_depth=-1, metric='binary_logloss', min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective='binary', random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
print("Train Accuracy :: ", accuracy_score(y_train, trained_model.predict(X_train)))
print("LGBM Model Test Accuracy is :: ", accuracy_score(y_test, predictions))
Train Accuracy :: 1.0 LGBM Model Test Accuracy is :: 0.9230769230769231
We can observe that the model has achieved 92% accuracy on test data and 100% accuracy on train data.
print(" Confusion matrix ", confusion_matrix(y_test, predictions))
Confusion matrix [[102 5] [ 14 126]]
feat_imp = pd.Series(trained_model.feature_importances_, index=X.columns)
feat_imp.nlargest(12).plot(kind='barh', figsize=(8,10))
<matplotlib.axes._subplots.AxesSubplot at 0x7f5889955cf8>
Feature importance as per the model
!pip install shap
Collecting shap Downloading https://files.pythonhosted.org/packages/c9/b3/76dc7e0a039543ff8646e453b3a28bfd55a1954f91a6bc7b6ed8be80bf16/shap-0.30.1.tar.gz (244kB) |████████████████████████████████| 245kB 14.8MB/s eta 0:00:01 Requirement already satisfied: numpy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (1.15.4) Requirement already satisfied: scipy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (1.2.0) Requirement already satisfied: scikit-learn in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (0.20.3) Requirement already satisfied: matplotlib in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (3.0.2) Requirement already satisfied: pandas in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (0.24.1) Requirement already satisfied: tqdm>4.25.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (4.31.1) Requirement already satisfied: ipython in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (7.2.0) Requirement already satisfied: scikit-image in /opt/conda/envs/Python36/lib/python3.6/site-packages (from shap) (0.14.1) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from matplotlib->shap) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from matplotlib->shap) (1.0.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from matplotlib->shap) (2.3.1) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from matplotlib->shap) (2.7.5) Requirement already satisfied: pytz>=2011k in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas->shap) (2018.9) Requirement already satisfied: traitlets>=4.2 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (4.3.2) Requirement already satisfied: pygments in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (2.3.1) Requirement already satisfied: setuptools>=18.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (40.8.0) Requirement already satisfied: pexpect; sys_platform != "win32" in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (4.6.0) Requirement already satisfied: decorator in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (4.3.2) Requirement already satisfied: backcall in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (0.1.0) Requirement already satisfied: pickleshare in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (0.7.5) Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (2.0.8) Requirement already satisfied: jedi>=0.10 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from ipython->shap) (0.13.2) Requirement already satisfied: networkx>=1.8 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from scikit-image->shap) (2.2) Requirement already satisfied: six>=1.10.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from scikit-image->shap) (1.12.0) Requirement already satisfied: pillow>=4.3.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from scikit-image->shap) (5.4.1) Requirement already satisfied: PyWavelets>=0.4.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from scikit-image->shap) (1.0.1) Requirement already satisfied: dask[array]>=0.9.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from scikit-image->shap) (1.1.1) Requirement already satisfied: cloudpickle>=0.2.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from scikit-image->shap) (0.7.0) Requirement already satisfied: ipython-genutils in /opt/conda/envs/Python36/lib/python3.6/site-packages (from traitlets>=4.2->ipython->shap) (0.2.0) Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pexpect; sys_platform != "win32"->ipython->shap) (0.6.0) Requirement already satisfied: wcwidth in /opt/conda/envs/Python36/lib/python3.6/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython->shap) (0.1.7) Requirement already satisfied: parso>=0.3.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from jedi>=0.10->ipython->shap) (0.3.2) Requirement already satisfied: toolz>=0.7.3; extra == "array" in /opt/conda/envs/Python36/lib/python3.6/site-packages (from dask[array]>=0.9.0->scikit-image->shap) (0.9.0) Building wheels for collected packages: shap Building wheel for shap (setup.py) ... done Stored in directory: /home/dsxuser/.cache/pip/wheels/62/94/5e/feb9af12d63a719a32266a29ca564b4dc37b4755052aca6859 Successfully built shap Installing collected packages: shap Successfully installed shap-0.30.1
import shap
shap.initjs()
shap_values = shap.TreeExplainer(trained_model.booster_).shap_values(X_train)
shap.summary_plot(shap_values, X_train)
We can observe that attributes like Married, ApplicantIncome, Credit_History_Available, LoanAMount & CoapplicantIncome have high impact on the target variable.