Example of outlier detection with autoencoders. Dataset https://www.kaggle.com/mlg-ulb/creditcardfraud from Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles).
It is a highly unbalanced dataset with a very low percetnage of fraudulent credit card transactions. Our purpose is to build a classifier for detecting fraudulent transactions. In this example we will consider them as outliers an will use an autoencoder for detecting them.
!wget -O creditfraud.zip https://www.dropbox.com/s/tl20yp9bcl56oxt/creditcardfraud.zip?dl=0
--2019-11-28 04:07:01-- https://www.dropbox.com/s/tl20yp9bcl56oxt/creditcardfraud.zip?dl=0 Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.1, 2620:100:6016:1::a27d:101 Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.1|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: /s/raw/tl20yp9bcl56oxt/creditcardfraud.zip [following] --2019-11-28 04:07:01-- https://www.dropbox.com/s/raw/tl20yp9bcl56oxt/creditcardfraud.zip Reusing existing connection to www.dropbox.com:443. HTTP request sent, awaiting response... 302 Found Location: https://uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com/cd/0/inline/AtO36T9HrRh1lv1X4fjHDp-az5GX_fQBx6A61o9S1_nNE-TcE7NM7JZn0DGhfBBlU8mOYfEJHXW3CQtlDePz3MNOWL2idFgKSLncClCeI9wGThnDQolRYxt3iLzCqPRbRHw/file# [following] --2019-11-28 04:07:01-- https://uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com/cd/0/inline/AtO36T9HrRh1lv1X4fjHDp-az5GX_fQBx6A61o9S1_nNE-TcE7NM7JZn0DGhfBBlU8mOYfEJHXW3CQtlDePz3MNOWL2idFgKSLncClCeI9wGThnDQolRYxt3iLzCqPRbRHw/file Resolving uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com (uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com)... 162.125.8.6, 2620:100:601b:6::a27d:806 Connecting to uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com (uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com)|162.125.8.6|:443... connected. HTTP request sent, awaiting response... 302 FOUND Location: /cd/0/inline2/AtN69OIj7nxIMRKGYu4bXMK0az97qwRNlf1BdtnoUMdfvBOwe4NlE4X8bAGjdqxpUEeNZPPD-zrdmrdWRqixpQ4uAwzDKrlTEJMvonwTfMSAYkRCLxCVpsvLJ4mySQcoB6lcgVp_7mq1CB6AHp4V0AW00haAaRi0RIsT87o1s2qUkD6rFz4l9dsf2klpkNASCmWQ4jUQEnACWWupFY3fGlkKrcp3SJFYZV2pO_EvDN6fS9tZxstcPJNzp7wVJg5WJw8f-oQAcfCEMvnT2Lh4Xv_p69nGaqpBexhtxjUupAiQI9AiqrVZ_TxoVbiinkC3xgyqSozlue3zWOKaj0rdApsD4pe_HGQxEoLmgg8a2ctvUQ/file [following] --2019-11-28 04:07:02-- https://uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com/cd/0/inline2/AtN69OIj7nxIMRKGYu4bXMK0az97qwRNlf1BdtnoUMdfvBOwe4NlE4X8bAGjdqxpUEeNZPPD-zrdmrdWRqixpQ4uAwzDKrlTEJMvonwTfMSAYkRCLxCVpsvLJ4mySQcoB6lcgVp_7mq1CB6AHp4V0AW00haAaRi0RIsT87o1s2qUkD6rFz4l9dsf2klpkNASCmWQ4jUQEnACWWupFY3fGlkKrcp3SJFYZV2pO_EvDN6fS9tZxstcPJNzp7wVJg5WJw8f-oQAcfCEMvnT2Lh4Xv_p69nGaqpBexhtxjUupAiQI9AiqrVZ_TxoVbiinkC3xgyqSozlue3zWOKaj0rdApsD4pe_HGQxEoLmgg8a2ctvUQ/file Reusing existing connection to uc00eb81e39e15f3e7eb2a63114c.dl.dropboxusercontent.com:443. HTTP request sent, awaiting response... 200 OK Length: 69155672 (66M) [application/zip] Saving to: ‘creditfraud.zip’ creditfraud.zip 100%[===================>] 65.95M 45.5MB/s in 1.4s 2019-11-28 04:07:04 (45.5 MB/s) - ‘creditfraud.zip’ saved [69155672/69155672]
!unzip creditfraud.zip
Archive: creditfraud.zip replace creditcard.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: y inflating: creditcard.csv
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from keras.models import Model, load_model
from keras.layers import Input, Dense
dat=pd.read_csv('creditcard.csv')
dat.head()
Time | V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | Amount | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | -1.359807 | -0.072781 | 2.536347 | 1.378155 | -0.338321 | 0.462388 | 0.239599 | 0.098698 | 0.363787 | 0.090794 | -0.551600 | -0.617801 | -0.991390 | -0.311169 | 1.468177 | -0.470401 | 0.207971 | 0.025791 | 0.403993 | 0.251412 | -0.018307 | 0.277838 | -0.110474 | 0.066928 | 0.128539 | -0.189115 | 0.133558 | -0.021053 | 149.62 | 0 |
1 | 0.0 | 1.191857 | 0.266151 | 0.166480 | 0.448154 | 0.060018 | -0.082361 | -0.078803 | 0.085102 | -0.255425 | -0.166974 | 1.612727 | 1.065235 | 0.489095 | -0.143772 | 0.635558 | 0.463917 | -0.114805 | -0.183361 | -0.145783 | -0.069083 | -0.225775 | -0.638672 | 0.101288 | -0.339846 | 0.167170 | 0.125895 | -0.008983 | 0.014724 | 2.69 | 0 |
2 | 1.0 | -1.358354 | -1.340163 | 1.773209 | 0.379780 | -0.503198 | 1.800499 | 0.791461 | 0.247676 | -1.514654 | 0.207643 | 0.624501 | 0.066084 | 0.717293 | -0.165946 | 2.345865 | -2.890083 | 1.109969 | -0.121359 | -2.261857 | 0.524980 | 0.247998 | 0.771679 | 0.909412 | -0.689281 | -0.327642 | -0.139097 | -0.055353 | -0.059752 | 378.66 | 0 |
3 | 1.0 | -0.966272 | -0.185226 | 1.792993 | -0.863291 | -0.010309 | 1.247203 | 0.237609 | 0.377436 | -1.387024 | -0.054952 | -0.226487 | 0.178228 | 0.507757 | -0.287924 | -0.631418 | -1.059647 | -0.684093 | 1.965775 | -1.232622 | -0.208038 | -0.108300 | 0.005274 | -0.190321 | -1.175575 | 0.647376 | -0.221929 | 0.062723 | 0.061458 | 123.50 | 0 |
4 | 2.0 | -1.158233 | 0.877737 | 1.548718 | 0.403034 | -0.407193 | 0.095921 | 0.592941 | -0.270533 | 0.817739 | 0.753074 | -0.822843 | 0.538196 | 1.345852 | -1.119670 | 0.175121 | -0.451449 | -0.237033 | -0.038195 | 0.803487 | 0.408542 | -0.009431 | 0.798278 | -0.137458 | 0.141267 | -0.206010 | 0.502292 | 0.219422 | 0.215153 | 69.99 | 0 |
The dataset is highly unbalanced with very few fraudulent credit cards
dat['Class'].value_counts()/dat['Class'].count()
0 0.998273 1 0.001727 Name: Class, dtype: float64
sns.countplot(x='Class',data=dat)
<matplotlib.axes._subplots.AxesSubplot at 0x7f343caa49e8>
dat = dat.drop([ 'Time'], 1)
dat['Amount'] = StandardScaler().fit_transform(dat['Amount'].values.reshape(-1, 1))
Splitting into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(dat.drop('Class',1) , dat['Class'], test_size=0.5, random_state=0)
y_test.value_counts()/y_test.count()
0 0.998294 1 0.001706 Name: Class, dtype: float64
y_train.value_counts()/y_train.count()
0 0.998251 1 0.001749 Name: Class, dtype: float64
For our first example we will train our autoencoder only on non fraudulent cases
X_train_normal = X_train[y_train==0]
X_train_fraud = X_train[y_train==1]
Building an autoencoder with
input_layer = Input(shape=(29, ))
encoded = Dense(12,activation='tanh')(input_layer)
decoded = Dense(29,activation='sigmoid')(encoded)
autoencoder = Model(input_layer,decoded)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4432: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
autoencoder.compile(optimizer='adam',loss='mean_squared_error')
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
autoencoder.fit(X_train_normal, X_train_normal, epochs = 100, batch_size=128,
validation_data=(X_train_normal,X_train_normal))
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1033: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1020: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3005: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. Train on 142154 samples, validate on 142154 samples Epoch 1/100 WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:197: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:207: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:216: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:223: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead. 142154/142154 [==============================] - 3s 18us/step - loss: 0.9911 - val_loss: 0.8843 Epoch 2/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.8598 - val_loss: 0.8426 Epoch 3/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.8328 - val_loss: 0.8244 Epoch 4/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.8182 - val_loss: 0.8125 Epoch 5/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.8080 - val_loss: 0.8037 Epoch 6/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.8004 - val_loss: 0.7972 Epoch 7/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7946 - val_loss: 0.7921 Epoch 8/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7903 - val_loss: 0.7885 Epoch 9/100 142154/142154 [==============================] - 3s 19us/step - loss: 0.7870 - val_loss: 0.7855 Epoch 10/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7844 - val_loss: 0.7834 Epoch 11/100 142154/142154 [==============================] - 3s 20us/step - loss: 0.7824 - val_loss: 0.7815 Epoch 12/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7809 - val_loss: 0.7801 Epoch 13/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7796 - val_loss: 0.7790 Epoch 14/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7786 - val_loss: 0.7781 Epoch 15/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7776 - val_loss: 0.7773 Epoch 16/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7769 - val_loss: 0.7765 Epoch 17/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7762 - val_loss: 0.7759 Epoch 18/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7756 - val_loss: 0.7755 Epoch 19/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7750 - val_loss: 0.7748 Epoch 20/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7745 - val_loss: 0.7742 Epoch 21/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7741 - val_loss: 0.7738 Epoch 22/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7736 - val_loss: 0.7732 Epoch 23/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7733 - val_loss: 0.7728 Epoch 24/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7729 - val_loss: 0.7727 Epoch 25/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7725 - val_loss: 0.7724 Epoch 26/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7723 - val_loss: 0.7720 Epoch 27/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7721 - val_loss: 0.7719 Epoch 28/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7719 - val_loss: 0.7718 Epoch 29/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7716 - val_loss: 0.7715 Epoch 30/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7715 - val_loss: 0.7715 Epoch 31/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7712 - val_loss: 0.7711 Epoch 32/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7711 - val_loss: 0.7713 Epoch 33/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7710 - val_loss: 0.7707 Epoch 34/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7708 - val_loss: 0.7708 Epoch 35/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7706 - val_loss: 0.7706 Epoch 36/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7705 - val_loss: 0.7705 Epoch 37/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7704 - val_loss: 0.7704 Epoch 38/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7703 - val_loss: 0.7705 Epoch 39/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7703 - val_loss: 0.7701 Epoch 40/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7701 - val_loss: 0.7702 Epoch 41/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7701 - val_loss: 0.7698 Epoch 42/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7699 - val_loss: 0.7698 Epoch 43/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7699 - val_loss: 0.7698 Epoch 44/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7699 - val_loss: 0.7700 Epoch 45/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7698 - val_loss: 0.7695 Epoch 46/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7696 - val_loss: 0.7699 Epoch 47/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7696 - val_loss: 0.7693 Epoch 48/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7695 - val_loss: 0.7694 Epoch 49/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7694 - val_loss: 0.7698 Epoch 50/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7694 - val_loss: 0.7691 Epoch 51/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7693 - val_loss: 0.7694 Epoch 52/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7693 - val_loss: 0.7693 Epoch 53/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7692 - val_loss: 0.7692 Epoch 54/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7691 - val_loss: 0.7691 Epoch 55/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7692 - val_loss: 0.7689 Epoch 56/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7691 - val_loss: 0.7689 Epoch 57/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7690 - val_loss: 0.7691 Epoch 58/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7689 - val_loss: 0.7690 Epoch 59/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7689 - val_loss: 0.7688 Epoch 60/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7689 - val_loss: 0.7687 Epoch 61/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7688 - val_loss: 0.7687 Epoch 62/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7688 - val_loss: 0.7686 Epoch 63/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7688 - val_loss: 0.7686 Epoch 64/100 142154/142154 [==============================] - 3s 18us/step - loss: 0.7687 - val_loss: 0.7685 Epoch 65/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7687 - val_loss: 0.7685 Epoch 66/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7687 - val_loss: 0.7689 Epoch 67/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7687 - val_loss: 0.7684 Epoch 68/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7686 - val_loss: 0.7685 Epoch 69/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7686 - val_loss: 0.7683 Epoch 70/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7685 - val_loss: 0.7687 Epoch 71/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7685 - val_loss: 0.7683 Epoch 72/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7685 - val_loss: 0.7690 Epoch 73/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7685 - val_loss: 0.7685 Epoch 74/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7684 - val_loss: 0.7688 Epoch 75/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7685 - val_loss: 0.7684 Epoch 76/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7683 - val_loss: 0.7682 Epoch 77/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7683 - val_loss: 0.7684 Epoch 78/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7683 - val_loss: 0.7681 Epoch 79/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7683 - val_loss: 0.7682 Epoch 80/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7683 - val_loss: 0.7684 Epoch 81/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7682 - val_loss: 0.7681 Epoch 82/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7682 - val_loss: 0.7680 Epoch 83/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7681 - val_loss: 0.7680 Epoch 84/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7682 - val_loss: 0.7682 Epoch 85/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7682 - val_loss: 0.7681 Epoch 86/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7681 - val_loss: 0.7680 Epoch 87/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7681 - val_loss: 0.7679 Epoch 88/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7680 - val_loss: 0.7692 Epoch 89/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7681 - val_loss: 0.7679 Epoch 90/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7679 - val_loss: 0.7682 Epoch 91/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7680 - val_loss: 0.7679 Epoch 92/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7680 - val_loss: 0.7679 Epoch 93/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7679 - val_loss: 0.7677 Epoch 94/100 142154/142154 [==============================] - 2s 16us/step - loss: 0.7680 - val_loss: 0.7677 Epoch 95/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7679 - val_loss: 0.7679 Epoch 96/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7679 - val_loss: 0.7678 Epoch 97/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7679 - val_loss: 0.7677 Epoch 98/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7679 - val_loss: 0.7678 Epoch 99/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7678 - val_loss: 0.7677 Epoch 100/100 142154/142154 [==============================] - 2s 17us/step - loss: 0.7679 - val_loss: 0.7675
<keras.callbacks.History at 0x7f343aae3390>
predictions = autoencoder.predict(X_train)
mse = np.mean(np.power(X_train - predictions, 2), axis=1)
error_df = pd.DataFrame({'reconstruction_error': mse,
'true_class': y_train})
error_df.groupby('true_class').describe()
reconstruction_error | ||||||||
---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | |
true_class | ||||||||
0 | 142154.0 | 0.767519 | 3.439808 | 0.037731 | 0.227251 | 0.396262 | 0.645605 | 318.941692 |
1 | 249.0 | 29.855354 | 43.107802 | 0.118304 | 4.228176 | 10.783928 | 26.808312 | 282.265950 |
As we can see above the error for non fraudulent case is lower than the error for fraudulent cases. We use a threshold of mean plus 3 sds to classify the test set.
test_predictions=autoencoder.predict(X_test)
mse = np.mean(np.power(X_test - test_predictions, 2), axis=1)
y_pred=[(lambda er: 1 if er>=11.078922 else 0)(er) for er in mse]
conf_matrix = metrics.confusion_matrix(y_test,y_pred)
ax=plt.subplot()
sns.heatmap(conf_matrix,annot=True,ax=ax,fmt='g')#annot=True to annotate cells, fmt='g' numbers not scientific form
ax.set_xlabel('Predicted labels'); ax.set_ylabel('True labels')
ax.set_title('Confusion Matrix');
ax.xaxis.set_ticklabels(['Normal', 'Fraud']); ax.yaxis.set_ticklabels(['Normal', 'Fraud']);
ax.set(yticks=[0, 2],
xticks=[0.5, 1.5])
ax.yaxis.set_major_locator(ticker.IndexLocator(base=1, offset=0.5))
We train using all cases (fraud/non-fraud)in train dataset and use the result to map the instances into a 12-dimensional space. The mapped cases are fed to k-NN for classification.
input_layer_all = Input(shape=(29, ))
encoded_all = Dense(12,activation='tanh')(input_layer_all)
decoded_all = Dense(29,activation='sigmoid')(encoded_all)
autoencoder_all = Model(input_layer_all,decoded_all)
autoencoder_all.compile(optimizer='adam',loss='mean_squared_error')
autoencoder_all.fit(X_train, X_train, epochs = 100, batch_size=128,
validation_data=(X_train,X_train))
Train on 142403 samples, validate on 142403 samples Epoch 1/100 142403/142403 [==============================] - 3s 20us/step - loss: 1.0494 - val_loss: 0.9373 Epoch 2/100 142403/142403 [==============================] - 3s 19us/step - loss: 0.9123 - val_loss: 0.8942 Epoch 3/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8835 - val_loss: 0.8743 Epoch 4/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8679 - val_loss: 0.8622 Epoch 5/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8580 - val_loss: 0.8540 Epoch 6/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8509 - val_loss: 0.8480 Epoch 7/100 142403/142403 [==============================] - 3s 18us/step - loss: 0.8457 - val_loss: 0.8436 Epoch 8/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8419 - val_loss: 0.8402 Epoch 9/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8389 - val_loss: 0.8376 Epoch 10/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8365 - val_loss: 0.8353 Epoch 11/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8344 - val_loss: 0.8336 Epoch 12/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8328 - val_loss: 0.8321 Epoch 13/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8315 - val_loss: 0.8308 Epoch 14/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8303 - val_loss: 0.8297 Epoch 15/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8294 - val_loss: 0.8290 Epoch 16/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8284 - val_loss: 0.8280 Epoch 17/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8277 - val_loss: 0.8272 Epoch 18/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8270 - val_loss: 0.8266 Epoch 19/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8263 - val_loss: 0.8260 Epoch 20/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8258 - val_loss: 0.8257 Epoch 21/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8255 - val_loss: 0.8251 Epoch 22/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8250 - val_loss: 0.8247 Epoch 23/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8246 - val_loss: 0.8243 Epoch 24/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8243 - val_loss: 0.8244 Epoch 25/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8241 - val_loss: 0.8237 Epoch 26/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8238 - val_loss: 0.8239 Epoch 27/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8236 - val_loss: 0.8234 Epoch 28/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8233 - val_loss: 0.8231 Epoch 29/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8231 - val_loss: 0.8228 Epoch 30/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8228 - val_loss: 0.8226 Epoch 31/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8227 - val_loss: 0.8225 Epoch 32/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8225 - val_loss: 0.8222 Epoch 33/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8223 - val_loss: 0.8224 Epoch 34/100 142403/142403 [==============================] - 3s 18us/step - loss: 0.8222 - val_loss: 0.8219 Epoch 35/100 142403/142403 [==============================] - 3s 18us/step - loss: 0.8220 - val_loss: 0.8218 Epoch 36/100 142403/142403 [==============================] - 3s 18us/step - loss: 0.8219 - val_loss: 0.8218 Epoch 37/100 142403/142403 [==============================] - 3s 20us/step - loss: 0.8217 - val_loss: 0.8215 Epoch 38/100 142403/142403 [==============================] - 2s 18us/step - loss: 0.8216 - val_loss: 0.8215 Epoch 39/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8215 - val_loss: 0.8213 Epoch 40/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8214 - val_loss: 0.8213 Epoch 41/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8214 - val_loss: 0.8211 Epoch 42/100 142403/142403 [==============================] - 3s 18us/step - loss: 0.8212 - val_loss: 0.8209 Epoch 43/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8210 - val_loss: 0.8209 Epoch 44/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8210 - val_loss: 0.8208 Epoch 45/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8209 - val_loss: 0.8208 Epoch 46/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8208 - val_loss: 0.8209 Epoch 47/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8207 - val_loss: 0.8205 Epoch 48/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8207 - val_loss: 0.8206 Epoch 49/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8206 - val_loss: 0.8205 Epoch 50/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8205 - val_loss: 0.8203 Epoch 51/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8204 - val_loss: 0.8202 Epoch 52/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8203 - val_loss: 0.8202 Epoch 53/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8203 - val_loss: 0.8202 Epoch 54/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8203 - val_loss: 0.8201 Epoch 55/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8202 - val_loss: 0.8202 Epoch 56/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8201 - val_loss: 0.8200 Epoch 57/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8200 - val_loss: 0.8197 Epoch 58/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8200 - val_loss: 0.8199 Epoch 59/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8199 - val_loss: 0.8196 Epoch 60/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8199 - val_loss: 0.8197 Epoch 61/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8198 - val_loss: 0.8197 Epoch 62/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8197 - val_loss: 0.8198 Epoch 63/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8197 - val_loss: 0.8195 Epoch 64/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8196 - val_loss: 0.8195 Epoch 65/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8195 - val_loss: 0.8195 Epoch 66/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8194 - val_loss: 0.8193 Epoch 67/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8194 - val_loss: 0.8192 Epoch 68/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8194 - val_loss: 0.8192 Epoch 69/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8193 - val_loss: 0.8191 Epoch 70/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8193 - val_loss: 0.8192 Epoch 71/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8192 - val_loss: 0.8196 Epoch 72/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8192 - val_loss: 0.8194 Epoch 73/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8191 - val_loss: 0.8190 Epoch 74/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8190 - val_loss: 0.8189 Epoch 75/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8192 - val_loss: 0.8190 Epoch 76/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8191 - val_loss: 0.8188 Epoch 77/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8189 - val_loss: 0.8189 Epoch 78/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8190 - val_loss: 0.8191 Epoch 79/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8188 - val_loss: 0.8187 Epoch 80/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8188 - val_loss: 0.8186 Epoch 81/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8189 - val_loss: 0.8184 Epoch 82/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8186 - val_loss: 0.8185 Epoch 83/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8188 - val_loss: 0.8185 Epoch 84/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8188 - val_loss: 0.8184 Epoch 85/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8186 - val_loss: 0.8189 Epoch 86/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8187 - val_loss: 0.8184 Epoch 87/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8185 - val_loss: 0.8186 Epoch 88/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8186 - val_loss: 0.8184 Epoch 89/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8186 - val_loss: 0.8186 Epoch 90/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8186 - val_loss: 0.8182 Epoch 91/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8185 - val_loss: 0.8186 Epoch 92/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8185 - val_loss: 0.8182 Epoch 93/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8183 - val_loss: 0.8183 Epoch 94/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8184 - val_loss: 0.8183 Epoch 95/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8182 - val_loss: 0.8184 Epoch 96/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8183 - val_loss: 0.8180 Epoch 97/100 142403/142403 [==============================] - 2s 17us/step - loss: 0.8183 - val_loss: 0.8182 Epoch 98/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8182 - val_loss: 0.8182 Epoch 99/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8182 - val_loss: 0.8182 Epoch 100/100 142403/142403 [==============================] - 2s 16us/step - loss: 0.8181 - val_loss: 0.8182
<keras.callbacks.History at 0x7f3432236dd8>
encoder_all = Model(input_layer_all,encoded_all)
enc_all = encoder_all.predict(X_train)
Loading library for k-NN
from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors=3)
# Train the model using the training sets
knn_model.fit(enc_all,y_train)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=3, p=2, weights='uniform')
%%time
knn_predicted= knn_model.predict(encoder_all.predict(X_test))
CPU times: user 34.1 s, sys: 130 ms, total: 34.2 s Wall time: 33.6 s
conf_matrix = metrics.confusion_matrix(y_test,knn_predicted)
ax=plt.subplot()
sns.heatmap(conf_matrix,annot=True,ax=ax,fmt='g')#annot=True to annotate cells, fmt='g' numbers not scientific form
ax.set_xlabel('Predicted labels'); ax.set_ylabel('True labels')
ax.set_title('Confusion Matrix');
ax.xaxis.set_ticklabels(['Normal', 'Fraud']); ax.yaxis.set_ticklabels(['Normal', 'Fraud']);
ax.set(yticks=[0, 2],
xticks=[0.5, 1.5])
ax.yaxis.set_major_locator(ticker.IndexLocator(base=1, offset=0.5))