! pip uninstall pycaret
!pip install git+https://github.com/amjadraza/pycaret.git@feature/gcp_zure_integration
from pycaret.classification import *
from pycaret.datasets import get_data
dataset = get_data('credit')
LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_1 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | PAY_6 | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 20000 | 2 | 2 | 1 | 24 | 2 | 2 | -1 | -1 | -2 | -2 | 3913.0 | 3102.0 | 689.0 | 0.0 | 0.0 | 0.0 | 0.0 | 689.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
1 | 90000 | 2 | 2 | 2 | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 29239.0 | 14027.0 | 13559.0 | 14331.0 | 14948.0 | 15549.0 | 1518.0 | 1500.0 | 1000.0 | 1000.0 | 1000.0 | 5000.0 | 0 |
2 | 50000 | 2 | 2 | 1 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 46990.0 | 48233.0 | 49291.0 | 28314.0 | 28959.0 | 29547.0 | 2000.0 | 2019.0 | 1200.0 | 1100.0 | 1069.0 | 1000.0 | 0 |
3 | 50000 | 1 | 2 | 1 | 57 | -1 | 0 | -1 | 0 | 0 | 0 | 8617.0 | 5670.0 | 35835.0 | 20940.0 | 19146.0 | 19131.0 | 2000.0 | 36681.0 | 10000.0 | 9000.0 | 689.0 | 679.0 | 0 |
4 | 50000 | 1 | 1 | 2 | 37 | 0 | 0 | 0 | 0 | 0 | 0 | 64400.0 | 57069.0 | 57608.0 | 19394.0 | 19619.0 | 20024.0 | 2500.0 | 1815.0 | 657.0 | 1000.0 | 1000.0 | 800.0 | 0 |
data = dataset.sample(frac=0.95, random_state=786).reset_index(drop=True)
data_unseen = dataset.drop(data.index).reset_index(drop=True)
print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))
Data for Modeling: (22800, 24) Unseen Data For Predictions: (1200, 24)
exp_clf101 = setup(data = data, target = 'default', session_id=123)
rf = create_model('rf')
Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
---|---|---|---|---|---|---|---|
0 | 0.8095 | 0.7531 | 0.3428 | 0.6269 | 0.4432 | 0.3400 | 0.3626 |
1 | 0.8127 | 0.7451 | 0.3399 | 0.6452 | 0.4453 | 0.3453 | 0.3710 |
2 | 0.8076 | 0.7714 | 0.3258 | 0.6250 | 0.4283 | 0.3262 | 0.3512 |
3 | 0.7989 | 0.7185 | 0.3144 | 0.5842 | 0.4088 | 0.3006 | 0.3215 |
4 | 0.8051 | 0.7249 | 0.3229 | 0.6129 | 0.4230 | 0.3191 | 0.3428 |
5 | 0.8152 | 0.7324 | 0.3569 | 0.6495 | 0.4607 | 0.3603 | 0.3839 |
6 | 0.8039 | 0.7244 | 0.3371 | 0.6010 | 0.4319 | 0.3246 | 0.3444 |
7 | 0.8158 | 0.7711 | 0.3399 | 0.6630 | 0.4494 | 0.3523 | 0.3807 |
8 | 0.8139 | 0.7183 | 0.3258 | 0.6609 | 0.4364 | 0.3400 | 0.3706 |
9 | 0.8107 | 0.7419 | 0.3569 | 0.6269 | 0.4549 | 0.3506 | 0.3710 |
Mean | 0.8093 | 0.7401 | 0.3363 | 0.6295 | 0.4382 | 0.3359 | 0.3600 |
SD | 0.0052 | 0.0190 | 0.0134 | 0.0243 | 0.0149 | 0.0172 | 0.0186 |
tuned_rf = tune_model(rf)
Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
---|---|---|---|---|---|---|---|
0 | 0.8258 | 0.7863 | 0.3654 | 0.7049 | 0.4813 | 0.3891 | 0.4194 |
1 | 0.8227 | 0.7977 | 0.3541 | 0.6944 | 0.4690 | 0.3758 | 0.4066 |
2 | 0.8233 | 0.8225 | 0.3853 | 0.6766 | 0.4910 | 0.3937 | 0.4165 |
3 | 0.8177 | 0.7713 | 0.3598 | 0.6615 | 0.4661 | 0.3675 | 0.3923 |
4 | 0.8227 | 0.7805 | 0.3513 | 0.6966 | 0.4670 | 0.3743 | 0.4059 |
5 | 0.8227 | 0.7955 | 0.3683 | 0.6842 | 0.4788 | 0.3834 | 0.4101 |
6 | 0.8158 | 0.7568 | 0.3371 | 0.6648 | 0.4474 | 0.3507 | 0.3799 |
7 | 0.8377 | 0.7941 | 0.3768 | 0.7733 | 0.5067 | 0.4231 | 0.4623 |
8 | 0.8227 | 0.7671 | 0.3569 | 0.6923 | 0.4710 | 0.3773 | 0.4073 |
9 | 0.8138 | 0.7833 | 0.3654 | 0.6386 | 0.4649 | 0.3621 | 0.3828 |
Mean | 0.8225 | 0.7855 | 0.3620 | 0.6887 | 0.4743 | 0.3797 | 0.4083 |
SD | 0.0062 | 0.0176 | 0.0128 | 0.0339 | 0.0154 | 0.0188 | 0.0220 |
predict_model(tuned_rf);
Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
---|---|---|---|---|---|---|---|---|
0 | Random Forest Classifier | 0.8135 | 0.7563 | 0.3245 | 0.6591 | 0.4349 | 0.3383 | 0.3688 |
final_rf = finalize_model(tuned_rf)
#Final Random Forest model parameters for deployment
print(final_rf)
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=10, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=2, min_samples_split=10, min_weight_fraction_leaf=0.0, n_estimators=70, n_jobs=-1, oob_score=False, random_state=123, verbose=0, warm_start=False)
predict_model(final_rf);
Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
---|---|---|---|---|---|---|---|---|
0 | Random Forest Classifier | 0.8345 | 0.8222 | 0.3629 | 0.7657 | 0.4924 | 0.4082 | 0.4489 |
This is the code to deploy model on Microsft azure using pycaret
functionalities.
! pip install azure-storage-blob
Requirement already satisfied: azure-storage-blob in /usr/local/lib/python3.6/dist-packages (12.3.2) Requirement already satisfied: cryptography>=2.1.4 in /usr/local/lib/python3.6/dist-packages (from azure-storage-blob) (3.0) Requirement already satisfied: msrest>=0.6.10 in /usr/local/lib/python3.6/dist-packages (from azure-storage-blob) (0.6.18) Requirement already satisfied: azure-core<2.0.0,>=1.6.0 in /usr/local/lib/python3.6/dist-packages (from azure-storage-blob) (1.7.0) Requirement already satisfied: six>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from cryptography>=2.1.4->azure-storage-blob) (1.15.0) Requirement already satisfied: cffi!=1.11.3,>=1.8 in /usr/local/lib/python3.6/dist-packages (from cryptography>=2.1.4->azure-storage-blob) (1.14.1) Requirement already satisfied: requests~=2.16 in /usr/local/lib/python3.6/dist-packages (from msrest>=0.6.10->azure-storage-blob) (2.23.0) Requirement already satisfied: requests-oauthlib>=0.5.0 in /usr/local/lib/python3.6/dist-packages (from msrest>=0.6.10->azure-storage-blob) (1.3.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from msrest>=0.6.10->azure-storage-blob) (2020.6.20) Requirement already satisfied: isodate>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from msrest>=0.6.10->azure-storage-blob) (0.6.0) Requirement already satisfied: pycparser in /usr/local/lib/python3.6/dist-packages (from cffi!=1.11.3,>=1.8->cryptography>=2.1.4->azure-storage-blob) (2.20) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests~=2.16->msrest>=0.6.10->azure-storage-blob) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests~=2.16->msrest>=0.6.10->azure-storage-blob) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests~=2.16->msrest>=0.6.10->azure-storage-blob) (1.24.3) Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.6/dist-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.10->azure-storage-blob) (3.1.0)
## Enter connection string when running in google colab
connect_str = '' #@param {type:"string"}
print(connect_str)
#! export AZURE_STORAGE_CONNECTION_STRING=connect_str
os.environ['AZURE_STORAGE_CONNECTION_STRING']= connect_str
! echo $AZURE_STORAGE_CONNECTION_STRING
os.getenv('AZURE_STORAGE_CONNECTION_STRING')
authentication = {'container': 'pycaret-cls-101'}
model_name = 'rf-clf-101'
deploy_model(final_rf, model_name, authentication, platform = 'azure')
Deploying model to Microsoft Azure Uploading to Azure Storage as blob: rf-clf-101.pkl
authentication = {'container': 'pycaret-cls-101'}
model_name = 'rf-clf-101'
model_azure = load_model(model_name,
platform = 'azure',
authentication = authentication,
verbose=True)
Loading model from Microsoft Azure Downloading blob to rf-clf-101.pkl Blob rf-clf-101.pkl downloaded to rf-clf-101.pkl. Transformation Pipeline and Model Successfully Loaded
authentication = {'container': 'pycaret-cls-101'}
model_name = 'rf-clf-101'
unseen_predictions = predict_model(model_name, data=data_unseen, platform='azure', authentication=authentication, verbose=True)
Loading model from Microsoft Azure Downloading blob to rf-clf-101.pkl Blob rf-clf-101.pkl downloaded to rf-clf-101.pkl. Transformation Pipeline and Model Successfully Loaded
unseen_predictions
LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_1 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | PAY_6 | BILL_AMT1 | BILL_AMT2 | BILL_AMT3 | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default | Label | Score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 50000 | 2 | 2 | 1 | 48 | 0 | 0 | 0 | 0 | 0 | 0 | 48572.0 | 45067.0 | 46492.0 | 47368.0 | 7988.0 | 8011.0 | 2028.0 | 2453.0 | 2329.0 | 431.0 | 300.0 | 500.0 | 0 | 0 | 0.1591 |
1 | 200000 | 2 | 1 | 1 | 40 | 2 | 2 | 2 | 2 | 2 | 2 | 80468.0 | 82874.0 | 84900.0 | 85758.0 | 87003.0 | 89112.0 | 4200.0 | 4100.0 | 3000.0 | 3400.0 | 3500.0 | 0.0 | 1 | 1 | 0.7779 |
2 | 50000 | 2 | 3 | 1 | 44 | 1 | 2 | 3 | 2 | 4 | 3 | 13112.0 | 14679.0 | 15143.0 | 16892.0 | 16341.0 | 15798.0 | 2100.0 | 1000.0 | 2300.0 | 0.0 | 0.0 | 0.0 | 1 | 1 | 0.6478 |
3 | 60000 | 2 | 2 | 1 | 31 | 2 | 2 | -1 | 0 | 0 | 0 | 63201.0 | 56600.0 | 54952.0 | 32094.0 | 31232.0 | 30384.0 | 1132.0 | 60994.0 | 1436.0 | 1047.0 | 1056.0 | 1053.0 | 1 | 1 | 0.5038 |
4 | 120000 | 2 | 3 | 2 | 32 | -1 | 0 | 0 | 0 | 0 | 0 | 66551.0 | 67876.0 | 69903.0 | 71446.0 | 79589.0 | 81354.0 | 2429.0 | 3120.0 | 3300.0 | 10000.0 | 3200.0 | 3200.0 | 0 | 0 | 0.1394 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1195 | 80000 | 1 | 2 | 2 | 34 | 2 | 2 | 2 | 2 | 2 | 2 | 72557.0 | 77708.0 | 79384.0 | 77519.0 | 82607.0 | 81158.0 | 7000.0 | 3500.0 | 0.0 | 7000.0 | 0.0 | 4000.0 | 1 | 1 | 0.7523 |
1196 | 150000 | 1 | 3 | 2 | 43 | -1 | -1 | -1 | -1 | 0 | 0 | 1683.0 | 1828.0 | 3502.0 | 8979.0 | 5190.0 | 0.0 | 1837.0 | 3526.0 | 8998.0 | 129.0 | 0.0 | 0.0 | 0 | 0 | 0.1499 |
1197 | 30000 | 1 | 2 | 2 | 37 | 4 | 3 | 2 | -1 | 0 | 0 | 3565.0 | 3356.0 | 2758.0 | 20878.0 | 20582.0 | 19357.0 | 0.0 | 0.0 | 22000.0 | 4200.0 | 2000.0 | 3100.0 | 1 | 0 | 0.4876 |
1198 | 80000 | 1 | 3 | 1 | 41 | 1 | -1 | 0 | 0 | 0 | -1 | -1645.0 | 78379.0 | 76304.0 | 52774.0 | 11855.0 | 48944.0 | 85900.0 | 3409.0 | 1178.0 | 1926.0 | 52964.0 | 1804.0 | 1 | 0 | 0.2613 |
1199 | 50000 | 1 | 2 | 1 | 46 | 0 | 0 | 0 | 0 | 0 | 0 | 47929.0 | 48905.0 | 49764.0 | 36535.0 | 32428.0 | 15313.0 | 2078.0 | 1800.0 | 1430.0 | 1000.0 | 1000.0 | 1000.0 | 1 | 0 | 0.1569 |
1200 rows × 26 columns
After the model is finalised and you are happy with the model, you can deploy the model on your cloud of choice. In this section, we deploy the model on the google cloud platform.
from google.colab import auth
auth.authenticate_user()
! pip install awscli
# GCP project name, Change the name based on your own GCP project.
CLOUD_PROJECT = 'gcpessentials-rz' # GCP project name
bucket_name = 'pycaret-clf101-test1' # bucket name for storage of your model
BUCKET = 'gs://' + CLOUD_PROJECT + '-{}'.format(bucket_name)
# Set the gcloud consol to $CLOUD_PROJECT Environment Variable for your Desired Project)
!gcloud config set project $CLOUD_PROJECT
authentication = {'project': CLOUD_PROJECT, 'bucket' : bucket_name}
model_name = 'rf-clf'
deploy_model(final_rf, model_name, authentication, platform = 'gcp')
authentication = {'project': CLOUD_PROJECT, 'bucket' : bucket_name}
model_name = 'rf-clf'
model_gcp = load_model(model_name,
platform = 'gcp',
authentication = authentication,
verbose=True)
estimator_ = load_model(model_name, platform='gcp',
authentication=authentication,
verbose=True)
authentication = {'project': CLOUD_PROJECT, 'bucket' : bucket_name}
model_name = 'rf-clf'
unseen_predictions = predict_model(model_name, data=data_unseen, platform='gcp', authentication=authentication, verbose=True)
unseen_predictions
authentication
import inspect as i
import sys
sys.stdout.write(i.getsource(predict_model))
The predict_model()
function is also used to predict on the unseen dataset. The only difference from section 11 above is that this time we will pass the data_unseen
parameter. data_unseen
is the variable created at the beginning of the tutorial and contains 5% (1200 samples) of the original dataset which was never exposed to PyCaret. (see section 5 for explanation)
unseen_predictions = predict_model(final_rf, data=data_unseen)
unseen_predictions.head()
The Label
and Score
columns are added onto the data_unseen
set. Label is the prediction and score is the probability of the prediction. Notice that predicted results are concatenated to the original dataset while all the transformations are automatically performed in the background.
We have now finished the experiment by finalizing the tuned_rf
model which is now stored in final_rf
variable. We have also used the model stored in final_rf
to predict data_unseen
. This brings us to the end of our experiment, but one question is still to be asked: What happens when you have more new data to predict? Do you have to go through the entire experiment again? The answer is no, PyCaret's inbuilt function save_model()
allows you to save the model along with entire transformation pipeline for later use.
save_model(final_rf,'Final RF Model 08Feb2020')
(TIP : It's always good to use date in the filename when saving models, it's good for version control.)
To load a saved model at a future date in the same or an alternative environment, we would use PyCaret's load_model()
function and then easily apply the saved model on new unseen data for prediction.
saved_final_rf = load_model('Final RF Model 08Feb2020')
Once the model is loaded in the environment, you can simply use it to predict on any new data using the same predict_model()
function. Below we have applied the loaded model to predict the same data_unseen
that we used in section 13 above.
new_prediction = predict_model(saved_final_rf, data=data_unseen)
new_prediction.head()
Notice that the results of unseen_predictions
and new_prediction
are identical.
This tutorial has covered the entire machine learning pipeline from data ingestion, pre-processing, training the model, hyperparameter tuning, prediction and saving the model for later use. We have completed all of these steps in less than 10 commands which are naturally constructed and very intuitive to remember such as create_model()
, tune_model()
, compare_models()
. Re-creating the entire experiment without PyCaret would have taken well over 100 lines of code in most libraries.
We have only covered the basics of pycaret.classification
. In following tutorials we will go deeper into advanced pre-processing, ensembling, generalized stacking and other techniques that allow you to fully customize your machine learning pipeline and are must know for any data scientist.
See you at the next tutorial. Follow the link to Binary Classification Tutorial (CLF102) - Intermediate Level