Created using: PyCaret 2.2
Date Updated: November 20, 2020
Welcome to the Binary Classification Tutorial (CLF102) - Level Intermediate. This tutorial assumes that you have completed Binary Classification Tutorial (CLF101) - Level Beginner. If you haven't used PyCaret before and this is your first tutorial, we strongly recommend you to go back and progress through the beginner tutorial to understand the basics of working in PyCaret.
In this tutorial we will use the
pycaret.classification module to learn:
Read Time : Approx 60 Minutes
If you haven't installed PyCaret yet, please follow the link to Beginner's Tutorial for instructions on how to install.
If you are running this notebook on Google colab, run the following code at top of your notebook to display interactive visuals.
from pycaret.utils import enable_colab
Before we get into the practical execution of the techniques mentioned above in the Section 1, it is important to understand what these techniques are and when to use them. More often than not most of these techniques will help linear and parametric algorithms, however it is not suprising to also see performance gains in tree-based models. The below explanations are only brief and we recommend that you to do extra reading to dive deeper and get a more thorough understanding of these techniques.
AGEfeature ranges between 21 to 79 while other numeric features range from 10,000 to 1,000,000. Read more
creditdataset there are features called
BILL_AMT1 .. BILL_AMT6which are related in such a way that
BILL_AMT1is the amount of the bill 1 month ago and
BILL_AMT6is the amount of the bill 6 months ago. Such features can be used to extract additional features based on the statistical properties of the distribution such as mean, median, variance, standard deviation etc.
Boosting. Stacking is also a type of ensemble learning where predictions from multiple models are used as input features for a meta model that predicts the final outcome. Read more
For this tutorial we will be using the same dataset that was used in Binary Classification Tutorial (CLF101) - Level Beginner
Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
The original dataset and data dictionary can be found here at the UCI Machine Learning Repository.
You can download the data from the original source found here and load it using the pandas read_csv function or you can use PyCaret's data respository to load the data using the get_data function (This will require an internet connection).
from pycaret.datasets import get_data dataset = get_data('credit', profile=True)