Created using: PyCaret 2.2
Date Updated: November 25, 2020
Welcome to the regression tutorial (REG102) - Level Intermediate. This tutorial assumes that you have completed Regression Tutorial (REG101) - Level Beginner. If you haven't used PyCaret before and this is your first tutorial, we strongly recommend you to go back and progress through the beginner tutorial to understand the basics of working in PyCaret.
In this tutorial we will use the pycaret.regression
module to learn:
Read Time : Approx 60 Minutes
If you haven't installed PyCaret yet. Please follow the link to Beginner's Tutorial for instructions on how to install pycaret.
If you are running this notebook on Google colab, run the following code at top of your notebook to display interactive visuals.
Before we into the practical execution of the techniques mentioned above in Section 1, it is important to understand what are these techniques are and when to use them. More often than not most of these techniques will help linear and parametric algorithms, however it is not surprising to also see performance gains in tree-based models. The Below explanations are only brief and we recommend that you do extra reading to dive deeper and get a more thorough understanding of these techniques.
For this tutorial we will be using the same dataset that was used in Regression Tutorial (REG101) - Level Beginner.
This case was prepared by Greg Mills (MBA ’07) under the supervision of Phillip E. Pfeifer, Alumni Research Professor of Business Administration. Copyright (c) 2007 by the University of Virginia Darden School Foundation, Charlottesville, VA. All rights reserved.
The original dataset and description can be found here.
You can download the data from the original source found here and load it using the pandas read_csv function or you can use PyCaret's data respository to load the data using the get_data function (This will require internet connection).
from pycaret.datasets import get_data
dataset = get_data('diamond', profile=True)
HBox(children=(FloatProgress(value=0.0, description='Summarize dataset', max=22.0, style=ProgressStyle(descrip…
HBox(children=(FloatProgress(value=0.0, description='Generate report structure', max=1.0, style=ProgressStyle(…
HBox(children=(FloatProgress(value=0.0, description='Render HTML', max=1.0, style=ProgressStyle(description_wi…