The CorrPlot
builder takes a dataframe (Kotlin Map<*, *>
) as the input and builds a correlation plot.
If the input has NxN shape and contains only numbers in range [0..1], then it is plotted as is. Otherwise CorrPlot
will compute correlation coefficients using the Pearson's method.
CorrPlot
allows to combine 'tile', 'point' or 'label' layers in a matrix of "full", "lower" or "upper" type.
A call to the terminal build()
method will create a resulting 'plot' object.
This 'plot' object can be further refined using regular Lets-Plot (ggplot) API, like + ggsize()
and so on.
The Ames Housing dataset for this demo was downloaded from House Prices - Advanced Regression Techniques (train.csv), (c) Kaggle.
%useLatestDescriptors
%use lets-plot
%use dataframe
LetsPlot.getInfo()
Lets-Plot Kotlin API v.4.4.2. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.4.0.0.
// Cars MPG dataset
var mpg_df = DataFrame.readCSV("https://raw.githubusercontent.com/JetBrains/lets-plot-kotlin/master/docs/examples/data/mpg.csv")
mpg_df.head(3)
DataFrame: rowsCount = 3, columnsCount = 12
mpg_df = mpg_df.remove("")
mpg_df.head(3)
DataFrame: rowsCount = 3, columnsCount = 12
val mpg_dat = mpg_df.toMap()
When combining layers, CorrPlot
chooses an acceptable plot configuration by default.
gggrid(
listOf(
CorrPlot(mpg_dat, "Tiles").tiles().build(),
CorrPlot(mpg_dat, "Points").points().build(),
CorrPlot(mpg_dat, "Tiles and labels").tiles().labels().build(),
CorrPlot(mpg_dat, "Tiles, points and labels").points().labels().tiles().build()
), 2, 400, 320)
The default plot configuration adapts to the changing options - compare "Tiles and labels" plot above and below.
You can also override the default plot configuration using the parameter type
- compare "Tiles, points and labels" plot above and below.
gggrid(
listOf(
CorrPlot(mpg_dat, "Tiles and labels").tiles().labels(color="white").build(),
CorrPlot(mpg_dat, "Tiles, points and labels")
.tiles(type="upper")
.points(type="lower")
.labels(type="full").build()
), 2, 400, 320)
Instead of the default blue-grey-red gradient you can define your own lower-middle-upper colors, or choose one of the available 'Brewer' diverging palettes.
Let's create a gradient resembling one of Seaborn gradients.
val corrPlot = CorrPlot(mpg_dat).points().labels().tiles()
// Configure gradient resembling one of Seaborn gradients.
val withGradientColors = (corrPlot
.paletteGradient(low="#417555", mid="#EDEDED", high="#963CA7")
.build()) + ggtitle("Custom gradient")
// Configure Brewer 'BrBG' palette.
val withBrewerColors = (corrPlot
.paletteSpectral()
.build()) + ggtitle("Brewer 'Spectral'")
// Show both plots
gggrid(listOf(withGradientColors, withBrewerColors), 2, 400, 320)
The Kaggle House Prices dataset contains 81 variables.
val housing_df = DataFrame.readCSV("../data/Ames_house_prices_train.csv")
housing_df.head(3)
DataFrame: rowsCount = 3, columnsCount = 81
Correlation plot that shows all the correlations in this dataset is too large and barely useful.
CorrPlot(housing_df.toMap())
.tiles(type="lower")
.paletteBrBG()
.build()
threshold
parameter.¶The threshold
parameter let us specify a level of significance, below which variables are not shown.
CorrPlot(housing_df.toMap(), "Threshold: 0.5", threshold = 0.5, adjustSize = 0.7)
.tiles(type="full", diag=false)
.paletteBrBG()
.build()
Let's further increase our threshold in order to see only highly correlated variables.
CorrPlot(housing_df.toMap(), "Threshold: 0.8", threshold = 0.8)
.tiles(diag=false)
.labels(color="white", diag=false)
.paletteBrBG()
.build()