Correlation Plot¶

The CorrPlot builder takes a dataframe (Kotlin Map<*, *>) as the input and builds a correlation plot.

If the input has NxN shape and contains only numbers in range [0..1], then it is plotted as is. Otherwise CorrPlot will compute correlation coefficients using the Pearson's method.

CorrPlot allows to combine 'tile', 'point' or 'label' layers in a matrix of "full", "lower" or "upper" type.

A call to the terminal build() method will create a resulting 'plot' object. This 'plot' object can be further refined using regular Lets-Plot (ggplot) API, like + ggsize() and so on.

The Ames Housing dataset for this demo was downloaded from House Prices - Advanced Regression Techniques (train.csv), (c) Kaggle.

In [1]:

%useLatestDescriptors
%use lets-plot
%use dataframe

LetsPlot.getInfo()

Out[1]:

Lets-Plot Kotlin API v.4.4.2. Frontend: Notebook with dynamically loaded JS. Lets-Plot JS v.4.0.0.

In [2]:

// Cars MPG dataset
var mpg_df = DataFrame.readCSV("https://raw.githubusercontent.com/JetBrains/lets-plot-kotlin/master/docs/examples/data/mpg.csv")
mpg_df.head(3)

Out[2]:

DataFrame: rowsCount = 3, columnsCount = 12

In [3]:

mpg_df = mpg_df.remove("")
mpg_df.head(3)

Out[3]:

DataFrame: rowsCount = 3, columnsCount = 12

In [4]:

val mpg_dat = mpg_df.toMap()

Combining 'tile', 'point' and 'label' layers.¶

When combining layers, CorrPlot chooses an acceptable plot configuration by default.

In [5]:

gggrid(
    listOf(
        CorrPlot(mpg_dat, "Tiles").tiles().build(),
        CorrPlot(mpg_dat, "Points").points().build(), 
        CorrPlot(mpg_dat, "Tiles and labels").tiles().labels().build(),
        CorrPlot(mpg_dat, "Tiles, points and labels").points().labels().tiles().build()
    ), 2, 400, 320)

Out[5]:

The default plot configuration adapts to the changing options - compare "Tiles and labels" plot above and below.

You can also override the default plot configuration using the parameter type - compare "Tiles, points and labels" plot above and below.

In [6]:

gggrid(
    listOf(
        CorrPlot(mpg_dat, "Tiles and labels").tiles().labels(color="white").build(),
        CorrPlot(mpg_dat, "Tiles, points and labels")
         .tiles(type="upper")
         .points(type="lower")
         .labels(type="full").build()
    ), 2, 400, 320)

Out[6]:

Customizing colors.¶

Instead of the default blue-grey-red gradient you can define your own lower-middle-upper colors, or choose one of the available 'Brewer' diverging palettes.

Let's create a gradient resembling one of Seaborn gradients.

In [7]:

val corrPlot = CorrPlot(mpg_dat).points().labels().tiles()

// Configure gradient resembling one of Seaborn gradients.
val withGradientColors = (corrPlot
            .paletteGradient(low="#417555", mid="#EDEDED", high="#963CA7")
            .build()) + ggtitle("Custom gradient")

// Configure Brewer 'BrBG' palette.
val withBrewerColors = (corrPlot
            .paletteSpectral()
            .build()) + ggtitle("Brewer 'Spectral'")

// Show both plots
gggrid(listOf(withGradientColors, withBrewerColors), 2, 400, 320)

Out[7]:

Correlation plot with large number of variables in dataset.¶

The Kaggle House Prices dataset contains 81 variables.

In [8]:

val housing_df = DataFrame.readCSV("../data/Ames_house_prices_train.csv")
housing_df.head(3)

Out[8]:

DataFrame: rowsCount = 3, columnsCount = 81

Correlation plot that shows all the correlations in this dataset is too large and barely useful.

In [9]:

CorrPlot(housing_df.toMap())
    .tiles(type="lower")
    .paletteBrBG()
    .build()

Out[9]:

The `threshold` parameter.¶

The threshold parameter let us specify a level of significance, below which variables are not shown.

In [10]:

CorrPlot(housing_df.toMap(), "Threshold: 0.5", threshold = 0.5, adjustSize = 0.7)
    .tiles(type="full", diag=false)
    .paletteBrBG()
    .build()

Out[10]:

Let's further increase our threshold in order to see only highly correlated variables.

In [11]:

CorrPlot(housing_df.toMap(), "Threshold: 0.8", threshold = 0.8)
    .tiles(diag=false)
    .labels(color="white", diag=false)
    .paletteBrBG()
    .build()

Out[11]:

Correlation Plot¶

Combining 'tile', 'point' and 'label' layers.¶

Customizing colors.¶

Correlation plot with large number of variables in dataset.¶

The threshold parameter.¶

The `threshold` parameter.¶