Kotlin for Jupyter Notebooks

Binder

This notebook will guide you on how one can use Kotlin with Jupyter notebooks.

Installing kernel

Currently, Kotlin Jupyter kernel can be installed only via conda:

conda install kotlin-jupyter-kernel -c jetbrains

Later it will be also possible to install it via pip install.

Note, Kotlin Jupyter requires Java 8 to be installed:

apt-get install openjdk-8-jre

Once these requirements are satisfied, feel free to run jupyter notebook and switch to Kotlin kernel.

Running cells

Here's a simple example with Kotlin code:

In [1]:
class Greeter(val name: String) {
    fun greet() {
        println("Hello, $name!")
    }
}
In [2]:
Greeter("Jupyter").greet() // Run me
Hello, Jupyter!

Configuring Maven dependencies

Here's another example, courtsey of thomasnield/kotlin-statistics, showcasing how to load additional dependencies to the notebook from Maven repos:

In [3]:
@file:Repository("https://repo1.maven.org/maven2")
@file:DependsOn("org.nield:kotlin-statistics:1.2.1")
In [4]:
import java.time.LocalDate
import java.time.temporal.ChronoUnit
import org.nield.kotlinstatistics.*

data class Patient(val firstName: String,
                   val lastName: String,
                   val gender: Gender,
                   val birthday: LocalDate,
                   val whiteBloodCellCount: Int) {

    val age = ChronoUnit.YEARS.between(birthday, LocalDate.now())
}

val patients = listOf(
        Patient("John", "Simone", Gender.MALE, LocalDate.of(1989, 1, 7), 4500),
        Patient("Sarah", "Marley", Gender.FEMALE, LocalDate.of(1970, 2, 5), 6700),
        Patient("Jessica", "Arnold", Gender.FEMALE, LocalDate.of(1980, 3, 9), 3400),
        Patient("Sam", "Beasley", Gender.MALE, LocalDate.of(1981, 4, 17), 8800),
        Patient("Dan", "Forney", Gender.MALE, LocalDate.of(1985, 9, 13), 5400),
        Patient("Lauren", "Michaels", Gender.FEMALE, LocalDate.of(1975, 8, 21), 5000),
        Patient("Michael", "Erlich", Gender.MALE, LocalDate.of(1985, 12, 17), 4100),
        Patient("Jason", "Miles", Gender.MALE, LocalDate.of(1991, 11, 1), 3900),
        Patient("Rebekah", "Earley", Gender.FEMALE, LocalDate.of(1985, 2, 18), 4600),
        Patient("James", "Larson", Gender.MALE, LocalDate.of(1974, 4, 10), 5100),
        Patient("Dan", "Ulrech", Gender.MALE, LocalDate.of(1991, 7, 11), 6000),
        Patient("Heather", "Eisner", Gender.FEMALE, LocalDate.of(1994, 3, 6), 6000),
        Patient("Jasper", "Martin", Gender.MALE, LocalDate.of(1971, 7, 1), 6000)
)

enum class Gender {
    MALE,
    FEMALE
}

val clusters = patients.multiKMeansCluster(k = 3,
        maxIterations = 10000,
        trialCount = 50,
        xSelector = { it.age.toDouble() },
        ySelector = { it.whiteBloodCellCount.toDouble() }
)
In [5]:
clusters.forEachIndexed { index, item ->
    println("CENTROID: $index")
    item.points.forEach {
        println("\t$it")
    }
}
CENTROID: 0
	Patient(firstName=Dan, lastName=Forney, gender=MALE, birthday=1985-09-13, whiteBloodCellCount=5400)
	Patient(firstName=Lauren, lastName=Michaels, gender=FEMALE, birthday=1975-08-21, whiteBloodCellCount=5000)
	Patient(firstName=James, lastName=Larson, gender=MALE, birthday=1974-04-10, whiteBloodCellCount=5100)
	Patient(firstName=Dan, lastName=Ulrech, gender=MALE, birthday=1991-07-11, whiteBloodCellCount=6000)
	Patient(firstName=Heather, lastName=Eisner, gender=FEMALE, birthday=1994-03-06, whiteBloodCellCount=6000)
	Patient(firstName=Jasper, lastName=Martin, gender=MALE, birthday=1971-07-01, whiteBloodCellCount=6000)
CENTROID: 1
	Patient(firstName=John, lastName=Simone, gender=MALE, birthday=1989-01-07, whiteBloodCellCount=4500)
	Patient(firstName=Jessica, lastName=Arnold, gender=FEMALE, birthday=1980-03-09, whiteBloodCellCount=3400)
	Patient(firstName=Michael, lastName=Erlich, gender=MALE, birthday=1985-12-17, whiteBloodCellCount=4100)
	Patient(firstName=Jason, lastName=Miles, gender=MALE, birthday=1991-11-01, whiteBloodCellCount=3900)
	Patient(firstName=Rebekah, lastName=Earley, gender=FEMALE, birthday=1985-02-18, whiteBloodCellCount=4600)
CENTROID: 2
	Patient(firstName=Sarah, lastName=Marley, gender=FEMALE, birthday=1970-02-05, whiteBloodCellCount=6700)
	Patient(firstName=Sam, lastName=Beasley, gender=MALE, birthday=1981-04-17, whiteBloodCellCount=8800)

Configuring the built-in via magics

For a more straightforward, the Kotlin kernel pre-configures certain libraries, and allows the notebook user to load them via special commands, also known as magics. To pre-configure libraries for a notebook, one must comma-separate their names prepened with %use. Here's how it works:

In [6]:
%use kotlin-statistics

When such a cell is executed, the kernel, makes sure the corresponding Maven repo is configured, the library is loaded, necessary import statements are added (e.g. in that case import org.nield.kotlinstatistics.* won't be needed), and necessary renderers are configured. The supported magics now include: %%kotlin-statistics, klaxon, krangl, kravis, and lets-plot.

Here's another example, showcasing krangl, and lets-plot libraries:

In [7]:
%use lets-plot, krangl
In [8]:
val df = DataFrame.readCSV("data/iris.csv")
df.head()
Out[8]:

sepal_lengthsepal_widthpetal_lengthpetal_widthspecies
5.13.51.40.2Iris-setosa
4.93.01.40.2Iris-setosa
4.73.21.30.2Iris-setosa
4.63.11.50.2Iris-setosa
5.03.61.40.2Iris-setosa
In [9]:
df.groupBy("species").count()
Out[9]:

speciesn
Iris-setosa50
Iris-versicolor50
Iris-virginica50
In [10]:
val points = geom_point(
    data = mapOf(
        "x" to df["sepal_length"].asDoubles().toList(),
        "y" to df["sepal_width"].asDoubles().toList(),
        "color" to df["species"].asStrings().toList()
        
    ), alpha=1.0)
{
    x = "x" 
    y = "y"
    color = "color"
}

ggplot() + points
Out[10]:

Useful libraries

  • kotlin-statistics is a library that provides a set of extension functions to perform exploratory and production statistics. It supports basic numeric list/sequence/array functions (from sum to skewness), slicing operators (e.g. countBy, simpleRegressionBy, etc), binning operations, discrete PDF sampling, naive bayes classifier, clustering, linear regression, and more.
  • kmath is a library inspired by numpy; this library supports algebraic structures and operations, array-like structures, math expressions, histograms, streaming operations, wrappers around commons-math and koma, and more.
  • krangl is a library inspired by R's dplyr and Python's pandas; this library provides functionality for data manipulation using a functional-style API; it allows to filter, transform, aggregate and reshape tabular data.
  • lets-plot is a library for declaratively creating plots based tabular data; it is inspired by Python's ggplot and The Grammar of Graphics; this library is integrated tightly with the Kotlin kernel; the library is multi-platform and can be used not just with JVM but also from JS and Python.
  • kravis is another library inspired by Python's ggplot for visualization of tabular data.

Documentation and contribution

The kernel's source code along with documentation is available on GitHub.

The community has already started adopting Kotlin for data science, and this adoption is only growing. It’s very much recommended to watch a talk by Holger Brandl (the creator of krangl, a Kotlin’s analog of Python’s pandas) or another talk by Thomas Nield (the creator of kotlin-statistics), or read his article.

In [ ]: