This notebook will guide you on how one can use Kotlin with Jupyter notebooks.
Currently, Kotlin Jupyter kernel can be installed only via conda:
conda install kotlin-jupyter-kernel -c jetbrains
Later it will be also possible to install it via pip install
.
Note, Kotlin Jupyter requires Java 8 to be installed:
apt-get install openjdk-8-jre
Once these requirements are satisfied, feel free to run jupyter notebook
and switch to Kotlin
kernel.
Here's a simple example with Kotlin code:
class Greeter(val name: String) {
fun greet() {
println("Hello, $name!")
}
}
Greeter("Jupyter").greet() // Run me
Hello, Jupyter!
Here's another example, courtsey of thomasnield/kotlin-statistics, showcasing how to load additional dependencies to the notebook from Maven repos:
@file:Repository("https://repo1.maven.org/maven2")
@file:DependsOn("org.nield:kotlin-statistics:1.2.1")
import java.time.LocalDate
import java.time.temporal.ChronoUnit
import org.nield.kotlinstatistics.*
data class Patient(val firstName: String,
val lastName: String,
val gender: Gender,
val birthday: LocalDate,
val whiteBloodCellCount: Int) {
val age = ChronoUnit.YEARS.between(birthday, LocalDate.now())
}
val patients = listOf(
Patient("John", "Simone", Gender.MALE, LocalDate.of(1989, 1, 7), 4500),
Patient("Sarah", "Marley", Gender.FEMALE, LocalDate.of(1970, 2, 5), 6700),
Patient("Jessica", "Arnold", Gender.FEMALE, LocalDate.of(1980, 3, 9), 3400),
Patient("Sam", "Beasley", Gender.MALE, LocalDate.of(1981, 4, 17), 8800),
Patient("Dan", "Forney", Gender.MALE, LocalDate.of(1985, 9, 13), 5400),
Patient("Lauren", "Michaels", Gender.FEMALE, LocalDate.of(1975, 8, 21), 5000),
Patient("Michael", "Erlich", Gender.MALE, LocalDate.of(1985, 12, 17), 4100),
Patient("Jason", "Miles", Gender.MALE, LocalDate.of(1991, 11, 1), 3900),
Patient("Rebekah", "Earley", Gender.FEMALE, LocalDate.of(1985, 2, 18), 4600),
Patient("James", "Larson", Gender.MALE, LocalDate.of(1974, 4, 10), 5100),
Patient("Dan", "Ulrech", Gender.MALE, LocalDate.of(1991, 7, 11), 6000),
Patient("Heather", "Eisner", Gender.FEMALE, LocalDate.of(1994, 3, 6), 6000),
Patient("Jasper", "Martin", Gender.MALE, LocalDate.of(1971, 7, 1), 6000)
)
enum class Gender {
MALE,
FEMALE
}
val clusters = patients.multiKMeansCluster(k = 3,
maxIterations = 10000,
trialCount = 50,
xSelector = { it.age.toDouble() },
ySelector = { it.whiteBloodCellCount.toDouble() }
)
clusters.forEachIndexed { index, item ->
println("CENTROID: $index")
item.points.forEach {
println("\t$it")
}
}
CENTROID: 0 Patient(firstName=Dan, lastName=Forney, gender=MALE, birthday=1985-09-13, whiteBloodCellCount=5400) Patient(firstName=Lauren, lastName=Michaels, gender=FEMALE, birthday=1975-08-21, whiteBloodCellCount=5000) Patient(firstName=James, lastName=Larson, gender=MALE, birthday=1974-04-10, whiteBloodCellCount=5100) Patient(firstName=Dan, lastName=Ulrech, gender=MALE, birthday=1991-07-11, whiteBloodCellCount=6000) Patient(firstName=Heather, lastName=Eisner, gender=FEMALE, birthday=1994-03-06, whiteBloodCellCount=6000) Patient(firstName=Jasper, lastName=Martin, gender=MALE, birthday=1971-07-01, whiteBloodCellCount=6000) CENTROID: 1 Patient(firstName=John, lastName=Simone, gender=MALE, birthday=1989-01-07, whiteBloodCellCount=4500) Patient(firstName=Jessica, lastName=Arnold, gender=FEMALE, birthday=1980-03-09, whiteBloodCellCount=3400) Patient(firstName=Michael, lastName=Erlich, gender=MALE, birthday=1985-12-17, whiteBloodCellCount=4100) Patient(firstName=Jason, lastName=Miles, gender=MALE, birthday=1991-11-01, whiteBloodCellCount=3900) Patient(firstName=Rebekah, lastName=Earley, gender=FEMALE, birthday=1985-02-18, whiteBloodCellCount=4600) CENTROID: 2 Patient(firstName=Sarah, lastName=Marley, gender=FEMALE, birthday=1970-02-05, whiteBloodCellCount=6700) Patient(firstName=Sam, lastName=Beasley, gender=MALE, birthday=1981-04-17, whiteBloodCellCount=8800)
For a more straightforward, the Kotlin kernel pre-configures certain libraries, and allows the notebook user to load them via special commands, also known as magics. To pre-configure libraries for a notebook, one must comma-separate their names prepened with %use
. Here's how it works:
%use kotlin-statistics
When such a cell is executed, the kernel, makes sure the corresponding Maven repo is configured, the library is loaded, necessary import statements are added (e.g. in that case import org.nield.kotlinstatistics.*
won't be needed), and necessary renderers are configured. The supported magics now include: %%kotlin-statistics
, klaxon
, krangl
, kravis
, and lets-plot
.
%use lets-plot, krangl
val df = DataFrame.readCSV("data/iris.csv")
df.head()
sepal_length | sepal_width | petal_length | petal_width | species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
df.groupBy("species").count()
species | n |
---|---|
Iris-setosa | 50 |
Iris-versicolor | 50 |
Iris-virginica | 50 |
val points = geom_point(
data = mapOf(
"x" to df["sepal_length"].asDoubles().toList(),
"y" to df["sepal_width"].asDoubles().toList(),
"color" to df["species"].asStrings().toList()
), alpha=1.0)
{
x = "x"
y = "y"
color = "color"
}
ggplot() + points
sum
to skewness
), slicing operators (e.g. countBy
, simpleRegressionBy
, etc), binning operations, discrete PDF sampling, naive bayes classifier, clustering, linear regression, and more.numpy
; this library supports algebraic structures and operations, array-like structures, math expressions, histograms, streaming operations, wrappers around commons-math and koma, and more.dplyr
and Python's pandas
; this library provides functionality for data manipulation using a functional-style API; it allows to filter, transform, aggregate and reshape tabular data.ggplot
and The Grammar of Graphics; this library is integrated tightly with the Kotlin kernel; the library is multi-platform and can be used not just with JVM but also from JS and Python.ggplot
for visualization of tabular data.The kernel's source code along with documentation is available on GitHub.
The community has already started adopting Kotlin for data science, and this adoption is only growing. It’s very much recommended to watch a talk by Holger Brandl (the creator of krangl, a Kotlin’s analog of Python’s pandas) or another talk by Thomas Nield (the creator of kotlin-statistics), or read his article.