To start working with Kotlin DataFrame in a notebook, run the cell with the next code:
%useLatestDescriptors
%use dataframe
This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame rendering. Learn more here.
Kotlin DataFrame supports all popular data formats, including CSV, JSON and Excel, as well as reading from various databases. Read a CSV with the "Jetbrains Repositories" dataset into df
variable:
val df = DataFrame.readCsv(
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
)
To display your dataframe as a cell output, place it in the last line of the cell:
df
full_name | html_url | stargazers_count | topics | watchers |
---|---|---|---|---|
JetBrains/JPS | https://github.com/JetBrains/JPS | 23 | [] | 23 |
JetBrains/YouTrackSharp | https://github.com/JetBrains/YouTrack... | 115 | [jetbrains, jetbrains-youtrack, youtr... | 115 |
JetBrains/colorSchemeTool | https://github.com/JetBrains/colorSch... | 290 | [] | 290 |
JetBrains/ideavim | https://github.com/JetBrains/ideavim | 6120 | [ideavim, intellij, intellij-platform... | 6120 |
JetBrains/youtrack-vcs-hooks | https://github.com/JetBrains/youtrack... | 5 | [] | 5 |
JetBrains/youtrack-rest-ruby-library | https://github.com/JetBrains/youtrack... | 8 | [] | 8 |
JetBrains/emacs4ij | https://github.com/JetBrains/emacs4ij | 47 | [] | 47 |
JetBrains/codereview4intellij | https://github.com/JetBrains/coderevi... | 11 | [] | 11 |
JetBrains/teamcity-nuget-support | https://github.com/JetBrains/teamcity... | 41 | [nuget, nuget-feed, teamcity, teamcit... | 41 |
JetBrains/Grammar-Kit | https://github.com/JetBrains/Grammar-Kit | 534 | [] | 534 |
JetBrains/intellij-starteam-plugin | https://github.com/JetBrains/intellij... | 6 | [] | 6 |
JetBrains/la-clojure | https://github.com/JetBrains/la-clojure | 218 | [] | 218 |
JetBrains/MPS | https://github.com/JetBrains/MPS | 1241 | [domain-specific-language, dsl] | 1241 |
JetBrains/intellij-community | https://github.com/JetBrains/intellij... | 12926 | [code-editor, ide, intellij, intellij... | 12926 |
JetBrains/TeamCity.ServiceMessages | https://github.com/JetBrains/TeamCity... | 39 | [c-sharp, teamcity, teamcity-service-... | 39 |
JetBrains/youtrack-rest-python-library | https://github.com/JetBrains/youtrack... | 118 | [] | 118 |
JetBrains/intellij-scala | https://github.com/JetBrains/intellij... | 1066 | [intellij-idea, intellij-plugin, scala] | 1066 |
JetBrains/teamcity-messages | https://github.com/JetBrains/teamcity... | 125 | [] | 125 |
JetBrains/teamcity-cpp | https://github.com/JetBrains/teamcity... | 27 | [] | 27 |
JetBrains/kotlin | https://github.com/JetBrains/kotlin | 39402 | [compiler, gradle-plugin, intellij-pl... | 39402 |
Kotlin Notebook has special interactive outputs for DataFrame
. Learn more about them here.
Use .describe()
method to get dataset summaries — column types, number of nulls and simple statistics.
df.describe()
name | type | count | unique | nulls | top | freq | mean | std | min | p25 | median | p75 | max |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
full_name | String | 562 | 562 | 0 | JetBrains/JPS | 1 | null | null | JetBrains/Android-Tuts-Samples | JetBrains/eslint-config | JetBrains/lightbeam | JetBrains/teamcity-bitbucket-issues | JetBrains/ztools |
html_url | URL | 562 | 562 | 0 | https://github.com/JetBrains/JPS | 1 | null | null | null | null | null | null | null |
stargazers_count | Int | 562 | 165 | 0 | 1 | 100 | 244.759786 | 1862.801982 | 0 | 2.000000 | 8.000000 | 48.000000 | 39402 |
topics | String | 562 | 145 | 0 | [] | 401 | null | null | [2d, graphics, java, skia] | [] | [] | [awt, swing] | [youtrack, youtrack-workflow] |
watchers | Int | 562 | 165 | 0 | 1 | 100 | 244.759786 | 1862.801982 | 0 | 2.000000 | 8.000000 | 48.000000 | 39402 |
Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of columns.
Column selectors are widely used across operations — one of the simplest examples is .select { }
, which returns a new DataFrame with only the columns chosen in Columns Selection expression.
After executing the cell where a DataFrame
variable is declared, an extension with properties for its columns is automatically generated.
These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.
Select some columns:
// Select "full_name", "stargazers_count" and "topics" columns
val dfSelected = df.select { full_name and stargazers_count and topics }
dfSelected
full_name | stargazers_count | topics |
---|---|---|
JetBrains/JPS | 23 | [] |
JetBrains/YouTrackSharp | 115 | [jetbrains, jetbrains-youtrack, youtr... |
JetBrains/colorSchemeTool | 290 | [] |
JetBrains/ideavim | 6120 | [ideavim, intellij, intellij-platform... |
JetBrains/youtrack-vcs-hooks | 5 | [] |
JetBrains/youtrack-rest-ruby-library | 8 | [] |
JetBrains/emacs4ij | 47 | [] |
JetBrains/codereview4intellij | 11 | [] |
JetBrains/teamcity-nuget-support | 41 | [nuget, nuget-feed, teamcity, teamcit... |
JetBrains/Grammar-Kit | 534 | [] |
JetBrains/intellij-starteam-plugin | 6 | [] |
JetBrains/la-clojure | 218 | [] |
JetBrains/MPS | 1241 | [domain-specific-language, dsl] |
JetBrains/intellij-community | 12926 | [code-editor, ide, intellij, intellij... |
JetBrains/TeamCity.ServiceMessages | 39 | [c-sharp, teamcity, teamcity-service-... |
JetBrains/youtrack-rest-python-library | 118 | [] |
JetBrains/intellij-scala | 1066 | [intellij-idea, intellij-plugin, scala] |
JetBrains/teamcity-messages | 125 | [] |
JetBrains/teamcity-cpp | 27 | [] |
JetBrains/kotlin | 39402 | [compiler, gradle-plugin, intellij-pl... |
Some operations use RowExpression
, i.e., an expression that applies for all DataFrame
rows.
For example .filter { }
returns a new DataFrame
with rows that satisfy a condition given by row expression.
Inside a row expression, you can access the values of the current row by column names through auto-generated properties. Similar to the Columns Selection DSL, but in this case the properties represent actual values, not column references.
Filter rows by "stargazers_count" value:
// Keep only rows where "stargazers_count" value is more than 1000
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
dfFiltered
full_name | stargazers_count | topics |
---|---|---|
JetBrains/ideavim | 6120 | [ideavim, intellij, intellij-platform... |
JetBrains/MPS | 1241 | [domain-specific-language, dsl] |
JetBrains/intellij-community | 12926 | [code-editor, ide, intellij, intellij... |
JetBrains/intellij-scala | 1066 | [intellij-idea, intellij-plugin, scala] |
JetBrains/kotlin | 39402 | [compiler, gradle-plugin, intellij-pl... |
JetBrains/intellij-plugins | 1737 | [] |
JetBrains/Exposed | 5688 | [dao, kotlin, orm, sql] |
JetBrains/kotlin-web-site | 1074 | [kotlin] |
JetBrains/idea-gitignore | 1181 | [gitignore, ignore-files, intellij, i... |
JetBrains/swot | 1072 | [] |
JetBrains/phpstorm-stubs | 1110 | [] |
JetBrains/gradle-intellij-plugin | 1058 | [gradle, gradle-intellij-plugin, grad... |
JetBrains/svg-sprite-loader | 1815 | [sprite, svg, svg-sprite, svg-stack, ... |
JetBrains/resharper-unity | 1017 | [hacktoberfest, jetbrains, plugin, re... |
JetBrains/kotlin-native | 7101 | [c, compiler, kotlin, llvm, objective-c] |
JetBrains/create-react-kotlin-app | 2424 | [create-react-app, jetbrains-ui, kotl... |
JetBrains/ring-ui | 2836 | [components, jetbrains-ui, react] |
JetBrains/kotlinconf-app | 2628 | [] |
JetBrains/JetBrainsMono | 6059 | [coding-font, font, ligatures, monosp... |
JetBrains/intellij-platform-plugin-te... | 1133 | [intellij, intellij-idea, intellij-id... |
Columns can be renamed using the .rename { }
operation, which also uses the Columns Selection DSL to select a column to rename.
The rename
operation does not perform the renaming immediately; instead, it creates an intermediate object that must be finalized into a new DataFrame
by calling the .into()
function with the new column name.
Rename "full_name" and "stargazers_count" columns:
// Rename "full_name" column into "name"
val dfRenamed = dfFiltered
.rename { full_name }.into("name")
// And "stargazers_count" into "starsCount"
.rename { stargazers_count }.into("starsCount")
dfRenamed
name | starsCount | topics |
---|---|---|
JetBrains/ideavim | 6120 | [ideavim, intellij, intellij-platform... |
JetBrains/MPS | 1241 | [domain-specific-language, dsl] |
JetBrains/intellij-community | 12926 | [code-editor, ide, intellij, intellij... |
JetBrains/intellij-scala | 1066 | [intellij-idea, intellij-plugin, scala] |
JetBrains/kotlin | 39402 | [compiler, gradle-plugin, intellij-pl... |
JetBrains/intellij-plugins | 1737 | [] |
JetBrains/Exposed | 5688 | [dao, kotlin, orm, sql] |
JetBrains/kotlin-web-site | 1074 | [kotlin] |
JetBrains/idea-gitignore | 1181 | [gitignore, ignore-files, intellij, i... |
JetBrains/swot | 1072 | [] |
JetBrains/phpstorm-stubs | 1110 | [] |
JetBrains/gradle-intellij-plugin | 1058 | [gradle, gradle-intellij-plugin, grad... |
JetBrains/svg-sprite-loader | 1815 | [sprite, svg, svg-sprite, svg-stack, ... |
JetBrains/resharper-unity | 1017 | [hacktoberfest, jetbrains, plugin, re... |
JetBrains/kotlin-native | 7101 | [c, compiler, kotlin, llvm, objective-c] |
JetBrains/create-react-kotlin-app | 2424 | [create-react-app, jetbrains-ui, kotl... |
JetBrains/ring-ui | 2836 | [components, jetbrains-ui, react] |
JetBrains/kotlinconf-app | 2628 | [] |
JetBrains/JetBrainsMono | 6059 | [coding-font, font, ligatures, monosp... |
JetBrains/intellij-platform-plugin-te... | 1133 | [intellij, intellij-idea, intellij-id... |
Columns can be modified using the update { }
and convert { }
operations.
Both operations select columns to modify via the Columns Selection DSL and, similar to rename
, create an intermediate object that must be finalized to produce a new DataFrame
.
The update
operation preserves the original column types, while convert
allows changing the type.
In both cases, column names and their positions remain unchanged.
Update "name" and convert "topics":
val dfUpdated = dfRenamed
// Update "name" values with only its second part (after '/')
.update { name }.with { it.split("/")[1] }
// Convert "topics" `String` values into `List<String>` by splitting:
.convert { topics }.with { it.removeSurrounding("[", "]").split(", ") }
dfUpdated
name | starsCount | topics |
---|---|---|
ideavim | 6120 | [ideavim, intellij, intellij-platform... |
MPS | 1241 | [domain-specific-language, dsl] |
intellij-community | 12926 | [code-editor, ide, intellij, intellij... |
intellij-scala | 1066 | [intellij-idea, intellij-plugin, scala] |
kotlin | 39402 | [compiler, gradle-plugin, intellij-pl... |
intellij-plugins | 1737 | [] |
Exposed | 5688 | [dao, kotlin, orm, sql] |
kotlin-web-site | 1074 | [kotlin] |
idea-gitignore | 1181 | [gitignore, ignore-files, intellij, i... |
swot | 1072 | [] |
phpstorm-stubs | 1110 | [] |
gradle-intellij-plugin | 1058 | [gradle, gradle-intellij-plugin, grad... |
svg-sprite-loader | 1815 | [sprite, svg, svg-sprite, svg-stack, ... |
resharper-unity | 1017 | [hacktoberfest, jetbrains, plugin, re... |
kotlin-native | 7101 | [c, compiler, kotlin, llvm, objective-c] |
create-react-kotlin-app | 2424 | [create-react-app, jetbrains-ui, kotl... |
ring-ui | 2836 | [components, jetbrains-ui, react] |
kotlinconf-app | 2628 | [] |
JetBrainsMono | 6059 | [coding-font, font, ligatures, monosp... |
intellij-platform-plugin-template | 1133 | [intellij, intellij-idea, intellij-id... |
Check the new "topics" type out:
dfUpdated.topics.type()
kotlin.collections.List<kotlin.String>
The .add { }
function allows creating a DataFrame
with a new column, where the value for each row is computed based on the existing values in that row. These values can be accessed within the row expressions.
Add a new Boolean
column "isIntellij":
// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
// or the topics include "intellij".
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
name.contains("intellij") || "intellij" in topics
}
dfWithIsIntellij
name | starsCount | topics | isIntellij |
---|---|---|---|
ideavim | 6120 | [ideavim, intellij, intellij-platform... | true |
MPS | 1241 | [domain-specific-language, dsl] | false |
intellij-community | 12926 | [code-editor, ide, intellij, intellij... | true |
intellij-scala | 1066 | [intellij-idea, intellij-plugin, scala] | true |
kotlin | 39402 | [compiler, gradle-plugin, intellij-pl... | false |
intellij-plugins | 1737 | [] | true |
Exposed | 5688 | [dao, kotlin, orm, sql] | false |
kotlin-web-site | 1074 | [kotlin] | false |
idea-gitignore | 1181 | [gitignore, ignore-files, intellij, i... | true |
swot | 1072 | [] | false |
phpstorm-stubs | 1110 | [] | false |
gradle-intellij-plugin | 1058 | [gradle, gradle-intellij-plugin, grad... | true |
svg-sprite-loader | 1815 | [sprite, svg, svg-sprite, svg-stack, ... | false |
resharper-unity | 1017 | [hacktoberfest, jetbrains, plugin, re... | false |
kotlin-native | 7101 | [c, compiler, kotlin, llvm, objective-c] | false |
create-react-kotlin-app | 2424 | [create-react-app, jetbrains-ui, kotl... | false |
ring-ui | 2836 | [components, jetbrains-ui, react] | false |
kotlinconf-app | 2628 | [] | false |
JetBrainsMono | 6059 | [coding-font, font, ligatures, monosp... | false |
intellij-platform-plugin-template | 1133 | [intellij, intellij-idea, intellij-id... | true |
A DataFrame
can be grouped by column keys, meaning its rows are split into groups based on the values in the key columns.
The .groupBy { }
operation selects columns and groups the DataFrame
by their values, using them as grouping keys.
The result is a GroupBy
— a DataFrame
-like structure that associates each key with the corresponding subset of the original DataFrame
.
Group dfWithIsIntellij
by "isIntellij":
val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
groupedByIsIntellij
isIntellij | group | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
true | DataFrame [7 x 4]
... showing only top 5 of 7 rows | ||||||||||||||||||||||||
false | DataFrame [17 x 4]
... showing only top 5 of 17 rows |
A GroupBy
can be aggregated — that is, you can compute one or several summary statistics for each group.
The result of the aggregation is a DataFrame
containing the key columns along with new columns holding the computed statistics for a corresponding group.
For example, count()
computes size of a group:
groupedByIsIntellij.count()
isIntellij | count |
---|---|
true | 7 |
false | 17 |
Compute several statistics with .aggregate { }
, which provides a DSL for aggregating:
groupedByIsIntellij.aggregate {
// Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
sumOf { starsCount } into "sumStars"
maxOf { starsCount } into "maxStars"
}
isIntellij | sumStars | maxStars |
---|---|---|
true | 25221 | 12926 |
false | 85392 | 39402 |
.sort {}
/.sortByDesc
sorts rows by value in selected columns, returning a DataFrame.
take(n)
returns a new DataFrame
with the first n
rows.
Combine them to get Top-10 repositories by number of stars:
val dfTop10 = dfWithIsIntellij
// Sort by "starsCount" value descending
.sortByDesc { starsCount }
.take(10)
dfTop10
name | starsCount | topics | isIntellij |
---|---|---|---|
kotlin | 39402 | [compiler, gradle-plugin, intellij-pl... | false |
intellij-community | 12926 | [code-editor, ide, intellij, intellij... | true |
kotlin-native | 7101 | [c, compiler, kotlin, llvm, objective-c] | false |
compose-jb | 6805 | [android, awt, compose, declarative-u... | false |
ideavim | 6120 | [ideavim, intellij, intellij-platform... | true |
JetBrainsMono | 6059 | [coding-font, font, ligatures, monosp... | false |
Exposed | 5688 | [dao, kotlin, orm, sql] | false |
ring-ui | 2836 | [components, jetbrains-ui, react] | false |
kotlinconf-app | 2628 | [] | false |
create-react-kotlin-app | 2424 | [create-react-app, jetbrains-ui, kotl... | false |
Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a convenient and typesafe way to build data visualizations.
Kandy can be loaded into notebook using %use kandy
:
%use kandy
Build a simple bar chart with the .plot { }
extension for DataFrame, that allows to use extension properties inside Kandy plotting DSL (a plot will be rendered as output after cell execution):
dfTop10.plot {
bars {
x(name)
y(starsCount)
}
layout.title = "Top 10 JetBrains repositories by stars count"
}
DataFrame
supports writing to (almost) all formats that it is capable of reading.
Write to Excel:
dfWithIsIntellij.writeExcel("jb_repos.xlsx")
2025-05-27T17:21:11.899521Z Execution of code 'dfWithIsIntellij.wri...' ERROR Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...