A picture is worth a thousand words. In machine learning, we usually handle high-dimensional data, which is impossible to draw on display directly. But a variety of statistical plots are tremendously valuable for us to grasp the characteristics of many data points. Smile provides data visualization tools such as plots and maps for researchers to understand information more easily and quickly.
With smile.plot.vega
package, we can create a specification that describes visualizations as mappings from data to properties of graphical marks (e.g., points or bars). The specification is based on Vega-Lite. The Vega-Lite compiler automatically produces visualization components including axes, legends, and scales. It then determines properties of these components based on a set of carefully designed rules.
This approach allows specifications to be succinct and expressive, but also provide user control. As Vega-Lite is designed for analysis, it supports data transformations such as aggregation, binning, filtering, sorting, and visual transformations including stacking and faceting. Moreover, Vega-Lite specifications can be composed into layered and multi-view displays, and made interactive with selections.
import $ivy.`com.github.haifengl::smile-scala:2.6.0`
import $ivy.`org.slf4j:slf4j-simple:1.7.30`
import java.lang.Math._
import smile._
import smile.json._
import smile.plot.vega._
import smile.plot.show
import smile.plot.Render._
A bar chart encodes quantitative values as the extent of rectangular bars.
val bar = VegaLite.view().
mark("bar").
heightStep(17).
x(field = "a", `type` = "ordinal", axis = JsObject("labelAngel" -> JsInt(0))).
y(field = "b", `type` = "quantitative").
data(jsan"""
[
{"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43},
{"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53},
{"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}
]"""
).
description("A simple bar chart with embedded data.")
show(bar)
A bar chart showing the US population distribution of age groups in 2000.
val aggregateBar = VegaLite("https://vega.github.io/vega-lite/examples/data/population.json").
mark("bar").
heightStep(17).
x(field = "people", `type` = "quantitative", aggregate = "sum", title = "population").
y(field = "age", `type` = "ordinal").
transform(json"""{"filter": "datum.year == 2000"}""").
description("A bar chart showing the US population distribution of age groups in 2000.")
show(aggregateBar)
A bar chart that sorts the y-values by the x-values.
val sortedAggregateBar = VegaLite("https://vega.github.io/vega-lite/examples/data/population.json").
mark("bar").
heightStep(17).
x(field = "people", `type` = "quantitative", aggregate = "sum", title = "population").
y(field = "age", `type` = "ordinal", sort = Some("-x")).
transform(json"""{"filter": "datum.year == 2000"}""").
description("A bar chart that sorts the y-values by the x-values.")
show(sortedAggregateBar)
val groupedBar = VegaLite.facet("https://vega.github.io/vega-lite/examples/data/population.json").
column(field = "age", `type` = "ordinal", spacing = Some(10)).
mark("bar").
widthStep(12).
x(field = "gender", `type` = "nominal", title = null).
y(field = "people", `type` = "quantitative", aggregate = "sum", axis = json"""{"title": "population", "grid": false}""").
color(field = "gender", `type` = "nominal", scale = json"""{"range": ["#675193", "#ca8861"]}""").
transform(
json"""{"filter": "datum.year == 2000"}""",
json"""{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"}"""
).
config(json"""{"view": {"stroke": "transparent"}, "axis": {"domainWidth": 1}}""")
show(groupedBar)
val stackedBar = VegaLite("https://vega.github.io/vega-lite/examples/data/seattle-weather.csv").
mark("bar").
x(field = "date", `type` = "ordinal", timeUnit = "month", title = "Month of the year").
y(field = null, aggregate = "count", `type` = "quantitative").
color(field = "weather", `type` = "nominal",
scale = json"""{
"domain": ["sun", "fog", "drizzle", "rain", "snow"],
"range": ["#e7ba52", "#c7c7c7", "#aec7e8", "#1f77b4", "#9467bd"]
}""",
legend = JsObject("title" -> JsString("Weather type"))
)
show(stackedBar)
val stackedRoundedBar = VegaLite("https://vega.github.io/vega-lite/examples/data/seattle-weather.csv").
mark(JsObject("type" -> "bar", "cornerRadiusTopLeft" -> 3, "cornerRadiusTopRight" -> 3)).
x(field = "date", `type` = "ordinal", timeUnit = "month").
y(field = null, aggregate = "count", `type` = "quantitative").
color(field = "weather", `type` = "nominal")
show(stackedRoundedBar)
val horizontalStackedBar = VegaLite("https://vega.github.io/vega-lite/examples/data/barley.json").
mark("bar").
x(field = "yield", `type` = "quantitative", aggregate = "sum").
y(field = "variety", `type` = "nominal").
color(field = "site", `type` = "nominal")
show(horizontalStackedBar)
val normalizedStackedBar = VegaLite("https://vega.github.io/vega-lite/examples/data/population.json").
mark("bar").
widthStep(17).
x(field = "age", `type` = "ordinal").
y(field = "people", `type` = "quantitative", aggregate = "sum", title = "population", stack = "normalize").
color(field = "gender", `type` = "nominal", scale = json"""{"range": ["#675193", "#ca8861"]}""").
transform(
json"""{"filter": "datum.year == 2000"}""",
json"""{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"}"""
)
show(normalizedStackedBar)
val layeredBar = VegaLite("https://vega.github.io/vega-lite/examples/data/population.json").
mark("bar").
widthStep(17).
x(field = "age", `type` = "ordinal").
y(field = "people", `type` = "quantitative", aggregate = "sum", title = "population", stack = JsNull).
color(field = "gender", `type` = "nominal", scale = json"""{"range": ["#675193", "#ca8861"]}""").
opacity(0.7).
transform(
json"""{"filter": "datum.year == 2000"}""",
json"""{"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"}"""
)
show(layeredBar)
val gantt = VegaLite.view().
mark("bar").
y(field = "task", `type` = "ordinal").
x(field = "start", `type` = "quantitative").
x2(field = "end").
data(jsan"""
[
{"task": "A", "start": 1, "end": 3},
{"task": "B", "start": 3, "end": 8},
{"task": "C", "start": 8, "end": 10}
]"""
)
show(gantt)
val colorBar = VegaLite.view().
mark("bar").
x(field = "color", `type` = "nominal").
y(field = "b", `type` = "quantitative").
color(field = "color", `type` = "nominal", scale = JsNull).
data(jsan"""
[
{"color": "red", "b": 28},
{"color": "green", "b": 55},
{"color": "blue", "b": 43}
]"""
)
show(colorBar)
val histogram = VegaLite("https://vega.github.io/vega-lite/examples/data/movies.json").
mark("bar").
x(field = "IMDB_Rating", `type` = "quantitative", bin = Left(true)).
y(field = null, `type` = "quantitative", aggregate = "count")
show(histogram)
val freqHistogram = VegaLite("https://vega.github.io/vega-lite/examples/data/cars.json").
mark(JsObject("type" -> "bar", "tooltip" -> true)).
x(field = "bin_Horsepwoer", `type` = "quantitative", bin = Right(json"""{"binned":true}"""), title = "Horsepower").
x2(field = "bin_Horsepwoer_end").
y(field = "PercentOfTotal", `type` = "quantitative", title = "Relative Frequency", axis = json"""{"format": ".1~%"}""").
transform(jsan"""[
{
"bin": true, "field": "Horsepower", "as": "bin_Horsepwoer"
},
{
"aggregate": [{"op": "count", "as": "Count"}],
"groupby": ["bin_Horsepwoer", "bin_Horsepwoer_end"]
},
{
"joinaggregate": [{"op": "sum", "field": "Count", "as": "TotalCount"}]
},
{
"calculate": "datum.Count/datum.TotalCount", "as": "PercentOfTotal"
}
]"""
)
show(freqHistogram)
val histHeatmap = VegaLite("https://vega.github.io/vega-lite/examples/data/movies.json").
mark("rect").
x(field = "IMDB_Rating", `type` = "quantitative", bin = Right(json"""{"maxbins":60}"""), title = "IMDB Rating").
y(field = "Rotten_Tomatoes_Rating", `type` = "quantitative", bin = Right(json"""{"maxbins":40}""")).
color(field = null, `type` = "quantitative", aggregate = "count").
width(300).
height(200).
transform(json"""{
"filter": {
"and": [
{"field": "IMDB_Rating", "valid": true},
{"field": "Rotten_Tomatoes_Rating", "valid": true}
]}}"""
).
config(JsObject("view" -> json"""{"stroke": "transparent"}"""))
show(histHeatmap)
val density = VegaLite("https://vega.github.io/vega-lite/examples/data/movies.json").
mark("area").
x(field = "value", `type` = "quantitative", title = "IMDB Rating").
y(field = "density", `type` = "quantitative").
width(400).
height(100).
transform(json"""
{
"density": "IMDB_Rating",
"bandwidth": 0.3
}"""
)
show(density)
val cdf = VegaLite("https://vega.github.io/vega-lite/examples/data/movies.json").
mark("area").
x(field = "IMDB_Rating", `type` = "quantitative").
y(field = "cumulative_count", `type` = "quantitative").
transform(json"""
{
"sort": [{"field": "IMDB_Rating"}],
"window": [{"op": "count", "field": "count", "as": "cumulative_count"}],
"frame": [null, 0]
}"""
)
show(cdf)
val scatter = VegaLite("https://vega.github.io/vega-lite/examples/data/cars.json").
mark("point").
x(field = "Horsepower", `type` = "quantitative").
y(field = "Miles_per_Gallon", `type` = "quantitative").
color(field = "Origin", `type` = "nominal").
shape(field = "Origin", `type` = "nominal")
show(scatter)
val bubble = VegaLite("https://vega.github.io/vega-lite/examples/data/cars.json").
mark("point").
x(field = "Horsepower", `type` = "quantitative").
y(field = "Miles_per_Gallon", `type` = "quantitative").
size(field = "Acceleration", `type` = "quantitative")
show(bubble)
val disaster = VegaLite("https://vega.github.io/vega-lite/examples/data/disasters.csv").
mark(JsObject(
"type" -> "circle",
"opacity" -> 0.8,
"stroke" -> "black",
"strokeWidth" -> 1
)).
width(600).
height(400).
x(field = "Year", `type` = "ordinal", axis = json"""{"labelAngle": 0}""").
y(field = "Entity", `type` = "nominal", title = null).
size(field = "Deaths", `type` = "quantitative",
legend = json"""{"title": "Annual Global Deaths", "clipHeight": 30}""",
scale = json"""{"range": [0, 5000]}"""
).
color(field = "Entity", `type` = "nominal", legend = JsNull).
transform(json"""{"filter": "datum.Entity !== 'All natural disasters'"}""")
show(disaster)
val textPlot = VegaLite("https://vega.github.io/vega-lite/examples/data/cars.json").
mark("text").
x(field = "Horsepower", `type` = "quantitative").
y(field = "Miles_per_Gallon", `type` = "quantitative").
color(field = "Brand", `type` = "nominal").
text(field = "Brand", `type` = "nominal").
transform(json"""
{
"calculate": "split(datum.Name, ' ')[0]",
"as": "Brand"
}"""
)
show(textPlot)
val line = VegaLite("https://vega.github.io/vega-lite/examples/data/stocks.csv").
mark("line").
x(field = "date", `type` = "temporal").
y(field = "price", `type` = "quantitative").
transform(json"""{"filter": "datum.symbol==='GOOG'"}""")
show(line)
val pointLine = VegaLite("https://vega.github.io/vega-lite/examples/data/stocks.csv").
mark(JsObject("type" -> "line", "point" -> true)).
x(field = "date", `type` = "temporal", timeUnit = "year").
y(field = "price", `type` = "quantitative", aggregate = "mean").
color(field = "symbol", `type` = "nominal")
show(pointLine)
val confidenceInterval = VegaLite.layer(
"https://vega.github.io/vega-lite/examples/data/cars.json",
"json",
VegaLite.view().
mark(JsObject("type" -> "errorband", "extent" -> "ci")).
y(field = "Miles_per_Gallon", `type` = "quantitative", title = "Mean of Miles per Gallon (95% CIs)"),
VegaLite.view().
mark("line").
y(field = "Miles_per_Gallon", `type` = "quantitative", aggregate = "mean")
).x(field = "Year", `type` = "temporal", timeUnit = "year")
show(confidenceInterval)
val rollingAverages = VegaLite.layer(
"https://vega.github.io/vega-lite/examples/data/seattle-weather.csv",
"csv",
VegaLite.view().
mark(JsObject("type" -> "point", "opacity" -> 0.3)).
x(field = "date", `type` = "temporal", title = "Date").
y(field = "temp_max", `type` = "quantitative", title = "Max Temperature"),
VegaLite.view().
mark(JsObject("color" -> "red", "size" -> 3, "type" -> "line")).
x(field = "date", `type` = "temporal").
y(field = "rolling_mean", `type` = "quantitative"),
).
width(400).
height(300).
transform(json"""
{
"frame": [-15, 15],
"window": [
{
"field": "temp_max",
"op": "mean",
"as": "rolling_mean"
}
]
}"""
)
show(rollingAverages)
val area = VegaLite("https://vega.github.io/vega-lite/examples/data/stocks.csv").
mark(JsObject("type" -> "area", "line" -> true, "point" -> true)).
x(field = "date", `type` = "temporal").
y(field = "price", `type` = "quantitative").
transform(json"""{"filter": "datum.symbol==='GOOG'"}""")
show(area)
val heatmap = VegaLite("https://vega.github.io/vega-lite/examples/data/seattle-temps.csv").
mark("rect").
x(field = "date", `type` = "ordinal", timeUnit = "date", title = "Day", axis = json"""{"labelAngle": 0, "format": "%e"}""").
y(field = "date", `type` = "ordinal", timeUnit = "month", title = "Month").
color(field = "temp", `type` = "quantitative", aggregate = "max", legend = json"""{"title": null}""").
config(json"""
{
"view": {
"strokeWidth": 0,
"step": 13
},
"axis": {
"domain": false
}
}"""
).
title("2010 Daily Max Temperature (F) in Seattle, WA")
show(heatmap)
val donut = VegaLite(jsan"""
[
{"category": 1, "value": 4},
{"category": 2, "value": 6},
{"category": 3, "value": 10},
{"category": 4, "value": 3},
{"category": 5, "value": 7},
{"category": 6, "value": 8}
]"""
).
mark(JsObject("type" -> "arc", "innerRadius" -> 50)).
view(JsObject("stroke" -> JsNull)).
theta(field = "value", `type` = "quantitative").
color(field = "category", `type` = "nominal")
show(donut)
This radial plot uses both angular and radial extent to convey multiple dimensions of data. However, this approach is not perceptually effective, as viewers will most likely be drawn to the total area of the shape, conflating the two dimensions. This example also demonstrates a way to add labels to circular plots.
val radial = VegaLite.layer(
jsan"""[12, 23, 47, 6, 52, 19]""",
VegaLite.view().
mark(JsObject("type" -> "arc", "innerRadius" -> 20, "stroke" -> "#fff")),
VegaLite.view().mark(JsObject("type" -> "text", "radiusOffset" -> 10)).
text(field = "data", `type` = "quantitative")
).
mark(JsObject("type" -> "arc", "innerRadius" -> 50)).
view(JsObject("stroke" -> JsNull)).
theta(field = "data", `type` = "quantitative", stack = true).
radius(field = "data", `type` = "quantitative", scale = json"""{"type": "sqrt", "zero": true, "range": [20, 100]}""").
color(field = "data", `type` = "nominal", legend = JsNull)
show(radial)
val boxplot = VegaLite("https://vega.github.io/vega-lite/examples/data/population.json").
mark(JsObject("type" -> "boxplot", "extent" -> "min-max")).
view(JsObject("stroke" -> JsNull)).
x(field = "age", `type` = "ordinal").
y(field = "people", `type` = "quantitative", title = "population")
show(boxplot)
show(VegaLite.hconcat(bar, aggregateBar, groupedBar))
val iris = read.arff("../data/weka/iris.arff")
show(VegaLite.splom(iris, "class"))
val geo = VegaLite(
"https://vega.github.io/vega-lite/examples/data/us-10m.json",
JsObject("type" -> "topojson", "feature" -> "counties")
).
mark("geoshape").
color(field = "rate", `type` = "quantitative").
projection(json"""{"type": "albersUsa"}""").
width(500).
height(300).
transform(json"""
{
"lookup": "id",
"from": {
"data": {
"url": "https://vega.github.io/vega-lite/examples/data/unemployment.tsv"
},
"key": "id",
"fields": ["rate"]
}
}"""
)
show(geo)