import pandas as pd
import lux
from lux.vis.VisList import VisList
from lux.vis.Vis import Vis
# Collecting basic usage statistics for Lux (For more information, see: https://tinyurl.com/logging-consent)
lux.logger = True # Remove this line if you do not want your interactions recorded
The Lux intent specification can be defined as a context-free grammar (CFG). Here, we introduce a formal definition of the intent language in Lux for interested readers.
df = pd.read_csv('https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true')
df["Year"] = pd.to_datetime(df["Year"], format='%Y')
Clause
objects¶An intent in Lux corresponds to the Kleene star of Clause
objects, i.e., it can have either zero, one, or multiple Clause
s.
spec1 = lux.Clause("MilesPerGal")
spec2 = lux.Clause("Horsepower")
spec3 = lux.Clause("Origin=USA")
intent = [spec1, spec2, spec3]
Here is an example of how we can formulate an intent as a list of Clause
and generate a visualization. In this tutorial, we will discuss how the Clause
breaks down to different production rules.
Vis(intent,df)
LuxWidget(current_vis={'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}, 'axis': {'labelCo…
<Vis (x: MilesPerGal, y: Horsepower -- [Origin=USA]) mark: scatter, score: 0.0 >
A Clause
can either be an Axis
specification or a Filter
specification.
Note that it is not possible for a Clause
to be both an Axis
and a Filter
, but they can be specified as separate Clause
s in the intent.
axisSpec = lux.Clause(attribute="MilesPerGal")
# Equivalent, easier-to-specify Clause syntax : lux.Clause("MilesPerGal")
axisSpec
filterSpec = lux.Clause(attribute="Origin",filter_op="=",value="USA")
# Equivalent, easier-to-specify Clause syntax : lux.Clause("Origin=USA")
filterSpec
Axis
specification¶An Axis
requires an attribute
specification, and an optional channel
, aggregation
, or bin_size
specification.
\begin{equation}
\langle Axis \rangle \rightarrow \langle attribute \rangle \langle channel \rangle \langle aggregation \rangle \langle bin\_size \rangle
\end{equation}
An attribute
can either be a single column in the dataset, a list of columns, or a wildcard
.
\begin{equation}
\begin{split}
\langle attribute \rangle &\rightarrow \textrm{attribute} \\
&\rightarrow \textrm{attribute} \cup \langle attribute \rangle\\
&\rightarrow \langle wildcard \rangle
\end{split}
\end{equation}
# user is interested in "MilesPerGal"
attribute = lux.Clause("Origin")
# user is interested in "MilesPerGal","Horsepower", or "Weight"
attribute = lux.Clause(["MilesPerGal","Horsepower","Weight"])
# user is interested in any attribute
attribute = lux.Clause("?")
Optional specification of the Axis
include :
# Ensure that "MilesPerGal" is placed on the x axis
axisSpec = lux.Clause("MilesPerGal",channel="x")
# Apply sum on "MilesPerGal"
axisSpec = lux.Clause("MilesPerGal",aggregation="sum")
# Divide "MilesPerGal" into 50 bins
axisSpec = lux.Clause("MilesPerGal",bin_size=50)
By default, if we specify only an attribute, the system automatically infers the appropriate channel
, aggregation
, or bin_size
.
axisSpec = "MilesPerGal"
Vis([axisSpec],df)
We can increase the bin_size
as an optional parameter:
bin50MPG = lux.Clause("MilesPerGal",bin_size=50)
Vis([bin50MPG],df)
For bar charts, Lux uses a default aggregation of mean
and displays a horizontal bar chart with Origin
on the y axis.
axisSpec1 = "MilesPerGal"
axisSpec2 = "Origin"
Vis([axisSpec1,axisSpec2],df)
We can change the mean
to a sum
aggregation:
axisSpec1 = lux.Clause("MilesPerGal",aggregation="sum")
axisSpec2 = "Origin"
Vis([axisSpec1,axisSpec2],df)
Or we can set the Origin
on the x-axis to get a vertical bar chart instead:
axisSpec1 = "MilesPerGal"
axisSpec2 = lux.Clause("Origin",channel="x")
Vis([axisSpec1,axisSpec2],df)
Wildcard
attribute specifier¶The wildcard
consists of an "any" specifier (?) with an optional constraint
clause, that constrains the set of attributes that Lux enumerates over.
# user is interested in any ordinal attribute
wildcard = lux.Clause("?",data_type="temporal")
# user is interested in any measure attribute
wildcard = lux.Clause("?",data_model="measure")
Origin
with respect to other measure variables¶origin = lux.Clause("Origin")
anyMeasure = lux.Clause("?",data_model="measure")
VisList([origin, anyMeasure],df)
Filter
specification¶# user is interested in only Ford cars
value = "ford"
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value)
# user is interested in cars that are either Ford, Chevrolet, or Toyota
value = ["ford","chevrolet","toyota"]
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value)
# user is interested in cars that are of any Brand
value = "?"
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value)
Horsepower
for different Brands¶horsepower = lux.Clause("Horsepower")
anyBrand = lux.Clause(attribute="Brand", filter_op="=",value="?")
VisList([horsepower, anyBrand],df)