Formal Language Definition

In [ ]:
import pandas as pd
import lux
from lux.vis.VisList import VisList
from lux.vis.Vis import Vis
In [ ]:
# Collecting basic usage statistics for Lux (For more information, see: https://tinyurl.com/logging-consent)
lux.logger = True # Remove this line if you do not want your interactions recorded

The Lux intent specification can be defined as a context-free grammar (CFG). Here, we introduce a formal definition of the intent language in Lux for interested readers.

In [ ]:
df = pd.read_csv('https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true')
df["Year"] = pd.to_datetime(df["Year"], format='%Y') 

Composing a Lux Intent with Clause objects

An intent in Lux corresponds to the Kleene star of Clause objects, i.e., it can have either zero, one, or multiple Clauses.

\begin{equation} \langle Intent\rangle \rightarrow \langle Clause\rangle^* \\ \end{equation}
In [4]:
spec1 = lux.Clause("MilesPerGal")
spec2 = lux.Clause("Horsepower")
spec3 = lux.Clause("Origin=USA")
intent = [spec1, spec2, spec3]

Here is an example of how we can formulate an intent as a list of Clause and generate a visualization. In this tutorial, we will discuss how the Clause breaks down to different production rules.

In [5]:
Vis(intent,df)
Out[5]:
<Vis  (x: MilesPerGal, y: Horsepower -- [Origin=USA]) mark: scatter, score: 0.0 >

A Clause can either be an Axis specification or a Filter specification. Note that it is not possible for a Clause to be both an Axis and a Filter, but they can be specified as separate Clauses in the intent.

\begin{equation} \begin{split} \langle Clause\rangle &\rightarrow \langle Axis \rangle \\ &\rightarrow \langle Filter \rangle \end{split} \end{equation}
In [ ]:
axisSpec = lux.Clause(attribute="MilesPerGal") 
# Equivalent, easier-to-specify Clause syntax : lux.Clause("MilesPerGal") 
axisSpec
In [ ]:
filterSpec = lux.Clause(attribute="Origin",filter_op="=",value="USA")
# Equivalent, easier-to-specify Clause syntax : lux.Clause("Origin=USA") 
filterSpec

Axis specification

An Axis requires an attribute specification, and an optional channel, aggregation, or bin_size specification. \begin{equation} \langle Axis \rangle \rightarrow \langle attribute \rangle \langle channel \rangle \langle aggregation \rangle \langle bin\_size \rangle \end{equation}

An attribute can either be a single column in the dataset, a list of columns, or a wildcard. \begin{equation} \begin{split} \langle attribute \rangle &\rightarrow \textrm{attribute} \\ &\rightarrow \textrm{attribute} \cup \langle attribute \rangle\\ &\rightarrow \langle wildcard \rangle \end{split} \end{equation}

In [ ]:
# user is interested in "MilesPerGal"
attribute = lux.Clause("Origin") 

# user is interested in "MilesPerGal","Horsepower", or "Weight"
attribute = lux.Clause(["MilesPerGal","Horsepower","Weight"]) 

# user is interested in any attribute
attribute = lux.Clause("?") 

Optional specification of the Axis include :

\begin{equation} \begin{aligned} &\langle channel\rangle \rightarrow (\textrm{x } |\textrm{ y }|\textrm{ color }|\textrm{ auto})\\ &\langle aggregation\rangle \rightarrow (\textrm{mean }| \textrm{ sum } | \textrm{ count } | \textrm{ min } | \textrm{ max } | \textrm{ any numpy aggregation function }| \textrm{ auto})\\ &\langle bin\_size \rangle \rightarrow ( \textrm{any integer } | \textrm{ auto})\\ \end{aligned} \end{equation}
In [ ]:
# Ensure that "MilesPerGal" is placed on the x axis
axisSpec = lux.Clause("MilesPerGal",channel="x")

# Apply sum on "MilesPerGal" 
axisSpec = lux.Clause("MilesPerGal",aggregation="sum")

# Divide "MilesPerGal" into 50 bins
axisSpec = lux.Clause("MilesPerGal",bin_size=50)

Example: Effects of optional attribute specification parameters

By default, if we specify only an attribute, the system automatically infers the appropriate channel, aggregation, or bin_size.

In [ ]:
axisSpec = "MilesPerGal"
Vis([axisSpec],df)

We can increase the bin_size as an optional parameter:

In [ ]:
bin50MPG = lux.Clause("MilesPerGal",bin_size=50)
Vis([bin50MPG],df)

For bar charts, Lux uses a default aggregation of mean and displays a horizontal bar chart with Origin on the y axis.

In [ ]:
axisSpec1 = "MilesPerGal"
axisSpec2 = "Origin"
Vis([axisSpec1,axisSpec2],df)

We can change the mean to a sum aggregation:

In [ ]:
axisSpec1 = lux.Clause("MilesPerGal",aggregation="sum")
axisSpec2 = "Origin"
Vis([axisSpec1,axisSpec2],df)

Or we can set the Origin on the x-axis to get a vertical bar chart instead:

In [ ]:
axisSpec1 = "MilesPerGal"
axisSpec2 = lux.Clause("Origin",channel="x")
Vis([axisSpec1,axisSpec2],df)

Wildcard attribute specifier

The wildcard consists of an "any" specifier (?) with an optional constraint clause, that constrains the set of attributes that Lux enumerates over.

\begin{equation} \langle wildcard \rangle \rightarrow \textrm{( ? )} \langle constraint\rangle\\ \end{equation}\begin{equation} \langle constraint \rangle \rightarrow \langle data\_model\rangle \langle data\_type\rangle\\ \end{equation}\begin{equation} \begin{aligned} &\langle data\_type\rangle \rightarrow (\textrm{quantitative }| \textrm{ nominal } | \textrm{ ordinal } | \textrm{ temporal } | \textrm{ auto})\\ &\langle data\_model\rangle \rightarrow (\textrm{dimension }|\textrm{ measure }|\textrm{ auto})\\ \end{aligned} \end{equation}
In [ ]:
# user is interested in any ordinal attribute
wildcard = lux.Clause("?",data_type="temporal")

# user is interested in any measure attribute
wildcard = lux.Clause("?",data_model="measure")

Example: Origin with respect to other measure variables

In [ ]:
origin = lux.Clause("Origin") 
anyMeasure = lux.Clause("?",data_model="measure")
VisList([origin, anyMeasure],df)

Filter specification

\begin{equation} \langle Filter \rangle \rightarrow \langle attribute\rangle (=|>|<|\leq|\geq|\neq) \langle value\rangle\\ \end{equation}\begin{equation} \begin{split} \langle value \rangle &\rightarrow \textrm{value} \\ &\rightarrow \textrm{value} \cup \langle value \rangle\\ &\rightarrow (\textrm{?}) \end{split} \end{equation}
In [ ]:
# user is interested in only Ford cars
value = "ford"
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value) 

# user is interested in cars that are either Ford, Chevrolet, or Toyota
value = ["ford","chevrolet","toyota"]
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value) 

# user is interested in cars that are of any Brand
value = "?"
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value) 

Example: Distribution of Horsepower for different Brands

In [ ]:
horsepower = lux.Clause("Horsepower")
anyBrand = lux.Clause(attribute="Brand", filter_op="=",value="?") 
VisList([horsepower, anyBrand],df)