# Formal Language Definition¶

In [ ]:
import pandas as pd
import lux
from lux.vis.VisList import VisList
from lux.vis.Vis import Vis

In [ ]:
# Collecting basic usage statistics for Lux (For more information, see: https://tinyurl.com/logging-consent)
lux.logger = True # Remove this line if you do not want your interactions recorded


The Lux intent specification can be defined as a context-free grammar (CFG). Here, we introduce a formal definition of the intent language in Lux for interested readers.

In [ ]:
df = pd.read_csv('https://github.com/lux-org/lux-datasets/blob/master/data/cars.csv?raw=true')
df["Year"] = pd.to_datetime(df["Year"], format='%Y')


## Composing a Lux Intent with Clause objects¶

An intent in Lux corresponds to the Kleene star of Clause objects, i.e., it can have either zero, one, or multiple Clauses.

$$\langle Intent\rangle \rightarrow \langle Clause\rangle^* \\$$
In [4]:
spec1 = lux.Clause("MilesPerGal")
spec2 = lux.Clause("Horsepower")
spec3 = lux.Clause("Origin=USA")
intent = [spec1, spec2, spec3]


Here is an example of how we can formulate an intent as a list of Clause and generate a visualization. In this tutorial, we will discuss how the Clause breaks down to different production rules.

In [5]:
Vis(intent,df)

Out[5]:
<Vis  (x: MilesPerGal, y: Horsepower -- [Origin=USA]) mark: scatter, score: 0.0 >

A Clause can either be an Axis specification or a Filter specification. Note that it is not possible for a Clause to be both an Axis and a Filter, but they can be specified as separate Clauses in the intent.

$$\begin{split} \langle Clause\rangle &\rightarrow \langle Axis \rangle \\ &\rightarrow \langle Filter \rangle \end{split}$$
In [ ]:
axisSpec = lux.Clause(attribute="MilesPerGal")
# Equivalent, easier-to-specify Clause syntax : lux.Clause("MilesPerGal")
axisSpec

In [ ]:
filterSpec = lux.Clause(attribute="Origin",filter_op="=",value="USA")
# Equivalent, easier-to-specify Clause syntax : lux.Clause("Origin=USA")
filterSpec


## Axis specification¶

An Axis requires an attribute specification, and an optional channel, aggregation, or bin_size specification. $$\langle Axis \rangle \rightarrow \langle attribute \rangle \langle channel \rangle \langle aggregation \rangle \langle bin\_size \rangle$$

An attribute can either be a single column in the dataset, a list of columns, or a wildcard. $$\begin{split} \langle attribute \rangle &\rightarrow \textrm{attribute} \\ &\rightarrow \textrm{attribute} \cup \langle attribute \rangle\\ &\rightarrow \langle wildcard \rangle \end{split}$$

In [ ]:
# user is interested in "MilesPerGal"
attribute = lux.Clause("Origin")

# user is interested in "MilesPerGal","Horsepower", or "Weight"
attribute = lux.Clause(["MilesPerGal","Horsepower","Weight"])

# user is interested in any attribute
attribute = lux.Clause("?")


Optional specification of the Axis include :

\begin{aligned} &\langle channel\rangle \rightarrow (\textrm{x } |\textrm{ y }|\textrm{ color }|\textrm{ auto})\\ &\langle aggregation\rangle \rightarrow (\textrm{mean }| \textrm{ sum } | \textrm{ count } | \textrm{ min } | \textrm{ max } | \textrm{ any numpy aggregation function }| \textrm{ auto})\\ &\langle bin\_size \rangle \rightarrow ( \textrm{any integer } | \textrm{ auto})\\ \end{aligned}
In [ ]:
# Ensure that "MilesPerGal" is placed on the x axis
axisSpec = lux.Clause("MilesPerGal",channel="x")

# Apply sum on "MilesPerGal"
axisSpec = lux.Clause("MilesPerGal",aggregation="sum")

# Divide "MilesPerGal" into 50 bins
axisSpec = lux.Clause("MilesPerGal",bin_size=50)


#### Example: Effects of optional attribute specification parameters¶

By default, if we specify only an attribute, the system automatically infers the appropriate channel, aggregation, or bin_size.

In [ ]:
axisSpec = "MilesPerGal"
Vis([axisSpec],df)


We can increase the bin_size as an optional parameter:

In [ ]:
bin50MPG = lux.Clause("MilesPerGal",bin_size=50)
Vis([bin50MPG],df)


For bar charts, Lux uses a default aggregation of mean and displays a horizontal bar chart with Origin on the y axis.

In [ ]:
axisSpec1 = "MilesPerGal"
axisSpec2 = "Origin"
Vis([axisSpec1,axisSpec2],df)


We can change the mean to a sum aggregation:

In [ ]:
axisSpec1 = lux.Clause("MilesPerGal",aggregation="sum")
axisSpec2 = "Origin"
Vis([axisSpec1,axisSpec2],df)


Or we can set the Origin on the x-axis to get a vertical bar chart instead:

In [ ]:
axisSpec1 = "MilesPerGal"
axisSpec2 = lux.Clause("Origin",channel="x")
Vis([axisSpec1,axisSpec2],df)


### Wildcard attribute specifier¶

The wildcard consists of an "any" specifier (?) with an optional constraint clause, that constrains the set of attributes that Lux enumerates over.

$$\langle wildcard \rangle \rightarrow \textrm{( ? )} \langle constraint\rangle\\$$$$\langle constraint \rangle \rightarrow \langle data\_model\rangle \langle data\_type\rangle\\$$\begin{aligned} &\langle data\_type\rangle \rightarrow (\textrm{quantitative }| \textrm{ nominal } | \textrm{ ordinal } | \textrm{ temporal } | \textrm{ auto})\\ &\langle data\_model\rangle \rightarrow (\textrm{dimension }|\textrm{ measure }|\textrm{ auto})\\ \end{aligned}
In [ ]:
# user is interested in any ordinal attribute
wildcard = lux.Clause("?",data_type="temporal")

# user is interested in any measure attribute
wildcard = lux.Clause("?",data_model="measure")


#### Example: Origin with respect to other measure variables¶

In [ ]:
origin = lux.Clause("Origin")
anyMeasure = lux.Clause("?",data_model="measure")
VisList([origin, anyMeasure],df)


## Filter specification¶

$$\langle Filter \rangle \rightarrow \langle attribute\rangle (=|>|<|\leq|\geq|\neq) \langle value\rangle\\$$$$\begin{split} \langle value \rangle &\rightarrow \textrm{value} \\ &\rightarrow \textrm{value} \cup \langle value \rangle\\ &\rightarrow (\textrm{?}) \end{split}$$
In [ ]:
# user is interested in only Ford cars
value = "ford"
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value)

# user is interested in cars that are either Ford, Chevrolet, or Toyota
value = ["ford","chevrolet","toyota"]
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value)

# user is interested in cars that are of any Brand
value = "?"
filterSpec = lux.Clause(attribute="Brand", filter_op="=",value=value)


#### Example: Distribution of Horsepower for different Brands¶

In [ ]:
horsepower = lux.Clause("Horsepower")
anyBrand = lux.Clause(attribute="Brand", filter_op="=",value="?")
VisList([horsepower, anyBrand],df)