Lux provides a flexible language for communicating your analysis intent to the system, so that Lux can provide better and more relevant recommendations to you. In this tutorial, we will see different ways of specifying the intent, including the attributes and values that you are interested or not interested in, enumeration specifiers, as well as any constraints on the visualization encoding.
The primary way to set the current intent associated with a dataframe is by setting the intent
property of the dataframe, and providing a list of specification as input. We will first describe how intent can be specified through convenient shorthand descriptions as string inputs, then we will describe advance usage via the lux.Clause
object.
import pandas as pd
import lux
df = pd.read_csv("../data/college.csv")
lux.config.default_display = "lux" # Setting default display as Lux
You can indicate that you are interested in an attribute, let's say AverageCost
.
df.intent = ['AverageCost']
df
You might be interested in multiple attributes, for instance you might want to look at both AverageCost
and FundingModel
. When multiple clauses are specified, Lux applies all the clauses in the intent and searches for visualizations that are relevant to AverageCost
and FundingModel
.
df.intent = ['AverageCost','FundingModel']
df
Let's say that in addition to AverageCost
, you are interested in the looking at a list of attributes that are related to different financial measures, such as Expenditure
or MedianDebt
, and how they breakdown with respect to FundingModel
.
You can specify a list of desired attributes separated by the |
symbol, which indicates an OR
relationship between the list of attributes. If multiple clauses are specified, Lux automatically create combinations of the specified attributes.
possible_attributes = "AverageCost|Expenditure|MedianDebt|MedianEarnings"
df.intent = [possible_attributes,"FundingModel"]
df
Alternatively, you could also provide the specification as a list:
possible_attributes = ['AverageCost','Expenditure','MedianDebt','MedianEarnings']
df.intent = [possible_attributes,"FundingModel"]
df
In Lux, you can also specify particular values corresponding to subsets of the data that you might be interested in. For example, you may be interested in only colleges located in New England.
df.intent = ["Region=New England"]
df
You can also specify multiple values of interest using the same |
notation that we saw earlier.
For example, you might want to compare the MedianDebt
of students from colleges in New England
, Southeast
, and Far West
.
Write the corresponding code to specify this intent.
# Write code to specify the intent here
# Hint: Try printing out your dataframe to see if your intent is specified correctly
The Lux supports advanced capabilities in specifying the intent, such as contraining an attribute to display on a specific axes, or modifying the aggregation or binning parameters, through the lux.Clause object, which can be thought of as a single unit of intent. We will see some example of how lux.Clause is used thorughout the subsequent tutorials, see this page for more information.