The notebook can contains the code for the accompanying blogpost titled Intelligent Visual Data Discovery with Lux — A Python library by Parul Pandey
Artwork by @allison_horst
The palmer penguins dataset has currently become a favorite of the data science community and is a drop-in replacement for the overused Iris dataset. The dataset consists of data for 344 penguins. The data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. The data can be downloaded from here. Let’s start by installing and importing the Lux library.
#Uncomment the code to install the library and desired extensions
#! pip install lux-api
#Activating extension for Jupyter notebook
#! jupyter nbextension install --py luxwidget
#! jupyter nbextension enable --py luxwidget
#Activating extension for Jupyter lab
#! jupyter labextension install @jupyter-widgets/jupyterlab-manager
#! jupyter labextension install luxwidget
For more details like using Lux with SQL engine, read the documentation, which is pretty robust and contains many hands-on examples.
Once the Lux library has been installed, we’ll import it along with our dataset.
import pandas as pd
import lux
df = pd.read_csv('../data/penguins.csv')
Lux's nice thing is that it can be used as it is with the pandas dataframe and doesn’t require any modifications to the existing syntax. For instance, if you drop any column or row, the recommendations are regenerated based on the updated dataframe. All the nice functionalities that we get from pandas like dropping columns, importing CSVs, etc., are also preserved. Let’s get an overview of the data set.
df.info()
There are some missing values in the dataset. Let’s get rid of those.
df = df.dropna()
Our data is now in memory, and we are all set to see how Lux can ease the EDA process for us.
When we print out the data frame, we see the default pandas table display. We can toggle it to get a set of recommendations generated automatically by Lux.
df
The recommendations in lux are organized by three different tabs, which represent potential next steps that users can take in their exploration. From the visualisations we infer that there are three different species of penguins — Adelie, Chinstrap, and Gentoo. There are also three different islands — Torgersen, Biscoe, and Dream; and both male and female species have been included in the dataset.
Beyond the basic recommendations, we can also specify our analysis intent. Let's say that we want to find out how the culmen length varies with the species. We can set the intent here as [‘culmen_length_mm’,’species’].
When we print out the data frame again, we can see that the recommendations are steered to what is relevant to the intent that we’ve specified.
df.intent = ['culmen_length_mm','species']
df
On the left-hand side in the image below, what we see is Current Visualization
corresponding to the attributes that we have selected. On the right-hand side, we have Enhance
i.e. what happens when we add an attribute to the current selection. We also have the Filter
tab which adds filter while fixing the selected variable.
If you closely look at the correlations within species, culmen length and depth are positively correlated. This is a classic example of Simpson’s paradox.
Finally, you can get a pretty clear separation between all three species by looking at flipper length versus culmen length.
Lux also makes it pretty easy to export and share the generated visualizations. The visualizations can be exported into a static HTML as follows:
df.save_as_html('file.html')
We can also access the set of recommendations generated for the data frames via the properties recommendation. The output is a dictionary, keyed by the name of the recommendation category.
df.recommendation
Not only can we export visualization as HTML but also as code. The visualizations can then be exported to code in Altair for further edits or as Vega-Lite specification. More details can be found in the documentation.
vis = df.recommendation["Enhance"][1]
vis
print(vis.to_altair())
print(vis.to_vegalite())
For more support and resources on Lux: