Altair is a declarative statistical visualization library for Python. It offers a powerful and concise visualization grammar that enables users to build a wide range of statistical visualizations quickly and simply.
Here are some benefits of using Altair for visualization:
The graph can easily be interative
Every visualization generated by Altair can be downloaded as PNG file if click the three dots on the upper right side of the graph
Coding grammar is greatly formatted and easy to add features
(Note: materials included are common useful visualizations methods collected from https://altair-viz.github.io/, if it doesn't include any specific visualization problem, please visit the website for further reference)
If you are using pip:
! pip install altair vega_datasets
If you are using conda:
! conda install -c conda-forge altair vega_datasets
Loading the Altair package is similar to loading other packages.
import pandas as pd
import numpy as np
import altair as alt
The fundamental object in Altair is the Chart, which takes a dataframe as a single argument.
alt.Chart(dataframe)
However, on its own, it will not draw anything because we have not yet told the chart to do anything with the data.We need to specify marks to successully draw the graph.
The mark property lets you specify how the data needs to be represented on the plot.
alt.Chart(dataframe).mark_point()
Once we have the data and determined how it is represented, we want to specify what columns in the dataframe to represent it. That is, we need to set up the x and y data, size, color, etc. This is where we use encodings.
alt.Chart(dataframe).mark_point().encode()
After knowing the general structure of the command, we can explore more deeply in each categories.
The mark property lets you specify how the data needs to be represented on the plot. Following are the common mark properties provided by Altair:
(For detailed marks, visit https://altair-viz.github.io/user_guide/marks.html)
Mark Type | Command | Description |
---|---|---|
area | mark_area() |
A filled area |
line | mark_line() |
A line plot |
bar | mark_bar() |
A bar plot |
point | mark_point() |
A scatter plot with hollow point |
circle | mark_circle() |
A scatter plot with solid point |
text | mark_line() |
A scatter plot with point as text |
square | mark_square() |
A scatter plot with square point |
rect | mark_rect() |
A heatmap |
box plot | mark_boxplot() |
A box plot |
from vega_datasets import data
cars = data.cars()
alt.Chart(cars).mark_circle().encode(
x='Horsepower:Q',
y='Acceleration:Q'
)
df = pd.DataFrame({'x':np.array([-1,0,1,2,3]),'y':np.array([-1,0,1,2,3])**2})
alt.Chart(df).mark_line().encode(
x='x:Q',
y='y:Q'
)
alt.Chart(cars).mark_boxplot().encode(
y='Cylinders:O',
x='Acceleration:Q'
)
alt.Chart(cars).mark_rect().encode(
y='Cylinders:O',
x='Origin:O',
color = 'count():Q'
).properties(
width=200,
height=200
)
Following are some common encoding parameters to put inside .encode()
:
(For detailed encoding channels, visit https://altair-viz.github.io/user_guide/encoding.html)
Encoding Description | Command | Description |
---|---|---|
x-axis | alt.X() |
x-axis data |
y-axis value | alt.Y() |
y-axis data |
size | alt.Size() |
size change with data |
shape | alt.Shape() |
shape change with data |
color | alt.Color() |
color change with data |
Sometimes we need to deal with different types of data. For example, for continuous data, we want the color to change gradually, but for categorical data, we want the color to be distinct for each category.
There is a way to specify each data type in encoding()
:
alt.Chart(df).mark_point().encode(x = ':Q')
or
alt.Chart(df).mark_point().encode(alt.X(field = '', type = 'quantitative')
Here are possible data types:
Data Type | Shorthand Code | Description |
---|---|---|
quantitative | Q |
a continuous real-valued quantity |
ordinal | O |
a discrete ordered quantity |
nominal | N |
a discrete unordered category |
temporal | T |
a time or date value |
geojson | G |
a geographic shape |
base = alt.Chart(cars).mark_point().encode(
alt.X('Horsepower',
type = 'quantitative',
title = 'Horsepower'),
alt.Y('Acceleration',
type = 'quantitative',
title = 'Acceleration')
).properties(
width=150,
height=150
)
# horizontally concat three graphs
alt.hconcat(
base.encode(alt.Color('Cylinders', type = 'quantitative')).properties(title='quantitative'),
base.encode(alt.Color('Cylinders', type = 'ordinal')).properties(title='ordinal'),
base.encode(alt.Color('Cylinders', type = 'nominal')).properties(title='nominal')
)
Tooltip lets us to show the details that the data point represents when moving around.
tooltip = [alt.Tooltip('')]
To make the graph a interactive plot, we can add .interactive()
after all the marks and encodings.
alt.Chart(dataframe).mark_point().encode().interactive()
alt.Chart(cars).mark_point(size = 50).encode(
alt.X('Horsepower',
type = 'quantitative',
title = 'Horsepower'),
alt.Y('Acceleration',
type = 'quantitative',
title = 'Acceleration'),
alt.Color('Cylinders', type = 'ordinal'),
alt.Shape('Origin', type = 'nominal'),
# include petal length, petal width and species information for each point
tooltip = [alt.Tooltip('Horsepower'),
alt.Tooltip('Acceleration'),
alt.Tooltip('Cylinders'),
alt.Tooltip('Origin')
]
).interactive() # make the plot interactive
There are several ways to transform the original data when during the visualization. Here I selected several useful transformations.
(For detailed transformation methods, visit https://altair-viz.github.io/user_guide/transform/index.html)
alt.Chart(cars).mark_area(interpolate='step').encode(
alt.X("Acceleration:Q",
axis = alt.Axis(title = "MPG"),
bin = alt.Bin(maxbins=10)),
alt.Y("count():Q",
axis = alt.Axis(title = "Count"),
stack=None)
)
There are two common conventions for storing data in a dataframe, sometimes called long-form and wide-form.
Altair’s grammar works best with long-form data, in which each row corresponds to a single observation along with its features. Hence, we can converting wide-form data to the long-form data used by Altair:
.transform_fold([featurs],as = [key, value])
(For detailed information, visit: https://altair-viz.github.io/user_guide/transform/fold.html#user-guide-fold-transform)
wide_form = pd.DataFrame({'Date': ['2021-08-01', '2021-09-01', '2021-10-01'],
'Orange': [5,6,7],
'Apple': [3,4,4],
'Peach': [5,5.5,5]})
wide_form
Date | Orange | Apple | Peach | |
---|---|---|---|---|
0 | 2021-08-01 | 5 | 3 | 5.0 |
1 | 2021-09-01 | 6 | 4 | 5.5 |
2 | 2021-10-01 | 7 | 4 | 5.0 |
alt.Chart(wide_form).transform_fold(
['Orange', 'Apple', 'Peach'],
as_=['fruit', 'price']
).mark_line().encode(
x='Date:T',
y='price:Q',
color='fruit:N'
)
np.random.seed(42)
df = pd.DataFrame({
'x': range(100),
'y': np.random.randn(100).cumsum()
})
chart = alt.Chart(df).mark_point().encode(
x='x',
y='y'
)
chart + chart.transform_loess('x', 'y').mark_line()
(For detailed ways to compound charts, visit https://altair-viz.github.io/user_guide/compound_charts.html)
Layered charts allow user to overlay two different charts on the same set of axes.
(Example above shows a way to layer two graphs, one scatter plot one line plot)
alt.layer
¶from altair.expr import datum
stocks = data.stocks()
base = alt.Chart(stocks).encode(
x='date:T',
y='price:Q',
color='symbol:N'
).transform_filter(
datum.symbol == 'GOOG'
)
alt.layer(
base.mark_line(),
base.mark_point(),
base.mark_rule()
)
chart1 = alt.Chart(cars).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
color='Origin:N'
).properties(
height=300,
width=300
)
chart2 = alt.Chart(cars).mark_bar().encode(
x='count():Q',
y=alt.Y('Miles_per_Gallon:Q', bin=alt.Bin(maxbins=10)),
color='Origin:N'
).properties(
height=300,
width=100
)
chart1 | chart2
Using alt.facet()
can put data into facets:
alt.Chart(cars).mark_point().encode(
x='Horsepower:Q',
y='Acceleration:Q',
color='Origin:N'
).properties(
width=180,
height=180
).facet(
column='Origin:N'
)
(https://altair-viz.github.io/user_guide/customization.html)
Acts on an entire chart object.
Every chart type has a "config" property at the top level that acts as a sort of theme for the whole chart and all of its sub-charts. Here you can specify things like axes properties, mark properties, selection properties, and more. Altair allows you to access these through the configure_*()
methods of the chart.
E.g. alt.Chart().mark_point().encode().configure_mark(opacity=0.2, color='red')
(Detailed ways to change global configuration: https://altair-viz.github.io/user_guide/configuration.html)
Acts on one mark of the chart.
If you would like to configure the look of the mark locally, such that the setting only affects the particular chart property you reference, this can be done via a local configuration setting. In the case of mark properties, the best approach is to set the property as an argument to the mark_*()
method.
E.g. alt.Chart().mark_point(opacity=0.2, color='red').encode()
Unlike when using the global configuration, here it is possible to use the resulting chart as a layer or facet in a compound chart.
Local config settings like this one will always override global settings.
Be used to set some chart properties.
Add a Scale property to the X encoding that specifies zero=False
:
alt.Chart(cars).mark_point().encode(
alt.X('Horsepower:Q',
scale=alt.Scale(zero=False)
),
y='Acceleration:Q'
)
To specify exact axis limits, you can use the domain()
property of the scale.
alt.Chart(cars).mark_point().encode(
alt.X('Horsepower:Q',
scale=alt.Scale(domain=(40, 200))
),
y='Acceleration:Q'
)
There is one problem with rescaling is that some data outside the domain may exist beyond the scale, and we need to tell Altair what to do with this data. One option is to “clip” the data by setting the clip
property of the mark to True
.
alt.Chart(cars).mark_point(clip=True).encode(
alt.X('Horsepower:Q',
scale=alt.Scale(domain=(40, 200))
),
y='Acceleration:Q'
)
In addition to the properties and methods for visualizations selected in this notebook, there are a wide array of different ways for advanced visualizations in Python using Altair on the website https://altair-viz.github.io/. The left column includes a detailed user guide which provides solutions for any possible problems emerged during the visualization process.
alt.Chart(df).mark_point().encode(
alt.X(':Q'
scale=alt.Scale(zero= ),
title='' # x-axis name
),
alt.Y(':Q'
scale=alt.Scale(domain=(,)),
aggregate='',
title=''
),
alt.Color(':Q'
title='' # color legend name
),
alt.Size(':Q'
title='' # size legend name
),
alt.Shape(':Q'
title='' # size legend name
),
tooltip = [alt.Tooltip('')] # add which column to be seen when pointed the data
).properties(
width=,
height=,
title=''
).interactive()