This tutorial will guide you through the process of creating visualizations in Altair. For details on installing Altair or its underlying philosophy, please see the Altair Documentation
Outline:
Chart
objectThis tutorial is written in the form of a Jupyter Notebook; we suggest downloading the notebook and following along, executing the code yourself as we go. For creating Altair visualizations in the notebook, all that is required is to install the package and its dependencies and import the Altair namespace:
import altair as alt
Data in Altair is built around the Pandas Dataframe.
For the purposes of this tutorial, we'll start by importing Pandas and creating a simple DataFrame
to visualize, with a categorical variable in column a
and a numerical variable in column b
:
import pandas as pd
data = pd.DataFrame({'a': list('CCCDDDEEE'),
'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})
data
a | b | |
---|---|---|
0 | C | 2 |
1 | C | 7 |
2 | C | 4 |
3 | D | 1 |
4 | D | 2 |
5 | D | 6 |
6 | E | 8 |
7 | E | 4 |
8 | E | 7 |
In Altair, every dataset should be provided as a Dataframe
, or as a URL referencing an appropriate dataset (see Defining Data).
The fundamental object in Altair is the Chart
. It takes the dataframe as a single argument:
chart = alt.Chart(data)
Fundamentally, a Chart
is an object which knows how to emit a JSON dictionary representing the data and visualization encodings (see below), which can be sent to the notebook and rendered by the Vega-Lite JavaScript library.
Here is what that JSON looks like for the current chart (since the chart is not yet complete, we turn off chart validation):
chart.to_dict(validate=False)
{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}}, 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'}, '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json', 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2}, {'a': 'C', 'b': 7}, {'a': 'C', 'b': 4}, {'a': 'D', 'b': 1}, {'a': 'D', 'b': 2}, {'a': 'D', 'b': 6}, {'a': 'E', 'b': 8}, {'a': 'E', 'b': 4}, {'a': 'E', 'b': 7}]}}
At this point the specification contains only the data and the default configuration, but no visualization specification.
Next we can decide what sort of mark we would like to use to represent our data.
For example, we can choose the point
mark to represent each data as a point on the plot:
chart = alt.Chart(data).mark_point()
chart
The result is a visualization with one point per row in the data, though it is not a particularly interesting: all the points are stacked right on top of each other! To see how this affects the specification, we can once again examine the dictionary representation:
chart.to_dict()
{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}}, 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'}, 'mark': 'point', '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json', 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2}, {'a': 'C', 'b': 7}, {'a': 'C', 'b': 4}, {'a': 'D', 'b': 1}, {'a': 'D', 'b': 2}, {'a': 'D', 'b': 6}, {'a': 'E', 'b': 8}, {'a': 'E', 'b': 4}, {'a': 'E', 'b': 7}]}}
Notice that now in addition to the data, the specification includes information about the mark type.
The next step is to add visual encodings (or encodings for short) to the chart. A visual encoding specifies how a given data column should be mapped onto the visual properties of the visualization. Some of the more frequenty used visual encodings are listed here:
For a complete list of these encodings, see the Encodings section of the documentation.
Visual encodings can be created with the encode()
method of the Chart
object. For example, we can start by mapping the y
axis of the chart to column a
:
chart = alt.Chart(data).mark_point().encode(y='a')
chart
The result is a one-dimensional visualization representing the values taken on by a
.
As above, we can view the JSON data generated for this visualization:
chart.to_dict()
{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}}, 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'}, 'mark': 'point', 'encoding': {'y': {'type': 'nominal', 'field': 'a'}}, '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json', 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2}, {'a': 'C', 'b': 7}, {'a': 'C', 'b': 4}, {'a': 'D', 'b': 1}, {'a': 'D', 'b': 2}, {'a': 'D', 'b': 6}, {'a': 'E', 'b': 8}, {'a': 'E', 'b': 4}, {'a': 'E', 'b': 7}]}}
The result is the same as above with the addition of the 'encoding'
key, which specifies the visualization channel (y
), the name of the field (a
), and the type of the variable (nominal
).
Altair is able to automatically determine the type of the variable using built-in heuristics. Altair and Vega-Lite support four primitive data types:
Data Type | Code | Description |
---|---|---|
quantitative | Q | Numerical quantity (real-valued) |
nominal | N | Name / Unordered categorical |
ordinal | O | Ordered categorial |
temporal | T | Date/time |
You can set the data type of a column explicitly using a one letter code attached to the column name with a colon:
alt.Chart(data).mark_point().encode(y='a:N')
The visualization can be made more interesting by adding another channel to the encoding: let's encode column b
as the x
position:
alt.Chart(data).mark_point().encode(
y='a',
x='b'
)
With two visual channels encoded, we can see the raw data points in the DataFrame
. A different mark type can be chosen using a different mark_*()
method, such as mark_bar()
:
alt.Chart(data).mark_bar().encode(
alt.Y('a'),
alt.X('b')
)
Notice, we have used a slightly different syntax for specifying the channels using classes (alt.X
and alt.Y
) passed as positional arguments. These classes allow additional arguments to be passed to each channel, as we will see below.
Here are some of the more commonly used mark_*()
methods supported in Altair and Vega-Lite; for more detail see Marks in the Altair documentation:
Method |
---|
mark_area() |
mark_bar() |
mark_circle() |
mark_line() |
mark_point() |
mark_rule() |
mark_square() |
mark_text() |
mark_tick() |
Altair and Vega-Lite also support a variety of built-in data transformations, such as aggregation. The easiest way to specify such aggregations is through a string-function syntax in the argument to the column name. For example, here we will plot not all the values, but a single point representing the mean of the x-values for a given y-value:
alt.Chart(data).mark_point().encode(
y='a',
x='mean(b)'
)
Conceptually, this is equivalent to the following groupby operation:
data.groupby('a').mean()
b | |
---|---|
a | |
C | 4.333333 |
D | 3.000000 |
E | 6.333333 |
More typically, aggregated values are displayed using bar charts.
Making this change is as simple as replacing mark_point()
with mark_bar()
:
chart = alt.Chart(data).mark_bar().encode(
y='a',
x='mean(b)'
)
chart
As above, Altair's role in this visualization is converting the resulting object into an appropriate JSON dict. Here it is, leaving out the data for clarity:
chart.to_dict()
{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}}, 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'}, 'mark': 'bar', 'encoding': {'x': {'type': 'quantitative', 'aggregate': 'mean', 'field': 'b'}, 'y': {'type': 'nominal', 'field': 'a'}}, '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json', 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2}, {'a': 'C', 'b': 7}, {'a': 'C', 'b': 4}, {'a': 'D', 'b': 1}, {'a': 'D', 'b': 2}, {'a': 'D', 'b': 6}, {'a': 'E', 'b': 8}, {'a': 'E', 'b': 4}, {'a': 'E', 'b': 7}]}}
Notice that Altair has taken the string 'mean(b)'
and converted it to a mapping that includes field
, type
, and aggregate
. The full shorthand syntax for the column names in Altair also includes the explicit type code separated by a column:
x = alt.X('mean(b):Q')
x.to_dict()
{'type': 'quantitative', 'aggregate': 'mean', 'field': 'b'}
This shorthand is equivalent to spelling-out these properties by name:
x = alt.X('b', aggregate='average', type='quantitative')
x.to_dict()
{'type': 'quantitative', 'aggregate': 'average', 'field': 'b'}
This is one benefit of using the Altair API over writing the Vega-Lite spec from scratch: valid Vega-Lite specifications can be created very succinctly, with less boilerplate code.
To speed the process of data exploration, Altair (via Vega-Lite) makes some choices about default properties of the visualization.
Altair also provides an API to customize the look of the visualization. For example, we can use the X
object we saw above to override the default x-axis title:
alt.Chart(data).mark_bar().encode(
y='a',
x=alt.X('mean(b)', axis=alt.Axis(title='Mean of quantity b'))
)
The properties of marks can be configured by passing keyword arguments to the mark_*()
methods; for example, any named HTML color is supported:
alt.Chart(data).mark_bar(color='firebrick').encode(
y='a',
x=alt.X('mean(b)', axis=alt.Axis(title='Mean of quantity b'))
)
Similarly, we can set properties of the chart such as width and height using the properties()
method:
chart = alt.Chart(data).mark_bar().encode(
y='a',
x=alt.X('average(b)', axis=alt.Axis(title='Average of b'))
).properties(
width=400,
height=300
)
chart
As above, we can inspect how these configuration options affect the resulting Vega-lite specification:
chart.to_dict()
{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}}, 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'}, 'mark': 'bar', 'encoding': {'x': {'type': 'quantitative', 'aggregate': 'average', 'axis': {'title': 'Average of b'}, 'field': 'b'}, 'y': {'type': 'nominal', 'field': 'a'}}, 'height': 300, 'width': 400, '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json', 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2}, {'a': 'C', 'b': 7}, {'a': 'C', 'b': 4}, {'a': 'D', 'b': 1}, {'a': 'D', 'b': 2}, {'a': 'D', 'b': 6}, {'a': 'E', 'b': 8}, {'a': 'E', 'b': 4}, {'a': 'E', 'b': 7}]}}
To learn more about the various properties of chart objects, you can use Jupyter's help syntax:
alt.Chart?
Init signature: alt.Chart( data=Undefined, encoding=Undefined, mark=Undefined, width=Undefined, height=Undefined, **kwargs, ) Docstring: Create a basic Altair/Vega-Lite chart. Although it is possible to set all Chart properties as constructor attributes, it is more idiomatic to use methods such as ``mark_point()``, ``encode()``, ``transform_filter()``, ``properties()``, etc. See Altair's documentation for details and examples: http://altair-viz.github.io/. Attributes ---------- data : Data An object describing the data source mark : AnyMark A string describing the mark type (one of `"bar"`, `"circle"`, `"square"`, `"tick"`, `"line"`, * `"area"`, `"point"`, `"rule"`, `"geoshape"`, and `"text"`) or a MarkDef object. encoding : FacetedEncoding A key-value mapping between encoding channels and definition of fields. autosize : anyOf(AutosizeType, AutoSizeParams) Sets how the visualization size should be determined. If a string, should be one of `"pad"`, `"fit"` or `"none"`. Object values can additionally specify parameters for content sizing and automatic resizing. `"fit"` is only supported for single and layered views that don't use `rangeStep`. __Default value__: `pad` background : string CSS color property to use as the background of visualization. **Default value:** none (transparent) config : Config Vega-Lite configuration object. This property can only be defined at the top-level of a specification. description : string Description of this mark for commenting purpose. height : float The height of a visualization. name : string Name of the visualization for later reference. padding : Padding The default visualization padding, in pixels, from the edge of the visualization canvas to the data rectangle. If a number, specifies padding for all sides. If an object, the value should have the format `{"left": 5, "top": 5, "right": 5, "bottom": 5}` to specify padding for each side of the visualization. __Default value__: `5` projection : Projection An object defining properties of geographic projection. Works with `"geoshape"` marks and `"point"` or `"line"` marks that have a channel (one or more of `"X"`, `"X2"`, `"Y"`, `"Y2"`) with type `"latitude"`, or `"longitude"`. selection : Mapping(required=[]) A key-value mapping between selection names and definitions. title : anyOf(string, TitleParams) Title for the plot. transform : List(Transform) An array of data transformations such as filter and new field calculation. width : float The width of a visualization. File: ~/miniconda3/lib/python3.7/site-packages/altair/vegalite/v4/api.py Type: type Subclasses:
You can also read more in Altair's Configuration documentation.
Because Altair produces Vega-Lite specifications, it is relatively straightforward to export charts and publish them on the web as Vega-Lite plots.
All that is required is to load the Vega-Lite javascript library, and pass it the JSON plot specification output by Altair.
For convenience Altair provides a save()
method, which will save any chart to HTML:
chart.save('chart.html')
!cat chart.html
<!DOCTYPE html> <html> <head> <style> .error { color: red; } </style> <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"></script> <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@4.0.0"></script> <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@6"></script> </head> <body> <div id="vis"></div> <script> (function(vegaEmbed) { var spec = {"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}}, "data": {"name": "data-347f1284ea3247c0f55cb966abbdd2d8"}, "mark": "bar", "encoding": {"x": {"type": "quantitative", "aggregate": "average", "axis": {"title": "Average of b"}, "field": "b"}, "y": {"type": "nominal", "field": "a"}}, "height": 300, "width": 400, "$schema": "https://vega.github.io/schema/vega-lite/v4.0.0.json", "datasets": {"data-347f1284ea3247c0f55cb966abbdd2d8": [{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4}, {"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6}, {"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}]}}; var embedOpt = {"mode": "vega-lite"}; function showError(el, error){ el.innerHTML = ('<div class="error" style="color:red;">' + '<p>JavaScript Error: ' + error.message + '</p>' + "<p>This usually means there's a typo in your chart specification. " + "See the javascript console for the full traceback.</p>" + '</div>'); throw error; } const el = document.getElementById('vis'); vegaEmbed("#vis", spec, embedOpt) .catch(error => showError(el, error)); })(vegaEmbed); </script> </body> </html>
Notice that the chart specification is passed to the vegaEmbed
library in the spec
variable; the rest of the code is a template that is constant regardless of the chart.
We can view the output in an iframe within the notebook (note that some online notebook viewers will not display iframes):
# Display IFrame in IPython
from IPython.display import IFrame
IFrame('chart.html', width=400, height=200)
Alternatively, you can use your web browser to open the file manually to confirm that it works: chart.html.
For more information on Altair, please refer to Altair's online documentation: http://altair-viz.github.io/
You can also see some of the example plots listed in the accompanying notebooks.