Tutorial: Introduction to Altair¶

This tutorial will guide you through the process of creating visualizations in Altair. For details on installing Altair or its underlying philosophy, please see the Altair Documentation

Outline:

The data
The Chart object
Data encodings and marks
Data transformation: Aggregation
Customizing your visualization
Publishing a visualization online

This tutorial is written in the form of a Jupyter Notebook; we suggest downloading the notebook and following along, executing the code yourself as we go. For creating Altair visualizations in the notebook, all that is required is to install the package and its dependencies and import the Altair namespace:

In [1]:

import altair as alt

The data¶

Data in Altair is built around the Pandas Dataframe. For the purposes of this tutorial, we'll start by importing Pandas and creating a simple DataFrame to visualize, with a categorical variable in column a and a numerical variable in column b:

In [2]:

import pandas as pd
data = pd.DataFrame({'a': list('CCCDDDEEE'),
                     'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})
data

Out[2]:

	a	b
0	C	2
1	C	7
2	C	4
3	D	1
4	D	2
5	D	6
6	E	8
7	E	4
8	E	7

In Altair, every dataset should be provided as a Dataframe, or as a URL referencing an appropriate dataset (see Defining Data).

The Chart object¶

The fundamental object in Altair is the Chart. It takes the dataframe as a single argument:

In [3]:

chart = alt.Chart(data)

Fundamentally, a Chart is an object which knows how to emit a JSON dictionary representing the data and visualization encodings (see below), which can be sent to the notebook and rendered by the Vega-Lite JavaScript library.

Here is what that JSON looks like for the current chart (since the chart is not yet complete, we turn off chart validation):

In [4]:

chart.to_dict(validate=False)

Out[4]:

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

At this point the specification contains only the data and the default configuration, but no visualization specification.

Chart Marks¶

Next we can decide what sort of mark we would like to use to represent our data. For example, we can choose the point mark to represent each data as a point on the plot:

In [5]:

chart = alt.Chart(data).mark_point()
chart

Out[5]:

The result is a visualization with one point per row in the data, though it is not a particularly interesting: all the points are stacked right on top of each other! To see how this affects the specification, we can once again examine the dictionary representation:

In [6]:

chart.to_dict()

Out[6]:

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'point',
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

Notice that now in addition to the data, the specification includes information about the mark type.

Data encodings¶

The next step is to add visual encodings (or encodings for short) to the chart. A visual encoding specifies how a given data column should be mapped onto the visual properties of the visualization. Some of the more frequenty used visual encodings are listed here:

X: x-axis value
Y: y-axis value
Color: color of the mark
Opacity: transparency/opacity of the mark
Shape: shape of the mark
Size: size of the mark
Row: row within a grid of facet plots
Column: column within a grid of facet plots

For a complete list of these encodings, see the Encodings section of the documentation.

Visual encodings can be created with the encode() method of the Chart object. For example, we can start by mapping the y axis of the chart to column a:

In [7]:

chart = alt.Chart(data).mark_point().encode(y='a')
chart

Out[7]:

The result is a one-dimensional visualization representing the values taken on by a. As above, we can view the JSON data generated for this visualization:

In [8]:

chart.to_dict()

Out[8]:

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'point',
 'encoding': {'y': {'type': 'nominal', 'field': 'a'}},
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

The result is the same as above with the addition of the 'encoding' key, which specifies the visualization channel (y), the name of the field (a), and the type of the variable (nominal).

Altair is able to automatically determine the type of the variable using built-in heuristics. Altair and Vega-Lite support four primitive data types:

Data Type	Code	Description
quantitative	Q	Numerical quantity (real-valued)
nominal	N	Name / Unordered categorical
ordinal	O	Ordered categorial
temporal	T	Date/time

You can set the data type of a column explicitly using a one letter code attached to the column name with a colon:

In [9]:

alt.Chart(data).mark_point().encode(y='a:N')

Out[9]:

The visualization can be made more interesting by adding another channel to the encoding: let's encode column b as the x position:

In [10]:

alt.Chart(data).mark_point().encode(
    y='a',
    x='b'
)

Out[10]:

With two visual channels encoded, we can see the raw data points in the DataFrame. A different mark type can be chosen using a different mark_*() method, such as mark_bar():

In [11]:

alt.Chart(data).mark_bar().encode(
    alt.Y('a'),
    alt.X('b')
)

Out[11]:

Notice, we have used a slightly different syntax for specifying the channels using classes (alt.X and alt.Y) passed as positional arguments. These classes allow additional arguments to be passed to each channel, as we will see below.

Here are some of the more commonly used mark_*() methods supported in Altair and Vega-Lite; for more detail see Marks in the Altair documentation:

Method
`mark_area()`
`mark_bar()`
`mark_circle()`
`mark_line()`
`mark_point()`
`mark_rule()`
`mark_square()`
`mark_text()`
`mark_tick()`

Data transformation: Aggregation¶

Altair and Vega-Lite also support a variety of built-in data transformations, such as aggregation. The easiest way to specify such aggregations is through a string-function syntax in the argument to the column name. For example, here we will plot not all the values, but a single point representing the mean of the x-values for a given y-value:

In [12]:

alt.Chart(data).mark_point().encode(
    y='a',
    x='mean(b)'
)

Out[12]:

Conceptually, this is equivalent to the following groupby operation:

In [13]:

data.groupby('a').mean()

Out[13]:

	b
a
C	4.333333
D	3.000000
E	6.333333

More typically, aggregated values are displayed using bar charts. Making this change is as simple as replacing mark_point() with mark_bar():

In [14]:

chart = alt.Chart(data).mark_bar().encode(
    y='a',
    x='mean(b)'
)
chart

Out[14]:

As above, Altair's role in this visualization is converting the resulting object into an appropriate JSON dict. Here it is, leaving out the data for clarity:

In [15]:

chart.to_dict()

Out[15]:

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'bar',
 'encoding': {'x': {'type': 'quantitative', 'aggregate': 'mean', 'field': 'b'},
  'y': {'type': 'nominal', 'field': 'a'}},
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

Notice that Altair has taken the string 'mean(b)' and converted it to a mapping that includes field, type, and aggregate. The full shorthand syntax for the column names in Altair also includes the explicit type code separated by a column:

In [16]:

x = alt.X('mean(b):Q')
x.to_dict()

Out[16]:

{'type': 'quantitative', 'aggregate': 'mean', 'field': 'b'}

This shorthand is equivalent to spelling-out these properties by name:

In [17]:

x = alt.X('b', aggregate='average', type='quantitative')
x.to_dict()

Out[17]:

{'type': 'quantitative', 'aggregate': 'average', 'field': 'b'}

This is one benefit of using the Altair API over writing the Vega-Lite spec from scratch: valid Vega-Lite specifications can be created very succinctly, with less boilerplate code.

Customizing your visualization¶

To speed the process of data exploration, Altair (via Vega-Lite) makes some choices about default properties of the visualization. Altair also provides an API to customize the look of the visualization. For example, we can use the X object we saw above to override the default x-axis title:

In [18]:

alt.Chart(data).mark_bar().encode(
    y='a',
    x=alt.X('mean(b)', axis=alt.Axis(title='Mean of quantity b'))
)

Out[18]:

The properties of marks can be configured by passing keyword arguments to the mark_*() methods; for example, any named HTML color is supported:

In [19]:

alt.Chart(data).mark_bar(color='firebrick').encode(
    y='a',
    x=alt.X('mean(b)', axis=alt.Axis(title='Mean of quantity b'))
)

Out[19]:

Similarly, we can set properties of the chart such as width and height using the properties() method:

In [20]:

chart = alt.Chart(data).mark_bar().encode(
    y='a',
    x=alt.X('average(b)', axis=alt.Axis(title='Average of b'))
).properties(
    width=400,
    height=300
)

chart

Out[20]:

As above, we can inspect how these configuration options affect the resulting Vega-lite specification:

In [21]:

chart.to_dict()

Out[21]:

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'bar',
 'encoding': {'x': {'type': 'quantitative',
   'aggregate': 'average',
   'axis': {'title': 'Average of b'},
   'field': 'b'},
  'y': {'type': 'nominal', 'field': 'a'}},
 'height': 300,
 'width': 400,
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

To learn more about the various properties of chart objects, you can use Jupyter's help syntax:

In [22]:

alt.Chart?

Init signature:
alt.Chart(
    data=Undefined,
    encoding=Undefined,
    mark=Undefined,
    width=Undefined,
    height=Undefined,
    **kwargs,
)
Docstring:     
Create a basic Altair/Vega-Lite chart.

Although it is possible to set all Chart properties as constructor attributes,
it is more idiomatic to use methods such as ``mark_point()``, ``encode()``,
``transform_filter()``, ``properties()``, etc. See Altair's documentation
for details and examples: http://altair-viz.github.io/.

Attributes
----------
data : Data
    An object describing the data source
mark : AnyMark
    A string describing the mark type (one of `"bar"`, `"circle"`, `"square"`, `"tick"`,
     `"line"`, * `"area"`, `"point"`, `"rule"`, `"geoshape"`, and `"text"`) or a
     MarkDef object.
encoding : FacetedEncoding
    A key-value mapping between encoding channels and definition of fields.
autosize : anyOf(AutosizeType, AutoSizeParams)
    Sets how the visualization size should be determined. If a string, should be one of
    `"pad"`, `"fit"` or `"none"`. Object values can additionally specify parameters for
    content sizing and automatic resizing. `"fit"` is only supported for single and
    layered views that don't use `rangeStep`.  __Default value__: `pad`
background : string
    CSS color property to use as the background of visualization.

    **Default value:** none (transparent)
config : Config
    Vega-Lite configuration object.  This property can only be defined at the top-level
    of a specification.
description : string
    Description of this mark for commenting purpose.
height : float
    The height of a visualization.
name : string
    Name of the visualization for later reference.
padding : Padding
    The default visualization padding, in pixels, from the edge of the visualization
    canvas to the data rectangle.  If a number, specifies padding for all sides. If an
    object, the value should have the format `{"left": 5, "top": 5, "right": 5,
    "bottom": 5}` to specify padding for each side of the visualization.  __Default
    value__: `5`
projection : Projection
    An object defining properties of geographic projection.  Works with `"geoshape"`
    marks and `"point"` or `"line"` marks that have a channel (one or more of `"X"`,
    `"X2"`, `"Y"`, `"Y2"`) with type `"latitude"`, or `"longitude"`.
selection : Mapping(required=[])
    A key-value mapping between selection names and definitions.
title : anyOf(string, TitleParams)
    Title for the plot.
transform : List(Transform)
    An array of data transformations such as filter and new field calculation.
width : float
    The width of a visualization.
File:           ~/miniconda3/lib/python3.7/site-packages/altair/vegalite/v4/api.py
Type:           type
Subclasses:

You can also read more in Altair's Configuration documentation.

Publishing a visualization online¶

Because Altair produces Vega-Lite specifications, it is relatively straightforward to export charts and publish them on the web as Vega-Lite plots. All that is required is to load the Vega-Lite javascript library, and pass it the JSON plot specification output by Altair. For convenience Altair provides a save() method, which will save any chart to HTML:

In [23]:

chart.save('chart.html')

In [24]:

!cat chart.html

<!DOCTYPE html>
<html>
<head>
  <style>
    .error {
        color: red;
    }
  </style>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@4.0.0"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@6"></script>
</head>
<body>
  <div id="vis"></div>
  <script>
    (function(vegaEmbed) {
      var spec = {"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}}, "data": {"name": "data-347f1284ea3247c0f55cb966abbdd2d8"}, "mark": "bar", "encoding": {"x": {"type": "quantitative", "aggregate": "average", "axis": {"title": "Average of b"}, "field": "b"}, "y": {"type": "nominal", "field": "a"}}, "height": 300, "width": 400, "$schema": "https://vega.github.io/schema/vega-lite/v4.0.0.json", "datasets": {"data-347f1284ea3247c0f55cb966abbdd2d8": [{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4}, {"a": "D", "b": 1}, {"a": "D", "b": 2}, {"a": "D", "b": 6}, {"a": "E", "b": 8}, {"a": "E", "b": 4}, {"a": "E", "b": 7}]}};
      var embedOpt = {"mode": "vega-lite"};

      function showError(el, error){
          el.innerHTML = ('<div class="error" style="color:red;">'
                          + '<p>JavaScript Error: ' + error.message + '</p>'
                          + "<p>This usually means there's a typo in your chart specification. "
                          + "See the javascript console for the full traceback.</p>"
                          + '</div>');
          throw error;
      }
      const el = document.getElementById('vis');
      vegaEmbed("#vis", spec, embedOpt)
        .catch(error => showError(el, error));
    })(vegaEmbed);

  </script>
</body>
</html>

Notice that the chart specification is passed to the vegaEmbed library in the spec variable; the rest of the code is a template that is constant regardless of the chart.

We can view the output in an iframe within the notebook (note that some online notebook viewers will not display iframes):

In [25]:

# Display IFrame in IPython
from IPython.display import IFrame
IFrame('chart.html', width=400, height=200)

Out[25]:

Alternatively, you can use your web browser to open the file manually to confirm that it works: chart.html.

Learning More¶

For more information on Altair, please refer to Altair's online documentation: http://altair-viz.github.io/

You can also see some of the example plots listed in the accompanying notebooks.