Data Visualization in Python: Altair¶

Introduction¶

Altair is a declarative statistical visualization library for Python. It offers a powerful and concise visualization grammar that enables users to build a wide range of statistical visualizations quickly and simply.

Here are some benefits of using Altair for visualization:

The graph can easily be interative
Every visualization generated by Altair can be downloaded as PNG file if click the three dots on the upper right side of the graph
Coding grammar is greatly formatted and easy to add features

(Note: materials included are common useful visualizations methods collected from https://altair-viz.github.io/, if it doesn't include any specific visualization problem, please visit the website for further reference)

Installation¶

If you are using pip: ! pip install altair vega_datasets

If you are using conda: ! conda install -c conda-forge altair vega_datasets

Loading Package¶

Loading the Altair package is similar to loading other packages.

In [1]:

import pandas as pd
import numpy as np
import altair as alt

Basic Graph Format¶

Chart¶

The fundamental object in Altair is the Chart, which takes a dataframe as a single argument.

alt.Chart(dataframe)

However, on its own, it will not draw anything because we have not yet told the chart to do anything with the data.We need to specify marks to successully draw the graph.

Marks¶

The mark property lets you specify how the data needs to be represented on the plot.

alt.Chart(dataframe).mark_point()

Encodings¶

Once we have the data and determined how it is represented, we want to specify what columns in the dataframe to represent it. That is, we need to set up the x and y data, size, color, etc. This is where we use encodings.

alt.Chart(dataframe).mark_point().encode()

After knowing the general structure of the command, we can explore more deeply in each categories.

1. Marks¶

The mark property lets you specify how the data needs to be represented on the plot. Following are the common mark properties provided by Altair:

(For detailed marks, visit https://altair-viz.github.io/user_guide/marks.html)

Mark Type	Command	Description
area	`mark_area()`	A filled area
line	`mark_line()`	A line plot
bar	`mark_bar()`	A bar plot
point	`mark_point()`	A scatter plot with hollow point
circle	`mark_circle()`	A scatter plot with solid point
text	`mark_line()`	A scatter plot with point as text
square	`mark_square()`	A scatter plot with square point
rect	`mark_rect()`	A heatmap
box plot	`mark_boxplot()`	A box plot

Example 1. Scatter Plot of Acceleration vs. Horsepower in Cars Dataset¶

In [2]:

from vega_datasets import data
cars = data.cars()
alt.Chart(cars).mark_circle().encode(
    x='Horsepower:Q',
    y='Acceleration:Q'
)

Out[2]:

Example 2. Line Plot¶

In [3]:

df = pd.DataFrame({'x':np.array([-1,0,1,2,3]),'y':np.array([-1,0,1,2,3])**2})
alt.Chart(df).mark_line().encode(
    x='x:Q',
    y='y:Q'
)

Out[3]:

Example 3. Box Plot of Acceleration vs. Cylinders in Cars Dataset¶

In [4]:

alt.Chart(cars).mark_boxplot().encode(
    y='Cylinders:O',
    x='Acceleration:Q'
)

Out[4]:

Example 4. Heatmap of Cylinders vs. Origin in Cars Dataset¶

In [5]:

alt.Chart(cars).mark_rect().encode(
    y='Cylinders:O',
    x='Origin:O',
    color = 'count():Q'
).properties(
    width=200,
    height=200
)

Out[5]:

2. Encodings¶

Following are some common encoding parameters to put inside .encode():

(For detailed encoding channels, visit https://altair-viz.github.io/user_guide/encoding.html)

Encoding Description	Command	Description
x-axis	`alt.X()`	x-axis data
y-axis value	`alt.Y()`	y-axis data
size	`alt.Size()`	size change with data
shape	`alt.Shape()`	shape change with data
color	`alt.Color()`	color change with data

2.1 Encoding Data Type¶

Sometimes we need to deal with different types of data. For example, for continuous data, we want the color to change gradually, but for categorical data, we want the color to be distinct for each category.

There is a way to specify each data type in encoding():

alt.Chart(df).mark_point().encode(x = ':Q')

or

alt.Chart(df).mark_point().encode(alt.X(field = '', type = 'quantitative')

Here are possible data types:

Data Type	Shorthand Code	Description
quantitative	`Q`	a continuous real-valued quantity
ordinal	`O`	a discrete ordered quantity
nominal	`N`	a discrete unordered category
temporal	`T`	a time or date value
geojson	`G`	a geographic shape

Example 1. Scatter plot of the acceleration vs. horsepower colored by number of cylinders in three different ways, with the color encoded as a quantitative, ordinal, and nominal type.¶

In [6]:

base = alt.Chart(cars).mark_point().encode(
    alt.X('Horsepower',
          type = 'quantitative',
          title = 'Horsepower'),
    alt.Y('Acceleration',
          type = 'quantitative',
          title = 'Acceleration')
).properties(
    width=150,
    height=150
)

# horizontally concat three graphs
alt.hconcat(
   base.encode(alt.Color('Cylinders', type = 'quantitative')).properties(title='quantitative'),
   base.encode(alt.Color('Cylinders', type = 'ordinal')).properties(title='ordinal'),
   base.encode(alt.Color('Cylinders', type = 'nominal')).properties(title='nominal')
)

Out[6]:

Tooltip lets us to show the details that the data point represents when moving around.

tooltip = [alt.Tooltip('')]

2.3 Interactive¶

To make the graph a interactive plot, we can add .interactive() after all the marks and encodings.

alt.Chart(dataframe).mark_point().encode().interactive()

Example. Interactive scatter plot of Acceleration vs. Horsepower in cars data, colored by number of cylinders and shaped by origin.¶

In [7]:

alt.Chart(cars).mark_point(size = 50).encode(
    alt.X('Horsepower',
          type = 'quantitative',
          title = 'Horsepower'), 
    alt.Y('Acceleration',
          type = 'quantitative',
          title = 'Acceleration'), 
    alt.Color('Cylinders', type = 'ordinal'),
    alt.Shape('Origin', type = 'nominal'),
    # include petal length, petal width and species information for each point
    tooltip = [alt.Tooltip('Horsepower'),
               alt.Tooltip('Acceleration'),
               alt.Tooltip('Cylinders'),
               alt.Tooltip('Origin')
              ]
).interactive() # make the plot interactive

Out[7]:

3. Data Transformation¶

There are several ways to transform the original data when during the visualization. Here I selected several useful transformations.

(For detailed transformation methods, visit https://altair-viz.github.io/user_guide/transform/index.html)

3.1 Bin Transform (Historgram)¶

Example. Histogram of Acceleration distribution in Cars dataset¶

In [8]:

alt.Chart(cars).mark_area(interpolate='step').encode(
    alt.X("Acceleration:Q",
          axis = alt.Axis(title = "MPG"),
          bin = alt.Bin(maxbins=10)),
    alt.Y("count():Q",
          axis = alt.Axis(title = "Count"), 
          stack=None)
)

Out[8]:

3.2 Convert wide-form data into long-form data¶

There are two common conventions for storing data in a dataframe, sometimes called long-form and wide-form.

wide-form data has one row per independent variable, with different features recorded in different columns.
long-form data has one row per observation, with features recorded within the table as values.

Altair’s grammar works best with long-form data, in which each row corresponds to a single observation along with its features. Hence, we can converting wide-form data to the long-form data used by Altair:

.transform_fold([featurs],as = [key, value])

(For detailed information, visit: https://altair-viz.github.io/user_guide/transform/fold.html#user-guide-fold-transform)

Example. Wide-form Data of Daily Fruit Price¶

In [9]:

wide_form = pd.DataFrame({'Date': ['2021-08-01', '2021-09-01', '2021-10-01'],
                          'Orange': [5,6,7],
                          'Apple': [3,4,4],
                          'Peach': [5,5.5,5]})
wide_form

Out[9]:

	Date	Orange	Apple	Peach
0	2021-08-01	5	3	5.0
1	2021-09-01	6	4	5.5
2	2021-10-01	7	4	5.0

In [10]:

alt.Chart(wide_form).transform_fold(
    ['Orange', 'Apple', 'Peach'],
    as_=['fruit', 'price']
).mark_line().encode(
    x='Date:T',
    y='price:Q',
    color='fruit:N'
)

Out[10]:

3.3 LOESS transform¶

The LOESS transform (LOcally Estimated Scatterplot Smoothing) uses a locally-estimated regression to produce a trend line. LOESS performs a sequence of local weighted regressions over a sliding window of nearest-neighbor points.

Example.¶

In [11]:

np.random.seed(42)

df = pd.DataFrame({
    'x': range(100),
    'y': np.random.randn(100).cumsum()
})

chart = alt.Chart(df).mark_point().encode(
    x='x',
    y='y'
)

chart + chart.transform_loess('x', 'y').mark_line()

Out[11]:

4. Compound Charts¶

(For detailed ways to compound charts, visit https://altair-viz.github.io/user_guide/compound_charts.html)

4.1 Layered Charts¶

Layered charts allow user to overlay two different charts on the same set of axes.

(Example above shows a way to layer two graphs, one scatter plot one line plot)

Example. Using `alt.layer`¶

In [12]:

from altair.expr import datum
stocks = data.stocks()

base = alt.Chart(stocks).encode(
    x='date:T',
    y='price:Q',
    color='symbol:N'
).transform_filter(
    datum.symbol == 'GOOG'
)
alt.layer(
  base.mark_line(),
  base.mark_point(),
  base.mark_rule()
)

Out[12]:

4.2 Horizontal/Vertical Concatenation¶

Displaying two plots side-by-side, which can be created using the hconcat() function or the | operator.

Similarly, two plots can be vertically combined via the vconcat() function or the & operator.

Example. horizontal concatenation of Iris Data¶

In [13]:

chart1 = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).properties(
    height=300,
    width=300
)

chart2 = alt.Chart(cars).mark_bar().encode(
    x='count():Q',
    y=alt.Y('Miles_per_Gallon:Q', bin=alt.Bin(maxbins=10)),
    color='Origin:N'
).properties(
    height=300,
    width=100
)

chart1 | chart2

Out[13]:

4.3 Faceted Charts¶

Using alt.facet() can put data into facets:

In [14]:

alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Acceleration:Q',
    color='Origin:N'
).properties(
    width=180,
    height=180
).facet(
    column='Origin:N'
)

Out[14]:

5. Customizing Visualizations¶

(https://altair-viz.github.io/user_guide/customization.html)

5.1 Global Config¶

Acts on an entire chart object.

Every chart type has a "config" property at the top level that acts as a sort of theme for the whole chart and all of its sub-charts. Here you can specify things like axes properties, mark properties, selection properties, and more. Altair allows you to access these through the configure_*() methods of the chart.

E.g. alt.Chart().mark_point().encode().configure_mark(opacity=0.2, color='red')

By design configurations will affect every mark used within the chart
The global configuration is only permissible at the top-level; so, for example, if you tried to layer the above chart with another, it would result in an error.

(Detailed ways to change global configuration: https://altair-viz.github.io/user_guide/configuration.html)

5.2 Local Config¶

Acts on one mark of the chart.

If you would like to configure the look of the mark locally, such that the setting only affects the particular chart property you reference, this can be done via a local configuration setting. In the case of mark properties, the best approach is to set the property as an argument to the mark_*() method.

E.g. alt.Chart().mark_point(opacity=0.2, color='red').encode()

Unlike when using the global configuration, here it is possible to use the resulting chart as a layer or facet in a compound chart.
Local config settings like this one will always override global settings.

5.3 Encoding channels¶

Be used to set some chart properties.

Encoding settings will always override local or global configuration settings.

5.4 Adjusting Axis Limits¶

5.4.1. Not Starting from Zero¶

Add a Scale property to the X encoding that specifies zero=False:

In [15]:

alt.Chart(cars).mark_point().encode(
    alt.X('Horsepower:Q',
        scale=alt.Scale(zero=False)
    ),
    y='Acceleration:Q'
)

Out[15]:

5.4.2. Rescale¶

To specify exact axis limits, you can use the domain() property of the scale.

In [16]:

alt.Chart(cars).mark_point().encode(
    alt.X('Horsepower:Q',
        scale=alt.Scale(domain=(40, 200))
    ),
    y='Acceleration:Q'
)

Out[16]:

There is one problem with rescaling is that some data outside the domain may exist beyond the scale, and we need to tell Altair what to do with this data. One option is to “clip” the data by setting the clip property of the mark to True.

In [17]:

alt.Chart(cars).mark_point(clip=True).encode(
    alt.X('Horsepower:Q',
        scale=alt.Scale(domain=(40, 200))
    ),
    y='Acceleration:Q'
)

Out[17]:

In addition to the properties and methods for visualizations selected in this notebook, there are a wide array of different ways for advanced visualizations in Python using Altair on the website https://altair-viz.github.io/. The left column includes a detailed user guide which provides solutions for any possible problems emerged during the visualization process.

Possible Template to Use¶

In [ ]:

alt.Chart(df).mark_point().encode(
    alt.X(':Q'
          scale=alt.Scale(zero= ),
          title='' # x-axis name
         ),
    alt.Y(':Q'
          scale=alt.Scale(domain=(,)),
          aggregate='',
          title=''
         ),
    alt.Color(':Q'
              title='' # color legend name
             ),
    alt.Size(':Q'
              title='' # size legend name
            ),
    alt.Shape(':Q'
              title='' # size legend name
            ),
    tooltip = [alt.Tooltip('')] # add which column to be seen when pointed the data
).properties(
    width=,
    height=,
    title=''
).interactive()