Static visualizations are limited in how much information they can show. To move beyond these limitations, we can create animated and/or interactive visualizations. Animations make it possible for our visualizations to tell a story through movement of the plot components (e.g., bars, points, lines). Interactivity makes it possible to explore the data visually by hiding and displaying information based on user interest. In this section, we will focus on creating animated visualizations using Matplotlib before moving on to create interactive visualizations in the next section.
In the previous section, we made a couple of visualizations to help us understand the number of Stack Overflow questions per library and how it changed over time. However, each of these came with some limitations.
We made a bar plot that captured the total number of questions per library, but it couldn't show us the growth in pandas questions over time (or how the growth rate changed over time):
We also made an area plot showing the number of questions per day over time for the top 4 libraries, but by limiting the libraries shown we lost some information:
Both of these visualizations gave us insight into the dataset. For example, we could see that pandas has by far the largest number of questions and has been growing at a faster rate than the other libraries. While this comes from studying the plots, an animation would make this much more obvious and, at the same time, capture the exponential growth in pandas questions that helped pandas overtake both Matplotlib and NumPy in cumulative questions.
Let's use Matplotlib to create an animated bar plot of cumulative questions over time to show this. We will do so in the following steps:
FuncAnimation
class.We will start by reading in our Stack Overflow dataset, but this time, we will calculate the total number of questions per month and then calculate the cumulative value over time:
import pandas as pd
questions_per_library = pd.read_csv(
'../data/stackoverflow.zip', parse_dates=True, index_col='creation_date'
).loc[:,'pandas':'bokeh'].resample('1M').sum().cumsum().reindex(
pd.date_range('2008-08', '2021-10', freq='M')
).fillna(0)
questions_per_library.tail()
pandas | matplotlib | numpy | seaborn | geopandas | geoviews | altair | yellowbrick | vega | holoviews | hvplot | bokeh | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-05-31 | 200734.0 | 57853.0 | 89812.0 | 6855.0 | 1456.0 | 57.0 | 716.0 | 46.0 | 532.0 | 513.0 | 84.0 | 4270.0 |
2021-06-30 | 205065.0 | 58602.0 | 91026.0 | 7021.0 | 1522.0 | 57.0 | 760.0 | 48.0 | 557.0 | 521.0 | 88.0 | 4308.0 |
2021-07-31 | 209235.0 | 59428.0 | 92254.0 | 7174.0 | 1579.0 | 62.0 | 781.0 | 50.0 | 572.0 | 528.0 | 89.0 | 4341.0 |
2021-08-31 | 213410.0 | 60250.0 | 93349.0 | 7344.0 | 1631.0 | 62.0 | 797.0 | 52.0 | 589.0 | 541.0 | 92.0 | 4372.0 |
2021-09-30 | 214919.0 | 60554.0 | 93797.0 | 7414.0 | 1652.0 | 63.0 | 804.0 | 54.0 | 598.0 | 542.0 | 92.0 | 4386.0 |
Source: Stack Exchange Network
FuncAnimation
class.¶To create animations with Matplotlib, we will be using the FuncAnimation
class, so let's import it now:
from matplotlib.animation import FuncAnimation
At a minimum, we will need to provide the following when instantiating a FuncAnimation
object:
Figure
object to draw on.In the next few steps, we will work on the logic for these.
Since we are required to pass in a Figure
object and bake all the plot update logic into a function, we will start by building up an initial plot. Here, we create a bar plot with bars of width 0, so that they don't show up for now. The y-axis is set up so that the libraries with the most questions overall are at the top:
import matplotlib.pyplot as plt
from matplotlib import ticker
from utils import despine
def bar_plot(data):
fig, ax = plt.subplots(figsize=(8, 6))
sort_order = data.last('1M').squeeze().sort_values().index
bars = [
bar.set_label(label) for label, bar in
zip(sort_order, ax.barh(sort_order, [0] * data.shape[1]))
]
ax.set_xlabel('total questions', fontweight='bold')
ax.set_xlim(0, 250_000)
ax.xaxis.set_major_formatter(ticker.EngFormatter())
ax.xaxis.set_tick_params(labelsize=12)
ax.yaxis.set_tick_params(labelsize=12)
despine(ax)
fig.tight_layout()
return fig, ax
This gives us a plot that we can update:
%config InlineBackend.figure_formats = ['svg']
%matplotlib inline
bar_plot(questions_per_library)
(<Figure size 576x432 with 1 Axes>, <AxesSubplot:xlabel='total questions'>)
We will also need to initialize annotations for each of the bars and some text to show the date in the animation (month and year):
def generate_plot_text(ax):
annotations = [
ax.annotate(
'', xy=(0, bar.get_y() + bar.get_height()/2),
ha='left', va='center'
) for bar in ax.patches
]
time_text = ax.text(
0.9, 0.1, '', transform=ax.transAxes,
fontsize=15, ha='center', va='center'
)
return annotations, time_text
Tip: We are passing in transform=ax.transAxes
when we place our time text in order to specify the location in terms of the Axes
object's coordinates instead of basing it off the data in the plot so that it is easier to place.
Next, we will make our plot update function. This will be called at each frame. We will extract that frame's data (the cumulative questions for that month), and then update the width of each of the bars. In addition, we will annotate the bars if their widths are greater than 0. At every frame, we will also need to update our time annotation (time_text
):
def update(frame, *, ax, df, annotations, time_text):
data = df.loc[frame, :]
# update bars
for rect, text in zip(ax.patches, annotations):
col = rect.get_label()
if data[col]:
rect.set_width(data[col])
text.set_x(data[col])
text.set_text(f' {data[col]:,.0f}')
# update time
time_text.set_text(frame.strftime('%b\n%Y'))
Tip: The asterisk in the function signature requires all arguments after it to be passed in by name. This makes sure that we explicitly define the components for the animation when calling the function. Read more on this syntax here.
The last step before creating our animation is to create a function that will assemble everything we need to pass to FuncAnimation
. Note that our update()
function requires multiple parameters, but we would be passing in the same values every time (since we would only change the value for frame
). To make this simpler, we create a partial function, which binds values to each of those arguments so that we only have to pass in frame
when we call the partial. This is essentially a closure, where bar_plot_init()
is the enclosing function and update()
is the nested function, which we defined in the previous code block for readability:
from functools import partial
def bar_plot_init(questions_per_library):
fig, ax = bar_plot(questions_per_library)
annotations, time_text = generate_plot_text(ax)
bar_plot_update = partial(
update, ax=ax, df=questions_per_library,
annotations=annotations, time_text=time_text
)
return fig, bar_plot_update
Finally, we are ready to create our animation. We start by calling the bar_plot_init()
function from the previous code block to generate the Figure
object and partial function for the update of the plot. Then, we pass in the Figure
object and update function when initializing our FuncAnimation
object. We also specify the frames
argument as the index of our DataFrame (the dates) and that the animation shouldn't repeat because we will save it as an MP4 video:
fig, update_func = bar_plot_init(questions_per_library)
ani = FuncAnimation(
fig, update_func, frames=questions_per_library.index, repeat=False
)
ani.save(
'../media/stackoverflow_questions.mp4',
writer='ffmpeg', fps=10, bitrate=100, dpi=300
)
plt.close()
Important: The FuncAnimation
object must be assigned to a variable when creating it; otherwise, without any references to it, Python will garbage collect it – ending the animation. For more information on garbage collection in Python, check out this article.
Now, let's view the animation we just saved as an MP4 file:
from IPython import display
display.Video(
'../media/stackoverflow_questions.mp4', width=600, height=400,
embed=True, html_attributes='controls muted autoplay'
)
As with the previous example, the histograms of daily Manhattan subway entries in 2018 (from the first section of the workshop) don't tell the whole story of the dataset because the distributions changed drastically in 2020 and 2021:
We will make an animated version of these histograms that enables us to see the distributions changing over time. Note that this example will have two key differences from the previous one. The first is that we will be animating subplots rather than a single plot, and the second is that we will use a technique called blitting to only update the portion of the subplots that has changed. This requires that we return the artists that need to be redrawn in the plot update function.
To make this visualization, we will work through these steps:
As we did previously, we will read in the subway dataset, which contains the total entries and exits per day per borough:
subway = pd.read_csv(
'../data/NYC_subway_daily.csv', parse_dates=['Datetime'],
index_col=['Borough', 'Datetime']
)
subway_daily = subway.unstack(0)
subway_daily.head()
Entries | Exits | |||||||
---|---|---|---|---|---|---|---|---|
Borough | Bk | Bx | M | Q | Bk | Bx | M | Q |
Datetime | ||||||||
2017-02-04 | 617650.0 | 247539.0 | 1390496.0 | 408736.0 | 417449.0 | 148237.0 | 1225689.0 | 279699.0 |
2017-02-05 | 542667.0 | 199078.0 | 1232537.0 | 339716.0 | 405607.0 | 139856.0 | 1033610.0 | 268626.0 |
2017-02-06 | 1184916.0 | 472846.0 | 2774016.0 | 787206.0 | 761166.0 | 267991.0 | 2240027.0 | 537780.0 |
2017-02-07 | 1192638.0 | 470573.0 | 2892462.0 | 790557.0 | 763653.0 | 270007.0 | 2325024.0 | 544828.0 |
2017-02-08 | 1243658.0 | 497412.0 | 2998897.0 | 825679.0 | 788356.0 | 275695.0 | 2389534.0 | 559639.0 |
For this visualization, we will just be working with the entries in Manhattan:
manhattan_entries = subway_daily['Entries']['M']
Before we can set up the subplots, we have to calculate the bin ranges for the histograms so that our animation is smooth. NumPy provides the histogram()
function, which gives us both the number of data points in each bin and the bin ranges, respectively. We will also be using this function to update the histograms during the animation:
import numpy as np
count_per_bin, bin_ranges = np.histogram(manhattan_entries, bins=30)
Next, we will handle the logic for building our initial histogram, packaging it in a function:
def subway_histogram(data, bins, date_range):
_, bin_ranges = np.histogram(data, bins=bins)
weekday_mask = data.index.weekday < 5
configs = [
{'label': 'Weekend', 'mask': ~weekday_mask, 'ymax': 60},
{'label': 'Weekday', 'mask': weekday_mask, 'ymax': 120}
]
fig, axes = plt.subplots(1, 2, figsize=(8, 4), sharex=True)
for ax, config in zip(axes, configs):
_, _, config['hist'] = ax.hist(
data[config['mask']].loc[date_range], bin_ranges, ec='black'
)
ax.xaxis.set_major_formatter(ticker.EngFormatter())
ax.set(
xlim=(0, None), ylim=(0, config['ymax']),
xlabel=f'{config["label"]} Entries'
)
despine(ax)
axes[0].set_ylabel('Frequency')
fig.suptitle('Histogram of Daily Subway Entries in Manhattan')
fig.tight_layout()
return fig, axes, bin_ranges, configs
Notice that our plot this time starts out with data already – this is because we want to show the change in the distribution of daily entries in the last year:
_ = subway_histogram(manhattan_entries, bins=30, date_range='2017')
We will once again include some text that indicates the time period as the animation runs. This is similar to what we had in the previous example:
def add_time_text(ax):
time_text = ax.text(
0.15, 0.9, '', transform=ax.transAxes,
fontsize=15, ha='center', va='center'
)
return time_text
Now, we will create our update function. This time, we have to update both subplots and return any artists that need to be redrawn since we are going to use blitting:
def update(frame, *, data, configs, time_text, bin_ranges):
artists = []
time = frame.strftime('%b\n%Y')
if time != time_text.get_text():
time_text.set_text(time)
artists.append(time_text)
for config in configs:
time_frame_mask = \
(data.index > frame - pd.Timedelta(days=365)) & (data.index <= frame)
counts, _ = np.histogram(
data[time_frame_mask & config['mask']],
bin_ranges
)
for count, rect in zip(counts, config['hist'].patches):
if count != rect.get_height():
rect.set_height(count)
artists.append(rect)
return artists
As our final step before generating the animation, we bind our arguments to the update function using a partial function:
def histogram_init(data, bins, initial_date_range):
fig, axes, bin_ranges, configs = subway_histogram(data, bins, initial_date_range)
update_func = partial(
update, data=data, configs=configs,
time_text=add_time_text(axes[0]),
bin_ranges=bin_ranges
)
return fig, update_func
Finally, we will animate the plot using FuncAnimation
like before. Notice that this time we are passing in blit=True
, so that only the artists that we returned in the update()
function are redrawn. We are specifying to make updates for each day in the data starting on August 1, 2019:
fig, update_func = histogram_init(
manhattan_entries, bins=30, initial_date_range=slice('2017', '2019-07')
)
ani = FuncAnimation(
fig, update_func, frames=manhattan_entries['2019-08':'2021'].index,
repeat=False, blit=True
)
ani.save(
'../media/subway_entries_subplots.mp4',
writer='ffmpeg', fps=30, bitrate=500, dpi=300
)
plt.close()
Tip: We are using a slice
object to pass a date range for pandas to use with loc[]
. More information on slice()
can be found here.
Our animation makes it easy to see the change in the distributions over time:
from IPython import display
display.Video(
'../media/subway_entries_subplots.mp4', width=600, height=400,
embed=True, html_attributes='controls muted autoplay'
)
We start by reading in the dataset:
import pandas as pd
manhattan_entries = pd.read_csv(
'../data/NYC_subway_daily.csv', parse_dates=['Datetime'],
index_col=['Borough', 'Datetime']
).unstack(0)['Entries']['M']
manhattan_entries.head()
Datetime 2017-02-04 1390496.0 2017-02-05 1232537.0 2017-02-06 2774016.0 2017-02-07 2892462.0 2017-02-08 2998897.0 Name: M, dtype: float64
Next, we need to handle our imports:
from functools import partial
from matplotlib.animation import FuncAnimation
import matplotlib.pyplot as plt
from matplotlib import ticker
import numpy as np
from utils import despine
We can make this animation with the following changes to the original code:
subway_histogram()
function to account for bar color and transparency, as well as plotting everything on a single Axes
object.histogram_init()
function to account for a single Axes
object.subway_histogram()
function to account for bar color and transparency, as well as plotting everything on a single Axes
object.¶def subway_histogram(data, bins, date_range):
_, bin_ranges = np.histogram(data, bins=bins)
weekday_mask = data.index.weekday < 5
configs = [ # CHANGE: add bar color to config
{'label': 'Weekend', 'mask': ~weekday_mask, 'color': 'green'},
{'label': 'Weekday', 'mask': weekday_mask, 'color': 'blue'}
]
fig, ax = plt.subplots(1, 1, figsize=(8, 4)) # CHANGE: single Axes
for config in configs:
_, _, config['hist'] = ax.hist(
data[config['mask']].loc[date_range], bin_ranges, ec='black',
facecolor=config['color'], alpha=0.5, label=config['label']
) # CHANGES: ^ color the bar and ^ add transparency
ax.xaxis.set_major_formatter(ticker.EngFormatter())
despine(ax)
# CHANGES: update formatting and add legend
ax.set(xlim=(0, None), ylim=(0, 120), xlabel='Entries', ylabel='Frequency')
ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.05), ncol=2, frameon=False)
fig.suptitle('Histogram of Daily Subway Entries in Manhattan')
fig.tight_layout()
return fig, ax, bin_ranges, configs
def add_time_text(ax):
time_text = ax.text(
0.1, 0.9, '', transform=ax.transAxes,
fontsize=15, ha='center', va='center'
)
return time_text
Note that we don't need to change the update()
function for this exercise:
def update(frame, *, data, configs, time_text, bin_ranges):
artists = []
time = frame.strftime('%b\n%Y')
if time != time_text.get_text():
time_text.set_text(time)
artists.append(time_text)
for config in configs:
time_frame_mask = \
(data.index > frame - pd.Timedelta(days=365)) & (data.index <= frame)
counts, _ = np.histogram(
data[time_frame_mask & config['mask']],
bin_ranges
)
for count, rect in zip(counts, config['hist'].patches):
if count != rect.get_height():
rect.set_height(count)
artists.append(rect)
return artists
histogram_init()
function to account for a single Axes
object.¶def histogram_init(data, bins, initial_date_range):
fig, ax, bin_ranges, configs = subway_histogram(
data, bins, initial_date_range
) # CHANGE: rename variable `ax`
update_func = partial(
update, data=data, configs=configs,
time_text=add_time_text(ax), # CHANGE: pass in `ax`
bin_ranges=bin_ranges
)
return fig, update_func
fig, update_func = histogram_init(
manhattan_entries, bins=30, initial_date_range=slice('2017', '2019-07')
)
ani = FuncAnimation(
fig, update_func, frames=manhattan_entries['2019-08':'2021'].index,
repeat=False, blit=True
)
ani.save(
'../media/subway_entries_exercise.mp4', # CHANGE: new filename
writer='ffmpeg', fps=30, bitrate=500, dpi=300
)
plt.close()
The new animation looks like this:
from IPython import display
display.Video(
'../media/subway_entries_exercise.mp4', width=600, height=400,
embed=True, html_attributes='controls muted autoplay'
)
HoloViz provides multiple high-level tools that aim to simplify data visualization in Python. For this example, we will be looking at HoloViews and GeoViews, which extends HoloViews for use with geographic data. HoloViews abstracts away some of the plotting logic, removing boilerplate code and making it possible to easily switch backends (e.g., switch from Matplotlib to Bokeh for JavaScript-powered, interactive plotting). To wrap up our discussion on animation, we will use GeoViews to create an animation of earthquakes per month in 2020 on a map of the world.
To make this visualization, we will work through the following steps:
Here, we import GeoPandas and then use the read_file()
function to read the earthquakes GeoJSON data into a GeoDataFrame
object:
import geopandas as gpd
earthquakes = gpd.read_file('../data/earthquakes.geojson').assign(
time=lambda x: pd.to_datetime(x.time, unit='ms'),
month=lambda x: x.time.dt.month
)[['geometry', 'mag', 'time', 'month']]
earthquakes.shape
(188527, 4)
Our data looks like this:
earthquakes.head()
geometry | mag | time | month | |
---|---|---|---|---|
0 | POINT Z (-67.12750 19.21750 12.00000) | 2.75 | 2020-01-01 00:01:56.590 | 1 |
1 | POINT Z (-67.09010 19.07660 6.00000) | 2.55 | 2020-01-01 00:03:38.210 | 1 |
2 | POINT Z (-66.85410 17.87050 6.00000) | 1.81 | 2020-01-01 00:05:09.440 | 1 |
3 | POINT Z (-66.86360 17.89930 8.00000) | 1.84 | 2020-01-01 00:05:36.930 | 1 |
4 | POINT Z (-66.86850 17.90660 8.00000) | 1.64 | 2020-01-01 00:09:20.060 | 1 |
Source: USGS API
Since our earthquakes dataset contains geometries, we will use GeoViews in addition to HoloViews to create our animation. For this example, we will be using the Matplotlib backend:
import geoviews as gv
import geoviews.feature as gf
import holoviews as hv
gv.extension('matplotlib')
Next, we will write a function to plot each earthquake as a point on the world map. Since our dataset has geometries, we can use that information to plot them and then color each point by the earthquake magnitude. Note that, since earthquakes are measured on a logarithmic scale, some magnitudes are negative:
import calendar
def plot_earthquakes(data, month_num):
points = gv.Points(
data.query(f'month == {month_num}'),
kdims=['longitude', 'latitude'], # key dimensions (for coordinates in this case)
vdims=['mag'] # value dimensions (for modifying the plot in this case)
).redim.range(mag=(-2, 10), latitude=(-90, 90))
# create an overlay by combining Cartopy features and the points with *
overlay = gf.land * gf.coastline * gf.borders * points
return overlay.opts(
gv.opts.Points(color='mag', cmap='fire_r', colorbar=True, alpha=0.75),
gv.opts.Overlay(
global_extent=False, title=f'{calendar.month_name[month_num]}', fontscale=2
)
)
Our function returns an Overlay
of earthquakes (represented as Points
) on a map of the world. Under the hood GeoViews is using Cartopy to create the map:
plot_earthquakes(earthquakes, 1).opts(
fig_inches=(6, 3), aspect=2, fig_size=250, fig_bounds=(0.07, 0.05, 0.87, 0.95)
)
Tip: One thing that makes working with geospatial data difficult is handling projections. When working with datasets that use different projections, GeoViews can help align them – check out their tutorial here.
We will create a HoloMap
of the frames to include in our animation. This maps the frame to the plot that should be rendered at that frame:
frames = {
month_num: plot_earthquakes(earthquakes, month_num)
for month_num in range(1, 13)
}
holomap = hv.HoloMap(frames)
Now, we will output our HoloMap
as a GIF animation, which may take a while to run:
hv.output(
holomap.opts(
fig_inches=(6, 3), aspect=2, fig_size=250,
fig_bounds=(0.07, 0.05, 0.87, 0.95)
), holomap='gif', fps=5
)
To save the animation to a file, run the following code:
hv.save(
holomap.opts(
fig_inches=(6, 3), aspect=2, fig_size=250,
fig_bounds=(0.07, 0.05, 0.87, 0.95)
), 'earthquakes.gif', fps=5
)
We start by reading in the dataset:
import geopandas as gpd
import pandas as pd
earthquakes = gpd.read_file('../data/earthquakes.geojson').assign(
time=lambda x: pd.to_datetime(x.time, unit='ms'),
month=lambda x: x.time.dt.month
)[['geometry', 'mag', 'time', 'month']]
earthquakes.head()
geometry | mag | time | month | |
---|---|---|---|---|
0 | POINT Z (-67.12750 19.21750 12.00000) | 2.75 | 2020-01-01 00:01:56.590 | 1 |
1 | POINT Z (-67.09010 19.07660 6.00000) | 2.55 | 2020-01-01 00:03:38.210 | 1 |
2 | POINT Z (-66.85410 17.87050 6.00000) | 1.81 | 2020-01-01 00:05:09.440 | 1 |
3 | POINT Z (-66.86360 17.89930 8.00000) | 1.84 | 2020-01-01 00:05:36.930 | 1 |
4 | POINT Z (-66.86850 17.90660 8.00000) | 1.64 | 2020-01-01 00:09:20.060 | 1 |
Next, we handle our plotting imports:
import geoviews as gv
import geoviews.feature as gf
import holoviews as hv
gv.extension('matplotlib')
We can make this animation as follows:
plot_earthquakes()
function to filter by date instead of month and use the date for the title.HoloMap
object.plot_earthquakes()
function to filter by date instead of month and use the date for the title.¶def plot_earthquakes(data, date):
points = gv.Points( # CHANGE: filter `data` by `date`
data.query(f'time.dt.strftime("%Y-%m-%d") == "{date}"'),
kdims=['longitude', 'latitude'],
vdims=['mag']
).redim.range(mag=(-2, 10), latitude=(-90, 90))
overlay = gf.land * gf.coastline * gf.borders * points
return overlay.opts(
gv.opts.Points(color='mag', cmap='fire_r', colorbar=True, alpha=0.75),
gv.opts.Overlay(
global_extent=False, title=f'{date:%B %d, %Y}', fontscale=2
) # CHANGE: title each frame with the date ^
)
HoloMap
object.¶import datetime as dt
frames = {
day: plot_earthquakes(earthquakes, dt.date(2020, 4, day))
for day in range(1, 31)
}
holomap = hv.HoloMap(frames)
hv.output(
holomap.opts(
fig_inches=(6, 3), aspect=2, fig_size=250,
fig_bounds=(0.07, 0.05, 0.87, 0.95)
), holomap='gif', fps=5
)
matplotlib.animation
API overviewFuncAnimation
documentation