Notebook

Tutorial 6. Interlinked Plots

Using hvPlot allows you to generate a number of different types of plot quickly from a standard API by building HoloViews objects, as discussed in the previous notebook. These objects are rendered with Bokeh which offers a number of standard ways to interact with your plot, such as panning and zooming tools.

Many other modes of interactivity are possible when building an exploratory visualization (such as a dashboard) and these forms of interactivity cannot be achieved using hvPlot alone.

In this notebook, we will drop down to the HoloViews level of representation to build a visualization directly that consists of linked plots that update when you interactivity select a particular earthquake with the mouse. The goal is to show how more sophisticated forms of interactivity can be built when needed, in a way that's fully compatible with all the examples shown in earlier sections.

First let us load our initial imports:

In [ ]:

import numpy as np
import pandas as pd
import dask.dataframe as dd
import hvplot.pandas  # noqa
import datashader.geo
from holoviews.element import tiles

And clean the data before filtering (for magnitude >7) and projecting to to Web Mercator as before:

In [ ]:

df = dd.read_parquet('../data/earthquakes.parq').repartition(npartitions=4)
cleaned_df = df.copy()
cleaned_df['mag'] = df.mag.where(df.mag > 0)
cleaned_reindexed_df = cleaned_df.set_index(cleaned_df.time)
cleaned_reindexed_df = cleaned_reindexed_df.persist()

most_severe = cleaned_reindexed_df[cleaned_reindexed_df.mag >= 7].compute()
x, y = datashader.geo.lnglat_to_meters(most_severe.longitude, most_severe.latitude)
most_severe_projected = most_severe.join([pd.DataFrame({'easting': x}), pd.DataFrame({'northing': y})])

Towards the end of the previous notebook we generated a scatter plot of earthquakes across the earth that had a magnitude >7 that was projected using datashader and overlaid on top of a map tile source:

In [ ]:

high_mag_quakes = most_severe_projected.hvplot.points(x='easting', y='northing', c='mag', 
                                                      title='Earthquakes with magnitude >= 7')
esri = tiles.ESRI().redim(x='easting', y='northing')
esri * high_mag_quakes

And saw how this object is a HoloViews Points object:

In [ ]:

print(high_mag_quakes)

This object is an example of a HoloViews Element which is an object that can display itself. These elements are thin wrappers around your data and the raw input data is always available on the .data attribute. For instance, we can look at the head of the most_severe_projected DataFrame as follows:

In [ ]:

high_mag_quakes.data.head()

We will now learn a little more about HoloViews elements, including how to build them up from scratch so that we can control every aspect of them.

An Introduction to HoloViews Elements¶

HoloViews elements are the atomic, visualizable components that can be rendered by a plotting library such as Bokeh. We don't actually need to use hvPlot to create these element objects: we can create them directly by importing HoloViews (and loading the extension if we have not loaded hvPlot):

In [ ]:

import holoviews as hv
hv.extension("bokeh") # Optional here as we have already loaded hvplot.pandas

Now we can create our own example of a Points element. In the next cell we plot 100 points with a normal (independent) distrbutions in the x and y directions:

In [ ]:

xs = np.random.randn(100)
ys = np.random.randn(100)
hv.Points((xs, ys))

Now that the axis labels are 'x' and 'y', the default dimensions for this element type. We can use a different set of dimensions along the x- and y-axis (say 'weight' and 'height') and we can also associate additional fitness information with each point if we wish:

In [ ]:

xs = np.random.randn(100)
ys = np.random.randn(100)
fitness = np.random.randn(100)
height_v_weight = hv.Points((xs, ys, fitness), ['weight', 'height'], 'fitness')
height_v_weight

Now we can look at the printed representation of this object:

In [ ]:

print(height_v_weight)

Here the printed representation shows the key dimensions that we specified in square brackets as [weight,height] and the additional value dimension fitness in parentheses as (fitness). The key dimensions map to the axes and the value dimensions can be visually represented by other visual attributes as we shall see shortly.

For more information an HoloViews dimensions, see this user guide.

Exercise¶

Visit the HoloViews reference gallery and browse the available set of elements. Pick an element type and try running one of the self-contained examples in the following cell.

In [ ]:

Setting Visual Options¶

The two Points elements above look quite different from the one returned by hvplot showing the earthquake positions. This is because hvplot makes use of the HoloViews options system to customize the visual representation of these element objects.

Let us color the height_v_weight scatter by the fitness value and use a larger point size:

In [ ]:

height_v_weight.opts(color='fitness', size=8, colorbar=True, aspect='square')

Exercise¶

Copy the line above into the next cell and try changing the points to 'blue' or 'green' or another dimension of the data such as 'height' or 'weight'.

Are the results what you expect?

In [ ]:

The `help` system¶

You can learn more about the .opts method and the HoloViews options system in the corresponding user guide. To easily learn about the available options from inside a notebook, you can use hv.help and inspect the 'Style Options'.

In [ ]:

# Commented as there is a lot of help output!
# hv.help(hv.Scatter) 

At this point, we can have some insight to the sort of HoloViews object hvPlot is building behind the scenes for our earthquake example:

In [ ]:

esri * hv.Points(most_severe_projected, ['easting', 'northing'], 'mag').opts(color='mag', size=8, aspect='equal')

Exercise¶

Try using hv.help to inspect the options available for different element types such as the Points element used above. Copy the line above into the cell below and pick a Points option that makes sense to you and try using it in the .opts method.

In [ ]:

Hint

If you can't decide on an option to pick, a good choice is marker. For instance, try:

marker='+'
marker='d'.

HoloViews uses matplotlib's conventions for specifying the various marker types. Try finding out which ones are support by Bokeh.

Custom interactivity for Elements¶

When rasterization of the population density data via hvplot was introduced in the last notebook, we saw that the HoloViews object returned was not an element but a DynamicMap.

A DynamicMap enables custom interactivity beyond the Bokeh defaults by dynamically generating elements that get displayed and updated as the plot is interacted with.

There is a counterpart to the DynamicMap that does not require a live Python server to be running called the HoloMap. The HoloMap container will not be covered in the tutorial but you can learn more about them in the containers user guide.

Now let us build a very simple DynamicMap that is driven by a linked stream (specifically a PointerXY stream) that represents the position of the cursor over the plot:

In [ ]:

from holoviews import streams
pointer = streams.PointerXY(x=0, y=0) # x=0 and y=0 are the initialized values

def crosshair(x, y):
    return  hv.Ellipse(0,0,1) * hv.HLine(y) * hv.VLine(x)

hv.DynamicMap(crosshair, streams=[pointer])

Try moving your mouse over the plot and you should see the crosshair follow your mouse position.

The core concepts here are:

The plot shows an overlay built with the * operator introduced in the previous notebook.
There is a callback that returns this overlay that is built according to the supplied x and y arguments. A DynamicMap always contains a callback that returns a HoloViews object such as an Element or Overlay
These x and y arguments are supplied by the PointerXY stream that reflect the position of the mouse on the plot.

Exercise¶

Look up the Ellipse, HLine, and VLine elements in the HoloViews reference guide and see if the definitions of these elements align with your initial intuitions.

Exercise (additional)¶

If you have time, try running one of the examples in the 'Streams' section of the HoloViews reference guide in the cell below. All the examples in the reference guide should be relatively short and self-contained.

In [ ]:

Selecting a particular earthquake with the mouse¶

Now we only need two more concepts before we can set up the appropriate mechanism to select a particular earthquake on the hvPlot-generated Scatter plot we started with.

First, we can attach a stream to an existing HoloViews element such as the earthquake distribution generated with hvplot:

In [ ]:

selection_stream = streams.Selection1D(source=high_mag_quakes)

Next we need to enable the 'tap' tool on our Scatter to instruct Bokeh to enable the desired selection mechanism in the browser.

In [ ]:

high_mag_quakes.opts(tools=['tap'])

The Bokeh default alpha of points which are unselected is going to be too low when we overlay these points on a tile source. We can use the HoloViews options system to pick a better default as follows:

In [ ]:

hv.opts.defaults(hv.opts.Points(nonselection_alpha=0.4))

The tap tool is in the toolbar with the icon showing the concentric circles and plus symbol. If you enable this tool, you should be able to pick individual earthquakes above by tapping on them.

Now we can make a DynamicMap that uses the stream we defined to show the index of the earthquake selected via the hv.Text element:

In [ ]:

def labelled_callback(index):
    if len(index) == 0:
        return  hv.Text(x=0,y=0, text='')
    first_index = index[0] # Pick only the first one if multiple are selected
    row = most_severe_projected.iloc[first_index]
    return hv.Text(x=row.easting,y=row.northing,text='%d : %s' % (first_index, row.place)).opts(color='white')

labeller = hv.DynamicMap(labelled_callback, streams=[selection_stream])

This labeller receives the index argument from the Selection1D stream which corresponds to the row of the original dataframe (most_severe) that was selected. This lets us present the index and place value using hv.Text which we then position at the corresponding latitude and longitude to label the chosen earthquake.

Finally, we overlay this labeller DynamicMap over the original plot. Now by using the tap tool you can see the index number of an earthquake followed by the assigned place name:

In [ ]:

(esri * high_mag_quakes * labeller).opts(hv.opts.Points(tools=['tap', 'hover']))

Exercise¶

Pick an earthquake point above and using the displayed index, display the corresponding row of the most_severe dataframe using the .iloc method in the following cell.

In [ ]:

Building a linked earthquake visualizer¶

Now we will build a visualization that achieves the following:

The user can select an earthquake with magnitude >7 using the tap tool in the manner illustrated in the last section.
In addition to the existing label, we will add concentric circles to further highlight the selected earthquake location.
All earthquakes within 0.5 degrees of latitude and longitude of the selected earthquake (~50km) will then be used to supply data for two linked plots:
1. A histogram showing the distribution of magnitudes in the selected area.
2. A timeseries scatter plot showing the magnitudes of earthquakes over time in the selected area.

The first step is to generate a concentric-circle marker using a similar approach to the labeller above. We can write a function that uses Ellipse to mark a particular earthquake and pass it to a DynamicMap:

In [ ]:

def mark_earthquake(index):
    if len(index) == 0:
        return  hv.Overlay([])
    first_index = index[0] # Pick only the first one if multiple are selected
    row = most_severe_projected.iloc[first_index]
    return (  hv.Ellipse(row.easting, row.northing, 1.5e6).opts(color='white', alpha=0.5)
            * hv.Ellipse(row.easting, row.northing, 3e6).opts(color='white', alpha=0.5))

quake_marker = hv.DynamicMap(mark_earthquake, streams=[selection_stream])

Now we can test this component by building an overlay of the ESRI tile source, the >=7 magnitude points and quake_marked:

In [ ]:

esri* high_mag_quakes.opts(tools=['tap']) * quake_marker

Note that you may need to zoom in to your selected earthquake to see the localized, lower magnitude earthquakes around it.

Filtering earthquakes by location¶

We wish to analyse the earthquakes that occur around a particular latitude and longitude. To do this we will define a function that given a latitude and longitude, returns the rows of a suitable dataframe that corresponding to earthquakes within 0.5 degrees of that position:

In [ ]:

def earthquakes_around_point(df, lat, lon, degrees_dist=0.5):
    half_dist = degrees_dist / 2.0
    return df[((df['latitude'] - lat).abs() < half_dist) 
              & ((df['longitude'] - lon).abs() < half_dist)].compute()

As it can be slow to filter our dataframes in this way, we can define the following function that can cache the result of filtering cleaned_reindexed_df (containing all earthquakes) based on an index pulled from the most_severe dataframe:

In [ ]:

def index_to_selection(indices, cache={}):
    if not indices: 
        return most_severe.iloc[[]]
    index = indices[0]   # Pick only the first one if multiple are selected
    if index in cache: return cache[index]
    row = most_severe.iloc[index]
    selected_df = earthquakes_around_point(cleaned_reindexed_df, row.latitude, row.longitude)
    cache[index] = selected_df
    return selected_df 

The caching will be useful as we know both of our planned linked plots (i.e the histogram and scatter over time) make use of the same earthquake selection once a particular index is supplied from a user selection. This particular caching strategy is rather awkward (and leaks memory!) but it simple and will serve for the current example. A better approach to caching will be presented in the Advanced Dashboards section of the tutorial.

Exercise¶

Test the index_to_selection function above for the index you picked in the previous exercise. Note that the stream supplied a list of indices and that the function above only uses the first value given in that list. Do the selected rows look correct?:

In [ ]:

Exercise¶

Convince yourself that the selected earthquakes are within 0.5$^o$ distance of each other in both latitude and longitude.

In [ ]:

Hint

For a given chosen index, you can see the distance difference using the following code:

chosen = 235
delta_long = index_to_selection([chosen]).longitude.max() - index_to_selection([chosen]).longitude.min()
delta_lat = index_to_selection([chosen]).latitude.max() - index_to_selection([chosen]).latitude.min()
print("Difference in longitude: %s" % delta_long)
print("Difference in latitude: %s" % delta_lat)

Linked plots¶

So far we have overlayed the display updates on top of the existing spatial distribution of earthquakes. However, there is no requirement that the data is overlaid and we might want to simply attach an entirely new, derived plot that dynamically updates to the side.

Using the same principles as we have already seen, we can define a DynamicMap that returns Histogram distributions of earthquake magnitude:

In [ ]:

def histogram_callback(index):
    title = 'Distribution of all magnitudes within half a degree of selection'
    selected_df = index_to_selection(index)
    return selected_df.hvplot.hist(y='mag', bin_range=(0,10), bins=20, color='red', title=title)

histogram = hv.DynamicMap(histogram_callback, streams=[selection_stream])

The only real difference in the approach here is that we can still use .hvplot to generate our elements instead of declaring the HoloViews elements explicitly. In this example, .hvplot.hist is used.

The exact same principles can be used to build the scatter callback and temporal_distribution DynamicMap:

In [ ]:

def scatter_callback(index):
    title = 'Temporal distribution of all magnitudes within half a degree of selection '
    selected_df = index_to_selection(index)
    return selected_df.hvplot.scatter('time', 'mag', color='green', title=title)

temporal_distribution = hv.DynamicMap(scatter_callback, streams=[selection_stream])

Lastly, let us define a DynamicMap that draws a VLine to mark the time at which the selected earthquake occurs so we can see which tremors may have been aftershocks immediately after that major earthquake occurred:

In [ ]:

def vline_callback(index):
    if not index:
        return hv.VLine(0).opts(alpha=0)
    row = most_severe.iloc[index[0]]
    return hv.VLine(row.time).opts(line_width=2, color='black')

temporal_vline = hv.DynamicMap(vline_callback, streams=[selection_stream])

We now have all the pieces we need to build an interactive, linked visualization of earthquake data.

Exercise¶

Test the histogram_callback and scatter_callback callback functions by supplying your chosen index, remembering that these functions require a list argument in the following cell.

In [ ]:

Putting it together¶

Now we can combine the components we have already built as follows to create a dynamically updating plot together with an associated, linked histogram:

In [ ]:

((esri * high_mag_quakes.opts(tools=['tap']) * labeller * quake_marker)
 + histogram + temporal_distribution * temporal_vline).cols(1)

We now have a custom interactive visualization that builds on the output of hvplot by making use of the underlying HoloViews objects that it generates.

Conclusion¶

When exploring data it can be convenient to use the .plot API to quickly visualize a particular dataset. By calling .plot to generate different plots over the course of a session, it is possible to gradually build up a mental model of how a particular dataset is structured. While this works well for simple datasets, it can be more efficient to build a linked visualization with support for direct user interaction as a tool for more rapidly gaining insight.

In the workflow presented here, building such custom interaction is relatively quick and easy and does not involve throwing away prior code used to generate simpler plots. In the spirit of 'short cuts not dead ends', we can use the HoloViews output of hvplot that we used in our initial exploration to build rich visualizations with custom interaction to explore our data at a deeper level.

These interactive visualizations not only allow for custom interactions beyond the scope of hvplot alone, but they can display visual annotations not offered by the .plot API. In particular, we can overlay our data on top of tile sources, generate interactive textual annotations, draw shapes such a circles, mark horizontal and vertical marker lines and much more. Using HoloViews you can build visualizations that allow you to directly interact with your data in a useful and intuitive manner.

In this notebook, the earthquakes plotted were either filtered early on by magnitude (>=7) or dynamically to analyse only the earthquakes within a small geographic distance. This allowed us to use Bokeh directly without any special handing and without having to worry about the performance issues that would be occur if we were to try to render the whole dataset at once.

In the next section we will see how such large datasets can be visualized directly using Datashader.

Tutorial 6. Interlinked Plots

An Introduction to HoloViews Elements¶

Exercise¶

Setting Visual Options¶

Exercise¶

The help system¶

Exercise¶

Custom interactivity for Elements¶

Exercise¶

Exercise (additional)¶

Selecting a particular earthquake with the mouse¶

Exercise¶

Building a linked earthquake visualizer¶

Filtering earthquakes by location¶

Exercise¶

Exercise¶

Linked plots¶

Exercise¶

Putting it together¶

Conclusion¶

The `help` system¶