Using hvPlot allows you to generate a number of different types of plot quickly from a standard API by building HoloViews objects, as discussed in the previous notebook. These objects are rendered with Bokeh which offers a number of standard ways to interact with your plot, such as panning and zooming tools.
Many other modes of interactivity are possible when building an exploratory visualization (such as a dashboard) and these forms of interactivity cannot be achieved using hvPlot alone.
In this notebook, we will drop down to the HoloViews level of representation to build a visualization directly that consists of linked plots that update when you interactivity select a particular earthquake with the mouse. The goal is to show how more sophisticated forms of interactivity can be built when needed, in a way that's fully compatible with all the examples shown in earlier sections.
First let us load our initial imports:
import numpy as np
import pandas as pd
import dask.dataframe as dd
import hvplot.pandas # noqa
import datashader.geo
from holoviews.element import tiles
And clean the data before filtering (for magnitude >7
) and projecting to to Web Mercator as before:
df = dd.read_parquet('../data/earthquakes.parq').repartition(npartitions=4)
cleaned_df = df.copy()
cleaned_df['mag'] = df.mag.where(df.mag > 0)
cleaned_reindexed_df = cleaned_df.set_index(cleaned_df.time)
cleaned_reindexed_df = cleaned_reindexed_df.persist()
most_severe = cleaned_reindexed_df[cleaned_reindexed_df.mag >= 7].compute()
x, y = datashader.geo.lnglat_to_meters(most_severe.longitude, most_severe.latitude)
most_severe_projected = most_severe.join([pd.DataFrame({'easting': x}), pd.DataFrame({'northing': y})])
Towards the end of the previous notebook we generated a scatter plot of earthquakes
across the earth that had a magnitude >7
that was projected using
datashader and overlaid on top of a map tile source:
high_mag_quakes = most_severe_projected.hvplot.points(x='easting', y='northing', c='mag',
title='Earthquakes with magnitude >= 7')
esri = tiles.ESRI().redim(x='easting', y='northing')
esri * high_mag_quakes
And saw how this object is a HoloViews Points
object:
print(high_mag_quakes)
This object is an example of a HoloViews Element which is an object that can display itself. These elements are thin wrappers around your data and the raw input data is always available on the .data
attribute. For instance, we can look at the head
of the most_severe_projected
DataFrame
as follows:
high_mag_quakes.data.head()
We will now learn a little more about HoloViews
elements, including how to build them up from scratch so that we can control every aspect of them.
HoloViews elements are the atomic, visualizable components that can be rendered by a plotting library such as Bokeh. We don't actually need to use hvPlot to create these element objects: we can create them directly by importing HoloViews (and loading the extension if we have not loaded hvPlot):
import holoviews as hv
hv.extension("bokeh") # Optional here as we have already loaded hvplot.pandas
Now we can create our own example of a Points
element. In the next
cell we plot 100 points with a normal (independent) distrbutions in the
x
and y
directions:
xs = np.random.randn(100)
ys = np.random.randn(100)
hv.Points((xs, ys))
Now that the axis labels are 'x' and 'y', the default dimensions for
this element type. We can use a different set of dimensions along the x- and y-axis (say
'weight' and 'height') and we can also associate additional fitness
information with each point if we wish:
xs = np.random.randn(100)
ys = np.random.randn(100)
fitness = np.random.randn(100)
height_v_weight = hv.Points((xs, ys, fitness), ['weight', 'height'], 'fitness')
height_v_weight
Now we can look at the printed representation of this object:
print(height_v_weight)
Here the printed representation shows the key dimensions that we specified in square brackets as [weight,height]
and the additional value dimension fitness
in parentheses as (fitness)
. The key dimensions map to the axes and the value dimensions can be visually represented by other visual attributes as we shall see shortly.
For more information an HoloViews dimensions, see this user guide.
Visit the HoloViews reference gallery and browse the available set of elements. Pick an element type and try running one of the self-contained examples in the following cell.
The two Points
elements above look quite different from the one
returned by hvplot showing the earthquake positions. This is because
hvplot makes use of the HoloViews options system to customize the
visual representation of these element objects.
Let us color the height_v_weight
scatter by the fitness value and use a larger
point size:
height_v_weight.opts(color='fitness', size=8, colorbar=True, aspect='square')
Copy the line above into the next cell and try changing the points to 'blue' or 'green' or another dimension of the data such as 'height' or 'weight'.
Are the results what you expect?
help
system¶You can learn more about the .opts
method and the HoloViews options
system in the corresponding user
guide. To
easily learn about the available options from inside a notebook, you can
use hv.help
and inspect the 'Style Options'.
# Commented as there is a lot of help output!
# hv.help(hv.Scatter)
At this point, we can have some insight to the sort of HoloViews object hvPlot is building behind the scenes for our earthquake example:
esri * hv.Points(most_severe_projected, ['easting', 'northing'], 'mag').opts(color='mag', size=8, aspect='equal')
Try using hv.help
to inspect the options available for different element types such as the Points
element used above. Copy the line above into the cell below and pick a Points
option that makes sense to you and try using it in the .opts
method.
If you can't decide on an option to pick, a good choice is marker
. For instance, try:
marker='+'
marker='d'
.HoloViews uses matplotlib's conventions for specifying the various marker types. Try finding out which ones are support by Bokeh.
When rasterization of the population density data via hvplot was
introduced in the last notebook, we saw that the HoloViews object
returned was not an element but a DynamicMap
.
A DynamicMap
enables custom interactivity beyond the Bokeh defaults by
dynamically generating elements that get displayed and updated as the
plot is interacted with.
There is a counterpart to the DynamicMap
that does not require a live
Python server to be running called the HoloMap
. The HoloMap
container will not be covered in the tutorial but you can learn more
about them in the containers user
guide.
Now let us build a very simple DynamicMap
that is driven by a linked
stream (specifically a PointerXY
stream) that represents the position
of the cursor over the plot:
from holoviews import streams
pointer = streams.PointerXY(x=0, y=0) # x=0 and y=0 are the initialized values
def crosshair(x, y):
return hv.Ellipse(0,0,1) * hv.HLine(y) * hv.VLine(x)
hv.DynamicMap(crosshair, streams=[pointer])
Try moving your mouse over the plot and you should see the crosshair follow your mouse position.
The core concepts here are:
*
operator introduced in
the previous notebook.x
and y
arguments. A DynamicMap always contains a
callback that returns a HoloViews object such as an Element
or
Overlay
x
and y
arguments are supplied by the PointerXY
stream
that reflect the position of the mouse on the plot.Look up the Ellipse
, HLine
, and VLine
elements in the
HoloViews reference guide and see
if the definitions of these elements align with your initial intuitions.
If you have time, try running one of the examples in the 'Streams' section of the HoloViews reference guide in the cell below. All the examples in the reference guide should be relatively short and self-contained.
Now we only need two more concepts before we can set up the appropriate mechanism to select a particular earthquake on the hvPlot-generated Scatter plot we started with.
First, we can attach a stream to an existing HoloViews element such as the earthquake distribution generated with hvplot:
selection_stream = streams.Selection1D(source=high_mag_quakes)
Next we need to enable the 'tap' tool on our Scatter to instruct Bokeh to enable the desired selection mechanism in the browser.
high_mag_quakes.opts(tools=['tap'])
The Bokeh default alpha of points which are unselected is going to be too low when we overlay these points on a tile source. We can use the HoloViews options system to pick a better default as follows:
hv.opts.defaults(hv.opts.Points(nonselection_alpha=0.4))
The tap tool is in the toolbar with the icon showing the concentric circles and plus symbol. If you enable this tool, you should be able to pick individual earthquakes above by tapping on them.
Now we can make a DynamicMap that uses the stream we defined to show the index of the earthquake selected via the hv.Text
element:
def labelled_callback(index):
if len(index) == 0:
return hv.Text(x=0,y=0, text='')
first_index = index[0] # Pick only the first one if multiple are selected
row = most_severe_projected.iloc[first_index]
return hv.Text(x=row.easting,y=row.northing,text='%d : %s' % (first_index, row.place)).opts(color='white')
labeller = hv.DynamicMap(labelled_callback, streams=[selection_stream])
This labeller receives the index argument from the Selection1D stream
which corresponds to the row of the original dataframe (most_severe
)
that was selected. This lets us present the index and place value using
hv.Text
which we then position at the corresponding latitude and
longitude to label the chosen earthquake.
Finally, we overlay this labeller DynamicMap
over the original
plot. Now by using the tap tool you can see the index number of an
earthquake followed by the assigned place name:
(esri * high_mag_quakes * labeller).opts(hv.opts.Points(tools=['tap', 'hover']))
Pick an earthquake point above and using the displayed index, display the corresponding row of the most_severe
dataframe using the .iloc
method in the following cell.
Now we will build a visualization that achieves the following:
The user can select an earthquake with magnitude >7
using the tap
tool in the manner illustrated in the last section.
In addition to the existing label, we will add concentric circles to further highlight the selected earthquake location.
All earthquakes within 0.5 degrees of latitude and longitude of the selected earthquake (~50km) will then be used to supply data for two linked plots:
The first step is to generate a concentric-circle marker using a similar approach to the labeller
above. We can write a function that uses Ellipse
to mark a particular earthquake and pass it to a DynamicMap
:
def mark_earthquake(index):
if len(index) == 0:
return hv.Overlay([])
first_index = index[0] # Pick only the first one if multiple are selected
row = most_severe_projected.iloc[first_index]
return ( hv.Ellipse(row.easting, row.northing, 1.5e6).opts(color='white', alpha=0.5)
* hv.Ellipse(row.easting, row.northing, 3e6).opts(color='white', alpha=0.5))
quake_marker = hv.DynamicMap(mark_earthquake, streams=[selection_stream])
Now we can test this component by building an overlay of the ESRI
tile source, the >=7
magnitude points and quake_marked
:
esri* high_mag_quakes.opts(tools=['tap']) * quake_marker
Note that you may need to zoom in to your selected earthquake to see the localized, lower magnitude earthquakes around it.
We wish to analyse the earthquakes that occur around a particular latitude and longitude. To do this we will define a function that given a latitude and longitude, returns the rows of a suitable dataframe that corresponding to earthquakes within 0.5 degrees of that position:
def earthquakes_around_point(df, lat, lon, degrees_dist=0.5):
half_dist = degrees_dist / 2.0
return df[((df['latitude'] - lat).abs() < half_dist)
& ((df['longitude'] - lon).abs() < half_dist)].compute()
As it can be slow to filter our dataframes in this way, we can define the following function that can cache the result of filtering cleaned_reindexed_df
(containing all earthquakes) based on an index pulled from the most_severe
dataframe:
def index_to_selection(indices, cache={}):
if not indices:
return most_severe.iloc[[]]
index = indices[0] # Pick only the first one if multiple are selected
if index in cache: return cache[index]
row = most_severe.iloc[index]
selected_df = earthquakes_around_point(cleaned_reindexed_df, row.latitude, row.longitude)
cache[index] = selected_df
return selected_df
The caching will be useful as we know both of our planned linked plots (i.e the histogram and scatter over time) make use of the same earthquake selection once a particular index is supplied from a user selection. This particular caching strategy is rather awkward (and leaks memory!) but it simple and will serve for the current example. A better approach to caching will be presented in the Advanced Dashboards section of the tutorial.
Test the index_to_selection
function above for the index you picked in the previous exercise. Note that the stream supplied a list of indices and that the function above only uses the first value given in that list. Do the selected rows look correct?:
Convince yourself that the selected earthquakes are within 0.5$^o$ distance of each other in both latitude and longitude.
For a given chosen
index, you can see the distance difference using the following code:
chosen = 235
delta_long = index_to_selection([chosen]).longitude.max() - index_to_selection([chosen]).longitude.min()
delta_lat = index_to_selection([chosen]).latitude.max() - index_to_selection([chosen]).latitude.min()
print("Difference in longitude: %s" % delta_long)
print("Difference in latitude: %s" % delta_lat)
So far we have overlayed the display updates on top of the existing spatial distribution of earthquakes. However, there is no requirement that the data is overlaid and we might want to simply attach an entirely new, derived plot that dynamically updates to the side.
Using the same principles as we have already seen, we can define a
DynamicMap
that returns Histogram
distributions of earthquake
magnitude:
def histogram_callback(index):
title = 'Distribution of all magnitudes within half a degree of selection'
selected_df = index_to_selection(index)
return selected_df.hvplot.hist(y='mag', bin_range=(0,10), bins=20, color='red', title=title)
histogram = hv.DynamicMap(histogram_callback, streams=[selection_stream])
The only real difference in the approach here is that we can still use
.hvplot
to generate our elements instead of declaring the HoloViews
elements explicitly. In this example, .hvplot.hist
is used.
The exact same principles can be used to build the scatter callback and temporal_distribution
DynamicMap
:
def scatter_callback(index):
title = 'Temporal distribution of all magnitudes within half a degree of selection '
selected_df = index_to_selection(index)
return selected_df.hvplot.scatter('time', 'mag', color='green', title=title)
temporal_distribution = hv.DynamicMap(scatter_callback, streams=[selection_stream])
Lastly, let us define a DynamicMap
that draws a VLine
to mark the time at which the selected earthquake occurs so we can see which tremors may have been aftershocks immediately after that major earthquake occurred:
def vline_callback(index):
if not index:
return hv.VLine(0).opts(alpha=0)
row = most_severe.iloc[index[0]]
return hv.VLine(row.time).opts(line_width=2, color='black')
temporal_vline = hv.DynamicMap(vline_callback, streams=[selection_stream])
We now have all the pieces we need to build an interactive, linked visualization of earthquake data.
Test the histogram_callback
and scatter_callback
callback functions by supplying your chosen index, remembering that these functions require a list argument in the following cell.
Now we can combine the components we have already built as follows to create a dynamically updating plot together with an associated, linked histogram:
((esri * high_mag_quakes.opts(tools=['tap']) * labeller * quake_marker)
+ histogram + temporal_distribution * temporal_vline).cols(1)
We now have a custom interactive visualization that builds on the output of hvplot
by making use of the underlying HoloViews objects that it generates.
When exploring data it can be convenient to use the .plot
API to quickly visualize a particular dataset. By calling .plot
to generate different plots over the course of a session, it is possible to gradually build up a mental model of how a particular dataset is structured. While this works well for simple datasets, it can be more efficient to build a linked visualization with support for direct user interaction as a tool for more rapidly gaining insight.
In the workflow presented here, building such custom interaction is relatively quick and easy and does not involve throwing away prior code used to generate simpler plots. In the spirit of 'short cuts not dead ends', we can use the HoloViews output of hvplot
that we used in our initial exploration to build rich visualizations with custom interaction to explore our data at a deeper level.
These interactive visualizations not only allow for custom interactions beyond the scope of hvplot
alone, but they can display visual annotations not offered by the .plot
API. In particular, we can overlay our data on top of tile sources, generate interactive textual annotations, draw shapes such a circles, mark horizontal and vertical marker lines and much more. Using HoloViews you can build visualizations that allow you to directly interact with your data in a useful and intuitive manner.
In this notebook, the earthquakes plotted were either filtered early on by magnitude (>=7
) or dynamically to analyse only the earthquakes within a small geographic distance. This allowed us to use Bokeh directly without any special handing and without having to worry about the performance issues that would be occur if we were to try to render the whole dataset at once.
In the next section we will see how such large datasets can be visualized directly using Datashader.