In this Jupyter notebook, we will be using the NumPy, Pandas, and HoloViews libraries to work with our data, using the standard abbreviations np
, pd
, and hv
. In order to show how HoloViews works, we'll focus on the native HoloViews API and not the simplified Pandas-style HVPlot interface, but you can use either one as you prefer.
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
hv.extension('bokeh')
loads the Bokeh plotting support in HoloViews; hv.extension('matplotlib')
would use the matplotlib support instead. If pyviz
is installed correctly as described at pyviz.org, you should see a HoloViews logo and a small Bokeh logo when you run the cell above.
By default, Python will typically display your data in a numerical form. For instance, let's say you have a simple Pandas dataframe containing 41 samples of a quadratic function y=100-x^2
, at different values of x
:
xs = np.arange(-10, 10.5, 0.5)
ys = 100-xs**2
df = pd.DataFrame(dict(x=xs, y=ys))
This dataframe will display as a table of numbers:
df
It's important to be able to access this numerical representation, but it's awkward to work with, hard to take in at a glance, and even this tiny dataset won't fit on a single screen in table form. In many cases, a graphical representation would be much more natural to work with, but visualizing anything but the simplest data typically requires writing a complex series of plotting commands that are distracting when exploring data. HoloViews provides an alternative approach: instead of having plotting as a separate task, annotate your data to make it instantly visualizable on demand in whatever context it ends up. You can then switch between graphical and numerical representations whenever you wish.
In this case, we know that the data is a set of samples from a continuous function of x, and we can capture that information by declaring a HoloViews object of type Curve
:
simple_curve = hv.Curve(df,'x','y')
print(simple_curve)
:Curve [x] (y)
is HoloViews's shorthand for saying that the data in df
is a set of samples from a continuous function y
of one independent variable x
, and simple_curve
simply pairs your dataframe df
with this semantic declaration.
Once we've captured this crucial bit of metadata, HoloViews now knows enough about this object to represent it graphically, as it will do by default in a Jupyter notebook:
simple_curve
This Bokeh plot is much more convenient to examine than a column of numbers, because it conveys the entire set of data in a compact, easily appreciated, interactively explorable format. HoloViews knew that a continuous curve like this is the right representation for what would otherwise be just a table of numbers, because we explicitly declared the element type as hv.Curve
. Crucially, simple_curve
itself is not a plot, it's just a simple wrapper around your data that happens to have a convenient graphical representation. The full dataframe will always be available as simple_curve.data
, for any numerical computations you would like to do:
simple_curve.data.tail()
As you can see, with HoloViews you don't have to select between plotting your data and working with it numerically. Any HoloViews object will let you do both conveniently; you can simply choose whatever representation is the most appropriate way to approach the task you are doing. This approach is very different from a traditional plotting program, where the objects you create (e.g. a Matplotlib figure or a native Bokeh plot) are a dead end from an analysis perspective, useful only for plotting. This tutorial will show you how to create visualizable Elements
like the Curve
object above, and some of the ways you can annotate them to make sure the visual representation accurately conveys the properties of your data.
Elements are HoloViews' most basic, core primitives. Elements allow you to give a dataset a semantically meaningful visual representation, while preserving the raw data you supplied and allowing various analysis operations to be applied to it. Nearly every Element can be constructed in the same way as you saw for Curve
above (apart from Histogram
and certain annotations):
hv.Element(data, kdims=None, vdims=None, **kwargs)
This standard signature consists of the same five types of information:
Element
: any of the dozens of element types shown in the reference gallery.data
: your data in one of a number of formats described below, such as tabular dataframes or multidimensional gridded Xarray or Numpy arrays.kdims
: "key dimension(s)", also called independent variables or index dimensions in other contexts---the values for which your data was measured.vdims
: "value dimension(s)", also called dependent variables or measurements---what was measured or recorded for each value of the key dimensions.kwargs
: optional keyword arguments specific to that Element
type (rarely needed).For simple_curve
, we selected column x
as the key dimension and y
as the value dimension, corresponding to how we generated the data (y
as a function of x
). Other element types might expect multiple key dimensions as a list, or allow multiple value dimensions (to show multiple data points for the same keys). We supplied a Pandas dataframe for the data
, but we could just as well have supplied the original Numpy arrays xs
and ys
, Python lists, or a Python dictionary:
hv.Curve((xs,ys)) + hv.Curve((list(xs),list(ys))) + hv.Curve({'x':xs,'y':ys})
(We will discuss the +
operator below, but here it just lets us put several HoloViews objects side by side). Each of these data formats would be represented in slightly different textual forms by Python, but as you can see, the graphical representation is the same for all of them because they all represent the same datapoints on a continuous curve.
Of course, maybe you don't want to treat these samples as a curve; perhaps for your purposes you want to see the individual data points, or maybe you want to draw attention to the area under the curve. You can provide this information to HoloViews by selecting the appropriate Element
, such as Scatter
, Area
, or [Bars
]((http://holoviews.org/reference/elements/bokeh/Bars.html). Try creating some of those elements here as an exercise!
# Exercise: Instead of hv.Curve(df,'x','y'), try hv.Area or hv.Scatter to choose how your dataframe appears
Wrapping your data (df
or (xs,ys)
here) as a HoloViews element is sufficient to make it visualizable, but there are many other aspects of the data that we can capture to convey more about its meaning to HoloViews. For instance, we might want to specify what the x-axis and y-axis actually correspond to, in the real world. Perhaps this parabola is the trajectory of a ball thrown into the air, in which case we could declare the object as:
trajectory = hv.Curve(df, ('x','Horizontal distance'), ('y','Height'))
trajectory
Here x
is a short and convenient tab-completeable Python identifier for the dimension that must match the column name when using dataframes, while Horizontal distance
is a human-readable label conveying more about what it means than the original "x
".
Even though the additional information we provided is a description of the data, not parameters of a plotting object, HoloViews is designed to reveal the properties of your data accurately, and so the axes now update to show what these dimensions represent. We'll look more at Dimension
s like x
below, including how to add units or other additional semantic information.
# Exercise: Take a look at trajectory.vdims
The type of an element is a declaration of important facts about your data, which gives HoloViews the appropriate hint required to generate a suitable visual representation from it. For instance, calling it a Curve
is a declaration from the user that the data consists of samples from a continuous function, which is why HoloViews plots it as a connected object. If we convert to an hv.Scatter
object instead, the same set of data will show up as separated points, because "Scatter" does not make an assumption that the data is meant to be continuous:
hv.Scatter(simple_curve) + hv.Area(simple_curve)
Casting the same data between different Element types in this way is often useful as a way to see your data differently, particularly if you are not certain of a single best way to interpret the data. Casting preserves your declared metadata as much as possible, propagating your declarations from the original object to the new one.
# How do you predict the representation for hv.Scatter(trajectory) will differ from
# hv.Scatter(simple_curve) above? Try it!
# Also try casting the trajectory to an area then back to a curve.
x = np.linspace(0, 10, 500)
y = np.linspace(0, 10, 500)
xx, yy = np.meshgrid(x, y)
arr = np.sin(xx)*np.cos(yy)
As above, we know that this data was sampled from a continuous function, but this time it's a function of two key dimensions, so we declare it as an hv.Image
object:
image = hv.Image(arr)
image
A very commonly useful method on all types of elements is the .hist
method, which adjoins a plot displaying the distribution of values:
image.hist()
As you can see, the default names for the dimensions of an Image
are x
and y
for the two key dimensions and z
for the value dimension (shown as color). Overriding those names is easy:
hv.Image(arr, ['xaxis','yaxis'], 'h').hist()
# Exercise: Try visualizing different two-dimensional arrays.
# You can try a new function entirely or simple modifications of the existing one
# E.g., explore the effect of squaring and cubing the arguments to sine and cosine
economic_data = pd.read_csv('../data/macro.csv')
economic_data.tail()
Here country
and year
are possible key dimensions, determining what measurements were taken, and the rest of the columns are possible value dimensions, i.e., the results of those measurements. Notice that the dataframe itself makes no distinction between the two types of dimension; this is information that a human must supply to map the values into something meaningfully visualizable.
As an example, let's build an element that helps us understand how the percentage growth in US GDP varies over time. As our dataframe contains GDP growth data for lots of countries, let us select the United States from the table and create a Curve
element from it:
US_data = economic_data[economic_data['country'] == 'United States'] # Select data for the US only
US_data.tail()
growth_curve = hv.Curve(US_data, 'year', 'growth')
growth_curve
# Exercise: Plot the unemployment (unem) by year
If we want to clarify that 'growth' is 'GDP growth', we could do so in the original call using hv.Curve(US_data, 'year', ('growth','GDP growth'))
, which is a convenient shortcut for creating a Dimension
object with a label. We can also supply dimension labels after the fact using the redim
method:
gdp_growth = growth_curve.redim.label(growth='GDP growth', year='Year')
gdp_growth
Here the redim
method associates a dimension label with each of the two key dimensions, creating and returning a new element called gdp_growth
(you can check for yourself that growth_curve
is unchanged).
The redim
utility lets you easily change other dimension parameters, and as an example let's give our GDP growth dimension the appropriate unit:
gdp_growth.redim.unit(growth='%')
# Exercise: Use redim.unit to give the year dimension a better unit
# For instance, relabel to 'Time' then give the unit as 'years'
So far we have mostly considered elements that map data values directly onto the screen, whether a curve where the x- and y-coordinates are drawn on the screen or an Image where the luminance of each pixel is drawn on the screen. Many other types of visualizations instead summarize the data, typically by computing statistics from it.
One of the most common such summaries is a Histogram
. To compute a standard histogram we can use the np.histogram
function and wrap the return value in a Histogram element:
hv.Histogram(np.histogram(economic_data.growth, normed=True), kdims='Growth')
Alternatively, we can let HoloViews compute the histogram using kernel density estimation by declaring a Distribution
element, which accepts data along with the key dimension (a column of a dataframe, in this case) whose distribution we want to visualize:
hv.Distribution(economic_data, 'growth')
Another option for summarizing a dataset is using a box plot, which we can declare using a BoxWhisker
element, specifying the 'country'
as a kdim and growth
as the vdim, giving us an overview of the distribution of growth percentages for each country:
hv.BoxWhisker(economic_data, 'country', 'growth')
As you saw above, we very often want to combine multiple elements into a single plot, both to save space and to show how things are related. In this section, we introduce the two composition operators +
and *
, which build Layout
and Overlay
objects.
Layouts are useful for grouping related data side by side:
layout = trajectory + hv.Scatter(trajectory) + hv.Area(trajectory) + hv.Spikes(trajectory)
layout.cols(2)
Putting items together like this saves space, but even more importantly it acts as a declaration to HoloViews that these objects are related to each other. HoloViews thus ensures that all of them will share the same axis ranges for all dimensions that match, and zooming or panning on any one of them will make the corresponding change to the others (try it!). The next section of this tutorial will describe how to change or override this behavior when appropriate.
If we look closely, we can see that the result of applying the +
operator is an hv.Layout
object (with a hint that a two-column layout is desired):
print(layout)
Now let us build a new layout by selecting elements from layout
:
layout.Curve.I + layout.Spikes.I
We see that a Layout
lets us pick component elements via two levels of tab-completable attribute access. Note that by default the type of the element defines the first level of access and the second level of access uses Roman numerals by default (because Python identifiers cannot start with numbers).
These two levels correspond to another type of semantic declaration that applies to the elements directly (rather than their dimensions), called group
and label
. Specifically, group
allows you to declare what kind of thing this object is, while label
allows you to label which specific object it is. What you put in those declarations, if anything, will form the title of the plot:
cannonball = trajectory.relabel('Cannonball', group='Trajectory')
integral = hv.Area(cannonball).relabel('Filled')
labelled_layout = cannonball + integral
labelled_layout
# Exercise: Try out the tab-completion of labelled_layout to build a new layout swapping the position of these elements
# Optional: Try using two levels of dictionary-style access to grab the cannonball trajectory
Layout places objects side by side, allowing it to collect (almost!) any HoloViews objects that you want to indicate are related. Another operator *
allows you to overlay elements into a single plot, if they live in the same space (with matching dimensions, and preferably with similar ranges over those dimensions). The result of *
is an Overlay
:
trajectory * hv.Spikes(trajectory)
# Exercise: Make an overlay of the Spikes object from layout on top of the filled trajectory area of labelled_layout
One thing that is specific to Overlays is the use of color cycles to automatically differentiate between elements of the same type and group
:
tennis_ball = cannonball.clone((xs, 0.5*np.array(ys)), label='Tennis Ball')
cannonball + tennis_ball + (cannonball * tennis_ball)
Here we use the clone
method to make a shallower tennis-ball trajectory: the clone
method create a new object that preserves semantic metadata while allowing overrides (in this case we override the input data and the label
).
As you can see, HoloViews can determine that the two overlaid curves will be distinguished by color, and so it also provides a legend so that the mapping from color to data is clear. In an Overlay, the title will contain whatever is shared between the elements in the plot, and for the third plot here, the title is just the group name "Trajectory" because the two labels differ.
# Optional Exercise:
# 1. Create a thrown_ball curve with half the height of tennis_ball by cloning it and assigning the label 'Thrown ball'
# 2. Add thrown_ball to the overlay
HoloViews elements can easily be sliced using array-style syntax or using the .select
method. The following example shows how we can slice the cannonball
trajectory into its ascending and descending components:
ascending = cannonball[-10.0:0.5].relabel('ascending')
descending = cannonball.select(x=(0,None)).relabel('descending')
ascending * descending
As before, the color automatically cycles when overlaying two elements of the same type, HoloViews adds a legend when the different components are labelled, and the plot's title is "Trajectory" because the group name is all that is common to these elements.
Note that the slicing in HoloViews is done in the continuous space of the dimension and not in the integer space of individual data samples. In this instance, the slice is over the Horizontal distance
dimension and we can see that the slicing semantics follow the usual Python convention of an inclusive lower bound and an exclusive upper bound.
This example also illustrates why we typically use simple identifiers for dimension names and reserve longer descriptions for the dimension labels: certain methods such as the select
method shown above accept dimension names as keywords.
In later tutorials, we will see how elements and the principles of composition extend to containers, which makes exploring more complex data quick, easy, and interactive. Before we examine the container types, the next tutorial will look at how to customize the appearance of elements, change the plotting extension, and specify output formats.
For a quick demonstration related to what we will be covering, hit the kernel restart button (⟳) in the toolbar for this notebook, change hv.extension('bokeh')
to hv.extension('matplotlib')
in the first cell, and rerun the notebook!