Notebook

The PyData ecosystem has a number of core Python data containers that allow users to work with a wide array of datatypes, including:

Pandas: DataFrame, Series (columnar/tabular data)
Rapids cuDF: GPU DataFrame, Series (columnar/tabular data)
Dask: DataFrame, Series (distributed/out of core arrays and columnar data)
XArray: Dataset, DataArray (labelled multidimensional arrays)
Streamz: DataFrame(s), Series(s) (streaming columnar data)
Intake: DataSource (data catalogues)
GeoPandas: GeoDataFrame (geometry data)
NetworkX: Graph (network graphs)

Several of these libraries have the concept of a high-level plotting API that lets a user generate common plot types very easily. The native plotting APIs are generally built on Matplotlib, which provides a solid foundation, but it means that users miss out on the benefits of modern, interactive plotting libraries built for the web like Bokeh and HoloViews.

hvPlot provides a high-level plotting API built on HoloViews that provides a general and consistent API for plotting data in all the abovementioned formats. hvPlot can integrate neatly with the individual libraries if an extension mechanism for the native plot APIs is offered, or it can be used as a standalone component.

Basic usage¶

hvPlot provides an alternative for the static plotting API provided by Pandas and other libraries, with by default an interactive Bokeh-based plotting API that supports panning, zooming, hovering, and clickable/selectable legends. Let's first create some data.

In [ ]:

import pandas as pd, numpy as np
idx = pd.date_range('1/1/2000', periods=1000)
df  = pd.DataFrame(np.random.randn(1000, 4), index=idx, columns=list('ABCD')).cumsum()

We need to import hvplot.pandas, this import has two side effects:

It makes the .hvplot accessor available on Pandas DataFrame and Series objects, which means that after that df.hvplot becomes a valid statement while before that it would raise an AttributeError.
It sets the Bokeh plotting library as the default one and loads the corresponding extension. In practice in a notebook it means that there's actually some front-end code that is injected in the cell output of this import, this code is required for HoloViews plots to behave correctly so make sure not to remove this cell.

In [ ]:

import hvplot.pandas  # noqa

Now simply call .hvplot() on the DataFrame as you would call .plot().

In [ ]:

df.hvplot()

hvPlot works with multiple data sources and ships with some inbuilt sample data, which is loaded using the Intake data catalog.

In [ ]:

from hvplot.sample_data import us_crime

columns = ['Burglary rate', 'Larceny-theft rate', 'Robbery rate', 'Violent Crime rate']
us_crime.hvplot.violin(y=columns, group_label='Type of crime', value_label='Rate per 100k', invert=True, color='Type of crime')

hvPlot output can easily be composed using * to overlay plots or + to lay them out side by side:

In [ ]:

us_crime.hvplot.bivariate('Burglary rate', 'Property crime rate', legend=False, width=500, height=400) * \
us_crime.hvplot.scatter(  'Burglary rate', 'Property crime rate', color='black', size=15, legend=False) +\
us_crime.hvplot.table(['Burglary rate', 'Property crime rate'], width=350, height=350)

When used with streamz DataFrames, hvPlot can very easily plot streaming data to get a live updating plot:

In [ ]:

import hvplot.streamz  # noqa
from streamz.dataframe import Random

streaming_df = Random(freq='5ms') 

streaming_df.hvplot(backlog=100, height=400, width=500) +\
streaming_df.hvplot.hexbin(x='x', y='z', backlog=2000, height=400, width=500);

For multidimensional data not supported well by Pandas, you can use an Xarray Dataset like this gridded data of North American air temperatures over time, which also demonstrates support for geographic projections:

In [ ]:

import xarray as xr, cartopy.crs as crs
import hvplot.xarray  # noqa

air_ds = xr.tutorial.open_dataset('air_temperature').load()
proj = crs.Orthographic(-90, 30)

air_ds.air.isel(time=slice(0, 9, 3)).hvplot.quadmesh(
    'lon', 'lat', projection=proj, project=True, global_extent=True, 
    cmap='viridis', rasterize=True, dynamic=False, coastline=True, 
    frame_width=500)

hvPlots will show widgets like the "Time" slider here whenever your data is indexed by dimensions that are not mapped onto the plot axes, allowing you to explore complex datasets much more easily.

Lastly, hvPlot also provides drop-in replacements for the NetworkX plotting functions, making it trivial to generate interactive plots of network graphs:

In [ ]:

import networkx as nx
import hvplot.networkx as hvnx

G = nx.karate_club_graph()

hvnx.draw_spring(G, labels='club', font_size='10pt', node_color='club', cmap='Category10', width=500, height=500)

Using Matplotlib or Plotly¶

hvPlot offers the possibility to create Matplotlib and Plotly plots. Load the chosen plotting library with the extension function.

In [ ]:

hvplot.extension('matplotlib')

In [ ]:

air_ds.air.isel(time=slice(0, 9, 3)).hvplot.quadmesh(
    'lon', 'lat', projection=proj, project=True, global_extent=True, 
    cmap='viridis', rasterize=True, dynamic=False, coastline=True,
    xaxis=None, yaxis=None, width=500
)

Once multiple backends are loaded you can switch between them with hvplot.output.

In [ ]:

hvplot.output(backend='bokeh')

hvPlot is designed to work well in and outside the Jupyter notebook, and thanks to built-in Datashader support scales easily to millions or even billions of datapoints:

Pandas backend¶

With recent versions of Pandas (>=0.25.0) we can also swap the default plotting backend:

In [ ]:

pd.options.plotting.backend = 'holoviews'

df.A.hist()

For information on using .hvplot() take a look at the User Guide.