Let's say you want to make it easy to explore some dataset, i.e.:
You can definitely do that in Python, but you would expect to:
Here we'll show a simple, flexible, powerful, step-by-step workflow, explaining which open-source tools solve each of the problems involved:
import holoviews as hv, geoviews as gv, param, dask.dataframe as dd, cartopy.crs as crs
from colorcet import cm, fire
from holoviews.operation import decimate
from holoviews.operation.datashader import datashade
from holoviews.streams import RangeXY
from geoviews.tile_sources import EsriImagery
%time df = dd.read_parquet('../data/nyc_taxi_wide.parq').persist()
print(len(df))
df.head(2)
hv.extension('bokeh')
points = hv.Points(df, ['pickup_x', 'pickup_y'])
decimate(points)
Here Points
declares an object wrapping df
, visualized as a scatterplot of the pickup locations. decimate
limits how many points will be sent to the browser so it won't crash.
As you can see, HoloViews makes it simple to pop up a visualization of your data, getting something on screen with only a few characters of typing. But it's not particularly pretty, so let's customize it a bit:
opts = dict(width=700, height=600, xaxis=None, yaxis=None, bgcolor='black')
decimate(points.options(**opts))
That looks a bit better, but it's still decimating the data nearly beyond recognition, so let's try using Datashader to rasterize it into a fixed-size image to send to the browser:
taxi_trips = datashade(points, cmap=fire).options(**opts)
taxi_trips
Ok, that looks good now; there's clearly lots to explore in this dataset. Notice that the aspect ratio changed because Datashader is using every point, including some distant outliers. One way to fix the aspect ratio is to indicate that it's geographic data by overlaying it on a map:
taxi_trips = datashade(points, x_sampling=1, y_sampling=1, cmap=fire).options(**opts)
EsriImagery * taxi_trips
We could add lots more visual elements (laying out additional plots left and right, overlaying annotations, etc.), but let's say that this is our basic visualization we'll want to share.
To sum up what we've done so far, here are the complete 10 lines of code required to generate this geo-located interactive plot of millions of datapoints in Jupyter:
import holoviews as hv, geoviews as gv, dask.dataframe as dd
from colorcet import fire
from holoviews.operation.datashader import datashade
from geoviews.tile_sources import EsriImagery
hv.extension('bokeh')
df = dd.read_parquet('../data/nyc_taxi_wide.parq').persist()
opts = dict(width=700, height=600, xaxis=None, yaxis=None, bgcolor='black')
points = hv.Points(df, ['pickup_x', 'pickup_y'])
taxi_trips = datashade(points, x_sampling=1, y_sampling=1, cmap=fire).options(\*\*opts)
EsriImagery * taxi_trips
Now that we've prototyped a nice plot, we could keep editing the code above to explore this data. But at this point we will instead often wish to start sharing our results with people not familiar with programming visualizations in this way.
So the next step: figure out what we want our intended user to be able to change, and declare those variables or parameters with:
The Param library allows declaring Python attributes having these features (and more, such as dynamic values and inheritance), letting you set up a well-defined space for a user (or you!) to explore.
cmaps = ['bgy','bgyw','bmw','bmy','fire','gray','kbc','kgy']
class NYCTaxiExplorer(param.Parameterized):
alpha = param.Magnitude(default=0.75, doc="Alpha value for the map opacity")
plot = param.ObjectSelector(default="pickup", objects=["pickup","dropoff"])
colormap = param.ObjectSelector(default='fire', objects=cmaps)
passengers = param.Range(default=(0, 10), bounds=(0, 10), doc="""
Filter for taxi trips by number of passengers""")
Each Parameter is a normal Python attribute, but with special checks and functions run automatically when getting or setting.
Parameters capture your goals and your knowledge about your domain, declaratively.
NYCTaxiExplorer.alpha
NYCTaxiExplorer.alpha = 0.5
NYCTaxiExplorer.alpha
try:
NYCTaxiExplorer.alpha = '0'
except Exception as e:
print(e)
try:
NYCTaxiExplorer.passengers = (0,100)
except Exception as e:
print(e)
explorer = NYCTaxiExplorer(alpha=0.6)
explorer.alpha
NYCTaxiExplorer.alpha
import panel as pn
pn.Row(NYCTaxiExplorer)
NYCTaxiExplorer.passengers
We've now defined the space that's available for exploration, and the next step is to link up the parameter space with the code that specifies the plot:
class NYCTaxiExplorer(param.Parameterized):
alpha = param.Magnitude(default=0.75, doc="Alpha value for the map opacity")
colormap = param.ObjectSelector(default='fire', objects=cmaps)
plot = param.ObjectSelector(default="pickup", objects=["pickup","dropoff"])
passengers = param.Range(default=(0, 10), bounds=(0, 10))
def make_view(self, x_range=None, y_range=None, **kwargs):
points = hv.Points(df, kdims=[self.plot+'_x', self.plot+'_y'], vdims=['passenger_count'])
selected = points.select(passenger_count=self.passengers)
taxi_trips = datashade(selected, x_sampling=1, y_sampling=1, cmap=cm[self.colormap],
width=800, height=475)
return EsriImagery.clone(crs=crs.GOOGLE_MERCATOR).options(alpha=self.alpha, **opts) * taxi_trips
Note that the NYCTaxiExplorer
class is entirely declarative (no widgets), and can be used "by hand" to provide range-checked and type-checked plotting for values from the declared parameter space:
explorer = NYCTaxiExplorer(alpha=0.4, plot="dropoff")
explorer.make_view()
But in practice, why not pop up the widgets to make it fully interactive?
explorer = NYCTaxiExplorer()
r = pn.Row(explorer, explorer.make_view)
r
Ok, now you've got something worth sharing, running inside Jupyter. But if you want to share your interactive app with people who don't use Python, you'll now want to run a server with this same code.
r.server_doc();
to the end to tell Bokeh Server which object to show in the dashboardbokeh serve nyc_taxi/main.py
with open('apps/nyc_taxi/main.py', 'r') as f: print(f.read())
The other sections in this tutorial will expand on steps in this workflow, providing more step-by-step instructions for each of the major tasks. These techniques can create much more ambitious apps with very little additional code or effort: