Written by: James A. Bednar and Philipp Rudiger
Created: January 13, 2020
Last updated: August 3, 2021
The Bokeh Python plotting library lets users build interactive apps and plots in a web browser for a very wide variety of data types. The high-level library HoloViews builds on Bokeh, making it easier to use for common data-processing tasks, and the corresponding GeoViews library adds support for plotting in geographic coordinate systems.
These tools now (as of development releases in 1/2020) all support interactively collecting data from the user, not just interacting with existing data, with components provided by HoloViews (and by GeoViews for data on maps) that make it simple to get data into Python ready to process and use for tagging data for machine-learning pipelines (or any other purpose!). These "annotation" and "drawing" tools can be used to annotate existing data sets or geographic locations, to classify each example or regions into categories or with numeric or other labels.
These tools make it possible to work directly with data in its native values (as data) and then immediately use it for further processing in Python. Other tools like labelImg will usually be faster and easier to use for the specific things they do, so if one of those meets your need, use it! Meanwhile, use the Bokeh/HoloViews annotation tools if you want to quickly create a fully custom app for special purposes, especially if you want to stay working with data you are already using in Python, in its native coordinates.
Here, we will focus only on the easy-to-use, high-level "annotator" components from HoloViews; fully custom control is always available by using Bokeh's drawing tools directly.
import holoviews as hv
import geoviews as gv
hv.extension('bokeh')
Let's imagine our task is to mark the locations of trees on satellite images. For convenience, we'll use a tile-based web mapping server where these images have been cleaned up and put into geographic lat,lon coordinates already. In GeoViews, you can easily get a tile source to work with:
tiles = gv.tile_sources.EsriImagery()
tiles
Next, we'll need an object to collect some lat,lon locations. Here's an example with three points already identified:
pts = dict(
Longitude = [-121.932619100, -121.932362392, -121.933530027],
Latitude = [ 36.631164244, 36.629475356, 36.630623206])
opts = dict(size=10, line_color='black', padding=0.1, min_height=400)
points = gv.Points(pts).opts(**opts)
points
These particular points are locations of actual trees somewhere in Monterey, California, as you can see if you overlay them onto the tiles (where *
means "overlay" in HoloViews):
tiles * points
The Bokeh tools in the tool bar let you pan and zoom on this plot interactively, but the data in it is fixed. What if we wanted to label all the trees that we can see here, i.e. add more data points? That's where the HoloViews Annotators come in.
A HoloViews (or GeoViews) annotator lets you add, change, or add information to data in a Bokeh plot, then get the data back into Python easily. Here, let's make an annotator for the points, then overlay the annotated points on the map tiles like we had before:
points_annotator = hv.annotate.instance()
hv.annotate.compose(tiles, points_annotator(points, annotations=dict(Size=int, Type=str)))
You'll see that there is now a table of coordinates and also that there is now a PointDraw tool in the toolbar:
Once you select that tool, you should be able to click and drag any of the existing points and see the location update in the table. Whether you click on the table or the points, the same object should be selected in each, so that you can see how the graphical and tabular representations relate.
The PointDraw tool also allows us to add completely new points; once the tool is selected, just click on the plot above in locations not already containing a point and you can see a new point and a new table row appear ready for editing. You can also delete points by selecting them in the plot or the table then moving back to the plot (if needed) and pressing Backspace or Delete (depending on operating system).
Whether for existing or newly added points, you can use the table to edit the latitude and longitude values numerically or add an optional "Size" estimate or "Type" description for each point.
There are also save and restore tools that help make sure you don't lose work once you've added a lot of data, but we won't have time to cover those here.
Now that we've added some points, let's get the data back into Python as a Pandas DataFrame:
points_annotator.annotated.dframe()
You should see that you can access the current set of user-provided or user-modified points and their user-provided labels from within Python, ready for saving to disk or any subsequent processing you need to do.
We can also access the currently selected
points, in case we care only about a subset of the points (which will be empty if no points/rows are selected):
points_annotator.selected.dframe()
HoloViews data types that can currently be annotated include:
Points
/Scatter
Curve
Path
Polygons
Rectangles
Let's look at the Rectangles
annotator, which behaves very similarly to the Points annotator:
BoxEdit
tool in the toolbar: rectangles = gv.Rectangles([(0, 0, 3, 3), (12, 12, 15, 15)]).opts(fill_alpha=0.2)
box_annotator = hv.annotate.instance()
labels = gv.tile_sources.StamenLabels()
layout = box_annotator(rectangles, name="Rectangles")
hv.annotate.compose(tiles, layout, labels)
As for Points, we can access the data using the annotated
property on the annotator instance, and then use these coordinates as part of any subsequent workflow:
box_annotator.annotated.dframe()
Annotated points associate data with each point, and annotated Rectangles associate data with the entire Rectangle. Annotated Paths and Polygons allow both, i.e. associating one value with the entire object ("this polygon is Arizona"), and associating specific values with each vertex of the object ("this position along the border has elevation X"). This capability makes these annotators more complex (see the HoloViews Annotators user guide and the PolyDraw and PolyEdit docs for more details), but we'll do a brief demo here.
To edit and annotate the vertices, use the draw tool or the first table to select a particular path/polygon and then navigate to the Vertices tab.
path = gv.Path([([-3.208222, -3.203861, -3.203865, -3.202945, -3.205764, -3.208222],
[55.868081, 55.867272, 55.867866, 55.868922, 55.869360, 55.868081]),
([-3.208646, -3.206124, -3.208234, -3.211137, -3.208646],
[55.864370, 55.863135, 55.861888, 55.862793, 55.864370])])
path_annotator = hv.annotate.instance()
layout = path_annotator(path, annotations=['Label'], vertex_annotations=['Value'])
hv.annotate.compose(tiles, layout)
To access the data we can make use of the iloc method on Path
objects to access a particular path, and then access the .data
or convert it to a dataframe:
path_annotator.annotated.iloc[0].dframe()
By the way, for any of the annotators but most usefully for paths and polygons, we can also get the data back as a Shapely geometry if that's more convenient to work with:
path_annotator.annotated.geom()
path_annotator.annotated.iloc[0].geom()
As you can see above, it's fairly straightforward to build an annotator to collect a specific type of data. To collect data at a large scale, you'll want to focus on usability, which will often mean creating a special-purpose app to collect data across multiple images, multiple datasets, by multiple raters, etc. Doing so is beyond the scope of this introduction, but can be straightforward using the separate Panel library for building apps, also based on Bokeh and having full support for HoloViews. The annotator objects can be included directly in a Panel layout and connected to other Panel objects for seamless updating and integration into a larger workflow.
For more details, see:
And please let us know if you find any rough edges or missing functionality in the annotators; these are relatively new to Bokeh, HoloViews, and GeoViews, and can probably be improved as more people try them out for new tasks as long as we know what's needed!