In [1]:

import intake
import pandas as pd

df = intake.open_csv('./data/bird_migration/{species}.csv').read()

def fill_day(v):
    next_year = v.assign(day=v.day + v.day.max())
    last_year = v.assign(day=v.day - v.day.max())
    surrounding_years = pd.concat([last_year, v, next_year])
    filled = surrounding_years.assign(
        lat=surrounding_years.lat.interpolate(), 
        lon=surrounding_years.lon.interpolate())
    this_year = filled[filled.day.isin(v.day)]
    return this_year

df = pd.concat([fill_day(v) for k, v in df.groupby('species')])

Holoviews¶

All of the plots that we made with hvplot in the last notebook are actually holoviews objects rendered in bokeh. holoviews is centered around the idea of annotating your data and letting it display itself. Here we will quickly go through some of the same workflows as we did in the last notebook, but using pure holoviews

In [2]:

import holoviews as hv
from holoviews import opts

hv.extension('bokeh')

Data structures¶

holoviews has it's own concept of a dataset which can accept lots of different types of array-like data such as numpy.array, pandas.Dataframe, xarray.Dataset. To create a holoviews.Dataset we'll define which of the columns in our dataframe are the key dimensions and which are the value dimensions.

In [3]:

bird_ds = hv.Dataset(df, kdims=['lon', 'lat'], vdims=['day', 'species'])
bird_ds

Out[3]:

:Dataset   [lon,lat]   (day,species)

In [4]:

bird_ds.data.head()

Out[4]:

	day	lon	lat	species
0	1	-59.059606	-30.904415	Small-billed_Elaenia
1	2	-59.078229	-30.918065	Small-billed_Elaenia
2	3	-59.097509	-30.931192	Small-billed_Elaenia
3	4	-59.117439	-30.943741	Small-billed_Elaenia
4	5	-59.138009	-30.955660	Small-billed_Elaenia

So we've essentially wrapped some metadata around our data.

Visualizing data¶

Since we annotated our data with which dimensions are keys and which are value dimensions, we don't have to specify that later in the call to plot the data.

In [5]:

p = bird_ds.to(hv.Points).opts(opts.Points(height=500, width=700))
p

Out[5]:

In [6]:

print(p)

:Points   [lon,lat]   (day,species)

We can look at the bird density across the timespan by instead making a hextiles plot.

In [7]:

hv.HexTiles(bird_ds).opts(opts.HexTiles(height=500, width=700))

Out[7]:

Styling¶

We can use a special colormap from colorcet to get the maximum visual distance between each species. Here I am using Glasbey colors and creating a colormap where each bird is mapped to a specific color.

In [8]:

import colorcet as cc

species_cmap = dict(zip(df.species.cat.categories, cc.glasbey))

In [9]:

p.opts(opts.Points(color_index='species', height=500, width=800, legend_position='right', cmap=species_cmap))

Out[9]:

Notice that p itself was not altered, just the way that p was rendered in that one cell.

Grouping¶

Just as we did in hvplot we can group these plots and add widgets to enhance the dimensionality of out visualization.

In [10]:

grouped_birds = p.groupby('day')
grouped_birds.opts(opts.Points(color_index='species', height=500, width=700,
                               show_legend=False, tools=['hover', 'tap', 'box_select'],
                               cmap=species_cmap, size=5))

Out[10]:

In [11]:

print(grouped_birds)

:HoloMap   [day]
   :Points   [lon,lat]   (day,species)

Adding another layer of data¶

It is often useful to add another layer of information under a dataset. In this case we might suspect that birds are motivated to migrate because of changing temperature. We will explore that hypothesis by importing data from a global climate model. For this we will use xarray.

In [12]:

import os
import xarray as xr
import hvplot.xarray

In [13]:

data_url = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep/air.day.ltm.nc'

# I downloaded the file locally because I was hitting rate limits.
local_file = './data/air.day.ltm.nc'
if os.path.isfile(local_file):
    data_url = local_file
    
ds = xr.open_dataset(data_url)

/Users/jsignell/conda/envs/birds/lib/python3.7/site-packages/xarray/coding/times.py:419: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  self.use_cftime)
/Users/jsignell/conda/envs/birds/lib/python3.7/site-packages/numpy/core/numeric.py:538: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  return array(a, dtype, copy=False, order=order)

The time is in a strange format because it is day of year means for each grid cell. We can convert those to integer day of year like we have for our bird data.

In [14]:

ds = ds.rename(time='day')
ds['day'] = list(range(1,366))
ds.day

Out[14]:

<xarray.DataArray 'day' (day: 365)>
array([  1,   2,   3, ..., 363, 364, 365])
Coordinates:
  * day      (day) int64 1 2 3 4 5 6 7 8 9 ... 358 359 360 361 362 363 364 365

We'll use hvplot to quickly take a look at this new data.

In [15]:

ds['air'].hvplot(x='lon', y='lat', groupby=['level', 'day'], height=500)

Out[15]:

Since we are interested in the temperature near the surface of the earth, we only really need the 1000mbar level. We can select that directly from the xarray.dataset.

In [16]:

grouped_air = ds.sel(level=1000).hvplot('lon', 'lat', groupby='day')
grouped_air

Out[16]:

In [17]:

print(grouped_air)

:DynamicMap   [day]
   :Image   [lon,lat]   (air)

NOTE: hvplot defaults to dynamic mapping rather than pre-computed mapping. This makes it much quicker to render, but means that all the plots aren't computed ahead of time, so there is a bit of a lag sometimes.

We can explore how this works by dragging the day slider and checking the keys that we have on our plot:

In [18]:

grouped_air.keys()

Out[18]:

[1]

Since the bird plot and the air plot are both holoviews object on the same axes, we can combine them into one plot. Remember that the air temperature plot was made using hvplot and the bird plot was made using holoviews - but since the outputs are all holoviews objects this history doesn't matter.

In [19]:

(grouped_air * grouped_birds).opts(height=400)

Out[19]:

Hmmm. That doesn't look great. Turns out that the birds and the temperature use different conventions for longitude. This is a great time to realize that all of these data really belong in a geographic context.

Next Section