import intake
import pandas as pd
df = intake.open_csv('./data/bird_migration/{species}.csv').read()
def fill_day(v):
next_year = v.assign(day=v.day + v.day.max())
last_year = v.assign(day=v.day - v.day.max())
surrounding_years = pd.concat([last_year, v, next_year])
filled = surrounding_years.assign(
lat=surrounding_years.lat.interpolate(),
lon=surrounding_years.lon.interpolate())
this_year = filled[filled.day.isin(v.day)]
return this_year
df = pd.concat([fill_day(v) for k, v in df.groupby('species')])
All of the plots that we made with hvplot
in the last notebook are actually holoviews
objects rendered in bokeh. holoviews
is centered around the idea of annotating your data and letting it display itself. Here we will quickly go through some of the same workflows as we did in the last notebook, but using pure holoviews
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
holoviews
has it's own concept of a dataset which can accept lots of different types of array-like data such as numpy.array
, pandas.Dataframe
, xarray.Dataset
. To create a holoviews.Dataset
we'll define which of the columns in our dataframe are the key dimensions and which are the value dimensions.
bird_ds = hv.Dataset(df, kdims=['lon', 'lat'], vdims=['day', 'species'])
bird_ds
:Dataset [lon,lat] (day,species)
bird_ds.data.head()
day | lon | lat | species | |
---|---|---|---|---|
0 | 1 | -59.059606 | -30.904415 | Small-billed_Elaenia |
1 | 2 | -59.078229 | -30.918065 | Small-billed_Elaenia |
2 | 3 | -59.097509 | -30.931192 | Small-billed_Elaenia |
3 | 4 | -59.117439 | -30.943741 | Small-billed_Elaenia |
4 | 5 | -59.138009 | -30.955660 | Small-billed_Elaenia |
So we've essentially wrapped some metadata around our data.
Since we annotated our data with which dimensions are keys and which are value dimensions, we don't have to specify that later in the call to plot the data.
p = bird_ds.to(hv.Points).opts(opts.Points(height=500, width=700))
p
print(p)
:Points [lon,lat] (day,species)
We can look at the bird density across the timespan by instead making a hextiles plot.
hv.HexTiles(bird_ds).opts(opts.HexTiles(height=500, width=700))
We can use a special colormap from colorcet
to get the maximum visual distance between each species. Here I am using Glasbey colors and creating a colormap where each bird is mapped to a specific color.
import colorcet as cc
species_cmap = dict(zip(df.species.cat.categories, cc.glasbey))
p.opts(opts.Points(color_index='species', height=500, width=800, legend_position='right', cmap=species_cmap))
Notice that p
itself was not altered, just the way that p
was rendered in that one cell.
Just as we did in hvplot
we can group these plots and add widgets to enhance the dimensionality of out visualization.
grouped_birds = p.groupby('day')
grouped_birds.opts(opts.Points(color_index='species', height=500, width=700,
show_legend=False, tools=['hover', 'tap', 'box_select'],
cmap=species_cmap, size=5))
print(grouped_birds)
:HoloMap [day] :Points [lon,lat] (day,species)
It is often useful to add another layer of information under a dataset. In this case we might suspect that birds are motivated to migrate because of changing temperature. We will explore that hypothesis by importing data from a global climate model. For this we will use xarray
.
import os
import xarray as xr
import hvplot.xarray
data_url = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep/air.day.ltm.nc'
# I downloaded the file locally because I was hitting rate limits.
local_file = './data/air.day.ltm.nc'
if os.path.isfile(local_file):
data_url = local_file
ds = xr.open_dataset(data_url)
/Users/jsignell/conda/envs/birds/lib/python3.7/site-packages/xarray/coding/times.py:419: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range self.use_cftime) /Users/jsignell/conda/envs/birds/lib/python3.7/site-packages/numpy/core/numeric.py:538: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range return array(a, dtype, copy=False, order=order)
The time is in a strange format because it is day of year means for each grid cell. We can convert those to integer day of year like we have for our bird data.
ds = ds.rename(time='day')
ds['day'] = list(range(1,366))
ds.day
<xarray.DataArray 'day' (day: 365)> array([ 1, 2, 3, ..., 363, 364, 365]) Coordinates: * day (day) int64 1 2 3 4 5 6 7 8 9 ... 358 359 360 361 362 363 364 365
We'll use hvplot
to quickly take a look at this new data.
ds['air'].hvplot(x='lon', y='lat', groupby=['level', 'day'], height=500)
Since we are interested in the temperature near the surface of the earth, we only really need the 1000mbar level. We can select that directly from the xarray.dataset
.
grouped_air = ds.sel(level=1000).hvplot('lon', 'lat', groupby='day')
grouped_air
print(grouped_air)
:DynamicMap [day] :Image [lon,lat] (air)
NOTE: hvplot
defaults to dynamic mapping rather than pre-computed mapping. This makes it much quicker to render, but means that all the plots aren't computed ahead of time, so there is a bit of a lag sometimes.
We can explore how this works by dragging the day
slider and checking the keys that we have on our plot:
grouped_air.keys()
[1]
Since the bird plot and the air plot are both holoviews object on the same axes, we can combine them into one plot. Remember that the air temperature plot was made using hvplot
and the bird plot was made using holoviews
- but since the outputs are all holoviews objects this history doesn't matter.
(grouped_air * grouped_birds).opts(height=400)
Hmmm. That doesn't look great. Turns out that the birds and the temperature use different conventions for longitude. This is a great time to realize that all of these data really belong in a geographic context.