GeoViews
¶A team at Google Research & Cloud are making parts of the ECMWF Reanalysis version 5 (aka ERA-5) accessible in a Analysis Ready, Cloud Optimized (aka ARCO) format.
In this notebook, we will do the following:
import fsspec
import xarray as xr
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import scipy.spatial
import numpy as np
import geoviews as gv
from geoviews import opts
from pyproj import Transformer
from holoviews.operation.datashader import rasterize
Let's open the single-level-reanalysis Zarr file and save two variables from it
reanalysis = xr.open_zarr(
'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr',
chunks={'time': 48},
consolidated=True,
)
msl = reanalysis.msl
t2m = reanalysis.t2m
These reanalyses are in their native, Guassian coordinates. We will define and use several functions to convert them to a lat-lon grid, as documented in the ARCO ERA-5 GCP example notebooks
def mirror_point_at_360(ds):
extra_point = (
ds.where(ds.longitude == 0, drop=True)
.assign_coords(longitude=lambda x: x.longitude + 360)
)
return xr.concat([ds, extra_point], dim='values')
def build_triangulation(x, y):
grid = np.stack([x, y], axis=1)
return scipy.spatial.Delaunay(grid)
def interpolate(data, tri, mesh):
indices = tri.find_simplex(mesh)
ndim = tri.transform.shape[-1]
T_inv = tri.transform[indices, :ndim, :]
r = tri.transform[indices, ndim, :]
c = np.einsum('...ij,...j', T_inv, mesh - r)
c = np.concatenate([c, 1 - c.sum(axis=-1, keepdims=True)], axis=-1)
result = np.einsum('...i,...i', data[:, tri.simplices[indices]], c)
return np.where(indices == -1, np.nan, result)
Select a particular time range from the dataset.
msl93 = msl.sel(time=slice('1993-03-13T18:00:00','1993-03-14T00:00:00')).compute().pipe(mirror_point_at_360)
t2m93 = t2m.sel(time=slice('1993-03-13T18:00:00','1993-03-14T00:00:00')).compute().pipe(mirror_point_at_360)
Regrid to a lat-lon grid.
tri = build_triangulation(msl93.longitude, msl93.latitude)
longitude = np.linspace(0, 360, num=360*4+1)
latitude = np.linspace(-90, 90, num=180*4+1)
mesh = np.stack(np.meshgrid(longitude, latitude, indexing='ij'), axis=-1)
grid_mesh_msl = interpolate(msl93.values, tri, mesh)
grid_mesh_t2m = interpolate(t2m93.values, tri, mesh)
Construct an Xarray DataArray
from the regridded array.
da_msl = xr.DataArray(data=grid_mesh_msl,
dims=["time", "longitude", "latitude"],
coords=[('time', msl93.time.data), ('longitude', longitude), ('latitude', latitude)],
name='msl')
da_t2m = xr.DataArray(data=grid_mesh_t2m,
dims=["time", "longitude", "latitude"],
coords=[('time', t2m93.time.data), ('longitude', longitude), ('latitude', latitude)],
name='t2m')
Convert to hPa and deg C.
slp = da_msl/100
t2m = da_t2m-273.15
gv.extension('bokeh')
If you want the plot restricted to a specific part of the world, you can specify where, however, we need to transform the coordinates from PlateCarree to WebMercator (aka EPSG-3857), which is the default projection for Geoviews with the Bokeh backend.
lonW, lonE, latS, latN = -130, -60, 20, 55
transformer = Transformer.from_crs(ccrs.PlateCarree(), "EPSG:3857")
lon1, lat1 = transformer.transform(lonW,latS)
lon2, lat2 = transformer.transform(lonE, latN)
Convert our Xarray datasets/data arrays into Geoviews dataset objects, specifying the different dimensions of the dataset (lat, lon, time) as well as the variable we want to plot. For this first example, we'll just visualize the first time in the selected time range.
Note:
By default, Geoviews expects a dataset whose coordinates are lat-lon (i.e., using aPlateCarree
projection) to have the longitude dimension vary from -180 to 180. In this case, the ERA-5's longitudes range from 0 to 360.
crs
of the element declares the actual central_longitude
, e.g. 180 degrees rather than the default 0 degrees.
dataset_era = gv.Dataset(slp.isel(time=0), kdims=['longitude','latitude'],vdims='msl') #create a geoviews dataset
contour = gv.project(dataset_era.to(gv.LineContours, ['longitude', 'latitude'],crs=ccrs.PlateCarree(central_longitude=180))) #create a Geoviews contour object
cint = np.arange(900,1080,4)
(gv.tile_sources.OSM * contour).opts(
opts.LineContours(tools=['hover'], levels=cint, frame_width=700, frame_height=400,show_legend=False, line_width=3,xlim=(lon1,lon2),ylim=(lat1,lat2)))
Use the Bokeh tools to the right of the plot to zoom in/out or reset to the original dimensions. The bottom-most tool tip (the hover tool) will produce a pop-up window that shows the sea-level pressure value at the closest grid point to where your cursor lies.
Info
Notice how this only plots a single time. We can iterate over the time dimension by specifying time as a dimension when we create the Geoviews dataset, as shown below!dataset_era = gv.Dataset(slp, kdims=['longitude','latitude','time'],vdims='msl') #create a geoviews dataset; here we do not select just a single time
contour = gv.project(dataset_era.to(gv.LineContours, ['longitude', 'latitude'])) #create line contours
Create the interactive visualization. We overlay the SLP contours on a map from a web tile server. We specify various options, such as frame size, spatial extent, and line thickness as well.
(gv.tile_sources.OSM * contour).opts(
opts.LineContours(tools=['hover'], levels=cint, frame_width=700, frame_height=400,show_legend=False, line_width=3,xlim=(lon1,lon2),ylim=(lat1,lat2)))
Since there are multiple times in the dataset, we now get a time slider that you can manipulate. Notice that as you change the time, the plot automatically updates!
Note
Did you notice that it took a little longer for the graphic to appear? That is because we have loaded seven total time steps, instead of just one.Now let's plot 2 meter temperatures. This time, we'll visualize the data via a Quadmesh (gv.Quadmesh
) instead of using contour lines (gv.LineContours
).
Quadmesh is more computationally expensive, so it is good practice to subset the data from its original global extent to an area that you're interested in. Here, we select the same region as we did earlier for sea-level pressure.
def lon_to_360(dlon: float) -> float:
return ((360 + (dlon % 360)) % 360)
cond = (t2m.longitude > lon_to_360(lonW)) & (t2m.latitude > latS) & \
(t2m.longitude < lon_to_360(lonE)) & (t2m.latitude < latN)
t2m_cropped = t2m.where(cond, drop=True)
dataset_era = gv.Dataset(t2m_cropped, kdims=['longitude','latitude','time'],vdims='t2m')
qm = gv.project(dataset_era.to(gv.QuadMesh, ['longitude', 'latitude']))
(gv.tile_sources.OSM().opts(xlim=(lon1, lon2),ylim=(lat1, lat2),title=f'ERA-5 2mT', frame_width=700, frame_height=400) * (qm.opts(tools=['hover'], axiswise=True, cmap='coolwarm', colorbar=True, clim=(-50,50), alpha=0.8)))
You probably noticed this visualization took a while to render. A more performant strategy is to rasterize the Geoviews dataset.
Warning
If you plot a large dataset without rasterizing it, e.g. the entire globe for multiple timeplots of data, load times and RAM will scale up in tandem.qm_raster = rasterize(qm, precompute=True)
(gv.tile_sources.OSM().opts(xlim=(lon1, lon2),ylim=(lat1, lat2),title=f'ERA-5 2mT', frame_width=700, frame_height=400) * (qm_raster.opts(tools=['hover'], axiswise=True, cmap='coolwarm', colorbar=True, clim=(-50,50), alpha=0.8)))
Time slider bug!
While the visualization loaded much faster, the rasterized dataset does not update as we manipulate the time slider!In this notebook, we have accessed one of the ARCO ERA-5 datasets, regridded from the ECMWF native spectral to cartesian lat-lon coordinates, and used geoviews to create interactive maps of sea-level pressure and 2-meter temperature.