Zarr is a storage format for chunked, N-dimensional arrays. It works well with object storage systems like Azure Blob Storage and open-source libraries like xarray. It's widely used in the geosciences, especially within the Pangeo community.
This example loads Daymet data that are stored in Zarr format into an xarray Dataset. Daymet provides gridded weather data for North America. We'll look at daily frequency data covering Hawaii.
The STAC Collections provided by the Planetary Computer contain assets with links to the root of the Zarr store.
import pystac_client
import planetary_computer
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1/",
modifier=planetary_computer.sign_inplace,
)
collection = catalog.get_collection("daymet-daily-hi")
asset = collection.assets["zarr-abfs"]
The Zarr assets provided by the Planetary Computer implement the xarray-assets extension. These specify the necessary and recommended keywords when loading data from fssspec
-based filesystems and xarray.
list(asset.extra_fields.keys())
['xarray:open_kwargs', 'xarray:storage_options']
In this case, the dataset should be opened with consolidated metadata.
import xarray as xr
ds = xr.open_zarr(
asset.href,
**asset.extra_fields["xarray:open_kwargs"],
storage_options=asset.extra_fields["xarray:storage_options"]
)
ds
<xarray.Dataset> Dimensions: (time: 14965, y: 584, x: 284, nv: 2) Coordinates: lat (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray> lon (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray> * time (time) datetime64[ns] 1980-01-01T12:00:00 ... 20... * x (x) float32 -5.802e+06 -5.801e+06 ... -5.519e+06 * y (y) float32 -3.9e+04 -4e+04 ... -6.21e+05 -6.22e+05 Dimensions without coordinates: nv Data variables: dayl (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray> lambert_conformal_conic int16 ... prcp (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray> srad (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray> swe (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray> time_bnds (time, nv) datetime64[ns] dask.array<chunksize=(365, 2), meta=np.ndarray> tmax (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray> tmin (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray> vp (time, y, x) float32 dask.array<chunksize=(365, 584, 284), meta=np.ndarray> yearday (time) int16 dask.array<chunksize=(365,), meta=np.ndarray> Attributes: Conventions: CF-1.6 Version_data: Daymet Data Version 4.0 Version_software: Daymet Software Version 4.0 citation: Please see http://daymet.ornl.gov/ for current Daymet ... references: Please see http://daymet.ornl.gov/ for current informa... source: Daymet Software Version 4.0 start_year: 1980
At this point we can load the data, aggregate it, and plot it.
import warnings
import matplotlib.pyplot as plt
warnings.simplefilter("ignore", RuntimeWarning)
fig, ax = plt.subplots(figsize=(12, 12))
ds.sel(time="2009")["tmax"].mean(dim="time").plot.imshow(ax=ax, cmap="inferno");
The xarray User Guide has more information on reading Zarr data. For more about the Daymet dataset, see here.