Accessing Daymet data with the Planetary Computer STAC API¶
The Daymet dataset contains daily minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length at 1km resolution for North America, Hawaii, and Puerto Rico. Annual and monthly summaries are also available. The dataset covers the period from January 1, 1980 to December 31, 2020.
Daymet is available in both NetCDF and Zarr format on Azure; this notebook demonstrates using the Planetary Computer STAC API to access to the Zarr data, which can be read into an xarrayDataset. If you just need a subset of the data, we recommend using xarray and Zarr to avoid downloading the full dataset unnecessarily.
There are 9 STAC Collections representing the unique combination of the 3 regions (na, hi, pr) and 3 frequencies (daily, monthly, annual), each prefixed with the daymet key. For example, daymet-monthly-na or daymet-annual-pr. Each collection contains all the metadata necessary to load the dataset with xarray, and can be read by PySTAC.
Using xarray, we can quickly select subsets of the data, perform an aggregation, and plot the result. For example, we'll plot the average of the maximum temperature for the year 2020. We can tell xarray to keep the attributes from the Dataset on the resulting DataArray.
Each of the datasets is chunked to allow for parallel and out-of-core or distributed processing with Dask. The different frequencies (daily, monthly, annual) are chunked so that each year is in a single chunk. The different regions in the x and y coordinates are chunked so that no single chunk is larger than about 250 MB, which is primarily important for the na region.
So our prcp array has a shape (14965, 584, 284) where each chunk is (365, 584, 284). Examining the store for monthly North America, we see the chunks each have size (12, 2000, 2000).
North America is considerably larger than the Hawaii or Puerto Rico dataset, so let's downsample a bit for quicker plotting. We'll also start up a Dask cluster to do reads and processing in parallel. If you're running this on the Hub, use the following URL in the Dask Extension to see progress. If you're not running it on the hub, you can use a distributed.LocalCluster to achieve the same result (but it will take longer, since it's running on a single machine).
The dataset contains a lot of months. Let's compute the average precipition over the whole continent for each month in the last 10 years. From that, we'll grab the top 6 wettest months and plot them.