#!/usr/bin/env python
# coding: utf-8

# ## Climate Impact Lab Global Downscaled Projections for Climate Impacts Research
# 
# The Climate Impact Lab Downscaled Projections for Climate Impacts Research (CIL-GDPCR) collections contain bias corrected and downscaled 1/4° CMIP6 projections for temperature and precipitation.
# 
# See the project homepage for more information: [github.com/ClimateImpactLab/downscaleCMIP6](https://github.com/ClimateImpactLab/downscaleCMIP6).
# 
# This tutorial covers accessing the STAC API to explore the collections and open a dataset. Additional tutorials are available at [github.com/microsoft/PlanetaryComputerExamples](https://github.com/microsoft/PlanetaryComputerExamples/blob/main/datasets/cil-gdpcir).
# 

# In[1]:


import planetary_computer
import pystac_client


# ### STAC Metadata
# 
# The [CIL-GDPR datasets](https://planetarycomputer.microsoft.com/dataset/group/cil-gdpcir) are grouped into two collections, depending on the license the data are provided under.
# 
# - [CIL-GDPCIR-CC0](https://planetarycomputer.microsoft.com/dataset/cil-gdpcir-cc0)
# - [CIL-GDPCIR-CC-BY](https://planetarycomputer.microsoft.com/dataset/cil-gdpcir-cc-by)
# 
# The data assets in this collection are a set of [Zarr](https://zarr.readthedocs.io/) groups which can be opend by tools like [xarray](https://xarray.pydata.org/). Each Zarr group contains a single data variable (either `pr`, `tasmax`, or `tasmin`). The Planetary Computer provides a single STAC item per experiment, and each STAC item has one asset per data variable.
# 
# To access the data, we'll create a `pystac_client.Client` to access the Planetary Computer data catalog, including `modifier=planetary_computer.sign_inplace` to automatically sign the returned results. See https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/ for more.

# In[2]:


catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
    modifier=planetary_computer.sign_inplace,
)
collection = catalog.get_collection("cil-gdpcir-cc-by")
item = collection.get_item("cil-gdpcir-NUIST-NESM3-ssp585-r1i1p1f1-day")
item.assets


# The STAC metadata includes all the required fields from the [CMIP 6 controlled vocabularies](https://wcrp-cmip.github.io/CMIP6_CVs/).

# In[3]:


item.properties["cmip6:source_id"]


# ### Querying the STAC API
# 
# Use the Planetary Computer STAC API to find the exact data you want. You'll most likely want to query on the controlled vocabularies fields, under the `cmip6:` prefix.. See the collection summary for the set of allowed values for each of those.

# In[4]:


collection.summaries.to_dict()


# In[5]:


search = catalog.search(
    collections=["cil-gdpcir-cc-by"],
    query={"cmip6:source_id": {"eq": "NESM3"}, "cmip6:experiment_id": {"eq": "ssp585"}},
)
items = search.get_all_items()
len(items)


# In[6]:


item = items[0]
item


# We can now load the assets with xarray:

# In[7]:


import xarray as xr

asset = item.assets["tasmax"]

ds = xr.open_dataset(asset.href, **asset.extra_fields["xarray:open_kwargs"])
ds


# And form a datacube with [`xarray.combine_by_coords`](https://docs.xarray.dev/en/stable/generated/xarray.combine_by_coords.html).

# In[8]:


import xarray as xr

asset = item.assets["tasmax"]
ds = xr.combine_by_coords(
    [
        xr.open_dataset(asset.href, **asset.extra_fields["xarray:open_kwargs"])
        for asset in item.assets.values()
    ],
    combine_attrs="drop_conflicts",
)
ds


# We'll load the data for a specific date and plot the result.

# In[9]:


tmax = ds.tasmax.isel(time=0).load()
tmax.plot(size=10);