#!/usr/bin/env python
# coding: utf-8

# ## Accessing ECMWF Open Data – Real Time
# 
# The Planetary Computer includes data from the ECMWF Open Data (real time) program. See the [dataset](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) page and [ECMWF Uesr Guide](https://confluence.ecmwf.int/display/UDOC/ECMWF+Open+Data+-+Real+Time) for more.
# 
# Each item in this collection includes metadata about the *stream* (or forecasting system) and *type* that produced that particular dataset. Filter on these values to select the item of interest. For example, we can select data from the `wave` stream, with the `fc` type that are forecast `0h` hours out.

# In[ ]:


import pystac_client
import planetary_computer

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)
search = catalog.search(
    collections=["ecmwf-forecast"],
    query={
        "ecmwf:stream": {"eq": "wave"},
        "ecmwf:type": {"eq": "fc"},
        "ecmwf:step": {"eq": "0h"},
    },
)
items = search.get_all_items()
len(items)


# We'll select the most recent item, using the item's datetime.

# In[2]:


item = max(items, key=lambda item: item.datetime)
item


# This STAC item has two assets. One asset is the actual GRIB2 file with the data. The second asset is the "index" file, which contains information about the messages within the GRIB2 file.

# In[3]:


url = item.assets["data"].href
url


# To open the file with xarray, we can download it locally and open it with `cfgrib`.

# In[4]:


import urllib.request
import xarray as xr

filename, _ = urllib.request.urlretrieve(url)

ds = xr.open_dataset(filename, engine="cfgrib")
ds


# We can plot the various data variables, for example the significant height of combined wind waves and swell.

# In[5]:


import matplotlib.pyplot as plt
import cartopy.crs as ccrs

projection = ccrs.Robinson()
fig, ax = plt.subplots(figsize=(16, 9), subplot_kw=dict(projection=projection))

ds.swh.plot(ax=ax, transform=ccrs.PlateCarree());


# Or the joint distribution between the wave period and height.

# In[6]:


import seaborn as sns

grid = sns.jointplot(
    x=ds.mwp.data.ravel(), y=ds.swh.data.ravel(), alpha=0.25, marker=".", height=12
)
grid.ax_joint.set(xlabel="Mean wave period", ylabel="Significant wave height");


# ### GRIB2 files with multiple datasets 
# 
# Some GRIB2 files contain many messages that for separate xarray Datasets. For example, we can find the STAC item containing the atmospheric fields ensemble forecast (`stream=enfo`) from the ensemble forecast model (`type=ef`).

# In[7]:


search = catalog.search(
    collections=["ecmwf-forecast"],
    query={
        "ecmwf:stream": {"eq": "enfo"},
        "ecmwf:type": {"eq": "ef"},
        "ecmwf:step": {"eq": "0h"},
    },
)
items = search.get_all_items()
print(len(items), "matched")
# select the most recent forecast
item = max(items, key=lambda item: item.datetime)


# In[8]:


url = item.assets["data"].href
filename, _ = urllib.request.urlretrieve(url)


# If we provided just the `filename` to `xarray.open_dataset`, we'd get an error from `cfgrib` saying it can't form a valid DataArray from the file. That's because the GRIB2 file contains multiple data variables that don't form a neat hypercube. Provide `filter_by_keys` to indicate which subset of the data to read in.

# In[9]:


ds = xr.open_dataset(
    filename,
    engine="cfgrib",
    filter_by_keys={"dataType": "pf", "typeOfLevel": "surface"},
)
ds


# Now we can plot the skin temperature, `skt`.

# In[ ]:


import matplotlib.pyplot as plt
import cartopy.crs as ccrs

projection = projection = ccrs.Robinson()
fig, ax = plt.subplots(figsize=(16, 9), subplot_kw=dict(projection=projection))

ds.skt[0].plot(ax=ax, transform=ccrs.PlateCarree());


# There are many more streams and forecast types available. See the [catalog](http://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) for more.