ESGF Node at DKRZ: https://esgf-data.dkrz.de/search/cmip5-dkrz/
Using esgf-pyclient
:
https://esgf-pyclient.readthedocs.io/en/latest/notebooks/examples/search.html
from pyesgf.search import SearchConnection
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search',
distrib=False)
Search only CMIP5 files locally available at DKRZ
ctx = conn.new_context(project='CMIP5', data_node='esgf1.dkrz.de,esgf3.dkrz.de', latest=True, replica=False)
ctx.hit_count
Select only one dataset ... use variable later for filtering.
variable = 'tas'
results = ctx.search(
institute='MPI-M',
model='MPI-ESM-MR',
experiment='historical',
realm='atmos',
variable=variable,
time_frequency='day',
ensemble='r1i1p1'
)
len(results)
ds = results[0]
ds.json
Get a dataset identifier used by rook
dataset_id = ds.json['instance_id']
dataset_id
Time range
f"{ds.json['datetime_start']}/{ds.json['datetime_stop']})"
Bounding Box: (West, Sout, East, North)
f"({ds.json['west_degrees']}, {ds.json['south_degrees']},{ds.json['east_degrees']}, {ds.json['west_degrees']}, {ds.json['north_degrees']})"
Size in GB
f"{ds.json['size'] / 1024 / 1024 / 1024} GB"
import os
os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'
os.environ['ROOK_MODE'] = 'async'
from rooki import operators as ops
Run subset workflow
bbox_africa = "-23.906250,-35.746512,63.632813,37.996163"
wf = ops.Subset(
ops.Input(
f'{variable}', [f"{dataset_id}.{variable}"]
),
time="1850-01-01/1850-12-31",
area=bbox_africa,
)
resp = wf.orchestrate()
resp.ok
Metalink URL
resp.url
Number of files
resp.num_files
Total size in MB
resp.size_in_mb
Download URLs
resp.download_urls()
Download and open with xarray
ds_0 = resp.datasets()[0]
ds_0
Provenance information is given using the PROV standard. https://pypi.org/project/prov/
Provenance: URL to json document
resp.provenance()
Provenance Plot
from IPython.display import Image
Image(resp.provenance_image())