ESGF Node at DKRZ: https://esgf-data.dkrz.de/search/cmip6-dkrz/
Using esgf-pyclient
:
https://esgf-pyclient.readthedocs.io/en/latest/notebooks/examples/search.html
from pyesgf.search import SearchConnection
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search',
distrib=False)
Search only CMIP6 files locally available at DKRZ
ctx = conn.new_context(project='CMIP6', data_node='esgf1.dkrz.de,esgf3.dkrz.de', latest=True, replica=False)
ctx.hit_count
Select only one dataset
results = ctx.search(
institution_id='MPI-M',
source_id='MPI-ESM1-2-HR',
experiment_id='historical',
variable='tas',
frequency='day',
variant_label='r1i1p1f1'
)
len(results)
ds = results[0]
ds.json
Get a dataset identifier used by rook
dataset_id = ds.json['instance_id']
dataset_id
Time range
f"{ds.json['datetime_start']}/{ds.json['datetime_stop']})"
Bounding Box: (West, Sout, East, North)
f"({ds.json['west_degrees']}, {ds.json['south_degrees']},{ds.json['east_degrees']}, {ds.json['west_degrees']}, {ds.json['north_degrees']})"
Size in GB
f"{ds.json['size'] / 1024 / 1024 / 1024} GB"
import os
os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'
os.environ['ROOK_MODE'] = 'async'
from rooki import operators as ops
Run subset workflow
bbox_africa = "-23.906250,-35.746512,63.632813,37.996163"
wf = ops.Subset(
ops.Input(
'tas', [dataset_id]
),
time="1850-01-01/1850-12-31",
area=bbox_africa,
)
resp = wf.orchestrate()
resp.ok
Metalink URL
resp.url
Number of files
resp.num_files
Total size in MB
resp.size_in_mb
Download URLs
resp.download_urls()
Download and open with xarray
ds_0 = resp.datasets()[0]
ds_0
Provenance information is given using the PROV standard. https://pypi.org/project/prov/
Provenance: URL to json document
resp.provenance()
Provenance Plot
from IPython.display import Image
Image(resp.provenance_image())