Here we use Fsspec ReferenceFileSystem to read data from 300 NWM Forecast netcdf files on AWS Public Dataset as a single virtual dataset. We used tau=0 for each forecast in the past + the latest forecast to construct the best time series virtual dataset. The only new file we created was the JSON file that points to the original NetCDF files. We read that JSON with the Zarr library.
import fsspec
import xarray as xr
import json
import intake
import hvplot.xarray
from dask.distributed import Client
client = Client()
client
Client-d3c6ed91-eb2f-11eb-87f9-866b929f3eda
Connection method: Cluster object | Cluster type: LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
9df726f9
Status: running | Using processes: True |
Dashboard: http://127.0.0.1:8787/status | Workers: 4 |
Total threads: 4 | Total memory: 7.00 GiB |
Scheduler-98506d3e-8ca6-4612-ac0f-87706ed9ab44
Comm: tcp://127.0.0.1:45887 | Workers: 4 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 4 |
Started: Just now | Total memory: 7.00 GiB |
Comm: tcp://127.0.0.1:40837 | Total threads: 1 |
Dashboard: http://127.0.0.1:34083/status | Memory: 1.75 GiB |
Nanny: tcp://127.0.0.1:37085 | |
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-oev7tcdk |
Comm: tcp://127.0.0.1:38157 | Total threads: 1 |
Dashboard: http://127.0.0.1:38067/status | Memory: 1.75 GiB |
Nanny: tcp://127.0.0.1:36763 | |
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-pz9n07nb |
Comm: tcp://127.0.0.1:39359 | Total threads: 1 |
Dashboard: http://127.0.0.1:42049/status | Memory: 1.75 GiB |
Nanny: tcp://127.0.0.1:43469 | |
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-494hnb5v |
Comm: tcp://127.0.0.1:34085 | Total threads: 1 |
Dashboard: http://127.0.0.1:34643/status | Memory: 1.75 GiB |
Nanny: tcp://127.0.0.1:34419 | |
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-s29u13l3 |
%%time
cat = intake.open_catalog('s3://esip-qhub/usgs/nwm_intake.yml',
storage_options={"requester_pays": True})
CPU times: user 176 ms, sys: 46 ms, total: 222 ms Wall time: 1.5 s
list(cat)
['nwm-rfs', 'nwm-forecast']
ds = cat['nwm-forecast'].to_dask()
ds.streamflow
<xarray.DataArray 'streamflow' (time: 300, feature_id: 2776738)> dask.array<open_dataset-08570eeab2de47aca18174cf56c6b308streamflow, shape=(300, 2776738), dtype=float64, chunksize=(1, 925580), chunktype=numpy.ndarray> Coordinates: * feature_id (feature_id) float64 101.0 179.0 181.0 ... 1.18e+09 1.18e+09 * time (time) datetime64[ns] 2021-07-09T23:00:00 ... 2021-07-23T10:0... Attributes: _Netcdf4Dimid: 0 coordinates: latitude longitude grid_mapping: crs long_name: River Flow units: m3 s-1 valid_range: [0, 5000000]
|
array([1.010000e+02, 1.790000e+02, 1.810000e+02, ..., 1.180002e+09, 1.180002e+09, 1.180002e+09])
array(['2021-07-09T23:00:00.000000000', '2021-07-10T00:00:00.000000000', '2021-07-10T01:00:00.000000000', ..., '2021-07-23T08:00:00.000000000', '2021-07-23T09:00:00.000000000', '2021-07-23T10:00:00.000000000'], dtype='datetime64[ns]')
%%time
imax = ds['streamflow'].sel(time='2021-07-22 00:00:00', method='nearest').argmax().values
CPU times: user 114 ms, sys: 22.2 ms, total: 137 ms Wall time: 4.78 s
%%time
ds.streamflow[:,imax].hvplot(x='time', grid=True)
CPU times: user 957 ms, sys: 101 ms, total: 1.06 s Wall time: 15.9 s
print(cat.text)
sources: nwm-rfs: driver: intake_xarray.xzarr.ZarrSource description: 'National Water Model Reanalysis' args: urlpath: "reference://" storage_options: target_options: requester_pays: true target_protocol: s3 fo: 's3://coawst-public/testing/nwm.json' remote_options: anon: true remote_protocol: s3 nwm-forecast: driver: intake_xarray.xzarr.ZarrSource description: 'National Water Model Forecast Best Time Series' args: urlpath: "reference://" storage_options: target_options: requester_pays: true target_protocol: s3 fo: 's3://esip-qhub/usgs/forecast/nwm.json' remote_options: anon: true remote_protocol: s3
cat['nwm-forecast']
nwm-forecast: args: storage_options: fo: s3://esip-qhub/usgs/forecast/nwm.json remote_options: anon: true remote_protocol: s3 target_options: requester_pays: true target_protocol: s3 urlpath: reference:// description: National Water Model Forecast Best Time Series driver: intake_xarray.xzarr.ZarrSource metadata: Conventions: CF-1.6 NWM_version_number: v2.1 TITLE: OUTPUT FROM NWM v2.1 _NCProperties: version=2,netcdf=4.7.4,hdf5=1.10.6, catalog_dir: s3://esip-qhub/usgs cdm_datatype: Station code_version: v5.2.0-beta2 coords: !!python/tuple - feature_id - time data_vars: nudge: - feature_id - time qBtmVertRunoff: - feature_id - time qBucket: - feature_id - time qSfcLatRunoff: - feature_id - time streamflow: - feature_id - time velocity: - feature_id - time dev: dev_ prefix indicates development/internal meta data dev_NOAH_TIMESTEP: 3600 dev_OVRTSWCRT: 1 dev_channelBucket_only: 0 dev_channel_only: 0 dims: feature_id: 2776738 time: 300 featureType: timeSeries model_configuration: short_range model_initialization_time: 2021-07-09_22:00:00 model_output_type: channel_rt model_output_valid_time: 2021-07-09_23:00:00 model_total_valid_times: 18 proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@ station_dimension: feature_id stream_order_output: 1