Read NWM Forecast "Best Time Series"¶

Here we use Fsspec ReferenceFileSystem to read data from 300 NWM Forecast netcdf files on AWS Public Dataset as a single virtual dataset. We used tau=0 for each forecast in the past + the latest forecast to construct the best time series virtual dataset. The only new file we created was the JSON file that points to the original NetCDF files. We read that JSON with the Zarr library.

In [1]:

import fsspec
import xarray as xr
import json
import intake
import hvplot.xarray

In [2]:

from dask.distributed import Client

In [3]:

client = Client()

In [4]:

client

Out[4]:

Client

Client-d3c6ed91-eb2f-11eb-87f9-866b929f3eda

Connection method: Cluster object	Cluster type: LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

9df726f9

Status: running	Using processes: True
Dashboard: http://127.0.0.1:8787/status	Workers: 4
Total threads: 4	Total memory: 7.00 GiB

Scheduler Info

Scheduler

Scheduler-98506d3e-8ca6-4612-ac0f-87706ed9ab44

Comm: tcp://127.0.0.1:45887	Workers: 4
Dashboard: http://127.0.0.1:8787/status	Total threads: 4
Started: Just now	Total memory: 7.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:40837	Total threads: 1
Dashboard: http://127.0.0.1:34083/status	Memory: 1.75 GiB
Nanny: tcp://127.0.0.1:37085
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-oev7tcdk

Worker: 1

Comm: tcp://127.0.0.1:38157	Total threads: 1
Dashboard: http://127.0.0.1:38067/status	Memory: 1.75 GiB
Nanny: tcp://127.0.0.1:36763
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-pz9n07nb

Worker: 2

Comm: tcp://127.0.0.1:39359	Total threads: 1
Dashboard: http://127.0.0.1:42049/status	Memory: 1.75 GiB
Nanny: tcp://127.0.0.1:43469
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-494hnb5v

Worker: 3

Comm: tcp://127.0.0.1:34085	Total threads: 1
Dashboard: http://127.0.0.1:34643/status	Memory: 1.75 GiB
Nanny: tcp://127.0.0.1:34419
Local directory: /home/shared/users/rsignell/notebooks/NWM/dask-worker-space/worker-s29u13l3

Load the consolidated JSON for the last 300 time steps using an Intake Catalog¶

In [5]:

%%time
cat = intake.open_catalog('s3://esip-qhub/usgs/nwm_intake.yml', 
                          storage_options={"requester_pays": True})

CPU times: user 176 ms, sys: 46 ms, total: 222 ms
Wall time: 1.5 s

In [6]:

list(cat)

Out[6]:

['nwm-rfs', 'nwm-forecast']

In [7]:

ds = cat['nwm-forecast'].to_dask()

ds.streamflow

Out[7]:

<xarray.DataArray 'streamflow' (time: 300, feature_id: 2776738)>
dask.array<open_dataset-08570eeab2de47aca18174cf56c6b308streamflow, shape=(300, 2776738), dtype=float64, chunksize=(1, 925580), chunktype=numpy.ndarray>
Coordinates:
  * feature_id  (feature_id) float64 101.0 179.0 181.0 ... 1.18e+09 1.18e+09
  * time        (time) datetime64[ns] 2021-07-09T23:00:00 ... 2021-07-23T10:0...
Attributes:
    _Netcdf4Dimid:  0
    coordinates:    latitude longitude
    grid_mapping:   crs
    long_name:      River Flow
    units:          m3 s-1
    valid_range:    [0, 5000000]

xarray.DataArray

'streamflow'

time: 300
feature_id: 2776738

dask.array<chunksize=(1, 925580), meta=np.ndarray>

	Array	Chunk
Bytes	6.21 GiB	7.06 MiB
Shape	(300, 2776738)	(1, 925580)
Count	901 Tasks	900 Chunks
Type	float64	numpy.ndarray

Coordinates: (2)
- feature_id
  (feature_id)
  float64
  101.0 179.0 ... 1.18e+09 1.18e+09
  NAME :
  feature_id
  _Netcdf4Dimid :
  0
  cf_role :
  timeseries_id
  comment :
  NHDPlusv2 ComIDs within CONUS, arbitrary Reach IDs outside of CONUS
  long_name :
  Reach ID
```
array([1.010000e+02, 1.790000e+02, 1.810000e+02, ..., 1.180002e+09,
       1.180002e+09, 1.180002e+09])
```
- time
  (time)
  datetime64[ns]
  2021-07-09T23:00:00 ... 2021-07-...
  NAME :
  time
  _Netcdf4Dimid :
  1
  long_name :
  valid output time
  standard_name :
  time
  valid_max :
  27098880
  valid_min :
  27097860
```
array(['2021-07-09T23:00:00.000000000', '2021-07-10T00:00:00.000000000',
       '2021-07-10T01:00:00.000000000', ..., '2021-07-23T08:00:00.000000000',
       '2021-07-23T09:00:00.000000000', '2021-07-23T10:00:00.000000000'],
      dtype='datetime64[ns]')
```
Attributes: (6)
_Netcdf4Dimid :
0
coordinates :
latitude longitude
grid_mapping :
crs
long_name :
River Flow
units :
m3 s-1
valid_range :
[0, 5000000]

Find the site with the largest streamflow:¶

In [8]:

%%time
imax = ds['streamflow'].sel(time='2021-07-22 00:00:00', method='nearest').argmax().values

CPU times: user 114 ms, sys: 22.2 ms, total: 137 ms
Wall time: 4.78 s

Plot the "best time series" from that location:¶

In [9]:

%%time
ds.streamflow[:,imax].hvplot(x='time', grid=True)

CPU times: user 957 ms, sys: 101 ms, total: 1.06 s
Wall time: 15.9 s

Out[9]:

What does this magical intake catalog look like?¶

In [10]:

print(cat.text)

sources:
  nwm-rfs:
    driver: intake_xarray.xzarr.ZarrSource
    description: 'National Water Model Reanalysis'
    args:
      urlpath: "reference://"
      storage_options:
        target_options:
          requester_pays: true
        target_protocol: s3
        fo: 's3://coawst-public/testing/nwm.json'
        remote_options:
          anon: true
        remote_protocol: s3
  nwm-forecast:
    driver: intake_xarray.xzarr.ZarrSource
    description: 'National Water Model Forecast Best Time Series'
    args:
      urlpath: "reference://"
      storage_options:
        target_options:
          requester_pays: true
        target_protocol: s3
        fo: 's3://esip-qhub/usgs/forecast/nwm.json'
        remote_options:
          anon: true
        remote_protocol: s3

In [11]:

cat['nwm-forecast']

nwm-forecast:
  args:
    storage_options:
      fo: s3://esip-qhub/usgs/forecast/nwm.json
      remote_options:
        anon: true
      remote_protocol: s3
      target_options:
        requester_pays: true
      target_protocol: s3
    urlpath: reference://
  description: National Water Model Forecast Best Time Series
  driver: intake_xarray.xzarr.ZarrSource
  metadata:
    Conventions: CF-1.6
    NWM_version_number: v2.1
    TITLE: OUTPUT FROM NWM v2.1
    _NCProperties: version=2,netcdf=4.7.4,hdf5=1.10.6,
    catalog_dir: s3://esip-qhub/usgs
    cdm_datatype: Station
    code_version: v5.2.0-beta2
    coords: !!python/tuple
    - feature_id
    - time
    data_vars:
      nudge:
      - feature_id
      - time
      qBtmVertRunoff:
      - feature_id
      - time
      qBucket:
      - feature_id
      - time
      qSfcLatRunoff:
      - feature_id
      - time
      streamflow:
      - feature_id
      - time
      velocity:
      - feature_id
      - time
    dev: dev_ prefix indicates development/internal meta data
    dev_NOAH_TIMESTEP: 3600
    dev_OVRTSWCRT: 1
    dev_channelBucket_only: 0
    dev_channel_only: 0
    dims:
      feature_id: 2776738
      time: 300
    featureType: timeSeries
    model_configuration: short_range
    model_initialization_time: 2021-07-09_22:00:00
    model_output_type: channel_rt
    model_output_valid_time: 2021-07-09_23:00:00
    model_total_valid_times: 18
    proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0
      +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@
    station_dimension: feature_id
    stream_order_output: 1

In [ ]: