Just as when opening from a filesystem, when opening from s3, xarray loads:
We demonstrate this here opening a 25GB NetCDF file using S3 in a matter of seconds with memory usage below 1GB.
import fsspec
import xarray as xr
fs = fsspec.filesystem('s3', anon=True, client_kwargs=dict(endpoint_url='https://ncsa.osn.xsede.org'))
ncfile_on_s3 = 's3://esip/examples/adcirc/adcirc_01.nc'
fs.size(ncfile_on_s3)/1e9 # GB
26.140007264
%%time
ds = xr.open_dataset(fs.open(ncfile_on_s3), chunks={'time':10, 'node':141973})
CPU times: user 3.27 s, sys: 312 ms, total: 3.58 s Wall time: 9.4 s
ds.zeta
<xarray.DataArray 'zeta' (time: 720, node: 9228245)> dask.array<open_dataset-247d281a3fb8e3c3d9504cca1b0c84eczeta, shape=(720, 9228245), dtype=float64, chunksize=(10, 141973), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2031-08-03T02:10:00 ... 2031-08-08T02:00:00 x (node) float64 dask.array<chunksize=(141973,), meta=np.ndarray> y (node) float64 dask.array<chunksize=(141973,), meta=np.ndarray> Dimensions without coordinates: node Attributes: location: node long_name: water surface elevation above geoid mesh: adcirc_mesh standard_name: sea_surface_height_above_geoid units: m
Data is loaded lazily in chunks and in parallel by Dask using available cores
da = ds.zeta[:30,:]
da
<xarray.DataArray 'zeta' (time: 30, node: 9228245)> dask.array<getitem, shape=(30, 9228245), dtype=float64, chunksize=(10, 141973), chunktype=numpy.ndarray> Coordinates: * time (time) datetime64[ns] 2031-08-03T02:10:00 ... 2031-08-03T07:00:00 x (node) float64 dask.array<chunksize=(141973,), meta=np.ndarray> y (node) float64 dask.array<chunksize=(141973,), meta=np.ndarray> Dimensions without coordinates: node Attributes: location: node long_name: water surface elevation above geoid mesh: adcirc_mesh standard_name: sea_surface_height_above_geoid units: m
%%time
da.mean(dim='time').compute()
CPU times: user 18.7 s, sys: 2.71 s, total: 21.5 s Wall time: 1min 4s
<xarray.DataArray 'zeta' (node: 9228245)> array([5.01569067, 5.01422469, 5.01385336, ..., nan, nan, nan]) Coordinates: x (node) float64 -91.79 -91.8 -91.8 -91.8 ... -88.19 -88.19 -88.19 y (node) float64 30.98 30.98 30.98 30.98 ... 30.58 30.58 30.58 30.58 Dimensions without coordinates: node