The original NetCDF file here (ROMS model output) has float32
vars with _FillValue
set as 1e37.
decode_cf=True
) correctly sets these values to NaN
import fsspec
import xarray as xr
_FillValue
is correctly converted to NaN
.
fs = fsspec.filesystem('s3', anon=True, client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'})
url = 's3://rsignellbucket1/COAWST/coawst_us_20220101_01.nc'
ds = xr.open_dataset(fs.open(url), decode_cf=True)
ds.temp[0,0,0,0].values
array(nan)
ds = xr.open_dataset(fs.open(url), decode_cf=False)
ds.temp[0,0,0,0].values
array(1.e+37, dtype=float32)
ds.temp._FillValue
1e+37
User sees NaN
in xarray, but fill_value
(the attribute used to store the fill value in Zarr) is 9.999999933815813e+36
instead of 1.e+37
%%time
ds[['temp','salt']].isel(ocean_time=slice(0,2)).to_zarr('foo.zarr', compute=True, mode='w')
CPU times: user 736 ms, sys: 136 ms, total: 871 ms Wall time: 9.74 s
<xarray.backends.zarr.ZarrStore at 0x7fc955646200>
ds2 = xr.open_dataset('foo.zarr', engine='zarr', decode_cf=False)
ds2.temp._FillValue
1e+37
ds2.temp[0,0,0,0].values
array(1.e+37, dtype=float32)
ds2 = xr.open_dataset('foo.zarr', engine='zarr', decode_cf=True)
ds2.temp[0,0,0,0].values
array(nan)
! cat ./foo.zarr/temp/.zarray
{ "chunks": [ 1, 4, 84, 448 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "<f4", "fill_value": 9.999999933815813e+36, "filters": null, "order": "C", "shape": [ 2, 16, 336, 896 ], "zarr_format": 2 }
Here the user doesn't get NaN
values in the masked regions, but a value close too but
json_url = 's3://rsignellbucket1/COAWST/jsons/coawst_us_20220101_01.nc.json'
Try with decode_cf=True
:
s_opts = dict(skip_instance_cache=True, anon=True, client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'}) #json
r_opts = dict(anon=True, client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'}) #data
fs = fsspec.filesystem("reference", fo=json_url, ref_storage_args=s_opts,
remote_protocol='s3', remote_options=r_opts)
m = fs.get_mapper("")
ds = xr.open_dataset(m, engine="zarr", chunks={},
backend_kwargs=dict(consolidated=False), decode_cf=True)
ds.temp[0,0,0,0].values
array(9.99999993e+36)
print with full precision:
format(ds.temp[0,0,0,0].values, '.60g')
'9999999933815812510711506376257961984'
So these came in as (non-NaN) values because they are different than
fill_value: 9.999999933815813e+36
?
Try with decode_cf=False
:
ds = xr.open_dataset(m, engine="zarr", chunks={},
backend_kwargs=dict(consolidated=False), decode_cf=False)
ds.temp[0,0,0,0].values
array(1.e+37, dtype=float32)
The kerchunk-generated JSON of course reflects what the Zarr file has:
fs.download('temp/.zattrs', 'foo')
!more foo
{ "_ARRAY_DIMENSIONS": [ "ocean_time", "s_rho", "eta_rho", "xi_rho" ], "_FillValue": 9.999999933815813e+36, "coordinates": "lon_rho lat_rho s_rho ocean_time", "field": "temperature, scalar, series", "grid": "grid", "location": "face", "long_name": "potential temperature", "time": "ocean_time", "units": "Celsius" }