import xarray as xr
niter = 100
In this case, all my ensemble members are named "hydro_ensemble_LHC_X" where X is the member number
path = "/glade/scratch/kdagon/archive/"
PPE = "hydro_ensemble_LHC_"
var = ['FPSN', 'EFLX_LH_TOT']
Here is where we use unix wildcards (e.g., *) to get all the files in each directory
Note that not all wildcard functionality works the same way as in does in the command line
#full_paths = [path+PPE+str(i+1)+"/lnd/hist/*{001[6-9],20-}*" for i in range(niter)] # specific to years 16-20; NOTE: this wildcard doesn't work with open_mfdataset
full_paths = [path+PPE+str(i+1)+"/lnd/hist/*" for i in range(niter)] # all history files in the folder
full_paths[:10] # look at the first 10 paths
['/glade/scratch/kdagon/archive/hydro_ensemble_LHC_1/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_2/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_3/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_4/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_5/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_6/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_7/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_8/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_9/lnd/hist/*', '/glade/scratch/kdagon/archive/hydro_ensemble_LHC_10/lnd/hist/*']
def preprocess(ds):
return ds[var]
%%time
da_model = xr.open_mfdataset(full_paths[0], combine='by_coords', preprocess=preprocess)
CPU times: user 9.21 s, sys: 51.6 ms, total: 9.26 s Wall time: 11.5 s
da_model
<xarray.Dataset> Dimensions: (lat: 46, lon: 72, time: 60) Coordinates: * lon (lon) float32 0.0 5.0 10.0 15.0 ... 340.0 345.0 350.0 355.0 * lat (lat) float32 -90.0 -86.0 -82.0 -78.0 ... 78.0 82.0 86.0 90.0 * time (time) object 0016-02-01 00:00:00 ... 0021-01-01 00:00:00 Data variables: FPSN (time, lat, lon) float32 dask.array<shape=(60, 46, 72), chunksize=(1, 46, 72)> EFLX_LH_TOT (time, lat, lon) float32 dask.array<shape=(60, 46, 72), chunksize=(1, 46, 72)> Attributes: title: CLM History file information comment: NOTE: None of the variables ar... Conventions: CF-1.0 history: created on 05/28/18 20:36:54 source: Community Land Model CLM4.0 hostname: cheyenne username: kdagon version: unknown revision_id: $Id: histFileMod.F90 42903 201... case_title: UNSET case_id: hydro_ensemble_LHC_1 Surface_dataset: surfdata_4x5_16pfts_Irrig_CMIP... Initial_conditions_dataset: finidat_interp_dest.nc PFT_physiological_constants_dataset: hydro_ensemble_LHC_1.nc ltype_vegetated_or_bare_soil: 1 ltype_crop: 2 ltype_UNUSED: 3 ltype_landice_multiple_elevation_classes: 4 ltype_deep_lake: 5 ltype_wetland: 6 ltype_urban_tbd: 7 ltype_urban_hd: 8 ltype_urban_md: 9 ctype_vegetated_or_bare_soil: 1 ctype_crop: 2 ctype_crop_noncompete: 2*100+m, m=cft_lb,cft_ub ctype_landice: 3 ctype_landice_multiple_elevation_classes: 4*100+m, m=1,glcnec ctype_deep_lake: 5 ctype_wetland: 6 ctype_urban_roof: 71 ctype_urban_sunwall: 72 ctype_urban_shadewall: 73 ctype_urban_impervious_road: 74 ctype_urban_pervious_road: 75 cft_c3_crop: 1 cft_c3_irrigated: 2 time_period_freq: month_1 Time_constant_3Dvars_filename: ./hydro_ensemble_LHC_1.clm2.h0... Time_constant_3Dvars: ZSOI:DZSOI:WATSAT:SUCSAT:BSW:H...
Start with the first 10 ensemble members to test functionality
%%time
da_model = [xr.open_mfdataset(p, combine='by_coords', preprocess=preprocess) for p in full_paths[:10]]
CPU times: user 1min 33s, sys: 1.17 s, total: 1min 34s Wall time: 1min 55s
Note: this takes about ~2min to read 10 ensemble members with 5 years of monthly history files for each (4x5 resolution)
Again, start with first 10 members
ensdim = xr.DataArray(list(range(1,11)), dims='ens', name='ens') # or can use np.arange
ensdim
<xarray.DataArray 'ens' (ens: 10)> array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) Dimensions without coordinates: ens
da_model_concat = xr.concat(da_model, dim=ensdim)
da_model_concat
<xarray.Dataset> Dimensions: (ens: 10, lat: 46, lon: 72, time: 60) Coordinates: * lon (lon) float32 0.0 5.0 10.0 15.0 ... 340.0 345.0 350.0 355.0 * lat (lat) float32 -90.0 -86.0 -82.0 -78.0 ... 78.0 82.0 86.0 90.0 * time (time) object 0016-02-01 00:00:00 ... 0021-01-01 00:00:00 * ens (ens) int64 1 2 3 4 5 6 7 8 9 10 Data variables: FPSN (ens, time, lat, lon) float32 dask.array<shape=(10, 60, 46, 72), chunksize=(1, 1, 46, 72)> EFLX_LH_TOT (ens, time, lat, lon) float32 dask.array<shape=(10, 60, 46, 72), chunksize=(1, 1, 46, 72)> Attributes: title: CLM History file information comment: NOTE: None of the variables ar... Conventions: CF-1.0 history: created on 05/28/18 20:36:54 source: Community Land Model CLM4.0 hostname: cheyenne username: kdagon version: unknown revision_id: $Id: histFileMod.F90 42903 201... case_title: UNSET case_id: hydro_ensemble_LHC_1 Surface_dataset: surfdata_4x5_16pfts_Irrig_CMIP... Initial_conditions_dataset: finidat_interp_dest.nc PFT_physiological_constants_dataset: hydro_ensemble_LHC_1.nc ltype_vegetated_or_bare_soil: 1 ltype_crop: 2 ltype_UNUSED: 3 ltype_landice_multiple_elevation_classes: 4 ltype_deep_lake: 5 ltype_wetland: 6 ltype_urban_tbd: 7 ltype_urban_hd: 8 ltype_urban_md: 9 ctype_vegetated_or_bare_soil: 1 ctype_crop: 2 ctype_crop_noncompete: 2*100+m, m=cft_lb,cft_ub ctype_landice: 3 ctype_landice_multiple_elevation_classes: 4*100+m, m=1,glcnec ctype_deep_lake: 5 ctype_wetland: 6 ctype_urban_roof: 71 ctype_urban_sunwall: 72 ctype_urban_shadewall: 73 ctype_urban_impervious_road: 74 ctype_urban_pervious_road: 75 cft_c3_crop: 1 cft_c3_irrigated: 2 time_period_freq: month_1 Time_constant_3Dvars_filename: ./hydro_ensemble_LHC_1.clm2.h0... Time_constant_3Dvars: ZSOI:DZSOI:WATSAT:SUCSAT:BSW:H...
file_path = "/glade/scratch/nanr/forKatie/daily/"
This is 1/4 degree atmosphere daily output, RCP8.5 from 2070-2100
file = "b.e13.BRCP85C5CN.ne120_g16.001.cam.h1.PRECT.20700101-21001231.FV.nc"
You kind of have to know your general dimension sizes to select chunk sizes
For example here I am spliting lat and lon into 2 chunks (total lat = 786, total lon = 1152)
And I am chunking time (the largest dimension) into size 100 each
ds = xr.open_dataset(file_path+file, chunks={'time': 100, 'lat': 384, 'lon': 576})
More about how to choose chunk sizes:
https://docs.dask.org/en/latest/array-best-practices.html
https://examples.dask.org/xarray.html
PRECT = ds.PRECT
PRECT
<xarray.DataArray 'PRECT' (time: 11315, lat: 768, lon: 1152)> dask.array<shape=(11315, 768, 1152), dtype=float32, chunksize=(100, 384, 576)> Coordinates: * lat (lat) float64 -90.0 -89.77 -89.53 -89.3 ... 89.3 89.53 89.77 90.0 * lon (lon) float64 0.0 0.3125 0.625 0.9375 ... 358.8 359.1 359.4 359.7 * time (time) object 2070-01-02 00:00:00 ... 2101-01-01 00:00:00 Attributes: units: m/s long_name: Total (convective and large-scale) precipitation rate (li... cell_methods: time: mean cell_measures: area: area
PRECT.data
|