All the climate data used by the Analytics Engine is stored in a publically accessible AWS S3 bucket. If you are familiar with Python you can easily access the data using the intake package to create an xarray dataset. This xarray dataset then can be exported to NetCDF and stored physically on your computer.
#If running this notebook in an environment outside of the Cal-Adapt Analytics Engine Jupyter Hub make sure to install intake-esm and s3fs packages
import intake
To connect to the data catalog that stores all the relavant metadata needed to access the data issue these commands:
# Open catalog of available data sets using intake-esm package
cat = intake.open_esm_datastore('https://cadcat.s3.amazonaws.com/cae-collection.json')
# inspecting the catalog object will show the number of datasets and unique attributes
cat
This catalog object can be converted to a Pandas dataframe to easily see the contents:
# Access catalog as dataframe and inspect the first few rows
cat_df = cat.df
cat_df.head()
You can also list just the column names in the catalog by doing:
# Print column names
for col in cat_df:
print(col)
To see the unique values in each column run the following code:
# unique values in each column. Not all combinations of values will link to a dataset.
for col in cat_df:
print(cat_df[col].unique())
This will give you an idea of the available query parameters that can be entered to retrieve a particular set of data. Below is a sample query against the whole catalog to refine catalog entries to those of interest:
# form query dictionary
query = {
# Downscaling method
'activity_id': 'WRF',
# GCM name
'source_id': 'CESM2',
# time period - historical or emissions scenario
'experiment_id': ['historical', 'ssp370'],
# variable
'variable_id': 't2',
# monthly time resolution
'table_id': 'mon',
# grid resolution: d01 = 45km, d02 = 9km, d03 = 3km
'grid_label': 'd03'
}
# subset catalog and get some metrics grouped by 'source_id'
cat_subset = cat.search(require_all_on=['source_id'], **query)
cat_subset
The zarr datasets of interest can then be loaded into memory as xarray datasets using:
dsets = cat_subset.to_dataset_dict(xarray_open_kwargs={'consolidated': True},
storage_options={'anon': True})
To see the dataset keys type:
# See object keys in dsets
list(dsets)
To get down to one dataset of interest just use the key:
# Subset to historical time period and examine data object
data = dsets['WRF.UCLA.CESM2.historical.mon.d03']
data
Finally to save a dataset to NetCDF use:
data.to_netcdf('WRF-UCLA-CESM2-historical-mon-d03.nc')