Julius Busecke1 and Ryan Abernathey2
(1 Princeton University 2 Columbia University)
The coupled model intercomparison project (CMIP; currently in its 6th iteration) involves contribution from countries around the world, constituting a combined dataset of of 20+ Petabytes.
Many years of planning go into the inception of the different scenarios, various model intercomparison projects (MIPs) and the delivery of the output data from the modeling community around the globe. A major challenge for the earth science community now, is to effectively analyze this dataset and answer science question that improve our understanding of the earth system for the coming years.
The Pangeo project recently introduced large parts of the CMIP6 data archive into the cloud. This enables, for the first time, centralized, reproducible science of state-of-the-art climate simulations without the need to own large storage or a supercomputer as a user.
The data itself however, still presents significant challenges for analysis, one of which is applying operations across many models. Two of the major hurdles are different naming/metadata between modeling centers and complex grid layouts, particularly for the ocean components of climate models. Common workflows in the past often included interpolating/remapping desired variables and creating new files, creating organizational burden, and increasing storage requirements.
We will demonstrate two pangeo tools which enable seamless calculation of common operations like vector calculus operators (grad, curl, div) and weighted averages/integrals across a wide range of CMIP6 models directly on the data stored in the cloud. cmip6_preprocessing
provides numerous tools to unify naming conventions and reconstruct grid information and metrics (like distances). This information is used by xgcm
to enable finite volume analysis on the native model grids. The combination of both tools facilitates fast analysis while ensuring a reproducible and accurate workflow.
from dask.distributed import Client, progress
from dask_gateway import Gateway
gateway = Gateway()
cluster = gateway.new_cluster()
cluster.scale(60)
cluster
VBox(children=(HTML(value='<h2>GatewayCluster</h2>'), HBox(children=(HTML(value='\n<div>\n<style scoped>\n …
client = Client(cluster)
client
Client
|
Cluster
|
In order to load the cmip data from the cloud storage, we use intake-esm
, and choose to load all monthly ocean model output (table_id='Omon'
) of sea surface temperature (variable_id='tos'
) on the native grid (grid_label='gn'
) and for the 'historical' run (experiment_id='historical'
).
For more details on how to use intake-esm we refer to this Earth Cube presentation.
import intake
import warnings
warnings.filterwarnings("ignore")
url = "https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
models = col.df['source_id'].unique()
# we will have to eliminate some models that do not *yet* play nice with cmip6_preprocessing
# ignore_models = ['FGOALS-f3-L','EC-Earth3','EC-Earth3-Veg-LR', 'CESM2-FV2',
# 'GFDL-ESM4', 'MPI-ESM-1-2-HAM', 'MPI-ESM1-2-LR', 'NorCPM1',
# 'GISS-E2-1-H', 'IPSL-CM6A-LR', 'MRI-ESM2-0', 'CESM2-WACCM',
# 'GISS-E2-1-G', 'FIO-ESM-2-0', 'MCM-UA-1-0', 'FGOALS-g3']
# models = [m for m in models if 'AWI' not in m and m not in ignore_models]
# These might change in the future so it is better to explicitly allow the models that work
models = ['TaiESM1','BCC-ESM1','CNRM-ESM2-1','MIROC6','UKESM1-0-LL','NorESM2-LM',
'BCC-CSM2-MR','CanESM5','ACCESS-ESM1-5','E3SM-1-1-ECA','E3SM-1-1',
'MIROC-ES2L','CESM2','CNRM-CM6-1','GFDL-CM4','CAMS-CSM1-0','CAS-ESM2-0',
'CanESM5-CanOE','IITM-ESM','CNRM-CM6-1-HR','ACCESS-CM2','E3SM-1-0',
'EC-Earth3-LR','EC-Earth3-Veg','INM-CM4-8','INM-CM5-0','HadGEM3-GC31-LL',
'HadGEM3-GC31-MM','MPI-ESM1-2-HR','GISS-E2-1-G-CC','GISS-E2-2-G','CESM2-WACCM-FV2',
'NorESM1-F','NorESM2-MM','KACE-1-0-G','GFDL-AM4','NESM3',
'SAM0-UNICON','CIESM','CESM1-1-CAM5-CMIP5','CMCC-CM2-HR4','CMCC-CM2-VHR4',
'EC-Earth3P-HR','EC-Earth3P','ECMWF-IFS-HR','ECMWF-IFS-LR','INM-CM5-H',
'IPSL-CM6A-ATM-HR','HadGEM3-GC31-HM','HadGEM3-GC31-LM','MPI-ESM1-2-XR','MRI-AGCM3-2-H',
'MRI-AGCM3-2-S','GFDL-CM4C192','GFDL-OM4p5B']
cat = col.search(table_id='Omon', grid_label='gn', experiment_id='historical', variable_id='tos', source_id=models)
/srv/conda/envs/notebook/lib/python3.7/site-packages/intake/source/discovery.py:138: FutureWarning: The drivers ['geojson', 'postgis', 'shapefile', 'spatialite'] do not specify entry_points and were only discovered via a package scan. This may break in a future release of intake. The packages should be updated. FutureWarning)
We have to eliminate some models from this analysis: The AWI
models, due to their unstructured native grid, as well as others for various subtleties that have yet to be resolved. This will be addressed in a future update of cmip6_preprocessing
. If you do not see your favorite model, please consider raising an issue on github.
cat.df['source_id'].nunique() # This represents the number of models as of 6/14/2020.
# More models might be added in the future. See commented parts in previous code cell.
27
This gives us 27 different models ('source_ids') to work with. Lets load them into a dictionary and inspect them closer.
# Fix described here: https://github.com/intake/filesystem_spec/issues/317
# Would cause `Name (gs) already in the registry and clobber is False` error somewhere in `intake-esm`
import fsspec
fsspec.filesystem('gs')
ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True, 'decode_times':False})
--> The keys in the returned dictionary of datasets are constructed as follows: 'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
What we ultimately want is to apply an analysis or just a visualization across all these models. So before we jump into that, lets inspect a few of the datasets:
ddict['CMIP.NCAR.CESM2.historical.Omon.gn']
|
|
|
|
|
array([0.000000e+00, 7.070000e+02, 1.415000e+03, ..., 1.443191e+06, 1.443923e+06, 1.444655e+06])
array([ 1, 2, 3, ..., 382, 383, 384], dtype=int32)
array([ 1, 2, 3, ..., 318, 319, 320], dtype=int32)
array(['r10i1p1f1', 'r11i1p1f1', 'r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1', 'r4i1p1f1', 'r5i1p1f1', 'r6i1p1f1', 'r7i1p1f1', 'r8i1p1f1', 'r9i1p1f1'], dtype='<U9')
|
ddict['CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.Omon.gn']
|
|
|
|
|
array([0.000000e+00, 1.550000e+01, 4.500000e+01, ..., 1.444152e+06, 1.444884e+06, 1.445616e+06])
array(['r10i1p1f2', 'r11i1p1f2', 'r12i1p1f2', 'r13i1p1f2', 'r14i1p1f2', 'r15i1p1f2', 'r16i1p1f2', 'r17i1p1f2', 'r18i1p1f2', 'r19i1p1f2', 'r1i1p1f2', 'r20i1p1f2', 'r21i1p1f2', 'r22i1p1f2', 'r23i1p1f2', 'r24i1p1f2', 'r25i1p1f2', 'r26i1p1f2', 'r27i1p1f2', 'r28i1p1f2', 'r29i1p1f2', 'r2i1p1f2', 'r30i1p1f2', 'r3i1p1f2', 'r4i1p1f2', 'r5i1p1f2', 'r6i1p1f2', 'r7i1p1f2', 'r8i1p1f2', 'r9i1p1f2'], dtype='<U9')
|
ddict['CMIP.CSIRO.ACCESS-ESM1-5.historical.Omon.gn']
|
|
|
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
array([ 0, 1, 2, ..., 357, 358, 359], dtype=int32)
array([ 0, 1, 2, ..., 297, 298, 299], dtype=int32)
array(['r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1'], dtype='<U8')
|
|
|
We can immediately spot problems:
There is no consistent convention for the labeling of dimensions (note the 'logical 1D dimension in the x-direction is called x
, nlon
, i
)
Some models (here: CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.Omon.gn
) are missing the 'vertex'/'vertices' dimension, due to the fact that the cell geometry is given by longitude and latitude 'bounds' centered on the cell face compared to the corners of the cell.
There are more issues that both make working with the data cumbersome and can quickly introduce errors. For instance, some models give depth in units of cm, not m, and many other issues. Instead of focusing on problems, here we would like to illustrate how easy analysis across models with different grids can be, using cmip6_preprocessing
.
cmip6_preprocessing
to harmonize different model output¶cmip6_preprocessing
was born out of the cmip6-hackathon and aims to provide a central package through which these conventions can be harmonized. For convenience we will make use of the preprocess
functionality and apply the 'catch-all' function combined_preprocessing
to the data before it gets aggregated. For a more detailed description of the corrections applied we refer to the documentation and examples therein.
from cmip6_preprocessing.preprocessing import combined_preprocessing
ddict_pp = cat.to_dataset_dict(
zarr_kwargs={'consolidated':True, 'decode_times':False},
preprocess=combined_preprocessing
)
--> The keys in the returned dictionary of datasets are constructed as follows: 'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
Note that all functions in the
cmip6_preprocessing.preprocessing
modules can also be used with 'raw' datasets, like e.g. a local netcdf file, as long as the metadata is intact.
Now lets look at the preprocessed version of the same datasets from above:
ddict_pp['CMIP.NCAR.CESM2.historical.Omon.gn']
|
|
|
|
|
|
|
array([0.000000e+00, 7.070000e+02, 1.415000e+03, ..., 1.443191e+06, 1.443923e+06, 1.444655e+06])
array([0, 1, 2, 3])
array([0, 1])
array([ 1.062475, 2.187474, 3.312474, ..., 357.687486, 358.812486, 359.93748 ])
array([-79.220523, -78.686306, -78.15209 , ..., 89.107546, 89.663855, 89.70641 ])
array(['r10i1p1f1', 'r11i1p1f1', 'r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1', 'r4i1p1f1', 'r5i1p1f1', 'r6i1p1f1', 'r7i1p1f1', 'r8i1p1f1', 'r9i1p1f1'], dtype='<U9')
|
ddict_pp['CMIP.CNRM-CERFACS.CNRM-CM6-1.historical.Omon.gn']
|
|
|
|
|
|
|
array([0.000000e+00, 1.550000e+01, 4.500000e+01, ..., 1.444152e+06, 1.444884e+06, 1.445616e+06])
array([0, 1, 2, 3])
array([0, 1])
array([ 0.5, 1.5, 2.5, ..., 357.5, 358.5, 359.5])
array([-78.35936 , -78.171432, -77.98098 , ..., 89.366531, 89.741768, 90. ])
array(['r10i1p1f2', 'r11i1p1f2', 'r12i1p1f2', 'r13i1p1f2', 'r14i1p1f2', 'r15i1p1f2', 'r16i1p1f2', 'r17i1p1f2', 'r18i1p1f2', 'r19i1p1f2', 'r1i1p1f2', 'r20i1p1f2', 'r21i1p1f2', 'r22i1p1f2', 'r23i1p1f2', 'r24i1p1f2', 'r25i1p1f2', 'r26i1p1f2', 'r27i1p1f2', 'r28i1p1f2', 'r29i1p1f2', 'r2i1p1f2', 'r30i1p1f2', 'r3i1p1f2', 'r4i1p1f2', 'r5i1p1f2', 'r6i1p1f2', 'r7i1p1f2', 'r8i1p1f2', 'r9i1p1f2'], dtype='<U9')
|
ddict_pp['CMIP.CSIRO.ACCESS-ESM1-5.historical.Omon.gn']
|
|
|
|
|
|
|
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
array([0, 1, 2, 3])
array([0, 1])
array([ 0.5, 1.5, 2.5, ..., 357.5, 358.5, 359.5])
array([-77.876625, -77.629715, -77.381706, ..., 88.867493, 89.31498 , 89.748711])
array(['r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1'], dtype='<U8')
|
Much better!
As you can see, they all have consistent dimension names and coordinates. Time to see if this works as advertised, by plotting the sea surface temperature (SST) for all models
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
fig, axarr = plt.subplots(
ncols=5, nrows=6, figsize=[15, 15], subplot_kw={"projection": ccrs.Robinson(190)}
)
for ax, (k, ds) in zip(axarr.flat, ddict_pp.items()):
# Select a single member for each model
if 'member_id' in ds.dims:
ds = ds.isel(member_id=-1)
# select the first time step
da = ds.tos.isel(time=0)
# some models have large values instead of nans, so we mask unresonable values
da = da.where(da < 1e30)
# and plot the resulting 2d array using xarray
pp = da.plot(
ax=ax,
x="lon",
y="lat",
vmin=-1,
vmax=28,
transform=ccrs.PlateCarree(),
infer_intervals=False,
add_colorbar=False,
)
ax.set_title(ds.attrs['source_id'])
ax.add_feature(cfeature.LAND, facecolor='0.6')
ax.coastlines()
fig.subplots_adjust(hspace=0.05, wspace=0.05)
fig.subplots_adjust(right=0.9)
cbar_ax = fig.add_axes([0.92, 0.3, 0.015, 0.4])
fig.colorbar(pp, cax=cbar_ax, label="Sea Surface Temperature [$^\circ C$]")
<matplotlib.colorbar.Colorbar at 0x7f907aa4edd8>
That was very little code to plot such a comprehensive figure, even if the models all have different resolution and grid architectures.
cmip6_preprocessing
helps keeping the logic needed for simple analysis and visualization to a minimum, but in the newest version is also helps to facilitate the use of xgcm
with the native model grids, making it a powerful tool for more complex analysis across the models.
xgcm
.¶Most modern circulation models discretize the partial differential equations needed to simulate the earth system on a logically rectangular grid. This means the grid for a single time step can be represented as a 3-dimensional array of cells. Operations like e.g., a derivative are then approximated by a finite difference between neighboring cells, divided by the appropriate distance.
Calculating operators like gradient, curl or divergence is usually associated with a lot of bookkeeping to make sure that the difference is taken between the correct grid points, etc. This often leads to users preferring interpolated grids (on regular lon/lat grids), which have to be processed and stored next to the native grid data, both increasing storage requirements and potentially losing some of the detailed structure in the high-resolution model fields.
The combination of cmip6_preprocessing
and xgcm
enables the user to quickly calculate operators like gradient or curl (shown in detail below) on the native model grids, preserving the maximum detail the models provide and preventing unnecessary storage burden. cmip6_preprocessing
parses the detailed grid information for the different models so that xgcm
can carry out the computations, without requiring the user to dive into the details of each model grid.
As an example, let us compute and plot the gradient magnitude of the SST.
$ \overline{\nabla(SST)} = \sqrt{\frac{\partial SST}{\partial x}^2 + \frac{\partial SST}{\partial y}^2} $
We will start with an example of just computing the zonal gradient $\frac{\partial SST}{\partial x}$:
First, we need to create a suitable grid, with dimensions both for the cell center (where our tracer is located) and the cell faces. These are needed for xgcm to calculate the finite distance version of the above equation.
The function combine_staggered_grid
parses a dataset so that it is compatible with xgcm. It also provides a 'ready-to-use' xgm Grid object for convenience.
# we pick one of the many models
ds = ddict_pp['CMIP.CNRM-CERFACS.CNRM-CM6-1-HR.historical.Omon.gn']
ds
|
|
|
|
|
|
|
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
array([0, 1, 2, 3])
array([0, 1])
array([2.5000e-01, 5.0000e-01, 7.5000e-01, ..., 3.5950e+02, 3.5975e+02, 3.6000e+02], dtype=float32)
array([-78.107124, -78.05977 , -78.01226 , ..., 89.88762 , 89.94787 , 90. ], dtype=float32)
array(['r1i1p1f2'], dtype='<U8')
|
from cmip6_preprocessing.grids import combine_staggered_grid
# Now we parse the necessary additional dimensions
grid,ds = combine_staggered_grid(ds)
ds
|
|
|
|
|
|
|
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
array([0, 1, 2, 3])
array([0, 1])
array([2.5000e-01, 5.0000e-01, 7.5000e-01, ..., 3.5950e+02, 3.5975e+02, 3.6000e+02], dtype=float32)
array([-78.107124, -78.05977 , -78.01226 , ..., 89.88762 , 89.94787 , 90. ], dtype=float32)
array(['r1i1p1f2'], dtype='<U8')
array([1.25000e-01, 3.75000e-01, 6.25000e-01, ..., 3.59375e+02, 3.59625e+02, 3.59875e+02], dtype=float32)
array([-78.08345, -78.03601, -77.98842, ..., 89.91774, 89.97394, 90.02606], dtype=float32)
|
Note how the new dimensions x_left
and y_right
are added, which are associated with the 'eastern' and 'northern' cell faces.
We can now easily calculate the finite difference in the logical X
direction:
delta_t_x = grid.diff(ds.tos, 'X')
delta_t_x
|
array(['r1i1p1f2'], dtype='<U8')
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
array([-78.107124, -78.05977 , -78.01226 , ..., 89.88762 , 89.94787 , 90. ], dtype=float32)
array([1.25000e-01, 3.75000e-01, 6.25000e-01, ..., 3.59375e+02, 3.59625e+02, 3.59875e+02], dtype=float32)
This array is now located on the center of the eastern cell face. But in order to recreate a derivative, we need to divide this difference by an appropriate distance. Usually these distances are provided with model output, but currently there is no such data publicly available for CMIP models.
To solve this problem, cmip6_preprocessing
can recalculate grid distances.
Be aware that this can lead to rather large biases when the grid is strongly warped, usually around the Arctic. More on that at the end of the notebook.
grid, ds = combine_staggered_grid(ds, recalculate_metrics=True)
ds
array([[ 1.4255313 , 1.672195 , 1.9187545 , ..., 0.6849076 , 0.9318883 , 1.1787626 ], [ 1.3997855 , 1.646533 , 1.8931786 , ..., 0.65892404, 0.9059817 , 1.1529354 ], [ 1.3744365 , 1.6212662 , 1.8679961 , ..., 0.6333417 , 0.8804749 , 1.1275063 ], ..., [72.40824 , 72.41498 , 72.42157 , ..., 72.38714 , 72.39433 , 72.40136 ], [73. , 73. , 73. , ..., 73. , 73. , 73. ], [73.59176 , 73.58502 , 73.57843 , ..., 73.61286 , 73.60567 , 73.59864 ]], dtype=float32)
array([[[-78.02738 , -78.02738 , -78.02738 , -78.02738 ], [-78.02127 , -78.02127 , -78.02127 , -78.02127 ], [-78.01517 , -78.01517 , -78.01517 , -78.01517 ], ..., [-78.045746, -78.045746, -78.045746, -78.045746], [-78.03962 , -78.03962 , -78.03962 , -78.03962 ], [-78.03349 , -78.03349 , -78.03349 , -78.03349 ]], [[-77.99497 , -77.94998 , -77.943886, -77.9888 ], [-77.9888 , -77.943886, -77.9378 , -77.982635], [-77.982635, -77.9378 , -77.93172 , -77.97648 ], ..., [-78.01352 , -77.96831 , -77.9622 , -78.00733 ], [-78.00733 , -77.9622 , -77.956085, -78.00115 ], [-78.00115 , -77.956085, -77.94998 , -77.99497 ]], [[-77.94998 , -77.90479 , -77.898766, -77.943886], [-77.943886, -77.898766, -77.89275 , -77.9378 ], [-77.9378 , -77.89275 , -77.88675 , -77.93172 ], ..., [-77.96831 , -77.92289 , -77.91685 , -77.9622 ], [-77.9622 , -77.91685 , -77.91082 , -77.956085], [-77.956085, -77.91082 , -77.90479 , -77.94998 ]], ..., [[ 80.10231 , 80.10339 , 79.99083 , 79.98976 ], [ 79.98976 , 79.99083 , 79.87826 , 79.8772 ], [ 79.8772 , 79.87826 , 79.76567 , 79.764626], ..., [ 80.43989 , 80.441 , 80.32848 , 80.32738 ], [ 80.32738 , 80.32848 , 80.21594 , 80.21485 ], [ 80.21485 , 80.21594 , 80.10339 , 80.10231 ]], [[ 80.10339 , 80.10339 , 79.99083 , 79.99083 ], [ 79.99083 , 79.99083 , 79.87826 , 79.87826 ], [ 79.87826 , 79.87826 , 79.76567 , 79.76567 ], ..., [ 80.441 , 80.441 , 80.32848 , 80.32848 ], [ 80.32848 , 80.32848 , 80.21594 , 80.21594 ], [ 80.21594 , 80.21594 , 80.10339 , 80.10339 ]], [[ 80.10339 , 80.10231 , 79.98976 , 79.99083 ], [ 79.99083 , 79.98976 , 79.8772 , 79.87826 ], [ 79.87826 , 79.8772 , 79.764626, 79.76567 ], ..., [ 80.441 , 80.43989 , 80.32738 , 80.32848 ], [ 80.32848 , 80.32738 , 80.21485 , 80.21594 ], [ 80.21594 , 80.21485 , 80.10231 , 80.10339 ]]], dtype=float32)
array([[[ 1.4255313 , 1.4255313 , 1.4255313 , 1.4255313 ], [ 1.672195 , 1.672195 , 1.672195 , 1.672195 ], [ 1.9187545 , 1.9187545 , 1.9187545 , 1.9187545 ], ..., [ 0.6849076 , 0.6849076 , 0.6849076 , 0.6849076 ], [ 0.9318883 , 0.9318883 , 0.9318883 , 0.9318883 ], [ 1.1787626 , 1.1787626 , 1.1787626 , 1.1787626 ]], [[ 1.3118777 , 1.2859744 , 1.5326287 , 1.5584462 ], [ 1.5584462 , 1.5326287 , 1.77918 , 1.8049096 ], [ 1.8049096 , 1.77918 , 2.0256293 , 2.0512688 ], ..., [ 0.57153535, 0.54538834, 0.792355 , 0.81842303], [ 0.81842303, 0.792355 , 1.0392168 , 1.0652037 ], [ 1.0652037 , 1.0392168 , 1.2859744 , 1.3118777 ]], [[ 1.2859744 , 1.2604687 , 1.507207 , 1.5326287 ], [ 1.5326287 , 1.507207 , 1.7538447 , 1.77918 ], [ 1.77918 , 1.7538447 , 2.0003824 , 2.0256293 ], ..., [ 0.54538834, 0.51964366, 0.76668775, 0.792355 ], [ 0.792355 , 0.76668775, 1.0136292 , 1.0392168 ], [ 1.0392168 , 1.0136292 , 1.2604687 , 1.2859744 ]], ..., [[72.1063 , 72.70256 , 72.70596 , 72.11652 ], [72.11652 , 72.70596 , 72.7093 , 72.12653 ], [72.12653 , 72.7093 , 72.712555 , 72.13633 ], ..., [72.07427 , 72.6919 , 72.69553 , 72.08518 ], [72.08518 , 72.69553 , 72.69909 , 72.09586 ], [72.09586 , 72.69909 , 72.70256 , 72.1063 ]], [[72.70256 , 73.29744 , 73.29404 , 72.70596 ], [72.70596 , 73.29404 , 73.2907 , 72.7093 ], [72.7093 , 73.2907 , 73.287445 , 72.712555 ], ..., [72.6919 , 73.3081 , 73.30447 , 72.69553 ], [72.69553 , 73.30447 , 73.30091 , 72.69909 ], [72.69909 , 73.30091 , 73.29744 , 72.70256 ]], [[73.29744 , 73.8937 , 73.88348 , 73.29404 ], [73.29404 , 73.88348 , 73.87347 , 73.2907 ], [73.2907 , 73.87347 , 73.86367 , 73.287445 ], ..., [73.3081 , 73.92573 , 73.91482 , 73.30447 ], [73.30447 , 73.91482 , 73.90414 , 73.30091 ], [73.30091 , 73.90414 , 73.8937 , 73.29744 ]]], dtype=float32)
|
array([[-78.02738 , -78.02127 , -78.01517 , ..., -78.045746, -78.03962 , -78.03349 ], [-77.982445, -77.9764 , -77.970375, ..., -78.00058 , -77.99453 , -77.98848 ], [-77.9373 , -77.93134 , -77.925385, ..., -77.95522 , -77.94924 , -77.94327 ], ..., [ 80.04671 , 79.93414 , 79.82157 , ..., 80.38433 , 80.2718 , 80.15926 ], [ 80.04725 , 79.93468 , 79.8221 , ..., 80.38488 , 80.272354, 80.159805], [ 80.04671 , 79.93414 , 79.82157 , ..., 80.38433 , 80.2718 , 80.15926 ]], dtype=float32)
|
|
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
array([0, 1, 2, 3])
array([0, 1])
array([2.5000e-01, 5.0000e-01, 7.5000e-01, ..., 3.5950e+02, 3.5975e+02, 3.6000e+02], dtype=float32)
array([-78.107124, -78.05977 , -78.01226 , ..., 89.88762 , 89.94787 , 90. ], dtype=float32)
array(['r1i1p1f2'], dtype='<U8')
array([1.25000e-01, 3.75000e-01, 6.25000e-01, ..., 3.59375e+02, 3.59625e+02, 3.59875e+02], dtype=float32)
array([-78.08345, -78.03601, -77.98842, ..., 89.91774, 89.97394, 90.02606], dtype=float32)
array([[ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 5720.5063, 5720.9644, 5721.434 , ..., 5719.176 , 5719.607 , 5720.047 ], [ 5743.611 , 5744.0796, 5744.556 , ..., 5742.2534, 5742.6973, 5743.1426], ..., [12529.459 , 12530.309 , 12532.007 , ..., 12525.213 , 12526.91 , 12527.76 ], [12530.309 , 12531.156 , 12532.856 , ..., 12525.213 , 12527.76 , 12528.609 ], [12529.459 , 12530.309 , 12532.007 , ..., 12525.213 , 12526.91 , 12527.76 ]], dtype=float32)
array([[ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], ..., [11407.564, 11403.865, 11400.074, ..., 11417.494, 11414.332, 11410.939], [11379.927, 11376.405, 11372.645, ..., 11389.674, 11386.506, 11383.429], [11406.348, 11402.679, 11398.908, ..., 11416.201, 11413.054, 11409.701]], dtype=float32)
array([[ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 5732.0396, 5732.5034, 5732.9805, ..., 5730.7 , 5731.1333, 5731.582 ], [ 5755.176 , 5755.6553, 5756.1357, ..., 5753.817 , 5754.259 , 5754.7114], ..., [12530.309 , 12531.156 , 12532.856 , ..., 12525.213 , 12527.76 , 12528.609 ], [12530.309 , 12531.156 , 12532.856 , ..., 12525.213 , 12527.76 , 12528.609 ], [12528.609 , 12530.309 , 12531.156 , ..., 12524.362 , 12526.062 , 12527.76 ]], dtype=float32)
array([[ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], ..., [11408.973, 11405.412, 11401.73 , ..., 11418.619, 11415.594, 11412.299], [11381.321, 11377.788, 11373.976, ..., 11390.786, 11387.77 , 11384.481], [11407.741, 11404.211, 11400.542, ..., 11417.302, 11414.309, 11411.036]], dtype=float32)
array([[ 5695.537 , 5695.9785, 5696.437 , ..., 5694.2666, 5694.679 , 5695.098 ], [ 5718.535 , 5718.986 , 5719.456 , ..., 5717.243 , 5717.6606, 5718.0933], [ 5741.6035, 5742.066 , 5742.534 , ..., 5740.2803, 5740.71 , 5741.152 ], ..., [12529.459 , 12530.309 , 12531.156 , ..., 12524.362 , 12526.91 , 12526.91 ], [12529.459 , 12531.156 , 12532.007 , ..., 12526.062 , 12526.062 , 12528.609 ], [12529.459 , 12530.309 , 12531.156 , ..., 12524.362 , 12526.91 , 12526.91 ]], dtype=float32)
array([[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], ..., [1.1385813e+04, 1.1382185e+04, 1.1378479e+04, ..., 1.1395688e+04, 1.1392450e+04, 1.1389262e+04], [1.1385195e+04, 1.1381589e+04, 1.1377897e+04, ..., 1.1395039e+04, 1.1391802e+04, 1.1388636e+04], [1.8447280e+07, 1.8454894e+07, 1.8462782e+07, ..., 1.8426088e+07, 1.8432876e+07, 1.8439940e+07]], dtype=float32)
array([[ 5695.537 , 5695.9785, 5696.437 , ..., 5694.2666, 5694.679 , 5695.098 ], [ 5731.811 , 5732.2734, 5732.743 , ..., 5730.488 , 5730.917 , 5731.357 ], [ 5754.946 , 5755.4146, 5755.897 , ..., 5753.5986, 5754.0376, 5754.487 ], ..., [12529.459 , 12531.156 , 12532.007 , ..., 12524.362 , 12526.062 , 12528.609 ], [12529.459 , 12531.156 , 12532.007 , ..., 12524.362 , 12526.062 , 12528.609 ], [12528.609 , 12529.459 , 12531.156 , ..., 12523.513 , 12525.213 , 12526.91 ]], dtype=float32)
array([[2.9235769e+03, 2.9262566e+03, 2.9289292e+03, ..., 2.9154836e+03, 2.9181882e+03, 2.9208870e+03], [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], ..., [1.1395071e+04, 1.1391674e+04, 1.1387852e+04, ..., 1.1404697e+04, 1.1401685e+04, 1.1398397e+04], [1.1394457e+04, 1.1391073e+04, 1.1387259e+04, ..., 1.1404044e+04, 1.1401039e+04, 1.1397759e+04], [1.8443960e+07, 1.8451454e+07, 1.8459222e+07, ..., 1.8423130e+07, 1.8429802e+07, 1.8436744e+07]], dtype=float32)
|
Now there are additional coordinates, representing distances in the x and y direction at different grid locations. We can now calculate the derivative with respect to x.
dt_dx = grid.diff(ds.tos, 'X') / ds.dx_gx
dt_dx
|
array([-78.107124, -78.05977 , -78.01226 , ..., 89.88762 , 89.94787 , 90. ], dtype=float32)
array([1.25000e-01, 3.75000e-01, 6.25000e-01, ..., 3.59375e+02, 3.59625e+02, 3.59875e+02], dtype=float32)
array(['r1i1p1f2'], dtype='<U8')
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
array([[ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], ..., [11408.973, 11405.412, 11401.73 , ..., 11418.619, 11415.594, 11412.299], [11381.321, 11377.788, 11373.976, ..., 11390.786, 11387.77 , 11384.481], [11407.741, 11404.211, 11400.542, ..., 11417.302, 11414.309, 11411.036]], dtype=float32)
array([[ 5695.537 , 5695.9785, 5696.437 , ..., 5694.2666, 5694.679 , 5695.098 ], [ 5718.535 , 5718.986 , 5719.456 , ..., 5717.243 , 5717.6606, 5718.0933], [ 5741.6035, 5742.066 , 5742.534 , ..., 5740.2803, 5740.71 , 5741.152 ], ..., [12529.459 , 12530.309 , 12531.156 , ..., 12524.362 , 12526.91 , 12526.91 ], [12529.459 , 12531.156 , 12532.007 , ..., 12526.062 , 12526.062 , 12528.609 ], [12529.459 , 12530.309 , 12531.156 , ..., 12524.362 , 12526.91 , 12526.91 ]], dtype=float32)
xgcm
can handle these cases even smarter, by automatically picking the right grid metric, in this case the zonal distance.
dt_dx_auto = grid.derivative(ds.tos, 'X')
dt_dx_auto
|
array([-78.107124, -78.05977 , -78.01226 , ..., 89.88762 , 89.94787 , 90. ], dtype=float32)
array([1.25000e-01, 3.75000e-01, 6.25000e-01, ..., 3.59375e+02, 3.59625e+02, 3.59875e+02], dtype=float32)
array(['r1i1p1f2'], dtype='<U8')
array([ 0, 708, 1416, ..., 1444152, 1444884, 1445616])
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=[13,5])
dt_dx.isel(time=0).sel(x_left=slice(120, 140), y=slice(20,40)).plot(ax=ax1, vmax=1e-4)
dt_dx_auto.isel(time=0).sel(x_left=slice(120, 140), y=slice(20,40)).plot(ax=ax2, vmax=1e-4)
<matplotlib.collections.QuadMesh at 0x7f90718e3208>
The results are identical, and the user does not have to remember the names of the distance coordinates, xgcm
takes care of it. The derivative in the y-direction can be calculated similarly.
Then, let's compute the full gradient amplitude for all the models in one loop, averaging the values over all members and 5 years at the beginning of the historical period.
fig, axarr = plt.subplots(
ncols=4, nrows=7, figsize=[15, 15], subplot_kw={"projection": ccrs.Robinson(190)}
)
for ax, (k, ds) in zip(axarr.flat, ddict_pp.items()):
# the cmip6_preprocessing magic:
# create an xgcm grid object and dataset with reconstructed grid metrics
grid, ds = combine_staggered_grid(ds, recalculate_metrics=True)
da = ds.tos
# some models have large values instead of nans, so we mask unresonable values
da = da.where(da < 1e30)
# calculate the zonal temperature gradient
dt_dx = grid.derivative(da, 'X')
dt_dy = grid.derivative(da, 'Y', boundary='extend')
# these values are now situated on the cell faces, we need to
# interpolate them back to the center to combine them
dt_dx = grid.interp(dt_dx, 'X')
dt_dy = grid.interp(dt_dy, 'Y', boundary='extend')
grad = (dt_dx**2 + dt_dy**2)**0.5
ds['grad'] = grad
# take an average over the first 5 years of the run
ds = ds.isel(time=slice(0,12*5)).mean('time', keep_attrs=True)
# take the average over all available model members
if "member_id" in ds.dims:
ds = ds.mean('member_id', keep_attrs=True)
# and plot the resulting 2d array using xarray
pp = ds.grad.plot(
ax=ax,
x="lon",
y="lat",
vmin=1e-9,
vmax=1e-5,
transform=ccrs.PlateCarree(),
infer_intervals=False,
add_colorbar=False,
)
ax.set_title(ds.attrs['source_id'])
ax.add_feature(cfeature.LAND, facecolor='0.6')
ax.coastlines()
fig.subplots_adjust(hspace=0.05, wspace=0.05)
fig.subplots_adjust(right=0.9)
cbar_ax = fig.add_axes([0.92, 0.3, 0.015, 0.4])
fig.colorbar(pp, cax=cbar_ax, label="Sea Surface Temperature gradient magnitude [$^\circ C/m$]")
<matplotlib.colorbar.Colorbar at 0x7f907a5a0be0>
Not bad for five additional lines of code!
Features like the western boundary currents and the fronts across along the Antarctic Circumpolar Current and the equatorial upwelling zones are clearly visible and present in most models.
But this was still just operating on a tracer field. What about velocity fields? We can also combine different variables into an xgcm compatible dataset!
The recommended workflow is to first load datasets seperately for each desired variable:
variables = ['thetao', 'uo', 'vo']
ddict_multi = {var:{} for var in variables}
for var in variables:
cat_var = col.search(table_id='Omon',
grid_label='gn',
experiment_id='historical',
variable_id=var,
source_id=models)
ddict_multi[var] = cat_var.to_dataset_dict(
zarr_kwargs={'consolidated':True, 'decode_times':False},
preprocess=combined_preprocessing
)
--> The keys in the returned dictionary of datasets are constructed as follows: 'activity_id.institution_id.source_id.experiment_id.table_id.grid_label' CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found MIROC-ES2L: No units found CESM2-WACCM-FV2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m`
CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC-ES2L: No units found MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC-ES2L: No units found MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found MIROC6: No units found MIROC6: No units found MIROC6: No units found MIROC6: No units found --> The keys in the returned dictionary of datasets are constructed as follows: 'activity_id.institution_id.source_id.experiment_id.table_id.grid_label' CESM2-WACCM-FV2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC-ES2L: No units found MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m`
CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units foundMIROC-ES2L: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC-ES2L: No units found MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found MIROC6: No units found MIROC6: No units found MIROC6: No units found --> The keys in the returned dictionary of datasets are constructed as follows: 'activity_id.institution_id.source_id.experiment_id.table_id.grid_label' CESM2-WACCM-FV2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC-ES2L: No units found MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m`
CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC-ES2L: No units found MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC-ES2L: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found CESM2: Unexpected unit (centimeters) for coordinate `lev` detected. Converted to `m` MIROC6: No units found MIROC6: No units found MIROC6: No units found MIROC6: No units found
Now we need to make sure to choose only the intersection between the keys of each sub-dictionary, because we need all 3 variables for each model.
ddict_multi_clean = {var:{} for var in variables}
for k in ddict_multi['thetao']:
if all([k in ddict_multi[var] for var in variables]) and 'lev' in ddict_multi['thetao'][k].dims:
for var in variables:
ddict_multi_clean[var][k] = ddict_multi[var][k]
And now similar as before we can create a complete grid dataset, but we can pass additional datasets to be combined using the other_ds
argument. This can be a list of different variables, cmip6_preprocessing
automatically detects the appropriate grid position.
import matplotlib.path as mpath
import numpy as np
from xgcm import Grid
import xarray as xr
fig, axarr = plt.subplots(
ncols=4, nrows=6, figsize=[25, 20], subplot_kw={"projection": ccrs.PlateCarree()}
)
for ai ,(ax, k) in enumerate(zip(axarr.flat, ddict_multi_clean['thetao'].keys())):
ds_t = ddict_multi_clean['thetao'][k] # One tracer should always be the reference dataset!
ds_u = ddict_multi_clean['uo'][k]
ds_v = ddict_multi_clean['vo'][k]
if 'lev' in ds_t: # show only models with depth coordinates in the vertical
# aling only the intersection of the member_ids
ds_t, ds_u, ds_v = xr.align(ds_t, ds_u, ds_v, join='inner', exclude=[di for di in ds_t.dims if di != 'member_id'])
# combine all datasets and create grid with metrics
grid, ds = combine_staggered_grid(ds_t, other_ds=[ds_u, ds_v], recalculate_metrics=True)
# interpolate grid metrics at the corner points (these are not *yet* constructed by cmip6_preprocessing)
ds.coords['dx_temp'] = grid.interp(ds['dx_gx'], 'Y', boundary='extend')
ds.coords['dy_temp'] = grid.interp(ds['dy_gy'], 'X')
ds = ds.chunk({'x':360})
# selecting the surface value
ds = ds.isel(lev=0)
# For demonstration purposes select the first member and a single monthly timestep
ds = ds.isel(time=10)
if "member_id" in ds.dims:
ds = ds.isel(member_id=0)
grid = Grid(ds, periodic=['X'], metrics={'X':[co for co in ds.coords if 'dx' in co],
'Y':[co for co in ds.coords if 'dy' in co]})
dv_dx = grid.derivative(ds.vo, 'X')
du_dy = grid.derivative(ds.uo, 'Y',boundary='extend')
# check the position of the derivatives and interpolate back to tracer point for plotting
if any(['x_' in di for di in dv_dx.dims]):
dv_dx = grid.interp(dv_dx, 'X')
if any(['y_' in di for di in dv_dx.dims]):
dv_dx = grid.interp(dv_dx, 'Y', boundary='extend')
if any(['x_' in di for di in du_dy.dims]):
du_dy = grid.interp(du_dy, 'X')
if any(['y_' in di for di in du_dy.dims]):
du_dy = grid.interp(du_dy, 'Y', boundary='extend')
curl = dv_dx - du_dy
ds['curl'] = curl
# Select the North Atlantic
ds = ds.sel(x=slice(270,335), y=slice(10,55))
# and plot the resulting 2d array using xarray
pp = ds.curl.plot.contourf(
levels=51,
ax=ax,
x="lon",
y="lat",
vmax=1e-5,
transform=ccrs.PlateCarree(),
infer_intervals=False,
add_colorbar=False,
add_labels=False
)
ax.text(0.2,0.9,ds.attrs['source_id'],horizontalalignment='center',verticalalignment='center',
transform=ax.transAxes, fontsize=10)
ax.coastlines()
ax.set_extent([-90, -45, 20, 45], ccrs.PlateCarree())
fig.subplots_adjust(wspace=0.01, hspace=0.01)
fig.subplots_adjust(right=0.9)
cbar_ax = fig.add_axes([0.92, 0.3, 0.015, 0.4])
fig.colorbar(pp, cax=cbar_ax, label="Sea Surface Vorticity [$1/s$]")
<matplotlib.colorbar.Colorbar at 0x7f8ef549ee10>
For all the examples shown here, we have relied on a simplified reconstruction of the grid metrics (in this case, the distances between different points on the grid). We can check the quality of the reconstruction indirectly by comparing these to the only horizontal grid metric that is saved with the ocean model output: The horizontal grid cell area.
We reconstruct our area naively as the product of the x and y distances centered on the tracer cell, dx_t
, and dy_t
.
ds = ddict_multi_clean['thetao']['CMIP.NOAA-GFDL.GFDL-CM4.historical.Omon.gn']
_,ds = combine_staggered_grid(ds, recalculate_metrics=True)
area_reconstructed = ds.dx_t * ds.dy_t
Now lets load the actual area and compare the two.
url = "https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
cat = col.search(table_id='Ofx', variable_id='areacello', source_id='GFDL-CM4' , grid_label='gn') # switch to `grid_label='gr'` for regridded file
ddict = cat.to_dataset_dict(zarr_kwargs={'consolidated':True}, preprocess=combined_preprocessing)
_,ds_area = ddict.popitem()
area = ds_area.areacello.isel(member_id=0).squeeze()
--> The keys in the returned dictionary of datasets are constructed as follows: 'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=[20,5])
area_reconstructed.plot(ax=ax1, vmax=1e9)
area.plot(ax=ax2, vmax=1e9)
((area_reconstructed - area) / area * 100).plot(ax=ax3, vmax=0.5)
ax1.set_title('reconstructed area')
ax2.set_title('saved area')
ax3.set_title('difference in %')
Text(0.5, 1.0, 'difference in %')
You can see that for this particular model (GFDL-CM4
), the reconstruction does reproduce the grid area with a less than 0.5% error between approximately 60S-60N. North of that, the grid geometry gets vastly more complicated, and the simple reconstruction fails. The area over which the reconstruction varies from model to model and as such caution should always be exercised when analyzing data using reconstructed metrics.
At this point, this is, however, the only way to run these kinds of analyses, since the original grid metrics are not routinely provided with the CMIP6 output. In order to truly reproduce analyses like e.g., budgets, the community requires the native model geometry.
We show that using cmip6_preprocessing
and xgcm
users can analyze complex native ocean model grids without the need to interpolate data or keep track of the intricacies of single model grids. Using these tools already enables users to calculate common operators like the gradient and curl, weighted averages and more, in a 'grid-agnostic' way, with decent precision outside of the polar regions.
Pending on the availability of more grid metric output and building on these tools, complex analyses like various budgets could become a matter of a few lines and be calculated across all models in the CMIP6 archive.
By combining generalizable and reproducible analysis with the publicly available CMIP6 data, more users will be able to analyze the data efficiently, leading to faster understanding and synthesis of the vast amount of data provided by modern climate modeling.
Julius Busecke's contributions to this work are motivated and developed as part of an ongoing project together with his postdoc advisor Prof. Laure Resplandy. This project investigates the influence of equatorial dynamics on possible expansions of Oxygen Minimum Zones in the Pacific and is funded under the NOAA Cooperative Institute for Climate Science agreement NA14OAR4320106.