This notebook provides discussion, examples, and best practices for working with netCDF files in Python. Topics include:
netcdf4-python
librarysalishsea_tools.nc_tools
code moduleThis notebook is a companion to the Exploring netCDF Datasets from ERDDAP notebook. That notebooks focusses on reading data from netCDF datasets stored on ERDDAP servers on the Internet. The Salish Sea project maintains an ERDDAP server at https://salishsea.eos.ubc.ca/erddap/.
Creating netCDF files and working with their attribute metadata is documented elsewhere: http://salishsea-meopar-docs.readthedocs.org/en/latest/code-notes/salishsea-nemo/nemo-forcing/netcdf4.html
The netcdf4-python
library
does all of the heavy lifting to let us work with netCDF files and their data.
Follow the link to get to the library documentation.
The salishsea_tools.nc_tools code module provides some shortcut functions for exploring netCDF datasets.
Let's go ahead and import those two packages,
We'll also import numpy
because we're going to use it later and it's good Python form
to keep all of our imports at the top of the file.
This notebook assumes that you are working in Python 3. If you don't have a Python 3 environment set up, please see our Anaconda Python Distribution docs for instructions on how to set one up.
import netCDF4 as nc
import numpy as np
from salishsea_tools import nc_tools
Note that:
netCDF4
to nc
and numpy
to np
so that we don't have to type as much
from ... import ...
form to get nc_tools
so that we can avoid typing salishsea_tools.nc_tools
everywhere
netCDF
provides a Dataset
object that allows us to load the contents
of a netCDF file into a Python data structure by simply passing in the
path and file name.
Let's explore the Salish Sea NEMO model bathymetry data:
grid = nc.Dataset('../../NEMO-forcing/grid/bathy_meter_SalishSea2.nc')
Note:
netCDF4.Dataset
can also access datasets stored on ERDDAP servers
like https://salishsea.eos.ubc.ca/erddap/.
If you don't have local access to the tools/NEMO-forcing/grid/bathy_meter_SalishSea2.nc
,
please see the Exploring netCDF Datasets from ERDDAP notebook.
netCDF files are organized around 4 big concepts:
NEMO doesn't use netCDF groups, so we'll ignore them.
nc_tools
provides useful (convenience) functions to look at the other 3.
nc_tools.show_dimensions(grid)
<class 'netCDF4._netCDF4.Dimension'>: name = 'y', size = 898 <class 'netCDF4._netCDF4.Dimension'>: name = 'x', size = 398
nc_tools.show_variables(grid)
odict_keys(['nav_lon', 'nav_lat', 'Bathymetry'])
So, we have a dataset that has 2 dimensions called y
and x
of size 898 and 398, respectively,
and 3 variables called nav_lon
, nav_lat
, and Bathymetry
.
We'll see how the dimensions and variables are related,
and how to work with the data in the variables in a moment,
but first, let's look at the dataset attributes:
nc_tools.show_dataset_attrs(grid)
file format: NETCDF4 Conventions: CF-1.6 title: Salish Sea NEMO Bathymetry institution: Dept of Earth, Ocean & Atmospheric Sciences, University of British Columbia references: https://bitbucket.org/salishsea/nemo-forcing/src/tip/grid/bathy_meter_SalishSea.nc comment: Based on 1_bathymetry_seagrid_WestCoast.nc file from 2-Oct-2013 WCSD_PREP tarball provided by J-P Paquin. source: https://github.com/SalishSeaCast/tools/blob/master/bathymetry/SalishSeaBathy.ipynb https://github.com/SalishSeaCast/tools/blob/master/bathymetry/SmoothMouthJdF.ipynb history: [2013-10-30 13:18] Created netCDF4 zlib=True dataset. [2013-10-30 15:22] Set depths between 0 and 4m to 4m and those >428m to 428m. [2013-10-31 17:10] Algorithmic smoothing. [2013-11-21 19:53] Reverted to pre-smothing dataset (repo rev 3b301b5b9b6d). [2013-11-21 20:14] Updated dataset and variable attributes to CF-1.6 conventions & project standards. [2013-11-21 20:47] Removed east end of Jervis Inlet and Toba Inlet region due to deficient source bathymetry data in Cascadia dataset. [2013-11-21 21:52] Algorithmic smoothing. [2014-01-01 14:44] Smoothed mouth of Juan de Fuca
netCDF attributes are metadata.
In the cast of the dataset attributes they tell us about the dataset as a whole:
how, when, and by whom it was created, how it has been modified, etc.
The meanings of the various attributes and the conventions for them that we use
in the Salish Sea MEOPAR project are documented elsewhere.
Variables also have attributes and nc_tools
provides a function to display them too:
nc_tools.show_variable_attrs(grid, 'nav_lon')
<class 'netCDF4._netCDF4.Variable'> float64 nav_lon(y, x) units: degrees east valid_range: [-126.40029144 -121.31835175] long_name: Longitude unlimited dimensions: current shape = (898, 398) filling on, default _FillValue of 9.969209968386869e+36 used
This tells us a whole lot of useful information about the longitude data values in our bathymetry dataset, for instance:
y
and x
dimensions, in that orderYou can list as many variable names as you want in the show_variable_attrs()
call
to get information about several variables at once.
If you don't provide any variables names,
you get the attributes of all of the variables in the dataset:
nc_tools.show_variable_attrs(grid)
<class 'netCDF4._netCDF4.Variable'> float64 nav_lon(y, x) units: degrees east valid_range: [-126.40029144 -121.31835175] long_name: Longitude unlimited dimensions: current shape = (898, 398) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> float64 nav_lat(y, x) units: degrees north valid_range: [ 46.85966492 51.10480118] long_name: Latitude unlimited dimensions: current shape = (898, 398) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> float64 Bathymetry(y, x) _FillValue: 0.0 least_significant_digit: 1 units: m valid_range: [ 0. 428.] long_name: Depth positive: down unlimited dimensions: current shape = (898, 398) filling on
Before we can go further exploring and working with the variables we need to
associate them with Python variables names.
We do that by accessing them by name in the variables
attribute of our Dataset
object.
variables
is a Python dict
.
We can use any Python variable names we like, so let's shorten them
(being careful not to sacrifice readability for ease of typing):
lons = grid.variables['nav_lon']
lats = grid.variables['nav_lat']
bathy = grid.variables['Bathymetry']
Having done that, we can now access the attributes of our variables using dotted notation:
bathy.units, bathy.valid_range
('m', array([ 0., 428.]))
Our variables are instances of the netCDF.Variable
object.
In addition to their attributes, they carry a bunch of other
useful properties and methods that you can read about in the netCDF4-python docs.
Perhaps more importantly the data associated with the variables
are stored as NumPy arrays.
So, we can use NumPy indexing and slicing to access the data values.
For instance, to get the latitudes and longitudes of the 4 corners of the domain:
lats.shape
(898, 398)
print('Latitudes and longitudes of domain corners:')
pt = (0, 0)
print(' 0, 0: ', lats[pt], lons[pt])
pt = (0, lats.shape[1] - 1)
print(' 0, x-max: ', lats[pt], lons[pt])
pt = (lats.shape[0] - 1, 0)
print(' y-max, 0: ', lats[pt], lons[pt])
pt = (lats.shape[0] - 1, lats.shape[1] - 1)
print(' y-max, x-max:', lats[pt], lons[pt])
Latitudes and longitudes of domain corners: 0, 0: 46.859664917 -123.42943573 0, x-max: 47.6009216309 -121.318351746 y-max, 0: 50.3899269104 -126.400291443 y-max, x-max: 51.104801178 -124.34198761
You can also access the entire variable data array, or subsets of it using slicing.
The [:]
slice notation is a convenient shorthand that means "the entire array".
lats[:]
array([[ 46.85966492, 46.86154556, 46.86342621, ..., 47.59721375, 47.59906769, 47.60092163], [ 46.86278915, 46.86481476, 46.86677933, ..., 47.60125732, 47.60311127, 47.60496521], [ 46.86606979, 46.86814499, 46.87015915, ..., 47.60529709, 47.60715485, 47.60900879], ..., [ 50.38191605, 50.38397598, 50.38602448, ..., 51.09400177, 51.09560776, 51.09720612], [ 50.38591766, 50.38798523, 50.39004135, ..., 51.09781265, 51.0994072 , 51.10100174], [ 50.38992691, 50.39200592, 50.39406967, ..., 51.10162354, 51.10321808, 51.10480118]])
lons[42:45, 128:135]
array([[-122.884552 , -122.87927246, -122.87399292, -122.86871338, -122.86342621, -122.85814667, -122.85286713], [-122.88778687, -122.88250732, -122.87722778, -122.87194824, -122.8666687 , -122.86138916, -122.85610962], [-122.89102936, -122.88574982, -122.88047028, -122.87519073, -122.86991119, -122.86463165, -122.85934448]])
lons[:2, :2], lats[-2:, -2:]
(array([[-123.42943573, -123.42411804], [-123.43196869, -123.42677307]]), array([[ 51.0994072 , 51.10100174], [ 51.10321808, 51.10480118]]))
Note that the zero and maximum dimension values may be omitted for slices that extend to the ends of array dimensions.
In some cases, like our bathymetry depths,
the netCDF variable has a _FillingValue
attribute value that is equal
to values in the variable data.
In that case the data are represented by a NumPy Masked Array with the
mask applied there the data values equal the _FillingValue
:
bathy[:]
masked_array(data = [[-- -- -- ..., -- -- --] [-- -- -- ..., -- -- --] [-- -- -- ..., -- -- --] ..., [-- -- -- ..., -- -- --] [-- -- -- ..., -- -- --] [-- -- -- ..., -- -- --]], mask = [[ True True True ..., True True True] [ True True True ..., True True True] [ True True True ..., True True True] ..., [ True True True ..., True True True] [ True True True ..., True True True] [ True True True ..., True True True]], fill_value = 0.0)
You can test to see if a variables data is masked like this:
np.ma.is_masked(bathy[:])
True
Masked arrays are useful because require less storage than a comparable size fully populated array. Also, when masked arrays are plotted the maked values are all plotted in the same colour (white by default). We'll see in other example notebooks how this allows us to very easily plot our bathymetry in a meaningfully way, and use it, or other values to mask velocity component, salinity, etc. results so that they show values only in the water areas of the domain.