Let's explore a netCDF file from the Atlantic Real-Time Ocean Forecast System
first, import netcdf4-python and numpy
import netCDF4
import numpy as np
f
is a Dataset
object, representing an open netCDF file.ncdump -h
.f = netCDF4.Dataset('data/rtofs_glo_3dz_f006_6hrly_reg3.nc')
print(f)
variables
dict.print(f.variables.keys()) # get all variable names
temp = f.variables['temperature'] # temperature variable
print(temp)
MT
dimension is special (unlimited
), which means it can be appended to.for d in f.dimensions.items():
print(d)
Each variable has a dimensions
and a shape
attribute.
temp.dimensions
temp.shape
mt = f.variables['MT']
depth = f.variables['Depth']
x,y = f.variables['X'], f.variables['Y']
print(mt)
print(x)
time = mt[:] # Reads the netCDF variable MT, array of one element
print(time)
dpth = depth[:] # examine depth array
print(dpth)
xx,yy = x[:],y[:]
print('shape of temp variable: %s' % repr(temp.shape))
tempslice = temp[0, dpth > 400, yy > yy.max()/2, xx > xx.max()/2]
print('shape of temp slice: %s' % repr(tempslice.shape))
lat, lon = f.variables['Latitude'], f.variables['Longitude']
print(lat)
Aha! So we need to find array indices iy
and ix
such that Latitude[iy, ix]
is close to 50.0 and Longitude[iy, ix]
is close to -140.0 ...
# extract lat/lon values (in degrees) to numpy arrays
latvals = lat[:]; lonvals = lon[:]
# a function to find the index of the point closest pt
# (in squared distance) to give lat/lon value.
def getclosest_ij(lats,lons,latpt,lonpt):
# find squared distance of every point on grid
dist_sq = (lats-latpt)**2 + (lons-lonpt)**2
# 1D index of minimum dist_sq element
minindex_flattened = dist_sq.argmin()
# Get 2D index for latvals and lonvals arrays from 1D index
return np.unravel_index(minindex_flattened, lats.shape)
iy_min, ix_min = getclosest_ij(latvals, lonvals, 50., -140)
|----------+--------|
| Variable | Index |
|----------+--------|
| MT | 0 |
| Depth | 0 |
| Y | iy_min |
| X | ix_min |
|----------+--------|
sal = f.variables['salinity']
# Read values out of the netCDF file for temperature and salinity
print('%7.4f %s' % (temp[0,0,iy_min,ix_min], temp.units))
print('%7.4f %s' % (sal[0,0,iy_min,ix_min], sal.units))
The following example showcases some nice netCDF features:
import datetime
date = datetime.datetime.now()
# build URL for latest synoptic analysis time
URL = 'http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_%04i%02i%02i_%02i%02i.grib2/GC' %\
(date.year,date.month,date.day,6*(date.hour//6),0)
# keep moving back 6 hours until a valid URL found
validURL = False; ncount = 0
while (not validURL and ncount < 10):
print(URL)
try:
gfs = netCDF4.Dataset(URL)
validURL = True
except RuntimeError:
date -= datetime.timedelta(hours=6)
ncount += 1
# Look at metadata for a specific variable
# gfs.variables.keys() will show all available variables.
sfctmp = gfs.variables['Temperature_surface']
# get info about sfctmp
print(sfctmp)
# print coord vars associated with this variable
for dname in sfctmp.dimensions:
print(gfs.variables[dname])
data == var.missing_value
somewhere, a masked array is returned.soilmvar = gfs.variables['Volumetric_Soil_Moisture_Content_depth_below_surface_layer']
# flip the data in latitude so North Hemisphere is up on the plot
soilm = soilmvar[0,0,::-1,:]
print('shape=%s, type=%s, missing_value=%s' % \
(soilm.shape, type(soilm), soilmvar.missing_value))
import matplotlib.pyplot as plt
%matplotlib inline
cs = plt.contourf(soilm)
There is a similar feature for variables with scale_factor
and add_offset
attributes.
hours since YY:MM:DD hh-mm-ss
.num2date
and date2num
convenience functions provided to convert between these numeric time coordinates and handy python datetime instances. date2index
finds the time index corresponding to a datetime instance.from netCDF4 import num2date, date2num, date2index
timedim = sfctmp.dimensions[0] # time dim name
print('name of time dimension = %s' % timedim)
times = gfs.variables[timedim] # time coord var
print('units = %s, values = %s' % (times.units, times[:]))
dates = num2date(times[:], times.units)
print([date.strftime('%Y-%m-%d %H:%M:%S') for date in dates[:10]]) # print only first ten...
from datetime import datetime, timedelta
date = datetime.now() + timedelta(days=3)
print(date)
ntime = date2index(date,times,select='nearest')
print('index = %s, date = %s' % (ntime, dates[ntime]))
getcloses_ij
we created before...lats, lons = gfs.variables['lat'][:], gfs.variables['lon'][:]
# lats, lons are 1-d. Make them 2-d using numpy.meshgrid.
lons, lats = np.meshgrid(lons,lats)
j, i = getclosest_ij(lats,lons,40,-105)
fcst_temp = sfctmp[ntime,j,i]
print('Boulder forecast valid at %s UTC = %5.1f %s' % \
(dates[ntime],fcst_temp,sfctmp.units))
What if you have a bunch of netcdf files, each with data for a different year, and you want to access all the data as if it were in one file?
!ls -l data/prmsl*nc
MFDataset
uses file globbing to patch together all the files into one big Dataset.
You can also pass it a list of specific files.
Limitations:
NETCDF3
, or NETCDF4_CLASSIC
formatted files.mf = netCDF4.MFDataset('data/prmsl*nc')
times = mf.variables['time']
dates = num2date(times[:],times.units)
print('starting date = %s' % dates[0])
print('ending date = %s'% dates[-1])
prmsl = mf.variables['prmsl']
print('times shape = %s' % times.shape)
print('prmsl dimensions = %s, prmsl shape = %s' %\
(prmsl.dimensions, prmsl.shape))
It's good to close netCDF files, but not actually necessary when Dataset is open for read access only.
f.close()
gfs.close()
Now you're ready to start exploring your data interactively.
To be continued with Writing netCDF data ....