Let's explore a netCDF file from the Atlantic Real-Time Ocean Forecast System
first, import netcdf4-python and numpy
import netCDF4
import numpy as np
f
is a Dataset
object, representing an open netCDF file.ncdump -h
.f = netCDF4.Dataset('data/rtofs_glo_3dz_f006_6hrly_reg3.nc')
print(f)
<class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4_CLASSIC data model, file format HDF5): Conventions: CF-1.0 title: HYCOM ATLb2.00 institution: National Centers for Environmental Prediction source: HYCOM archive file experiment: 90.9 history: archv2ncdf3z dimensions(sizes): MT(1), Y(850), X(712), Depth(10) variables(dimensions): float64 MT(MT), float64 Date(MT), float32 Depth(Depth), int32 Y(Y), int32 X(X), float32 Latitude(Y, X), float32 Longitude(Y, X), float32 u(MT, Depth, Y, X), float32 v(MT, Depth, Y, X), float32 temperature(MT, Depth, Y, X), float32 salinity(MT, Depth, Y, X) groups:
variables
dict.print(f.variables.keys()) # get all variable names
temp = f.variables['temperature'] # temperature variable
print(temp)
dict_keys(['MT', 'Date', 'Depth', 'Y', 'X', 'Latitude', 'Longitude', 'u', 'v', 'temperature', 'salinity']) <class 'netCDF4._netCDF4.Variable'> float32 temperature(MT, Depth, Y, X) coordinates: Longitude Latitude Date standard_name: sea_water_potential_temperature units: degC _FillValue: 1.2676506e+30 valid_range: [-5.078603 11.1498995] long_name: temp [90.9H] unlimited dimensions: MT current shape = (1, 10, 850, 712) filling on
MT
dimension is special (unlimited
), which means it can be appended to.for d in f.dimensions.items():
print(d)
('MT', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'MT', size = 1) ('Y', <class 'netCDF4._netCDF4.Dimension'>: name = 'Y', size = 850) ('X', <class 'netCDF4._netCDF4.Dimension'>: name = 'X', size = 712) ('Depth', <class 'netCDF4._netCDF4.Dimension'>: name = 'Depth', size = 10)
Each variable has a dimensions
and a shape
attribute.
temp.dimensions
('MT', 'Depth', 'Y', 'X')
temp.shape
(1, 10, 850, 712)
mt = f.variables['MT']
depth = f.variables['Depth']
x,y = f.variables['X'], f.variables['Y']
print(mt)
print(x)
<class 'netCDF4._netCDF4.Variable'> float64 MT(MT) long_name: time units: days since 1900-12-31 00:00:00 calendar: standard axis: T unlimited dimensions: MT current shape = (1,) filling on, default _FillValue of 9.969209968386869e+36 used <class 'netCDF4._netCDF4.Variable'> int32 X(X) point_spacing: even axis: X unlimited dimensions: current shape = (712,) filling on, default _FillValue of -2147483647 used
time = mt[:] # Reads the netCDF variable MT, array of one element
print(time)
[41023.25]
dpth = depth[:] # examine depth array
print(dpth)
[ 0. 100. 200. 400. 700. 1000. 2000. 3000. 4000. 5000.]
xx,yy = x[:],y[:]
print('shape of temp variable: %s' % repr(temp.shape))
tempslice = temp[0, dpth > 400, yy > yy.max()/2, xx > xx.max()/2]
print('shape of temp slice: %s' % repr(tempslice.shape))
shape of temp variable: (1, 10, 850, 712) shape of temp slice: (6, 425, 356)
lat, lon = f.variables['Latitude'], f.variables['Longitude']
print(lat)
<class 'netCDF4._netCDF4.Variable'> float32 Latitude(Y, X) standard_name: latitude units: degrees_north unlimited dimensions: current shape = (850, 712) filling on, default _FillValue of 9.969209968386869e+36 used
Aha! So we need to find array indices iy
and ix
such that Latitude[iy, ix]
is close to 50.0 and Longitude[iy, ix]
is close to -140.0 ...
# extract lat/lon values (in degrees) to numpy arrays
latvals = lat[:]; lonvals = lon[:]
# a function to find the index of the point closest pt
# (in squared distance) to give lat/lon value.
def getclosest_ij(lats,lons,latpt,lonpt):
# find squared distance of every point on grid
dist_sq = (lats-latpt)**2 + (lons-lonpt)**2
# 1D index of minimum dist_sq element
minindex_flattened = dist_sq.argmin()
# Get 2D index for latvals and lonvals arrays from 1D index
return np.unravel_index(minindex_flattened, lats.shape)
iy_min, ix_min = getclosest_ij(latvals, lonvals, 50., -140)
|----------+--------|
| Variable | Index |
|----------+--------|
| MT | 0 |
| Depth | 0 |
| Y | iy_min |
| X | ix_min |
|----------+--------|
sal = f.variables['salinity']
# Read values out of the netCDF file for temperature and salinity
print('%7.4f %s' % (temp[0,0,iy_min,ix_min], temp.units))
print('%7.4f %s' % (sal[0,0,iy_min,ix_min], sal.units))
6.4631 degC 32.6572 psu
The following example showcases some nice netCDF features:
import datetime
date = datetime.datetime.now()
# build URL for latest synoptic analysis time
URL = 'https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_%04i%02i%02i_%02i%02i.grib2/GC' %\
(date.year,date.month,date.day,6*(date.hour//6),0)
# keep moving back 6 hours until a valid URL found
validURL = False; ncount = 0
while (not validURL and ncount < 10):
print(URL)
try:
gfs = netCDF4.Dataset(URL)
validURL = True
except RuntimeError:
date -= datetime.timedelta(hours=6)
ncount += 1
https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/GFS_Global_0p5deg_20230525_1200.grib2/GC
# Look at metadata for a specific variable
# gfs.variables.keys() will show all available variables.
sfctmp = gfs.variables['Temperature_surface']
# get info about sfctmp
print(sfctmp)
# print coord vars associated with this variable
for dname in sfctmp.dimensions:
print(gfs.variables[dname])
<class 'netCDF4._netCDF4.Variable'> float32 Temperature_surface(time1, lat, lon) long_name: Temperature @ Ground or water surface units: K abbreviation: TMP missing_value: nan grid_mapping: LatLon_Projection coordinates: reftime time1 lat lon Grib_Variable_Id: VAR_0-0-0_L1 Grib2_Parameter: [0 0 0] Grib2_Parameter_Discipline: Meteorological products Grib2_Parameter_Category: Temperature Grib2_Parameter_Name: Temperature Grib2_Level_Type: 1 Grib2_Level_Desc: Ground or water surface Grib2_Generating_Process_Type: Forecast Grib2_Statistical_Process_Type: UnknownStatType--1 unlimited dimensions: current shape = (129, 361, 720) filling off <class 'netCDF4._netCDF4.Variable'> float64 time1(time1) units: Hour since 2023-05-25T12:00:00Z standard_name: time long_name: GRIB forecast or observation time calendar: proleptic_gregorian _CoordinateAxisType: Time unlimited dimensions: current shape = (129,) filling off <class 'netCDF4._netCDF4.Variable'> float32 lat(lat) units: degrees_north _CoordinateAxisType: Lat unlimited dimensions: current shape = (361,) filling off <class 'netCDF4._netCDF4.Variable'> float32 lon(lon) units: degrees_east _CoordinateAxisType: Lon unlimited dimensions: current shape = (720,) filling off
data == var.missing_value
somewhere, a masked array is returned.soilmvar = gfs.variables['Volumetric_Soil_Moisture_Content_depth_below_surface_layer']
# flip the data in latitude so North Hemisphere is up on the plot
soilm = soilmvar[0,0,::-1,:]
print('shape=%s, type=%s, missing_value=%s' % \
(soilm.shape, type(soilm), soilmvar.missing_value))
import matplotlib.pyplot as plt
%matplotlib inline
cs = plt.contourf(soilm)
shape=(361, 720), type=<class 'numpy.ma.core.MaskedArray'>, missing_value=nan
There is a similar feature for variables with scale_factor
and add_offset
attributes.
hours since YY:MM:DD hh-mm-ss
*.num2date
and date2num
convenience functions provided to convert between these numeric time coordinates and handy python datetime instances.date2index
finds the time index corresponding to a datetime instance.from netCDF4 import num2date, date2num, date2index
timedim = sfctmp.dimensions[0] # time dim name
print('name of time dimension = %s' % timedim)
times = gfs.variables[timedim] # time coord var
print('units = %s, values = %s' % (times.units, times[:]))
name of time dimension = time1 units = Hour since 2023-05-25T12:00:00Z, values = [ 0. 3. 6. 9. 12. 15. 18. 21. 24. 27. 30. 33. 36. 39. 42. 45. 48. 51. 54. 57. 60. 63. 66. 69. 72. 75. 78. 81. 84. 87. 90. 93. 96. 99. 102. 105. 108. 111. 114. 117. 120. 123. 126. 129. 132. 135. 138. 141. 144. 147. 150. 153. 156. 159. 162. 165. 168. 171. 174. 177. 180. 183. 186. 189. 192. 195. 198. 201. 204. 207. 210. 213. 216. 219. 222. 225. 228. 231. 234. 237. 240. 243. 246. 249. 252. 255. 258. 261. 264. 267. 270. 273. 276. 279. 282. 285. 288. 291. 294. 297. 300. 303. 306. 309. 312. 315. 318. 321. 324. 327. 330. 333. 336. 339. 342. 345. 348. 351. 354. 357. 360. 363. 366. 369. 372. 375. 378. 381. 384.]
dates = num2date(times[:], times.units)
print([date.strftime('%Y-%m-%d %H:%M:%S') for date in dates[:10]]) # print only first ten...
['2023-05-25 12:00:00', '2023-05-25 15:00:00', '2023-05-25 18:00:00', '2023-05-25 21:00:00', '2023-05-26 00:00:00', '2023-05-26 03:00:00', '2023-05-26 06:00:00', '2023-05-26 09:00:00', '2023-05-26 12:00:00', '2023-05-26 15:00:00']
from datetime import datetime, timedelta
date = datetime.now() + timedelta(days=3)
print(date)
ntime = date2index(date,times,select='nearest')
print('index = %s, date = %s' % (ntime, dates[ntime]))
2023-05-28 15:57:27.760935 index = 25, date = 2023-05-28 15:00:00
getcloses_ij
we created before...lats, lons = gfs.variables['lat'][:], gfs.variables['lon'][:]
# lats, lons are 1-d. Make them 2-d using numpy.meshgrid.
lons, lats = np.meshgrid(lons,lats)
j, i = getclosest_ij(lats,lons,40,-105)
fcst_temp = sfctmp[ntime,j,i]
print('Boulder forecast valid at %s UTC = %5.1f %s' % \
(dates[ntime],fcst_temp,sfctmp.units))
Boulder forecast valid at 2023-05-28 15:00:00 UTC = 297.6 K
What if you have a bunch of netcdf files, each with data for a different year, and you want to access all the data as if it were in one file?
!ls -ldgG data/prmsl*nc
-rw-rw-r-- 1 8985332 May 17 15:27 data/prmsl.2000.nc -rw-rw-r-- 1 8968789 May 17 15:27 data/prmsl.2001.nc -rw-rw-r-- 1 8972796 May 17 15:27 data/prmsl.2002.nc -rw-rw-r-- 1 8974435 May 17 15:27 data/prmsl.2003.nc -rw-rw-r-- 1 8997438 May 17 15:27 data/prmsl.2004.nc -rw-rw-r-- 1 8976678 May 17 15:27 data/prmsl.2005.nc -rw-rw-r-- 1 8969714 May 17 15:27 data/prmsl.2006.nc -rw-rw-r-- 1 8974360 May 17 15:27 data/prmsl.2007.nc -rw-rw-r-- 1 8994260 May 17 15:27 data/prmsl.2008.nc -rw-rw-r-- 1 8974678 May 17 15:27 data/prmsl.2009.nc -rw-rw-r-- 1 8970732 May 17 15:27 data/prmsl.2010.nc -rw-rw-r-- 1 8976285 May 17 15:27 data/prmsl.2011.nc
MFDataset
uses file globbing to patch together all the files into one big Dataset.
You can also pass it a list of specific files.
Limitations:
NETCDF3
, or NETCDF4_CLASSIC
formatted files.mf = netCDF4.MFDataset('data/prmsl*nc')
times = mf.variables['time']
dates = num2date(times[:],times.units)
print('starting date = %s' % dates[0])
print('ending date = %s'% dates[-1])
prmsl = mf.variables['prmsl']
print('times shape = %s' % times.shape)
print('prmsl dimensions = %s, prmsl shape = %s' %\
(prmsl.dimensions, prmsl.shape))
starting date = 2000-01-01 00:00:00 ending date = 2011-12-31 00:00:00 times shape = 4383 prmsl dimensions = ('time', 'lat', 'lon'), prmsl shape = (4383, 91, 180)
It's good to close netCDF files, but not actually necessary when Dataset is open for read access only.
f.close()
gfs.close()
Now you're ready to start exploring your data interactively.
To be continued with Writing netCDF data ....