Figuring out how to transform a day's ONC CTD data from a SoG node into a netCDF file that is part of an ERDDAP dataset:
qaqcFlag == 1
samplesqaqcFlag
arrays as variable attributesmean, variance, and count for each variable in each time bin
/opt/tomcat/content/erddap/datasets.xml
fragmentfrom collections import OrderedDict
import os
import arrow
from lxml import etree
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from salishsea_tools import data_tools
from salishsea_tools.places import PLACES
%matplotlib inline
scalardata
Web Service¶Access to the ONC web services requires a user token which you can generate on the
Web Services API
tab of your ONC account profile page.
I have stored mine in an environment variable so as not to publish it to the world
in this notebook.
TOKEN = os.environ['ONC_USER_TOKEN']
Request a day's worth of CTD salinity and temperature data
and parse them into an xarray.Dataset
:
onc_data = data_tools.get_onc_data(
'scalardata', 'getByStation', TOKEN,
station='SCVIP', deviceCategory='CTD',
sensors='salinity,temperature',
dateFrom=data_tools.onc_datetime('2015-12-27 00:00', 'utc'),
)
ctd_data = data_tools.onc_json_to_dataset(onc_data)
ctd_data
<xarray.Dataset> Dimensions: (sampleTime: 86398) Coordinates: * sampleTime (sampleTime) datetime64[ns] 2015-12-27T00:00:00.259000 ... Data variables: salinity (sampleTime) float64 31.23 31.23 31.23 31.23 31.23 31.23 ... temperature (sampleTime) float64 9.959 9.959 9.959 9.959 9.959 9.959 ... Attributes: dateFrom: 2015-12-27T00:00:00.000Z deviceCategory: CTD dateTo: None rowLimit: None outputFormat: None sensors: salinity,temperature nextDateFrom: 2015-12-28T00:00:00.090Z totalActualSamples: 172796 station: SCVIP
ctd_data.data_vars['salinity']
<xarray.DataArray 'salinity' (sampleTime: 86398)> array([ 31.22826354, 31.22836401, 31.22826354, ..., 28.26344878, 28.26375019, 28.26415208]) Coordinates: * sampleTime (sampleTime) datetime64[ns] 2015-12-27T00:00:00.259000 ... Attributes: unitOfMeasure: g/kg qaqcFlag: [1 1 1 ..., 4 4 4] sensorName: Reference Salinity actualSamples: 86398
qaqcFlag
Values¶Filter the salinity data to exclude samples for which qaqcFlag != 1
:
ctd_data.salinity
<xarray.DataArray 'salinity' (sampleTime: 86398)> array([ 31.22826354, 31.22836401, 31.22826354, ..., 28.26344878, 28.26375019, 28.26415208]) Coordinates: * sampleTime (sampleTime) datetime64[ns] 2015-12-27T00:00:00.259000 ... Attributes: unitOfMeasure: g/kg qaqcFlag: [1 1 1 ..., 4 4 4] sensorName: Reference Salinity actualSamples: 86398
salinity_qaqc_mask = ctd_data.salinity.attrs['qaqcFlag'] == 1
salinity = xr.DataArray(
name='salinity',
data=ctd_data.salinity[salinity_qaqc_mask].values,
coords={'time': ctd_data.salinity.sampleTime[salinity_qaqc_mask].values},
)
salinity
<xarray.DataArray 'salinity' (time: 54393)> array([ 31.22826354, 31.22836401, 31.22826354, ..., 31.22977061, 31.22967014, 31.22977061]) Coordinates: * time (time) datetime64[ns] 2015-12-27T00:00:00.259000 ...
Filter the temperature data to exclude samples for which qaqcFlag != 1
:
ctd_data.temperature
<xarray.DataArray 'temperature' (sampleTime: 86398)> array([ 9.959 , 9.959 , 9.9591, ..., 9.9531, 9.9533, 9.9531]) Coordinates: * sampleTime (sampleTime) datetime64[ns] 2015-12-27T00:00:00.259000 ... Attributes: unitOfMeasure: C qaqcFlag: [1 1 1 ..., 1 1 1] sensorName: Temperature actualSamples: 86398
temperature_qaqc_mask = ctd_data.temperature.attrs['qaqcFlag'] == 1
temperature = xr.DataArray(
name='temperature',
data=ctd_data.temperature[temperature_qaqc_mask].values,
coords={'time': ctd_data.temperature.sampleTime[temperature_qaqc_mask].values},
)
temperature
<xarray.DataArray 'temperature' (time: 86395)> array([ 9.959 , 9.959 , 9.9591, ..., 9.9531, 9.9533, 9.9531]) Coordinates: * time (time) datetime64[ns] 2015-12-27T00:00:00.259000 ...
Station-specific metadata for the dataset:
xr_metadata = {
'SCVIP': {
'place_name': 'Central node',
'ONC_station': 'Central',
'ONC_stationCode': PLACES['Central node']['ONC stationCode'],
'ONC_stationDescription':
'Pacific, Salish Sea, Strait of Georgia, Central, Strait of Georgia VENUS Instrument Platform',
'ONC_data_product_url': 'http://dmas.uvic.ca/DataSearch?location=SCVIP&deviceCategory=CTD',
},
'SEVIP': {
'place_name': 'East node',
'ONC_station': 'East',
'ONC_stationCode': PLACES['East node']['ONC stationCode'],
'ONC_stationDescription':
'Pacific, Salish Sea, Strait of Georgia, East, Strait of Georgia VENUS Instrument Platform',
'ONC_data_product_url': 'http://dmas.uvic.ca/DataSearch?location=SEVIP&deviceCategory=CTD',
},
}
Define an aggregation function to count the samples in each resampling interval:
def count(values, axis):
return values.size
Create a dataset of resampled data and their statistics:
onc_station = 'SCVIP'
ds = xr.Dataset(
data_vars={
'salinity': xr.DataArray(
name='salinity',
data=salinity.resample('15Min', 'time', how='mean'),
attrs={
'ioos_category': 'Salinity',
'standard_name': 'sea_water_reference_salinity',
'long_name': 'reference salinity',
'units': 'g/kg',
'aggregation_operation': 'mean',
'aggregation_interval': 15*60,
'aggregation_interval_units': 'seconds',
},
),
'salinity_std_dev': xr.DataArray(
name='salinity_std_dev',
data=salinity.resample('15Min', 'time', how='std'),
attrs={
'ioos_category': 'Salinity',
'standard_name':b 'sea_water_reference_salinity_standard_deviation',
'long_name': 'reference salinity standard deviation',
'units': 'g/kg',
'aggregation_operation': 'standard deviation',
'aggregation_interval': 15*60,
'aggregation_interval_units': 'seconds',
},
),
'salinity_sample_count': xr.DataArray(
name='salinity_sample_count',
data=salinity.resample('15Min', 'time', how=count),
attrs={
'standard_name': 'sea_water_reference_salinity_sample_count',
'long_name': 'reference salinity sample count',
'aggregation_operation': 'count',
'aggregation_interval': 15*60,
'aggregation_interval_units': 'seconds',
},
),
'temperature': xr.DataArray(
name='temperature',
data=temperature.resample('15Min', 'time', how='mean'),
attrs={
'ioos_category': 'Temperature',
'standard_name': 'sea_water_temperature',
'long_name': 'temperature',
'units': 'degrees_Celcius',
'aggregation_operation': 'mean',
'aggregation_interval': 15*60,
'aggregation_interval_units': 'seconds',
},
),
'temperature_std_dev': xr.DataArray(
name='temperature_std_dev',
data=temperature.resample('15Min', 'time', how='std'),
attrs={
'ioos_category': 'Temperature',
'standard_name': 'sea_water_temperature_standard_deviation',
'long_name': 'temperature standard deviation',
'units': 'degrees_Celcius',
'aggregation_operation': 'standard deviation',
'aggregation_interval': 15*60,
'aggregation_interval_units': 'seconds',
},
),
'temperature_sample_count': xr.DataArray(
name='temperature_sample_count',
data=temperature.resample('15Min', 'time', how=count),
attrs={
'standard_name': 'sea_water_temperature_sample_count',
'long_name': 'temperature sample count',
'aggregation_operation': 'count',
'aggregation_interval': 15*60,
'aggregation_interval_units': 'seconds',
},
),
},
coords={
'depth': PLACES[xr_metadata[onc_station]['place_name']]['depth'],
'longitude': PLACES[xr_metadata[onc_station]['place_name']]['lon lat'][0],
'latitude': PLACES[xr_metadata[onc_station]['place_name']]['lon lat'][1],
},
attrs={
'history': """
{0} Download raw data from ONC scalardata API.
{0} Filter to exclude data with qaqcFlag != 1.
{0} Resample data to 15 minute intervals using mean, standard deviation and count as aggregation functions.
{0} Store as netCDF4 file.
""".format(arrow.now().format('YYYY-MM-DD HH:mm:ss')),
'ONC_station': xr_metadata[onc_station]['ONC_station'],
'ONC_stationCode': PLACES[xr_metadata[onc_station]['place_name']]['ONC stationCode'],
'ONC_stationDescription': xr_metadata[onc_station]['ONC_stationDescription'],
'ONC_data_product_url': xr_metadata[onc_station]['ONC_data_product_url'],
},
)
If any of the DataArray
s are short compared to the others the missing values
are filled with NaN
s.
That makes sense for temperature and salinity values,
and their standard deviations,
but not their sample counts.
So, we change NaN
s to zeros in the sample count DataArray
s:
ds.salinity_sample_count.values = np.nan_to_num(ds.salinity_sample_count.values)
ds.temperature_sample_count.values = np.nan_to_num(ds.temperature_sample_count.values)
ds
<xarray.Dataset> Dimensions: (time: 96) Coordinates: * time (time) datetime64[ns] 2015-12-27 ... longitude float64 -123.4 depth int64 294 latitude float64 49.04 Data variables: salinity (time) float64 31.23 31.23 31.23 31.23 31.23 ... salinity_std_dev (time) float64 0.0001472 0.0001813 0.0001711 ... temperature_sample_count (time) int64 900 900 900 900 900 900 900 900 ... temperature_std_dev (time) float64 0.0001285 0.0001598 0.000142 ... salinity_sample_count (time) float64 900.0 899.0 900.0 900.0 900.0 ... temperature (time) float64 9.959 9.959 9.959 9.959 9.958 ... Attributes: history: 2016-09-19 13:07:14 Download raw data from ONC scalardata API. 2016-09-19 13:07:14 Filter to exclude data with qaqcFlag != 1. 2016-09-19 13:07:14 Resample data to 15 minute intervals using mean, standard deviation and count as aggregation functions. 2016-09-19 13:07:14 Store as netCDF4 file. ONC_stationDescription: Pacific, Salish Sea, Strait of Georgia, Central, Strait of Georgia VENUS Instrument Platform ONC_stationCode: SCVIP ONC_station: Central ONC_data_product_url: http://dmas.uvic.ca/DataSearch?location=SCVIP&deviceCategory=CTD
print(ds.salinity_sample_count)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 6))
salinity.plot(ax=ax1)
ax1.set_title('Raw Data')
ds.salinity.plot(ax=ax2)
ax2.set_title('15 min Averaged')
ds.salinity_std_dev.plot(ax=ax3)
ax3.set_title('15 min Std Dev')
<xarray.DataArray 'salinity_sample_count' (time: 96)> array([ 900., 899., 900., 900., 900., 899., 899., 900., 900., 900., 899., 897., 900., 896., 899., 900., 899., 899., 900., 900., 899., 900., 899., 899., 900., 899., 899., 899., 899., 897., 900., 899., 900., 900., 900., 899., 899., 899., 899., 896., 895., 899., 899., 900., 898., 898., 900., 899., 899., 900., 900., 899., 900., 899., 900., 899., 898., 898., 900., 900., 446., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) Coordinates: * time (time) datetime64[ns] 2015-12-27 2015-12-27T00:15:00 ... longitude float64 -123.4 depth int64 294 latitude float64 49.04 Attributes: standard_name: sea_water_reference_salinity_sample_count aggregation_operation: count aggregation_interval: 900 aggregation_interval_units: seconds long_name: reference salinity sample count
<matplotlib.text.Text at 0x7fab0b9b0e10>
print(ds.temperature_sample_count)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 6))
temperature.plot(ax=ax1)
ax1.set_title('Raw Data')
ds.temperature.plot(ax=ax2)
ax2.set_title('15 min Averaged')
ds.temperature_std_dev.plot(ax=ax3)
ax3.set_title('15 min Std Dev')
<xarray.DataArray 'temperature_sample_count' (time: 96)> array([900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 899, 901, 899, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 897, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 900, 899, 900, 900]) Coordinates: * time (time) datetime64[ns] 2015-12-27 2015-12-27T00:15:00 ... longitude float64 -123.4 depth int64 294 latitude float64 49.04 Attributes: standard_name: sea_water_temperature_sample_count aggregation_operation: count aggregation_interval: 900 aggregation_interval_units: seconds long_name: temperature sample count
<matplotlib.text.Text at 0x7fab0b8d7be0>
ERDDAP requires that all files in a dataset have the same units for their time
variable.
On the other hand,
xarray
defaults to using the 1st time
value in the dataset as the time-base for the units.
So, we have to explicitly define the time
units as an encoding when the dataset is stored
as a netCDF4 file.
ds.to_netcdf(
'/results/observations/ONC/CTD/{station}/{station}_CTD_15m_20151227.nc'
.format(station=onc_station),
encoding={'time': {'units': 'minutes since 1970-01-01 00:00'}})
Use the /opt/tomcat/webapps/erddap/WEB-INF/GenerateDatasetsXml.sh
script
generate the initial version of an XML fragment for a dataset:
$ cd /opt/tomcat/webapps/erddap/WEB-INF/
$ bash GenerateDatasetsXml.sh EDDTableFromNcFiles /results/observations/ONC/CTD/SCVIP/
The EDDTableFromNcFiles
and /results/observations/ONC/CTD/SCVIP/
arguments
tell the script which EDDType
and what parent directory to use,
avoiding having to type those in answer to prompts.
Answer the remaining prompts,
for example:
File name regex (e.g., ".*\.nc") (default="")
? .*SCVIP_CTD_15m_\d{8}\.nc$
A sample full file name (default="")
? /results/observations/ONC/CTD/SCVIP/SCVIP_CTD_15m_20160724.nc
DimensionsCSV (or "" for default) (default="")
?
ReloadEveryNMinutes (e.g., 10080) (default="")
? 10080
PreExtractRegex (default="")
?
PostExtractRegex (default="")
?
ExtractRegex (default="")
?
Column name for extract (default="")
?
Sorted column source name (default="")
?
Sort files by sourceName (default="")
?
infoUrl (default="")
? https://salishsea-meopar-tools.readthedocs.org/en/latest/results_server/
institution (default="")
? UBC EOAS
summary (default="")
?
title (default="")
? ONC, Strait of Georgia, Central Node, Salinity and Temperature, 15min, v1
The output is written to /results/erddap/logs/GenerateDatasetsXml.out
The metadata
dictionary below contains information for dataset
attribute tags whose values need to be changed,
or that need to be added for all datasets.
The keys are the dataset attribute names.
The values are dicts containing a required text
item
and perhaps an optional after
item.
The value associated with the text
key is the text content
for the attribute tag.
When present,
the value associated with the after
key is the name
of the dataset attribute after which a new attribute tag
containing the text
value is to be inserted.
metadata = OrderedDict([
('cdm_data_type', {'text': 'TimeSeries'}),
('cdm_timeseries_variables', {
'text': 'depth, longitude, latitude',
'after': 'cdm_data_type',
}),
('institution_fullname', {
'text': 'Earth, Ocean & Atmospheric Sciences, University of British Columbia',
'after': 'institution',
}),
('license', {
'text': '''The Salish Sea MEOPAR observation datasets are copyright 2013 – present
by the Salish Sea MEOPAR Project Contributors, The University of British Columbia, and Ocean Networks Canada.
They are licensed under the Apache License, Version 2.0. http://www.apache.org/licenses/LICENSE-2.0
Raw instrument data on which this dataset is based were provided by Ocean Networks Canada.''',
}),
('project', {
'text':'Salish Sea MEOPAR NEMO Model',
'after': 'title',
}),
('creator_name', {
'text': 'Salish Sea MEOPAR Project Contributors',
}),
('creator_email', {
'text': 'sallen@eos.ubc.ca',
'after': 'creator_name',
}),
('creator_url', {
'text': 'https://salishsea-meopar-docs.readthedocs.org/',
}),
('acknowledgement', {
'text': 'MEOPAR, ONC, Compute Canada',
'after': 'creator_url',
}),
('drawLandMask', {
'text': 'over',
'after': 'acknowledgement',
}),
])
The datasets
dictionary below provides the content
for the dataset title
and summary
attributes.
The title
attribute content appears in the the datasets list table
(among other places).
The summary
atribute content appears
(among other places)
when a user hovers the cursor over the ?
icon beside the title
content in the datasets list table.
The text that is inserted into the summary
attribute tag
by code later in this notebook is the
title
content followed by the summary
content,
separated by a blank line.
The keys of the datasets
dict are the datasetID
strings that
are used in many places by the ERDDAP server.
They are structured as follows:
ubc
to indicate that the dataset was produced at UBCONC
to indicate that the dataset is a product of filter, resampling, etc.raw instrument data provided by Ocean Networks Canada (ONC)
SCVIPCTD
15m
V1
So:
ubcONCSCVIPCTD15mV1
is the version 1 dataset of 15 minute resampled CTD temperature and salinity datafrom the ONC Strait of Georgia Central node VENUS instrument platform
The dataset version part of the datasetID
is used to indicate changes in the variables
contained in the dataset.
All datasets start at V1
and their summary
ends with a notation about the variables
that they contain; e.g.
v1: reference salinity, reference salinity standard deviation, reference salinity sample counts,
temperature, temperature standard deviation, temperature sample counts variables
When the a dataset version is incremented a line describing the change is added
to the end of its summary
.
datasets = {
'ubcONCSCVIPCTD15mV1' :{
'type': 'resampled CTD',
'title': 'ONC, Strait of Georgia, Central Node, Salinity and Temperature, 15min, v1',
'summary':'''Temperature and salinity data from the Ocean Networks Canada (ONC)
Strait of Georgia Central Node VENUS Instrument Platform CTD.
The data are resampled from the raw instrument data to 15 minute mean values.
They are accompanied by standard deviations and sample counts for each of the 15 minute
aggregation intervals.
v1: reference salinity, reference salinity standard deviation, reference salinity sample counts,
temperature, temperature standard deviation, temperature sample counts variables''',
'keywords': '''15min aggregation, ONC Central Node VENUS Instrument Platform, Ocean Networks Canada,
depth, UBC EOAS, Strait of Georgia, latitude, longitude, ocean, SCVIP, observations, CTD,
Oceans > Ocean Temperature > Water Temperature,
reference salinity, salinity_sample_count, salinity_std_dev, sea_water_reference_salinity,
sea_water_reference_salinity_sample_count, sea_water_reference_salinity_standard_deviation,
sea_water_temperature, sea_water_temperature_sample_count, sea_water_temperature_standard_deviation,
temperature, temperature_sample_count, temperature_std_dev, time''',
'fileNameRegex': '.*SCVIP_CTD_15m_\d{8}\.nc$'
},
'ubcONCSEVIPCTD15mV1' :{
'type': 'resampled CTD',
'title': 'ONC, Strait of Georgia, East Node, Salinity and Temperature, 15min, v1',
'summary':'''Temperature and salinity data from the Ocean Networks Canada (ONC)
Strait of Georgia East Node VENUS Instrument Platform CTD.
The data are resampled from the raw instrument data to 15 minute mean values.
They are accompanied by standard deviations and sample counts for each of the 15 minute
aggregation intervals.
v1: reference salinity, reference salinity standard deviation, reference salinity sample counts,
temperature, temperature standard deviation, temperature sample counts variables''',
'keywords': '''15min aggregation, ONC East Node VENUS Instrument Platform, Ocean Networks Canada,
depth, UBC EOAS, Strait of Georgia, latitude, longitude, ocean, SEVIP, observations, CTD,
Oceans > Ocean Temperature > Water Temperature,
reference salinity, salinity_sample_count, salinity_std_dev, sea_water_reference_salinity,
sea_water_reference_salinity_sample_count, sea_water_reference_salinity_standard_deviation,
sea_water_temperature, sea_water_temperature_sample_count, sea_water_temperature_standard_deviation,
temperature, temperature_sample_count, temperature_std_dev, time''',
'fileNameRegex': '.*SEVIP_CTD_15m_\d{8}\.nc$'
},
}
A few convenience functions to reduce code repetition:
def print_tree(root):
"""Display an XML tree fragment with indentation.
"""
print(etree.tostring(root, pretty_print=True).decode('ascii'))
def find_att(root, att):
"""Return the dataset attribute element named att
or raise a ValueError exception if it cannot be found.
"""
e = root.find('.//att[@name="{}"]'.format(att))
if e is None:
raise ValueError('{} attribute element not found'.format(att))
return e
GenerateDatasetsXml.sh
into an XML tree data structuredatasetID
dataset attribute valuerecursive
dataset attribute value to false
fileNameRegex
dataset attribute value because it looses its \
characters during parsing(?)cf_role
attribute element with value timeseries_id
to the time
variabledatasets
dict defined aboveand delete from the variables for which they are nonsensical
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('/results/erddap/logs/GenerateDatasetsXml.out', parser)
root = tree.getroot()
datasetID = 'ubcONCSEVIPCTD15mV1'
root.attrib['datasetID'] = datasetID
root.find('.//recursive').text = 'false'
root.find('.//fileNameRegex').text = datasets[datasetID]['fileNameRegex']
vars = [e.text for e in tree.findall('//sourceName')]
vars
['time', 'salinity', 'temperature', 'temperature_std_dev', 'salinity_std_dev', 'salinity_sample_count', 'temperature_sample_count', 'latitude', 'longitude', 'depth']
e = etree.Element('att', name='cf_role')
e.text = 'timeseries_id'
tree.find('//dataVariable[{}]/addAttributes'.format(vars.index('time')+1)).append(e)
for att, info in metadata.items():
e = etree.Element('att', name=att)
e.text = info['text']
try:
root.find('.//att[@name="{}"]'.format(info['after'])).addnext(e)
except KeyError:
find_att(root, att).text = info['text']
title = datasets[datasetID]['title']
find_att(root, 'title').text = title
find_att(root, 'summary').text = '{0}\n\n{1}'.format(title, datasets[datasetID]['summary'])
find_att(root, 'keywords').text = datasets[datasetID]['keywords']
# Salinity colour map limits
e = tree.find(
'//dataVariable[{}]/addAttributes/att[@name="colorBarMinimum"]'
.format(vars.index('salinity')+1))
e.text = '0.0'
e = tree.find(
'//dataVariable[{}]/addAttributes/att[@name="colorBarMaximum"]'
.format(vars.index('salinity')+1))
e.text = '34.0'
# Temperature colour map limits
e = tree.find(
'//dataVariable[{}]/addAttributes/att[@name="colorBarMinimum"]'
.format(vars.index('temperature')+1))
e.text = '4.0'
e = tree.find(
'//dataVariable[{}]/addAttributes/att[@name="colorBarMaximum"]'
.format(vars.index('temperature')+1))
e.text = '20.0'
# Depth colour map limits
e = tree.find(
'//dataVariable[{}]/addAttributes/att[@name="colorBarMinimum"]'
.format(vars.index('depth')+1))
e.text = '0.0'
e = tree.find(
'//dataVariable[{}]/addAttributes/att[@name="colorBarMaximum"]'
.format(vars.index('depth')+1))
e.text = '450.0'
# Delete nonsensical colourBar* attributes
no_cbar_vars = [
'temperature_sample_count', 'temperature_std_dev',
'salinity_sample_count', 'salinity_std_dev']
for var in no_cbar_vars:
for att in ('colorBarMinimum', 'colorBarMaximum'):
e = tree.find(
'//dataVariable[{0}]/addAttributes/att[@name="{1}"]'
.format(vars.index(var)+1, att))
e.getparent().remove(e)
Inspect the resulting dataset XML fragment below and edit the dicts and code cell above until it is what is required for the dataset:
print_tree(root)
<dataset type="EDDTableFromNcFiles" datasetID="ubcONCSEVIPCTD15mV1" active="true"> <reloadEveryNMinutes>10080</reloadEveryNMinutes> <updateEveryNMillis>10000</updateEveryNMillis> <fileDir>/results/observations/ONC/CTD/SEVIP/</fileDir> <recursive>false</recursive> <fileNameRegex>.*SEVIP_CTD_15m_\d{8}\.nc$</fileNameRegex> <metadataFrom>last</metadataFrom> <preExtractRegex/> <postExtractRegex/> <extractRegex/> <columnNameForExtract/> <sortedColumnSourceName>time</sortedColumnSourceName> <sortFilesBySourceNames>time</sortFilesBySourceNames> <fileTableInMemory>false</fileTableInMemory> <accessibleViaFiles>false</accessibleViaFiles> <!-- sourceAttributes> <att name="_NCProperties">version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17</att> <att name="coordinates">latitude longitude depth</att> <att name="history"> 2016-09-10 15:28:46 Download raw data from ONC scalardata API. 2016-09-10 15:28:46 Filter to exclude data with qaqcFlag != 1. 2016-09-10 15:28:46 Resample data to 15 minute intervals using mean, standard deviation and count as aggregation functions. 2016-09-10 15:28:46 Store as netCDF4 file. </att> <att name="ONC_data_product_url">http://dmas.uvic.ca/DataSearch?location=SEVIP&deviceCategory=CTD</att> <att name="ONC_station">East</att> <att name="ONC_stationCode">SEVIP</att> <att name="ONC_stationDescription">Pacific, Salish Sea, Strait of Georgia, East, Strait of Georgia VENUS Instrument Platform</att> </sourceAttributes --> <!-- Please specify the actual cdm_data_type (TimeSeries?) and related info below, for example... <att name="cdm_timeseries_variables">station, longitude, latitude</att> <att name="subsetVariables">station, longitude, latitude</att> --> <addAttributes> <att name="cdm_data_type">TimeSeries</att> <att name="cdm_timeseries_variables">depth, longitude, latitude</att> <att name="Conventions">COARDS, CF-1.6, ACDD-1.3</att> <att name="creator_name">Salish Sea MEOPAR Project Contributors</att> <att name="creator_email">sallen@eos.ubc.ca</att> <att name="creator_url">https://salishsea-meopar-docs.readthedocs.org/</att> <att name="acknowledgement">MEOPAR, ONC, Compute Canada</att> <att name="drawLandMask">over</att> <att name="infoUrl">https://salishsea-meopar-tools.readthedocs.org/en/latest/results_server/</att> <att name="institution">UBC EOAS</att> <att name="institution_fullname">Earth, Ocean & Atmospheric Sciences, University of British Columbia</att> <att name="keywords">15min aggregation, ONC East Node VENUS Instrument Platform, Ocean Networks Canada, depth, UBC EOAS, Strait of Georgia, latitude, longitude, ocean, SEVIP, observations, CTD, Oceans &gt; Ocean Temperature &gt; Water Temperature, reference salinity, salinity_sample_count, salinity_std_dev, sea_water_reference_salinity, sea_water_reference_salinity_sample_count, sea_water_reference_salinity_standard_deviation, sea_water_temperature, sea_water_temperature_sample_count, sea_water_temperature_standard_deviation, temperature, temperature_sample_count, temperature_std_dev, time</att> <att name="keywords_vocabulary">GCMD Science Keywords</att> <att name="license">The Salish Sea MEOPAR observation datasets are copyright 2013 – present by the Salish Sea MEOPAR Project Contributors, The University of British Columbia, and Ocean Networks Canada. They are licensed under the Apache License, Version 2.0. http://www.apache.org/licenses/LICENSE-2.0 Raw instrument data on which this dataset is based were provided by Ocean Networks Canada.</att> <att name="sourceUrl">(local files)</att> <att name="standard_name_vocabulary">CF Standard Name Table v29</att> <att name="summary">ONC, Strait of Georgia, East Node, Salinity and Temperature, 15min, v1 Temperature and salinity data from the Ocean Networks Canada (ONC) Strait of Georgia East Node VENUS Instrument Platform CTD. The data are resampled from the raw instrument data to 15 minute mean values. They are accompanied by standard deviations and sample counts for each of the 15 minute aggregation intervals. v1: reference salinity, reference salinity standard deviation, reference salinity sample counts, temperature, temperature standard deviation, temperature sample counts variables</att> <att name="title">ONC, Strait of Georgia, East Node, Salinity and Temperature, 15min, v1</att> <att name="project">Salish Sea MEOPAR NEMO Model</att> </addAttributes> <dataVariable> <sourceName>time</sourceName> <destinationName>time</destinationName> <dataType>long</dataType> <!-- sourceAttributes> <att name="calendar">proleptic_gregorian</att> <att name="units">minutes since 1970-01-01</att> </sourceAttributes --> <addAttributes> <att name="long_name">Time</att> <att name="standard_name">time</att> <att name="cf_role">timeseries_id</att> </addAttributes> </dataVariable> <dataVariable> <sourceName>salinity</sourceName> <destinationName>salinity</destinationName> <dataType>double</dataType> <!-- sourceAttributes> <att name="aggregation_interval" type="long">900</att> <att name="aggregation_interval_units">seconds</att> <att name="aggregation_operation">mean</att> <att name="ioos_category">Salinity</att> <att name="long_name">reference salinity</att> <att name="standard_name">sea_water_reference_salinity</att> <att name="units">g/kg</att> </sourceAttributes --> <addAttributes> <att name="colorBarMaximum" type="double">34.0</att> <att name="colorBarMinimum" type="double">0.0</att> </addAttributes> </dataVariable> <dataVariable> <sourceName>temperature</sourceName> <destinationName>temperature</destinationName> <dataType>double</dataType> <!-- sourceAttributes> <att name="aggregation_interval" type="long">900</att> <att name="aggregation_interval_units">seconds</att> <att name="aggregation_operation">mean</att> <att name="ioos_category">Temperature</att> <att name="long_name">temperature</att> <att name="standard_name">sea_water_temperature</att> <att name="units">degrees_Celcius</att> </sourceAttributes --> <addAttributes> <att name="colorBarMaximum" type="double">20.0</att> <att name="colorBarMinimum" type="double">4.0</att> </addAttributes> </dataVariable> <dataVariable> <sourceName>temperature_std_dev</sourceName> <destinationName>temperature_std_dev</destinationName> <dataType>double</dataType> <!-- sourceAttributes> <att name="aggregation_interval" type="long">900</att> <att name="aggregation_interval_units">seconds</att> <att name="aggregation_operation">standard deviation</att> <att name="ioos_category">Temperature</att> <att name="long_name">temperature standard deviation</att> <att name="standard_name">sea_water_temperature_standard_deviation</att> <att name="units">degrees_Celcius</att> </sourceAttributes --> <addAttributes/> </dataVariable> <dataVariable> <sourceName>salinity_std_dev</sourceName> <destinationName>salinity_std_dev</destinationName> <dataType>double</dataType> <!-- sourceAttributes> <att name="aggregation_interval" type="long">900</att> <att name="aggregation_interval_units">seconds</att> <att name="aggregation_operation">standard deviation</att> <att name="ioos_category">Salinity</att> <att name="long_name">reference salinity standard deviation</att> <att name="standard_name">sea_water_reference_salinity_standard_deviation</att> <att name="units">g/kg</att> </sourceAttributes --> <addAttributes/> </dataVariable> <dataVariable> <sourceName>salinity_sample_count</sourceName> <destinationName>salinity_sample_count</destinationName> <dataType>long</dataType> <!-- sourceAttributes> <att name="aggregation_interval" type="long">900</att> <att name="aggregation_interval_units">seconds</att> <att name="aggregation_operation">count</att> <att name="long_name">reference salinity sample count</att> <att name="standard_name">sea_water_reference_salinity_sample_count</att> </sourceAttributes --> <addAttributes/> </dataVariable> <dataVariable> <sourceName>temperature_sample_count</sourceName> <destinationName>temperature_sample_count</destinationName> <dataType>long</dataType> <!-- sourceAttributes> <att name="aggregation_interval" type="long">900</att> <att name="aggregation_interval_units">seconds</att> <att name="aggregation_operation">count</att> <att name="long_name">temperature sample count</att> <att name="standard_name">sea_water_temperature_sample_count</att> </sourceAttributes --> <addAttributes/> </dataVariable> <dataVariable> <sourceName>latitude</sourceName> <destinationName>latitude</destinationName> <dataType>double</dataType> <!-- sourceAttributes> </sourceAttributes --> <addAttributes> <att name="colorBarMaximum" type="double">90.0</att> <att name="colorBarMinimum" type="double">-90.0</att> <att name="long_name">Latitude</att> <att name="standard_name">latitude</att> <att name="units">degrees_north</att> </addAttributes> </dataVariable> <dataVariable> <sourceName>longitude</sourceName> <destinationName>longitude</destinationName> <dataType>double</dataType> <!-- sourceAttributes> </sourceAttributes --> <addAttributes> <att name="colorBarMaximum" type="double">180.0</att> <att name="colorBarMinimum" type="double">-180.0</att> <att name="long_name">Longitude</att> <att name="standard_name">longitude</att> <att name="units">degrees_east</att> </addAttributes> </dataVariable> <dataVariable> <sourceName>depth</sourceName> <destinationName>depth</destinationName> <dataType>long</dataType> <!-- sourceAttributes> </sourceAttributes --> <addAttributes> <att name="colorBarMaximum" type="double">450.0</att> <att name="colorBarMinimum" type="double">0.0</att> <att name="colorBarPalette">OceanDepth</att> <att name="long_name">Depth</att> <att name="standard_name">depth</att> <att name="units">m</att> </addAttributes> </dataVariable> </dataset>
Store the XML fragment for the dataset:
with open('/results/erddap_datasets_xml/{}.xml'.format(datasetID), 'wb') as f:
f.write(etree.tostring(root, pretty_print=True))
Edit /opt/tomcat/content/erddap/datasets.xml
to include the XML fragment for the dataset that was stored by the abave cell.
Create a flag file to signal the ERDDAP server process to load the dataset:
$ cd /results/erddap/flag/
$ touch <datasetID>
Confirm that the dataset and its metadata were correctly added to ERDDAP by inspecting
https://salishsea.eos.ubc.ca/erddap/tabledap/.
If there is a problem,
error messages can be found in /results/erddap/logs/log.txt
.