This notebook shows how to:
from argopy import DataFetcher # This is the class to work with Argo data
from argopy import ArgoIndex # This is the class to work with Argo index
from argopy import ArgoNVSReferenceTables # This is the class to retrieve data from Argo reference tables
from argopy import ArgoColors # This is a class with usefull pre-defined colors
from argopy.plot import scatter_map, scatter_plot # This is a function to easily make maps
# Make a fresh start
import argopy
argopy.reset_options()
argopy.clear_cache()
argopy.set_options(cachedir='cache_bgc')
#
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
import cmocean
import xarray as xr
xr.set_options(display_expand_attrs = False)
/Users/gmaze/miniconda3/envs/argopy-docs/lib/python3.8/site-packages/pyproj/__init__.py:89: UserWarning: pyproj unable to set database path. _pyproj_global_context_initialize()
<xarray.core.options.set_options at 0x7f897b7d7e80>
import logging
logging.getLogger("matplotlib").setLevel(logging.ERROR)
logging.getLogger("pyproj").setLevel(logging.ERROR)
logging.getLogger("fsspec").setLevel(logging.ERROR)
logging.getLogger("parso").setLevel(logging.ERROR)
logging.getLogger("asyncio").setLevel(logging.ERROR)
DEBUGFORMATTER = '%(asctime)s [%(levelname)s] [%(name)s] %(filename)s:%(lineno)d: %(message)s'
logging.basicConfig(
level=logging.DEBUG,
format=DEBUGFORMATTER,
datefmt='%I:%M:%S %p',
handlers=[logging.FileHandler("nb-docs.log", mode='w')]
)
For this demo notebook, we'll work with BGC floats in the Labrador Sea
For your own use, you can simply overwrite the BOX
variable content with your region.
# Format: [lon_min, lon_max, lat_min, lat_max, pres_min, pres_max, datim_min, datim_max]
BOX = [-56, -45, 54, 60, 0, 2000, '2022-01', '2023-01']
# BOX = [-56, -45, 54, 60, 0, 2000, '2022-09', '2023-01']
BOX = [-56, -45, 54, 60, 0, 500, '2019-01', '2023-01']
# BOX = [-75, -62, 38, 42, 0, 2000, '2021-01', '2022-01']
# Load the BGC-synthetic profiles index
# rq: we work with synthetic profiles because that's the only dataset available from the erddap at this point (2023/07/21)
idx = ArgoIndex(index_file='bgc-s').load()
idx
<argoindex.pyarrow> Host: https://data-argo.ifremer.fr Index: argo_synthetic-profile_index.txt Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC) Loaded: True (287723 records) Searched: False
# Select profile in a space/time domain:
index_BOX = [BOX[ii] for ii in [0, 1, 2, 3, 6, 7]] # We don't want the pressure axis BOX limits
idx.search_lat_lon_tim(index_BOX)
<argoindex.pyarrow> Host: https://data-argo.ifremer.fr Index: argo_synthetic-profile_index.txt Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC) Loaded: True (287723 records) Searched: True (1411 matches, 0.4904%)
# Get the list of all parameters for this region:
idx.read_params()
['BBP700', 'CDOM', 'CHLA', 'DOWNWELLING_PAR', 'DOWN_IRRADIANCE380', 'DOWN_IRRADIANCE412', 'DOWN_IRRADIANCE490', 'DOXY', 'NITRATE', 'PH_IN_SITU_TOTAL', 'PRES', 'PSAL', 'TEMP']
# Extract the list of BGC parameters:
bgc_params = idx.read_params()
[bgc_params.remove(p) for p in ['PRES', 'TEMP', 'PSAL']] # Remove core variables from the list
bgc_params
['BBP700', 'CDOM', 'CHLA', 'DOWNWELLING_PAR', 'DOWN_IRRADIANCE380', 'DOWN_IRRADIANCE412', 'DOWN_IRRADIANCE490', 'DOXY', 'NITRATE', 'PH_IN_SITU_TOTAL']
# How many different floats in the region:
len(idx.read_wmo())
48
In the following DataFetcher
command, we use the parallel
option to chunk the requests into a collection of smaller domains (with a maximum length of 30 days)
%%time
# f = DataFetcher(ds='bgc', mode='expert', params='all', parallel=True, progress=True).region(BOX).load() # Fetch everything !
f = DataFetcher(ds='bgc', mode='expert', params='all',
parallel=True, progress=True, cache=False,
chunks_maxsize={'time': 30},
)
f = f.region(BOX).load()
f
Final post-processing of the merged dataset () ... CPU times: user 1min 10s, sys: 2.65 s, total: 1min 12s Wall time: 4min 51s
<datafetcher.erddap> Name: Ifremer erddap Argo BGC data fetcher for a space/time region API: https://erddap.ifremer.fr/erddap Domain: [x=-56.00/-45.00; y=54.00/60.0 ... 00.0; t=2019-01-01/2023-01-01] BGC variables: ['BBP700', 'CDOM', 'CHLA', 'DOWNWELLING_PAR', 'DOWN_IRRADIANCE380', 'DOWN_IRRADIANCE412', 'DOWN_IRRADIANCE490', 'DOXY', 'NITRATE', 'PH_IN_SITU_TOTAL', 'PRES', 'PSAL', 'TEMP'] BGC 'must be measured' variables: [] Performances: cache=False, parallel=True User mode: expert Dataset: bgc
# Check the data structure (xarray.dataset):
ds = f.data
ds
<xarray.Dataset> Dimensions: (N_POINTS: 710679) Coordinates: LATITUDE (N_POINTS) float64 56.93 56.93 ... 59.67 LONGITUDE (N_POINTS) float64 -52.52 ... -50.98 TIME (N_POINTS) datetime64[ns] 2019-01-01T0... * N_POINTS (N_POINTS) int64 0 1 2 ... 21749 21750 Data variables: (12/84) BBP700 (N_POINTS) float32 nan nan ... 0.0002474 BBP700_ADJUSTED (N_POINTS) float32 nan nan ... nan nan BBP700_ADJUSTED_ERROR (N_POINTS) float32 nan nan ... nan nan BBP700_ADJUSTED_QC (N_POINTS) int64 99999 99999 ... 0 0 BBP700_DATA_MODE (N_POINTS) <U1 'R' 'R' 'R' ... '' '' '' BBP700_QC (N_POINTS) int64 99999 99999 ... 2 2 ... ... TEMP_ADJUSTED (N_POINTS) float32 3.108 3.11 ... 3.718 TEMP_ADJUSTED_ERROR (N_POINTS) float32 0.002 0.002 ... nan TEMP_ADJUSTED_QC (N_POINTS) int64 1 1 1 1 1 ... 8 8 8 8 8 TEMP_DATA_MODE (N_POINTS) <U1 'D' 'D' 'D' ... '' '' '' TEMP_QC (N_POINTS) int64 1 1 1 1 1 ... 8 8 8 8 8 TIME_QC (N_POINTS) int64 1 1 1 1 1 ... 1 1 1 1 1 Attributes: (7)
# Check the data through the argo xarray accessor:
ds.argo
<xarray.Dataset.argo> This is a collection of Argo points N_POINTS(42348) ~ N_PROF(1413) x N_LEVELS(1420)
# and check the corresponding index structure (pandas.dataframe):
df = f.index
df
file | date | latitude | longitude | ocean | profiler_code | institution_code | parameters | parameter_data_mode | date_update | wmo | cyc | institution | profiler | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | aoml/4903365/profiles/SD4903365_010.nc | 2021-11-11 04:59:42 | 57.866 | -46.017 | A | 869 | AO | PRES TEMP PSAL DOXY CHLA BBP700 CDOM PH_IN_SIT... | AAADAARDD | 2023-07-05 09:04:39 | 4903365 | 10 | AOML, USA | Unknown |
1 | aoml/4903365/profiles/SD4903365_011.nc | 2021-11-21 03:43:40 | 57.474 | -46.290 | A | 869 | AO | PRES TEMP PSAL DOXY CHLA BBP700 CDOM PH_IN_SIT... | AAADAARDD | 2023-07-05 09:04:51 | 4903365 | 11 | AOML, USA | Unknown |
2 | coriolis/1902578/profiles/SR1902578_001.nc | 2022-05-30 05:01:26 | 56.757 | -52.358 | A | 834 | IF | PRES TEMP PSAL DOXY CHLA BBP700 CDOM PH_IN_SIT... | RRRAARRR | 2023-05-03 10:11:30 | 1902578 | 1 | Ifremer, France | Unknown |
3 | coriolis/1902578/profiles/SR1902578_001D.nc | 2022-05-29 12:11:57 | 56.816 | -52.258 | A | 834 | IF | PRES TEMP PSAL CHLA BBP700 CDOM | RRRARR | 2023-05-03 10:11:16 | 1902578 | 1 | Ifremer, France | Unknown |
4 | coriolis/1902578/profiles/SR1902578_002.nc | 2022-05-31 14:23:37 | 56.664 | -52.604 | A | 834 | IF | PRES TEMP PSAL DOXY CHLA BBP700 CDOM PH_IN_SIT... | RRRAARRR | 2023-05-03 10:11:43 | 1902578 | 2 | Ifremer, France | Unknown |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1406 | meds/4902508/profiles/SR4902508_055.nc | 2022-02-06 09:31:00 | 56.947 | -50.378 | A | 844 | ME | PRES TEMP PSAL | RRR | 2022-06-29 02:29:26 | 4902508 | 55 | MEDS, Canada | ARVOR float with SBE conductivity sensor |
1407 | meds/4902512/profiles/SR4902512_054.nc | 2022-01-27 04:35:00 | 57.890 | -46.448 | A | 844 | ME | PRES TEMP PSAL | RRR | 2022-06-29 02:32:08 | 4902512 | 54 | MEDS, Canada | ARVOR float with SBE conductivity sensor |
1408 | meds/4902512/profiles/SR4902512_055.nc | 2022-02-06 09:43:00 | 57.940 | -46.851 | A | 844 | ME | PRES TEMP PSAL | RRR | 2022-06-29 02:32:20 | 4902512 | 55 | MEDS, Canada | ARVOR float with SBE conductivity sensor |
1409 | meds/4902513/profiles/SR4902513_054.nc | 2022-01-27 04:37:00 | 59.242 | -53.157 | A | 844 | ME | PRES TEMP PSAL | RRR | 2022-06-29 02:32:59 | 4902513 | 54 | MEDS, Canada | ARVOR float with SBE conductivity sensor |
1410 | meds/4902513/profiles/SR4902513_055.nc | 2022-02-06 09:44:00 | 58.702 | -53.878 | A | 844 | ME | PRES TEMP PSAL | RRR | 2022-06-29 02:33:11 | 4902513 | 55 | MEDS, Canada | ARVOR float with SBE conductivity sensor |
1411 rows × 14 columns
Rq: we could have done this without downloading the data by working directly with the ArgoIndex
scatter_map(idx.to_dataframe(), set_global=False);
# But here, we use the index retrieved with the DataFetcher:
scatter_map(df, traj=False, set_global=False, legend=False);
We can also make a scatter map with one BGC parameter data mode.
# Randomly select one BGC parameter to work with:
a_param = bgc_params[np.random.randint(len(bgc_params))]
a_param
'CHLA'
# Get more verbose information about this parameter (usefull for plot titles):
reftbl = ArgoNVSReferenceTables().tbl('R03')
param_info = reftbl[reftbl['altLabel']==a_param].iloc[0].to_dict()
param_info
{'altLabel': 'CHLA', 'prefLabel': 'Chlorophyll-a concentration', 'definition': 'Chlorophyll-a concentration (mg/m^3).', 'deprecated': 'false', 'id': 'http://vocab.nerc.ac.uk/collection/R03/current/CHLA/'}
# To make the scatter map, we need to have the data mode available in one DataFrame column
# so we need to add a new column with the DATA_MODE of the PARAMETER:
df["variables"] = df["parameters"].apply(lambda x: x.split())
df["%s_DM" % a_param] = df.apply(lambda x: x['parameter_data_mode'][x['variables'].index(a_param)] if a_param in x['variables'] else '', axis=1)
np.unique(df["%s_DM" % a_param])
array(['', 'A'], dtype=object)
# Finally plot the map:
fig, ax = scatter_map(df,
hue="%s_DM" % a_param,
cmap="data_mode",
markersize=24,
markeredgecolor='w',
traj_color='gray',
legend_title='Data mode')
ax.set_title("Data mode for %s (%s)\n%i profiles from the %s\n%i profiles downloaded" % (param_info['prefLabel'], a_param,
idx.N_MATCH, idx.convention_title, df.shape[0]));
Rq: If some points have no data mode in the above map, it simply means that these profiles have not the request parameters
We will be using the argopy scatter_plot method that has the following signature:
scatter_plot(
ds: xarray.core.dataset.Dataset,
this_param,
this_x='TIME',
this_y='PRES',
figsize=(18, 6),
cmap=None,
vmin=None,
vmax=None,
s=4,
)
# Try to define readible color bounds for each BGC variables
# (this is probably NOT appropriate for all regions and periods !)
c_bounds = {'BBP700': (20e-5, 70e-5),
'CDOM': (0.4, 0.8),
'CHLA': (0, 0.2),
'DOWNWELLING_PAR': (0, 10),
'DOWN_IRRADIANCE380': (0, .1),
'DOWN_IRRADIANCE412': (0, .1),
'DOWN_IRRADIANCE490': (0, .1),
'DOXY': (250,300),
'NITRATE': (-2, 20),
'PH_IN_SITU_TOTAL': (6, 8.5),
}
# We''l make use of the argopy color scheme:
ArgoColors('qc')
Quality control flag scale | ||
Names: qc, qc_flag, quality_control, quality_control_flag, quality_control_flag_scale | ||
0 | No QC performed | |
1 | Good data | |
2 | Probably good data | |
3 | Probably bad data that are potentially adjustable | |
4 | Bad data | |
5 | Value changed | |
6 | Not used | |
7 | Not used | |
8 | Estimated value | |
9 | Missing value |
fig, ax = scatter_plot(ds, a_param + '_QC', this_x = a_param,
vmin=0, vmax=9, cmap=ArgoColors('qc').cmap, figsize=(5,5))
ax.set_title("QC for %s [%s]\n'%s' mission" % (param_info['prefLabel'], a_param, f.mission),
fontdict={'weight': 'bold', 'size': 14});
fig, ax = scatter_plot(ds, a_param + '_ADJUSTED_QC', this_x = a_param + '_ADJUSTED',
vmin=0, vmax=9, cmap=ArgoColors('qc').cmap, figsize=(5,5))
ax.set_title("QC for Adjusted %s [%s]\n'%s' mission" % (param_info['prefLabel'], a_param + '_ADJUSTED', f.mission),
fontdict={'weight': 'bold', 'size': 14});
fig, ax = scatter_plot(ds, 'CHLA_ADJUSTED', this_x = 'TEMP_ADJUSTED', this_y = 'DOXY_ADJUSTED', figsize=(5,5))
fig, ax = scatter_plot(ds, 'TEMP', vmin=2, vmax=8)
ax.set_title("%s ('%s' mission)" % ('TEMP', f.mission), fontdict={'weight': 'bold', 'size': 14});
fig, ax = scatter_plot(ds, 'PSAL', cmap=cmocean.cm.haline, vmin=34.5, vmax=35)
ax.set_title("%s ('%s' mission)" % ('PSAL', f.mission), fontdict={'weight': 'bold', 'size': 14});
# Plot all BGC params
for param in bgc_params:
vmin, vmax = c_bounds[param] if param in c_bounds else (None, None)
fig, ax = scatter_plot(ds, param, vmin=vmin, vmax=vmax)
ax.set_title("%s ('%s' mission)" % (param, f.mission), fontdict={'weight': 'bold', 'size': 12});