This notebook demonstrates the evaluation of four bias adjustment methods - ISIMIP (Lange 2021), CDFt (Vrac 2017), Quantile Delta Mapping (Cannon et al 2015) and Quantile Mapping - to the variables tas (daily mean 2m surface temperature, K) and pr (total daily precipitation flux, kg/m2/s). ISIMIP is additionally applied to tasmin (daily minimum 2m surface temperature, K) and tasmax (daily maximum 2m surface temperature, K).
The notebook is structured along the following three components of the module:
This step requires the user to have downloaded and preprocessed observational or reanalysis data (such as ERA5) as well as simulations of a climate model during a historical and future (or application) period. Necessary pre-processing steps include regridding the datasets to the same area and grid in space and time and conducted checks for corrupted or missing data.
For the purpose of this demonstration, we will work with the testing data uploaded to the github folder of ibicus. This data is already preprocessed and numpy arrays of data have been extracted. We can get it using:
!wget https://github.com/ecmwf-projects/ibicus/blob/main/notebooks/testing_data.zip -c
!unzip testing_data.zip
import numpy as np
import iris
import math
from scipy.stats import norm, laplace, gamma, beta
from ibicus.variables import *
from ibicus.utils import gen_PrecipitationHurdleModel, gen_PrecipitationGammaLeftCensoredModel
from ibicus.debias import ECDFM, ISIMIP, QuantileMapping, DeltaChange, CDFt
from ibicus.debias import QuantileDeltaMapping, ScaledDistributionMapping, LinearScaling
from ibicus.evaluate import assumptions, correlation, marginal, multivariate, trend
from ibicus.evaluate.metrics import *
The second component of the evaluation requires a split of the historical period into training and validation period.
def get_data(variable, path = "testing_data/", split_ratio = 0.7037):
data = np.load(f"{path}{variable}.npz", allow_pickle = True)
cut_off = math.floor(data["obs"].shape[0]*split_ratio)
obs = data["obs"][0:cut_off, :, :]
obs_validate = data["obs"][cut_off:, :, :]
cm_hist = data["cm_hist"][0:cut_off, :, :]
cm_validate = data["cm_hist"][cut_off:, :, :]
cm_future = data["cm_future"]
time_obs = data["time_obs"][0:cut_off]
time_obs_validate = data["time_obs"][cut_off:]
time_cm_hist = data["time_cm_hist"][0:cut_off]
time_cm_validate = data["time_cm_hist"][cut_off:]
time_cm_future = data["time_cm_future"]
return obs, obs_validate, cm_hist, cm_validate, cm_future, time_obs, time_obs_validate, time_cm_hist, time_cm_validate, time_cm_future
tas_obs, tas_obs_validate, tas_cm_hist, tas_cm_validate, tas_cm_future, tas_time_obs, tas_time_obs_validate, tas_time_cm_hist, tas_time_cm_validate, tas_time_cm_future = get_data("tas")
pr_obs, pr_obs_validate, pr_cm_hist, pr_cm_validate, pr_cm_future, pr_time_obs, pr_time_obs_validate, pr_time_cm_hist, pr_time_cm_validate, pr_time_cm_future = get_data("pr")
The following section initializes and applies the debiasers ISIMIP, CDFt, Quantile Mapping (QM) and Quantile Delta Mapping (QDM) to the chosen variables. For a more detailed explanaition of the bias adjustment methods we refer to the documentation, and for some examples on how to customize different debiasers, we refer to the notebook on adjusting bias adjustment methods.
ISIMIP
tas_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'tas')
tas_val_debiased_ISIMIP = tas_debiaser_ISIMIP.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, parallel=True)
tas_fut_debiased_ISIMIP = tas_debiaser_ISIMIP.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, parallel=True)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_debiaser.py:536: UserWarning: progressbar argument is ignored when parallel = True. warnings.warn("progressbar argument is ignored when parallel = True.")
pr_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'pr')
pr_val_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, parallel=True)
pr_fut_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, parallel=True)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:3133: RuntimeWarning: divide by zero encountered in double_scalars aest = (3-s + np.sqrt((s-3)**2 + 24*s)) / (12*s) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:3132: RuntimeWarning: invalid value encountered in double_scalars func = lambda a: np.log(a) - sc.digamma(a) - s /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:3133: RuntimeWarning: divide by zero encountered in double_scalars aest = (3-s + np.sqrt((s-3)**2 + 24*s)) / (12*s) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:3132: RuntimeWarning: invalid value encountered in double_scalars func = lambda a: np.log(a) - sc.digamma(a) - s
CDFt
tas_debiaser_CDFT = CDFt.from_variable(variable = 'tas')
tas_val_debiased_CDFT = tas_debiaser_CDFT.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate)
tas_fut_debiased_CDFT = tas_debiaser_CDFT.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future)
100%|█████████████████████████████████████████| 225/225 [00:09<00:00, 24.53it/s] 100%|█████████████████████████████████████████| 225/225 [00:30<00:00, 7.48it/s]
pr_debiaser_CDFT = CDFt.from_variable(variable = 'pr')
pr_val_debiased_CDFT = pr_debiaser_CDFT.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate)
pr_fut_debiased_CDFT = pr_debiaser_CDFT.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future)
100%|█████████████████████████████████████████| 225/225 [00:08<00:00, 25.06it/s] 100%|█████████████████████████████████████████| 225/225 [00:29<00:00, 7.53it/s]
Quantile Delta Mapping
tas_debiaser_QDM = QuantileDeltaMapping.from_variable(variable = "tas")
tas_val_debiased_QDM = tas_debiaser_QDM.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, parallel=True)
tas_fut_debiased_QDM = tas_debiaser_QDM.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, parallel=True)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_debiaser.py:560: UserWarning: The debiaser output contains values outside the reasonable physical range of [100, 400] for the variable: Daily mean near-surface air temperature. This might be due to values outside the range in the input, or to a problem of the debiaser for the given dataset at hand. It is recommended to check the output carefully. self._check_output(output)
pr_debiaser_QDM = QuantileDeltaMapping.from_variable(variable = "pr")
pr_val_debiased_QDM = pr_debiaser_QDM.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, parallel=True)
pr_fut_debiased_QDM = pr_debiaser_QDM.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, parallel=True)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_quantile_delta_mapping.py:322: RuntimeWarning: divide by zero encountered in divide cm_future /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_debiaser.py:560: UserWarning: The debiaser output contains inf or nan values. This might be due to inf or nan values inside the input, or to a problem of the debiaser for the given dataset at hand. It is recommended to check the output carefully self._check_output(output) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_debiaser.py:560: UserWarning: The debiaser output contains values outside the reasonable physical range of [0, 0.01] for the variable: Daily mean precipitation. This might be due to values outside the range in the input, or to a problem of the debiaser for the given dataset at hand. It is recommended to check the output carefully. self._check_output(output) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_quantile_delta_mapping.py:322: RuntimeWarning: divide by zero encountered in divide cm_future
Quantile Mapping
tas_debiaser_QM = QuantileMapping.from_variable(variable = "tas")
tas_val_debiased_QM = tas_debiaser_QM.apply(tas_obs, tas_cm_hist, tas_cm_validate)
tas_fut_debiased_QM = tas_debiaser_QM.apply(tas_obs, tas_cm_hist, tas_cm_future)
pr_debiaser_QM = QuantileMapping.from_variable(variable = "pr")
pr_val_debiased_QM = pr_debiaser_QM.apply(pr_obs, pr_cm_hist, pr_cm_validate)
pr_fut_debiased_QM = pr_debiaser_QM.apply(pr_obs, pr_cm_hist, pr_cm_future)
100%|███████████████████████████████████████| 225/225 [00:00<00:00, 1452.93it/s] 100%|████████████████████████████████████████| 225/225 [00:00<00:00, 924.95it/s] 100%|████████████████████████████████████████| 225/225 [00:00<00:00, 228.97it/s] 100%|████████████████████████████████████████| 225/225 [00:02<00:00, 103.39it/s]
There are two types of metrics that the evaluation module enables you to analyse: bias in the mean or statistical properties such as the 5th or 95th percentile, and bias in threshold metrics such as frost days or location-wise defined high mean daily temperature. The reason for including the second type of metrics is that it is often specific threshold metrics that are relevant for impact models (for example agricultural or hydrological models), and where the success of bias adjustment methods is particularly desirable.
The following table gives an overview, which types of analysis are currently available in this release of the package.
Statistical Properties | Threshold Metrics | |
---|---|---|
Marginal | x - marginal bias | x - marginal bias |
Temporal | x - spell length | |
Spatial | x - RMSE corr matrices | x - spatial extent |
Spatioteporal | x - cluster size | |
Multivariate | x - correlation |
A new threshold metric can be defined by the user by simply specifying at least the a) threshold value or values, b) the type of threshold (whether values above, below, between or outside the given threshold(s) are of interest), c) the variable it applies to and d) a metric name.
mean_frost_days = ThresholdMetric(name="Mean frost days (tas<0°C)", variable="tas", threshold_value=273.13, threshold_type="lower")
Thresholds defined per day, month or season are also possible using the threshold_scope argument, as well as per location (“threshold_locality”) or any combination of the two. Since precipitation is an accumulative variable, a child class with special features is defined for precipitation metrics.
tas_heatwave = ThresholdMetric.from_quantile(tas_obs, 0.95, threshold_type = "higher",
threshold_scope = "season",
time = tas_time_cm_validate,
name = "Mean daily temperature >95th \n seasonal percentile of gridcell")
For each variable, the package implements a number of out-of-the-box metrics which users can edit, next to creating new ones:
dry_days
AccumulativeThresholdMetric(threshold_value=1.1574074074074073e-05, threshold_type='lower', threshold_scope='overall', threshold_locality='global', name='Dry days \n (< 1 mm/day)', variable='pr')
dry_days.threshold_type ="lower"
A number of threshold metrics are associated with each variable by default:
tas_metrics = [warm_days, cold_days, mean_frost_days]
pr_metrics = [dry_days, wet_days, R10mm, R20mm]
Overall, all ETCCDI climate indices are fully integrated and compatible with the ibicus threshold metrics class (https://www.climdex.org/learn/indices/).
We first investigate the location-wise bias of the chosen metrics and descriptive statistics. The plot below shows the distribution of marginal (location-wise) biases across locations. For the mean, this is for example calculated as follows:
$$ \text{Bias}_{ij} = 100 * \frac{\bar{\text{tas}}_{obs, ij} - \bar{\text{tas}}_{cm, ij}}{\bar{\text{tas}}_{obs, ij}} $$tas_marginal_bias_data = marginal.calculate_marginal_bias(metrics = tas_metrics,
statistics = ['mean', 0.05, 0.95],
percentage_or_absolute = 'absolute',
obs = tas_obs_validate,
raw = tas_cm_validate,
CDFT = tas_val_debiased_CDFT,
ISIMIP = tas_val_debiased_ISIMIP,
QM = tas_val_debiased_QM, QDM = tas_val_debiased_QDM)
tas_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'tas',
bias_df = tas_marginal_bias_data,
metrics_title = "Absolute bias [days / year]",
statistics_title = "Absolute bias [K]")
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_utils.py:464: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for index, value in row.iteritems():
pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics,
statistics = ['mean', 0.05, 0.95],
percentage_or_absolute = 'percentage',
obs = pr_obs_validate,
raw = pr_cm_validate,
CDFT = pr_val_debiased_CDFT,
ISIMIP = pr_val_debiased_ISIMIP,
QDM = pr_val_debiased_QDM,
QM = pr_val_debiased_QM)
pr_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'pr',
bias_df = pr_marginal_bias_data,
remove_outliers = True,
outlier_threshold_statistics = 10,
metrics_title = 'Absolute bias [days / year]',
statistics_title = 'Absolute bias [mm]')
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:94: RuntimeWarning: divide by zero encountered in divide bias = 100 * (cm_metric - obs_metric) / obs_metric /var/folders/3g/80lqcgbn19l711ck1nnjxplr0000gn/T/ipykernel_21176/1936019458.py:1: UserWarning: raw: Division by zero encountered in bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', threshold_scope='overall', threshold_locality='global', name='Extremely wet days \n (> 20 mm/day)', variable='pr') calculation, not showing results for this debiaser. pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, /var/folders/3g/80lqcgbn19l711ck1nnjxplr0000gn/T/ipykernel_21176/1936019458.py:1: UserWarning: CDFT: Division by zero encountered in bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', threshold_scope='overall', threshold_locality='global', name='Extremely wet days \n (> 20 mm/day)', variable='pr') calculation, not showing results for this debiaser. pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, /var/folders/3g/80lqcgbn19l711ck1nnjxplr0000gn/T/ipykernel_21176/1936019458.py:1: UserWarning: ISIMIP: Division by zero encountered in bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', threshold_scope='overall', threshold_locality='global', name='Extremely wet days \n (> 20 mm/day)', variable='pr') calculation, not showing results for this debiaser. pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:51: RuntimeWarning: overflow encountered in divide 100 /var/folders/3g/80lqcgbn19l711ck1nnjxplr0000gn/T/ipykernel_21176/1936019458.py:1: UserWarning: QDM: Division by zero encountered in bias of mean calculation, not showing results for this debiaser. pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, /var/folders/3g/80lqcgbn19l711ck1nnjxplr0000gn/T/ipykernel_21176/1936019458.py:1: UserWarning: QDM: Division by zero encountered in bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', threshold_scope='overall', threshold_locality='global', name='Extremely wet days \n (> 20 mm/day)', variable='pr') calculation, not showing results for this debiaser. pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, /var/folders/3g/80lqcgbn19l711ck1nnjxplr0000gn/T/ipykernel_21176/1936019458.py:1: UserWarning: QM: Division by zero encountered in bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', threshold_scope='overall', threshold_locality='global', name='Extremely wet days \n (> 20 mm/day)', variable='pr') calculation, not showing results for this debiaser. pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_utils.py:464: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for index, value in row.iteritems():
Spatial distribution of marginal biases: The boxplots above plot the bias across locations. To investigate this further, we can plot the bias spatially. The function shown below works with input of the type produced by the output of the 'calculate_marginal_bias' function, and produces simple heatmaps that have no further package dependencies other than numpy. The user can build on this to produce more sophisticated plots using packages such as xarray.
tas_bias_map_mean = marginal.plot_bias_spatial(variable = 'tas', metric = '0.05 qn', bias_df = tas_marginal_bias_data)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_utils.py:464: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for index, value in row.iteritems():
pr_bias_map_mean = marginal.plot_bias_spatial(variable = 'pr', metric = 'Mean', bias_df = pr_marginal_bias_data)
For each threshold metric, the temporal spell length, the spatial extent and the 'spatiotemporal cluster size' can be calculated as follows:
spelllength_dry = dry_days.calculate_spell_length(minimum_length= 3,
obs = pr_obs_validate,
raw = pr_cm_validate, CDFT = pr_val_debiased_CDFT,
ISIMIP = pr_val_debiased_ISIMIP,
QM = pr_val_debiased_QM,
QDM = pr_val_debiased_QDM
)
spatiotemporal_dry = dry_days.calculate_spatiotemporal_clusters(
obs = pr_obs_validate,
raw = pr_cm_validate, CDFT = pr_val_debiased_CDFT,
ISIMIP = pr_val_debiased_ISIMIP,
QM = pr_val_debiased_QM,
QDM = pr_val_debiased_QDM
)
spatial_dry = dry_days.calculate_spatial_extent(
obs = pr_obs_validate,
raw = pr_cm_validate, CDFT = pr_val_debiased_CDFT,
ISIMIP = pr_val_debiased_ISIMIP,
QM = pr_val_debiased_QM,
QDM = pr_val_debiased_QDM)
A useful way of visualising these metrics is to plot their empirical cumulative distribution functions as visual identification of differences in the distributions are possible more easily.
spatiotemporal_fig = marginal.plot_spatiotemporal(data = [spelllength_dry, spatiotemporal_dry, spatial_dry])
We see that bias adjustment can significantly modify the spatiotemporal extent of threshold metrics. Depending on the application case and context, individual spatiotemporal extent metrics can and should be investigated in more detail.
So far, the spatiotemporal evaluation has focused on threshold metrics. To evaluate the spatial correlation for a single variable, a function to calculate the RMSE between the observed and the model correlation matrix for a single location is currently implemented in ibicus. At each location, the RMSE between the correlation matrix of this location with all others in the observational data, and the correlation matrix in the climate model data is calculated and output in the following map as a single number at each location.
tas_rmsd_spatial = correlation.rmse_spatial_correlation_distribution(variable = 'tas',
obs_data = tas_obs_validate, raw = tas_cm_future,
CDFT = tas_val_debiased_CDFT,
ISIMIP = tas_val_debiased_ISIMIP,
QM = tas_val_debiased_QM,
QDM = tas_val_debiased_QDM
)
tas_rmsd_spatial_plot = correlation.rmse_spatial_correlation_boxplot(variable = 'tas', dataset = tas_rmsd_spatial)
tas_rmsd_spatial_plot.show()
/var/folders/3g/80lqcgbn19l711ck1nnjxplr0000gn/T/ipykernel_21176/777485544.py:9: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. tas_rmsd_spatial_plot.show()