This notebook demonstrates the evaluation of three debiasers, ISIMIP (Lange 2021), CDFt (Vrac 2017) and Quantile Delta Mapping (Cannon et al 2015) to the variables tas (daily mean 2m surface temperature, K) and pr (total daily precipitation flux, kg/m2/s). ISIMIP is additionally applied to tasmin (daily minimum 2m surface temperature, K) and tasmax (daily maximum 2m surface temperature, K).
The evaluation module provides a set of functionalities that can help you assess the performance of your bias correction method. The notebook is structured along the following three components of the module:
This step requires you to have downloaded and preprocessed observational or reanalysis data (such as ERA5) as well as simulations of a climate model during a historical and future (or application) period. Necessary pre-processing steps include regridding the datasets to the same area and grid in space and time and conducted checks for corrupted or missing data.
For the purpose of this demonstration, we will work with the testing data uploaded to the github folder of ibicus. This data is already preprocessed and numpy arrays of data have been extracted. We can get it using:
!wget https://github.com/ecmwf-projects/ibicus/blob/main/notebooks/testing_data.zip -c
!unzip testing_data.zip
import numpy as np
import iris
import math
from scipy.stats import norm, laplace, gamma, beta
from ibicus.variables import *
from ibicus.utils import gen_PrecipitationHurdleModel, gen_PrecipitationGammaLeftCensoredModel
from ibicus.debias import ECDFM, ISIMIP, QuantileMapping, DeltaChange, CDFt
from ibicus.debias import QuantileDeltaMapping, ScaledDistributionMapping, LinearScaling
from ibicus.evaluate import assumptions, correlation, marginal, multivariate, trend
from ibicus.evaluate.metrics import *
The second component of the evaluation requires a split of the historical period into training and validation period which is implemented in the following function. As a default value, the split_ratio is set such that the training period ends at the end of a year.
def get_data(variable, path = "testing_data/", split_ratio = 0.7037):
data = np.load(f"{path}{variable}.npz", allow_pickle = True)
cut_off = math.floor(data["obs"].shape[0]*split_ratio)
obs = data["obs"][0:cut_off, :, :]
obs_validate = data["obs"][cut_off:, :, :]
cm_hist = data["cm_hist"][0:cut_off, :, :]
cm_validate = data["cm_hist"][cut_off:, :, :]
cm_future = data["cm_future"]
time_obs = data["time_obs"][0:cut_off]
time_obs_validate = data["time_obs"][cut_off:]
time_cm_hist = data["time_cm_hist"][0:cut_off]
time_cm_validate = data["time_cm_hist"][cut_off:]
time_cm_future = data["time_cm_future"]
return obs, obs_validate, cm_hist, cm_validate, cm_future, time_obs, time_obs_validate, time_cm_hist, time_cm_validate, time_cm_future
Initializing the testing data and dates for tas, pr, tasmin, tasmax:
tas_obs, tas_obs_validate, tas_cm_hist, tas_cm_validate, tas_cm_future, tas_time_obs, tas_time_obs_validate, tas_time_cm_hist, tas_time_cm_validate, tas_time_cm_future = get_data("tas")
tasmin_obs, tasmin_obs_validate, tasmin_cm_hist, tasmin_cm_validate, tasmin_cm_future, tasmin_time_obs, tasmin_time_obs_validate, tasmin_time_cm_hist, tasmin_time_cm_validate, tasmin_time_cm_future = get_data("tasmin")
tasmax_obs, tasmax_obs_validate, tasmax_cm_hist, tasmax_cm_validate, tasmax_cm_future, tasmax_time_obs, tasmax_time_obs_validate, tasmax_time_cm_hist, tasmax_time_cm_validate, tasmax_time_cm_future = get_data("tasmax")
pr_obs, pr_obs_validate, pr_cm_hist, pr_cm_validate, pr_cm_future, pr_time_obs, pr_time_obs_validate, pr_time_cm_hist, pr_time_cm_validate, pr_time_cm_future = get_data("pr")
Calculating tasmin and tasmax: currently only ISIMIP covers tasmin and tasmax. It also does not bias correct these two variables directly, but rather bias corrects the variables tasrange and tasskew, where
$$ \text{tasrange} = \text{tasmax} - \text{tasmin} $$and
$$ \text{tasskew} = \frac{\text{tas} − \text{tasmin}}{\text{tasrange}} $$The utils function 'get_tasmin_tasmax' calculates tasmin and tasmax from tasrange and tasskew.
tasrange_obs, tasskew_obs = utils.get_tasrange_tasskew(tas = tas_obs, tasmin = tasmin_obs, tasmax = tasmax_obs)
tasrange_obs_validate, tasskew_obs_validate = utils.get_tasrange_tasskew(tas = tas_obs_validate, tasmin = tasmin_obs_validate, tasmax = tasmax_obs_validate)
tasrange_cm_hist, tasskew_cm_hist = utils.get_tasrange_tasskew(tas = tas_cm_hist, tasmin = tasmin_cm_hist, tasmax = tasmax_cm_hist)
tasrange_cm_validate, tasskew_cm_validate = utils.get_tasrange_tasskew(tas = tas_cm_validate, tasmin = tasmin_cm_validate, tasmax = tasmax_cm_validate)
tasrange_cm_future, tasskew_cm_future = utils.get_tasrange_tasskew(tas = tas_cm_future, tasmin = tasmin_cm_future, tasmax = tasmax_cm_future)
Different bias correction methods rely on different assumptions, as described above. A detailed overview of assumptions associated with a specific bias correction method can be found in the documentation of each debiaser. For the sake of demonstration, we investigate the goodness of fit in this notebook:
For all parametric methods, distributions are fitted to the data. Default distributions for each variable are specified in the individual debiasers. We assess the following two components:
Step 1: Calculating the AIC and plottinge the worst fit
The Akaike Information Criterion is a statistical method for comparative evaluation of different statistical models. The AIC weighs model complexity with goodness of fit, with a lower AIC indicating a 'better model', and is computed as follows:
$$ AIC = 2 \frac{k}{n} - 2 \frac{l}{n}$$whereby $l$ is the log likelihood function calculated in the following way:
$$ l = - \frac{n}{2} (1+ \ln(2 \pi) + \ln (\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2)) $$To demonstrate, we test the fit of the normal and the beta distribution to the temperature data:
tas_obs_aic = assumptions.calculate_aic('tas', tas_obs_validate, scipy.stats.norm, scipy.stats.beta)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:639: RuntimeWarning: invalid value encountered in sqrt sk = 2*(b-a)*np.sqrt(a + b + 1) / (a + b + 2) / np.sqrt(a*b) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/optimize/minpack.py:175: RuntimeWarning: The iteration is not making good progress, as measured by the improvement from the last ten iterations. warnings.warn(msg, RuntimeWarning)
The following boxplot shows the distribution of the AIC across locations for the normal (left) and the beta (right) distribution:
assumptions.plot_aic(variable = 'tas', aic_values = tas_obs_aic)
[Text(0.5, 1.0, 'Distribution of AIC values across locations \n Daily mean near-surface air temperature')]
tas_obs_worst_fit = assumptions.plot_fit_worst_aic(variable = 'tas', dataset = tas_obs,
data_type = 'Observed', distribution = scipy.stats.beta,
nr_bins = 100, aic_values = tas_obs_aic)
Given the plot we see above, and the strong bi-modal distribution of the temperature data, we make the assumption that might be seasonal variation at play here. This can be further investigated by using the following function: it plots the time-series and autocorrelation function of the quantile residuals. It also plot a QQ plot of the normalized quantile residuals to get a picture of the goodness of fit for different quantiles.
tas_obs_plot_gof = assumptions.plot_quantile_residuals(variable = 'tas', dataset = tas_obs[:,0,0], data_type = 'observation data',
distribution = scipy.stats.norm)
Conclusion:
Overall the distributions do not seem to fit the temperature data too well, mainly due to the bi-modal distribution of temperature. Based on the quantile residuals, we conclude that there is a strong seasonality at play. One option to address this issue would be to build a statistical model that explicitely includes the season as covariate. An alternative approach is to use a so-called 'running-window-mode' (applied for example in the QDM and ISIMIP debiasers) where a separate mapping is conducted each day (or month) of the year, which also takes care of the seasonality. For more details on the running-window-mode, have a look at the documentation of different debiasers.
The following section initializes and applies the debiasers ISIMIP, CDFt, Quantile Mapping (QM) and Quantile Delta Mapping (QDM) to the chosen variables. For a more detailed explanaition of the bias correction methods we refer to the documentation, and for some examples on how to customize different debiasers, we refer to the notebook 03 Adjusting Debiasers.
ISIMIP
tas_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'tas')
tas_val_debiased_ISIMIP = tas_debiaser_ISIMIP.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tas_fut_debiased_ISIMIP = tas_debiaser_ISIMIP.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
pr_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'pr')
pr_val_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
pr_fut_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2757: RuntimeWarning: divide by zero encountered in double_scalars aest = (3-s + np.sqrt((s-3)**2 + 24*s)) / (12*s) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2756: RuntimeWarning: invalid value encountered in double_scalars func = lambda a: np.log(a) - sc.digamma(a) - s /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2756: RuntimeWarning: invalid value encountered in log func = lambda a: np.log(a) - sc.digamma(a) - s WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2757: RuntimeWarning: divide by zero encountered in double_scalars aest = (3-s + np.sqrt((s-3)**2 + 24*s)) / (12*s) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2756: RuntimeWarning: invalid value encountered in double_scalars func = lambda a: np.log(a) - sc.digamma(a) - s /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/scipy/stats/_continuous_distns.py:2756: RuntimeWarning: invalid value encountered in log func = lambda a: np.log(a) - sc.digamma(a) - s WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: There are no values between thresholds in cm_future, but there should be some after bias adjustment. Using nonparametric quantile mapping (instead of parametric one) between the values in cm_future and the pseudo future observations to calculate the values between threshold. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead. WARNING:root:Step 6: Goodness of fit not good enough according to ks-test. Using nonparametric quantile mapping instead.
tasrange_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'tasrange')
tasrange_val_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasrange_obs, tasrange_cm_hist, tasrange_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tasrange_fut_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasrange_obs, tasrange_cm_hist, tasrange_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
tasskew_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'tasrange')
tasskew_val_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasskew_obs, tasskew_cm_hist, tasskew_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tasskew_fut_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasskew_obs, tasskew_cm_hist, tasskew_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
tasmin_val_debiased_ISIMIP, tasmax_val_debiased_ISIMIP = utils.get_tasmin_tasmax(tas = tas_val_debiased_ISIMIP, tasrange = tasrange_val_debiased_ISIMIP, tasskew = tasskew_val_debiased_ISIMIP)
tasmin_fut_debiased_ISIMIP, tasmax_fut_debiased_ISIMIP = utils.get_tasmin_tasmax(tas = tas_fut_debiased_ISIMIP, tasrange = tasrange_fut_debiased_ISIMIP, tasskew = tasskew_fut_debiased_ISIMIP)
CDFt
tas_debiaser_CDFT = CDFt.from_variable(variable = 'tas')
tas_val_debiased_CDFT = tas_debiaser_CDFT.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY')
tas_fut_debiased_CDFT = tas_debiaser_CDFT.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY')
100%|███████████████████████████████████████████| 225/225 [00:10<00:00, 21.21it/s] 100%|███████████████████████████████████████████| 225/225 [00:34<00:00, 6.48it/s]
pr_debiaser_CDFT = CDFt.from_variable(variable = 'pr')
pr_val_debiased_CDFT = pr_debiaser_CDFT.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, verbosity='ERRORS_ONLY')
pr_fut_debiased_CDFT = pr_debiaser_CDFT.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, verbosity='ERRORS_ONLY')
100%|███████████████████████████████████████████| 225/225 [00:10<00:00, 21.99it/s] 100%|███████████████████████████████████████████| 225/225 [00:34<00:00, 6.54it/s]
Quantile Delta Mapping
tas_debiaser_QDM = QuantileDeltaMapping.from_variable(variable = "tas")
tas_val_debiased_QDM = tas_debiaser_QDM.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tas_fut_debiased_QDM = tas_debiaser_QDM.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
pr_debiaser_QDM = QuantileDeltaMapping.from_variable(variable = "pr")
pr_val_debiased_QDM = pr_debiaser_QDM.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
pr_fut_debiased_QDM = pr_debiaser_QDM.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_quantile_delta_mapping.py:268: RuntimeWarning: divide by zero encountered in true_divide cm_future * self.distribution.ppf(tau_t, *fit_obs) / self.distribution.ppf(tau_t, *fit_cm_hist) /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/utils/_math_utils.py:347: RuntimeWarning: invalid value encountered in log return -np.sum(scipy.stats.gamma.logpdf(x, a=params[0], scale=params[1])) - nr_censored_x * np.log( /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/debias/_quantile_delta_mapping.py:268: RuntimeWarning: divide by zero encountered in true_divide cm_future * self.distribution.ppf(tau_t, *fit_obs) / self.distribution.ppf(tau_t, *fit_cm_hist)
Quantile Mapping
tas_debiaser_QM = QuantileMapping.from_variable(variable = "tas")
tas_val_debiased_QM = tas_debiaser_QM.apply(tas_obs, tas_cm_hist, tas_cm_validate, verbosity='INFO')
tas_fut_debiased_QM = tas_debiaser_QM.apply(tas_obs, tas_cm_hist, tas_cm_future, verbosity='INFO')
pr_debiaser_QM = QuantileMapping.from_variable(variable = "pr")
pr_val_debiased_QM = pr_debiaser_QM.apply(pr_obs, pr_cm_hist, pr_cm_validate, verbosity='INFO')
pr_fut_debiased_QM = pr_debiaser_QM.apply(pr_obs, pr_cm_hist, pr_cm_future, verbosity='INFO')
INFO:root:----- Running debiasing for variable: Daily mean near-surface air temperature ----- 100%|█████████████████████████████████████████| 225/225 [00:00<00:00, 1060.98it/s] INFO:root:----- Running debiasing for variable: Daily mean near-surface air temperature ----- 100%|██████████████████████████████████████████| 225/225 [00:00<00:00, 874.85it/s] INFO:root:----- Running debiasing for variable: Daily mean precipitation ----- 100%|██████████████████████████████████████████| 225/225 [00:01<00:00, 223.61it/s] INFO:root:----- Running debiasing for variable: Daily mean precipitation ----- 100%|███████████████████████████████████████████| 225/225 [00:02<00:00, 99.52it/s]
There are essentially two types of analysis that the evaluation module enables you to conduct: one ist an analysis of the statistical properties of the debiased variable - this includes the marginal bias of descriptive statistics such as the mean, or 5th and 95th percentile, as well as the difference in spatial correlation structure. The other is an analysis of threshold metrics. What we mean by the term threshold metrics here, is that specific thresholds are often relevent for impact studies and other applications - for example the number of frost days for crop modelling. Here we can analyse not only the marginal bias, but also the spell length of days of threshold exceedance, or their spatial extent and spatiotemporal cluster size.
The following table gives an overview, which types of analysis are currently available in this release of the package.
Statistical Properties | Threshold Metrics | |
---|---|---|
Marginal | x - marginal bias | x - marginal bias |
Temporal | x - spell length | |
Spatial | x - RMSE corr matrices | x - spatial extent |
Spatioteporal | x - cluster size | |
Multivariate | x - correlation |
For each variable, we implemented a number of standard metrics, inspired by the climate extreme indices https://www.climdex.org/learn/indices/
tas_metrics = [warm_days, cold_days]
pr_metrics = [dry_days, wet_days, R10mm, R20mm]
tasmin_metrics = [tropical_nights, frost_days]
tasmax_metrics = [summer_days, icing_days]
Users can have a look at the properties of different threshold metrics:
dry_days
AccumulativeThresholdMetric(threshold_value=1.1574074074074073e-05, threshold_type='lower', name='Dry days \n (< 1 mm/day)', variable='pr')
And edit their properties:
dry_days.threshold_type ="lower"
In addition, the user can define entirely new metrics:
R95p = AccumulativeThresholdMetric(name = '95th percentile \n precipitation', variable = 'pr',
threshold_value = np.quantile(np.ndarray.flatten(pr_obs), 0.95), threshold_type = 'higher')
R99p = AccumulativeThresholdMetric(name = '99th percentile \n precipitation', variable = 'pr',
threshold_value = np.quantile(np.ndarray.flatten(pr_obs), 0.99), threshold_type = 'higher')
We start by investigating the marginal bias of our chosen metrics and descriptive statistics. The user can start with a broader overview plot (see below) and then further investigate that have a particularly large bias, or a particularly large spread across locations, to investigate further.
The plot below shows the distribution of marginal (location-wise) biases across locations. For the mean, this is for example calculated as follows:
$$ \text{Bias}_{ij} = 100 * \frac{\bar{\text{tas}}_{obs, ij} - \bar{\text{tas}}_{cm, ij}}{\bar{\text{tas}}_{obs, ij}} $$tas_marginal_bias_data = marginal.calculate_marginal_bias(metrics = tas_metrics,
remove_outliers = True,
obs_data = tas_obs_validate,
raw = tas_cm_validate,
CDFT = tas_val_debiased_CDFT,
ISIMIP = tas_val_debiased_ISIMIP,
QM = tas_val_debiased_QM, QDM = tas_val_debiased_QDM)
tas_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'tas', bias_df = tas_marginal_bias_data)
plt.show()
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:54: RuntimeWarning: invalid value encountered in true_divide bias = 100 * (obs_metric - cm_metric) / obs_metric /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:54: RuntimeWarning: divide by zero encountered in true_divide bias = 100 * (obs_metric - cm_metric) / obs_metric WARNING:root:raw: Bias of ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') > 1000% at on location at least. Because remove_outliers is set to True, the ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:CDFT: Bias of ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') > 1000% at on location at least. Because remove_outliers is set to True, the ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:ISIMIP: Bias of ThresholdMetric(threshold_value=295, threshold_type='higher', name='Mean warm days (K)', variable='tas') > 1000% at on location at least. Because remove_outliers is set to True, the ThresholdMetric(threshold_value=295, threshold_type='higher', name='Mean warm days (K)', variable='tas') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:ISIMIP: Bias of ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') > 1000% at on location at least. Because remove_outliers is set to True, the ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:QM: Bias of ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') > 1000% at on location at least. Because remove_outliers is set to True, the ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:QDM: Bias of ThresholdMetric(threshold_value=295, threshold_type='higher', name='Mean warm days (K)', variable='tas') > 1000% at on location at least. Because remove_outliers is set to True, the ThresholdMetric(threshold_value=295, threshold_type='higher', name='Mean warm days (K)', variable='tas') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:QDM: Bias of ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') > 1000% at on location at least. Because remove_outliers is set to True, the ThresholdMetric(threshold_value=275, threshold_type='lower', name='Mean cold days (K)', variable='tas') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser.
If remove_outliers is set to True, the function above will not plot distributions that contain inf values, or percentage biases larger than 1000%. Those biases call still be calculated by investigating the dataframe when setting remove_outliers to False.
We can further visually inspect the bias correction of temperature by plotting the associated histogram at one location:
tas_marginal_histograms = marginal.plot_histogram(variable = 'tas', data_obs = tas_obs_validate[:, 0,0],
raw = tas_cm_validate[:, 0,0], CDFT = tas_val_debiased_CDFT[:, 0,0],
ISIMIP = tas_val_debiased_ISIMIP[:, 0,0],
QDM = tas_val_debiased_QDM[:, 0,0],
QM = tas_val_debiased_QM[:, 0,0])
plt.show()
pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, obs_data = pr_obs_validate,
raw = pr_cm_validate,
CDFT = pr_val_debiased_CDFT,
ISIMIP = pr_val_debiased_ISIMIP,
QDM = pr_val_debiased_QDM,
QM = pr_val_debiased_QM)
pr_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'pr', bias_df = pr_marginal_bias_data)
plt.show()
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:54: RuntimeWarning: divide by zero encountered in true_divide bias = 100 * (obs_metric - cm_metric) / obs_metric WARNING:root:raw: Bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') > 1000% at on location at least. Because remove_outliers is set to True, the AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:CDFT: Bias of low quantile > 1000% at on location at least. Because remove_outliers is set to True, the low quantile bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:CDFT: Bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') > 1000% at on location at least. Because remove_outliers is set to True, the AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:ISIMIP: Bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') > 1000% at on location at least. Because remove_outliers is set to True, the AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:26: RuntimeWarning: overflow encountered in true_divide mean_bias = 100 * (np.mean(obs_data, axis=0) - np.mean(cm_data, axis=0)) / np.mean(obs_data, axis=0) WARNING:root:QDM: Bias of mean > 1000% at on location at least. Because remove_outliers is set to True, the mean bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:QDM: Bias of high quantile > 1000% at on location at least. Because remove_outliers is set to True, the high quantile bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:QDM: Bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') > 1000% at on location at least. Because remove_outliers is set to True, the AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser. WARNING:root:QM: Bias of AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') > 1000% at on location at least. Because remove_outliers is set to True, the AccumulativeThresholdMetric(threshold_value=0.0002314814814814815, threshold_type='higher', name='Extremely wet days \n (> 20 mm/day)', variable='pr') bias for this debiaser is not shown for the sake of readability. Set remove_outliers to False to include this debiaser.
For precipitation, the percentage bias of the mean is much larger than for temperature. Also, biases in very high or low percentiles (more extreme values) naturally tend to be larger. It is also advised to interprete percentage bias values with care.
Tasmin and tasmax are currently only bias corrected in ISIMIP:
tasmax_marginal_bias_data = marginal.calculate_marginal_bias(metrics = tasmax_metrics,
remove_outliers = False,
obs_data = tasmax_obs_validate,
raw = tasmax_cm_validate,
ISIMIP = tasmax_val_debiased_ISIMIP)
tasmax_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'tasmax', bias_df = tasmax_marginal_bias_data)
plt.show()
/Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:54: RuntimeWarning: divide by zero encountered in true_divide bias = 100 * (obs_metric - cm_metric) / obs_metric /Users/fionaspuler/opt/anaconda3/lib/python3.9/site-packages/ibicus/evaluate/marginal.py:54: RuntimeWarning: invalid value encountered in true_divide bias = 100 * (obs_metric - cm_metric) / obs_metric