This notebook demonstrates the evaluation of three debiasers, ISIMIP (Lange 2021), CDFt (Vrac 2017) and Quantile Delta Mapping (Cannon et al 2015) to the variables tas (daily mean 2m surface temperature, K) and pr (total daily precipitation flux, kg/m2/s). ISIMIP is additionally applied to tasmin (daily minimum 2m surface temperature, K) and tasmax (daily maximum 2m surface temperature, K).

The evaluation module provides a set of functionalities that can help you assess the performance of your bias correction method. The notebook is structured along the following three components of the module:

**Testing assumptions of different debiasers**: Different debiasers rely on different assumptions - some are parametrics, others non-parametric, some bias correct each day of the year separately, others are applied to all days of the year in the same way. This components is meant to check some of these assumptions and help the user rule out the use of some debiasers that are not fit for purpose in this specific application.**Evaluating the bias corrected model on a validation period**: In order to assess the performance of a bias correction method, the bias corrected model data should be compared to observational / reanalysis data. This component provides insight into the correction of marginal biases, as well as temporal, spatial and spatiotemporal metrics.**Investigating whether the climate change trend is preserved**: Bias correction methods can significantly modify the trend projected in the climate model simulation (Switanek 2017). Have a look at the documentation of individual debiasers to see whether the method is explicitely trend-preserving or not. This component shows the bias in trend of different metrics between the raw and bias corrected climate model.

This step requires you to have downloaded and preprocessed observational or reanalysis data (such as ERA5) as well as simulations of a climate model during a historical and future (or application) period. Necessary pre-processing steps include regridding the datasets to the same area and grid in space and time and conducted checks for corrupted or missing data.

For the purpose of this demonstration, we will work with the testing data uploaded to the github folder of ibicus. This data is already preprocessed and numpy arrays of data have been extracted. We can get it using:

```
!wget https://github.com/ecmwf-projects/ibicus/blob/main/notebooks/testing_data.zip -c
!unzip testing_data.zip
```

In [1]:

```
import numpy as np
import iris
import math
from scipy.stats import norm, laplace, gamma, beta
from ibicus.variables import *
from ibicus.utils import gen_PrecipitationHurdleModel, gen_PrecipitationGammaLeftCensoredModel
from ibicus.debias import ECDFM, ISIMIP, QuantileMapping, DeltaChange, CDFt
from ibicus.debias import QuantileDeltaMapping, ScaledDistributionMapping, LinearScaling
from ibicus.evaluate import assumptions, correlation, marginal, multivariate, trend
from ibicus.evaluate.metrics import *
```

In [2]:

```
def get_data(variable, path = "testing_data/", split_ratio = 0.7037):
data = np.load(f"{path}{variable}.npz", allow_pickle = True)
cut_off = math.floor(data["obs"].shape[0]*split_ratio)
obs = data["obs"][0:cut_off, :, :]
obs_validate = data["obs"][cut_off:, :, :]
cm_hist = data["cm_hist"][0:cut_off, :, :]
cm_validate = data["cm_hist"][cut_off:, :, :]
cm_future = data["cm_future"]
time_obs = data["time_obs"][0:cut_off]
time_obs_validate = data["time_obs"][cut_off:]
time_cm_hist = data["time_cm_hist"][0:cut_off]
time_cm_validate = data["time_cm_hist"][cut_off:]
time_cm_future = data["time_cm_future"]
return obs, obs_validate, cm_hist, cm_validate, cm_future, time_obs, time_obs_validate, time_cm_hist, time_cm_validate, time_cm_future
```

Initializing the testing data and dates for tas, pr, tasmin, tasmax:

In [3]:

```
tas_obs, tas_obs_validate, tas_cm_hist, tas_cm_validate, tas_cm_future, tas_time_obs, tas_time_obs_validate, tas_time_cm_hist, tas_time_cm_validate, tas_time_cm_future = get_data("tas")
tasmin_obs, tasmin_obs_validate, tasmin_cm_hist, tasmin_cm_validate, tasmin_cm_future, tasmin_time_obs, tasmin_time_obs_validate, tasmin_time_cm_hist, tasmin_time_cm_validate, tasmin_time_cm_future = get_data("tasmin")
tasmax_obs, tasmax_obs_validate, tasmax_cm_hist, tasmax_cm_validate, tasmax_cm_future, tasmax_time_obs, tasmax_time_obs_validate, tasmax_time_cm_hist, tasmax_time_cm_validate, tasmax_time_cm_future = get_data("tasmax")
pr_obs, pr_obs_validate, pr_cm_hist, pr_cm_validate, pr_cm_future, pr_time_obs, pr_time_obs_validate, pr_time_cm_hist, pr_time_cm_validate, pr_time_cm_future = get_data("pr")
```

**Calculating tasmin and tasmax**: currently only ISIMIP covers tasmin and tasmax. It also does not bias correct these two variables directly, but rather bias corrects the variables tasrange and tasskew, where

and

$$ \text{tasskew} = \frac{\text{tas} − \text{tasmin}}{\text{tasrange}} $$The utils function 'get_tasmin_tasmax' calculates tasmin and tasmax from tasrange and tasskew.

In [4]:

```
tasrange_obs, tasskew_obs = utils.get_tasrange_tasskew(tas = tas_obs, tasmin = tasmin_obs, tasmax = tasmax_obs)
tasrange_obs_validate, tasskew_obs_validate = utils.get_tasrange_tasskew(tas = tas_obs_validate, tasmin = tasmin_obs_validate, tasmax = tasmax_obs_validate)
tasrange_cm_hist, tasskew_cm_hist = utils.get_tasrange_tasskew(tas = tas_cm_hist, tasmin = tasmin_cm_hist, tasmax = tasmax_cm_hist)
tasrange_cm_validate, tasskew_cm_validate = utils.get_tasrange_tasskew(tas = tas_cm_validate, tasmin = tasmin_cm_validate, tasmax = tasmax_cm_validate)
tasrange_cm_future, tasskew_cm_future = utils.get_tasrange_tasskew(tas = tas_cm_future, tasmin = tasmin_cm_future, tasmax = tasmax_cm_future)
```

Different bias correction methods rely on different assumptions, as described above. A detailed overview of assumptions associated with a specific bias correction method can be found in the documentation of each debiaser. For the sake of demonstration, we investigate the goodness of fit in this notebook:

For all parametric methods, distributions are fitted to the data. Default distributions for each variable are specified in the individual debiasers. We assess the following two components:

- Is the fit of the default distribution 'good enough' or should a different distribution be used? (1)
- Is there any seasonality in the data that should be accounted for, for example by applying a 'running window mode' (meaning that the bias correction is fitted separately for different parts of the year, i.e. windows)? (2)

**Step 1: Calculating the AIC and plottinge the worst fit**

The Akaike Information Criterion is a statistical method for comparative evaluation of different statistical models. The AIC weighs model complexity with goodness of fit, with a lower AIC indicating a 'better model', and is computed as follows:

$$ AIC = 2 \frac{k}{n} - 2 \frac{l}{n}$$whereby $l$ is the log likelihood function calculated in the following way:

$$ l = - \frac{n}{2} (1+ \ln(2 \pi) + \ln (\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2)) $$To demonstrate, we test the fit of the normal and the beta distribution to the temperature data:

In [5]:

```
tas_obs_aic = assumptions.calculate_aic('tas', tas_obs_validate, scipy.stats.norm, scipy.stats.beta)
```

In [6]:

```
assumptions.plot_aic(variable = 'tas', aic_values = tas_obs_aic)
```

Out[6]:

[Text(0.5, 1.0, 'Distribution of AIC values across locations \n Daily mean near-surface air temperature')]

In [7]:

```
tas_obs_worst_fit = assumptions.plot_fit_worst_aic(variable = 'tas', dataset = tas_obs,
data_type = 'Observed', distribution = scipy.stats.beta,
nr_bins = 100, aic_values = tas_obs_aic)
```

In [8]:

```
tas_obs_plot_gof = assumptions.plot_quantile_residuals(variable = 'tas', dataset = tas_obs[:,0,0], data_type = 'observation data',
distribution = scipy.stats.norm)
```

**Conclusion:**

Overall the distributions do not seem to fit the temperature data too well, mainly due to the bi-modal distribution of temperature. Based on the quantile residuals, we conclude that there is a strong seasonality at play. One option to address this issue would be to build a statistical model that explicitely includes the season as covariate. An alternative approach is to use a so-called 'running-window-mode' (applied for example in the QDM and ISIMIP debiasers) where a separate mapping is conducted each day (or month) of the year, which also takes care of the seasonality. For more details on the running-window-mode, have a look at the documentation of different debiasers.

The following section initializes and applies the debiasers ISIMIP, CDFt, Quantile Mapping (QM) and Quantile Delta Mapping (QDM) to the chosen variables. For a more detailed explanaition of the bias correction methods we refer to the documentation, and for some examples on how to customize different debiasers, we refer to the notebook 03 Adjusting Debiasers.

**ISIMIP**

In [9]:

```
tas_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'tas')
tas_val_debiased_ISIMIP = tas_debiaser_ISIMIP.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tas_fut_debiased_ISIMIP = tas_debiaser_ISIMIP.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
```

In [10]:

```
pr_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'pr')
pr_val_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
pr_fut_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
```

In [11]:

```
tasrange_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'tasrange')
tasrange_val_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasrange_obs, tasrange_cm_hist, tasrange_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tasrange_fut_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasrange_obs, tasrange_cm_hist, tasrange_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
```

In [12]:

```
tasskew_debiaser_ISIMIP = ISIMIP.from_variable(variable = 'tasrange')
tasskew_val_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasskew_obs, tasskew_cm_hist, tasskew_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tasskew_fut_debiased_ISIMIP = pr_debiaser_ISIMIP.apply(tasskew_obs, tasskew_cm_hist, tasskew_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
```

In [13]:

```
tasmin_val_debiased_ISIMIP, tasmax_val_debiased_ISIMIP = utils.get_tasmin_tasmax(tas = tas_val_debiased_ISIMIP, tasrange = tasrange_val_debiased_ISIMIP, tasskew = tasskew_val_debiased_ISIMIP)
tasmin_fut_debiased_ISIMIP, tasmax_fut_debiased_ISIMIP = utils.get_tasmin_tasmax(tas = tas_fut_debiased_ISIMIP, tasrange = tasrange_fut_debiased_ISIMIP, tasskew = tasskew_fut_debiased_ISIMIP)
```

**CDFt**

In [14]:

```
tas_debiaser_CDFT = CDFt.from_variable(variable = 'tas')
tas_val_debiased_CDFT = tas_debiaser_CDFT.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY')
tas_fut_debiased_CDFT = tas_debiaser_CDFT.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY')
```

In [15]:

```
pr_debiaser_CDFT = CDFt.from_variable(variable = 'pr')
pr_val_debiased_CDFT = pr_debiaser_CDFT.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, verbosity='ERRORS_ONLY')
pr_fut_debiased_CDFT = pr_debiaser_CDFT.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, verbosity='ERRORS_ONLY')
```

**Quantile Delta Mapping**

In [16]:

```
tas_debiaser_QDM = QuantileDeltaMapping.from_variable(variable = "tas")
tas_val_debiased_QDM = tas_debiaser_QDM.apply(tas_obs, tas_cm_hist, tas_cm_validate, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
tas_fut_debiased_QDM = tas_debiaser_QDM.apply(tas_obs, tas_cm_hist, tas_cm_future, time_obs = tas_time_obs, time_cm_hist = tas_time_cm_hist, time_cm_future = tas_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
```

In [17]:

```
pr_debiaser_QDM = QuantileDeltaMapping.from_variable(variable = "pr")
pr_val_debiased_QDM = pr_debiaser_QDM.apply(pr_obs, pr_cm_hist, pr_cm_validate, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_validate, verbosity='ERRORS_ONLY', parallel=True)
pr_fut_debiased_QDM = pr_debiaser_QDM.apply(pr_obs, pr_cm_hist, pr_cm_future, time_obs = pr_time_obs, time_cm_hist = pr_time_cm_hist, time_cm_future = pr_time_cm_future, verbosity='ERRORS_ONLY', parallel=True)
```

**Quantile Mapping**

In [18]:

```
tas_debiaser_QM = QuantileMapping.from_variable(variable = "tas")
tas_val_debiased_QM = tas_debiaser_QM.apply(tas_obs, tas_cm_hist, tas_cm_validate, verbosity='INFO')
tas_fut_debiased_QM = tas_debiaser_QM.apply(tas_obs, tas_cm_hist, tas_cm_future, verbosity='INFO')
pr_debiaser_QM = QuantileMapping.from_variable(variable = "pr")
pr_val_debiased_QM = pr_debiaser_QM.apply(pr_obs, pr_cm_hist, pr_cm_validate, verbosity='INFO')
pr_fut_debiased_QM = pr_debiaser_QM.apply(pr_obs, pr_cm_hist, pr_cm_future, verbosity='INFO')
```

There are essentially two types of analysis that the evaluation module enables you to conduct: one ist an analysis of the statistical properties of the debiased variable - this includes the marginal bias of descriptive statistics such as the mean, or 5th and 95th percentile, as well as the difference in spatial correlation structure. The other is an analysis of threshold metrics. What we mean by the term threshold metrics here, is that specific thresholds are often relevent for impact studies and other applications - for example the number of frost days for crop modelling. Here we can analyse not only the marginal bias, but also the spell length of days of threshold exceedance, or their spatial extent and spatiotemporal cluster size.

The following table gives an overview, which types of analysis are currently available in this release of the package.

Statistical Properties | Threshold Metrics | |
---|---|---|

Marginal |
x - marginal bias | x - marginal bias |

Temporal |
x - spell length | |

Spatial |
x - RMSE corr matrices | x - spatial extent |

Spatioteporal |
x - cluster size | |

Multivariate |
x - correlation |

In [19]:

```
tas_metrics = [warm_days, cold_days]
pr_metrics = [dry_days, wet_days, R10mm, R20mm]
tasmin_metrics = [tropical_nights, frost_days]
tasmax_metrics = [summer_days, icing_days]
```

Users can have a look at the properties of different threshold metrics:

In [20]:

```
dry_days
```

Out[20]:

AccumulativeThresholdMetric(threshold_value=1.1574074074074073e-05, threshold_type='lower', name='Dry days \n (< 1 mm/day)', variable='pr')

And edit their properties:

In [21]:

```
dry_days.threshold_type ="lower"
```

In addition, the user can define entirely new metrics:

In [22]:

```
R95p = AccumulativeThresholdMetric(name = '95th percentile \n precipitation', variable = 'pr',
threshold_value = np.quantile(np.ndarray.flatten(pr_obs), 0.95), threshold_type = 'higher')
R99p = AccumulativeThresholdMetric(name = '99th percentile \n precipitation', variable = 'pr',
threshold_value = np.quantile(np.ndarray.flatten(pr_obs), 0.99), threshold_type = 'higher')
```

We start by investigating the marginal bias of our chosen metrics and descriptive statistics. The user can start with a broader overview plot (see below) and then further investigate that have a particularly large bias, or a particularly large spread across locations, to investigate further.

The plot below shows the distribution of marginal (location-wise) biases across locations. For the mean, this is for example calculated as follows:

$$ \text{Bias}_{ij} = 100 * \frac{\bar{\text{tas}}_{obs, ij} - \bar{\text{tas}}_{cm, ij}}{\bar{\text{tas}}_{obs, ij}} $$In [24]:

```
tas_marginal_bias_data = marginal.calculate_marginal_bias(metrics = tas_metrics,
remove_outliers = True,
obs_data = tas_obs_validate,
raw = tas_cm_validate,
CDFT = tas_val_debiased_CDFT,
ISIMIP = tas_val_debiased_ISIMIP,
QM = tas_val_debiased_QM, QDM = tas_val_debiased_QDM)
tas_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'tas', bias_df = tas_marginal_bias_data)
plt.show()
```

If remove_outliers is set to True, the function above will not plot distributions that contain inf values, or percentage biases larger than 1000%. Those biases call still be calculated by investigating the dataframe when setting remove_outliers to False.

We can further visually inspect the bias correction of temperature by plotting the associated histogram at one location:

In [25]:

```
tas_marginal_histograms = marginal.plot_histogram(variable = 'tas', data_obs = tas_obs_validate[:, 0,0],
raw = tas_cm_validate[:, 0,0], CDFT = tas_val_debiased_CDFT[:, 0,0],
ISIMIP = tas_val_debiased_ISIMIP[:, 0,0],
QDM = tas_val_debiased_QDM[:, 0,0],
QM = tas_val_debiased_QM[:, 0,0])
plt.show()
```

In [26]:

```
pr_marginal_bias_data = marginal.calculate_marginal_bias(metrics = pr_metrics, obs_data = pr_obs_validate,
raw = pr_cm_validate,
CDFT = pr_val_debiased_CDFT,
ISIMIP = pr_val_debiased_ISIMIP,
QDM = pr_val_debiased_QDM,
QM = pr_val_debiased_QM)
pr_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'pr', bias_df = pr_marginal_bias_data)
plt.show()
```

For precipitation, the percentage bias of the mean is much larger than for temperature. Also, biases in very high or low percentiles (more extreme values) naturally tend to be larger. It is also advised to interprete percentage bias values with care.

Tasmin and tasmax are currently only bias corrected in ISIMIP:

In [27]:

```
tasmax_marginal_bias_data = marginal.calculate_marginal_bias(metrics = tasmax_metrics,
remove_outliers = False,
obs_data = tasmax_obs_validate,
raw = tasmax_cm_validate,
ISIMIP = tasmax_val_debiased_ISIMIP)
tasmax_marginal_bias_plot = marginal.plot_marginal_bias(variable = 'tasmax', bias_df = tasmax_marginal_bias_data)
plt.show()
```