xskillscore-tutorial

Welcome to the xskillscore tutorial.

This was created for a talk at the Data Science Study Group: South Florida on April 1 st 2020. The associated slides with the talk can be found here.

The repository for this tutorial is hosted on GitHub here: xskillscore-tutorial.

Motivation for xskillscore

xskillscore provides a one-stop shop for metrics used in verification of forecasts.

It is an extension of xarray which is a library that handles labelled n-dimensional arrays. Find out more information about xarray here.

History of xskillscore

xskillscore was developed by Ray Bell while at the University of Miami during the SubX project in 2018.

In 2019, Aaron Spring, Andrew Huang and Riley Brady greatly improved xskillscore. Aaron, Andrew and Riley provided upstream fixes and enhancement of xskillscore as it used extensively in climpred.

xskillscore overview

The verification metrics in xskillscore are split into two types: deterministic and probabilistic.

Deterministic metrics consist of correlation metrics (e.g. pearson r) and distance metrics (e.g. root-mean-square error). These metrics adapt the implementation in scikit-learn and scipy.stats.

Probabilistic metrics can be calculated when the forecast consists of multiple forecasts for the same target. Examples, include Continuous Ranked Probability Score and Brier Score.

xskillscore works on xarray objects which requires data to be castable to an ndarray. It works with numpy.array, pandas.DataFrame and dask.array.

You can see the metrics available in xskillscore by running dir(xs):

In [1]:
import xskillscore as xs
dir(xs)
Out[1]:
['XSkillScoreAccessor',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'brier_score',
 'core',
 'crps_ensemble',
 'crps_gaussian',
 'crps_quadrature',
 'effective_sample_size',
 'mae',
 'mape',
 'median_absolute_error',
 'mse',
 'pearson_r',
 'pearson_r_eff_p_value',
 'pearson_r_p_value',
 'r2',
 'rmse',
 'smape',
 'spearman_r',
 'spearman_r_eff_p_value',
 'spearman_r_p_value',
 'threshold_brier_score']

Table of Contents

In this notebook I show how xskillscore can be dropped in a typical data science task where the data is a pandas.DataFrame.

I use the metric root-mean-squared error (RMSE) to verify forecasts of items sold.

I also show how you can applies weights to the verification and handle missing values.

This notebook shows how to use probabilistic metrics in a typical data science task where the data is a pandas.DataFrame.

The metric Continuous Ranked Probability Score (CRPS) is used to verify multiple forecasts for the same target.

xarray can handle big data, therefore xskillscore can handle big data.

In this notebook I verify 12 million forecasts in a couple of seconds using the RMSE metric on a dask.array.

References

This tutorial was adapted from the dask-tutorial.

The interactive session is hosted by Binder and runs on Google Kubernetes Engine (GKE).

In [ ]: