Welcome to the xskillscore tutorial.
This was created for a talk at the Data Science Study Group: South Florida on April 1 st 2020. The associated slides with the talk can be found HERE.
The repository for this tutorial is hosted on GitHub here: xskillscore-tutorial.
xskillscore
was developed by Ray Bell while at the University of Miami during the SubX project in 2018.
In 2019, Aaron Spring, Andrew Huang and Riley Brady greatly improved xskillscore
. Aaron, Andrew and Riley provided upstream fixes and enchancment of xskillscore
as it used extensively in climpred.
The verification metrics in xskillscore
are split into two types: deterministic and probabilistic.
Deterministic metrics consist of correlation metrics (e.g. pearson r) and distance metrics (e.g. root-mean-square error). These metrics adapt the implimentation in scikit-learn
and scipy.stats
.
Probabilistic metrics can be calculated when the forecast consists of multiple forecasts for the same target. Examples, include Continuous Ranked Probability Score and Brier Score.
xskillscore
works on xarray
objects which requires data to be castable to an ndarray
. It works with numpy.array
, pandas.DataFrame
and dask.array
.
You can see the metrics availible in xskillscore
by running dir(xs)
:
import xskillscore as xs
dir(xs)
['XSkillScoreAccessor', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'brier_score', 'core', 'crps_ensemble', 'crps_gaussian', 'crps_quadrature', 'effective_sample_size', 'mae', 'mape', 'median_absolute_error', 'mse', 'pearson_r', 'pearson_r_eff_p_value', 'pearson_r_p_value', 'r2', 'rmse', 'smape', 'spearman_r', 'spearman_r_eff_p_value', 'spearman_r_p_value', 'threshold_brier_score']
In this notebook I show how xskillscore
can be dropped in a typical data science task where the data is a pandas.DataFrame
.
I use the metric root-mean-squared error (RMSE) to verifity forecasts of items sold.
I also show how you can applys weights to the verification and handle missing values.
This notebook shows how to use probabilistic metrics in a typical data science task where the data is a pandas.DataFrame
.
The metric Continuous Ranked Probability Score (CRPS) is used to verify multiple forecasts for the same target.
xarray
can handle big data, therefore xskillscore
can handle big data.
In this notebook I verify 12 million forecasts in a couple of seconds using the RMSE metric on a dask.array
.