#!/usr/bin/env python # coding: utf-8 # ## xskillscore-tutorial # Welcome to the [xskillscore](https://github.com/raybellwaves/xskillscore) tutorial. # # This was created for a talk at the [Data Science Study Group: South Florida](https://www.meetup.com/Data-Science-Study-Group-South-Florida/) on April 1 st 2020. The associated slides with the talk can be found [here](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/xskillscore-tutorial.pdf). # # The repository for this tutorial is hosted on GitHub here: [xskillscore-tutorial](https://github.com/raybellwaves/xskillscore-tutorial). # ## Motivation for xskillscore # `xskillscore` provides a one-stop shop for metrics used in verification of forecasts. # # It is an extension of [`xarray`](http://xarray.pydata.org/en/stable/) which is a library that handles labelled n-dimensional arrays. Find out more information about `xarray` [here](http://xarray.pydata.org/en/stable/why-xarray.html). # ## History of xskillscore # `xskillscore` was developed by Ray Bell while at the University of Miami during the [SubX project](https://journals.ametsoc.org/doi/full/10.1175/BAMS-D-18-0270.1) in 2018. # # In 2019, Aaron Spring, Andrew Huang and Riley Brady greatly improved `xskillscore`. Aaron, Andrew and Riley provided upstream fixes and enhancement of `xskillscore` as it used extensively in [climpred](https://climpred.readthedocs.io/en/stable/). # ## xskillscore overview # The verification metrics in `xskillscore` are split into two types: **deterministic** and **probabilistic**. # # **Deterministic** metrics consist of correlation metrics (e.g. [pearson r](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)) and distance metrics (e.g. [root-mean-square error](https://en.wikipedia.org/wiki/Root-mean-square_deviation)). These metrics adapt the implementation in [`scikit-learn`](https://scikit-learn.org/stable/) and [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html). # # **Probabilistic** metrics can be calculated when the forecast consists of multiple forecasts for the same target. Examples, include [Continuous Ranked Probability Score](https://climpred.readthedocs.io/en/stable/metrics.html#continuous-ranked-probability-score-crps) and [Brier Score](https://journals.ametsoc.org/doi/abs/10.1175/1520-0493%281950%29078%3C0001%3AVOFEIT%3E2.0.CO%3B2). # # `xskillscore` works on `xarray` objects which requires data to be castable to an `ndarray`. It works with [`numpy.array`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html), [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and [`dask.array`](https://docs.dask.org/en/latest/array.html). # You can see the metrics available in `xskillscore` by running `dir(xs)`: # In[1]: import xskillscore as xs dir(xs) # ## Table of Contents # ## [01_Deterministic.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/01_Determinisitic.ipynb) # In this notebook I show how `xskillscore` can be dropped in a typical data science task where the data is a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). # # I use the metric root-mean-squared error (RMSE) to verify forecasts of items sold. # # I also show how you can applies weights to the verification and handle missing values. # ## [02_Probabilistic.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/02_Probabilistic.ipynb) # This notebook shows how to use probabilistic metrics in a typical data science task where the data is a [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). # # The metric Continuous Ranked Probability Score (CRPS) is used to verify multiple forecasts for the same target. # ## [03_Big_Data.ipynb](https://github.com/raybellwaves/xskillscore-tutorial/blob/master/03_Big_Data.ipynb) # `xarray` can handle big data, therefore `xskillscore` can handle big data. # # In this notebook I verify 12 million forecasts in a couple of seconds using the RMSE metric on a `dask.array`. # ## References # This tutorial was adapted from the [dask-tutorial](https://github.com/dask/dask-tutorial). # # The interactive session is hosted by [Binder](https://mybinder.readthedocs.io/en/latest/) # and runs on [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine). # In[ ]: