You can run this notebook in a live session or view it on Github.

Quick Start¶

The easiest way to get up and running is to load in one of our example datasets and to convert them to either a {py:class}.HindcastEnsemble or {py:class}.PerfectModelEnsemble object.

climpred provides various example datasets. See our examples to see some more analysis cases.

In [1]:

# linting
%load_ext nb_black
%load_ext lab_black

In [2]:

%matplotlib inline
import matplotlib.pyplot as plt
import xarray as xr

from climpred import HindcastEnsemble
from climpred.tutorial import load_dataset
import climpred

You can view the example datasets available to be loaded with {py:func}.climpred.tutorial.load_dataset without passing any arguments:

In [3]:

load_dataset()

'MPI-control-1D': area averages for the MPI control run of SST/SSS.
'MPI-control-3D': lat/lon/time for the MPI control run of SST/SSS.
'MPI-PM-DP-1D': perfect model decadal prediction ensemble area averages of SST/SSS/AMO.
'MPI-PM-DP-3D': perfect model decadal prediction ensemble lat/lon/time of SST/SSS/AMO.
'CESM-DP-SST': hindcast decadal prediction ensemble of global mean SSTs.
'CESM-DP-SSS': hindcast decadal prediction ensemble of global mean SSS.
'CESM-DP-SST-3D': hindcast decadal prediction ensemble of eastern Pacific SSTs.
'CESM-LE': uninitialized ensemble of global mean SSTs.
'MPIESM_miklip_baseline1-hind-SST-global': hindcast initialized ensemble of global mean SSTs
'MPIESM_miklip_baseline1-hist-SST-global': uninitialized ensemble of global mean SSTs
'MPIESM_miklip_baseline1-assim-SST-global': assimilation in MPI-ESM of global mean SSTs
'ERSST': observations of global mean SSTs.
'FOSI-SST': reconstruction of global mean SSTs.
'FOSI-SSS': reconstruction of global mean SSS.
'FOSI-SST-3D': reconstruction of eastern Pacific SSTs
'GMAO-GEOS-RMM1': daily RMM1 from the GMAO-GEOS-V2p1 model for SubX
'RMM-INTERANN-OBS': observed RMM with interannual variablity included
'ECMWF_S2S_Germany': S2S ECMWF on-the-fly hindcasts from the S2S Project for Germany
'Observations_Germany': CPC/ERA5 observations for S2S forecasts over Germany
'NMME_hindcast_Nino34_sst': monthly multi-member hindcasts of sea-surface temperature averaged over the Nino3.4 region from the NMME project from IRIDL
'NMME_OIv2_Nino34_sst': monthly Reyn_SmithOIv2 sea-surface temperature observations averaged over the Nino3.4 region

From here, loading a dataset is easy. Note that you need to be connected to the internet for this to work -- the datasets are being pulled from the climpred-data repository. Once loaded, it is cached on your computer so you can reload extremely quickly. These datasets are very small (< 1MB each) so they won't take up much space.

In [4]:

initialized = climpred.tutorial.load_dataset("CESM-DP-SST")
# Add lead attribute units.
initialized["lead"].attrs["units"] = "years"
obs = climpred.tutorial.load_dataset("ERSST")

Make sure your prediction ensemble's dimension labeling conforms to climpred's standards. In other words, you need an init, lead, and (optional) member dimension. Make sure that your init and lead dimensions align.

Note that we here have a special case with ints in the init coords. For CESM-DP, a November 1st 1954 initialization should be labeled as init=1954, so that the lead=1 forecast related to valid_time=1955.

In [5]:

initialized.coords

Out[5]:

Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
  * member   (member) int32 1 2 3 4 5 6 7 8 9 10
  * init     (init) float32 1.954e+03 1.955e+03 ... 2.016e+03 2.017e+03

We'll quickly process the data to create anomalies. CESM-DPLE's drift-correction occurs over 1964-2014, so we'll remove that from the observations.

In [6]:

obs = obs - obs.sel(time=slice(1964, 2014)).mean("time")

We can create a {py:class}.HindcastEnsemble object and add our observations.

In [7]:

hindcast = HindcastEnsemble(initialized)
hindcast = hindcast.add_observations(obs)
hindcast

/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
  warnings.warn(
/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
  warnings.warn(

climpred.HindcastEnsemble

<Observations>
Dimensions:  (time: 61)
Coordinates:
  * time     (time) object 1955-01-01 00:00:00 ... 2015-01-01 00:00:00
Data variables:
    SST      (time) float32 -0.4015 -0.3524 -0.1851 ... 0.2481 0.346 0.4502

Observations

Dimensions:
- time: 61

Coordinates: (1)

time

(time)

object

1955-01-01 00:00:00 ... 2015-01-...

long_name :: time
standard_name :: time

array([cftime.DatetimeProlepticGregorian(1955, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1956, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1957, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1958, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1959, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1960, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1961, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1962, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1963, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1964, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1965, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1966, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1967, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1968, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1969, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1970, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1971, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1972, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1973, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1974, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1975, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1976, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1977, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1978, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1979, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1980, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1981, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1982, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1983, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1984, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1985, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1986, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1987, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1988, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1989, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1990, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1991, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1992, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1993, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1994, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1995, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1996, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1997, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1998, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(1999, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2001, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2002, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2003, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2004, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2005, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2006, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2007, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2008, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2009, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2010, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2011, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2012, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2014, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2015, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
      dtype=object)

Data variables: (1)

SST

(time)

float32

-0.4015 -0.3524 ... 0.346 0.4502

array([-0.40146065, -0.35238647, -0.1850853 , -0.15909958, -0.20505714,
       -0.22090721, -0.1948986 , -0.21372604, -0.19957924, -0.3476467 ,
       -0.30198097, -0.2612896 , -0.27322388, -0.2529316 , -0.09751511,
       -0.21275139, -0.33102226, -0.15143204, -0.1470375 , -0.27275276,
       -0.2956009 , -0.25230026, -0.07224655, -0.13404083, -0.01206017,
       -0.0067482 , -0.04417992, -0.04536247,  0.01559448, -0.07734871,
       -0.10858917, -0.05820274,  0.07967186,  0.03335953, -0.01163864,
        0.08754349,  0.07234192, -0.01018333, -0.01288033,  0.0184803 ,
        0.05579376,  0.02571869,  0.16826057,  0.21926689,  0.02901459,
        0.05236626,  0.148283  ,  0.18193245,  0.21125221,  0.20121765,
        0.21643448,  0.20848656,  0.13706398,  0.13442993,  0.24890518,
        0.26042938,  0.16648293,  0.22451782,  0.24811554,  0.34601402,
        0.45021248], dtype=float32)

Attributes: (0)

Out[7]:

In forecast verification, valid_time for the initialized data shall be matched with time for observations.

In [8]:

hindcast.get_initialized().coords

Out[8]:

Coordinates:
  * lead        (lead) int32 1 2 3 4 5 6 7 8 9 10
  * member      (member) int32 1 2 3 4 5 6 7 8 9 10
  * init        (init) object 1954-01-01 00:00:00 ... 2017-01-01 00:00:00
    valid_time  (lead, init) object 1955-01-01 00:00:00 ... 2027-01-01 00:00:00

In [9]:

hindcast.get_observations().coords

Out[9]:

Coordinates:
  * time     (time) object 1955-01-01 00:00:00 ... 2015-01-01 00:00:00

{py:meth}.PredictionEnsemble.plot shows all associated datasets (initialized, uninitialized if present, observations if present) if only climpred dimension (lead, init, member, time) are present, e.g. plot() does not work for lat, lon, model, ...

In [10]:

hindcast.plot()

Out[10]:

<AxesSubplot:xlabel='validity time', ylabel='SST'>

We'll also remove a quadratic trend so that it doesn't artificially boost our predictability. PredictionEnsemble.map(func) tries to apply/map a callable func to all associated datasets. Those calls do not raise errors such as ValueError, KeyError, DimensionError, but show respective warnings, which can be filtered away with warnings.filterwarnings("ignore").

In [11]:

from climpred.stats import rm_poly

hindcast.map(rm_poly, dim="init", deg=2).map(rm_poly, dim="time", deg=2)

Alternatively, when supplying the kwargs dim="init_or_time", the matching dim is applied only and hence does not raise UserWarnings.

In [12]:

hindcast = hindcast.map(rm_poly, dim="init_or_time", deg=2)
hindcast.plot()

Out[12]:

<AxesSubplot:xlabel='validity time', ylabel='SST'>

Now we'll quickly calculate forecast with {py:meth}.HindcastEnsemble.verify. We require users to define metric, comparison, dim, and alignment. This ensures that climpred isn't treated like a black box -- there are no "defaults" to the prediction analysis framework. You can choose from a variety of possible metrics by entering their associated strings. comparisons strategies vary for hindcast and perfect model systems. Here we chose to compare the ensemble mean to observations ("e2o"). We reduce this operation over the initialization dimension. Lastly, we choose the "same_verif" alignment, which uses the same set of verification dates across all leads, see alignment strategies.

In [13]:

result = hindcast.verify(
    metric="rmse", comparison="e2o", dim="init", alignment="same_verif",
)
result

In [14]:

plt.rcParams["lines.linewidth"] = 3
plt.rcParams["lines.markersize"] = 10
plt.rcParams["lines.marker"] = "o"
plt.rcParams["figure.figsize"] = (8, 3)

In [15]:

result.SST.plot()
plt.title("Global Mean SST Predictability")
plt.ylabel("Anomaly \n Correlation Coefficient")
plt.xlabel("Lead Year")
plt.show()

We can also check the association of forecasts and observations with the anomaly correlation coefficient metric="acc" {py:func}.climpred.metrics._pearson_r against multiple reference forecasts. Choose reference from ["climatology","persistence","uninitialized"].

In [16]:

result = hindcast.verify(
    metric="acc",
    comparison="e2o",
    dim="init",
    alignment="same_verif",
    reference=["persistence", "climatology"],
)

In [17]:

result.SST.plot(hue="skill")
plt.title("Global Mean SST Forecast Error")
plt.ylabel("ACC")
plt.xlabel("Lead Year")
plt.show()