The Brier score is the most commonly used verification metric for evaluating a probability of a binary outcome forecast, such as a "chance of rainfall" forecast.
Probabilistic forecasts of binary events are expressed as values between 0 and 1, and observations are exactly 0 (event did not occur), or 1 (event occured).
The metric is then calculated the same way as MSE. The Brier score is a strictly proper scoring rule where lower values are better (it is negatively oriented) where a perfect score is 0 and the worst score is 1.
from scores.probability import brier_score
from scipy.stats import beta, binom
import numpy as np
import xarray as xr
# To learn more about the implemenation of the Brier score, uncomment the following
# help(brier_score)
We generate two synthetic forecasts. By design, fcst1
is a good forecast, while fcst2
is a poor forecast. We measure the difference in skill by calculating and comparing their Brier Scores.
fcst1 = beta.rvs(2, 1, size=1000)
obs = binom.rvs(1, fcst1)
fcst2 = beta.rvs(0.5, 1, size=1000)
fcst1 = xr.DataArray(data=fcst1, dims="time", coords={"time": np.arange(0, 1000)})
fcst2 = xr.DataArray(data=fcst2, dims="time", coords={"time": np.arange(0, 1000)})
obs = xr.DataArray(data=obs, dims="time", coords={"time": np.arange(0, 1000)})
brier_fcst1 = brier_score(fcst1, obs)
brier_fcst2 = brier_score(fcst2, obs)
print(f"Brier score for fcst1 = {brier_fcst1.item():.2f}")
print(f"Brier score for fcst2 = {brier_fcst2.item():.2f}")
Brier score for fcst1 = 0.16 Brier score for fcst2 = 0.43
As expected, fcst1 has the lower Brier Score quantifying the degree to which it is better than fcst2.
check_args
arg to False
in brier_score
.