import operator
import scores.categorical
import xarray as xr
Examples of binary categorical scores include: hit rate (true positive rate), false alarm rate and accuracy.
These scores are calculated based on a binary contingency table, which comprises true positives (hits), true negatives (correct negatives), false positives (false alarms) and false negatives (misses).
If you would like to know more, visit the Methods for dichotomous (yes/no) forecasts information produced by the WWRP/WGNE Joint Working Group on Forecast Verification Research.
Producing these categorical statistics is inherently a three-step process.
scores
contains the following binary categorical scores:scores
provides support for all three steps, plus convenient functions for working efficiently. Most of the scores
APIs are functional in nature, however introducing some classes (see below) makes binary categorical metrics much easier to calculate and work with. This notebook starts with simple examples and builds up to more complex ones.
In some instances, data will already be binary. Examples of binary data include forecasts of "events" - such as "it will rain" or "a tropical cyclone will occur".
If the data is already binary, ignore this step and go straight to step two.
However, in many situations, data is not binary and is instead either continuous or categorical. An example of continuous data is a physical quantity, such as a temperature in degrees Celcius. An example of categorical data is the Beaufort Wind Scale.
In order to use binary contingency scores, any non-binary data must first be converted into binary data.
Options include either:
scores
, then input the binary data into scores
(in which case, go to step two), ORscores
to efficiently convert the non-binary data into binary data, using the scores
"event operator" class.scores
provides a general class called an "EventOperator". scores
provides in-built event operator types (e.g. greater-than, less-than and many others). Alternatively, users can define their own event operator implementation for use within scores
(for example, for more complex event definitions involving multiple variables).
The rest of this section focuses on using scores
to convert continuous data into binary data using the "event operator" class.
For instance, let's say you need to verify how accurately a numerical weather prediction (NWP) system forecasts "heavy rainfall". As the NWP preciptation rate predictions are continuous, you will need to first convert the precitation rate predictions into "heavy rainfall events" as binary data. To do this, you:
A more complex event like "a thunderstorm" may involve more complex logic to determine when the event occurs, incorporating factors such as wind and lightning data as well as the rainfall rate.
Here is an example of using scores
to convert non-binary data into binary data using the "event operator" class.
Consider the following two idealized tables of forecast and observed values (in mm) for precipitation.
# Provides a basic forecast data structure in three dimensions
simple_forecast = xr.DataArray(
[
[
[35.9, 44.0, 49],
[50.7, 51.4, 52.8],
[38.4, 38.5, 22.3],
],
[
[25.6, 38.1, 38.2],
[52.4, 55.9, 51.0],
[29.1, 33.1, 24.5],
],
],
coords=[[10, 11], [0, 1, 2], [5, 6, 7]], dims=["lead_hour", "lat", "lon"])
# Idealized observations are within 1.0 or 4.0 of the forecast in all cases except one
# This can be used to find some exact matches, and some close matches
simple_obs = xr.DataArray(
[
[
[35.8, 44.2, 52],
[53.7, 52.3, 53.7],
[38.3, 36.4, 21.4],
],
[
[25.7, 38.2, 38.7],
[51.7, 56.2, 49.9],
[29.6, 30.2, 16.9],
],
],
coords=[[10, 11], [0, 1, 2], [5, 6, 7]], dims=["lead_hour", "lat", "lon"])
For the purposes of this example, let's say an event occurs if the precipitation value reaches 50. We therefore define a threshold based event operator with a threshold of 50.
Applying this operator means scores
converts the non-binary data into binary data. It does this by creating binary event data - precipitation data with a value less than 50 is "false" and precipitation data with a value greater than or equal to 50 is "true".
The following code creates a "greater than or equal to" operator - using scores
built-in ThresholdEventOperator - which creates events with a threshold of ">= 50". It can be repeatedly called with different event thresholds, for reasons that will be explained later.
# An event here is defined as precipitation with a value greater than 50
# The EventThresholdOperator can take a variety of operators from the python "operator" module, or a user-defined function
# The default is operator.ge, which is the same as ">=" but in functional form.
# Here we pass it in explicitly so it's very clear what will happen
event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50.0, default_op_fn=operator.ge)
# event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50.0) # This is the same thing
# https://docs.python.org/3/library/operator.html provides an overview of what standard operator types can be supplied
# You may wish to exeriment with the following alternatives now
# Greater-than (not equal to) operator
# event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50, default_op_fn=operator.gt)
# Less-than operator
# event_operator = scores.categorical.ThresholdEventOperator(default_event_threshold=50, default_op_fn=operator.lt)
# The event operator creates binary "event tables".
# If you don't wish to visualise the binary "event tables", you can ignore this section.
# However, if you wish to visualise or utilise the the "event tables" directly, you can do so as follows:
forecast_binary, observed_binary = event_operator.make_event_tables(simple_forecast, simple_obs)
# It is also possible to vary both the event threshold and the event operator from the defaults during this step
# This commented-out code uses the less-than operator with a threshold of 45
# forecast_binary, observed_binary = event_operator.make_event_tables(simple_forecast, simple_obs, event_threshold=45, op_fn=operator.lt)
print(forecast_binary)
<xarray.DataArray (lead_hour: 2, lat: 3, lon: 3)> Size: 144B array([[[0., 0., 0.], [1., 1., 1.], [0., 0., 0.]], [[0., 0., 0.], [1., 1., 1.], [0., 0., 0.]]]) Coordinates: * lead_hour (lead_hour) int64 16B 10 11 * lat (lat) int64 24B 0 1 2 * lon (lon) int64 24B 5 6 7
print(observed_binary)
<xarray.DataArray (lead_hour: 2, lat: 3, lon: 3)> Size: 144B array([[[0., 0., 1.], [1., 1., 1.], [0., 0., 0.]], [[0., 0., 0.], [1., 1., 0.], [0., 0., 0.]]]) Coordinates: * lead_hour (lead_hour) int64 16B 10 11 * lat (lat) int64 24B 0 1 2 * lon (lon) int64 24B 5 6 7
Once the event tables have been made, the next step is to make a contingency manager object. It is possible (and quicker) to make a contingency manager from the event operator directly, but we will first show the process as separate steps, and then show how to do the two things at once. The contingency manager stores the binary forecast event data, the binary observed event data and the contingency table itself, plus provides functions for aggregating the data along various dimensions, and calculating metrics from that information.
contingency_manager = scores.categorical.BinaryContingencyManager(forecast_binary, observed_binary)
contingency_manager.get_table() # Print a view of the contingency table
<xarray.DataArray (contingency: 5)> Size: 40B array([ 5., 11., 1., 1., 18.]) Coordinates: * contingency (contingency) <U11 220B 'tp_count' 'tn_count' ... 'total_count'
# Uncomment this line to view the full functionality provided by a contingency manager
# help(contingency_manager)
# It is also possible, and briefer, to make the binary contingency table directly from the event operator, as follows:
contingency_manager = event_operator.make_contingency_manager(simple_forecast, simple_obs)
# It is also possible to vary both the event threshold and the event operator from the defaults during this step
# contingency_manager = event_operator.make_contingency_manager(simple_forecast, simple_obs, event_threshold=2.0, op_fn=operator.lt)
contingency_manager.get_table()
<xarray.DataArray (contingency: 5)> Size: 40B array([ 5., 11., 1., 1., 18.]) Coordinates: * contingency (contingency) <U11 220B 'tp_count' 'tn_count' ... 'total_count'
# Further, if it turn out you want them, the event tables are still there
contingency_manager.fcst_events
<xarray.DataArray (lead_hour: 2, lat: 3, lon: 3)> Size: 144B array([[[0., 0., 0.], [1., 1., 1.], [0., 0., 0.]], [[0., 0., 0.], [1., 1., 1.], [0., 0., 0.]]]) Coordinates: * lead_hour (lead_hour) int64 16B 10 11 * lat (lat) int64 24B 0 1 2 * lon (lon) int64 24B 5 6 7
The binary contingency manager can then be utilised to calculate a wide variety of scores which are based on the contingency table.
Below are two examples of using scores
to calculate metrics directly from the contigency manager:
contingency_manager.accuracy()
<xarray.DataArray ()> Size: 8B array(0.88888889)
contingency_manager.false_alarm_rate()
<xarray.DataArray ()> Size: 8B array(0.08333333)
Using the "transform" operation on the contingency table can be helpful for handling very large data sets.
Transforming the table triggers the 'compute' function against the dask arrays which until this point may have been loaded lazily. This is insignificant for small data sets, but if calculating a contingency table for a multi-terabyte gridded data set, can be time-consuming. The tranformed table contains only the resultant contingency table, which is thereafter very fast to work with.
Here we present a 'transform' with no arguments to trigger the computation of the aggregated contingency table. As such, it may be an expensive operation, but the resultant table is very small and all the relevant scores can be easily calculated as the table is re-used.
computed = contingency_manager.transform()
computed.accuracy()
<xarray.DataArray ()> Size: 8B array(0.88888889)
The transform function can also be utilised to explore the dimensionality of the data. This function accepts preserve_dims and reduce_dims as per the rest of the scores
codebase.
While the team behind scores
intend to add functionality so that scores
can support weights during transformation, this functionality has not yet been added.
At present, the following approach can be used to examine contingency scores along the data dimensions.
# If it is wanted, the underlying event counts can be accessed
new_manager = contingency_manager.transform(preserve_dims='lead_hour')
print(new_manager)
Contingency Manager (xarray view of table): <xarray.DataArray (contingency: 5, lead_hour: 2)> Size: 80B array([[3., 2.], [5., 6.], [0., 1.], [1., 0.], [9., 9.]]) Coordinates: * lead_hour (lead_hour) int64 16B 10 11 * contingency (contingency) <U11 220B 'tp_count' 'tn_count' ... 'total_count'
# Calculating the accuracy on this transformed manager will show the accuracy along the height dimension
new_manager.accuracy()
<xarray.DataArray (lead_hour: 2)> Size: 16B array([0.88888889, 0.88888889]) Coordinates: * lead_hour (lead_hour) int64 16B 10 11
An interesting next step could include: