ModelSkill is a general purpose model skill assessment library for (spatio)/temporal data.
If your data is in a tabular format, where each row corresponds to a time step and you have at least one column with an observed value and another column with modelled values, then you can use the ModelSkill library to assess the model skill.
import pandas as pd
import modelskill as ms
The csv file has 5 columns:
df = pd.read_csv("../tests/testdata/matched_data.csv", index_col='datetime', parse_dates=True)
df.head()
Station X | linear | quadratic | random_forest_n10 | |
---|---|---|---|---|
datetime | ||||
2019-01-01 00:00:00 | 0.756527 | 0.775651 | 0.770355 | 0.768863 |
2019-01-01 01:00:00 | 0.579304 | 0.584794 | 0.591272 | 0.594818 |
2019-01-01 02:00:00 | 0.999306 | 1.043984 | 1.063773 | 1.007354 |
2019-01-01 03:00:00 | 1.011664 | 1.003572 | 1.031288 | 1.000223 |
2019-01-01 04:00:00 | 0.802189 | 0.811493 | 0.812934 | 0.806587 |
In order to use this dataset for skill assessment we create a Comparer
with the modelskill.from_matched()
function, in order to get nice labels on the plots, we also define which physical quantity this represents.
cmp = ms.from_matched(df, obs_item='Station X', quantity=ms.Quantity(name="Some variable",unit="s"))
cmp
<Comparer> Quantity: Some variable [s] Observation: Station X, n_points=250 Model: quadratic, rmse=0.040 Model: random_forest_n10, rmse=0.025 Model: linear, rmse=0.054
cmp.sel(model='linear').plot.timeseries();
cmp.sel(model='linear').plot.scatter(skill_table=True, title='Linear regression');
cmp.sel(model='random_forest_n10').plot.scatter();
cmp.skill().round(2)
observation | n | bias | rmse | urmse | mae | cc | si | r2 | |
---|---|---|---|---|---|---|---|---|---|
model | |||||||||
linear | Station X | 250 | 0.01 | 0.05 | 0.05 | 0.04 | 0.94 | 0.07 | 0.88 |
quadratic | Station X | 250 | 0.01 | 0.04 | 0.04 | 0.03 | 0.97 | 0.05 | 0.93 |
random_forest_n10 | Station X | 250 | 0.00 | 0.03 | 0.03 | 0.01 | 0.99 | 0.03 | 0.97 |
cmp.plot.timeseries(backend="plotly")