This notebook provides a reference to write forecasting results in a format that can be incorporated into a workflow with modelskill.
import pandas as pd
import modelskill as ms
def path_to_file(filename):
return f"../tests/testdata/forecast_skill/{filename}"
def melt_df_by_date(df, name):
melted_df = pd.melt(df.reset_index(), id_vars=["date"], var_name="lead_time", value_name=name)
melted_df["lead_time"] = melted_df["lead_time"].astype(int)
return melted_df
def wide_to_long_representation(forecasts):
def get_horizon_series(row):
new_index = (pd.to_datetime(row.name) - pd.to_datetime(row.index)).days
return pd.Series(row.values, index=new_index)
predictions_by_horizon = forecasts.T.apply(
get_horizon_series, axis=1
).dropna(axis=1, how="all")
predictions_by_horizon.index.name = "date"
predictions_by_horizon.columns = predictions_by_horizon.columns.astype(int)
return predictions_by_horizon
Let´s assume we are developing a model to forecast the next 7 days of a variable X. There are multiple strategies to store the results of such model; the following dataframe shows one arbitrary method. Every row in this dataframe represents a week of forecasted values made the date denoted by the row index. Each of these rows contains only 7 values accounting for the forecast for the next 7 days: Let´s call this the wide representation.
It is important to remark that this strategy might only be reasonable when the dataset is small enough, since this dataframe is very sparse. Still, one can follow the same structure while using a different storage method, e.g., a folder tree with subfolders for every day, where every subfolder contains a file with the forecast values.
forecast_model_1 = pd.read_csv(path_to_file("forecast_model_1.csv"), parse_dates=True, index_col=0)
forecast_model_1.head(10).round(2)
2023-01-01 | 2023-01-02 | 2023-01-03 | 2023-01-04 | 2023-01-05 | 2023-01-06 | 2023-01-07 | 2023-01-08 | 2023-01-09 | 2023-01-10 | ... | 2024-01-12 | 2024-01-13 | 2024-01-14 | 2024-01-15 | 2024-01-16 | 2024-01-17 | 2024-01-18 | 2024-01-19 | 2024-01-20 | 2024-01-21 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2022-12-31 | 15521.82 | 14296.53 | 13821.23 | 13312.48 | 13241.17 | 12847.62 | 13003.76 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-01 | NaN | 13195.95 | 13037.03 | 12753.71 | 12843.03 | 12563.93 | 12801.62 | 13731.58 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-02 | NaN | NaN | 13207.27 | 12875.01 | 12929.46 | 12625.52 | 12845.50 | 13762.85 | 13145.25 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-03 | NaN | NaN | NaN | 13342.34 | 13262.45 | 12862.78 | 13014.56 | 13883.31 | 13231.08 | 13000.59 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-04 | NaN | NaN | NaN | NaN | 13998.87 | 13387.50 | 13388.44 | 14149.71 | 13420.90 | 13135.85 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-05 | NaN | NaN | NaN | NaN | NaN | 13267.53 | 13302.96 | 14088.80 | 13377.50 | 13104.92 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-06 | NaN | NaN | NaN | NaN | NaN | NaN | 13804.10 | 14445.89 | 13631.93 | 13286.21 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-07 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 14813.45 | 13893.84 | 13472.83 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-08 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 14173.23 | 13671.91 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2023-01-09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 13939.92 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10 rows × 386 columns
When evaluating forecasting results, it is interesting to compare the forecast results at different horizons. We can more efficiently write the previous dataframe such that it is sorted by how long in advance were we forecasting each value. We can call this the horizon representation.
results_model_1 = wide_to_long_representation(forecast_model_1).dropna()
results_model_1.head(10).round(2)
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|
date | |||||||
2023-01-07 | 13804.10 | 13302.96 | 13388.44 | 13014.56 | 12845.50 | 12801.62 | 13003.76 |
2023-01-08 | 14813.45 | 14445.89 | 14088.80 | 14149.71 | 13883.31 | 13762.85 | 13731.58 |
2023-01-09 | 14173.23 | 13893.84 | 13631.93 | 13377.50 | 13420.90 | 13231.08 | 13145.25 |
2023-01-10 | 13939.92 | 13671.91 | 13472.83 | 13286.21 | 13104.92 | 13135.85 | 13000.59 |
2023-01-11 | 14669.68 | 13872.50 | 13681.53 | 13539.68 | 13406.71 | 13277.53 | 13299.57 |
2023-01-12 | 13441.08 | 14314.18 | 13746.16 | 13610.09 | 13509.02 | 13414.27 | 13322.23 |
2023-01-13 | 14470.25 | 13174.05 | 13796.16 | 13391.43 | 13294.47 | 13222.46 | 13154.95 |
2023-01-14 | 13611.65 | 14044.00 | 13120.41 | 13563.69 | 13275.30 | 13206.22 | 13154.91 |
2023-01-15 | 15539.59 | 14242.69 | 14550.75 | 13892.66 | 14208.51 | 14003.03 | 13953.80 |
2023-01-16 | 14242.70 | 14503.05 | 13578.96 | 13798.46 | 13329.56 | 13554.61 | 13408.19 |
Notice that, despite we are working with a univariate time series, we need two dimensions to evaluate the forecast results. One dimension (the rows) marks the date at which we aim to forecast the value of X; this dimensions is sometimes called the analysis time.The other dimension (the columns) represents the horizon of our forecast, i.e. how many days in advance are we forecasting X. The horizon of a forecast will often impact significantly the model results as it is normally easier to predict values that are close.
In order to compare these results with results from a different model using modelskill, we need to write the horizon representation in long format, so we can use the lead time as an auxiliary feature. We can call this, the long representation. Finally, we need to include the actual observations of X during the analysis time to assess the performance of the model.
melted_results_model_1 = melt_df_by_date(results_model_1, name="model_1")
observations = pd.read_csv(path_to_file("observations.csv"), parse_dates=True)
melted_results_model_1 = pd.merge(left=melted_results_model_1, right=observations, on="date")
melted_results_model_1.head().round(2)
date | lead_time | model_1 | observation | |
---|---|---|---|---|
0 | 2023-01-07 | 1 | 13804.10 | 14319.96 |
1 | 2023-01-08 | 1 | 14813.45 | 15205.57 |
2 | 2023-01-09 | 1 | 14173.23 | 14549.37 |
3 | 2023-01-10 | 1 | 13939.92 | 15058.72 |
4 | 2023-01-11 | 1 | 14669.68 | 13444.34 |
The following image shows an schematic of the previous workflow, where the labels A, B denote the wide representation, the horizon representation. The label C represents the matched representation, which is described in the following section. The green box to the right contains the actual observations of X during the analysis time.
(For the sake of brevity, the image shows an example where the horizon is 3 time-steps, instead of 7.)
After we have written the data in the long format, we can already call modelskill to evaluate the performance of the model. As mentioned above, we are separating the forecast results by the lead time (or horizon). We see that all metrics get worse as the lead time increases, as expected.
cmp = ms.from_matched(melted_results_model_1, mod_items=["model_1"], aux_items=["lead_time"], obs_item="observation")
cmp.skill(by=["model", "lead_time"]).sort_index().round(2).style()
observation | n | bias | rmse | urmse | mae | cc | si | r2 | ||
---|---|---|---|---|---|---|---|---|---|---|
model | lead_time | |||||||||
model_1 | 1 | observation | 374 | -172.710 | 811.960 | 793.380 | 635.330 | 0.900 | 0.050 | 0.800 |
2 | observation | 374 | -296.600 | 996.070 | 950.890 | 794.870 | 0.850 | 0.060 | 0.690 | |
3 | observation | 374 | -384.350 | 1184.790 | 1120.720 | 950.580 | 0.780 | 0.070 | 0.560 | |
4 | observation | 374 | -446.930 | 1323.600 | 1245.860 | 1080.360 | 0.720 | 0.080 | 0.460 | |
5 | observation | 374 | -491.980 | 1412.620 | 1324.180 | 1166.210 | 0.680 | 0.090 | 0.380 | |
6 | observation | 374 | -523.710 | 1475.760 | 1379.710 | 1221.850 | 0.650 | 0.090 | 0.320 | |
7 | observation | 374 | -545.460 | 1521.410 | 1420.270 | 1267.010 | 0.630 | 0.090 | 0.280 |
Typically we will be interested in comparing different model candidates. In order to do that, we need to load the results from a different model and combine the results of the two models into a single dataframe object.
forecast_model_2 = pd.read_csv(path_to_file("forecast_model_2.csv"), parse_dates=True, index_col=0)
results_model_2 = wide_to_long_representation(forecast_model_2).dropna()
melted_results_model_2 = melt_df_by_date(results_model_2, name="model_2")
melted_results_model_2 = pd.merge(left=melted_results_model_2, right=observations, on="date")
matched_model_results = melted_results_model_1.merge(
melted_results_model_2,
how="inner",
on=["date","lead_time", "observation"]
).sort_values(
by=["date", "lead_time"]
).set_index(
"date"
)
matched_model_results = matched_model_results.reindex(
sorted(matched_model_results.columns),
axis=1
)
# We save the results as well for later use
matched_model_results.to_csv(path_to_file("matched_model_results.csv"))
matched_model_results.head().round(2)
lead_time | model_1 | model_2 | observation | |
---|---|---|---|---|
date | ||||
2023-01-07 | 1 | 13804.10 | 12663.04 | 14319.96 |
2023-01-07 | 2 | 13302.96 | 12898.16 | 14319.96 |
2023-01-07 | 3 | 13388.44 | 13505.24 | 14319.96 |
2023-01-07 | 4 | 13014.56 | 13444.72 | 14319.96 |
2023-01-07 | 5 | 12845.50 | 13086.09 | 14319.96 |
Let's call this format, the matched representation, as previously introduced.
With the previous dataframe object, it we can easily compare the performance of our two candidates.
cmp = ms.from_matched(matched_model_results, mod_items=["model_1", "model_2"], aux_items=["lead_time"], obs_item="observation")
cmp.skill(by=["model", "lead_time"]).sort_index().round(2).style()
observation | n | bias | rmse | urmse | mae | cc | si | r2 | ||
---|---|---|---|---|---|---|---|---|---|---|
model | lead_time | |||||||||
model_1 | 1 | observation | 374 | -172.710 | 811.960 | 793.380 | 635.330 | 0.900 | 0.050 | 0.800 |
2 | observation | 374 | -296.600 | 996.070 | 950.890 | 794.870 | 0.850 | 0.060 | 0.690 | |
3 | observation | 374 | -384.350 | 1184.790 | 1120.720 | 950.580 | 0.780 | 0.070 | 0.560 | |
4 | observation | 374 | -446.930 | 1323.600 | 1245.860 | 1080.360 | 0.720 | 0.080 | 0.460 | |
5 | observation | 374 | -491.980 | 1412.620 | 1324.180 | 1166.210 | 0.680 | 0.090 | 0.380 | |
6 | observation | 374 | -523.710 | 1475.760 | 1379.710 | 1221.850 | 0.650 | 0.090 | 0.320 | |
7 | observation | 374 | -545.460 | 1521.410 | 1420.270 | 1267.010 | 0.630 | 0.090 | 0.280 | |
model_2 | 1 | observation | 374 | -1281.480 | 2020.670 | 1562.340 | 1512.510 | 0.500 | 0.100 | -0.270 |
2 | observation | 374 | -1186.980 | 1967.200 | 1568.750 | 1459.470 | 0.490 | 0.100 | -0.200 | |
3 | observation | 374 | -506.580 | 1669.490 | 1590.770 | 1256.730 | 0.470 | 0.100 | 0.140 | |
4 | observation | 374 | -773.380 | 1810.800 | 1637.340 | 1323.620 | 0.420 | 0.110 | -0.020 | |
5 | observation | 374 | -1268.830 | 2069.900 | 1635.410 | 1510.860 | 0.420 | 0.110 | -0.330 | |
6 | observation | 374 | -770.870 | 1832.630 | 1662.620 | 1337.590 | 0.390 | 0.110 | -0.040 | |
7 | observation | 374 | -1176.300 | 2048.960 | 1677.660 | 1488.200 | 0.370 | 0.110 | -0.300 |