And spatial skill assessments
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib_inline.backend_inline import set_matplotlib_formats
set_matplotlib_formats('png')
import modelskill as ms
fn = '../tests/testdata/NorthSeaHD_and_windspeed.dfsu'
mr = ms.ModelResult(fn, name='HD', item=0)
mr
<DfsuModelResult> 'HD'
In this case, the track observations are stored in a csv file, which we can read in using pandas. Any file format that can be read into a pandas dataframe can be used here.
fn = '../tests/testdata/altimetry_NorthSea_20171027.csv'
df = pd.read_csv(fn, index_col=0, parse_dates=True)
df.head()
lon | lat | surface_elevation | significant_wave_height | wind_speed | |
---|---|---|---|---|---|
date | |||||
2017-10-26 04:37:37 | 8.757272 | 53.926136 | 1.6449 | 0.426 | 6.100000 |
2017-10-26 04:37:54 | 8.221631 | 54.948459 | 1.1200 | 1.634 | 9.030000 |
2017-10-26 04:37:55 | 8.189390 | 55.008547 | 1.0882 | 1.717 | 9.370000 |
2017-10-26 04:37:56 | 8.157065 | 55.068627 | 1.0309 | 1.869 | 9.559999 |
2017-10-26 04:37:58 | 8.124656 | 55.128700 | 1.0369 | 1.939 | 9.980000 |
mr.quantity
Quantity(name='Surface Elevation', unit='meter')
# the dataframe doesn't include the metadata on which quantity it contains, we add this manually, consistent with the model result
o1 = ms.TrackObservation(df, item="surface_elevation", name='alti', quantity=ms.Quantity(name="Surface Elevation", unit="meter"))
o1
/home/jan/src/modelskill/modelskill/utils.py:134: UserWarning: Time axis has duplicate entries. Now adding milliseconds to non-unique entries to make index unique. warnings.warn(
TrackObservation: alti, n=1115
ms.plotting.spatial_overview(o1, mr)
cc = ms.compare(o1, mr)[0]
cc
C:\Users\rywm\Repos\modelskill\modelskill\comparison\_comparison.py:301: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. res = df.groupby(by=by).apply(calc_metrics)
<Comparer> Quantity: Surface Elevation [meter] Observation: alti, n_points=544 Model: HD, rmse=0.115
cc.data
<xarray.Dataset> Dimensions: (time: 544) Coordinates: * time (time) datetime64[ns] 2017-10-27T10:45:19 ... 2017-10-29T13:... Data variables: x (time) float64 1.262 1.231 1.2 1.168 ... 6.908 6.971 7.034 y (time) float64 55.3 55.24 55.18 55.13 ... 55.24 55.28 55.32 HD (time) float32 0.3699 0.356 0.3559 ... 0.7068 0.7068 0.685 Observation (time) float64 0.3778 0.4375 0.4489 ... 0.8562 0.8368 0.8218 Attributes: gtype: track name: alti quantity_name: Surface Elevation modelskill_version: 1.0.dev2
cc.plot.scatter();
ModelResult is now a dfs0
fn = '../tests/testdata/NorthSeaHD_extracted_track.dfs0'
mr = ms.TrackModelResult(fn, name='HD', item=2) # explicitly define type as Track
mr.data
/home/jan/src/modelskill/modelskill/utils.py:134: UserWarning: Time axis has duplicate entries. Now adding milliseconds to non-unique entries to make index unique. warnings.warn(
x | y | Model_surface_elevation | |
---|---|---|---|
2017-10-26 04:37:37 | 8.757272 | 53.926136 | NaN |
2017-10-26 04:37:54 | 8.221631 | 54.948460 | NaN |
2017-10-26 04:37:55 | 8.189390 | 55.008549 | NaN |
2017-10-26 04:37:56 | 8.157065 | 55.068626 | NaN |
2017-10-26 04:37:58 | 8.124656 | 55.128700 | NaN |
... | ... | ... | ... |
2017-10-30 20:54:43 | 3.703790 | 55.363529 | NaN |
2017-10-30 20:54:44 | 3.672147 | 55.420830 | NaN |
2017-10-30 20:54:45 | 3.640425 | 55.478127 | NaN |
2017-10-30 20:54:46 | 3.608623 | 55.535419 | NaN |
2017-10-30 20:54:47 | 3.576741 | 55.592701 | NaN |
1115 rows × 3 columns
mr
<TrackModelResult> 'HD'
fn = '../tests/testdata/altimetry_NorthSea_20171027.csv'
df = pd.read_csv(fn, index_col=0, parse_dates=True)
o1 = ms.TrackObservation(df, item=2, name='alti')
/home/jan/src/modelskill/modelskill/utils.py:134: UserWarning: Time axis has duplicate entries. Now adding milliseconds to non-unique entries to make index unique. warnings.warn(
o1.geometry
o1.data
x | y | surface_elevation | |
---|---|---|---|
date | |||
2017-10-26 04:37:37 | 8.757272 | 53.926136 | 1.6449 |
2017-10-26 04:37:54 | 8.221631 | 54.948459 | 1.1200 |
2017-10-26 04:37:55 | 8.189390 | 55.008547 | 1.0882 |
2017-10-26 04:37:56 | 8.157065 | 55.068627 | 1.0309 |
2017-10-26 04:37:58 | 8.124656 | 55.128700 | 1.0369 |
... | ... | ... | ... |
2017-10-30 20:54:43 | 3.703790 | 55.363528 | 0.1111 |
2017-10-30 20:54:44 | 3.672147 | 55.420831 | 0.0647 |
2017-10-30 20:54:45 | 3.640425 | 55.478128 | 0.0364 |
2017-10-30 20:54:46 | 3.608623 | 55.535418 | 0.0283 |
2017-10-30 20:54:47 | 3.576741 | 55.592701 | 0.0601 |
1115 rows × 3 columns
cc = ms.compare(o1, mr)
cc.plot.scatter();
Load model, load observation, add observation to model and extract.
fn = '../tests/testdata/NorthSeaHD_and_windspeed.dfsu'
mr = ms.ModelResult(fn, name='HD', item=0)
fn = '../tests/testdata/altimetry_NorthSea_20171027.csv'
df = pd.read_csv(fn, index_col=0, parse_dates=True)
o1 = ms.TrackObservation(df, item=2, name='alti')
cc = ms.compare(o1, mr)
cc
/home/jan/src/modelskill/modelskill/utils.py:134: UserWarning: Time axis has duplicate entries. Now adding milliseconds to non-unique entries to make index unique. warnings.warn(
<ComparerCollection> Comparer: alti
Get metrics binned by a regular spatial grid, returns xarray Dataset
ss = cc.spatial_skill(metrics=['bias'])
C:\Users\rywm\Repos\modelskill\modelskill\comparison\_comparison.py:301: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. res = df.groupby(by=by).apply(calc_metrics)
ss.ds['n']
<xarray.DataArray 'n' (y: 5, x: 5)> array([[ 3, 17, 0, 0, 0], [ 0, 50, 9, 0, 0], [ 0, 36, 51, 0, 0], [14, 72, 33, 15, 28], [37, 83, 0, 20, 76]]) Coordinates: * y (y) float64 50.6 51.66 52.7 53.75 54.8 * x (x) float64 -0.436 1.543 3.517 5.492 7.466 observation <U4 'alti' Attributes: long_name: Number of observations units: -
Plot using xarray - convenient methods coming soon!
fig, axes = plt.subplots(ncols=2, nrows=1, figsize = (10, 5))
ss.plot('n', ax=axes[0])
ss.plot('bias', ax=axes[1]);
ss = cc.spatial_skill(metrics=['bias'], n_min=25)
fig, axes = plt.subplots(ncols=2, nrows=1, figsize=(10, 5))
ss.plot('n', ax=axes[0])
ss.plot('bias', ax=axes[1]);
C:\Users\rywm\Repos\modelskill\modelskill\comparison\_comparison.py:301: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. res = df.groupby(by=by).apply(calc_metrics)
Get data from comparer as dataframe and add a water level category as a new column.
all_df = cc.to_dataframe()
all_df["wl category"] = 'high'
all_df.loc[all_df['mod_val']<0, "wl category"] = 'low'
Add the "wl category" to the comparer's data structure.
cmp = cc[0]
cmp.data["wl category"] = all_df["wl category"]
cmp.data
<xarray.Dataset> Dimensions: (time: 544) Coordinates: * time (time) datetime64[ns] 2017-10-27T10:45:19 ... 2017-10-29T13:... Data variables: x (time) float64 1.262 1.231 1.2 1.168 ... 6.908 6.971 7.034 y (time) float64 55.3 55.24 55.18 55.13 ... 55.24 55.28 55.32 HD (time) float32 0.3699 0.356 0.3559 ... 0.7068 0.7068 0.685 Observation (time) float64 0.3778 0.4375 0.4489 ... 0.8562 0.8368 0.8218 wl category (time) object 'high' 'high' 'high' ... 'high' 'high' 'high' Attributes: gtype: track name: alti quantity_name: Undefined modelskill_version: 1.0.dev2
Now aggregate the data by the new column (and x and y):
ss = cmp.spatial_skill(by=['wl category'], metrics=['bias'], n_min=5)
ss
C:\Users\rywm\Repos\modelskill\modelskill\comparison\_comparison.py:301: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. res = df.groupby(by=by).apply(calc_metrics)
<xarray.Dataset> Dimensions: (y: 5, x: 5, wl category: 2) Coordinates: * y (y) float64 50.6 51.66 52.7 53.75 54.8 * x (x) float64 -0.436 1.543 3.517 5.492 7.466 * wl category (wl category) object 'high' 'low' Data variables: n (y, x, wl category) int64 0 3 14 3 0 0 0 0 ... 6 0 0 20 0 72 4 bias (y, x, wl category) float64 nan nan -0.2252 ... nan -0.1065 nan
ss.plot('n');
ss.plot('bias');
Add fake 2nd observation to model
df2 = df.copy()
df2['surface_elevation'] = df2['surface_elevation'] - 0.2
o2 = ms.TrackObservation(df2, item=2, name='alti2')
cmp2 = ms.compare(o2, mr)[0]
/home/jan/src/modelskill/modelskill/utils.py:134: UserWarning: Time axis has duplicate entries. Now adding milliseconds to non-unique entries to make index unique. warnings.warn(
Extract, spatial skill, add attrs, plot.
cc = cc + cmp2
ss = cc.spatial_skill(metrics=['bias'], n_min=20)
ss.ds['bias'].attrs = dict(long_name="Bias of surface elevation", units="m")
ss.plot('bias', figsize=(10,5));
C:\Users\rywm\Repos\modelskill\modelskill\comparison\_comparison.py:301: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. res = df.groupby(by=by).apply(calc_metrics)