This notebooks contains examples with neural network models.
Table of Contents
import torch
import random
import pandas as pd
import numpy as np
from etna.datasets.tsdataset import TSDataset
from etna.transforms import DateFlagsTransform
from etna.transforms import LagTransform
from etna.transforms import LinearTrendTransform
from etna.metrics import SMAPE, MAPE, MAE
from etna.model_selection import TimeSeriesCrossValidation
from etna.analysis import plot_backtest
from etna.models import SeasonalMovingAverageModel
import warnings
warnings.filterwarnings("ignore")
We are going to take transformed [Household Electric Power Consumption] dataset. Let's load and look at it.
original_df = pd.read_csv("data/example_dataset.csv")
original_df.head()
timestamp | segment | target | |
---|---|---|---|
0 | 2019-01-01 | segment_a | 170 |
1 | 2019-01-02 | segment_a | 243 |
2 | 2019-01-03 | segment_a | 267 |
3 | 2019-01-04 | segment_a | 287 |
4 | 2019-01-05 | segment_a | 279 |
Our library works with the spacial data structure TSDataset. Let's create it as it was done in "Get started" notebook.
df = TSDataset.to_dataset(original_df)
ts = TSDataset(df, freq="D")
ts.head(5)
segment | segment_a | segment_b | segment_c | segment_d |
---|---|---|---|---|
feature | target | target | target | target |
timestamp | ||||
2019-01-01 | 170 | 102 | 92 | 238 |
2019-01-02 | 243 | 123 | 107 | 358 |
2019-01-03 | 267 | 130 | 103 | 366 |
2019-01-04 | 287 | 138 | 103 | 385 |
2019-01-05 | 279 | 137 | 104 | 384 |
Our library uses PyTorch Forecasting to work with time series neural networks. To include it in our current architecture we use PytorchForecastingTransform
class.
Let's look at it closer.
from etna.transforms import PytorchForecastingTransform
?PytorchForecastingTransform
"""
Init signature:
PytorchForecastingTransform(
max_encoder_length: int = 30,
min_encoder_length: int = None,
min_prediction_idx: int = None,
min_prediction_length: int = None,
max_prediction_length: int = 1,
static_categoricals: List[str] = [],
static_reals: List[str] = [],
time_varying_known_categoricals: List[str] = [],
time_varying_known_reals: List[str] = [],
time_varying_unknown_categoricals: List[str] = [],
time_varying_unknown_reals: List[str] = [],
variable_groups: Dict[str, List[int]] = {},
dropout_categoricals: List[str] = [],
constant_fill_strategy: Dict[str, Union[str, float, int, bool]] = {},
allow_missings: bool = True,
lags: Dict[str, List[int]] = {},
add_relative_time_idx: bool = True,
add_target_scales: bool = True,
add_encoder_length: Union[bool, str] = True,
target_normalizer: Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer, str, List[Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer]], Tuple[Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer]]] = 'auto',
categorical_encoders: Dict[str, pytorch_forecasting.data.encoders.NaNLabelEncoder] = None,
scalers: Dict[str, Union[sklearn.preprocessing._data.StandardScaler, sklearn.preprocessing._data.RobustScaler, pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.EncoderNormalizer]] = {},
)
Docstring: Transform for models from PytorchForecasting library.
Init docstring:
Parameters for TimeSeriesDataSet object.
Reference
---------
https://github.com/jdb78/pytorch-forecasting/blob/v0.8.5/pytorch_forecasting/data/timeseries.py#L117
"""
We can see a pretty scary signature, but don't panic, we will look at the most important parameters.
time_varying_known_reals
— known real values that change across the time (real regressors), now it it necessary to add "time_idx" variable to the list;time_varying_unknown_reals
— our real value target, set it to ["target"]
;max_prediction_length
— our horizon for forecasting;max_encoder_length
— length of past context to use;static_categoricals
— static categorical values, for example, if we use multiple segments it can be some its characteristics including identifier: "segment";time_varying_known_categoricals
— known categorical values that change across the time (categorical regressors);target_normalizer
— class for normalization targets across different segments.In this section we will test our models on example.
Before training let's fix seeds for reproducibility.
torch.manual_seed(42)
random.seed(42)
np.random.seed(42)
Creating transforms for DeepAR.
from pytorch_forecasting.data import GroupNormalizer
HORIZON = 7
transform_date = DateFlagsTransform(day_number_in_week=True, day_number_in_month=False)
num_lags = 10
transform_lag = LagTransform(in_column="target", lags=[HORIZON+i for i in range(num_lags)])
lag_columns = [f"regressor_target_lag_{HORIZON+i}" for i in range(num_lags)]
transform_deepar = PytorchForecastingTransform(
max_encoder_length=HORIZON,
max_prediction_length=HORIZON,
time_varying_known_reals=["time_idx"]+lag_columns,
time_varying_unknown_reals=["target"],
time_varying_known_categoricals=["regressor_day_number_in_week"],
target_normalizer=GroupNormalizer(groups=["segment"]),
)
Now we are going to start backtest.
from etna.models.nn import DeepARModel
model_deepar = DeepARModel(max_epochs=150, learning_rate=[0.01], gpus=0, batch_size=64)
metrics = [SMAPE(), MAPE(), MAE()]
tscv_deepar = TimeSeriesCrossValidation(
model=model_deepar, horizon=HORIZON, metrics=metrics, n_folds=3, n_jobs=1
)
results_deepar = tscv_deepar.backtest(ts, transforms=[transform_lag, transform_date, transform_deepar])
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs | Name | Type | Params ------------------------------------------------------------------ 0 | loss | NormalDistributionLoss | 0 1 | logging_metrics | ModuleList | 0 2 | embeddings | MultiEmbedding | 35 3 | rnn | LSTM | 2.2 K 4 | distribution_projector | Linear | 22 ------------------------------------------------------------------ 2.3 K Trainable params 0 Non-trainable params 2.3 K Total params 0.009 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]
Training: -1it [00:00, ?it/s]
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 3.4min remaining: 0.0s GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs | Name | Type | Params ------------------------------------------------------------------ 0 | loss | NormalDistributionLoss | 0 1 | logging_metrics | ModuleList | 0 2 | embeddings | MultiEmbedding | 35 3 | rnn | LSTM | 2.2 K 4 | distribution_projector | Linear | 22 ------------------------------------------------------------------ 2.3 K Trainable params 0 Non-trainable params 2.3 K Total params 0.009 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]
Training: -1it [00:00, ?it/s]
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 6.7min remaining: 0.0s GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs | Name | Type | Params ------------------------------------------------------------------ 0 | loss | NormalDistributionLoss | 0 1 | logging_metrics | ModuleList | 0 2 | embeddings | MultiEmbedding | 35 3 | rnn | LSTM | 2.2 K 4 | distribution_projector | Linear | 22 ------------------------------------------------------------------ 2.3 K Trainable params 0 Non-trainable params 2.3 K Total params 0.009 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]
Training: -1it [00:00, ?it/s]
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 9.8min remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 9.8min finished
Let's compare results across different segments.
results_deepar[0]
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
2 | segment_a | 7.356097 | 7.037268 | 42.945461 | 0 |
2 | segment_a | 2.956163 | 3.041791 | 15.743748 | 1 |
2 | segment_a | 11.065429 | 13.657979 | 58.187932 | 2 |
1 | segment_b | 4.707505 | 4.580171 | 12.263770 | 0 |
1 | segment_b | 4.195955 | 4.072031 | 10.687613 | 1 |
1 | segment_b | 11.304928 | 12.714467 | 28.793474 | 2 |
3 | segment_c | 7.110642 | 6.987627 | 13.824657 | 0 |
3 | segment_c | 29.520678 | 25.431216 | 67.461661 | 1 |
3 | segment_c | 21.930446 | 26.494975 | 40.162896 | 2 |
0 | segment_d | 10.680206 | 10.042157 | 101.930368 | 0 |
0 | segment_d | 4.660242 | 4.495999 | 42.028399 | 1 |
0 | segment_d | 11.619863 | 12.886463 | 105.870850 | 2 |
To summarize it we will take mean value of SMAPE metric because it is scale tolerant.
score = results_deepar[0]["SMAPE"].mean()
print(f"Average SMAPE for DeepAR: {score:.3f}")
Average SMAPE for DeepAR: 10.592
Visualize results.
plot_backtest(results_deepar[1], ts, history_len=20)
Let's move to the next model.
torch.manual_seed(42)
random.seed(42)
np.random.seed(42)
transform_date = DateFlagsTransform(day_number_in_week=True, day_number_in_month=False)
num_lags = 10
transform_lag = LagTransform(in_column="target", lags=[HORIZON+i for i in range(num_lags)])
lag_columns = [f"regressor_target_lag_{HORIZON+i}" for i in range(num_lags)]
transform_tft = PytorchForecastingTransform(
max_encoder_length=HORIZON,
max_prediction_length=HORIZON,
time_varying_known_reals=["time_idx"],
time_varying_unknown_reals=["target"],
time_varying_known_categoricals=["regressor_day_number_in_week"],
static_categoricals=["segment"],
target_normalizer=GroupNormalizer(groups=["segment"]),
)
from etna.models.nn import TFTModel
model_deepar = TFTModel(max_epochs=200, learning_rate=[0.01], gpus=0, batch_size=64)
tscv_tft = TimeSeriesCrossValidation(
model=model_deepar, horizon=HORIZON, metrics=metrics, n_folds=3, n_jobs=1
)
results_tft = tscv_tft.backtest(ts, transforms=[transform_lag, transform_date, transform_tft])
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs | Name | Type | Params ---------------------------------------------------------------------------------------- 0 | loss | QuantileLoss | 0 1 | logging_metrics | ModuleList | 0 2 | input_embeddings | MultiEmbedding | 47 3 | prescalers | ModuleDict | 96 4 | static_variable_selection | VariableSelectionNetwork | 1.8 K 5 | encoder_variable_selection | VariableSelectionNetwork | 1.9 K 6 | decoder_variable_selection | VariableSelectionNetwork | 1.3 K 7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K 8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K 9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K 10 | static_context_enrichment | GatedResidualNetwork | 1.1 K 11 | lstm_encoder | LSTM | 2.2 K 12 | lstm_decoder | LSTM | 2.2 K 13 | post_lstm_gate_encoder | GatedLinearUnit | 544 14 | post_lstm_add_norm_encoder | AddNorm | 32 15 | static_enrichment | GatedResidualNetwork | 1.4 K 16 | multihead_attn | InterpretableMultiHeadAttention | 676 17 | post_attn_gate_norm | GateAddNorm | 576 18 | pos_wise_ff | GatedResidualNetwork | 1.1 K 19 | pre_output_gate_norm | GateAddNorm | 576 20 | output_layer | Linear | 119 ---------------------------------------------------------------------------------------- 18.9 K Trainable params 0 Non-trainable params 18.9 K Total params 0.075 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]
Training: -1it [00:00, ?it/s]
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 6.3min remaining: 0.0s GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs | Name | Type | Params ---------------------------------------------------------------------------------------- 0 | loss | QuantileLoss | 0 1 | logging_metrics | ModuleList | 0 2 | input_embeddings | MultiEmbedding | 47 3 | prescalers | ModuleDict | 96 4 | static_variable_selection | VariableSelectionNetwork | 1.8 K 5 | encoder_variable_selection | VariableSelectionNetwork | 1.9 K 6 | decoder_variable_selection | VariableSelectionNetwork | 1.3 K 7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K 8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K 9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K 10 | static_context_enrichment | GatedResidualNetwork | 1.1 K 11 | lstm_encoder | LSTM | 2.2 K 12 | lstm_decoder | LSTM | 2.2 K 13 | post_lstm_gate_encoder | GatedLinearUnit | 544 14 | post_lstm_add_norm_encoder | AddNorm | 32 15 | static_enrichment | GatedResidualNetwork | 1.4 K 16 | multihead_attn | InterpretableMultiHeadAttention | 676 17 | post_attn_gate_norm | GateAddNorm | 576 18 | pos_wise_ff | GatedResidualNetwork | 1.1 K 19 | pre_output_gate_norm | GateAddNorm | 576 20 | output_layer | Linear | 119 ---------------------------------------------------------------------------------------- 18.9 K Trainable params 0 Non-trainable params 18.9 K Total params 0.075 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]
Training: -1it [00:00, ?it/s]
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 13.7min remaining: 0.0s GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs | Name | Type | Params ---------------------------------------------------------------------------------------- 0 | loss | QuantileLoss | 0 1 | logging_metrics | ModuleList | 0 2 | input_embeddings | MultiEmbedding | 47 3 | prescalers | ModuleDict | 96 4 | static_variable_selection | VariableSelectionNetwork | 1.8 K 5 | encoder_variable_selection | VariableSelectionNetwork | 1.9 K 6 | decoder_variable_selection | VariableSelectionNetwork | 1.3 K 7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K 8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K 9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K 10 | static_context_enrichment | GatedResidualNetwork | 1.1 K 11 | lstm_encoder | LSTM | 2.2 K 12 | lstm_decoder | LSTM | 2.2 K 13 | post_lstm_gate_encoder | GatedLinearUnit | 544 14 | post_lstm_add_norm_encoder | AddNorm | 32 15 | static_enrichment | GatedResidualNetwork | 1.4 K 16 | multihead_attn | InterpretableMultiHeadAttention | 676 17 | post_attn_gate_norm | GateAddNorm | 576 18 | pos_wise_ff | GatedResidualNetwork | 1.1 K 19 | pre_output_gate_norm | GateAddNorm | 576 20 | output_layer | Linear | 119 ---------------------------------------------------------------------------------------- 18.9 K Trainable params 0 Non-trainable params 18.9 K Total params 0.075 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]
Training: -1it [00:00, ?it/s]
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 20.4min remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 20.4min finished
results_tft[0]
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
2 | segment_a | 8.543791 | 8.241657 | 48.721048 | 0 |
2 | segment_a | 7.446091 | 7.743637 | 43.059692 | 1 |
2 | segment_a | 10.303624 | 12.855012 | 53.768236 | 2 |
1 | segment_b | 3.470987 | 3.410261 | 8.963939 | 0 |
1 | segment_b | 3.652748 | 3.616559 | 9.414638 | 1 |
1 | segment_b | 10.980510 | 12.086171 | 27.739066 | 2 |
3 | segment_c | 11.993670 | 12.430382 | 24.533615 | 0 |
3 | segment_c | 21.916415 | 19.152201 | 51.425435 | 1 |
3 | segment_c | 14.719156 | 16.762780 | 25.828378 | 2 |
0 | segment_d | 10.015856 | 9.492810 | 93.228498 | 0 |
0 | segment_d | 8.286156 | 8.244600 | 76.196568 | 1 |
0 | segment_d | 13.162213 | 14.071577 | 119.847665 | 2 |
score = results_tft[0]["SMAPE"].mean()
print(f"Average SMAPE for TFT: {score:.3f}")
Average SMAPE for TFT: 10.374
plot_backtest(results_tft[1], ts, history_len=20)
For comparison let's train a much more simplier model.
model_sma = SeasonalMovingAverageModel(window=5, seasonality=7)
tscv_sma = TimeSeriesCrossValidation(
model=model_sma, horizon=HORIZON, metrics=metrics, n_folds=3, n_jobs=1
)
linear_trend_transform = LinearTrendTransform(in_column='target')
results_sma = tscv_sma.backtest(ts, transforms=[linear_trend_transform])
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. [Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s [Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.3s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.4s remaining: 0.0s [Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.4s finished
results_sma[0]
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
2 | segment_a | 7.260226 | 6.890553 | 42.417014 | 0 |
2 | segment_a | 2.842970 | 2.818930 | 16.192259 | 1 |
2 | segment_a | 12.530490 | 15.023041 | 66.406420 | 2 |
1 | segment_b | 4.973334 | 4.832433 | 12.651281 | 0 |
1 | segment_b | 4.520458 | 4.361738 | 11.165200 | 1 |
1 | segment_b | 12.673402 | 13.596187 | 32.172506 | 2 |
3 | segment_c | 6.011002 | 5.831433 | 11.684907 | 0 |
3 | segment_c | 29.948908 | 25.662033 | 68.456659 | 1 |
3 | segment_c | 25.850774 | 31.364255 | 48.537652 | 2 |
0 | segment_d | 9.915012 | 9.374096 | 92.073727 | 0 |
0 | segment_d | 3.756376 | 3.696013 | 34.462304 | 1 |
0 | segment_d | 13.923461 | 14.897909 | 126.628731 | 2 |
score = results_sma[0]["SMAPE"].mean()
print(f"Average SMAPE for Seasonal MA: {score:.3f}")
Average SMAPE for Seasonal MA: 11.184
plot_backtest(results_sma[1], ts, history_len=20)
As we can see, neural networks are a bit better in this particular case.