KDD2024 Tutorial / A Hands-On Introduction to Time Series Classification and Regression
In this notebook, we will explore the convolutional methods for time series machine learning available in aeon
. We will also demonstrate how aeon
estimators are compatible with scikit-learn
tuning functionality.
Convolutional methods are a class of algorithms that leverage randomly initialised convolutional kernels to extract features from time series data. These methods have been shown to be effective in time series classification tasks, and are very efficient to process.
This notebook mostly focuses on classification as this is the learning task these approaches were designed and published for, but they are easily adaptable to regression tasks as well. The regression versions and evaluations of them are unpublished but are available in aeon
.
!pip install aeon==0.11.0 torch
!mkdir -p data
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TEST.ts -P data/
# There are some deprecation warnings present in the notebook, we will ignore them.
# Remove this cell if you are interested in finding out what is changing soon, for
# aeon there will be big changes in out v1.0.0 release!
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
from aeon.registry import all_estimators
all_estimators(
"classifier", filter_tags={"algorithm_type": "convolution"}, as_dataframe=True
)
name | estimator | |
---|---|---|
0 | Arsenal | <class 'aeon.classification.convolution_based.... |
1 | HydraClassifier | <class 'aeon.classification.convolution_based.... |
2 | MultiRocketHydraClassifier | <class 'aeon.classification.convolution_based.... |
3 | RocketClassifier | <class 'aeon.classification.convolution_based.... |
all_estimators(
"regressor", filter_tags={"algorithm_type": "convolution"}, as_dataframe=True
)
name | estimator | |
---|---|---|
0 | HydraRegressor | <class 'aeon.regression.convolution_based._hyd... |
1 | MultiRocketHydraRegressor | <class 'aeon.regression.convolution_based._mr_... |
2 | RocketRegressor | <class 'aeon.regression.convolution_based._roc... |
from aeon.datasets import load_from_tsfile
X_train_c, y_train_c = load_from_tsfile("./data/KDD_MTSC_TRAIN.ts")
X_test_c, y_test_c = load_from_tsfile("./data/KDD_MTSC_TEST.ts")
print("Train shape:", X_train_c.shape)
print("Test shape:", X_test_c.shape)
Train shape: (40, 4, 100) Test shape: (40, 4, 100)
X_train_r, y_train_r = load_from_tsfile("./data/KDD_MTSER_TRAIN.ts")
X_test_r, y_test_r = load_from_tsfile("./data/KDD_MTSER_TEST.ts")
print("Train shape:", X_train_r.shape)
print("Test shape:", X_test_r.shape)
Train shape: (72, 4, 100) Test shape: (72, 4, 100)
from aeon.classification.convolution_based import RocketClassifier
classifier = RocketClassifier(rocket_transform="rocket", random_state=42)
classifier.fit(X_train_c, y_train_c)
classifier.score(X_test_c, y_test_c)
0.425
from aeon.classification.convolution_based import RocketClassifier
classifier = RocketClassifier(rocket_transform="minirocket", random_state=42)
classifier.fit(X_train_c, y_train_c)
classifier.score(X_test_c, y_test_c)
0.9
RocketRegressor(rocket_transform = "minirocket")
¶from aeon.regression.convolution_based import RocketRegressor
from sklearn.metrics import r2_score
regressor = RocketRegressor(rocket_transform="minirocket", random_state=42)
regressor.fit(X_train_r, y_train_r)
preds = regressor.predict(X_test_r)
r2_score(y_test_r, preds)
0.4110569698148909
from aeon.visualisation import plot_scatter_predictions
plot_scatter_predictions(y_test_r, preds)
(<Figure size 600x600 with 1 Axes>, <Axes: xlabel='Actual values', ylabel='Predicted values'>)
from sklearn.model_selection import GridSearchCV
import numpy as np
import matplotlib.pyplot as plt
# ==============================================================================
# == fit =======================================================================
# ==============================================================================
param_grid = {
"num_kernels" : [100, 1000, 10000],
"max_dilations_per_kernel" : [16, 32, 64],
}
gs = GridSearchCV(
RocketRegressor(rocket_transform="minirocket", random_state=42),
param_grid,
)
gs.fit(X_train_r, y_train_r)
# ==============================================================================
# == plot results ==============================================================
# ==============================================================================
_scores = gs.cv_results_["mean_test_score"]
_params = np.array([f"d={_[0]}\nk={_[1]}" for _ in [tuple(_.values()) for _ in gs.cv_results_["params"]]])
_order = _scores.argsort()
_, a = plt.subplots(1, 1, figsize = (6, 4))
a.plot(_params[_order], _scores[_order], ".-")
a.set(xlabel = "d/k", ylabel = "r2")
plt.show()
score with best configuration.
gs.score(X_test_r, y_test_r)
0.39129226372986603
from aeon.classification.convolution_based import RocketClassifier
classifier = RocketClassifier(rocket_transform="multirocket", random_state=42)
classifier.fit(X_train_c, y_train_c)
classifier.score(X_test_c, y_test_c)
0.775
from aeon.classification.convolution_based import HydraClassifier
classifier = HydraClassifier(random_state=42)
classifier.fit(X_train_c, y_train_c)
classifier.score(X_test_c, y_test_c)
0.925
HydraRegressor()
¶from aeon.regression.convolution_based import HydraRegressor
from sklearn.metrics import r2_score
regressor = HydraRegressor(random_state=42)
regressor.fit(X_train_r, y_train_r)
preds = regressor.predict(X_test_r)
r2_score(y_test_r, preds)
0.43152035511445563
from aeon.visualisation import plot_scatter_predictions
plot_scatter_predictions(y_test_r, preds)
(<Figure size 600x600 with 1 Axes>, <Axes: xlabel='Actual values', ylabel='Predicted values'>)
from sklearn.model_selection import RandomizedSearchCV
import numpy as np
import matplotlib.pyplot as plt
# ==============================================================================
# == fit =======================================================================
# ==============================================================================
param_grid = {
"n_kernels" : [2, 4, 8, 16, 32],
"n_groups" : [1, 2, 4, 8, 16, 32, 64, 128],
}
rs = RandomizedSearchCV(
HydraRegressor(random_state=42),
param_grid,
random_state=42,
)
rs.fit(X_train_r, y_train_r)
# ==============================================================================
# == plot results ==============================================================
# ==============================================================================
_scores = rs.cv_results_["mean_test_score"]
_params = np.array([f"k={_[0]}\ng={_[1]}" for _ in [tuple(_.values()) for _ in rs.cv_results_["params"]]])
_order = _scores.argsort()
_, a = plt.subplots(1, 1, figsize = (6, 4))
a.plot(_params[_order], _scores[_order], ".-")
a.set(xlabel = "k/g", ylabel = "r2")
plt.show()
score with best configuration.
rs.score(X_test_r, y_test_r)
0.44497187214434764
from aeon.classification.convolution_based import MultiRocketHydraClassifier
classifier = MultiRocketHydraClassifier(random_state=42)
classifier.fit(X_train_c, y_train_c)
classifier.score(X_test_c, y_test_c)
0.8
Below we show the performance of the convolution approaches shown on the UCR TSC archive datasets [5] using results from a large scale comparison of TSC algorithms [6]. The results files are stored on timeseriesclassification.com.
from aeon.benchmarking import get_estimator_results_as_array
from aeon.datasets.tsc_datasets import univariate
names = ["Rocket", "MiniRocket", "MultiRocket", "Hydra", "MultiRocketHydra", "1NN-DTW"]
results, present_names = get_estimator_results_as_array(
names, univariate, include_missing=False
)
results.shape
(112, 6)
import numpy as np
np.mean(results, axis=0)
array([0.85581294, 0.85803176, 0.86567363, 0.85349016, 0.86802991, 0.74206624])
from aeon.visualisation import plot_critical_difference
plot_critical_difference(results, names)
(<Figure size 600x250 with 1 Axes>, <Axes: >)
from aeon.visualisation import plot_boxplot_median
plot_boxplot_median(results, names, plot_type="boxplot")
(<Figure size 1000x600 with 1 Axes>, <Axes: >)
[1] Dempster, A., Petitjean, F., & Webb, G. I. (2020). ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery, 34(5), 1454-1495.
[2] Dempster, A., Schmidt, D. F., & Webb, G. I. (2021, August). Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 248-257).
[3] Tan, C. W., Dempster, A., Bergmeir, C., & Webb, G. I. (2022). MultiRocket: multiple pooling operators and transformations for fast and effective time series classification. Data Mining and Knowledge Discovery, 36(5), 1623-1646.
[4] Dempster, A., Schmidt, D. F., & Webb, G. I. (2023). Hydra: Competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery, 37(5), 1779-1805.
[5] Dau, Hoang Anh, et al. "The UCR time series archive." IEEE/CAA Journal of Automatica Sinica 6.6 (2019): 1293-1305.
[6] Middlehurst, Matthew, Patrick Schäfer, and Anthony Bagnall. "Bake off redux: a review and experimental evaluation of recent time series classification algorithms." Data Mining and Knowledge Discovery (2024): 1-74.