KDD2024 Tutorial / A Hands-On Introduction to Time Series Classification and Regression
aeon
¶Interval-based approaches look at phase-dependent intervals of the full series, calculating summary statistics from selected subseries to be used in classification and regression. Interval-based approaches are particularly useful when the phase of the time series is important for the classification or regression task. Depending on the features extracted, interval-based approaches can be more resistant to noise than other phase-dependent approaches. For example, in the case of the ethanol level series below, the phase of the signal is important for determining the level of ethanol in a bottle of alcohol. A large amount of irrelevant high variation noise is present which would confound approaches such as distance-based.
Currently, seven published interval-based approaches are implemented in aeon
, all of which can handle classification and five of which can handle regression. These are:
In this notebook, we will demonstrate how to use four interval-based estimators using our EEG example: TSF, DrCIF and RSTSF and QUANT.
!pip install aeon==0.11.0 torch
!mkdir -p data
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TEST.ts -P data/
# There are some deprecation warnings present in the notebook, we will ignore them.
# Remove this cell if you are interested in finding out what is changing soon, for
# aeon there will be big changes in out v1.0.0 release!
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
from aeon.registry import all_estimators
all_estimators(
"classifier", filter_tags={"algorithm_type": "interval"}, as_dataframe=True
)
name | estimator | |
---|---|---|
0 | CanonicalIntervalForestClassifier | <class 'aeon.classification.interval_based._ci... |
1 | DrCIFClassifier | <class 'aeon.classification.interval_based._dr... |
2 | IntervalForestClassifier | <class 'aeon.classification.interval_based._in... |
3 | QUANTClassifier | <class 'aeon.classification.interval_based._qu... |
4 | RSTSF | <class 'aeon.classification.interval_based._rs... |
5 | RandomIntervalClassifier | <class 'aeon.classification.interval_based._in... |
6 | RandomIntervalSpectralEnsembleClassifier | <class 'aeon.classification.interval_based._ri... |
7 | SupervisedIntervalClassifier | <class 'aeon.classification.interval_based._in... |
8 | SupervisedTimeSeriesForest | <class 'aeon.classification.interval_based._st... |
9 | TimeSeriesForestClassifier | <class 'aeon.classification.interval_based._ts... |
all_estimators(
"regressor", filter_tags={"algorithm_type": "interval"}, as_dataframe=True
)
name | estimator | |
---|---|---|
0 | CanonicalIntervalForestRegressor | <class 'aeon.regression.interval_based._cif.Ca... |
1 | DrCIFRegressor | <class 'aeon.regression.interval_based._drcif.... |
2 | IntervalForestRegressor | <class 'aeon.regression.interval_based._interv... |
3 | RandomIntervalRegressor | <class 'aeon.regression.interval_based._interv... |
4 | RandomIntervalSpectralEnsembleRegressor | <class 'aeon.regression.interval_based._rise.R... |
5 | TimeSeriesForestRegressor | <class 'aeon.regression.interval_based._tsf.Ti... |
from aeon.datasets import load_from_tsfile
X_train_c, y_train_c = load_from_tsfile("./data/KDD_MTSC_TRAIN.ts")
X_test_c, y_test_c = load_from_tsfile("./data/KDD_MTSC_TEST.ts")
print("Train shape:", X_train_c.shape)
print("Test shape:", X_test_c.shape)
Train shape: (40, 4, 100) Test shape: (40, 4, 100)
from aeon.visualisation import plot_collection_by_class
plot_collection_by_class(X_train_c[:,2,:], y_train_c)
(<Figure size 1200x600 with 2 Axes>, array([<Axes: >, <Axes: >], dtype=object))
X_train_r, y_train_r = load_from_tsfile("./data/KDD_MTSER_TRAIN.ts")
X_test_r, y_test_r = load_from_tsfile("./data/KDD_MTSER_TEST.ts")
print("Train shape:", X_train_r.shape)
print("Test shape:", X_test_r.shape)
Train shape: (72, 4, 100) Test shape: (72, 4, 100)
from matplotlib import pyplot as plt
plt.plot(X_train_r[0].T)
plt.legend(["Dim 0", "Dim 1", "Dim 2", "Dim 3"])
<matplotlib.legend.Legend at 0x1f1ecf8b190>
The time series forest (TSF) [1] is an ensemble of tree classifiers built on randomly selected intervals. For each tree, sqrt(n_timepoints)
interval subseries are selected with random length and position. From each of these intervals the mean, standard deviation and slope is extracted from each time series and concatenated into a feature vector. These new features are then used to build the tree, which is added to the ensemble.
The TimeSeriesForestClassifier
class is an aeon
classifier for the TSF algorithm. The classifier can be fitted to the training data and used to predict the class of the test data.
from sklearn.metrics import accuracy_score
from aeon.classification.interval_based import TimeSeriesForestClassifier
tsf_cls = TimeSeriesForestClassifier(random_state=42)
tsf_cls.fit(X_train_c, y_train_c)
tsf_preds_c = tsf_cls.predict(X_test_c)
accuracy_score(y_test_c, tsf_preds_c)
0.85
Interval based classifiers can provide insight into the importance of different parts of the time series. The temporal_importance_curves
method returns the importance of each interval in the time series. These curves can be plotted to show the importance of different intervals and features in the time series.
Currently, only BaseIntervalForest
estimators using the ContinuousIntervalTree
base support creating temporal importance curves.
In the plot below, we show that extracting the standard deviation from the beginning and end of channel 0 and the beginning and middle of channel 1 are important for the classification task.
from aeon.classification.sklearn import ContinuousIntervalTree
tsf_tic = TimeSeriesForestClassifier(base_estimator=ContinuousIntervalTree(), random_state=42)
tsf_tic.fit(X_train_c, y_train_c)
TimeSeriesForestClassifier(base_estimator=ContinuousIntervalTree(), random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
TimeSeriesForestClassifier(base_estimator=ContinuousIntervalTree(), random_state=42)
ContinuousIntervalTree()
ContinuousIntervalTree()
from aeon.visualisation import plot_temporal_importance_curves
names, curves = tsf_tic.temporal_importance_curves()
fig, ax = plot_temporal_importance_curves(curves, names, top_curves_shown=3, plot_mean=True)
fig.set_size_inches(10, 6)
fig.show()
C:\Users\Matthew Middlehurst\AppData\Local\Temp\ipykernel_16564\4070484312.py:6: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown fig.show()
The TimeSeriesForestRegressor
class is an aeon
regressor for the TSF algorithm.
from sklearn.metrics import mean_squared_error
from aeon.regression.interval_based import TimeSeriesForestRegressor
tsf_reg = TimeSeriesForestRegressor(random_state=42)
tsf_reg.fit(X_train_r, y_train_r)
tsf_preds_r = tsf_reg.predict(X_test_r)
mean_squared_error(y_test_r, tsf_preds_r)
0.65566000777802
from aeon.visualisation import plot_scatter_predictions
plot_scatter_predictions(y_test_r, tsf_preds_r)
(<Figure size 600x600 with 1 Axes>, <Axes: xlabel='Actual values', ylabel='Predicted values'>)
The canonical interval forest (CIF) [3] extends from the TSF algorithm. In addition to the three summary statistics used by TSF, CIF makes use of the features from the Catch22 [8] transformation. To increase the diversity of the ensemble, the number of TSF and Catch22 attributes is randomly subsampled per tree, selecting eight by default. Intervals remain randomly selected.
DrCIF [4] makes use of multiple series representations. The series is transformed into a periodogram and first order differences representations, and intervals are extracted from there as well.
While the Catch22 features are slower to calculate than the summary statistics, they are more informative and can improve the performance of the classifier.
The DrCIFClassifier
class is an aeon
classifier for the DrCIF algorithm. The algorithm performs well on our EEG example, achieving an accuracy of 0.95 on the below configuration.
Using the temporal importance curves plot, we can see that the most informative features come from the periodogram representation of the time series with the mean and max of the intervals from the end of the periodogram. The periodogram transform shortens the series, so the end of the curve plot from time point 64 onwards does not exist for this representation.
from aeon.classification.sklearn import ContinuousIntervalTree
from sklearn.metrics import accuracy_score
from aeon.classification.interval_based import DrCIFClassifier
cif_cls = DrCIFClassifier(n_estimators=100, base_estimator=ContinuousIntervalTree(), random_state=42)
cif_cls.fit(X_train_c, y_train_c)
cif_preds_c = cif_cls.predict(X_test_c)
accuracy_score(y_test_c, cif_preds_c)
0.95
from aeon.visualisation import plot_temporal_importance_curves
names, curves = cif_cls.temporal_importance_curves()
fig, ax = plot_temporal_importance_curves(curves, names, top_curves_shown=3, plot_mean=True)
fig.set_size_inches(10, 6)
The DrCIFRegressor
class is an aeon
regressor for the DrCIF algorithm.
from sklearn.metrics import mean_squared_error
from aeon.regression.interval_based import DrCIFRegressor
cif_reg = DrCIFRegressor(n_estimators=100, random_state=42)
cif_reg.fit(X_train_r, y_train_r)
cif_preds_r = cif_reg.predict(X_test_r)
mean_squared_error(y_test_r, cif_preds_r)
0.7120495131469782
from aeon.visualisation import plot_scatter_predictions
plot_scatter_predictions(y_test_r, cif_preds_r)
(<Figure size 600x600 with 1 Axes>, <Axes: xlabel='Actual values', ylabel='Predicted values'>)
The supervised time series forest (STSF) [5] makes a number of adjustments from the original TSF algorithm. A supervised method of selecting intervals replaces random selection. Features are extracted from intervals generated from additional representations in periodogram and first order differences. Median, min, max and interquartile range are included in the summary statistics extracted.
RSTSF [6] uses the same supervised process but with additional randomisation to increase diversity. Rather than an ensemble, the interbals are all extracted at once then used to build and extra trees classifier in a pipeline format.
The supervised interval extraction is only available for classification tasks currently. The RSTSF
class is an aeon
classifier for the RSTSF algorithm.
from aeon.classification.interval_based import RSTSF
from sklearn.metrics import accuracy_score
rstsf_cls = RSTSF(random_state=42)
rstsf_cls.fit(X_train_c, y_train_c)
rstsf_preds_c = rstsf_cls.predict(X_test_c)
accuracy_score(y_test_c, rstsf_preds_c)
0.95
The QUANT interval classifier extracts multiple quantiles from a fixed set of dyadic intervals. The series is continuously halved, with the quantiles from each subseries concatenated to form a feature vector. For each set of intervals extracted, the window is shifted by half the interval length to extract more intervals. Like previous algorithms, multiple series representations are used. The feature extraction is performed on the first order differences, second order differences, and a Fourier transform of the input series along with the original series.
Currently, the QUANT classifier is only available for classification tasks. The QUANTClassifier
class is an aeon
classifier for the QUANT algorithm. A version for regression will soon be available.
Note: QUANT is implemented using torch
. You will need to pip install torch
to run this code.
from sklearn.metrics import accuracy_score
from aeon.classification.interval_based import QUANTClassifier
quant_cls = QUANTClassifier(random_state=42)
quant_cls.fit(X_train_c, y_train_c)
quant_preds_c = quant_cls.predict(X_test_c)
accuracy_score(y_test_c, quant_preds_c)
0.95
Below we show the performance of the interval classifiers we have introduced in this notebook on the UCR TSC archive datasets [9] using results from a large scale comparison of TSC algorithms [10]. The results files are stored on timeseriesclassification.com.
from aeon.benchmarking import get_estimator_results_as_array
from aeon.datasets.tsc_datasets import univariate
names = ["TSF", "RISE", "CIF", "DrCIF", "STSF", "RSTSF", "QUANT", "1NN-DTW"]
results, present_names = get_estimator_results_as_array(
names, univariate, include_missing=False
)
results.shape
(112, 8)
import numpy as np
np.mean(results, axis=0)
array([0.78949898, 0.79287376, 0.83218294, 0.84985472, 0.82956842, 0.84855813, 0.85439771, 0.74206624])
from aeon.visualisation import plot_critical_difference
plot_critical_difference(results, names)
(<Figure size 600x270 with 1 Axes>, <Axes: >)
from aeon.visualisation import plot_boxplot_median
plot_boxplot_median(results, names, plot_type="boxplot")
(<Figure size 1000x600 with 1 Axes>, <Axes: >)
aeon
has transformation classes available for both the random and supervised methods of interval extraction.
from aeon.transformations.collection.interval_based import RandomIntervals
from aeon.utils.numba.stats import row_mean, row_std
rand_int = RandomIntervals(n_intervals=3, features=[row_mean, row_std])
rand_int.fit(X_train_c)
rand_int.transform(X_train_c)
array([[ 2.79178773e-06, 7.00553355e-06, -1.03336568e-05, 5.91278019e-06, -1.21532110e-06, 1.40214100e-05], [-1.19249840e-05, 1.92607752e-05, -7.73571803e-07, 7.83381023e-06, 2.04542303e-06, 1.54176836e-05], [ 2.16624288e-08, 1.23857351e-05, -5.97589174e-06, 5.10042374e-06, 4.82148204e-07, 1.32838369e-05], [-1.21167348e-05, 6.15997514e-06, -4.79926485e-06, 4.29394145e-06, 6.99026606e-07, 1.55456224e-05], [ 3.06131845e-06, 1.22718683e-05, -2.02003339e-06, 8.36990447e-06, 2.81618386e-06, 1.43893458e-05], [ 1.01280600e-05, 4.36681589e-06, -5.95266932e-06, 8.26655474e-06, 4.97379146e-06, 1.57106994e-05], [-6.61164787e-06, 9.12496777e-06, 1.10538169e-06, 5.59794057e-06, 1.32845475e-06, 1.31395063e-05], [-1.38681897e-05, 1.06896067e-05, 1.05977125e-06, 4.68217327e-06, 5.82584819e-06, 1.51930658e-05], [ 3.79417107e-06, 6.62607155e-06, 5.78732659e-07, 5.38827732e-06, 6.60740332e-06, 1.44981593e-05], [-6.88725118e-06, 1.26864034e-05, -7.91266509e-06, 5.80978202e-06, -4.07073127e-06, 1.48045736e-05], [-2.77813457e-06, 6.12300182e-06, -1.64955962e-07, 6.47926905e-06, 7.37534857e-07, 1.29796680e-05], [-8.98364551e-06, 8.91946180e-06, -8.85418011e-06, 4.80818644e-06, 4.96045506e-06, 1.56850591e-05], [-1.72087738e-06, 6.63406838e-06, -6.00830006e-06, 8.26884393e-06, -1.18551630e-05, 1.49924130e-05], [ 6.01516115e-06, 1.47044499e-05, 1.02665067e-06, 4.47330431e-06, 7.69532780e-06, 1.45330685e-05], [-2.82369035e-06, 1.85428726e-05, 2.88355746e-06, 7.45194534e-06, 5.45649320e-06, 1.83465248e-05], [-1.39334915e-05, 1.10270286e-05, 1.22343338e-06, 9.19145073e-06, -2.20060340e-06, 1.17517847e-05], [-3.54812456e-06, 1.70380565e-05, -3.68138437e-06, 5.08911914e-06, -4.99570672e-06, 1.46138768e-05], [-6.95260945e-06, 9.91589514e-06, -3.92004785e-06, 1.07268084e-05, -2.23117627e-06, 1.35153779e-05], [-1.18201485e-06, 8.98774574e-06, 2.97138443e-06, 4.42813424e-06, 3.79477590e-06, 1.46833351e-05], [-5.42352597e-06, 9.31518466e-06, -2.73249880e-06, 4.42825422e-06, -5.13404479e-06, 1.40283134e-05], [ 3.54620422e-06, 4.66777282e-06, 3.56934843e-06, 4.98415177e-06, -2.04932819e-05, 1.84881647e-05], [ 3.92186585e-06, 1.13116533e-05, 9.79880445e-07, 4.20147588e-06, 2.14822322e-07, 1.51430994e-05], [-8.44343680e-06, 5.32603766e-06, -3.43256663e-06, 4.29753213e-06, 6.02679120e-06, 9.72551733e-06], [ 3.61386609e-06, 2.96745175e-06, 6.63931679e-06, 3.85510078e-06, 3.25399006e-06, 7.21193033e-06], [ 1.29655023e-05, 3.01751646e-06, -1.32151838e-05, 6.47455104e-06, 1.19208269e-05, 1.39330967e-05], [-1.27530891e-05, 3.76517160e-06, 1.78286538e-06, 6.08102163e-06, -7.18325417e-06, 8.59114066e-06], [-1.03754122e-05, 2.50297130e-06, -1.90387181e-06, 4.90865374e-06, -5.64319010e-06, 7.59944394e-06], [ 3.26993790e-06, 2.90988199e-06, -8.21731698e-06, 6.87692508e-06, 4.44424946e-06, 1.10854631e-05], [ 1.69358435e-05, 4.10978411e-06, 7.96482001e-06, 5.88389048e-06, 1.56967049e-05, 1.42263968e-05], [-5.33565637e-06, 6.26703985e-06, -3.78084330e-06, 3.70554924e-06, -5.22148574e-06, 1.09143179e-05], [ 4.36674061e-06, 8.18412795e-06, 2.13309909e-05, 5.08039063e-06, 6.82103369e-06, 8.44103983e-06], [ 1.35910160e-06, 4.91535172e-06, -7.08370460e-06, 5.86146490e-06, -5.24920398e-06, 9.86561566e-06], [ 2.18805856e-05, 2.59611487e-06, 1.19971802e-05, 5.17597019e-06, 2.03438885e-05, 9.00783595e-06], [-3.35750001e-06, 2.92724860e-06, -2.55303081e-05, 6.04488777e-06, -1.01784484e-05, 1.94103319e-05], [-1.12986535e-05, 3.65393271e-06, -6.70277914e-06, 6.52399648e-06, 1.06592810e-05, 1.00678394e-05], [-1.35680300e-05, 5.23476331e-06, 4.63219623e-06, 5.06754422e-06, -2.91339938e-06, 1.60459294e-05], [ 9.12246019e-06, 3.74859292e-06, -1.33767586e-05, 7.22029925e-06, -1.75959770e-06, 9.31221466e-06], [ 1.60798671e-05, 7.89538007e-06, 1.08511808e-05, 5.98028608e-06, 1.25350003e-05, 1.00985848e-05], [ 8.60716478e-06, 4.87122855e-06, 9.84524652e-06, 4.40366535e-06, 1.63980425e-06, 8.64908567e-06], [-8.33453930e-06, 3.70112232e-06, -3.63708762e-06, 6.54974380e-06, 3.54094464e-06, 1.50164337e-05]])
from aeon.transformations.collection.interval_based import SupervisedIntervals
sup_int = SupervisedIntervals(n_intervals=1, features=[row_mean, row_std])
sup_int.fit(X_train_c, y_train_c)
sup_int.transform(X_train_c)
array([[-1.53204310e-05, -1.69415540e-05, -2.00034729e-05, ..., 7.49260582e-06, 8.78522462e-06, 3.27827601e-06], [-1.03658132e-05, 3.88462199e-06, -1.61891666e-06, ..., 6.11356875e-06, 7.19938184e-06, 5.92081090e-06], [-2.79596369e-06, -3.78408311e-06, -3.50255592e-06, ..., 5.39698660e-06, 5.51183168e-06, 3.62738874e-06], ..., [ 1.86491690e-05, 1.94140163e-05, 1.70760024e-05, ..., 5.51905175e-06, 3.94290242e-06, 4.84783972e-06], [-4.50684085e-06, 8.65383598e-07, 2.33870474e-06, ..., 4.34097676e-06, 3.98221724e-06, 2.79533143e-06], [ 2.98438226e-05, 1.28175026e-05, 1.72065405e-05, ..., 5.63069008e-06, 5.77779821e-06, 3.77031967e-06]])
A composable class is also available for both classification and regression, allowing for a configurable interval extraction ensemble.
from aeon.utils.numba.stats import row_mean, row_std, row_numba_max, row_numba_min
from aeon.classification.interval_based import IntervalForestClassifier
from aeon.classification.sklearn import ContinuousIntervalTree
from aeon.transformations.collection import PeriodogramTransformer
from sklearn.metrics import accuracy_score
int_clf = IntervalForestClassifier(
base_estimator=ContinuousIntervalTree(),
interval_selection_method="random",
n_intervals=10,
interval_features=[row_numba_min, row_numba_max, row_mean, row_std],
series_transformers=[None, PeriodogramTransformer()],
random_state=42,
)
int_clf.fit(X_train_c, y_train_c)
int_preds_c = int_clf.predict(X_test_c)
accuracy_score(y_test_c, int_preds_c)
0.775
from aeon.visualisation import plot_temporal_importance_curves
names, curves = int_clf.temporal_importance_curves()
fig, ax = plot_temporal_importance_curves(curves, names, top_curves_shown=3, plot_mean=True)
fig.set_size_inches(10, 6)
[1] Deng, Houtao, et al. "A time series forest for classification and feature extraction." Information Sciences 239 (2013): 142-153.
[2] Flynn, Michael, James Large, and Tony Bagnall. "The contract random interval spectral ensemble (c-RISE): The effect of contracting a classifier on accuracy." Hybrid Artificial Intelligent Systems: 14th International Conference, HAIS 2019, León, Spain, September 4–6, 2019, Proceedings 14. Springer International Publishing, 2019.
[3] Middlehurst, Matthew, James Large, and Anthony Bagnall. "The canonical interval forest (CIF) classifier for time series classification." 2020 IEEE international conference on big data (big data). IEEE, 2020.
[4] Middlehurst, Matthew, et al. "HIVE-COTE 2.0: a new meta ensemble for time series classification." Machine Learning 110.11 (2021): 3211-3243.
[5] Cabello, Nestor, et al. "Fast and accurate time series classification through supervised interval search." 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 2020.
[6] Cabello, Nestor, et al. "Fast, accurate and explainable time series classification through randomization." Data Mining and Knowledge Discovery 38.2 (2024): 748-811.
[7] Dempster, Angus, Daniel F. Schmidt, and Geoffrey I. Webb. "Quant: A minimalist interval method for time series classification." Data Mining and Knowledge Discovery (2024): 1-26.
[8] Lubba, Carl H., et al. "catch22: CAnonical Time-series CHaracteristics: Selected through highly comparative time-series analysis." Data Mining and Knowledge Discovery 33.6 (2019): 1821-1852.
[9] Dau, Hoang Anh, et al. "The UCR time series archive." IEEE/CAA Journal of Automatica Sinica 6.6 (2019): 1293-1305.
[10] Middlehurst, Matthew, Patrick Schäfer, and Anthony Bagnall. "Bake off redux: a review and experimental evaluation of recent time series classification algorithms." Data Mining and Knowledge Discovery (2024): 1-74.