KDD2024 Tutorial / A Hands-On Introduction to Time Series Classification and Regression
To finish off, we build all classifiers and regressors on our example EEG data and compare accuracy. We do not go into depth into the relative performance, because this these are very small toy datasets. We start by listing classifiers by their capabilities
!pip install aeon==0.11.0
!mkdir -p data
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TEST.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TRAIN.ts -P data/
!wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TEST.ts -P data/
# There are some deprecation warnings present in the notebook, we will ignore them.
# Remove this cell if you are interested in finding out what is changing soon, for
# aeon there will be big changes in out v1.0.0 release!
import warnings
warnings.filterwarnings("ignore")
from aeon.registry import all_estimators
uni_cls=all_estimators("classifier", filter_tags={"capability:multivariate": False},
as_dataframe=True)
print("Univariate series only classifiers\n",uni_cls.iloc[:,0])
uni_reg=all_estimators("regressor", filter_tags={"capability:multivariate": False},
as_dataframe=True)
print("Univariate only regressors\n",uni_reg.iloc[:,0])
multi_cls=all_estimators("classifier", filter_tags={"capability:multivariate": True},
as_dataframe=True)
print("Classifiers that can handle multivariate\n",multi_cls.iloc[:,0])
multi_reg=all_estimators("regressor", filter_tags={"capability:multivariate": True},
as_dataframe=True)
print("Regressors that can handle multivariate\n",multi_reg.iloc[:,0])
Univariate series only classifiers 0 BOSSEnsemble 1 ClassifierPipeline 2 ContractableBOSS 3 HIVECOTEV1 4 IndividualBOSS 5 MrSQMClassifier 6 ProximityForest 7 ProximityTree 8 RSASTClassifier 9 SASTClassifier 10 WEASEL 11 WEASEL_V2 12 WeightedEnsembleClassifier Name: name, dtype: object Univariate only regressors 0 RegressorPipeline Name: name, dtype: object Classifiers that can handle multivariate 0 Arsenal 1 CNNClassifier 2 CanonicalIntervalForestClassifier 3 Catch22Classifier 4 ChannelEnsembleClassifier 5 DrCIFClassifier 6 DummyClassifier 7 ElasticEnsemble 8 EncoderClassifier 9 FCNClassifier 10 FreshPRINCEClassifier 11 HIVECOTEV2 12 HydraClassifier 13 InceptionTimeClassifier 14 IndividualInceptionClassifier 15 IndividualLITEClassifier 16 IndividualOrdinalTDE 17 IndividualTDE 18 IntervalForestClassifier 19 KNeighborsTimeSeriesClassifier 20 LITETimeClassifier 21 LearningShapeletClassifier 22 MLPClassifier 23 MUSE 24 MultiRocketHydraClassifier 25 OrdinalTDE 26 QUANTClassifier 27 RDSTClassifier 28 REDCOMETS 29 RISTClassifier 30 RSTSF 31 RandomIntervalClassifier 32 RandomIntervalSpectralEnsembleClassifier 33 ResNetClassifier 34 RocketClassifier 35 ShapeletTransformClassifier 36 SignatureClassifier 37 SummaryClassifier 38 SupervisedIntervalClassifier 39 SupervisedTimeSeriesForest 40 TSFreshClassifier 41 TapNetClassifier 42 TemporalDictionaryEnsemble 43 TimeCNNClassifier 44 TimeSeriesForestClassifier Name: name, dtype: object Regressors that can handle multivariate 0 CNNRegressor 1 CanonicalIntervalForestRegressor 2 Catch22Regressor 3 DrCIFRegressor 4 DummyRegressor 5 EncoderRegressor 6 FCNRegressor 7 FreshPRINCERegressor 8 HydraRegressor 9 InceptionTimeRegressor 10 IndividualInceptionRegressor 11 IndividualLITERegressor 12 IntervalForestRegressor 13 KNeighborsTimeSeriesRegressor 14 LITETimeRegressor 15 MLPRegressor 16 MultiRocketHydraRegressor 17 RDSTRegressor 18 RISTRegressor 19 RandomIntervalRegressor 20 RandomIntervalSpectralEnsembleRegressor 21 ResNetRegressor 22 RocketRegressor 23 SummaryRegressor 24 TSFreshRegressor 25 TapNetRegressor 26 TimeCNNRegressor 27 TimeSeriesForestRegressor Name: name, dtype: object
Currently few classifiers and regessors support unequal length series and none internally handle missing values. This will change soon. Until then, we advise using padding or truncation
uneq_cls=all_estimators("classifier", filter_tags={"capability:unequal_length": True},
as_dataframe=True)
print("Classifiers that can handle unequal length series\n",uneq_cls.iloc[:,0])
uneq_reg=all_estimators("regressor", filter_tags={"capability:unequal_length": True},
as_dataframe=True)
print("Regressors that can handle unequal length series\n",uneq_reg.iloc[:,0])
Classifiers that can handle unequal length series 0 Catch22Classifier 1 DummyClassifier 2 ElasticEnsemble 3 KNeighborsTimeSeriesClassifier 4 RDSTClassifier Name: name, dtype: object Regressors that can handle unequal length series 0 Catch22Regressor 1 DummyRegressor 2 KNeighborsTimeSeriesRegressor 3 RDSTRegressor Name: name, dtype: object
We can create, fit and predict with these list of classifiers. We will use the EEG data made for this tutorial. Do not interpret much with regard to relative performance, this is for illustrative purposes only. However, the variance in results does suggest that the classifiers work differently. We exclude estimators that require arguments in the constructor such as Pipelines.
from aeon.datasets import load_from_tsfile
X_train_c, y_train_c = load_from_tsfile("./data/KDD_UTSC_TRAIN.ts")
X_test_c, y_test_c = load_from_tsfile("./data/KDD_UTSC_TEST.ts")
for _, c in uni_cls.iterrows():
if c[0] not in ["ClassifierPipeline","MrSQMClassifier","WeightedEnsembleClassifier"]:
clf = c[1]()
clf.fit(X_train_c, y_train_c)
print(c[0]," accuracy = ", clf.score(X_test_c, y_test_c))
print()
BOSSEnsemble accuracy = 0.6 ContractableBOSS accuracy = 0.6 HIVECOTEV1 accuracy = 0.825 IndividualBOSS accuracy = 0.525 ProximityForest accuracy = 0.55 ProximityTree accuracy = 0.6 RSASTClassifier accuracy = 0.6 SASTClassifier accuracy = 0.625 WEASEL accuracy = 0.725 WEASEL_V2 accuracy = 0.6
We can use multivariate classifiers on univariate data (except for MUSE). Some are excluded from this example because they require constructor arguments, are very slow especially on CPU, require non standard imports or generate many warnings on this data. This cell will take a while to execute.
excl = ["MUSE", "EncoderClassifier", "ChannelEnsembleClassifier","FCNClassifier",
"HIVECOTEV2", "IntervalForestClassifier", "LearningShapeletClassifier",
"InceptionTimeClassifier","IndividualInceptionClassifier",
"IndividualOrdinalTDE","OrdinalTDE","TapNetClassifier",
"SignatureClassifier","ResNetClassifier","LITETimeClassifier",
"IndividualLITETimeClassifier", "MLPClassifier","CNNClassifier",
"SupervisedIntervalClassifier","REDCOMETS", "ElasticEnsemble"]
for _, c in multi_cls.iterrows():
if c[0] not in excl:
clf = c[1]()
clf.fit(X_train_c, y_train_c)
print(c[0]," accuracy = ", clf.score(X_test_c, y_test_c))
print()
Arsenal accuracy = 0.425 CanonicalIntervalForestClassifier accuracy = 0.8 Catch22Classifier accuracy = 0.8 DrCIFClassifier accuracy = 0.75 DummyClassifier accuracy = 0.5 FreshPRINCEClassifier accuracy = 0.8 HydraClassifier accuracy = 0.775 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 207ms/step IndividualLITEClassifier accuracy = 0.875 IndividualTDE accuracy = 0.625 KNeighborsTimeSeriesClassifier accuracy = 0.55 MultiRocketHydraClassifier accuracy = 0.825 QUANTClassifier accuracy = 0.775 RDSTClassifier accuracy = 0.9 RISTClassifier accuracy = 0.85 RSTSF accuracy = 0.8 RandomIntervalClassifier accuracy = 0.75 RandomIntervalSpectralEnsembleClassifier accuracy = 0.9 RocketClassifier accuracy = 0.425 ShapeletTransformClassifier accuracy = 0.65 SummaryClassifier accuracy = 0.7 SupervisedTimeSeriesForest accuracy = 0.8 TSFreshClassifier accuracy = 0.825 TemporalDictionaryEnsemble accuracy = 0.5 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step TimeCNNClassifier accuracy = 0.5 TimeSeriesForestClassifier accuracy = 0.725
All of the regressors can handle mulivariate
from aeon.datasets import load_from_tsfile
from sklearn.metrics import mean_squared_error
X_train_c, y_train_c = load_from_tsfile("./data/KDD_UTSER_TRAIN.ts")
X_test_c, y_test_c = load_from_tsfile("./data/KDD_UTSER_TEST.ts")
excl = ["RegressorPipeline","CNNRegressor","FCNRegressor",
"InceptionTimeRegressor","IndividualInceptionRegressor",
"EncoderRegressor","ResNetRegressor","IndividualLITERegressor",
"LITETimeRegressor", "TapNetRegressor"]
for _, c in multi_reg.iterrows():
if c[0] not in excl:
clf = c[1]()
clf.fit(X_train_c, y_train_c)
y_pred= clf.predict(X_test_c)
print(c[0]," MSE = ", mean_squared_error(y_test_c, y_pred))
print()
CanonicalIntervalForestRegressor MSE = 0.8481033528977171 Catch22Regressor MSE = 0.8885051219549379 DrCIFRegressor MSE = 0.7587243502357529 DummyRegressor MSE = 1.3269463963115031 FreshPRINCERegressor MSE = 0.7817647290295615 HydraRegressor MSE = 0.9064081505288237 IntervalForestRegressor MSE = 0.8274072847127886 KNeighborsTimeSeriesRegressor MSE = 1.3924300952980344 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step MLPRegressor MSE = 22.353821472855486 MultiRocketHydraRegressor MSE = 0.9717904133897317 RDSTRegressor MSE = 0.8214062698138948 RISTRegressor MSE = 0.7026925551976874 RandomIntervalRegressor MSE = 0.8220888655006664 RandomIntervalSpectralEnsembleRegressor MSE = 0.8848753973613874 RocketRegressor MSE = 1.1483157858379491 SummaryRegressor MSE = 0.8242166680716194 TSFreshRegressor MSE = 0.7928987608546907 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step TimeCNNRegressor MSE = 1.6028747634206029 TimeSeriesForestRegressor MSE = 0.8388172082622043
We can directly pull published results from our website timeseriesclassification.com. See this notebook for more details on how to do this.
from aeon.benchmarking import get_available_estimators
from aeon.benchmarking.results_loaders import (
get_estimator_results_as_array,
)
from aeon.visualisation import (
plot_critical_difference,
)
cls = get_available_estimators(task="classification", return_dataframe=False)
print(len(cls), " classifier results available\n", cls)
resamples_all, data_names = get_estimator_results_as_array(
estimators=cls, default_only=False
)
# results are loaded from
# https://timeseriesclassification.com/results/ReferenceResults.
# You can download the files directly from there
print(resamples_all.shape)
classifiers = [
"FreshPRINCE",
"HIVECOTEV2",
"InceptionTime",
"WEASEL-D",
"MR-Hydra",
"RDST",
"QUANT",
"PF"
]
resamples_all, data_names = get_estimator_results_as_array(
estimators=classifiers, default_only=False
)
plot = plot_critical_difference(
resamples_all, classifiers, test="wilcoxon", correction="holm"
)
40 classifier results available ['1NN-DTW', 'Arsenal', 'BOSS', 'CIF', 'CNN', 'Catch22', 'DrCIF', 'EE', 'FreshPRINCE', 'GRAIL', 'H-InceptionTime', 'HC1', 'HC2', 'Hydra', 'InceptionTime', 'LiteTime', 'MR', 'MR-Hydra', 'MiniROCKET', 'MrSQM', 'PF', 'QUANT', 'R-STSF', 'RDST', 'RISE', 'RIST', 'ROCKET', 'RSF', 'ResNet', 'STC', 'STSF', 'ShapeDTW', 'Signatures', 'TDE', 'TS-CHIEF', 'TSF', 'TSFresh', 'WEASEL-1.0', 'WEASEL-2.0', 'cBOSS'] (112, 40)