catch22[1] is a collection of 22 time series features extracted from the 7000+ present in the hctsa [2][3] toolbox. A hierarchical clustering was performed on the correlation matrix of features that performed better than random chance to remove redundancy. These clusters were sorted by balanced accuracy using a decision tree classifier and a single feature was selected from the 22 clusters formed, taking into account balanced accuracy results, computational efficiency and interpretability.
In this notebook, we will demonstrate how to use the catch22 transformer on the ItalyPowerDemand univariate and BasicMotions multivariate datasets. We also show catch22 used for classification with a random forest classifier.
[1] Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., & Jones, N. S. (2019). catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery, 33(6), 1821-1852.
[2] Fulcher, B. D., & Jones, N. S. (2017). hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell systems, 5(5), 527-531.
[3] Fulcher, B. D., Little, M. A., & Jones, N. S. (2013). Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 20130048.
from sklearn import metrics
from aeon.classification.feature_based import Catch22Classifier
from aeon.datasets import load_basic_motions, load_italy_power_demand
from aeon.transformations.collection.feature_based import Catch22
IPD_X_train, IPD_y_train = load_italy_power_demand(split="train")
IPD_X_test, IPD_y_test = load_italy_power_demand(split="test")
IPD_X_test = IPD_X_test[:50]
IPD_y_test = IPD_y_test[:50]
print(IPD_X_train.shape, IPD_y_train.shape, IPD_X_test.shape, IPD_y_test.shape)
BM_X_train, BM_y_train = load_basic_motions(split="train")
BM_X_test, BM_y_test = load_basic_motions(
split="test",
)
print(BM_X_train.shape, BM_y_train.shape, BM_X_test.shape, BM_y_test.shape)
(67, 1, 24) (67,) (50, 1, 24) (50,) (40, 6, 100) (40,) (40, 6, 100) (40,)
c22_uv = Catch22()
c22_uv.fit(IPD_X_train, IPD_y_train)
transformed_data_uv = c22_uv.transform(IPD_X_train)
print(transformed_data_uv.shape)
(67, 22)
Transformation of multivariate data is supported by Catch22
.
The default procedure will concatenate each column prior to transformation.
c22_mv = Catch22()
c22_mv.fit(BM_X_train, BM_y_train)
Catch22()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Catch22()
transformed_data_mv = c22_mv.transform(BM_X_train)
print(transformed_data_mv.shape)
(40, 132)
For classification tasks the default classifier to use with the catch22 features is random forest classifier.
An implementation making use of the RandomForestClassifier
from sklearn built on catch22 features is provided in the form on the Catch22Classifier
for ease of use.
c22f = Catch22Classifier(random_state=0)
c22f.fit(IPD_X_train, IPD_y_train)
Catch22Classifier(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Catch22Classifier(random_state=0)
c22f_preds = c22f.predict(IPD_X_test)
print("C22F Accuracy: " + str(metrics.accuracy_score(IPD_y_test, c22f_preds)))
C22F Accuracy: 0.86