#!/usr/bin/env python # coding: utf-8 # # The Canonical Time-series Characteristics (catch22) transform # # catch22\[1\] is a collection of 22 time series features extracted from the 7000+ present in the _hctsa_ \[2\]\[3\] toolbox. # A hierarchical clustering was performed on the correlation matrix of features that performed better than random chance to remove redundancy. # These clusters were sorted by balanced accuracy using a decision tree classifier and a single feature was selected from the 22 clusters formed, taking into account balanced accuracy results, computational efficiency and interpretability. # # In this notebook, we will demonstrate how to use the catch22 transformer on the ItalyPowerDemand univariate and BasicMotions multivariate datasets. We also show catch22 used for classification with a random forest classifier. # # #### References: # # \[1\] Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., & Jones, N. S. (2019). catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery, 33(6), 1821-1852. # # \[2\] Fulcher, B. D., & Jones, N. S. (2017). hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell systems, 5(5), 527-531. # # \[3\] Fulcher, B. D., Little, M. A., & Jones, N. S. (2013). Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 20130048. # ## 1. Imports # In[18]: from sklearn import metrics from aeon.classification.feature_based import Catch22Classifier from aeon.datasets import load_basic_motions, load_italy_power_demand from aeon.transformations.collection.catch22 import Catch22 # ## 2. Load data # In[19]: IPD_X_train, IPD_y_train = load_italy_power_demand(split="train") IPD_X_test, IPD_y_test = load_italy_power_demand(split="test") IPD_X_test = IPD_X_test[:50] IPD_y_test = IPD_y_test[:50] print(IPD_X_train.shape, IPD_y_train.shape, IPD_X_test.shape, IPD_y_test.shape) BM_X_train, BM_y_train = load_basic_motions(split="train") BM_X_test, BM_y_test = load_basic_motions( split="test", ) print(BM_X_train.shape, BM_y_train.shape, BM_X_test.shape, BM_y_test.shape) # ## 3. catch22 transform # # ### Univariate # # The catch22 features are provided in the form of a transformer, `Catch22`. # From this the transformed data can be used for a variety of time series analysis tasks. # In[20]: c22_uv = Catch22() c22_uv.fit(IPD_X_train, IPD_y_train) transformed_data_uv = c22_uv.transform(IPD_X_train) print(transformed_data_uv.shape) # ### Multivariate # # Transformation of multivariate data is supported by `Catch22`. # The default procedure will concatenate each column prior to transformation. # In[21]: c22_mv = Catch22() c22_mv.fit(BM_X_train, BM_y_train) # In[22]: transformed_data_mv = c22_mv.transform(BM_X_train) print(transformed_data_mv.shape) # ## 4. catch22 Forest Classifier # # For classification tasks the default classifier to use with the catch22 features is random forest classifier. # An implementation making use of the `RandomForestClassifier` from sklearn built on catch22 features is provided in the form on the `Catch22Classifier` for ease of use. # In[23]: c22f = Catch22Classifier(random_state=0) c22f.fit(IPD_X_train, IPD_y_train) # In[24]: c22f_preds = c22f.predict(IPD_X_test) print("C22F Accuracy: " + str(metrics.accuracy_score(IPD_y_test, c22f_preds)))