#!/usr/bin/env python
# coding: utf-8

# # The Canonical Time-series Characteristics (catch22) transform
# 
# catch22\[1\] is a collection of 22 time series features extracted from the 7000+ present in the _hctsa_ \[2\]\[3\] toolbox.
# A hierarchical clustering was performed on the correlation matrix of features that performed better than random chance to remove redundancy.
# These clusters were sorted by balanced accuracy using a decision tree classifier and a single feature was selected from the 22 clusters formed, taking into account balanced accuracy results, computational efficiency and interpretability.
# 
# In this notebook, we will demonstrate how to use the catch22 transformer on the ItalyPowerDemand univariate and BasicMotions multivariate datasets. We also show catch22 used for classification with a random forest classifier.
# 
# #### References:
# 
# \[1\] Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., & Jones, N. S. (2019). catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery, 33(6), 1821-1852.
# 
# \[2\] Fulcher, B. D., & Jones, N. S. (2017). hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell systems, 5(5), 527-531.
# 
# \[3\] Fulcher, B. D., Little, M. A., & Jones, N. S. (2013). Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 20130048.

# ## 1. Imports

# In[18]:


from sklearn import metrics

from aeon.classification.feature_based import Catch22Classifier
from aeon.datasets import load_basic_motions, load_italy_power_demand
from aeon.transformations.collection.catch22 import Catch22


# ## 2. Load data

# In[19]:


IPD_X_train, IPD_y_train = load_italy_power_demand(split="train")
IPD_X_test, IPD_y_test = load_italy_power_demand(split="test")
IPD_X_test = IPD_X_test[:50]
IPD_y_test = IPD_y_test[:50]

print(IPD_X_train.shape, IPD_y_train.shape, IPD_X_test.shape, IPD_y_test.shape)

BM_X_train, BM_y_train = load_basic_motions(split="train")
BM_X_test, BM_y_test = load_basic_motions(
    split="test",
)

print(BM_X_train.shape, BM_y_train.shape, BM_X_test.shape, BM_y_test.shape)


# ## 3. catch22 transform
# 
# ### Univariate
# 
# The catch22 features are provided in the form of a transformer, `Catch22`.
# From this the transformed data can be used for a variety of time series analysis tasks.

# In[20]:


c22_uv = Catch22()
c22_uv.fit(IPD_X_train, IPD_y_train)
transformed_data_uv = c22_uv.transform(IPD_X_train)
print(transformed_data_uv.shape)


# ### Multivariate
# 
# Transformation of multivariate data is supported by `Catch22`.
# The default procedure will concatenate each column prior to transformation.

# In[21]:


c22_mv = Catch22()
c22_mv.fit(BM_X_train, BM_y_train)


# In[22]:


transformed_data_mv = c22_mv.transform(BM_X_train)
print(transformed_data_mv.shape)


# ## 4. catch22 Forest Classifier
# 
# For classification tasks the default classifier to use with the catch22 features is random forest classifier.
# An implementation making use of the `RandomForestClassifier` from sklearn built on catch22 features is provided in the form on the `Catch22Classifier` for ease of use.

# In[23]:


c22f = Catch22Classifier(random_state=0)
c22f.fit(IPD_X_train, IPD_y_train)


# In[24]:


c22f_preds = c22f.predict(IPD_X_test)
print("C22F Accuracy: " + str(metrics.accuracy_score(IPD_y_test, c22f_preds)))