#!/usr/bin/env python # coding: utf-8 # `KDD2024 Tutorial / A Hands-On Introduction to Time Series Classification and Regression` # # Dictionary-based Time Series Machine Learning in `aeon` # # Similar to other subseries extraction algorithms, dictionary approaches identify phase-independent subseries within time series data. These approaches convert each window into a sequence of discrete symbols, commonly referred to as a word. Dictionary methods differentiate themselves by analysing word frequency, a concept often described as a bag-of-words approach. This process can be summarised as follows: # # 1. Extracting subseries, or windows, from the time series; # 2. Converting each window of real values into a discrete-valued word (a sequence of symbols from a fixed alphabet); # 3. Constructing a sparse feature vector based on histograms of word counts; and # 4. Finally, applying a machine learning classification method to these feature vectors. # ## Table of Contents # # * [Load example data](#load-data) # * [Symbols Approximation Methods](#symbols) # * [SAX: Symbolic Aggregate approXimation](#sax) # * [Transforming Time Series To Symbols with SAX](#sax-transform) # * [SFA: Symbolic Fourier Approximation](#sfa) # * [Transforming Time Series To Symbols with SFA](#sfa-transform) # * [Bag-of-SFA-Symbols (BOSS)](#boss) # * [Temporal Dictionary Ensemble (TDE)](#tde) # * [Multiple Representations Sequence Miner (MrSQM)](#mrsqm) # * [WEASEL and WEASEL2](#weasel) # * [Performance on the UCR univariate classification datasets](#evaluation) # * [References](#references) # In[ ]: get_ipython().system('pip install aeon==0.11.0') get_ipython().system('mkdir -p data') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TRAIN.ts -P data/') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSC_TEST.ts -P data/') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TRAIN.ts -P data/') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSC_TEST.ts -P data/') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TRAIN.ts -P data/') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_MTSER_TEST.ts -P data/') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TRAIN.ts -P data/') get_ipython().system('wget -nc https://raw.githubusercontent.com/aeon-tutorials/KDD-2024/main/Notebooks/data/KDD_UTSER_TEST.ts -P data/') # In[1]: # There are some deprecation warnings present in the notebook, we will ignore them. # Remove this cell if you are interested in finding out what is changing soon, for # aeon there will be big changes in out v1.0.0 release! import warnings warnings.filterwarnings("ignore", category=FutureWarning) # In[2]: from aeon.registry import all_estimators all_estimators( "classifier", filter_tags={"algorithm_type": "dictionary"}, as_dataframe=True ) # In[3]: all_estimators( "regressor", filter_tags={"algorithm_type": "dictionary"}, as_dataframe=True ) # ## Load example data # In[4]: from aeon.datasets import load_from_tsfile X_train_c, y_train_c = load_from_tsfile("./data/KDD_UTSC_TRAIN.ts") X_test_c, y_test_c = load_from_tsfile("./data/KDD_UTSC_TEST.ts") print("Train shape:", X_train_c.shape) print("Test shape:", X_test_c.shape) # In[5]: from aeon.visualisation import plot_collection_by_class plot_collection_by_class(X_train_c[:,0,:], y_train_c) # ## Symbols Approximation Methods # Two well known approaches to approximate a time series into segments are: # ### SAX: Symbolic Aggregate approXimation # # SAX [[1]](#references) first applies a Piecewise Aggregate Approximation (PAA) [[2]](#references) which is a dimensionality reduction technique that replaces each segment by its mean value. # This is followed by replacing each PAA value by its dictionary element associated to it using a lookup table. # This lookup table, referred to as dictionary, is defined using the Percent Point Function (PPF) of the Gaussian distribution. # sax # #### Transforming Time Series To Symbols with SAX # In[6]: from aeon.transformations.collection.dictionary_based import SAX sax_transformer = SAX(n_segments=20, alphabet_size=500000, znormalized=True) X_train_c_sax = sax_transformer.fit_transform(X_train_c) X_train_c_sax.shape # In[7]: X_train_c_sax[0,0,:] # In[8]: import matplotlib.pyplot as plt X_train_c_inv_sax = sax_transformer.inverse_sax(X_train_c_sax, original_length=int(X_train_c.shape[-1])) plt.plot(X_train_c[0,0,:], color='blue') plt.plot(X_train_c_inv_sax[0,0,:], color='red') plt.legend(["original series", "sax series"]) # ### SFA: Symbolic Fourier Approximation # # SFA [[3]](#references) operates by first normalising the values in each window to achieve amplitude invariance. Next, dimensionality reduction is performed using a truncated Fourier transform, retaining only the first few coefficients, which helps filter out noise. The process continues with the derivation of discretisation bins through Multiple Coefficient Binning (MCB), where the real and imaginary components of the Fourier coefficients are binned using either equi-depth or equi-width methods. Finally, each coefficient is discretised into a symbol from a fixed-size alphabet, enhancing robustness against noise. # sfa # #### Transforming Time Series To Symbols with SFA # In[9]: from aeon.transformations.collection.dictionary_based import SFA sfa_transformer = SFA(word_length=5) X_train_c_sfa = sfa_transformer.fit(X_train_c).transform_words(X_train_c) X_train_c_sfa[0] # ## Bag-of-SFA-Symbols (BOSS) # # BOSS [[4]](#references) follows a similar process as described earlier, where each sliding window is converted into a word using the SFA method. A feature vector is then created by tallying the frequency of each word across all windows. Finally, a non-symmetric distance function is applied in conjunction with a 1-NN classifier to classify new instances. # # boss # In[10]: from aeon.classification.dictionary_based import BOSSEnsemble from sklearn.metrics import accuracy_score boss = BOSSEnsemble(random_state=42) boss.fit(X_train_c, y_train_c) boss_preds = boss.predict(X_test_c) accuracy_score(y_test_c, boss_preds) # `ContractableBOSS` [[5]](#references) is a variation of the BOSS algorithm that uses a more efficient method of building the ensemble using filtered randomly parameterised members. # In[11]: from aeon.classification.dictionary_based import ContractableBOSS cboss = ContractableBOSS(random_state=42) cboss.fit(X_train_c, y_train_c) cboss_preds = cboss.predict(X_test_c) accuracy_score(y_test_c, cboss_preds) # ## Temporal Dictionary Ensemble (TDE) # # The temporal dictionary ensemble [[6]](#references) further extends the BOSS dictionary ensemble concept, making improvements to the word extraction and ensemble building processes. Iherited from WEASEL (introduced below) TDE uses information gain to select its letter bins for words. TDE also uses a spatial pyramid of words to capture the temporal structure of the time series data. # # Rather than randomly selecting parameters or finding them through a grid search like previous ensembles, TDE uses a Bayesian optimisation inspired approach to select the parameters for its ensemble. # # information gain binning # spatial pyramid of words in tde # In[12]: from aeon.classification.dictionary_based import TemporalDictionaryEnsemble from sklearn.metrics import accuracy_score tde = TemporalDictionaryEnsemble(random_state=42) tde.fit(X_train_c, y_train_c) tde_preds = tde.predict(X_test_c) accuracy_score(y_test_c, tde_preds) # ## Multiple Representations Sequence Miner (MrSQM) # # MrSQM [[7,8]](#references) combines SFA and SAX symbolic transformations to discretize time series subseries into words and uses a logistic regression classifier. It uniquely employs a trie to store and rank frequent substrings, selecting features using either a supervised chi-squared test or an unsupervised random substring sampling method to avoid redundancy. Additionally, MrSQM adjusts the number of learned representations and the window size parameter based on the time series length, utilizing an exponential scale. # # __Note:__ You will need to `pip install mraqm` to run this code. This is only available for maxOS and some linux distributions. # mrsqm # In[13]: from aeon.classification.dictionary_based import MrSQMClassifier mrsqm = MrSQMClassifier(random_state=42) mrsqm.fit(X_train_c, y_train_c) mrsqm_preds = mrsqm.predict(X_test_c) accuracy_score(y_test_c, mrsqm_preds) # ## WEASEL and WEASEL2 # # WEASEL v1.0 [[9]](#references) is a time series classification pipeline that identifies and retains discriminative words by generating histograms of word counts across various window sizes and lengths. It uses a Chi-squared test for feature selection, discarding non-discriminative words, and then trains a linear Ridge classifier on the selected features. WEASEL also employs a supervised variation of SFA and an information-gain approach to optimizse word creation and class separation. # # WEASEL v2.0 [[10]](#references) is an enhanced version of the original WEASEL classifier that reduces memory usage and improves accuracy by incorporating randomly parameterized SFA transformations and a dilated sliding window approach. This new approach extracts subseries with fixed gaps between values, applies a Fourier transform, and generates words using SFA. Additionally, it introduces a variance-based feature selection strategy to retain only the Fourier values with the highest variance, further optimizing performance. # weasel # In[14]: from aeon.classification.dictionary_based import WEASEL from sklearn.metrics import accuracy_score weasel = WEASEL() weasel.fit(X_train_c, y_train_c) weasel_preds = weasel.predict(X_test_c) accuracy_score(y_test_c, weasel_preds) # In[15]: from aeon.classification.dictionary_based import WEASEL_V2 weasel2 = WEASEL_V2() weasel2.fit(X_train_c, y_train_c) weasel2_preds = weasel2.predict(X_test_c) accuracy_score(y_test_c, weasel2_preds) # ## Performance on the UCR univariate classification datasets # # Below we show the performance of the `BOSS`, `MrSQM`, `WEASEL` and `WEASEL2` models on the UCR TSC archive datasets [[11]](#references) using results from TSC bake off in 2024 [[12]](#references). # In[16]: from aeon.benchmarking import get_estimator_results_as_array from aeon.datasets.tsc_datasets import univariate names = ["BOSS", "cBOSS", "TDE", "MrSQM", "WEASEL", "WEASEL_V2", "1NN-DTW"] results, present_names = get_estimator_results_as_array( names, univariate, include_missing=False ) results.shape # In[17]: from aeon.visualisation import plot_critical_difference from aeon.visualisation import plot_boxplot_median plot_critical_difference(results, names) plot_boxplot_median(results, names, plot_type="boxplot") # ## References # # [1] E. Keogh & M. Pazzani. Scaling up dynamic time warping for datamining applications. SIGKDD 2000, pp. 285–289. # # [2] J. Lin, E. Keogh, L. Wei, et al. Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 2007. vol. 15(107) # # [3] Schäfer, Patrick, and Mikael Högqvist. ”SFA: a symbolic fourier approximation and index for similarity search in high # dimensional datasets.” Proceedings of the 15th international conference on extending database technology. 2012. # # [4] Schäfer, Patrick. ”The BOSS is concerned with time series classification in the presence of noise.” Data Mining and Knowledge Discovery 29 (2015): 1505-1530. # # [5] Middlehurst, Matthew, William Vickers, and Anthony Bagnall. "Scalable dictionary classifiers for time series classification." In Intelligent Data Engineering and Automated Learning–IDEAL 2019: 20th International Conference, Manchester, UK, November 14–16, 2019, Proceedings, Part I 20, pp. 11-19. Springer International Publishing, 2019. # # [6] Middlehurst, Matthew, James Large, Gavin Cawley, and Anthony Bagnall. "The temporal dictionary ensemble (TDE) classifier for time series classification." In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part I, pp. 660-676. Springer International Publishing, 2021. # # [7] Nguyen, Thach Le, and Georgiana Ifrim. ”Fast time series classification with random symbolic subsequences.” # Advanced Analytics and Learning on Temporal Data: 7th ECML PKDD Workshop, AALTD 2022 # # [8] Nguyen, Thach Le, and Georgiana Ifrim. ”MrSQM: Fast time series classification with symbolic representations.” # arXiv preprint arXiv:2109.01036 (2021). # # [9] Schäfer, Patrick, and Ulf Leser. ”Fast and accurate time series classification with weasel.” Proceedings of the 2017 # ACM on Conference on Information and Knowledge Management. 2017. # # [10] Schäfer, Patrick, and Ulf Leser. ”WEASEL 2.0: a random dilated dictionary transform for fast, accurate and # memory constrained time series classification.” Machine Learning 112.12 (2023): 4763-4788 # # [11] Dau, Hoang Anh, et al. "The UCR time series archive." IEEE/CAA Journal of Automatica Sinica 6.6 (2019): 1293-1305. # # [12] Middlehurst, Matthew, Patrick Schäfer, and Anthony Bagnall. "Bake off redux: a review and experimental evaluation of recent time series classification algorithms." Data Mining and Knowledge Discovery (2024): 1-74. # # [Return to Table of Contents](#toc)