#!/usr/bin/env python # coding: utf-8 # # Distance based time series classification in aeon # # Distance based classifiers use a time series specific distance function to measure the # similarity between time series. Time series distance functions are # often called elastic distances, since they compensate for possible misalignment # between series by shifting or editing the series. # # Dynamic time warping is the best known elastic distance measure. This image # demonstrates how a warping path is found between two series # A visualisation of dynamic time warping # # We have a range of elastic distance functions in the distances module. Please see the # distances notebook for more information. Distance functions have been mostly used # with a nearest neighbour (NN) classifier. # # Example of warping two series to the best
# alignment. # # # ## Load data and list distance based classifiers # In[1]: from sklearn import metrics from aeon.datasets import load_italy_power_demand from aeon.registry import all_estimators X_train, y_train = load_italy_power_demand(split="train", return_X_y=True) X_test, y_test = load_italy_power_demand(split="test", return_X_y=True) X_test = X_test[:10] y_test = y_test[:10] print(X_train.shape, y_train.shape, X_test.shape, y_test.shape) # ## Distance based classifiers # # In[2]: # search for all classifiers that can handle multivariate time series. This will # give some UserWarnings if soft dependencies are not installed. Rerun to remove # warnings. all_estimators("classifier", filter_tags={"algorithm_type": "distance"}) # # # # In[3]: from aeon.classification.distance_based import ( ElasticEnsemble, KNeighborsTimeSeriesClassifier, ShapeDTW, ) # ## K-NN: KNeighborsTimeSeriesClassifier in aeon # k-NN is often called a lazy classifier, because there is little work done in # the fit operation. The fit operation simply stores the training data. When we want to # make a prediction for a new time series, k-NN measures the distance between the new # time series and all the series in the training data and records the class of the # closest k train series. The class labels of these nearest neighbours are used to make # a prediction: if they are all the same label, then that is the prediction. If they # differ, then some form of voting mechanism is required. For example, we may predict # the most common class label amongst the nearest neighbours for the test instance. # # KNeighborsTimeSeriesClassifier in aeon is configurable to use any of the distances # functions in the distance module, or it can be passed a bespoke callable. You can set # the number of neighbours and the weights. Weights are used in the prediction # process when neightbours differ in class values. By default all neighbours have an # equal vote. There is an option to weight by distance, meaning closer neighbours have # more weight in the vote. # In[4]: knn = KNeighborsTimeSeriesClassifier(distance="msm", n_neighbors=3, weights="distance") knn.fit(X_train, y_train) knn_preds = knn.predict(X_test) metrics.accuracy_score(y_test, knn_preds) # ## Elastic Ensemble: ElasticEnsemble in aeon # # The first algorithm to significantly out perform 1-NN DTW on the UCR data was the # Elastic Ensemble (EE) [1]. EE is a weighted ensemble of 11 1-NN classifiers with a # range of elastic distance measures. It was the best performing distance based # classifier in the bake off. Elastic distances can be slow, and EE requires cross # validation to find the weights of each classifier in the ensemble. You can configure # EE to use specified distance functions, and tell it how much # # # In[5]: ee = ElasticEnsemble( distance_measures=["dtw", "msm"], proportion_of_param_options=0.1, proportion_train_in_param_finding=0.3, proportion_train_for_test=0.5, ) ee.fit(X_train, y_train) ee_preds = ee.predict(X_test) metrics.accuracy_score(y_test, ee_preds) # ### Shape Dynamic Time Warping: ShapeDTW in aeon. # Shape based DTW (ShapeDTW) [2] works by extracting a set of shape descriptors (such # as slope and derivative) over windows of each series. These series to series transformed data are then used with 1-NN with DTW. # # # # In[6]: shape = ShapeDTW() shape.fit(X_train, y_train) shape_preds = shape.predict(X_test) metrics.accuracy_score(y_test, shape_preds) # ### Proximity Forest # # Proximity Forest [3] is a distance based ensemble of decision trees. Its is the # most accurate purely distance based technique for TSC that we know of. We do not # currently have a working version of PF in aeon, but would very much like to have one. # please see this issue. https://github.com/aeon-toolkit/aeon/issues/159 # ### Comparing performance: coming soon # # # ## References # [1] Lines J, Bagnall A (2015) Time series classification with ensembles of elastic # distance measures. Data Mining and Knowledge Discovery 29:565–592 # [2] Zhao J. and Itti L (2019) shapeDTW: Shape Dynamic Time Warping, Pattern # Recognition 74:171-184 https://arxiv.org/pdf/1606.01601.pdf # [3] Lucas et al. (2019) Proximity Forest: an effective and scalable distance-based # classifier. Data Mining and Knowledge Discovery 33: 607--635 https://arxiv.org/abs/1808.10594 # #