#!/usr/bin/env python
# coding: utf-8
# # Distance based time series classification in aeon
#
# Distance based classifiers use a time series specific distance function to measure the
# similarity between time series. Time series distance functions are
# often called elastic distances, since they compensate for possible misalignment
# between series by shifting or editing the series.
#
# Dynamic time warping is the best known elastic distance measure. This image
# demonstrates how a warping path is found between two series
#
#
# We have a range of elastic distance functions in the distances module. Please see the
# distances notebook for more information. Distance functions have been mostly used
# with a nearest neighbour (NN) classifier.
#
#
#
#
# ## Load data and list distance based classifiers
# In[1]:
from sklearn import metrics
from aeon.datasets import load_italy_power_demand
from aeon.registry import all_estimators
X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
X_test = X_test[:10]
y_test = y_test[:10]
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
# ## Distance based classifiers
#
# In[2]:
# search for all classifiers that can handle multivariate time series. This will
# give some UserWarnings if soft dependencies are not installed. Rerun to remove
# warnings.
all_estimators("classifier", filter_tags={"algorithm_type": "distance"})
#
#
#
# In[3]:
from aeon.classification.distance_based import (
ElasticEnsemble,
KNeighborsTimeSeriesClassifier,
ShapeDTW,
)
# ## K-NN: KNeighborsTimeSeriesClassifier in aeon
# k-NN is often called a lazy classifier, because there is little work done in
# the fit operation. The fit operation simply stores the training data. When we want to
# make a prediction for a new time series, k-NN measures the distance between the new
# time series and all the series in the training data and records the class of the
# closest k train series. The class labels of these nearest neighbours are used to make
# a prediction: if they are all the same label, then that is the prediction. If they
# differ, then some form of voting mechanism is required. For example, we may predict
# the most common class label amongst the nearest neighbours for the test instance.
#
# KNeighborsTimeSeriesClassifier in aeon is configurable to use any of the distances
# functions in the distance module, or it can be passed a bespoke callable. You can set
# the number of neighbours and the weights. Weights are used in the prediction
# process when neightbours differ in class values. By default all neighbours have an
# equal vote. There is an option to weight by distance, meaning closer neighbours have
# more weight in the vote.
# In[4]:
knn = KNeighborsTimeSeriesClassifier(distance="msm", n_neighbors=3, weights="distance")
knn.fit(X_train, y_train)
knn_preds = knn.predict(X_test)
metrics.accuracy_score(y_test, knn_preds)
# ## Elastic Ensemble: ElasticEnsemble in aeon
#
# The first algorithm to significantly out perform 1-NN DTW on the UCR data was the
# Elastic Ensemble (EE) [1]. EE is a weighted ensemble of 11 1-NN classifiers with a
# range of elastic distance measures. It was the best performing distance based
# classifier in the bake off. Elastic distances can be slow, and EE requires cross
# validation to find the weights of each classifier in the ensemble. You can configure
# EE to use specified distance functions, and tell it how much
#
#
# In[5]:
ee = ElasticEnsemble(
distance_measures=["dtw", "msm"],
proportion_of_param_options=0.1,
proportion_train_in_param_finding=0.3,
proportion_train_for_test=0.5,
)
ee.fit(X_train, y_train)
ee_preds = ee.predict(X_test)
metrics.accuracy_score(y_test, ee_preds)
# ### Shape Dynamic Time Warping: ShapeDTW in aeon.
# Shape based DTW (ShapeDTW) [2] works by extracting a set of shape descriptors (such
# as slope and derivative) over windows of each series. These series to series transformed data are then used with 1-NN with DTW.
#
#
#
# In[6]:
shape = ShapeDTW()
shape.fit(X_train, y_train)
shape_preds = shape.predict(X_test)
metrics.accuracy_score(y_test, shape_preds)
# ### Proximity Forest
#
# Proximity Forest [3] is a distance based ensemble of decision trees. Its is the
# most accurate purely distance based technique for TSC that we know of. We do not
# currently have a working version of PF in aeon, but would very much like to have one.
# please see this issue. https://github.com/aeon-toolkit/aeon/issues/159
# ### Comparing performance: coming soon
#
#
# ## References
# [1] Lines J, Bagnall A (2015) Time series classification with ensembles of elastic
# distance measures. Data Mining and Knowledge Discovery 29:565–592
# [2] Zhao J. and Itti L (2019) shapeDTW: Shape Dynamic Time Warping, Pattern
# Recognition 74:171-184 https://arxiv.org/pdf/1606.01601.pdf
# [3] Lucas et al. (2019) Proximity Forest: an effective and scalable distance-based
# classifier. Data Mining and Knowledge Discovery 33: 607--635 https://arxiv.org/abs/1808.10594
#
#