tsfresh is a tool for extacting summary features from a collection of time series. It is an unsupervised transformation, and as such can easily be used as a pipeline stage in classification, clustering and regression in conjunction with a scikit-learn compatible estimator.
You have to install tsfresh if you haven't already. To install it, uncomment the cell below:
# !pip install --upgrade tsfresh
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from aeon.datasets import load_arrow_head, load_basic_motions
from aeon.transformations.collection.feature_based import TSFresh, TSFreshRelevant
We use the ArrowHead data from the UCR TSC archive. as an example dataset. See dataset notebook for more details. We only use the first few cases for examples to speed up the notebook.
X, y = load_arrow_head()
n_cases = 24
X_train = X[:n_cases, :, :]
y_train = y[:n_cases]
X_test = X[n_cases : 2 * n_cases, :, :]
y_test = y[n_cases : 2 * n_cases]
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(24, 1, 251) (24,) (24, 1, 251) (24,)
There are two versions of TSFresh feature extractors wrapped in aeon. The
first is the unsupervised
TSFresh
which by default extracts all 4662 features. See the
documentation for parameter configuration.
t = TSFresh()
Xt = t.fit_transform(X_train)
Xt2 = t.transform(X_test)
print(f"Train shape = {Xt.shape} test shape = {Xt2.shape}")
Train shape = (24, 777) test shape = (24, 777)
The second is TSFreshRelevant
which uses y
to select the most
relevant features.
t = TSFreshRelevant()
t.fit(X_train, y_train)
Xt = t.transform(X_test)
Xt.shape
(24, 75)
You can use the tsfresh transformer with any scikit-learn compatible estimator.
classifier = make_pipeline(
TSFresh(default_fc_parameters="efficient", show_warnings=False),
RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
0.625
For convenience and consistency of use we also have hard coded TSFresh classifier, regressor and clusterer.
from aeon.classification.feature_based import TSFreshClassifier
from aeon.clustering.feature_based import TSFreshClusterer
cls = TSFreshClassifier(relevant_feature_extractor=False)
clst = TSFreshClusterer(n_clusters=2)
cls.fit(X_train, y_train)
cls.score(X_test, y_test)
clst.fit(X_train)
print(cls.predict(X_test))
print(clst.predict(X_test))
['0' '1' '0' '1' '1' '2' '0' '1' '1' '0' '1' '1' '0' '2' '0' '0' '0' '2' '2' '1' '0' '0' '0' '0'] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
By default, the TSFreshClassifier
uses the supervised
TSFreshRelevant
and the scitkit RandomForestClassifier
.
You can
change this through the constructor
from aeon.classification.sklearn import RotationForestClassifier
cls = TSFreshClassifier(estimator=RotationForestClassifier(n_estimators=5))
cls.fit(X_train, y_train)
cls.score(X_test, y_test)
0.5833333333333334
By default, the TSFreshClusterer
uses the unsupervised TSFresh
and the sklearn
clusterer KMeans
with default parameters (which fits 8 clusters).
You can also configure this through the constructor.
from sklearn.cluster import KMeans
clst = TSFreshClusterer(estimator=KMeans(n_clusters=3))
clst.fit(X_train)
print(clst.predict(X_test))
[1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 2 0 0 1]
The TSFreshRegressor
uses the supervised
TSFreshRelevant
and the scitkit RandomForestRegressor
.
from aeon.regression.feature_based import TSFreshRegressor
reg = TSFreshRegressor(relevant_feature_extractor=False)
from aeon.datasets import load_covid_3month
X, y = load_covid_3month(split="train")
reg.fit(X, y)
TSFreshRegressor(relevant_feature_extractor=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
TSFreshRegressor(relevant_feature_extractor=False)
TSFresh
transformers and all three estimators can be used with multivariate time
series. The transform calculates the features on each channel independently then
concatenate the results. The full transform creates 777*n_channels
features.
X_train, y_train = load_basic_motions(split="train")
X_test, y_test = load_basic_motions(split="test")
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(40, 6, 100) (40,) (40, 6, 100) (40,)
tsfresh = TSFresh()
X = tsfresh.fit_transform(X_train, y_train)
X.shape
(40, 4662)