Notebook

Tuning the hyperparameters of Slisemap¶

In this notebook we will demonstrate how to automatically tune the regularisation hyperparameters of Slisemap and Slipmap (the procedures described in this notebook works for both methods). The scikit-optimize library is required to run this notebook (e.g., via pip install "slisemap[tuning]" or pip install scikit-optimize). As the dataset we will use the Magic Telescope dataset.

In [1]:

import sys

import numpy as np
import pandas as pd

from pathlib import Path
from urllib.request import urlretrieve

from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

sys.path.insert(0, "..")

from slisemap import Slipmap
from slisemap.local_models import LogisticRegression
from slisemap.tuning import optimise_with_test
from slisemap.metrics import accuracy

Objectives¶

These are the objectives of this notebook:

Describe the hyperparameters of Slisemap.
Discuss how unseen data can be used to evaluate Slisemap.
Demonstrate how this can be used to tune the hyperparameters.
Discuss the symbiosis with subsampling.

Hyperparameters¶

Below are the most interesting hyperparameters described in more detail:

local_model / local_loss: Prediction and loss function for the local models. The local model must be selected to suit the data. Slisemap comes with some already implemented models: linear regression (for regression) and logistic regression (for classification).
lasso / ridge: Coefficients for Lasso/L1 regularisation and Ridge/L2 regularisation. Since the local model are local (only fit a subset of all the data) there is a greater risk of overfitting. Therefore, it is strongly recommended to use some kind of regularisation. Slisemap comes with two common types of regularisation that can be enabled through the lasso and ridge parameters.
radius: The radius controls size of the embedding. The larger the radius the more "independent clusters" can exist in the embedding at once.
d: The number of embedding dimensions. Since most computer monitors are 2D it is very convenient to keep d=2.
intercept: Include an intercept term. For linear model it is common to include an intercept term. Slisemap can automatically add a constant to column to the data (creating an intercept term in linear local models).

In this notebook we will focus on lasso and ridge because these are maybe the least intuitive of the hyperparameters. While the choice of local model (including loss and intercept) depends on the data, so it is left for manual selection.

Results caching¶

To reduce execution times, we provide pretrained Slisemap models. If you want to train them yourself instead please set USE_CACHE=False.

In [3]:

USE_CACHE = True
SM_V1_CACHE_PATH = Path("cache") / "03_hyperparameter_tuning_example.v1.sp"
SM_V2_CACHE_PATH = Path("cache") / "03_hyperparameter_tuning_example.v2.sp"

for path in [SM_V1_CACHE_PATH, SM_V2_CACHE_PATH]:
    path.parent.mkdir(exist_ok=True, parents=True)
    if USE_CACHE and not path.exists():
        try:
            urlretrieve(
                f"https://raw.githubusercontent.com/edahelsinki/slisemap/data/examples/cache/{path.name}",
                path,
            )
        except:
            pass

Data¶

Magic Telescope dataset: The data are MC generated to simulate registration of high energy gamma particles in a ground-based atmospheric Cherenkov gamma telescope using the imaging technique. Cherenkov gamma telescope observes high energy gamma rays, taking advantage of the radiation emitted by charged particles produced inside the electromagnetic showers initiated by the gammas, and developing in the atmosphere. This Cherenkov radiation (of visible to UV wavelengths) leaks through the atmosphere and gets recorded in the detector, allowing reconstruction of the shower parameters.

Bock, R.K., Chilingarian, A., Gaug, M., Hakl, F., Hengstebeck, T., Jirina, M., Klaschka, J., Kotrc, E., Savicky, P., Towers, S., Vaicilius, A., Wittek W. (2004). Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl.Instr.Meth. A, 516, pp. 511-528.

In [69]:

# Download the Magic Telescope dataset from https://www.openml.org/d/1120
data = fetch_openml(data_id=1120)
data = data.frame.dropna()
X = data.iloc[:, :-1]
scale_x = StandardScaler()
X = pd.DataFrame(scale_x.fit_transform(X), columns=X.columns)
y = pd.get_dummies(data.iloc[:, -1:])
y.columns = ["signal", "background"]

In [5]:

random_forest = RandomForestClassifier(random_state=42).fit(X.iloc[::2], y.iloc[::2])
y2 = random_forest.predict_proba(X)[1]
print("Correctly classified:", np.mean(np.argmax(y.to_numpy(), 1)[1::2] == np.argmax(y2, 1)[1::2]))

Correctly classified: 0.8802313354363828

Slisemap¶

We subsample 2000 data items for training, 2000 for testing, and the rest as validation data.

In [6]:

# Subsample the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y2, train_size=2000, stratify=y.iloc[:, 0], random_state=42
)
X_test, X_validation, y_test, y_validation = train_test_split(
    X_test, y_test, train_size=2000, stratify=y_test[:, 0] > 0.5, random_state=42
)

Now we create a Slipmap object and optimise the solution. Since we are dealing with a classification task we replace the default white box model with logistic regression.

In [63]:

if USE_CACHE and SM_V1_CACHE_PATH.exists():
    sm = Slipmap.load(SM_V1_CACHE_PATH, device="cpu")
else:
    # Create a Slisemap object with lasso regularisation
    sm = Slipmap(
        X_train,
        y_train,
        lasso=0.0001,
        ridge=0.0001,
        intercept=False,
        local_model=LogisticRegression,
    )
    # Optimise the solution
    sm.optimise()
    # Metadata
    sm.metadata.set_targets(y.columns)
    sm.metadata.set_scale_X(scale_x.mean_, scale_x.scale_)
    # Cache the results
    sm.save(SM_V1_CACHE_PATH)

Here we guessed that lasso=0.0001 and ridge=0.0001 would be good values. But how can we check if this is actually true?

Evaluation using validation data¶

To measure how well the Slisemap solution generalises to unseen data we can use the accuracy function from slisemap.metrics. This function predicts the value for the unseen data items (using Slisemap.predict to copy an existing local model) and then calculates the local losses:

In [64]:

print(f"Accuracy (without tuning): {accuracy(sm, X_validation, y_validation):.5f}")

Accuracy (without tuning): 0.04165

Hyperparameter optimisation¶

This "accuracy" value can be used to automatically tune the hyperparameters. Slisemap comes with three built-in methods for tuning the hyperparameters:

slisemap.tuning.hyperparameter_tune: Full "Bayesian" hyperparameter optimisation. Can be quite slow since the Slisemap.optimise is restarted multiple times.
slisemap.tuning.optimise_with_test: Works like the Slisemap.optimise function, but inserts a local search for better hyper-parameters each iteration. Can be quite fast but is vulnerable to local optima and requires starting values.
slisemap.tuning.optimise_with_cv: Works like optimise_with_test but uses cross-validation instead of a test set. Could be more stable (and slightly slower) than the previous option, but still requires starting values.

In this notebook we will demonstrate the second option:

In [74]:

if USE_CACHE and SM_V2_CACHE_PATH.exists():
    sm2 = Slipmap.load(SM_V2_CACHE_PATH, device="cpu")
else:
    sm2 = sm.copy()
    # Optimise Slisemap with hyperparameter tuning
    sm2 = optimise_with_test(sm2, X_test, y_test)
    # Cache the results
    sm2.save(SM_V2_CACHE_PATH)

In [75]:

print(f"Tuned hyperparameters: lasso = {sm2.lasso:.5f}, ridge = {sm2.ridge:.5f}")
print(f"Accuracy (without tuning): {accuracy(sm2, X_validation, y_validation):.5f}")

Tuned hyperparameters: lasso = 0.00562, ridge = 0.00082
Accuracy (without tuning): 0.03678

Comparison¶

With the tuned hyperparameters we are able to half the "accuracy" metric. However, the losses and accuracy are already quite good before the tuning (for this dataset with 2000 training items):

In [76]:

print("Accuracy (without tuning), Training:  ", accuracy(sm, X_train, y_train))
print("Accuracy (without tuning), Testing:   ", accuracy(sm, X_test, y_test))
print("Accuracy (without tuning), Validation:", accuracy(sm, X_validation, y_validation))
print()
print("Accuracy (after tuning), Training:    ", accuracy(sm2, X_train, y_train))
print("Accuracy (after tuning), Testing:     ", accuracy(sm2, X_test, y_test))
print("Accuracy (after tuning), Validation:  ", accuracy(sm2, X_validation, y_validation))

Accuracy (without tuning), Training:   0.0013598452787846327
Accuracy (without tuning), Testing:    0.0435044951736927
Accuracy (without tuning), Validation: 0.041654691100120544

Accuracy (after tuning), Training:     0.012171604670584202
Accuracy (after tuning), Testing:      0.03755825012922287
Accuracy (after tuning), Validation:   0.03677966818213463

Furthermore, a common use-case for Slisemap is visualisation and exploratory data analysis. Therefore, the choice of hyperparameter is also subjective, the visualisations need to suit the users preferences, not only match the random forest predictions. For example, increasing lasso tends to make the local models more sparse, which makes the interpretation easier. The difference in sparsity can be seen when plotting the two solutions:

In [78]:

sm.plot(title=f"Without hyperparameter tuning (lasso = {sm.lasso:.5f})", clusters=5, bars=True)
sm2.plot(title=f"After hyperparameter tuning (lasso = {sm2.lasso:.5f})", clusters=5, bars=True)

Thus, the tuned hyperparameters can be seen as a "baseline" for manual tweaking (if necessary). As is demonstrated in the previous example notebook, when tweaking hyperparameters it is possible to continue the optimisation from a previous solution (instead of starting from scratch).

Conclusions¶

In this notebook we demonstrate how additional data can be used to improve the results through hyperparameter tuning. Training on a subset of the data can be motivated by speed (although Slipmap and/or GPU acceleration also helps), but we also don't really need to visualise more point than we have pixels.

The main targets for the hyperparameter tuning are the regularisation coefficients and embedding radius. Regularisation is most useful when the number of variables is large or the number of data items small (unlike the dataset used for this example).

Two of the hyperparameter optimisation procedures that comes with Slisemap are designed for speed through the use of (less exhaustive) local search. In all cases, the objective for the tuning is to find solutions that best generalises to unseen data, not a Slisemap solution with the best visualisation. Hence, manual tweaking can still be adventageous (with the automatically tuned values as a starting point).