In this notebook we will demonstrate how to automatically tune the regularisation hyperparameters of Slisemap and Slipmap (the procedures described in this notebook works for both methods).
The scikit-optimize
library is required to run this notebook (e.g., via pip install "slisemap[tuning]"
or pip install scikit-optimize
).
As the dataset we will use the Magic Telescope dataset.
import sys
import numpy as np
import pandas as pd
from pathlib import Path
from urllib.request import urlretrieve
from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
sys.path.insert(0, "..")
from slisemap import Slipmap
from slisemap.local_models import LogisticRegression
from slisemap.tuning import optimise_with_test
from slisemap.metrics import accuracy
These are the objectives of this notebook:
Below are the most interesting hyperparameters described in more detail:
local_model / local_loss: Prediction and loss function for the local models. The local model must be selected to suit the data. Slisemap comes with some already implemented models: linear regression (for regression) and logistic regression (for classification).
lasso / ridge: Coefficients for Lasso/L1 regularisation and Ridge/L2 regularisation. Since the local model are local (only fit a subset of all the data) there is a greater risk of overfitting. Therefore, it is strongly recommended to use some kind of regularisation. Slisemap comes with two common types of regularisation that can be enabled through the lasso
and ridge
parameters.
radius: The radius
controls size of the embedding. The larger the radius
the more "independent clusters" can exist in the embedding at once.
d: The number of embedding dimensions. Since most computer monitors are 2D it is very convenient to keep d=2
.
intercept: Include an intercept term. For linear model it is common to include an intercept term. Slisemap can automatically add a constant to column to the data (creating an intercept term in linear local models).
In this notebook we will focus on lasso
and ridge
because these are maybe the least intuitive of the hyperparameters.
While the choice of local model (including loss and intercept) depends on the data, so it is left for manual selection.
To reduce execution times, we provide pretrained Slisemap models. If you want to train them yourself instead please set USE_CACHE=False
.
USE_CACHE = True
SM_V1_CACHE_PATH = Path("cache") / "03_hyperparameter_tuning_example.v1.sp"
SM_V2_CACHE_PATH = Path("cache") / "03_hyperparameter_tuning_example.v2.sp"
for path in [SM_V1_CACHE_PATH, SM_V2_CACHE_PATH]:
path.parent.mkdir(exist_ok=True, parents=True)
if USE_CACHE and not path.exists():
try:
urlretrieve(
f"https://raw.githubusercontent.com/edahelsinki/slisemap/data/examples/cache/{path.name}",
path,
)
except:
pass
Magic Telescope dataset: The data are MC generated to simulate registration of high energy gamma particles in a ground-based atmospheric Cherenkov gamma telescope using the imaging technique. Cherenkov gamma telescope observes high energy gamma rays, taking advantage of the radiation emitted by charged particles produced inside the electromagnetic showers initiated by the gammas, and developing in the atmosphere. This Cherenkov radiation (of visible to UV wavelengths) leaks through the atmosphere and gets recorded in the detector, allowing reconstruction of the shower parameters.
Bock, R.K., Chilingarian, A., Gaug, M., Hakl, F., Hengstebeck, T., Jirina, M., Klaschka, J., Kotrc, E., Savicky, P., Towers, S., Vaicilius, A., Wittek W. (2004). Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl.Instr.Meth. A, 516, pp. 511-528.
# Download the Magic Telescope dataset from https://www.openml.org/d/1120
data = fetch_openml(data_id=1120)
data = data.frame.dropna()
X = data.iloc[:, :-1]
scale_x = StandardScaler()
X = pd.DataFrame(scale_x.fit_transform(X), columns=X.columns)
y = pd.get_dummies(data.iloc[:, -1:])
y.columns = ["signal", "background"]
random_forest = RandomForestClassifier(random_state=42).fit(X.iloc[::2], y.iloc[::2])
y2 = random_forest.predict_proba(X)[1]
print("Correctly classified:", np.mean(np.argmax(y.to_numpy(), 1)[1::2] == np.argmax(y2, 1)[1::2]))
Correctly classified: 0.8802313354363828
We subsample 2000 data items for training, 2000 for testing, and the rest as validation data.
# Subsample the data
X_train, X_test, y_train, y_test = train_test_split(
X, y2, train_size=2000, stratify=y.iloc[:, 0], random_state=42
)
X_test, X_validation, y_test, y_validation = train_test_split(
X_test, y_test, train_size=2000, stratify=y_test[:, 0] > 0.5, random_state=42
)
Now we create a Slipmap object and optimise the solution. Since we are dealing with a classification task we replace the default white box model with logistic regression.
if USE_CACHE and SM_V1_CACHE_PATH.exists():
sm = Slipmap.load(SM_V1_CACHE_PATH, device="cpu")
else:
# Create a Slisemap object with lasso regularisation
sm = Slipmap(
X_train,
y_train,
lasso=0.0001,
ridge=0.0001,
intercept=False,
local_model=LogisticRegression,
)
# Optimise the solution
sm.optimise()
# Metadata
sm.metadata.set_targets(y.columns)
sm.metadata.set_scale_X(scale_x.mean_, scale_x.scale_)
# Cache the results
sm.save(SM_V1_CACHE_PATH)
Here we guessed that lasso=0.0001
and ridge=0.0001
would be good values. But how can we check if this is actually true?
To measure how well the Slisemap solution generalises to unseen data we can use the accuracy
function from slisemap.metrics
.
This function predicts the value for the unseen data items (using Slisemap.predict
to copy an existing local model) and then calculates the local losses:
print(f"Accuracy (without tuning): {accuracy(sm, X_validation, y_validation):.5f}")
Accuracy (without tuning): 0.04165
This "accuracy" value can be used to automatically tune the hyperparameters. Slisemap comes with three built-in methods for tuning the hyperparameters:
slisemap.tuning.hyperparameter_tune
: Full "Bayesian" hyperparameter optimisation. Can be quite slow since the Slisemap.optimise
is restarted multiple times.slisemap.tuning.optimise_with_test
: Works like the Slisemap.optimise
function, but inserts a local search for better hyper-parameters each iteration. Can be quite fast but is vulnerable to local optima and requires starting values.slisemap.tuning.optimise_with_cv
: Works like optimise_with_test
but uses cross-validation instead of a test set. Could be more stable (and slightly slower) than the previous option, but still requires starting values.In this notebook we will demonstrate the second option:
if USE_CACHE and SM_V2_CACHE_PATH.exists():
sm2 = Slipmap.load(SM_V2_CACHE_PATH, device="cpu")
else:
sm2 = sm.copy()
# Optimise Slisemap with hyperparameter tuning
sm2 = optimise_with_test(sm2, X_test, y_test)
# Cache the results
sm2.save(SM_V2_CACHE_PATH)
print(f"Tuned hyperparameters: lasso = {sm2.lasso:.5f}, ridge = {sm2.ridge:.5f}")
print(f"Accuracy (without tuning): {accuracy(sm2, X_validation, y_validation):.5f}")
Tuned hyperparameters: lasso = 0.00562, ridge = 0.00082 Accuracy (without tuning): 0.03678
With the tuned hyperparameters we are able to half the "accuracy" metric. However, the losses and accuracy are already quite good before the tuning (for this dataset with 2000 training items):
print("Accuracy (without tuning), Training: ", accuracy(sm, X_train, y_train))
print("Accuracy (without tuning), Testing: ", accuracy(sm, X_test, y_test))
print("Accuracy (without tuning), Validation:", accuracy(sm, X_validation, y_validation))
print()
print("Accuracy (after tuning), Training: ", accuracy(sm2, X_train, y_train))
print("Accuracy (after tuning), Testing: ", accuracy(sm2, X_test, y_test))
print("Accuracy (after tuning), Validation: ", accuracy(sm2, X_validation, y_validation))
Accuracy (without tuning), Training: 0.0013598452787846327 Accuracy (without tuning), Testing: 0.0435044951736927 Accuracy (without tuning), Validation: 0.041654691100120544 Accuracy (after tuning), Training: 0.012171604670584202 Accuracy (after tuning), Testing: 0.03755825012922287 Accuracy (after tuning), Validation: 0.03677966818213463
Furthermore, a common use-case for Slisemap is visualisation and exploratory data analysis. Therefore, the choice of hyperparameter is also subjective, the visualisations need to suit the users preferences, not only match the random forest predictions. For example, increasing lasso
tends to make the local models more sparse, which makes the interpretation easier. The difference in sparsity can be seen when plotting the two solutions:
sm.plot(title=f"Without hyperparameter tuning (lasso = {sm.lasso:.5f})", clusters=5, bars=True)
sm2.plot(title=f"After hyperparameter tuning (lasso = {sm2.lasso:.5f})", clusters=5, bars=True)
Thus, the tuned hyperparameters can be seen as a "baseline" for manual tweaking (if necessary). As is demonstrated in the previous example notebook, when tweaking hyperparameters it is possible to continue the optimisation from a previous solution (instead of starting from scratch).
In this notebook we demonstrate how additional data can be used to improve the results through hyperparameter tuning. Training on a subset of the data can be motivated by speed (although Slipmap and/or GPU acceleration also helps), but we also don't really need to visualise more point than we have pixels.
The main targets for the hyperparameter tuning are the regularisation coefficients and embedding radius. Regularisation is most useful when the number of variables is large or the number of data items small (unlike the dataset used for this example).
Two of the hyperparameter optimisation procedures that comes with Slisemap are designed for speed through the use of (less exhaustive) local search. In all cases, the objective for the tuning is to find solutions that best generalises to unseen data, not a Slisemap solution with the best visualisation. Hence, manual tweaking can still be adventageous (with the automatically tuned values as a starting point).