This notebook shows how to use sklearn-genetic-opt
for hyperparameter optimization based on genetic algorithms (evolutionary programming). If you are interested in understanding how it works, sklearn-genetic-opt
is using DEAP under the hood.
%load_ext watermark
%watermark -p scikit-learn,sklearn,deap,sklearn_genetic
scikit-learn : 1.0 sklearn : 1.0 deap : 1.3.1 sklearn_genetic: 0.7.0
from sklearn import model_selection
from sklearn.model_selection import train_test_split
from sklearn import datasets
data = datasets.load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
X_train_sub, X_valid, y_train_sub, y_valid = \
train_test_split(X_train, y_train, test_size=0.2, random_state=1, stratify=y_train)
print('Train/Valid/Test sizes:', y_train.shape[0], y_valid.shape[0], y_test.shape[0])
Train/Valid/Test sizes: 398 80 171
Install: pip install sklearn-genetic-opt[all]
More info: https://sklearn-genetic-opt.readthedocs.io/en/stable/#
import numpy as np
import scipy.stats
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Integer, Categorical, Continuous
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=123)
params = {
'min_samples_split': Integer(2, 12),
'min_impurity_decrease': Continuous(0.0, 0.5),
'max_depth': Categorical([6, 16, None])
}
search = GASearchCV(
estimator=clf,
cv=5,
population_size=15,
generations=20,
tournament_size=3,
elitism=True,
keep_top_k=4,
crossover_probability=0.9,
mutation_probability=0.05,
param_grid=params,
criteria='max',
algorithm='eaMuCommaLambda',
n_jobs=-1)
search.fit(X_train, y_train)
search.best_score_
gen nevals fitness fitness_std fitness_max fitness_min 0 15 0.773962 0.131052 0.914778 0.628165 1 28 0.888608 0.0588224 0.914778 0.673165 2 29 0.911424 0.00855215 0.914778 0.88962 3 28 0.914778 4.44089e-16 0.914778 0.914778 4 28 0.914778 4.44089e-16 0.914778 0.914778 5 28 0.914778 4.44089e-16 0.914778 0.914778 6 29 0.914778 4.44089e-16 0.914778 0.914778 7 27 0.918297 0.00703797 0.932373 0.914778 8 27 0.922989 0.0087779 0.932373 0.914778 9 29 0.928854 0.00703797 0.932373 0.914778 10 29 0.932373 3.33067e-16 0.932373 0.932373 11 29 0.932373 3.33067e-16 0.932373 0.932373 12 29 0.932373 3.33067e-16 0.932373 0.932373 13 29 0.932861 0.000974684 0.93481 0.932373 14 29 0.933023 0.00107755 0.93481 0.932373 15 28 0.93416 0.00107755 0.93481 0.932373 16 29 0.93481 3.33067e-16 0.93481 0.93481 17 29 0.93481 3.33067e-16 0.93481 0.93481 18 29 0.93481 3.33067e-16 0.93481 0.93481 19 28 0.93481 3.33067e-16 0.93481 0.93481 20 29 0.93481 3.33067e-16 0.93481 0.93481
0.9348101265822784
search.best_params_
{'min_samples_split': 8, 'min_impurity_decrease': 0.006258039752250311, 'max_depth': 16}
print(f"Training Accuracy: {search.best_estimator_.score(X_train, y_train):0.2f}")
print(f"Test Accuracy: {search.best_estimator_.score(X_test, y_test):0.2f}")
Training Accuracy: 0.99 Test Accuracy: 0.94