Guillaume Baudart, Martin Hirzel, Kiran Kate, Pari Ram, and Avi Shinnar
27 March 2020
Examples, documentation, code: https://github.com/ibm/lale
!pip install lale
Requirement already satisfied: lale in /home/hirzel/python3.6venv/lib/python3.6/site-packages (0.3.5) Requirement already satisfied: lightgbm in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (2.2.3) Requirement already satisfied: astunparse in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (1.6.2) Requirement already satisfied: hyperopt==0.2.3 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.2.3) Requirement already satisfied: pandas<=0.25.3 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.25.0) Requirement already satisfied: xgboost in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.90) Requirement already satisfied: jsonsubschema in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.0.0) Requirement already satisfied: jsonschema in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (3.2.0) Requirement already satisfied: h5py in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (2.9.0) Requirement already satisfied: scikit-learn==0.20.3 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.20.3) Requirement already satisfied: scipy in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (1.3.0) Requirement already satisfied: numpy in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (1.17.0) Requirement already satisfied: graphviz in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.11.1) Requirement already satisfied: decorator in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (4.4.0) Requirement already satisfied: six<2.0,>=1.6.1 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from astunparse->lale) (1.12.0) Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from astunparse->lale) (0.33.4) Requirement already satisfied: future in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (0.17.1) Requirement already satisfied: cloudpickle in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (1.3.0) Requirement already satisfied: tqdm in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (4.32.2) Requirement already satisfied: networkx==2.2 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (2.2) Requirement already satisfied: pytz>=2017.2 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from pandas<=0.25.3->lale) (2019.1) Requirement already satisfied: python-dateutil>=2.6.1 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from pandas<=0.25.3->lale) (2.8.0) Requirement already satisfied: python-intervals in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonsubschema->lale) (1.8.0) Requirement already satisfied: greenery in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonsubschema->lale) (3.1) Requirement already satisfied: pyrsistent>=0.14.0 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (0.15.7) Requirement already satisfied: setuptools in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (41.0.1) Requirement already satisfied: attrs>=17.4.0 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (19.1.0) Requirement already satisfied: importlib-metadata; python_version < "3.8" in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (1.3.0) Requirement already satisfied: zipp>=0.5 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->jsonschema->lale) (0.5.2)
import lale.datasets
(train_X_all, train_y_all), (test_X, test_y) = lale.datasets.covtype_df(test_size=0.1)
print(f'shape train_X_all {train_X_all.shape}, test_X {test_X.shape}')
shape train_X_all (522910, 54), test_X (58102, 54)
import sklearn.model_selection
train_X, other_X, train_y, other_y = sklearn.model_selection.train_test_split(
train_X_all, train_y_all, test_size=0.9)
print(f'shape train_X {train_X.shape}, other_X {other_X.shape}')
shape train_X (52291, 54), other_X (470619, 54)
import pandas as pd
pd.set_option('display.max_columns', None)
pd.concat([pd.DataFrame({'y': train_y}, index=train_X.index),
train_X], axis=1).tail(10)
y | Elevation | Aspect | Slope | Horizontal_Distance_To_Hydrology | Vertical_Distance_To_Hydrology | Horizontal_Distance_To_Roadways | Hillshade_9am | Hillshade_Noon | Hillshade_3pm | Horizontal_Distance_To_Fire_Points | Wilderness_Area1 | Wilderness_Area2 | Wilderness_Area3 | Wilderness_Area4 | Soil_Type1 | Soil_Type2 | Soil_Type3 | Soil_Type4 | Soil_Type5 | Soil_Type6 | Soil_Type7 | Soil_Type8 | Soil_Type9 | Soil_Type10 | Soil_Type11 | Soil_Type12 | Soil_Type13 | Soil_Type14 | Soil_Type15 | Soil_Type16 | Soil_Type17 | Soil_Type18 | Soil_Type19 | Soil_Type20 | Soil_Type21 | Soil_Type22 | Soil_Type23 | Soil_Type24 | Soil_Type25 | Soil_Type26 | Soil_Type27 | Soil_Type28 | Soil_Type29 | Soil_Type30 | Soil_Type31 | Soil_Type32 | Soil_Type33 | Soil_Type34 | Soil_Type35 | Soil_Type36 | Soil_Type37 | Soil_Type38 | Soil_Type39 | Soil_Type40 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
325384 | 2 | 3064.0 | 86.0 | 25.0 | 702.0 | 259.0 | 721.0 | 247.0 | 189.0 | 56.0 | 1714.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
442177 | 1 | 3277.0 | 31.0 | 15.0 | 454.0 | 70.0 | 1570.0 | 215.0 | 206.0 | 124.0 | 2754.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
185316 | 2 | 3138.0 | 257.0 | 14.0 | 228.0 | 30.0 | 5649.0 | 185.0 | 248.0 | 200.0 | 3051.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
189541 | 3 | 2317.0 | 150.0 | 8.0 | 150.0 | 42.0 | 644.0 | 231.0 | 240.0 | 141.0 | 781.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
428374 | 2 | 2970.0 | 47.0 | 25.0 | 319.0 | 100.0 | 1919.0 | 220.0 | 178.0 | 80.0 | 3060.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
234638 | 1 | 3278.0 | 335.0 | 5.0 | 360.0 | 35.0 | 5763.0 | 209.0 | 233.0 | 163.0 | 646.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
172207 | 1 | 3175.0 | 343.0 | 17.0 | 162.0 | 3.0 | 4395.0 | 183.0 | 212.0 | 166.0 | 2965.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
240801 | 1 | 3355.0 | 346.0 | 16.0 | 180.0 | 6.0 | 1922.0 | 188.0 | 213.0 | 163.0 | 4906.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
435277 | 1 | 3154.0 | 316.0 | 26.0 | 339.0 | 122.0 | 2688.0 | 143.0 | 209.0 | 201.0 | 2720.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
297100 | 7 | 3344.0 | 313.0 | 20.0 | 0.0 | 0.0 | 4317.0 | 163.0 | 221.0 | 196.0 | 4092.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
from sklearn.decomposition import PCA
from xgboost import XGBClassifier as XGBoost
lale.wrap_imported_operators()
manual_trainable = PCA(n_components=6) >> XGBoost(n_estimators=3)
manual_trainable.visualize()
%%time
manual_trained = manual_trainable.fit(train_X, train_y)
CPU times: user 2.34 s, sys: 1.2 s, total: 3.55 s Wall time: 2.05 s
import sklearn.metrics
manual_y = manual_trained.predict(test_X)
print(f'accuracy {sklearn.metrics.accuracy_score(test_y, manual_y):.1%}')
accuracy 64.5%
XGBoost.hyperparam_schema('n_estimators')
{'description': 'Number of trees to fit.', 'type': 'integer', 'default': 100, 'minimumForOptimizer': 10, 'maximumForOptimizer': 1500}
print(PCA.documentation_url())
https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html
from lale.lib.lale import Hyperopt
import lale.schemas as schemas
CustomPCA = PCA.customize_schema(n_components=schemas.Int(min=2, max=54))
CustomXGBoost = XGBoost.customize_schema(n_estimators=schemas.Int(min=1, max=10))
hpo_planned = CustomPCA >> CustomXGBoost
hpo_trainable = Hyperopt(estimator=hpo_planned, max_evals=10, cv=3)
%%time
hpo_trained = hpo_trainable.fit(train_X, train_y)
100%|███████| 10/10 [01:20<00:00, 6.64s/trial, best loss: -0.7885106540569516] CPU times: user 1min 50s, sys: 22.2 s, total: 2min 12s Wall time: 1min 28s
hpo_y = hpo_trained.predict(test_X)
print(f'accuracy {sklearn.metrics.accuracy_score(test_y, hpo_y):.1%}')
accuracy 80.1%
hpo_trained.get_pipeline().visualize()
hpo_trained.get_pipeline().pretty_print(ipython_display=True)
from lale.lib.sklearn import PCA
from lale.lib.xgboost.xgb_classifier import XGBoost
import lale
lale.wrap_imported_operators()
pca = PCA(n_components=39, svd_solver='full')
xg_boost = XGBoost(colsample_bylevel=0.6016063807304212, colsample_bytree=0.7763972782064467, learning_rate=0.16389357351003786, max_depth=10, min_child_weight=5, n_estimators=5, reg_alpha=0.10485915855270356, reg_lambda=0.9268502695024392, subsample=0.4503841871781402)
pipeline = pca >> xg_boost
hpo_trained.summary()
tid | loss | time | log_loss | status | |
---|---|---|---|---|---|
name | |||||
p0 | 0 | -0.667916 | 1.532263 | 1.250336 | ok |
p1 | 1 | -0.635559 | 1.395001 | 1.120280 | ok |
p2 | 2 | -0.670229 | 2.745617 | 1.087269 | ok |
p3 | 3 | -0.788511 | 5.876360 | 1.049096 | ok |
p4 | 4 | -0.718938 | 3.725537 | 0.661428 | ok |
p5 | 5 | -0.482052 | 1.952195 | 1.241045 | ok |
p6 | 6 | -0.482052 | 1.209477 | 1.338511 | ok |
p7 | 7 | -0.669484 | 2.106700 | 0.844174 | ok |
p8 | 8 | -0.632346 | 1.612136 | 0.925707 | ok |
p9 | 9 | -0.622306 | 1.474229 | 1.882534 | ok |
worst_name = hpo_trained.summary().loss.argmax()
print(worst_name)
p5
hpo_trained.get_pipeline(worst_name).visualize()
hpo_trained.get_pipeline(worst_name).pretty_print(ipython_display=True, show_imports=False)
pca = PCA(n_components=48, svd_solver='full', whiten=True)
xg_boost = XGBoost(booster='gblinear', colsample_bylevel=0.41777546097517426, colsample_bytree=0.6852556915729863, learning_rate=0.4299362917360751, max_depth=15, min_child_weight=18, n_estimators=7, reg_alpha=0.5266202371276923, reg_lambda=0.494226267796831, subsample=0.8015579071911012)
pipeline = pca >> xg_boost
from sklearn.preprocessing import Normalizer as Norm
from sklearn.linear_model import LogisticRegression as LR
from sklearn.tree import DecisionTreeClassifier as Tree
from sklearn.neighbors import KNeighborsClassifier as KNN
from lale.lib.lale import NoOp
lale.wrap_imported_operators()
KNN = KNN.customize_schema(n_neighbors=schemas.Int(min=1, max=10))
transp_planned = (Norm | NoOp) >> (Tree | LR(dual=True) | KNN)
transp_planned.visualize()
%%time
transp_trained = transp_planned.auto_configure(
train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3)
100%|█████████| 3/3 [01:48<00:00, 32.59s/trial, best loss: -0.8376392446578157] CPU times: user 1min 50s, sys: 1.12 s, total: 1min 51s Wall time: 1min 49s
transp_trained.pretty_print(ipython_display=True, show_imports=False)
transp_trained.visualize()
knn = KNN(algorithm='ball_tree', metric='manhattan', n_neighbors=9)
pipeline = NoOp() >> knn
%%time
transp_y = transp_trained.predict(test_X)
print(f'accuracy {sklearn.metrics.accuracy_score(test_y, transp_y):.1%}')
accuracy 86.6% CPU times: user 50.6 s, sys: 15.6 ms, total: 50.6 s Wall time: 50.7 s
test_X.json_schema
{'description': 'Features of forest covertypes dataset (classification).', 'documentation_url': 'https://scikit-learn.org/0.20/datasets/index.html#forest-covertypes', 'type': 'array', 'items': {'type': 'array', 'minItems': 54, 'maxItems': 54, 'items': [{'description': 'Elevation', 'type': 'integer'}, {'description': 'Aspect', 'type': 'integer'}, {'description': 'Slope', 'type': 'integer'}, {'description': 'Horizontal_Distance_To_Hydrology', 'type': 'integer'}, {'description': 'Vertical_Distance_To_Hydrology', 'type': 'integer'}, {'description': 'Horizontal_Distance_To_Roadways', 'type': 'integer'}, {'description': 'Hillshade_9am', 'type': 'integer'}, {'description': 'Hillshade_Noon', 'type': 'integer'}, {'description': 'Hillshade_3pm', 'type': 'integer'}, {'description': 'Horizontal_Distance_To_Fire_Points', 'type': 'integer'}, {'description': 'Wilderness_Area1', 'enum': [0, 1]}, {'description': 'Wilderness_Area2', 'enum': [0, 1]}, {'description': 'Wilderness_Area3', 'enum': [0, 1]}, {'description': 'Wilderness_Area4', 'enum': [0, 1]}, {'description': 'Soil_Type1', 'enum': [0, 1]}, {'description': 'Soil_Type2', 'enum': [0, 1]}, {'description': 'Soil_Type3', 'enum': [0, 1]}, {'description': 'Soil_Type4', 'enum': [0, 1]}, {'description': 'Soil_Type5', 'enum': [0, 1]}, {'description': 'Soil_Type6', 'enum': [0, 1]}, {'description': 'Soil_Type7', 'enum': [0, 1]}, {'description': 'Soil_Type8', 'enum': [0, 1]}, {'description': 'Soil_Type9', 'enum': [0, 1]}, {'description': 'Soil_Type10', 'enum': [0, 1]}, {'description': 'Soil_Type11', 'enum': [0, 1]}, {'description': 'Soil_Type12', 'enum': [0, 1]}, {'description': 'Soil_Type13', 'enum': [0, 1]}, {'description': 'Soil_Type14', 'enum': [0, 1]}, {'description': 'Soil_Type15', 'enum': [0, 1]}, {'description': 'Soil_Type16', 'enum': [0, 1]}, {'description': 'Soil_Type17', 'enum': [0, 1]}, {'description': 'Soil_Type18', 'enum': [0, 1]}, {'description': 'Soil_Type19', 'enum': [0, 1]}, {'description': 'Soil_Type20', 'enum': [0, 1]}, {'description': 'Soil_Type21', 'enum': [0, 1]}, {'description': 'Soil_Type22', 'enum': [0, 1]}, {'description': 'Soil_Type23', 'enum': [0, 1]}, {'description': 'Soil_Type24', 'enum': [0, 1]}, {'description': 'Soil_Type25', 'enum': [0, 1]}, {'description': 'Soil_Type26', 'enum': [0, 1]}, {'description': 'Soil_Type27', 'enum': [0, 1]}, {'description': 'Soil_Type28', 'enum': [0, 1]}, {'description': 'Soil_Type29', 'enum': [0, 1]}, {'description': 'Soil_Type30', 'enum': [0, 1]}, {'description': 'Soil_Type31', 'enum': [0, 1]}, {'description': 'Soil_Type32', 'enum': [0, 1]}, {'description': 'Soil_Type33', 'enum': [0, 1]}, {'description': 'Soil_Type34', 'enum': [0, 1]}, {'description': 'Soil_Type35', 'enum': [0, 1]}, {'description': 'Soil_Type36', 'enum': [0, 1]}, {'description': 'Soil_Type37', 'enum': [0, 1]}, {'description': 'Soil_Type38', 'enum': [0, 1]}, {'description': 'Soil_Type39', 'enum': [0, 1]}, {'description': 'Soil_Type40', 'enum': [0, 1]}]}, 'minItems': 58102, 'maxItems': 58102}
area_columns = [f'Wilderness_Area{i}' for i in range(1, 5)]
soil_columns = [f'Soil_Type{i}' for i in range(1, 41)]
binary_columns = area_columns + soil_columns
other_columns = [c for c in train_X.columns if c not in binary_columns]
print(f'other columns: {", ".join(other_columns)}')
other columns: Elevation, Aspect, Slope, Horizontal_Distance_To_Hydrology, Vertical_Distance_To_Hydrology, Horizontal_Distance_To_Roadways, Hillshade_9am, Hillshade_Noon, Hillshade_3pm, Horizontal_Distance_To_Fire_Points
from lale.lib.lale import Project
from lale.lib.lale import ConcatFeatures as Concat
from sklearn.feature_selection import SelectKBest as FeatSel
lale.wrap_imported_operators()
binary_prep = Project(columns=binary_columns) >> FeatSel
other_prep = Project(columns=other_columns) >> (Norm | NoOp)
nonlin_planned = (binary_prep & other_prep) >> Concat >> KNN
nonlin_planned.visualize()
%%time
nonlin_trained = nonlin_planned.auto_configure(
train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3)
100%|█████████| 3/3 [02:08<00:00, 34.88s/trial, best loss: -0.8584651324517477] CPU times: user 2min 9s, sys: 62.5 ms, total: 2min 9s Wall time: 2min 10s
nonlin_trained.visualize()
nonlin_trained.pretty_print(ipython_display=True, show_imports=False, combinators=False)
project_0 = Project(columns=['Wilderness_Area1', 'Wilderness_Area2', 'Wilderness_Area3', 'Wilderness_Area4', 'Soil_Type1', 'Soil_Type2', 'Soil_Type3', 'Soil_Type4', 'Soil_Type5', 'Soil_Type6', 'Soil_Type7', 'Soil_Type8', 'Soil_Type9', 'Soil_Type10', 'Soil_Type11', 'Soil_Type12', 'Soil_Type13', 'Soil_Type14', 'Soil_Type15', 'Soil_Type16', 'Soil_Type17', 'Soil_Type18', 'Soil_Type19', 'Soil_Type20', 'Soil_Type21', 'Soil_Type22', 'Soil_Type23', 'Soil_Type24', 'Soil_Type25', 'Soil_Type26', 'Soil_Type27', 'Soil_Type28', 'Soil_Type29', 'Soil_Type30', 'Soil_Type31', 'Soil_Type32', 'Soil_Type33', 'Soil_Type34', 'Soil_Type35', 'Soil_Type36', 'Soil_Type37', 'Soil_Type38', 'Soil_Type39', 'Soil_Type40'])
feat_sel = FeatSel(k=8)
pipeline_0 = make_pipeline(project_0, feat_sel)
project_1 = Project(columns=['Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways', 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm', 'Horizontal_Distance_To_Fire_Points'])
pipeline_1 = make_pipeline(project_1, NoOp())
union = make_union(pipeline_0, pipeline_1)
knn = KNN(algorithm='kd_tree', n_neighbors=7, weights='distance')
pipeline = make_pipeline(union, knn)
%%time
nonlin_y = nonlin_trained.predict(test_X)
print(f'accuracy {sklearn.metrics.accuracy_score(test_y, nonlin_y):.1%}')
accuracy 88.6% CPU times: user 4.12 s, sys: 93.8 ms, total: 4.22 s Wall time: 4.19 s
binary_prep_trainable = Project(columns=binary_columns) >> FeatSel(k=8)
binary_prep_trained = binary_prep_trainable.fit(train_X, train_y)
binary_prep_trained.transform(test_X.head(10))
Wilderness_Area1 | Wilderness_Area4 | Soil_Type2 | Soil_Type3 | Soil_Type4 | Soil_Type10 | Soil_Type38 | Soil_Type39 | |
---|---|---|---|---|---|---|---|---|
0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
6 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
8 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
9 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |