This notebook shows how to use Lale directly with sklearn operators.
The function lale.wrap_imported_operators()
will automatically wrap
known sklearn operators into Lale operators.
import sklearn.datasets
import sklearn.model_selection
digits = sklearn.datasets.load_digits()
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42)
print(f'truth {y_test.tolist()[:20]}')
truth [6, 9, 3, 7, 2, 1, 5, 2, 5, 2, 1, 9, 4, 0, 4, 2, 3, 7, 8, 8]
import lale
from sklearn.linear_model import LogisticRegression as LR
lale.wrap_imported_operators()
trainable_lr = LR(LR.enum.solver.lbfgs, C=0.0001)
trained_lr = trainable_lr.fit(X_train, y_train)
predictions = trained_lr.predict(X_test)
print(f'actual {predictions.tolist()[:20]}')
actual [6, 9, 3, 7, 2, 2, 5, 2, 5, 2, 1, 4, 4, 0, 4, 2, 3, 7, 8, 8]
from sklearn.metrics import accuracy_score
print(f'accuracy {accuracy_score(y_test, predictions):.1%}')
accuracy 92.2%
Lale uses JSON Schema to check for valid hyperparameters. These schemas enable not just validation but also interactive documentation. Thanks to using a single source of truth, the documentation is correct by construction.
from jsonschema import ValidationError
try:
lale_lr = LR(solver='adam', C=0.01)
except ValidationError as e:
print(e.message)
Invalid configuration for LR(solver='adam', C=0.01) due to invalid value solver=adam. Schema of argument solver: { "default": "lbfgs", "description": "Algorithm for optimization problem.", "enum": ["newton-cg", "lbfgs", "liblinear", "sag", "saga"], } Value: adam
LR.hyperparam_schema('C')
{'description': 'Inverse regularization strength. Smaller values specify stronger regularization.', 'type': 'number', 'distribution': 'loguniform', 'minimum': 0.0, 'exclusiveMinimum': True, 'default': 1.0, 'minimumForOptimizer': 0.03125, 'maximumForOptimizer': 32768}
LR.get_defaults()
mappingproxy({'solver': 'lbfgs', 'penalty': 'l2', 'dual': False, 'C': 1.0, 'tol': 0.0001, 'fit_intercept': True, 'intercept_scaling': 1.0, 'class_weight': None, 'random_state': None, 'max_iter': 100, 'multi_class': 'auto', 'verbose': 0, 'warm_start': False, 'n_jobs': None, 'l1_ratio': None})
Lale includes a compiler that converts types (expressed as JSON Schema) to optimizer search spaces. It currently has back-ends for hyperopt, GridSearchCV, and SMAC. We are also actively working towards various other forms of AI automation using various other tools.
from lale.search.op2hp import hyperopt_search_space
from hyperopt import STATUS_OK, Trials, fmin, tpe, space_eval
import lale.helpers
import warnings
warnings.filterwarnings("ignore")
def objective(hyperparams):
trainable = LR(**lale.helpers.dict_without(hyperparams, 'name'))
trained = trainable.fit(X_train, y_train)
predictions = trained.predict(X_test)
return {'loss': -accuracy_score(y_test, predictions), 'status': STATUS_OK}
search_space = hyperopt_search_space(LR)
trials = Trials()
fmin(objective, search_space, algo=tpe.suggest, max_evals=10, trials=trials)
best_hps = space_eval(search_space, trials.argmin)
print(f'best hyperparams {lale.helpers.dict_without(best_hps, "name")}\n')
print(f'accuracy {-min(trials.losses()):.1%}')
100%|██████████| 10/10 [00:03<00:00, 2.93trial/s, best loss: -0.975] best hyperparams {'dual': False, 'fit_intercept': False, 'intercept_scaling': 0.03784617564805115, 'max_iter': 99, 'multi_class': 'auto', 'solver': 'saga', 'tol': 0.005801390831569728} accuracy 97.5%
Lale supports composite models, which resemble sklearn pipelines but are more expressive.
Symbol | Name | Description | Sklearn feature |
---|---|---|---|
>> | pipe | Feed to next | make_pipeline |
& | and | Run both | make_union , includes concat |
| | or | Choose one | (missing) |
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from lale.lib.lale import ConcatFeatures as Cat
from lale.lib.lale import NoOp
lale.wrap_imported_operators()
optimizable = (PCA & NoOp) >> Cat >> (LR | SVC)
optimizable.visualize()
from lale.operators import make_pipeline, make_union, make_choice
optimizable = make_pipeline(make_union(PCA, NoOp), make_choice(LR, SVC))
optimizable.visualize()
import lale.lib.lale.hyperopt
Optimizer = lale.lib.lale.hyperopt.Hyperopt
trained = optimizable.auto_configure(X_train, y_train, optimizer=Optimizer, max_evals=10)
100%|██████████| 10/10 [00:41<00:00, 4.18s/trial, best loss: -0.982597754548974] 1 out of 10 trials failed, call summary() for details. Run with verbose=True to see per-trial exceptions.
predictions = trained.predict(X_test)
print(f'accuracy {accuracy_score(y_test, predictions):.1%}')
trained.visualize()
accuracy 98.9%
Besides schemas for hyperparameter, Lale also provides operator tags and schemas for input and output data of operators.
LR.get_tags()
{'pre': ['~categoricals'], 'op': ['estimator', 'classifier', 'interpretable'], 'post': ['probabilities']}
LR.get_schema('input_fit')
{'type': 'object', 'required': ['X', 'y'], 'additionalProperties': False, 'properties': {'X': {'description': 'Features; the outer array is over samples.', 'type': 'array', 'items': {'type': 'array', 'items': {'type': 'number'}}}, 'y': {'description': 'Target class labels; the array is over samples.', 'anyOf': [{'type': 'array', 'items': {'type': 'number'}}, {'type': 'array', 'items': {'type': 'string'}}, {'type': 'array', 'items': {'type': 'boolean'}}]}}}
LR.get_schema('output_predict')
{'description': 'Predicted class label per sample.', 'anyOf': [{'type': 'array', 'items': {'type': 'number'}}, {'type': 'array', 'items': {'type': 'string'}}, {'type': 'array', 'items': {'type': 'boolean'}}]}