Hyperparameter error examples¶

Since the schema of the C hyperparameter of LR specifies an exclusive minimum of zero, passing zero is not valid. Lale internally calls an off-the-shelf JSON Schema validator when an operator gets configured with concrete hyperparameter values.

In [1]:

from sklearn.linear_model import LogisticRegression as LR
import lale
lale.wrap_imported_operators()
from lale.settings import set_disable_data_schema_validation, set_disable_hyperparams_schema_validation
#enable schema validation explicitly for the notebook
set_disable_data_schema_validation(False)
set_disable_hyperparams_schema_validation(False)

In [2]:

import jsonschema
import sys
try:
    LR(C=0.0)
except jsonschema.ValidationError as e:
    message = e.message
print(message, file=sys.stderr)
assert message.startswith('Invalid configuration for LR(C=0.0)')

Invalid configuration for LR(C=0.0) due to invalid value C=0.0.
Some possible fixes include:
- set C=1.0
Schema of argument C: {
    "description": "Inverse regularization strength. Smaller values specify stronger regularization.",
    "type": "number",
    "distribution": "loguniform",
    "minimum": 0.0,
    "exclusiveMinimum": true,
    "default": 1.0,
    "minimumForOptimizer": 0.03125,
    "maximumForOptimizer": 32768,
}
Invalid value: 0.0

Besides per-hyperparameter types, there are also conditional inter-hyperparameter constraints. These are checked using the same call to an off-the-shelf JSON Schema validator.

In [3]:

try:
    LR(LR.enum.solver.sag, LR.enum.penalty.l1)
except jsonschema.ValidationError as e:
    message = e.message
print(message, file=sys.stderr)
assert message.find('support only l2 or no penalties') != -1

Invalid configuration for LR(solver='sag', penalty='l1') due to constraint the newton-cg, sag, and lbfgs solvers support only l2 or no penalties.
Some possible fixes include:
- set penalty='l2'
Schema of failing constraint: https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html#constraint-1
Invalid value: {'solver': 'sag', 'penalty': 'l1', 'dual': False, 'C': 1.0, 'tol': 0.0001, 'fit_intercept': True, 'intercept_scaling': 1.0, 'class_weight': None, 'random_state': None, 'max_iter': 100, 'multi_class': 'auto', 'verbose': 0, 'warm_start': False, 'n_jobs': None, 'l1_ratio': None}

There are even constraints that affect three different hyperparameters.

In [4]:

try:
    LR(LR.enum.penalty.l2, LR.enum.solver.sag, dual=True)
except jsonschema.ValidationError as e:
    message = e.message
print(message, file=sys.stderr)
assert message.find('dual formulation is only implemented for') != -1

Invalid configuration for LR(penalty='l2', solver='sag', dual=True) due to constraint the dual formulation is only implemented for l2 penalty with the liblinear solver.
Some possible fixes include:
- set dual=False
Schema of failing constraint: https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html#constraint-2
Invalid value: {'penalty': 'l2', 'solver': 'sag', 'dual': True, 'C': 1.0, 'tol': 0.0001, 'fit_intercept': True, 'intercept_scaling': 1.0, 'class_weight': None, 'random_state': None, 'max_iter': 100, 'multi_class': 'auto', 'verbose': 0, 'warm_start': False, 'n_jobs': None, 'l1_ratio': None}

Dataset error example for individual operator¶

Lale uses JSON Schema validation not only for hyperparameters but also for data. The dataset train_X is multimodal: some columns contain text strings whereas others contain numbers.

In [5]:

import pandas as pd
from lale.datasets.uci.uci_datasets import fetch_drugscom
train_X, train_y, test_X, test_y = fetch_drugscom()
pd.concat([train_X.head(), train_y.head()], axis=1)

Out[5]:

	drugName	condition	review	date	usefulCount	rating
0	Valsartan	Left Ventricular Dysfunction	"It has no side effect, I take it in combinati...	May 20, 2012	27	9.0
1	Guanfacine	ADHD	"My son is halfway through his fourth week of ...	April 27, 2010	192	8.0
2	Lybrel	Birth Control	"I used to take another oral contraceptive, wh...	December 14, 2009	17	5.0
3	Ortho Evra	Birth Control	"This is my first time using any form of birth...	November 3, 2015	10	8.0
4	Buprenorphine / naloxone	Opiate Dependence	"Suboxone has completely turned my life around...	November 27, 2016	37	9.0

In [6]:

#Enable the schema validation for data 
from lale.settings import set_disable_data_schema_validation
set_disable_data_schema_validation(False)

In [7]:

from lale.pretty_print import ipython_display
ipython_display(lale.datasets.data_schemas.to_schema(train_X))

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "array",
    "items": {
        "type": "array",
        "minItems": 5,
        "maxItems": 5,
        "items": [
            {"description": "drugName", "type": "string"},
            {
                "description": "condition",
                "anyOf": [{"type": "string"}, {"enum": [NaN]}],
            },
            {"description": "review", "type": "string"},
            {"description": "date", "type": "string"},
            {"description": "usefulCount", "type": "integer", "minimum": 0},
        ],
    },
    "minItems": 161297,
    "maxItems": 161297,
}

Since train_X contains strings but LR expects only numbers, the call to fit reports a type error.

In [8]:

trainable_lr = LR(max_iter=1000)
try:
    LR.validate_schema(train_X, train_y)
except ValueError as e:
    message = str(e)
print(message, file=sys.stderr)
assert message.startswith('LR.fit() invalid X')

LR.fit() invalid X, the schema of the actual data is not a subschema of the expected schema of the argument.
actual_schema = {
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "array",
    "items": {
        "type": "array",
        "minItems": 5,
        "maxItems": 5,
        "items": [
            {"description": "drugName", "type": "string"},
            {
                "description": "condition",
                "anyOf": [{"type": "string"}, {"enum": [NaN]}],
            },
            {"description": "review", "type": "string"},
            {"description": "date", "type": "string"},
            {"description": "usefulCount", "type": "integer", "minimum": 0},
        ],
    },
    "minItems": 161297,
    "maxItems": 161297,
}
expected_schema = {
    "description": "Features; the outer array is over samples.",
    "type": "array",
    "items": {"type": "array", "items": {"type": "number"}},
}

Load a pure numerical dataset instead.

In [9]:

from lale.datasets import load_iris_df
(train_X, train_y), (test_X, test_y) = load_iris_df()
train_X.head()

Out[9]:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5

Training LR with the Iris dataset works fine.

In [10]:

trained_lr = trainable_lr.fit(train_X, train_y)

Lifecycle error example¶

Lale encourages separating the lifecycle states, here represented by trainable_lr vs. trained_lr. The predict method should only be called on a trained model.

In [11]:

predicted = trained_lr.predict(test_X)
print(f'test_y    {[*test_y]}')
print(f'predicted {[*predicted]}')

test_y    [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]
predicted [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]

On the other hand, the predict method should not be called on a trainable model.

In [12]:

import warnings
warnings.filterwarnings("error", category=DeprecationWarning)
try:
    predicted = trainable_lr.predict(test_X)
except DeprecationWarning as w:
    message = str(w)
print(message, file=sys.stderr)
assert message.startswith('The `predict` method is deprecated on a trainable')
print(f'test_y    {[*test_y]}')
print(f'predicted {[*predicted]}')

test_y    [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]
predicted [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]

The `predict` method is deprecated on a trainable operator, because the learned coefficients could be accidentally overwritten by retraining. Call `predict` on the trained operator returned by `fit` instead.

Delegate error example¶

LogisticRegression is an estimator and therefore does not have a transform method, even when trained.

In [ ]:

try:
    trained_lr.transform(train_X)
except AttributeError as e:
    message = 'AttributeError'
    print(message, file=sys.stderr)
assert message.startswith('AttributeError')

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5