Hyperparameter error examples¶

Since the schema of the C hyperparameter of LR specifies an exclusive minimum of zero, passing zero is not valid. Lale internally calls an off-the-shelf JSON Schema validator when an operator gets configured with concrete hyperparameter values.

In [1]:

from sklearn.linear_model import LogisticRegression as LR
import lale
lale.wrap_imported_operators()

In [2]:

import jsonschema
import sys
try:
    LR(C=0.0)
except jsonschema.ValidationError as e:
    message = e.message
print(message, file=sys.stderr)
assert message.startswith('Invalid configuration for LR(C=0.0)')

Invalid configuration for LR(C=0.0) due to invalid value C=0.0.
Schema of argument C: {
    'description': 'Inverse regularization strength. Smaller values specify stronger regularization.',
    'type': 'number',
    'distribution': 'loguniform',
    'minimum': 0.0,
    'exclusiveMinimum': true,
    'default': 1.0,
    'minimumForOptimizer': 0.03125,
    'maximumForOptimizer': 32768}
Value: 0.0

Besides per-hyperparameter types, there are also conditional inter-hyperparameter constraints. These are checked using the same call to an off-the-shelf JSON Schema validator.

In [3]:

try:
    LR(LR.solver.sag, LR.penalty.l1)
except jsonschema.ValidationError as e:
    message = e.message
print(message, file=sys.stderr)
assert message.find('support only l2 penalties') != -1

Invalid configuration for LR(solver='sag', penalty='l1') due to constraint the newton-cg, sag, and lbfgs solvers support only l2 penalties.
Schema of constraint 1: {
    'description': 'The newton-cg, sag, and lbfgs solvers support only l2 penalties.',
    'anyOf': [
    {   'type': 'object',
        'properties': {
            'solver': {
                'not': {
                    'enum': ['newton-cg', 'sag', 'lbfgs']}}}},
    {   'type': 'object',
        'properties': {
            'penalty': {
                'enum': ['l2']}}}]}
Value: {'solver': 'sag', 'penalty': 'l1', 'dual': False, 'C': 1.0, 'tol': 0.0001, 'fit_intercept': True, 'intercept_scaling': 1.0, 'class_weight': None, 'random_state': None, 'max_iter': 100, 'multi_class': 'ovr', 'verbose': 0, 'warm_start': False, 'n_jobs': None}

There are even constraints that affect three different hyperparameters.

In [4]:

try:
    LR(LR.penalty.l2, LR.solver.sag, dual=True)
except jsonschema.ValidationError as e:
    message = e.message
print(message, file=sys.stderr)
assert message.find('dual formulation is only implemented for') != -1

Invalid configuration for LR(penalty='l2', solver='sag', dual=True) due to constraint the dual formulation is only implemented for l2 penalty with the liblinear solver.
Schema of constraint 2: {
    'description': 'The dual formulation is only implemented for l2 penalty with the liblinear solver.',
    'anyOf': [
    {   'type': 'object',
        'properties': {
            'dual': {
                'enum': [false]}}},
    {   'type': 'object',
        'properties': {
            'penalty': {
                'enum': ['l2']},
            'solver': {
                'enum': ['liblinear']}}}]}
Value: {'penalty': 'l2', 'solver': 'sag', 'dual': True, 'C': 1.0, 'tol': 0.0001, 'fit_intercept': True, 'intercept_scaling': 1.0, 'class_weight': None, 'random_state': None, 'max_iter': 100, 'multi_class': 'ovr', 'verbose': 0, 'warm_start': False, 'n_jobs': None}

Dataset error example for individual operator¶

Lale uses JSON Schema validation not only for hyperparameters but also for data. The dataset train_X is multimodal: some columns contain text strings whereas others contain numbers.

In [5]:

import pandas as pd
from lale.datasets.uci.uci_datasets import fetch_drugscom
train_X, train_y, test_X, test_y = fetch_drugscom()
pd.concat([train_X.head(), train_y.head()], axis=1)

Out[5]:

	drugName	condition	review	date	usefulCount	rating
0	Valsartan	Left Ventricular Dysfunction	"It has no side effect, I take it in combinati...	May 20, 2012	27	9.0
1	Guanfacine	ADHD	"My son is halfway through his fourth week of ...	April 27, 2010	192	8.0
2	Lybrel	Birth Control	"I used to take another oral contraceptive, wh...	December 14, 2009	17	5.0
3	Ortho Evra	Birth Control	"This is my first time using any form of birth...	November 3, 2015	10	8.0
4	Buprenorphine / naloxone	Opiate Dependence	"Suboxone has completely turned my life around...	November 27, 2016	37	9.0

In [6]:

from lale.pretty_print import ipython_display
ipython_display(lale.datasets.data_schemas.to_schema(train_X))

{
    'type': 'array',
    'items': {
        'type': 'array',
        'minItems': 5,
        'maxItems': 5,
        'items': [
        {   'description': 'drugName',
            'type': 'string'},
        {   'description': 'condition',
            'anyOf': [
            {   'type': 'string'},
            {   'enum': [NaN]}]},
        {   'description': 'review',
            'type': 'string'},
        {   'description': 'date',
            'type': 'string'},
        {   'description': 'usefulCount',
            'type': 'integer',
            'minimum': 0}]},
    'minItems': 161297,
    'maxItems': 161297}

Since train_X contains strings but LR expects only numbers, the call to fit reports a type error.

In [7]:

trainable_lr = LR()
try:
    LR.validate_schema(train_X, train_y)
except ValueError as e:
    message = str(e)
print(message, file=sys.stderr)
assert message.startswith('LR.fit() invalid X')

LR.fit() invalid X: Expected sub to be a subschema of super.
sub = {
    'type': 'array',
    'items': {
        'type': 'array',
        'minItems': 5,
        'maxItems': 5,
        'items': [
        {   'description': 'drugName',
            'type': 'string'},
        {   'description': 'condition',
            'anyOf': [
            {   'type': 'string'},
            {   'enum': [NaN]}]},
        {   'description': 'review',
            'type': 'string'},
        {   'description': 'date',
            'type': 'string'},
        {   'description': 'usefulCount',
            'type': 'integer',
            'minimum': 0}]},
    'minItems': 161297,
    'maxItems': 161297}
super = {
    'description': 'Features; the outer array is over samples.',
    'type': 'array',
    'items': {
        'type': 'array',
        'items': {
            'type': 'number'}}}

Load a pure numerical dataset instead.

In [8]:

from lale.datasets import load_iris_df
(train_X, train_y), (test_X, test_y) = load_iris_df()
train_X.head()

Out[8]:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5

Training LR with the Iris dataset works fine.

In [9]:

trained_lr = trainable_lr.fit(train_X, train_y)

Lifecycle error example¶

Lale encourages separating the lifecycle states, here represented by trainable_lr vs. trained_lr. The predict method should only be called on a trained model.

In [10]:

predicted = trained_lr.predict(test_X)
print(f'test_y    {[*test_y]}')
print(f'predicted {[*predicted]}')

test_y    [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]
predicted [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]

On the other hand, the predict method should not be called on a trainable model.

In [11]:

import warnings
warnings.filterwarnings("error", category=DeprecationWarning)
try:
    predicted = trainable_lr.predict(test_X)
except DeprecationWarning as w:
    message = str(w)
print(message, file=sys.stderr)
assert message.startswith('The `predict` method is deprecated on a trainable')
print(f'test_y    {[*test_y]}')
print(f'predicted {[*predicted]}')

test_y    [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]
predicted [2, 1, 1, 0, 2, 0, 1, 1, 0, 0, 1, 0, 1, 1, 2, 0, 2, 1, 1, 0, 0, 2, 2, 0, 2, 1, 0, 2, 1, 0]

The `predict` method is deprecated on a trainable operator, because the learned coefficients could be accidentally overwritten by retraining. Call `predict` on the trained operator returned by `fit` instead.

Delegate error example¶

LogisticRegression is an estimator and therefore does not have a transform method, even when trained.

In [12]:

try:
    trained_lr.transform(train_X)
except AttributeError as e:
    message = 'AttributeError'
    print(message, file=sys.stderr)
assert message.startswith('AttributeError')

AttributeError

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.0	3.4	1.6	0.4
1	6.3	3.3	4.7	1.6
2	5.1	3.4	1.5	0.2
3	4.8	3.0	1.4	0.1
4	6.7	3.1	4.7	1.5