#!/usr/bin/env python # coding: utf-8 # # Lale: Auto-ML and Types for Scikit-learn # # This notebook is an introductory guide to # [Lale](https://github.com/ibm/lale) for scikit-learn users. # [Scikit-learn](https://scikit-learn.org) is a popular, easy-to-use, # and comprehensive data science library for Python. This notebook aims # to show how Lale can make scikit-learn even better in two areas: # auto-ML and type checking. First, if you do not want to manually # select all algorithms or tune all hyperparameters, you can leave it to # Lale to do that for you automatically. Second, when you pass # hyperparameters or datasets to scikit-learn, Lale checks that these # are type-correct. For both auto-ML and type-checking, Lale uses a # single source of truth: machine-readable schemas associated with # scikit-learn compatible transformers and estimators. Rather than # invent a new schema specification language, Lale uses [JSON # Schema](https://json-schema.org/understanding-json-schema/), because # it is popular, widely-supported, and makes it easy to store or send # hyperparameters as JSON objects. Furthermore, by using the same # schemas both for auto-ML and for type-checking, Lale ensures that # auto-ML is consistent with type checking while also reducing the # maintenance burden to a single set of schemas. # # Lale is an open-source Python library and you can install it by doing # `pip install lale`. See # [installation](https://github.com/IBM/lale/blob/master/docs/installation.rst) # for further instructions. Lale uses the term *operator* to refer to # what scikit-learn calls machine-learning transformer or estimator. # Lale provides schemas for 180 # [operators](https://github.com/IBM/lale/tree/master/lale/lib). Most of # these operators come from scikit-learn itself, but there are also # operators from other frameworks such as XGBoost or PyTorch. # If Lale does not yet support your favorite operator, you can add it # yourself by following this # [guide](https://nbviewer.jupyter.org/github/IBM/lale/blob/master/examples/docs_new_operators.ipynb). # If you do add a new operator, please consider contributing it back to # Lale! # # The rest of this notebook first demonstrates auto-ML, then reveals # some of the schemas that make that possible, and finally demonstrates # how to also use the very same schemas for type checking. # ## 1. Auto-ML with Lale # # Lale serves as an interface for two Auto-ML tasks: hyperparameter tuning # and algorithm selection. Rather than provide new implementations for # these tasks, Lale reuses existing implementations. The next few cells # demonstrate how to use Hyperopt and GridSearchCV from Lale. Lale also # supports additional optimizers, not shown in this notebook. In all # cases, the syntax for specifying the search space is the same. # # ### 1.1 Hyperparameter Tuning with Lale and Hyperopt # # Let's start by looking at hyperparameter tuning, which is an important # subtask of auto-ML. To demonstrate it, we first need a dataset. # Therefore, we load the California Housing dataset and display the # first few rows to get a feeling for the data. Lale can process both # Pandas dataframes and Numpy ndarrays; here we use dataframes. # In[1]: import pandas as pd import lale.datasets (train_X, train_y), (test_X, test_y) = lale.datasets.california_housing_df() pd.concat([train_X.head(), train_y.head()], axis=1) # As you can see, the target column is a continuous number, indicating # that this is a regression task. Besides the target, there are eight # feature columns, which are also all continuous numbers. That means # many scikit-learn operators will work out of the box on this data # without needing to preprocess it first. Next, we need to import a few # operators. `PCA` (principal component analysis) is a transformer from # scikit-learn for linear dimensionality reduction. # `DecisionTreeRegressor` is an estimator from scikit-learn that can # predict the target column. `Pipeline` is how scikit-learn composes # operators into a sequence. `Hyperopt` is a Lale wrapper for # the [hyperopt](http://hyperopt.github.io/hyperopt/) auto-ML library. # And finally, `wrap_imported_operators` augments `PCA`, `Tree`, and # `Pipeline` with schemas to enable Lale to tune their hyperparameters. # In[2]: from sklearn.decomposition import PCA from sklearn.tree import DecisionTreeRegressor as Tree from sklearn.pipeline import Pipeline from lale.lib.lale import Hyperopt lale.wrap_imported_operators() # Next, we create a two-step pipeline of `PCA` and `Tree`. This code # looks almost like in scikit-learn. The only difference is that since # we want Lale to tune the hyperparameters for us, we do # not specify them by hand. Specifically, we just write `PCA` instead of # `PCA(...)`, omitting the hyperparameters for `PCA`. Analogously, we # just write `Tree` instead of `Tree(...)`, omitting the hyperparameters # for `Tree`. Rather than binding hyperparameters by hand, we leave them # free to be tuned by hyperopt. # In[3]: pca_tree_planned = Pipeline(steps=[('tfm', PCA), ('estim', Tree)]) # We use `auto_configure` on the pipeline and pass `Hyperopt` as an optimizer. This will use the pipeline's search space to find the best pipeline. In this case, the search uses 10 trials. Each # trial draws values for the hyperparameters from the ranges specified # by the JSON schemas associated with the operators in the pipeline. # In[4]: get_ipython().run_cell_magic('time', '', 'pca_tree_trained = pca_tree_planned.auto_configure(\n train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=10, verbose=True)\n') # By default, Hyperopt uses k-fold cross validation # to evaluate each trial and a default scoring metric based on the task. The end result is the pipeline that # performed best out of all trials. In addition to the cross-val score, # we can also evaluate this best pipeline against the test data. We # simply use the existing R2 score metric from scikit-learn for this # purpose. # In[5]: import sklearn.metrics predicted = pca_tree_trained.predict(test_X) print(f'R2 score {sklearn.metrics.r2_score(test_y, predicted):.2f}') # ### 1.2 Inspecting the Results of Automation # # In the previous example, the automation picked hyperparameter values # for PCA and the decision tree. We know the values were valid and we # know how well the pipeline performed with them. But we might also want # to know exactly which values were picked. One way to do that is by # visualizing the pipeline and using tooltips. If you are looking at # this notebook in a viewer that supports tooltips, you can hover the # mouse pointer over either one of the operators to see its # hyperparameters. # In[6]: pca_tree_trained.visualize() # Another way to view the results of hyperparameter tuning in Lale is by # pretty-printing the pipeline as Python source code. Calling the # `pretty_print` method with `ipython_display=True` prints the code with # syntax highlighting in a Jupyter notebook. The pretty-printed code # contains the hyperparameters. # In[7]: pca_tree_trained.pretty_print(ipython_display=True) # ### 1.3 Hyperparameter Tuning with Lale and GridSearchCV # # Lale supports multiple auto-ML tools, not just hyperopt. For instance, # you can also use # [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) # from scikit-learn. You could use the exact same `pca_tree_planned` # pipeline for this as we did with the hyperopt tool. # However, to avoid running for a long time, here we simplify the space: # for `PCA`, we bind the `svd_solver` so only the remaining hyperparameters # are being searched, and for `Tree`, we call `freeze_trainable()` to bind # all hyperparameters to their defaults. Lale again uses the schemas # attached to the operators in the pipeline to generate a suitable search grid. # Here, instead of the scikit-learn's `Pipeline(...)` API, we use the # `make_pipeline` function. This function exists in both scikit-learn and # Lale; the Lale version yields a Lale pipeline that supports `auto_configure`. # Note that, to be compatible with scikit-learn, `lale.lib.lale.GridSearchCV` # can also take a `param_grid` as an argument if the user chooses to use a # handcrafted grid instead of the one generated automatically. # In[8]: get_ipython().run_cell_magic('time', '', "from lale.lib.lale import GridSearchCV\nfrom lale.operators import make_pipeline\ngrid_search_planned = make_pipeline(\n PCA(svd_solver='auto'), Tree().freeze_trainable())\ngrid_search_result = grid_search_planned.auto_configure(\n train_X, train_y, optimizer=GridSearchCV, cv=3)\n") # Just like we saw earlier with hyperopt, you can use the best pipeline # found for scoring and evaluate the quality of the predictions. # In[9]: predicted = grid_search_result.predict(test_X) print(f'R2 score {sklearn.metrics.r2_score(test_y, predicted):.2f}') # Similarly, to inspect the results of grid search, you have the same # options as demonstrated earlier for hypopt. For instance, you can # pretty-print the best pipeline found by grid search back as Python # source code, and then look at its hyperparameters. # In[10]: grid_search_result.visualize() grid_search_result.pretty_print(ipython_display=True, combinators=False) # If we do not pretty-print with `combinators=False`, the pretty-printed # code is rendered slightly differently, using `>>` instead of `make_pipeline`. # In[11]: grid_search_result.pretty_print(ipython_display=True) # ### 1.4 Pipeline Combinators # # We already saw that `>>` is syntactic sugar for `make_pipeline`. Lale # refers to `>>` as the *pipe combinator*. Besides `>>`, Lale supports # two additional combinators. Before we introduce them, let's import a # few more things. # In[12]: from lale.lib.lale import NoOp, ConcatFeatures from sklearn.linear_model import LinearRegression as LinReg from xgboost import XGBRegressor as XGBoost lale.wrap_imported_operators() # Lale emulates the scikit-learn APIs for composing pipelines using # functions. We already saw `make_pipeline`. Another function in # scikit-learn is `make_union`, which composes multiple sub-pipelines to # run on the same data, then concatenates the features. In other words, # `make_union` produces a horizontal stack of the data transformed by # its sub-pipelines. To support auto-ML, Lale introduces a third # function, `make_choice`, which does not exist in scikit-learn. The # `make_choice` function specifies an algorithmic choice for auto-ML to # resolve. In other words, `make_choice` creates a search space for # automated algorithm selection. # In[13]: dag_with_functions = lale.operators.make_pipeline( lale.operators.make_union(PCA, NoOp), lale.operators.make_choice(Tree, LinReg, XGBoost(booster='gbtree'))) dag_with_functions.visualize() # The visualization shows `make_union` as multiple sub-pipelines feeding # into `ConcatFeatures`, and it shows `make_choice` using an `|` # combinator. Operators shown in white are already fully trained; in # this case, these operators actually do not have any learnable # coefficients, nor do they have hyperparameters. For each of the three # functions `make_pipeline`, `make_choice`, and `make_union`, Lale also # provides a corresponding combinator. We already saw the pipe # combinator (`>>`) and the choice combinator (`|`). To get the effect # of `make_union`, use the *and combinator* (`&`) with the # `ConcatFeatures` operator. The next example shows the exact same # pipeline as before, but written using combinators instead of # functions. # In[14]: dag_with_combinators = ( (PCA(svd_solver='full') & NoOp) >> ConcatFeatures >> (Tree | LinReg | XGBoost(booster='gbtree'))) dag_with_combinators.visualize() # ### 1.5 Combined Algorithm Selection and Hyperparameter Optimization # # Since the `dag_with_functions` specifies an algorithm choice, when we # feed it to a `Hyperopt`, hyperopt will do algorithm selection # for us. And since some of the operators in the dag do not have all # their hyperparameters bound, hyperopt will also tune their free # hyperparameters for us. Note that `booster` for `XGBoost` is fixed to `gbtree` and hence Hyperopt would not tune it. # In[15]: get_ipython().run_cell_magic('time', '', 'multi_alg_trained = dag_with_functions.auto_configure(\n train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=10)\n') # Visualizing the best estimator reveals what algorithms # hyperopt chose. # In[16]: multi_alg_trained.visualize() # Pretty-printing the best estimator reveals how hyperopt tuned the # hyperparameters. For instance, we can see that a `randomized` `svd_solver` was chosen for PCA. # In[17]: multi_alg_trained.pretty_print(ipython_display=True, show_imports=False) # Of course, the trained pipeline can be used for predictions as usual, # and we can use scikit-learn metrics to evaluate those predictions. # In[18]: predicted = multi_alg_trained.predict(test_X) print(f'R2 score {sklearn.metrics.r2_score(test_y, predicted):.2f}') # ## 2. Viewing and Customizing Schemas # # This section reveals more of what happens behind the scenes for # auto-ML with Lale. In particular, it shows the JSON Schemas used for # auto-ML, and demonstrates how to customize them if desired. # # ### 2.1 Looking at Schemas from a Notebook # # When writing data science code, I often don't remember all the API # information about what hyperparameters and datasets an operator # expects. Lale attaches this information to the operators and uses it # for auto-ML as demonstrated above. The same information can also be # useful as interactive documentation in a notebook. Most individual # operators in the visualizations shown earlier in this notebook actually # contain a hyperlink to the excellent online documentation of # scikit-learn. We can also retrieve that hyperlink using a method call. # In[19]: print(Tree.documentation_url()) # Lale's helper function `ipython_display` pretty-prints JSON documents # and JSON schemas in a Jupyter notebook. You can get a quick overview # of the constructor arguments of an operator by calling the # `get_defaults` method. # In[20]: from lale.pretty_print import ipython_display ipython_display(dict(Tree.get_defaults())) # Hyperparameters can be categorical (meaning they accept a few # discrete values) or continuous (integers or real numbers). # As an example for a categorical hyperparameter, let's look at the # `criterion`. JSON Schema can encode categoricals as an `enum`. # In[21]: ipython_display(Tree.hyperparam_schema('criterion')) # As an example for a continuous hyperparameter, let's look at # `max_depth`. The decision tree regressor in scikit-learn accepts # either an integer for that, or `None`, which has its own meaning. # JSON Schema can express these two choices as an `anyOf`, and # encodes the Python `None` as a JSON `null`. Also, while # any positive integer is a valid value, in the context of auto-ML, # Lale specifies a bounded range for the optimizer to search over. # In[22]: ipython_display(Tree.hyperparam_schema('max_depth')) # Besides hyperparameter schemas, Lale also provides dataset schemas. # For exampe, NMF, which stands for non-negative matrix factorization, # requires a non-negative matrix as `X`. In JSON Schema, we express this # as an array of arrays of numbers with `minimum: 0`. While NMF also # accepts a second argument `y`, it does not use that argument. # Therefore, Lale gives `y` the schema `{'laleType': 'Any'}`, which permits any # values. # In[23]: from sklearn.decomposition import NMF lale.wrap_imported_operators() ipython_display(NMF.input_schema_fit()) # ### 2.2 Customizing Schemas from a Notebook # # While you can use Lale schemas as-is, you can also customize the # schemas to exert more control over the automation. As one example, it is common to tune XGBoost to use a large number for `n_estimators`. However, you might want to # reduce the number of trees in an XGBoost forest to reduce memory # consumption or to improve explainability. As another example, you # might want to hand-pick one of the boosters to reduce the search space # and thus hopefully speed up the search. # In[24]: import lale.schemas as schemas Grove = XGBoost.customize_schema( n_estimators=schemas.Int(minimum=2, maximum=6), booster=schemas.Enum(['gbtree'], default='gbtree')) # As this example demonstrates, Lale provides a simple Python API for # writing schemas, which it then converts to JSON Schema internally. The # result of customization is a new copy of the operator that can be used # in the same way as any other operator in Lale. In particular, it can # be part of a pipeline as before. # In[25]: grove_planned = lale.operators.make_pipeline( lale.operators.make_union(PCA, NoOp), Grove) grove_planned.visualize() # Given this new planned pipeline, we use hyperopt as before to search # for a good trained pipeline. # In[26]: get_ipython().run_cell_magic('time', '', 'grove_trained = grove_planned.auto_configure(\n train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=10)\n') # As with all trained Lale pipelines, we can evaluate `grove_trained` # with metrics to see how well it does. Also, we can pretty-print # it back as Python code to double-check whether hyperopt obeyed the # customized schemas for `n_estimators` and `booster`. # In[27]: predicted = grove_trained.predict(test_X) print(f'R2 score {sklearn.metrics.r2_score(test_y, predicted):.2f}') grove_trained.pretty_print(ipython_display=True, show_imports=False) # ## 3. Type-Checking with Lale # # The rest of this notebook gives examples for how the same schemas # that serve for auto-ML can also serve for error checking. We will # give comparative examples for error checking in scikit-learn (without # schemas) and in Lale (with schemas). To make it clear which version # of an operator is being used, all of the following examples uses # fully-qualified names (e.g., `sklearn.feature_selection.RFE`). The # fully-qualified names are for presentation purposes only; in typical # usage of either scikit-learn or Lale, these would be simple names # (e.g. just `RFE`). # # ### 3.1 Hyperparameter Error Example in Scikit-Learn # # First, we import a few things. # In[28]: import sys import sklearn from sklearn import pipeline, feature_selection, ensemble, tree # We use `make_pipeline` to compose a pipeline of two steps: an RFE # transformer and a decision tree regressor. RFE performs recursive # feature elimination, keeping only those features of the input data # that are the most useful for its `estimator` argument. For RFE's # estimator argument, the following code uses a random forest with 10 # trees. # In[29]: sklearn_hyperparam_error = sklearn.pipeline.make_pipeline( sklearn.feature_selection.RFE( estimator=sklearn.ensemble.RandomForestRegressor(n_estimators=10)), sklearn.tree.DecisionTreeRegressor(max_depth=-1)) # The `max_depth` argument for a decision tree cannot be a # negative number. Hence, the above code actually contains a bug: it # sets `max_depth=-1`. Scikit-learn does not check for this mistake from # the `__init__` method, otherwise we would have seen an error message # already. Instead, scikit-learn checks for this mistake during `fit`. # Unfortunately, it takes a few seconds to get the exception, because # scikit-learn first trains the RFE transformer and uses it to transform # the data. Only then does it pass the data to the decision tree. # In[30]: get_ipython().run_cell_magic('time', '', 'try:\n sklearn_hyperparam_error.fit(train_X, train_y)\nexcept ValueError as e:\n message = str(e)\nprint(message, file=sys.stderr)\n') # Fortunately, this error message is pretty clear. Scikit-learn # implements the error check imperatively, using Python if-statements # to raise an exception when hyperparameters are configured wrong. # This notebook is part of Lale's regression test suite and gets run # automatically when changes are pushed to the Lale source code # repository. The assertion in the following cell is a test that the # error-check indeed behaves as expected and documented here. # In[31]: assert message.startswith("max_depth must be greater than zero.") # ### 3.2 Checking Hyperparameters with Types # # Lale performs the same error checks, but using JSON Schema validation # instead of Python if-statements and raise-statements. First, we import # the `jsonschema` validator so we can catch its exceptions. # In[1]: import jsonschema #enable schema validation explicitly for the notebook from lale.settings import set_disable_data_schema_validation, set_disable_hyperparams_schema_validation set_disable_data_schema_validation(False) set_disable_hyperparams_schema_validation(False) # Below is the exact same pipeline as before, but written in Lale # instead of directly in scikit-learn. In both cases, the underlying # implementation is in scikit-learn; Lale only adds thin wrappers to # support type checking and auto-ML. # In[33]: get_ipython().run_cell_magic('time', '', 'try:\n lale_hyperparam_error = lale.operators.make_pipeline(\n lale.lib.sklearn.RFE(\n estimator=lale.lib.sklearn.RandomForestRegressor(n_estimators=10)),\n lale.lib.sklearn.DecisionTreeRegressor(max_depth=-1))\nexcept jsonschema.ValidationError as e:\n message = e.message\nprint(message, file=sys.stderr)\n') # In[34]: assert message.startswith("Invalid configuration for DecisionTreeRegressor(max_depth=-1)") # Just like in the scikit-learn example, the error message in the Lale # example also pin-points the problem as passing `max_depth=-1` to the # decision tree. It does so in a more stylized way, printing the # relevant JSON schema for this hyperparameter. Lale detects the error # already when the wrong hyperparameter is being passed as an argument, # thus reducing the amount of code you have to look at to find the root # cause. Furthermore, Lale takes only tens of milliseconds to detect # the error, because it does not attempt to train the RFE transformer # first. In this example, that only saves a few seconds, which may not # be significant. But there are situations with larger time savings, # such as when using larger datasets, slower operators, or when auto-ML # tries out many pipelines. # ### 3.3 Dataset Error Example in Scikit-Learn # # Above, we saw an example for detecting a hyperparameter error in # scikit-learn and in Lale. Next, we look at an analogous example for a # dataset error. Again, let's first look at the experience with # scikit-learn and then the same thing with Lale. # In[35]: from sklearn import decomposition # We use scikit-learn to compose a pipeline of two steps: an RFE # transformer as before, this time followed by an NMF transformer. # In[36]: sklearn_dataset_error = sklearn.pipeline.make_pipeline( sklearn.feature_selection.RFE( estimator=sklearn.ensemble.RandomForestRegressor(n_estimators=10)), sklearn.decomposition.NMF()) # NMF, or non-negative matrix factorization, does not allow any negative # numbers in its input matrix. The California Housing dataset contains # some negative numbers and the RFE does not eliminate those features. # To detect the mistake, scikit-learn must first train the RFE and # transform the data with it, which takes a few seconds. Then, NMF # detects the error and throws an exception. # In[37]: get_ipython().run_cell_magic('time', '', 'try:\n sklearn_dataset_error.fit(train_X, train_y)\nexcept ValueError as e:\n message = str(e)\nprint(message, file=sys.stderr)\n') # In[38]: assert message.startswith("Negative values in data passed to NMF (input X)") # ### 3.4 Types for Dataset Checking # # Lale uses types (as expressed using JSON schemas) to check # dataset-related mistakes. Below is the same pipeline as before, using # thin Lale wrappers around scikit-learn operators. We redefine the # pipeline to enable Lale type-checking for it. # In[39]: lale_dataset_error = lale.operators.make_pipeline( lale.lib.sklearn.RFE( estimator=lale.lib.sklearn.RandomForestRegressor(n_estimators=10)), lale.lib.sklearn.NMF()) # When we call `fit` on the pipeline, before doing the actual training, # Lale can check that the # schema is correct at each step of the pipeline. In other words, it # checks whether the schema of the input data is valid for the first # step of the pipeline, and that the schema of the output from each step # is valid for the next step. By saving the time for training the RFE, # this completes in tens of milliseconds instead of seconds as before. # In[40]: #Enable the data schema validation in lale settings from lale.settings import set_disable_data_schema_validation set_disable_data_schema_validation(False) # In[41]: get_ipython().run_cell_magic('time', '', 'try:\n lale_dataset_error.fit(train_X, train_y)\nexcept ValueError as e:\n message = str(e)\nprint(message, file=sys.stderr)\n') # In[42]: assert message.startswith('NMF.fit() invalid X, the schema of the actual data is not a subschema of the expected schema of the argument.') # In this example, the schemas for `X` differ: whereas the data is an # array of arrays of unconstrained numbers, NMF expects an array of # arrays of only non-negative numbers. # ### 3.5 Hyperparameter Constraint Example in Scikit-Learn # # Sometimes, the validity of hyperparameters cannot be checked in # isolation. Instead, the value of one hyperparameter can restrict # which values are valid for another hyperparameter. For example, # scikit-learn imposes a conditional hyperparameter constraint between # the `svd_solver` and `n_components` arguments to PCA. # In[43]: sklearn_constraint_error = sklearn.pipeline.make_pipeline( sklearn.feature_selection.RFE( estimator=sklearn.ensemble.RandomForestRegressor(n_estimators=10)), sklearn.decomposition.PCA(svd_solver='arpack', n_components='mle')) # The above notebook cell completed successfully, because scikit-learn # did not yet check for the constraint. To observe the error message # with scikit-learn, we must attempt to fit the pipeline. # In[44]: get_ipython().run_cell_magic('time', '', 'message=None\ntry:\n sklearn_constraint_error.fit(train_X, train_y)\nexcept ValueError as e:\n message = str(e)\nprint(message, file=sys.stderr)\n') # In[45]: assert message.startswith("n_components='mle' cannot be a string with svd_solver='arpack'") # Scikit-learn implements constraint-checking as Python code with # if-statements and raise-statements. After a few seconds, we get an # exception, and the error message explains what went wrong. # ### 3.6 Types for Constraint Checking # # Lale specifies constraints using JSON Schemas. When you configure an # operator with actual hyperparameters, Lale immediately validates them # against their schema including constraints. # In[46]: get_ipython().run_cell_magic('time', '', "try:\n lale_constraint_error = lale.operators.make_pipeline(\n lale.lib.sklearn.RFE(\n estimator=lale.lib.sklearn.RandomForestRegressor(n_estimators=10)),\n PCA(svd_solver='arpack', n_components='mle'))\nexcept jsonschema.ValidationError as e:\n message = str(e)\nprint(message, file=sys.stderr)\n") # In[47]: assert message.startswith("Invalid configuration for PCA(svd_solver='arpack', n_components='mle')") # Lale reports the error quicker than scikit-learn, taking only tens of # milliseconds instead of multiple seconds. The error message contains # both a natural-language description of the constraint and its formal # representation in JSON Schema. The `'anyOf'` implements an 'or', so # you can read the constraints as # # ```python # (not (n_components in ['mle'])) or (svd_solver in ['full', 'auto']) # ``` # # By basic Boolean algebra, this is equivalent to an implication # # ```python # (n_components in ['mle']) implies (svd_solver in ['full', 'auto']) # ``` # # Since the constraint is specified declaratively in the schema, it gets # applied wherever the schema gets used. Specifically, the constraint # gets applied both during auto-ML and during type-checking. In the # context of auto-ML, the constraint prunes the search space: it # eliminates some hyperparameter combinations so that the auto-ML tool # does not have to try them out. We have observed cases where this # pruning makes a big difference in search convergence. # ## 4. Conclusion # # This notebook showed additions to scikit-learn that simplify auto-ML # as well as error checking. The common foundation for both of these # additions is schemas for operators. For further reading, return to the # Lale github [repository](https://github.com/ibm/lale), where you can # find installation instructions, an FAQ, and links to further # documentation, notebooks, talks, etc.