🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!
In this example, we will show how you can create condition validators in a simplified way by using the condition_validator
decorator. This will allow you to easily create a condition validator based on a user-defined function (UDF).
Let's say you are logging a numerical column col1
, and you want to trigger an action whenever the evaluated row value for this column is greater than 4. To do so, we'll define two functions: an action and a condition. We will then decorate the condition function with the condition_validator
decorator, and pass the action function as an argument.
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
import pandas as pd
from typing import Any
from whylogs.experimental.core.validators import condition_validator
from whylogs.experimental.core.udf_schema import udf_schema
import whylogs as why
data = pd.DataFrame({"col1": [1, 3, 7]})
def do_something_important(validator_name, condition_name: str, value: Any, column_id=None):
print("Validator: {}\n Condition name {} failed for value {}".format(validator_name, condition_name, value))
return
@condition_validator(["col1"], condition_name="less_than_four", actions=[do_something_important])
def lt_4(x):
return x < 4
schema = udf_schema()
why.log(data, schema=schema).view()
No session found. Call whylogs.init() to initialize a session and authenticate. See https://docs.whylabs.ai/docs/whylabs-whylogs-init for more information.
Validator: less_than_four Condition name less_than_four failed for value 7
<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7fe2a606a520>
You can see that the action was triggered once for the value 7.
Condition Validators are compatible with Dataset UDFs. Through Dataset UDFs, you can create new columns based on the values of other columns. In this example, we will create a new column add5
that is equal to col1
+ 5. We will then assign a condition validator to the newly created column:
from typing import Dict, List, Union
from whylogs.experimental.core.udf_schema import register_dataset_udf
@register_dataset_udf(["col1"])
def add5(x: Union[Dict[str, List], pd.DataFrame]) -> Union[List, pd.Series]:
return [xx + 5 for xx in x["col1"]]
@condition_validator(["add5"], condition_name="less_than_four", actions=[do_something_important])
def lt_4(x):
return x < 4
schema = udf_schema()
why.log(data, schema=schema).view()
Validator: less_than_four Condition name less_than_four failed for value 7 Validator: less_than_four Condition name less_than_four failed for value 6 Validator: less_than_four Condition name less_than_four failed for value 8 Validator: less_than_four Condition name less_than_four failed for value 12
<whylogs.core.view.dataset_profile_view.DatasetProfileView at 0x7fe3038f8040>
Now, our action was triggered 4 times: once for col1
's value 7, and 3 times for add5
's values 6, 8 and 12.
You can access the assigned condition validators through the schema object. In the following code snippet, we can see that there's one condition validator assigned to col1
and one to add5
, both being named less_than_four
:
schema.validators
defaultdict(list, {'col1': [ConditionValidator(name='less_than_four', conditions={'less_than_four': <function lt_4 at 0x7fe2a6059af0>}, actions=[<function do_something_important at 0x7fe2ea7c6d30>], total=0, failures={'less_than_four': 2}, enable_sampling=True, _samples=[], _sampler=<whylogs_sketching.var_opt_sketch object at 0x7fe2a6067130>, sample_size=10)], 'add5': [ConditionValidator(name='less_than_four', conditions={'less_than_four': <function lt_4 at 0x7fe2a4d531f0>}, actions=[<function do_something_important at 0x7fe2ea7c6d30>], total=0, failures={'less_than_four': 3}, enable_sampling=True, _samples=[], _sampler=<whylogs_sketching.var_opt_sketch object at 0x7fe2a4d461b0>, sample_size=10)]})
We can get a sample of the data that failed the condition. Let's do that for the first (and only) condition validator for the add5
column:
schema.validators["add5"][0].get_samples()
[6, 8, 12]