🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!
In this example, we'll explore the basics of logging data with whylogs and a user defined function or UDF
%pip install 'whylogs>=1.5.0'
Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:
import pandas as pd
data = {
"animal": ["cat", "hawk", "clam", "cat", "mongoose", "octopus"],
"legs": [4, 2, 1, 4, 4, 8],
"weight": [4.3, 1.8, 1.3, 4.1, 5.4, 3.2],
}
df = pd.DataFrame(data)
Here we use a metric UDF targeting a named column animal
as an example to show how we can add features to a dataframe for custom monitoring. In this example we model some custom logic for if the animal has a cool name. This is a toy example that just checks if the name is longer than 4 characters, and does a binary classification, but you could return a score based on values in a column too.
import whylogs as why
from whylogs.experimental.core.udf_schema import udf_schema
from whylogs.experimental.core.metrics.udf_metric import register_metric_udf
@register_metric_udf(col_name="animal")
def has_cool_animal_name(text):
if len(text) > 4: # long names are cool
return 1
else:
return 0
custom_schema = udf_schema()
To obtain a profile of your data, you can simply use whylogs' log
call with your UDF schema defined earlier. This will attach a feature named animal.has_cool_animal_name
which you can then see in WhyLabs.
import whylogs as why
why.init(force_local=True)
results = why.log(df, name="udf_demo", schema=custom_schema)
results.view().to_pandas()
Unlike metric UDFs, dataset UDFs can take multiple columns as input. Dataset UDFs create a new column in your pandas dataframe, which then is profiled along with your inputs.
from whylogs.experimental.core.udf_schema import register_dataset_udf
import pandas as pd
@register_dataset_udf(["legs", "weight"])
def weight_per_leg(data: pd.DataFrame) -> pd.Series:
return data["weight"] / data["legs"]
custom_schema2 = udf_schema()
results = why.log(df, schema=custom_schema2)
results.view().to_pandas()
For more details on the different kinds of UDFs (say you wanted to calculate a score based on multiple columns) see this example: