🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!

Getting Started¶

Table of Content¶

In this example, we'll explore the basics of logging data with whylogs and a user defined function or UDF

Installing whylogs¶

In [ ]:

%pip install 'whylogs>=1.5.0'

Loading a Pandas DataFrame¶

Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:

In [1]:

import pandas as pd
data = {
    "animal": ["cat", "hawk", "clam", "cat", "mongoose", "octopus"],
    "legs": [4, 2, 1, 4, 4, 8],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.4, 3.2],
}

df = pd.DataFrame(data)

Defining a simple metric UDF¶

Here we use a metric UDF targeting a named column animal as an example to show how we can add features to a dataframe for custom monitoring. In this example we model some custom logic for if the animal has a cool name. This is a toy example that just checks if the name is longer than 4 characters, and does a binary classification, but you could return a score based on values in a column too.

In [2]:

import whylogs as why
from whylogs.experimental.core.udf_schema import udf_schema
from whylogs.experimental.core.metrics.udf_metric import register_metric_udf


@register_metric_udf(col_name="animal")
def has_cool_animal_name(text):
  if len(text) > 4: # long names are cool
    return 1
  else:
    return 0
  
custom_schema = udf_schema()

Profiling with whylogs + UDFs¶

To obtain a profile of your data, you can simply use whylogs' log call with your UDF schema defined earlier. This will attach a feature named animal.has_cool_animal_name which you can then see in WhyLabs.

In [ ]:

import whylogs as why

why.init(force_local=True)

results = why.log(df, name="udf_demo", schema=custom_schema)
results.view().to_pandas()

Going Further with UDFs¶

Unlike metric UDFs, dataset UDFs can take multiple columns as input. Dataset UDFs create a new column in your pandas dataframe, which then is profiled along with your inputs.

In [4]:

from whylogs.experimental.core.udf_schema import register_dataset_udf
import pandas as pd

@register_dataset_udf(["legs", "weight"])
def weight_per_leg(data: pd.DataFrame) -> pd.Series:
    return data["weight"] / data["legs"]

In [ ]:

custom_schema2 = udf_schema()
results = why.log(df, schema=custom_schema2)
results.view().to_pandas()

For more details on the different kinds of UDFs (say you wanted to calculate a score based on multiple columns) see this example:

https://github.com/whylabs/whylogs/blob/mainline/python/examples/experimental/whylogs_UDF_examples.ipynb

🚩 Create a free WhyLabs account to get more value out of whylogs!

Getting Started¶

Table of Content¶

Installing whylogs¶

Loading a Pandas DataFrame¶

Defining a simple metric UDF¶

Profiling with whylogs + UDFs¶

Going Further with UDFs¶