Notebook

🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!

Simple Constraints - Examples and Usage¶

This is a whylogs v1 example. For the analog feature in v0, please refer to this example

In this example, we'll show how to define a number of simple constraints and examples on how to use them. For the basics on how to build your own set of constraints, see the example - Data Validation with Metric Constraints.

The constraints are listed according to the metric namespace used when defining them. For each category, we will create helper functions for simple and popular constraints. Each helper function has a brief explanation in its docstring. After defining the helper functions, we'll show a simple example on how to build the constraints out of the functions and visualize them as a report with the visualization module.

Note: The constraints shown here are still experimental and subject to further changes. Stay tuned for upgrades!

Completeness Constraints¶

constraint	parameters	semantic	metric
no_missing_values	column name	Checks that are no missing values in the column	Counts
null_values_below_number	column name, number	Number of null values must be below given number.	Counts
null_percentage_below_number	column name, number	Percentage of null values must be below given number.	Counts

Consistency Constraints¶

constraint	parameters	semantic	metric
greater_than_number	column name	Minimum value of given column must be above defined number.	Distribution
smaller_than_number	column name, number	Maximum value of given column must be below defined number.	Distribution
is_in_range	column name, lower, upper	Checks that all of column's values are in defined range (inclusive).	Distribution
is_non_negative	column name	Checks if a column is non negative.	Distribution
n_most_common_items_in_set	column name, reference set	Checks if the top n most common items appear in the dataset.	Frequent Items
frequent_strings_in_reference_set	column name, reference set	Checks if a set of variables appear in the frequent strings for a string column.	Frequent Items
count_below_number	column name, number	Checks if elements in a column are below given number.	Counts
distinct_number_in_range	column name, lower, upper	Checks if number of distinct categories is between lower and upper values (inclusive).	Cardinality
column_is_nullable_integral	column name	Check if column contains only records of specific datatype.	Types
column_is_nullable_boolean	column name	Check if column contains only records of specific datatype.	Types
column_is_nullable_fractional	column name	Check if column contains only records of specific datatype.	Types
column_is_nullable_object	column name	Check if column contains only records of specific datatype.	Types
column_is_nullable_string	column name	Check if column contains only records of specific datatype.	Types

Condition Count Constraints¶

Please refer to the example Metric Constraints with Condition Count Metrics for examples on how to use these constraints.

constraint	parameters	semantic	metric
condition_meets	column name, condition_name	Fails if condition not met at least once.	Condition Count
condition_never_meets	column name, condition_name	Fails if condition is met at least once	Condition Count
condition_count_below	column name, condition_name, max_count	Fails if condition is met more than max count	Condition Count

Statistics Constraints¶

constraint	parameters	semantic	metric
mean_between_range	column name, lower, upper	Mean must be between range defined by lower and upper bounds.	Distribution
stddev_between_range	column name, lower, upper	Standard deviarion must be between range defined by lower and upper bounds.	Distribution
quantile_between_range	column name, quantile, lower, upper	Q-th quantile value must be withing the range defined by lower and upper boundaries.	Distribution

Table of Contents¶

Installing and Importing Modules
Distribution Metrics Constraints
Frequent Items/Frequent Strings Metrics Constraints
Counters Constraints
Cardinality Constraints
Types Constraints
Combined Metrics Constraints

Installing whylogs and importing modules ¶

If you haven't already, install whylogs:

In [1]:

# Note: you may need to restart the kernel to use updated packages.
%pip install 'whylogs[viz]'

Then, let's import the helper functions needed to define the constraints:

In [2]:

from whylogs.core.constraints import ConstraintsBuilder
from whylogs.core.constraints.factories import (
    greater_than_number,
    is_in_range,
    is_non_negative,
    mean_between_range,
    smaller_than_number,
    stddev_between_range,
    quantile_between_range
)

Examples - Distribution Metrics Constraints¶

In [3]:

import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "hawk", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, 4, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [4]:

builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(greater_than_number(column_name="weight", number=0.14))
builder.add_constraint(mean_between_range(column_name="weight", lower=2, upper=3))
builder.add_constraint(smaller_than_number(column_name="weight", number=20.5))
builder.add_constraint(stddev_between_range(column_name="weight", lower=1, upper=3))
builder.add_constraint(quantile_between_range(column_name="weight", quantile=0.5, lower=1.5, upper=2.0))
builder.add_constraint(is_in_range(column_name="weight", lower=1.1, upper=3.2))
builder.add_constraint(is_in_range(column_name="legs", lower=0, upper=6))
builder.add_constraint(is_non_negative(column_name="legs"))

# animal has missing distribution metrics. this will pass if skip_missing = True and fail otherwise.
builder.add_constraint(
    quantile_between_range(
        column_name="animal", 
        quantile=0.5, 
        lower=1.5, 
        upper=2.0, 
        skip_missing=False
    )
)

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

Out[4]:

Frequent Items/Frequent Strings Constraints ¶

In [5]:

from whylogs.core.constraints.factories import n_most_common_items_in_set, frequent_strings_in_reference_set

Examples - Frequent Items/Frequent Strings Constraints¶

In [6]:

import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [0, 1, 2, 3, 4],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [7]:

builder = ConstraintsBuilder(dataset_profile_view=profile_view)
reference_set = {"cat","snake"}
builder.add_constraint(frequent_strings_in_reference_set(column_name="animal", reference_set=reference_set))
builder.add_constraint(n_most_common_items_in_set(column_name="animal",n=2,reference_set=reference_set))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

Out[7]:

Counters Constraints ¶

In [8]:

from whylogs.core.constraints.factories import no_missing_values, count_below_number, null_percentage_below_number, null_values_below_number

Examples - Counters Constraints¶

In [9]:

import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [10]:

builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(count_below_number(column_name="legs", number=10))
builder.add_constraint(null_percentage_below_number(column_name="legs", number=0.05))
builder.add_constraint(null_values_below_number(column_name="legs", number=1))
builder.add_constraint(no_missing_values(column_name="legs"))
builder.add_constraint(no_missing_values(column_name="animal"))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

Out[10]:

Cardinality Constraints ¶

In [11]:

from whylogs.core.constraints.factories import distinct_number_in_range

Examples - Cardinality Constraints¶

In [12]:

import whylogs as why
import pandas as pd
data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
}

results = why.log(pd.DataFrame(data))
profile_view = results.view()

In [13]:

builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(distinct_number_in_range(column_name = "animal", lower = 3, upper = 6))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

Out[13]:

Types Metrics ¶

Examples - Types Metrics¶

In [14]:

import whylogs as why
import pandas as pd

data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
    "flies": [False, False, "False", False, True],
    "obj": [{"a":1}, None, {"a":1}, {"a":1}, {"a":1}]
}
df = pd.DataFrame(data)
results = why.log(df)
profile_view = results.view()

Check Nullable Types¶

In [15]:

from whylogs.core.constraints.factories import ( 
    column_is_nullable_integral,
    column_is_nullable_boolean, 
    column_is_nullable_fractional,
    column_is_nullable_object,
    column_is_nullable_string,
)
from whylogs.core.constraints import ConstraintsBuilder


builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(column_is_nullable_string(column_name="animal"))
builder.add_constraint(column_is_nullable_integral(column_name="legs"))
builder.add_constraint(column_is_nullable_fractional(column_name="weight"))
builder.add_constraint(column_is_nullable_boolean(column_name="flies"))
builder.add_constraint(column_is_nullable_object(column_name="obj"))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

Out[15]:

The constraints above will pass if all values are of a given type. Null values are accepted.

Note that for legs, the constraints failed. That is because whylogs leverages pandas' dtypes when it is available, and when a None is present, the column is considered to be fractional, even though the remaining values were originally integers.

Combined Constraints ¶

Examples - Combined Metrics¶

To create a constraint that checks for a non-nullable type, we combine two separate constraints:

column is nullable datatype
null values below 1

In [16]:

import whylogs as why
import pandas as pd

data = {
    "animal": ["cat", "snake", "snake", "cat", "mosquito"],
    "legs": [4, 2, 0, None, 6],
    "weight": [4.3, 1.8, 1.3, 4.1, 5.5e-6],
    "flies": [False, False, "False", False, True],
    "obj": [{"a":1}, None, {"a":1}, {"a":1}, {"a":1}]
}
df = pd.DataFrame(data)
results = why.log(df)
profile_view = results.view()

Check Non-nullable Types¶

In [17]:

from whylogs.core.constraints.factories import ( 
    column_is_nullable_integral,
    column_is_nullable_boolean, 
    column_is_nullable_fractional,
    column_is_nullable_object,
    column_is_nullable_string,
    null_values_below_number,
)
from whylogs.core.constraints import ConstraintsBuilder


builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(column_is_nullable_string(column_name="animal"))
builder.add_constraint(null_values_below_number(column_name="animal",number=1))

# The combination of these metrics makes a check of non-nullable integral
builder.add_constraint(column_is_nullable_integral(column_name="legs"))
builder.add_constraint(null_values_below_number(column_name="legs",number=1))

# The combination of these metrics makes a check of non-nullable fractional
builder.add_constraint(column_is_nullable_fractional(column_name="weight"))
builder.add_constraint(null_values_below_number(column_name="weight",number=1))

# The combination of these metrics makes a check of non-nullable boolean
builder.add_constraint(column_is_nullable_boolean(column_name="flies"))
builder.add_constraint(null_values_below_number(column_name="flies",number=1))

# The combination of these metrics makes a check of non-nullable object
builder.add_constraint(column_is_nullable_object(column_name="obj"))
builder.add_constraint(null_values_below_number(column_name="obj",number=1))

constraints = builder.build()

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.constraints_report(constraints, cell_height=300)

Out[17]:

🚩 Create a free WhyLabs account to get more value out of whylogs!