Notebook

🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!

Drift Algorithm Configuration¶

In whylogs, you can calculate drift scores and generate a summary drift report between two profiles, as shown in the Notebook Profile Visualizer example.

In this example, we will show you how to apply drift calculation with the default algorithm selection, and also to customize the drift calculations in two ways: by choosing the algorithm of your choosing and by changing the algorithm's internal parameters and thresholds for drift detection. We will also show you how to calculate drifts in a standalone manner, without the need to generate a visualization with the summary report.

Currently, whylogs supports the following drift algorithms: Kolmogorov-Smirnov Test, ChiSquare Test, and Hellinger distance - Stay tuned for more algorithms to be added in the future!

Installing whylogs¶

In [ ]:

# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs

Generating the Target and Reference Profiles¶

First, we will generate two profiles, one as the target and one as the reference.

We will use those profiles in order to calculate drift scores for each column in both profiles.

In [1]:

import whylogs as why
import pandas as pd

data = {
    "animal": ["cat", "hawk", "snake", "cat"],
    "legs": [4, 2, 0, 4],
    "weight": [4.3, 1.8, None, 4.1],
}

df = pd.DataFrame(data)

data2 = {
    "animal": ["cat", "hawk", "snake", "cat"],
    "legs": [13, 34, 99, 123],
    "weight": [4.9, 13.3, None, 232.3],
}

df2 = pd.DataFrame(data2)


target_view = why.log(df).profile().view()
ref_view = why.log(df2).profile().view()

Calculating Drift - Default Behavior¶

You can calculate drift scores between your profiles in two ways. The first is to use calculate_drift_scores, which whill give you a dictionary of drift scores for each feature.

The second is to view it integrated into the Notebook Profile Visualizer by calling summary_drift_report. This will give you a drift summary report in the format of an in-notebook visualization or a downloadable HTML file.

Let's see both cases for the default behavior scenario - we won't specify any drift algorithms or parameters.

To get a dictionary with the drift scores, you can use the calculate_drift_scores method:

In [2]:

from whylogs.viz.drift.column_drift_algorithms import calculate_drift_scores

scores = calculate_drift_scores(target_view=target_view, reference_view=ref_view, with_thresholds = True)
scores

Out[2]:

{'animal': {'algorithm': 'chi-square',
  'pvalue': 1.0,
  'statistic': 0.0,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'NO_DRIFT'},
 'legs': {'algorithm': 'ks',
  'pvalue': 0.0,
  'statistic': 1.0,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'},
 'weight': {'algorithm': 'ks',
  'pvalue': 0.0,
  'statistic': 1.0,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'}}

The scores object is a dictionary with the drift scores for each feature with additional metadata.

Algorithm Selection¶

We can see that the KS test was applied for both weight and animal, and chi-squared was applied for animal. The default behavior for choosing which drift algorithm to use is the following: KS is calculated if distribution metrics exists for said column. If not, Chi2 is calculated if frequent items, cardinality and count metric exists. If not, then no drift value is associated to the column.

Thresholds¶

We can also see the thresholds defined by default for each algorithm. Each drift category contains a tuple defining a range: if the measure falls within the range, then the drift category is assigned to the column. For each range, the lower bound is inclusive, while the upper bound is exclusive, except for the maximum upper bound, which is inclusive.

The drift categorization will use either the pvalue or statistic value, depending on the algorithm. Both KS and Chi Square tests compare the pvalue against the thresholds, while the Hellinger distance compares the statistic value against the thresholds.

We can also visualize this information integrated with the NotebookProfileVisualizer summary_drift_report:

In [3]:

from whylogs.viz import NotebookProfileVisualizer

visualization = NotebookProfileVisualizer()
visualization.set_profiles(target_profile_view=target_view, reference_profile_view=ref_view)

visualization.summary_drift_report()

Out[3]:

Feel free to explore the dashboard! You can search by column names and filter by drift categorization. You can also drift on each column's drift category to check the thresholds that were used for the categorization.

Calculating Drift - Choosing a Drift Algorithm¶

You can overwrite the default algorithm selection logic by explicitly stating which algorithms you want to be run.

Suppose now we want hellinger to be used for the weight column, and chi-squared to be used for the legs column. We can do this by passing a dictionary with the column names as keys and the drift algorithm as values.

In [4]:

from whylogs.viz.drift.column_drift_algorithms import Hellinger, ChiSquare

drift_map = {"weight": Hellinger(),"legs": ChiSquare()}

scores = calculate_drift_scores(target_view=target_view, reference_view=ref_view, with_thresholds = True, drift_map=drift_map)

scores

Out[4]:

{'animal': {'algorithm': 'chi-square',
  'pvalue': 1.0,
  'statistic': 0.0,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'NO_DRIFT'},
 'legs': {'algorithm': 'chi-square',
  'pvalue': 0.0,
  'statistic': inf,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'},
 'weight': {'algorithm': 'hellinger',
  'pvalue': None,
  'statistic': 0.4283729905961321,
  'thresholds': {'NO_DRIFT': (0, 0.15),
   'POSSIBLE_DRIFT': (0.15, 0.4),
   'DRIFT': (0.4, 1)},
  'drift_category': 'DRIFT'}}

Note that we didn't specify an algorithm for animal, and we got a drift score nonetheless. If you don't specify an algorithm for a column, the default algorithm selection logic will be used.

In the Visualizer's case, you can choose the algorithms by using the add_drift_config method.

The cell below will also define hellinger for weight and chi-squared for legs:

In [5]:

visualization.add_drift_config(column_names=["weight"], algorithm=Hellinger())
visualization.add_drift_config(column_names=["legs"], algorithm=ChiSquare())

visualization.summary_drift_report()

Out[5]:

Calculating Drift - Customizing Internal Parameters¶

In addition to selecting which algorithms to use, you can also customize the internal parameters of each algorithm.

For example, suppose we want to change the thresholds for the hellinger algorithm, making it less sensitive to drift. We can do this by passing a parameter_config object when instantiating the Hellinger algorithm. One of the parameters in the parameter_config object is a DriftThresholds object, which contains the thresholds for the drift categorization.

We might also want to change the KS algorithm. In whylogs, the quantiles are split into 100 bins by default. If you want to use another number, you can create a KSTestConfig object with your own value for quantiles and pass it to the KSTest algorithm.

Finally, suppose we don't want the Chi Square algorithm to categorize into 3 different classes. We want only a binary categorization, where the column is either drifted or not. We can do this by passing a ChiSquareConfig object with a DriftThresholds object with only two thresholds.

Let's see how to create those config objects and how to pass them to either calculate_drift_score or summary_drift_report:

In [6]:

from whylogs.viz.drift.configs import KSTestConfig, HellingerConfig, ChiSquareConfig, DriftThresholds

hellingerconfig = HellingerConfig(thresholds=DriftThresholds(NO_DRIFT=(0, 0.15), POSSIBLE_DRIFT=(0.15,0.5), DRIFT=(0.5, 1)))

quantiles = [0.0, 0.01, 0.05, 0.25, 0.5, 0.75, 0.95, 0.99, 1.0]
ksconfig = KSTestConfig(quantiles=quantiles)

chisquareconfig = ChiSquareConfig(thresholds=DriftThresholds(DRIFT=(0, 0.1), NO_DRIFT=(0.1, 1)))

With the configs at hand, we can now pass them to the Drift Algorithms:

In [7]:

from whylogs.viz.drift.column_drift_algorithms import Hellinger, ChiSquare, KS

drift_map = {
    "weight": Hellinger(hellingerconfig),
    "animal": ChiSquare(chisquareconfig),
    "legs": KS(ksconfig),
}

scores = calculate_drift_scores(target_view=target_view, reference_view=ref_view, with_thresholds = True, drift_map=drift_map)
scores

Out[7]:

{'animal': {'algorithm': 'chi-square',
  'pvalue': 1.0,
  'statistic': 0.0,
  'thresholds': {'NO_DRIFT': (0.1, 1), 'DRIFT': (0, 0.1)},
  'drift_category': 'NO_DRIFT'},
 'legs': {'algorithm': 'ks',
  'pvalue': 0.0,
  'statistic': 1.0,
  'thresholds': {'NO_DRIFT': (0.15, 1),
   'POSSIBLE_DRIFT': (0.05, 0.15),
   'DRIFT': (0, 0.05)},
  'drift_category': 'DRIFT'},
 'weight': {'algorithm': 'hellinger',
  'pvalue': None,
  'statistic': 0.4283729905961321,
  'thresholds': {'NO_DRIFT': (0, 0.15),
   'POSSIBLE_DRIFT': (0.15, 0.5),
   'DRIFT': (0.5, 1)},
  'drift_category': 'POSSIBLE_DRIFT'}}

In [8]:

visualization.add_drift_config(column_names=["weight"], algorithm=Hellinger(hellingerconfig))
visualization.add_drift_config(column_names=["animal"], algorithm=ChiSquare(chisquareconfig))
visualization.add_drift_config(column_names=["legs"], algorithm=KS(ksconfig))

visualization.summary_drift_report()

Overwriting existing drift configuration for column weight.
Overwriting existing drift configuration for column legs.

Out[8]:

You can check on either the scores object or the drift report that the changes were indeed applied.

Stay tuned for more Drift Algorithms to be added to whylogs!

🚩 Create a free WhyLabs account to get more value out of whylogs!