MLflow + whylogs Integration¶

In this notebook, we will explore the MLflow integration in whylogs.

This example uses the data from MLflow's tutorial for demonstration purposes.

This tutorial showcases how you can use the whylogs integration to:

Capture data quality metrics while training a linear regression model in mlflow
Extract whylogs data back into an in-memory format from the MLflow backend
Visualize this data

Getting Started¶

To run this tutorial:

Install conda
Create a new environment with conda via conda create --name whylogs-mlflow python=3.8
- You'll need to activate the environment with conda activate whylogs-mlflow
- You'll need to install pip into the Conda environment conda install pip
- To make the environment work with Jupyter notebooks, run pip install ipykernel to install the kernel module
- Install the environment as a Jupyter notebook kernel via python -m ipykernel install --user --name=whylogs-mlflow
Install MLflow with scikit-learn via pip install mlflow[extras]
Install whylogs with matplotlib via pip install whylogs[viz]
You can also install the necessary libraries separately:
- MLflow: pip install mlflow
- whylogs: pip install whylogs
- scikit-learn: pip install scikit-learn
- matplotlib: pip install matplotlib
In your notebook, ensure you select whylogs-mlflow as your kernel

Setup¶

First, we want to filter out noisy warnings

In [1]:

import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter('ignore')

In [2]:

import random
import time

import pandas as pd
import mlflow
import whylogs

from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

Enable whylogs Integration¶

Enable whylogs in MLflow to allow storing whylogs statistical profiles with every run. This method returns True if whylogs is able to patch MLflow.

In [3]:

from whylogs import get_or_create_session
assert whylogs.__version__ >= "0.1.13" # we need 0.1.13 or later for MLflow integration
session = get_or_create_session(".whylogs_mlflow.yaml")
whylogs.enable_mlflow(session)

Out[3]:

True

Dataset Preparation¶

Download and prepare the UCI wine quality dataset. We sample the test dataset further to represent batches of data produced every second.

In [4]:

data_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(data_url, sep=";")
data

Out[4]:

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
0	7.4	0.700	0.00	1.9	0.076	11.0	34.0	0.99780	3.51	0.56	9.4	5
1	7.8	0.880	0.00	2.6	0.098	25.0	67.0	0.99680	3.20	0.68	9.8	5
2	7.8	0.760	0.04	2.3	0.092	15.0	54.0	0.99700	3.26	0.65	9.8	5
3	11.2	0.280	0.56	1.9	0.075	17.0	60.0	0.99800	3.16	0.58	9.8	6
4	7.4	0.700	0.00	1.9	0.076	11.0	34.0	0.99780	3.51	0.56	9.4	5
...	...	...	...	...	...	...	...	...	...	...	...	...
1594	6.2	0.600	0.08	2.0	0.090	32.0	44.0	0.99490	3.45	0.58	10.5	5
1595	5.9	0.550	0.10	2.2	0.062	39.0	51.0	0.99512	3.52	0.76	11.2	6
1596	6.3	0.510	0.13	2.3	0.076	29.0	40.0	0.99574	3.42	0.75	11.0	6
1597	5.9	0.645	0.12	2.0	0.075	32.0	44.0	0.99547	3.57	0.71	10.2	5
1598	6.0	0.310	0.47	3.6	0.067	18.0	42.0	0.99549	3.39	0.66	11.0	6

1599 rows × 12 columns

In [5]:

# Split the data into training and test sets
train, test = train_test_split(data)

We can quickly get a sense of the shape of the training dataset by profiling it with whylogs. Later, we can compare the baseline data metrics to the profiles of the batches as they flow through our model.

If you'd like to learn more about whylogs, check out our introductory notebook.

In [6]:

summary = session.profile_dataframe(train, "training-data").flat_summary()['summary']

summary

Out[6]:

	column	count	numeric_count	max	mean	min	stddev	nunique_numbers	...	quantile_0.0000	quantile_0.0100	quantile_0.0500	quantile_0.2500	quantile_0.5000	quantile_0.7500	quantile_0.9500	quantile_0.9900	quantile_1.0000
0	volatile acidity	1199.0	1199.0	1.58000	0.526405	0.12000	0.177930	135.0	...	0.12000	0.1800	0.2700	0.39000	0.52000	0.6300	0.8350	1.0200	1.58000
1	fixed acidity	1199.0	1199.0	15.90000	8.299333	4.60000	1.722787	93.0	...	4.60000	5.2000	6.2000	7.10000	7.90000	9.2000	11.7000	13.7000	15.90000
2	chlorides	1199.0	1199.0	0.61100	0.086253	0.01200	0.044362	134.0	...	0.01200	0.0430	0.0540	0.07000	0.07900	0.0890	0.1240	0.3680	0.61100
3	residual sugar	1199.0	1199.0	15.50000	2.552168	0.90000	1.432984	85.0	...	0.90000	1.3000	1.5000	1.90000	2.20000	2.6000	5.2000	8.8000	15.50000
4	quality	1199.0	1199.0	8.00000	5.629691	3.00000	0.807717	6.0	...	3.00000	4.0000	5.0000	5.00000	6.00000	6.0000	7.0000	7.0000	8.00000
5	pH	1199.0	1199.0	4.01000	3.312952	2.86000	0.153886	85.0	...	2.86000	2.9800	3.0600	3.21000	3.31000	3.4100	3.5700	3.7100	4.01000
6	density	1199.0	1199.0	1.00369	0.996719	0.99007	0.001885	385.0	...	0.99007	0.9922	0.9936	0.99558	0.99675	0.9978	0.9998	1.0022	1.00369
7	alcohol	1199.0	1199.0	14.90000	10.432569	8.40000	1.058259	60.0	...	8.40000	9.0000	9.2000	9.50000	10.20000	11.1000	12.4000	13.4000	14.90000
8	total sulfur dioxide	1199.0	1199.0	289.00000	46.918265	6.00000	33.241459	137.0	...	6.00000	8.0000	11.0000	22.00000	38.00000	63.0000	112.0000	144.0000	289.00000
9	free sulfur dioxide	1199.0	1199.0	68.00000	16.123436	1.00000	10.542138	58.0	...	1.00000	3.0000	4.0000	8.00000	14.00000	22.0000	36.0000	51.0000	68.00000
10	citric acid	1199.0	1199.0	0.78000	0.270534	0.00000	0.193169	77.0	...	0.00000	0.0000	0.0000	0.10000	0.26000	0.4300	0.6000	0.7300	0.78000
11	sulphates	1199.0	1199.0	1.98000	0.655446	0.33000	0.165561	89.0	...	0.33000	0.4200	0.4700	0.55000	0.62000	0.7300	0.9300	1.3100	1.98000

12 rows × 40 columns

Now that we've taken a peek at our training data metrics, there's one last item on our to-do list: split the test data into batches, so we can feed them through our model later on.

In [7]:

# Relocate predicted variable "quality" to y vectors
train_x = train.drop(["quality"], axis=1).reset_index(drop=True)
test_x = test.drop(["quality"], axis=1).reset_index(drop=True)
train_y = train[["quality"]].reset_index(drop=True)
test_y = test[["quality"]].reset_index(drop=True)

subset_test_x = []
subset_test_y = []
num_batches = 20
for i in range(num_batches):
    indices = random.sample(range(len(test)), 5)
    subset_test_x.append(test_x.loc[indices, :])
    subset_test_y.append(test_y.loc[indices, :])

Train the model¶

We'll train an ElasticNet model using scikit-learn, then run this model for each of the batches of data, logging the model parameters, mean absolute error, and whylogs data metrics.

Note that whylogs profiler data is automatically logged when mlflow.end_run() is called (implicitly or explicitly).

In [8]:

# Create an MLflow experiment for our demo
experiment_name = "whylogs demo"
mlflow.set_experiment(experiment_name)

model_params = {"alpha": 1.0,
                "l1_ratio": 0.7}

lr = ElasticNet(**model_params)
lr.fit(train_x, train_y)
print("ElasticNet model (%s):" % model_params)

2022/02/25 14:00:22 INFO mlflow.tracking.fluent: Experiment with name 'whylogs demo' does not exist. Creating a new experiment.

ElasticNet model ({'alpha': 1.0, 'l1_ratio': 0.7}):

In [9]:

# run predictions on the batches of data we set up earlier and log whylogs data
for i in range(num_batches):
    with mlflow.start_run(run_name=f"Run {i+1}"):
        batch = subset_test_x[i]
        predicted_output = lr.predict(batch)

        mae = mean_absolute_error(subset_test_y[i], predicted_output)
        print("Subset %.0f, mean absolute error: %s" % (i + 1, mae))

        mlflow.log_params(model_params)
        mlflow.log_metric("mae", mae)

        # use whylogs to log data quality metrics for the current batch
        mlflow.whylogs.log_pandas(batch)

    # wait a second between runs to create a time series of prediction results
    time.sleep(1)

Subset 1, mean absolute error: 0.7208114956462538
Subset 2, mean absolute error: 0.4008819363599402
Subset 3, mean absolute error: 0.5837474925658647
Subset 4, mean absolute error: 0.43951696217029623
Subset 5, mean absolute error: 0.5041402918562679
Subset 6, mean absolute error: 0.8977011208878751
Subset 7, mean absolute error: 0.5341530775871286
Subset 8, mean absolute error: 0.486432571693188
Subset 9, mean absolute error: 0.6258337238624618
Subset 10, mean absolute error: 0.5217704719950854
Subset 11, mean absolute error: 0.7837474925658648
Subset 12, mean absolute error: 0.7451900067797711
Subset 13, mean absolute error: 0.31774599013983984
Subset 14, mean absolute error: 0.2855501352978765
Subset 15, mean absolute error: 0.6441147203945459
Subset 16, mean absolute error: 0.5797230107106193
Subset 17, mean absolute error: 0.8103090728104844
Subset 18, mean absolute error: 0.8362973766860998
Subset 19, mean absolute error: 0.4783836079826974
Subset 20, mean absolute error: 0.7003474300030671

Accessing whylogs Data From Your Experiment¶

Now, let's explore our whylogs data inside the MLflow experiment.

In [10]:

client = mlflow.tracking.MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)
experiment.name, experiment.experiment_id

Out[10]:

('whylogs demo', '1')

MLflow stores the data profiles as artifacts. These can be retrieved in the same way you access MLflow projects, parameters, and metrics.

whylogs exposes helper API for accessing whylogs-specific output of an experiment.

In [11]:

whylogs.mlflow.list_whylogs_runs(experiment.experiment_id)

Out[11]:

[<RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/79206072408e4b2c974620af776ffb66/artifacts', end_time=1645808453917, experiment_id='1', lifecycle_stage='active', run_id='79206072408e4b2c974620af776ffb66', run_uuid='79206072408e4b2c974620af776ffb66', start_time=1645808453471, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/2e4bcc59569b467483b9ca3d327cd2e1/artifacts', end_time=1645808452436, experiment_id='1', lifecycle_stage='active', run_id='2e4bcc59569b467483b9ca3d327cd2e1', run_uuid='2e4bcc59569b467483b9ca3d327cd2e1', start_time=1645808452052, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/9ce955a844154b84a3544a82dcff7dc8/artifacts', end_time=1645808451014, experiment_id='1', lifecycle_stage='active', run_id='9ce955a844154b84a3544a82dcff7dc8', run_uuid='9ce955a844154b84a3544a82dcff7dc8', start_time=1645808450591, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/013c2024164143aaa2133f0fdcc00a15/artifacts', end_time=1645808449526, experiment_id='1', lifecycle_stage='active', run_id='013c2024164143aaa2133f0fdcc00a15', run_uuid='013c2024164143aaa2133f0fdcc00a15', start_time=1645808449100, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/8e45419f7d664a9d82bfaf2bb6f189fe/artifacts', end_time=1645808448058, experiment_id='1', lifecycle_stage='active', run_id='8e45419f7d664a9d82bfaf2bb6f189fe', run_uuid='8e45419f7d664a9d82bfaf2bb6f189fe', start_time=1645808447560, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/e2cdeba39490490192d8e9c9eaddc6c4/artifacts', end_time=1645808446523, experiment_id='1', lifecycle_stage='active', run_id='e2cdeba39490490192d8e9c9eaddc6c4', run_uuid='e2cdeba39490490192d8e9c9eaddc6c4', start_time=1645808446132, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/15e68a30eb3449e9b3aad5b6dcde6448/artifacts', end_time=1645808445083, experiment_id='1', lifecycle_stage='active', run_id='15e68a30eb3449e9b3aad5b6dcde6448', run_uuid='15e68a30eb3449e9b3aad5b6dcde6448', start_time=1645808444476, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/df7df825e1f74ddc91fe8c7a41278b68/artifacts', end_time=1645808443425, experiment_id='1', lifecycle_stage='active', run_id='df7df825e1f74ddc91fe8c7a41278b68', run_uuid='df7df825e1f74ddc91fe8c7a41278b68', start_time=1645808442821, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/41a171472cf04e16af19593337bc9a73/artifacts', end_time=1645808441771, experiment_id='1', lifecycle_stage='active', run_id='41a171472cf04e16af19593337bc9a73', run_uuid='41a171472cf04e16af19593337bc9a73', start_time=1645808441077, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/9f560c6cfcac48d6b895b02bd64f928b/artifacts', end_time=1645808440023, experiment_id='1', lifecycle_stage='active', run_id='9f560c6cfcac48d6b895b02bd64f928b', run_uuid='9f560c6cfcac48d6b895b02bd64f928b', start_time=1645808439434, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/a3748043ee7d4828b5c8a61a07ff6ad3/artifacts', end_time=1645808438394, experiment_id='1', lifecycle_stage='active', run_id='a3748043ee7d4828b5c8a61a07ff6ad3', run_uuid='a3748043ee7d4828b5c8a61a07ff6ad3', start_time=1645808437873, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/f255ba85ce6344d3a7a7a089b160966b/artifacts', end_time=1645808436832, experiment_id='1', lifecycle_stage='active', run_id='f255ba85ce6344d3a7a7a089b160966b', run_uuid='f255ba85ce6344d3a7a7a089b160966b', start_time=1645808435946, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/0a98d85ff1cd46a999851871beb9f724/artifacts', end_time=1645808434895, experiment_id='1', lifecycle_stage='active', run_id='0a98d85ff1cd46a999851871beb9f724', run_uuid='0a98d85ff1cd46a999851871beb9f724', start_time=1645808434328, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/692776d9518a46c78725d93dbcad69c7/artifacts', end_time=1645808433288, experiment_id='1', lifecycle_stage='active', run_id='692776d9518a46c78725d93dbcad69c7', run_uuid='692776d9518a46c78725d93dbcad69c7', start_time=1645808432673, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/f5128e7e05dd4260859af061d916cde1/artifacts', end_time=1645808431632, experiment_id='1', lifecycle_stage='active', run_id='f5128e7e05dd4260859af061d916cde1', run_uuid='f5128e7e05dd4260859af061d916cde1', start_time=1645808431076, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/4a76ab88334d4e6fbcd42e682bdd0b8f/artifacts', end_time=1645808430033, experiment_id='1', lifecycle_stage='active', run_id='4a76ab88334d4e6fbcd42e682bdd0b8f', run_uuid='4a76ab88334d4e6fbcd42e682bdd0b8f', start_time=1645808429554, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/c215c5f4008a4316bc96817643d8edb4/artifacts', end_time=1645808428510, experiment_id='1', lifecycle_stage='active', run_id='c215c5f4008a4316bc96817643d8edb4', run_uuid='c215c5f4008a4316bc96817643d8edb4', start_time=1645808428045, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/19893b217bf5498f83ef13255874d3b0/artifacts', end_time=1645808427001, experiment_id='1', lifecycle_stage='active', run_id='19893b217bf5498f83ef13255874d3b0', run_uuid='19893b217bf5498f83ef13255874d3b0', start_time=1645808426457, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/3ef3a8a52529446b9f61594b35b9900a/artifacts', end_time=1645808425419, experiment_id='1', lifecycle_stage='active', run_id='3ef3a8a52529446b9f61594b35b9900a', run_uuid='3ef3a8a52529446b9f61594b35b9900a', start_time=1645808424953, status='FINISHED', user_id='felipeadachi'>,
 <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/275c797d66294ebe86e16a57d044dfaf/artifacts', end_time=1645808423911, experiment_id='1', lifecycle_stage='active', run_id='275c797d66294ebe86e16a57d044dfaf', run_uuid='275c797d66294ebe86e16a57d044dfaf', start_time=1645808423122, status='FINISHED', user_id='felipeadachi'>]

Visualizing whylogs Data¶

Our integration allows you to quickly collect the statistical profiles produced during experimentation.

In [12]:

mlflow_profiles = whylogs.mlflow.get_experiment_profiles(experiment.experiment_id)
mlflow_profiles

Out[12]:

[<whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e559c8b0>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e55aab20>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e553be50>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e554d100>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e555d2e0>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54f0550>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54ff7c0>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5510a30>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5521ca0>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54b1f70>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54c5220>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54d4520>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54e6700>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5479970>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5487be0>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e549ae50>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54aa100>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e543e370>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e544f5e0>,
 <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54608e0>]

You can then use whylogs.viz to easily produce visualizations for the whylogs profile data.

Below, you can see how the data changed over time in our batches for the column called free sulfur dioxide.

In [13]:

from whylogs.viz import ProfileVisualizer

viz = ProfileVisualizer()
viz.set_profiles(mlflow_profiles)

In [14]:

viz.plot_distribution("free sulfur dioxide", ts_format="%H:%M:%S")

findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Asap, Verdana
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Asap, Verdana
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Asap, Verdana

Out[14]:

In [15]:

viz.plot_uniqueness("free sulfur dioxide", ts_format="%H:%M:%S")

Out[15]:

In [16]:

viz.plot_missing_values("free sulfur dioxide", ts_format="%H:%M:%S")

Out[16]:

In [17]:

viz.plot_data_types("free sulfur dioxide", ts_format="%H:%M:%S")

Out[17]:

We can also plot the mean error of each run for comparison.

In [18]:

import matplotlib.pyplot as plt

plt.close('all')

runs = mlflow.search_runs(experiment.experiment_id)
plt.figure(figsize=(10,2))
plt.plot(runs['start_time'], runs['metrics.mae'])
plt.show()

In [19]:

session.close()

With whylogs, collecting and visualizing data quality metrics at both training and inference time for your MLflow runs is made dead simple. These metrics can be invaluable when trying to debug model failures or optimize their performance.

whylogs data can be visualized in more complex ways. Check out whylogs.viz for details on the API.

In addition, you can also check out how WhyLabs can help you visualize data quality metrics by visiting our sandbox. Feel free to reach out to our slack channel if you have any questions!