In this notebook, we will explore the MLflow integration in whylogs
.
This example uses the data from MLflow's tutorial for demonstration purposes.
This tutorial showcases how you can use the whylogs integration to:
mlflow
To run this tutorial:
conda create --name whylogs-mlflow python=3.8
conda activate whylogs-mlflow
conda install pip
pip install ipykernel
to install the kernel modulepython -m ipykernel install --user --name=whylogs-mlflow
pip install mlflow[extras]
pip install whylogs[viz]
pip install mlflow
pip install whylogs
pip install scikit-learn
pip install matplotlib
whylogs-mlflow
as your kernelFirst, we want to filter out noisy warnings
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter('ignore')
import random
import time
import pandas as pd
import mlflow
import whylogs
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
Enable whylogs in MLflow to allow storing whylogs statistical profiles with every run. This method returns True
if whylogs is able to patch MLflow.
from whylogs import get_or_create_session
assert whylogs.__version__ >= "0.1.13" # we need 0.1.13 or later for MLflow integration
session = get_or_create_session(".whylogs_mlflow.yaml")
whylogs.enable_mlflow(session)
True
Download and prepare the UCI wine quality dataset. We sample the test dataset further to represent batches of data produced every second.
data_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(data_url, sep=";")
data
fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7.4 | 0.700 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.99780 | 3.51 | 0.56 | 9.4 | 5 |
1 | 7.8 | 0.880 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.99680 | 3.20 | 0.68 | 9.8 | 5 |
2 | 7.8 | 0.760 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.99700 | 3.26 | 0.65 | 9.8 | 5 |
3 | 11.2 | 0.280 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.99800 | 3.16 | 0.58 | 9.8 | 6 |
4 | 7.4 | 0.700 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.99780 | 3.51 | 0.56 | 9.4 | 5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1594 | 6.2 | 0.600 | 0.08 | 2.0 | 0.090 | 32.0 | 44.0 | 0.99490 | 3.45 | 0.58 | 10.5 | 5 |
1595 | 5.9 | 0.550 | 0.10 | 2.2 | 0.062 | 39.0 | 51.0 | 0.99512 | 3.52 | 0.76 | 11.2 | 6 |
1596 | 6.3 | 0.510 | 0.13 | 2.3 | 0.076 | 29.0 | 40.0 | 0.99574 | 3.42 | 0.75 | 11.0 | 6 |
1597 | 5.9 | 0.645 | 0.12 | 2.0 | 0.075 | 32.0 | 44.0 | 0.99547 | 3.57 | 0.71 | 10.2 | 5 |
1598 | 6.0 | 0.310 | 0.47 | 3.6 | 0.067 | 18.0 | 42.0 | 0.99549 | 3.39 | 0.66 | 11.0 | 6 |
1599 rows × 12 columns
# Split the data into training and test sets
train, test = train_test_split(data)
We can quickly get a sense of the shape of the training dataset by profiling it with whylogs. Later, we can compare the baseline data metrics to the profiles of the batches as they flow through our model.
If you'd like to learn more about whylogs, check out our introductory notebook.
summary = session.profile_dataframe(train, "training-data").flat_summary()['summary']
summary
column | count | null_count | bool_count | numeric_count | max | mean | min | stddev | nunique_numbers | ... | stddev_token_length | quantile_0.0000 | quantile_0.0100 | quantile_0.0500 | quantile_0.2500 | quantile_0.5000 | quantile_0.7500 | quantile_0.9500 | quantile_0.9900 | quantile_1.0000 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | volatile acidity | 1199.0 | 0.0 | 0.0 | 1199.0 | 1.58000 | 0.526405 | 0.12000 | 0.177930 | 135.0 | ... | 0.0 | 0.12000 | 0.1800 | 0.2700 | 0.39000 | 0.52000 | 0.6300 | 0.8350 | 1.0200 | 1.58000 |
1 | fixed acidity | 1199.0 | 0.0 | 0.0 | 1199.0 | 15.90000 | 8.299333 | 4.60000 | 1.722787 | 93.0 | ... | 0.0 | 4.60000 | 5.2000 | 6.2000 | 7.10000 | 7.90000 | 9.2000 | 11.7000 | 13.7000 | 15.90000 |
2 | chlorides | 1199.0 | 0.0 | 0.0 | 1199.0 | 0.61100 | 0.086253 | 0.01200 | 0.044362 | 134.0 | ... | 0.0 | 0.01200 | 0.0430 | 0.0540 | 0.07000 | 0.07900 | 0.0890 | 0.1240 | 0.3680 | 0.61100 |
3 | residual sugar | 1199.0 | 0.0 | 0.0 | 1199.0 | 15.50000 | 2.552168 | 0.90000 | 1.432984 | 85.0 | ... | 0.0 | 0.90000 | 1.3000 | 1.5000 | 1.90000 | 2.20000 | 2.6000 | 5.2000 | 8.8000 | 15.50000 |
4 | quality | 1199.0 | 0.0 | 0.0 | 1199.0 | 8.00000 | 5.629691 | 3.00000 | 0.807717 | 6.0 | ... | 0.0 | 3.00000 | 4.0000 | 5.0000 | 5.00000 | 6.00000 | 6.0000 | 7.0000 | 7.0000 | 8.00000 |
5 | pH | 1199.0 | 0.0 | 0.0 | 1199.0 | 4.01000 | 3.312952 | 2.86000 | 0.153886 | 85.0 | ... | 0.0 | 2.86000 | 2.9800 | 3.0600 | 3.21000 | 3.31000 | 3.4100 | 3.5700 | 3.7100 | 4.01000 |
6 | density | 1199.0 | 0.0 | 0.0 | 1199.0 | 1.00369 | 0.996719 | 0.99007 | 0.001885 | 385.0 | ... | 0.0 | 0.99007 | 0.9922 | 0.9936 | 0.99558 | 0.99675 | 0.9978 | 0.9998 | 1.0022 | 1.00369 |
7 | alcohol | 1199.0 | 0.0 | 0.0 | 1199.0 | 14.90000 | 10.432569 | 8.40000 | 1.058259 | 60.0 | ... | 0.0 | 8.40000 | 9.0000 | 9.2000 | 9.50000 | 10.20000 | 11.1000 | 12.4000 | 13.4000 | 14.90000 |
8 | total sulfur dioxide | 1199.0 | 0.0 | 0.0 | 1199.0 | 289.00000 | 46.918265 | 6.00000 | 33.241459 | 137.0 | ... | 0.0 | 6.00000 | 8.0000 | 11.0000 | 22.00000 | 38.00000 | 63.0000 | 112.0000 | 144.0000 | 289.00000 |
9 | free sulfur dioxide | 1199.0 | 0.0 | 0.0 | 1199.0 | 68.00000 | 16.123436 | 1.00000 | 10.542138 | 58.0 | ... | 0.0 | 1.00000 | 3.0000 | 4.0000 | 8.00000 | 14.00000 | 22.0000 | 36.0000 | 51.0000 | 68.00000 |
10 | citric acid | 1199.0 | 0.0 | 0.0 | 1199.0 | 0.78000 | 0.270534 | 0.00000 | 0.193169 | 77.0 | ... | 0.0 | 0.00000 | 0.0000 | 0.0000 | 0.10000 | 0.26000 | 0.4300 | 0.6000 | 0.7300 | 0.78000 |
11 | sulphates | 1199.0 | 0.0 | 0.0 | 1199.0 | 1.98000 | 0.655446 | 0.33000 | 0.165561 | 89.0 | ... | 0.0 | 0.33000 | 0.4200 | 0.4700 | 0.55000 | 0.62000 | 0.7300 | 0.9300 | 1.3100 | 1.98000 |
12 rows × 40 columns
Now that we've taken a peek at our training data metrics, there's one last item on our to-do list: split the test data into batches, so we can feed them through our model later on.
# Relocate predicted variable "quality" to y vectors
train_x = train.drop(["quality"], axis=1).reset_index(drop=True)
test_x = test.drop(["quality"], axis=1).reset_index(drop=True)
train_y = train[["quality"]].reset_index(drop=True)
test_y = test[["quality"]].reset_index(drop=True)
subset_test_x = []
subset_test_y = []
num_batches = 20
for i in range(num_batches):
indices = random.sample(range(len(test)), 5)
subset_test_x.append(test_x.loc[indices, :])
subset_test_y.append(test_y.loc[indices, :])
We'll train an ElasticNet model using scikit-learn, then run this model for each of the batches of data, logging the model parameters, mean absolute error, and whylogs data metrics.
Note that whylogs profiler data is automatically logged when mlflow.end_run()
is called (implicitly or explicitly).
# Create an MLflow experiment for our demo
experiment_name = "whylogs demo"
mlflow.set_experiment(experiment_name)
model_params = {"alpha": 1.0,
"l1_ratio": 0.7}
lr = ElasticNet(**model_params)
lr.fit(train_x, train_y)
print("ElasticNet model (%s):" % model_params)
2022/02/25 14:00:22 INFO mlflow.tracking.fluent: Experiment with name 'whylogs demo' does not exist. Creating a new experiment.
ElasticNet model ({'alpha': 1.0, 'l1_ratio': 0.7}):
# run predictions on the batches of data we set up earlier and log whylogs data
for i in range(num_batches):
with mlflow.start_run(run_name=f"Run {i+1}"):
batch = subset_test_x[i]
predicted_output = lr.predict(batch)
mae = mean_absolute_error(subset_test_y[i], predicted_output)
print("Subset %.0f, mean absolute error: %s" % (i + 1, mae))
mlflow.log_params(model_params)
mlflow.log_metric("mae", mae)
# use whylogs to log data quality metrics for the current batch
mlflow.whylogs.log_pandas(batch)
# wait a second between runs to create a time series of prediction results
time.sleep(1)
Subset 1, mean absolute error: 0.7208114956462538 Subset 2, mean absolute error: 0.4008819363599402 Subset 3, mean absolute error: 0.5837474925658647 Subset 4, mean absolute error: 0.43951696217029623 Subset 5, mean absolute error: 0.5041402918562679 Subset 6, mean absolute error: 0.8977011208878751 Subset 7, mean absolute error: 0.5341530775871286 Subset 8, mean absolute error: 0.486432571693188 Subset 9, mean absolute error: 0.6258337238624618 Subset 10, mean absolute error: 0.5217704719950854 Subset 11, mean absolute error: 0.7837474925658648 Subset 12, mean absolute error: 0.7451900067797711 Subset 13, mean absolute error: 0.31774599013983984 Subset 14, mean absolute error: 0.2855501352978765 Subset 15, mean absolute error: 0.6441147203945459 Subset 16, mean absolute error: 0.5797230107106193 Subset 17, mean absolute error: 0.8103090728104844 Subset 18, mean absolute error: 0.8362973766860998 Subset 19, mean absolute error: 0.4783836079826974 Subset 20, mean absolute error: 0.7003474300030671
Now, let's explore our whylogs data inside the MLflow experiment.
client = mlflow.tracking.MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)
experiment.name, experiment.experiment_id
('whylogs demo', '1')
MLflow stores the data profiles as artifacts. These can be retrieved in the same way you access MLflow projects, parameters, and metrics.
whylogs exposes helper API for accessing whylogs-specific output of an experiment.
whylogs.mlflow.list_whylogs_runs(experiment.experiment_id)
[<RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/79206072408e4b2c974620af776ffb66/artifacts', end_time=1645808453917, experiment_id='1', lifecycle_stage='active', run_id='79206072408e4b2c974620af776ffb66', run_uuid='79206072408e4b2c974620af776ffb66', start_time=1645808453471, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/2e4bcc59569b467483b9ca3d327cd2e1/artifacts', end_time=1645808452436, experiment_id='1', lifecycle_stage='active', run_id='2e4bcc59569b467483b9ca3d327cd2e1', run_uuid='2e4bcc59569b467483b9ca3d327cd2e1', start_time=1645808452052, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/9ce955a844154b84a3544a82dcff7dc8/artifacts', end_time=1645808451014, experiment_id='1', lifecycle_stage='active', run_id='9ce955a844154b84a3544a82dcff7dc8', run_uuid='9ce955a844154b84a3544a82dcff7dc8', start_time=1645808450591, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/013c2024164143aaa2133f0fdcc00a15/artifacts', end_time=1645808449526, experiment_id='1', lifecycle_stage='active', run_id='013c2024164143aaa2133f0fdcc00a15', run_uuid='013c2024164143aaa2133f0fdcc00a15', start_time=1645808449100, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/8e45419f7d664a9d82bfaf2bb6f189fe/artifacts', end_time=1645808448058, experiment_id='1', lifecycle_stage='active', run_id='8e45419f7d664a9d82bfaf2bb6f189fe', run_uuid='8e45419f7d664a9d82bfaf2bb6f189fe', start_time=1645808447560, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/e2cdeba39490490192d8e9c9eaddc6c4/artifacts', end_time=1645808446523, experiment_id='1', lifecycle_stage='active', run_id='e2cdeba39490490192d8e9c9eaddc6c4', run_uuid='e2cdeba39490490192d8e9c9eaddc6c4', start_time=1645808446132, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/15e68a30eb3449e9b3aad5b6dcde6448/artifacts', end_time=1645808445083, experiment_id='1', lifecycle_stage='active', run_id='15e68a30eb3449e9b3aad5b6dcde6448', run_uuid='15e68a30eb3449e9b3aad5b6dcde6448', start_time=1645808444476, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/df7df825e1f74ddc91fe8c7a41278b68/artifacts', end_time=1645808443425, experiment_id='1', lifecycle_stage='active', run_id='df7df825e1f74ddc91fe8c7a41278b68', run_uuid='df7df825e1f74ddc91fe8c7a41278b68', start_time=1645808442821, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/41a171472cf04e16af19593337bc9a73/artifacts', end_time=1645808441771, experiment_id='1', lifecycle_stage='active', run_id='41a171472cf04e16af19593337bc9a73', run_uuid='41a171472cf04e16af19593337bc9a73', start_time=1645808441077, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/9f560c6cfcac48d6b895b02bd64f928b/artifacts', end_time=1645808440023, experiment_id='1', lifecycle_stage='active', run_id='9f560c6cfcac48d6b895b02bd64f928b', run_uuid='9f560c6cfcac48d6b895b02bd64f928b', start_time=1645808439434, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/a3748043ee7d4828b5c8a61a07ff6ad3/artifacts', end_time=1645808438394, experiment_id='1', lifecycle_stage='active', run_id='a3748043ee7d4828b5c8a61a07ff6ad3', run_uuid='a3748043ee7d4828b5c8a61a07ff6ad3', start_time=1645808437873, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/f255ba85ce6344d3a7a7a089b160966b/artifacts', end_time=1645808436832, experiment_id='1', lifecycle_stage='active', run_id='f255ba85ce6344d3a7a7a089b160966b', run_uuid='f255ba85ce6344d3a7a7a089b160966b', start_time=1645808435946, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/0a98d85ff1cd46a999851871beb9f724/artifacts', end_time=1645808434895, experiment_id='1', lifecycle_stage='active', run_id='0a98d85ff1cd46a999851871beb9f724', run_uuid='0a98d85ff1cd46a999851871beb9f724', start_time=1645808434328, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/692776d9518a46c78725d93dbcad69c7/artifacts', end_time=1645808433288, experiment_id='1', lifecycle_stage='active', run_id='692776d9518a46c78725d93dbcad69c7', run_uuid='692776d9518a46c78725d93dbcad69c7', start_time=1645808432673, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/f5128e7e05dd4260859af061d916cde1/artifacts', end_time=1645808431632, experiment_id='1', lifecycle_stage='active', run_id='f5128e7e05dd4260859af061d916cde1', run_uuid='f5128e7e05dd4260859af061d916cde1', start_time=1645808431076, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/4a76ab88334d4e6fbcd42e682bdd0b8f/artifacts', end_time=1645808430033, experiment_id='1', lifecycle_stage='active', run_id='4a76ab88334d4e6fbcd42e682bdd0b8f', run_uuid='4a76ab88334d4e6fbcd42e682bdd0b8f', start_time=1645808429554, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/c215c5f4008a4316bc96817643d8edb4/artifacts', end_time=1645808428510, experiment_id='1', lifecycle_stage='active', run_id='c215c5f4008a4316bc96817643d8edb4', run_uuid='c215c5f4008a4316bc96817643d8edb4', start_time=1645808428045, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/19893b217bf5498f83ef13255874d3b0/artifacts', end_time=1645808427001, experiment_id='1', lifecycle_stage='active', run_id='19893b217bf5498f83ef13255874d3b0', run_uuid='19893b217bf5498f83ef13255874d3b0', start_time=1645808426457, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/3ef3a8a52529446b9f61594b35b9900a/artifacts', end_time=1645808425419, experiment_id='1', lifecycle_stage='active', run_id='3ef3a8a52529446b9f61594b35b9900a', run_uuid='3ef3a8a52529446b9f61594b35b9900a', start_time=1645808424953, status='FINISHED', user_id='felipeadachi'>, <RunInfo: artifact_uri='file:///mnt/c/Users/felip/OneDrive/Documentos/Projects/whylogs-examples/python/mlruns/1/275c797d66294ebe86e16a57d044dfaf/artifacts', end_time=1645808423911, experiment_id='1', lifecycle_stage='active', run_id='275c797d66294ebe86e16a57d044dfaf', run_uuid='275c797d66294ebe86e16a57d044dfaf', start_time=1645808423122, status='FINISHED', user_id='felipeadachi'>]
Our integration allows you to quickly collect the statistical profiles produced during experimentation.
mlflow_profiles = whylogs.mlflow.get_experiment_profiles(experiment.experiment_id)
mlflow_profiles
[<whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e559c8b0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e55aab20>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e553be50>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e554d100>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e555d2e0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54f0550>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54ff7c0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5510a30>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5521ca0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54b1f70>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54c5220>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54d4520>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54e6700>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5479970>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e5487be0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e549ae50>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54aa100>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e543e370>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e544f5e0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fa0e54608e0>]
You can then use whylogs.viz
to easily produce visualizations for the whylogs profile data.
Below, you can see how the data changed over time in our batches for the column called free sulfur dioxide
.
from whylogs.viz import ProfileVisualizer
viz = ProfileVisualizer()
viz.set_profiles(mlflow_profiles)
viz.plot_distribution("free sulfur dioxide", ts_format="%H:%M:%S")
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans. findfont: Generic family 'sans-serif' not found because none of the following families were found: Asap, Verdana findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans. findfont: Generic family 'sans-serif' not found because none of the following families were found: Asap, Verdana findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans. findfont: Generic family 'sans-serif' not found because none of the following families were found: Asap, Verdana
viz.plot_uniqueness("free sulfur dioxide", ts_format="%H:%M:%S")
viz.plot_missing_values("free sulfur dioxide", ts_format="%H:%M:%S")
viz.plot_data_types("free sulfur dioxide", ts_format="%H:%M:%S")
We can also plot the mean error of each run for comparison.
import matplotlib.pyplot as plt
plt.close('all')
runs = mlflow.search_runs(experiment.experiment_id)
plt.figure(figsize=(10,2))
plt.plot(runs['start_time'], runs['metrics.mae'])
plt.show()
session.close()
With whylogs, collecting and visualizing data quality metrics at both training and inference time for your MLflow runs is made dead simple. These metrics can be invaluable when trying to debug model failures or optimize their performance.
whylogs data can be visualized in more complex ways. Check out whylogs.viz for details on the API.
In addition, you can also check out how WhyLabs can help you visualize data quality metrics by visiting our sandbox. Feel free to reach out to our slack channel if you have any questions!