In this notebook, we will explore the MLflow integration in whylogs
.
This example uses the data from MLflow's tutorial for demonstration purposes.
This tutorial showcases how you can use the whylogs integration to:
mlflow
To run this tutorial:
conda create --name whylogs-mlflow python=3.8
conda activate whylogs-mlflow
conda install pip
pip install ipykernel
to install the kernel modulepython -m ipykernel install --user --name=whylogs-mlflow
pip install mlflow[extras]
pip install whylogs[viz]
pip install mlflow
pip install whylogs
pip install scikit-learn
pip install matplotlib
whylogs-mlflow
as your kernelFirst, we want to filter out noisy warnings
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter('ignore')
import random
import time
import pandas as pd
import mlflow
import whylogs
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
Enable whylogs in MLflow to allow storing whylogs statistical profiles with every run. This method returns True
if whylogs is able to patch MLflow.
assert whylogs.__version__ >= "0.1.13" # we need 0.1.13 or later for MLflow integration
whylogs.enable_mlflow()
True
Download and prepare the UCI wine quality dataset. We sample the test dataset further to represent batches of data produced every second.
data_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(data_url, sep=";")
data
fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7.4 | 0.700 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.99780 | 3.51 | 0.56 | 9.4 | 5 |
1 | 7.8 | 0.880 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.99680 | 3.20 | 0.68 | 9.8 | 5 |
2 | 7.8 | 0.760 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.99700 | 3.26 | 0.65 | 9.8 | 5 |
3 | 11.2 | 0.280 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.99800 | 3.16 | 0.58 | 9.8 | 6 |
4 | 7.4 | 0.700 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.99780 | 3.51 | 0.56 | 9.4 | 5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1594 | 6.2 | 0.600 | 0.08 | 2.0 | 0.090 | 32.0 | 44.0 | 0.99490 | 3.45 | 0.58 | 10.5 | 5 |
1595 | 5.9 | 0.550 | 0.10 | 2.2 | 0.062 | 39.0 | 51.0 | 0.99512 | 3.52 | 0.76 | 11.2 | 6 |
1596 | 6.3 | 0.510 | 0.13 | 2.3 | 0.076 | 29.0 | 40.0 | 0.99574 | 3.42 | 0.75 | 11.0 | 6 |
1597 | 5.9 | 0.645 | 0.12 | 2.0 | 0.075 | 32.0 | 44.0 | 0.99547 | 3.57 | 0.71 | 10.2 | 5 |
1598 | 6.0 | 0.310 | 0.47 | 3.6 | 0.067 | 18.0 | 42.0 | 0.99549 | 3.39 | 0.66 | 11.0 | 6 |
1599 rows × 12 columns
# Split the data into training and test sets
train, test = train_test_split(data)
We can quickly get a sense of the shape of the training dataset by profiling it with whylogs. Later, we can compare the baseline data metrics to the profiles of the batches as they flow through our model.
If you'd like to learn more about whylogs, check out our introductory notebook.
from whylogs import get_or_create_session
with get_or_create_session() as session:
summary = session.profile_dataframe(train, "training-data").flat_summary()['summary']
summary
column | count | null_count | bool_count | numeric_count | max | mean | min | stddev | nunique_numbers | ... | nunique_str_upper | quantile_0.0000 | quantile_0.0100 | quantile_0.0500 | quantile_0.2500 | quantile_0.5000 | quantile_0.7500 | quantile_0.9500 | quantile_0.9900 | quantile_1.0000 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | chlorides | 1199.0 | 0.0 | 0.0 | 1199.0 | 0.61100 | 0.086965 | 0.01200 | 0.044101 | 134.0 | ... | 0.0 | 0.01200 | 0.04100 | 0.05400 | 0.07000 | 0.07900 | 0.0900 | 0.123 | 0.3680 | 0.61100 |
1 | quality | 1199.0 | 0.0 | 0.0 | 1199.0 | 8.00000 | 5.626355 | 3.00000 | 0.785196 | 6.0 | ... | 0.0 | 3.00000 | 4.00000 | 5.00000 | 5.00000 | 6.00000 | 6.0000 | 7.000 | 8.0000 | 8.00000 |
2 | total sulfur dioxide | 1199.0 | 0.0 | 0.0 | 1199.0 | 289.00000 | 46.488741 | 6.00000 | 33.239143 | 141.0 | ... | 0.0 | 6.00000 | 8.00000 | 11.00000 | 22.00000 | 37.00000 | 63.0000 | 115.000 | 148.0000 | 289.00000 |
3 | alcohol | 1199.0 | 0.0 | 0.0 | 1199.0 | 14.90000 | 10.404393 | 8.40000 | 1.060160 | 65.0 | ... | 0.0 | 8.40000 | 9.00000 | 9.20000 | 9.50000 | 10.10000 | 11.0000 | 12.500 | 13.4000 | 14.90000 |
4 | density | 1199.0 | 0.0 | 0.0 | 1199.0 | 1.00369 | 0.996796 | 0.99007 | 0.001863 | 390.0 | ... | 0.0 | 0.99007 | 0.99235 | 0.99362 | 0.99566 | 0.99677 | 0.9979 | 1.000 | 1.0022 | 1.00369 |
5 | free sulfur dioxide | 1199.0 | 0.0 | 0.0 | 1199.0 | 72.00000 | 15.822769 | 1.00000 | 10.503918 | 57.0 | ... | 0.0 | 1.00000 | 3.00000 | 4.00000 | 7.00000 | 13.00000 | 21.0000 | 35.000 | 48.0000 | 72.00000 |
6 | volatile acidity | 1199.0 | 0.0 | 0.0 | 1199.0 | 1.33000 | 0.530367 | 0.12000 | 0.178894 | 136.0 | ... | 0.0 | 0.12000 | 0.18000 | 0.27000 | 0.40000 | 0.53000 | 0.6400 | 0.855 | 1.0350 | 1.33000 |
7 | sulphates | 1199.0 | 0.0 | 0.0 | 1199.0 | 1.98000 | 0.657131 | 0.33000 | 0.168270 | 89.0 | ... | 0.0 | 0.33000 | 0.43000 | 0.47000 | 0.55000 | 0.62000 | 0.7300 | 0.950 | 1.3400 | 1.98000 |
8 | citric acid | 1199.0 | 0.0 | 0.0 | 1199.0 | 0.78000 | 0.269700 | 0.00000 | 0.193572 | 75.0 | ... | 0.0 | 0.00000 | 0.00000 | 0.00000 | 0.09000 | 0.26000 | 0.4200 | 0.600 | 0.6900 | 0.78000 |
9 | pH | 1199.0 | 0.0 | 0.0 | 1199.0 | 4.01000 | 3.310234 | 2.89000 | 0.153440 | 82.0 | ... | 0.0 | 2.89000 | 2.93000 | 3.06000 | 3.21000 | 3.31000 | 3.4000 | 3.560 | 3.6800 | 4.01000 |
10 | residual sugar | 1199.0 | 0.0 | 0.0 | 1199.0 | 15.50000 | 2.575438 | 0.90000 | 1.459931 | 85.0 | ... | 0.0 | 0.90000 | 1.30000 | 1.60000 | 1.90000 | 2.20000 | 2.6000 | 5.200 | 8.3000 | 15.50000 |
11 | fixed acidity | 1199.0 | 0.0 | 0.0 | 1199.0 | 15.90000 | 8.328774 | 4.60000 | 1.723000 | 90.0 | ... | 0.0 | 4.60000 | 5.20000 | 6.10000 | 7.10000 | 7.90000 | 9.3000 | 11.700 | 13.0000 | 15.90000 |
12 rows × 32 columns
Now that we've taken a peek at our training data metrics, there's one last item on our to-do list: split the test data into batches, so we can feed them through our model later on.
# Relocate predicted variable "quality" to y vectors
train_x = train.drop(["quality"], axis=1).reset_index(drop=True)
test_x = test.drop(["quality"], axis=1).reset_index(drop=True)
train_y = train[["quality"]].reset_index(drop=True)
test_y = test[["quality"]].reset_index(drop=True)
subset_test_x = []
subset_test_y = []
num_batches = 20
for i in range(num_batches):
indices = random.sample(range(len(test)), 5)
subset_test_x.append(test_x.loc[indices, :])
subset_test_y.append(test_y.loc[indices, :])
We'll train an ElasticNet model using scikit-learn, then run this model for each of the batches of data, logging the model parameters, mean absolute error, and whylogs data metrics.
Note that whylogs profiler data is automatically logged when mlflow.end_run()
is called (implicitly or explicitly).
# Create an MLflow experiment for our demo
experiment_name = "whylogs demo"
mlflow.set_experiment(experiment_name)
model_params = {"alpha": 1.0,
"l1_ratio": 0.7}
lr = ElasticNet(**model_params)
lr.fit(train_x, train_y)
print("ElasticNet model (%s):" % model_params)
INFO: 'whylogs demo' does not exist. Creating a new experiment ElasticNet model ({'alpha': 1.0, 'l1_ratio': 0.7}):
# run predictions on the batches of data we set up earlier and log whylogs data
for i in range(num_batches):
with mlflow.start_run(run_name=f"Run {i+1}"):
batch = subset_test_x[i]
predicted_output = lr.predict(batch)
mae = mean_absolute_error(subset_test_y[i], predicted_output)
print("Subset %.0f, mean absolute error: %s" % (i + 1, mae))
mlflow.log_params(model_params)
mlflow.log_metric("mae", mae)
# use whylogs to log data quality metrics for the current batch
mlflow.whylogs.log_pandas(batch)
# wait a second between runs to create a time series of prediction results
time.sleep(1)
Subset 1, mean absolute error: 0.5789794874388365 Subset 2, mean absolute error: 0.6111029246160566 Subset 3, mean absolute error: 0.8536210353690337 Subset 4, mean absolute error: 1.0097638903189077 Subset 5, mean absolute error: 1.1419334387935294 Subset 6, mean absolute error: 0.4389454235595741 Subset 7, mean absolute error: 0.6943716888788721 Subset 8, mean absolute error: 0.5281415556056386 Subset 9, mean absolute error: 0.8662166919084389 Subset 10, mean absolute error: 0.8462118256399723 Subset 11, mean absolute error: 0.9189648886334384 Subset 12, mean absolute error: 0.6721440596916878 Subset 13, mean absolute error: 0.8699212967729693 Subset 14, mean absolute error: 0.7364127140011986 Subset 15, mean absolute error: 0.8105048112918138 Subset 16, mean absolute error: 0.5930083232393926 Subset 17, mean absolute error: 0.6795532694207491 Subset 18, mean absolute error: 0.38086797237556774 Subset 19, mean absolute error: 0.7352651500373737 Subset 20, mean absolute error: 0.6262069593715065
Now, let's explore our whylogs data inside the MLflow experiment.
client = mlflow.tracking.MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)
experiment.name, experiment.experiment_id
('whylogs demo', '1')
MLflow stores the data profiles as artifacts. These can be retrieved in the same way you access MLflow projects, parameters, and metrics.
whylogs exposes helper API for accessing whylogs-specific output of an experiment.
whylogs.mlflow.list_whylogs_runs(experiment.experiment_id)
[<RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/964c5ebd92904c5fb26d86d4a7ff90eb/artifacts', end_time=1612491706925, experiment_id='1', lifecycle_stage='active', run_id='964c5ebd92904c5fb26d86d4a7ff90eb', run_uuid='964c5ebd92904c5fb26d86d4a7ff90eb', start_time=1612491706903, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/1b3f6274772a464cb1e81d860dd7ff7e/artifacts', end_time=1612491705899, experiment_id='1', lifecycle_stage='active', run_id='1b3f6274772a464cb1e81d860dd7ff7e', run_uuid='1b3f6274772a464cb1e81d860dd7ff7e', start_time=1612491705876, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/8c3130a637d846eebede052e3bb2d1cf/artifacts', end_time=1612491704870, experiment_id='1', lifecycle_stage='active', run_id='8c3130a637d846eebede052e3bb2d1cf', run_uuid='8c3130a637d846eebede052e3bb2d1cf', start_time=1612491704847, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/da99de11af7841ff983392cffd7c3706/artifacts', end_time=1612491703840, experiment_id='1', lifecycle_stage='active', run_id='da99de11af7841ff983392cffd7c3706', run_uuid='da99de11af7841ff983392cffd7c3706', start_time=1612491703814, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/f2c5079de6984fb7af10f47789caf2d2/artifacts', end_time=1612491702809, experiment_id='1', lifecycle_stage='active', run_id='f2c5079de6984fb7af10f47789caf2d2', run_uuid='f2c5079de6984fb7af10f47789caf2d2', start_time=1612491702785, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/f48cdd04ad7946d8954d9bf9f03ce9a4/artifacts', end_time=1612491701780, experiment_id='1', lifecycle_stage='active', run_id='f48cdd04ad7946d8954d9bf9f03ce9a4', run_uuid='f48cdd04ad7946d8954d9bf9f03ce9a4', start_time=1612491701747, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/31dd9654f58a440c9e87130273cadb36/artifacts', end_time=1612491700742, experiment_id='1', lifecycle_stage='active', run_id='31dd9654f58a440c9e87130273cadb36', run_uuid='31dd9654f58a440c9e87130273cadb36', start_time=1612491700718, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/cefb2663adaa430798e57bf75ab68830/artifacts', end_time=1612491699714, experiment_id='1', lifecycle_stage='active', run_id='cefb2663adaa430798e57bf75ab68830', run_uuid='cefb2663adaa430798e57bf75ab68830', start_time=1612491699692, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/49609961a6c74f429f20a1ac35d286b5/artifacts', end_time=1612491698688, experiment_id='1', lifecycle_stage='active', run_id='49609961a6c74f429f20a1ac35d286b5', run_uuid='49609961a6c74f429f20a1ac35d286b5', start_time=1612491698666, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/5fee982192dd428cb916c86b4a92d02b/artifacts', end_time=1612491697660, experiment_id='1', lifecycle_stage='active', run_id='5fee982192dd428cb916c86b4a92d02b', run_uuid='5fee982192dd428cb916c86b4a92d02b', start_time=1612491697638, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/12ef4b57219f49bd8cb46a51258fb9bb/artifacts', end_time=1612491696634, experiment_id='1', lifecycle_stage='active', run_id='12ef4b57219f49bd8cb46a51258fb9bb', run_uuid='12ef4b57219f49bd8cb46a51258fb9bb', start_time=1612491696613, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/936c053196674015b1ff93b27e6b50d3/artifacts', end_time=1612491695611, experiment_id='1', lifecycle_stage='active', run_id='936c053196674015b1ff93b27e6b50d3', run_uuid='936c053196674015b1ff93b27e6b50d3', start_time=1612491695590, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/13e6f333d02d4555b3b6c1592c4b55d7/artifacts', end_time=1612491694588, experiment_id='1', lifecycle_stage='active', run_id='13e6f333d02d4555b3b6c1592c4b55d7', run_uuid='13e6f333d02d4555b3b6c1592c4b55d7', start_time=1612491694566, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/32ebbdbe3922426e8857138553213a2d/artifacts', end_time=1612491693561, experiment_id='1', lifecycle_stage='active', run_id='32ebbdbe3922426e8857138553213a2d', run_uuid='32ebbdbe3922426e8857138553213a2d', start_time=1612491693535, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/358fa648910f4505a47505a9978c5179/artifacts', end_time=1612491692533, experiment_id='1', lifecycle_stage='active', run_id='358fa648910f4505a47505a9978c5179', run_uuid='358fa648910f4505a47505a9978c5179', start_time=1612491692510, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/8149206986e24d4ebc64de7e6a50b7e7/artifacts', end_time=1612491691504, experiment_id='1', lifecycle_stage='active', run_id='8149206986e24d4ebc64de7e6a50b7e7', run_uuid='8149206986e24d4ebc64de7e6a50b7e7', start_time=1612491691484, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/baeff8fdb79a457e89dc13b85fc6c4d8/artifacts', end_time=1612491690482, experiment_id='1', lifecycle_stage='active', run_id='baeff8fdb79a457e89dc13b85fc6c4d8', run_uuid='baeff8fdb79a457e89dc13b85fc6c4d8', start_time=1612491690461, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/39fd55aef2f74443a48d9ed3b8ad4840/artifacts', end_time=1612491689454, experiment_id='1', lifecycle_stage='active', run_id='39fd55aef2f74443a48d9ed3b8ad4840', run_uuid='39fd55aef2f74443a48d9ed3b8ad4840', start_time=1612491689435, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/34f58f05a6064eb2a43f23fb3f70d62a/artifacts', end_time=1612491688429, experiment_id='1', lifecycle_stage='active', run_id='34f58f05a6064eb2a43f23fb3f70d62a', run_uuid='34f58f05a6064eb2a43f23fb3f70d62a', start_time=1612491688403, status='FINISHED', user_id='alex'>, <RunInfo: artifact_uri='file:///Users/alex/Code/whylogs-examples/python/mlruns/1/0a8985c8f01a42c2bad8a31dfbb9699d/artifacts', end_time=1612491687400, experiment_id='1', lifecycle_stage='active', run_id='0a8985c8f01a42c2bad8a31dfbb9699d', run_uuid='0a8985c8f01a42c2bad8a31dfbb9699d', start_time=1612491687377, status='FINISHED', user_id='alex'>]
Our integration allows you to quickly collect the statistical profiles produced during experimentation.
mlflow_profiles = whylogs.mlflow.get_experiment_profiles(experiment.experiment_id)
mlflow_profiles
[<whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fbc39d0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd09d00>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd10f70>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd10d90>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd22e80>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd2a7f0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd32b20>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd3bdc0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd43ee0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd3bc70>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd586a0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd62a60>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd69c40>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd72e20>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd72d60>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd699d0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd907c0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fd989d0>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fda2d60>, <whylogs.core.datasetprofile.DatasetProfile at 0x7fbc7fdaaf10>]
You can then use whylogs.viz
to easily produce visualizations for the whylogs profile data.
Below, you can see how the data changed over time in our batches for the column called free sulfur dioxide
.
from whylogs.viz import ProfileVisualizer
viz = ProfileVisualizer()
viz.set_profiles(mlflow_profiles)
viz.plot_distribution("free sulfur dioxide", ts_format="%d-%b-%y %H:%M:%S")
viz.plot_uniqueness("free sulfur dioxide", ts_format="%d-%b-%y %H:%M:%S")
viz.plot_missing_values("free sulfur dioxide", ts_format="%d-%b-%y %H:%M:%S")
viz.plot_data_types("free sulfur dioxide", ts_format="%d-%b-%y %H:%M:%S")
We can also plot the mean error of each run for comparison.
import matplotlib.pyplot as plt
plt.close('all')
runs = mlflow.search_runs(experiment.experiment_id)
plt.figure(figsize=(10,2))
plt.plot(runs['start_time'], runs['metrics.mae'])
plt.show()
With whylogs, collecting and visualizing data quality metrics at both training and inference time for your MLflow runs is made dead simple. These metrics can be invaluable when trying to debug model failures or optimize their performance.
whylogs data can be visualized in more complex ways. Check out whylogs.viz for details on the API.
In addition, you can also check out how WhyLabs can help you visualize data quality metrics by visiting our sandbox. Feel free to reach out to our slack channel if you have any questions!