Run AI with Certainty
In order to use this example notebook, you'll first need to head to WhyLabs and signup for a free account.
You can skip the onboarding code example if you are using this noteboook
As part of the onboarding workflow, you will receive an organization ID for your account. This is the identifier for your account.
You'll also need to create an access token as part of the onboarding flow.
Please go to Settings -> Access Tokens to generate tokens.
To begin, uncomment the cell below and install the whylogs library.
✅ The whylogs
library profiles data in real time, collecting thousands of metrics from structured data, unstructured data, and ML model predictions with zero configuration.
✅ This library runs locally on your machine and collects relevant metrics in dataset profiles that can both be logged to disk and uploaded to the WhyLabs Platform for monitoring.
# Note: you may need to restart the kernel to use updated packages.
### The following WhyLabs Platform integration example requires the latest whylogs version:
%pip install whylogs
The example data is prepared from our public S3 bucket. Here in the example we have prepared a few examples CSVs for the example.
import pandas as pd
pdfs = []
for i in range(1, 8):
path = f"https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_{i}.csv"
print(f"Loading data from {path}")
df = pd.read_csv(path)
pdfs.append(df)
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_1.csv Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_2.csv Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_3.csv Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_4.csv Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_5.csv Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_6.csv Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/demo_batches/input_batch_7.csv
pdfs[0].describe()
Unnamed: 0 | id | member_id | loan_amnt | funded_amnt | funded_amnt_inv | int_rate | installment | annual_inc | desc | ... | hardship_loan_status | orig_projected_additional_accrued_interest | hardship_payoff_balance_amount | hardship_last_payment_amount | debt_settlement_flag_date | settlement_status | settlement_date | settlement_amount | settlement_percentage | settlement_term | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 407.000000 | 4.070000e+02 | 0.0 | 407.000000 | 407.000000 | 407.000000 | 407.000000 | 407.000000 | 407.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
mean | 12548.717445 | 1.158631e+08 | NaN | 14203.746929 | 14203.746929 | 14202.948403 | 13.514054 | 418.020344 | 78818.956069 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
std | 125.354772 | 1.207642e+06 | NaN | 9351.142374 | 9351.142374 | 9350.997874 | 5.446881 | 271.096531 | 55864.939403 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
min | 12325.000000 | 1.121538e+08 | NaN | 1000.000000 | 1000.000000 | 1000.000000 | 5.320000 | 34.220000 | 0.000000 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
25% | 12442.500000 | 1.150769e+08 | NaN | 7000.000000 | 7000.000000 | 7000.000000 | 9.930000 | 235.580000 | 43325.000000 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
50% | 12550.000000 | 1.157004e+08 | NaN | 12000.000000 | 12000.000000 | 12000.000000 | 12.620000 | 357.250000 | 63300.000000 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
75% | 12653.500000 | 1.168245e+08 | NaN | 20000.000000 | 20000.000000 | 20000.000000 | 16.020000 | 553.515000 | 95000.000000 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
max | 12862.000000 | 1.181592e+08 | NaN | 40000.000000 | 40000.000000 | 40000.000000 | 30.990000 | 1417.710000 | 495000.000000 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8 rows × 126 columns
whylogs
, by default, does not send statistics to WhyLabs.
There are a few small steps you need to set up. If you haven't got the access key, please onboard with WhyLabs and generate an API key https://hub.whylabsapp.com/settings/access-tokens.
WhyLabs only requires whylogs profiles - your raw data never leaves your machine.
import whylogs as why
# Create a model in the dashboard and use that model id as the default dataset id in the prompt here. It will be
# saved in your whylogs conifg for future use. You can optionally supply reinit=True to reset your conifg.
why.init()
❓ What kind of session do you want to use? ⤷ 1. WhyLabs. Use an api key to upload to WhyLabs. ⤷ 2. WhyLabs Anonymous. Upload data anonymously to WhyLabs and get a viewing url. Initializing session with config /home/jamie/.config/whylogs/config.ini ✅ Using session type: WHYLABS_ANONYMOUS ⤷ session id: <will be generated before upload>
<whylogs.api.whylabs.session.session.GuestSession at 0x7fbf885ff880>
You can run this init from the command line as well with.
python -m whylogs.api.whylabs.session.why_init
You can use this to reset your config if you want to change your api key or default dataset it.
Ensure you have a model ID (also called dataset ID) before you start!
whylogs
dataset_timestamp
parameter, it'll default to UTC
nowimport datetime
import whylogs as why
for i, df in enumerate(pdfs):
# walking backwards. Each dataset has to map to a date to show up as a different batch
# in WhyLabs
dt = datetime.datetime.now(tz=datetime.timezone.utc) - datetime.timedelta(days=i)
# log each day's data and set the date on the profile
results = why.log(df, dataset_timestamp=dt)
✅ Aggregated 407 rows into profile Visualize and explore this profile with one-click 🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725321600000&sessionToken=session-GKTK6PAd ✅ Aggregated 390 rows into profile Visualize and explore this profile with one-click 🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725235200000&sessionToken=session-GKTK6PAd ✅ Aggregated 382 rows into profile Visualize and explore this profile with one-click 🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725148800000&sessionToken=session-GKTK6PAd ✅ Aggregated 371 rows into profile Visualize and explore this profile with one-click 🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1725062400000&sessionToken=session-GKTK6PAd ✅ Aggregated 301 rows into profile Visualize and explore this profile with one-click 🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1724976000000&sessionToken=session-GKTK6PAd ✅ Aggregated 392 rows into profile Visualize and explore this profile with one-click 🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1724889600000&sessionToken=session-GKTK6PAd ✅ Aggregated 283 rows into profile Visualize and explore this profile with one-click 🔍 https://hub.whylabsapp.com/resources/model-1/profiles?profile=1724803200000&sessionToken=session-GKTK6PAd
from IPython.core.display import HTML
from whylogs.api.whylabs.session.session_manager import get_current_session
session = get_current_session()
model_id = session.config.get_default_dataset_id()
HTML(f'To view your statistics, go to the <a href="https://hub.whylabsapp.com/models/{model_id}/summary" target="_blank">model dashboard</a>')
WhyLabs stores the follow statistics, from what is configured in whylogs
Notice that these statistics are organized in batches. So if you run the above cells again, you'll see the statistics changed.
To go further, visit our documentation for more detailed of everything that you can do to start monitoring your ML and data pipelines.
You can also join our Community Slack Channel for questions related to whylogs
or cut us a ticket if you encounter issues with Whylabs onboarding.