Transactions allow several profiles to be commited to WhyLabs as a group. Let's start with some setup.
!pip install whylogs
import whylogs as why
from whylabs_client.api.transactions_api import TransactionsApi
from whylogs.core.schema import DatasetSchema
from whylogs.core.segmentation_partition import segment_on_column
from whylogs.api.writer.whylabs import WhyLabsWriter, WhyLabsTransaction
import os
from uuid import uuid4
from whylogs.datasets import Ecommerce
import numpy as np
import pandas as pd
from datetime import datetime, timedelta, timezone
os.environ["WHYLABS_DEFAULT_ORG_ID"] = "org-XXX"
os.environ["WHYLABS_DEFAULT_DATASET_ID"] = "model-XXX"
os.environ["WHYLABS_API_KEY"] = "XXXX:org-XXX"
dataset = Ecommerce()
daily_batches = dataset.get_inference_data(number_batches=20)
list_daily_batches = list(daily_batches)
columns = ['product','sales_last_week','market_price','rating','category','output_discount','output_prediction','output_score']
df = list_daily_batches[0].data[columns]
df.head()
product | sales_last_week | market_price | rating | category | output_discount | output_prediction | output_score | |
---|---|---|---|---|---|---|---|---|
date | ||||||||
2024-02-23 00:00:00+00:00 | 1-2-3 Noodles - Veg Masala Flavour | 2 | 12.0 | 4.200000 | Snacks and Branded Foods | 0 | 0 | 1.000000 |
2024-02-23 00:00:00+00:00 | Jaggery Powder - Organic, Sulphur Free | 1 | 280.0 | 3.996552 | Gourmet and World Food | 0 | 0 | 0.571833 |
2024-02-23 00:00:00+00:00 | Pudding - Assorted | 3 | 50.0 | 4.400000 | Gourmet and World Food | 0 | 1 | 0.600000 |
2024-02-23 00:00:00+00:00 | Perfectly Moist Dark Chocolate Fudge Cake Mix ... | 1 | 495.0 | 4.000000 | Gourmet and World Food | 0 | 1 | 0.517833 |
2024-02-23 00:00:00+00:00 | Pasta/Spaghetti Spoon - Nylon, Silicon Handle,... | 1 | 299.0 | 3.732046 | Kitchen, Garden and Pets | 1 | 1 | 0.950000 |
why.init(force_local=True)
Initializing session with config /root/.config/whylogs/config.ini ✅ Using session type: LOCAL. Profiles won't be uploaded or written anywhere automatically.
<whylogs.api.whylabs.session.session.LocalSession at 0x7f57fad06ec0>
writer = WhyLabsWriter()
WhyLabsWriter::start_transaction()
signals the start of a transaction. Profiles sent to WhyLabs with WhyLabsWriter::write()
during the transaction are uploaded to WhyLabs immediately, but won't be processed until WhyLabsWriter::commit_transaction()
is called.
transaction_id = writer.start_transaction()
print(f"Started transaction {transaction_id}")
for i in range(5):
batch_df = list_daily_batches[i].data[columns]
profile = why.log(batch_df)
timestamp = datetime.now(tz=timezone.utc) - timedelta(days=i+1)
profile.set_dataset_timestamp(timestamp)
status, id = writer.write(profile)
print(status, id)
writer.commit_transaction()
print("Commiting transaction")
Started transaction df4ff687-f881-4633-8a44-3d1c24f631d3 True log-v12vewf7Cu9j3aVV True log-PWW3D23edKlU0aFt True log-hDYs8dGamli2LHdq True log-4JIe3jWBahpMou07 True log-2bmc0Rl3u4oGBIu8 Commiting transaction
The WhyLabsTransaction
context manager can simplify error handling.
timestamp = datetime.now(tz=timezone.utc) - timedelta(days=2)
timestamp
datetime.datetime(2024, 2, 20, 0, 14, 58, 753029, tzinfo=datetime.timezone.utc)
try:
with WhyLabsTransaction(writer):
print("Started transaction")
for i in range(5):
batch_df = list_daily_batches[i].data[columns]
profile = why.log(df)
profile.set_dataset_timestamp(timestamp)
status, id = writer.write(profile)
print(status, id)
except Exception:
print("Transaction failed")
print("Committed transaction")
Started transaction True log-yaHvpXyNRO53ilWo True log-Zsa0lbCCqjzzjLGJ True log-pg57yHO6RuvO4Q8J True log-FSYoOwtmE8x51xSr True log-v3G6VyLUn1x1crVy Committed transaction
If a write()
call returns a False
status, the profile will not be included in the transaction. You might want to retry writing it. If not, that profile will be left out of the transaction, but those successfully written will still be included.
Each segment in a segmneted profile get uploaded to WhyLabs in a separate S3 interaction. Segmented profiles can be sent as a transaction so that all the segments are committed to WhyLabs at once. In this case, the status returned from WhyLabsWriter::write()
is the logical and of the statuses of each segment, and it returns a list of all the segmented ids.
schema = DatasetSchema(segments=segment_on_column("output_discount"))
profile = why.log(df, schema=schema)
with WhyLabsTransaction(writer):
status, id = writer.write(profile)
print(f"{status} {id}")
True log-Rhlr7KzY6pp7vla5; log-8ihsF7KAbhAfNFM6