🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!
WhyLogs enables logging different types of data that can then be used to monitor the data. We'll go through examples on different types of data to log and go more in depth on different options. Before we get going though, let's import whylogs.
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
WARNING: You are using pip version 22.0.3; however, version 22.1 is available. You should consider upgrading via the '/Users/melanie/Dev/whylogs-v1/python/.venv/bin/python -m pip install --upgrade pip' command.
We will be generating log by importing data from a CSV into Pandas Dataframe, logging it with the whylogs python library.
import os.path
import pandas as pd
# Read in a CSV, this one is from a public bucket on s3
retail_daily = pd.read_csv('https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/retail-daily-features.csv')
retail_daily
Transaction ID | Customer ID | Product Subcategory Code | Product Category Code | Item Price | Total Tax | Total Amount | Store Type | Product Category | Product Subcategory | Date of Birth | Gender | City Code | Age at Transaction Date | Purchase Canceled | Transaction Day of Week | Transaction Week | Transaction Batch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | T25601292314 | C268458 | 12 | 6 | 114.9 | 24.1290 | 253.9290 | TeleShop | Home and kitchen | Tools | 1976-10-08 | M | 1.0 | 36.0 | 0.0 | 0 | 0 | 0 |
1 | T1465175267 | C271344 | 3 | 5 | 107.7 | 22.6170 | 238.0170 | e-Shop | Books | Comics | 1970-01-29 | F | 5.0 | 43.0 | 0.0 | 0 | 0 | 0 |
2 | T4968790114 | C272305 | 4 | 3 | 14.6 | 7.6650 | 80.6650 | e-Shop | Electronics | Mobiles | 1975-08-25 | F | 10.0 | 37.0 | 0.0 | 0 | 0 | 0 |
3 | T50504166310 | C275057 | 4 | 4 | 15.7 | 4.9455 | 52.0455 | MBR | Bags | Women | 1980-09-17 | M | 7.0 | 32.0 | 0.0 | 0 | 0 | 0 |
4 | T10877729712 | C270074 | 10 | 5 | 144.1 | 45.3915 | 477.6915 | e-Shop | Books | Non-Fiction | 1983-02-20 | M | 10.0 | 30.0 | 0.0 | 0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
904 | T89167826318 | C274270 | 10 | 5 | 68.2 | 14.3220 | 150.7220 | Flagship store | Books | Non-Fiction | 1972-06-06 | F | 1.0 | 40.0 | 0.0 | 0 | 0 | 0 |
905 | T87193008634 | C271051 | 11 | 6 | 124.2 | 52.1640 | 548.9640 | e-Shop | Home and kitchen | Bath | 1976-02-13 | F | 3.0 | 36.0 | 0.0 | 0 | 0 | 0 |
906 | T84036395834 | C270763 | 11 | 5 | 77.9 | 16.3590 | 172.1590 | TeleShop | Books | Children | 1991-02-10 | F | 8.0 | 21.0 | 0.0 | 0 | 0 | 0 |
907 | T72150045625 | C270432 | 11 | 5 | 11.8 | 4.9560 | 52.1560 | e-Shop | Books | Children | 1982-09-17 | M | 7.0 | 30.0 | 1.0 | 0 | 0 | 0 |
908 | T97942600110 | C269559 | 4 | 3 | 67.9 | 35.6475 | 375.1475 | e-Shop | Electronics | Mobiles | 1972-06-24 | M | 1.0 | 40.0 | 0.0 | 0 | 0 | 0 |
909 rows × 18 columns
import whylogs as why
# Log the data frame. This equivalent to why.log(retail_daily) and why.log(data=retail_daily)
results = why.log(pandas=retail_daily)
# Get the Results
profile = results.profile()
# Head down to Display a Log for explination
profile.view().to_pandas()
counts/n | counts/null | types/integral | types/fractional | types/boolean | types/string | types/object | cardinality/est | cardinality/upper_1 | cardinality/lower_1 | ... | distribution/min | distribution/q_10 | distribution/q_25 | distribution/median | distribution/q_75 | distribution/q_90 | type | ints/max | ints/min | frequent_items/frequent_strings | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
Purchase Canceled | 909 | 72 | 0 | 837 | 0 | 0 | 0 | 2.000000 | 2.000100 | 2.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Age at Transaction Date | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 25.000001 | 25.001250 | 25.000000 | ... | 19.000 | 21.0000 | 25.000 | 31.0000 | 37.0000 | 40.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Transaction Week | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 1.000000 | 1.000050 | 1.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | 0.0 | 0.0 | [FrequentItem(value='0.000000', est=909, upper... |
Store Type | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 4.000000 | 4.000200 | 4.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='e-Shop', est=375, upper=3... |
Product Category | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 6.000000 | 6.000300 | 6.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='Books', est=232, upper=23... |
Gender | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 2.000000 | 2.000100 | 2.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='M', est=455, upper=455, l... |
Transaction ID | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 904.722898 | 916.565225 | 893.168643 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='T40336799311', est=3, upp... |
Item Price | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 672.542875 | 681.346093 | 663.953801 | ... | 7.100 | 18.2000 | 43.200 | 80.1000 | 116.2000 | 137.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Total Tax | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 800.975225 | 811.459552 | 790.745935 | ... | 0.861 | 4.8825 | 10.017 | 20.5170 | 36.1725 | 54.2640 | SummaryType.COLUMN | NaN | NaN | NaN |
Product Category Code | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 6.000000 | 6.000300 | 6.000000 | ... | 1.000 | 1.0000 | 2.000 | 4.0000 | 5.0000 | 6.0000 | SummaryType.COLUMN | 6.0 | 1.0 | [FrequentItem(value='5.000000', est=232, upper... |
Transaction Day of Week | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 1.000000 | 1.000050 | 1.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | 0.0 | 0.0 | [FrequentItem(value='0.000000', est=909, upper... |
City Code | 909 | 1 | 0 | 908 | 0 | 0 | 0 | 10.000000 | 10.000500 | 10.000000 | ... | 1.000 | 1.0000 | 3.000 | 5.0000 | 8.0000 | 10.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Transaction Batch | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 1.000000 | 1.000050 | 1.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | 0.0 | 0.0 | [FrequentItem(value='0.000000', est=909, upper... |
Date of Birth | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 801.978113 | 812.475567 | 791.736016 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='1981-03-29', est=4, upper... |
Customer ID | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 847.420398 | 858.512667 | 836.597955 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='C274278', est=4, upper=3,... |
Total Amount | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 842.098548 | 853.121157 | 831.344071 | ... | -767.975 | 25.3045 | 82.433 | 188.4025 | 358.6830 | 555.2625 | SummaryType.COLUMN | NaN | NaN | NaN |
Product Subcategory | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 18.000001 | 18.000899 | 18.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='Women', est=133, upper=13... |
Product Subcategory Code | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 12.000000 | 12.000599 | 12.000000 | ... | 1.000 | 1.0000 | 3.000 | 5.0000 | 10.0000 | 11.0000 | SummaryType.COLUMN | 12.0 | 1.0 | [FrequentItem(value='4.000000', est=148, upper... |
18 rows × 24 columns
Sometimes a quick log is all you need though and don't want to set up a DataFrame. We can log a dictionary as if it were a single row of data. This works best when the values of that dictionary are scalar data, any collection values or nested values will be tracked with only a basic type counter and these entries get mapped to the object count.
Suppose we want to log art prints that are being shown to see what sells best.
import whylogs as why
example_data = {"height": 100, "length": 1000, "status": "sold", "price": 58.00, "medium": ["watercolor", "digital"] }
# Log the dictionary this is equivalent to why.log(example_data)
dict_results = why.log(row=example_data)
# Retrieve the profile
profile_from_dict = dict_results.profile()
# Head to Display Logs to explain
profile_from_dict.view().to_pandas()
counts/n | counts/null | types/integral | types/fractional | types/boolean | types/string | types/object | cardinality/est | cardinality/upper_1 | cardinality/lower_1 | ... | distribution/n | distribution/max | distribution/min | distribution/q_10 | distribution/q_25 | distribution/median | distribution/q_75 | distribution/q_90 | ints/max | ints/min | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
status | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1.0 | 1.00005 | 1.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
medium | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
price | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1.0 | 1.00005 | 1.0 | ... | 1.0 | 58.0 | 58.0 | 58.0 | 58.0 | 58.0 | 58.0 | 58.0 | NaN | NaN |
height | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1.0 | 1.00005 | 1.0 | ... | 1.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
length | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1.0 | 1.00005 | 1.0 | ... | 1.0 | 1000.0 | 1000.0 | 1000.0 | 1000.0 | 1000.0 | 1000.0 | 1000.0 | 1000.0 | 1000.0 |
5 rows × 24 columns
There are many ways to display the data! Examples in "Integrations", "WhyLabs", and "Use Cases" showcase how to use a variety of tools to see your data. Also the Notebook_Profile_Visualizer helps you display the profile with a variety of charts.
Your log will from above returns results including the profile. It's this profile that we can view and export as a Pandas DataFrame.
# Note run any of the examples above to get the results for this block
#grab profile from result set
profile = results.profile()
#grab a 'view' of the profile for inspection
prof_view = profile.view()
#inspect profile as a Pandas DataFrame
prof_df = prof_view.to_pandas()
prof_df
counts/n | counts/null | types/integral | types/fractional | types/boolean | types/string | types/object | cardinality/est | cardinality/upper_1 | cardinality/lower_1 | ... | distribution/min | distribution/q_10 | distribution/q_25 | distribution/median | distribution/q_75 | distribution/q_90 | type | ints/max | ints/min | frequent_items/frequent_strings | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
Purchase Canceled | 909 | 72 | 0 | 837 | 0 | 0 | 0 | 2.000000 | 2.000100 | 2.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Age at Transaction Date | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 25.000001 | 25.001250 | 25.000000 | ... | 19.000 | 21.0000 | 25.000 | 31.0000 | 37.0000 | 40.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Transaction Week | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 1.000000 | 1.000050 | 1.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | 0.0 | 0.0 | [FrequentItem(value='0.000000', est=909, upper... |
Store Type | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 4.000000 | 4.000200 | 4.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='e-Shop', est=375, upper=3... |
Product Category | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 6.000000 | 6.000300 | 6.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='Books', est=232, upper=23... |
Gender | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 2.000000 | 2.000100 | 2.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='M', est=455, upper=455, l... |
Transaction ID | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 904.722898 | 916.565225 | 893.168643 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='T40336799311', est=3, upp... |
Item Price | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 672.542875 | 681.346093 | 663.953801 | ... | 7.100 | 18.2000 | 43.200 | 80.1000 | 116.2000 | 137.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Total Tax | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 800.975225 | 811.459552 | 790.745935 | ... | 0.861 | 4.8825 | 10.017 | 20.5170 | 36.1725 | 54.2640 | SummaryType.COLUMN | NaN | NaN | NaN |
Product Category Code | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 6.000000 | 6.000300 | 6.000000 | ... | 1.000 | 1.0000 | 2.000 | 4.0000 | 5.0000 | 6.0000 | SummaryType.COLUMN | 6.0 | 1.0 | [FrequentItem(value='5.000000', est=232, upper... |
Transaction Day of Week | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 1.000000 | 1.000050 | 1.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | 0.0 | 0.0 | [FrequentItem(value='0.000000', est=909, upper... |
City Code | 909 | 1 | 0 | 908 | 0 | 0 | 0 | 10.000000 | 10.000500 | 10.000000 | ... | 1.000 | 1.0000 | 3.000 | 5.0000 | 8.0000 | 10.0000 | SummaryType.COLUMN | NaN | NaN | NaN |
Transaction Batch | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 1.000000 | 1.000050 | 1.000000 | ... | 0.000 | 0.0000 | 0.000 | 0.0000 | 0.0000 | 0.0000 | SummaryType.COLUMN | 0.0 | 0.0 | [FrequentItem(value='0.000000', est=909, upper... |
Date of Birth | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 801.978113 | 812.475567 | 791.736016 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='1981-03-29', est=4, upper... |
Customer ID | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 847.420398 | 858.512667 | 836.597955 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='C274278', est=4, upper=3,... |
Total Amount | 909 | 0 | 0 | 909 | 0 | 0 | 0 | 842.098548 | 853.121157 | 831.344071 | ... | -767.975 | 25.3045 | 82.433 | 188.4025 | 358.6830 | 555.2625 | SummaryType.COLUMN | NaN | NaN | NaN |
Product Subcategory | 909 | 0 | 0 | 0 | 0 | 909 | 0 | 18.000001 | 18.000899 | 18.000000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | SummaryType.COLUMN | NaN | NaN | [FrequentItem(value='Women', est=133, upper=13... |
Product Subcategory Code | 909 | 0 | 909 | 0 | 0 | 0 | 0 | 12.000000 | 12.000599 | 12.000000 | ... | 1.000 | 1.0000 | 3.000 | 5.0000 | 10.0000 | 11.0000 | SummaryType.COLUMN | 12.0 | 1.0 | [FrequentItem(value='4.000000', est=148, upper... |
18 rows × 24 columns