🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!
whylogs provides a standard to log any kind of data.
With whylogs, we will show how to log data, generating statistical summaries called profiles. These profiles can be used in a number of ways, like:
In this example, we'll explore the basics of logging data with whylogs:
whylogs is made available as a Python package. You can get the latest version from PyPI with pip install whylogs
:
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
Minimal requirements:
Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:
import pandas as pd
data = {
"animal": ["cat", "hawk", "snake", "cat"],
"legs": [4, 2, 0, 4],
"weight": [4.3, 1.8, 1.3, 4.1],
}
df = pd.DataFrame(data)
To obtain a profile of your data, you can simply use whylogs' log
call, and navigate through the result to a specific profile with profile()
:
import whylogs as why
results = why.log(df)
profile = results.profile()
Once you're done logging the data, you can generate a Profile View
and inspect it in a Pandas Dataframe format:
prof_view = profile.view()
prof_df = prof_view.to_pandas()
prof_df
cardinality/est | cardinality/lower_1 | cardinality/upper_1 | counts/inf | counts/n | counts/nan | counts/null | distribution/max | distribution/mean | distribution/median | ... | frequent_items/frequent_strings | type | types/boolean | types/fractional | types/integral | types/object | types/string | types/tensor | ints/max | ints/min | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
animal | 3.0 | 3.0 | 3.00015 | 0 | 4 | 0 | 0 | NaN | 0.000 | NaN | ... | [FrequentItem(value='cat', est=2, upper=2, low... | SummaryType.COLUMN | 0 | 0 | 0 | 0 | 4 | 0 | NaN | NaN |
legs | 3.0 | 3.0 | 3.00015 | 0 | 4 | 0 | 0 | 4.0 | 2.500 | 4.0 | ... | [FrequentItem(value='4', est=2, upper=2, lower... | SummaryType.COLUMN | 0 | 0 | 4 | 0 | 0 | 0 | 4.0 | 0.0 |
weight | 4.0 | 4.0 | 4.00020 | 0 | 4 | 0 | 0 | 4.3 | 2.875 | 4.1 | ... | NaN | SummaryType.COLUMN | 0 | 4 | 0 | 0 | 0 | 0 | NaN | NaN |
3 rows × 31 columns
This will provide you with valuable statistics on a column (feature) basis, such as:
You can also store your profile in disk for further inspection:
why.write(profile, "profile.bin")
This will create a profile binary file in your local filesystem.
You can read the profile back into memory with:
n_prof = why.read("profile.bin")
Note:
write
expects a profile as parameter, whileread
returns aProfile View
. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.
There's a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!
Or go to the examples page for the complete list of examples!