whylogs provides a standard to log any kind of data.
With whylogs, we will show how to log data, generating statistical summaries called profiles. These profiles can be used in a number of ways, like:
In this example, we'll explore the basics of logging data with whylogs:
whylogs is made available as a Python package. You can get the latest version from PyPI with pip install whylogs
:
!pip install -q whylogs --pre
Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:
import pandas as pd
data = {
"animal": ["cat", "hawk", "snake", "cat"],
"legs": [4, 2, 0, 4],
"weight": [4.3, 1.8, 1.3, 4.1],
}
df = pd.DataFrame(data)
To obtain a profile of your data, you can simply use whylogs' log
call, and navigate through the result to a specific profile with get_profile
:
import whylogs as why
results = why.log(df)
profile = results.profile()
Once you're done logging the data, you can generate a Profile View
and inspect it in a Pandas Dataframe format:
prof_view = profile.view()
prof_df = prof_view.to_pandas()
prof_df
counts/n | counts/null | types/integral | types/fractional | types/boolean | types/string | types/object | cardinality/est | cardinality/upper_1 | cardinality/lower_1 | ... | distribution/n | distribution/max | distribution/min | distribution/q_10 | distribution/q_25 | distribution/median | distribution/q_75 | distribution/q_90 | ints/max | ints/min | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
animal | 8 | 0 | 0 | 0 | 0 | 8 | 0 | 6.0 | 6.00030 | 6.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
weight | 8 | 0 | 0 | 8 | 0 | 0 | 0 | 7.0 | 7.00035 | 7.0 | ... | 8.0 | 30.1 | 1.3 | 1.3 | 4.1 | 4.3 | 14.3 | 30.1 | NaN | NaN |
legs | 8 | 0 | 8 | 0 | 0 | 0 | 0 | 3.0 | 3.00015 | 3.0 | ... | 8.0 | 4.0 | 0.0 | 0.0 | 2.0 | 4.0 | 4.0 | 4.0 | 4.0 | 0.0 |
3 rows × 24 columns
This will provide you with valuable statistics on a column (feature) basis, such as:
You can also store your profile in disk for further inspection:
why.write(profile,"profile.bin")
This will create a profile binary file in your local filesystem.
You can read the profile back into memory with:
n_prof = why.read("profile.bin")
Note:
write
expects a profile as parameter, whileread
returns aProfile View
. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.
There's a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!