#!/usr/bin/env python # coding: utf-8 # >### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
# >*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started) to leverage the power of whylogs and WhyLabs together!* # # Getting Started # [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/basic/Getting_Started.ipynb) # whylogs provides a standard to log any kind of data. # # With whylogs, we will show how to log data, generating statistical summaries called *profiles*. These profiles can be used in a number of ways, like: # # * Data Visualization # * Data Validation # * Tracking changes in your datasets # ## Table of Content # In this example, we'll explore the basics of logging data with whylogs: # # - Installing whylogs # - Profiling data # - Interacting with the profile # - Writing/Reading profiles to/from disk # ## Installing whylogs # whylogs is made available as a Python package. You can get the latest version from PyPI with `pip install whylogs`: # In[1]: # Note: you may need to restart the kernel to use updated packages. get_ipython().run_line_magic('pip', 'install whylogs') # Minimal requirements: # # - Python 3.7+ up to Python 3.10 # - Windows, Linux x86_64, and MacOS 10+ # ## Loading a Pandas DataFrame # Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame: # In[8]: import pandas as pd data = { "animal": ["cat", "hawk", "snake", "cat"], "legs": [4, 2, 0, 4], "weight": [4.3, 1.8, 1.3, 4.1], } df = pd.DataFrame(data) # ## Profiling with whylogs # To obtain a profile of your data, you can simply use whylogs' `log` call, and navigate through the result to a specific profile with `profile()`: # In[3]: import whylogs as why results = why.log(df) profile = results.profile() # ## Analyzing Profiles # Once you're done logging the data, you can generate a `Profile View` and inspect it in a Pandas Dataframe format: # In[9]: prof_view = profile.view() prof_df = prof_view.to_pandas() prof_df # This will provide you with valuable statistics on a column (feature) basis, such as: # # - Counters, such as number of samples and null values # - Inferred types, such as integral, fractional and boolean # - Estimated Cardinality # - Frequent Items # - Distribution Metrics: min,max, median, quantile values # ## Writing to Disk # You can also store your profile in disk for further inspection: # In[7]: why.write(profile, "profile.bin") # This will create a profile binary file in your local filesystem. # ## Reading from Disk # You can read the profile back into memory with: # In[8]: n_prof = why.read("profile.bin") # > Note: `write` expects a profile as parameter, while `read` returns a `Profile View`. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates. # ## What's Next? # There's a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples! # - Basic # - [Visualizing Profiles](https://whylogs.readthedocs.io/en/stable/examples/basic/Notebook_Profile_Visualizer.html) - Compare profiles to detect distribution shifts, visualize histograms and bar charts and explore your data # - [Logging Data](https://whylogs.readthedocs.io/en/stable/examples/basic/Logging_Different_Data.html) - See the different ways you can log your data with whylogs # - [Inspecting Profiles](https://whylogs.readthedocs.io/en/stable/examples/basic/Inspecting_Profiles.html) - A deeper dive on the metrics generated by whylogs # - [Schema Configuration for Tracking Metrics](https://whylogs.readthedocs.io/en/stable/examples/basic/Schema_Configuration.html) - Configure tracking metrics according to data type or column features # - [Data Constraints](https://whylogs.readthedocs.io/en/stable/examples/advanced/Metric_Constraints.html) - Set constraints to your data to ensure its quality # - [Merging Profiles](https://whylogs.readthedocs.io/en/stable/examples/basic/Merging_Profiles.html) - Merge your profiles logged across different computing instances, time periods or data segments # - Integrations # - [WhyLabs](https://whylogs.readthedocs.io/en/stable/examples/integrations/writers/Writing_to_WhyLabs.html) - Monitor your profiles continuously with the WhyLabs Observability Platform # - [Pyspark](https://whylogs.readthedocs.io/en/stable/examples/integrations/Pyspark_Profiling.html) - Use whylogs with pyspark # - [Writing Profiles](https://whylogs.readthedocs.io/en/stable/examples/integrations/writers/Writing_Profiles.html) - See different ways and locations to output your profiles # - [Flask](https://whylogs.readthedocs.io/en/stable/examples/integrations/flask_streaming/flask_with_whylogs.html) - See how you can create a Flask app with whylogs and WhyLabs integration # - [Feature Stores](https://whylogs.readthedocs.io/en/stable/examples/integrations/Feature_Stores_and_whylogs.html) - Learn how to log features from your Feature Store with feast and whylogs # - [BigQuery](https://whylogs.readthedocs.io/en/stable/examples/integrations/BigQuery_Example.html) - Profile data queried from a Google BigQuery table # - [MLflow](https://whylogs.readthedocs.io/en/stable/examples/integrations/Mlflow_Logging.html) - Log your whylogs profiles to an MLflow environment # # Or go to the [examples page](https://whylogs.readthedocs.io/en/stable/examples.html) for the complete list of examples!