#!/usr/bin/env python
# coding: utf-8

# # Getting Started

# [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/1.0.x/python/examples/basic/Getting_Started.ipynb)

# whylogs provides a standard to log any kind of data.
# 
# With whylogs, we will show how to log data, generating statistical summaries called *profiles*. These profiles can be used in a number of ways, like:
# 
# * Data Visualization
# * Data Validation
# * Tracking changes in your datasets

# ## Table of Content

# In this example, we'll explore the basics of logging data with whylogs:
# - Installing whylogs
# - Profiling data
# - Interacting with the profile
# - Writing/Reading profiles to/from disk

# ## Installing whylogs

# whylogs is made available as a Python package. You can get the latest version from PyPI with `pip install whylogs`:

# In[1]:


get_ipython().system('pip install -q whylogs --pre')


# ## Loading a Pandas DataFrame

# Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:

# In[2]:


import pandas as pd
data = {
    "animal": ["cat", "hawk", "snake", "cat"],
    "legs": [4, 2, 0, 4],
    "weight": [4.3, 1.8, 1.3, 4.1],
}

df = pd.DataFrame(data)


# ## Profiling with whylogs

# To obtain a profile of your data, you can simply use whylogs' `log` call, and navigate through the result to a specific profile with `get_profile`:

# In[3]:


import whylogs as why

results = why.log(df)
profile = results.profile()


# ## Analyzing Profiles

# Once you're done logging the data, you can generate a `Profile View` and inspect it in a Pandas Dataframe format:

# In[6]:


prof_view = profile.view()
prof_df = prof_view.to_pandas()

prof_df


# This will provide you with valuable statistics on a column (feature) basis, such as:
# 
# - Counters, such as number of samples and null values
# - Inferred types, such as integral, fractional and boolean
# - Estimated Cardinality
# - Frequent Items
# - Distribution Metrics: min,max, median, quantile values

# ## Writing to Disk

# You can also store your profile in disk for further inspection:

# In[7]:


why.write(profile,"profile.bin")


# This will create a profile binary file in your local filesystem.

# ## Reading from Disk

# You can read the profile back into memory with:

# In[8]:


n_prof = why.read("profile.bin")


# > Note: `write` expects a profile as parameter, while `read` returns a `Profile View`. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.

# ## What's Next?

# There's a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!

# - Basic
#     - [Visualizing Profiles](https://whylogs-v1-doc-dev.netlify.app/examples/basic/notebook_profile_visualizer) - Compare profiles to detect distribution shifts, visualize histograms and bar charts and explore your data
#     - [Schema Configuration for Tracking Metrics](https://whylogs-v1-doc-dev.netlify.app/examples/basic/schema_configuration) - Configure tracking metrics according to data type or column features
#     - More to Come!