🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!
This is a
whylogs v1
example. For the analog example inv0
, please refer to this example
In this walkthrough, we'll see how you can use Feast and whylogs together at different parts of your ML pipeline - We'll use Feast to set up an online feature store, and then use it to enrich our serving data with additional features. After assembling our feature vector, we'll proceed to log it with whylogs. As the requests for prediction arrives, the logged input features will be statistically profiled. We will explore these profiles to see what kind of insights we can have.
To do so, we'll use a sample dataset of daily taxi rides in NYC, extracted from here. Our final goal could be a prediction requested at the start of a given ride. This prediction could be whether the customer will give a high tip to the driver, or maybe whether the customer will give him a good review. As an input to the prediction model, in addition to the ride information (like number of passengers, day of the week or trip distance), we might be interested in enriching our feature vector with information about the driver, like the driver's average speed, average rating or avg trips in the last 24 hours, with the hopes of improving the model's performances.
The info about the specific ride will be known at inference time. However, the driver statistics might be available to us in a different data source, updated at specific time intervals. We will join these information to assemble a single feature vector by using Feast to set up an online feature store. Feast will materialize the features into the online store from a data source file. This data source will have driver statistic's, according the each driver's ID updated in an hourly basis.
We will simulate a production pipeline, where requests for predictions will be made at different timestamps. We'll then log the feature vectors for each request into daily profiles for a period of 7 days. We'll then see how we can compare the obtained profiles for possible data issues or drifts we might have between days.
Let's consider some scenarios in which logging and visualizing features would be helpful.
In this example, we have updated information about drivers in an hourly basis. Let's simulate a scenario in which this frequency gets affected by some reason, and for a particular period we have new information accessible only in 2-hour cycles.
Let's consider a scenario where people's behavior changes: maybe people are riding less. For example, when covid started, the number of rides certainly plummeted. We could also have a change in the criterias people use to rate a driver. For example, now the given rates, or reviews, for each driver could be affected by specific services provided, like the presence of alcohol and/or physical barriers to ensure social distancing.
First of all, let's install the required packages for this tutorial:
# Note: you may need to restart the kernel to use updated packages.
%pip install --upgrade pip -qq
%pip install feast==0.22.4 -qq
%pip install Pygments -qq
%pip install whylogs[viz] -U
In order to deploy our feature store, we need to create a feature repository. In Feast's quickstart example, this is traditionally done with a feast init
command. This example is based on the quickstart example, but with some changes in the python and configuration files.
For this reason, let's quickly create a folder with the required files to create a feature repository adapted to our use case.
%%sh
mkdir feature_repo
mkdir feature_repo/data
mkdir feature_repo/whylogs_output
touch feature_repo/__init__.py
Writing our feature definition in the example.py
inside our feature_repo
folder:
%%writefile feature_repo/example.py
# This is an example feature definition file
from datetime import timedelta
from feast import Entity, FeatureView, Field, FeatureService, FileSource, ValueType
from feast.types import Float32, Int64
# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = FileSource(
path="data/driver_stats.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)
# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
# Entity has a name used for later reference (in a feature view, eg)
# and join_key to identify physical field name used in storages
driver = Entity(name="driver", value_type=ValueType.INT64, join_keys=["driver_id"], description="driver id",)
# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver"], # reference entity by name
ttl=timedelta(seconds=86400 * 1),
schema=[
Field(name="rate_1m", dtype=Int64),
Field(name="avg_daily_trips", dtype=Int64),
Field(name="avg_speed", dtype=Float32),
],
online=True,
source=driver_hourly_stats,
tags={},
)
driver_stats_fs = FeatureService(
name="driver_activity",
features=[driver_hourly_stats_view]
)
Overwriting feature_repo/example.py
Writing the feature_store.yaml
configuration file:
%%writefile feature_repo/feature_store.yaml
project: feature_repo
registry: data/registry.db
provider: local
online_store:
path: data/online_store.db
Overwriting feature_repo/feature_store.yaml
Let's first navigate to our feature repository folder:
%cd feature_repo
/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs2/python/examples/integrations/feature_repo
Make sure you're on the right folder. You should see an empty data folder (we'll populate it with our data source later), the example.py
python script, which contains our feature definitions, and the feature_store.yaml
configuration file.
%ls -R
.: __init__.py* data/ example.py* feature_store.yaml* whylogs_output/ ./data: driver_stats.parquet* registry.db* ./whylogs_output:
Now, let's download our data source and store it locally in our feature repository:
import feast
feast.__version__
'0.22.4'
import pandas as pd
path = f"https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/feast_integration/driver_stats.parquet"
print(f"Loading data from {path}")
driver_stats = pd.read_parquet(path)
print(f"Saving file source locally")
driver_stats.to_parquet("data/driver_stats.parquet")
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/feast_integration/driver_stats.parquet Saving file source locally
In the data source, we have driver's statistics on an hourly basis, such as the average trips done on the last 24 hours, average rating on the last month and average driving speed. You can see more information on how this data was created at the end of this notebook, in the Appendix.
driver_stats.head()
index | event_timestamp | driver_id | created | avg_daily_trips | rate_1m | avg_speed | |
---|---|---|---|---|---|---|---|
0 | 0.0 | 2020-02-10 00:00:00 | 1001 | 2022-02-16 16:17:56.446774 | 25 | 3 | 16.87 |
1 | 0.0 | 2020-02-10 00:00:00 | 1002 | 2022-02-16 16:17:56.446774 | 35 | 1 | 20.21 |
2 | 1.0 | 2020-02-10 01:00:00 | 1001 | 2022-02-16 16:17:56.446774 | 19 | 4 | 20.77 |
3 | 1.0 | 2020-02-10 01:00:00 | 1002 | 2022-02-16 16:17:56.446774 | 29 | 3 | 19.20 |
4 | 2.0 | 2020-02-10 02:00:00 | 1001 | 2022-02-16 16:17:56.446774 | 31 | 3 | 17.41 |
driver_stats.dtypes
index float64 event_timestamp datetime64[ns] driver_id int64 created datetime64[ns] avg_daily_trips int64 rate_1m int64 avg_speed float64 dtype: object
Now, we will scan the python files in our feature repository for feature views/entity definitions, register the objects and deploy the infrastructure with the feast apply command
.
!feast apply
/mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs2/python/.venv/lib/python3.8/site-packages/feast/entity.py:110: DeprecationWarning: The `value_type` parameter is being deprecated. Instead, the type of an entity should be specified as a Field in the schema of a feature view. Feast 0.24 and onwards will not support the `value_type` parameter. The `entities` parameter of feature views should also be changed to a List[Entity] instead of a List[str]; if this is not done, entity columns will be mistakenly interpreted as feature columns. warnings.warn( /mnt/c/Users/felip/Documents/Projects-WhyLabs/whylogs2/python/.venv/lib/python3.8/site-packages/feast/feature_view.py:180: DeprecationWarning: The `entities` parameter should be a list of `Entity` objects. Feast 0.24 and onwards will not support passing in a list of strings to define entities. warnings.warn( Created entity driver Created feature view driver_hourly_stats Created feature service driver_activity Created sqlite table feature_repo_driver_hourly_stats
Let's also load our rides dataframe. In it we, have features about rides made during 10-Feb to 16-Feb (2020), such as the number of passengers, trip distance and pickup date and time.
import pandas as pd
path = f"https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/nyc_taxi_rides_feb_2020_changed.parquet"
print(f"Loading data from {path}")
rides_df = pd.read_parquet(path)
rides_df.head()
Loading data from https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/nyc_taxi_rides_feb_2020_changed.parquet
pickup_weekday | passenger_count | trip_distance | PULocationID | tpep_pickup_datetime | pickup_date | |
---|---|---|---|---|---|---|
225897 | 0 | 1.0 | 1.20 | 249 | 2020-02-10 00:23:21 | 2020-02-10 |
108301 | 0 | 5.0 | 19.03 | 132 | 2020-02-10 01:19:01 | 2020-02-10 |
196729 | 0 | 6.0 | 0.38 | 68 | 2020-02-10 01:29:23 | 2020-02-10 |
239495 | 0 | 1.0 | 2.90 | 263 | 2020-02-10 02:44:20 | 2020-02-10 |
72014 | 0 | 6.0 | 16.05 | 233 | 2020-02-10 04:12:22 | 2020-02-10 |
rides_df['passenger_count'] = rides_df['passenger_count'].fillna(0).astype('int64')
The real dataset doesn't contain information regarding the taxi driver that conducted the ride. Since our goal is to enrich the dataset with driver features from an external data source, we will create a driver_id
column. For simplicity, let's consider that this dataset contains ride information of only 2 drivers (IDs 1001
and 1002
)
import numpy as np
rides_df['driver_id'] = np.random.randint(1001, 1003, rides_df.shape[0])
We will iterate on rides_df
, where each row represents a point in time in which we will request a prediction. For each request, we will:
We'll consider that the materialization job is run hourly. To simulate that, we will call materialize for the last rounded hour, based on the request's timestamp tpep_pickup_datetime
.
We will iterate through all the requests on the dataset, generate profiles for daily batches of data, and then write the profiles to disk in a binary file for each of the seven days:
from datetime import datetime, timedelta
from pprint import pprint
from feast import FeatureStore
import os
import whylogs as why
store = FeatureStore(repo_path=".")
prev_time = datetime(2020, 2, 10, 00, 00)
target_time = datetime(2020, 2, 10, 1, 00)
store.materialize(start_date=prev_time,end_date=target_time)
# Initializing logger for the first day
day_to_log = datetime(2020, 2, 10)
profile = None
for index,row in rides_df.iterrows():
request_timestamp = row['tpep_pickup_datetime']
# If new request is from the next day, close logger, save profile in-memory and start logger for the next day
if request_timestamp.day > day_to_log.day:
# let's write our profiles to whylogs_output folder
why.write(profile,os.path.join("whylogs_output","profile_{}_{}_{}.bin".format(day_to_log.day,day_to_log.month,day_to_log.year)))
day_to_log = request_timestamp.replace(hour=0, minute=0, second=0, microsecond=0)
print("Starting logger for day {}....".format(day_to_log))
profile = None
if request_timestamp>target_time + timedelta(hours=1):
target_time = datetime(request_timestamp.year,request_timestamp.month,request_timestamp.day,request_timestamp.hour)
prev_time = target_time - timedelta(hours=1)
store.materialize(start_date=prev_time,end_date=target_time)
driver_feature_vector = store.get_online_features(
features=[
"driver_hourly_stats:rate_1m",
"driver_hourly_stats:avg_daily_trips",
"driver_hourly_stats:avg_speed"
],
entity_rows=[{"driver_id": row['driver_id']},],
).to_dict()
# Get features from both ride and driver
assembled_feature_vector = {
"pickup_weekday": row["pickup_weekday"],
"passenger_count": row["passenger_count"],
"trip_distance": row["trip_distance"],
"PULocationID": row["PULocationID"],
"driver_avg_daily_trips": driver_feature_vector["avg_daily_trips"][0],
"driver_rate_1m": driver_feature_vector["rate_1m"][0],
"driver_avg_speed": driver_feature_vector["avg_speed"][0],
}
# Now that we have the complete set of features, model prediction could go here.
# The first time data is logged to a profile, we call log(). For subsequent data to be logged in the same profile, let's use track(), until the daily batch is finished.
if not profile:
profile = why.log(row=assembled_feature_vector).profile()
else:
profile.track(assembled_feature_vector)
why.write(profile,os.path.join("whylogs_output","profile_{}_{}_{}.bin".format(day_to_log.day,day_to_log.month,day_to_log.year)))
Materializing 1 feature views from 2020-02-10 00:00:00-03:00 to 2020-02-10 01:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 97.08it/s]
Materializing 1 feature views from 2020-02-10 01:00:00-03:00 to 2020-02-10 02:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 101.23it/s]
Materializing 1 feature views from 2020-02-10 03:00:00-03:00 to 2020-02-10 04:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 117.07it/s]
Materializing 1 feature views from 2020-02-10 04:00:00-03:00 to 2020-02-10 05:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 135.54it/s]
Materializing 1 feature views from 2020-02-10 05:00:00-03:00 to 2020-02-10 06:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 141.93it/s]
Materializing 1 feature views from 2020-02-10 06:00:00-03:00 to 2020-02-10 07:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 154.83it/s]
Materializing 1 feature views from 2020-02-10 07:00:00-03:00 to 2020-02-10 08:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 159.83it/s]
Materializing 1 feature views from 2020-02-10 08:00:00-03:00 to 2020-02-10 09:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.47it/s]
Materializing 1 feature views from 2020-02-10 09:00:00-03:00 to 2020-02-10 10:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 95.67it/s]
Materializing 1 feature views from 2020-02-10 10:00:00-03:00 to 2020-02-10 11:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.11it/s]
Materializing 1 feature views from 2020-02-10 11:00:00-03:00 to 2020-02-10 12:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 78.79it/s]
Materializing 1 feature views from 2020-02-10 12:00:00-03:00 to 2020-02-10 13:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 127.50it/s]
Materializing 1 feature views from 2020-02-10 13:00:00-03:00 to 2020-02-10 14:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.23it/s]
Materializing 1 feature views from 2020-02-10 14:00:00-03:00 to 2020-02-10 15:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.21it/s]
Materializing 1 feature views from 2020-02-10 15:00:00-03:00 to 2020-02-10 16:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 114.58it/s]
Materializing 1 feature views from 2020-02-10 16:00:00-03:00 to 2020-02-10 17:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.24it/s]
Materializing 1 feature views from 2020-02-10 17:00:00-03:00 to 2020-02-10 18:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.09it/s]
Materializing 1 feature views from 2020-02-10 18:00:00-03:00 to 2020-02-10 19:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 133.12it/s]
Materializing 1 feature views from 2020-02-10 19:00:00-03:00 to 2020-02-10 20:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 102.93it/s]
Materializing 1 feature views from 2020-02-10 20:00:00-03:00 to 2020-02-10 21:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 96.86it/s]
Materializing 1 feature views from 2020-02-10 21:00:00-03:00 to 2020-02-10 22:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 168.66it/s]
Materializing 1 feature views from 2020-02-10 22:00:00-03:00 to 2020-02-10 23:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 88.05it/s]
Starting logger for day 2020-02-11 00:00:00.... Materializing 1 feature views from 2020-02-10 23:00:00-03:00 to 2020-02-11 00:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 158.66it/s]
Materializing 1 feature views from 2020-02-11 00:00:00-03:00 to 2020-02-11 01:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 157.59it/s]
Materializing 1 feature views from 2020-02-11 04:00:00-03:00 to 2020-02-11 05:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 149.97it/s]
Materializing 1 feature views from 2020-02-11 05:00:00-03:00 to 2020-02-11 06:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 06:00:00-03:00 to 2020-02-11 07:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 148.77it/s]
Materializing 1 feature views from 2020-02-11 07:00:00-03:00 to 2020-02-11 08:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 08:00:00-03:00 to 2020-02-11 09:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 96.71it/s]
Materializing 1 feature views from 2020-02-11 09:00:00-03:00 to 2020-02-11 10:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 10:00:00-03:00 to 2020-02-11 11:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.11it/s]
Materializing 1 feature views from 2020-02-11 11:00:00-03:00 to 2020-02-11 12:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 12:00:00-03:00 to 2020-02-11 13:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.29it/s]
Materializing 1 feature views from 2020-02-11 13:00:00-03:00 to 2020-02-11 14:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 14:00:00-03:00 to 2020-02-11 15:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.01it/s]
Materializing 1 feature views from 2020-02-11 15:00:00-03:00 to 2020-02-11 16:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 16:00:00-03:00 to 2020-02-11 17:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 94.35it/s]
Materializing 1 feature views from 2020-02-11 17:00:00-03:00 to 2020-02-11 18:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 18:00:00-03:00 to 2020-02-11 19:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 110.94it/s]
Materializing 1 feature views from 2020-02-11 19:00:00-03:00 to 2020-02-11 20:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 20:00:00-03:00 to 2020-02-11 21:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 119.28it/s]
Materializing 1 feature views from 2020-02-11 21:00:00-03:00 to 2020-02-11 22:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-11 22:00:00-03:00 to 2020-02-11 23:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 98.93it/s]
Starting logger for day 2020-02-12 00:00:00.... Materializing 1 feature views from 2020-02-11 23:00:00-03:00 to 2020-02-12 00:00:00-03:00 into the sqlite online store. driver_hourly_stats:
0it [00:00, ?it/s]
Materializing 1 feature views from 2020-02-12 00:00:00-03:00 to 2020-02-12 01:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.18it/s]
Materializing 1 feature views from 2020-02-12 03:00:00-03:00 to 2020-02-12 04:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 142.89it/s]
Materializing 1 feature views from 2020-02-12 06:00:00-03:00 to 2020-02-12 07:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 83.19it/s]
Materializing 1 feature views from 2020-02-12 07:00:00-03:00 to 2020-02-12 08:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 103.58it/s]
Materializing 1 feature views from 2020-02-12 08:00:00-03:00 to 2020-02-12 09:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 139.98it/s]
Materializing 1 feature views from 2020-02-12 09:00:00-03:00 to 2020-02-12 10:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 105.49it/s]
Materializing 1 feature views from 2020-02-12 10:00:00-03:00 to 2020-02-12 11:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.73it/s]
Materializing 1 feature views from 2020-02-12 11:00:00-03:00 to 2020-02-12 12:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 128.29it/s]
Materializing 1 feature views from 2020-02-12 12:00:00-03:00 to 2020-02-12 13:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 105.62it/s]
Materializing 1 feature views from 2020-02-12 13:00:00-03:00 to 2020-02-12 14:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 103.11it/s]
Materializing 1 feature views from 2020-02-12 14:00:00-03:00 to 2020-02-12 15:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.69it/s]
Materializing 1 feature views from 2020-02-12 15:00:00-03:00 to 2020-02-12 16:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 84.25it/s]
Materializing 1 feature views from 2020-02-12 16:00:00-03:00 to 2020-02-12 17:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 127.68it/s]
Materializing 1 feature views from 2020-02-12 17:00:00-03:00 to 2020-02-12 18:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.89it/s]
Materializing 1 feature views from 2020-02-12 18:00:00-03:00 to 2020-02-12 19:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.54it/s]
Materializing 1 feature views from 2020-02-12 19:00:00-03:00 to 2020-02-12 20:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.51it/s]
Materializing 1 feature views from 2020-02-12 20:00:00-03:00 to 2020-02-12 21:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 101.55it/s]
Materializing 1 feature views from 2020-02-12 21:00:00-03:00 to 2020-02-12 22:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.28it/s]
Materializing 1 feature views from 2020-02-12 22:00:00-03:00 to 2020-02-12 23:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 118.25it/s]
Starting logger for day 2020-02-13 00:00:00.... Materializing 1 feature views from 2020-02-13 01:00:00-03:00 to 2020-02-13 02:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 122.30it/s]
Materializing 1 feature views from 2020-02-13 03:00:00-03:00 to 2020-02-13 04:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.17it/s]
Materializing 1 feature views from 2020-02-13 04:00:00-03:00 to 2020-02-13 05:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 141.48it/s]
Materializing 1 feature views from 2020-02-13 05:00:00-03:00 to 2020-02-13 06:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 120.50it/s]
Materializing 1 feature views from 2020-02-13 06:00:00-03:00 to 2020-02-13 07:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 125.49it/s]
Materializing 1 feature views from 2020-02-13 07:00:00-03:00 to 2020-02-13 08:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 90.89it/s]
Materializing 1 feature views from 2020-02-13 08:00:00-03:00 to 2020-02-13 09:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 133.53it/s]
Materializing 1 feature views from 2020-02-13 09:00:00-03:00 to 2020-02-13 10:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 96.20it/s]
Materializing 1 feature views from 2020-02-13 10:00:00-03:00 to 2020-02-13 11:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 81.72it/s]
Materializing 1 feature views from 2020-02-13 11:00:00-03:00 to 2020-02-13 12:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 88.51it/s]
Materializing 1 feature views from 2020-02-13 12:00:00-03:00 to 2020-02-13 13:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 112.87it/s]
Materializing 1 feature views from 2020-02-13 13:00:00-03:00 to 2020-02-13 14:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 126.05it/s]
Materializing 1 feature views from 2020-02-13 14:00:00-03:00 to 2020-02-13 15:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.44it/s]
Materializing 1 feature views from 2020-02-13 15:00:00-03:00 to 2020-02-13 16:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 140.34it/s]
Materializing 1 feature views from 2020-02-13 16:00:00-03:00 to 2020-02-13 17:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 104.42it/s]
Materializing 1 feature views from 2020-02-13 17:00:00-03:00 to 2020-02-13 18:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 148.64it/s]
Materializing 1 feature views from 2020-02-13 18:00:00-03:00 to 2020-02-13 19:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.96it/s]
Materializing 1 feature views from 2020-02-13 19:00:00-03:00 to 2020-02-13 20:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 129.22it/s]
Materializing 1 feature views from 2020-02-13 20:00:00-03:00 to 2020-02-13 21:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.97it/s]
Materializing 1 feature views from 2020-02-13 21:00:00-03:00 to 2020-02-13 22:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.21it/s]
Materializing 1 feature views from 2020-02-13 22:00:00-03:00 to 2020-02-13 23:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 117.89it/s]
Starting logger for day 2020-02-14 00:00:00.... Materializing 1 feature views from 2020-02-13 23:00:00-03:00 to 2020-02-14 00:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 127.16it/s]
Materializing 1 feature views from 2020-02-14 00:00:00-03:00 to 2020-02-14 01:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.66it/s]
Materializing 1 feature views from 2020-02-14 01:00:00-03:00 to 2020-02-14 02:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 138.51it/s]
Materializing 1 feature views from 2020-02-14 05:00:00-03:00 to 2020-02-14 06:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.10it/s]
Materializing 1 feature views from 2020-02-14 06:00:00-03:00 to 2020-02-14 07:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 146.52it/s]
Materializing 1 feature views from 2020-02-14 07:00:00-03:00 to 2020-02-14 08:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 133.95it/s]
Materializing 1 feature views from 2020-02-14 08:00:00-03:00 to 2020-02-14 09:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.94it/s]
Materializing 1 feature views from 2020-02-14 09:00:00-03:00 to 2020-02-14 10:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.47it/s]
Materializing 1 feature views from 2020-02-14 10:00:00-03:00 to 2020-02-14 11:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 117.00it/s]
Materializing 1 feature views from 2020-02-14 11:00:00-03:00 to 2020-02-14 12:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 104.38it/s]
Materializing 1 feature views from 2020-02-14 12:00:00-03:00 to 2020-02-14 13:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 144.85it/s]
Materializing 1 feature views from 2020-02-14 13:00:00-03:00 to 2020-02-14 14:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 50.76it/s]
Materializing 1 feature views from 2020-02-14 14:00:00-03:00 to 2020-02-14 15:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 80.35it/s]
Materializing 1 feature views from 2020-02-14 15:00:00-03:00 to 2020-02-14 16:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.47it/s]
Materializing 1 feature views from 2020-02-14 16:00:00-03:00 to 2020-02-14 17:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.62it/s]
Materializing 1 feature views from 2020-02-14 17:00:00-03:00 to 2020-02-14 18:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 97.90it/s]
Materializing 1 feature views from 2020-02-14 18:00:00-03:00 to 2020-02-14 19:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 128.72it/s]
Materializing 1 feature views from 2020-02-14 19:00:00-03:00 to 2020-02-14 20:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 68.97it/s]
Materializing 1 feature views from 2020-02-14 20:00:00-03:00 to 2020-02-14 21:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 103.13it/s]
Materializing 1 feature views from 2020-02-14 21:00:00-03:00 to 2020-02-14 22:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 134.36it/s]
Materializing 1 feature views from 2020-02-14 22:00:00-03:00 to 2020-02-14 23:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.09it/s]
Starting logger for day 2020-02-15 00:00:00.... Materializing 1 feature views from 2020-02-14 23:00:00-03:00 to 2020-02-15 00:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.28it/s]
Materializing 1 feature views from 2020-02-15 00:00:00-03:00 to 2020-02-15 01:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 134.45it/s]
Materializing 1 feature views from 2020-02-15 01:00:00-03:00 to 2020-02-15 02:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 121.67it/s]
Materializing 1 feature views from 2020-02-15 02:00:00-03:00 to 2020-02-15 03:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 82.45it/s]
Materializing 1 feature views from 2020-02-15 06:00:00-03:00 to 2020-02-15 07:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.26it/s]
Materializing 1 feature views from 2020-02-15 07:00:00-03:00 to 2020-02-15 08:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 100.35it/s]
Materializing 1 feature views from 2020-02-15 08:00:00-03:00 to 2020-02-15 09:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 106.04it/s]
Materializing 1 feature views from 2020-02-15 09:00:00-03:00 to 2020-02-15 10:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.26it/s]
Materializing 1 feature views from 2020-02-15 10:00:00-03:00 to 2020-02-15 11:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 121.92it/s]
Materializing 1 feature views from 2020-02-15 11:00:00-03:00 to 2020-02-15 12:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 124.89it/s]
Materializing 1 feature views from 2020-02-15 12:00:00-03:00 to 2020-02-15 13:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 107.71it/s]
Materializing 1 feature views from 2020-02-15 13:00:00-03:00 to 2020-02-15 14:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.57it/s]
Materializing 1 feature views from 2020-02-15 14:00:00-03:00 to 2020-02-15 15:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 108.74it/s]
Materializing 1 feature views from 2020-02-15 15:00:00-03:00 to 2020-02-15 16:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 115.71it/s]
Materializing 1 feature views from 2020-02-15 16:00:00-03:00 to 2020-02-15 17:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 111.62it/s]
Materializing 1 feature views from 2020-02-15 17:00:00-03:00 to 2020-02-15 18:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 97.34it/s]
Materializing 1 feature views from 2020-02-15 18:00:00-03:00 to 2020-02-15 19:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.39it/s]
Materializing 1 feature views from 2020-02-15 19:00:00-03:00 to 2020-02-15 20:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 101.70it/s]
Materializing 1 feature views from 2020-02-15 21:00:00-03:00 to 2020-02-15 22:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 102.01it/s]
Materializing 1 feature views from 2020-02-15 22:00:00-03:00 to 2020-02-15 23:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.55it/s]
Starting logger for day 2020-02-16 00:00:00.... Materializing 1 feature views from 2020-02-15 23:00:00-03:00 to 2020-02-16 00:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 115.39it/s]
Materializing 1 feature views from 2020-02-16 00:00:00-03:00 to 2020-02-16 01:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 120.52it/s]
Materializing 1 feature views from 2020-02-16 01:00:00-03:00 to 2020-02-16 02:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 113.76it/s]
Materializing 1 feature views from 2020-02-16 04:00:00-03:00 to 2020-02-16 05:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 125.81it/s]
Materializing 1 feature views from 2020-02-16 07:00:00-03:00 to 2020-02-16 08:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 86.17it/s]
Materializing 1 feature views from 2020-02-16 08:00:00-03:00 to 2020-02-16 09:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 107.42it/s]
Materializing 1 feature views from 2020-02-16 09:00:00-03:00 to 2020-02-16 10:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 123.10it/s]
Materializing 1 feature views from 2020-02-16 10:00:00-03:00 to 2020-02-16 11:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 121.85it/s]
Materializing 1 feature views from 2020-02-16 11:00:00-03:00 to 2020-02-16 12:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.44it/s]
Materializing 1 feature views from 2020-02-16 12:00:00-03:00 to 2020-02-16 13:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 141.69it/s]
Materializing 1 feature views from 2020-02-16 13:00:00-03:00 to 2020-02-16 14:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 99.13it/s]
Materializing 1 feature views from 2020-02-16 14:00:00-03:00 to 2020-02-16 15:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 98.62it/s]
Materializing 1 feature views from 2020-02-16 15:00:00-03:00 to 2020-02-16 16:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 149.06it/s]
Materializing 1 feature views from 2020-02-16 16:00:00-03:00 to 2020-02-16 17:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 112.62it/s]
Materializing 1 feature views from 2020-02-16 17:00:00-03:00 to 2020-02-16 18:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 92.72it/s]
Materializing 1 feature views from 2020-02-16 18:00:00-03:00 to 2020-02-16 19:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 102.09it/s]
Materializing 1 feature views from 2020-02-16 19:00:00-03:00 to 2020-02-16 20:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 131.01it/s]
Materializing 1 feature views from 2020-02-16 20:00:00-03:00 to 2020-02-16 21:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 136.77it/s]
Materializing 1 feature views from 2020-02-16 21:00:00-03:00 to 2020-02-16 22:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|█████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 93.03it/s]
Materializing 1 feature views from 2020-02-16 22:00:00-03:00 to 2020-02-16 23:00:00-03:00 into the sqlite online store. driver_hourly_stats:
100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 135.85it/s]
Let's confirm that the profiles for each day was indeed written to the disk:
%%sh
ls whylogs_output
profile_10_2_2020.bin profile_11_2_2020.bin profile_12_2_2020.bin profile_13_2_2020.bin profile_14_2_2020.bin profile_15_2_2020.bin profile_16_2_2020.bin
We can rehydrate each of those profiles to check some of the metrics provided in the profile. Let's take the first day as our reference profile:
reference_profile = why.read(os.path.join("whylogs_output","profile_10_2_2020.bin"))
# we generate a profile view, and then call to_pandas() to have a dataframe with the metrics to be inspected
reference_metrics = reference_profile.view().to_pandas()
reference_metrics
cardinality/est | cardinality/lower_1 | cardinality/upper_1 | counts/n | counts/null | distribution/max | distribution/mean | distribution/median | distribution/min | distribution/n | ... | distribution/stddev | frequent_items/frequent_strings | ints/max | ints/min | type | types/boolean | types/fractional | types/integral | types/object | types/string | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
PULocationID | 39.000004 | 39.0 | 39.001951 | 98 | 0 | 264.000000 | 161.663265 | 161.00 | 41.00 | 98 | ... | 64.712997 | [FrequentItem(value='161.000000', est=6, upper... | 264.0 | 41.0 | SummaryType.COLUMN | 0 | 0 | 98 | 0 | 0 |
driver_avg_daily_trips | 17.000001 | 17.0 | 17.000849 | 98 | 0 | 44.000000 | 29.428571 | 30.00 | 9.00 | 98 | ... | 7.729979 | [FrequentItem(value='36.000000', est=14, upper... | 44.0 | 9.0 | SummaryType.COLUMN | 0 | 0 | 98 | 0 | 0 |
driver_avg_speed | 35.000003 | 35.0 | 35.001750 | 98 | 0 | 31.139999 | 21.182551 | 20.98 | 13.21 | 98 | ... | 3.748177 | NaN | NaN | NaN | SummaryType.COLUMN | 0 | 98 | 0 | 0 | 0 |
driver_rate_1m | 4.000000 | 4.0 | 4.000200 | 98 | 0 | 4.000000 | 2.540816 | 3.00 | 1.00 | 98 | ... | 0.801653 | [FrequentItem(value='3.000000', est=45, upper=... | 4.0 | 1.0 | SummaryType.COLUMN | 0 | 0 | 98 | 0 | 0 |
passenger_count | 6.000000 | 6.0 | 6.000300 | 98 | 0 | 6.000000 | 1.408163 | 1.00 | 0.00 | 98 | ... | 1.199972 | [FrequentItem(value='1.000000', est=77, upper=... | 6.0 | 0.0 | SummaryType.COLUMN | 0 | 0 | 98 | 0 | 0 |
pickup_weekday | 1.000000 | 1.0 | 1.000050 | 98 | 0 | 0.000000 | 0.000000 | 0.00 | 0.00 | 98 | ... | 0.000000 | [FrequentItem(value='0.000000', est=98, upper=... | 0.0 | 0.0 | SummaryType.COLUMN | 0 | 0 | 98 | 0 | 0 |
trip_distance | 83.000017 | 83.0 | 83.004161 | 98 | 0 | 20.220000 | 2.791531 | 1.62 | 0.24 | 98 | ... | 3.606351 | NaN | NaN | NaN | SummaryType.COLUMN | 0 | 98 | 0 | 0 | 0 |
7 rows × 28 columns
If you want to know more about inspecting profiles and metrics contained in them, check the example on Inspecting Profiles !
Now, let's add some data error issues into the dataset and see how we could visually inspect this with some of whylog's functionalites. Some of the changes applied are shown as following:
Feb 10: No changes
Feb 11: (Data update error) New driver features are available only in 2 hour cycles. Simulating a scenario in which the sampling frequency is affected due to changes upstream.
Feb 16: (Feature drift) Based on the considerations made on section Changes in Data, we will: a) Reduce the number of passengers (passenger_count) and b) increase the standard deviation of rate_1m's distribution. For more information of how that was done, please see Appendix - Changing the Dataset.
We already have our reference profile, so let's load from disk two other profiles that contain the data update and feature drift issues, respectively.
target_profile_1 = why.read(os.path.join("whylogs_output","profile_11_2_2020.bin"))
target_profile_2 = why.read(os.path.join("whylogs_output","profile_16_2_2020.bin"))
The data update issue is a subtle one, since we still have data available with the expected shape and values. The only difference is that the values are being updated less often. Ideally, this could be checked elsewhere in our pipeline, but with information available in our assembled feature vector, we could get signals of this issues indirectly by inspecting the cardinality of the features collected from the driver source.
Let's check the cardinality of the average speed for our reference profile, which is a float variable:
card = reference_metrics.loc['driver_avg_speed']['cardinality/est']
print("Cardinality for driver average speed for Reference dataset:",card)
Cardinality for driver average speed for Reference dataset: 35.000002955397264
For the same frequency update, we can expect cardinality estimates around the value seen in our baseline.
Let's now compare the cardinality estimations in the other two profiles:
profile_1_metrics = target_profile_1.view().to_pandas()
profile_2_metrics = target_profile_2.view().to_pandas()
print("Cardinality for driver average speed for profile #1:")
print(profile_1_metrics.loc['driver_avg_speed']['cardinality/est'])
print("Cardinality for driver average speed for profile #2:")
print(profile_2_metrics.loc['driver_avg_speed']['cardinality/est'])
Cardinality for driver average speed for profile #1: 24.00000137090692 Cardinality for driver average speed for profile #2: 35.000002955397264
We can see there's a significant difference for the profile that is updated less frequently.
You could automate this type of assertion by using Constraints in order to do data validation in your data. If you want to know more, please see the example on Building Metric Constraints!
For February 16, we have a change in the average daily trips and also in the driver's monthly rating.
We can compare both profiles in order to detect data drifts and generate a report for every feature in the profiles by using the NotebookProfileVisualizer
from whylogs.viz import NotebookProfileVisualizer
visualization = NotebookProfileVisualizer()
visualization.set_profiles(target_profile_view=target_profile_2.view(), reference_profile_view=reference_profile.view())
visualization.summary_drift_report()
The report warns us of 3 possible drifts in the following features:
Indeed, we artificially changed the first two features, so we might expect to see a drift alert for those. The third one is also a drift, but probably not a relevant one, since it's pretty obvious that a feature that reflects the day of the week will be different for daily batches (unless both of them are from the same day of the week!)
We can inspect these features further by using distribution_chart()
for the driver's rating and double_histogram()
for the daily trips:
Note: Even though both features are integers, each feature can be better visualized with different types of visualizations. That happens because integers can be viewed as number, properly, or as a sort of encoding categorical variables. Since the driver's rating has few different possible number, and it wouldn't make sense to group different numbers in a single bin, this feature is better visualized by treating them as categorical variables, and therefore using the
distribution_chart()
visualization.
visualization.distribution_chart(feature_name="driver_rate_1m")
There's a lot of ratings that don't even show in the reference profile, so it really likes like there's a significant drift here.
visualization.double_histogram(feature_name="driver_avg_daily_trips")
Likewise, these histograms almost don't overlap, so it's pretty clear that these distributions are different.
The NotebookProfileVisualizer
has a bunch of other types of features and visualization. If you like to know more, be sure to check the example on the Notebook Profile Visualizer
This section is not really a part of the demonstration. It's just to show the changes made in the dataset that originated the driver_stats_changed.parquet
file that will be used in the beginning of the notebook.
The NYC taxi datasets provides only information about rides, but in this example we want to show an example of using an online feature store to enrich ride information with driver statistics. So, we'll fabricate some driver statistics and link them with the rides dataset (nyc_taxi_rides_feb_2020.parquet) through the Driver_ID
key.
import pandas as pd
dstats = pd.DataFrame(
{'event_timestamp': pd.date_range('2020-02-10', '2020-02-17', freq='1H', closed='left')}
)
dstats['driver_id'] = '1001'
dstats2 = pd.DataFrame(
{'event_timestamp': pd.date_range('2020-02-10', '2020-02-17', freq='1H', closed='left')}
)
dstats2['driver_id'] = '1002'
dstats_tot = pd.concat([dstats, dstats2])
dstats_tot = dstats_tot.sort_values(by=["event_timestamp","driver_id"])
import datetime
dstats_tot['created'] = datetime.datetime.now()
import numpy as np
mu, sigma = 30, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, len(dstats_tot))
daily_trips = np.round(s)
daily_trips = [int(x) for x in daily_trips]
dstats_tot['avg_daily_trips'] = daily_trips
from scipy.stats import truncnorm
def get_truncated_normal(mean=3, sd=0.75, low=1, upp=11):
return truncnorm(
(low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)
X = get_truncated_normal()
dstats_tot['rate_1m'] = [int(x) for x in X.rvs(len(dstats_tot))]
import numpy as np
mu, sigma = 20, 4 # mean and standard deviation
s = np.random.normal(mu, sigma, len(dstats_tot))
avg_speed = np.round(s,2)
avg_speed
dstats_tot['avg_speed'] = avg_speed
dstats_tot = dstats_tot.reset_index()
cond = (dstats_tot['event_timestamp'].dt.day==11) & (dstats_tot['event_timestamp'].dt.month==2) & ((dstats_tot['event_timestamp'].dt.hour%2)!=0)
df2 = dstats_tot.loc[cond]
dstats_tot = dstats_tot[~dstats_tot.isin(df2)].dropna()
We're assuming that this change in customer's behaviour would not change the mean of the distribution, but would have an increased standard deviation, making the rates be more spreaded, increasing the frequency of extreme ratings (positive or negative).
import numpy as np
cond = (dstats_tot['event_timestamp'].dt.day==14) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])
X = get_truncated_normal(mean=3, sd=2, low=1, upp=11)
rate_1m = [int(x) for x in X.rvs(size)]
dstats_tot.loc[cond, 'rate_1m'] = rate_1m
import numpy as np
cond = (dstats_tot['event_timestamp'].dt.day==15) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])
X = get_truncated_normal(mean=3, sd=3, low=1, upp=11)
rate_1m = [int(x) for x in X.rvs(size)]
dstats_tot.loc[cond, 'rate_1m'] = rate_1m
import numpy as np
cond = (dstats_tot['event_timestamp'].dt.day==16) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])
X = get_truncated_normal(mean=3, sd=4, low=1, upp=11)
rate_1m = [int(x) for x in X.rvs(size)]
dstats_tot.loc[cond, 'rate_1m'] = rate_1m
cond = (dstats_tot['event_timestamp'].dt.day==14) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])
mu, sigma = 24, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, size)
daily_trips = np.round(s)
daily_trips = [int(x) if x>0 else 0 for x in daily_trips]
# daily_trips
dstats_tot.loc[cond, 'avg_daily_trips'] = daily_trips
cond = (dstats_tot['event_timestamp'].dt.day==15) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])
mu, sigma = 12, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, size)
daily_trips = np.round(s)
daily_trips = [int(x) if x>0 else 0 for x in daily_trips]
# daily_trips
dstats_tot.loc[cond, 'avg_daily_trips'] = daily_trips
cond = (dstats_tot['event_timestamp'].dt.day==16) & (dstats_tot['event_timestamp'].dt.month==2)
size = len(dstats_tot.loc[cond])
mu, sigma = 3, 6 # mean and standard deviation
s = np.random.normal(mu, sigma, size)
daily_trips = np.round(s)
daily_trips = [int(x) if x>0 else 0 for x in daily_trips]
# daily_trips
dstats_tot.loc[cond, 'avg_daily_trips'] = daily_trips
dstats_tot = dstats_tot.astype({'driver_id': 'int64','avg_daily_trips':'int64','rate_1m':'int64'})
# dstats_tot.to_parquet("driver_stats.parquet")
The nyc_taxi_rides_feb_2020.parquet
was extracted from the TLC trip record data. We randomly sampled the data and selected a few chosen features, in order to reduce the dataset for this demonstration.
In addition, one features was created: The day of the week, based from tpep_pickup_datetime
.
The original features are described in this data dictionary.