🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!
Feature Importance consists of a class of techniques that indicates the relative importance of each feature for a given prediction model. This is done by calculating and assigning a score, or weight, to each feature. Tracking the feature weights of a model can be useful for different reasons, such as:
WhyLabs supports feature weight tracking and visualization at your monitoring dashboard. The most straightforward way to send and receive feature weights to WhyLabs is through whylogs, and that's exactly what we'll show in this example.
In this example, we will:
First, let's make sure you have the required packages installed:
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
%pip install scikit-learn==1.0.2
Requirement already satisfied: whylogs in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (1.3.1) Requirement already satisfied: platformdirs<4.0.0,>=3.5.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (3.10.0) Requirement already satisfied: protobuf>=3.19.4 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (4.24.2) Requirement already satisfied: requests<3.0,>=2.27 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (2.31.0) Requirement already satisfied: types-requests<3.0.0.0,>=2.30.0.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (2.31.0.2) Requirement already satisfied: typing-extensions>=3.10 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (4.7.1) Requirement already satisfied: whylabs-client<0.6.0,>=0.5.5 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (0.5.5) Requirement already satisfied: whylogs-sketching>=3.4.1.dev3 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (3.4.1.dev3) Requirement already satisfied: charset-normalizer<4,>=2 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (3.2.0) Requirement already satisfied: idna<4,>=2.5 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (2023.7.22) Requirement already satisfied: types-urllib3 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from types-requests<3.0.0.0,>=2.30.0.0->whylogs) (1.26.25.14) Requirement already satisfied: python-dateutil in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylabs-client<0.6.0,>=0.5.5->whylogs) (2.8.2) Requirement already satisfied: six>=1.5 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from python-dateutil->whylabs-client<0.6.0,>=0.5.5->whylogs) (1.16.0) DEPRECATION: feast 0.22.4 has a non-standard dependency specifier googleapis-common-protos<2,>=1.52.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 DEPRECATION: feast 0.22.4 has a non-standard dependency specifier PyYAML<7,>=5.4.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 DEPRECATION: feast 0.22.4 has a non-standard dependency specifier dask<2022.02.0,>=2021.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 Note: you may need to restart the kernel to use updated packages. Collecting scikit-learn==1.0.2 Downloading scikit_learn-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.4/26.4 MB 3.4 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: numpy>=1.14.6 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (1.25.2) Requirement already satisfied: scipy>=1.1.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (1.9.3) Requirement already satisfied: joblib>=0.11 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (1.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (3.2.0) DEPRECATION: feast 0.22.4 has a non-standard dependency specifier googleapis-common-protos<2,>=1.52.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 DEPRECATION: feast 0.22.4 has a non-standard dependency specifier PyYAML<7,>=5.4.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 DEPRECATION: feast 0.22.4 has a non-standard dependency specifier dask<2022.02.0,>=2021.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 Installing collected packages: scikit-learn Attempting uninstall: scikit-learn Found existing installation: scikit-learn 1.3.0 Uninstalling scikit-learn-1.3.0: Successfully uninstalled scikit-learn-1.3.0 Successfully installed scikit-learn-1.0.2 Note: you may need to restart the kernel to use updated packages.
There are several different ways of calculating feature importance. One such way is calculating the model coefficients of a linear regression model, which can be interpreted as a feature importance score. That's the method we'll use for this demonstration. We'll create 5 informative features and 5 random ones.
The code below was based on the article How to Calculate Feature Importance With Python, by Jason Brownlee. I definitely recommend it if you are interested in other ways of calculating feature importance.
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)
# define the model
model = LinearRegression()
# fit the model
model.fit(X, y)
# get importance
importance = model.coef_
# summarize feature importance
weights = {"Feature_{}".format(key): value for (key, value) in enumerate(importance)}
weights
{'Feature_0': 3.4778530780712388e-15, 'Feature_1': 12.444827855389764, 'Feature_2': -3.108624468950438e-14, 'Feature_3': -1.9095836023552692e-14, 'Feature_4': 93.32225450776932, 'Feature_5': 86.50810998606799, 'Feature_6': 26.74606669803453, 'Feature_7': 3.285346398262155, 'Feature_8': -2.531308496145357e-14, 'Feature_9': 1.9539925233402755e-14}
We end up with a dictionary with the features as keys and the respective scores as values. This is an example of global feature importance, as opposed to local feature importance, which would show the contribution of features for a specific prediction. Currently, WhyLabs and whylogs support only global feature importance. Therefore, this is the structure we'll use to later send the Feature Weights to WhyLabs.
In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.
We will need three pieces of information:
First, grab a free WhyLabs account if you haven't already. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.
After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like model-xxxx
).
import whylogs as why
why.init()
Initializing session with config /home/anthony/.config/whylogs/config.ini ✅ Using session type: WHYLABS ⤷ org id: org-JpsdM6 ⤷ api key: l70gARzBVZ ⤷ default dataset: model-66 In production, you should pass the api key as an environment variable WHYLABS_API_KEY, the org id as WHYLABS_DEFAULT_ORG_ID, and the default dataset id as WHYLABS_DEFAULT_DATASET_ID.
<whylogs.api.whylabs.session.session.ApiKeySession at 0x7fd8b24392b0>
Once the feature weights are calculated, sending them to WhyLabs is very simple.
We first need to wrap the dictionary into a FeatureWeights
object:
from whylogs.core.feature_weights import FeatureWeights
feature_weights = FeatureWeights(weights)
And then use the WhyLabsWriter
to write it, provided your environment variables are properly set:
from whylogs.api.writer.whylabs import WhyLabsWriter
result = feature_weights.writer("whylabs").write()
result
(True, '200')
Another way of doing the exact same thing is by instantiating the Writer itself, and then calling write
:
WhyLabsWriter().write(feature_weights)
(True, '200')
You can also get the feature weights from WhyLabs with get_feature_weights()
:
result = WhyLabsWriter().get_feature_weights()
print(result.weights)
print(result.metadata)
{'Feature_4': 93.32225450776932, 'Feature_5': 86.50810998606799, 'Feature_6': 26.74606669803453, 'Feature_1': 12.444827855389764, 'Feature_7': 3.285346398262155, 'Feature_2': -3.108624468950438e-14, 'Feature_8': -2.531308496145357e-14, 'Feature_9': 1.9539925233402755e-14, 'Feature_3': -1.9095836023552692e-14, 'Feature_0': 3.4778530780712388e-15} {'author': 'system', 'updated_timestamp': 1694207704547, 'version': 2}
As you can see, the result will contain the set of weights in result.weights
, along with additional metadata in result.metadata
.
If you write multiple set of weights to the same model at WhyLabs, the content will be overwritten. When using get_feature_weights()
, you'll get the latest version, that is, the last set of weights you sent. You're able to see which version it is in the metadata, along with the timestamp of creation.