Counterfactual explanations¶

Simple example¶

We start by defining our black-box model, typically represented by

$$ f(\mathbf{x}) = \mathbf{y} $$

Where $\mathbf{x}=\{x_1, x_2, \dots,x_m\}$ and $\mathbf{y}=\{y_1, y_2, \dots,y_n\}$.

Our example toy model, in this case, takes an all-numerical input $\mathbf{x}$ and return a $\mathbf{y}$ of either true or false if the sum of the $\mathbf{x}$ components is within a threshold $\epsilon$ of a point $\mathbf{C}$, that is:

$$ f(\mathbf{x}, \epsilon, \mathbf{C})=\begin{cases} \text{true},\qquad \text{if}\ \mathbf{C}-\epsilon<\sum_{i=1}^m x_i <\mathbf{C}+\epsilon \\ \text{false},\qquad \text{otherwise} \end{cases} $$

This model is provided in the TestUtils module. We instantiate with a $\mathbf{C}=500$ and $\epsilon=1.0$.

In [3]:

from trustyai.utils import TestModels

center = 500.0
epsilon = 1.0

model = TestModels.getSumThresholdModel(center, epsilon)

Next we need to define a goal. If our model is $f(\mathbf{x'})=\mathbf{y'}$ we are then defining our $\mathbf{y'}$ and the counterfactual result will be the $\mathbf{x'}$ which satisfies $f(\mathbf{x'})=\mathbf{y'}$.

We will define our goal as true, that is, the sum is withing the vicinity of a (to be defined) point $\mathbf{C}$. The goal is a list of Output which take the following parameters

The feature name
The feature type
The feature value (wrapped in Value)
A confidence threshold, which we will leave at zero (no threshold)

In [4]:

from trustyai.model import output

goal = [output(name="inside", dtype="bool", value=True)]

We will now define our initial features, $\mathbf{x}$. Each feature can be instantiated by using FeatureFactory and in this case we want to use numerical features, so we'll use FeatureFactory.newNumericalFeature.

Finally, we also specify which are the bounds for the counterfactual search. Typically this can be set either using domain-specific knowledge or taken from the data. In this case we simply specify an arbitrary (sensible) value, e.g. all the features can vary between 0 and 1000.

In [5]:

import random
from trustyai.model import feature

features = [
    feature(name=f"x{i+1}", value=random.random() * 10.0, dtype="number", domain=(0.0, 1000.0))
    for i in range(4)
]

As we can see, the sum of of the features will not be within $\epsilon$ (1.0) of $\mathbf{C}$ (500.0). As such the model prediction will be false:

In [6]:

feature_sum = 0.0
for f in features:
    value = f.value.as_number()
    print(f"Feature {f.name} has value {value}")
    feature_sum += value
print(f"\nFeatures sum is {feature_sum}")

Feature x1 has value 5.691481718331826
Feature x2 has value 2.224205523610917
Feature x3 has value 2.2922923494291747
Feature x4 has value 7.44220396685685

Features sum is 17.65018355822877

We can now instantiate the explainer itself.

To do so, we will to configure the termination criteria. For this example we will specify that the counterfactual search should only execute a maximum of 10,000 iterations before stopping and returning whatever the best result is so far.

We can can now instantiate the explainer itself using CounterfactualExplainer.

In [8]:

from trustyai.explainers import CounterfactualExplainer

explainer = CounterfactualExplainer(steps=10_000)

We will now express the counterfactual problem as defined above.

input represents our $\mathbf{x}$ which know gives a prediction of False
outputs represents our $\mathbf{y'}$, that is our desired prediction (True)
domain repreents the boundaries for the counterfactual search

We now request the counterfactual $\mathbf{x'}$ which is closest to $\mathbf{x}$ and which satisfies $f(\mathbf{x'}, \epsilon, \mathbf{C})=\mathbf{y'}$:

In [9]:

explanation = explainer.explain(inputs=features, goal=goal, model=model)

We can see that the counterfactual $\mathbf{x'}$

In [10]:

df = explanation.as_dataframe()
display(df)
print(f"\nFeature sum is {df.proposed.sum()}")

	features	proposed	original	constrained	difference
0	x1	5.691482	5.691482	False	0.000000
1	x2	6.167213	2.224206	False	3.943007
2	x3	2.292292	2.292292	False	0.000000
3	x4	484.898373	7.442204	False	477.456169

Feature sum is 499.0493597932051

In [11]:

explanation.plot()

Constrained features¶

As we've seen, it is possible to constraint a specific feature $x_i$ by setting the constraints list corresponding element to True.

In this example, we know want to fix $x_1$ and $x_4$. That is, these features should have the same value in the counterfactual $\mathbf{x'}$ as in the original $\mathbf{x}$.

By simply omitting a search domain to a feature we are constraining it.

In [12]:

features = [
    feature(name=f"x1", value=random.random() * 10.0, dtype="number"),
    feature(name=f"x2", value=random.random() * 10.0, dtype="number", domain=(0.0, 1000.0)),
    feature(name=f"x3", value=random.random() * 10.0, dtype="number", domain=(0.0, 1000.0)),
    feature(name=f"x4", value=random.random() * 10.0, dtype="number"),
]

And request a new counterfactual explanation

In [13]:

explanation = explainer.explain(inputs=features, goal=goal, model=model)

We can see that $x_1$ and $x_4$ has the same value as the original and the model satisfies the conditions.

In [14]:

explanation.as_dataframe()

Out[14]:

	features	proposed	original	constrained	difference
0	x1	6.067308	6.067308	True	0.000000
1	x2	487.776213	4.606031	False	483.170182
2	x3	4.078248	4.078248	False	0.000000
3	x4	1.355742	1.355742	True	0.000000

In [15]:

explanation.plot()

Using Python models¶

We will now show how to use a custom Python model with TrustyAI counterfactual explanations.

The model will be an XGBoost one trained with the credit-bias dataset (available here).

For convenience, the model is pre-trained and serialised with joblib so that for this example we simply need to deserialised it.

In [16]:

import joblib

xgb = joblib.load("models/credit-bias-xgboost.joblib")
print(xgb)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.07, max_delta_step=0, max_depth=8,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=200, n_jobs=12, num_parallel_tree=1, random_state=27,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=0.9861206227457426,
              seed=27, subsample=1, tree_method='exact', validate_parameters=1,
              verbosity=None)

This model has as a single output a boolean PaidLoan, which will predict whether a certain loan applicant will repay the loan in time or not. The model is slightly more complex than the previous examples, with input features:

Input feature	Type	Note
`NewCreditCustomer`	boolean
`Amount`	numerical
`Interest`	numerical
`LoanDuration`	numerical	In months
`Education`	numerical	Level (1, 2, 3..)
`NrOfDependants`	numerical	Integer
`EmploymentDurationCurrentEmployer`	numerical	Integer (years)
`IncomeFromPrincipalEmployer`	numerical
`IncomeFromPension`	numerical
`IncomeFromFamilyAllowance`	numerical
`IncomeFromSocialWelfare`	numerical
`IncomeFromLeavePay`	numerical
`IncomeFromChildSupport`	numerical
`IncomeOther`	numerical
`ExistingLiabilities`	numerical	integer
`RefinanceLiabilities`	numerical	integer
`DebtToIncome`	numerical
`FreeCash`	numerical
`CreditScoreEeMini`	numerical	integer
`NoOfPreviousLoansBeforeLoan`	numerical	integer
`AmountOfPreviousLoansBeforeLoan`	numerical
`PreviousRepaymentsBeforeLoan`	numerical
`PreviousEarlyRepaymentsBefoleLoan`	numerical
`PreviousEarlyRepaymentsCountBeforeLoan`	numerical	integer
`Council_house`	boolean
`Homeless`	boolean
`Joint_ownership`	boolean
`Joint_tenant`	boolean
`Living_with_parents`	boolean
`Mortgage`	boolean
`Other`	boolean
`Owner`	boolean
`Owner_with_encumbrance`	boolean
`Tenant`	boolean
`Entrepreneur`	boolean
`Fully`	boolean
`Partially`	boolean
`Retiree`	boolean
`Self_employed`	boolean

We will start by testing the model with an input we are quite sure (from the original data) that will be predicted as false:

In [18]:

import pandas as pd

feature_values = {
        "NewCreditCustomer": False,
        "Amount": 2125.0,
        "Interest": 20.97,
        "LoanDuration": 60.0,
        "Education": 4.0,
        "NrOfDependants": 0.0,
        "EmploymentDurationCurrentEmployer": 6.0,
        "IncomeFromPrincipalEmployer": 0.0,
        "IncomeFromPension": 301.0,
        "IncomeFromFamilyAllowance": 0.0,
        "IncomeFromSocialWelfare": 53.0,
        "IncomeFromLeavePay": 0.0,
        "IncomeFromChildSupport": 0.0,
        "IncomeOther": 0.0,
        "ExistingLiabilities": 8.0,
        "RefinanceLiabilities": 6.0,
        "DebtToIncome": 26.29,
        "FreeCash": 10.92,
        "CreditScoreEeMini": 1000.0,
        "NoOfPreviousLoansBeforeLoan": 1.0,
        "AmountOfPreviousLoansBeforeLoan": 500.0,
        "PreviousRepaymentsBeforeLoan": 590.95,
        "PreviousEarlyRepaymentsBefoleLoan": 0.0,
        "PreviousEarlyRepaymentsCountBeforeLoan": 0.0,
        "Council_house": False,
        "Homeless": False,
        "Joint_ownership": False,
        "Joint_tenant": False,
        "Living_with_parents": False,
        "Mortgage": False,
        "Other": False,
        "Owner": False,
        "Owner_with_encumbrance": True,
        "Tenant": True,
        "Entrepreneur": False,
        "Fully": False,
        "Partially": False,
        "Retiree": True,
        "Self_employed": False
}
feature_df = pd.DataFrame([feature_values])
feature_df.T

Out[18]:

	0
NewCreditCustomer	False
Amount	2125.0
Interest	20.97
LoanDuration	60.0
Education	4.0
NrOfDependants	0.0
EmploymentDurationCurrentEmployer	6.0
IncomeFromPrincipalEmployer	0.0
IncomeFromPension	301.0
IncomeFromFamilyAllowance	0.0
IncomeFromSocialWelfare	53.0
IncomeFromLeavePay	0.0
IncomeFromChildSupport	0.0
IncomeOther	0.0
ExistingLiabilities	8.0
RefinanceLiabilities	6.0
DebtToIncome	26.29
FreeCash	10.92
CreditScoreEeMini	1000.0
NoOfPreviousLoansBeforeLoan	1.0
AmountOfPreviousLoansBeforeLoan	500.0
PreviousRepaymentsBeforeLoan	590.95
PreviousEarlyRepaymentsBefoleLoan	0.0
PreviousEarlyRepaymentsCountBeforeLoan	0.0
Council_house	False
Homeless	False
Joint_ownership	False
Joint_tenant	False
Living_with_parents	False
Mortgage	False
Other	False
Owner	False
Owner_with_encumbrance	True
Tenant	True
Entrepreneur	False
Fully	False
Partially	False
Retiree	True
Self_employed	False

We can confirm now that this input will lead to a false PaidLoan prediction with a probability of $\sim78\%$:

In [20]:

print(xgb.predict_proba(feature_df))
print(f"Paid loan is predicted as: {xgb.predict(feature_df)}")

[[0.7835957  0.21640435]]
Paid loan is predicted as: [False]

We will now prepare the XGBoost model to be used from the TrustyAI counterfactual engine.

To do so, we simply need to wrap the predict function into a TrustyAI.Modelclass. Since our model outputs a Numpy array, we need to specify the column names of the output, so we can later use those names to set the goals of our counterfactual search.

In [21]:

from trustyai.model import Model

model = Model(xgb.predict, output_names=['PaidLoan'])

Unconstrained basic search¶

To get started we will search for a counterfactual with no constraints at all. This is not a realistic use case, but we will use it as a baseline.

We will also create a set of equal bounds for all the features. Again, this is not realistic, but we do it to establish a baseline.

In [22]:

from trustyai.model.domain import feature_domain
domains = []
for v in  feature_values.values():
    if type(v) is bool:
        domains.append(feature_domain([True, False]))
    else:
        domains.append(feature_domain((0, 10000.)))

We want our goal to be the model predicting the loan will be paid (PaidLoad=True), so we specify it as:

In [24]:

goal = [output(name="PaidLoan", dtype="bool", value=True)]

We are now ready to search for a counterfactual:

In [25]:

from trustyai.explainers import CounterfactualExplainer

explainer = CounterfactualExplainer(steps=10_000)
explanation = explainer.explain(inputs=feature_df, goal=goal, model=model, feature_domains=domains)

First we will confirm that our counterfactual changes the outcome, by predicting its outcome using the model:

In [26]:

model(explanation.proposed_features_dataframe)

Out[26]:

array([ True])

In [27]:

explanation.as_dataframe()

Out[27]:

	features	proposed	original	constrained	difference
0	NewCreditCustomer	0.00	0.00	False	0.0
1	Amount	2125.00	2125.00	False	0.0
2	Interest	20.97	20.97	False	0.0
3	LoanDuration	60.00	60.00	False	0.0
4	Education	4.00	4.00	False	0.0
5	NrOfDependants	0.00	0.00	False	0.0
6	EmploymentDurationCurrentEmployer	6.00	6.00	False	0.0
7	IncomeFromPrincipalEmployer	0.00	0.00	False	0.0
8	IncomeFromPension	301.00	301.00	False	0.0
9	IncomeFromFamilyAllowance	0.00	0.00	False	0.0
10	IncomeFromSocialWelfare	53.00	53.00	False	0.0
11	IncomeFromLeavePay	0.00	0.00	False	0.0
12	IncomeFromChildSupport	0.00	0.00	False	0.0
13	IncomeOther	0.00	0.00	False	0.0
14	ExistingLiabilities	8.00	8.00	False	0.0
15	RefinanceLiabilities	0.00	6.00	False	-6.0
16	DebtToIncome	26.29	26.29	False	0.0
17	FreeCash	10.92	10.92	False	0.0
18	CreditScoreEeMini	1000.00	1000.00	False	0.0
19	NoOfPreviousLoansBeforeLoan	1.00	1.00	False	0.0
20	AmountOfPreviousLoansBeforeLoan	500.00	500.00	False	0.0
21	PreviousRepaymentsBeforeLoan	590.95	590.95	False	0.0
22	PreviousEarlyRepaymentsBefoleLoan	0.00	0.00	False	0.0
23	PreviousEarlyRepaymentsCountBeforeLoan	6.00	0.00	False	6.0
24	Council_house	0.00	0.00	False	0.0
25	Homeless	0.00	0.00	False	0.0
26	Joint_ownership	0.00	0.00	False	0.0
27	Joint_tenant	0.00	0.00	False	0.0
28	Living_with_parents	0.00	0.00	False	0.0
29	Mortgage	0.00	0.00	False	0.0
30	Other	0.00	0.00	False	0.0
31	Owner	0.00	0.00	False	0.0
32	Owner_with_encumbrance	1.00	1.00	False	0.0
33	Tenant	1.00	1.00	False	0.0
34	Entrepreneur	0.00	0.00	False	0.0
35	Fully	0.00	0.00	False	0.0
36	Partially	0.00	0.00	False	0.0
37	Retiree	1.00	1.00	False	0.0
38	Self_employed	0.00	0.00	False	0.0

And indeed it changes. We will now verify which features were changed:

In [28]:

df = explanation.as_dataframe()
df[df.difference != 0.0]

Out[28]:

	features	proposed	original	constrained	difference
15	RefinanceLiabilities	0.0	6.0	False	-6.0
23	PreviousEarlyRepaymentsCountBeforeLoan	6.0	0.0	False	6.0

In [29]:

explanation.plot()

Here we can see the problem with the unconstrained search.

Some of the fields that were changed might be difficult to change in practice.

Constrained search¶

We will now try a more realistic search, which incorporates domain specific knowledge (and common sence).

To do so, we will constrain features we feel they shouldn't (or mustn't) change and specify sensible search bounds. We will start with the constraints:

The constraints should be self-explanatory, but in essence they were divided into three groups

Attributes you cannot or should not change (protected), for instance age, education level, etc
Attributes you can change, for loan duration, loan amount, etc
Attributes you probably won't be able to change, but might be informative to change. For instance, you might not be able to easily change your income, but you might be interested in how much would it need to be in order to get the prediction as favourable.

In [30]:

_domains = {
        "Amount": (0.0, 1000.0),
        "LoanDuration": (0.0, 120.0),
        "EmploymentDurationCurrentEmployer": (0.0, 40.0),
        "IncomeFromPrincipalEmployer": (0.0, 1000.0),
        "IncomeFromPension": (0.0, 1000.0),
        "IncomeFromFamilyAllowance": (0.0, 1000.0),
        "IncomeFromSocialWelfare": (0.0, 1000.0),
        "IncomeFromLeavePay": (0.0, 1000.0),
        "IncomeFromChildSupport": (0.0, 1000.0),
        "IncomeOther": (0.0, 1000.0),
        "DebtToIncome": (0.0, 100.0),
        "FreeCash": (0.0, 100.0),
        "CreditScoreEeMini": (0.0, 10000.0)
}
domains = {key: None for key  in feature_values.keys()}

for key in  _domains.keys():
        domains[key] = feature_domain(_domains[key])

domains = list(domains.values())

And we start a new search:

In [31]:

explanation = explainer.explain(inputs=feature_df, goal=goal, model=model, feature_domains=domains)

We test that the counterfactual does change the outcome:

In [32]:

model(explanation.proposed_features_dataframe)

Out[32]:

array([ True])

And we confirm that no constrained features were changed:

In [33]:

df = explanation.as_dataframe()
df[df.difference != 0.0]

Out[33]:

	features	proposed	original	constrained	difference
3	LoanDuration	55.86733	60.0	False	-4.13267
10	IncomeFromSocialWelfare	60.00000	53.0	False	7.00000

In [34]:

explanation.plot()

Minimum counterfactual probabilities¶

We can see that the previous answer is very close to $50\%$.

With TrustyAI we have the possibility to specify a minimum probability for the result (when the model supports prediction confidences).

Let's say we want a result that is at least $75\%$ confident that the loan will be repaid. We can just encode the minimum probability as the last argument of each Output. A minimum probability of $0$ (as we've used) simply means that any desired outcome will be accepted, regardless of its probability.

In [35]:

def fn(x):
    return xgb.predict_proba(x)[:,1]

model = Model(fn, output_names=["probability"])
goal = [output(name="probability", dtype="number", value=0.75)]

We can then re-run the search with all the data as defined previously:

In [36]:

explanation = explainer.explain(inputs=feature_df, goal=goal, model=model, feature_domains=domains)

As previously, we check that the answer is what we are looking for:

In [37]:

model(explanation.proposed_features_dataframe)

Out[37]:

array([0.75004345], dtype=float32)

And we show which features need to be changed for said desired outcome:

In [38]:

df = explanation.as_dataframe()
df[df.difference != 0.0]

Out[38]:

	features	proposed	original	constrained	difference
3	LoanDuration	12.949312	60.0	False	-47.050688
9	IncomeFromFamilyAllowance	49.624817	0.0	False	49.624817
10	IncomeFromSocialWelfare	23.144654	53.0	False	-29.855346
12	IncomeFromChildSupport	21.172252	0.0	False	21.172252

In [39]:

explanation.plot()

Visualisation¶

Let's try to visualise counterfactuals in action with the following example:

We construct three clusters of data and train a model to assign points to a cluster. Our counterfactual question can then be

If we have a point $X$ belonging to a certain cluster $C$, how far would it need to move in order to belong to a desired cluster $C^\prime$

In [40]:

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=2000, n_features=2, random_state=0, centers=[(-5,0), (0, 5), (5, 0)])

In [41]:

import matplotlib.pyplot as plt

plt.figure(figsize=(20,10))
plt.scatter(X[:,0], X[:,1], c=y, s=5)
plt.show()

We now train a KNN model in order to classify points and assign them to a cluster.

In [42]:

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X,y)

Out[42]:

KNeighborsClassifier(n_neighbors=3)

If we take a point such as (2.5, -1) it is clear this point will belong to cluster 2.

In [43]:

plt.figure(figsize=(20,10))
ORIGINAL = [2.5, -1.0]
plt.scatter(X[:,0], X[:,1], c=y, s=5)
plt.scatter(ORIGINAL[0], ORIGINAL[1], c="red", s=50)
plt.plot([ORIGINAL[0], 5], [ORIGINAL[1], 0], c="red")
plt.plot([ORIGINAL[0], 0], [ORIGINAL[1], 5], c="grey", linestyle="dashed")
plt.show()

In [44]:

import numpy as np
knn.predict_proba(np.array([[2.5, -1.0]]))

Out[44]:

array([[0., 0., 1.]])

We can now create our prediction function. I will simply take the x and y and return the cluster classification, along with a probability.

In [45]:

def knn_classify(inputs):
    result = knn.predict_proba(inputs)
    index = np.argmax(result, axis=1)[0]
    return np.array([index]).astype(float)

We wrap this function in the Model wrapper.

In [46]:

from trustyai.model import Model

knn_model = Model(knn_classify, output_names=['cluster'])

We can test it and confirm it works as expected.

We now define our goal. We want it to belong to cluster 1.

In [47]:

goal = [output(name="cluster", dtype="number", value=1)]
features = [
    feature("x", "number", 2.5, (-10, 10)), 
    feature("y", "number", -1.0, (-10, 10))
]

We pass this data to the explainer along with the search boundaries.

In [48]:

explanation = explainer.explain(inputs=features, goal=goal, model=knn_model)

The counterfactual we get is:

In [49]:

result = explanation.proposed_features_array[0]
result

Out[49]:

array([2.49380669, 2.46489674])

We can confirm this point belongs to cluster 1

In [50]:

knn_model(explanation.proposed_features_array)

Out[50]:

array([1.])

And visually:

In [51]:

plt.figure(figsize=(20,10))
plt.scatter(X[:,0], X[:,1], c=y, s=5)
plt.scatter(ORIGINAL[0], ORIGINAL[1], c="red", s=50)
plt.scatter(result[0], result[1], c="green", s=50)
plt.plot([result[0], 5], [result[1], 0], c="grey", linestyle="dashed")
plt.plot([result[0], 0], [result[1], 5], c="green")

plt.show()

In [52]:

explanation.as_dataframe()

Out[52]:

	features	proposed	original	constrained	difference
0	x	2.493807	2.5	False	-0.006193
1	y	2.464897	-1.0	False	3.464897

In [53]:

explanation.plot()