Sklearn compatible Exponentiated Gradient Reduction¶

Exponentiated gradient reduction is an in-processing technique that reduces fair classification to a sequence of cost-sensitive classification problems, returning a randomized classifier with the lowest empirical error subject to fair classification constraints. The code for exponentiated gradient reduction wraps the source class fairlearn.reductions.ExponentiatedGradient available in the https://github.com/fairlearn/fairlearn library, licensed under the MIT Licencse, Copyright Microsoft Corporation.

In [1]:

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [2]:

import numpy as np
import pandas as pd

from sklearn.compose import make_column_transformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

from aif360.sklearn.inprocessing import ExponentiatedGradientReduction

from aif360.sklearn.datasets import fetch_adult
from aif360.sklearn.metrics import average_odds_error

Loading data¶

Datasets are formatted as separate X (# samples x # features) and y (# samples x # labels) DataFrames. The index of each DataFrame contains protected attribute values per sample. Datasets may also load a sample_weight object to be used with certain algorithms/metrics. All of this makes it so that aif360 is compatible with scikit-learn objects.

For example, we can easily load the Adult dataset from UCI with the following line:

In [3]:

X, y, sample_weight = fetch_adult()
X.head()

Out[3]:

		age	workclass	education	education-num	marital-status	occupation	relationship	race	sex	capital-gain	capital-loss	hours-per-week	native-country
race	sex
Non-white	Male	25.0	Private	11th	7.0	Never-married	Machine-op-inspct	Own-child	Black	Male	0.0	0.0	40.0	United-States
White	Male	38.0	Private	HS-grad	9.0	Married-civ-spouse	Farming-fishing	Husband	White	Male	0.0	0.0	50.0	United-States
White	Male	28.0	Local-gov	Assoc-acdm	12.0	Married-civ-spouse	Protective-serv	Husband	White	Male	0.0	0.0	40.0	United-States
Non-white	Male	44.0	Private	Some-college	10.0	Married-civ-spouse	Machine-op-inspct	Husband	Black	Male	7688.0	0.0	40.0	United-States
White	Male	34.0	Private	10th	6.0	Never-married	Other-service	Not-in-family	White	Male	0.0	0.0	30.0	United-States

To match the old version, we also remap the "race" feature to "White"/"Non-white",

In [4]:

X.race = X.race.cat.set_categories(['Non-white', 'White'], ordered=True).fillna('Non-white')

We can then map the protected attributes to integers,

In [5]:

X.index = pd.MultiIndex.from_arrays(X.index.codes, names=X.index.names)
y.index = pd.MultiIndex.from_arrays(y.index.codes, names=y.index.names)

and the target classes to 0/1,

In [6]:

y = pd.Series(y.factorize(sort=True)[0], index=y.index)

split the dataset,

In [7]:

(X_train, X_test,
 y_train, y_test) = train_test_split(X, y, train_size=0.7, random_state=1234567)

We use sklearn for one-hot encoding for easy reference to columns associated with protected attributes, information necessary for Exponentiated Gradient Reduction

In [8]:

ohe = make_column_transformer(
        (OneHotEncoder(sparse_output=False), X_train.dtypes == 'category'),
        remainder='passthrough', verbose_feature_names_out=False)
X_train  = pd.DataFrame(ohe.fit_transform(X_train), columns=ohe.get_feature_names_out(), index=X_train.index)
X_test = pd.DataFrame(ohe.transform(X_test), columns=ohe.get_feature_names_out(), index=X_test.index)

X_train.head()

Out[8]:

		workclass_Federal-gov	workclass_Local-gov	workclass_Private	workclass_Self-emp-inc	workclass_Self-emp-not-inc	workclass_State-gov	workclass_Without-pay	education_10th	education_11th	education_12th	...	native-country_Thailand	native-country_Trinadad&Tobago	native-country_United-States	native-country_Vietnam	native-country_Yugoslavia	age	education-num	capital-gain	capital-loss	hours-per-week
race	sex
1	1	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	1.0	0.0	0.0	58.0	11.0	0.0	0.0	42.0
	0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	51.0	12.0	0.0	0.0	30.0
	1	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	1.0	0.0	0.0	26.0	14.0	0.0	1887.0	40.0
	1	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	44.0	3.0	0.0	0.0	40.0
	1	0.0	0.0	1.0	0.0	0.0	0.0	0.0	1.0	0.0	0.0	...	0.0	0.0	1.0	0.0	0.0	33.0	6.0	0.0	0.0	40.0

5 rows × 100 columns

The protected attribute information is also replicated in the labels:

In [9]:

y_train.head()

Out[9]:

race  sex
1     1      0
      0      1
      1      1
      1      0
      1      0
dtype: int64

Running metrics¶

With the data in this format, we can easily train a scikit-learn model and get predictions for the test data:

In [10]:

y_pred = LogisticRegression(solver='liblinear').fit(X_train, y_train).predict(X_test)
lr_acc = accuracy_score(y_test, y_pred)
lr_acc

Out[10]:

0.8460234392275374

We can assess how close the predictions are to equality of odds.

average_odds_error() computes the (unweighted) average of the absolute values of the true positive rate (TPR) difference and false positive rate (FPR) difference, i.e.:

$$ \tfrac{1}{2}\left(|FPR_{D = \text{unprivileged}} - FPR_{D = \text{privileged}}| + |TPR_{D = \text{unprivileged}} - TPR_{D = \text{privileged}}|\right) $$

In [11]:

lr_aoe_sex = average_odds_error(y_test, y_pred, prot_attr='sex')
lr_aoe_sex

Out[11]:

0.09335303807799161

In [12]:

lr_aoe_race = average_odds_error(y_test, y_pred, prot_attr='race')
lr_aoe_race

Out[12]:

0.06751597777565721

Exponentiated Gradient Reduction¶

Choose a base model for the randomized classifier

In [13]:

estimator = LogisticRegression(solver='liblinear')

Determine the columns associated with the protected attribute(s)

In [14]:

prot_attr_cols = [colname for colname in X_train if "sex" in colname or "race" in colname]

Train the randomized classifier and observe test accuracy. Other options for constraints include "DemographicParity", "TruePositiveRateParity", "FalsePositiveRateParity", and "ErrorRateParity".

In [15]:

np.random.seed(0) #for reproducibility
exp_grad_red = ExponentiatedGradientReduction(prot_attr=prot_attr_cols,
                                              estimator=estimator,
                                              constraints="EqualizedOdds",
                                              drop_prot_attr=False)
exp_grad_red.fit(X_train, y_train)
egr_acc = exp_grad_red.score(X_test, y_test)
print(egr_acc)

# Check for that accuracy is comparable
assert abs(lr_acc-egr_acc)<=0.03

0.834303825458834

In [16]:

egr_aoe_sex = average_odds_error(y_test, exp_grad_red.predict(X_test), prot_attr='sex')
print(egr_aoe_sex)

# Check for improvement in average odds error for sex
assert egr_aoe_sex<lr_aoe_sex

0.02361168550972803

In [17]:

egr_aoe_race = average_odds_error(y_test, exp_grad_red.predict(X_test), prot_attr='race')
print(egr_aoe_race)

# Check for improvement in average odds error for race
assert egr_aoe_race<lr_aoe_race

0.024975550258025947

Number of calls made to base model algorithm

In [18]:

exp_grad_red.model_.n_oracle_calls_

Out[18]:

Maximum calls permitted

In [19]:

exp_grad_red.max_iter

Out[19]:

Instead of passing in a string value for constraints, we can also pass a fairlearn.reductions.moment object. You could use a predefined moment as we do below or create a custom moment using the fairlearn library.

In [20]:

import fairlearn.reductions as red

np.random.seed(0) #need for reproducibility
exp_grad_red2 = ExponentiatedGradientReduction(prot_attr=prot_attr_cols,
                                               estimator=estimator,
                                               constraints=red.EqualizedOdds(),
                                               drop_prot_attr=False)
exp_grad_red2.fit(X_train, y_train)
exp_grad_red2.score(X_test, y_test)

Out[20]:

0.834303825458834

In [21]:

average_odds_error(y_test, exp_grad_red2.predict(X_test), prot_attr='sex')

Out[21]:

0.02361168550972803

In [22]:

average_odds_error(y_test, exp_grad_red2.predict(X_test), prot_attr='race')

Out[22]:

0.024975550258025947