This notebook demonstrates the use of the learning fair representations algorithm for bias mitigation¶

Learning fair representations [1] is a pre-processing technique that finds a latent representation which encodes the data well but obfuscates information about protected attributes. We will see how to use this algorithm for learning representations that encourage individual fairness and apply them on the Adult dataset.

In [1]:

%matplotlib inline
# Load all necessary packages
import sys
sys.path.append("../")
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import AdultDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric
from aif360.metrics.utils import compute_boolean_conditioning_vector

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_adult
from aif360.algorithms.preprocessing.lfr import LFR

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

from IPython.display import Markdown, display
import matplotlib.pyplot as plt

Load dataset and set options¶

In [2]:

# Get the dataset and split into train and test
dataset_orig = load_preproc_data_adult()
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

Clean up training data¶

In [3]:

# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

Training Dataset shape¶

(34189, 18)

Favorable and unfavorable labels¶

(1.0, 0.0)

Protected attribute names¶

['sex', 'race']

Privileged and unprivileged protected attribute values¶

([array([1.]), array([1.])], [array([0.]), array([0.])])

Dataset feature names¶

['race', 'sex', 'Age (decade)=10', 'Age (decade)=20', 'Age (decade)=30', 'Age (decade)=40', 'Age (decade)=50', 'Age (decade)=60', 'Age (decade)=>=70', 'Education Years=6', 'Education Years=7', 'Education Years=8', 'Education Years=9', 'Education Years=10', 'Education Years=11', 'Education Years=12', 'Education Years=<6', 'Education Years=>12']

Metric for original training data¶

In [4]:

# Metric for the original dataset
privileged_groups = [{'sex': 1.0}]
unprivileged_groups = [{'sex': 0.0}]
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

Original training dataset¶

Difference in mean outcomes between unprivileged and privileged groups = -0.196576

Train with and transform the original training data¶

In [5]:

# Input recontruction quality - Ax
# Fairness constraint - Az
# Output prediction error - Ay

privileged_groups = [{'sex': 1}]
unprivileged_groups = [{'sex': 0}]
    
TR = LFR(unprivileged_groups = unprivileged_groups, privileged_groups = privileged_groups)
TR = TR.fit(dataset_orig_train)

250 20650.5619275
500 19749.3611659
750 18978.3028853
1000 17995.3008952
1250 16691.6273592
1500 16443.051177
1750 16096.538506
2000 15761.1807924
2250 17279.0619677
2500 15737.4111706
2750 15673.1203091
3000 15576.8576761
3250 15454.6964063
3500 15416.9562705
3750 15316.2346051
4000 15247.5321307
4250 15177.9090497
4500 15138.4650177
4750 15108.6571538
5000 15082.626066

In [6]:

# Transform training data and align features
dataset_transf_train = TR.transform(dataset_orig_train)

Metric with the transformed training data¶

In [7]:

from sklearn.metrics import classification_report
thresholds = [0.2, 0.3, 0.35, 0.4, 0.5]
for threshold in thresholds:
    
    # Transform training data and align features
    dataset_transf_train = TR.transform(dataset_orig_train, threshold=threshold)

    metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
    display(Markdown("#### Transformed training dataset"))
    print("Classification threshold = %f" % threshold)
    #print(classification_report(dataset_orig_train.labels, dataset_transf_train.labels))
    print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

Transformed training dataset¶

Classification threshold = 0.200000
Difference in mean outcomes between unprivileged and privileged groups = -0.501460

Transformed training dataset¶

Classification threshold = 0.300000
Difference in mean outcomes between unprivileged and privileged groups = -0.358144

Transformed training dataset¶

Classification threshold = 0.350000
Difference in mean outcomes between unprivileged and privileged groups = -0.258428

Transformed training dataset¶

Classification threshold = 0.400000
Difference in mean outcomes between unprivileged and privileged groups = -0.311679

Transformed training dataset¶

Classification threshold = 0.500000
Difference in mean outcomes between unprivileged and privileged groups = -0.214736

Optimized preprocessing has reduced the disparity in favorable outcomes between the privileged and unprivileged groups (training data).

In [8]:

display(Markdown("#### Individual fairness metrics"))
print("Consistency of labels in transformed training dataset= %f" %metric_transf_train.consistency())
print("Consistency of labels in original training dataset= %f" %metric_orig_train.consistency())

Individual fairness metrics¶

Consistency of labels in transformed training dataset= 1.000000
Consistency of labels in original training dataset= 0.742326

In [9]:

## PCA Analysis of consitency

In [10]:

import pandas as pd

feat_cols = dataset_orig_train.feature_names

orig_df = pd.DataFrame(dataset_orig_train.features,columns=feat_cols)
orig_df['label'] = dataset_orig_train.labels
orig_df['label'] = orig_df['label'].apply(lambda i: str(i))

transf_df = pd.DataFrame(dataset_transf_train.features,columns=feat_cols)
transf_df['label'] = dataset_transf_train.labels
transf_df['label'] = transf_df['label'].apply(lambda i: str(i))

In [11]:

from sklearn.decomposition import PCA

orig_pca = PCA(n_components=3)
orig_pca_result = orig_pca.fit_transform(orig_df[feat_cols].values)

orig_df['pca-one'] = orig_pca_result[:,0]
orig_df['pca-two'] = orig_pca_result[:,1] 
orig_df['pca-three'] = orig_pca_result[:,2]

display(Markdown("#### Original training dataset"))
print('Explained variation per principal component:')
print(orig_pca.explained_variance_ratio_)

Original training dataset¶

Explained variation per principal component:
[0.15355025 0.14652464 0.12614707]

In [12]:

transf_pca = PCA(n_components=3)
transf_pca_result = transf_pca.fit_transform(transf_df[feat_cols].values)

transf_df['pca-one'] = transf_pca_result[:,0]
transf_df['pca-two'] = transf_pca_result[:,1] 
transf_df['pca-three'] = transf_pca_result[:,2]

display(Markdown("#### Transformed training dataset"))
print('Explained variation per principal component:')
print(transf_pca.explained_variance_ratio_)

Transformed training dataset¶

Explained variation per principal component:
[0.63337409 0.3302964  0.03632951]

Load, clean up original test data and compute metric¶

In [13]:

display(Markdown("#### Testing Dataset shape"))
print(dataset_orig_test.features.shape)

metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                                         unprivileged_groups=unprivileged_groups,
                                         privileged_groups=privileged_groups)
display(Markdown("#### Original test dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

Testing Dataset shape¶

(14653, 18)

Original test dataset¶

Difference in mean outcomes between unprivileged and privileged groups = -0.189722

Transform test data and compute metric¶

In [14]:

dataset_transf_test = TR.transform(dataset_orig_test, threshold=threshold)
metric_transf_test = BinaryLabelDatasetMetric(dataset_transf_test, 
                                         unprivileged_groups=unprivileged_groups,
                                         privileged_groups=privileged_groups)

In [15]:

print("Consistency of labels in tranformed test dataset= %f" %metric_transf_test.consistency())

Consistency of labels in tranformed test dataset= 1.000000

In [16]:

print("Consistency of labels in original test dataset= %f" %metric_orig_test.consistency())

Consistency of labels in original test dataset= 0.738798

In [17]:

def check_algorithm_success():
    """Transformed dataset consistency should be greater than original dataset."""
    assert metric_transf_test.consistency() > metric_orig_test.consistency(), "Transformed dataset consistency should be greater than original dataset."

check_algorithm_success()    

References:
[1] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork,  "Learning Fair Representations." 
International Conference on Machine Learning, 2013.

In [ ]: