This notebook demonstrates the capabilities of the DeterministicReranking algorithm.¶

The algorithm provides a way to construct balanced rankings of candidates based on precalculated scores.

Based on: Sahin Cem Geyik, Stuart Ambler, & Krishnaram Kenthapadi (2019). Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

The notebook is organized as follows:

Introduction;
A toy example;
Theoretical background.

1. Introduction¶

Ranking algorithms are at the core of search and recommendation systems used in, among others, hiring, college admissions, and web-searches. It is clear that algorithmic bias in such cases can create or amplify unacceptable discrimination based on race, gender, or other attributes. Thus, a way of constructing "fair" rankings is needed.

Algorithms presented here take as input a dataset that has already been ranked (scored), possibly by some other machine learning model, and order them in a way that satisfies specified fairness requirements, expressed in a form of a distribution over the protected attributes.

2. A toy example¶

Let's say we have a collection of 10 balls: 5 red and 5 blue. We assign each of them a score from 0 to 100; however, for some reason the red balls have a higher average score than the blue ones.

In [1]:

import numpy as np
import pandas as pd

balls = pd.DataFrame([['r', 100],['r', 90],['r', 85],['r', 70],['b', 70],['b', 60],['b', 50],['b', 40],['b', 30],['r', 20]],
                     columns=['color', 'score'])

print(f"Red mean score: {np.mean(balls[balls['color'] == 'r']['score'])}")
print(f"Blue mean score: {np.mean(balls[balls['color'] == 'b']['score'])}")

Red mean score: 73.0
Blue mean score: 50.0

Now, say we want to take the 6 best balls based on their score. In real life, the need for this limited "sub-ranking" may arise for many reasons. For example, the landing page of our ball-selling website may only have space for 6 items.

This is similar to the case presented in the original paper, where the algorithm is used to rank job-seekers for recruiters on LinkedIn.

In [2]:

balls.sort_values(by='score', ascending=False)[:6]

Out[2]:

	color	score
0	r	100
1	r	90
2	r	85
3	r	70
4	b	70
5	b	60

Of course, we notice that we have only 2 blue balls in this ranking, and it is in the last position! On one hand, it seems fair in terms of scores. However, we may want a more equal representation of different colors in the ranking. The possible motivations for that in real life are clear.

In our case, the color is the protected attribute, and the red balls are a privileged class.

We may get a fairer ranking using the DeterministicReranking class:

In [3]:

from aif360.datasets import RegressionDataset
from aif360.algorithms.postprocessing.deterministic_reranking import DeterministicReranking

WARNING:root:
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
    target = raw_df.values[1::2, 2]

Alternative datasets include the California housing dataset and the
Ames housing dataset. You can load the datasets as follows::

    from sklearn.datasets import fetch_california_housing
    housing = fetch_california_housing()

for the California housing dataset and::

    from sklearn.datasets import fetch_openml
    housing = fetch_openml(name="house_prices", as_frame=True)

for the Ames housing dataset.

[1] M Carlisle.
"Racist data destruction?"
<https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8>

[2] Harrison Jr, David, and Daniel L. Rubinfeld.
"Hedonic housing prices and the demand for clean air."
Journal of environmental economics and management 5.1 (1978): 81-102.
<https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>
: LawSchoolGPADataset will be unavailable. To install, run:
pip install 'aif360[LawSchoolGPA]'
WARNING:root:No module named 'tensorflow': AdversarialDebiasing will be unavailable. To install, run:
pip install 'aif360[AdversarialDebiasing]'
WARNING:root:No module named 'tensorflow': AdversarialDebiasing will be unavailable. To install, run:
pip install 'aif360[AdversarialDebiasing]'
WARNING:root:No module named 'fairlearn': ExponentiatedGradientReduction will be unavailable. To install, run:
pip install 'aif360[Reductions]'
WARNING:root:No module named 'fairlearn': GridSearchReduction will be unavailable. To install, run:
pip install 'aif360[Reductions]'
c:\Users\andre\miniconda3\envs\aif\lib\site-packages\torch\_functorch\deprecated.py:58: UserWarning: We've integrated functorch into PyTorch. As the final step of the integration, functorch.vmap is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use torch.vmap instead; see the PyTorch 2.0 release notes and/or the torch.func migration guide for more details https://pytorch.org/docs/master/func.migrating.html
  warn_deprecated('vmap', 'torch.vmap')
WARNING:root:No module named 'fairlearn': GridSearchReduction will be unavailable. To install, run:
pip install 'aif360[Reductions]'

In [4]:

# Initialize a RegressionDataset with color as the protected attribute and red as the privileged class.
balls_ds = RegressionDataset(df=balls, dep_var_name='score', protected_attribute_names=['color'], privileged_classes=[['r']])
# keep the un-normalized scores for clarity; RegressionDataset normalizes them to be from 0 to 1.
balls_ds.labels = np.transpose([balls['score']])

To initialize the DeterministicReranking class, we need to pass the protected attribute values of the privileged and unprivileged groups. We do that using a list of dictionaries for each group, with the name of the attribute as the key.

In [5]:

# The RegressionDataset class automatically maps the privileged attribute value (color='red') to 1 and the other to 0.
dr = DeterministicReranking(unprivileged_groups=[{'color': 0}], privileged_groups=[{'color': 1}])

Use the fit_predict method to get the ranking. The arguments are:

dataset is the dataset to construct a ranking from;
rec_size is the size of the ranking we need - in our case 6;
target_prop is the desired proportion of items of each group in the ranking in the form of dictionary; the keys are the corresponding protected attribute values. We need equal representation, so we pass {0: 0.5, 1: 0.5};
rerank_type is the algorithm to use; for further details, skip to section 4 of this notebook. For now, we stick to the default Constrained;
renormalize_scores will normalize the scores in the result so that the lowest is 0 and the highest is 1. Default is False.

In [6]:

fair_ranking = dr.fit_predict(dataset=balls_ds, rec_size=6, target_prop=[0.5, 0.5], rerank_type='Constrained')
fair_ranking.convert_to_dataframe()[0]

Out[6]:

	color	score
0	1.0	100.0
4	0.0	70.0
1	1.0	90.0
5	0.0	60.0
2	1.0	85.0
6	0.0	50.0

In this result, the proportions are equal. Additionally, as the algorithm goes through positions in the ranking one-by-one, checking each time for violations of fairness, the items belonging to the unprivileged group (blue balls) aren't all at the "bottom" of the ranking.

We can complicate the task a little by adding another protected attribute: let's call it size, which can be "large" or "small".

In [7]:

balls['size'] = ['l', 's', 'l', 's', 'l', 's', 'l', 's', 'l', 'l']
balls_ds = RegressionDataset(df=balls, dep_var_name='score',
                             protected_attribute_names=['color', 'size'],
                             privileged_classes=[['r'], ['l']])
balls_ds.convert_to_dataframe()[0]

Out[7]:

	color	size	score
0	1.0	1.0	1.0000
1	1.0	0.0	0.8750
2	1.0	1.0	0.8125
3	1.0	0.0	0.6250
4	0.0	1.0	0.6250
5	0.0	0.0	0.5000
6	0.0	1.0	0.3750
7	0.0	0.0	0.2500
8	0.0	1.0	0.1250
9	1.0	1.0	0.0000

Let's say that we want all possible combinations of the values to be represented equally in the output. We can do so by specifying more than one privileged/unprivileged group (the distinction between privileged and unprivileged groups isn't relevant in this algorihtm).

In [8]:

dr = DeterministicReranking(unprivileged_groups=[{'color': 0, 'size': 0}, {'color': 0, 'size': 1}, {'color': 1, 'size': 0}],
                            privileged_groups=[{'color': 1, 'size': 1}])
fair_ranking = dr.fit_predict(dataset=balls_ds, rec_size=6, target_prop=[0.25, 0.25, 0.25, 0.25])
fair_ranking.convert_to_dataframe()[0]

Out[8]:

	color	size	score
0	1.0	1.0	1.0000
1	1.0	0.0	0.8750
4	0.0	1.0	0.6250
5	0.0	0.0	0.5000
2	1.0	1.0	0.8125
3	1.0	0.0	0.6250

3. Variations of the algorithm¶

The predict and fit_predict methods of the algorithm include a rerank_type parameter. It refers to the different algorithms described in the original paper: Greedy, Conservative, Relaxed and Constrained. The difference between them lies in how, given a desired proportion of groups, they choose the next element in the ranking.

Before getting into the details, we need to formalize the properties we want our ranking to satisfy:

$\forall k \le |\tau_r|, \quad \forall a \in A: \quad count_k(a) \ge \lfloor p_a*k \rfloor$,
$\forall k \le |\tau_r|, \quad \forall a \in A: \quad count_k(a) \le \lceil p_a*k \rceil$

where $|\tau_r|$ is the size of the ranking, $A$ is the set of groups, $count_k(a)$ is the number of elements of group $a$ in first $k$ elements of the ranking, and $p_a$ is the desired proportion of elements belonging to the group $a$ in the ranking.

We refer to these properties as the minimum and the maximum representation constraints, respectively.

The demand for the properties to be satisfied at every $k$ comes from the observation that the position of an element in the ranking can have a significant impact on the response of the user.

The Greedy variant works as follows:

If there are any groups for which the minimum representation constraint is violated (the proportion of the group in the ranking is too small), choose the element with the highest score among those groups;
Otherwise, choose the element with the highest score among groups that do not violate the maximum constraint (i.e. aren't overrepresented).

The Conservative and Relaxed variants give preference to underrepresented groups:

If there are any groups for which the minimum representation constraint is violated, choose the element with the highest score among those groups (analogous to Greedy)
Otherwise, among groups that do not violate the maximum constraint, pick the group that minimizes $\frac{\lceil p_a*k \rceil}{p_a}$ if Conservative or $\lceil\frac{\lceil p_a*k \rceil}{p_a}\rceil$ if Relaxed. From this group, choose the element with the highest score.

The Constrained variant differs significantly from both of the previous ones:

Starting with 0, increase the value of k until the minimum representation constraint is increased for at least one group. If there are more than one such groups, order them according to the descending score of their highest-scoring candidates not yet in the ranking.
For each group in the list above:
1. Insert the next candidate from the group to the next empty index in the ranking
2. Swap the candidate towards earlier indices until:
  - Either the score of the candidate in the earlier index is larger, or,
  - Swapping will violate the minimum condition for the group of the candidate in the earlier index.

Experimental results (see reference) show that all algorithms exhibit similar performance in terms of both list utility and fairness. The Greedy algorithm generates ranked lists with higher utility, but doesn't strictly adhere to the fairness constraints; among the rest, Constrained shows the best utility.