The algorithm provides a way to construct balanced rankings of candidates based on precalculated scores.
The notebook is organized as follows:
Ranking algorithms are at the core of search and recommendation systems used in, among others, hiring, college admissions, and web-searches. It is clear that algorithmic bias in such cases can create or amplify unacceptable discrimination based on race, gender, or other attributes. Thus, a way of constructing "fair" rankings is needed.
Algorithms presented here take as input a dataset that has already been ranked (scored), possibly by some other machine learning model, and order them in a way that satisfies specified fairness requirements, expressed in a form of a distribution over the protected attributes.
Let's say we have a collection of 10 balls: 5 red and 5 blue. We assign each of them a score from 0 to 100; however, for some reason the red balls have a higher average score than the blue ones.
import numpy as np
import pandas as pd
balls = pd.DataFrame([['r', 100],['r', 90],['r', 85],['r', 70],['b', 70],['b', 60],['b', 50],['b', 40],['b', 30],['r', 20]],
columns=['color', 'score'])
print(f"Red mean score: {np.mean(balls[balls['color'] == 'r']['score'])}")
print(f"Blue mean score: {np.mean(balls[balls['color'] == 'b']['score'])}")
Red mean score: 73.0 Blue mean score: 50.0
Now, say we want to take the 6 best balls based on their score. In real life, the need for this limited "sub-ranking" may arise for many reasons. For example, the landing page of our ball-selling website may only have space for 6 items.
This is similar to the case presented in the original paper, where the algorithm is used to rank job-seekers for recruiters on LinkedIn.
balls.sort_values(by='score', ascending=False)[:6]
color | score | |
---|---|---|
0 | r | 100 |
1 | r | 90 |
2 | r | 85 |
3 | r | 70 |
4 | b | 70 |
5 | b | 60 |
Of course, we notice that we have only 2 blue balls in this ranking, and it is in the last position! On one hand, it seems fair in terms of scores. However, we may want a more equal representation of different colors in the ranking. The possible motivations for that in real life are clear.
In our case, the color is the protected attribute, and the red balls are a privileged class.
We may get a fairer ranking using the DeterministicReranking
class:
from aif360.datasets import RegressionDataset
from aif360.algorithms.postprocessing.deterministic_reranking import DeterministicReranking
WARNING:root: `load_boston` has been removed from scikit-learn since version 1.2. The Boston housing prices dataset has an ethical problem: as investigated in [1], the authors of this dataset engineered a non-invertible variable "B" assuming that racial self-segregation had a positive impact on house prices [2]. Furthermore the goal of the research that led to the creation of this dataset was to study the impact of air quality but it did not give adequate demonstration of the validity of this assumption. The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning. In this special case, you can fetch the dataset from the original source:: import pandas as pd import numpy as np data_url = "http://lib.stat.cmu.edu/datasets/boston" raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None) data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]]) target = raw_df.values[1::2, 2] Alternative datasets include the California housing dataset and the Ames housing dataset. You can load the datasets as follows:: from sklearn.datasets import fetch_california_housing housing = fetch_california_housing() for the California housing dataset and:: from sklearn.datasets import fetch_openml housing = fetch_openml(name="house_prices", as_frame=True) for the Ames housing dataset. [1] M Carlisle. "Racist data destruction?" <https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8> [2] Harrison Jr, David, and Daniel L. Rubinfeld. "Hedonic housing prices and the demand for clean air." Journal of environmental economics and management 5.1 (1978): 81-102. <https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air> : LawSchoolGPADataset will be unavailable. To install, run: pip install 'aif360[LawSchoolGPA]' WARNING:root:No module named 'tensorflow': AdversarialDebiasing will be unavailable. To install, run: pip install 'aif360[AdversarialDebiasing]' WARNING:root:No module named 'tensorflow': AdversarialDebiasing will be unavailable. To install, run: pip install 'aif360[AdversarialDebiasing]' WARNING:root:No module named 'fairlearn': ExponentiatedGradientReduction will be unavailable. To install, run: pip install 'aif360[Reductions]' WARNING:root:No module named 'fairlearn': GridSearchReduction will be unavailable. To install, run: pip install 'aif360[Reductions]' c:\Users\andre\miniconda3\envs\aif\lib\site-packages\torch\_functorch\deprecated.py:58: UserWarning: We've integrated functorch into PyTorch. As the final step of the integration, functorch.vmap is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use torch.vmap instead; see the PyTorch 2.0 release notes and/or the torch.func migration guide for more details https://pytorch.org/docs/master/func.migrating.html warn_deprecated('vmap', 'torch.vmap') WARNING:root:No module named 'fairlearn': GridSearchReduction will be unavailable. To install, run: pip install 'aif360[Reductions]'
# Initialize a RegressionDataset with color as the protected attribute and red as the privileged class.
balls_ds = RegressionDataset(df=balls, dep_var_name='score', protected_attribute_names=['color'], privileged_classes=[['r']])
# keep the un-normalized scores for clarity; RegressionDataset normalizes them to be from 0 to 1.
balls_ds.labels = np.transpose([balls['score']])
To initialize the DeterministicReranking class, we need to pass the protected attribute values of the privileged and unprivileged groups. We do that using a list of dictionaries for each group, with the name of the attribute as the key.
# The RegressionDataset class automatically maps the privileged attribute value (color='red') to 1 and the other to 0.
dr = DeterministicReranking(unprivileged_groups=[{'color': 0}], privileged_groups=[{'color': 1}])
Use the fit_predict
method to get the ranking. The arguments are:
dataset
is the dataset to construct a ranking from;rec_size
is the size of the ranking we need - in our case 6;target_prop
is the desired proportion of items of each group in the ranking in the form of dictionary; the keys are the corresponding protected attribute values. We need equal representation, so we pass {0: 0.5, 1: 0.5}
;rerank_type
is the algorithm to use; for further details, skip to section 4 of this notebook. For now, we stick to the default Constrained
;renormalize_scores
will normalize the scores in the result so that the lowest is 0 and the highest is 1. Default is False
.fair_ranking = dr.fit_predict(dataset=balls_ds, rec_size=6, target_prop=[0.5, 0.5], rerank_type='Constrained')
fair_ranking.convert_to_dataframe()[0]
color | score | |
---|---|---|
0 | 1.0 | 100.0 |
4 | 0.0 | 70.0 |
1 | 1.0 | 90.0 |
5 | 0.0 | 60.0 |
2 | 1.0 | 85.0 |
6 | 0.0 | 50.0 |
In this result, the proportions are equal. Additionally, as the algorithm goes through positions in the ranking one-by-one, checking each time for violations of fairness, the items belonging to the unprivileged group (blue balls) aren't all at the "bottom" of the ranking.
We can complicate the task a little by adding another protected attribute: let's call it size
, which can be "large" or "small".
balls['size'] = ['l', 's', 'l', 's', 'l', 's', 'l', 's', 'l', 'l']
balls_ds = RegressionDataset(df=balls, dep_var_name='score',
protected_attribute_names=['color', 'size'],
privileged_classes=[['r'], ['l']])
balls_ds.convert_to_dataframe()[0]
color | size | score | |
---|---|---|---|
0 | 1.0 | 1.0 | 1.0000 |
1 | 1.0 | 0.0 | 0.8750 |
2 | 1.0 | 1.0 | 0.8125 |
3 | 1.0 | 0.0 | 0.6250 |
4 | 0.0 | 1.0 | 0.6250 |
5 | 0.0 | 0.0 | 0.5000 |
6 | 0.0 | 1.0 | 0.3750 |
7 | 0.0 | 0.0 | 0.2500 |
8 | 0.0 | 1.0 | 0.1250 |
9 | 1.0 | 1.0 | 0.0000 |
Let's say that we want all possible combinations of the values to be represented equally in the output. We can do so by specifying more than one privileged/unprivileged group (the distinction between privileged and unprivileged groups isn't relevant in this algorihtm).
dr = DeterministicReranking(unprivileged_groups=[{'color': 0, 'size': 0}, {'color': 0, 'size': 1}, {'color': 1, 'size': 0}],
privileged_groups=[{'color': 1, 'size': 1}])
fair_ranking = dr.fit_predict(dataset=balls_ds, rec_size=6, target_prop=[0.25, 0.25, 0.25, 0.25])
fair_ranking.convert_to_dataframe()[0]
color | size | score | |
---|---|---|---|
0 | 1.0 | 1.0 | 1.0000 |
1 | 1.0 | 0.0 | 0.8750 |
4 | 0.0 | 1.0 | 0.6250 |
5 | 0.0 | 0.0 | 0.5000 |
2 | 1.0 | 1.0 | 0.8125 |
3 | 1.0 | 0.0 | 0.6250 |
The predict
and fit_predict
methods of the algorithm include a rerank_type
parameter. It refers to the different algorithms described in the original paper: Greedy, Conservative, Relaxed and Constrained. The difference between them lies in how, given a desired proportion of groups, they choose the next element in the ranking.
Before getting into the details, we need to formalize the properties we want our ranking to satisfy:
where $|\tau_r|$ is the size of the ranking, $A$ is the set of groups, $count_k(a)$ is the number of elements of group $a$ in first $k$ elements of the ranking, and $p_a$ is the desired proportion of elements belonging to the group $a$ in the ranking.
We refer to these properties as the minimum and the maximum representation constraints, respectively.
The demand for the properties to be satisfied at every $k$ comes from the observation that the position of an element in the ranking can have a significant impact on the response of the user.
The Greedy variant works as follows:
The Conservative and Relaxed variants give preference to underrepresented groups:
The Constrained variant differs significantly from both of the previous ones:
Experimental results (see reference) show that all algorithms exhibit similar performance in terms of both list utility and fairness. The Greedy algorithm generates ranked lists with higher utility, but doesn't strictly adhere to the fairness constraints; among the rest, Constrained shows the best utility.