* عادلة, feminine Arabic given name, meaning just and fair
2025, COIN, A Probabilistic Greedy Attempt to be Fair in Neural Team Recommendation.
Under Review
2023, BIAS-ECIR, Bootless Application of Greedy Re-ranking Algorithms in Fair Neural Team Formation.
doireviewsvideo
Team Recommendation aims to automate forming teams of experts who can collaborate and successfully solve tasks. While state-of-the-art methods are able to efficiently analyze massive collections of experts to recommend effective collaborative teams, they largely ignore the fairness in the recommended experts; our experiments show that they are biased toward popular and male experts. In Adila, we aim to mitigate the potential biases for fair team recommendation. Fairness breeds innovation and increases teams' success by enabling a stronger sense of community, reducing conflict, and stimulating more creative thinking.
We have studied the application of state-of-the-art deterministic greedy re-ranking methods [Geyik et al. KDD'19] in addition to probabilistic greedy re-ranking methods [Zehlike et al. IP&M'22]to mitigate populairty bias and gender bias based on equal opportunity and demographic parity notions of fairness for state-of-the-art neural team formation methods from OpeNTF. Our experiments show that:
Although deterministic re-ranking algorithms mitigate
popularityxorgenderbias, they hurt the efficacy of teams, i.e., higher fairness metrics yet lower utility metrics (successful team)
Probabilistic greedy re-ranking algorithms mitigate
popularitybias significantly and maintain utility. Though in terms ofgender, such algorithms fail due to extreme bias in a dataset.
Currently, we are investigating:
Other fairness factors like demographic attributes, including
age, andrace;
Developing machine learning-based models using Learning-to-Rank (L2R) techniques to mitigate bias as opposed to deterministic greedy algorithms.
Adila needs Python >= 3.8 and installs required packages lazily and on-demand, i.e., as it goes through the steps of the pipeline, it installs a package if the package or the correct version is not available in the environment. For further details, refer to requirements.txt and pkgmgr.py. To set up an environment locally:
#python3.8
python -m venv adila_venv
source adila_venv/bin/activate #non-windows
#adila_venv\Scripts\activate #windows
pip install --upgrade pip
pip install -r requirements.txtcd src
python main.py data.fpred=../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/f0.test.pred \ # the recommended teams for the test set of size |test|×|experts|, to be reranked for fairness
data.fteamsvecs: ../output/dblp/toy.dblp.v12.json/teamsvecs.pkl \ # the sparse 1-hot representation of all teams of size |dataset|×|skills| and |dataset|×|experts|
data.fgender: ../output/dblp/toy.dblp.v12.json/females.csv \ # column indices of females (minority labels) in teamsvecs.pkl
data.fsplits: ../output/dblp/toy.dblp.v12.json/splits.f3.r0.85.pkl \ # the splits information including the rowids of teams in the test and train sets
data.output: ../output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000 \ # output folder for the reranked version and respective eval files
"fair.algorithm=[fa-ir]" \ # fairness-aware reranker algorithm
"fair.notion=[eo]" \ # notion of fairness, equal opportunity
"fair.attribute=[gender]" \ # protected/sensitive attribute
"eval.fair_metrics=[ndkl,skew]" # metrics to measure fairness of the original (before) vs. reranked (after) versions of recommendations
"eval.utility_metrics.trec=[P_topk,ndcg_cut_topk]" # metrics to measure accuracy of the original (before) vs. reranked (after) versions of recommendations
eval.utility_metrics.topk='2,5,10' The above run, loads member recommendations by the random model in OpeNTF for test teams of a tiny-size toy example dataset toy.dblp.v12.json from dblp. Then, reranks the members for each team using the fairness algorithm fa-ir to provide fair distribution of experts based on their gender to mitigate bias toward the minority group, i.e., females. For a step-by-step guide and output trace, see our colab script .
Adila needs preprocessed information about the teams in the form of sparse matrix representation (data.fteamsvecs) and neural team formation prediction file(s) (data.fpred), obtained from OpeNTF:
.
├── data
│ └── {dblp, imdb, uspt}
└── output
└── dblp
└── toy.dblp.v12.json
├── females.csv
├── teamsvecs.pkl
├── splits.f3.r0.85.pkl
└── splits.f3.r0.85
└── rnd.b1000
├── f0.test.pred
├── f0.test.pred.eval.mean.csvAdila has three main steps:
Based on the distribution of experts on teams, which is power law (long tail) as shown in the figure, we label those in the tail as nonpopular and those in the head as popular. To find the cutoff between head and tail, we calculate the avg number of teams per expert over the entire dataset, or based on equal area under the curve auc. The result is a set of expert ids for popular experts as the minority group and is save in {data.output}/adila/popularity.{avg,auc}/labels.csv like ./output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/popularity.avg/labels.csv
As seen in above figures for the training datasets `imdb`, `dblp` and `uspt` in team recommendation, gender distributions are highly bised toward majority `males` and unfair for `minority` `females`. We obtain gender labels for experts either from the original dataset or via `https://gender-api.com/` and `https://genderize.io/`, located at [`./output/dblp/toy.dblp.v12.json/females.csv`](./output/dblp/toy.dblp.v12.json/females.csv).We treat
popularityas theprotected attributebut theprotected groupis the set ofnon-popularexperts, who are themajority, as opposed to theminoritypopular experts.
We treat
genderas theprotected attributeand theprotected groupis the set offemaleexperts, who are theminority, as opposed to themajaritymaleexperts.
We apply rerankers including {'det_greedy', 'det_cons', 'det_relaxed', fa-ir} to mitigate populairty or gender bias. The reranker needs a cutoff fair.k_max.
The result of predictions after reranking is saved in {data.output}/adila/{fair.attribute: gender, popularity}/{fair.notion: dp, eo}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred like /output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/gender/dp/f0.test.pred.det_cons.5.rerank.pred.
We evaluate fairness and utility metrics before and after applying rerankers on team predictions to see whether re-ranking algorithms improve the fairness in team recommendations while maintaining their accuracy.
The result of
fairnessmetricsbeforeandafterwill be stored in{data.output}/adila/{fair.attribute: gender, popularity}/{fair.notion: dp, eo}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.fair.{instance, mean}.csvlike./output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/gender/dp/f0.test.pred.det_cons.5.rerank.pred.eval.fair.mean.csv.
The result of
utilitymetricsbeforeandafterwill be stored in{data.output}/adila/{fair.attribute: gender, popularity}/{fair.notion: dp, eo}/{data.fpred}.{fair.algorithm}.{fair.k_max}.rerank.pred.eval.utility.{instance, mean}.csvlike./output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/adila/gender/dp/f0.test.pred.det_cons.5.rerank.pred.eval.utility.mean.csv.
After successful run of all steps, the {data.output} like ./output/dblp/toy.dblp.v12.json/splits.f3.r0.85/rnd.b1000/ contains:
.
├── f0.test.pred
├── f0.test.pred.eval.instance.csv
├── f0.test.pred.eval.mean.csv
├── adila
│ ├── gender
│ │ ├── dp
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.fair.instance.csv
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.fair.mean.csv
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.utility.instance.csv
│ │ │ └── f0.test.pred.fa-ir.10.5.rerank.pred.eval.utility.mean.csv
│ │ ├── eo
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.fair.instance.csv
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.fair.mean.csv
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.utility.instance.csv
│ │ │ ├── f0.test.pred.fa-ir.10.5.rerank.pred.eval.utility.mean.csv
│ │ │ └── ratios.pkl
│ │ ├── labels.csv
│ │ └── stats.pkl
│ ├── popularity.auc
│ │ ├── dp
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.fair.instance.csv
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.fair.mean.csv
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.utility.instance.csv
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.utility.mean.csv
│ │ │ └── f0.test.pred.fa-ir.auc.10.5.rerank.pred
│ │ ├── eo
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.fair.instance.csv
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.fair.mean.csv
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.utility.instance.csv
│ │ │ ├── f0.test.pred.fa-ir.auc.10.5.rerank.pred.eval.utility.mean.csv
│ │ │ └── ratios.pkl
│ │ ├── labels.csv
│ │ └── stats.pklWe benefit from reranking and fairsearchcore, and other libraries. We would like to thank the authors of these libraries and helpful resources.
©2025. This work is licensed under a CC BY-NC-SA 4.0 license.




