This project aims to identify, measure, and mitigate social biases, such as gender, race, or profession-related stereotypes, in lightweight transformer models through hands-on fine-tuning and evaluation on targeted NLP tasks. More specifically, the project should implement a four-step methodology, defined as follows:
- Choose a lightweight pre-trained transformer model (e.g., DistilBERT, ALBERT, RoBERTa-base) suitable for local fine-tuning and evaluation.
- Evaluate the presence and extent of social bias (e.g., gender, racial, or occupational stereotypes) using dedicated benchmark datasets. Both quantitative metrics and qualitative outputs should be evaluated.
- Apply a bias mitigation technique, such as fine-tuning on curated counter-stereotypical data, integrating adapter layers, or employing contrastive learning, while keeping the solution computationally efficient and transparent.
- Re-assess the model using the same benchmark(s) to measure improvements. We should compare pre- and post-intervention results, discuss trade-offs (e.g., performance vs. fairness), and visualize the impact of their approach.
- StereoSet: Measuring stereotypical bias in pretrained language models. Nadeem, M., Bethke, A., & Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
- StereoSet: Measuring stereotypical bias in pretrained language models. Hugging Face co.
- StereoSet: Measuring stereotypical bias in pretrained language models 2021.acl-long.416.
- Zhang, Y., & Zhou, F. (2024). Bias mitigation in fine-tuning pre-trained models for enhanced fairness and efficiency. arXiv preprint arXiv:2403.00625.
- Fu, C. L., Chen, Z. C., Lee, Y. R., & Lee, H. Y. (2022). Adapterbias: Parameter-efficient token-dependent representation shift for adapters in nlp tasks. arXiv preprint arXiv:2205.00305.
- Park, K., Oh, S., Kim, D., & Kim, J. (2024, June). Contrastive Learning as a Polarizer: Mitigating Gender Bias by Fair and Biased sentences. In Findings of the Association for Computational Linguistics: NAACL 2024 (pp. 4725-4736).