Biases in Large Language Models: Origins, Inventory, and Discussion

被引:77
作者
Navigli, Roberto [1 ]
Conia, Simone [1 ]
Ross, Bjorn [2 ]
机构
[1] Sapienza Univ Rome, Via Ariosto 25, Rome, Italy
[2] Univ Edinburgh, 10 Crichton St, Edinburgh, Midlothian, Scotland
来源
ACM JOURNAL OF DATA AND INFORMATION QUALITY | 2023年 / 15卷 / 02期
关键词
Bias in NLP; language models; HEALTH;
D O I
10.1145/3597307
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we introduce and discuss the pervasive issue of bias in the large language models that are currently at the core of mainstream approaches to Natural Language Processing (NLP). We first introduce data selection bias, that is, the bias caused by the choice of texts that make up a training corpus. Then, we survey the different types of social bias evidenced in the text generated by language models trained on such corpora, ranging from gender to age, from sexual orientation to ethnicity, and from religion to culture. We conclude with directions focused on measuring, reducing, and tackling the aforementioned types of bias.
引用
收藏
页数:21
相关论文
共 132 条
[1]   Persistent Anti-Muslim Bias in Large Language Models [J].
Abid, Abubakar ;
Farooqi, Maheen ;
Zou, James .
AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2021, :298-306
[2]  
Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
[3]  
Ahn J, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P533
[4]  
Angwin J., 2016, PROPUBLICA
[5]  
[Anonymous], 2009, Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL 2009, Boulder, Colorado, USA, June 4, 2009
[6]  
Brown TB, 2020, Arxiv, DOI arXiv:2005.14165
[7]   THE WAGE EFFECTS OF SEXUAL ORIENTATION DISCRIMINATION [J].
BADGETT, MVL .
INDUSTRIAL & LABOR RELATIONS REVIEW, 1995, 48 (04) :726-739
[8]   Data and Algorithmic Bias in the Web [J].
Baeza-Yates, Ricardo .
PROCEEDINGS OF THE 2016 ACM WEB SCIENCE CONFERENCE (WEBSCI'16), 2016, :1-1
[9]  
Barba E, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P4661
[10]  
Barba E, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P2478