Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

被引：18

作者：

Chen, Yan ^{[1
]}

Mahoney, Christopher ^{[1
]}

Grasso, Isabella ^{[1
]}

Wali, Esma ^{[1
]}

Matthews, Abigail ^{[2
]}

Middleton, Thomas ^{[1
]}

Njie, Mariama ^{[3
]}

Matthews, Jeanna ^{[1
]}

机构：

[1] Clarkson Univ, Potsdam, NY 13676 USA

[2] Univ Wisconsin Madison, Madison, WI USA

[3] Iona Coll, New York, NY USA

来源：

AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY | 2021年

关键词：

bias; gender bias; natural language processing;

D O I：

10.1145/3461702.3462530

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.

引用

页码：24 / 34

页数：11

共 50 条

[1] Natural language processing for under-resourced languages: Developing a Welsh natural language toolkit
Cunliffe, Daniel
Vlachidis, Andreas
Williams, Daniel
Tudhope, Douglas
COMPUTER SPEECH AND LANGUAGE, 2022, 72
[2] Gender Bias in Natural Language Processing and Computer Vision: A Comparative Survey
Bartl, Marion
Mandal, Abhishek
Leavy, Susan
Little, Suzanne
ACM COMPUTING SURVEYS, 2025, 57 (06)
[3] Natural Language Processing in Translation of Relational Languages
Dudas, Adam
Skrinarova, Jarmila
IPSI BGD TRANSACTIONS ON INTERNET RESEARCH, 2023, 19 (01): : 17 - 23
[4] Intelligent Approaches for Natural Language Processing for Indic Languages
Kumar, Rashi
Sahula, Vineet
2021 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2021), 2021, : 331 - 334
[5] A KNOWLEDGE REPRESENTATION LANGUAGE FOR NATURAL LANGUAGE PROCESSING, SIMULATION AND REASONING
McShane, Marjorie
Nirenburg, Sergei
INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2012, 6 (01) : 3 - 23
[6] Editorial: Bias, Subjectivity and Perspectives in Natural Language Processing
Basile, Valerio
Caselli, Tommaso
Balahur, Alexandra
Ku, Lun-Wei
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
[7] Natural language processing as human language engineering
Di Felippo, Ariani
Dias-da-Silva, Bento Carlos
CALIDOSCOPIO, 2009, 7 (03): : 183 - 191
[8] Uncovering gender dimensions in energy policy using Natural Language Processing
Carroll, P.
Singh, B.
Mangina, E.
RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2024, 193
[9] Designing a Uniform Meaning Representation for Natural Language Processing
Jens E. L. Van Gysel
Meagan Vigus
Jayeol Chun
Kenneth Lai
Sarah Moeller
Jiarui Yao
Tim O’Gorman
Andrew Cowell
William Croft
Chu-Ren Huang
Jan Hajič
James H. Martin
Stephan Oepen
Martha Palmer
James Pustejovsky
Rosa Vallejos
Nianwen Xue
KI - Künstliche Intelligenz, 2021, 35 : 343 - 360
[10] Designing a Uniform Meaning Representation for Natural Language Processing
Van Gysel, Jens E. L.
Vigus, Meagan
Chun, Jayeol
Lai, Kenneth
Moeller, Sarah
Yao, Jiarui
O'Gorman, Tim
Cowell, Andrew
Croft, William
Huang, Chu-Ren
Hajic, Jan
Martin, James H.
Oepen, Stephan
Palmer, Martha
Pustejovsky, James
Vallejos, Rosa
Xue, Nianwen
KUNSTLICHE INTELLIGENZ, 2021, 35 (3-4): : 343 - 360

← 1 2 3 4 5 →