Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

被引：18

作者：

Chen, Yan ^{[1
]}

Mahoney, Christopher ^{[1
]}

Grasso, Isabella ^{[1
]}

Wali, Esma ^{[1
]}

Matthews, Abigail ^{[2
]}

Middleton, Thomas ^{[1
]}

Njie, Mariama ^{[3
]}

Matthews, Jeanna ^{[1
]}

机构：

[1] Clarkson Univ, Potsdam, NY 13676 USA

[2] Univ Wisconsin Madison, Madison, WI USA

[3] Iona Coll, New York, NY USA

来源：

AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY | 2021年

关键词：

bias; gender bias; natural language processing;

D O I：

10.1145/3461702.3462530

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.

引用

页码：24 / 34

页数：11

共 50 条

[31] Construction and Application of Text Classification Model under Natural Language Processing
Sun, Zhongnuo
Gao, Pan
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MODELING, NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING, CMNM 2024, 2024, : 226 - 231
[32] Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
Thompson, Hale M.
Sharma, Brihat
Bhalla, Sameer
Boley, Randy
McCluskey, Connor
Dligach, Dmitriy
Churpek, Matthew M.
Karnik, Niranjan S.
Afshar, Majid
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (11) : 2393 - 2403
[33] Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words
Charlesworth, Tessa E. S.
Yang, Victor
Mann, Thomas C.
Kurdi, Benedek
Banaji, Mahzarin R.
PSYCHOLOGICAL SCIENCE, 2021, 32 (02) : 218 - 240
[34] A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages
Krasadakis, Panteleimon
Sakkopoulos, Evangelos
Verykios, Vassilios S.
ELECTRONICS, 2024, 13 (03)
[35] Evaluating the Availability of Resources, Research Hubs, and Financial Supports for Nigerian Languages Natural Language Processing Research
Asubiaro, Toluwase
CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2021, 43 (03): : 269 - 290
[36] Using Natural Language Processing and Machine Learning to Replace Human Content Coders
Wang, Yilei
Tian, Jingyuan
Yazar, Yagizhan
Ones, Deniz S.
Landers, Richard N.
PSYCHOLOGICAL METHODS, 2022, : 1148 - 1163
[37] Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation
Kalpakchi, Dmytro
Boye, Johan
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 856 - 866
[38] Global Research on Natural Disasters and Human Health: a Mapping Study Using Natural Language Processing Techniques
Ye, Xin
Lin, Hugo
CURRENT ENVIRONMENTAL HEALTH REPORTS, 2024, 11 (01) : 61 - 70
[39] The dynamics of natural language processing and text mining under emerging artificial intelligence techniques
Dimlo, U. M. Fernandes
Rupesh, V.
Raju, Yeligeti
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (09) : 4512 - 4526
[40] Global Research on Natural Disasters and Human Health: a Mapping Study Using Natural Language Processing Techniques
Xin Ye
Hugo Lin
Current Environmental Health Reports, 2024, 11 : 61 - 70

← 1 2 3 4 5 →