Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

被引:18
作者
Chen, Yan [1 ]
Mahoney, Christopher [1 ]
Grasso, Isabella [1 ]
Wali, Esma [1 ]
Matthews, Abigail [2 ]
Middleton, Thomas [1 ]
Njie, Mariama [3 ]
Matthews, Jeanna [1 ]
机构
[1] Clarkson Univ, Potsdam, NY 13676 USA
[2] Univ Wisconsin Madison, Madison, WI USA
[3] Iona Coll, New York, NY USA
来源
AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY | 2021年
关键词
bias; gender bias; natural language processing;
D O I
10.1145/3461702.3462530
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.
引用
收藏
页码:24 / 34
页数:11
相关论文
共 50 条
  • [31] Construction and Application of Text Classification Model under Natural Language Processing
    Sun, Zhongnuo
    Gao, Pan
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MODELING, NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING, CMNM 2024, 2024, : 226 - 231
  • [32] Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups
    Thompson, Hale M.
    Sharma, Brihat
    Bhalla, Sameer
    Boley, Randy
    McCluskey, Connor
    Dligach, Dmitriy
    Churpek, Matthew M.
    Karnik, Niranjan S.
    Afshar, Majid
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (11) : 2393 - 2403
  • [33] Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words
    Charlesworth, Tessa E. S.
    Yang, Victor
    Mann, Thomas C.
    Kurdi, Benedek
    Banaji, Mahzarin R.
    PSYCHOLOGICAL SCIENCE, 2021, 32 (02) : 218 - 240
  • [34] A Survey on Challenges and Advances in Natural Language Processing with a Focus on Legal Informatics and Low-Resource Languages
    Krasadakis, Panteleimon
    Sakkopoulos, Evangelos
    Verykios, Vassilios S.
    ELECTRONICS, 2024, 13 (03)
  • [35] Evaluating the Availability of Resources, Research Hubs, and Financial Supports for Nigerian Languages Natural Language Processing Research
    Asubiaro, Toluwase
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2021, 43 (03): : 269 - 290
  • [36] Using Natural Language Processing and Machine Learning to Replace Human Content Coders
    Wang, Yilei
    Tian, Jingyuan
    Yazar, Yagizhan
    Ones, Deniz S.
    Landers, Richard N.
    PSYCHOLOGICAL METHODS, 2022, : 1148 - 1163
  • [37] Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation
    Kalpakchi, Dmytro
    Boye, Johan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 856 - 866
  • [38] Global Research on Natural Disasters and Human Health: a Mapping Study Using Natural Language Processing Techniques
    Ye, Xin
    Lin, Hugo
    CURRENT ENVIRONMENTAL HEALTH REPORTS, 2024, 11 (01) : 61 - 70
  • [39] The dynamics of natural language processing and text mining under emerging artificial intelligence techniques
    Dimlo, U. M. Fernandes
    Rupesh, V.
    Raju, Yeligeti
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (09) : 4512 - 4526
  • [40] Global Research on Natural Disasters and Human Health: a Mapping Study Using Natural Language Processing Techniques
    Xin Ye
    Hugo Lin
    Current Environmental Health Reports, 2024, 11 : 61 - 70