Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

被引:18
|
作者
Chen, Yan [1 ]
Mahoney, Christopher [1 ]
Grasso, Isabella [1 ]
Wali, Esma [1 ]
Matthews, Abigail [2 ]
Middleton, Thomas [1 ]
Njie, Mariama [3 ]
Matthews, Jeanna [1 ]
机构
[1] Clarkson Univ, Potsdam, NY 13676 USA
[2] Univ Wisconsin Madison, Madison, WI USA
[3] Iona Coll, New York, NY USA
来源
AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY | 2021年
关键词
bias; gender bias; natural language processing;
D O I
10.1145/3461702.3462530
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.
引用
收藏
页码:24 / 34
页数:11
相关论文
共 50 条
  • [1] Natural language processing for under-resourced languages: Developing a Welsh natural language toolkit
    Cunliffe, Daniel
    Vlachidis, Andreas
    Williams, Daniel
    Tudhope, Douglas
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [2] Gender Bias in Natural Language Processing and Computer Vision: A Comparative Survey
    Bartl, Marion
    Mandal, Abhishek
    Leavy, Susan
    Little, Suzanne
    ACM COMPUTING SURVEYS, 2025, 57 (06)
  • [3] Natural Language Processing in Translation of Relational Languages
    Dudas, Adam
    Skrinarova, Jarmila
    IPSI BGD TRANSACTIONS ON INTERNET RESEARCH, 2023, 19 (01): : 17 - 23
  • [4] Intelligent Approaches for Natural Language Processing for Indic Languages
    Kumar, Rashi
    Sahula, Vineet
    2021 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2021), 2021, : 331 - 334
  • [5] A KNOWLEDGE REPRESENTATION LANGUAGE FOR NATURAL LANGUAGE PROCESSING, SIMULATION AND REASONING
    McShane, Marjorie
    Nirenburg, Sergei
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2012, 6 (01) : 3 - 23
  • [6] Editorial: Bias, Subjectivity and Perspectives in Natural Language Processing
    Basile, Valerio
    Caselli, Tommaso
    Balahur, Alexandra
    Ku, Lun-Wei
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [7] Natural language processing as human language engineering
    Di Felippo, Ariani
    Dias-da-Silva, Bento Carlos
    CALIDOSCOPIO, 2009, 7 (03): : 183 - 191
  • [8] Uncovering gender dimensions in energy policy using Natural Language Processing
    Carroll, P.
    Singh, B.
    Mangina, E.
    RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2024, 193
  • [9] Designing a Uniform Meaning Representation for Natural Language Processing
    Jens E. L. Van Gysel
    Meagan Vigus
    Jayeol Chun
    Kenneth Lai
    Sarah Moeller
    Jiarui Yao
    Tim O’Gorman
    Andrew Cowell
    William Croft
    Chu-Ren Huang
    Jan Hajič
    James H. Martin
    Stephan Oepen
    Martha Palmer
    James Pustejovsky
    Rosa Vallejos
    Nianwen Xue
    KI - Künstliche Intelligenz, 2021, 35 : 343 - 360
  • [10] Designing a Uniform Meaning Representation for Natural Language Processing
    Van Gysel, Jens E. L.
    Vigus, Meagan
    Chun, Jayeol
    Lai, Kenneth
    Moeller, Sarah
    Yao, Jiarui
    O'Gorman, Tim
    Cowell, Andrew
    Croft, William
    Huang, Chu-Ren
    Hajic, Jan
    Martin, James H.
    Oepen, Stephan
    Palmer, Martha
    Pustejovsky, James
    Vallejos, Rosa
    Xue, Nianwen
    KUNSTLICHE INTELLIGENZ, 2021, 35 (3-4): : 343 - 360