Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

被引:18
作者
Chen, Yan [1 ]
Mahoney, Christopher [1 ]
Grasso, Isabella [1 ]
Wali, Esma [1 ]
Matthews, Abigail [2 ]
Middleton, Thomas [1 ]
Njie, Mariama [3 ]
Matthews, Jeanna [1 ]
机构
[1] Clarkson Univ, Potsdam, NY 13676 USA
[2] Univ Wisconsin Madison, Madison, WI USA
[3] Iona Coll, New York, NY USA
来源
AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY | 2021年
关键词
bias; gender bias; natural language processing;
D O I
10.1145/3461702.3462530
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.
引用
收藏
页码:24 / 34
页数:11
相关论文
共 50 条
  • [41] Using natural language processing to understand, facilitate and maintain continuity in patient experience across transitions of care
    Khanbhai, Mustafa
    Warren, Leigh
    Symons, Joshua
    Flott, Kelsey
    Harrison-White, Stephanie
    Manton, Dave
    Darzi, Ara
    Mayer, Erik
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2022, 157
  • [42] Natural Language Processing through BERT for Identifying Gender-Based Violence Messages on Social Media
    Soldevilla, Ivonne
    Flores, Nahum
    2021 IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SOFTWARE ENGINEERING (ICICSE 2021), 2021, : 204 - 208
  • [43] A natural language processing pipeline for pairing measurements uniquely across free-text CT reports
    Sevenster, Merlijn
    Bozeman, Jeffrey
    Cowhy, Andrea
    Trost, William
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 53 : 36 - 48
  • [44] Natural language processing for abstraction of cancer treatment toxicities: accuracy versus human experts
    Hong, Julian C.
    Fairchild, Andrew T.
    Tanksley, Jarred P.
    Palta, Manisha
    Tenenbaum, Jessica D.
    JAMIA OPEN, 2020, 3 (04) : 513 - 517
  • [45] Machine Learning and Natural Language Processing for Prediction of Human Factors in Aviation Incident Reports
    Madeira, Tomas
    Melicio, Rui
    Valerio, Duarte
    Santos, Luis
    AEROSPACE, 2021, 8 (02) : 1 - 18
  • [46] Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective
    Aletras, Nikolaos
    Tsarapatsanis, Dimitrios
    Preotiuc-Pietro, Daniel
    Lampos, Vasileios
    PEERJ COMPUTER SCIENCE, 2016, PeerJ Inc. (2016):
  • [47] Development and evaluation of an interoperable natural language processing system for identifying pneumonia across clinical settings of care and institutions
    Chapman, Alec B.
    Peterson, Kelly S.
    Rutter, Elizabeth
    Nevers, Mckenna
    Zhang, Mingyuan
    Ying, Jian
    Jones, Makoto
    Classen, David
    Jones, Barbara
    JAMIA OPEN, 2022, 5 (04)
  • [48] Mitigation of User-Prompt Bias in Large Language Models: A Natural Langauge Processing and Deep Learning Based Framework
    Tiku, Sarvesh
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [49] The Text-Package: An R-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Transformers
    Kjell, Oscar
    Giorgi, Salvatore
    Schwartz, H. Andrew
    PSYCHOLOGICAL METHODS, 2023, 28 (06) : 1478 - 1498
  • [50] Development and Validation of a Natural Language Processing Tool to Identify Patients Treated for Pneumonia across VA Emergency Departments
    Jones, B. E.
    South, B. R.
    Shao, Y.
    Lu, C. C.
    Leng, J.
    Sauer, B. C.
    Gundlapalli, A. V.
    Samore, M. H.
    Zeng, Q.
    APPLIED CLINICAL INFORMATICS, 2018, 9 (01): : 122 - 128