Persistent Anti-Muslim Bias in Large Language Models

被引:140
作者
Abid, Abubakar [1 ]
Farooqi, Maheen [2 ]
Zou, James [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] McMaster Univ, Hamilton, ON, Canada
来源
AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY | 2021年
关键词
machine learning; language models; bias; stereotypes; ethics;
D O I
10.1145/3461702.3462624
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It has been observed that large-scale language models capture undesirable societal biases, e.g. relating to race and gender; yet religious bias has been relatively unexplored. We demonstrate that GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias. We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation, to understand this anti-Muslim bias, demonstrating that it appears consistently and creatively in different uses of the model and that it is severe even compared to biases about other religious groups. For instance, "Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is mapped to its most common stereotype, "money," in 5% of test cases. We quantify the positive distraction needed to overcome this bias with adversarial text prompts, and find that use of the most positive 6 adjectives reduces violent completions for "Muslims" from 66% to 20%, but which is still higher than for other religious groups.
引用
收藏
页码:298 / 306
页数:9
相关论文
共 50 条
  • [31] Modeling Global and local Codon Bias with Deep Language Models
    Fujimoto, M. Stanley
    Bodily, Paul M.
    Lyman, Cole A.
    Jacobsen, J. Andrew
    Snell, Quinn
    Clement, Mark J.
    2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 151 - 156
  • [32] Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes- response bias
    Dentella, Vittoria
    Guenther, Fritz
    Leivada, Evelina
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (51)
  • [33] Enhancing Bias Assessment for Complex Term Groups in Language Embedding Models: Quantitative Comparison of Methods
    Gray, Magnus
    Milanova, Mariofanna
    Wu, Leihong
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [34] Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing
    Pendyala, Vishnu S.
    Kamdar, Karnavee
    Mulchandani, Kapil
    ELECTRONICS, 2025, 14 (02):
  • [35] Large Language Models and Simple, Stupid Bugs
    Jesse, Kevin
    Ahmed, Toufique
    Devanbu, Premkumar T.
    Morgan, Emily
    2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 563 - 575
  • [36] Detection avoidance techniques for large language models
    Schneider, Sinclair
    Steuber, Florian
    Schneider, Joao A. G.
    Rodosek, Gabi Dreo
    DATA & POLICY, 2025, 7
  • [37] Perspective: Large Language Models in Applied Mechanics
    Brodnik, Neal R.
    Carton, Samuel
    Muir, Caelin
    Ghosh, Satanu
    Downey, Doug
    Echlin, McLean P.
    Pollock, Tresa M.
    Daly, Samantha
    JOURNAL OF APPLIED MECHANICS-TRANSACTIONS OF THE ASME, 2023, 90 (10):
  • [38] Debiasing large language models: research opportunities
    Yogarajan, Vithya
    Dobbie, Gillian
    Keegan, Te Taka
    JOURNAL OF THE ROYAL SOCIETY OF NEW ZEALAND, 2025, 55 (02) : 372 - 395
  • [39] Level Generation Through Large Language Models
    Todd, Graham
    Earle, Sam
    Nasir, Muhammad Umair
    Green, Michael Cerny
    Togelius, Julian
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2023, 2023,
  • [40] Large language models and the treaty interpretation game
    Nelson, Jack Wright
    CAMBRIDGE INTERNATIONAL LAW JOURNAL, 2023, 12 (02) : 305 - 327