Bias Unveiled: Enhancing Fairness in German Word Embeddings with Large Language Models

被引:0
|
作者
Saeid, Yasser [1 ]
Kopinski, Thomas [1 ]
机构
[1] South Westphalia Univ Appl Sci, Meschede, Germany
来源
SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷
关键词
Stereotypical biases; Gender bias; Machine learning systems; Word embedding algorithms; Bias amplification; Embedding bias; Origins of bias; Specific training documents; Efficacy; Abating bias; Methodology; Insights; Matrix; German Wikipedia corpora; Empirical endeavor; Precision; Sources of bias; Equanimity; Impartiality; LLM;
D O I
10.1007/978-3-031-78014-1_23
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Gender bias in word embedding algorithms has garnered significant attention due to its integration into machine learning systems and its potential to reinforce stereotypes. Despite ongoing efforts, the root causes of biases in training word embeddings, specifically for the German language, remain unclear. This research presents a novel approach to tackling this problem, paving the way for new avenues of investigation. Our methodology involves a comprehensive analysis of word embeddings, focusing on how training data manipulations impact resulting biases. By examining how biases originate within specific training documents, we identify subsets that can be removed to effectively mitigate these effects. Additionally, we explore both conventional methods and new approaches using large language models (LLMs) to ensure the generated text adheres to concepts of fairness. Using few-shot prompting, we generate gender bias-free text, employing GPT-4 as a benchmark to evaluate the fairness of this process for the German language. Our method explains the intricate origins of biases within word embeddings, validated through rigorous application to German Wikipedia corpora. Our findings robustly demonstrate the efficacy of our method, showing that removing certain document subsets significantly diminishes bias in word embeddings. This is further detailed in our analysis, "Unlocking the Limits: Document Removal with an Upper Bound," in the experimental results section. Ultimately, this research presents a practical framework to uncover and mitigate biases in word embedding algorithms during training. Our goal is to advance machine learning systems that prioritize fairness and impartiality by revealing and addressing latent sources of bias.
引用
收藏
页码:308 / 325
页数:18
相关论文
共 12 条
  • [1] Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models
    Lai, Honghao
    Ge, Long
    Sun, Mingyao
    Pan, Bei
    Huang, Jiajie
    Hou, Liangying
    Yang, Qiuyu
    Liu, Jiayi
    Liu, Jianing
    Ye, Ziying
    Xia, Danni
    Zhao, Weilong
    Wang, Xiaoman
    Liu, Ming
    Talukdar, Jhalok Ronjan
    Tian, Jinhui
    Yang, Kehu
    Estill, Janne
    JAMA NETWORK OPEN, 2024, 7 (05) : E2412687
  • [2] Leveraging Large Language Models for Enhancing Literature-Based Discovery
    Taleb, Ikbal
    Navaz, Alramzana Nujum
    Serhani, Mohamed Adel
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (11)
  • [3] MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish
    Garrido-Munoz, Ismael
    Martinez-Santiago, Fernando
    Montejo-Raez, Arturo
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (04) : 1387 - 1417
  • [4] Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models
    Agrawal, Anjali
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (09)
  • [5] GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
    Tang, Kunsheng
    Zhou, Wenbo
    Zhang, Jie
    Liu, Aishan
    Deng, Gelei
    Li, Shuai
    Qi, Peigui
    Zhang, Weiming
    Zhang, Tianwei
    Yu, Nenghai
    PROCEEDINGS OF THE 2024 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2024, 2024, : 1196 - 1210
  • [6] Bias of AI-generated content: an examination of news produced by large language models
    Fang, Xiao
    Che, Shangkun
    Mao, Minjia
    Zhang, Hongzhe
    Zhao, Ming
    Zhao, Xiaohang
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [7] Enhancing Drug Safety Documentation Search Capabilities with Large Language Models: A User-Centric Approach
    Painter, Jeffery E.
    Mahaux, Olivia
    Vanini, Marco
    Kara, Vijay
    Roshan, Christie
    Karwowski, Marcin
    Chalamalasetti, Venkateswara Rao
    Bate, Andrew
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 49 - 56
  • [8] Enhancing Supply Chain Efficiency through Retrieve-Augmented Generation Approach in Large Language Models
    Zhu, Beilei
    Vuppalapati, Chandrasekar
    2024 IEEE 10TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND MACHINE LEARNING APPLICATIONS, BIGDATASERVICE 2024, 2024, : 117 - 121
  • [9] Vox Populi, Vox AI? Using Large Language Models to Estimate German Vote Choice
    von der Heyde, Leah
    Haensch, Anna-Carolina
    Wenz, Alexander
    SOCIAL SCIENCE COMPUTER REVIEW, 2025,
  • [10] Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative Study
    Quttainah, Majdi
    Mishra, Vinaytosh
    Madakam, Somayya
    Lurie, Yotam
    Mark, Shlomo
    JMIR AI, 2024, 3