Bias Unveiled: Enhancing Fairness in German Word Embeddings with Large Language Models

被引：0

作者：

Saeid, Yasser ^{[1
]}

Kopinski, Thomas ^{[1
]}

机构：

[1] South Westphalia Univ Appl Sci, Meschede, Germany

来源：

SPEECH AND COMPUTER, SPECOM 2024, PT II | 2025年 / 15300卷

关键词：

Stereotypical biases; Gender bias; Machine learning systems; Word embedding algorithms; Bias amplification; Embedding bias; Origins of bias; Specific training documents; Efficacy; Abating bias; Methodology; Insights; Matrix; German Wikipedia corpora; Empirical endeavor; Precision; Sources of bias; Equanimity; Impartiality; LLM;

D O I：

10.1007/978-3-031-78014-1_23

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Gender bias in word embedding algorithms has garnered significant attention due to its integration into machine learning systems and its potential to reinforce stereotypes. Despite ongoing efforts, the root causes of biases in training word embeddings, specifically for the German language, remain unclear. This research presents a novel approach to tackling this problem, paving the way for new avenues of investigation. Our methodology involves a comprehensive analysis of word embeddings, focusing on how training data manipulations impact resulting biases. By examining how biases originate within specific training documents, we identify subsets that can be removed to effectively mitigate these effects. Additionally, we explore both conventional methods and new approaches using large language models (LLMs) to ensure the generated text adheres to concepts of fairness. Using few-shot prompting, we generate gender bias-free text, employing GPT-4 as a benchmark to evaluate the fairness of this process for the German language. Our method explains the intricate origins of biases within word embeddings, validated through rigorous application to German Wikipedia corpora. Our findings robustly demonstrate the efficacy of our method, showing that removing certain document subsets significantly diminishes bias in word embeddings. This is further detailed in our analysis, "Unlocking the Limits: Document Removal with an Upper Bound," in the experimental results section. Ultimately, this research presents a practical framework to uncover and mitigate biases in word embedding algorithms during training. Our goal is to advance machine learning systems that prioritize fairness and impartiality by revealing and addressing latent sources of bias.

引用

页码：308 / 325

页数：18

共 12 条

[1] Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models
Lai, Honghao
Ge, Long
Sun, Mingyao
Pan, Bei
Huang, Jiajie
Hou, Liangying
Yang, Qiuyu
Liu, Jiayi
Liu, Jianing
Ye, Ziying
Xia, Danni
Zhao, Weilong
Wang, Xiaoman
Liu, Ming
Talukdar, Jhalok Ronjan
Tian, Jinhui
Yang, Kehu
Estill, Janne
JAMA NETWORK OPEN, 2024, 7 (05) : E2412687
[2] Leveraging Large Language Models for Enhancing Literature-Based Discovery
Taleb, Ikbal
Navaz, Alramzana Nujum
Serhani, Mohamed Adel
BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (11)
[3] MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish
Garrido-Munoz, Ismael
Martinez-Santiago, Fernando
Montejo-Raez, Arturo
LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (04) : 1387 - 1417
[4] Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models
Agrawal, Anjali
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (09)
[5] GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
Tang, Kunsheng
Zhou, Wenbo
Zhang, Jie
Liu, Aishan
Deng, Gelei
Li, Shuai
Qi, Peigui
Zhang, Weiming
Zhang, Tianwei
Yu, Nenghai
PROCEEDINGS OF THE 2024 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2024, 2024, : 1196 - 1210
[6] Bias of AI-generated content: an examination of news produced by large language models
Fang, Xiao
Che, Shangkun
Mao, Minjia
Zhang, Hongzhe
Zhao, Ming
Zhao, Xiaohang
SCIENTIFIC REPORTS, 2024, 14 (01)
[7] Enhancing Drug Safety Documentation Search Capabilities with Large Language Models: A User-Centric Approach
Painter, Jeffery E.
Mahaux, Olivia
Vanini, Marco
Kara, Vijay
Roshan, Christie
Karwowski, Marcin
Chalamalasetti, Venkateswara Rao
Bate, Andrew
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 49 - 56
[8] Enhancing Supply Chain Efficiency through Retrieve-Augmented Generation Approach in Large Language Models
Zhu, Beilei
Vuppalapati, Chandrasekar
2024 IEEE 10TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND MACHINE LEARNING APPLICATIONS, BIGDATASERVICE 2024, 2024, : 117 - 121
[9] Vox Populi, Vox AI? Using Large Language Models to Estimate German Vote Choice
von der Heyde, Leah
Haensch, Anna-Carolina
Wenz, Alexander
SOCIAL SCIENCE COMPUTER REVIEW, 2025,
[10] Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative Study
Quttainah, Majdi
Mishra, Vinaytosh
Madakam, Somayya
Lurie, Yotam
Mark, Shlomo
JMIR AI, 2024, 3

← 1 2 →