Fairness Testing of Machine Translation Systems

被引：1

作者：

Sun, Zeyu ^{[1
]}

Chen, Zhenpeng ^{[2
]}

Zhang, Jie ^{[3
]}

Hao, Dan ^{[4
]}

机构：

[1] Chinese Acad Sci, Inst Software, Sci & Technol Integrated Informat Syst Lab, Beijing, Peoples R China

[2] Nanyang Technol Univ, Singapore, Singapore

[3] Kings Coll London, London, England

[4] Peking Univ, Sch Comp Sci, Key Lab High Confidence Software Technol, MoE, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY | 2024年 / 33卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Fairness testing; metamorphic testing; machine translation; protected attributes;

D O I：

10.1145/3664608

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Machine translation is integral to international communication and extensively employed in diverse human-related applications. Despite remarkable progress, fairness issues persist within current machine translation systems. In this article, we propose FairMT, an automated fairness testing approach tailored for machine translation systems. FairMT operates on the assumption that translations of semantically similar sentences, containing protected attributes from distinct demographic groups, should maintain comparable meanings. It comprises three key steps: (1) test input generation, producing inputs covering various demographic groups; (2) test oracle generation, identifying potential unfair translations based on semantic similarity measurements; and (3) regression, discerning genuine fairness issues from those caused by low-quality translation. Leveraging FairMT, we conduct an empirical study on three leading machine translation systems-Google Translate, T5, and Transformer. Our investigation uncovers up to 832, 1,984, and 2,627 unfair translations across the three systems, respectively. Intriguingly, we observe that fair translations tend to exhibit superior translation performance, challenging the conventional wisdom of a fairness-performance tradeoff prevalent in the fairness literature.

引用

页数：27

共 60 条

[1] Black Box Fairness Testing of Machine Learning Models
Aggarwal, Aniya
Lohia, Pranay
Nagar, Seema
Dey, Kuntal
Saha, Diptikalyan
[J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 625 - 635
[2] algorithmwatch, 2020, Female Historians and Male Nurses do not Exist
[3] [Anonymous], 2014, WIKIPEDIA
[4] BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems
Asyrofi, Muhammad Hilmi
Yang, Zhou
Yusuf, Imam Nur Bani
Kang, Hong Jin
Thung, Ferdian
Lo, David
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (12) : 5087 - 5101
[5] Belinkov Y., 2018, SYNTHETIC NATURAL NO
[6] Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness
Biswas, Sumon
Rajan, Hridesh
[J]. PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, : 642 - 653
[7] businessinsider, 2015, Google Apologizes After Its Translator Produced Homophobic Slurs For The Word 'Gay'
[8] Campbell L, 2008, LANGUAGE, V84, P636
[9] Cao Jialun, 2020, arXiv
[10] Chen T.Y., 1998, Metamorphic testing: a new approach for generating next test cases

← 1 2 3 4 5 6 →