Fairness Testing of Machine Translation Systems

被引:1
作者
Sun, Zeyu [1 ]
Chen, Zhenpeng [2 ]
Zhang, Jie [3 ]
Hao, Dan [4 ]
机构
[1] Chinese Acad Sci, Inst Software, Sci & Technol Integrated Informat Syst Lab, Beijing, Peoples R China
[2] Nanyang Technol Univ, Singapore, Singapore
[3] Kings Coll London, London, England
[4] Peking Univ, Sch Comp Sci, Key Lab High Confidence Software Technol, MoE, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Fairness testing; metamorphic testing; machine translation; protected attributes;
D O I
10.1145/3664608
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Machine translation is integral to international communication and extensively employed in diverse human-related applications. Despite remarkable progress, fairness issues persist within current machine translation systems. In this article, we propose FairMT, an automated fairness testing approach tailored for machine translation systems. FairMT operates on the assumption that translations of semantically similar sentences, containing protected attributes from distinct demographic groups, should maintain comparable meanings. It comprises three key steps: (1) test input generation, producing inputs covering various demographic groups; (2) test oracle generation, identifying potential unfair translations based on semantic similarity measurements; and (3) regression, discerning genuine fairness issues from those caused by low-quality translation. Leveraging FairMT, we conduct an empirical study on three leading machine translation systems-Google Translate, T5, and Transformer. Our investigation uncovers up to 832, 1,984, and 2,627 unfair translations across the three systems, respectively. Intriguingly, we observe that fair translations tend to exhibit superior translation performance, challenging the conventional wisdom of a fairness-performance tradeoff prevalent in the fairness literature.
引用
收藏
页数:27
相关论文
共 60 条
  • [1] Black Box Fairness Testing of Machine Learning Models
    Aggarwal, Aniya
    Lohia, Pranay
    Nagar, Seema
    Dey, Kuntal
    Saha, Diptikalyan
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 625 - 635
  • [2] algorithmwatch, 2020, Female Historians and Male Nurses do not Exist
  • [3] [Anonymous], 2014, WIKIPEDIA
  • [4] BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems
    Asyrofi, Muhammad Hilmi
    Yang, Zhou
    Yusuf, Imam Nur Bani
    Kang, Hong Jin
    Thung, Ferdian
    Lo, David
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (12) : 5087 - 5101
  • [5] Belinkov Y., 2018, SYNTHETIC NATURAL NO
  • [6] Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness
    Biswas, Sumon
    Rajan, Hridesh
    [J]. PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, : 642 - 653
  • [7] businessinsider, 2015, Google Apologizes After Its Translator Produced Homophobic Slurs For The Word 'Gay'
  • [8] Campbell L, 2008, LANGUAGE, V84, P636
  • [9] Cao Jialun, 2020, arXiv
  • [10] Chen T.Y., 1998, Metamorphic testing: a new approach for generating next test cases