Gender Biases in Automatic Evaluation Metrics for Image Captioning

被引:0
|
作者
Qiu, Haoyi [1 ]
Dou, Zi-Yi [1 ]
Wang, Tianlu [2 ]
Celikyilmaz, Asli [2 ]
Peng, Nanyun [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Meta AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based evaluation metrics (e.g., CLIPScore and GPTScore) have demonstrated decent correlations with human judgments in various language generation tasks. However, their impact on fairness remains largely unexplored. It is widely recognized that pretrained models can inadvertently encode societal biases, thus employing these models for evaluation purposes may inadvertently perpetuate and amplify biases. For example, an evaluation metric may favor the caption "a woman is calculating an account book" over "a man is calculating an account book," even if the image only shows male accountants. In this paper, we conduct a systematic study of gender biases in modelbased automatic evaluation metrics for image captioning tasks. We start by curating a dataset comprising profession, activity, and object concepts associated with stereotypical gender associations. Then, we demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations, as well as the propagation of biases to generation models through reinforcement learning. Finally, we present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments. Our dataset and framework lay the foundation for understanding the potential harm of model-based evaluation metrics, and facilitate future works to develop more inclusive evaluation metrics.(1)
引用
收藏
页码:8358 / 8375
页数:18
相关论文
共 50 条
  • [1] Re-evaluating Automatic Metrics for Image Captioning
    Kilickaya, Mert
    Erdem, Aykut
    Ikizler-Cinbis, Nazli
    Erdem, Erkut
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 199 - 209
  • [2] Are metrics measuring what they should? An evaluation of Image Captioning task metrics
    Gonzalez-Chavez, Othon
    Ruiz, Guillermo
    Moctezuma, Daniela
    Ramirez-delReal, Tania
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 120
  • [3] Image Captioning Methods and Metrics
    Sargar, Omkar
    Kinger, Shakti
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 522 - 526
  • [4] Automatic image captioning
    Pan, JY
    Yang, HJ
    Duygulu, P
    Faloutsos, C
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1987 - 1990
  • [5] A thorough review of models, evaluation metrics, and datasets on image captioning
    Luo, Gaifang
    Cheng, Lijun
    Jing, Chao
    Zhao, Can
    Song, Guozhu
    IET IMAGE PROCESSING, 2022, 16 (02) : 311 - 332
  • [6] On Diversity in Image Captioning: Metrics and Methods
    Wang, Qingzhong
    Wan, Jia
    Chan, Antoni B.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 1035 - 1049
  • [7] Understanding and Evaluating Racial Biases in Image Captioning
    Zhao, Dora
    Wang, Angelina
    Russakovsky, Olga
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14810 - 14820
  • [8] Evaluation metrics for video captioning: A survey
    Inacio, Andrei de Souza
    Lopes, Heitor Silverio
    MACHINE LEARNING WITH APPLICATIONS, 2023, 13
  • [9] A Study of Evaluation Metrics and Datasets for Video Captioning
    Park, Jaehui
    Song, Chibon
    Han, Ji-hyeong
    2017 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2017, : 172 - 175
  • [10] Chittron: An Automatic Bangla Image Captioning System
    Rahman, Matiur
    Mohammed, Nabeel
    Mansoor, Nafees
    Momen, Sifat
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 636 - 642