Gender Biases in Automatic Evaluation Metrics for Image Captioning

被引:0
|
作者
Qiu, Haoyi [1 ]
Dou, Zi-Yi [1 ]
Wang, Tianlu [2 ]
Celikyilmaz, Asli [2 ]
Peng, Nanyun [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Meta AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based evaluation metrics (e.g., CLIPScore and GPTScore) have demonstrated decent correlations with human judgments in various language generation tasks. However, their impact on fairness remains largely unexplored. It is widely recognized that pretrained models can inadvertently encode societal biases, thus employing these models for evaluation purposes may inadvertently perpetuate and amplify biases. For example, an evaluation metric may favor the caption "a woman is calculating an account book" over "a man is calculating an account book," even if the image only shows male accountants. In this paper, we conduct a systematic study of gender biases in modelbased automatic evaluation metrics for image captioning tasks. We start by curating a dataset comprising profession, activity, and object concepts associated with stereotypical gender associations. Then, we demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations, as well as the propagation of biases to generation models through reinforcement learning. Finally, we present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments. Our dataset and framework lay the foundation for understanding the potential harm of model-based evaluation metrics, and facilitate future works to develop more inclusive evaluation metrics.(1)
引用
收藏
页码:8358 / 8375
页数:18
相关论文
共 50 条
  • [21] Evaluating the effectiveness of automatic image captioning for web accessibility
    Maurizio Leotta
    Fabrizio Mori
    Marina Ribaudo
    Universal Access in the Information Society, 2023, 22 : 1293 - 1313
  • [22] Model-Agnostic Gender Debiased Image Captioning
    Hirota, Yusuke
    Nakashima, Yuta
    Garcia, Noa
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15191 - 15200
  • [23] Improved image reconstruction from brain activity through automatic image captioning
    Kalantari, Fatemeh
    Faez, Karim
    Amindavar, Hamidreza
    Nazari, Soheila
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [24] Automatic image captioning system based on augmentation and ranking mechanism
    B. S. Revathi
    A. Meena Kowshalya
    Signal, Image and Video Processing, 2024, 18 : 265 - 274
  • [25] Improving Automatic Image Captioning Using Text Summarization Techniques
    Plaza, Laura
    Lloret, Elena
    Aker, Ahmet
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 165 - +
  • [26] Automatic image captioning system based on augmentation and ranking mechanism
    Revathi, B. S.
    Kowshalya, A. Meena
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 265 - 274
  • [27] Automatic image captioning system using a deep learning approach
    Deepak, Gerard
    Gali, Sowmya
    Sonker, Abhilash
    Jos, Bobin Cherian
    Sagar, K. V. Daya
    Singh, Charanjeet
    SOFT COMPUTING, 2023,
  • [28] A Grey Relational Analysis based Evaluation Metric for Image Captioning and Video Captioning
    Ma, Miao
    Wang, Bolong
    PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON GREY SYSTEMS AND INTELLIGENT SERVICES (GSIS), 2017, : 76 - 81
  • [29] Evaluation of automatic video captioning using direct assessment
    Graham, Yvette
    Awad, George
    Smeaton, Alan
    PLOS ONE, 2018, 13 (09):
  • [30] Semantic interdisciplinary evaluation of image captioning models
    Sirisha, Uddagiri
    Chandana, Bolem Sai
    COGENT ENGINEERING, 2022, 9 (01):