A thorough review of models, evaluation metrics, and datasets on image captioning

被引:12
|
作者
Luo, Gaifang [1 ]
Cheng, Lijun [1 ]
Jing, Chao [1 ]
Zhao, Can [1 ]
Song, Guozhu [1 ]
机构
[1] Shanxi Agr Univ, Sch Software, Jinzhong 030801, Peoples R China
关键词
LANGUAGE; SCENE;
D O I
10.1049/ipr2.12367
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning means generate descriptive sentences from a query image automatically. It has recently received widespread attention from the computer vision and natural language processing communities as an emerging visual task. Currently, both components have evolved considerably by exploiting object regions, attributes, attention mechanism methods, entity recognition with novelties, and training strategies. However, despite the impressive results, the research has not yet come to a conclusive answer. This survey aims to provide a comprehensive overview of image captioning methods, from technical architectures to benchmark datasets, evaluation metrics, and comparison of state-of-the-art methods. In particular, image captioning methods are divided into different categories based on the technique adopted. Representative methods in each class are summarized, and their advantages and limitations are discussed. Moreover, many related state-of-the-art studies were quantitatively compared to determine the recent trends and future directions in image captioning. The ultimate goal of this work is to serve as a tool for understanding the existing literature and highlighting future directions in the area of image captioning for Computer Vision and Natural Language Processing communities may benefit from.
引用
收藏
页码:311 / 332
页数:22
相关论文
共 50 条
  • [1] Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
    Wajid, Mohammad Saif
    Terashima-Marin, Hugo
    Najafirad, Peyman
    Wajid, Mohd Anas
    ENGINEERING REPORTS, 2024, 6 (01)
  • [2] A Study of Evaluation Metrics and Datasets for Video Captioning
    Park, Jaehui
    Song, Chibon
    Han, Ji-hyeong
    2017 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2017, : 172 - 175
  • [3] Underwater image captioning: Challenges, models, and datasets
    Li, Huanyu
    Wang, Hao
    Zhang, Ying
    Li, Li
    Ren, Peng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 220 : 440 - 453
  • [4] Are metrics measuring what they should? An evaluation of Image Captioning task metrics
    Gonzalez-Chavez, Othon
    Ruiz, Guillermo
    Moctezuma, Daniela
    Ramirez-delReal, Tania
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 120
  • [5] Gender Biases in Automatic Evaluation Metrics for Image Captioning
    Qiu, Haoyi
    Dou, Zi-Yi
    Wang, Tianlu
    Celikyilmaz, Asli
    Peng, Nanyun
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 8358 - 8375
  • [6] Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook
    Baraheem, Samah Saeed
    Le, Trung-Nghia
    Nguyen, Tam V. V.
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (10) : 10813 - 10865
  • [7] Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook
    Samah Saeed Baraheem
    Trung-Nghia Le
    Tam V. Nguyen
    Artificial Intelligence Review, 2023, 56 : 10813 - 10865
  • [8] Improving the Performance of Image Captioning Models Trained on Small Datasets
    du Plessis, Mikkel
    Brink, Willie
    ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2021, 2022, 1551 : 77 - 91
  • [9] Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
    Sharma, Dhruv
    Dhiman, Chhavi
    Kumar, Dinesh
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [10] Image Captioning Methods and Metrics
    Sargar, Omkar
    Kinger, Shakti
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 522 - 526