A thorough review of models, evaluation metrics, and datasets on image captioning

被引：12

作者：

Luo, Gaifang ^{[1
]}

Cheng, Lijun ^{[1
]}

Jing, Chao ^{[1
]}

Zhao, Can ^{[1
]}

Song, Guozhu ^{[1
]}

机构：

[1] Shanxi Agr Univ, Sch Software, Jinzhong 030801, Peoples R China

来源：

IET IMAGE PROCESSING | 2022年 / 16卷 / 02期

关键词：

LANGUAGE; SCENE;

D O I：

10.1049/ipr2.12367

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image captioning means generate descriptive sentences from a query image automatically. It has recently received widespread attention from the computer vision and natural language processing communities as an emerging visual task. Currently, both components have evolved considerably by exploiting object regions, attributes, attention mechanism methods, entity recognition with novelties, and training strategies. However, despite the impressive results, the research has not yet come to a conclusive answer. This survey aims to provide a comprehensive overview of image captioning methods, from technical architectures to benchmark datasets, evaluation metrics, and comparison of state-of-the-art methods. In particular, image captioning methods are divided into different categories based on the technique adopted. Representative methods in each class are summarized, and their advantages and limitations are discussed. Moreover, many related state-of-the-art studies were quantitatively compared to determine the recent trends and future directions in image captioning. The ultimate goal of this work is to serve as a tool for understanding the existing literature and highlighting future directions in the area of image captioning for Computer Vision and Natural Language Processing communities may benefit from.

引用

页码：311 / 332

页数：22

共 50 条

[1] Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
Wajid, Mohammad Saif
Terashima-Marin, Hugo
Najafirad, Peyman
Wajid, Mohd Anas
ENGINEERING REPORTS, 2024, 6 (01)
[2] A Study of Evaluation Metrics and Datasets for Video Captioning
Park, Jaehui
Song, Chibon
Han, Ji-hyeong
2017 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2017, : 172 - 175
[3] Underwater image captioning: Challenges, models, and datasets
Li, Huanyu
Wang, Hao
Zhang, Ying
Li, Li
Ren, Peng
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 220 : 440 - 453
[4] Are metrics measuring what they should? An evaluation of Image Captioning task metrics
Gonzalez-Chavez, Othon
Ruiz, Guillermo
Moctezuma, Daniela
Ramirez-delReal, Tania
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 120
[5] Gender Biases in Automatic Evaluation Metrics for Image Captioning
Qiu, Haoyi
Dou, Zi-Yi
Wang, Tianlu
Celikyilmaz, Asli
Peng, Nanyun
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 8358 - 8375
[6] Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook
Baraheem, Samah Saeed
Le, Trung-Nghia
Nguyen, Tam V. V.
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (10) : 10813 - 10865
[7] Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook
Samah Saeed Baraheem
Trung-Nghia Le
Tam V. Nguyen
Artificial Intelligence Review, 2023, 56 : 10813 - 10865
[8] Improving the Performance of Image Captioning Models Trained on Small Datasets
du Plessis, Mikkel
Brink, Willie
ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2021, 2022, 1551 : 77 - 91
[9] Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
Sharma, Dhruv
Dhiman, Chhavi
Kumar, Dinesh
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
[10] Image Captioning Methods and Metrics
Sargar, Omkar
Kinger, Shakti
2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 522 - 526

← 1 2 3 4 5 →