A comprehensive survey on deep-learning-based visual captioning

被引：1

作者：

Xin, Bowen ^{[1
]}

Xu, Ning ^{[2
]}

Zhai, Yingchen ^{[2
]}

Zhang, Tingting ^{[2
]}

Lu, Zimu ^{[2
]}

Liu, Jing ^{[2
]}

Nie, Weizhi ^{[2
]}

Li, Xuanya ^{[3
]}

Liu, An-An ^{[2
,4
]}

机构：

[1] Heilongjiang Univ, Sch Elect Engn, Harbin 150006, Heilongjiang, Peoples R China

[2] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[3] Baidu Inc, Beijing 100085, Peoples R China

[4] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2023年 / 29卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Visual captioning; Deep learning; Survey; LONG-TERM; IMAGE; ALGORITHMS; NETWORK; VISION; GRAPH;

D O I：

10.1007/s00530-023-01175-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Generating a description for an image/video is termed as the visual captioning task. It requires the model to capture the semantic information of visual content and translate them into syntactically and semantically human language. Connecting both research communities of computer vision (CV) and natural language processing (NLP), visual captioning presents the big challenge to bridge the gap between low-level visual features and high-level language information. Thanks to recent advances in deep learning, which are widely applied to the fields of visual and language modeling, the visual captioning methods depending on the deep neural networks has demonstrated state-of-the-art performances. In this paper, we aim to present a comprehensive survey of existing deep learning-based visual captioning methods. Relying on the adopted mechanism and technique to narrow the semantic gap, we divide visual captioning methods into various groups. Representative categories in each group are summarized, and their strengths and limitations are discussed. The quantitative evaluations of state-of-the-art approaches on popular benchmark datasets are also presented and analyzed. Furthermore, we provide the discussions on future research directions.

引用

页码：3781 / 3804

页数：24

共 50 条

[21] Deep-learning-based visual data analytics for smart construction management
Pal, Aritra
Hsieh, Shang-Hsien
AUTOMATION IN CONSTRUCTION, 2021, 131
[22] Survey of deep learning and architectures for visual captioning-transitioning between media and natural languages
Sur, Chiranjib
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 32187 - 32237
[23] Deep-Learning-Based Semantic Segmentation of Remote Sensing Images: A Survey
Huang, Liwei
Jiang, Bitao
Lv, Shouye
Liu, Yanbo
Fu, Ying
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 8370 - 8396
[24] Toward Deep-Learning-Based Methods in Image Forgery Detection: A Survey
Pham, Nam Thanh
Park, Chun-Su
IEEE ACCESS, 2023, 11 : 11224 - 11237
[25] Survey of Visual SLAM Based on Deep Learning
Huang Z.
Shao C.
Jiqiren/Robot, 2023, 45 (06): : 756 - 768
[26] Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Xu, Yuecong
Cao, Haozhi
Xie, Lihua
Li, Xiao-Li
Chen, Zhenghua
Yang, Jianfei
ACM COMPUTING SURVEYS, 2024, 56 (12)
[27] A comprehensive literature review on image captioning methods and metrics based on deep learning technique
Ahmad Sami Al-Shamayleh
Omar Adwan
Mohammad A. Alsharaiah
Abdelrahman H. Hussein
Qasem M. Kharma
Christopher Ifeanyi Eke
Multimedia Tools and Applications, 2024, 83 : 34219 - 34268
[28] Video super-resolution based on deep learning: a comprehensive survey
Liu, Hongying
Ruan, Zhubo
Zhao, Peng
Dong, Chao
Shang, Fanhua
Liu, Yuanyuan
Yang, Linlin
Timofte, Radu
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) : 5981 - 6035
[29] A Comprehensive Survey of Recommender Systems Based on Deep Learning
Zhou, Hongde
Xiong, Fei
Chen, Hongshu
APPLIED SCIENCES-BASEL, 2023, 13 (20):
[30] An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Michelsanti, Daniel
Tan, Zheng-Hua
Zhang, Shi-Xiong
Xu, Yong
Yu, Meng
Yu, Dong
Jensen, Jesper
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1368 - 1396

← 1 2 3 4 5 →