A comprehensive survey on deep-learning-based visual captioning

被引：1

作者：

Xin, Bowen ^{[1
]}

Xu, Ning ^{[2
]}

Zhai, Yingchen ^{[2
]}

Zhang, Tingting ^{[2
]}

Lu, Zimu ^{[2
]}

Liu, Jing ^{[2
]}

Nie, Weizhi ^{[2
]}

Li, Xuanya ^{[3
]}

Liu, An-An ^{[2
,4
]}

机构：

[1] Heilongjiang Univ, Sch Elect Engn, Harbin 150006, Heilongjiang, Peoples R China

[2] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[3] Baidu Inc, Beijing 100085, Peoples R China

[4] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2023年 / 29卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Visual captioning; Deep learning; Survey; LONG-TERM; IMAGE; ALGORITHMS; NETWORK; VISION; GRAPH;

D O I：

10.1007/s00530-023-01175-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Generating a description for an image/video is termed as the visual captioning task. It requires the model to capture the semantic information of visual content and translate them into syntactically and semantically human language. Connecting both research communities of computer vision (CV) and natural language processing (NLP), visual captioning presents the big challenge to bridge the gap between low-level visual features and high-level language information. Thanks to recent advances in deep learning, which are widely applied to the fields of visual and language modeling, the visual captioning methods depending on the deep neural networks has demonstrated state-of-the-art performances. In this paper, we aim to present a comprehensive survey of existing deep learning-based visual captioning methods. Relying on the adopted mechanism and technique to narrow the semantic gap, we divide visual captioning methods into various groups. Representative categories in each group are summarized, and their strengths and limitations are discussed. The quantitative evaluations of state-of-the-art approaches on popular benchmark datasets are also presented and analyzed. Furthermore, we provide the discussions on future research directions.

引用

页码：3781 / 3804

页数：24

共 50 条

[1] A comprehensive survey on deep-learning-based visual captioning
Bowen Xin
Ning Xu
Yingchen Zhai
Tingting Zhang
Zimu Lu
Jing Liu
Weizhi Nie
Xuanya Li
An-An Liu
Multimedia Systems, 2023, 29 (6) : 3781 - 3804
[2] A Comprehensive Survey of Deep Learning for Image Captioning
Hossain, Md Zakir
Sohel, Ferdous
Shiratuddin, Mohd Fairuz
Laga, Hamid
ACM COMPUTING SURVEYS, 2019, 51 (06)
[3] A Comprehensive Survey on Deep-Learning-Based Breast Cancer Diagnosis
Mridha, Muhammad Firoz
Hamid, Md. Abdul
Monowar, Muhammad Mostafa
Keya, Ashfia Jannat
Ohi, Abu Quwsar
Islam, Md. Rashedul
Kim, Jong-Myon
CANCERS, 2021, 13 (23)
[4] Deep-Learning-Based Precision Visual Tracking
Peng, Xiaoming
Xu, Zhiyong
Ji, Xiang
Peng, Yufan
Zhang, Jianlin
Zuo, Haorui
Wei, Yuxing
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (06)
[5] A survey on Deep-Learning-based image steganography
Song, Bingbing
Wei, Ping
Wu, Sixing
Lin, Yu
Zhou, Wei
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 254
[6] Exploring Video Captioning Techniques: A Comprehensive Survey on Deep Learning Methods
Islam S.
Dash A.
Seum A.
Raj A.H.
Hossain T.
Shah F.M.
SN Computer Science, 2021, 2 (2)
[7] Deep Learning for Visual Tracking: A Comprehensive Survey
Marvasti-Zadeh, Seyed Mojtaba
Cheng, Li
Ghanei-Yakhdan, Hossein
Kasaei, Shohreh
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (05) : 3943 - 3968
[8] Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning
Oluwasammi, Ariyo
Aftab, Muhammad Umar
Qin, Zhiguang
Son Tung Ngo
Thang Van Doan
Son Ba Nguyen
Son Hoang Nguyen
Giang Hoang Nguyen
COMPLEXITY, 2021, 2021
[9] A Survey on Deep-Learning-Based Diabetic Retinopathy Classification
Sebastian, Anila
Elharrouss, Omar
Al-Maadeed, Somaya
Almaadeed, Noor
DIAGNOSTICS, 2023, 13 (03)
[10] A Comprehensive Survey for Deep-Learning-Based Abnormality Detection in Smart Grids with Multimodal Image Data
Zhou, Fangrong
Wen, Gang
Ma, Yi
Geng, Hao
Huang, Ran
Pei, Ling
Yu, Wenxian
Chu, Lei
Qiu, Robert
APPLIED SCIENCES-BASEL, 2022, 12 (11):

← 1 2 3 4 5 →