A comprehensive survey on deep-learning-based visual captioning

被引:1
|
作者
Xin, Bowen [1 ]
Xu, Ning [2 ]
Zhai, Yingchen [2 ]
Zhang, Tingting [2 ]
Lu, Zimu [2 ]
Liu, Jing [2 ]
Nie, Weizhi [2 ]
Li, Xuanya [3 ]
Liu, An-An [2 ,4 ]
机构
[1] Heilongjiang Univ, Sch Elect Engn, Harbin 150006, Heilongjiang, Peoples R China
[2] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[3] Baidu Inc, Beijing 100085, Peoples R China
[4] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual captioning; Deep learning; Survey; LONG-TERM; IMAGE; ALGORITHMS; NETWORK; VISION; GRAPH;
D O I
10.1007/s00530-023-01175-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generating a description for an image/video is termed as the visual captioning task. It requires the model to capture the semantic information of visual content and translate them into syntactically and semantically human language. Connecting both research communities of computer vision (CV) and natural language processing (NLP), visual captioning presents the big challenge to bridge the gap between low-level visual features and high-level language information. Thanks to recent advances in deep learning, which are widely applied to the fields of visual and language modeling, the visual captioning methods depending on the deep neural networks has demonstrated state-of-the-art performances. In this paper, we aim to present a comprehensive survey of existing deep learning-based visual captioning methods. Relying on the adopted mechanism and technique to narrow the semantic gap, we divide visual captioning methods into various groups. Representative categories in each group are summarized, and their strengths and limitations are discussed. The quantitative evaluations of state-of-the-art approaches on popular benchmark datasets are also presented and analyzed. Furthermore, we provide the discussions on future research directions.
引用
收藏
页码:3781 / 3804
页数:24
相关论文
共 50 条
  • [1] A comprehensive survey on deep-learning-based visual captioning
    Bowen Xin
    Ning Xu
    Yingchen Zhai
    Tingting Zhang
    Zimu Lu
    Jing Liu
    Weizhi Nie
    Xuanya Li
    An-An Liu
    Multimedia Systems, 2023, 29 (6) : 3781 - 3804
  • [2] A Comprehensive Survey of Deep Learning for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [3] A Comprehensive Survey on Deep-Learning-Based Breast Cancer Diagnosis
    Mridha, Muhammad Firoz
    Hamid, Md. Abdul
    Monowar, Muhammad Mostafa
    Keya, Ashfia Jannat
    Ohi, Abu Quwsar
    Islam, Md. Rashedul
    Kim, Jong-Myon
    CANCERS, 2021, 13 (23)
  • [4] Deep-Learning-Based Precision Visual Tracking
    Peng, Xiaoming
    Xu, Zhiyong
    Ji, Xiang
    Peng, Yufan
    Zhang, Jianlin
    Zuo, Haorui
    Wei, Yuxing
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (06)
  • [5] A survey on Deep-Learning-based image steganography
    Song, Bingbing
    Wei, Ping
    Wu, Sixing
    Lin, Yu
    Zhou, Wei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 254
  • [6] Exploring Video Captioning Techniques: A Comprehensive Survey on Deep Learning Methods
    Islam S.
    Dash A.
    Seum A.
    Raj A.H.
    Hossain T.
    Shah F.M.
    SN Computer Science, 2021, 2 (2)
  • [7] Deep Learning for Visual Tracking: A Comprehensive Survey
    Marvasti-Zadeh, Seyed Mojtaba
    Cheng, Li
    Ghanei-Yakhdan, Hossein
    Kasaei, Shohreh
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (05) : 3943 - 3968
  • [8] Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning
    Oluwasammi, Ariyo
    Aftab, Muhammad Umar
    Qin, Zhiguang
    Son Tung Ngo
    Thang Van Doan
    Son Ba Nguyen
    Son Hoang Nguyen
    Giang Hoang Nguyen
    COMPLEXITY, 2021, 2021
  • [9] A Survey on Deep-Learning-Based Diabetic Retinopathy Classification
    Sebastian, Anila
    Elharrouss, Omar
    Al-Maadeed, Somaya
    Almaadeed, Noor
    DIAGNOSTICS, 2023, 13 (03)
  • [10] A Comprehensive Survey for Deep-Learning-Based Abnormality Detection in Smart Grids with Multimodal Image Data
    Zhou, Fangrong
    Wen, Gang
    Ma, Yi
    Geng, Hao
    Huang, Ran
    Pei, Ling
    Yu, Wenxian
    Chu, Lei
    Qiu, Robert
    APPLIED SCIENCES-BASEL, 2022, 12 (11):