Video description: A comprehensive survey of deep learning approaches

被引:10
|
作者
Rafiq, Ghazala [1 ]
Rafiq, Muhammad [2 ]
Choi, Gyu Sang [1 ]
机构
[1] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
[2] Keimyung Univ, Dept Game & Mobile Engn, 1095 Dalgubeol Daero, Daegu 42601, South Korea
基金
新加坡国家研究基金会;
关键词
Deep learning; Encoder-Decoder architecture; Text description; Video captioning techniques; Video description approaches; Video captioning; Vision to text; NETWORKS;
D O I
10.1007/s10462-023-10414-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. It bridges the key AI fields of computer vision and natural language processing in conjunction with real-time and practical applications. Deep learning-based approaches employed for video description have demonstrated enhanced results compared to conventional approaches. The current literature lacks a thorough interpretation of the recently developed and employed sequence to sequence techniques for video description. This paper fills that gap by focusing mainly on deep learning-enabled approaches to automatic caption generation. Sequence to sequence models follow an Encoder-Decoder architecture employing a specific composition of CNN, RNN, or the variants LSTM or GRU as an encoder and decoder block. This standard-architecture can be fused with an attention mechanism to focus on a specific distinctiveness, achieving high quality results. Reinforcement learning employed within the Encoder-Decoder structure can progressively deliver state-of-the-art captions by following exploration and exploitation strategies. The transformer mechanism is a modern and efficient transductive architecture for robust output. Free from recurrence, and solely based on self-attention, it allows parallelization along with training on a massive amount of data. It can fully utilize the available GPUs for most NLP tasks. Recently, with the emergence of several versions of transformers, long term dependency handling is not an issue anymore for researchers engaged in video processing for summarization and description, or for autonomous-vehicle, surveillance, and instructional purposes. They can get auspicious directions from this research.
引用
收藏
页码:13293 / 13372
页数:80
相关论文
共 50 条
  • [41] A SURVEY ON VIDEO FACE RECOGNITION USING DEEP LEARNING
    Mustapha, Muhammad Firdaus
    Mohamad, Nur Maisarah
    Hamid, Siti Haslini A. B.
    Malik, Mohd Azry Abdul
    Noor, Mohd Rahimie M. D.
    JOURNAL OF QUALITY MEASUREMENT AND ANALYSIS, 2022, 18 (01): : 49 - 62
  • [42] A Survey of Deep Learning Video Super-Resolution
    Baniya, Arbind Agrahari
    Lee, Tsz-Kwan
    Eklund, Peter W.
    Aryal, Sunil
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2655 - 2676
  • [43] A Comprehensive Survey of Recommender Systems Based on Deep Learning
    Zhou, Hongde
    Xiong, Fei
    Chen, Hongshu
    APPLIED SCIENCES-BASEL, 2023, 13 (20):
  • [44] Deep learning for fake news detection: A comprehensive survey
    Hu, Linmei
    Wei, Siqi
    Zhao, Ziwang
    Wu, Bin
    AI OPEN, 2022, 3 : 133 - 155
  • [45] A comprehensive survey on optimizing deep learning models by metaheuristics
    Bahriye Akay
    Dervis Karaboga
    Rustu Akay
    Artificial Intelligence Review, 2022, 55 : 829 - 894
  • [46] Causal Inference Meets Deep Learning: A Comprehensive Survey
    Jiao, Licheng
    Wang, Yuhan
    Liu, Xu
    Li, Lingling
    Liu, Fang
    Ma, Wenping
    Guo, Yuwei
    Chen, Puhua
    Yang, Shuyuan
    Hou, Biao
    RESEARCH, 2024, 7
  • [47] Deep Learning for Intelligent Wireless Networks: A Comprehensive Survey
    Mao, Qian
    Hu, Fei
    Hao, Qi
    IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2018, 20 (04): : 2595 - 2621
  • [48] A comprehensive survey on design and application of autoencoder in deep learning
    Li, Pengzhi
    Pei, Yan
    Li, Jianqiang
    APPLIED SOFT COMPUTING, 2023, 138
  • [49] Multiscale Deep Learning for Detection and Recognition: A Comprehensive Survey
    Jiao, Licheng
    Wang, Mengjiao
    Liu, Xu
    Li, Lingling
    Liu, Fang
    Feng, Zhixi
    Yang, Shuyuan
    Hou, Biao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 21
  • [50] Activation functions in deep learning: A comprehensive survey and benchmark
    Dubey, Shiv Ram
    Singh, Satish Kumar
    Chaudhuri, Bidyut Baran
    NEUROCOMPUTING, 2022, 503 : 92 - 108