Automatic video captioning using tree hierarchical deep convolutional neural network and ASRNN-bi-directional LSTM

被引:14
作者
Kavitha, N. [1 ]
Soundar, K. Ruba [2 ]
Karthick, R. [3 ]
Kohila, J. [4 ]
机构
[1] Mepco Schlenk Engn Coll Autonomous, Dept Comp Sci & Engn, Trichy, Tamil Nadu, India
[2] Mepco Schlenk Engn Coll Autonomous, Dept Comp Sci & Engn, Sivakasi, Tamil Nadu, India
[3] KLN Coll Engn, Dept Comp Sci & Engn, Sivaganga, India
[4] PSR Engn Coll, Dept Elect & Elect Engn, Sivakasi 626146, Tamil Nadu, India
关键词
Attention segmental recurrent neural network; Automatic video captioning; Bi-directional LSTM; Tree hierarchical deep convolutional neural network;
D O I
10.1007/s00607-024-01334-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The development of automatic video understanding technology is highly needed due to the rise of mass video data, like surveillance videos, personal video data. Several methods have been presented previously for automatic video captioning. But, the existing methods have some problems, like more time consume during processing a huge number of frames, and also it contains over fitting problem. This is a difficult task to automate the process of video caption. So, it affects final result (Caption) accuracy. To overcome these issues, Automatic Video Captioning using Tree Hierarchical Deep Convolutional Neural Network and attention segmental recurrent neural network-bi-directional Long Short-Term Memory (ASRNN-bi-directional LSTM) is proposed in this paper. The captioning part contains two phases: Feature Encoder and Decoder. In feature encoder phase, the tree hierarchical Deep Convolutional Neural Network (Tree CNN) encodes the vector representation of video and extract three kinds of features. In decoder phase, the attention segmental recurrent neural network (ASRNN) decode vector into textual description. ASRNN-base methods struck with long-term dependency issue. To deal this issue, focuses on all generated words from the bi-directional LSTM and caption generator for extracting global context information presented by concealed state of caption generator is local and unfinished. Hence, Golden Eagle Optimization is exploited to enhance ASRNN weight parameters. The proposed method is executed in Python. The proposed technique achieves 34.89%, 29.06% and 20.78% higher accuracy, 23.65%, 22.10% and 29.68% lesser Mean Squared Error compared to the existing methods.
引用
收藏
页码:3691 / 3709
页数:19
相关论文
共 20 条
  • [1] Empirical autopsy of deep video captioning encoder-decoder architecture
    Aafaq, Nayyer
    Akhtar, Naveed
    Liu, Wei
    Mian, Ajmal
    [J]. ARRAY, 2021, 9
  • [2] Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion
    Abdi, Asad
    Shamsuddin, Siti Mariyam
    Hasan, Shafaatunnur
    Piran, Jalil
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (04) : 1245 - 1259
  • [3] Encoder-Decoder Model for Automatic Video Captioning Using Yolo Algorithm
    Alkalouti, Hanan Nasser
    Al Masre, Mayada Ahmed
    [J]. 2021 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS), 2021, : 718 - 721
  • [4] [Anonymous], About us
  • [5] Streets of London: Using Flickr and OpenStreetMap to build an interactive image of the city
    Bahrehdar, Azam Raha
    Adams, Benjamin
    Purves, Ross S.
    [J]. COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2020, 84
  • [6] Dual-Channel Capsule Generative Adversarial Network Optimized with Golden Eagle Optimization for Pediatric Bone Age Assessment from Hand X-Ray Image
    Chandran, J. Jasper Gnana
    Karthick, R.
    Rajagopal, R.
    Meenalochini, P.
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (02)
  • [7] Syntax-Guided Hierarchical Attention Network for Video Captioning
    Deng, Jincan
    Li, Liang
    Zhang, Beichen
    Wang, Shuhui
    Zha, Zhengjun
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 880 - 892
  • [8] Fused GRU with semantic-temporal attention for video captioning
    Gao, Lianli
    Wang, Xuanhan
    Song, Jingkuan
    Liu, Yang
    [J]. NEUROCOMPUTING, 2020, 395 : 222 - 228
  • [9] Islam S., 2021, Social Netw. Comput. Sci., V2, P120, DOI [10.1007/s42979-021-00487-x, DOI 10.1007/S42979-021-00487-X]
  • [10] IoT-based COVID-19 detection using recalling-enhanced recurrent neural network optimized with golden eagle optimization algorithm
    Karthick, S.
    Gomathi, N.
    [J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (03) : 925 - 940