Automatic video captioning using tree hierarchical deep convolutional neural network and ASRNN-bi-directional LSTM

被引：14

作者：

Kavitha, N. ^{[1
]}

Soundar, K. Ruba ^{[2
]}

Karthick, R. ^{[3
]}

Kohila, J. ^{[4
]}

机构：

[1] Mepco Schlenk Engn Coll Autonomous, Dept Comp Sci & Engn, Trichy, Tamil Nadu, India

[2] Mepco Schlenk Engn Coll Autonomous, Dept Comp Sci & Engn, Sivakasi, Tamil Nadu, India

[3] KLN Coll Engn, Dept Comp Sci & Engn, Sivaganga, India

[4] PSR Engn Coll, Dept Elect & Elect Engn, Sivakasi 626146, Tamil Nadu, India

来源：

COMPUTING | 2024年 / 106卷 / 11期

关键词：

Attention segmental recurrent neural network; Automatic video captioning; Bi-directional LSTM; Tree hierarchical deep convolutional neural network;

D O I：

10.1007/s00607-024-01334-6

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The development of automatic video understanding technology is highly needed due to the rise of mass video data, like surveillance videos, personal video data. Several methods have been presented previously for automatic video captioning. But, the existing methods have some problems, like more time consume during processing a huge number of frames, and also it contains over fitting problem. This is a difficult task to automate the process of video caption. So, it affects final result (Caption) accuracy. To overcome these issues, Automatic Video Captioning using Tree Hierarchical Deep Convolutional Neural Network and attention segmental recurrent neural network-bi-directional Long Short-Term Memory (ASRNN-bi-directional LSTM) is proposed in this paper. The captioning part contains two phases: Feature Encoder and Decoder. In feature encoder phase, the tree hierarchical Deep Convolutional Neural Network (Tree CNN) encodes the vector representation of video and extract three kinds of features. In decoder phase, the attention segmental recurrent neural network (ASRNN) decode vector into textual description. ASRNN-base methods struck with long-term dependency issue. To deal this issue, focuses on all generated words from the bi-directional LSTM and caption generator for extracting global context information presented by concealed state of caption generator is local and unfinished. Hence, Golden Eagle Optimization is exploited to enhance ASRNN weight parameters. The proposed method is executed in Python. The proposed technique achieves 34.89%, 29.06% and 20.78% higher accuracy, 23.65%, 22.10% and 29.68% lesser Mean Squared Error compared to the existing methods.

引用

页码：3691 / 3709

页数：19

共 20 条

[1] Empirical autopsy of deep video captioning encoder-decoder architecture
Aafaq, Nayyer
Akhtar, Naveed
Liu, Wei
Mian, Ajmal
[J]. ARRAY, 2021, 9
[2] Deep learning-based sentiment classification of evaluative text based on Multi-feature fusion
Abdi, Asad
Shamsuddin, Siti Mariyam
Hasan, Shafaatunnur
Piran, Jalil
[J]. INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (04) : 1245 - 1259
[3] Encoder-Decoder Model for Automatic Video Captioning Using Yolo Algorithm
Alkalouti, Hanan Nasser
Al Masre, Mayada Ahmed
[J]. 2021 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS), 2021, : 718 - 721
[4] [Anonymous], About us
[5] Streets of London: Using Flickr and OpenStreetMap to build an interactive image of the city
Bahrehdar, Azam Raha
Adams, Benjamin
Purves, Ross S.
[J]. COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2020, 84
[6] Dual-Channel Capsule Generative Adversarial Network Optimized with Golden Eagle Optimization for Pediatric Bone Age Assessment from Hand X-Ray Image
Chandran, J. Jasper Gnana
Karthick, R.
Rajagopal, R.
Meenalochini, P.
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (02)
[7] Syntax-Guided Hierarchical Attention Network for Video Captioning
Deng, Jincan
Li, Liang
Zhang, Beichen
Wang, Shuhui
Zha, Zhengjun
Huang, Qingming
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 880 - 892
[8] Fused GRU with semantic-temporal attention for video captioning
Gao, Lianli
Wang, Xuanhan
Song, Jingkuan
Liu, Yang
[J]. NEUROCOMPUTING, 2020, 395 : 222 - 228
[9] Islam S., 2021, Social Netw. Comput. Sci., V2, P120, DOI [10.1007/s42979-021-00487-x, DOI 10.1007/S42979-021-00487-X]
[10] IoT-based COVID-19 detection using recalling-enhanced recurrent neural network optimized with golden eagle optimization algorithm
Karthick, S.
Gomathi, N.
[J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (03) : 925 - 940

← 1 2 →