Image Caption Generation using Deep Learning For Video Summarization Applications

被引：1

作者：

Inayathulla, Mohammed ^{[1
]}

Karthikeyan, C. ^{[1
]}

机构：

[1] Koneru Lakshmaiah Educ Fdn, Dept Comp Sci & Engn, Guntur, Andhra Pradesh, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2024年 / 15卷 / 01期

关键词：

Video summarization; deep learning; image caption synthesis; densenet201; GloVe embeddings; LSTM;

D O I：

10.14569/IJACSA.2024.0150155

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In the area of video summarization applications, automatic image caption synthesis using deep learning is a promising approach. This methodology utilizes the capabilities of neural networks to autonomously produce detailed textual descriptions for significant frames or instances in a video. Through the examination of visual elements, deep learning models possess the capability to discern and classify objects, scenarios, and actions, hence enabling the generation of coherent and useful captions. This paper presents a novel methodology for generating image captions in the context of video summarizing applications. DenseNet201 architecture is used to extract image features, enabling the effective extraction of comprehensive visual information from keyframes in the videos. In text processing, GloVe embedding, which is pre -trained word vectors that capture semantic associations between words, is employed to efficiently represent textual information. The utilization of these embeddings establishes a fundamental basis for comprehending the contextual variations and semantic significance of words contained within the captions. LSTM models are subsequently utilized to process the GloVe embeddings, facilitating the development of captions that keep coherence, context, and readability. The integration of GloVe embeddings with LSTM models in this study facilitates the effective fusion of visual and textual data, leading to the generation of captions that are both informative and contextually relevant for video summarization. The proposed model significantly enhances the performance by combining the strengths of convolutional neural networks for image analysis and recurrent neural networks for natural language generation. The experimental results demonstrate the effectiveness of the proposed approach in generating informative captions for video summarization, offering a valuable tool for content understanding, retrieval, and recommendation.

引用

页码：565 / 572

页数：8

共 50 条

[1] Automatic image caption generation using deep learning
Verma, Akash
Yadav, Arun Kumar
Kumar, Mohit
Yadav, Divakar
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
[2] Image Caption Generation using Deep Learning Technique
Amritkar, Chetan
Jabade, Vaishali
2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
[3] Automatic image caption generation using deep learning
Akash Verma
Arun Kumar Yadav
Mohit Kumar
Divakar Yadav
Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
[4] A Hindi Image Caption Generation Framework Using Deep Learning
Mishra, Santosh Kumar
Dhir, Rijul
Saha, Sriparna
Bhattacharyya, Pushpak
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
[5] Automatic image caption generation using deep learning and multimodal attention
Dai, Jin
Zhang, Xinyu
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
[6] Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap
Amirian, Soheyla
Rasheed, Khaled
Taha, Thiab R.
Arabnia, Hamid R.
IEEE ACCESS, 2020, 8 (08): : 218386 - 218400
[7] Image Caption Generation Using A Deep Architecture
Hani, Ansar
Tagougui, Najiba
Kherallah, Monji
2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
[8] Sentence Learning on Deep Convolutional Networks for Image Caption Generation
Kim, Dong-Jin
Yoo, Donggeun
Sim, Bonggeun
Kweon, In So
2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2016, : 246 - 247
[9] Ensemble Learning on Deep Neural Networks for Image Caption Generation
Katpally, Harshitha
Bansal, Ajay
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 61 - 68
[10] Lecture Video Summarization Using Deep Learning
Khetarpaul, Sonia
Jain, Lakshay
Goyal, Kush
Tej, P. Vishnu
RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 2145 : 94 - 105

← 1 2 3 4 5 →