Video Captioning with Visual and Semantic Features

被引：5

作者：

Lee, Sujin ^{[1
]}

Kim, Incheol ^{[2
]}

机构：

[1] Kyonggi Univ, Dept Comp Sci, Grad Sch, Suwon, South Korea

[2] Kyonggi Univ, Dept Comp Sci, Suwon, South Korea

来源：

JOURNAL OF INFORMATION PROCESSING SYSTEMS | 2018年 / 14卷 / 06期

关键词：

Attention-Based Caption Generation; Deep Neural Networks; Semantic Feature; Video Captioning;

D O I：

10.3745/JIPS.02.0098

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).

引用

页码：1318 / 1330

页数：13

共 50 条

[41] Video captioning – a survey
Vaishnavi J.
Narmatha V.
Multimedia Tools and Applications, 2025, 84 (2) : 947 - 978
[42] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
Li, Shun
Zhang, Ze-Fan
Ji, Yi
Li, Ying
Liu, Chun-Ping
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[43] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
Shuqin Chen
Li Yang
Yikang Hu
Neural Processing Letters, 2023, 55 (8) : 11509 - 11526
[44] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
Chen, Shuqin
Yang, Li
Hu, Yikang
NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11509 - 11526
[45] Video Captioning based on Image Captioning as Subsidiary Content
Vaishnavi, J.
Narmatha, V
2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
[46] Rethink video retrieval representation for video captioning
Tian, Mingkai
Li, Guorong
Qi, Yuankai
Wang, Shuhui
Sheng, Quan Z.
Huang, Qingming
PATTERN RECOGNITION, 2024, 156
[47] A Review Of Video Captioning Methods
Mahajan, Dewarthi
Bhosale, Sakshi
Nighot, Yash
Tayal, Madhuri
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715
[48] Towards Knowledge-Aware Video Captioning via Transitive Visual Relationship Detection
Wu, Bofeng
Niu, Guocheng
Yu, Jun
Xiao, Xinyan
Zhang, Jian
Wu, Hua
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6753 - 6765
[49] Sequence in sequence for video captioning
Wang, Huiyun
Gao, Chongyang
Han, Yahong
PATTERN RECOGNITION LETTERS, 2020, 130 (130) : 327 - 334
[50] Multirate Multimodal Video Captioning
Yang, Ziwei
Xu, Youjiang
Wang, Huiyun
Wang, Bo
Han, Yahong
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1877 - 1882

← 1 2 3 4 5 →