Video Captioning with Visual and Semantic Features

被引:5
|
作者
Lee, Sujin [1 ]
Kim, Incheol [2 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Grad Sch, Suwon, South Korea
[2] Kyonggi Univ, Dept Comp Sci, Suwon, South Korea
来源
JOURNAL OF INFORMATION PROCESSING SYSTEMS | 2018年 / 14卷 / 06期
关键词
Attention-Based Caption Generation; Deep Neural Networks; Semantic Feature; Video Captioning;
D O I
10.3745/JIPS.02.0098
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).
引用
收藏
页码:1318 / 1330
页数:13
相关论文
共 50 条
  • [41] Video captioning – a survey
    Vaishnavi J.
    Narmatha V.
    Multimedia Tools and Applications, 2025, 84 (2) : 947 - 978
  • [42] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
    Li, Shun
    Zhang, Ze-Fan
    Ji, Yi
    Li, Ying
    Liu, Chun-Ping
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [43] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
    Shuqin Chen
    Li Yang
    Yikang Hu
    Neural Processing Letters, 2023, 55 (8) : 11509 - 11526
  • [44] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
    Chen, Shuqin
    Yang, Li
    Hu, Yikang
    NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11509 - 11526
  • [45] Video Captioning based on Image Captioning as Subsidiary Content
    Vaishnavi, J.
    Narmatha, V
    2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [46] Rethink video retrieval representation for video captioning
    Tian, Mingkai
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Sheng, Quan Z.
    Huang, Qingming
    PATTERN RECOGNITION, 2024, 156
  • [47] A Review Of Video Captioning Methods
    Mahajan, Dewarthi
    Bhosale, Sakshi
    Nighot, Yash
    Tayal, Madhuri
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (05): : 708 - 715
  • [48] Towards Knowledge-Aware Video Captioning via Transitive Visual Relationship Detection
    Wu, Bofeng
    Niu, Guocheng
    Yu, Jun
    Xiao, Xinyan
    Zhang, Jian
    Wu, Hua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6753 - 6765
  • [49] Sequence in sequence for video captioning
    Wang, Huiyun
    Gao, Chongyang
    Han, Yahong
    PATTERN RECOGNITION LETTERS, 2020, 130 (130) : 327 - 334
  • [50] Multirate Multimodal Video Captioning
    Yang, Ziwei
    Xu, Youjiang
    Wang, Huiyun
    Wang, Bo
    Han, Yahong
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1877 - 1882