Video Captioning with Visual and Semantic Features

被引:5
|
作者
Lee, Sujin [1 ]
Kim, Incheol [2 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Grad Sch, Suwon, South Korea
[2] Kyonggi Univ, Dept Comp Sci, Suwon, South Korea
来源
JOURNAL OF INFORMATION PROCESSING SYSTEMS | 2018年 / 14卷 / 06期
关键词
Attention-Based Caption Generation; Deep Neural Networks; Semantic Feature; Video Captioning;
D O I
10.3745/JIPS.02.0098
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).
引用
收藏
页码:1318 / 1330
页数:13
相关论文
共 50 条
  • [31] Rich Visual and Language Representation with Complementary Semantics for Video Captioning
    Tang, Pengjie
    Wang, Hanli
    Li, Qinyu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)
  • [32] Learning to enhance areal video captioning with visual question answering
    Al Mehmadi, Shima M.
    Bazi, Yakoub
    Al Rahhal, Mohamad M.
    Zuair, Mansour
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (18) : 6395 - 6407
  • [33] MIVCN: Multimodal interaction video captioning network based on semantic association graph
    Wang, Ying
    Huang, Guoheng
    Lin Yuming
    Yuan, Haoliang
    Pun, Chi-Man
    Ling, Wing-Kuen
    Cheng, Lianglun
    APPLIED INTELLIGENCE, 2022, 52 (05) : 5241 - 5260
  • [34] Memory-attended semantic context-aware network for video captioning
    Chen, Shuqin
    Zhong, Xian
    Wu, Shifeng
    Sun, Zhixin
    Liu, Wenxuan
    Jia, Xuemei
    Xia, Hongxia
    SOFT COMPUTING, 2021, 28 (Suppl 2) : 425 - 425
  • [35] MIVCN: Multimodal interaction video captioning network based on semantic association graph
    Ying Wang
    Guoheng Huang
    Lin Yuming
    Haoliang Yuan
    Chi-Man Pun
    Wing-Kuen Ling
    Lianglun Cheng
    Applied Intelligence, 2022, 52 : 5241 - 5260
  • [36] Incorporating the Graph Representation of Video and Text into Video Captioning
    Lu, Min
    Li, Yuan
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 396 - 401
  • [37] Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
    Liu, Fenglin
    Wu, Xian
    You, Chenyu
    Ge, Shen
    Zou, Yuexian
    Sun, Xu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9255 - 9268
  • [38] Video Captioning Based on C3D and Visual Elements
    Xiao H.
    Shi J.
    2018, South China University of Technology (46): : 88 - 95
  • [39] Multi-scale features with temporal information guidance for video captioning
    Zhao, Hong
    Chen, Zhiwen
    Yang, Yi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [40] Multimodal Deep Neural Network with Image Sequence Features for Video Captioning
    Oura, Soichiro
    Matsukawa, Tetsu
    Suzuki, Einoshin
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,