Video Captioning with Visual and Semantic Features

被引:5
|
作者
Lee, Sujin [1 ]
Kim, Incheol [2 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Grad Sch, Suwon, South Korea
[2] Kyonggi Univ, Dept Comp Sci, Suwon, South Korea
来源
JOURNAL OF INFORMATION PROCESSING SYSTEMS | 2018年 / 14卷 / 06期
关键词
Attention-Based Caption Generation; Deep Neural Networks; Semantic Feature; Video Captioning;
D O I
10.3745/JIPS.02.0098
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).
引用
收藏
页码:1318 / 1330
页数:13
相关论文
共 50 条
  • [1] Attentive Visual Semantic Specialized Network for Video Captioning
    Perez-Martin, Jesus
    Bustos, Benjamin
    Perez, Jorge
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5767 - 5774
  • [2] Richer Semantic Visual and Language Representation for Video Captioning
    Tang, Pengjie
    Wang, Hanli
    Wang, Hanzhang
    Xu, Kaisheng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1871 - 1876
  • [3] Video Captioning with Semantic Guiding
    Yuan, Jin
    Tian, Chunna
    Zhang, Xiangnan
    Ding, Yuxuan
    Wei, Wei
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [4] Chained semantic generation network for video captioning
    Mao L.
    Gao H.
    Yang D.
    Zhang R.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2022, 30 (24): : 3198 - 3209
  • [5] Modeling Context-Guided Visual and Linguistic Semantic Feature for Video Captioning
    Sun, Zhixin
    Zhong, Xian
    Chen, Shuqin
    Liu, Wenxuan
    Feng, Duxiu
    Li, Lin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 677 - 689
  • [6] Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning
    Dong, Shanshan
    Niu, Tianzi
    Luo, Xin
    Liu, Wu
    Xu, Xinshun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [7] Multi-Level Visual Representation with Semantic-Reinforced Learning for Video Captioning
    Dong, Chengbo
    Chen, Xinru
    Chen, Aozhu
    Hu, Fan
    Wang, Zihan
    Li, Xirong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4750 - 4754
  • [8] Global semantic enhancement network for video captioning
    Luo, Xuemei
    Luo, Xiaotong
    Wang, Di
    Liu, Jinhui
    Wan, Bo
    Zhao, Lin
    PATTERN RECOGNITION, 2024, 145
  • [9] Adaptive semantic guidance network for video captioning☆
    Liu, Yuanyuan
    Zhu, Hong
    Wu, Zhong
    Du, Sen
    Wu, Shuning
    Shi, Jing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [10] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING
    Sun, Liang
    Li, Bing
    Yuan, Chunfeng
    Zha, Zhengjun
    Hu, Weiming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305