Semantic Enhanced Video Captioning with Multi-feature Fusion

被引:3
|
作者
Niu, Tian-Zi [1 ]
Dong, Shan-Shan [1 ]
Chen, Zhen-Duo [1 ]
Luo, Xin [1 ]
Guo, Shanqing [2 ]
Huang, Zi [3 ]
Xu, Xin-Shun [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
[2] Shandong Univ, Sch Cyber Sci & Technol, Qingdao 266237, Peoples R China
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Australia
基金
中国国家自然科学基金;
关键词
Video captioning; semantic encoder; discrete selection; multi-feature fusion; NETWORK;
D O I
10.1145/3588572
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage different types of features to generate sentences, e.g., semantic information, 2D or 3D features. However, some methods only treat semantic information as a complement of visual representations and cannot fully exploit it; some of them ignore the relationship between different types of features. In addition, most of them select multiple frames of a video with an equally spaced sampling scheme, resulting in much redundant information. To address these issues, we present a novel video-captioning framework, Semantic Enhanced video captioning with Multi-feature Fusion, SEMF for short. It optimizes the use of different types of features from three aspects. First, a semantic encoder is designed to enhance meaningful semantic features through a semantic dictionary to boost performance. Second, a discrete selection module pays attention to important features and obtains different contexts at different steps to reduce feature redundancy. Finally, a multi-feature fusionmodule uses a novel relation-aware attentionmechanism to separate the common and complementary components of different features to provide more effective visual features for the next step. Moreover, the entire framework can be trained in an end-to-endmanner. Extensive experiments are conducted on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets. The results demonstrate that SEMF is able to achieve state-of-the-art results.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Survival Situation Awareness Based on Multi-feature Fusion
    Zhao, Jinhui
    Shuo, Liangxun
    Qian, Xu
    INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS, PTS 1-4, 2013, 241-244 : 2528 - +
  • [42] Pipeline signal feature extraction with improved VMD and multi-feature fusion
    Zhou, Yina
    Zhang, Yong
    Yang, Dandi
    Lu, Jingyi
    Dong, Hongli
    Li, Gongfa
    SYSTEMS SCIENCE & CONTROL ENGINEERING, 2020, 8 (01) : 318 - 327
  • [43] Sequence Neural Network for Recommendation with Multi-feature Fusion
    Gu, Xiao
    Zhao, Haiping
    Jian, Ling
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 210
  • [44] Multi-Feature Fusion for Enhancing Image Similarity Learning
    Lu, Jian
    Ma, Cheng-Xian
    Zhou, Yan-Ran
    Luo, Mao-Xin
    Zhang, Kai-Bing
    IEEE ACCESS, 2019, 7 : 167547 - 167556
  • [45] ADAPTIVE MULTI-FEATURE FUSION FOR ROBUST OBJECT TRACKING
    Liu, Mengxue
    Qi, Yujuan
    Wang, Yanjiang
    Liu, Baodi
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1884 - 1888
  • [46] Multi-level video captioning method based on semantic space
    Yao, Xiao
    Zeng, Yuanlin
    Gu, Min
    Yuan, Ruxi
    Li, Jie
    Ge, Junyi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72113 - 72130
  • [47] MO-QoE: Video QoE using multi-feature fusion based Optimized Learning Models
    Ghosh, Monalisa
    Singhal, Chetna
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 107
  • [48] CenterNet-SPP based on multi-feature fusion for basketball posture recognition
    Jin, Zhouxiang
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (36):
  • [49] A Model for Yellow Tea Polyphenols Content Estimation Based on Multi-Feature Fusion
    Yang, Baohua
    Zhu, Yue
    Wang, Mengxuan
    Ning, Jingming
    IEEE ACCESS, 2019, 7 : 180054 - 180063
  • [50] Video Captioning with Visual and Semantic Features
    Lee, Sujin
    Kim, Incheol
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (06): : 1318 - 1330