Semantic Enhanced Video Captioning with Multi-feature Fusion

被引：3

作者：

Niu, Tian-Zi ^{[1
]}

Dong, Shan-Shan ^{[1
]}

Chen, Zhen-Duo ^{[1
]}

Luo, Xin ^{[1
]}

Guo, Shanqing ^{[2
]}

Huang, Zi ^{[3
]}

Xu, Xin-Shun ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China

[2] Shandong Univ, Sch Cyber Sci & Technol, Qingdao 266237, Peoples R China

[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Australia

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2023年 / 19卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Video captioning; semantic encoder; discrete selection; multi-feature fusion; NETWORK;

D O I：

10.1145/3588572

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage different types of features to generate sentences, e.g., semantic information, 2D or 3D features. However, some methods only treat semantic information as a complement of visual representations and cannot fully exploit it; some of them ignore the relationship between different types of features. In addition, most of them select multiple frames of a video with an equally spaced sampling scheme, resulting in much redundant information. To address these issues, we present a novel video-captioning framework, Semantic Enhanced video captioning with Multi-feature Fusion, SEMF for short. It optimizes the use of different types of features from three aspects. First, a semantic encoder is designed to enhance meaningful semantic features through a semantic dictionary to boost performance. Second, a discrete selection module pays attention to important features and obtains different contexts at different steps to reduce feature redundancy. Finally, a multi-feature fusionmodule uses a novel relation-aware attentionmechanism to separate the common and complementary components of different features to provide more effective visual features for the next step. Moreover, the entire framework can be trained in an end-to-endmanner. Extensive experiments are conducted on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets. The results demonstrate that SEMF is able to achieve state-of-the-art results.

引用

页数：21

共 50 条

[41] Survival Situation Awareness Based on Multi-feature Fusion
Zhao, Jinhui
Shuo, Liangxun
Qian, Xu
INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS, PTS 1-4, 2013, 241-244 : 2528 - +
[42] Pipeline signal feature extraction with improved VMD and multi-feature fusion
Zhou, Yina
Zhang, Yong
Yang, Dandi
Lu, Jingyi
Dong, Hongli
Li, Gongfa
SYSTEMS SCIENCE & CONTROL ENGINEERING, 2020, 8 (01) : 318 - 327
[43] Sequence Neural Network for Recommendation with Multi-feature Fusion
Gu, Xiao
Zhao, Haiping
Jian, Ling
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 210
[44] Multi-Feature Fusion for Enhancing Image Similarity Learning
Lu, Jian
Ma, Cheng-Xian
Zhou, Yan-Ran
Luo, Mao-Xin
Zhang, Kai-Bing
IEEE ACCESS, 2019, 7 : 167547 - 167556
[45] ADAPTIVE MULTI-FEATURE FUSION FOR ROBUST OBJECT TRACKING
Liu, Mengxue
Qi, Yujuan
Wang, Yanjiang
Liu, Baodi
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1884 - 1888
[46] Multi-level video captioning method based on semantic space
Yao, Xiao
Zeng, Yuanlin
Gu, Min
Yuan, Ruxi
Li, Jie
Ge, Junyi
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72113 - 72130
[47] MO-QoE: Video QoE using multi-feature fusion based Optimized Learning Models
Ghosh, Monalisa
Singhal, Chetna
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 107
[48] CenterNet-SPP based on multi-feature fusion for basketball posture recognition
Jin, Zhouxiang
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (36):
[49] A Model for Yellow Tea Polyphenols Content Estimation Based on Multi-Feature Fusion
Yang, Baohua
Zhu, Yue
Wang, Mengxuan
Ning, Jingming
IEEE ACCESS, 2019, 7 : 180054 - 180063
[50] Video Captioning with Visual and Semantic Features
Lee, Sujin
Kim, Incheol
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (06): : 1318 - 1330

← 1 2 3 4 5 →