Semantic Enhanced Video Captioning with Multi-feature Fusion

被引：3

作者：

Niu, Tian-Zi ^{[1
]}

Dong, Shan-Shan ^{[1
]}

Chen, Zhen-Duo ^{[1
]}

Luo, Xin ^{[1
]}

Guo, Shanqing ^{[2
]}

Huang, Zi ^{[3
]}

Xu, Xin-Shun ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China

[2] Shandong Univ, Sch Cyber Sci & Technol, Qingdao 266237, Peoples R China

[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Australia

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2023年 / 19卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Video captioning; semantic encoder; discrete selection; multi-feature fusion; NETWORK;

D O I：

10.1145/3588572

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage different types of features to generate sentences, e.g., semantic information, 2D or 3D features. However, some methods only treat semantic information as a complement of visual representations and cannot fully exploit it; some of them ignore the relationship between different types of features. In addition, most of them select multiple frames of a video with an equally spaced sampling scheme, resulting in much redundant information. To address these issues, we present a novel video-captioning framework, Semantic Enhanced video captioning with Multi-feature Fusion, SEMF for short. It optimizes the use of different types of features from three aspects. First, a semantic encoder is designed to enhance meaningful semantic features through a semantic dictionary to boost performance. Second, a discrete selection module pays attention to important features and obtains different contexts at different steps to reduce feature redundancy. Finally, a multi-feature fusionmodule uses a novel relation-aware attentionmechanism to separate the common and complementary components of different features to provide more effective visual features for the next step. Moreover, the entire framework can be trained in an end-to-endmanner. Extensive experiments are conducted on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets. The results demonstrate that SEMF is able to achieve state-of-the-art results.

引用

页数：21

共 50 条

[21] High Speed Front-Vehicle Detection Based on Video Multi-feature Fusion
Xiong, Liliang
Yue, Wenjing
Xu, Qiushi
Zhu, Zhengtian
Chen, Zhi
PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 348 - 351
[22] Background Modeling Algorithm for Multi-feature Fusion
Guo, Zhicheng
Dang, Jianwu
Wang, Yangping
Jin, Jing
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1117 - 1121
[23] Knowledge tracing based on multi-feature fusion
Yongkang Xiao
Rong Xiao
Ning Huang
Yixin Hu
Huan Li
Bo Sun
Neural Computing and Applications, 2023, 35 : 1819 - 1833
[24] Knowledge tracing based on multi-feature fusion
Xiao, Yongkang
Xiao, Rong
Huang, Ning
Hu, Yixin
Li, Huan
Sun, Bo
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02): : 1819 - 1833
[25] Multi-feature fusion target tracking algorithm
Liang Hui-hui
He Qiu-sheng
Jia Wei-zhen
Zhang Wei-feng
CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2020, 35 (06) : 583 - 594
[26] MFNet: Multi-Feature Fusion Network for Real-Time Semantic Segmentation in Road Scenes
Lu, Mengxu
Chen, Zhenxue
Liu, Chengyun
Ma, Sile
Cai, Lei
Qin, Hao
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) : 20991 - 21003
[27] Multi-feature Fusion Based on Semantic Understanding Attention Neural Network for Chinese Text Categorization
Xie Jinbao
Hou Yongjin
Kang Shouqiang
Li Baiwei
Zhang Xiao
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (05) : 1258 - 1265
[28] Research on Chinese Semantic Relation Extraction in Marine Engine Rooms Based on Multi-Feature Fusion
Liu, Xicai
Wang, Zhengquan
Wang, Fubo
IEEE ACCESS, 2024, 12 : 192013 - 192027
[29] Video Captioning with Semantic Guiding
Yuan, Jin
Tian, Chunna
Zhang, Xiangnan
Ding, Yuxuan
Wei, Wei
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[30] Semantic similarity information discrimination for video captioning
Du, Sen
Zhu, Hong
Xiong, Ge
Lin, Guangfeng
Wang, Dong
Shi, Jing
Wang, Jing
Xing, Nan
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213

← 1 2 3 4 5 →