Semantic Enhanced Video Captioning with Multi-feature Fusion

被引：3

作者：

Niu, Tian-Zi ^{[1
]}

Dong, Shan-Shan ^{[1
]}

Chen, Zhen-Duo ^{[1
]}

Luo, Xin ^{[1
]}

Guo, Shanqing ^{[2
]}

Huang, Zi ^{[3
]}

Xu, Xin-Shun ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China

[2] Shandong Univ, Sch Cyber Sci & Technol, Qingdao 266237, Peoples R China

[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Australia

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2023年 / 19卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Video captioning; semantic encoder; discrete selection; multi-feature fusion; NETWORK;

D O I：

10.1145/3588572

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage different types of features to generate sentences, e.g., semantic information, 2D or 3D features. However, some methods only treat semantic information as a complement of visual representations and cannot fully exploit it; some of them ignore the relationship between different types of features. In addition, most of them select multiple frames of a video with an equally spaced sampling scheme, resulting in much redundant information. To address these issues, we present a novel video-captioning framework, Semantic Enhanced video captioning with Multi-feature Fusion, SEMF for short. It optimizes the use of different types of features from three aspects. First, a semantic encoder is designed to enhance meaningful semantic features through a semantic dictionary to boost performance. Second, a discrete selection module pays attention to important features and obtains different contexts at different steps to reduce feature redundancy. Finally, a multi-feature fusionmodule uses a novel relation-aware attentionmechanism to separate the common and complementary components of different features to provide more effective visual features for the next step. Moreover, the entire framework can be trained in an end-to-endmanner. Extensive experiments are conducted on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets. The results demonstrate that SEMF is able to achieve state-of-the-art results.

引用

页数：21

共 50 条

[31] DeepFireNet: A real-time video fire detection method based on multi-feature fusion
Zhang, Bin
Sun, Linkun
Song, Yingjie
Shao, Weiping
Guo, Yan
Yuan, Fang
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2020, 17 (06) : 7804 - 7818
[32] Multi-Feature Fusion 3D-CNN for Tooth Segmentation
Rao, Yunbo
Gou, Miao
Wang, Yilin
Chen, Zening
Xue, Junmin
Sun, Jianxun
Wang, Zairong
TWELFTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2020), 2021, 11720
[33] Unknown Traffic Recognition Based on Multi-Feature Fusion and Incremental Learning
Liu, Junyi
Wang, Jiarong
Yan, Tian
Qi, Fazhi
Chen, Gang
APPLIED SCIENCES-BASEL, 2023, 13 (13):
[34] Center-enhanced video captioning model with multimodal semantic alignment
Zhang, Benhui
Gao, Junyu
Yuan, Yuan
NEURAL NETWORKS, 2024, 180
[35] HIERARCHICAL MULTI-FEATURE FUSION FOR MULTIMODAL DATA ANALYSIS
Zhang, Hong
Chen, Li
Liu, Jun
Yuan, Junsong
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5916 - 5920
[36] Adaptive Multi-Feature Fusion for Underwater Diver Classification
Yang Juan
Xu Feng
Wei Zhiheng
An Xudong
Liu Jia
Ji Yongqiang
Wen Tao
2013 IEEE/OES ACOUSTICS IN UNDERWATER GEOSCIENCES SYMPOSIUM (RIO ACOUSTICS 2013), 2013,
[37] Modeling Context-Guided Visual and Linguistic Semantic Feature for Video Captioning
Sun, Zhixin
Zhong, Xian
Chen, Shuqin
Liu, Wenxuan
Feng, Duxiu
Li, Lin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 677 - 689
[38] Adaptive Multi-feature Fusion for Correlation Filter Tracking
Liu, Linfeng
Yan, Xiaole
Shen, Qiu
COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 1057 - 1066
[39] An adaptive KCF tracking via multi-feature fusion
Guo De-quan
Peng Sheng
Ling Sheng-gui
Yang Hong-yu
Liu Hong
2017 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2017), 2017, : 255 - 260
[40] VideoGIS Data Retrieval Based on Multi-feature Fusion
Dai, Haihong
Hu, Bin
Cui, Qian
Zou, Zhiqiang
2017 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (IEEE ISKE), 2017,

← 1 2 3 4 5 →