Video captioning with global and local text attention

被引：1

作者：

Peng, Yuqing ^{[1
]}

Wang, Chenxi ^{[1
]}

Pei, Yixin ^{[1
]}

Li, Yingjun ^{[1
]}

机构：

[1] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China

来源：

VISUAL COMPUTER | 2022年 / 38卷 / 12期

关键词：

Video captioning; Global control; Local strengthening; Bidirectional; AGGREGATION;

D O I：

10.1007/s00371-021-02294-0

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The task of video captioning is to generate a video description corresponding to the video content, so there are stringent requirements for the extraction of fine-grained video features and the language processing of tag text. A new method using global control of the text and local strengthening during training is proposed in this paper. In this method, the context can be referred to when the training generates text. In addition, more attention is given to important words in the text, such as nouns and predicate verbs, and this approach greatly improves the recognition of objects and provides more accurate prediction of actions in the video. Moreover, in this paper, the authors adopt 2D and 3D multimodal feature extraction for the process of video feature extraction. Better results are achieved by the fine-grained feature capture of global attention and the fusion of bidirectional time flow. The method in this paper obtains good results on both the MSR-VTT and MSVD datasets.

引用

页码：4267 / 4278

页数：12

共 50 条

[41] Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Nakamura, Katsuyuki
Ohashi, Hiroki
Okada, Mitsuhiro
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4220 - 4229
[42] Refocused Attention: Long Short-Term Rewards Guided Video Captioning
Dong, Jiarong
Gao, Ke
Chen, Xiaokai
Cao, Juan
NEURAL PROCESSING LETTERS, 2020, 52 (02) : 935 - 948
[43] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
Shuqin Chen
Li Yang
Yikang Hu
Neural Processing Letters, 2023, 55 (8) : 11509 - 11526
[44] Critic-based Attention Network for Event-based Video Captioning
Barati, Elaheh
Chen, Xuewen
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 811 - 817
[45] Anticipation Video Captioning of Aerial Refueling Based on Combined Attention Masking Mechanism
Wu, Shuai
Tong, Wei
Duan, Ya
Yang, Weidong
Zhu, Guangyu
Wu, Edmond Q.
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (03): : 4373 - 4382
[46] Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning
Dong, Shanshan
Niu, Tianzi
Luo, Xin
Liu, Wu
Xu, Xinshun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
[47] Fully exploring object relation interaction and hidden state attention for video captioning
Yuan, Feiniu
Gu, Sipei
Zhang, Xiangfen
Fang, Zhijun
PATTERN RECOGNITION, 2025, 159
[48] Exploring adaptive attention in memory transformer applied to coherent video paragraph captioning
Cardoso, Leonardo Vilela
Guimaraes, Silvio Jamil F.
Patrocinio, Zenilton K. G., Jr.
2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022), 2022, : 37 - 44
[49] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
Chen, Shuqin
Yang, Li
Hu, Yikang
NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11509 - 11526
[50] Refocused Attention: Long Short-Term Rewards Guided Video Captioning
Jiarong Dong
Ke Gao
Xiaokai Chen
Juan Cao
Neural Processing Letters, 2020, 52 : 935 - 948

← 1 2 3 4 5 →