Video captioning with global and local text attention

被引：1

作者：

Peng, Yuqing ^{[1
]}

Wang, Chenxi ^{[1
]}

Pei, Yixin ^{[1
]}

Li, Yingjun ^{[1
]}

机构：

[1] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China

来源：

VISUAL COMPUTER | 2022年 / 38卷 / 12期

关键词：

Video captioning; Global control; Local strengthening; Bidirectional; AGGREGATION;

D O I：

10.1007/s00371-021-02294-0

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The task of video captioning is to generate a video description corresponding to the video content, so there are stringent requirements for the extraction of fine-grained video features and the language processing of tag text. A new method using global control of the text and local strengthening during training is proposed in this paper. In this method, the context can be referred to when the training generates text. In addition, more attention is given to important words in the text, such as nouns and predicate verbs, and this approach greatly improves the recognition of objects and provides more accurate prediction of actions in the video. Moreover, in this paper, the authors adopt 2D and 3D multimodal feature extraction for the process of video feature extraction. Better results are achieved by the fine-grained feature capture of global attention and the fusion of bidirectional time flow. The method in this paper obtains good results on both the MSR-VTT and MSVD datasets.

引用

页码：4267 / 4278

页数：12

共 50 条

[1] Video captioning with global and local text attention
Yuqing Peng
Chenxi Wang
Yixin Pei
Yingjun Li
The Visual Computer, 2022, 38 : 4267 - 4278
[2] Recurrent convolutional video captioning with global and local attention
Jin, Tao
Li, Yingming
Zhang, Zhongfei
NEUROCOMPUTING, 2019, 370 : 118 - 127
[3] Dense video captioning based on local attention
Qian, Yong
Mao, Yingchi
Chen, Zhihao
Li, Chang
Bloh, Olano Teah
Huang, Qian
IET IMAGE PROCESSING, 2023, 17 (09) : 2673 - 2685
[4] Video Captioning Using Global-Local Representation
Yan, Liqi
Ma, Siqi
Wang, Qifan
Chen, Yingjie
Zhang, Xiangyu
Savakis, Andreas
Liu, Dongfang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6642 - 6656
[5] Variational Stacked Local Attention Networks for Diverse Video Captioning
Deb, Tonmoay
Sadmanee, Akib
Bhaumik, Kishor Kumar
Ali, Amin Ahsan
Amin, M. Ashraful
Rahman, A. K. M. Mahbubur
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2493 - 2502
[6] Local-global visual interaction attention for image captioning
Wang, Changzhi
Gu, Xiaodong
DIGITAL SIGNAL PROCESSING, 2022, 130
[7] Video captioning with text -based dynamic attention and step-by-step learning
Xiao, Huanhou
Shi, Jinglun
PATTERN RECOGNITION LETTERS, 2020, 133 : 305 - 312
[8] A GLOBAL-LOCAL CONTRASTIVE LEARNING FRAMEWORK FOR VIDEO CAPTIONING
Huang, Qunyue
Fang, Bin
Ai, Xi
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2410 - 2414
[9] Hierarchical Global-Local Temporal Modeling for Video Captioning
Hu, Yaosi
Chen, Zhenzhong
Zha, Zheng-Jun
Wu, Feng
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 774 - 783
[10] Global-Local Combined Semantic Generation Network for Video Captioning
Mao L.
Gao H.
Yang D.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09): : 1374 - 1382

← 1 2 3 4 5 →