Video captioning with global and local text attention

被引:1
|
作者
Peng, Yuqing [1 ]
Wang, Chenxi [1 ]
Pei, Yixin [1 ]
Li, Yingjun [1 ]
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China
来源
VISUAL COMPUTER | 2022年 / 38卷 / 12期
关键词
Video captioning; Global control; Local strengthening; Bidirectional; AGGREGATION;
D O I
10.1007/s00371-021-02294-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The task of video captioning is to generate a video description corresponding to the video content, so there are stringent requirements for the extraction of fine-grained video features and the language processing of tag text. A new method using global control of the text and local strengthening during training is proposed in this paper. In this method, the context can be referred to when the training generates text. In addition, more attention is given to important words in the text, such as nouns and predicate verbs, and this approach greatly improves the recognition of objects and provides more accurate prediction of actions in the video. Moreover, in this paper, the authors adopt 2D and 3D multimodal feature extraction for the process of video feature extraction. Better results are achieved by the fine-grained feature capture of global attention and the fusion of bidirectional time flow. The method in this paper obtains good results on both the MSR-VTT and MSVD datasets.
引用
收藏
页码:4267 / 4278
页数:12
相关论文
共 50 条
  • [1] Video captioning with global and local text attention
    Yuqing Peng
    Chenxi Wang
    Yixin Pei
    Yingjun Li
    The Visual Computer, 2022, 38 : 4267 - 4278
  • [2] Recurrent convolutional video captioning with global and local attention
    Jin, Tao
    Li, Yingming
    Zhang, Zhongfei
    NEUROCOMPUTING, 2019, 370 : 118 - 127
  • [3] Dense video captioning based on local attention
    Qian, Yong
    Mao, Yingchi
    Chen, Zhihao
    Li, Chang
    Bloh, Olano Teah
    Huang, Qian
    IET IMAGE PROCESSING, 2023, 17 (09) : 2673 - 2685
  • [4] Video Captioning Using Global-Local Representation
    Yan, Liqi
    Ma, Siqi
    Wang, Qifan
    Chen, Yingjie
    Zhang, Xiangyu
    Savakis, Andreas
    Liu, Dongfang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6642 - 6656
  • [5] Variational Stacked Local Attention Networks for Diverse Video Captioning
    Deb, Tonmoay
    Sadmanee, Akib
    Bhaumik, Kishor Kumar
    Ali, Amin Ahsan
    Amin, M. Ashraful
    Rahman, A. K. M. Mahbubur
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2493 - 2502
  • [6] Local-global visual interaction attention for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2022, 130
  • [7] Video captioning with text -based dynamic attention and step-by-step learning
    Xiao, Huanhou
    Shi, Jinglun
    PATTERN RECOGNITION LETTERS, 2020, 133 : 305 - 312
  • [8] A GLOBAL-LOCAL CONTRASTIVE LEARNING FRAMEWORK FOR VIDEO CAPTIONING
    Huang, Qunyue
    Fang, Bin
    Ai, Xi
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2410 - 2414
  • [9] Hierarchical Global-Local Temporal Modeling for Video Captioning
    Hu, Yaosi
    Chen, Zhenzhong
    Zha, Zheng-Jun
    Wu, Feng
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 774 - 783
  • [10] Global-Local Combined Semantic Generation Network for Video Captioning
    Mao L.
    Gao H.
    Yang D.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (09): : 1374 - 1382