Video captioning with global and local text attention

被引:1
作者
Peng, Yuqing [1 ]
Wang, Chenxi [1 ]
Pei, Yixin [1 ]
Li, Yingjun [1 ]
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China
关键词
Video captioning; Global control; Local strengthening; Bidirectional; AGGREGATION;
D O I
10.1007/s00371-021-02294-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The task of video captioning is to generate a video description corresponding to the video content, so there are stringent requirements for the extraction of fine-grained video features and the language processing of tag text. A new method using global control of the text and local strengthening during training is proposed in this paper. In this method, the context can be referred to when the training generates text. In addition, more attention is given to important words in the text, such as nouns and predicate verbs, and this approach greatly improves the recognition of objects and provides more accurate prediction of actions in the video. Moreover, in this paper, the authors adopt 2D and 3D multimodal feature extraction for the process of video feature extraction. Better results are achieved by the fine-grained feature capture of global attention and the fusion of bidirectional time flow. The method in this paper obtains good results on both the MSR-VTT and MSVD datasets.
引用
收藏
页码:4267 / 4278
页数:12
相关论文
共 50 条
  • [41] Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
    Nakamura, Katsuyuki
    Ohashi, Hiroki
    Okada, Mitsuhiro
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4220 - 4229
  • [42] Refocused Attention: Long Short-Term Rewards Guided Video Captioning
    Dong, Jiarong
    Gao, Ke
    Chen, Xiaokai
    Cao, Juan
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 935 - 948
  • [43] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
    Shuqin Chen
    Li Yang
    Yikang Hu
    Neural Processing Letters, 2023, 55 (8) : 11509 - 11526
  • [44] Critic-based Attention Network for Event-based Video Captioning
    Barati, Elaheh
    Chen, Xuewen
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 811 - 817
  • [45] Anticipation Video Captioning of Aerial Refueling Based on Combined Attention Masking Mechanism
    Wu, Shuai
    Tong, Wei
    Duan, Ya
    Yang, Weidong
    Zhu, Guangyu
    Wu, Edmond Q.
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (03): : 4373 - 4382
  • [46] Semantic Embedding Guided Attention with Explicit Visual Feature Fusion for Video Captioning
    Dong, Shanshan
    Niu, Tianzi
    Luo, Xin
    Liu, Wu
    Xu, Xinshun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [47] Fully exploring object relation interaction and hidden state attention for video captioning
    Yuan, Feiniu
    Gu, Sipei
    Zhang, Xiangfen
    Fang, Zhijun
    PATTERN RECOGNITION, 2025, 159
  • [48] Exploring adaptive attention in memory transformer applied to coherent video paragraph captioning
    Cardoso, Leonardo Vilela
    Guimaraes, Silvio Jamil F.
    Patrocinio, Zenilton K. G., Jr.
    2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022), 2022, : 37 - 44
  • [49] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
    Chen, Shuqin
    Yang, Li
    Hu, Yikang
    NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11509 - 11526
  • [50] Refocused Attention: Long Short-Term Rewards Guided Video Captioning
    Jiarong Dong
    Ke Gao
    Xiaokai Chen
    Juan Cao
    Neural Processing Letters, 2020, 52 : 935 - 948