Video captioning with global and local text attention

被引:1
|
作者
Peng, Yuqing [1 ]
Wang, Chenxi [1 ]
Pei, Yixin [1 ]
Li, Yingjun [1 ]
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China
来源
VISUAL COMPUTER | 2022年 / 38卷 / 12期
关键词
Video captioning; Global control; Local strengthening; Bidirectional; AGGREGATION;
D O I
10.1007/s00371-021-02294-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The task of video captioning is to generate a video description corresponding to the video content, so there are stringent requirements for the extraction of fine-grained video features and the language processing of tag text. A new method using global control of the text and local strengthening during training is proposed in this paper. In this method, the context can be referred to when the training generates text. In addition, more attention is given to important words in the text, such as nouns and predicate verbs, and this approach greatly improves the recognition of objects and provides more accurate prediction of actions in the video. Moreover, in this paper, the authors adopt 2D and 3D multimodal feature extraction for the process of video feature extraction. Better results are achieved by the fine-grained feature capture of global attention and the fusion of bidirectional time flow. The method in this paper obtains good results on both the MSR-VTT and MSVD datasets.
引用
收藏
页码:4267 / 4278
页数:12
相关论文
共 50 条
  • [21] Motion Guided Spatial Attention for Video Captioning
    Chen, Shaoxiang
    Jiang, Yu-Gang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8191 - 8198
  • [22] UAT: Universal Attention Transformer for Video Captioning
    Im, Heeju
    Choi, Yong-Suk
    SENSORS, 2022, 22 (13)
  • [23] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING
    Sun, Liang
    Li, Bing
    Yuan, Chunfeng
    Zha, Zhengjun
    Hu, Weiming
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305
  • [24] Contextual Attention Network for Emotional Video Captioning
    Song, Peipei
    Guo, Dan
    Cheng, Jun
    Wang, Meng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1858 - 1867
  • [25] Attention based video captioning framework for Hindi
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    MULTIMEDIA SYSTEMS, 2022, 28 (01) : 195 - 207
  • [26] Image Captioning with Text-Based Visual Attention
    Chen He
    Haifeng Hu
    Neural Processing Letters, 2019, 49 : 177 - 185
  • [27] Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network
    Liu, Chunsheng
    Zhang, Xiao
    Chang, Faliang
    Li, Shuang
    Hao, Penghui
    Lu, Yansha
    Wang, Yinhai
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 3615 - 3627
  • [28] Image Captioning with Text-Based Visual Attention
    He, Chen
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2019, 49 (01) : 177 - 185
  • [29] Text-Guided Attention Model for Image Captioning
    Mun, Jonghwan
    Cho, Minsu
    Han, Bohyung
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4233 - 4239
  • [30] Global-Local Mutual Attention Model for Text Classification
    Ma, Qianli
    Yu, Liuhong
    Tian, Shuai
    Chen, Enhuan
    Ng, Wing W. Y.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2127 - 2139