Global-Attention-Based Neural Networks for Vision Language Intelligence

被引:14
|
作者
Liu, Pei [1 ]
Zhou, Yingjie [1 ]
Peng, Dezhong [1 ,2 ,3 ]
Wu, Dapeng [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Zhiqian Technol Co Ltd, Chengdu 610041, Peoples R China
[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[4] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
基金
中国国家自然科学基金;
关键词
Global attention; image captioning; latent contribution;
D O I
10.1109/JAS.2020.1003402
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we develop a novel global-attention-based neural network (GANN) for vision language intelligence, specifically, image captioning (language description of a given image). As many previous works, the encoder-decoder framework is adopted in our proposed model, in which the encoder is responsible for encoding the region proposal features and extracting global caption feature based on a specially designed module of predicting the caption objects, and the decoder generates captions by taking the obtained global caption feature along with the encoded visual features as inputs for each attention head of the decoder layer. The global caption feature is introduced for the purpose of exploring the latent contributions of region proposals for image captioning, and further helping the decoder better focus on the most relevant proposals so as to extract more accurate visual feature in each time step of caption generation. Our GANN is implemented by incorporating the global caption feature into the attention weight calculation phase in the word predication process in each head of the decoder layer. In our experiments, we qualitatively analyzed the proposed model, and quantitatively evaluated several state-of-the-art schemes with GANN on the MS-COCO dataset. Experimental results demonstrate the effectiveness of the proposed global attention mechanism for image captioning.
引用
收藏
页码:1243 / 1252
页数:10
相关论文
共 50 条
  • [1] Global-Attention-Based Neural Networks for Vision Language Intelligence
    Pei Liu
    Yingjie Zhou
    Dezhong Peng
    Dapeng Wu
    IEEE/CAAJournalofAutomaticaSinica, 2021, 8 (07) : 1243 - 1252
  • [2] Global Fusion Attention for Vision and Language Understanding
    Guo, Zixin
    Liang, Chen
    Wan, Ziyu
    Bai, Yang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15789 - 15790
  • [3] GLAD: A Global-Attention-Based Diffusion Model for Infrared and Visible Image Fusion
    Guo, Haozhe
    Chen, Mengjie
    Li, Kaijiang
    Su, Hao
    Lv, Pei
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 345 - 356
  • [4] Holistic Graph Neural Networks based on a global-based attention mechanism
    Rassil, Asmaa
    Chougrad, Hiba
    Zouaki, Hamid
    KNOWLEDGE-BASED SYSTEMS, 2022, 240
  • [5] Global Attention-Based Graph Neural Networks for Node Classification
    Chen, Jiusheng
    Fang, Chengyuan
    Zhang, Xiaoyu
    NEURAL PROCESSING LETTERS, 2023, 55 (04) : 4127 - 4150
  • [6] Global Attention-Based Graph Neural Networks for Node Classification
    Jiusheng Chen
    Chengyuan Fang
    Xiaoyu Zhang
    Neural Processing Letters, 2023, 55 : 4127 - 4150
  • [7] Attention-Based Phonetic Convolutional Recurrent Neural Networks for Language Identification
    Gundluru, Ramesh
    Venkatesh, Vayyavuru
    Murty, K. Sri Rama
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 475 - 480
  • [8] Text Language Identification Using Attention-Based Recurrent Neural Networks
    Perelkiewicz, Michal
    Poswiata, Rafal
    ARTIFICIAL INTELLIGENCEAND SOFT COMPUTING, PT I, 2019, 11508 : 181 - 190
  • [9] Dynamic relevance: vision-based focus of attention using artificial neural networks
    Baluja, S
    Pomerleau, D
    ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 381 - 395
  • [10] Research on Anti-terrorism Intelligence Mining Method Based on Attention Neural Networks
    Bai, Caitong
    Li, Ai
    Gao, Zhiqiang
    Cui, Xiaolong
    PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 458 - 464