Global-Attention-Based Neural Networks for Vision Language Intelligence

被引:14
|
作者
Liu, Pei [1 ]
Zhou, Yingjie [1 ]
Peng, Dezhong [1 ,2 ,3 ]
Wu, Dapeng [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Zhiqian Technol Co Ltd, Chengdu 610041, Peoples R China
[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[4] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
基金
中国国家自然科学基金;
关键词
Global attention; image captioning; latent contribution;
D O I
10.1109/JAS.2020.1003402
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we develop a novel global-attention-based neural network (GANN) for vision language intelligence, specifically, image captioning (language description of a given image). As many previous works, the encoder-decoder framework is adopted in our proposed model, in which the encoder is responsible for encoding the region proposal features and extracting global caption feature based on a specially designed module of predicting the caption objects, and the decoder generates captions by taking the obtained global caption feature along with the encoded visual features as inputs for each attention head of the decoder layer. The global caption feature is introduced for the purpose of exploring the latent contributions of region proposals for image captioning, and further helping the decoder better focus on the most relevant proposals so as to extract more accurate visual feature in each time step of caption generation. Our GANN is implemented by incorporating the global caption feature into the attention weight calculation phase in the word predication process in each head of the decoder layer. In our experiments, we qualitatively analyzed the proposed model, and quantitatively evaluated several state-of-the-art schemes with GANN on the MS-COCO dataset. Experimental results demonstrate the effectiveness of the proposed global attention mechanism for image captioning.
引用
收藏
页码:1243 / 1252
页数:10
相关论文
共 50 条
  • [31] Attention-based graph neural networks: a survey
    Chengcheng Sun
    Chenhao Li
    Xiang Lin
    Tianji Zheng
    Fanrong Meng
    Xiaobin Rui
    Zhixiao Wang
    Artificial Intelligence Review, 2023, 56 : 2263 - 2310
  • [32] Attention Based Recurrent Neural Networks for Online Advertising
    Zhai, Shuangfei
    Chang, Keng-Hao
    Zhang, Ruofei
    Zhang, Zhongfei
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 141 - 142
  • [33] Attention-based graph neural networks: a survey
    Sun, Chengcheng
    Li, Chenhao
    Lin, Xiang
    Zheng, Tianji
    Meng, Fanrong
    Rui, Xiaobin
    Wang, Zhixiao
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL 2) : 2263 - 2310
  • [34] Attention Based Neural Networks for Wireless Channel Estimation
    Luan, Dianxin
    Thompson, John
    2022 IEEE 95TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-SPRING), 2022,
  • [35] Circular Convolutional Neural Networks Based on Triplet Attention
    Wang J.
    Lei J.
    Zhang J.
    Sun S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (02): : 116 - 129
  • [36] Nonintrusive Load Disaggregation Based on Attention Neural Networks
    Lin, Shunfu
    Yang, Jiayu
    Li, Yi
    Shen, Yunwei
    Li, Fangxing
    Bian, Xiaoyan
    Li, Dongdong
    INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS, 2025, 2025 (01):
  • [37] Sign language recognition based on global-local attention
    Zhang, Shujun
    Zhang, Qun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 80
  • [38] Spoken Language Diarization Using an Attention based Neural Network
    Mishra, Jagabandhu
    Agarwal, Ayush
    Prasanna, S. R. Mahadeva
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 117 - 122
  • [39] Graph convolutional neural networks with global attention for improved materials property prediction
    Louis, Steph-Yves
    Zhao, Yong
    Nasiri, Alireza
    Wang, Xiran
    Song, Yuqi
    Liu, Fei
    Hu, Jianjun
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2020, 22 (32) : 18141 - 18148
  • [40] Representing Long-Range Context for Graph Neural Networks with Global Attention
    Wu, Zhanghao
    Jain, Paras
    Wright, Matthew A.
    Mirhoseini, Azalia
    Gonzalez, Joseph E.
    Stoica, Ion
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34