Global-Attention-Based Neural Networks for Vision Language Intelligence

被引:14
|
作者
Liu, Pei [1 ]
Zhou, Yingjie [1 ]
Peng, Dezhong [1 ,2 ,3 ]
Wu, Dapeng [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Zhiqian Technol Co Ltd, Chengdu 610041, Peoples R China
[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[4] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
基金
中国国家自然科学基金;
关键词
Global attention; image captioning; latent contribution;
D O I
10.1109/JAS.2020.1003402
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we develop a novel global-attention-based neural network (GANN) for vision language intelligence, specifically, image captioning (language description of a given image). As many previous works, the encoder-decoder framework is adopted in our proposed model, in which the encoder is responsible for encoding the region proposal features and extracting global caption feature based on a specially designed module of predicting the caption objects, and the decoder generates captions by taking the obtained global caption feature along with the encoded visual features as inputs for each attention head of the decoder layer. The global caption feature is introduced for the purpose of exploring the latent contributions of region proposals for image captioning, and further helping the decoder better focus on the most relevant proposals so as to extract more accurate visual feature in each time step of caption generation. Our GANN is implemented by incorporating the global caption feature into the attention weight calculation phase in the word predication process in each head of the decoder layer. In our experiments, we qualitatively analyzed the proposed model, and quantitatively evaluated several state-of-the-art schemes with GANN on the MS-COCO dataset. Experimental results demonstrate the effectiveness of the proposed global attention mechanism for image captioning.
引用
收藏
页码:1243 / 1252
页数:10
相关论文
共 50 条
  • [41] Dynamic Convolution Neural Networks with Both Global and Local Attention for Image Classification
    Zheng, Chusan
    Li, Yafeng
    Li, Jian
    Li, Ning
    Fan, Pan
    Sun, Jieqi
    Liu, Penghui
    MATHEMATICS, 2024, 12 (12)
  • [42] Global Convolutional Neural Networks With Self-Attention for Fisheye Image Rectification
    Kim, Byunghyun
    Lee, Dohyun
    Min, Kyeongyuk
    Chong, Jongwha
    Joe, Inwhee
    IEEE Access, 2022, 10 : 129580 - 129587
  • [43] Global Convolutional Neural Networks With Self-Attention for Fisheye Image Rectification
    Kim, Byunghyun
    Lee, Dohyun
    Min, Kyeongyuk
    Chong, Jongwha
    Joe, Inwhee
    IEEE ACCESS, 2022, 10 : 129580 - 129587
  • [44] Causal Attention for Vision-Language Tasks
    Yang, Xu
    Zhang, Hanwang
    Qi, Guojun
    Cai, Jianfei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9842 - 9852
  • [45] Multi-scale oriented object detection in aerial images based on convolutional neural networks with global attention
    Fei, Jingjing
    Wang, Zhicheng
    Yu, Zhaohui
    Gu, Xi
    Wei, Gang
    MIPPR 2019: REMOTE SENSING IMAGE PROCESSING, GEOGRAPHIC INFORMATION SYSTEMS, AND OTHER APPLICATIONS, 2020, 11432
  • [46] Self-supervised global graph neural networks with enhance-attention for session-based recommendation
    Wang, Qi
    Cui, Hao
    Zhang, Jiapeng
    Du, Yan
    Lu, Xiaojun
    APPLIED SOFT COMPUTING, 2024, 150
  • [47] Attention Spiking Neural Networks
    Yao, Man
    Zhao, Guangshe
    Zhang, Hengyu
    Hu, Yifan
    Deng, Lei
    Tian, Yonghong
    Xu, Bo
    Li, Guoqi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9393 - 9410
  • [48] Attention-based neural networks for trust evaluation in online social networks
    Xu, Yanwei
    Feng, Zhiyong
    Zhou, Xian
    Xing, Meng
    Wu, Hongyue
    Xue, Xiao
    Chen, Shizhan
    Wang, Chao
    Qi, Lianyong
    INFORMATION SCIENCES, 2023, 630 : 507 - 522
  • [49] Morphological neural networks and vision based simultaneous localization and mapping
    Villaverde, Ivan
    Grana, Manuel
    d'Anjou, Alicia
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2007, 14 (04) : 355 - 363
  • [50] Vision-Based Fall Detection with Convolutional Neural Networks
    Nunez-Marcos, Adrian
    Azkune, Gorka
    Arganda-Carreras, Ignacio
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2017,