Global-Attention-Based Neural Networks for Vision Language Intelligence

被引：14

作者：

Liu, Pei ^{[1
]}

Zhou, Yingjie ^{[1
]}

Peng, Dezhong ^{[1
,2
,3
]}

Wu, Dapeng ^{[4
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Sichuan Zhiqian Technol Co Ltd, Chengdu 610041, Peoples R China

[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China

[4] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

来源：

IEEE-CAA JOURNAL OF AUTOMATICA SINICA | 2021年 / 8卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Global attention; image captioning; latent contribution;

D O I：

10.1109/JAS.2020.1003402

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we develop a novel global-attention-based neural network (GANN) for vision language intelligence, specifically, image captioning (language description of a given image). As many previous works, the encoder-decoder framework is adopted in our proposed model, in which the encoder is responsible for encoding the region proposal features and extracting global caption feature based on a specially designed module of predicting the caption objects, and the decoder generates captions by taking the obtained global caption feature along with the encoded visual features as inputs for each attention head of the decoder layer. The global caption feature is introduced for the purpose of exploring the latent contributions of region proposals for image captioning, and further helping the decoder better focus on the most relevant proposals so as to extract more accurate visual feature in each time step of caption generation. Our GANN is implemented by incorporating the global caption feature into the attention weight calculation phase in the word predication process in each head of the decoder layer. In our experiments, we qualitatively analyzed the proposed model, and quantitatively evaluated several state-of-the-art schemes with GANN on the MS-COCO dataset. Experimental results demonstrate the effectiveness of the proposed global attention mechanism for image captioning.

引用

页码：1243 / 1252

页数：10

共 50 条

[41] Dynamic Convolution Neural Networks with Both Global and Local Attention for Image Classification
Zheng, Chusan
Li, Yafeng
Li, Jian
Li, Ning
Fan, Pan
Sun, Jieqi
Liu, Penghui
MATHEMATICS, 2024, 12 (12)
[42] Global Convolutional Neural Networks With Self-Attention for Fisheye Image Rectification
Kim, Byunghyun
Lee, Dohyun
Min, Kyeongyuk
Chong, Jongwha
Joe, Inwhee
IEEE Access, 2022, 10 : 129580 - 129587
[43] Global Convolutional Neural Networks With Self-Attention for Fisheye Image Rectification
Kim, Byunghyun
Lee, Dohyun
Min, Kyeongyuk
Chong, Jongwha
Joe, Inwhee
IEEE ACCESS, 2022, 10 : 129580 - 129587
[44] Causal Attention for Vision-Language Tasks
Yang, Xu
Zhang, Hanwang
Qi, Guojun
Cai, Jianfei
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9842 - 9852
[45] Multi-scale oriented object detection in aerial images based on convolutional neural networks with global attention
Fei, Jingjing
Wang, Zhicheng
Yu, Zhaohui
Gu, Xi
Wei, Gang
MIPPR 2019: REMOTE SENSING IMAGE PROCESSING, GEOGRAPHIC INFORMATION SYSTEMS, AND OTHER APPLICATIONS, 2020, 11432
[46] Self-supervised global graph neural networks with enhance-attention for session-based recommendation
Wang, Qi
Cui, Hao
Zhang, Jiapeng
Du, Yan
Lu, Xiaojun
APPLIED SOFT COMPUTING, 2024, 150
[47] Attention Spiking Neural Networks
Yao, Man
Zhao, Guangshe
Zhang, Hengyu
Hu, Yifan
Deng, Lei
Tian, Yonghong
Xu, Bo
Li, Guoqi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9393 - 9410
[48] Attention-based neural networks for trust evaluation in online social networks
Xu, Yanwei
Feng, Zhiyong
Zhou, Xian
Xing, Meng
Wu, Hongyue
Xue, Xiao
Chen, Shizhan
Wang, Chao
Qi, Lianyong
INFORMATION SCIENCES, 2023, 630 : 507 - 522
[49] Morphological neural networks and vision based simultaneous localization and mapping
Villaverde, Ivan
Grana, Manuel
d'Anjou, Alicia
INTEGRATED COMPUTER-AIDED ENGINEERING, 2007, 14 (04) : 355 - 363
[50] Vision-Based Fall Detection with Convolutional Neural Networks
Nunez-Marcos, Adrian
Azkune, Gorka
Arganda-Carreras, Ignacio
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2017,

← 1 2 3 4 5 →