Attention mechanisms in computer vision: A survey

被引:1436
作者
Guo, Meng-Hao [1 ]
Xu, Tian-Xing [1 ]
Liu, Jiang-Jiang [2 ]
Liu, Zheng-Ning [1 ]
Jiang, Peng-Tao [2 ]
Mu, Tai-Jiang [1 ]
Zhang, Song-Hai [1 ]
Martin, Ralph R. [3 ]
Cheng, Ming-Ming [2 ]
Hu, Shi-Min [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 100084, Peoples R China
[2] Nankai Univ, Coll Comp Sci, TKLNDST, Tianjin 300350, Peoples R China
[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff, Wales
基金
中国国家自然科学基金;
关键词
attention; transformer; computer vision; deep learning; salience; SPATIAL-TEMPORAL ATTENTION; NETWORK; MODEL;
D O I
10.1007/s41095-022-0271-y
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
引用
收藏
页码:331 / 368
页数:38
相关论文
共 184 条
[21]   Graph-Based Global Reasoning Networks [J].
Chen, Yunpeng ;
Rohrbach, Marcus ;
Yan, Zhicheng ;
Yan, Shuicheng ;
Feng, Jiashi ;
Kalantidis, Yannis .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :433-442
[22]  
Chen YP, 2018, ADV NEUR IN, V31
[23]   You Look Twice: GaterNet for Dynamic Filter Selection in CNNs [J].
Chen, Zhourong ;
Li, Yang ;
Bengio, Samy ;
Si, Si .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9164-9172
[24]  
Choromanski K. M., 2021, P ICLR, P1
[25]   Multi-Context Attention for Human Pose Estimation [J].
Chu, Xiao ;
Yang, Wei ;
Ouyang, Wanli ;
Ma, Cheng ;
Yuille, Alan L. ;
Wang, Xiaogang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5669-5678
[26]   Control of goal-directed and stimulus-driven attention in the brain [J].
Corbetta, M ;
Shulman, GL .
NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215
[27]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773
[28]   Second-order Attention Network for Single Image Super-Resolution [J].
Dai, Tao ;
Cai, Jianrui ;
Zhang, Yongbing ;
Xia, Shu-Tao ;
Zhang, Lei .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11057-11066
[29]  
Dai Z., 2021, ADV NEURAL INFORM PR
[30]  
Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978