Attention mechanisms in computer vision: A survey

被引:1436
作者
Guo, Meng-Hao [1 ]
Xu, Tian-Xing [1 ]
Liu, Jiang-Jiang [2 ]
Liu, Zheng-Ning [1 ]
Jiang, Peng-Tao [2 ]
Mu, Tai-Jiang [1 ]
Zhang, Song-Hai [1 ]
Martin, Ralph R. [3 ]
Cheng, Ming-Ming [2 ]
Hu, Shi-Min [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 100084, Peoples R China
[2] Nankai Univ, Coll Comp Sci, TKLNDST, Tianjin 300350, Peoples R China
[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff, Wales
基金
中国国家自然科学基金;
关键词
attention; transformer; computer vision; deep learning; salience; SPATIAL-TEMPORAL ATTENTION; NETWORK; MODEL;
D O I
10.1007/s41095-022-0271-y
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
引用
收藏
页码:331 / 368
页数:38
相关论文
共 184 条
[1]  
[Anonymous], 2019, P 27 ACM INT C MULT, DOI DOI 10.1145/3343031.3350960
[2]  
[Anonymous], 2016, P COMP VIS PATT REC, DOI DOI 10.48550/ARXIV.1601.06823
[3]  
Ba J. L., 2016, Advances in Neural Information Processing Systems (NeurIPS), P1
[4]  
Bahdanau D., 2015, ICLR
[5]  
Bao H. B., 2021, ARXIV PREPRINT ARXIV
[6]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294
[7]  
Brown TB, 2020, ADV NEUR IN, V33
[8]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[9]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[10]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640