An Empirical Study of Spatial Attention Mechanisms in Deep Networks

被引：424

作者：

Zhu, Xizhou ^{[1
,2
]}

Cheng, Dazhi ^{[2
]}

Zhang, Zheng ^{[2
]}

Lin, Stephen ^{[2
]}

Dai, Jifeng ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00679

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the comparison of query and key content in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. On the other hand, a proper combination of deformable convolution with key content saliency achieves the best accuracy-efficiency tradeoff in self-attention. Our results suggest that there exists much room for improvement in the design of attention mechanisms.

引用

页码：6687 / 6696

页数：10

共 48 条

[21]

Hu Han, 2018, CVPR

[22]

Hu Jie, 2018, CVPR

[23]

Huang Lichao, 2018, COMPUTER VISION PATT

[24]

Jain S, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3543

[25]

Jun F, 2018, ARXIV180902983180902

[26] Microsoft COCO: Common Objects in Context [J].

Lin, Tsung-Yi ;

Maire, Michael ;

Belongie, Serge ;

Hays, James ;

Perona, Pietro ;

Ramanan, Deva ;

Dollar, Piotr ;

Zitnick, C. Lawrence .

COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755

[27]

LIN TY, 2017, PROC CVPR IEEE, P936, DOI DOI 10.1109/CVPR.2017.106

[28]

Lin Z., 2017, INT C LEARN REPR

[29]

Liu Lemao, 2016, P COLING 2016 26 INT, P3093

[30]

Luong Minh-Thang, 2015, P C EMP METH NAT LAN, DOI DOI 10.18653/V1/D15-1166

← 1 2 3 4 5 →