Rotate to Attend: Convolutional Triplet Attention Module

被引：653

作者：

Misra, Diganta ^{[1
]}

Nalamada, Trikay ^{[2
]}

Arasanipalai, Ajay Uppili ^{[3
]}

Hou, Qibin ^{[4
]}

机构：

[1] Landskape, Sailu, India

[2] Indian Inst Technol, Gauhati, India

[3] Univ Illinois, Urbana, IL USA

[4] Natl Univ Singapore, Singapore, Singapore

来源：

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 | 2021年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/WACV48630.2021.00318

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Benefiting from the capability of building inter-dependencies among channels or spatial locations, attention mechanisms have been extensively studied and broadly used in a variety of computer vision tasks recently. In this paper, we investigate light-weight but effective attention mechanisms and present triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure. For an input tensor, triplet attention builds inter-dimensional dependencies by the rotation operation followed by residual transformations and encodes inter-channel and spatial information with negligible computational overhead. Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module. We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide extensive insight into the performance of triplet attention by visually inspecting the GradCAM and GradCAM++ results. The empirical evaluation of our method supports our intuition on the importance of capturing dependencies across dimensions when computing attention weights. Code for this paper can be publicly accessed at https://github. com/LandskapeAI/triplet-attention.

引用

页码：3138 / 3147

页数：10

共 36 条

[1] Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks [J].

Cao, Chunshui ;

Liu, Xianming ;

Yang, Yi ;

Yu, Yinan ;

Wang, Jiang ;

Wang, Zilei ;

Huang, Yongzhen ;

Wang, Liang ;

Huang, Chang ;

Xu, Wei ;

Ramanan, Deva ;

Huang, Thomas S. .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2956-2964

[2] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].

Cao, Yue ;

Xu, Jiarui ;

Lin, Stephen ;

Wei, Fangyun ;

Hu, Han .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980

[3] Grad-CAM plus plus : Generalized Gradient-based Visual Explanations for Deep Convolutional Networks [J].

Chattopadhay, Aditya ;

Sarkar, Anirban ;

Howlader, Prantik ;

Balasubramanian, Vineeth N. .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :839-847

[4]

Chen K., 2019, arXiv:1906.07155

[5] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

[6]

Chen YP, 2018, ADV NEUR IN, V31

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] The PASCAL Visual Object Classes Challenge: A Retrospective [J].

Everingham, Mark ;

Eslami, S. M. Ali ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136

[9]

FUKUI H, 2019, IEEE C COMP VIS PATT

[10] 3Global Second-order Pooling Convolutional Networks [J].

Gao, Zilin ;

Xie, Jiangtao ;

Wang, Qilong ;

Li, Peihua .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3019-3028

← 1 2 3 4 →