Attention Augmented Convolutional Networks

被引：895

作者：

Bello, Irwan ^{[1
]}

Zoph, Barret ^{[1
]}

Vaswani, Ashish ^{[1
]}

Shlens, Jonathon ^{[1
]}

Le, Quoc V. ^{[1
]}

机构：

[1] Google Brain, Mountain View, CA 94043 USA

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

ARCHITECTURES;

D O I：

10.1109/ICCV.2019.00338

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, on the other hand, has emerged as a recent advance to capture long range interactions, but has mostly been applied to sequence modeling and generative modeling tasks. In this paper, we consider the use of self-attention for discriminative visual tasks as an alternative to convolutions. We introduce a novel two-dimensional relative self-attention mechanism that proves competitive in replacing convolutions as a stand-alone computational primitive for image classification. We find in control experiments that the best results are obtained when combining both convolutions and self-attention. We therefore propose to augment convolutional operators with this self-attention mechanism by concatenating convolutional feature maps with a set of feature maps produced via self-attention. Extensive experiments show that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a stateof-the art mobile constrained network, while keeping the number of parameters similar. In particular, our method achieves a 1.3% top-1 accuracy improvement on ImageNet classification over a ResNet50 baseline and outperforms other attention mechanisms for images such as Squeeze-and-Excitation [17]. It also achieves an improvement of 1.4 mAP in COCO Object Detection on top of a RetinaNet baseline.

引用

页码：3285 / 3294

页数：10

共 55 条

[41]

Shaw Peter, 2018, PROC NAACL HLT, V18, P464

[42]

So David R., 2019, CORR

[43]

Szegedy C., 2016, IEEE C COMP VIS PATT IEEE C COMPUTER VISI

[44]

Szegedy C., 2015, P IEEE C COMP VIS PA, P1, DOI DOI 10.1109/CVPR.2015.7298594

[45]

Szegedy Christian, 2016, INT C LEARN REPR WOR

[46]

Vaswani Ashish, 2017, P 31 INT C NEURAL IN, P5998

[47]

Vinyals O., 2015, ADV NEURAL INFORM PR, V28

[48] Non-local Neural Networks [J].

Wang, Xiaolong ;

Girshick, Ross ;

Gupta, Abhinav ;

He, Kaiming .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7794-7803

[49] CBAM: Convolutional Block Attention Module [J].

Woo, Sanghyun ;

Park, Jongchan ;

Lee, Joon-Young ;

Kweon, In So .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :3-19

[50]

Yamada Y., 2018, ARXIV180202375

← 1 2 3 4 5 6 →