Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

被引：472

作者：

Wang, Huiyu ^{[1
]}

Zhu, Yukun ^{[2
]}

Green, Bradley ^{[2
]}

Adam, Hartwig ^{[3
]}

Yuille, Alan ^{[1
]}

Chen, Liang-Chieh ^{[3
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Google Res, Seattle, WA USA

[3] Google Res, Los Angeles, CA USA

来源：

COMPUTER VISION - ECCV 2020, PT IV | 2020年 / 12349卷

关键词：

Bottom-up panoptic segmentation; Self-attention; ALGORITHM; TRANSFORM;

D O I：

10.1007/978-3-030-58548-8_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.

引用

页码：108 / 126

页数：19

共 98 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

ACKLEY DH, 1985, COGNITIVE SCI, V9, P147

[3]

[Anonymous], 2004, Workshop on Statistical Learning in Computer Vision in European Conference on Computer Vision

[4]

[Anonymous], 1990, Wavelets

[5]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[6] Deep Watershed Transform for Instance Segmentation [J].

Bai, Min ;

Urtasun, Raquel .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2858-2866

[7] GENERALIZING THE HOUGH TRANSFORM TO DETECT ARBITRARY SHAPES [J].

BALLARD, DH .

PATTERN RECOGNITION, 1981, 13 (02) :111-122

[8] Attention Augmented Convolutional Networks [J].

Bello, Irwan ;

Zoph, Barret ;

Vaswani, Ashish ;

Shlens, Jonathon ;

Le, Quoc V. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294

[9]

Bonde U, 2020, Arxiv, DOI arXiv:2002.07705

[10]

Brock Andrew, 2019, INT C LEARN REPR

← 1 2 3 4 5 6 7 8 9 10 →