Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

被引:472
作者
Wang, Huiyu [1 ]
Zhu, Yukun [2 ]
Green, Bradley [2 ]
Adam, Hartwig [3 ]
Yuille, Alan [1 ]
Chen, Liang-Chieh [3 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Google Res, Seattle, WA USA
[3] Google Res, Los Angeles, CA USA
来源
COMPUTER VISION - ECCV 2020, PT IV | 2020年 / 12349卷
关键词
Bottom-up panoptic segmentation; Self-attention; ALGORITHM; TRANSFORM;
D O I
10.1007/978-3-030-58548-8_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
引用
收藏
页码:108 / 126
页数:19
相关论文
共 98 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
ACKLEY DH, 1985, COGNITIVE SCI, V9, P147
[3]  
[Anonymous], 2004, Workshop on Statistical Learning in Computer Vision in European Conference on Computer Vision
[4]  
[Anonymous], 1990, Wavelets
[5]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[6]   Deep Watershed Transform for Instance Segmentation [J].
Bai, Min ;
Urtasun, Raquel .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2858-2866
[7]   GENERALIZING THE HOUGH TRANSFORM TO DETECT ARBITRARY SHAPES [J].
BALLARD, DH .
PATTERN RECOGNITION, 1981, 13 (02) :111-122
[8]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294
[9]  
Bonde U, 2020, Arxiv, DOI arXiv:2002.07705
[10]  
Brock Andrew, 2019, INT C LEARN REPR