MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

被引:273
作者
Wang, Huiyu [1 ,3 ]
Zhu, Yukun [2 ]
Adam, Hartwig [2 ]
Yuille, Alan [1 ]
Chen, Liang-Chieh [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Google Res, Mountain View, CA USA
[3] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.
引用
收藏
页码:5459 / 5470
页数:12
相关论文
共 109 条
[1]  
Ainslie J, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P268
[2]  
[Anonymous], IEEE T PATTERN ANAL
[3]  
[Anonymous], 2017, ICCV COCO CHALL WORK
[4]  
[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.305
[5]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00633
[6]  
[Anonymous], 2020, NEURIPS
[7]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00656
[8]  
[Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.255
[9]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[10]   GENERALIZING THE HOUGH TRANSFORM TO DETECT ARBITRARY SHAPES [J].
BALLARD, DH .
PATTERN RECOGNITION, 1981, 13 (02) :111-122