MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

被引：291

作者：

Wang, Huiyu ^{[1
,3
]}

Zhu, Yukun ^{[2
]}

Adam, Hartwig ^{[2
]}

Yuille, Alan ^{[1
]}

Chen, Liang-Chieh ^{[2
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Google Res, Mountain View, CA USA

[3] Google, Mountain View, CA 94043 USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00542

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.

引用

页码：5459 / 5470

页数：12

共 109 条

[51] Panoptic Feature Pyramid Networks [J].

Kirillov, Alexander ;

Girshick, Ross ;

He, Kaiming ;

Dollar, Piotr .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6392-6401

[52]

Kitaev N., 2020, 8 INT C LEARN REPR I

[53]

Kontschieder, 2019, CVPR, P8277

[54] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[55] The Hungarian Method for the assignment problem [J].

Kuhn, HW .

NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21

[56] Gradient-based learning applied to document recognition [J].

Lecun, Y ;

Bottou, L ;

Bengio, Y ;

Haffner, P .

PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324

[57]

Leibe B., 2004, WORKSH STAT LEARN CO, V2, P7

[58]

Li Jie, 2018, ARXIV181201192

[59] Wavelet Integrated CNNs for Noise-Robust Image Classification [J].

Li, Qiufu ;

Shen, Linlin ;

Guo, Sheng ;

Lai, Zhihui .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7243-7252

[60] Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics [J].

Li, Yuezun ;

Yang, Xin ;

Sun, Pu ;

Qi, Honggang ;

Lyu, Siwei .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3204-3213

← 1 2 3 4 5 6 7 8 9 10 →