SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

被引：109

作者：

Cao, Jiale ^{[1
]}

Anwer, Rao Muhammad ^{[2
,3
]}

Cholakkal, Hisham ^{[2
,3
]}

Khan, Fahad Shahbaz ^{[2
,3
]}

Pang, Yanwei ^{[1
]}

Shao, Ling ^{[2
,3
]}

机构：

[1] Tianjin Univ, Tianjin, Peoples R China

[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates

[3] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

来源：

COMPUTER VISION - ECCV 2020, PT XIV | 2020年 / 12359卷

关键词：

Instance segmentation; Real-time; Spatial preservation;

D O I：

10.1007/978-3-030-58568-6_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each subregion within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.

引用

页码：1 / 18

页数：18

共 55 条

[1] Pixelwise Instance Segmentation with a Dynamically Instantiated Network [J].

Arnab, Anurag ;

Torr, Philip H. S. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :879-888

[2]

Bolya D, 2020, Arxiv, DOI arXiv:1912.06218

[3] YOLACT Real-time Instance Segmentation [J].

Bolya, Daniel ;

Zhou, Chong ;

Xiao, Fanyi ;

Lee, Yong Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165

[4] D2Det: Towards High Quality Object Detection and Instance Segmentation [J].

Cao, Jiale ;

Cholakkal, Hisham ;

Anwer, Rao Muhammad ;

Khan, Fahad Shahbaz ;

Pang, Yanwei ;

Shao, Ling .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11482-11491

[5] Hierarchical Shot Detector [J].

Cao, Jiale ;

Pang, Yanwei ;

Han, Jungong ;

Li, Xuelong .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9704-9713

[6] Triply Supervised Decoder Networks for Joint Detection and Segmentation [J].

Cao, Jiale ;

Pang, Yanwei ;

Li, Xuelong .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7384-7393

[7] BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [J].

Chen, Hao ;

Sun, Kunyang ;

Tian, Zhi ;

Shen, Chunhua ;

Huang, Yongming ;

Yan, Youliang .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8570-8578

[8] Hybrid Task Cascade for Instance Segmentation [J].

Chen, Kai ;

Pang, Jiangmiao ;

Wang, Jiaqi ;

Xiong, Yu ;

Li, Xiaoxiao ;

Sun, Shuyang ;

Feng, Wansen ;

Liu, Ziwei ;

Shi, Jianping ;

Ouyang, Wanli ;

Loy, Chen Change ;

Lin, Dahua .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978

[9] MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features [J].

Chen, Liang-Chieh ;

Hermans, Alexander ;

Papandreou, George ;

Schroff, Florian ;

Wang, Peng ;

Adam, Hartwig .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4013-4022

[10] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

← 1 2 3 4 5 6 →