SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

被引:109
作者
Cao, Jiale [1 ]
Anwer, Rao Muhammad [2 ,3 ]
Cholakkal, Hisham [2 ,3 ]
Khan, Fahad Shahbaz [2 ,3 ]
Pang, Yanwei [1 ]
Shao, Ling [2 ,3 ]
机构
[1] Tianjin Univ, Tianjin, Peoples R China
[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
来源
COMPUTER VISION - ECCV 2020, PT XIV | 2020年 / 12359卷
关键词
Instance segmentation; Real-time; Spatial preservation;
D O I
10.1007/978-3-030-58568-6_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each subregion within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.
引用
收藏
页码:1 / 18
页数:18
相关论文
共 55 条
[1]   Pixelwise Instance Segmentation with a Dynamically Instantiated Network [J].
Arnab, Anurag ;
Torr, Philip H. S. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :879-888
[2]  
Bolya D, 2020, Arxiv, DOI arXiv:1912.06218
[3]   YOLACT Real-time Instance Segmentation [J].
Bolya, Daniel ;
Zhou, Chong ;
Xiao, Fanyi ;
Lee, Yong Jae .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165
[4]   D2Det: Towards High Quality Object Detection and Instance Segmentation [J].
Cao, Jiale ;
Cholakkal, Hisham ;
Anwer, Rao Muhammad ;
Khan, Fahad Shahbaz ;
Pang, Yanwei ;
Shao, Ling .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11482-11491
[5]   Hierarchical Shot Detector [J].
Cao, Jiale ;
Pang, Yanwei ;
Han, Jungong ;
Li, Xuelong .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9704-9713
[6]   Triply Supervised Decoder Networks for Joint Detection and Segmentation [J].
Cao, Jiale ;
Pang, Yanwei ;
Li, Xuelong .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7384-7393
[7]   BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [J].
Chen, Hao ;
Sun, Kunyang ;
Tian, Zhi ;
Shen, Chunhua ;
Huang, Yongming ;
Yan, Youliang .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8570-8578
[8]   Hybrid Task Cascade for Instance Segmentation [J].
Chen, Kai ;
Pang, Jiangmiao ;
Wang, Jiaqi ;
Xiong, Yu ;
Li, Xiaoxiao ;
Sun, Shuyang ;
Feng, Wansen ;
Liu, Ziwei ;
Shi, Jianping ;
Ouyang, Wanli ;
Loy, Chen Change ;
Lin, Dahua .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978
[9]   MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features [J].
Chen, Liang-Chieh ;
Hermans, Alexander ;
Papandreou, George ;
Schroff, Florian ;
Wang, Peng ;
Adam, Hartwig .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4013-4022
[10]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848