Segmenting Beyond the Bounding Box for Instance Segmentation

被引：26

作者：

Zhang, Xiaoliang ^{[1
]}

Li, Hongliang ^{[1
]}

Meng, Fanman ^{[1
]}

Song, Zichen ^{[1
]}

Xu, Linfeng ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Object detection; Image segmentation; Semantics; Proposals; Detectors; Task analysis; Training; Instance segmentation; bounding box; pixel embedding;

D O I：

10.1109/TCSVT.2021.3063377

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Instance segmentation needs to locate all instances in an image correctly and segment each instance precisely. Currently, the most dominant methods for instance segmentation take object detection as a pre-task. However, they rely on the accuracy of object detection incredibly. If the pre-task cannot predict an accurate bounding box, the performance of instance segmentation will degenerate. In this paper, we present a novel method for instance segmentation to solve this problem, which is called Segmenting Beyond the Bounding Box (S3B-Net). Our S3B-Net designs a sub-network to help instance segmentation methods based on object detection to segment the part of an instance beyond the bounding box. Specifically, the sub-network first predicts a two-dimensional pixel embedding for each pixel. Then, the Gaussian function is employed to calculate a pixel's probability belongs to a corresponding instance according to the two-dimensional pixel embedding. Finally, the output of the sub-network combines with the output of instance segmentation based on object detection to generate a more precise instance mask. Our sub-network can easily extend on the existing instance segmentation method based on object detection to segment instance beyond the bounding box. We do our experiments on dominant instance segmentation datasets, such as the COCO dataset and Cityscapes dataset. The results show that our method can achieve 6.8 points gain compared with the baseline Mask R-CNN with ResNet-50-FPN in Cityscapes datasets, and 1.7 points gain with ResNet-101-FPN-DCN in COCO datasets. Our S3B-Net outperforms the previous state-of-the-art instance segmentation method, which proves our method is competitive. The source code of our method will be made available.

引用

页码：704 / 714

页数：11

共 42 条

[1] Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations [J].

Ahn, Jiwoon ;

Cho, Sunghyun ;

Kwak, Suha .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2204-2213

[2]

[Anonymous], 2015, P 3 INT C LEARN REPR

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4] YOLACT Real-time Instance Segmentation [J].

Bolya, Daniel ;

Zhou, Chong ;

Xiao, Fanyi ;

Lee, Yong Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165

[5]

Chen K., 2019, CoRR abs/1906.07155

[6] Hybrid Task Cascade for Instance Segmentation [J].

Chen, Kai ;

Pang, Jiangmiao ;

Wang, Jiaqi ;

Xiong, Yu ;

Li, Xiaoxiao ;

Sun, Shuyang ;

Feng, Wansen ;

Liu, Ziwei ;

Shi, Jianping ;

Ouyang, Wanli ;

Loy, Chen Change ;

Lin, Dahua .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978

[7] TensorMask: A Foundation for Dense Object Segmentation [J].

Chen, Xinlei ;

Girshick, Ross ;

He, Kaiming ;

Dollar, Piotr .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2061-2069

[8]

Cheng T., 2019, P EUR C COMP VIS, P660

[9] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[10] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

← 1 2 3 4 5 →