Mask encoding: A general instance mask representation for object segmentation

被引：5

作者：

Zhang, Rufeng ^{[1
]}

Kong, Tao ^{[2
]}

Wang, Xinlong ^{[3
]}

You, Mingyu ^{[1
,4
]}

机构：

[1] Tongji Univ, Dept Control Sci & Engn, Shanghai, Peoples R China

[2] ByteDance AI Lab, Beijing, Peoples R China

[3] Univ Adelaide, Sch Comp Sci, Adelaide, SA, Australia

[4] Tongji Univ, Frontiers Sci Ctr Intelligent Autonomous Syst, Shanghai, Peoples R China

来源：

PATTERN RECOGNITION | 2022年 / 124卷

基金：

中国国家自然科学基金;

关键词：

Mask encoding; Instance segmentation; Video instance segmentation;

D O I：

10.1016/j.patcog.2021.108505

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28 x 28 . Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet (c) 2021 Elsevier Ltd. All rights reserved.

引用

页数：12

共 37 条

[1] Pixelwise Instance Segmentation with a Dynamically Instantiated Network [J].

Arnab, Anurag ;

Torr, Philip H. S. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :879-888

[2]

Athar Ali, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12356), P158, DOI 10.1007/978-3-030-58621-8_10

[3] YOLACT Real-time Instance Segmentation [J].

Bolya, Daniel ;

Zhou, Chong ;

Xiao, Fanyi ;

Lee, Yong Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165

[4] JS']JSPNet: Learning joint semantic & instance segmentation of point clouds via feature self-similarity and cross-task probability [J].

Chen, Feng ;

Wu, Fei ;

Gao, Guangwei ;

Ji, Yimu ;

Xu, Jing ;

Jiang, Guo-Ping ;

Jing, Xiao-Yuan .

PATTERN RECOGNITION, 2022, 122

[5] Frequency Domain Compact 3D Convolutional Neural Networks [J].

Chen, Hanting ;

Wang, Yunhe ;

Shu, Han ;

Tang, Yehui ;

Xu, Chunjing ;

Shi, Boxin ;

Xu, Chao ;

Tian, Qi ;

Xu, Chang .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1638-1647

[6] Dual-force convolutional neural networks for accurate brain tumor segmentation [J].

Chen, Shengcong ;

Ding, Changxing ;

Liu, Minfeng .

PATTERN RECOGNITION, 2019, 88 :90-100

[7] TensorMask: A Foundation for Dense Object Segmentation [J].

Chen, Xinlei ;

Girshick, Ross ;

He, Kaiming ;

Dollar, Piotr .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2061-2069

[8] SEMEDA: Enhancing segmentation precision with semantic edge aware loss [J].

Chen, Yifu ;

Dapogny, Arnaud ;

Cord, Matthieu .

PATTERN RECOGNITION, 2020, 108

[9] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[10] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

← 1 2 3 4 →