Mask encoding: A general instance mask representation for object segmentation

被引:5
作者
Zhang, Rufeng [1 ]
Kong, Tao [2 ]
Wang, Xinlong [3 ]
You, Mingyu [1 ,4 ]
机构
[1] Tongji Univ, Dept Control Sci & Engn, Shanghai, Peoples R China
[2] ByteDance AI Lab, Beijing, Peoples R China
[3] Univ Adelaide, Sch Comp Sci, Adelaide, SA, Australia
[4] Tongji Univ, Frontiers Sci Ctr Intelligent Autonomous Syst, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Mask encoding; Instance segmentation; Video instance segmentation;
D O I
10.1016/j.patcog.2021.108505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28 x 28 . Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 37 条
[1]   Pixelwise Instance Segmentation with a Dynamically Instantiated Network [J].
Arnab, Anurag ;
Torr, Philip H. S. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :879-888
[2]  
Athar Ali, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12356), P158, DOI 10.1007/978-3-030-58621-8_10
[3]   YOLACT Real-time Instance Segmentation [J].
Bolya, Daniel ;
Zhou, Chong ;
Xiao, Fanyi ;
Lee, Yong Jae .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165
[4]   JS']JSPNet: Learning joint semantic & instance segmentation of point clouds via feature self-similarity and cross-task probability [J].
Chen, Feng ;
Wu, Fei ;
Gao, Guangwei ;
Ji, Yimu ;
Xu, Jing ;
Jiang, Guo-Ping ;
Jing, Xiao-Yuan .
PATTERN RECOGNITION, 2022, 122
[5]   Frequency Domain Compact 3D Convolutional Neural Networks [J].
Chen, Hanting ;
Wang, Yunhe ;
Shu, Han ;
Tang, Yehui ;
Xu, Chunjing ;
Shi, Boxin ;
Xu, Chao ;
Tian, Qi ;
Xu, Chang .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1638-1647
[6]   Dual-force convolutional neural networks for accurate brain tumor segmentation [J].
Chen, Shengcong ;
Ding, Changxing ;
Liu, Minfeng .
PATTERN RECOGNITION, 2019, 88 :90-100
[7]   TensorMask: A Foundation for Dense Object Segmentation [J].
Chen, Xinlei ;
Girshick, Ross ;
He, Kaiming ;
Dollar, Piotr .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2061-2069
[8]   SEMEDA: Enhancing segmentation precision with semantic edge aware loss [J].
Chen, Yifu ;
Dapogny, Arnaud ;
Cord, Matthieu .
PATTERN RECOGNITION, 2020, 108
[9]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[10]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773