IEMask R-CNN: Information-Enhanced Mask R-CNN

被引：41

作者：

Bi, Xiuli ^{[1
]}

Hu, Jinwu ^{[1
]}

Xiao, Bin ^{[1
]}

Li, Weisheng ^{[1
]}

Gao, Xinbo ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Dept Comp Sci & Technol, Chongqing 400065, Peoples R China

来源：

IEEE TRANSACTIONS ON BIG DATA | 2023年 / 9卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Object segmentation; Semantics; Feature extraction; Image segmentation; Location awareness; Head; Instance segmentation; information-enhanced FPN; adaptive feature fusion; encoding-decoding mask head; INSTANCE SEGMENTATION;

D O I：

10.1109/TBDATA.2022.3187413

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The instance segmentation task is relatively difficult in computer vision, which requires not only high-quality masks but also high-accuracy instance category classification. Mask R-CNN has been proven to be a feasible method. However, due to the Feature Pyramid Network (FPN) structure lack useful channel information, global information and low-level texture information, and mask branch cannot obtain useful local-global information, Mask R-CNN is prevented from obtaining high-quality masks and high-accuracy instance category classification. Therefore, we proposed the Information-enhanced Mask R-CNN, called IEMask R-CNN. In the FPN structure of IEMask R-CNN, the information-enhanced FPN will enhance the useful channel information and the global information of the feature maps to solve the issues that the high-level feature map loses useful channel information and inaccurate of instance category classification, meanwhile the bottom-up path enhancement with adaptive feature fusion will ultilize the precise positioning signal in the lower layer to enhance the feature pyramid. In the mask branch of IEMask R-CNN, an encoding-decoding mask head will strength local-global information to gain a high-quality mask. Without bells and whistles, IEMask R-CNN gains significant gains of about 2.60%, 4.00%, 3.17% over Mask R-CNN on MS COCO2017, Cityscapes and LVIS1.0 benchmarks respectively.

引用

页码：688 / 700

页数：13

共 43 条

[1] Deep Watershed Transform for Instance Segmentation [J].

Bai, Min ;

Urtasun, Raquel .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2858-2866

[2] YOLACT Real-time Instance Segmentation [J].

Bolya, Daniel ;

Zhou, Chong ;

Xiao, Fanyi ;

Lee, Yong Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165

[3] Cascade R-CNN: High Quality Object Detection and Instance Segmentation [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) :1483-1498

[4] DCAN: Deep contour-aware networks for object instance segmentation from histology images [J].

Chen, Hao ;

Qi, Xiaojuan ;

Yu, Lequan ;

Dou, Qi ;

Qin, Jing ;

Heng, Pheng-Ann .

MEDICAL IMAGE ANALYSIS, 2017, 36 :135-146

[5] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[6] Deep Feature Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network [J].

Chen, Min ;

Shi, Xiaobo ;

Zhang, Yin ;

Wu, Di ;

Guizani, Mohsen .

IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (04) :750-758

[7] Boundary-Preserving Mask R-CNN [J].

Cheng, Tianheng ;

Wang, Xinggang ;

Huang, Lichao ;

Liu, Wenyu .

COMPUTER VISION - ECCV 2020, PT XIV, 2020, 12359 :660-676

[8] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[9]

Fei Wu, 2015, IEEE Transactions on Big Data, V1, P109, DOI [10.1109/tbdata.2015.2497270, 10.1109/TBDATA.2015.2497270]

[10] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

← 1 2 3 4 5 →