Efficient Task-Specific Feature Re-Fusion for More Accurate Object Detection and Instance Segmentation

被引：1

作者：

Wang, Cheng ^{[1
,2
]}

Fang, Yuxin ^{[1
]}

Fang, Jiemin ^{[1
,2
]}

Guo, Peng ^{[1
]}

Wu, Rui ^{[3
]}

Huang, He ^{[3
]}

Wang, Xinggang ^{[1
]}

Huang, Chang ^{[3
]}

Liu, Wenyu ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China

[2] Huazhong Univ Sci & Technol, Inst Artificial Intelligence, Wuhan 430074, Peoples R China

[3] Horizon Robot Inc, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

关键词：

Computer vision; deep learning; object detection; instance segmentation;

D O I：

10.1109/TCSVT.2023.3344713

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Feature pyramid representations have been widely adopted in the object detection literature for better handling of variations in scale, which provide abundant information from various spatial levels for classification and localization sub-tasks. We find that inter sub-task feature disentanglement and intra sub-task feature re-fusion are crucial for final prediction performance, but are hard to be achieved simultaneously considering the computational efficiency. We find this issue can be addressed by delicate module design. In this paper, we propose an Efficient Task-specific Feature Re-fusion (ETFR) module to mitigate the dilemma. ETFR disentangles inter sub-task features, reduces the output channels of multi-scale features based on their importance and re-fuses intra sub-task features via concatenation operation. As a plug-and-play module, ETFR can remarkably and consistently improve the well-established and highly-optimized object detection and instance segmentation methods, such as RetinaNet, FCOS, BlendMask and CondInst, with neglectable extra computation cost. Extensive experiments demonstrate that ETFR has good generalization ability on various changeling datasets, including COCO, LVIS and Cityscapes.

引用

页码：5350 / 5360

页数：11

共 75 条

[1] YOLACT Real-time Instance Segmentation
Bolya, Daniel
Zhou, Chong
Xiao, Fanyi
Lee, Yong Jae
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9156 - 9165
[2] Cascade R-CNN: Delving into High Quality Object Detection
Cai, Zhaowei
Vasconcelos, Nuno
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6154 - 6162
[3] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Cao, Yue
Xu, Jiarui
Lin, Stephen
Wei, Fangyun
Hu, Han
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1971 - 1980
[4] Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Chen, Fangyi
Zhang, Han
Hu, Kai
Huang, Yu-Kai
Zhu, Chenchen
Savvides, Marios
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23756 - 23765
[5] Chen X., 2018, P BRIT MACH VIS C NE, P83
[6] TensorMask: A Foundation for Dense Object Segmentation
Chen, Xinlei
Girshick, Ross
He, Kaiming
Dollar, Piotr
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2061 - 2069
[7] Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution
Chen, Yunpeng
Fan, Haoqi
Xu, Bing
Yan, Zhicheng
Kalantidis, Yannis
Rohrbach, Marcus
Yan, Shuicheng
Feng, Jiashi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3434 - 3443
[8] Chenchen Zhu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12354), P91, DOI 10.1007/978-3-030-58545-7_6
[9] Cheng K., 2022, IEEE Trans. Circuits Syst. Video Technol., early access
[10] Chi L, 2020, PROC CVPR IEEE, P11801, DOI 10.1109/CVPR42600.2020.01182

← 1 2 3 4 5 6 7 8 →