Efficient Task-Specific Feature Re-Fusion for More Accurate Object Detection and Instance Segmentation

被引:1
作者
Wang, Cheng [1 ,2 ]
Fang, Yuxin [1 ]
Fang, Jiemin [1 ,2 ]
Guo, Peng [1 ]
Wu, Rui [3 ]
Huang, He [3 ]
Wang, Xinggang [1 ]
Huang, Chang [3 ]
Liu, Wenyu [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Inst Artificial Intelligence, Wuhan 430074, Peoples R China
[3] Horizon Robot Inc, Beijing 100190, Peoples R China
关键词
Computer vision; deep learning; object detection; instance segmentation;
D O I
10.1109/TCSVT.2023.3344713
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Feature pyramid representations have been widely adopted in the object detection literature for better handling of variations in scale, which provide abundant information from various spatial levels for classification and localization sub-tasks. We find that inter sub-task feature disentanglement and intra sub-task feature re-fusion are crucial for final prediction performance, but are hard to be achieved simultaneously considering the computational efficiency. We find this issue can be addressed by delicate module design. In this paper, we propose an Efficient Task-specific Feature Re-fusion (ETFR) module to mitigate the dilemma. ETFR disentangles inter sub-task features, reduces the output channels of multi-scale features based on their importance and re-fuses intra sub-task features via concatenation operation. As a plug-and-play module, ETFR can remarkably and consistently improve the well-established and highly-optimized object detection and instance segmentation methods, such as RetinaNet, FCOS, BlendMask and CondInst, with neglectable extra computation cost. Extensive experiments demonstrate that ETFR has good generalization ability on various changeling datasets, including COCO, LVIS and Cityscapes.
引用
收藏
页码:5350 / 5360
页数:11
相关论文
共 75 条
  • [51] FCOS: Fully Convolutional One-Stage Object Detection
    Tian, Zhi
    Shen, Chunhua
    Chen, Hao
    He, Tong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9626 - 9635
  • [52] Focal Loss for Dense Object Detection
    Lin, Tsung-Yi
    Goyal, Priya
    Girshick, Ross
    He, Kaiming
    Dollar, Piotr
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2999 - 3007
  • [53] Bridging Multi-Scale Context-Aware Representation for Object Detection
    Wang, Boying
    Ji, Ruyi
    Zhang, Libo
    Wu, Yanjun
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2317 - 2329
  • [54] Wang QL, 2020, PROC CVPR IEEE, P11531, DOI 10.1109/CVPR42600.2020.01155
  • [55] Wang WH, 2023, Arxiv, DOI arXiv:2106.13797
  • [56] Wang XJ, 2020, PROC CVPR IEEE, P13356, DOI 10.1109/CVPR42600.2020.01337
  • [57] Wang Xinlong, 2020, Advances in Neural Information Processing Systems, V33
  • [58] Woo S., 2023, arXiv
  • [59] CBAM: Convolutional Block Attention Module
    Woo, Sanghyun
    Park, Jongchan
    Lee, Joon-Young
    Kweon, In So
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 3 - 19
  • [60] Wu Y., 2019, DETECTRON2