Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

被引:0
|
作者
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
机构
[1] The University of Sydney,Faculty of Engineering, School of Computer Science
来源
International Journal of Computer Vision | 2023年 / 131卷
关键词
Object detection; Feature pyramid; Context modeling; 35A01; 65L10; 65L12; 65L20; 65L70;
D O I
暂无
中图分类号
学科分类号
摘要
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance. However, they usually require heavy cross-level connections or iterative refinement to obtain better MFF result, making them complicated in structure and inefficient in computation. To address these issues, we propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results while reducing the computational costs effectively. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency. The two representations include a locally concentrated representation and a globally summarized representation, where the former focuses on extracting context cues from nearby areas while the latter extracts general contextual representations of the whole image scene as global context cues. By collecting the condensed contexts, we employ a Transformer decoder to investigate the relations between them and each local feature from the FP and then refine the MFF results accordingly. As a result, we obtain a simple and light-weight Transformer-based Context Condensation (TCC) module, which can boost various FPs and lower their computational costs simultaneously. Extensive experimental results on the challenging MS COCO dataset show that TCC is compatible to four representative FPs and consistently improves their detection accuracy by up to 7.8% in terms of average precision and reduce their complexities by up to around 20% in terms of GFLOPs, helping them achieve state-of-the-art performance more efficiently. Code will be released at https://github.com/zhechen/TCC.
引用
收藏
页码:2738 / 2756
页数:18
相关论文
共 50 条
  • [31] Explainability Enhanced Object Detection Transformer With Feature Disentanglement
    Yu, Wenlong
    Liu, Ruonan
    Chen, Dongyue
    Hu, Qinghua
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6439 - 6454
  • [32] A transformer-based lightweight method for multiple-object tracking
    Wan, Qin
    Ge, Zhu
    Yang, Yang
    Shen, Xuejun
    Zhong, Hang
    Zhang, Hui
    Wang, Yaonan
    Wu, Di
    IET IMAGE PROCESSING, 2024, 18 (09) : 2329 - 2345
  • [33] Loop closure detection based on feature pyramids and NetVLAD
    Ren, Mingrong
    Gao, Bo
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (06) : 63033
  • [34] Att-FPA: Boosting Feature Perceive for Object Detection
    Liu, Jingwei
    Gu, Yi
    Han, Shumin
    Zhang, Zhibin
    Guo, Jiafeng
    Cheng, Xueqi
    IEEE ACCESS, 2021, 9 : 47380 - 47390
  • [35] UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection
    Guo, Ruohao
    Ying, Xianghua
    Qi, Yanyu
    Qu, Liao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7622 - 7635
  • [36] Optimized Data Distribution Learning for Enhancing Vision Transformer-Based Object Detection in Remote Sensing Images
    Song, Huaxiang
    Xie, Junping
    Wang, Yunyang
    Fu, Lihua
    Zhou, Yang
    Zhou, Xing
    PHOTOGRAMMETRIC RECORD, 2025, 40 (189)
  • [37] Transformer-Based Stereo-Aware 3D Object Detection From Binocular Images
    Sun, Hanqing
    Pang, Yanwei
    Cao, Jiale
    Xie, Jin
    Li, Xuelong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (12) : 19675 - 19687
  • [38] Point Transformer-Based Salient Object Detection Network for 3-D Measurement Point Clouds
    Wei, Zeyong
    Chen, Baian
    Wang, Weiming
    Chen, Honghua
    Wei, Mingqiang
    Li, Jonathan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 11
  • [39] TFNet: Transformer-Based Multi-Scale Feature Fusion Forest Fire Image Detection Network
    Liu, Hongying
    Zhang, Fuquan
    Xu, Yiqing
    Wang, Junling
    Lu, Hong
    Wei, Wei
    Zhu, Jun
    FIRE-SWITZERLAND, 2025, 8 (02):
  • [40] PARASITIC EGG DETECTION AND CLASSIFICATION WITH TRANSFORMER-BASED ARCHITECTURES
    Pedraza, Anibal
    Ruiz-Santaquiteria, Jesus
    Deniz, Oscar
    Bueno, Gloria
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4301 - 4305