Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

被引:0
|
作者
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
机构
[1] The University of Sydney,Faculty of Engineering, School of Computer Science
来源
International Journal of Computer Vision | 2023年 / 131卷
关键词
Object detection; Feature pyramid; Context modeling; 35A01; 65L10; 65L12; 65L20; 65L70;
D O I
暂无
中图分类号
学科分类号
摘要
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance. However, they usually require heavy cross-level connections or iterative refinement to obtain better MFF result, making them complicated in structure and inefficient in computation. To address these issues, we propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results while reducing the computational costs effectively. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency. The two representations include a locally concentrated representation and a globally summarized representation, where the former focuses on extracting context cues from nearby areas while the latter extracts general contextual representations of the whole image scene as global context cues. By collecting the condensed contexts, we employ a Transformer decoder to investigate the relations between them and each local feature from the FP and then refine the MFF results accordingly. As a result, we obtain a simple and light-weight Transformer-based Context Condensation (TCC) module, which can boost various FPs and lower their computational costs simultaneously. Extensive experimental results on the challenging MS COCO dataset show that TCC is compatible to four representative FPs and consistently improves their detection accuracy by up to 7.8% in terms of average precision and reduce their complexities by up to around 20% in terms of GFLOPs, helping them achieve state-of-the-art performance more efficiently. Code will be released at https://github.com/zhechen/TCC.
引用
收藏
页码:2738 / 2756
页数:18
相关论文
共 50 条
  • [21] TANet: Transformer-based asymmetric network for RGB-D salient object detection
    Liu, Chang
    Yang, Gang
    Wang, Shuo
    Wang, Hangxu
    Zhang, Yunhua
    Wang, Yutao
    IET COMPUTER VISION, 2023, 17 (04) : 415 - 430
  • [22] TOD-Net: An end-to-end transformer-based object detection network
    Sirisha, Museboyina
    Sudha, S. V.
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [23] FPDT: a multi-scale feature pyramidal object detection transformer
    Huang, Kailai
    Wen, Mi
    Wang, Chen
    Ling, Lina
    JOURNAL OF APPLIED REMOTE SENSING, 2023, 17 (02)
  • [24] Swin Transformer-Based Object Detection Model Using Explainable Meta-Learning Mining
    Baek, Ji-Won
    Chung, Kyungyong
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [25] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
    Alaba, Simegnew Yihunie
    Ball, John E.
    IEEE ACCESS, 2024, 12 : 50165 - 50176
  • [26] A Detection Transformer-Based Environmental Foreign Object Feature Detection Algorithm for the Social Internet of Things: Addressing high positioning accuracy requirements and environmental factor interference
    Cai, Kewei
    IEEE SYSTEMS MAN AND CYBERNETICS MAGAZINE, 2025, 11 (01): : 57 - 66
  • [27] A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection
    Wang, Yue
    Jia, Xu
    Zhang, Lu
    Li, Yuke
    Elder, James H.
    Lu, Huchuan
    PATTERN RECOGNITION, 2023, 140
  • [28] QAGA-Net: enhanced vision transformer-based object detection for remote sensing images
    Song, Huaxiang
    Xia, Hanjun
    Wang, Wenhui
    Zhou, Yang
    Liu, Wanbo
    Liu, Qun
    Liu, Jinling
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2025, 18 (01) : 133 - 152
  • [29] Multi-scale Feature Fusion Object Detection Based on Swin Transformer
    Zhang, Ying
    Wu, Lin
    Deng, Huaxuan
    Hu, Jun
    Li, Xifan
    39TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION, YAC 2024, 2024, : 1982 - 1987
  • [30] Transformer Based Remote Sensing Object Detection With Enhanced Multispectral Feature Extraction
    Zhu, Jiahe
    Chen, Xu
    Zhang, Huan
    Tan, Zelong
    Wang, Shengjin
    Ma, Hongbing
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20