Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

被引:0
|
作者
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
机构
[1] The University of Sydney,Faculty of Engineering, School of Computer Science
来源
International Journal of Computer Vision | 2023年 / 131卷
关键词
Object detection; Feature pyramid; Context modeling; 35A01; 65L10; 65L12; 65L20; 65L70;
D O I
暂无
中图分类号
学科分类号
摘要
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance. However, they usually require heavy cross-level connections or iterative refinement to obtain better MFF result, making them complicated in structure and inefficient in computation. To address these issues, we propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results while reducing the computational costs effectively. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency. The two representations include a locally concentrated representation and a globally summarized representation, where the former focuses on extracting context cues from nearby areas while the latter extracts general contextual representations of the whole image scene as global context cues. By collecting the condensed contexts, we employ a Transformer decoder to investigate the relations between them and each local feature from the FP and then refine the MFF results accordingly. As a result, we obtain a simple and light-weight Transformer-based Context Condensation (TCC) module, which can boost various FPs and lower their computational costs simultaneously. Extensive experimental results on the challenging MS COCO dataset show that TCC is compatible to four representative FPs and consistently improves their detection accuracy by up to 7.8% in terms of average precision and reduce their complexities by up to around 20% in terms of GFLOPs, helping them achieve state-of-the-art performance more efficiently. Code will be released at https://github.com/zhechen/TCC.
引用
收藏
页码:2738 / 2756
页数:18
相关论文
共 50 条
  • [41] Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion
    Yang Chen
    Hou Zhiqiang
    Li Xinyue
    Ma Sugang
    Yang Xiaobao
    ACTA PHOTONICA SINICA, 2024, 53 (03)
  • [42] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Zhou, Xiaofei
    Wan, Bin
    Wang, Shuai
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
  • [43] Transformer-based Algorithm for Commodity Detection in Fisheye Images
    Zhang, Chen
    Yang, Tangwen
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 90 - 94
  • [44] UAV Small Object Detection Algorithm Based on Context Information and Feature Refinement
    Peng, Yanfei
    Zhao, Tao
    Chen, Yankang
    Yuan, Xiaolong
    Computer Engineering and Applications, 2024, 60 (05) : 183 - 190
  • [45] Combining transformer global and local feature extraction for object detection
    Li, Tianping
    Zhang, Zhenyi
    Zhu, Mengdi
    Cui, Zhaotong
    Wei, Dongmei
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (04) : 4897 - 4920
  • [46] Design of hand detection based on attention and feature enhancement pyramids
    Li, Jiao
    Sun, Haodong
    Qiao, Yang
    Li, Zhongyu
    Ran, Sijie
    Sun, Xuecheng
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (03)
  • [47] Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network
    Lingling Li
    Changwen Zheng
    Cunli Mao
    Haibo Deng
    Taisong Jin
    Neural Processing Letters, 2022, 54 : 581 - 595
  • [48] An Advanced Object Detection Framework for UAV Imagery Utilizing Transformer-Based Architecture and Split Attention Module: PvSAMNet
    Sirisha, Museboyina
    Sudha, Sadasivam Vijayakumar
    TRAITEMENT DU SIGNAL, 2023, 40 (04) : 1661 - 1672
  • [49] Scale-Insensitive Object Detection via Attention Feature Pyramid Transformer Network
    Li, Lingling
    Zheng, Changwen
    Mao, Cunli
    Deng, Haibo
    Jin, Taisong
    NEURAL PROCESSING LETTERS, 2022, 54 (01) : 581 - 595
  • [50] Small Object Detection Based on Lightweight Feature Pyramid
    Li, Ziyang
    Guo, Chenwei
    Han, Guang
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (03) : 6064 - 6074