Knowledge Amalgamation for Object Detection With Transformers

被引:8
作者
Zhang, Haofei [1 ]
Mao, Feng [2 ]
Xue, Mengqi [3 ]
Fang, Gongfan [4 ]
Feng, Zunlei [5 ]
Song, Jie [5 ]
Song, Mingli [6 ,7 ,8 ,9 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China
[2] Alibaba Grp, Xixi Campus, Hangzhou 311121, Peoples R China
[3] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310028, Peoples R China
[4] Natl Univ Singapore, Elect & Comp Engn, Singapore 119077, Singapore
[5] Zhejiang Univ, Coll Software Technol, Hangzhou 310027, Peoples R China
[6] Zhejiang Univ, Shanghai Inst Adv Study, Shanghai 200080, Peoples R China
[7] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China
[8] Zhejiang Univ, Zhejiang Prov Key Lab Serv Robot, Hangzhou 310027, Peoples R China
[9] ZJU Bangsun Joint Res Ctr, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Task analysis; Object detection; Detectors; Training; Computer architecture; Feature extraction; Model reusing; knowledge amalgamation; knowledge distillation; object detection; vision transformers;
D O I
10.1109/TIP.2023.3263105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a multi-talented and compact student. Currently, most of these approaches are tailored for convolutional neural networks (CNNs). However, there is a tendency that Transformers, with a completely different architecture, are starting to challenge the domination of CNNs in many computer vision tasks. Nevertheless, directly applying the previous KA methods to Transformers leads to severe performance degradation. In this work, we explore a more effective KA scheme for Transformer-based object detection models. Specifically, considering the architecture characteristics of Transformers, we propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA). In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one as previous KA approaches. Besides, the student learns heterogeneous detection tasks through soft targets with efficiency in the task-level amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that the sequence-level amalgamation significantly boosts the performance of students, while the previous methods impair the students. Moreover, the Transformer-based students excel in learning amalgamated knowledge, as they have mastered heterogeneous detection tasks rapidly and achieved superior or at least comparable performance to those of the teachers in their specializations.
引用
收藏
页码:2093 / 2106
页数:14
相关论文
共 50 条
  • [41] Knowledge distillation for object detection with diffusion model
    Zhang, Yi
    Long, Junzong
    Li, Chunrui
    [J]. NEUROCOMPUTING, 2025, 636
  • [42] Few-Shot Object Detection via Sample Processing
    Xu, Honghui
    Wang, Xinqing
    Shao, Faming
    Duan, Baoguo
    Zhang, Peng
    [J]. IEEE ACCESS, 2021, 9 (09): : 29207 - 29221
  • [43] Dual Appearance-Aware Enhancement for Oriented Object Detection
    Gong, Maoguo
    Zhao, Hongyu
    Wu, Yue
    Tang, Zedong
    Feng, Kai-Yuan
    Sheng, Kai
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [44] Learning Orientation-Aware Distances for Oriented Object Detection
    Rao, Chaofan
    Wang, Jiabao
    Cheng, Gong
    Xie, Xingxing
    Han, Junwei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [45] Temporal Speciation Network for Few-Shot Object Detection
    Zhao, Xiaowei
    Liu, Xianglong
    Ma, Yuqing
    Bai, Shihao
    Shen, Yifan
    Hao, Zeyu
    Liu, Aishan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8267 - 8278
  • [46] Object Detection in Hyperspectral Images
    Yan, Longbin
    Zhao, Min
    Wang, Xiuheng
    Zhang, Yuge
    Chen, Jie
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 508 - 512
  • [47] SSTNet: Saliency sparse transformers network with tokenized dilation for salient object detection
    Yang, Mo
    Liu, Ziyan
    Dong, Wen
    Wu, Ying
    [J]. IET IMAGE PROCESSING, 2023, 17 (13) : 3759 - 3776
  • [48] A Novel Keypoint Supplemented R-CNN for UAV Object Detection
    Butler, Justin
    Leung, Henry
    [J]. IEEE SENSORS JOURNAL, 2023, 23 (24) : 30883 - 30892
  • [49] Balanced Classification: A Unified Framework for Long-Tailed Object Detection
    Qi, Tianhao
    Xie, Hongtao
    Li, Pandeng
    Ge, Jiannan
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3088 - 3101
  • [50] A Survey of Self-Supervised and Few-Shot Object Detection
    Huang, Gabriel
    Laradji, Issam
    Vazquez, David
    Lacoste-Julien, Simon
    Rodriguez, Pau
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4071 - 4089