Knowledge Amalgamation for Object Detection With Transformers

被引:8
作者
Zhang, Haofei [1 ]
Mao, Feng [2 ]
Xue, Mengqi [3 ]
Fang, Gongfan [4 ]
Feng, Zunlei [5 ]
Song, Jie [5 ]
Song, Mingli [6 ,7 ,8 ,9 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China
[2] Alibaba Grp, Xixi Campus, Hangzhou 311121, Peoples R China
[3] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310028, Peoples R China
[4] Natl Univ Singapore, Elect & Comp Engn, Singapore 119077, Singapore
[5] Zhejiang Univ, Coll Software Technol, Hangzhou 310027, Peoples R China
[6] Zhejiang Univ, Shanghai Inst Adv Study, Shanghai 200080, Peoples R China
[7] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China
[8] Zhejiang Univ, Zhejiang Prov Key Lab Serv Robot, Hangzhou 310027, Peoples R China
[9] ZJU Bangsun Joint Res Ctr, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Task analysis; Object detection; Detectors; Training; Computer architecture; Feature extraction; Model reusing; knowledge amalgamation; knowledge distillation; object detection; vision transformers;
D O I
10.1109/TIP.2023.3263105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a multi-talented and compact student. Currently, most of these approaches are tailored for convolutional neural networks (CNNs). However, there is a tendency that Transformers, with a completely different architecture, are starting to challenge the domination of CNNs in many computer vision tasks. Nevertheless, directly applying the previous KA methods to Transformers leads to severe performance degradation. In this work, we explore a more effective KA scheme for Transformer-based object detection models. Specifically, considering the architecture characteristics of Transformers, we propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA). In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one as previous KA approaches. Besides, the student learns heterogeneous detection tasks through soft targets with efficiency in the task-level amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that the sequence-level amalgamation significantly boosts the performance of students, while the previous methods impair the students. Moreover, the Transformer-based students excel in learning amalgamated knowledge, as they have mastered heterogeneous detection tasks rapidly and achieved superior or at least comparable performance to those of the teachers in their specializations.
引用
收藏
页码:2093 / 2106
页数:14
相关论文
共 50 条
  • [21] Category Knowledge-Guided Parameter Calibration for Few-Shot Object Detection
    Chen, Chaofan
    Yang, Xiaoshan
    Zhang, Jinpeng
    Dong, Bo
    Xu, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1092 - 1107
  • [22] Diversity Knowledge Distillation for LiDAR-Based 3-D Object Detection
    Ning, Kanglin
    Liu, Yanfei
    Su, Yanzhao
    Jiang, Ke
    IEEE SENSORS JOURNAL, 2023, 23 (11) : 11181 - 11193
  • [23] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
    Xu, Yifan
    Zhang, Mengdan
    Yang, Xiaoshan
    Xu, Changsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
  • [24] Improved object detection method for unmanned driving based on Transformers
    Zhao, Huaqi
    Peng, Xiang
    Wang, Su
    Li, Jun-Bao
    Pan, Jeng-Shyang
    Su, Xiaoguang
    Liu, Xiaomin
    FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [25] Align Deep Features for Oriented Object Detection
    Han, Jiaming
    Ding, Jian
    Li, Jie
    Xia, Gui-Song
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [26] Domain Contrast for Domain Adaptive Object Detection
    Liu, Feng
    Zhang, Xiaosong
    Wan, Fang
    Ji, Xiangyang
    Ye, Qixiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8227 - 8237
  • [27] Hybrid Knowledge Distillation Network for RGB-D Co-Salient Object Detection
    Tu, Zhangping
    Zhou, Wujie
    Qian, Xiaohong
    Yan, Weiqing
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025, 55 (04): : 2695 - 2706
  • [28] Toward Compact Transformers for End-to-End Object Detection With Decomposed Chain Tensor Structure
    Zhen, Peining
    Yan, Xiaotao
    Wang, Wei
    Hou, Tianshu
    Wei, Hao
    Chen, Hai-Bao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 872 - 885
  • [29] A Simple Yet Effective Network Based on Vision Transformer for Camouflaged Object and Salient Object Detection
    Hao, Chao
    Yu, Zitong
    Liu, Xin
    Xu, Jun
    Yue, Huanjing
    Yang, Jingyu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 608 - 622
  • [30] Incremental Object Detection via Meta-Learning
    Joseph, K. J.
    Rajasegaran, Jathushan
    Khan, Salman
    Khan, Fahad Shahbaz
    Balasubramanian, Vineeth N.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9209 - 9216