Knowledge Amalgamation for Object Detection With Transformers

被引:8
|
作者
Zhang, Haofei [1 ]
Mao, Feng [2 ]
Xue, Mengqi [3 ]
Fang, Gongfan [4 ]
Feng, Zunlei [5 ]
Song, Jie [5 ]
Song, Mingli [6 ,7 ,8 ,9 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China
[2] Alibaba Grp, Xixi Campus, Hangzhou 311121, Peoples R China
[3] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310028, Peoples R China
[4] Natl Univ Singapore, Elect & Comp Engn, Singapore 119077, Singapore
[5] Zhejiang Univ, Coll Software Technol, Hangzhou 310027, Peoples R China
[6] Zhejiang Univ, Shanghai Inst Adv Study, Shanghai 200080, Peoples R China
[7] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China
[8] Zhejiang Univ, Zhejiang Prov Key Lab Serv Robot, Hangzhou 310027, Peoples R China
[9] ZJU Bangsun Joint Res Ctr, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Task analysis; Object detection; Detectors; Training; Computer architecture; Feature extraction; Model reusing; knowledge amalgamation; knowledge distillation; object detection; vision transformers;
D O I
10.1109/TIP.2023.3263105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a multi-talented and compact student. Currently, most of these approaches are tailored for convolutional neural networks (CNNs). However, there is a tendency that Transformers, with a completely different architecture, are starting to challenge the domination of CNNs in many computer vision tasks. Nevertheless, directly applying the previous KA methods to Transformers leads to severe performance degradation. In this work, we explore a more effective KA scheme for Transformer-based object detection models. Specifically, considering the architecture characteristics of Transformers, we propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA). In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one as previous KA approaches. Besides, the student learns heterogeneous detection tasks through soft targets with efficiency in the task-level amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that the sequence-level amalgamation significantly boosts the performance of students, while the previous methods impair the students. Moreover, the Transformer-based students excel in learning amalgamated knowledge, as they have mastered heterogeneous detection tasks rapidly and achieved superior or at least comparable performance to those of the teachers in their specializations.
引用
收藏
页码:2093 / 2106
页数:14
相关论文
共 50 条
  • [1] Learning Implicit Class Knowledge for RGB-D Co-Salient Object Detection With Transformers
    Zhang, Ni
    Han, Junwei
    Liu, Nian
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4556 - 4570
  • [2] TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers
    Zhou, Qianyu
    Li, Xiangtai
    He, Lu
    Yang, Yibo
    Cheng, Guangliang
    Tong, Yunhai
    Ma, Lizhuang
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7853 - 7869
  • [3] Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review
    Amjoud, Ayoub Benali
    Amrouch, Mustapha
    IEEE ACCESS, 2023, 11 : 35479 - 35516
  • [4] TCNet: Co-Salient Object Detection via Parallel Interaction of Transformers and CNNs
    Ge, Yanliang
    Zhang, Qiao
    Xiang, Tian-Zhu
    Zhang, Cong
    Bi, Hongbo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (06) : 2600 - 2615
  • [5] Auto Learner of Objects Co-Occurrence Knowledge for Object Detection in Remote Sensing Images
    Zheng, Kunlong
    Dong, Yifan
    Xu, Wei
    Tan, Weixian
    Huang, Pingping
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [6] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [7] Transformed Dynamic Feature Pyramid for Small Object Detection
    Liang, Hong
    Yang, Ying
    Zhang, Qian
    Feng, Linxia
    Ren, Jie
    Liang, Qiyao
    IEEE ACCESS, 2021, 9 : 134649 - 134659
  • [8] A Refined Hybrid Network for Object Detection in Aerial Images
    Yu, Ying
    Yang, Xi
    Li, Jie
    Gao, Xinbo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [9] L-DETR: A Light-Weight Detector for End-to-End Object Detection With Transformers
    Li, Tianyang
    Wang, Jian
    Zhang, Tibing
    IEEE ACCESS, 2022, 10 : 105685 - 105692
  • [10] Cross-Domain Adaptive Object Detection Based on Refined Knowledge Transfer and Mined Guidance in Autonomous Vehicles
    Wang, Ke
    Pu, Liang
    Dong, Wenjie
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1899 - 1908