Knowledge Amalgamation for Object Detection With Transformers

被引：8

作者：

Zhang, Haofei ^{[1
]}

Mao, Feng ^{[2
]}

Xue, Mengqi ^{[3
]}

Fang, Gongfan ^{[4
]}

Feng, Zunlei ^{[5
]}

Song, Jie ^{[5
]}

Song, Mingli ^{[6
,7
,8
,9
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China

[2] Alibaba Grp, Xixi Campus, Hangzhou 311121, Peoples R China

[3] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310028, Peoples R China

[4] Natl Univ Singapore, Elect & Comp Engn, Singapore 119077, Singapore

[5] Zhejiang Univ, Coll Software Technol, Hangzhou 310027, Peoples R China

[6] Zhejiang Univ, Shanghai Inst Adv Study, Shanghai 200080, Peoples R China

[7] Zhejiang Univ, Coll Comp Sci, Hangzhou 310027, Peoples R China

[8] Zhejiang Univ, Zhejiang Prov Key Lab Serv Robot, Hangzhou 310027, Peoples R China

[9] ZJU Bangsun Joint Res Ctr, Hangzhou 310058, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Task analysis; Object detection; Detectors; Training; Computer architecture; Feature extraction; Model reusing; knowledge amalgamation; knowledge distillation; object detection; vision transformers;

D O I：

10.1109/TIP.2023.3263105

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a multi-talented and compact student. Currently, most of these approaches are tailored for convolutional neural networks (CNNs). However, there is a tendency that Transformers, with a completely different architecture, are starting to challenge the domination of CNNs in many computer vision tasks. Nevertheless, directly applying the previous KA methods to Transformers leads to severe performance degradation. In this work, we explore a more effective KA scheme for Transformer-based object detection models. Specifically, considering the architecture characteristics of Transformers, we propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA). In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one as previous KA approaches. Besides, the student learns heterogeneous detection tasks through soft targets with efficiency in the task-level amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that the sequence-level amalgamation significantly boosts the performance of students, while the previous methods impair the students. Moreover, the Transformer-based students excel in learning amalgamated knowledge, as they have mastered heterogeneous detection tasks rapidly and achieved superior or at least comparable performance to those of the teachers in their specializations.

引用

页码：2093 / 2106

页数：14

共 50 条

[41] Knowledge distillation for object detection with diffusion model
Zhang, Yi
Long, Junzong
Li, Chunrui
[J]. NEUROCOMPUTING, 2025, 636
[42] Few-Shot Object Detection via Sample Processing
Xu, Honghui
Wang, Xinqing
Shao, Faming
Duan, Baoguo
Zhang, Peng
[J]. IEEE ACCESS, 2021, 9 (09): : 29207 - 29221
[43] Dual Appearance-Aware Enhancement for Oriented Object Detection
Gong, Maoguo
Zhao, Hongyu
Wu, Yue
Tang, Zedong
Feng, Kai-Yuan
Sheng, Kai
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[44] Learning Orientation-Aware Distances for Oriented Object Detection
Rao, Chaofan
Wang, Jiabao
Cheng, Gong
Xie, Xingxing
Han, Junwei
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[45] Temporal Speciation Network for Few-Shot Object Detection
Zhao, Xiaowei
Liu, Xianglong
Ma, Yuqing
Bai, Shihao
Shen, Yifan
Hao, Zeyu
Liu, Aishan
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8267 - 8278
[46] Object Detection in Hyperspectral Images
Yan, Longbin
Zhao, Min
Wang, Xiuheng
Zhang, Yuge
Chen, Jie
[J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 508 - 512
[47] SSTNet: Saliency sparse transformers network with tokenized dilation for salient object detection
Yang, Mo
Liu, Ziyan
Dong, Wen
Wu, Ying
[J]. IET IMAGE PROCESSING, 2023, 17 (13) : 3759 - 3776
[48] A Novel Keypoint Supplemented R-CNN for UAV Object Detection
Butler, Justin
Leung, Henry
[J]. IEEE SENSORS JOURNAL, 2023, 23 (24) : 30883 - 30892
[49] Balanced Classification: A Unified Framework for Long-Tailed Object Detection
Qi, Tianhao
Xie, Hongtao
Li, Pandeng
Ge, Jiannan
Zhang, Yongdong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3088 - 3101
[50] A Survey of Self-Supervised and Few-Shot Object Detection
Huang, Gabriel
Laradji, Issam
Vazquez, David
Lacoste-Julien, Simon
Rodriguez, Pau
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4071 - 4089

← 1 2 3 4 5 →