Lightweight Model Pre-Training via Language Guided Knowledge Distillation

被引:1
作者
Li, Mingsheng [1 ]
Zhang, Lin [1 ]
Zhu, Mingzhen [1 ]
Huang, Zilong [2 ]
Yu, Gang [2 ]
Fan, Jiayuan [3 ]
Chen, Tao [1 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Tencent GY Lab, Shanghai 200000, Peoples R China
[3] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Visualization; Semantics; Task analysis; Feature extraction; Training; Computational modeling; Image segmentation; Lightweight model pre-training; language-guided distillation; textual semantics bank; visual semantics banks;
D O I
10.1109/TMM.2024.3410532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to extract semantic embeddings from language and construct a textual semantic space called Textual Semantics Bank (TSB). Furthermore, we design a Language-Guided Knowledge Aggregation (LGKA) module to construct the visual semantic space, also named Visual Semantics Bank (VSB). The task-related knowledge is transferred by driving a student encoder to mimic the similarity score distribution inferred by a teacher over TSB and VSB. Compared with other small models obtained by either ImageNet pre-training or self-supervised distillation, experiment results show that the distilled lightweight model using the proposed LGD method presents state-of-the-art performance and is validated on various downstream tasks, including classification, detection, and segmentation.
引用
收藏
页码:10720 / 10730
页数:11
相关论文
共 50 条
[41]   Heterogeneous Federated Learning via Generative Model-Aided Knowledge Distillation in the Edge [J].
Sun, Chuanneng ;
Jiang, Tingcong ;
Pompili, Dario .
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (05) :5589-5599
[42]   Lightweight Deep CNN for Natural Image Matting via Similarity-Preserving Knowledge Distillation [J].
Yoon, Donggeun ;
Park, Jinsun ;
Cho, Donghyeon .
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 :2139-2143
[43]   Lightweight Vision Language Model-Guided Gesture Recognition Based on Electromyography [J].
Dere, Mustapha Deji ;
Cheong, Saehyung ;
Jo, Ji-Hun ;
Ku, Giwon ;
Lee, Boreom .
IEEE SENSORS JOURNAL, 2025, 25 (12) :22677-22685
[44]   Corruption Is Not All Bad: Incorporating Discourse Structure Into Pre-Training via Corruption for Essay Scoring [J].
Mim, Farjana Sultana ;
Inoue, Naoya ;
Reisert, Paul ;
Ouchi, Hiroki ;
Inui, Kentaro .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :2202-2215
[45]   DeepVulSeeker: A novel vulnerability identification framework via code graph structure and pre-training mechanism [J].
Wang, Jin ;
Xiao, Hui ;
Zhong, Shuwen ;
Xiao, Yinhao .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 148 :15-26
[46]   A Lightweight Knowledge Distillation and Feature Compression Model for User Click-Through Rates Prediction in Edge Computing Scenarios [J].
Yang, Bin ;
Zhou, Jiawei ;
Zhang, Shihao ;
Xing, Ying ;
Jiang, Weiwei ;
Xu, Lexi .
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (03) :2295-2308
[47]   HOP plus : History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation [J].
Qiao, Yanyuan ;
Qi, Yuankai ;
Hong, Yicong ;
Yu, Zheng ;
Wang, Peng ;
Wu, Qi .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) :8524-8537
[48]   Self-supervised pre-training in photovoltaic systems via supervisory control and data acquisition data [J].
Wang, Dejun ;
Duan, Zhenqing ;
Wang, Wenbin ;
Chu, Jingchun ;
Cui, Qingru ;
Zhu, Runze ;
Cui, Yahui ;
Zhang, You ;
You, Zedong .
IET CYBER-PHYSICAL SYSTEMS: THEORY & APPLICATIONS, 2023, 8 (04) :272-279
[49]   FLFT: A Large-Scale Pre-Training Model Distributed Fine-Tuning Method That Integrates Federated Learning Strategies [J].
Tao, Yu ;
Yang, Ruopeng ;
Zeng, Kaisheng ;
Yin, Changsheng ;
Lu, Yiwei ;
Lu, Wenxin ;
Shi, Yongqi ;
Wang, Bo ;
Huang, Bo .
IEEE ACCESS, 2025, 13 :56439-56453
[50]   Blur-Robust Object Detection Using Feature-Level Deblurring via Self-Guided Knowledge Distillation [J].
Cho, Sung-Jin ;
Kim, Seung-Wook ;
Jung, Seung-Won ;
Ko, Sung-Jea .
IEEE ACCESS, 2022, 10 :79491-79501