Lightweight Model Pre-Training via Language Guided Knowledge Distillation

被引:0
|
作者
Li, Mingsheng [1 ]
Zhang, Lin [1 ]
Zhu, Mingzhen [1 ]
Huang, Zilong [2 ]
Yu, Gang [2 ]
Fan, Jiayuan [3 ]
Chen, Tao [1 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Tencent GY Lab, Shanghai 200000, Peoples R China
[3] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Visualization; Semantics; Task analysis; Feature extraction; Training; Computational modeling; Image segmentation; Lightweight model pre-training; language-guided distillation; textual semantics bank; visual semantics banks;
D O I
10.1109/TMM.2024.3410532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to extract semantic embeddings from language and construct a textual semantic space called Textual Semantics Bank (TSB). Furthermore, we design a Language-Guided Knowledge Aggregation (LGKA) module to construct the visual semantic space, also named Visual Semantics Bank (VSB). The task-related knowledge is transferred by driving a student encoder to mimic the similarity score distribution inferred by a teacher over TSB and VSB. Compared with other small models obtained by either ImageNet pre-training or self-supervised distillation, experiment results show that the distilled lightweight model using the proposed LGD method presents state-of-the-art performance and is validated on various downstream tasks, including classification, detection, and segmentation.
引用
收藏
页码:10720 / 10730
页数:11
相关论文
共 49 条
  • [1] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training
    Liu, Che
    Cheng, Sibo
    Shi, Miaojing
    Shah, Anand
    Bai, Wenjia
    Arcucci, Rossella
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 519 - 529
  • [2] Simultaneously Training and Compressing Vision-and-Language Pre-Training Model
    Qi, Qiaosong
    Zhang, Aixi
    Liao, Yue
    Sun, Wenyu
    Wang, Yongliang
    Li, Xiaobo
    Liu, Si
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8194 - 8203
  • [3] RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training
    Qiao, Ziyue
    Fu, Yanjie
    Wang, Pengyang
    Xiao, Meng
    Ning, Zhiyuan
    Zhang, Denghui
    Du, Yi
    Zhou, Yuanchun
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (01) : 186 - 199
  • [4] Focus and Align: Learning Tube Tokens for Video-Language Pre-Training
    Zhu, Yongqing
    Li, Xiangyang
    Zheng, Mao
    Yang, Jiahao
    Wang, Zihan
    Guo, Xiaoqian
    Chai, Zifeng
    Yuan, Yuchen
    Jiang, Shuqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8036 - 8050
  • [5] Efficient Person Search via Expert-Guided Knowledge Distillation
    Zhang, Yaqing
    Li, Xi
    Zhang, Zhongfei
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (10) : 5093 - 5104
  • [6] MedFILIP: Medical Fine-Grained Language-Image Pre-Training
    Liang, Xinjie
    Li, Xiangyu
    Li, Fanding
    Jiang, Jie
    Dong, Qing
    Wang, Wei
    Wang, Kuanquan
    Dong, Suyu
    Luo, Gongning
    Li, Shuo
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (05) : 3587 - 3597
  • [7] Sampling-Based Pruned Knowledge Distillation for Training Lightweight RNN-T
    Kim, Sungsoo
    Lee, Dongjune
    Kang, Ju Yeon
    Jeong, Myeonghun
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 631 - 635
  • [8] Morphology-Guided Network via Knowledge Distillation for RGB-D Mirror Segmentation
    Zhou, Wujie
    Cai, Yuqi
    Qiang, Fangfang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 17382 - 17391
  • [9] Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks
    Ma, Duo
    Yue, Xianghu
    Ao, Junyi
    Gao, Xiaoxue
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2055 - 2059
  • [10] Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks
    Wang, Siyuan
    Wei, Zhongyu
    Xu, Jiarong
    Li, Taishan
    Fan, Zhihao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1586 - 1595