A Teacher-Free Graph Knowledge Distillation Framework With Dual Self-Distillation

被引:1
|
作者
Wu, Lirong [1 ]
Lin, Haitao [1 ]
Gao, Zhangyang [1 ]
Zhao, Guojiang [1 ]
Li, Stan Z. [1 ]
机构
[1] Westlake Univ, Res Ctr Ind Future, AI Lab, Hangzhou 310000, Peoples R China
基金
中国国家自然科学基金;
关键词
Graph neural networks; Training; Self-supervised learning; Inference algorithms; Task analysis; Standards; Knowledge engineering; graph knowledge distillation; inference acceleration;
D O I
10.1109/TKDE.2024.3374773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed great success in handling graph-related tasks with Graph Neural Networks (GNNs). Despite their great academic success, Multi-Layer Perceptrons (MLPs) remain the primary workhorse for practical industrial applications. One reason for such an academic-industry gap is the neighborhood-fetching latency incurred by data dependency in GNNs. To reduce their gaps, Graph Knowledge Distillation (GKD) is proposed, usually based on a standard teacher-student architecture, to distill knowledge from a large teacher GNN into a lightweight student GNN or MLP. However, we found in this paper that neither teachers nor GNNs are necessary for graph knowledge distillation. We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference. More importantly, the proposed TGS framework is purely based on MLPs, where structural information is only implicitly used to guide dual knowledge self-distillation between the target node and its neighborhood. As a result, TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference. Extensive experiments have shown that the performance of vanilla MLPs can be greatly improved with dual self-distillation, e.g., TGS improves over vanilla MLPs by 15.54% on average and outperforms state-of-the-art GKD algorithms on six real-world datasets. In terms of inference speed, TGS infers 75x-89x faster than existing GNNs and 16x-25x faster than classical inference acceleration methods.
引用
收藏
页码:4375 / 4385
页数:11
相关论文
共 50 条
  • [1] Iterative Graph Self-Distillation
    Zhang, Hanlin
    Lin, Shuai
    Liu, Weiyang
    Zhou, Pan
    Tang, Jian
    Liang, Xiaodan
    Xing, Eric P.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1161 - 1169
  • [2] Restructuring the Teacher and Student in Self-Distillation
    Zheng, Yujie
    Wang, Chong
    Tao, Chenchen
    Lin, Sunqi
    Qian, Jiangbo
    Wu, Jiafei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5551 - 5563
  • [3] Teacher-free Distillation via Regularizing Intermediate Representation
    Li, Lujun
    Liang, Shiuan-Ni
    Yang, Ya
    Jin, Zhe
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [4] Reverse Self-Distillation Overcoming the Self-Distillation Barrier
    Ni, Shuiping
    Ma, Xinliang
    Zhu, Mingfu
    Li, Xingwang
    Zhang, Yu-Dong
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2023, 4 : 195 - 205
  • [5] Self-Regulated Feature Learning via Teacher-free Feature Distillation
    Li, Lujun
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 347 - 363
  • [6] Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
    Zhang, Xinlu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    IEEE ACCESS, 2020, 8 : 206638 - 206645
  • [7] Teacher-Free Knowledge Distillation based on Non-Progressive Meta-Learned Simulated Annealing
    Ho, Pin Hsuan
    Jiang, Bing Ru
    Lin, Albert S.
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [8] Unbiased scene graph generation using the self-distillation method
    Bo Sun
    Zhuo Hao
    Lejun Yu
    Jun He
    The Visual Computer, 2024, 40 : 2381 - 2390
  • [9] KED: A Deep-Supervised Knowledge Enhancement Self-Distillation Framework for Model Compression
    Lai, Yutong
    Ning, Dejun
    Liu, Shipeng
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 831 - 835
  • [10] Self-distillation with model averaging
    Gu, Xiaozhe
    Zhang, Zixun
    Jin, Ran
    Goh, Rick Siow Mong
    Luo, Tao
    INFORMATION SCIENCES, 2025, 694