Frameless Graph Knowledge Distillation

被引：0

作者：

Shi, Dai ^{[1
]}

Shao, Zhiqi ^{[2
]}

Gao, Junbin ^{[2
]}

Wang, Zhiyong ^{[3
]}

Guo, Yi ^{[1
]}

机构：

[1] Western Sydney Univ, Sch Comp Data & Math Sci, Parramatta, NSW 2150, Australia

[2] Univ Sydney, Business Sch, Discipline Business Analyt, Sydney, NSW 2006, Australia

[3] Univ Sydney, Sch Comp Sci, Sydney, NSW 2050, Australia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 05期

关键词：

Computational modeling; Adaptation models; Analytical models; Knowledge engineering; Graph neural networks; Data models; Spectral analysis; Graph framelets; graph neural networks (GNNs); knowledge distillation (KD);

D O I：

10.1109/TNNLS.2024.3442379

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to graph representation learning models such as graph neural networks (GNNs) to accelerate the model's inference speed via student models. However, many existing KD-based GNNs utilize multilayer perceptron (MLP) as a universal approximator in the student model to imitate the teacher model's process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multiscaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multiscaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the oversquashing issue with a simple yet effective graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.

引用

页码：8125 / 8139

页数：15

共 83 条

[1] Variational Information Distillation for Knowledge Transfer [J].

Ahn, Sungsoo ;

Hu, Shell Xu ;

Damianou, Andreas ;

Lawrence, Neil D. ;

Dai, Zhenwen .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163

[2]

Arora R., 2016, ARXIV

[3]

Bronstein Michael M, 2021, arXiv

[4]

Chen DF, 2020, AAAI CONF ARTIF INTE, V34, P3430

[5]

Chen J., 2024, SUB MITTED T MACH LE

[6]

Chen J., 2023, ARXIV

[7] Graph Decoupling Attention Markov Networks for Semisupervised Graph Node Classification [J].

Chen, Jie ;

Chen, Shouzhen ;

Bai, Mingyuan ;

Pu, Jian ;

Zhang, Junping ;

Gao, Junbin .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) :9859-9873

[8]

Chen Y., 2020, arXiv

[9] Explaining Knowledge Distillation by Quantifying the Knowledge [J].

Cheng, Xu ;

Rao, Zhefan ;

Chen, Yilan ;

Zhang, Quanshi .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12922-12932

[10]

Chien E., 2020, ARXIV

← 1 2 3 4 5 6 7 8 9 →